Introduction to Graphs ・V = Nodes
Total Page:16
File Type:pdf, Size:1020Kb
Undirected graphs Notation. G = (V, E) Introduction to Graphs ・V = nodes. ・E = edges between pairs of nodes. ・Captures pairwise relationship between objects. Tyler Moore ・Graph size parameters: n = | V |, m = | E |. CS 2123, The University of Tulsa V = { 1, 2, 3, 4, 5, 6, 7, 8 } Some slides created by or adapted from Dr. Kevin Wayne. For more information see E = { 1-2, 1-3, 2-3, 2-4, 2-5, 3-5, 3-7, 3-8, 4-5, 5-6, 7-8 } http://www.cs.princeton.edu/~wayne/kleinberg-tardos m = 11, n = 8 3 2 / 36 One week of Enron emails The evolution of FCC lobbying coalitions 4 “The Evolution of FCC Lobbying Coalitions” by Pierre de Vries in JoSS Visualization Symposium 2010 5 3 / 36 4 / 36 The Spread of Obesity in a Large Social Network Over 32 Years educational level; the ego’s obesity status at the ing both their weights. We estimated these mod- previous time point (t); and most pertinent, the els in varied ego–alter pair types. alter’s obesity status at times t and t + 1.25 We To evaluate the possibility that omitted vari- used generalized estimating equations to account ables or unobserved events might explain the as- for multiple observations of the same ego across sociations, we examined how the type or direc- examinations and across ego–alter pairs.26 We tion of the social relationship between the ego assumed an independent working correlation and the alter affected the association between the structure for the clusters.26,27 ego’s obesity and the alter’s obesity. For example, The use of a time-lagged dependent variable if unobserved factors drove the association be- (lagged to the previous examination) eliminated tween the ego’s obesity and the alter’s obesity, serial correlation in the errors (evaluated with a then the directionality of friendship should not Lagrange multiplier test28) and also substantial- have been relevant. ly controlled for the ego’s genetic endowment and We evaluated the role of a possible spread in any intrinsic, stable predisposition to obesity. The smoking-cessation behavior as a contributor to use of a lagged independent variable for an alter’s the spread of obesity by adding variables for the weight status controlled for homophily.25 The smoking status of egos and alters at times t and key variable of interest was an alter’s obesity at t + 1 to the foregoing models. We also analyzed time t + 1. A significant coefficient for this vari- the role of geographic distance between egos Social Network as a Graph Framinghamable would heart suggest study either that an alter’s weight and alters by adding such a variable. affected an ego’s weight or that an ego and an We calculated 95% confidence intervals by sim- alter experienced contemporaneous events affect- ulating the first difference in the alter’s contem- Alice Nodes: people Edges: friendship Bob Charlie Dawn Eve Figure 1. Largest Connected Subcomponent of the Social Network in the Framingham Heart Study in the Year 2000. Each circle (node) represents one person in the data set. There are 2200 persons in this subcomponent of the social network. Circles with red borders denote women, and circles with blue borders denote men. The size of each circle Fred George is proportional to the person’s body-mass index. The interior color of the circles indicates the person’s obesity status: yellow denotes an obese person (body-mass index, ≥30) and green denotes a nonobese person. The colors of the ties between the nodes indicate the relationship between them: purple denotes a friendship or marital tie and orange denotes a familial tie. “The Spread of Obesity in a Large Social Network over 32 Years” by Christakis and Fowler in New England Journal of Medicine, 2007 6 373 n engl j med 357;4 www.nejm.org july 26, 2007 5 / 36 6 / 36 Road Network as a Graph Electronic Circuits as a Graph 10Ω Nodes: intersections A Main St Edges: roads vx S 20Ω 5Ω vx 1st Ave 5 Main St 2nd Ave B 3rd Ave Elm St Aspen Ave 5th Ave Main St Vertices: junctions Park St 5th Ave Edges: components 7 / 36 8 / 36 Course Dependencies as a Graph Maze as a Graph CS1043 CS2003 CS2033 CS2123 Nodes: courses Edges: prerequisites CS3053 CS4333 Nodes: rooms; edges: doorways 10 / 36 11 / 36 Flavors of Graphs Directed vs. Undirected Graphs A Undirected A Directed The first step in any graph problem is recognizing that you have a B C B C graph problem It's not always obvious! D E D E The second step in any graph problem is determining which flavor of F G F G graph you are dealing with Learning to talk the talk is an important part of walking the walk The flavor of graph has a big impact on which algorithms are A graph G = (V ; E) is undirected if edge (x; y) 2 E implies that appropriate and efficient (y; x) is also in E Road networks between cities are typically undirected Street networks within cities are almost always directed because of one-way streets What about online social networks? 12 / 36 13 / 36 Bank Account # Payment Amount Chess piece at board position Valid move code blocks control flow functions function call Exercise 1: Spot the graph Exercise 2: Spot the graph Pay From Pay To Amount Date 12354 67324 $1000 Sep 1 12398 67324 $500 Sep 4 67324 45721 $750 Sep 7 78923 12398 $500 Sep 8 Nodes: Edges: Nodes: Edges: 14 / 36 15 / 36 Exercise 3: Spot the graph Exercise 4: Spot the graph def foo(arg1): x = ['ham','spam','glam','tram'] x = arg1+2 for a in x: bar(x) print a if a =='ham': def bar(arg1,arg2): print 'awesome' y = arg1+arg2 else: print 'terrible' ham(y) print 'moving on' def ham(arg1): bar(2*arg1) Nodes: Edges: Nodes: Edges: 16 / 36 17 / 36 Some graph applications Some directed graph applications graph node edge directed graph node directed edge transportation street intersection one-way street communication telephone, computer fiber optic cable web web page hyperlink circuit gate, register, processor wire food web species predator-prey relationship mechanical joint rod, beam, spring WordNet synset hypernym financial stock, currency transactions scheduling task precedence constraint transportation street intersection, airport highway, airway route financial bank transaction internet class C network connection cell phone person placed call game board position legal move infectious disease person infection game board position legal move social relationship person, actor friendship, movie cast citation journal article citation neural network neuron synapse object graph object pointer protein network protein protein-protein interaction inheritance hierarchy class inherits from molecule atom bond control flow code block jump 7 39 18 / 36 19 / 36 Weighted vs. Unweighted Graphs Cyclic vs. Acyclic Graphs A Unweighted A Weighted 10 A Cyclic A Acyclic 4 B C 1 B C 5 B C B C 7 8 19 D E D E 4 6 D D 10 8 F G F G F G F G In weighted graphs, each edge (or vertex) of G is assigned a An acyclic graph does not contain any cycles numerical value, or weight Trees are connected, acyclic, undirected graphs The edges of a road network graph might be weighted with their Directed acyclic graphs are called DAGs length, drive-time or speed limit DAGs arise naturally in scheduling problems, where a directed edge In unweighted graphs, there is no cost distinction between various (x; y) indicates that x must occur before y. edges and vertices 20 / 36 21 / 36 Labeled vs. Unlabeled Graphs More Graph Terminology A Labeled Unlabeled B C A path is a sequence of edges connecting two vertices Two common problems: (1) does a path exist between A and B? (2) D E what is the shortest path between A and B? A graph is connected if there is a path between any two vertices F G A directed graph is strongly connected if there is a directed path between any two vertices. In labeled graphs, each vertex is assigned a unique name or identifier The degree of a vertex is the number of edges adjacent to it to distinguish it from all other vertices. An important graph problem is isomorphism testing: determining whether the topological structure of two graphs are in fact identical if we ignore any labels. 22 / 36 23 / 36 Graph representation: adjacency matrix Graph representation: adjacency lists Adjacency matrix. n-by-n matrix with Auv = 1 if (u, v) is an edge. Adjacency lists. Node indexed array of lists. ・Two representations of each edge. ・Two representations of each edge. degree = number of neighbors of u ・Space proportional to n2. ・Space is Θ(m + n). ・Checking if (u, v) is an edge takes Θ(1) time. ・Checking if (u, v) is an edge takes O(degree(u)) time. ・Identifying all edges takes Θ(n2) time. ・Identifying all edges takes Θ(m + n) time. 1 2 3 1 2 3 4 5 6 7 8 1 0 1 1 0 0 0 0 0 2 1 3 4 5 2 1 0 1 1 1 0 0 0 3 1 2 5 7 8 3 1 1 0 0 1 0 1 1 4 2 5 4 0 1 0 0 1 0 0 0 5 0 1 1 1 0 1 0 0 5 2 3 4 6 6 0 0 0 0 1 0 0 0 6 5 7 0 0 1 0 0 0 0 1 7 3 8 8 0 0 1 0 0 0 1 0 8 3 7 8 9 24 / 36 25 / 36 Graph Data Structures Graph Data Structures Option 1: Adjacency Sets Option 2: Adjacency Lists a, b, c, d, e, f, g, h= range ( 8 ) a, b, c, d, e, f, g, h= range ( 8 ) N = [ N = [ fb , c , d , e , f g , # a [b, c, d, e, f], # a fc , e g , # b [ c , e ] , # b fd g , # c [ d ] , # c b c g f e g , # d b c g [ e ] , # d f f g , # e [ f ] , # e fc , g , h g , # f [ c , g , h ] , # f f f , h g , # g [ f , h ] , # g a f f f , gg # h a f [ f , g ] # h ] ] >>> b i n N[ a ] >>> b i n N[ a ] d e h # Neighborhood membership d e h # Neighborhood membership True True >>> l e n (N[ f ] ) # Degree >>> l e n (N[ f ] ) # Degree 3 3 26 / 36 26 / 36 Graph Data Structures Graph Data Structures Option 3: Adjacency Dictionaries w/ Edge Weights Option 4: Dictionaries a, b, c, d, e, f, g, h= range ( 8 ) w/ Adjacency