
' $ Analysis and visulization of large networks with Pajek Vladimir Batagelj University of Ljubljana Vienna, St. Stephen’s Cathedral Methodenforum der Fakultat¨ fur¨ Sozialwissenschaften Universitat¨ Wien, 21-22. 6. 2007 & version: June 18, 2007 / 03 : 11% V. Batagelj: Analysis and visulization of large networks with Pajek 2 ' Outline $ 1 Networks ...................................... 1 6 Complexity of algorithms ............................. 6 7 Pajek ...................................... 7 11 Approaches to large networks ........................... 11 12 Statistics ...................................... 12 19 Clusters, clusterings, partitions, hierarchies .................... 19 20 Representations of properties ............................ 20 30 Example: Snyder and Kick World Trade ...................... 30 35 Clustering ..................................... 35 38 Contraction of cluster ............................... 38 42 Subgraph ...................................... 42 44 Important vertices in network ........................... 44 52 Dense groups .................................... 52 60 Connectivity .................................... 60 64 Cuts ........................................ 64 69 Citation weights .................................. 69 70 k-rings ....................................... 70 75 Islands ....................................... 75 Methodenforum der Fakultat¨ fur¨ Sozialwissenschaften, Universitat¨ Wien, 21-22. 6. 2007 & L L S L G L S L L % V. Batagelj: Analysis and visulization of large networks with Pajek 3 ' $ 84 Bipartite cores ................................... 84 91 Directed 4-rings .................................. 91 96 Pattern searching .................................. 96 101 Multiplication of networks ............................. 101 105 Networks from data tables ............................. 105 107 EU projects on simulation ............................. 107 116 What else? ..................................... 116 http://vlado.fmf.uni-lj.si/pub/networks/doc/tut/Vienna07.pdf Methodenforum der Fakultat¨ fur¨ Sozialwissenschaften, Universitat¨ Wien, 21-22. 6. 2007 & L L S L G L S L L % V. Batagelj: Analysis and visulization of large networks with Pajek 1 ' Networks $ A network is based on two sets – set of vertices (nodes), that represent the selected units, and set of lines (links), that represent ties between units. They determine a graph.A line can be directed – an arc, or undirected – an edge. Additional data about vertices or lines can be known – their prop- erties (attributes). For example: Alexandra Schuler/ Marion Laging-Glaser: Analyse von Snoopy Comics name/label, type, value, . Network = Graph + Data The data can be measured or computed. Methodenforum der Fakultat¨ fur¨ Sozialwissenschaften, Universitat¨ Wien, 21-22. 6. 2007 & L L S L G L S L L % V. Batagelj: Analysis and visulization of large networks with Pajek 2 ' $ Networks / Formally A network N = (V, L, P, W) consists of: • a graph G = (V, L), where V is the set of vertices and L = E ∪ A is the set of lines; A is the set of arcs and E is the set of edges. n = |V|, m = |L| • P vertex value functions / properties: p : V → A • W line value functions / weights: w : L → B Methodenforum der Fakultat¨ fur¨ Sozialwissenschaften, Universitat¨ Wien, 21-22. 6. 2007 & L L S L G L S L L % V. Batagelj: Analysis and visulization of large networks with Pajek 3 ' Example: Wolfe Monkey Data $ inter.net inter.net sex.clu age.vec rank.per *Vertices 20 *vertices 20 *vertices 20 *vertices 20 1 "m01" 1 15 1 2 "m02" 1 10 2 3 "m03" 1 10 3 4 "m04" 1 8 4 5 "m05" 1 7 5 6 "f06" 2 15 10 7 "f07" 2 5 11 8 "f08" 2 11 6 9 "f09" 2 8 12 10 "f10" 2 9 9 11 "f11" 2 16 7 12 "f12" 2 10 8 13 "f13" 2 14 18 14 "f14" 2 5 19 15 "f15" 2 7 20 16 "f16" 2 11 13 17 "f17" 2 7 14 18 "f18" 2 5 15 19 "f19" 2 15 16 20 "f20" 2 4 17 *Edges 1 2 2 1 3 10 1 4 4 1 5 5 1 6 5 1 7 9 ... 1 8 7 1 9 4 Important 1 10 notes 3: 0 is not allowed as vertex number. Pajek doesn’t support Unix text files – 1 11 3 1 12 7 lines should 1 13 be ended3 with CR LF. 1 14 2 1 15 5 1 16 1 Methodenforum der Fakult 1 at¨ 17 fur¨ Sozialwissenschaften, 4 Universitat¨ Wien, 21-22. 6. 2007 L L S L G L S L L & 1 18 1 % 2 3 5 2 4 1 2 5 3 2 6 1 2 7 4 2 8 2 2 9 6 2 10 2 2 11 5 2 12 4 2 13 3 2 14 2 2 15 2 2 16 6 2 17 3 2 18 1 2 19 1 3 4 8 3 5 9 3 6 5 3 7 11 3 8 7 3 9 8 3 10 8 3 11 14 3 12 17 3 13 9 3 14 11 3 15 11 3 16 5 3 17 9 3 18 4 V. Batagelj: Analysis and visulization of large networks with Pajek 4 ' Size of network $ The size of a network/graph is expressed by two numbers: number of vertices n = |V| and number of lines m = |L|. 1 In a simple undirected graph (no parallel edges, no loops) m ≤ 2 n(n − 1); and in a simple directed graph (no parallel arcs) m ≤ n2. The quotient γ = m is a density of graph. mmax Small networks (some tens of vertices) – can be represented by a picture and analyzed by many algorithms (UCINET, NetMiner). Also middle size networks (some hundreds of vertices) can still be represented by a picture (!?), but some analytical procedures can’t be used. Till 1990 most networks were small – they were collected by researchers using surveys, observations, archival records, . The advances in IT allowed to create networks from the data already available in the computer(s). Large networks became reality. Large networks are too big to be displayed in details; special algorithms are needed for their analysis (Pajek ). Methodenforum der Fakultat¨ fur¨ Sozialwissenschaften, Universitat¨ Wien, 21-22. 6. 2007 & L L S L G L S L L % V. Batagelj: Analysis and visulization of large networks with Pajek 5 ' Large Networks $ Large network – several thousands or millions of vertices. Can be stored in computer’s memory – otherwise huge network. Usually sparse m <<n2; typical: m = O(n) or m = O(n log n) . network size n = |V | m = |L| source ODLIS dictionary 61K 2909 18419 ODLIS online Citations SOM 168K 4470 12731 Garfield’s collection Molecula 1ATN 74K 5020 5128 Brookhaven PDB Comput. geometry 140K 7343 11898 BiBTEX bibliographies English words 2-8 520K 52652 89038 Knuth’s English words Internet traceroutes 1.7M 124651 207214 Internet Mapping Project Franklin genealogy 12M 203909 195650 Roperld.com gedcoms World-Wide-Web 3.6M 325729 1497135 Notre Dame Networks Internet Movie DB 113.6M 1324748 3792390 IMDB Wikipedia 53.8M 659388 16582425 Wikimedia US patents 82M 3774768 16522438 Nber SI internet 38M 5547916 62259968 Najdi Si Pajek datasets. Methodenforum der Fakultat¨ fur¨ Sozialwissenschaften, Universitat¨ Wien, 21-22. 6. 2007 & L L S L G L S L L % V. Batagelj: Analysis and visulization of large networks with Pajek 6 ' $ Complexity of algorithms From some thousands to some (tens) millions of units (vertices). Let us look to time complexities of some typical algorithms: T(n) 1.000 10.000 100.000 1.000.000 10.000.000 LinAlg O(n) 0.00 s 0.015 s 0.17 s 2.22 s 22.2 s LogAlg O(n log n) 0.00 s 0.06 s 0.98 s 14.4 s 2.8 m √ SqrtAlg O(n n) 0.01 s 0.32 s 10.0 s 5.27 m 2.78 h SqrAlg O(n2) 0.07 s 7.50 s 12.5 m 20.8 h 86.8 d CubAlg O(n3) 0.10 s 1.67 m 1.16 d 3.17 y 3.17 ky For the interactive use on large graphs already quadratic algorithms, O(n2), are too slow. Methodenforum der Fakultat¨ fur¨ Sozialwissenschaften, Universitat¨ Wien, 21-22. 6. 2007 & L L S L G L S L L % V. Batagelj: Analysis and visulization of large networks with Pajek 7 ' Pajek $ The main goals in the design of Pajek are: • to support abstraction by (recursive) decomposition of a large network into several smaller networks that can be treated further using more sophisti- cated methods; • to provide the user with some powerful visualization tools; • to implement a selection of efficient subquadratic algorithms for analysis of large networks. With Pajek we can: find clusters (components, neighbourhoods of ‘important’ vertices, cores, etc.) in a network, extract vertices that belong to the same clusters and show them separately, possibly with the parts of the context (detailed local view), shrink vertices in clusters and show relations among clusters (global view). Methodenforum der Fakultat¨ fur¨ Sozialwissenschaften, Universitat¨ Wien, 21-22. 6. 2007 & L L S L G L S L L % V. Batagelj: Analysis and visulization of large networks with Pajek 8 ' $ Pajek’s data types In Pajek analysis and visualization are performed using 6 data types: • network (graph), • partition (nominal or ordinal properties of vertices), • vector (numerical properties of vertices), • cluster (subset of vertices), • permutation (reordering of vertices, ordinal properties), and • hierarchy (general tree structure on vertices). Pajek supports also multi-relational, temporal and two-mode networks. Methodenforum der Fakultat¨ fur¨ Sozialwissenschaften, Universitat¨ Wien, 21-22. 6. 2007 & L L S L G L S L L % V. Batagelj: Analysis and visulization of large networks with Pajek 9 ' $ ... Pajek’s data types The power of Pajek is based on several transformations that support different transitions among these data structures. Also the menu structure of the main Pajek’s window is based on them. Pajek’s main window uses a ‘calculator’ paradigm with list-accumulator for each data type. The operations are performed on the currently active (selected) data and are also returning the results through accumulators. The procedures are available through the main window menus. Frequently used sequences of operations can be defined as macros.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages122 Page
-
File Size-