Network Science
Total Page:16
File Type:pdf, Size:1020Kb
Data-driven Pattern Discovery using Network Science Frank Takes LIACS, Leiden University D4N Meeting | January 27, 2017 Network Science | D4N Meeting | January 27, 2017 1 / 50 Data Data: facts, measurements or text collected for reference or analysis (Oxford dictionary) Unstructured data: data that does not fit a certain data structure (text, some numeric measurements) Structured data: data that fits a certain data structure (table, tree, network, etc.) Network Science | D4N Meeting | January 27, 2017 2 / 50 Data Analysis Data Mining Pattern Discovery Data Science Big Data Network science: analyzing \big" structured data consisting of objects connected via certain relationships, in short: networks Interest from: mathematics, computer science, physics, biology, economics, social sciences, . Data Network Science ! Data Network Science | D4N Meeting | January 27, 2017 3 / 50 Data Mining Pattern Discovery Data Science Big Data Network science: analyzing \big" structured data consisting of objects connected via certain relationships, in short: networks Interest from: mathematics, computer science, physics, biology, economics, social sciences, . Data Network Science ! Data Data Analysis Network Science | D4N Meeting | January 27, 2017 3 / 50 Pattern Discovery Data Science Big Data Network science: analyzing \big" structured data consisting of objects connected via certain relationships, in short: networks Interest from: mathematics, computer science, physics, biology, economics, social sciences, . Data Network Science ! Data Data Analysis Data Mining Network Science | D4N Meeting | January 27, 2017 3 / 50 Data Science Big Data Network science: analyzing \big" structured data consisting of objects connected via certain relationships, in short: networks Interest from: mathematics, computer science, physics, biology, economics, social sciences, . Data Network Science ! Data Data Analysis Data Mining Pattern Discovery Network Science | D4N Meeting | January 27, 2017 3 / 50 Big Data Network science: analyzing \big" structured data consisting of objects connected via certain relationships, in short: networks Interest from: mathematics, computer science, physics, biology, economics, social sciences, . Data Network Science ! Data Data Analysis Data Mining Pattern Discovery Data Science Network Science | D4N Meeting | January 27, 2017 3 / 50 Network science: analyzing \big" structured data consisting of objects connected via certain relationships, in short: networks Interest from: mathematics, computer science, physics, biology, economics, social sciences, . Data Network Science ! Data Data Analysis Data Mining Pattern Discovery Data Science Big Data Network Science | D4N Meeting | January 27, 2017 3 / 50 Data Network Science ! Data Data Analysis Data Mining Pattern Discovery Data Science Big Data Network science: analyzing \big" structured data consisting of objects connected via certain relationships, in short: networks Interest from: mathematics, computer science, physics, biology, economics, social sciences, . Network Science | D4N Meeting | January 27, 2017 3 / 50 Network Science | D4N Meeting | January 27, 2017 4 / 50 Network Science | D4N Meeting | January 27, 2017 5 / 50 Networks Network/graph: objects and relationships G = (V ; E) Objects/entities/nodes/vertices V = n j j Relationships/ties/links/edges E = m j j Data attributes are annotations on the nodes and the edges Enrich using labels, weights and multiple node and edge types Examples: Online social networks Scientific citation and collaboration networks Webgraphs Biological networks Communication networks Corporate networks Network Science | D4N Meeting | January 27, 2017 6 / 50 One-mode labeled network Source: http://web.stanford.edu/class/cs224w Network Science | D4N Meeting | January 27, 2017 7 / 50 Two-mode weighted network Source: http://toreopsahl.com Network Science | D4N Meeting | January 27, 2017 8 / 50 LIACS collaboration network Network Science | D4N Meeting | January 27, 2017 9 / 50 Branch of data science focusing on network data Method in complexity research Complex systems approach: the behavior emerging from the network reveals patterns not visible when studying the individuals Network science Network science: understanding data by investigating interactions and relationships between individual data objects as a network Networks are the central model of computation Network Science | D4N Meeting | January 27, 2017 10 / 50 Network science Network science: understanding data by investigating interactions and relationships between individual data objects as a network Networks are the central model of computation Branch of data science focusing on network data Method in complexity research Complex systems approach: the behavior emerging from the network reveals patterns not visible when studying the individuals Network Science | D4N Meeting | January 27, 2017 10 / 50 Example: PPI network 1706 proteins 6207 interactions Network Science | D4N Meeting | January 27, 2017 11 / 50 Example: PPI network 1706 proteins 6207 interactions Network Science | D4N Meeting | January 27, 2017 11 / 50 Example: PPI network 1706 proteins 6207 interactions Network Science | D4N Meeting | January 27, 2017 12 / 50 Visualization of PPI network Network Science | D4N Meeting | January 27, 2017 13 / 50 Visualization of PPI network Network Science | D4N Meeting | January 27, 2017 14 / 50 Real-world networks Topological characteristics 1 Density 2 Degree Power law 3 Components Giant component 4 Distance Small world 5 Clustering coefficient Network Science | D4N Meeting | January 27, 2017 15 / 50 Directed networks Indegree indeg(v) = 4 Outdegree outdeg(v) = 3 Degree distribution: frequency of each degree value. Follows a power law distribution with a \fat tail" Degree u w u w v x v x y z y z Figure : Undirected network Figure : Directed network Undirected networks: degree deg(v) = 5 Network Science | D4N Meeting | January 27, 2017 16 / 50 Degree distribution: frequency of each degree value. Follows a power law distribution with a \fat tail" Degree u w u w v x v x y z y z Figure : Undirected network Figure : Directed network Undirected networks: degree deg(v) = 5 Directed networks Indegree indeg(v) = 4 Outdegree outdeg(v) = 3 Network Science | D4N Meeting | January 27, 2017 16 / 50 Degree u w u w v x v x y z y z Figure : Undirected network Figure : Directed network Undirected networks: degree deg(v) = 5 Directed networks Indegree indeg(v) = 4 Outdegree outdeg(v) = 3 Degree distribution: frequency of each degree value. Follows a power law distribution with a \fat tail" Network Science | D4N Meeting | January 27, 2017 16 / 50 Outdegree distribution Network Science | D4N Meeting | January 27, 2017 17 / 50 Indegree distribution Network Science | D4N Meeting | January 27, 2017 18 / 50 Giant component Network Science | D4N Meeting | January 27, 2017 19 / 50 Components in PPI network Network Science | D4N Meeting | January 27, 2017 20 / 50 Components in PPI network Network Science | D4N Meeting | January 27, 2017 21 / 50 Distance in PPI network Network Science | D4N Meeting | January 27, 2017 22 / 50 Topics in Network Science Graph Representation and Structure Network Modeling Link Prediction Spidering and Sampling Centrality Visualization Algorithms Graph Compression Community Detection Diffusion Contagion, Gossiping and Virality Privacy, Anonymity and Ethics Network Science | D4N Meeting | January 27, 2017 23 / 50 Centrality Network Science | D4N Meeting | January 27, 2017 24 / 50 Centrality Given a social network, which person is most important? What is the most important page on the web? Which protein is most vital in a biological network? Who is the most respected author in a scientific citation network? What is the most crucial router in an internet topology network? Network Science | D4N Meeting | January 27, 2017 25 / 50 Degree centrality Undirected graphs { degree centrality: measure the number of adjacent nodes deg(v) C (v) = d n 1 − Directed graphs | indegree centrality and outdegree centrality Local measure O(1) time to compute Network Science | D4N Meeting | January 27, 2017 26 / 50 Degree centrality Network Science | D4N Meeting | January 27, 2017 27 / 50 Network Science | D4N Meeting | January 27, 2017 28 / 50 Degree centrality Loras Tyrell Jhiqui Janos Slynt Pycelle Renly Baratheon Doreah Varys Wine Seller Irri Ilyn Payne Petyr Baelish Gregor Clegane Qotho Illyrio Mopatis Septa Mordane Barristan Selmy Daenerys Targaryen Mycah Lancel Lannister Mirri Maz Duur Sansa Stark Lyanna Stark Stannis Baratheon Jorah MormontMeryn TrantSandor Clegane Robert Baratheon Viserys Targaryen Hot Pie Rakharo Hugh of the Vale Joffrey Baratheon Drogo Syrio ForelJaime Lannister Eddard Stark Gendry Mago Arya Stark Beric Dondarrion Myrcella Jory CasselThe Three-Eyed Raven Samwell Tarly Grenn Cersei Baratheon Rickon Stark Jon Snow Rodrik Cassel Will Gared Pypar Alistair Thorn Joer Mormont Yoren Ros Bran Stark Walder Frey Catelyn StarkRobb Stark Waymar Royce Rast Maester Aemon Tywin Lannister Theon Greyjoy Tyrion Lannister Benjen Stark Maester Luwin Timett Mord Greatjon Umber Old Nan Shagga Lysa Arryn Osha Kurleket Chella Bronn Knight of House Frey Hodor Galbart Glover Kevan Lannister Vardis Egan Willis Wode Robin Arryn Shae Figure : Character co-occurence network. Node size based on degree. Network Science | D4N Meeting | January 27, 2017 29 / 50 Closeness centrality Closeness centrality: the average distance to each other node in the graph 1 X Cc (v) = d(v; w) n 1 − w2V where d(v; w) is the length of a shortest