Lecture 12- Examples of Networks and Power Laws in Action Small
Total Page:16
File Type:pdf, Size:1020Kb
Lecture 12- Examples of Networks and Power Laws in Action Small world fits much real data: most nodes connected through short path to other node, distance measured as # links along shortest path. Large cluster coefficient to reflect clustered n-hood. Lower characteristic path length C than in random n-hood or regular n-hood. And scale free – fraction of nodes with k neighbors decays by power law (after some point) which makes some nodes critical in hub-and-spoke network. Examples here show that networks/power laws fit the silly and the serious: 1.Marvel Comic Books In 2002 Spanish mathematicians analyzed Marvel Comics database making nodes significant Marvel characters and where two characters were linked when they jointly appear in the same comic book “after Issue 1 of Fantastic Four (dated November 1961), which is understood as the point of departure of the Marvel Age of Comics.” (http://arxiv.org/abs/cond-mat/0202174). Data came from the Marvel Chronology Project (MCP), which collected over 96 000 appearances by more than 6 500 characters in about 13 000 comic books, and thus yields extensive picture of the Marvel Universe. In 2011 amazon made available Marvel Universe Social Graph http://aws.amazon.com/datasets/5621954952932508. Last updated November 24, 2015 This can be set up as bipartite graph – with two different types of nodes/vertexes – Basic data consists of Comic A --- Spiderman, Wolverine, Hulk Comic B-- Spider, Hulk, Hobgoblin, Comic C:--Spider, Wolverine, Aquaman, Howard-the-Duck 3 comics, 6 characters average characters per book = (3+3+ 4)/3 = 10/3 average books per character = (3 +2+ 2+ 1 + 1+ 1)/6 = 10/6 Linked by identity that average characters/book x # books = average books/char x # character You can use this data to form a network with people connected to people (collaboration network) and comics as edges. Each person/vertex has degree distribution of # books character is in. You can form network of comics connected to comics with people as edges. Each book/vertex has degree distribution of # characters in a book. Basic data on appearances of characters in comic books: Number of characters: 6 486 Number of books: 12 942 Books per character: 14.9 --- # of appearances from 1 to 1625 (Spider-Man!); Mean characters per book: 7.47 3.12 The probability comic book has k characters in it is Pb(k) = k− – a power law. Most comic books have a few characters but a few have a lot. About 50% have 5 or fewer characters and 90% have around 10 or fewer. But a few have lot → uniform distribution from 1 to 10 and power law after 10 The probability a character appears in k books is Pc(k) = k−0.6610−k/1895-- which is smoother and has relatively small power law coefficient. Most characters appear in a few books, but superstar characters appear in many. Compare network characteristics to a random network. Could take 6486 characters as nodes & link characters randomly to a book to match the number of edges for character – ie if Hulk is in 2 comics, randomly choose which two. Marvel world diverges greatly from random. Average character collaborates with 52 others whereas if random would collaborate with 176 others. Table 2 Summary of results of the MU network. Mean partners per character: 51.88 Size of giant component: 6 449 characters (99.42%) Mean distance: 2.63 Maximum distance: 5 Clustering coefficient: 0.012 Distribution of partners: P(k) k−0.7210−k/2167 Size of giant component is large at 99.42% because vertices with high degree tend to connect to vertices with high-degree. Clustering coefficient is small, close to random graph because tail dominated by superheroes. The differences between MU network and others is a hint of the artificiality of the Marvel Universe. Paper suggestion: Look at another artificial network: Pro-wrestling shows or soap operas, where same characters interact on shows. Divide your week and treat part of each day as a comic book and you determine how many characters you interact with: family and living conditions might come closer to Marvel world than to movie and science worlds). 2.Scientists Networks: Two views of how science proceeds. 1.Individual great scientists/superstars – measured by power laws of papers citations, collaborators. Stars matter, rest are bit players. The policy implication is that we reward the superstars and largely ignore the rest. Set up science as tournament with rewards to individuals, per Tennis, Golf, Boxing 2.Team-based science – science is done in teams with many authors and with PIs giving guidance but where post-doctoral students, grad students, others contribute significantly. The policy implication is that worry about the team composition and incentives to team members. Evidence for the first view comes from power laws. 1) Number of papers Lotka (1926) law of inverse square of scientific productivity: Number of scientists (N) who publish P papers (in his case in Chemistry and Physics) = a P-2 : so that ln N = constant – 2 ln P # papers # of scientists 1 100 Get different coefficients for different fields but usually around 2 (Economics 2 25 coefficient is 1.84). Lots of curve fitting in this business. Some use Tsallis 3 11 statistic which are form with a parameter that adjusts so that distribution 4 6 has properties intermediate to Gaussian and Levy statistics. 5 4 Some use stretched exponential 2)Citations to Papers: some papers get a lot and some don't: 3. Cites to Top 10 Physicists Most Cited Physicists, 1981-June 1997 Out of over 500,000 Examined 2.Collaborative Network view Science is giant collaborative network. Giant connected component – all scientists linked up: would we find astrologers linked? Creationists? Clustering coefficients differ –> different modes of research: biologists more smaller clusters than physicists; Trend in co-authored papers; Papers with more authors get more cites; Papers with more references get more cites; Homophily in paper writing and in citations – National/ethnic/gender groups more likely to write together and to cite their group. Small number of extremely well-connected scientists serve as "brokers" for communications between others, with most connections among collaborators passing through them; Most of a particular scientist's connections to others in the field are through only one or two collaborators. –> Power to the connector Science is a small world. Mathematicians separated from one another by 7.6 links, while the 1.6 million biomedical researchers in the analysis were separated by only four links. THE ECONOMICS in Networks relates to Decision for collaborative research. There are two decisions: writing paper; – alone, with 1,2, … N others; Decision to cite other papers. Important because #papers and #cites important in promotion decisions. Why collaborate? Time is chief input. If writing paper with you takes ½ time and I get full credit for paper, wow! But more likely I get half credit. Then no reason to write with you unless paper actually has productive payoff. We imagine it works best with comparative advantage; I do the blue sky, you do the heavy math, I do the writing. We mix biology and engineering to create Frankenstein Monster Can tweets predict citations? Other Networks Sports: www.baseball-reference.com/oracle/ www.math.uaa.alaska.edu/~afkjm/nbawelcome.html Game of thrones: http://www.lyonwj.com/2016/06/26/graph-of-thrones-neo4j-social-network-analysis/ Vikings – http://nautil.us/blog/vikingstheyre-just-like-us-social-networks-in-norse-sagas Criminal networks https://papers.ssrn.com/sol3/papers2.cfm?abstract_id=945369 Financial control of multinationals http://journals.plos.org/plosone/article? id=10.1371/journal.pone.0025995 Does knowing that people form social network help policy? If we could identify 2-3 most connected terrorists, could we use that to weaken the terrorist network? Say you removed 2-3 top terrorists, what would happen? If network is “endogenous” structure, 2-3 new people would move into the jobs of the removed. Say you broke up big banks? .