Arxiv:2105.06318V1 [Cs.SI] 13 May 2021

Total Page:16

File Type:pdf, Size:1020Kb

Arxiv:2105.06318V1 [Cs.SI] 13 May 2021 Weighted directed clustering: interpretations and requirements for heterogeneous, inferred, and measured networks Tanguy Fardet1;2 and Anna Levina1;2 1 University of T¨ubingen,T¨ubingen,Germany 2 Max Planck Institute for Biological Cybernetics, T¨ubingen,Germany Weights and directionality of the edges carry a large part of the information we can extract from a complex network. However, many network measures were formulated initially for undirected binary networks. The necessity to incorporate information about the weights led to the conception of the multiple extensions, particularly for definitions of the local clustering coefficient discussed here. We uncover that not all of these extensions are fully-weighted; some depend on the degree and thus change a lot when an infinitely small weight edge is exchanged for the absence of an edge, a feature that is not always desirable. We call these methods \hybrid" and argue that, in many situations, one should prefer fully-weighted definitions. After listing the necessary requirements for a method to analyze many various weighted networks properly, we propose a fully-weighted continuous clustering coefficient that satisfies all the previously proposed criteria while also being continuous with respect to vanishing weights. We demonstrate that the behavior and meaning of the Zhang{Horvath clustering and our new continuous definition provide complementary results and significantly outperform other definitions in multiple relevant conditions. Using synthetic and real-world examples, we show that when the network is inferred, noisy, or very heterogeneous, it is essential to use the fully-weighted clustering definitions. CONTENTS 1. Barrat 14 2. Onnela 15 I. Introduction 1 3. Directed versions of the clustering coefficients 15 II. Interpretation and purpose of weighted clustering 2 D. Closure 15 A. Desired properties of weighted clustering 1. Undirected weighted closure 15 coefficients 2 2. Directed weighted closure 15 B. State of the art for weighted clustering 3 C. A continuous definition for weighted E. Network generation algorithms 16 clustering and closure 4 1. Core-periphery network 16 D. Directed weighted clustering 4 2. Watts{Strogatz 16 III. The advantages of fully-weighted definitions 5 F. Real-world networks 16 A. Sensitivity to weight-encoded topological 1. Mouse mesoscale connectome 16 features 6 2. Fediverse mesoscale network 17 B. Continuity and resilience to noise 6 3. Closure in the shuffled networks 17 4. Networks with a high number of single-node IV. Application to real world networks 8 triangles 17 A. Mouse mesoscale connectome 8 B. Decentralized social media: the Fediverse 8 C. Using local clustering to infer dynamical I. INTRODUCTION arXiv:2105.06318v2 [cs.SI] 29 Aug 2021 properties 10 The clustering coefficient (CC) was originally intro- V. Discussion 11 duced for binary undirected networks to quantify strong Acknowledgments 12 connectedness within a local neighborhood. It was de- fined as the fraction of all possible triangles that were re- References 12 alized i.e. the ratio between all triangles to which node i participates (n∆;i) and the total number of triangles that A. Limitations of other fully-weighted definitions 13 could theoretically be made given its degree di, which is 1. Holme et al. (2007) 13 the number of triplets (nT;i): 2. Miyajima and Sakuragawa (2014) 13 bin n∆;i n∆;i Ci = = (1) B. Comparison of clustering properties 13 nT;i di(di − 1) C. Derivation of the evolution of hybrid clustering From a neighbor-centric perspective, it can be seen per- coefficients 14 haps more intuitively as the probability that two neigh- 2 bors of a node are connected. However, as network sci- propose a new definition that obeys additional condi- ence expanded, more and more graphs were encountered, tions, including continuity of the results with respect to where directedness and edge weights play a central role. infinitesimal changes in edge weights, which has signifi- Generalizations of the clustering coefficient were there- cant consequences for the resilience to noise in inferred fore introduced to account for asymmetry in the connec- networks. We demonstrate why fully-weighted methods tions between pairs of nodes or heterogeneity in their are essential for measured and inferred networks, that are strength. pervasive in biological fields such as neuroscience, and The importance of clustering, including its directed for networks dealing with flows of information, money, variants, to understand complex dynamics on networks or goods that display a very broad weight distribution. has been stressed in multiple studies [1{4]. This is no- tably the case for the middleman motif which is a marker of feedforward loops in transcriptional networks, and of II. INTERPRETATION AND PURPOSE OF information transfer redundancy e.g. in neuroscience. WEIGHTED CLUSTERING More generally, such motifs will influence the evolution of dynamical processes on the networks, for instance syn- A. Desired properties of weighted clustering chronization patterns, and have been shown to character- coefficients ize families of networks such as transcription or language networks [2]. Finally, clustering is used in other mea- Weighted measures are crucial for many network types surements to access the small-world propensity of net- where the binary connectivity is either uninformative works [5] and the choice of a specific definition can there- (fully connected network) or displays similar or lower fore influence whether the network of interest will register heterogeneity compared to the weighted structure. In as small-world or not. this study, we focus on two classes of real-world net- In many applications network topology and weights are works: inferred or measured networks where there can measured only up to certain precision [6, 7]. For exam- be a large number of spurious (false positive) edges with ple, in neuroscience, the functional connectivity networks small weights; and networks associated with flows of in- measured using the indirect inference of connections from formation or goods, which often display broad weight the recorded activity [8, 9]. Accepting the inevitability distributions. This is notably the case for many net- of noise in a network brings forward new requirements on works in neuroscience, and more generally in information, the network measures, namely that they are stable to the transportation, or other social and economic networks. noise and do not change dramatically if the weights are Weights are essential to understand the dynamical pro- perturbed or weak connections are randomly omitted. cesses that occur in these networks, requiring measures There is no agreement among the researchers which that go beyond the binary structure. weighted extension of the clustering coefficient defini- There could be multiple requirements for weighted tion is most appropriate. The three predominantly used clustering coefficients [14] depending on the particular methods at the moment [10{12] differ in many properties question of interest and on the network properties. The of their definitions. Part of the reason for the absence of main requirements that we considered necessary for a a single best weighted clustering lies in a different inter- weighted clustering coefficient are: pretation of weights in various datasets. Consequently, a normalization (C 2 [0; 1]), different weighted extension might be most appropriate • i for various data and specific scientific questions. How- • consistency with the binary definition (for binary ever, to understand which method to use when and why networks, it should give back the classical result), we need to understand their differences precisely. The difficulty of extending graph measures to weighted • linearity (scaling by α all edges involving node i networks is not specific to the clustering coefficient but and all edges in triangles including node i scales Ci can occur whenever ratios of degrees or path-length by α), are involved. We will therefore also discuss a second • continuity (weak influence of the addition or dele- clustering-related measure, called the closure coefficient tion of edges having very small weights, mean- and introduced as the fraction of all open walks of length ing that an edge with infinitesimally small weight 2 starting from node i that are part of a triangle [13]. should be equivalent to the absence of that edge). This will also enable us to discuss the complementarity of closure and clustering as the former provides an im- Compared to a previously proposed list of condi- portant complement to analyze the tendency of nodes to tions [14], we added a continuity condition but did not form 3 and 4-cliques. include a requirement of a specific normalization factor We introduce here a distinction between fully-weighted (the global max(w)) as long as the normalization con- and hybrid definitions and discuss why, for several dition is fulfilled since only the normalization matters. classes of networks, fully-weighted and directed defini- We omitted the last two conditions of Saram¨aki'spaper tions should be preferred to other clustering definitions (invariance under weight permutation and ignorance of that are currently used for network analysis. We also weights not participating in any triangle). Although they 3 might be of interest for some specific applications, we do triangles to which node i participates. In terms of trian- not consider them to be generally desired properties for gle intensity, this definition was originally written: a clustering coefficient. We also did not require that all P wij +wik a a a weights in a triangle should be accounted for because this B j6=k 2 ij ik jk Ci = condition is necessarily met if the continuity condition is 2si(di − 1) fulfilled. (3) 1 X wij + wik Continuity can be expressed mathematically as follow: = aijaikajk di(di − 1) 2wi for a graph G(V; E), if a weighted edge (u; v; w) with j6=k u; v 2 V and weight w 2 is added to this graph to R B form a new graph G0(V; E0), with E0 = E + f(u; v; w)g, thus defining the intensity of triangle ∆ijk as I∆ijk = wij +wik then the clustering measure is continuous if and only if aijaikajk as the function of two of the triangle's 2wi (G0) (G) weights and the average weight of the edges connected to 8i 2 V , Ci −−−−! Ci .
Recommended publications
  • Annual Report and Accounts for the Year Ended 31 March 2020
    Annual Report and Accounts For the year ended 31 March 2020 Company Number: 07706036 About Nesta Nesta is an innovation foundation. For us, innovation means turning bold ideas into reality and changing lives for the better. We use our expertise, skills and funding in areas where there are big challenges facing society. Nesta is based in the UK and supported by a financial endowment. We work with partners around the globe to bring bold ideas to life to change the world for good. If you’d like this publication in an alternative format such as Braille or large print please contact us at [email protected] Design: Green Doe Graphic Design Ltd Annual Report and Accounts For the year ended 31 March 2020 Trustees 4 Chair and Chief Executive’s introductory statement 5 Strategic report 7 Financial review 23 Principal risks and uncertainties 29 Objectives 30 Governance and management 31 Independent auditor’s report 38 Financial statements 40 Reference and administrative details 75 Annual Report and Accounts: For the year ended 31 March 2020 Trustees Sir John Gieve Professor Anthony Lilley Christina McComb Nesta’s Chair Trustee Trustee Independent Chair of VocaLink Director of Scenario Two Ltd Chair of OneFamily, Chair of Standard and Chair of Homerton Life Private Equity Trust plc, Senior NHS trust Independent Director, Big Society Capital Heider Ridha Imran Khan Jimmy Wales Trustee Trustee Trustee Operating Partner Head of Public Engagement Founder of Wikipedia of TDR Capital at the Wellcome Trust and WT Social Joanna Killian Judy Gibbons
    [Show full text]
  • The Public Square Project
    THE PUBLIC SQUARE PROJECT The case for building public digital infrastructure to support our community and our democracy With majority support from Australians on curbing Facebook’s influence and role on our civic spaces, it is time to create an alternative social network that serves the public interest Research report Jordan Guiao Peter Lewis CONTENTS 2 // SUMMARY 3 // INTRODUCTION 5 // REIMAGINING THE PUBLIC SQUARE 10 // A NEW PUBLIC DIGITAL INFRASTRUCTURE 12 // CONSIDERATIONS IN BUILDING PUBLIC DIGITAL INFRASTRUCTURE 17 // TOWARDS THE FUTURE 19 // CONCLUSION 20 // APPENDIX — ALTERNATE SOCIAL NETWORKS OVER TIME The public square is a place where citizens come together, exchange ideas and mediate differences. It has its origins in the physical town square, where a community can gather in a central and open public space. As towns grew and technology progressed, the public square has become an anchor of democracy, with civic features like public broadcasting creating a space between the commercial, the personal and the government that helps anchor communities in shared understanding. 1 | SUMMARY In recent times, online platforms like Facebook In re-imagining a new public square, this paper have usurped core aspects of what we expect from proposes an incremental evolution of the Australian a public square. However, Facebook’s surveillance public broadcaster, centred around principles business model and engagement-at-all-costs developed by John Reith, the creator of public algorithm is designed to promote commercial rather broadcasting, of an independent, but publicly-funded than civic objectives, creating a more divided and entity with a remit to ‘inform, educate and entertain’ distorted public discourse.
    [Show full text]
  • Mcafee® Antivirus to Be 100% Safe 10/10 1
    Social Media: Tracking Its Exponential Growth Exponential Growth Please can you Join 2 Websites ➽Alternative Social Media Leave Tec-Tyrants Thought Police Guard Dogs FB, Twitter and Never go near Mini Thought Police Guard Dog-Quora in Mountain View, California, community Jimmy Wales, Richard A. Muller, Justin Trudeau, Barack Obama, Hillary Clinton, Adrián Lamo For tables, graphs, formulas please go to https://whaller.com/sphere/yia0ze# For My News punch and other documents please go to https://www.edocr.com/user/drdejahang02 Note Links are clickable @ WHALLER.COM Note Links are NOT clickable @ EDOCR WEBSITE GIVE GIVE In the Link text box, enter the URL of the external SCORE STARS website please have Norton 360™ or McAfee® Antivirus to be 100% safe 10/10 1. ALLOBLAK.COM https://www.alloblak.com/drdejahang19/ BETA.SOCIALL https://beta.sociall.io/u/5d21d5c0e53c2 c02a8e378cf/feed CHTTR.CO https://chttr.co/ @id14000578159 CYGONENETWORK.COM https://cygonetwork.com/go/user/DRDEJAHANG2 020 DIASPORA.COM https://joindiaspora.com/stream DIASPORTA JOINDIASPORA.COM https://joindiaspora.com/u/drdejf DIASSPORA.FIDAIS https://diaspora.flidais.net/people/e015 eea068050137c2364ed6a217099b DIASPORA.CLUB https://diaspora.club/people/a20af33046830137 02cc17a96ccf4e57 https://podupti.me/ https://diaspora.club/ https://diaspora.club/stream DARTO.COM https://darto.com/u/drdej197 ELLO.CO https://ello.co/drdej2016 ELLO https://ello.co/drdej2016/post/6n4tirgo muhclvatyrz9ba E-SATHI.COM https://www.e-sathi.com/Dejahang77 DIASPORA.CLUB https://diaspora.club/people/a20af33046830137
    [Show full text]
  • On Why There Is a Need to Conceptualise Privacy from a Marxist Perspective
    Privacy as a precondition to social protection: On why there is a need to conceptualise privacy from a Marxist perspective Mehek Vajawatt Msc, International Migration and Public Policy programme, Class of 2020, London School of Economics and Political Science Photo by Adrien Olichon on Pexels Photo cropped and modified to black and white ABSTRACT Concerns about privacy and its ambit have never been higher than they are today, in a post-Snowden and post-Cambridge Analytica world. These concerns bring to mind a quote from Franzen written in 1998 – “On closer examination, though, privacy proves to be the Cheshire cat of values: not much substance, but a very winning smile” (Franzen, 1998). It is interesting that ‘privacy’ as normative value is held as a ‘requisite of freedom’ (Douglas, 1952) and essential for ‘an autonomous life’ (Delaney and Carolan, 2008) while the contents of privacy are dismissed for being chimerical. Thus, the value attributed to the ‘right to privacy’ is far greater than the value attributed individually to ‘contents of privacy.’ This difference in normative values allows intrusions into ‘contents of privacy’ justifiable on grounds of a higher norm such as public interest or national security. Having contextualised the debate, I argue in my essay for an alternative conception of privacy, based on Karl Marx. Marxian ideology allows us to take seriously the critique of individualistic privacy notions as well as critically analyse the power imbalances of control and access of a State vis-à-vis an individual. In this essay, I engage with these chimerical characteristics of privacy; examining in the first half its liberal basis and its limitations.
    [Show full text]
  • Open PDF 125KB
    Select Committee on Communications and Digital Corrected oral evidence: Freedom of expression online Tuesday 9 February 2021 3 pm Watch the meeting Members present: Viscount Colville of Culross (Acting Chair); Baroness Bull; Baroness Buscombe; Baroness Featherstone; Lord Gilbert of Panteg; Baroness Grender; Lord Griffiths of Burry Port; Lord McInnes of Kilwinning; Baroness Rebuck; Lord Stevenson of Balmacara; Lord Vaizey of Didcot; The Lord Bishop of Worcester. Evidence Session No. 9 Virtual Proceeding Questions 79 - 87 Witness I: Jimmy Wales, Founder, Wikipedia and WT Social. USE OF THE TRANSCRIPT This is a corrected transcript of evidence taken in public and webcast on www.parliamentlive.tv. 1 Examination of witness Jimmy Wales. Q79 The Chair: Good afternoon. In our first evidence session today, we are joined by Jimmy Wales, the founder of Wikipedia, which needs no introduction. In 2019, he founded WikiTribune Social, a microblogging social network that does not use ads and relies on donations. We will be drawing on his expertise in campaigning against violations of free speech online through the Jimmy Wales Foundation. This session is being broadcast live online, and a transcript will be taken. Jimmy, would you like to introduce yourself and tell us a bit more about the organisations you represent? It would be helpful for the committee if you could give us a perspective on the broad themes we need to consider when looking at freedom of speech online. We will then take questions from the committee members. Jimmy Wales: I am the founder of Wikipedia, as you have mentioned; of Fandom, a more commercial wiki community of fans of TV shows and video games mainly; and of WT Social, a pilot project to create a new type of social network that is more controlled by the users, without the top-down moderation system that is in wide use everywhere else.
    [Show full text]
  • What Is Digital Public Infrastructure?
    NOVEMBER 2020 What is Digital Public Infrastructure? An essay, in the form of an FAQ, about the possibility of digital social spaces built with taxpayer dollars By Ethan Zuckerman, incoming associate professor, University of Massachusetts Amherst A PART OF THE OPEN MARKETS INSTITUTE 1 1 Abstract Societies operate on infrastructures: physical, digital, and social. At the intersection of digital and social infrastructures is a set of spaces that host critical conversations about civic, political, and social issues. At present, these spaces primarily are built and governed by large media companies who maintain them to collect user data and serve advertisements. What would happen if we built digital public infrastructures, digital social spaces built with taxpayer dollars with explicit civic goals? This article builds on a previous essay, The Case for a Digital Public Infrastructure, to propose a roadmap to build a robust ecosystem of public service digital spaces, tools and resources. The essay includes discussions of interoperability, taxation, common tool sets and more. Introduction In mid-March 2020, life across much of the United States came to an abrupt halt. As the novel coronavirus spread across the nation, many workers began working from home. Business and leisure travelers canceled flights and hotel reservations. One set of infrastructures – airports; train stations; and the crowded roads that bring workers to offices in the city – suddenly went quiet, while another set found itself under new strains. The shipping and trucking industries that bring food from farms shifted deliveries from restaurants to grocery stores, as millions more meals were served at home each day.
    [Show full text]
  • Content Moderation in Social Media and AI
    Content Moderation in Social Media and AI SUMMARY KEYWORDS content, social media, moderation, published, algorithms, users, people, platform, Facebook, hateful, Europe, wrote, regulator, problem, important, hate speech, legitimacy, difficult, fake news 00:02 Good morning, bonjour. I'm going to speak about content moderation in social media and AI. Let me share the screen now. So this is a topic of my presentation. First a few words to tell you where I'm coming from. I'm a computer science researcher at Inria French Government Institute. I'm also a board member at ARCEP, which is the French regulator of telecom, something like the French FCC. I'm also writing essays and novels. In the past, I've been a teacher in a number of places, including Stanford in the US. And I founded the startup Xyleme that's still existing. This is the organization of my talk, I will briefly talk about social media. But of course, you all know what this is. I'll talk about the responsibility of the social media, a little bit what's inside. And then we'll focus on content moderation, why it's difficult and why it's necessary to use machine learning. And then I'll conclude. The social media - t's important to realize how massive this is. There are 3.6 billion active users worldwide. Monthly, five social media are above 1 billion, and they're all from US or China. Facebook is very important in this setting. And not only for Facebook, but because of Instagram, Messenger, WhatsApp.
    [Show full text]
  • “I HAVE READ and AGREE to SHARE MY LIFE with YOU”: Building Trust with Comprehensible Terms of Service Agreements
    “I HAVE READ AND AGREE TO SHARE MY LIFE WITH YOU”: Building Trust with Comprehensible Terms of Service Agreements by Mert Kocabagli A TERMINAL PROJECT Presented to the School of Journalism and Communication In partial fulfillment of the requirements for the degree of Master of Advertising and Brand Responsibility Spring, 2020 Approved by: _______________________ Adviser: Maxwell Foxman, Ph.D. Introduction This paper seeks to advise on social media platforms’ inconsiderate design decisions that mislead users on their terms of services (ToS) and other related agreements. Within the context of social media platforms, the role of data collection and related design decisions have a big impact in today’s digital society and economy. In the current tech-driven society, brands’ business models rely on specifically written elements in the ToS on their products or services to collect consumers’ data to sell to third parties. If one looks at recent technology scandals, inconsiderate design decisions on the ToS and related unethical data practices are the cause of the majority of incidents. An example is the recent Facebook scandal involving Cambridge Analytica. According to the federal trade commission’s post on July 24, 2019, Facebook received the penalty for violating consumers’ privacy and its the largest penalty ever given by the U.S government for any violation. The purpose of the penalty is to discourage future privacy related violations and, more importantly, to address Facebook's approach to privacy (FTC, 2019). Consumer privacy is important for both social media platforms and its users in different ways; the lack of readability and awareness of the ToS and its content creates an advantage for platforms to manipulate its users' privacy and information.
    [Show full text]