<<

Decentralized Social Networks: Pros and Cons of the Platform

Charlot R. Shaw Division of Science and Mathematics University of Minnesota, Morris Morris, Minnesota, USA 56267 [email protected]

ABSTRACT be email, in which different email providers can send mes- sages between themselves on behalf of their users. A user with a gmail address can message someone with a proton- Mastodon is a federated designed to allow di- mail address just as easily as they could message a fellow versity in users, small but interconnected communities, and gmail user, and there is no need for a a supervising entity user-ownership of their data. It builds on standardized pro- that manages all email. tocols, and follows a decentralized model. This architecture Email is one exmaple of a federated protocol, but there allows for more truly public online , and has a num- are others. The World Wide Web Consortium(W3C) pub- ber of difference from its closest popular alternative, Twit- lished the ActivityPub standard [2] in 2018, a federated ter. However, the Mastodon network is negatively affected protocol for . This standard is not a social net- by various forces, and is potentially more fragile than its work, it is instead an agreed upon set of signals and ac- distributed nature suggests. tions that allow common interactions in social media, such as following a user, liking some content, or sharing content Keywords of one’s own. Software that implements the standard can inter-operate with one another, and form a federated net- Mastodon, Social Networks, Decentralized Social Networks work. Data from one program is intelligibly presented on users in other programs, and connections between users on 1. INTRODUCTION different implementations is as simple as connecting to a Online social networks have become a common platform user on the same implementation. This network is collo- the world over, a means of communicating and a medium quially termed the Fediverse (Federated universe), and this of discovery. Despite their global reach, the most common paper will use the term when discussing the ActivityPub social networks operate on a centralized model with a single network as whole. platform provider, operating the entire platform. For ex- 2.1.2 Mastodon ample, there is only one service, and it is the one Mastodon is a software built on subset of managed by Twitter. However, the centralized model is not the ActivityPub standard, and published under the Affero the only option. A decentralized social network known as General Public License 3.0 in 2016. It will be familiar to Mastodon has been operating since 2016, and has accumu- users of Twitter, with users posting short messages, known lated a mesh of independent communities and users (over a as Toots, and receiving Toots written by other users they million at time of writing). Mastodons decentralized struc- are subscribed to. With federation, users sign up for a spe- ture changes how its users interact with one another, how cific server, also known as an instance. They receive a feed the social network grows, and how it behaves under stress. of public Toots written by other users on the server, this is Although far from flawless, Mastodon, and its continuing known as the local timeline. Additionally, they see all pub- growth show that the centralized model is not the only op- lic Toots from users anyone on the server has subscribed to. tion for online social networks. This is the federated timeline, and provides the most com- plete vision of the fediverse as a whole for an individual user. 2. MASTODON As there is no managing entity, there is no server which sees every message in the fediverse, or even the subset of the 2.1 Background fediverse that consists only of Mastodon servers. Because of this, there is no truly global view, each user can only 2.1.1 Federation see what their instance can see. Some servers host bots, Federation is the concept that independent servers can Mastodon account controlled by software, that systemati- communicate through a common protocol and work together cally follow as many other users as possible. As these bots without sacrificing independence. A common example would exist on the server, the content the bot has followed appears in the federated timeline, widening the servers perspective on the fediverse. This work is licensed under the Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. To view a copy of 2.1.3 Federation Follower Example this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/. UMM CSci Senior Seminar Conference, November 2020 Morris, MN. Suppose we have a mastodon instance, running at the URL xeno.space, with a user, Alex. Alex finds a - tween April 2017 and July 2018, and found 239 thousand ger they like, Babbar, has posted about his Mastodon ac- unique users and 62% of all toots in the crawled graph. The count on yottabyte.rocks, and wants to follow him. When remaining could not be accessed due to the instances con- Alex tells their server this, xeno.space opens a connection to figuring themselves to block crawling, or the toots being yottabyte.rocks, and requests yottabyte.rocks let xeno.space non-public. know whenever the account [email protected] posts a These datasets include both snapshots of the network, public message, and that Babbar has a new follower named and long term data on how Mastodon grew and changed. [email protected]. Although not discussed thoroughly here, the overall trend The next time Babbar posts something that is marked is steady growth, with spikes occurring that likely corre- as public, yottabyte.rocks sends a message to xeno.space, spond to popular campaigns on other platforms, includ- which then notifies Alex. xeno.space also places Babbars ing #DeleteFacebook, or with publication of article about post in the federated timeline, for other xeno.space users to Mastodon in mass media, such as Wired Magazine. [3] see. 3. EFFECTS OF MASTODONS MODEL 2.2 Graphs With the rough structure of Mastodon understood, and The connections between users and those they follow, or an introduction to the data out of the way, we can now talk between intercommunicating servers form graphs, collections about what this means. First, the main issue that Mastodon of nodes with connections between them. This data is one sets out to address, and how it does so. Second, the differ- of the best ways available to understand the Mastodon net- ences in its structure compared to Twitter, its closest pop- work, but we need to make sure we understand what graph ular analog. Third, some issues with Mastodons federation we are referring to, as there are two interconnected ones that model, and some proposed strategies to mitigate them. shape the Mastodon network’s activity. The network graph consists of the connections made by 3.1 Benefits servers when their users interact. These are carried out To understand what benefits Mastodon provides, and what through HTTP(s) requests, and routed by the DNS servers issues it was designed to avoid, let’s compare it with its that underpin the internet. They are utilitarian in nature, closest analog, Twitter. Though their internal structures created by the software, and as much as possible hidden from are vastly different, they both are microblogging platforms, the user. The content of these messages has forms specified focused on short, public messages. Mastodon improves on by the ActivityPub standard, thus allowing the different Twitter in three ways we will enumerate here: public space, servers to parse the messages automatically. privacy and diversity. The social graph is the relations between users: who fol- lows who, who has liked which posts, etc. This graph is made 3.1.1 Public Space of the connection records in each instance’s database. They Twitter is definitely a public space in one sense, with any are created as reactions to signals received through the net- action occurring before a potentially vast audience. How- work graph, and the network graph forms and abandons con- ever, as Aral Balkan observes [1], Twitter is closer to a shop- nections between servers based on them. These records are ping mall. The public is admitted to the property, and quite available on request, to verify that [email protected] often people carry out social interactions there. However, really is being followed by [email protected] for example. the purpose of a mall is not to facilitate a public space, but The network graph is complex, but not altogether novel. rather to achieve profits. Socializing at the mall serves this It consists of the connections between servers required to motive, but if the socialization hampers the commerce of the exchange the messages required by the social graph. The mall, it’s in the mall’s interests to put a stop to it. Simi- social graph is constrained by the structure of the network larly, the mall focuses its amenities on those who contribute graph, with users being able to discover new content only to profits. Twitter is in the position of a mall here, main- by their instance encountering it while serving a fellow users taining an appearance of being an open and public space, request, no matter how popular that content is elsewhere in while maintaining the same profit motive. the network. Mastodon at first glance has merely shuffled this problem Mastodon has not implemented a recomender algorithm to around, with each and every instance being a mall-like space. steer user connections; this community driven discovery pro- However, the Mastodon network as a whole doesn’t belong cess is the only way users encounter each other on-platform. to any one instance. The public space Mastodon creates is Thus makes Mastodon a valuable data-source for the behav- the network between every node, which is a open protocol. ior of communities online, especially decentralized ones. [7] However, the decentralized nature of Mastodon does make 3.1.2 Privacy study difficult in some ways. There is no way to access Twitter holds all its user’s data, and tracks them even the network wholesale through negotiation with a single en- offsite. [5] Their motivation for this is profit, through selling tity. Instead, researchers carry out automated explorations that data, access to that data, or insights derived from that of public instance and user data, starting with as broad a data. Privacy from this standpoint, is something between list of servers as possible, and following the connections be- users. Although options like blocking unwanted contacts, tween users to discover new instances and thus map the or protected Twitter accounts offer more privacy, none of network. [7] them protect the user from Twitter itself. Consider two sce- Some of the crawls considered in this paper were carried narios: If a hundred random people learned a different fact out over six months starting in August of 2018, and covered about you, that would be potentially unpleasant. However, 479,425 users, an estimated 46% of all users in the Mastodon if one person learned one hundred different facts about you, network. [7] The other crawl considered was carried out be- the situation is worse. The odds of something you would not wish known is a hundred times greater. The amount of nology” and “Programming” were the most common topics, information that person can infer, correctly or not, is also with various forms of science cumulatively taking second increased. In Twitter’s case, this same entity is also collect- place. Instances dedicated to various forms of art, including ing information on everyone else, not just you. Twitter may “writing” and “anime” were also well represented. [7] claim that no humans read this data, or that it works with- Aside from the amount of servers dedicated to a topic, out exposing user information. The author of this paper there is also interesting patterns in how users interact with does not find comfort in the reassurance that our surveil- topics. For example, instances dedicated to “Journalism” or lance is automated and industrialized. “General”, are more common than average, but have lower Being decentralized, Mastodon already has a vast advan- user counts per-instance. [3] From this, one could suppose tage in terms of truer privacy. Again, issues arise with the that servers with less to differentiate themselves from others administrators of an instance having access to the data of on a common topic tend not to attract users as readily as a the instance users, but there is no central entity that sees all server that is unique in its topic. information. Furthermore, each instance has less users than The inverse of this patterns exists, with topics served by Twitter, and so less data to offer if they wished to monetize few instances with a disproportionate number of users. This it. If an instance did monetize its user’s data, Mastodon has has it’s clearest examples in instances covering erotic or sex- tools to allow easy switching of instances, so users would ual subjects. Though few in number, these instances host have little reason to stay on an instance that did not respect a larger number of users than servers not focused on erotic their privacy. content. This creates an environment where instance owners have strong incentives to protect users information, and users are 3.2.2 Instance Behavior Patterns not locked into any particular host. Topics might reveal patterns of behavior across instances, but within an instance, unique patterns also emerge. Look- 3.1.3 Diversity ing at the social graph, one finds that users in an instance Twitter, as singular platform, is placed in the unenviable behave like other users in the same instance, specifically in position of trying to please everyone. With all users existing terms of their interactions and connections. in more or less the same virtual space, there is bound to be One could assume that intraconnections between users on conflict between user’s different styles of communication, ex- the same server would be more common, as they have imme- pectations of behavior, etc. assuming that all parties on the diate access to each other and would share common interests. platform are acting in good faith. terms of service, Some instances, following this pattern, have few connections and various moderation systems are necessarily generic, and to other instances. However, this is not the common case, whether they are fair guidelines or well applied is beyond with Zignani et al. finding that 35-40% of all users have the scope of this paper. It is sufficient to note that they significant inter-instance connection. [8] Is the opposite true exist, and must necessarily take a one-size-fits-all approach. then? Are users using their home instance simply as a ser- Mastodon, by its decentralized model, does not have ev- vice to access the wider fediverse? Again, Zignani finds this ery one on the same platform. Each instance can create and to somewhat untrue, with some instances being very outgo- enforce its own code of conduct, with users being able to opt ing, and others very insular. (Ignoring the extreme case of out by switching instances or hosting their own instance if the user on a single user instance, who has only outgoing they find themselves in an uncomfortable position. Instance connections.) Mastodon allows a wide variety of instance moderators also have effective tools to filter content coming behavior, and Zignani et al. find that the instances match from other servers, so content coming in through the feder- this diversity. ation can be rejected if it doesn’t fit the code of conduct. In general we confirm that the architecture based This raises an issue of siloization, of users only encoun- on independent instances has a stronger impact tering those who have compatible viewpoints, and creating on the how users’ ego-networks cluster; some- a fractured perception of reality. This issue is wider than times instances act as a bound on how the neigh- Mastodon, and occurs on most social networks, including borhood of their members clusters, other times centralized ones. The author acknowledges this, but did not instances promote an external clustering. focus their research on this issue specifically, and so cannot state whether or not Mastodon handles this issue better or The culture of an instance has a measurable effect on its worse than other platforms. users behavior. This also is expressed in how likely users are to form mutual connections, or only one-sided ones (Fol- 3.2 Differences lower/Followee). This is not correlated with instance sizes; Not all of Mastodon’s features are clearly beneficial or it is not the case that users in smaller instances are seeking detrimental. The instance structure provides explicit groups connections with larger servers. Nor are large servers seek- whose interrelations are of interest. Additionally, instances ing small servers in any significantly differing way. Instead, have individual traits, some explicit and others implicit that users are still following the unique style of their own server. are visible in the social graph. A crucial thing to note though, is all the effects of instance style are of lesser import than the effects of instance location. 3.2.1 Instance Topics Users have a very strong tendency not to interact with users Starting with instances, Zignani et al. collected the top- outside their country; fewer than 10% of users have 50% or ics that instances chose to self-describe with. A very large more of their connections going to users in instances in other group of these describe themselves as “General”, or do not countries. [8] state any topics at all. Unsurprisingly, for an open source network growing by word of mouth, the specific topics“Tech- 3.3 Drawbacks Challenges in the Decentralised Web IMC ’19, October 21–23, 2019, Amsterdam, Netherlands

(a) (b) (c) Figure 2: Dissecting Instances with open and closed (invite-only) registrations. a Distribution of number of toots and users ( ) per-instance b Number of instances, toots and users for open and closed registrations; c Distribution of active users (max ( ) ( ) percentage of users logged-in in a week per instance) across all instances. Challenges in the Decentralised Web IMC ’19, October 21–23, 2019, Amsterdam, Netherlands

ASN Instances Failures IPs Users Toots Org. RankPeers AS9370 97 1 95 33.4K 3.89M Sakura 2.0K 10 AS20473 22 4 21 5.7K 936K Choopa 143 150 AS8075 12 7 12 1.7K 35.4K Microsoft 2.1K 257 AS12322 9 15 9 123 4.4K Free SAS 3.2K 63 AS2516 9 4 8 559 102K KDDI 70 123 AS9371 8 14 8 165 4.7K Sakura 2.4K 3 Table 1: AS failures per number of hosted instances. Rank refers to CAIDA AS Rank, and Peers is the number of net- works the AS peers [8].

Figure 1: Distribution of instances, users, and Figure 5: Distribution of instances, users, and toots acrossFigure 8: Distribution of per-day downtime (measured every Figure 3: Distribution of number of instances, toots and thetoots(messages) top-5 countries (top) across and top ASes 5 countries (bottom). and hosting Figure 2: Distribution of downtime over number of providers. ￿ve minutes) of Mastodon instances (binned by number of users across various categories. toots on an instance, with the average totals and a toots),comparable and Twitter snapshot (Feb–Dec of Twitter. 2007). 4.3 Instance Hosting Mastodon is decentralized, a fully functional network of Unlike a centrally administered deployment, where presence can be independent servers. However, there are a number of pres- slower rate. intelligentlysures that selected, can cause Mastodon’s hidden interdependencies infrastructure between follows servers. a bottom- In Mastodon and other decentralized networks, the net- up approach,These drives where towards administrators re-centralization independently are present decide in the where work often fails by degrees, which is a significant feature, theyMastodon place their network, instance. and Figure we will 5 presents explore ahow breakdown they can of mani- the pres- but also means that the network is practically never going encefest, of instances, and the consequences toots, and users for theacross network countries if left and unchecked. Autonomous to be in a perfect state. Systems (ASes). (a) 3.3.1 Massive Servers 3.3.3 Centralization Through Hosting Providers Figure 10: CDF of continuous outage (in days) of instances Countries.Most usersJapan coming dominates from a in centralized terms of the service number want of to instances, join As most instance are hosted by volunteers and often funded not accessible for at least one day (Y1-axis) and number of usersthe and biggest toots. server, In total, or don’t it hosts understand 25.5% of that all instances,mastodon.social closely fol- by themselves or through donations, there is an emphasis on toots and users a￿ected due to the outage (Y2-axis). lowedis not by the the US only which option. hostsmastodon.social 21.4%. Closeritself inspection as the reveals largest that cost-efficiency in hosting providers. This means that only theserver, ratio between headed the by numberthe primary of instances developer, and has number tried various of users dif- three hosting services support for more than 62% of all users. tactics to balance between people joining the network at all, Amazon alone hosts 6% of all instances, but due to the size fer across countries though. For example, Japan hosts a just quarter 105 instances to be down on one day (23 July 2018), removing nearly and people joining mastodon.social specifically. [4] of those instances, those account for 30% of all users in the 200K toots from the system. Closer inspection reveals that this was of instances,This is yet bigger gains than 41%mastodon.social of all users; in contrast,, as there France are other hosts 16% Mastodon network. Cloudflare is the second most common caused by the Let’s Encrypt CA short expiry policy (90 days), which of instances,servers of yetenormous accumulates size out only there. 9.2% In fact,of users. 52% It of is all also users worth hosting service, and it hosts(b) 5.4% of all servers, and 31.7% on Mastodon are hosted in only 10% of all servers. [3] This is of all messages. [3] Though a massive failure in Amazon or simultaneously expired certi￿cates for all 105 instances. In total, noting that these countries are heavily interconnected, as instancesFigure 9: a Footprint of certi￿cate authorities among the mustafederate threat totogether, the diversityi.e., users of the on network, one instance as moderators may follow and users Cloudflare( ) would be a nightmare scenario for much of the in- these certi￿cate expirations were responsible for 6.3% of the outages admins on those 10% of all servers hold far greater sway instances.ternet, smallerb Unavailability changes in policy, of instances price or technology (on a per-day could ba- Figure 4: Distribution of instances and users across instances on another instance (thereby creating federated subscription links ( ) observed in our dataset. w.r.t. prohibited and allowed categories. over the network than a more even distribution would grant sis).negatively effect large portions of the Mastodon network. betweenthem. them, Though see Sectionproblematic 2). in and of itself, these massive Consider Figure 1, which shows how top heavy this distri- AS Dependencies. Another potential explanation for some in- Toservers capture will their be shown interdependency, to have other Figureeffects later 6 presents on. a Sankey bution is. Something not shown in the graph is the long tail stance unavailability is that AS-wide network outages might occur. diagram; along the left axis are the top countries hosting instances,toots)of clearly diverse have hosting the providers, most downtime with the (median average 12%),host holding those with Due to the co-location of instances within the same AS, this could Mastodon UI also has a “content warning” (CW) checkbox for and the3.3.2 graph Social depicts Graph the fraction Failure of Example their federated subscriptionsoverto only 1M toots 10 instances. actually have worse availability than instances with obviously have a widespread impact. To test this, we correlate the Given these warnings, what is the actual uptime of the the posters to give advance notice that there are spoilers in the instancesSupposing hosted we in other have three countries users, (right on three axis). different Unsurprisingly, servers. between the 100K and 1M (2.1% 0.34% median downtime). In fact, above instance unavailability to identify cases where instances Mastodon network? Followingvs. the data from Raman et al in all content. Interestingly, while 35% instances prohibit posting spoilers [email protected] exhibit homophily:who follows [email protected] of an instance follow other, who usersthe correlation between the number of toots on an instance and its in a given AS simultaneously fail — this may indicate an AS out- in turn follows [email protected]. Angela can Figure 2, we can see that for the average user, Mastodon has without a content warning, the remaining 65% explicitly allow this. on instancessee Babbar’s in the posts, same and country, Babbare.g., can32% see of federated Alex’s. If links Babbar are withdowntimean order is -0.04, of magnitudei.e., instance worse popularity availability is not than a good Twitter predictor did of age. Table 1 presents a summary of the most frequent failures (we replies to Alex, Angela would see that reply, and might even availability.when it wasThe of ￿gure a similar also size includes in 2007, Twitter’s 1.25% versus downtime 10.75%. in 2007 [3] for consider it to be an AS failure if all instances hosted in the same reply herself, which would mean anyone who followed her comparisonSurprisingly, (see Sectionalthough 3). less Although active instances we see havea number more down-of outliers, AS became unavailable simultaneously). We only include ASes that time, the toot counts of an instance is not a good predictor would then see that post. This is how messages can spread even Twitter, which was famous for its poor availability (the “Fail host at least 8 instances (to avoid mistaking a small number of beyond the single hop to the followers of the poster and their of it’s uptime.[3] federated timelines. Whale” [28]), had better availability compared to Mastodon: its failures as an entire AS failure). We observe a small but notable 221 Now consider yottabyte.rocks fails, perhaps due to a soft- average3.3.4 downtime User Abandonmentwas just 1.25% vs. and10.95% the Social for Mastodon Graph instances. set of outages. In total, 6 ASes su￿er an outage. The largest is by ware bug. Alex’s posts might now have no way to reach Certi￿Wecate can Dependencies. now see the downtimeAnother of individual possible reason servers, for which failures AS9370 (Sakura, a Japanese hosting company), which lost 97 in- Angela, and Alex can’t see Babbar or his posts. Until yot- affect the the users on them, but how is the network af- is third party dependencies, e.g., TLS certi￿cate problems (Mastodon stances simultaneously, rendering 3.89M toots unavailable. The AS tabyte.rocks comes back online, Babbar can neither post, fected when connections are severed? This is measured by with most outages (15) is AS12322 (Free SAS), which removed 9 nor can anyone see his posts. If yottabyte.rocks is unrecov- usesthe HTTPS Largest by default). Connected To Component(LCC),test if this may have which caused measure issues, we erable, Babbar has ceased to exist. takehow the certifar a￿ postcate could registration spread by data sharing from alone.crt.sh Not[11 all], and users check instances. These outages are responsible for less than 1% of the In a centralized service, the network status tends to be whichare certi connected￿cate authorities to each other, (CAs) so there are used will by be instances,separate islands presented failures observed, however, their impact is still signi￿cant. In total, more binary. Either the service is up and running, or it is in Figurein the 9(a). socialLet’s graph, Encrypt wherehas a post been has chosen spread as as CA far for as it more the- than these AS outages resulted in the (temporary) removal of 4.98M down. Some grey areas exist in case a regional server goes oretically could, and there are no more connections to the toots from the system, as well as 41.5K user accounts. Although this down, and so traffic must be routed to a distant server at a 85%remaining of the instances, parts of likelythe graph. because this service o￿ers good au- tomation and is free of cost [31]. This, again, con￿rms a central centralisation can result in such vulnerabilities, the decentralised dependency in the DW. We also observe that certi￿cate expiry is management of Mastodon makes it di￿cult for administrators to a noticeable issue (perhaps due to non-committed administrators). coordinate placement to avoid these “hot spots”. Figure 9(b) presents the number of instances that have outages Outage durations. Finally, for each outage, we brie￿y compute caused by the expiry of their certi￿cates. In the worst case we ￿nd its duration and plot the CDF in Figure 10 (blue line, Y1-axis). While

223 Challenges in the Decentralised Web IMC ’19, October 21–23, 2019, Amsterdam, Netherlands

Toots from #Home Users Toots Instances Domain Home Users Users OD ID OD ID OD ID Run by AS (Country) mstdn.jp 9.87M 23.2K 22.5K 24.7K 71.4M 1.94M 1352 1241 Individual Cloud￿are (US) friends.nico 6.54M 8998 8809 23.3K 37.4M 2.57M 1273 1287 Dwango Amazon (JP) pawoo.net 4.72M 30.3K 27.6K 15.4K 34.9M 1.4M 1162 1106 Pixiv Amazon (US) mimumedon.com 3.29M 1671 507 7510 435K 366K 420 524 Individual Sakura (JP) imastodon.net 2.34M 1237 772 10.8K 2.37M 1.52M 711 865 Individuals (CF) Amazon (US) mastodon.social 1.65M 26.6K 24.8K 16.1K 30.9M 525K 1442 1083 Individual (CF) Online SAS (FR) mastodon.cloud 1.54M 5375 5209 106 7.35M 337 1198 39 Unknown Cloud￿are (US) mstdn-workers.com 1.35M 610 576 12.5K 4.18M 1.98M 735 850 Individual (CF) Amazon (JP) vocalodon.net 914K 672 653 8441 2.6M 853K 981 631 Bokaro bowl (A) Sakura (JP) mstdn.osaka 803K 710 363 1.64K 2.68M 2.1M 561 862 Individual Google (US) Table 2: Top 10 instances as per number of toots from the home timeline. (OD: Out Degree, ID: In Degree, CF: Crowd-funded, IMC ’19, October 21–23, 2019, Amsterdam, Netherlands A. Raman et al. A: maintained by selling Bokaro Bowl album).

Impact of Removing ASes. As discussed earlier, many instances are co-located in a small number of hosting ASes. We now inspect the impact of removing entire ASes, and thus all instances hosted within. Naturally, this is a far rarer occurrence than instance fail- ures, yet they do occur (see Section 4.4). We do not present this as a regular situation, but one that represents the most damaging theoretical impact. For context, AS-wide collapse might be caused by catastrophic failures within the AS itself [20, 29] or via their network interconnections [17]. Figure 13(b) presents the LCC and number of components for GF , while iteratively removing the ASes, ranked by both the number of instances (blue) and number of users (red). At ￿rst, we see that 92% of all instances are within a single LCC. This LCC covers 96% of all (a) (b) users. The graph shows that removing large ASes, measured by the Figure 11: CDF of the out-degree distribution of the so- Figure 13: Impact of node removal attacks. Each sub￿gure number of instances (blue), has a signi￿cant impact on GF . The size Figuremeasures, 4: on Removals Y1 axis, LCC sorted and, by on users Y2 axis, hosted the number (red), of the largest connected component decreases similarly whether cial follower graph, federation graph, and Twitter follower toots posted (green), and instances hosted (blue). FigureFigure 12: Impact 3: Mastodon’s of removing proportion user accounts of connected from G V, E . of components for GF I, E , by removing: a the top N in- we remove the largest ASes when ranked by instances hosted (blue) graph. Raman et al. use the term Autonomous( ) System(AS) Eachusers(solid iteration (X green), axis) represents and number the removal of strongly of the con- remain- stances (each node in GF is an instance); and b the top N or by number of users (red). However, the number of connected to refer to Hosting Providers. ( ) ing 1%nected of the components(solid highest degree nodes. red). A snapshot of Twit- Autonomous Systems, including all instances hosted within. components in GF increases drastically when we remove the ASes almost all instances (98%) go down at least once, a quarter of them ter(dashed lines) from 2011 provided for compari- hosting the largest users rather than ASes ranked by number of son. X axis is percentage of total nodes removed. instances: the removal of just 5 ASes shatters the federation graph are unavailable for at least one day before coming back online, Ideally, important instances should have stable and predictable instances would be a linear decay, but with the top 5 hosting into 272 components when sorted by users hosted, compared to just providers hosting 20% of instances and 85% of all users [3], ranging from 1 day (21%) to over a month (7%). Figure 10 also reports funding. Curiously, we ￿nd less conventional business models, may remove these users. Indeed, individual instance failures, which 139 when ranking by the #instances in the AS. This is explained e.g., failure on the hosting provider level leaves a massive gap in the number of toots and users a￿ected by the outages: 14% of users we examine next, can take out huge subsets of users from the global by the central role of a few ASes: the top 5 ASes by users cover vocalodon.net,First, Raman an instance et al. considered dedicated the to music effect ofthat users funds leaving itself by the network. cannot access their instances for a whole day at least once. Naturally, creatingthe network, compilation to see albums how a massfrom abandonmentuser contributions. scenario could social graph. only 20% of instances (yet comprise 85% of users); when ranked by these measures are closely correlated to toot unavailability (i.e., spiral, prompting further abandonment. Mastodon has a 3.4Impact Possible of Removing Improvements Instances. As discussed earlier, instance number of instances, the top 5 covers 42% of instances (and 33.6% Impact of Removing Users. The above ￿ndings motivate us to of users). toots become unavailable when their host instance goes o￿ine). strong social graph at the outset, with 99.95% of all users failuresThough are all not these uncommon, issues and are can problematic, have an impact most that of exceeds them explore the impact of removing nodes from these graphs. Although Thus, when AS failures occur, Mastodon shows signi￿cantly In the worst case, we ￿nd one day (April 15, 2017) where 6% of all in the LCC. With the top 1% of most connected users re- seemtheir local rooted user in base socioeconomic due to the (federated) constraints. cross-instance However, intercon- that we aremoved, primarily the LCC interested only covers in infrastructure 26.38% of all outages, users. we This start is by nectivity of users in the social follower graph. Therefore, we next worse resilience properties than previously seen for just instance (global) toots were unavailable for the whole day. These ￿ndings does not mean that software solutions cannot play a part. failures (Figure 13(a)). This is driven by the fact that the top ￿ve evaluatingan incredibly the impact steep of drop. removing For comparison, individual users Twitter from in the 2011 social Ramanmeasure et the al. resilience studied of the instance effectiveness federation of replication, graph (GF . In of suggest a need for more reslient approaches to DW management. ) ASes by number of instances hosted — OVH SAS (FR), Scaleway had an LCC which covered 95% of all users, and with the storingFigure 13(a), toots we and report other the impactdata on of instance servers failuresother than on GF origi-.We graph, G V, E . This would happen by users deleting their accounts. (FR), Sakura Internet (JP), Hetzner Online (DE), and GMO Internet removal of the top 10%, only dropped to 80%. [3] naliteratively instance. remove This the adds top a N significant instances, ordered layer of by complexity, their size; we as Such a failure is not unique to the DW, and many past social net- (JP) — account for 42% of all instances. Their removal yields a 49% 5 EXPLORING FEDERATION It is possible that Mastodons instance system contributes Mastodonrank by both is numberstructured of users around (red) a and users’ number instance of toots being (green). the works have failed simply by users abandoning them [46]. Here, we reduction in the size of LCC in the federation graph, leaving behind to the creation of these keystone users, in that popular users soleWhen source ranking of truth via either about metric, them. we However, notice a replicationremarkably strate-robust The previous section has explored the central role of independent an LCC which only covers 45% of instances and 66% of users. This repeatare past followed methodologies from many to instances, test the butresilience other connections of the social be- graph gieslinear hold decay large in potential.the size of the All LCC, the proposed and a corresponding system relied increase on a constitutes a radical drop in the capacity of Mastodon to dissemi- instances within the Mastodon ecosystem. The other major innova- by removingtween those the instances top users areand not computing established. two metrics: If the popular (i) the size globalin the indexnumber of of toot components. replicas being maintained, likely through nate toots via the federated subscription links. Indeed, removing tion introduced by the DW is federation. Here we inspect federation of theuser Largest vanishes, Connected all those Component interconnections (LCC), go which with them. represents An- the a distributedUnlike the hashdrastic table breakdown. of the social graph, this elegant other possibility could be that these users with high amount them not only wipes out a large number of nodes, but also results in through two lenses: (i) the federated subscription graph that inter- maximum potential number of users that toots can be propagated degradation is caused by the more uniform degree distribution of connections could be explorative bots. Raman et al. do 3.4.1of the federation Subscription graph (asReplication compared against traditional social a smaller number of components which still remain. That said, the connects instances (Section 5.1); and (ii) the distributed placement to (via shares); and (ii) the number of disconnected components, linear degradation of the instance federation graph discussed previ- not state whether they attempted to filter out bot users. If networksOne of the [6]). simplest We emphasise methods that of the replication instance federation was having graph in- and sharing of content (toots) via this graph (Section 5.2). This ously provides some limited protection against a more catastrophic whichthis relates is true, to the the social number graph of isolatedcould have communities a different shape retaining for in- stanceshows thestore potential copies connectivity of toots they of instances. received However, through individual subscrip- section studies the resilience properties of DW federation in light failure as observed with the Mastodon social graph. Although a ternalhuman connectivity users. for propagating toots. These metrics have been tions,instance with failures instances would stilllooking havefor an enormous a toot on impact an offline on the serversocial of the frequent failures observed earlier. used to characterise the attack and error tolerance of social and lookinggraph. up backup replicas. other3.3.5 graphs Infrastructure [3, 23, 51]. failure and the Social Graph This is less effective, as it essentially spreads the central- 5.1 Breaking the User Federation We proceedEarlier, inwe rounds, noted theremoving unintentional the top 1% centralization of remaining of nodes in- in ization out one hop in all directions. According to simu- stances and hosting providers. Raman et al. now examine lation carried out by Raman et al. under this system a Federation allows users to create global follower links with users eachthe iteration, issues such and computing centralization the can size cause. of the LCC in the remaining network with the top ten servers by toots removed, the net- on other instances. This means that instance outages (Section 4.4) graph,In as the well case as ofthe instances, number the of new largest components instances can created be clas- by the work would lose access to only 2.1% of all toots. Compared 225 can create a transitive ripple e￿ect, e.g., if three users on di￿erent removalsified of by crucial number connecting of messages, nodes. or by Figure number 12 presents of users. the results to the 62.69% loss without replication, this is already a sig- instances follow each other, U1 U2 U3, then the failure of the as a sensitivitymastodon.social graph.holds The results the greatest con￿rm number that the of users, user follower but nificant improvement. This improvement largely holds true ! ! mstdn.jp has more toots, despite having over 3,000 less users. [3] for the removal of the top ten hosting providers by toots, instance hosting U2 would also disconnect U1 and U3 (assuming graph is extremely sensitive to removing the highly connected Similarly, hosting providers can be ranked by number of in- with 18.66% lost with replication, compared to 90.1% with- that no other paths exist). To the risk, Figure 11 presents users.stances Although hosted Mastodon or by number appears of to users be a strong those instances social graph, sup- with out. [3] the degree distribution of these graphs, alongside a snapshot of 99.95%port. of users Both rankingsin the LCC, have removing similar effects just the on top the 1% size of of accounts the Subscription based replication re-centralizes itself rapidly the Twitter follower graph (see Section 3). We observe traditional decreasesLCC when the LCC used to to 26.38% remove of hosting all users. providers, but removal by though, as the majority of replicas would be hosted on servers power law distributions across all three graphs. Although natural, Asnumber a comparison, of users causes we use catastrophic the Twitter failure social in graph the number from 2011 with a large amount of subscribers. These large servers this creates clear points of centralisation, as outages within highly whenof Twitter components was of a similarthe LCC. age Removal as Mastodon of 5 hosting is now providers (and beset would already be among the worst ones to lose, even if they by number of users breaks the graph into 272 components, were not a key part of a backup mechanism. The same is- connected nodes will have a disproportionate impact on the overall withwhereas frequent removal “fail whale” by number appearances of instances [28 only]). Without results in any 139 node sue is also mirrored in in terms of popular users receiving graph structure [3]. removals,components. Twitter’s 3.3.5 LCC This contained directly 95% shows of users the effect [10]; of removing massive the excessive amount of backup, whereas more out of the way To add context to these highly connected instances, Table 2 top 10%instances, still leaves and how 80% unintentional of users within centralization the LCC. This weakens con￿rms the that users receive none; 23% of all toots would have more than summarises the graph properties of the top 10 instances (ranked Mastodon’sfediverse. social Losing graph, comparable by comparison, number of is number far more of sensitiverandom to 10 replicas, while 9.7% would have no replicas at all. [3] by the number of toots generated on their timeline). As well as user removals. Although we expect that the top users on any plat- having very high degree within the graphs, we also note that these form will be more engaged, and therefore less likely to abandon the popular instances are operated by a mix of organisations, including platform, the availability of top users to every other user cannot be companies (e.g., Pixiv and Dwango), individuals, and crowd-funding. guaranteed since there is no central provider and instance outages

224 Challenges in the Decentralised Web IMC ’19, October 21–23, 2019, Amsterdam, Netherlands 3.4.2 Randomized Replication The most important detail however, is that Mastodon isn’t Avoiding the centralization inherent in following the social a theoretical idea. It is a network, growing in the wild, graph, the other technique Raman et al. studied was random and built to uphold a common standard published by the replication, where every new toot is replicated on one or W3C. [2] Though analysis of how it grows and the issues it more servers chosen at random from the whole federation. faces is important, it also is not waiting for to be theoreti- This adds a great deal of complexity, needing a systems cally sound. With frequent updates from over eight hundred to fairly distribute replication duties, and avoid the abuse authors [6], Mastodon is present and active. by over burdening an individual server with replication re- quests. It also would need a way to index all instances, Acknowledgments without that system itself becoming a point of centraliza- The author would like to thank their academic advisor, as- tion. sociate professor Elena Machkasova, their seminar advisor, If these issues are overcome, then a single randomly placed professor Nic McPhee, and seminar teacher, professor KK replica can keep 99.2% of all toots available after removing Lamberty. the 25 instances by number of toots. Subscription based replication would have 95% of toots still available after- 5. REFERENCES wards. More randomized(a) No replication replicas have increasingly(b) No replication better (c) Subscription Replication (d) Subscription Replication persistence, as demonstrated by Figure 5. [1] A. Balkan. Encouraging individual sovereignty and a FigureAlthough 15: Availability complicated, of tootsrandomized based on replicationa removing could ASes, pro- with ASeshealthy ranked commons, based on 2017. #instances, #toots and #users hosted; ( ) andvideb aremoving way to avoid instances, the downsides ranked byof semi-centralized #users, #toots, and net- #connections[2] E. S. with A. G. other Christopher instances. Lemmer In c and Webber,d , we report the toot ( ) ( ) ( ) availabilityworks, though when it does replicating not actually across address all instances the issue that of follow cen- them. Jessica Tallon and E. Prodromou. Activitypub. tralization in the network itself. Technical report, World Wide Web Consortium, 2018. [3]on A. the Raman, DW. For S. example, Joglekar, both E. D. Mastodon Cristofaro, and N. PeerTube Sastry, use and Ac- tivityStreamsG. Tyson. andChallenges ActivityPub; in the thus, decentralised they can exchange web: The data. An overviewmastodon of these case. various In Proceedings protocol can of the be found Internet in [21]. ResearchersMeasurement have Conference also looked, IMC at ’19,security page and 217–229, privacy New in the York, NY, USA, 2019. Association for Computing DW [40, 44], mostly around securing data management. For ex- Machinery. ample, various projects have attempted to decentralise data, e.g., [4] E. Rochko. The role of mastodon.social in the DataBox [38], SOLID [32], and SocialGate [26]. These operate local mastodon ecosystem, 2019. datastores for individual users, e.g., running on a physical home [5] M. Stockley. More relevant ads with tailored audiences, appliance.2013. Applications wishing to access user data must be granted permission, potentially through a prior negotiation mechanisms. [6] TootSuite. Authors.md, 2020. [7]Social M. Zignani, Network S. Measurements. Gaito, and G. P.A Rossi. number Follow of measurement the studies“mastodon”: have analysed Structure “centralised” and evolution social of networks a decentralized like Face- bookonline [39, 49 social] and network. Twitter [10 In, Twelfth15, 27]. These International have revealed AAAI a range FigureFigure 16: 5: Availability Availability of toots percentage when removing of all top toots instances with of properties,Conference including on Web attributes and Social of Media the social, 2018. graph and content (measuredthe top by ten #toots) hosting with providersrandom replication. and instances Lines for re-n > [8]generation. M. Zignani, Bielenberg C. Quadri,et al. S.performed Gaito, H. the Cherifi, ￿rst study and G. of P. a DW 4moved.overlap indicating Raman et availability al. use abovethe term 99% (97%) Autonomous in case of application,Rossi. The Diaspora footprints [5]. When of a “mastodon”: inspecting its How growth, a they found decentralized architecture influences online social instanceSystem(AS) (AS) failures. to refer to Hosting Providers. a network far smaller than the one we observe on Mastodon. There hasrelationships. been a small set In ofIEEE recent INFOCOM works that 2019-IEEE focus on Mastodon. Zig- naniConferenceet al. collected on Computer and released Communications Mastodon datasets, Workshops as well as 4. CONCLUSIONS exploring(INFOCOM several WKSHPS) features, e.g.,, pagesthe social 472–477. graph, IEEE, placement 2019. of in- these were largely peer-to-peer, LifeSocial.KOM [18] and Peer- e.g., stances and content warnings [54, 55]. Also, studies have focused SONMastodon [7], and relied as a onplatform entirely has decentralised its shares protocols.of downsides, Unfortu- in- cluding the drive towards centralization and the impact of on friend recommendations [47] and sentiment analysis [9]. We nately, they did not achieve widespread adoption partly due to chal- unreliable servers. Some of these problems Mastodon can complement these works with a focus on availability, covering the lenges related to both performance [4] and reliability [25]. Thus, a engineer solutions towards such as replication [3]. Others, key aspects of federation. We also inspect the nature and deploy- new wave of DW platforms has emerged that rely on a server-based such as the growth of megaservers are still a subject of de- ment of instances, as well as their topical interests. To the best of federated model, Mastodon. Before that, the most successful bate, and likelye.g., will need to be resolved through community our knowledge, this paper constitutes the largest study to date of platform was Diaspora, which saw growth around 2011 [19]. Di- action over software features. [4] Mastodon. asporaSome o￿ ofers these -like changes functionalityare simply differences using a federated to what model has similarcome before to Mastodon. in the However,history of unlike social Mastodon, networks. it Though relies on not be- spokedecentralized, protocols (rather one could than point standards), towards and the its micro-communities user population has 7 CONCLUSION sinceof reddit stagnated. as an example of a similar arrangement of users and This paper presented a large-scale measurement study of the Decen- interestsSince Diaspora’s on a significant creation, platform. several (semi-)standard It is not a stretch federation to say pro- tralised Web (DW) through the lens of Mastodon. We focused on tocolsthat have the limitations been proposed imposed allowing by instances Mastodon’s to exchange structure data are[41]. exploring challenges arising from two key innovations introduced Thesewithin include the constraints OStatus [37 other], which social allows networks real-time have interaction imposed, be- by the DW: (i) the decomposition of a global service into many tweenand still instances; been successful. [24] for discovering information about independent instances; and (ii) the process of federation, whereby Mastodon also shows a way out of the non-public gather- entities on other instances; and ActivityPub [1] for publishing in- these instances collaborate and interact. ing places issue that most other social networks represent. formation about social activities. Attempts have also been made We found that Mastodon’s design decision of giving everyone This is a significant benefit, besides the emphasis on user tocontrol standardise and strong the format moderation of these data tools structures, that it provides.e.g., ActivityS- Ac- the ability to establish an independent instance of their own has treamscess to [2 the]. This source standardisation code is also e￿ highlyorts have beneficial. had a dramatic impact led to an active ecosystem, with instances covering a wide variety

227