<<

Proceedings of the 54th Hawaii International Conference on System Sciences | 2021

Profiling Online Platforms: vs.

Veruska Ayora Flávio Horita Carlos Kamienski Federal University of ABC, Federal University of ABC, Brazil Federal University of ABC, Brazil [email protected] [email protected] [email protected]

Abstract governments, policymakers, authorities, and businesses to understand market trends and behavioral patterns Online Social Networks (OSN) have been increasingly [38]. used as sources of information for different Several studies have also demonstrated the applications, ranging from business, politics, and potential of OSN in defining 's sentiments about public services. However, there is a lack of information events, incidents, products, and services [41]. The on OSN platforms' behavior that may impact big data potential of Twitter as a platform for improving processing and real-time services. In this paper, two of awareness over variables of interest and thus the most widely used social networks, Instagram and supporting a more informed decision-making process Twitter, are investigated to broaden the understanding has been highlighted in the literature for different of how each platform's message characteristics areas, such as earthquake detection [46], influenza influence data completeness and latency. We virus epidemics [1][6], disasters [14][25], elections performed a series of experiments to emulate data [21], mobility [10], organizational issues [43], or posting and collection automatically. Our results crimes [33]. increase the level of transparency of the platforms' Existing methods are mostly focused internal behavior, showing that both can deliver data on event detection [20][52], content analysis [27], and with reasonably low latencies and high completeness, rumor analysis [13], which can describe specific but Twitter can be up to eight times faster when it phenomena often retrieve data to transform it into comes to messages. services. The challenge now is to investigate how the behavior of such platforms can influence data 1. Introduction completeness and latency. This additional understanding is necessary as OSN platforms were not originally designed to support real-time services, even The increasing use of Online Social Networks though they belong to private providers and data (OSN) in different areas reveals a critical and mostly audition is not allowed [7]. unexplored question regarding the performance In this context, the main goal of this paper is to provided by their underlying platforms. With the rapid examine the influence of message characteristics of growth and proliferation of OSN platforms, a vast OSN platforms (in particular, photo, video, ) amount of -generated content becomes a valuable on data completeness and latency. We shed some light information source for applications in different areas. on the currently unexplored and poorly understood Also, data is widely accessible since it can be collected OSN platform behaviors, increasing the level of through web-crawlers or public . These two transparency of their internal working. Particularly, we characteristics, .e., massive and open data, represent seek to answer the following research questions: the primary motivation for most OSN research [41]. User-generated messages on OSN platforms, such 1 2 RQ1: Are there messages shown to the Twitter and as Instagram and Twitter , have emerged as powerful, Instagram users, but not readily shown to other users? real-time means of information sharing on the . If so, what proportion of posted messages can be By the online communication of billions of individual reliably retrieved? users, these platforms have involuntarily created a global participatory sensing network that can be RQ2: Does any message characteristic (e.g., photo, harnessed as an observatory of social events. Data video, ) affect the decision to make them collected from OSN provides social, economic, and available to other users? cultural information that can be utilized by RQ3: After a user posts a message, how long does

1 it take for Twitter and Instagram to make it available https://www.instagram.com/about/us to other users? 2 https://about.twitter.com/

URI: https://hdl.handle.net/10125/70955 978-0-9981331-4-0 Page 2792 (CC BY-NC-ND 4.0) information that could shed light on why and how the Here, we analyze two social networks, namely, the events and patterns measured by physical sensors platform Twitter and the photo- and emerge. video-sharing platform Instagram, as they have The potential of this approach has also been millions of users worldwide and allow data to be confirmed by [3], which mentions the power of OSN collected by automated processes. We utilized two key during emergencies: “In July 2013, a Boeing 777 aircraft metrics to understand OSN platform characteristics, crashed on landing at the International i.e., completeness, measured as the message return Airport, after a transpacific flight from Seoul, South Korea. rate, and latency, measured as the time it takes for a An observer waiting to board another flight snapped a message to be available to other users. We are aware photograph of the accident with her and uploaded it to Twitter less than 1 min after the impact. Within that OSN data distribution policies and algorithms may 30 min, there had been more than 44,000 tweets about the change according to different criteria. Still, we believe accident, including photos and videos taken by survivors as there are solid reasons to understand and track them they escaped from the wreckage (International Air Transport over time, by taking snapshots of their most significant Association, 2014).” behaviors as perceived by their users. Despite all the attention to OSN, using data without Hence, the main contributions of this paper are clearly understanding what it comprises might be very threefold. Firstly, we introduce a methodology for problematic [17]. The more we understand a system's evaluating OSN platforms based on heterogeneous data inner workings, the more justifiable it can be governed quality and its impact on the users. Secondly, we and held accountable [4]. Transparency provides examine two of the most widely used OSNs, Twitter, additional reliability and validity for algorithms in and Instagram to broaden the understanding of the charge of decisions that affect society. Also, it makes it similarities and differences in data quality across possible for users to evaluate the correctness of OSN platforms. For , our results reveal that the outputs and identify incorrect data [44]. Twitter platform reposts up to 18% of the tweets However, even though transparency mechanisms shown in the user timeline. On the other hand, this ideally can empower users to question and critique the behavior has not been observed on Instagram. Thirdly, system, Ananny & Crawford [4] have highlighted we help developers by summarizing relevant OSN some technical limitations of the difficulties in getting technical limitations when designing scenarios for to know how algorithms operate. They suggested that different applications. imposing transparency is not as simple as it seems, as In the remainder of this paper, Section 2 presents system developers themselves are often unable to related works, and Section 3 explains the proposed explain how a complex system works or which parts research methods. Experimental results are presented are essential for its operation [9]. in Section 4. Section 5 discusses the lessons learned, Transparency becomes even more relevant, as and finally, Section 6 draws some conclusions and many OSN platforms are owned by commercial presents work. corporations that profit over data and interaction [30]. Eslami et al. [18] showed that the majority of 2. Background & Related Work users have a lack of awareness or understanding about their Feed being structured Recently, due to the broad adoption of mobile by an algorithm. When users miss an important post platforms, OSN represents an increasingly up-to-date from a friend, they usually blame themselves, ignoring digital reflection of society. It supports people in that the OSN has an algorithm making decisions that interacting with places around the city, with other could be misleading or hiding important content from people, and businesses, while talking about topics of them. On the other hand, even if users are aware of common interest, such as sentiments, political beliefs, such complex algorithms, they have no way of social interactions, human mobility, presence in knowing how it works, which may prevent them from specific events, likes [47]. being sure about the results of their actions [18]. OSN behaves as "human as sensors," recording human activities and preferences [22]. This data 2.1. Social Networks Platforms becomes a crucial asset since it can be transformed into valuable knowledge, helping the decision-making From the literature review, it is possible to process [37]. Furthermore, enabled by OSN, these understand how to integrate OSN platforms as data human sensors provide 24 per 7 real-time data streams distributors for various services that can be provided at virtually no cost [16][46][1][6][33]. Also, [15] have for the needs of a variety of smart applications. For argued that OSN updates shared by the citizens can instance, Anthopoulos & Fitsilis [5] explore the need serve as a complementary or a supplementary source of of custom social networks for smart cities, considering

Page 2793 that OSN platforms are the ideal implementations 2.2. Social Networks Methods within the smart city ecosystem. OSN platforms and their applications have Existing studies on OSN are mostly focused on revolutionized the Internet, radically changing the content analysis [5][8][12][35][39][47]. However, communication methods while enabling dynamic there are a few studies that investigate the data interactions between users [48]. However, OSN completeness collected through public APIs. For platforms differ significantly in their conventions and instance, Morstatter et al. [36] analyzed the characteristics [24]. For instance, Twitter is primarily a completeness of samples collected by the Twitter text-based that, until recently, only allowed Streaming API, using a feed that allows access to all tweets up to 140 characters, which meant individuals public tweets. Their study concluded that query needed to condense their ideas into simple messages. parameters impact the coverage of API results. In the In contrast, Instagram is a visual-first medium that same , Joseph, Landwehr, Carley [29] emphasizes pictures and videos over written text [23]. established five simultaneous connections for tracking Twitter is nowadays a popular microblogging similar keywords simultaneously and analyzing service that allows its users to send short messages of whether the data returned by each connection was the 3 up to 280 characters , called tweets, as well as images same. Their results revealed an average of more than 4 and videos. Monthly, almost 500 million messages are 96% overlap and concluded by the impossibility of created and redistributed by millions of , collecting 100% of the data with multiple connections. 5 around 330 million users worldwide. Like Instagram, Driscoll & Walker [17] discussed how platform Twitter is hashtag-driven [19] and has been associated bias could influence data completeness, comparing two with the proliferation and dissemination of news and sources retrieved from Twitter, namely the public events [45]. Twitter holds a prominent position among Streaming API and the commercial firehose streaming OSN as it offers: i) real-time update; ii) flexibility, as a connection provided by PowerTrack. While the user can track someone else's post without being Streaming API excels at longitudinal data collection, it friends; iii) ability to harvest vast amounts of data is a poor choice for massive, short-term events. through its APIs, and; iv) potentiality for the Firehose offers an extensive collection of tweets sent predictions of future situations [28]. within short periods, but the companies do not disclose With 1 billion monthly active users [11] [34], data collection and processing procedures. Instagram focuses on sharing photos and short videos Another approach focused on examining the that motivate the interaction between users through data rate returned by the Twitter Streaming API. Perera private conversations, public comments, and the like et al. [42] found a pattern in 's tweets concept to show approval. Most photos have tags and that can be modeled as a Poisson distribution, while captions with high-level descriptions [51]. Instagram is retweets follow a geometric distribution. Sakaki et al. often used for self-expression and self-documentation [46] identified that inter-arrival times during natural through the showcasing of everyday life [2]. disasters such as earthquakes and typhoons, fit well Due to growing concerns about and data into an exponential distribution with λ = 0.34. security, OSN platforms face frequent changes to their Other studies also examined the current security policies and data restrictions through APIs. technological issues of OSN. For example, Stieglitz et

The latest changes on Instagram included restrictions al. [49] presented a three-layers on data permissions, updating platform policies, and framework determining the main components and regularly evaluating an app's access to user analytical approaches to gain deeper insights. On an permissions [31][40]. Thus, this study focuses on data extension of their previous work, Stieglitz et al. [50] captured by web-crawlers and not by public APIs, as further highlighted the need for suitable data storage our previous study focused on the Twitter Streaming and scalable and flexible software architecture to deal API performance [7]. Instagram is currently changing with the high volume of data gathered from OSN. On its privacy policy due to recent data leaks, and the the other hand, Hammerl et al. [26] introduced a series European General Data Protection Regulation (GDPR) of critical factors and key performance indicators to may require changes in the future. OSN usage's success. The authors mentioned the indicator “Reduction of response time” as part of the critical success factor “Team”. The differences in our study in comparison to the 3https://blog.twitter.com/official/en_us/topics/product/2017/tuíteing previously mentioned papers are manifold. Firstly, it madeeasier.html does not to compare extracted data through APIs 4https://www.internetlivestats.com/twitter-statistics/ 5 but using web-crawlers. Our purpose is to mimic user https://www.statista.com/statistics/282087/number-of-monthly- behavior and compare Twitter and Instagram, where active-twitter-users

Page 2794 the latter does not provide an API like the former. Secondly, to understand data completeness and latency, we automatically simulated all the steps of a legitimate user's post, producing a synthetic workload. Furthermore, most studies are related to evaluations and classification of OSN content and user behavior. Unlike them, our paper seeks to evaluate the OSN data perceived by its users, such as completeness and latency of posts.

3. Research Design and Methods Figure 1. Environment Configuration Even though each platform has different interface features, they support similar core functions for users Our experiments were performed between March to interact with one another. These core functions (e.g., 13 and May 6, 2019, with ten replications to avoid bias to attach a photo, or to send a simple text message) in the samples. It is essential to notice that the enable us to compare data completeness and latency experiments took a long time to complete and had to be across platforms. The following subsections describe repeated often since the platforms have different our 4-step methodology to achieve this purpose. mechanisms to avoid automated robots. Users were blocked many times, and the procedure to unblock 3.1. Step 1: Environment Configuration them frequently took many days.

This step aims at simulating how users interact with 3.2. Step 2: Experimental Design each other on the Twitter and Instagram platforms. We This step defines the test scenarios for each set up a test environment, as shown in Figure 1. On the left side, user A sends messages from her account to platform, considering that the evaluation process was Twitter and Instagram. On the right side, user B, who repeated several times with different parameters to test follows user A, checks the update messages on her the robustness of the experiment. Our process involves timeline. Two computers were used to establish three test scenarios, depicted in Table 1, aiming at simultaneous connections, having the same evaluating whether messages with hashtags, video, or configuration: 1.8 GHz Intel Core i5, 8GB 1600MHz image can interfere in the metrics under analysis. In all DDR3. scenarios, each experiment was repeated 10 times. For Instagram is geared towards mobile devices with each replication, 50 messages were published. The Android and iOS. Posting photos on a laptop is estimated execution time for each scenario was about possible, but it requires simulating a mobile user in a 10 hours to post 500 messages per platform. . For doing so, we used Chrome that provides a shortcut to access the mobile version. ● Scenario 1: Image & Specific Hashtag - the This approach does not work for posting videos messages are composed of a 3.1 MB image, the though. In this case, we used Gramblr6, a free specific hashtag #aksurevlorrainearoya, the app and web service with a desktop , replication number, and the post number. These where the primary interactions are with the web posts are sent from the timeline of user A, while service. While Gramblr is now discontinued after the user B waits until they are available on her submission of this work, other similar tools should be timeline. used for the same purpose. ● Scenario 2: Image & Generic Hashtag - the Although OSN platforms allow different settings messages are composed of a 3.1 MB image and 6 for the visualization of the timeline, we configured most common generic hashtags, namely #beautiful Twitter for showing messages in chronological order. #cute #instagood (from worldwide), #tbt #love and Instagram does not offer such an option, but we #brasil (from Brazil). Also, the replication number, observed that it returned the results in a chronological and the post number. These posts are sent from the order for new posts (a strict chronological order is not timeline of user A, while user B waits until they are enforced since it automatically reposts old messages available on her timeline. sometimes). ● Scenario 3: Video - messages are composed of a 14.1 MB video, the replication number, and the 6 https://gramblr.com post number. These posts are sent from the timeline

Page 2795 of user A, while user B waits until they are The message capture algorithm is developed in available on her timeline. Python 3 to record the exact moment the message sent by user A and becomes available to be viewed by user Table 1. Configuration of the Experiments B (Figure 3). The application monitors the HTML Scenario Media Hashtag Msgs Replications webpage over the user B timeline to check if a new post is available. The Selenium WebDriverWait and Image Specific 50 10 ExpectedCondition are also used here to refresh the 3 MB page and establish the conditions for recording messages. Image Generic 50 10 3 MB 1 Log in with user B Video 2 L = last post at User B timeline -- 50 10 3 L’= Last post saved 14 MB 4 L’=L 5 refresh page 6 Read L 7 If L’ ↑ L 3.3. Step 3: Posting and Capturing Messages 8 Save L and time 9 Repeat steps 4, 5, 6 and 7 10 else This step required the development of two 11 Repeat steps 5, 6 and algorithms for different purposes. The first algorithm Figure 3. Algorithm for automated simulates message submissions step by step as if a user message capture in front of a computer performs them. We used Python 3 and PyAutoGUI7 for controlling the mouse 3.4. Step 4: Data Analysis 8 programmatically. Selenium WebDriverAPI allows the web browsing automation, automating web This last step identifies how the collected data can applications by driving a browser natively like a user be evaluated and measured. Using Venn diagrams, the either locally or on a remote machine, using the 9 set theory allows us to understand five possibilities for Selenium Server. Also, Selenium WebDriverWait and creating the captured dataset (Figure 4). We considered ExpectedCondition allow the program to wait for a the U, P, and C sets: particular condition to occur (i.e., finding an element ● Set U: the universe set of messages posted by all from the next page) before proceeding further in the users of the OSN platform. code. ● Set P: the set of messages posted by user A in our Figure 2 shows the algorithm used for posting experiments. messages. The content of each message varies for the ● Set C: the set of messages captured by user B, three scenarios, as presented in Table 1. We used the including not only those posted by user A. Poisson distribution for the message arrival rate, The scenarios analyzed include the following meaning that the time between posting two consecutive possibilities for data completeness. In Figure 4(a) we messages is given by an Exponential distribution with have � ∩ � = {� ∈ � | � ∈ � ��� � ∈ �}, that is, λ = 0.34 seconds, according to the literature [42]. how many tweets are captured (C) from the set of

posted ones (P), and how many belong only to the set 1 Log in with user A U posted by other Twitter users. In Figure 4(b), we 2 T = exponential random variable with ⎣=0.34 3 set seed = n, n>0 have � ∩ � = {}, where samples were unable to 4 T= get variate from T capture any tweet posted by the experiment. 5 sleep t 6 choose media (image or video) by user clicks In Figure 4(c), C is a proper subset of P (� ⊊ �), 7 write hashtag (specific or generic) i.e., samples returned only tweets posted by the 8 post message experiment, but in a smaller number. In Figure 4(d), 9 repeat steps 4, 5, 6, 7 and 8 for all messages we have the opposite situation, i.e., P is a proper subset Figure 2. Algorithm for automated of C, which means that all posted tweets were captured, but many more tweets from other users were message posting also captured. Finally, Figure 4 (e) shows that � = � (� ⊆ � and � ⊆ �), which means that samples returned all and only the tweets posted by the experiment. 7 https://pyautogui.readthedocs.io/en/latest/ 8 https://docs.seleniumhq.org/projects/webdriver/ 9https://selenium-python.readthedocs.io/waits.html

Page 2796 RQ1: Are there messages shown to the Twitter and Instagram users, but not readily accessible by automated applications? If so, what proportion of posted messages can be reliably retrieved? Figure 5 shows that the average completeness for scenario 1 is 100% and 97% for Twitter and Instagram, a) b) respectively. On the other hand, scenario 2 (generic

hashtags: #beautiful #cute #instagood #tbt #love #brasil and 3 MB image) reveals that Twitter also achieved average completeness of 100%, but for Instagram, this figure was much lower, amounting 61%. Finally, scenario 3 (no hashtag and 14 MB video) shows that Twitter amounted again 100% and c) d) e) Instagram a lower percentage of 73%. Figure 4. Venn diagram for the evaluation of collected data

3.5. Metrics

We used the following metrics to analyze the results of the experiments:

● Twitter completeness: calculated as the number of messages captured from user B timeline divided by the number of messages posted by user A. ● Twitter latency: calculated as the difference Figure 5. Data completeness for Twitter and between the timestamp of a message captured by Instagram user B and the timestamp of the same message According to the possible scenarios presented in posted by user A. Figure 4, the data captured from the Twitter platform ● Instagram completeness: calculated as the number presented the characteristics of Figure 4(d), in which P of messages captured from user B timeline divided is a proper subset of C. This means that all tweets by the number of messages posted by user A. posted by user A are shown in the timeline of user B, ● Instagram latency: calculated as the difference but tweets from other users (set U) are visualized as between the timestamp of a message captured by well. On the other hand, for Instagram the captured user B and the timestamp of the same message posts behave as shown by Figure 4(a), where the set C posted by user A. of messages visualized by user B was less than the set P of messages posted by user A. Also, other messages The values of the metrics presented in the results are appeared in the timeline of user B that belong to the U the mean of the 10 replications. We computed set (i.e., posted by other users). asymptotic confidence intervals at the level of 95% Some findings were found observing the captured that are shown as error bars whenever they are data. We observed in Twitter datasets that some tweets meaningful. were reposted by the platform itself, even though the profile is configured to display messages in chronological order in the timeline. The experiments 4. Results show that about 18% of previously viewed tweets appear on their timeline again, without being reposted By performing experiments with Instagram and by user A. Of that percentage, 67% had videos Twitter, we can compare how these platforms respond (scenario 3), and 33% had images (scenarios 1 and 2). to their users and, consequently, perform better for Besides, we observed that the platform policies could different configurations. Based on the data collected in repost a single message up to 8 times. The interval these experiments, we revisit the research questions between the first reposting and the last one ranged formulated in Section 1. from 3 minutes to 18 hours within the experiments. This behavior was not observed for Instagram.

Page 2797 RQ2: Does any message characteristic (e.g., photo, with videos on the Twitter platform take 4.79 seconds video, hashtag) affect the decision to make them on average (varying between 4.19 and 5.39 seconds). available to other users? Thus, the same message with video travels through the Figure 5 shows that for Twitter, user B can Twitter platform about seven times faster (up to 13 visualize 100% of messages from user A, regardless of times faster if we consider the upper margin of error) the hashtag type (specific or generic) or media type than the Instagram platform at scenario 3. (image or video). However, for Instagram, the completeness involved in the messages with six top 5. Discussion hashtags (scenario 2) presented the lowest completeness of 61%, meaning that 39% of the Our 4-step research methodology and results messages with this characteristic are not visualized by presented here helped us obtain insights from which user B. Also, it is possible to verify (scenario 3) that significant lessons could be learned. the posts involving videos are less likely to be visualized (73%), compared to (scenario 1) images and Twitter vs. Instagram: Our experiments specific hashtag (97%). demonstrated that the Twitter platform presented an improved performance when the metrics of message RQ3: After a user posts a message, how long does completeness and latency are analyzed. However, user it take for Twitter and Instagram to make it available B friends who posted on Instagram did not necessarily to other users? make the posts on the Twitter platform. In other words, Figure 6 shows the latency for both platforms, i.e., the “competition of posts” between platforms is not how many seconds have elapsed from user A posting under control. We have to consider that Instagram an image up to the point it is available in the timeline allows fewer automated captures of messages per of user B. For scenario 1, the latency for Twitter is minute, which might be caused by a higher activity about 3 seconds (varying between 2.89 and 3.12 level of Instagram users than Twitter. Specifically, seconds). In contrast, for Instagram, it is about 25.616 even though the percentage of monthly active users on seconds (varying between 21.95 and 29.29 seconds), Instagram and Twitter is comparable (33% vs. 32%), considering a confidence level of 95%. We can notice 61% of Instagram users visit the platform daily that posts with images travel eight times faster through compared to 45% of Twitter users. Thus, we can infer the Twitter platform than through the Instagram that the competition of photos and videos on Instagram platform for this case. is higher since it is a photo-sharing app, while Twitter For scenario 2, Twitter achieves higher seems optimized for short text messages. performance once again, with an average of 6.89 Twitter behavior: The Twitter dataset revealed seconds, varying between 6.18 and 7.62 seconds. On some curiosities, e.g., reposting some messages up to 8 the other hand, it is 25.73 seconds for Instagram, times, even though the was configured for varying between 18.503 and 32.950 seconds. In messages to be presented in chronological order. We scenario 2, messages with generic hashtags travel can conclude that Twitter selects specific tweets, through the Twitter platform about four times faster mainly ones with videos, to be retweeted in followers' than through Instagram. timelines. The criteria are not clear since the messages contained the same type of content. The only difference was the posting time, which may be considered by the platform to decide whether to repost a message. Observer effect: As we used different tools (Selenium, Gramblr, and our programs), our results might have been affected by them. In physics, this phenomenon is widely known as the observer effect, which also happens in active network and system measurements. This is a limitation of experimental methodologies. For example, we cannot quantify Figure 6. Latency for Twitter and Instagram whether Gamblr introduced some significant additional delay to Instagram or if the computed delay is accurate The results show that when it comes to posts with to the time spent for a post to appear in a user timeline. videos (scenario 3), Instagram can take up to 71.45 User profiles: Our experiments depicted a snapshot seconds to make it available for user B timeline, with of Twitter and Instagram behavior when the data was an average of 33.15 seconds. On the other hand, posts collected. This behavior might change for other users

Page 2798 and even for the same users at different circumstances However, when comparing platforms, Twitter according to the platforms' internal algorithm working, outperforms Instagram in all metrics evaluated. which is for us a black box. Our experiments were Future research should repeat the experiments conducted with only two essentially and considering users in different geographic locations antisocial users, with few followers and following since our experiments were performed with users in the fellows. While the platforms' behavior may differ for same geographic location. Moreover, it is relevant to users with a more intense record of interactions, conduct a different set of experiments expanding the conducting such experiments would require the OSN configuration settings, e.g., text, video, or even existing provider's cooperation, given the practical and ethical users with several followers and following of various concerns involved. Therefore, despite these limitations, users. This might allow investigating the role played by we believe our results are valid and useful for different the platform algorithms on data completeness and purposes, as a first attempt to disclose and compare the latency. behavior of two different OSNs. Experiment automation: This research further Acknowledgements contributed to the study carried out by Perera et al. [42] on the analysis of the tweet arrival rate. The use of an appropriate distribution for the message arrival rate This research is part of the INCT of the Future allowed us to work stealthily with the Twitter and Internet for Smart Cities funded by CNPq proc. Instagram platforms outsmarting the detection of 465446/2014-0, Coordenação de Aperfeiçoamento de artificial bot behavior. Pessoal de Nível Superior – Brasil (CAPES) – Finance Platform constraints: We faced some constraints Code 001, FAPESP proc. 14/50937-1, and FAPESP during the execution of the experiments. In step 2 of proc. 15/24485-9. our methodology, we estimated spending 20 hours performing all experiments on the two platforms. References However, in practice, we spent three times more because several replications were detected as unusual behavior by the platforms. For instance, the Twitter [1] Achrekar, H., Gandhe, A., , R., Yu, S. H., & Liu, platform presented some instability, i.e., "Internal B. (2011, April). Predicting flu trends using Twitter data. server error" and other messages of unclear errors, i.e., In 2011 IEEE conference on computer communications workshops (INFOCOM WKSHPS) (pp. 702-707). "Your account may not be authorized to perform this action. Please refresh the page and try again.". Once [2] Alhabash, S., & Ma, tag">M. (2017). A tale of four platforms: the user account is locked, it is necessary to execute a Motivations and uses of Facebook, Twitter, Instagram, few steps to unlock it, including a verification code and among college students? Social Media+ sent by e-mail or SMS to the user. These actions Society, 3(1), 2056305117691544. caused interruptions in the experiments, which [3] Alkhatib, M., El Barachi, M., & Shaalan, K. (2019). An frequently required the experiment replication to be social media based framework for incidents and restarted from scratch. events monitoring in smart cities. Journal of Cleaner Production, 220, 771-785 6. Conclusion [4] Ananny, M., & Crawford, K. (2018). Seeing without knowing: Limitations of the transparency ideal and its This study aimed to evaluate the performance of application to algorithmic accountability. & OSNs, given their increasing use in real-life event society, 20(3), 973-989. detection based on public data. We evaluated how the Twitter and Instagram platforms respond to their users [5] Anthopoulos, L., & Fitsilis, P. (2015, October). Social networks in smart cities: Comparing evaluation models. when providing similar functionalities (e.g., message In 2015 IEEE First International Smart Cities Conference publications, receiving posts at the timeline). Our (ISC2) (pp. 1-6). IEEE. experiments allowed us to understand some of the platforms' behavior. When a user A posts a message, it [6] Aramaki, E., Maskawa, S., & Morita, M. (2011, July). appears in the timeline of another user B over a higher Twitter catches the flu: detecting influenza epidemics performance on Twitter than Instagram. More posts using Twitter. In Proceedings of the conference on published by user A were available to user B on empirical methods in natural language processing (pp. 1568-1576). Association for Computational Linguistics. Twitter than on Instagram. Study findings also showed that both platforms offered a latency near real-time, [7] Ayora, V., Horita, F. and Kamienski, C., Social making them adequate for real-time processing. Networks as Real-time Data Distribution Platforms for

Page 2799 Smart Cities. 10th Latin America Networking Information, Communication & Society, 19(9), 1214- Conference (LANC 2018), October 2018. 1232.

[8] Batrinca, B., & Treleaven, P. C. (2015). Social media [20] Giridhar, P., Wang, S., Abdelzaher, T., Al Amin, T., & analytics: a survey of techniques, tools and platforms. Ai Kaplan, L. (2017, July). Social fusion: Integrating & Society, 30(1), 89-116. Twitter and Instagram for event monitoring. 2017 IEEE International Conference on Autonomic Computing [9] Burrell, J. (2016). How the machine ‘thinks’: (ICAC) (pp. 1-10). Understanding opacity in algorithms. Big Data & Society, 3(1), 2053951715622512.). [21] Gonawela, A., Kumar, R., Thawani, U., Ahmed, D., Chandrasekaran, R. & Pal, J. (2020, January). The [10] Cerrone, D., Pau, H., & Lehtovuori, P. (2015). A sense of Anointed Son, The Hired Gun, and the Chai Wala: place: Exploring the potentials and possible uses of Enemies and Insults in Politicians’ Tweets in the Run-Up Location Based Social Network Data for urban and to the 2019 Indian General Elections. In Proceedings of transportation planning in Turku City Centre. Turku the 53rd Hawaii International Conference on System Urban Research Programme’s Research Report 1/2015. Sciences (HICSS) (pp. 2878-2887).

[11] Constine J. Instagram hits 1 billion monthly users, up [22] Goodchild, M. F. (2007). Citizens as sensors: the world from 800M in September. TechCrunch. 2018. Available of volunteered geography. GeoJournal, 69(4), 211-221. at: https://techcrunch.com/2018/06/20/ instagram-1-billion-users/ Accessed 30 September 2020. [23] Gruzd, A., Lannigan, J., & Quigley, K. (2018). Examining government cross-platform engagement in [12] D’Andrea, A., Ferri, F., & Grifoni, P. (2010). An social media: Instagram vs Twitter and the big lift overview of methods for virtual social networks analysis. project. Government Information Quarterly, 35(4), 579- In Computational (pp. 3-25). . 587.

[13] Dayani, R., Chhabra, N., Kadian, T., & Kaushal, R. [24] Guidry, J. P., Jin, Y., Orr, C. A., Messner, M., & (2015, December). Rumor detection in twitter: An Meganck, S. (2017). on Instagram and Twitter: analysis in retrospect. In 2015 IEEE International How health organizations address the health crisis in Conference on Advanced Networks and their social media engagement. Public relations review, Telecommunications Systems (ANTS)(pp. 1-3). 43(3), 477-486.

[14] De Assis, L. F. F. G., de Albuquerque, J. P., Herfort, B., [25] Horita, F. E., de Albuquerque, J. P., Marchezini, V., & Steiger, E., & Horita, F. E. A. (2015). Geographical Mendiondo, E. M. (2017). Bridging the gap between prioritization of social network messages in near real- decision-making and emerging big data sources: An time using sensor data streams: an application to floods. application of a -based framework to disaster Revista Brasileira de Cartografia, 68(6). management in Brazil. Decision Support Systems, 97, 12-22. [15] Doran, D., Gokhale, S., & Dagnino, A. (2013, August). Human sensing for smart cities. In Proceedings of the [26] Hammerl, T., Schwaiger, J. M., & Leist, S. (2019). 2013 IEEE/ACM International Conference on Advances Measuring the Success of Social Media: Matching in Social Networks Analysis and Mining (pp. 1323- Identified Success Factors to Social Media KPIs. 1330). [27] Hu, Y., Manikonda, L., & Kambhampati, S. (2014, May). [16] Doran, D., Severin, K., Gokhale, S., & Dagnino, A. What we instagram: A first analysis of instagram photo (2016). Social media enabled human sensing for smart content and user types. In Eighth International AAAI cities. AI Communications, 29(1), 57-75. conference on weblogs and social media.

[17] Driscoll, K., & Walker, S. (2014). Big data, big [28] Hutchinson, A., 2016. Here's why Twitter is so questions| working within a black box: Transparency in important, to everyone. Available at: the collection and production of big twitter data. https://www.socialmediatoday.com/social- International Journal of Communication, 8, 20. networks/heres-why-twitter-soimportant-everyone, Accessed date: 10 June 2020. [18] Eslami, M., Rickman, A., Vaccaro, K., Aleyasen, A., Vuong, A., Karahalios, K., ... & Sandvig, C. (2015, [29] Joseph, K., Landwehr, P. M., & Carley, K. M. (2014, April). " I always assumed that I wasn't really that close April). Two 1% s don’t make a whole: Comparing to [her]" Reasoning about Invisible Algorithms in News simultaneous samples from Twitter’s streaming API. In Feeds. 33rd annual ACM conference on human factors in International conference on social computing, computing systems (pp. 153-162). behavioral-cultural modeling, and prediction(pp. 75-83).

[19] Gilbert, S. (2016). Learning in a Twitter-based [30] Karlsson, M., & Sjøvaag, H. (2016). Content analysis of practice: an exploration of knowledge and online news: epistemologies of analysing the exchange as a motivation for participation in# hcsmca. ephemeral Web. Digital , 4(1), 177-192.

Page 2800 [31] Instagram. Changes to Improve Your Instagram Feed. [42] Perera, R. D., Anand, S., Subbalakshmi, K. P., & March 22, 2018. Available at: https://instagram- Chandramouli, R. (2010, October). Twitter analytics: press.com/blog/2018/03/22/changes-to-improve-your- Architecture, tools and analysis. 2010 Military instagram-feed/. Accessed 10 June 2020. Communications Conference (pp. 2186-2191).

[32] Instagram. Changes to Our Account Disable Policy. July [43] Prakash, A., Caton, S., Haas, C.. (2019, January). 18, 2019. Available at: https://instagram- Utilizing Social Media For . In press.com/blog/2019/07/18/changes-to-our-account- Proceedings of the 52nd Hawaii International Conference disable-policy/. Accessed 10 June 2020. on System Sciences (HICSS) (pp. 2304-2313).

[33] Li, R., Lei, K. H., Khadiwala, R., & Chang, K. C. C. [44] Rader, E., Cotter, K., & Cho, J. (2018, April). (2012, April). Tedas: A Twitter-based event detection Explanations as mechanisms for supporting algorithmic and analysis system. In 2012 IEEE 28th International transparency. In Proceedings of the 2018 CHI conference Conference on Data Engineering (pp. 1273-1276). IEEE. on human factors in computing systems (pp. 1-13)).

[34] Mohsin M. 10 Instagram Stats Every Marketer Should [45] Rathnayake, C., & Suthers, D. D. (2018). Twitter issue Know in 2019. Oberlo.2019. Available at: response hashtags as affordances for momentary https://www.oberlo.com/blog/instagram-stats-every- connectedness. Social Media+ Society, 4(3), marketer-should-know. Accessed 5 June 2019 2056305118784780.

[35] Mora, N., Caballé, S., Daradoumis, T., Barolli, L., Kulla, [46] Sakaki, T., Okazaki, M., & Matsuo, Y. (2010, April). E., & Spaho, E. (2014, July). Characterizing social Earthquake shakes Twitter users: real-time event network e-assessment in collaborative complex learning detection by social sensors. 19th international conference resources. In 2014 Eighth International Conference on on (pp. 851-860). Complex, Intelligent and Software Intensive Systems (pp. 257-264). [47] Singh, P., Dwivedi, Y. K., Kahlon, K. S., Sawhney, R. S., Alalwan, A. A., & Rana, N. P. (2019). Smart Monitoring [36] Morstatter, F., Pfeffer, J., Liu, H., & Carley, K. M. (2013, and Controlling of Government Policies Using Social June). Is the sample good enough? comparing data from Media and Cloud Computing. Information Systems twitter's streaming with twitter's firehose. In Seventh Frontiers, 1-23. international AAAI conference on weblogs and social media. [48] Smith, K., 2019. : 126 Amazing Social Media Statistics and Facts. blog. Available at. [37] Moustaka, V., Vakali, A., & Anthopoulos, L. G. (2018). https://www.brandwatch.com/blog/amazing-social- A systematic review for smart city data analytics. ACM media-statistics-and-facts/, Accessed date: 24 June 2019. Computing Surveys (CSUR), 51(5), 103. [49] Stieglitz, S., & Dang-Xuan, L. (2013). Social media and [38] Moustaka, V., Theodosiou, Z., Vakali, A., Kounoudes, political communication: a social media analytics A., & Anthopoulos, L. G. (2019). Εnhancing social framework. Social network analysis and mining, 3(4), networking in smart cities: Privacy and security 1277-1291. borderlines. Technological Forecasting and Social Change, 142, 285-300. [50] Stieglitz, S., Mirbabaie, M., Ross, B., & Neuberger, C. (2018). Social media analytics–Challenges in topic [39] Nummi, P. (2019). Social media data analysis in urban e- discovery, data collection, and data preparation. planning. In Smart Cities and Smart : Concepts, International journal of information management, 39, Methodologies, Tools, and Applications (pp. 636-651). 156-168.

[40] O’Neil E. API Updates and Importante Changes. [51] Thakur, M., Patel, D., Kumar, S., & Barua, J. (2014, Facebook.com. April 25, 2019. Available at: December). Newsinstaminer: Enriching news article https://developers.facebook.com/blog/post/v2/2019/04/25 using instagram. In International Conference on Big Data /api-updates/. Accessed , 2019. Analytics(pp. 174-188). Springer, Cham.

[41] Panagiotou, N., Katakis, I., & Gunopulos, D. (2016). [52] Wang, S., Giridhar, P., Kaplan, L., & Abdelzaher, T. Detecting events in online social networks: Definitions, (2017, April). Unsupervised event tracking by integrating trends and challenges. In Solving Large Scale Learning twitter and instagram. In Proceedings of the 2nd Tasks. Challenges and Algorithms (pp. 42-84). International Workshop on Social Sensing (pp. 81-86).

Page 2801