A Dataset of Media Realeases (Twitter, News and Comments, Youtube, Facebook) Form Poland Related to COVID-19 for Open Research
Total Page:16
File Type:pdf, Size:1020Kb
A dataset of media realeases (Twitter, News and Comments, Youtube, Facebook) form Poland related to COVID-19 for open research Andrzej Jarynowski, Monika Wójta-Kempa, Małgorzata Stochmal, Daniel Płatek, Martyna Mencel, Jan Maciejewski, Alexander Semenov, Vitaly Belik Interdisciplinary Research Institute, Wroclaw, Poland; System Modeling Group, Institute of Veterinary Epidemiology and Biostatistics, Freie Universität Berlin, Germany; and many others EOSC USE Case 03.09.2020 1 / 1 Outline I Timeline of events in Poland I The use of the traditional and social-content media on the Internet in Poland I infodemics I Data: Twitter, News and Comments, Youtube, Facebook I Analysis of text records I Analysis of networks 2 / 1 Internet media Poland In this study, by quantitative analysis of “digital traces/footprints” on the Internet (social and traditional media), we concentrate on the number and nature of events such as information queries and information propagation. We cover ca. 28 millions Polish language speaking residents of Poland representing over 85% of the literate population. The Polish Internet users are younger (especially among content creators) with females prevailing (especially among content receivers) as compared to the average population. Our results do not describe the perception of the oldest groups, due to digital exclusions, health literacy limitations or consideration of social media as less informative 3 / 1 Interest in “Coronavirus” in Poland 16.03 services 24.03 closure mobility restriction 29.01 31.01 16.04 First case 27.02 04.03 25.05 WHO declares 02-03.03 Restriction In Germany Fake news first case Restriction 16.06 Global threat Special releasing On first case In Poland 25.04 releasing Restriction Act plan 25.01 In Poland New 2 releasing MD Liang exams 3 dates Wudong 11.03 death Pandemic and 31.03 school closure 04.04 Epidemic Anticrisis declaration Anticrisis And law Mitigation Physical Law 2.0 Italian Waiting phase distansing Restrictions releasing Chinese phase phase Phase Anticrisis We have identified several temporal major clusters of interest on the topic COVID-19: 1) Chinese, 2) Italian, 3) Waiting, 4) Mitigations, 5) Social distancing and Lockdown, 6) Anti-crisis shield, 7) Restrictions releasing, 8) Incease of incidence. 4 / 1 SYTUACJA OGÓLNOPOLSKA Phases PHASES EMOTIONS TOPICS Chinese neutral economics 15.01-17.02 Italian negative healthcare system 18-27.02 waiting diverse politics 28.02-09.03 declaration of restrictions fear restrictions 10.03-17.03 Stay at home positive social 18.03-31.03 anticrisis shield diverse epidemiology 01.04-07.05 restrictions releasing optimism politics and economy 08.05-25.07 5 / 1 incidence growth fear social 26.07-05.08 Intensity of the interest 0 25 50 75 100 in the Internet Interest in “Coronavirus” did not reflect transmision dynamics Moreover, “Are Online Searches for the Novel Coronavirus (COVID-19) Related to Media or Epidemiology? A Cross-sectional Study”, published in International Journal of Infectious Diseases (doi: https://doi.org/10.1016/j.ijid.2020.06.028) presented the case important conclusions. that online searches for COVID-19 in Europe are not correlated with epidemiology. 6 / 1 Interest in “Coronavirus” could reflect transmision dynamics in second wave Incease of both incidence and interests in last weeks 7 / 1 Infodemic I Traditional vs social-containment media I a concept in the social medicine area: “lay-referral framing system” where opinions and beliefs of the general public are something different from medical knowledge. I information ws widely treated as a trigger of consequences impacting daily-life security I declarations of mitigation strategies by the prime minister or the minister of health gathered the highest attention to the COVID-19 pandemic of the Internet users in Poland. Thus enacted or in force events do not affect the interest to such extent. However, databases on mitigation strategies (WHO, ECDC, etc.) do not keep this in their records, although it would be of great importance. 8 / 1 RSV by Voivevodship 100 90 First case 80 in Lubusz 70 Fake case Outbreaks 60 in Lodz in Silesia 50 The Power of Fake News V S 40 R 30 20 10 0 5 9 3 7 1 4 8 2 6 0 4 8 3 7 1 5 9 3 7 1 4 8 2 6 0 4 8 2 6 0 4 8 2 6 0 3 7 1 5 9 3 7 -1 -1 -2 -2 -3 -0 -0 -1 -1 -2 -2 -2 -0 -0 -1 -1 -1 -2 -2 -3 -0 -0 -1 -1 -2 -2 -2 -0 -0 -1 -1 -1 -2 -2 -3 -0 -0 -1 -1 -1 -2 -2 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 koronawirus: (Kuyavian-Pomeranian Voivodeship) koronawirus: (Greater Poland Voivodeship) koronawirus: (Lesser Poland Voivodeship) koronawirus: (lodz Voivodeship) koronawirus: (Lower Silesian Voivodeship) koronawirus: (Lublin Voivodeship) koronawirus: (Lubusz Voivodeship) koronawirus: (Masovian Voivodeship) koronawirus: (Opole Voivodeship) koronawirus: (Lower Silesian Voivodeship) koronawirus: (Pomeranian Voivodeship) koronawirus: (Silesian Voivodeship) koronawirus: (Subcarpathian Voivodeship) koronawirus: (Swietokrzyskie) koronawirus: (Warmian-Masurian Voivodeship) koronawirus: (West Pomeranian Voivodeship) An effect of fake news of the first confirmed case 9 / 1 The Power of Information RSV by Voivevodship 100 90 80 70 Lubuskie 60 Zachodniopomorskie Warmińsko-Mazurskie 50 Dolnośląske V 40 S Śląskie R 30 Mazowieckie Małopolskie 20 Wielkopolskie 10 Lubelskie day Opolskie 0 Podkarpackie Łódzkie Świętokrzyskie Pomorskie koronawirus: (Kuyavian-Pomeranian Voivodeship) koronawirus: (Greater Poland Voivodeship) Podlaskie Kujawsko-Pomorskie koronawirus: (Lesser Poland Voivodeship) koronawirus: (lodz Voivodeship) 0 6 2 8 koronawirus: (Lower Silesian Voivodeship) koronawirus: (Lublin Voivodeship) 1 1 koronawirus: (Lubusz Voivodeship) koronawirus: (Masovian Voivodeship) koronawirus: (Opole Voivodeship) koronawirus: (Lower Silesian Voivodeship) koronawirus: (Pomeranian Voivodeship) koronawirus: (Silesian Voivodeship) koronawirus: (Subcarpathian Voivodeship) koronawirus: (Swietokrzyskie) koronawirus: (Warmian-Masurian Voivodeship) koronawirus: (West Pomeranian Voivodeship) Introduction day as the main driving force 10 / 1 Datasets (deposited) https://zenodo.org/record/3985807 I extracted 57,306 representative articles in Polish using Eventregitry.org tool in language Polish and topic “Coronavirus” in article body; I extracted 1,015,199 and Tweets from #Koronawirus in language Polish usiing Twitter API. I collected 1,574 videos with keyword: Koronawirus on YouTube and 247,575 comments on them using Google API; 11 / 1 Datasets (deposited) 2 I manually labeled 1,449 articles / Facebook posts from Lower Silesia and 111 texts from outside this region; I manually labeled 1,500 Twitts / Youtube comments (for ML training); I analysis of 244 social empirical studies till 25.05 on COVID-19 in Poland. 12 / 1 Techniques Methodological and statistical analysis is a crucial point in this kind of study. We apply: I the number and nature of social media events such as information queries (time series analysis); I sentiment and conceptual fields analysis; I social network analysis; I topic modeling technique; I comparative approach bewtween local and national-wide media; I quasi-systematic review of studies. 13 / 1 Twitter I We have collected >1 000 000 with #Koronawiurs I Twitter in Poland has relatively low popularity (~3 million registered users or less than 8% of the country population) and is mainly used by expats, journalists and politicians I For Twitter data we will deploy the temporal multilayer network analysis of users (using metadata on quotes, retweets, mentions, replies, and followings), because networks can be constructed based on the mentioned metadata types. 14 / 1 Retweet networks ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● M_K_Blonska●●● ● ● ● ●●● ● ●● ● ● ● ● ●MichalSzczerba● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● AndrzejTurczyn● ●●● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● MZ_GOV_PL● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● PremierRP● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● KONFEDERACJA_ ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● A network of retweets with the Koronawirus hashtag in Polish language ● during 28.01-11.03.2020 period (250 most central nodes according to weighted centrality, coloring according to● Louvain community detection algorithm). Yellow – the ruling coalition, green – the mainstream opposition, dark blue – a Protestant minority, light blue – a far right party ● 15 / 1 Polarization 0.9 net_layer 0.8 mod_retweet mod_reply mod_quote modularity 0.7 0.6 10 20 30 weeks Modularity of the full networks (over 30 thousands accounts) and their weekly dynamics for different type of communication relations (retweeting, reply, quotation). 16 / 1 Actors The most central users 17 / 1 News articles I We have collected >60000 epresentative digital traditional media news (selected with EventRegistry tool) with stemmed word Koronawirus in title or short body. I We choose EventRegistry (eventregistry.org) as a traditional media search engine because it has a large range of online news services representing a broad political spectrum.