A General Method for Estimating the Prevalence of Influenza-Like-Symptoms with Wikipedia Data

Total Page:16

File Type:pdf, Size:1020Kb

A General Method for Estimating the Prevalence of Influenza-Like-Symptoms with Wikipedia Data A general method for estimating the prevalence of Influenza-Like-Symptoms with Wikipedia data Giovanni De Toni Cristian Consonni Alberto Montresor DISI, University of Trento Eurecat - Centre Tecnòlogic of DISI, University of Trento [email protected] Catalunya [email protected] [email protected] ABSTRACT Wikipedia1 is a multilingual encyclopedia, written collabora- Influenza is an acute respiratory seasonal disease that affects mil- tively by volunteers online. As of this writing, the English version lions of people worldwide and causes thousands of deaths in Europe alone has more than 6M articles and 50M pages and it is edited on alone. Being able to estimate in a fast and reliable way the impact average by more than 65k active users every month [16]. of an illness on a given country is essential to plan and organize Research has shown that Wikipedia is a first-stop source for in- effective countermeasures, which is now possible by leveraging un- formation of all kinds, including information about science [12, 19]. conventional data sources like web searches and visits. In this study, This is of particular significance in fields such as medicine, where it we show the feasibility of exploiting information about Wikipedia’s has been shown that Wikipedia is a prominent result for over 70% page views of a selected group of articles and machine learning of search results for health-related keywords [5]. Wikipedia is also models to obtain accurate estimates of influenza-like illnesses in- the most viewed medical resources globally [3], and it is used as an cidence in four European countries: Italy, Germany, Belgium, and information source by 50% to 70% of practicing physicians [2]. the Netherlands. We propose a novel language-agnostic method, In this study, we investigate how to exploit Wikipedia as a source based on two algorithms, Personalized PageRank and CycleRank, of data for doing rapid now-casting of the influenza activity levels in to automatically select the most relevant Wikipedia pages to be several European countries. McIver and Brownstein [7] developed monitored without the need for expert supervision. We then show a method based on Wikipedia’s pageviews, which behaves quite how our model is able to reach state-of-the-art results by comparing well when predicting influenza levels in the United States provided it with previous solutions. by the Centers for Disease Control and Prevention (CDC). They use pageviews of a predefined set of pages from the English language edition of Wikipedia chosen by a panel of experts. Generous et al. [1] also developed linear machine learning models and demon- 1 INTRODUCTION strated the ability to forecast incidence levels for 14 location-disease combinations. Influenza is a widespread acute respiratory infection that occurs We extend upon the state of the art by proposing a novel method with seasonal frequency, mainly during the autumn-winter period. that can be applied automatically to other languages and countries. The European Centre for Disease and Control (ECDC) estimates We focused our attention on predicting influenza levels in Italy, that influenza causes up to 50 millions cases and upto 70,000 deaths Germany, Belgium and the Netherlands. In extending this work, we annually in Europe [13, 20]. Globally, the number of deaths caused have developed a method that is more flexible with respect of the by this infection ranges from 290,000 to 650,000 [21]. The most af- characteristics of the edition of Wikipedia and that can be applied fected categories are children in the pediatric age range and seniors. to any language thus enabling us to build and deploy machine Severe cases are recorded in people above 65 years old and with pre- learning models for Influenza-like Illnesses (ILI) estimation without vious health conditions, for example, respiratory or cardiovascular the need of expert supervision. illnesses [8, 14]. The paper is structured as follows: in Section 2 we review the The ECDC monitors the influenza virus activities and provides relevant datasets that we have used in our study. In Section 3 and 4 weekly bulletins that aggregate the open data coming from all the arXiv:2010.14903v1 [cs.CY] 28 Oct 2020 we discuss our models and how we selected the relevant features to European countries. The local data are gathered from the national use for predicting influenza activity levels from Wikipedia data. We networks of volunteer physicians [22]. The Centre supplies only present the results of our study in Section 5 and 6, and we discuss data about the current situation, which means that they do not pro- them in Section 7. Finally, we conclude the paper in Section 8. vide any valuable information about which will be the future impact of the influenza illness in a European country, namely the number of future infected people. Moreover, ECDC data are available with a 2 DATA SOURCES delay ranging between one and two weeks, which prevents taking Wikipedia page views. Data about Wikipedia page views are preventive measures to mitigate the impact of influenza. In recent freely available to download. For our study, we have used two 2 years, thanks to machine learning techniques, new methods have datasets: the pagecounts dataset (December 2007-May 2016, for been developed to estimate the activity level of diseases by harness- a total of 461 weeks) and the pageviews dataset (October 2016 to ing the power of the internet and big data. Researchers tried to use April 2019, for a total of 134 weeks). Both of them count the number unconventional sources of data to make predictions, for example, tweets from Twitter [4, 10, 11], Google’s search keywords [18] or 1https://www.wikipedia.org social media [17]. 2Pagecounts-raw on wikitech.wikimedia.org, https://w.wiki/gnq, visited on 2020-10-14 of page hits to Wikipedia articles. The pagecounts dataset contains Italian data were downloaded from the InfluNet system4, which non-filtered page views, including automatically bot-generated is the Italian flu surveillance program managed by the Istituto ones; moreover, it does not contain data of page hits from mobile Superiore di Sanità; German data were obtained from the Robert devices. The pageviews dataset has been recently developed, su- Koch Institute5; Dutch and Belgian data were take from the GISRS perseding pagecounts in October 2016, and includes only human (Global Influenza Surveillance and Response System) online tool traffic from both desktop and mobile devices. By considering both which is available from the WHO website6, since at the best of our these datasets, we can examine a longer time range while estimating knowledge, the two countries do not offer open data from national the impact of different models on our results. health institute websites. From these datasets, we extracted the page views of three differ- For Germany and Italy, we cover 12 influenza seasons (from ent versions of Wikipedia: Italian, German, and Dutch. We selected 2007-2008 to 2018-2019); for the Netherlands and Belgium, we have these languages because the majority of page views for each of data that cover 9 influenza seasons (from 2010-2011 to 2018-2019). them come almost exclusively from one single area.3 For instance, Each of the datasets records the influenza incidence aggregated just 41.5% of the page views of the English Wikipedia come from weekly over 100 000 patients. For each influenza seasons, each of US and UK, while all the rest comes from all over the world. On the the dataset entries represents the influenza level on a specific week. contrary, 87.0% of the page views of the Italian version of Wikipedia For each influenza seasons, data cover 26 weeks, from the 42nd come from Italy, 77.0% of the page views of the German version week of the previous year to the 15th week of the following year; of Wikipedia come from Germany, and almost 89% of the Dutch for instance, the 2009-2010 influenza season comprises data ranging Wikipedia’s page views come from the Netherlands and Belgium from the 42nd week of 2009 to the 15th week of 2010. The complete (20.3% coming from Belgium and 69.1% coming from the Nether- dataset has the following size: 312 weeks of data for Germany and lands alone). This enables us to build datasets which contain less Italy, 234 for the Netherlands and Belgium. noise and which are only affected by the internet behaviour ofthe population of the country or area we are examining. 3 FEATURE SELECTION Each of row is composed of four columns, namely: the project’s The selection of the pages to monitor was based on the following language, the page title, the number of requests and the size in byte assumption: during the influenza seasons, we should see an incre- of the page retrieved. Data are available with hourly granularity; ment in the page view number of some specific Wikipedia’s pages, to use them in our project, we filtered them to select only specific which should be related to flu topics (e.g., “Influenza”, “Headache”, entries and then we computed an aggregated weekly value of the “Flu vaccine”, etc.). That occurs because most people tend to search page views for each of the entries selected. online the symptoms of their disease when they are sick, trying to We generated weekly aggregates for page views for the period get more information about their illness. ranging from the 50th week of 2007 to the 15th week of 2019, for a We use three different methodologies in order to avoid to make total of 591 weeks. pageviews data were used to cover the period unnecessary assumptions about which pages to choose. from September 2016 to April 2019 period, while for the previous period pagecounts data were used.
Recommended publications
  • Modeling Popularity and Reliability of Sources in Multilingual Wikipedia
    information Article Modeling Popularity and Reliability of Sources in Multilingual Wikipedia Włodzimierz Lewoniewski * , Krzysztof W˛ecel and Witold Abramowicz Department of Information Systems, Pozna´nUniversity of Economics and Business, 61-875 Pozna´n,Poland; [email protected] (K.W.); [email protected] (W.A.) * Correspondence: [email protected] Received: 31 March 2020; Accepted: 7 May 2020; Published: 13 May 2020 Abstract: One of the most important factors impacting quality of content in Wikipedia is presence of reliable sources. By following references, readers can verify facts or find more details about described topic. A Wikipedia article can be edited independently in any of over 300 languages, even by anonymous users, therefore information about the same topic may be inconsistent. This also applies to use of references in different language versions of a particular article, so the same statement can have different sources. In this paper we analyzed over 40 million articles from the 55 most developed language versions of Wikipedia to extract information about over 200 million references and find the most popular and reliable sources. We presented 10 models for the assessment of the popularity and reliability of the sources based on analysis of meta information about the references in Wikipedia articles, page views and authors of the articles. Using DBpedia and Wikidata we automatically identified the alignment of the sources to a specific domain. Additionally, we analyzed the changes of popularity and reliability in time and identified growth leaders in each of the considered months. The results can be used for quality improvements of the content in different languages versions of Wikipedia.
    [Show full text]
  • Mapping the Nature of Content and Processes on the English Wikipedia
    ‘Creating Knowledge’: Mapping the nature of content and processes on the English Wikipedia Report submitted by Sohnee Harshey M. Phil Scholar, Advanced Centre for Women's Studies Tata Institute of Social Sciences Mumbai April 2014 Study Commissioned by the Higher Education Innovation and Research Applications (HEIRA) Programme, Centre for the Study of Culture and Society, Bangalore as part of an initiative on ‘Mapping Digital Humanities in India’, in collaboration with the Centre for Internet and Society, Bangalore. Supported by the Ford Foundation’s ‘Pathways to Higher Education Programme’ (2009-13) Introduction Run a search on Google and one of the first results to show up would be a Wikipedia entry. So much so, that from ‘googled it’, the phrase ‘wikied it’ is catching up with students across university campuses. The Wikipedia, which is a ‘collaboratively edited, multilingual, free Internet encyclopedia’1, is hugely popular simply because of the range and extent of topics covered in a format that is now familiar to most people using the internet. It is not unknown that the ‘quick ready reference’ nature of Wikipedia makes it a popular source even for those in the higher education system-for quick information and even as a starting point for academic writing. Since there is no other source which is freely available on the internet-both in terms of access and information, the content from Wikipedia is thrown up when one runs searches on Google, Yahoo or other search engines. With Wikipedia now accessible on phones, the rate of distribution of information as well as the rate of access have gone up; such use necessitates that the content on this platform must be neutral and at the same time sensitive to the concerns of caste, gender, ethnicity, race etc.
    [Show full text]
  • Does Wikipedia Matter? the Effect of Wikipedia on Tourist Choices Marit Hinnosaar, Toomas Hinnosaar, Michael Kummer, and Olga Slivko Discus­­ Si­­ On­­ Paper No
    Dis cus si on Paper No. 15-089 Does Wikipedia Matter? The Effect of Wikipedia on Tourist Choices Marit Hinnosaar, Toomas Hinnosaar, Michael Kummer, and Olga Slivko Dis cus si on Paper No. 15-089 Does Wikipedia Matter? The Effect of Wikipedia on Tourist Choices Marit Hinnosaar, Toomas Hinnosaar, Michael Kummer, and Olga Slivko First version: December 2015 This version: September 2017 Download this ZEW Discussion Paper from our ftp server: http://ftp.zew.de/pub/zew-docs/dp/dp15089.pdf Die Dis cus si on Pape rs die nen einer mög lichst schnel len Ver brei tung von neue ren For schungs arbei ten des ZEW. Die Bei trä ge lie gen in allei ni ger Ver ant wor tung der Auto ren und stel len nicht not wen di ger wei se die Mei nung des ZEW dar. Dis cus si on Papers are inten ded to make results of ZEW research prompt ly avai la ble to other eco no mists in order to encou ra ge dis cus si on and sug gesti ons for revi si ons. The aut hors are sole ly respon si ble for the con tents which do not neces sa ri ly repre sent the opi ni on of the ZEW. Does Wikipedia Matter? The Effect of Wikipedia on Tourist Choices ∗ Marit Hinnosaar† Toomas Hinnosaar‡ Michael Kummer§ Olga Slivko¶ First version: December 2015 This version: September 2017 September 25, 2017 Abstract We document a causal influence of online user-generated information on real- world economic outcomes. In particular, we conduct a randomized field experiment to test whether additional information on Wikipedia about cities affects tourists’ choices of overnight visits.
    [Show full text]
  • Flags and Banners
    Flags and Banners A Wikipedia Compilation by Michael A. Linton Contents 1 Flag 1 1.1 History ................................................. 2 1.2 National flags ............................................. 4 1.2.1 Civil flags ........................................... 8 1.2.2 War flags ........................................... 8 1.2.3 International flags ....................................... 8 1.3 At sea ................................................. 8 1.4 Shapes and designs .......................................... 9 1.4.1 Vertical flags ......................................... 12 1.5 Religious flags ............................................. 13 1.6 Linguistic flags ............................................. 13 1.7 In sports ................................................ 16 1.8 Diplomatic flags ............................................ 18 1.9 In politics ............................................... 18 1.10 Vehicle flags .............................................. 18 1.11 Swimming flags ............................................ 19 1.12 Railway flags .............................................. 20 1.13 Flagpoles ............................................... 21 1.13.1 Record heights ........................................ 21 1.13.2 Design ............................................. 21 1.14 Hoisting the flag ............................................ 21 1.15 Flags and communication ....................................... 21 1.16 Flapping ................................................ 23 1.17 See also ...............................................
    [Show full text]
  • Wikipedia Matters∗
    Wikipedia Matters∗ Marit Hinnosaar† Toomas Hinnosaar‡ Michael Kummer§ Olga Slivko¶ September 29, 2017 Abstract We document a causal impact of online user-generated information on real-world economic outcomes. In particular, we conduct a randomized field experiment to test whether additional content on Wikipedia pages about cities affects tourists’ choices of overnight visits. Our treatment of adding information to Wikipedia increases overnight visits by 9% during the tourist season. The impact comes mostly from improving the shorter and incomplete pages on Wikipedia. These findings highlight the value of content in digital public goods for informing individual choices. JEL: C93, H41, L17, L82, L83, L86 Keywords: field experiment, user-generated content, Wikipedia, tourism industry 1 Introduction Asymmetric information can hinder efficient economic activity. In recent decades, the Internet and new media have enabled greater access to information than ever before. However, the digital divide, language barriers, Internet censorship, and technological con- straints still create inequalities in the amount of accessible information. How much does it matter for economic outcomes? In this paper, we analyze the causal impact of online information on real-world eco- nomic outcomes. In particular, we measure the impact of information on one of the primary economic decisions—consumption. As the source of information, we focus on Wikipedia. It is one of the most important online sources of reference. It is the fifth most ∗We are grateful to Irene Bertschek, Avi Goldfarb, Shane Greenstein, Tobias Kretschmer, Thomas Niebel, Marianne Saam, Greg Veramendi, Joel Waldfogel, and Michael Zhang as well as seminar audiences at the Economics of Network Industries conference in Paris, ZEW Conference on the Economics of ICT, and Advances with Field Experiments 2017 Conference at the University of Chicago for valuable comments.
    [Show full text]
  • ORES: Lowering Barriers with Participatory Machine Learning in Wikipedia
    ORES: Lowering Barriers with Participatory Machine Learning in Wikipedia AARON HALFAKER∗, Microsoft, USA R. STUART GEIGER†, University of California, San Diego, USA Algorithmic systems—from rule-based bots to machine learning classifiers—have a long history of supporting the essential work of content moderation and other curation work in peer production projects. From counter- vandalism to task routing, basic machine prediction has allowed open knowledge projects like Wikipedia to scale to the largest encyclopedia in the world, while maintaining quality and consistency. However, conversations about how quality control should work and what role algorithms should play have generally been led by the expert engineers who have the skills and resources to develop and modify these complex algorithmic systems. In this paper, we describe ORES: an algorithmic scoring service that supports real-time scoring of wiki edits using multiple independent classifiers trained on different datasets. ORES decouples several activities that have typically all been performed by engineers: choosing or curating training data, building models to serve predictions, auditing predictions, and developing interfaces or automated agents that act on those predictions. This meta-algorithmic system was designed to open up socio-technical conversations about algorithms in Wikipedia to a broader set of participants. In this paper, we discuss the theoretical mechanisms of social change ORES enables and detail case studies in participatory machine learning around ORES from the 5 years since its deployment. CCS Concepts: • Networks → Online social networks; • Computing methodologies → Supervised 148 learning by classification; • Applied computing → Sociology; • Software and its engineering → Software design techniques; • Computer systems organization → Cloud computing; Additional Key Words and Phrases: Wikipedia; Reflection; Machine learning; Transparency; Fairness; Algo- rithms; Governance ACM Reference Format: Aaron Halfaker and R.
    [Show full text]
  • Cross-Lingual Infobox Alignment in Wikipedia Using Entity-Attribute Factor Graph
    Cross-lingual infobox alignment in Wikipedia using Entity-Attribute Factor Graph Yan Zhang, Thomas Paradis, Lei Hou, Juanzi Li, Jing Zhang and Haitao Zheng Knowledge Engineering Group, Tsinghua University, Beijing, China [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] Abstract. Wikipedia infoboxes contain information about article enti- ties in the form of attribute-value pairs, and are thus a very rich source of structured knowledge. However, as the different language versions of Wikipedia evolve independently, it is a promising but challenging prob- lem to find correspondences between infobox attributes in different lan- guage editions. In this paper, we propose 8 effective features for cross lingual infobox attribute matching containing categories, templates, at- tribute labels and values. We propose entity-attribute factor graph to consider not only individual features but also the correlations among attribute pairs. Experiments on the two Wikipedia data sets of English- Chinese and English-French show that proposed approach can achieve high F1-measure:85.5% and 85.4% respectively on the two data sets. Our proposed approach finds 23,923 new infobox attribute mappings be- tween English and Chinese Wikipedia, and 31,576 between English and French based on no more than six thousand existing matched infobox attributes. We conduct an infobox completion experiment on English- Chinese Wikipedia and complement 76,498 (more than 30% of EN-ZH Wikipedia existing cross-lingual links) pairs of corresponding articles with more than one attribute-value pairs.
    [Show full text]
  • Wikipedia Matters∗
    Wikipedia Matters∗ Marit Hinnosaar† Toomas Hinnosaar‡ Michael Kummer§ Olga Slivko¶ July 14, 2019 Abstract We document a causal impact of online user-generated information on real-world economic outcomes. In particular, we conduct a randomized field experiment to test whether additional content on Wikipedia pages about cities affects tourists’ choices of overnight visits. Our treatment of adding information to Wikipedia in- creases overnight stays in treated cities compared to non-treated cities. The impact is largely driven by improvements to shorter and relatively incomplete pages on Wikipedia. Our findings highlight the value of content in digital public goods for informing individual choices. JEL: C93, H41, L17, L82, L83, L86 Keywords: field experiment, user-generated content, Wikipedia, tourism industry ∗We are grateful to Irene Bertschek, Avi Goldfarb, Shane Greenstein, Tobias Kretschmer, Michael Luca, Thomas Niebel, Marianne Saam, Greg Veramendi, Joel Waldfogel, and Michael Zhang as well as seminar audiences at the Economics of Network Industries conference (Paris), ZEW Conference on the Economics of ICT (Mannheim), Advances with Field Experiments 2017 Conference (University of Chicago), 15th Annual Media Economics Workshop (Barcelona), Conference on Digital Experimentation (MIT), and Digital Economics Conference (Toulouse) for valuable comments. Ruetger Egolf, David Neseer, and Andrii Pogorielov provided outstanding research assistance. Financial support from SEEK 2014 is gratefully acknowledged. †Collegio Carlo Alberto and CEPR, [email protected] ‡Collegio Carlo Alberto, [email protected] §Georgia Institute of Technology & University of East Anglia, [email protected] ¶ZEW, [email protected] 1 1 Introduction Asymmetric information can hinder efficient economic activity (Akerlof, 1970). In recent decades, the Internet and new media have enabled greater access to information than ever before.
    [Show full text]
  • Discovering Missing Wikipedia Inter-Language Links by Means of Cross-Lingual Word Sense Disambiguation
    Discovering Missing Wikipedia Inter-language Links by means of Cross-lingual Word Sense Disambiguation Els Lefever1 ;2 ,Veronique´ Hoste1 ;3 and Martine De Cock2 1 LT3, Language and Translation Technology Team, University College Ghent Groot-Brittannielaan¨ 45, 9000 Gent, Belgium 2 Department of Applied Mathematics and Computer Science, Ghent University Krijgslaan 281 (S9), 9000 Gent, Belgium 3 Dept. of Linguistics, Ghent University Blandijnberg 2, 9000 Gent, Belgium els.lefever, [email protected], [email protected] Abstract Wikipedia pages typically contain inter-language links to the corresponding pages in other languages. These links, however, are often incomplete. This paper describes a set of experiments in which the viability of discovering such missing inter-language links for ambiguous nouns by means of a cross-lingual Word Sense Disambiguation approach is investigated. The input for the inter-language link detection system is a set of Dutch pages for a given ambiguous noun and the output of the system is a set of links to the corresponding pages in three target languages (viz. French, Spanish and Italian). The experimental results show that although it is a very challenging task, the system succeeds to detect missing inter-language links between Wikipedia documents for a manually labeled test set. The final goal of the system is to provide a human editor with a list of possible missing links that should be manually verified. Keywords: Wikipedia links, Cross-lingual WSD, Word Sense Disambiguation 1. Introduction when they do exist, because no human contributor has es- Wikipedia1 is a very popular collaborative multilingual tablished the appropriate inter-language link yet.
    [Show full text]
  • FULLTEXT01.Pdf
    Blekinge Institute of Technology School of Computing Department of Technology and Aesthetics "WRITING FOR THE ENEMY": KURDISH LANGUAGE STAN- DARDIZATION ONLINE Agri Dehqan 2014 BACHELOR THESIS B.S. in Digital Culture Supervisor: Maria Engberg Dehqan 1 Agri Dehqan Maria Engberg 13 May 2014 Table of content 1. Introduction 2 2. Kurdistan 12 2.1 The land 14 2.2 Population 14 2.3 Language 15 2.4 Northern Kurdistan, Turkey 18 2.5 Southern Kurdistan, Iraq 21 2.6 Eastern Kurdistan, Iran 21 2.7 Western Kurdistan, Syria 22 3. Wikipedia 22 3.1 Inception 22 3.2 Statistics 24 3.3 Policies and Guidelines 25 3.4 Criticism 26 3.5 Systemic Bias 27 4. Kurdish Wikipedia 28 4.1 Case studies 32 4.1.1 First Case Study, Potato 33 4.1.2 Second Case Study, Kirkuk 36 5. The Role of Kurdish Wikipedia 39 Conclusion 47 Works Cited 48 Dehqan 2 1. Introduction Wikis and the social web have created platforms conducive to collaboration and dis- course at an unprecedented rate and level. These social tools have brought about new oppor- tunities, affordances, demands and risks that not only challenge existing power relations (Castells, Communication Power), but give rise to bottom-up solutions to societal problems. Democratization of both knowledge creation and dissemination (Fuchs 255; Shirky 297)— stimulated by participatory culture—is a natural by-product of such change which enables users’ to contribute regardless of their commitment or their expertise (Jenkins et al. 6). Such transformation is both facilitated by technology and helps spur technological innovation (Castells, Rise of Network Society 37).
    [Show full text]
  • The Effect of Wikipedia on Tourist Choices
    A Service of Leibniz-Informationszentrum econstor Wirtschaft Leibniz Information Centre Make Your Publications Visible. zbw for Economics Hinnosaar, Marit; Hinnosaar, Toomas; Kummer, Michael; Slivko, Olga Working Paper Does Wikipedia matter? The effect of Wikipedia on tourist choices ZEW Discussion Papers, No. 15-089 Provided in Cooperation with: ZEW - Leibniz Centre for European Economic Research Suggested Citation: Hinnosaar, Marit; Hinnosaar, Toomas; Kummer, Michael; Slivko, Olga (2017) : Does Wikipedia matter? The effect of Wikipedia on tourist choices, ZEW Discussion Papers, No. 15-089, Zentrum für Europäische Wirtschaftsforschung (ZEW), Mannheim This Version is available at: http://hdl.handle.net/10419/173092 Standard-Nutzungsbedingungen: Terms of use: Die Dokumente auf EconStor dürfen zu eigenen wissenschaftlichen Documents in EconStor may be saved and copied for your Zwecken und zum Privatgebrauch gespeichert und kopiert werden. personal and scholarly purposes. Sie dürfen die Dokumente nicht für öffentliche oder kommerzielle You are not to copy documents for public or commercial Zwecke vervielfältigen, öffentlich ausstellen, öffentlich zugänglich purposes, to exhibit the documents publicly, to make them machen, vertreiben oder anderweitig nutzen. publicly available on the internet, or to distribute or otherwise use the documents in public. Sofern die Verfasser die Dokumente unter Open-Content-Lizenzen (insbesondere CC-Lizenzen) zur Verfügung gestellt haben sollten, If the documents have been made available under an Open gelten abweichend von diesen Nutzungsbedingungen die in der dort Content Licence (especially Creative Commons Licences), you genannten Lizenz gewährten Nutzungsrechte. may exercise further usage rights as specified in the indicated licence. www.econstor.eu Dis cus si on Paper No. 15-089 Does Wikipedia Matter? The Effect of Wikipedia on Tourist Choices Marit Hinnosaar, Toomas Hinnosaar, Michael Kummer, and Olga Slivko Dis cus si on Paper No.
    [Show full text]
  • Linguistic Neighbourhoods: Explaining Cultural Borders on Wikipedia Through Multilingual Co-Editing Activity
    Samoilenko et al. RESEARCH Linguistic neighbourhoods: Explaining cultural borders on Wikipedia through multilingual co-editing activity Anna Samoilenko1,3*, Fariba Karimi1, Daniel Edler2, J´er^omeKunegis3 and Markus Strohmaier1,3 *Correspondence: [email protected] Abstract 1GESIS { Leibniz-Institute for the Social Sciences, 6-8 Unter In this paper, we study the network of global interconnections between language Sachsenhausen, 50667 Cologne, communities, based on shared co-editing interests of Wikipedia editors, and show Germany that although English is discussed as a potential lingua franca of the digital Full list of author information is available at the end of the article space, its domination disappears in the network of co-editing similarities, and instead local connections come to the forefront. Out of the hypotheses we explored, bilingualism, linguistic similarity of languages, and shared religion provide the best explanations for the similarity of interests between cultural communities. Population attraction and geographical proximity are also significant, but much weaker factors bringing communities together. In addition, we present an approach that allows for extracting significant cultural borders from editing activity of Wikipedia users, and comparing a set of hypotheses about the social mechanisms generating these borders. Our study sheds light on how culture is reflected in the collective process of archiving knowledge on Wikipedia, and demonstrates that cross-lingual interconnections on Wikipedia are not dominated by one powerful language. Our findings also raise some important policy questions for the Wikimedia Foundation. Keywords: Wikipedia; Multilingual; Cultural similarity; Network; Digital language divide; Socio-linguistics; Digital Humanities; Hypothesis testing 1 Introduction Measuring the extent to which cultural communities overlap via the knowledge they preserve can paint a picture of how culturally proximate or diverse they are.
    [Show full text]