Content Categorization for Contextual Advertising Using Wikipedia
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Complete Issue
Culture Unbound: Journal of Current Cultural Research Thematic Section: Changing Orders of Knowledge? Encyclopedias in Transition Edited by Jutta Haider & Olof Sundin Extraction from Volume 6, 2014 Linköping University Electronic Press Culture Unbound: ISSN 2000-1525 (online) URL: http://www.cultureunbound.ep.liu.se/ © 2014 The Authors. Culture Unbound, Extraction from Volume 6, 2014 Thematic Section: Changing Orders of Knowledge? Encyclopedias in Transition Jutta Haider & Olof Sundin Introduction: Changing Orders of Knowledge? Encyclopaedias in Transition ................................ 475 Katharine Schopflin What do we Think an Encyclopaedia is? ........................................................................................... 483 Seth Rudy Knowledge and the Systematic Reader: The Past and Present of Encyclopedic Reading .............................................................................................................................................. 505 Siv Frøydis Berg & Tore Rem Knowledge for Sale: Norwegian Encyclopaedias in the Marketplace .............................................. 527 Vanessa Aliniaina Rasoamampianina Reviewing Encyclopaedia Authority .................................................................................................. 547 Ulrike Spree How readers Shape the Content of an Encyclopedia: A Case Study Comparing the German Meyers Konversationslexikon (1885-1890) with Wikipedia (2002-2013) ........................... 569 Kim Osman The Free Encyclopaedia that Anyone can Edit: The -
Strengthening and Unifying the Visual Identity of Wikimedia Projects: a Step Towards Maturity
Strengthening and unifying the visual identity of Wikimedia projects: a step towards maturity Guillaume Paumier∗ Elisabeth Bauer [[m:User:guillom]] [[m:User:Elian]] Abstract In January 2007, the Wikimedian community celebrated the sixth birthday of Wikipedia. Six years of constant evolution have now led to Wikipedia being one of the most visited websites in the world. Other projects developing free content and supported by the Wikimedia Foundation have been expanding rapidly too. The Foundation and its projects are now facing some communication issues due to the difference of scale between the human and financial resources of the Foundation and the success of its projects. In this paper, we identify critical issues in terms of visual identity and marketing. We evaluate the situation and propose several changes, including a redesign of the default website interface. Introduction The first Wikipedia project was created in January 2001. In these days, the technical infrastructure was provided by Bomis, a dot-com company. In June 2003, Jimmy Wales, founder of Wikipedia and owner of Bomis, created the Wikimedia Foundation [1] to provide a long-term administrative and technical structure dedicated to free content. Since these days, both the projects and the Foundation have been evolving. New projects have been created. All have grown at different rates. Some have got more fame than the others. New financial, technical and communication challenges have risen. In this paper, we will first identify some of these challenges and issues in terms of global visual identity. We will then analyse logos, website layouts, projects names, trademarks so as to provide some hindsight. -
Wikipedia Data Analysis
Wikipedia data analysis. Introduction and practical examples Felipe Ortega June 29, 2012 Abstract This document offers a gentle and accessible introduction to conduct quantitative analysis with Wikipedia data. The large size of many Wikipedia communities, the fact that (for many languages) the already account for more than a decade of activity and the precise level of detail of records accounting for this activity represent an unparalleled opportunity for researchers to conduct interesting studies for a wide variety of scientific disciplines. The focus of this introductory guide is on data retrieval and data preparation. Many re- search works have explained in detail numerous examples of Wikipedia data analysis, includ- ing numerical and graphical results. However, in many cases very little attention is paid to explain the data sources that were used in these studies, how these data were prepared before the analysis and additional information that is also available to extend the analysis in future studies. This is rather unfortunate since, in general, data retrieval and data preparation usu- ally consumes no less than 75% of the whole time devoted to quantitative analysis, specially in very large datasets. As a result, the main goal of this document is to fill in this gap, providing detailed de- scriptions of available data sources in Wikipedia, and practical methods and tools to prepare these data for the analysis. This is the focus of the first part, dealing with data retrieval and preparation. It also presents an overview of useful existing tools and frameworks to facilitate this process. The second part includes a description of a general methodology to undertake quantitative analysis with Wikipedia data, and open source tools that can serve as building blocks for researchers to implement their own analysis process. -
1 Wikipedia As an Arena and Source for the Public. a Scandinavian
Wikipedia as an arena and source for the public. A Scandinavian Comparison of “Islam” Hallvard Moe Department of Information Science and Media Studies University of Bergen [email protected] Abstract This article compares Wikipedia as an arena and source for the public through analysis of articles on “Islam” across the three Scandinavian languages. Findings show that the Swedish article is continuously revised and adjusted by a fairly high number of contributors, with comparatively low concentration to a small group of top users. The Norwegian article is static, more basic, but still serves as a matter-of-factly presentation of Islam as religion to a stable amount of views. In contrast, the Danish article is at once more dynamic through more changes up until recently, it portrays Islam differently with a distinct focus on identity issues, and it is read less often. The analysis illustrates how studying Wikipedia can bring light to the receiving end of what goes on in the public sphere. The analysis also illustrates how our understanding of the online realm profits from “groundedness”, and how comparison of similar sites in different languages can yield insights into cultural as well as political differences, and their implications. Keywords Wikipedia, public sphere, freedom of information, comparative, digital methods Introduction The online encyclopedia Wikipedia is heralded as a non-commercial, user generated source of information. It is also a space for debate over controversial issues. Wikipedia, therefore, stands out from other online media more commonly analyzed in studies of public debate: on the one hand, mainstream media such as online newspapers are typically deemed interesting since they (are thought to) reach a wide audience with curated or edited content. -
Operationalizing a National Digital Library: the Case for a Norwegian Transformer Model Per E Kummervold Javier De La Rosa [email protected] [email protected]
Operationalizing a National Digital Library: The Case for a Norwegian Transformer Model Per E Kummervold Javier de la Rosa [email protected] [email protected] Freddy Wetjen Svein Arne Brygfjeld [email protected] [email protected] The National Library of Norway Mo i Rana, Norway Abstract (Devlin et al., 2019). Later research has shown that the corpus size might have even been too In this work, we show the process of build- small, and when Facebook released its Robustly ing a large-scale training set from digi- Optimized BERT (RoBERTa), it showed a consid- tal and digitized collections at a national erable gain in performance by increasing the cor- library. The resulting Bidirectional En- pus to 160GB (Liu et al., 2019). coder Representations from Transformers (BERT)-based language model for Nor- Norwegian is spoken by just 5 million peo- wegian outperforms multilingual BERT ple worldwide. The reference publication Ethno- (mBERT) models in several token and se- logue lists the 200 most commonly spoken na- quence classification tasks for both Nor- tive languages, and it places Norwegian as num- wegian Bokmal˚ and Norwegian Nynorsk. ber 171. The Norwegian language has two differ- Our model also improves the mBERT per- ent varieties, both equally recognized as written formance for other languages present in languages: Bokmal˚ and Nynorsk. The number of the corpus such as English, Swedish, and Wikipedia pages written in a certain language is Danish. For languages not included in the often used to measure its prevalence, and in this corpus, the weights degrade moderately regard, Norwegian Bokmal˚ ranges as number 23 while keeping strong multilingual prop- and Nynorsk as number 55. -
Bootstrap Quantification of Estimation Uncertainties in Network Degree Distributions Received: 15 February 2017 Yulia R
www.nature.com/scientificreports OPEN Bootstrap quantification of estimation uncertainties in network degree distributions Received: 15 February 2017 Yulia R. Gel1, Vyacheslav Lyubchich2 & L. Leticia Ramirez Ramirez3 Accepted: 5 June 2017 We propose a new method of nonparametric bootstrap to quantify estimation uncertainties in functions Published: xx xx xxxx of network degree distribution in large ultra sparse networks. Both network degree distribution and network order are assumed to be unknown. The key idea is based on adaptation of the “blocking” argument, developed for bootstrapping of time series and re-tiling of spatial data, to random networks. We first sample a set of multiple ego networks of varying orders that form a patch, or a network block analogue, and then resample the data within patches. To select an optimal patch size, we develop a new computationally efficient and data-driven cross-validation algorithm. The proposed fast patchwork bootstrap (FPB) methodology further extends the ideas for a case of network mean degree, to inference on a degree distribution. In addition, the FPB is substantially less computationally expensive, requires less information on a graph, and is free from nuisance parameters. In our simulation study, we show that the new bootstrap method outperforms competing approaches by providing sharper and better-calibrated confidence intervals for functions of a network degree distribution than other available approaches, including the cases of networks in an ultra sparse regime. We illustrate the FPB in application to collaboration networks in statistics and computer science and to Wikipedia networks. Motivated by a plethora of modern large network applications and rapid advances in computing technologies, the area of network modeling is undergoing a vigorous developmental boom, spreading over numerous disciplines, from computer science to engineering to social and health sciences. -
Global Gender Differences in Wikipedia Readership
Global gender differences in Wikipedia readership Isaac Johnson1, Florian Lemmerich2, Diego Saez-Trumper´ 1, Robert West3, Markus Strohmaier2,4, and Leila Zia1 1Wikimedia Foundation; fi[email protected] 2RWTH Aachen University; fi[email protected] 3EPFL; fi[email protected] 4GESIS - Leibniz Institute for the Social Sciences; fi[email protected] ABSTRACT Wikipedia represents the largest and most popular source of encyclopedic knowledge in the world today, aiming to provide equal access to information worldwide. From a global online survey of 65,031 readers of Wikipedia and their corresponding reading logs, we present novel evidence of gender differences in Wikipedia readership and how they manifest in records of user behavior. More specifically we report that (1) women are underrepresented among readers of Wikipedia, (2) women view fewer pages per reading session than men do, (3) men and women visit Wikipedia for similar reasons, and (4) men and women exhibit specific topical preferences. Our findings lay the foundation for identifying pathways toward knowledge equity in the usage of online encyclopedic knowledge. Equal access to encyclopedic knowledge represents a critical prerequisite for promoting equality more broadly in society and for an open and informed public at large. With almost 54 million articles written by roughly 500,000 monthly editors across more than 160 actively edited languages, Wikipedia is the most important source of encyclopedic knowledge for a global audience, and one of the most important knowledge resources available on the internet. Every month, Wikipedia attracts users on more than 1.5 billion unique devices from across the globe, for a total of more than 15 billion monthly pageviews.1 Data about who is represented in this readership provides unique insight into the accessibility of encyclopedic knowledge worldwide. -
Wikipedia – Free and Reliable? Aspects of a Collaboratively Shaped Encyclopaedia*
Nordicom Review 30 (2009) 1, pp. 183-199 Wikipedia – Free and Reliable? Aspects of a Collaboratively Shaped Encyclopaedia* MARIA MATTUS Abstract Wikipedia is a multilingual, Internet-based, free, wiki-encyclopaedia that is created by its own users. The aim of the present article is to let users’ descriptions of their impressions and ex- periences of Wikipedia increase our understanding of the function and dynamics of this collaboratively shaped wiki-encyclopaedia. Qualitative, structured interviews concerning users, and the creation and use of Wikipe- dia were carried out via e-mail with six male respondents – administrators at the Swedish Wikipedia – during September and October, 2006. The results focus on the following themes: I. Passive and active users; II. Formal and informal tasks; III. Common and personal visions; IV. Working together; V. The origins and creation of articles; VI. Contents and quality; VII. Decisions and interventions; VIII. Encyclopaedic ambitions. The discussion deals with the approach of this encyclopaedic phenomenon, focusing on its “unfinishedness”, its development in different directions, and the social regulation that is implied and involved. Wikipedia is a product of our time, having a powerful vision and engagement, and it should therefore be interpreted and considered on its own terms. Keywords: Wikipedia, encyclopaedia, wiki, cooperation, collaboration, social regulation Introduction Wikipedia is an international, multilingual1, Internet-based, free-content encyclopaedia. Consisting of about 8 million2 articles (27 million pages), it has become the world’s largest encyclopaedic wiki. Wikipedia uses the wiki technique and hypertext environ- ment, and besides its encyclopaedic content, it contains meta-level information and communication channels to enable further interaction. -
Wikipedia – Free and Reliable? Aspects of a Collaboratively Shaped Encyclopaedia*
10.1515/nor-2017-0146 Nordicom Review 30 (2009) 1, pp. 183-199 Wikipedia – Free and Reliable? Aspects of a Collaboratively Shaped Encyclopaedia* MARIA MATTUS Abstract Wikipedia is a multilingual, Internet-based, free, wiki-encyclopaedia that is created by its own users. The aim of the present article is to let users’ descriptions of their impressions and ex- periences of Wikipedia increase our understanding of the function and dynamics of this collaboratively shaped wiki-encyclopaedia. Qualitative, structured interviews concerning users, and the creation and use of Wikipe- dia were carried out via e-mail with six male respondents – administrators at the Swedish Wikipedia – during September and October, 2006. The results focus on the following themes: I. Passive and active users; II. Formal and informal tasks; III. Common and personal visions; IV. Working together; V. The origins and creation of articles; VI. Contents and quality; VII. Decisions and interventions; VIII. Encyclopaedic ambitions. The discussion deals with the approach of this encyclopaedic phenomenon, focusing on its “unfinishedness”, its development in different directions, and the social regulation that is implied and involved. Wikipedia is a product of our time, having a powerful vision and engagement, and it should therefore be interpreted and considered on its own terms. Keywords: Wikipedia, encyclopaedia, wiki, cooperation, collaboration, social regulation Introduction Wikipedia is an international, multilingual1, Internet-based, free-content encyclopaedia. Consisting of about 8 million2 articles (27 million pages), it has become the world’s largest encyclopaedic wiki. Wikipedia uses the wiki technique and hypertext environ- ment, and besides its encyclopaedic content, it contains meta-level information and communication channels to enable further interaction. -
Wikipedia Research and Tools: Review and Comments
Wikipedia research and tools: Review and comments Finn Arup˚ Nielsen January 24, 2019 Abstract research is to find the answer to the question \why does it work at all?" I here give an overview of Wikipedia and wiki re- Research using information from Wikipedia will search and tools. Well over 1,000 reports have been typically hope for the correctness and perhaps com- published in the field and there exist dedicated sci- pleteness of the information in Wikipedia. Many entific meetings for Wikipedia research. It is not aspects of Wikipedia can be used, not just the raw possible to give a complete review of all material text, but the links between components: Language published. This overview serves to describe some links, categories and information in templates pro- i key areas of research. vide structured content that can be used in a vari- ety of other applications, such as natural language processing and translation tools. Large-scale efforts 1 Introduction extract structured information from Wikipedia and link the data and connect it as Linked Data. The Wikipedia has attracted researchers from a wide 2 range of diciplines| phycisist, computer scientists, papers with description of DBpedia. represent ex- librarians, etc.|examining the online encyclopedia amples on this line of research. from a several different perspectives and using it in It is odd to write a Science 1.0 article about a variety of contexts. a Web 2.0 phenomenon: An interested researcher Broadly, Wikipedia research falls into four cate- may already find good collaborative written arti- gories: cles about Wikipedia research on Wikipedia itself, see Table1. -
Wikipedia and Its Tensions with Paid Labour
tripleC 14(1): 78-98, 2016 http://www.triple-c.at Monetary Materialities of Peer-Produced Knowledge: The Case of Wikipedia and Its Tensions with Paid Labour Arwid Lund* and Juhana Venäläinen** *Uppsala University, Uppsala, Sweden, [email protected] **University of Eastern Finland, Joensuu, Finland, [email protected] Abstract: This article contributes to the debate on the possibilities and limits of expanding the sphere of peer production within and beyond capitalism. As a case in point, it discusses the explicit and tacit monetary dependencies of Wikipedia, which are not only ascribable to the need to sustain the techno- logical structures that render the collaboration possible, but also about the money-mediated suste- nance of the peer producers themselves. In Wikipedia, the “bright line” principle for avoiding conflicts of interest has been that no one should be paid for directly editing an article. By examining the after- math of the Wiki-PR scandal, where a consulting firm was allegedly involved in helping more than 12,000 clients to edit Wikipedia articles until 2014, the goal of the analysis is to shed light on the para- doxical situation where the institution for supporting the peer production (Wikimedia Foundation) found itself taking a more strict perspective vis-à-vis commercial alliances than the unpaid community edi- tors. Keywords: Wikipedia, Wiki-PR, Materialities, Peer Production, Commons 1. Introduction Peer production has often been portrayed as a potential complement or a radical alternative to capitalism (e.g. Moore 2011; Rigi 2013). As an early and prominent figure in the debate, Yochai Benkler (2002; 2006; 2011) depicted the regime of commons-based peer production as a revolutionary social form that will eventually transform the ways of organizing production in the contemporary economy. -
The Case for a Norwegian Transformer Model Per E Kummervold Javier De La Rosa [email protected] [email protected]
Operationalizing a National Digital Library: The Case for a Norwegian Transformer Model Per E Kummervold Javier de la Rosa [email protected] [email protected] Freddy Wetjen Svein Arne Brygfjeld [email protected] [email protected] The National Library of Norway Mo i Rana, Norway Abstract (Devlin et al., 2019). Later research has shown that the corpus size might have even been too In this work, we show the process of build- small, and when Facebook released its Robustly ing a large-scale training set from digi- Optimized BERT (RoBERTa), it showed a consid- tal and digitized collections at a national erable gain in performance by increasing the cor- library. The resulting Bidirectional En- pus to 160GB (Liu et al., 2019). coder Representations from Transformers (BERT)-based language model for Nor- Norwegian is spoken by just 5 million peo- wegian outperforms multilingual BERT ple worldwide. The reference publication Ethno- (mBERT) models in several token and se- logue lists the 200 most commonly spoken na- quence classification tasks for both Nor- tive languages, and it places Norwegian as num- wegian Bokmal˚ and Norwegian Nynorsk. ber 171. The Norwegian language has two differ- Our model also improves the mBERT per- ent varieties, both equally recognized as written formance for other languages present in languages: Bokmal˚ and Nynorsk. The number of the corpus such as English, Swedish, and Wikipedia pages written in a certain language is Danish. For languages not included in the often used to measure its prevalence, and in this corpus, the weights degrade moderately regard, Norwegian Bokmal˚ ranges as number 23 while keeping strong multilingual prop- and Nynorsk as number 55.