Complete Issue
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Content Categorization for Contextual Advertising Using Wikipedia
Content Categorization for Contextual Advertising Using Wikipedia Ingrid Grønlie Guren August 2, 2015 Content Categorization for Contextual Advertising Using Wikipedia Ingrid Grønlie Guren August 2, 2015 ii Abstract Automatic categorization of content is an important functionality in online ad- vertising and automated content recommendations, both for ensuring contextual relevancy of placements and for building up behavioral profiles for users that consume the content. Within the advertising domain, the taxonomy tree that content is classified into is defined with some commercial application in mind to somehow reflect the advertising platform’s ad inventory. The nature of the ad inventory and the language of the content might vary across brokers (i.e., the operator of the advertising platform), so it was of interest to develop a system that can easily bootstrap the development of a well-working classifier. We developed a dictionary-based classifier based on titles from Wikipedia articles where the titles represent entries in the dictionary. The idea of the dictionary-based classifier is so simple that it can be understood by users of the program, also those who lack technical experience. Further, it has the ad- vantage that its users easily can expand the dictionary with desirable words for specific advertisement purposes. The process of creating the classifier includes a processing of all Wikipedia article titles to a form more likely to occur in docu- ments, before each entry is graded to their most describing Wikipedia category path. The Wikipedia category paths are further mapped to categories based on the taxonomy of Interactive Advertising Bureau (IAB), which are categories relevant for advertising. -
Position Description Addenda
POSITION DESCRIPTION January 2014 Wikimedia Foundation Executive Director - Addenda The Wikimedia Foundation is a radically transparent organization, and much information can be found at www.wikimediafoundation.org . That said, certain information might be particularly useful to nominators and prospective candidates, including: Announcements pertaining to the Wikimedia Foundation Executive Director Search Kicking off the search for our next Executive Director by Former Wikimedia Foundation Board Chair Kat Walsh An announcement from Wikimedia Foundation ED Sue Gardner by Wikimedia Executive Director Sue Gardner Video Interviews on the Wikimedia Community and Foundation and Its History Some of the values and experiences of the Wikimedia Community are best described directly by those who have been intimately involved in the organization’s dramatic expansion. The following interviews are available for viewing though mOppenheim.TV . • 2013 Interview with Former Wikimedia Board Chair Kat Walsh • 2013 Interview with Wikimedia Executive Director Sue Gardner • 2009 Interview with Wikimedia Executive Director Sue Gardner Guiding Principles of the Wikimedia Foundation and the Wikimedia Community The following article by Sue Gardner, the current Executive Director of the Wikimedia Foundation, has received broad distribution and summarizes some of the core cultural values shared by Wikimedia’s staff, board and community. Topics covered include: • Freedom and open source • Serving every human being • Transparency • Accountability • Stewardship • Shared power • Internationalism • Free speech • Independence More information can be found at: https://meta.wikimedia.org/wiki/User:Sue_Gardner/Wikimedia_Foundation_Guiding_Principles Wikimedia Policies The Wikimedia Foundation has an extensive list of policies and procedures available online at: http://wikimediafoundation.org/wiki/Policies Wikimedia Projects All major projects of the Wikimedia Foundation are collaboratively developed by users around the world using the MediaWiki software. -
A Topic-Aligned Multilingual Corpus of Wikipedia Articles for Studying Information Asymmetry in Low Resource Languages
Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 2373–2380 Marseille, 11–16 May 2020 c European Language Resources Association (ELRA), licensed under CC-BY-NC A Topic-Aligned Multilingual Corpus of Wikipedia Articles for Studying Information Asymmetry in Low Resource Languages Dwaipayan Roy, Sumit Bhatia, Prateek Jain GESIS - Cologne, IBM Research - Delhi, IIIT - Delhi [email protected], [email protected], [email protected] Abstract Wikipedia is the largest web-based open encyclopedia covering more than three hundred languages. However, different language editions of Wikipedia differ significantly in terms of their information coverage. We present a systematic comparison of information coverage in English Wikipedia (most exhaustive) and Wikipedias in eight other widely spoken languages (Arabic, German, Hindi, Korean, Portuguese, Russian, Spanish and Turkish). We analyze the content present in the respective Wikipedias in terms of the coverage of topics as well as the depth of coverage of topics included in these Wikipedias. Our analysis quantifies and provides useful insights about the information gap that exists between different language editions of Wikipedia and offers a roadmap for the Information Retrieval (IR) community to bridge this gap. Keywords: Wikipedia, Knowledge base, Information gap 1. Introduction other with respect to the coverage of topics as well as Wikipedia is the largest web-based encyclopedia covering the amount of information about overlapping topics. -
Modeling Popularity and Reliability of Sources in Multilingual Wikipedia
information Article Modeling Popularity and Reliability of Sources in Multilingual Wikipedia Włodzimierz Lewoniewski * , Krzysztof W˛ecel and Witold Abramowicz Department of Information Systems, Pozna´nUniversity of Economics and Business, 61-875 Pozna´n,Poland; [email protected] (K.W.); [email protected] (W.A.) * Correspondence: [email protected] Received: 31 March 2020; Accepted: 7 May 2020; Published: 13 May 2020 Abstract: One of the most important factors impacting quality of content in Wikipedia is presence of reliable sources. By following references, readers can verify facts or find more details about described topic. A Wikipedia article can be edited independently in any of over 300 languages, even by anonymous users, therefore information about the same topic may be inconsistent. This also applies to use of references in different language versions of a particular article, so the same statement can have different sources. In this paper we analyzed over 40 million articles from the 55 most developed language versions of Wikipedia to extract information about over 200 million references and find the most popular and reliable sources. We presented 10 models for the assessment of the popularity and reliability of the sources based on analysis of meta information about the references in Wikipedia articles, page views and authors of the articles. Using DBpedia and Wikidata we automatically identified the alignment of the sources to a specific domain. Additionally, we analyzed the changes of popularity and reliability in time and identified growth leaders in each of the considered months. The results can be used for quality improvements of the content in different languages versions of Wikipedia. -
Omnipedia: Bridging the Wikipedia Language
Omnipedia: Bridging the Wikipedia Language Gap Patti Bao*†, Brent Hecht†, Samuel Carton†, Mahmood Quaderi†, Michael Horn†§, Darren Gergle*† *Communication Studies, †Electrical Engineering & Computer Science, §Learning Sciences Northwestern University {patti,brent,sam.carton,quaderi}@u.northwestern.edu, {michael-horn,dgergle}@northwestern.edu ABSTRACT language edition contains its own cultural viewpoints on a We present Omnipedia, a system that allows Wikipedia large number of topics [7, 14, 15, 27]. On the other hand, readers to gain insight from up to 25 language editions of the language barrier serves to silo knowledge [2, 4, 33], Wikipedia simultaneously. Omnipedia highlights the slowing the transfer of less culturally imbued information similarities and differences that exist among Wikipedia between language editions and preventing Wikipedia’s 422 language editions, and makes salient information that is million monthly visitors [12] from accessing most of the unique to each language as well as that which is shared information on the site. more widely. We detail solutions to numerous front-end and algorithmic challenges inherent to providing users with In this paper, we present Omnipedia, a system that attempts a multilingual Wikipedia experience. These include to remedy this situation at a large scale. It reduces the silo visualizing content in a language-neutral way and aligning effect by providing users with structured access in their data in the face of diverse information organization native language to over 7.5 million concepts from up to 25 strategies. We present a study of Omnipedia that language editions of Wikipedia. At the same time, it characterizes how people interact with information using a highlights similarities and differences between each of the multilingual lens. -
L'exemple De Wikipédia Laure Endrizzi Chargée D'études Et De Recherche, Cellule Veille Scientifique Et Technologique, INRP, Lyon
La communauté comme auteur et éditeur : l’exemple de Wikipédia Laure Endrizzi To cite this version: Laure Endrizzi. La communauté comme auteur et éditeur : l’exemple de Wikipédia. Journée nationale du réseau des URFIST : Evaluation et validation de l’information sur internet, Jan 2007, Paris, France. edutice-00184888 HAL Id: edutice-00184888 https://edutice.archives-ouvertes.fr/edutice-00184888 Submitted on 2 Nov 2007 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Journée d'études des URFIST 31 janvier 2007, Paris « Evaluation et validation de l'information sur internet » La communauté comme auteur et éditeur : l'exemple de Wikipédia Laure Endrizzi chargée d'études et de recherche, cellule Veille scientifique et technologique, INRP, Lyon Résumé L’ensemble des technologies dites 2.0 place l’usager au cœur de la création des contenus numériques tout en l’inscrivant dans une dynamique collective. Ces transformations remettent en cause le modèle éditorial traditionnel, sans offrir de représentations claires et stabilisées des modes de production et de validation qui sont à l’œuvre. Avec l’exemple de Wikipédia, nous tenterons de comprendre les mécanismes de la régulation éditoriale, pour ensuite nous interroger sur les formes d’expertise sollicitées et les figures de l’auteur. -
Are Encyclopedias Dead? Evaluating the Usefulness of a Traditional Reference Resource Rachel S
St. Cloud State University theRepository at St. Cloud State Library Faculty Publications Library Services 2012 Are Encyclopedias Dead? Evaluating the Usefulness of a Traditional Reference Resource Rachel S. Wexelbaum St. Cloud State University, [email protected] Follow this and additional works at: https://repository.stcloudstate.edu/lrs_facpubs Part of the Library and Information Science Commons Recommended Citation Wexelbaum, Rachel S., "Are Encyclopedias Dead? Evaluating the Usefulness of a Traditional Reference Resource" (2012). Library Faculty Publications. 26. https://repository.stcloudstate.edu/lrs_facpubs/26 This Article is brought to you for free and open access by the Library Services at theRepository at St. Cloud State. It has been accepted for inclusion in Library Faculty Publications by an authorized administrator of theRepository at St. Cloud State. For more information, please contact [email protected]. Are Encyclopedias Dead? Evaluating the Usefulness of a Traditional Reference Resource Author Rachel Wexelbaum is Collection Management Librarian and Assistant Professor at Saint Cloud State University, Saint Cloud, Minnesota. Contact Details Rachel Wexelbaum Collection Management Librarian MC135D Collections Saint Cloud State University 720 4 th Avenue South Saint Cloud, MN 56301 Email: [email protected] Abstract Purpose – To examine past, current, and future usage of encyclopedias. Design/methodology/approach – Review the history of encyclopedias, their composition, and usage by focusing on select publications covering different subject areas. Findings – Due to their static nature, traditionally published encyclopedias are not always accurate, objective information resources. Intentions of editors and authors also come into question. A researcher may find more value in using encyclopedias as historical documents rather than resources for quick facts. -
The Culture of Wikipedia
Good Faith Collaboration: The Culture of Wikipedia Good Faith Collaboration The Culture of Wikipedia Joseph Michael Reagle Jr. Foreword by Lawrence Lessig The MIT Press, Cambridge, MA. Web edition, Copyright © 2011 by Joseph Michael Reagle Jr. CC-NC-SA 3.0 Purchase at Amazon.com | Barnes and Noble | IndieBound | MIT Press Wikipedia's style of collaborative production has been lauded, lambasted, and satirized. Despite unease over its implications for the character (and quality) of knowledge, Wikipedia has brought us closer than ever to a realization of the centuries-old Author Bio & Research Blog pursuit of a universal encyclopedia. Good Faith Collaboration: The Culture of Wikipedia is a rich ethnographic portrayal of Wikipedia's historical roots, collaborative culture, and much debated legacy. Foreword Preface to the Web Edition Praise for Good Faith Collaboration Preface Extended Table of Contents "Reagle offers a compelling case that Wikipedia's most fascinating and unprecedented aspect isn't the encyclopedia itself — rather, it's the collaborative culture that underpins it: brawling, self-reflexive, funny, serious, and full-tilt committed to the 1. Nazis and Norms project, even if it means setting aside personal differences. Reagle's position as a scholar and a member of the community 2. The Pursuit of the Universal makes him uniquely situated to describe this culture." —Cory Doctorow , Boing Boing Encyclopedia "Reagle provides ample data regarding the everyday practices and cultural norms of the community which collaborates to 3. Good Faith Collaboration produce Wikipedia. His rich research and nuanced appreciation of the complexities of cultural digital media research are 4. The Puzzle of Openness well presented. -
The Importance of Female Voices
The Importance of Female Voices Goals Collaborate to read and understand a text, and examine and use topic-specific vocabulary; Identify and define individual and collective knowledge, differentiating knowledge of girls and boys. Essential Questions Are women today participating in Canadian public life in percentages equal to that of men? Have efforts to increase women’s participation in public life succeeded in Canada? Does it matter if female voices are not heard in more equal proportions? Enduring Understandings Even after much progress in the past 30 years, women are vastly under-represented in many aspects of Canadian public life. This imbalance even showed up in Wikipedia, the open online resource encyclopedia, where only 15-20% of content contributors are women. Efforts to increase female participation in public life in the Canada have succeeded but growth is slow. The executives of the Wikipedia foundation viewed the small number of female contributors as a problem. Many believe that increasing the percentage of female voices will enrich diversity of opinions, and better reflect the user community. Materials Define Gender Gap? (2011) Making the edit: why we need more women in Wikipedia (2019) Wikipedia is a mirror of the world's gender biases (2018) Vocabulary intractable [ in-trak-tuh-buh ] (adjective) not easily controlled or directed; not docile or manageable; stubborn; obstinate. nuance [ noo-ahns ] (noun) a subtle difference or distinction in expression or meaning. disparity [ di-ˈsper-ə-tee ] (noun) difference; containing or made up of fundamentally different or unequal elements. egalitarian [ ih-gal-i-tair-ee-uh n ] (adjective) characterized by a belief in the equal status of all people in political, economic, or social life. -
Wikimedia Foundation 2010-11 Plan
Wikimedia Foundation The Year Ahead, and the Year in Review SUE GARDNER WIKIMANIA – WASHINGTON DC – 14 JULY 2012 Purpose of this presentation is two things: Tell you what the WMF did in 2011-12 Tell you what the WMF plans to do in 2012-13 3 34.8 million dollars 4 You wrote the projects. You earned the goodwill. Your work is what donors are supporting. 5 6 7 Recapping 2011-12 8 Recapping 2011-12 Activities Supported 25% increase in readership from 400 to 500 million UVs; Visual Editor – released with save/edit capability & new parser, in restricted namespace on Mediawiki.org; Global Ed work is being done in 12 countries, and in eight the work is driven by volunteers – 19 million characters added to Wikipedia, with 50% female editors; The mobile platform has been redeveloped, Android app released, mobile PVs up 187%. Wikipedia Zero launched with new deals with two companies covering 28 countries, and more underway; Editor recruitment activity is underway in India and Brazil; Wikimedia Labs is launched and exceeding participation targets; Internationalization team has made significant improvements, particularly for Indic languages. It has also created new translation tools; New editor engagement – MoodBar/Feedback Dashboard, AFTv5, New Pages Feed tool, plus many research projects and small experiments (e.g., warnings & welcoming study, lapsed editor e-mail experiment,Teahouse). Nonetheless, editors are down to 85 from 89K; Upload wizard increased image uploads 27% over the year, with WLM causing a visible spike in September 2011. 9 Strategy Plan 2015 Goals (for reference) 1) Increase readership. 2) Increase the quantity of material we offer. -
Amplifying the Impact of Open Access: Wikipedia and the Diffusion of Science
(forthcoming in the Journal of the Association for Information Science and Technology) Amplifying the Impact of Open Access: Wikipedia and the Diffusion of Science Misha Teplitskiy Grace Lu Eamon Duede Dept. of Sociology and KnowledgeLab Computation Institute and KnowledgeLab University of Chicago KnowledgeLab University of Chicago [email protected] University of Chicago [email protected] (773) 834-4787 [email protected] (773) 834-4787 5735 South Ellis Avenue (773) 834-4787 5735 South Ellis Avenue Chicago, Illinois 60637 5735 South Ellis Avenue Chicago, Illinois 60637 Chicago, Illinois 60637 Abstract With the rise of Wikipedia as a first-stop source for scientific knowledge, it is important to compare its representation of that knowledge to that of the academic literature. Here we identify the 250 most heavi- ly used journals in each of 26 research fields (4,721 journals, 19.4M articles in total) indexed by the Scopus database, and test whether topic, academic status, and accessibility make articles from these journals more or less likely to be referenced on Wikipedia. We find that a journal’s academic status (im- pact factor) and accessibility (open access policy) both strongly increase the probability of its being ref- erenced on Wikipedia. Controlling for field and impact factor, the odds that an open access journal is referenced on the English Wikipedia are 47% higher compared to paywall journals. One of the implica- tions of this study is that a major consequence of open access policies is to significantly amplify the dif- fusion of science, through an intermediary like Wikipedia, to a broad audience. Word count: 7894 Introduction Wikipedia, one of the most visited websites in the world1, has become a destination for information of all kinds, including information about science (Heilman & West, 2015; Laurent & Vickers, 2009; Okoli, Mehdi, Mesgari, Nielsen, & Lanamäki, 2014; Spoerri, 2007). -
Language-Agnostic Relation Extraction from Abstracts in Wikis
information Article Language-Agnostic Relation Extraction from Abstracts in Wikis Nicolas Heist, Sven Hertling and Heiko Paulheim * ID Data and Web Science Group, University of Mannheim, Mannheim 68131, Germany; [email protected] (N.H.); [email protected] (S.H.) * Correspondence: [email protected] Received: 5 February 2018; Accepted: 28 March 2018; Published: 29 March 2018 Abstract: Large-scale knowledge graphs, such as DBpedia, Wikidata, or YAGO, can be enhanced by relation extraction from text, using the data in the knowledge graph as training data, i.e., using distant supervision. While most existing approaches use language-specific methods (usually for English), we present a language-agnostic approach that exploits background knowledge from the graph instead of language-specific techniques and builds machine learning models only from language-independent features. We demonstrate the extraction of relations from Wikipedia abstracts, using the twelve largest language editions of Wikipedia. From those, we can extract 1.6 M new relations in DBpedia at a level of precision of 95%, using a RandomForest classifier trained only on language-independent features. We furthermore investigate the similarity of models for different languages and show an exemplary geographical breakdown of the information extracted. In a second series of experiments, we show how the approach can be transferred to DBkWik, a knowledge graph extracted from thousands of Wikis. We discuss the challenges and first results of extracting relations from a larger set of Wikis, using a less formalized knowledge graph. Keywords: relation extraction; knowledge graphs; Wikipedia; DBpedia; DBkWik; Wiki farms 1. Introduction Large-scale knowledge graphs, such as DBpedia [1], Freebase [2], Wikidata [3], or YAGO [4], are usually built using heuristic extraction methods, by exploiting crowd-sourcing processes, or both [5].