Arxiv:1811.12349V2 [Cs.SI] 4 Dec 2018 Content for Different Purposes in Very Large Scale
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Personalized Search
Personalized Search Fredrik Nygård Carlsen arXiv:1509.02207v1 [cs.IR] 7 Sep 2015 Submission date: September 2015 Supervisor: Trond Aalberg, IDI Norwegian University of Science and Technology Department of Computer Science Abstract As the volume of electronically available information grows, relevant items become harder to find. This work presents an approach to personalizing search results in scientific publication databases. This work focuses on re-ranking search results from existing search engines like Solr or ElasticSearch. This work also includes the development of Obelix, a new recommendation sys- tem used to re-rank search results. The project was proposed and performed at CERN, using the scientific publications available on the CERN Document Server (CDS). This work experiments with re-ranking using offline and online evaluation of users and documents in CDS. The experiments conclude that the personal- ized search result outperform both latest first and word similarity in terms of click position in the search result for global search in CDS. iii Acknowledgments I would first of all like to thank my supervisor, Trond Aalberg, for valuable guidance and support. Furthermore, I would like to thank the people at CERN: Head Librarian at Invenio, Jens Vigen; CDS Developer, Patrick Glauner; Head of CDS, Jean-Yves Le Meur, and Head developer at Invenio, Tibor Simko, for valuable insight and ideas. I would like to thank my friends who supported me through the entire process; Thea Christine Mathisen and Rolf Erik Lekang. iv Contents Acknowledgment . iv List of Figures ix List of Tablesx 1 Introduction3 1.1 Motivation . .3 1.2 Problem Formulation . -
In Search of the Climate Change Filter Bubble
In Search of the Climate Change Filter Bubble A Content-based Method for Studying Ideological Segregation in Google Genot, Emmanuel; Jiborn, Magnus; Hahn, Ulrike; Volzhanin, Igor; Olsson, Erik J; von Gerber, Ylva 2020 Document Version: Early version, also known as pre-print Link to publication Citation for published version (APA): Genot, E., Jiborn, M., Hahn, U., Volzhanin, I., Olsson, E. J., & von Gerber, Y. (2020). In Search of the Climate Change Filter Bubble: A Content-based Method for Studying Ideological Segregation in Google. Manuscript submitted for publication. Total number of authors: 6 Creative Commons License: Other General rights Unless other specific re-use rights are stated the following general rights apply: Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal Read more about Creative commons licenses: https://creativecommons.org/licenses/ Take down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. LUND UNIVERSITY PO Box 117 221 00 Lund +46 46-222 00 00 In Search of the Climate Change Filter Bubble: A Content-based Method for Studying Ideological Segregation in Google Emmanuel Genot1, Magnus Jiborn2, Ulrike Hahn3, Igor Volzhanin4, Erik J. -
Bursting the Filter Bubble
BURSTINGTHE FILTER BUBBLE:DEMOCRACY , DESIGN, AND ETHICS Proefschrift ter verkrijging van de graad van doctor aan de Technische Universiteit Delft, op gezag van de Rector Magnificus prof. ir. K. C. A. M. Luyben, voorzitter van het College voor Promoties, in het openbaar te verdedigen op woensdag, 16 September 2015 om 10:00 uur door Engin BOZDAG˘ Master of Science in Technische Informatica geboren te Malatya, Turkije. Dit proefschrift is goedgekeurd door: Promotors: Prof. dr. M.J. van den Hoven Prof. dr. ir. I.R. van de Poel Copromotor: dr. M.E. Warnier Samenstelling promotiecommissie: Rector Magnificus, voorzitter Prof. dr. M.J. van den Hoven Technische Universiteit Delft, promotor Prof. dr. ir. I.R. van de Poel Technische Universiteit Delft, promotor dr. M.E. Warnier Technische Universiteit Delft, copromotor Independent members: dr. C. Sandvig Michigan State University, USA Prof. dr. M. Binark Hacettepe University, Turkey Prof. dr. R. Rogers Universiteit van Amsterdam Prof. dr. A. Hanjalic Technische Universiteit Delft Prof. dr. ir. M.F.W.H.A. Janssen Technische Universiteit Delft, reservelid Printed by: CPI Koninklijke Wöhrmann Cover Design: Özgür Taylan Gültekin E-mail: [email protected] WWW: http://www.bozdag.nl Copyright © 2015 by Engin Bozda˘g All rights reserved. No part of the material protected by this copyright notice may be reproduced or utilized in any form or by any means, electronic or mechanical, includ- ing photocopying, recording or by any information storage and retrieval system, without written permission of the author. An electronic version of this dissertation is available at http://repository.tudelft.nl/. PREFACE For Philip Serracino Inglott, For his passion and dedication to Information Ethics Rest in Peace. -
A Longitudinal Analysis of Youtube's Promotion of Conspiracy Videos
A longitudinal analysis of YouTube’s promotion of conspiracy videos Marc Faddoul1, Guillaume Chaslot3, and Hany Farid1,2 Abstract Conspiracy theories have flourished on social media, raising concerns that such content is fueling the spread of disinformation, supporting extremist ideologies, and in some cases, leading to violence. Under increased scrutiny and pressure from legislators and the public, YouTube announced efforts to change their recommendation algorithms so that the most egregious conspiracy videos are demoted and demonetized. To verify this claim, we have developed a classifier for automatically determining if a video is conspiratorial (e.g., the moon landing was faked, the pyramids of Giza were built by aliens, end of the world prophecies, etc.). We coupled this classifier with an emulation of YouTube’s watch-next algorithm on more than a thousand popular informational channels to obtain a year-long picture of the videos actively promoted by YouTube. We also obtained trends of the so-called filter-bubble effect for conspiracy theories. Keywords Online Moderation, Disinformation, Algorithmic Transparency, Recommendation Systems Introduction social media 21; (2) Although view-time might not be the only metric driving the recommendation algorithms, YouTube By allowing for a wide range of opinions to coexist, has not fully explained what the other factors are, or their social media has allowed for an open exchange of relative contributions. It is unarguable, nevertheless, that ideas. There have, however, been concerns that the keeping users engaged remains the main driver for YouTubes recommendation engines which power these services advertising revenues 22,23; and (3) While recommendations amplify sensational content because of its tendency to may span a spectrum, users preferably engage with content generate more engagement. -
Convergence of Search, Recommendations and Advertising
Information Seeking: Convergence of Search, Recommendations and Advertising Hector Garcia-Molina Georgia Koutrika Aditya Parameswaran 1. INTRODUCTION • An advertisement mechanism is similar to a recommen- All of us are faced with a \deluge of data" [2], in our work- dation mechanism, except that the objects presented places and our homes: an ever-growing World Wide Web, to the user are commercial advertisements, and finan- digital books and magazines, photographs, blogs, tweets, e- cial considerations play a central role in ranking. An mails, databases, activity logs, sensor streams, on-line videos, advertising mechanism also considers the user context, movies and music, and so on. Thus, one of the fundamen- e.g., what Web site is currently being visited, or what tal problems in Computer Science has become even more query the user just entered. The selection of advertise- critical today: how to identify objects satisfying a user's ments can be based on the similarity of the context to information need. The goal is to present to the user only the ad, or the similarity of the context to a bid phrase information that is of interest and relevance, at the right (where the advertiser specifies the contexts he is in- place and time. terested in), or on financial considerations (e.g., which At least three types of information providing mechanisms advertiser is willing to pay more for the ad placement). have been developed over the years to satisfy user informa- tion needs: The main thesis of this paper is that these mechanisms are not that different to begin with, and designing these three • A search mechanism takes as input a query that de- mechanisms making use of this commonality could lead to scribes the current user interests. -
Voorgeprogrammeerd
Voorgeprogrammeerd Voorgeprogrammeerd.indd 1 28-2-2012 14:19:13 Voorgeprogrammeerd.indd 2 28-2-2012 14:19:13 Voorgeprogrammeerd Hoe internet ons leven leidt Redactie: Christian van ’t Hof Jelte Timmer Rinie van Est Boom Lemma uitgevers Den Haag 2012 Voorgeprogrammeerd.indd 3 28-2-2012 14:19:14 Omslagontwerp: Textcetera, Den Haag Foto omslag: Shutterstock Opmaak binnenwerk: Textcetera, Den Haag © 2012 Rathenau | Boom Lemma uit gevers Behoudens de in of krachtens de Auteurswet gestelde uitzonderingen mag niets uit deze uitgave worden verveelvoudigd, opgeslagen in een geautomatiseerd gegevens- bestand, of openbaar gemaakt, in enige vorm of op enige wijze, hetzij elektronisch, mechanisch, door fotokopieën, opnamen of enige andere manier, zonder vooraf- gaande schriftelijke toestemming van de uitgever. Voor zover het maken van reprografische verveelvoudigingen uit deze uitgave is toe- gestaan op grond van artikel 16h Auteurswet dient men de daarvoor wettelijk ver- schuldigde vergoedingen te voldoen aan de Stichting Reprorecht (Postbus 3051, 2130 KB Hoofddorp, www.reprorecht.nl). Voor het overnemen van (een) gedeelte(n) uit deze uitgave in bloemlezingen, readers en andere compilatiewerken (art. 16 Auteurswet) kan men zich wenden tot de Stichting PRO (Stichting Publicatie- en Reproductierechten Organisatie, Postbus 3060, 2130 KB Hoofddorp, www.cedar.nl/pro). No part of this book may be reproduced in any form, by print, photoprint, microfilm or any other means without written permission from the publisher. ISBN 978-90-5931-797-0 NUR 800 www.boomlemma.nl www.rathenau.nl Voorgeprogrammeerd.indd 4 28-2-2012 14:19:14 Voorwoord Het Rathenau Instituut stimuleert de discussie over technologische ont- wikkelingen die veel impact hebben op de samenleving. -
Algorithms, Filter Bubbles and Fake News
ALGORITHMS, FILTER BUBBLES AND FAKE NEWS BY DOMINIK GRAU, Managing Director, Beuth Edited by Jessica Patterson Below the surface of digital platforms lies a deep sea of algorithms. Hidden in plain sight, they shape how we consume content and the way content or services are recommended to us. Publishers need to understand the growing influence of algorithms and the digital filter bubbles they create. THE NATURE OF ALGORITHMS Algorithms help publishers, platforms and brands to amplify and personalize content on almost all digital channels. Most of today’s content management systems and recommendation engines are driven by tens of thousands of algorithms. On the most basic level, algorithms are “sets of defined steps structured to process instructions/data to produce an output” according to National University of Ireland Professor Rob Kitchin. Kitchin, who recently published a paper on ‘Thinking critically about and researching algorithms,’ states that “algorithms can be conceived in a number of ways – technically, computationally, mathematically, politically, culturally, economically, contextually, materially, philosophically, ethically.”1 Dartmouth professors Tom Cormen and Devin Balkcom recently offered a specific example of the binary nature of algorithms: “Binary search is an efficient algorithm for finding an item from a sorted list of items. It works by repeatedly dividing in half the portion of the list that could contain the item, until you've narrowed down the possible locations to just one.”2 Just like mathematical formulae, algorithms compute predefined metrics and calculations, step by step. Their code has no bias, they “know” or “understand” nothing, yet they compute everything that has been implemented as a rule or path. -
Measuring Personalization of Web Search
1 Measuring Personalization of Web Search ANIKÓ HANNÁK, Northeastern University PIOTR SAPIEŻYŃSKI, Technical University of Denmark ARASH MOLAVI KAKHKI, Northeastern University DAVID LAZER, ALAN MISLOVE, and CHRISTO WILSON, Northeastern University Web search is an integral part of our daily lives. Recently, there has been a trend of personalization in Web search, where dierent users receive dierent results for the same search query. The increasing level of personalization is leading to concerns about Filter Bubble eects, where certain users are simply unable to access information that the search engines’ algorithm decides is irrelevant. Despite these concerns, there has been little quantication of the extent of personalization in Web search today, or the user attributes that cause it. In light of this situation, we make three contributions. First, we develop a methodology for measuring personalization in Web search results. While conceptually simple, there are numerous details that our methodology must handle in order to accurately attribute dierences in search results to personalization. Second, we apply our methodology to 200 users on Google Web Search and 100 users on Bing. We nd that, on average, 11.7% of results show dierences due to personalization on Google, while 15.8% of results are personalized on Bing, but that this varies widely by search query and by result ranking. Third, we investigate the user features used to personalize on Google Web Search and Bing. Surprisingly, we only nd measurable personalization as a result of searching with a logged in account and the IP address of the searching user. Our results are a rst step towards understanding the extent and eects of personalization on Web search engines today. -
Characterizing Search-Engine Traffic to Internet Research Agency Web Properties
Characterizing Search-Engine Traffic to Internet Research Agency Web Properties Alexander Spangher Gireeja Ranade Besmira Nushi Information Sciences Institute, University of California, Berkeley and Microsoft Research University of Southern California Microsoft Research Adam Fourney Eric Horvitz Microsoft Research Microsoft Research ABSTRACT The Russia-based Internet Research Agency (IRA) carried out a broad information campaign in the U.S. before and after the 2016 presidential election. The organization created an expansive set of internet properties: web domains, Facebook pages, and Twitter bots, which received traffic via purchased Facebook ads, tweets, and search engines indexing their domains. In this paper, we focus on IRA activities that received exposure through search engines, by joining data from Facebook and Twitter with logs from the Internet Explorer 11 and Edge browsers and the Bing.com search engine. We find that a substantial volume of Russian content was apolit- ical and emotionally-neutral in nature. Our observations demon- strate that such content gave IRA web-properties considerable ex- posure through search-engines and brought readers to websites hosting inflammatory content and engagement hooks. Our findings show that, like social media, web search also directed traffic to IRA Figure 1: IRA campaign structure. Illustration of the struc- generated web content, and the resultant traffic patterns are distinct ture of IRA sponsored content on the web. Content spread from those of social media. via a combination of paid promotions (Facebook ads), un- ACM Reference Format: paid promotions (tweets and Facebook posts), and search re- Alexander Spangher, Gireeja Ranade, Besmira Nushi, Adam Fourney, ferrals (organic search and recommendations). These path- and Eric Horvitz. -
Filter Bubbles and Social Media
"FILTER BUBBLES", SOCIAL MEDIA & BIAS WHAT IS A "FILTER BUBBLE"? A filter bubble is an environment, especially an online environment, in which a person only ever comes in to contact with opinions and views that are the same as their own. WHAT IS "BIAS"? Bias is an opinion in favour of or against a particular person, group, or idea. We all have certain biases, it's almost impossible to avoid them. This is OK but we need to acknowledge that we have them as they will impact our response to the news. WHAT IS "POLITICAL BIAS"? Political bias is your preferred political stance. Almost every newspaper has one. Again, this is not a problem - we just need to know it's there. Mainstream newspapers are usually quite open about their political stance - this is a good thing. WHAT ABOUT SOCIAL MEDIA? Most of us get our news from social media. What we see on our social media is decided by a computer programme called an "algorithm", it shows us things that we are likely to enjoy. This means the news you see will almost always align with your views. WHAT'S THE PROBLEM HERE? If you only ever read news that confirms your point of view, then you won't have a balanced understanding of events. If you don't have the full picture, it can be hard to know what's actually happening! WHAT CAN I DO TO AVOID THIS? Read articles from a wide range of sources to get a balanced view. Be aware of bias while you read (your own and that of the news source). -
Ethical Implications of Filter Bubbles and Personalized News-Streams
Volume 3, 190-189, 2017 ISSN 2501-3513 ETHICAL IMPLICATIONS OF FILTER BUBBLES AND PERSONALIZED NEWS-STREAMS Dan Valeriu VOINEA University of Craiova, Center for Scientific Research Communication, Media and Public Opinion, Romania 1. Abstract In the days of traditional media, be it print, radio, or television, every consumer of a product received the same information. However, with the transition to the online world, personalization, tailoring information to the individual’s interests, became a possibility. Starting with the advent of personalized search (Pitkow et al., 2002), which was introduced by Google as a feature of its search engine in 2004, interest-based advertising and personalized news became both features, improving the relevance of the information to a specific user, and ethical problems – how can information only I receive influence me? The issue is even more complicated when talking about social media and personalized news – how can we ensure algorithms are transparent and indiscriminate? The present research is focused on bringing to attention possible ethical dilemmas and solutions to filter bubbles brought on by news personalization. 190 2. Keywords filter bubbles, personalized news, information filtering, fake news, news algorithms, social media 3. Introduction Even though discussions regarding the personalization of content served to consumers on the internet started in the late ’90s, there were not many mainstream analyses into the ethical implications of such manipulation of content until the introduction of the term “filter bubble” by Eli Pariser. “The filter bubble tends to dramatically amplify confirmation bias—in a way, it’s designed to. Consuming information that conforms to our ideas of the world is easy and pleasurable; consuming information that challenges us to think in new ways or question our assumptions is frustrating and difficult.” (Pariser, 2012, p. -
Repriv: Re-Envisioning In-Browser Privacy
RePriv: Re-Envisioning In-Browser Privacy Matt Fredrikson Ben Livshits University of Wisconsin Microsoft Research New York Times Share data to get Privacy personalized concerns results Netflix Google news Amazon Browser: Personalization & Privacy • Broad applications: – Site personalizationYourGoogleNetflixAmazon browser 12 – Personalized search1 Browsing history 11 1 10 Top: Computers: Security: Internet: Privacy – Ads10 2 Top: Arts: Movies: Genres: Film Noir 9 3 Top: Sports: Hockey: Ice Hockey 9 3 Top: Science: Math: Number Theory • User data in browser Top: Recreation: Outdoors: Fishing 8 4 7 5 7 5 Distill 6 • Control information release User interest profile Scenario #1: Online Shopping bn.com would like to learn your top interests. We will let them know you are interested in: Interest • Science profile • Technology • Outdoors Interest profile Accept Decline RePriv Protocol Scenario #2: Personalized Search PersonalizedPersonalized Results Results Would you like to install an extension RePrivcalled “Bing Personalizer“weather”” that will: weather.com “weather” •weather.comWatch mouse clicks“sports” on bing.com espn.com • Modify appearance of bing.com “sports” espn.com “movies” imdb.com • Store personal data in browser “movies” imdb.com “recipes” epicurious.com “recipes” epicurious.comAccept Decline Contributions of RePriv • An in-browser framework for collecting & RePriv managing personal data to facilitate personalization. Core Behavior • Efficient in-browser behavior mining & controlled Mining dissemination of personal data.