Fingerprinting Keywords in Search Queries Over Tor

Total Page:16

File Type:pdf, Size:1020Kb

Fingerprinting Keywords in Search Queries Over Tor Proceedings on Privacy Enhancing Technologies ; 2017 (4):251–270 Se Eun Oh*, Shuai Li, and Nicholas Hopper Fingerprinting Keywords in Search Queries over Tor Abstract: Search engine queries contain a great deal ×10 6 ×10 6 of private and potentially compromising information 2 2 about users. One technique to prevent search engines 1 1 from identifying the source of a query, and Internet ser- 0 0 vice providers (ISPs) from identifying the contents of PCA2 PCA2 -1 -1 queries is to query the search engine over an anony- Backgroud Webpage Trace Backgroud Webpage Trace Google Query Trace Duckduckgo Query Trace -2 -2 mous network such as Tor. -1 0 1 2 3 4 5 -1 0 1 2 3 4 5 In this paper, we study the extent to which Website PCA1 ×10 7 PCA1 ×10 7 Fingerprinting can be extended to fingerprint individual (a) Google (b) Duckduckgo queries or keywords to web applications, a task we call Fig. 1. Principal Components Analysis (PCA) Plot of Google and Keyword Fingerprinting (KF). We show that by aug- Duckduckgo query traces and background webpage traces based menting traffic analysis using a two-stage approach with on CUMUL feature set new task-specific feature sets, a passive network adver- Bing and Google, and ISPs, are in a position to col- sary can in many cases defeat the use of Tor to protect lect sensitive details about users. These search queries search engine queries. have also been among the targets of censorship [29] and We explore three popular search engines, Google, Bing, surveillance infrastructures [17] built through the coop- and Duckduckgo, and several machine learning tech- eration of state and private entities. niques with various experimental scenarios. Our experi- One mechanism to conceal the link between users mental results show that KF can identify Google queries and their search queries is an anonymous network such containing one of 300 targeted keywords with recall of as Tor, where the identities of clients are concealed from 80% and precision of 91%, while identifying the specific servers and the contents and destinations of connec- monitored keyword among 300 search keywords with ac- tions are concealed from network adversaries, by send- curacy 48%. We also further investigate the factors that ing connections through a series of encrypted relays. contribute to keyword fingerprintability to understand However, even Tor cannot always guarantee connection how search engines and users might protect against KF. anonymity since the timing and volume of traffic still reveal some information about user browsing activity. DOI 10.1515/popets-2017-0048 Over the past decade, researchers have applied vari- Received 2017-02-28; revised 2017-06-01; accepted 2017-06-02. ous machine learning techniques to features based on packet size and timing information from Tor connec- tions to show that Website Fingerprinting (WF) attacks 1 Introduction can identify frequently visited or sensitive web pages Internet search engines are the most popular and con- [7, 14, 20, 30, 31, 44]. venient method by which users locate information on In this paper, we describe a new application of the web. As such, the queries a user makes to these these attacks on Tor, Keyword Fingerprinting (KF). In search engines necessarily contain a great deal of pri- this attack model, a local passive adversary attempts vate and personal information about the user. For ex- to infer a user’s search engine queries, based only on ample, AOL search data leaked in 2006 was found to analysing traffic intercepted between the client and the contain queries revealing medical conditions, drug use, entry guard in the Tor network. A KF attack proceeds in sexual assault victims, marital problems, and political three stages. First, the attacker must identify which Tor leanings [18, 38]. Thus, popular search engines such as connections carry the search result traces of a particu- lar search engine against the other webpage traces. The second is to determine whether a target query trace is in a list of “monitored queries” targeted for identification. *Corresponding Author: Se Eun Oh: University of Min- nesota, E-mail: [email protected] The third goal is to classify each query trace correctly Shuai Li: University of Minnesota, E-mail: [email protected] to predict the query (keyword) that the victim typed. Nicholas Hopper: University of Minnesota, E-mail: hop- Note that while KF and WF attacks share some [email protected] common techniques, KF focuses on the latter stages of Unauthenticated Download Date | 1/3/18 2:35 AM Proceedings on Privacy Enhancing Technologies ; 2017 (4):252–270 this attack, distinguishing between multiple results from 2 Threat Model a single web application, which is challenging for WF techniques. Figure 1 illustrates this distinction; it shows Our attack model for KF follows the standard WF at- the website fingerprints (using the CUMUL features tack model, in which the attacker is a passive attacker proposed by Panchenko et al. [30]) of 10,000 popular who can only observe the communication between the web pages in green, and 10,000 search engine queries in client and the Tor entry guard. He cannot add, drop, blue. Search engine queries are easy to distinguish from modify, decrypt packets, or compromise the entry guard. general web traffic, but show less variation between key- The attacker has a list of monitored phrases and hopes words. Hence, while direct application of WF performs to identify packet traces in which the user queries a the first step very well and can perform adequately in search engine for one of these phrases, which we re- the second stage, it performs poorly for differentiating fer to as keywords. We also assume that search engine between monitored keywords. Thus, the different level servers are uncooperative and consequently users relay of application as well as the multi-stage nature of the at- their queries through Tor to hide the link between pre- tack make it difficult to directly use or compare results vious and future queries. Thus, our threat model only from the WF setting. monitors traces entering through the Tor network. The primary contribution of this paper is to demon- The attacker will progress through two sequential strate the feasibility of KF across a variety of set- fingerprinting steps. First, he performs website finger- tings. We collect a large keyword dataset, yielding about printing to identify the traffic traces that contain queries 160,000 search query traces for monitored and back- to the targeted search engine, rather than other web- ground keywords. Based on this dataset, we determine page traffic. Traffic not identified by this step is ignored. that WF features do not carry sufficient information We refer to the remaining traces, passed to the next to distinguish between queries, as Figure 1 illustrates. step, as search query traces or keyword traces. Second, Thus we conduct a feature analysis to identify more use- the attacker performs KF to predict keywords in query ful features for KF; adding these features significantly traces. The attacker will classify query traces into in- improves the accuracy that can be achieved in the sec- dividual keywords, creating a multiclass classifier. De- ond and third stages of the attack. pending on the purpose of the attack, the attacker trains Using these new features, we explore the effective- classifiers with either binary or multiclass classifications. ness of KF, by considering different sizes of “mon- For example, binary classification will detect monitored itored” and “background” keyword sets; both incre- keywords against background keywords while multiclass mental search (e.g. Google Instant1) and “high secu- classification will infer which of the monitored keywords rity” search (with JavaScript disabled); across popu- the victim queried. lar search engines — Google, Bing, and Duckduckgo; We assume the adversary periodically re-trains the and with different classifiers, support vector machines classifiers used in both steps with new training data; (SVM), k-Nearest Neighbor, and random forests (k- since both training and acquiring the data are resource- fingerprinting). We show that well-known high-overhead intensive, we used data gathered over a three-month pe- WF defenses [5, 6] significantly reduce the success of riod to test the generalizability of our results. KF attacks but do not completely prevent them with 3 Data Set standard parameters. We examine factors that differ be- tween non-fingerprintable (those with recall of 0%) and 3.1 Data Collection fingerprintable keywords. Overall, our results indicate We describe the Tor traffic capture tool and data that use of Tor alone may be inadequate to defend the sets collected for our experiment. To collect search content of users’ search engine queries. query traces on Tor, we used the Juarez et al. Tor Browser Crawler [22, 39], built based on Selenium, Stem, and Dumpcap to automate Tor Browser browsing and packet capture. We added two additional features to the crawler for our experiments. The first is to identify abnor- mal Google responses due to rate limiting and rebuild 1 Google Instant is a predictive search for potential responses as the Tor circuit. When an exit node is rate-limited by soon as or before the user types keywords into the search box. By default, Google Instant predictions are enabled only when Google, it responds (most often) with a CAPTCHA, the computer is fast enough. Unauthenticated Download Date | 1/3/18 2:35 AM Proceedings on Privacy Enhancing Technologies ; 2017 (4):253–270 thus, we added a mechanism to detect these cases and Table 1. Data collection settings for Google, Bing, and Duck- reset the Tor process, resulting in a different exit node. duckgo (inc:incremental, one:one-shot, Y:Yes, N:No) Among 101,685 HTML files for Google queries and keyword set Google(inc) Google(one) Bing Duck 125,245 HTML files for Bing, we manually inspected Top-ranked Y Y Y Y those of size less than 100KB, yielding 3,315 HTML re- Blacklisted Y Y N N sponses for Google and 9,985 for Bing, and found that AOL Y N N N a simple size threshold of 10KB for the response doc- ument was sufficient to distinguish rate-limit responses search query traces from January to February in 2017.) from normal results with perfect accuracy.
Recommended publications
  • Duckduckgo Search Engines Android
    Duckduckgo search engines android Continue 1 5.65.0 10.8MB DuckduckGo Privacy Browser 1 5.64.0 10.8MB DuckduckGo Privacy Browser 1 5.63.1 10.78MB DuckduckGo Privacy Browser 1 5.62.0 10.36MB DuckduckGo Privacy Browser 1 5.61.2 10.36MB DuckduckGo Privacy Browser 1 5.60.0 10.35MB DuckduckGo Privacy Browser 1 5.59.1 10.35MB DuckduckGo Privacy Browser 1 5.58.1 10.33MB DuckduckGo Privacy Browser 1 5.57.1 10.31MB DuckduckGo Privacy browser © DuckduckGo. Privacy, simplified. This article is about the search engine. For children's play, see duck, duck, goose. Internet search engine DuckDuckGoScreenshot home page DuckDuckGo on 2018Type search engine siteWeb Unavailable inMultilingualHeadquarters20 Paoli PikePaoli, Pennsylvania, USA Area servedWorldwideOwnerDuck Duck Go, Inc., createdGabriel WeinbergURLduckduckgo.comAlexa rank 158 (October 2020 update) CommercialRegregedSeptember 25, 2008; 12 years ago (2008-09-25) was an Internet search engine that emphasized the privacy of search engines and avoided the filter bubble of personalized search results. DuckDuckGo differs from other search engines by not profiling its users and showing all users the same search results for this search term. The company is based in Paoli, Pennsylvania, in Greater Philadelphia and has 111 employees since October 2020. The name of the company is a reference to the children's game duck, duck, goose. The results of the DuckDuckGo Survey are a compilation of more than 400 sources, including Yahoo! Search BOSS, Wolfram Alpha, Bing, Yandex, own web scanner (DuckDuckBot) and others. It also uses data from crowdsourcing sites, including Wikipedia, to fill in the knowledge panel boxes to the right of the results.
    [Show full text]
  • EFF, Duckduckgo, and Internet Archive Amicus Brief
    Case: 17-16783, 11/27/2017, ID: 10667942, DktEntry: 42, Page 1 of 39 NO. 17-16783 IN THE UNITED STATES COURT OF APPEALS FOR THE NINTH CIRCUIT HIQ LABS, INC., PLAINTIFF-APPELLEE, V. LINKEDIN CORPORATION, DEFENDANT-APPELLANT. On Appeal from the United States District Court for the Northern District of California Case No. 3:17-cv-03301-EMC The Honorable Edward M. Chen, District Court Judge BRIEF OF AMICI CURIAE ELECTRONIC FRONTIER FOUNDATION, DUCKDUCKGO, AND INTERNET ARCHIVE IN SUPPORT OF PLAINTIFF-APPELLEE Jamie Williams Corynne McSherry Cindy Cohn Nathan Cardozo ELECTRONIC FRONTIER FOUNDATION 815 Eddy Street San Francisco, CA 94109 Email: [email protected] Telephone: (415) 436-9333 Counsel for Amici Curiae Case: 17-16783, 11/27/2017, ID: 10667942, DktEntry: 42, Page 2 of 39 DISCLOSURE OF CORPORATE AFFILIATIONS AND OTHER ENTITIES WITH A DIRECT FINANCIAL INTEREST IN LITIGATION Pursuant to Rule 26.1 of the Federal Rules of Appellate Procedure, Amici Curiae Electronic Frontier Foundation, DuckDuckGo, and Internet Archive each individually state that they do not have a parent corporation and that no publicly held corporation owns 10 percent or more of their stock. i Case: 17-16783, 11/27/2017, ID: 10667942, DktEntry: 42, Page 3 of 39 TABLE OF CONTENTS CORPORATE DISCLOSURE STATEMENTS ..................................................... i TABLE OF CONTENTS ...................................................................................... ii TABLE OF AUTHORITIES ...............................................................................
    [Show full text]
  • The Autonomous Surfer
    Renée Ridgway The Autonomous Surfer CAIS Report Fellowship Mai bis Oktober 2018 GEFÖRDERT DURCH RIDGWAY The Autonomous Surfer Research Questions The Autonomous Surfer endeavoured to discover the unknown unknowns of alternative search through the following research questions: What are the alternatives to Google search? What are their hidden revenue models, even if they do not collect user data? How do they deliver divergent (and qualitative) results or knowledge? What are the criteria that determine ranking and relevance? How do p2p search engines such as YaCy work? Does it deliver alternative results compared to other search engines? Is there still a movement for a larger, public index? Can there be serendipitous search, which is the ability to come across books, articles, images, information, objects, and so forth, by chance? Aims and Projected Results My PhD research investigates Google search – its early development, its technological innovation, its business model of the past 20 years and how it works now. Furthermore, I have experimented with Tor (The Onion Router) in order to find out if I could be anonymous online, and if so, would I receive diver- gent results from Google with the same keywords. For my fellowship at CAIS I decided to first research search engines that were incorporated into the Tor browser as default (Startpage, Disconnect) or are the default browser now (DuckDuckGo). I then researched search engines in my original CAIS proposal that I had come across in my PhD but hadn’t had the time to research; some are from the Society of the Query Reader (2014) and others I found en route or on colleagues’ suggestions.
    [Show full text]
  • GOGGLES: Democracy Dies in Darkness, and So Does the Web
    GOGGLES: Democracy dies in darkness, and so does the Web Brave Search Team Brave Munich, Germany [email protected] Abstract of information has led to a significant transfer of power from cre- This paper proposes an open and collaborative system by which ators to aggregators. Access to information has been monopolized a community, or a single user, can create sets of rules and filters, by companies like Google and Facebook [27]. While everything is called Goggles, to define the space which a search engine can pull theoretically still retrievable, in practice we are looking at the world results from. Instead of a single ranking algorithm, we could have through the biases of a few providers, who act, unintentionally or as many as needed, overcoming the biases that a single actor (the not, as gatekeepers. Akin to the thought experiment about the tree search engine) embeds into the results. Transparency and openness, falling in the forest [3], if a page is not listed on Google’s results all desirable qualities, will become accessible through the deep re- page or in the Facebook feed, does it really exist? ranking capabilities Goggles would enable. Such system would be The biases of Google and Facebook, whether algorithmic, data made possible by the availability of a host search engine, providing induced, commercial or political dictate what version of the world the index and infrastructure, which are unlikely to be replicated we get to see. Reality becomes what the models we are fed depict without major development and infrastructure costs. Besides the it to be [24].
    [Show full text]
  • Mass Surveillance
    Mass Surveillance Mass Surveillance What are the risks for the citizens and the opportunities for the European Information Society? What are the possible mitigation strategies? Part 1 - Risks and opportunities raised by the current generation of network services and applications Study IP/G/STOA/FWC-2013-1/LOT 9/C5/SC1 January 2015 PE 527.409 STOA - Science and Technology Options Assessment The STOA project “Mass Surveillance Part 1 – Risks, Opportunities and Mitigation Strategies” was carried out by TECNALIA Research and Investigation in Spain. AUTHORS Arkaitz Gamino Garcia Concepción Cortes Velasco Eider Iturbe Zamalloa Erkuden Rios Velasco Iñaki Eguía Elejabarrieta Javier Herrera Lotero Jason Mansell (Linguistic Review) José Javier Larrañeta Ibañez Stefan Schuster (Editor) The authors acknowledge and would like to thank the following experts for their contributions to this report: Prof. Nigel Smart, University of Bristol; Matteo E. Bonfanti PhD, Research Fellow in International Law and Security, Scuola Superiore Sant’Anna Pisa; Prof. Fred Piper, University of London; Caspar Bowden, independent privacy researcher; Maria Pilar Torres Bruna, Head of Cybersecurity, Everis Aerospace, Defense and Security; Prof. Kenny Paterson, University of London; Agustín Martin and Luis Hernández Encinas, Tenured Scientists, Department of Information Processing and Cryptography (Cryptology and Information Security Group), CSIC; Alessandro Zanasi, Zanasi & Partners; Fernando Acero, Expert on Open Source Software; Luigi Coppolino,Università degli Studi di Napoli; Marcello Antonucci, EZNESS srl; Rachel Oldroyd, Managing Editor of The Bureau of Investigative Journalism; Peter Kruse, Founder of CSIS Security Group A/S; Ryan Gallagher, investigative Reporter of The Intercept; Capitán Alberto Redondo, Guardia Civil; Prof. Bart Preneel, KU Leuven; Raoul Chiesa, Security Brokers SCpA, CyberDefcon Ltd.; Prof.
    [Show full text]
  • Applying Library Values to Emerging Technology Decision-Making in the Age of Open Access, Maker Spaces, and the Ever-Changing Library
    ACRL Publications in Librarianship No. 72 Applying Library Values to Emerging Technology Decision-Making in the Age of Open Access, Maker Spaces, and the Ever-Changing Library Editors Peter D. Fernandez and Kelly Tilton Association of College and Research Libraries A division of the American Library Association Chicago, Illinois 2018 The paper used in this publication meets the minimum requirements of Ameri- can National Standard for Information Sciences–Permanence of Paper for Print- ed Library Materials, ANSI Z39.48-1992. ∞ Cataloging-in-Publication data is on file with the Library of Congress. Copyright ©2018 by the Association of College and Research Libraries. All rights reserved except those which may be granted by Sections 107 and 108 of the Copyright Revision Act of 1976. Printed in the United States of America. 22 21 20 19 18 5 4 3 2 1 Contents Contents Introduction .......................................................................................................ix Peter Fernandez, Head, LRE Liaison Programs, University of Tennessee Libraries Kelly Tilton, Information Literacy Instruction Librarian, University of Tennessee Libraries Part I Contemplating Library Values Chapter 1. ..........................................................................................................1 The New Technocracy: Positioning Librarianship’s Core Values in Relationship to Technology Is a Much Taller Order Than We Think John Buschman, Dean of University Libraries, Seton Hall University Chapter 2. ........................................................................................................27
    [Show full text]
  • Online-Quellen Nutzen: Recherche Im Internet
    ONLINE-QUELLEN NUTZEN: RECHERCHE IM INTERNET Katharina Gilarski, Verena Müller, Martin Nissen (Stand: 08/2020) Inhaltsverzeichnis 1 Recherche mit Universalsuchmaschinen ............................................................. 2 1.1 Clever googlen ......................................................................................................... 2 1.2 Suchoperatoren für die einfache Suche ................................................................... 2 1.3 Die erweiterte Suche mit Google .............................................................................. 3 1.4 Die umgekehrte Bildersuche mit Google .................................................................. 4 1.5 Das Geheimnis der Treffersortierung in Google ....................................................... 5 1.6 Alternative universelle Suchmaschinen .................................................................... 6 1.7 Surface vs. Deep Web ............................................................................................. 9 1.8 Vor- und Nachteile von universellen Suchmaschinen ............................................. 10 1.9 Bewertung von Internetquellen ............................................................................... 10 2 Recherche mit wissenschaftlichen Suchmaschinen ......................................... 11 2.1 Google Scholar: Googles Suchmaschine für Wissenschaftler ................................ 11 2.2 Suchmöglichkeiten von Google Scholar ................................................................
    [Show full text]
  • Fingerprinting Detection
    ACTIONS SPEAK LOUDER THAN WORDS: SEMI-SUPERVISED LEARNING FOR BROWSER FINGERPRINTING DETECTION Sarah Bird Vikas Mishra Steven Englehardt Mozilla INRIA Mozilla [email protected] [email protected] [email protected] Rob Willoughby David Zeber Walter Rudametkin University of British Colombia Mozilla INRIA [email protected] [email protected] [email protected] Martin Lopatka Mozilla [email protected] November 1, 2019 ABSTRACT As online tracking continues to grow, existing anti-tracking and fingerprinting detection techniques that require significant manual input must be augmented. Heuristic approaches to fingerprinting de- tection are precise but must be carefully curated. Supervised machine learning techniques proposed for detecting tracking require manually generated label-sets. Seeking to overcome these challenges, we present a semi-supervised machine learning approach for detecting fingerprinting scripts. Our approach is based on the core insight that fingerprinting scripts have similar patterns of API access when generating their fingerprints, even though their access patterns may not match exactly. Using this insight, we group scripts by their JavaScript (JS) execution traces and apply a semi-supervised approach to detect new fingerprinting scripts. We detail our methodology and demonstrate its abil- ity to identify the majority of scripts (>94.9%) identified by existing heuristic techniques. We also show that the approach expands beyond detecting known scripts by surfacing candidate scripts that arXiv:2003.04463v1 [cs.CR] 9 Mar 2020 are likely to include fingerprinting. Through an analysis of these candidate scripts we discovered fin- gerprinting scripts that were missed by heuristics and for which there are no heuristics. In particular, we identified over one hundred device-class fingerprinting scripts present on hundreds of domains.
    [Show full text]
  • Online Legal Research Handout
    The New On-Line Search Fundamentals 1. Search Process Natural language searches – Natural language searching is essentially the keyword searching we all know and love; the search results you get reflect algorithmic decision- making by the people who designed the search engine. Often the results you get will be based on the number of times your search terms appear in a document or on a web page. Boolean searches – Boolean searches are also referred to as “terms and connectors” searches. These are the searches using quotation marks, minus signs, slashes, and other symbols to “tell” the search engine exactly what you are searching for. Boolean searching gives you a way to limit your searches by dates, run searches in specific fields, and overall return more precise results. Natural Language v. Boolean searching compared - Most search engines (including Google, Bing, Westlaw Next, and Lexis Advance) are designed for natural language searching. - You can override natural language searching 1. If you know the terms and connectors used by that search engine. (I often click on “help” or something similar to see what terms and connectors the search engine uses.); or 2. If you can find the “advanced search” page for the search engine. - Natural language searching provides the “best” search results when You are not sure where to start and you need to figure out what search terms to use (i.e., you don’t know the words used on the webpages or in the documents you are looking for); You want a large number of search results you can sift through to figure out what kinds of results to expect; or You are researching a conceptual issue rather than a specific topic.
    [Show full text]
  • The Tor Dark Net
    PAPER SERIES: NO. 20 — SEPTEMBER 2015 The Tor Dark Net Gareth Owen and Nick Savage THE TOR DARK NET Gareth Owen and Nick Savage Copyright © 2015 by Gareth Owen and Nick Savage Published by the Centre for International Governance Innovation and the Royal Institute of International Affairs. The opinions expressed in this publication are those of the authors and do not necessarily reflect the views of the Centre for International Governance Innovation or its Board of Directors. This work is licensed under a Creative Commons Attribution — Non-commercial — No Derivatives License. To view this license, visit (www.creativecommons.org/licenses/by-nc- nd/3.0/). For re-use or distribution, please include this copyright notice. 67 Erb Street West 10 St James’s Square Waterloo, Ontario N2L 6C2 London, England SW1Y 4LE Canada United Kingdom tel +1 519 885 2444 fax +1 519 885 5450 tel +44 (0)20 7957 5700 fax +44 (0)20 7957 5710 www.cigionline.org www.chathamhouse.org TABLE OF CONTENTS vi About the Global Commission on Internet Governance vi About the Authors 1 Executive Summary 1 Introduction 2 Hidden Services 2 Related Work 3 Study of HSes 4 Content and Popularity Analysis 7 Deanonymization of Tor Users and HSes 8 Blocking of Tor 8 HS Blocking 9 Conclusion 9 Works Cited 12 About CIGI 12 About Chatham House 12 CIGI Masthead GLOBAL COMMISSION ON INTERNET GOVERNANCE PAPER SERIES: NO. 20 — SEPTEMBER 2015 ABOUT THE GLOBAL ABOUT THE AUTHORS COMMISSION ON INTERNET Gareth Owen is a senior lecturer in the School of GOVERNANCE Computing at the University of Portsmouth.
    [Show full text]
  • How Do Tor Users Interact with Onion Services?
    How Do Tor Users Interact With Onion Services? Philipp Winter Anne Edmundson Laura M. Roberts Princeton University Princeton University Princeton University Agnieszka Dutkowska-Zuk˙ Marshini Chetty Nick Feamster Independent Princeton University Princeton University Abstract messaging [4] and file sharing [15]. The Tor Project currently does not have data on the number of onion Onion services are anonymous network services that are service users, but Facebook reported in 2016 that more exposed over the Tor network. In contrast to conventional than one million users logged into its onion service in one Internet services, onion services are private, generally not month [20]. indexed by search engines, and use self-certifying domain Onion services differ from conventional web services names that are long and difficult for humans to read. In in four ways; First, they can only be accessed over the Tor this paper, we study how people perceive, understand, and network. Second, onion domains are hashes over their use onion services based on data from 17 semi-structured public key, which make them difficult to remember. Third, interviews and an online survey of 517 users. We find that the network path between client and the onion service is users have an incomplete mental model of onion services, typically longer, increasing latency and thus reducing the use these services for anonymity and have varying trust in performance of the service. Finally, onion services are onion services in general. Users also have difficulty dis- private by default, meaning that users must discover these covering and tracking onion sites and authenticating them. sites organically, rather than with a search engine.
    [Show full text]
  • Appendix I: Search Quality and Economies of Scale
    Appendix I: search quality and economies of scale Introduction 1. This appendix presents some detailed evidence concerning how general search engines compete on quality. It also examines network effects, scale economies and other barriers to search engines competing effectively on quality and other dimensions of competition. 2. This appendix draws on academic literature, submissions and internal documents from market participants. Comparing quality across search engines 3. As set out in Chapter 3, the main way that search engines compete for consumers is through various dimensions of quality, including relevance of results, privacy and trust, and social purpose or rewards.1 Of these dimensions, the relevance of search results is generally considered to be the most important. 4. In this section, we present consumer research and other evidence regarding how search engines compare on the quality and relevance of search results. Evidence on quality from consumer research 5. We reviewed a range of consumer research produced or commissioned by Google or Bing, to understand how consumers perceive the quality of these and other search engines. The following evidence concerns consumers’ overall quality judgements and preferences for particular search engines: 6. Google submitted the results of its Information Satisfaction tests. Google’s Information Satisfaction tests are quality comparisons that it carries out in the normal course of its business. In these tests, human raters score unbranded search results from mobile devices for a random set of queries from Google’s search logs. • Google submitted the results of 98 tests from the US between 2017 and 2020. Google outscored Bing in each test; the gap stood at 8.1 percentage points in the most recent test and 7.3 percentage points on average over the period.
    [Show full text]