Cached and Confused: Web Cache Deception in the Wild

Total Page:16

File Type:pdf, Size:1020Kb

Cached and Confused: Web Cache Deception in the Wild Cached and Confused: Web Cache Deception in the Wild Seyed Ali Mirheidari, University of Trento; Sajjad Arshad, Northeastern University; Kaan Onarlioglu, Akamai Technologies; Bruno Crispo, University of Trento, KU Leuven; Engin Kirda and William Robertson, Northeastern University https://www.usenix.org/conference/usenixsecurity20/presentation/mirheidari This paper is included in the Proceedings of the 29th USENIX Security Symposium. August 12–14, 2020 978-1-939133-17-5 Open access to the Proceedings of the 29th USENIX Security Symposium is sponsored by USENIX. Cached and Confused: Web Cache Deception in the Wild Seyed Ali Mirheidari Sajjad Arshad∗ Kaan Onarlioglu University of Trento Northeastern University Akamai Technologies Bruno Crispo Engin Kirda William Robertson University of Trento & Northeastern University Northeastern University KU Leuven Abstract In particular, Content Delivery Network (CDN) providers Web cache deception (WCD) is an attack proposed in 2017, heavily rely on effective web content caching at their edge where an attacker tricks a caching proxy into erroneously servers, which together comprise a massively-distributed In- storing private information transmitted over the Internet and ternet overlay network of caching reverse proxies. Popular subsequently gains unauthorized access to that cached data. CDN providers advertise accelerated content delivery and Due to the widespread use of web caches and, in particular, high availability via global coverage and deployments reach- the use of massive networks of caching proxies deployed ing hundreds of thousands of servers [5,15]. A recent scien- by content distribution network (CDN) providers as a critical tific measurement also estimates that more than 74% of the component of the Internet, WCD puts a substantial population Alexa Top 1K are served by CDN providers, indicating that of Internet users at risk. CDNs and more generally web caching play a central role in We present the first large-scale study that quantifies the the Internet [26]. prevalence of WCD in 340 high-profile sites among the Alexa While there exist technologies that enable limited caching Top 5K. Our analysis reveals WCD vulnerabilities that leak of dynamically-generated pages, web caching primarily tar- private user data as well as secret authentication and autho- gets static, publicly accessible content. In other words, web rization tokens that can be leveraged by an attacker to mount caches store static content that is costly to deliver due to an ob- damaging web application attacks. Furthermore, we explore ject’s size or distance. Importantly, these objects must not con- WCD in a scientific framework as an instance of the path tain private or otherwise sensitive information, as application- confusion class of attacks, and demonstrate that variations on level access control is not enforced at cache servers. Good the path confusion technique used make it possible to exploit candidates for caching include frequently accessed images, sites that are otherwise not impacted by the original attack. software and document downloads, streaming media, style Our findings show that many popular sites remain vulnerable sheets, and large static HTML and JavaScript files. two years after the public disclosure of WCD. In 2017, Gil presented a novel attack called web cache de- Our empirical experiments with popular CDN providers ception (WCD) that can trick a web cache into incorrectly underline the fact that web caches are not plug & play tech- storing sensitive content, and consequently give an attacker nologies. In order to mitigate WCD, site operators must adopt unauthorized access to that content [23,24]. Gil demonstrated a holistic view of their web infrastructure and carefully con- the issue with a real-life attack scenario targeting a high pro- figure cache settings appropriate for their applications. file site, PayPal, and showed that WCD can successfully leak details of a private payment account. Consequently, WCD garnered significant media attention, and prompted responses 1 Introduction from major web cache and CDN providers [8,9,12,13,43,48]. At its core, WCD results from path confusion between an Web caches have become an essential component of the Inter- origin server and a web cache. In other words, different in- net infrastructure with numerous use cases such as reducing terpretations of a requested URL at these two points lead to bandwidth costs in private enterprise networks and accelerat- a disagreement on the cacheability of a given object. This ing content delivery over the World Wide Web. Today caching disagreement can then be exploited to trick the web cache is implemented at multiple stages of Internet communications, into storing non-cacheable objects. WCD does not imply for instance in popular web browsers [45,58], at caching prox- that these individual components—the origin server and web ies [55, 64], and directly at origin web servers [6, 46]. cache—are incorrectly configured per se. Instead, their haz- ∗Currently employed by Google. ardous interactions as a system lead to the vulnerability. As a USENIX Association 29th USENIX Security Symposium 665 result, detecting and correcting vulnerable systems is a cum- Ethical Considerations. We have designed our measure- bersome task, and may require careful inspection of the en- ment methodology to minimize the impact on scanned sites, tire caching architecture. Combined with the aforementioned and limit the inconvenience we impose on site operators. Sim- pervasiveness and critical role of web caches in the Internet ilarly, we have followed responsible disclosure principles to infrastructure, WCD has become a severely damaging issue. notify the impacted parties, and limited the information we In this paper, we first present a large-scale measurement share in this paper to minimize the risk of any inadvertent and analysis of WCD over 295 sites in the Alexa Top 5K. We damage to them or their end-users. We discuss details of the present a repeatable and automated methodology to discover ethical considerations pertaining to this work in Section 3.5. vulnerable sites over the Internet, and a detailed analysis of our findings to characterize the extent of the problem. Our results show that many high-profile sites that handle sensitive 2 Background & Related Work and private data are impacted by WCD and are vulnerable to practical attacks. We then discuss additional path confusion In this section, we present an overview of how web cache methods that can maximize the damage potential of WCD, deception (WCD) attacks work and discuss related concepts and demonstrate their impact in a follow-up experiment over and technologies such as web caches, path confusion, and an extended data set of 340 sites. existing WCD scanners. As of this writing, the academic To the best of our knowledge, this is the first in-depth inves- literature has not yet directly covered WCD. Nevertheless, in tigation of WCD in a scientific framework and at this scale. In this section we summarize previous publications pertaining addition, the scope of our investigation goes beyond private to other security issues around web caches and CDNs. data leakage to provide novel insights into the severity of WCD. We demonstrate how WCD can be exploited to steal other types of sensitive data including security tokens, explain 2.1 Web Caches advanced attack techniques that elevate WCD vulnerabilities Repeatedly transferring heavily used and large web objects to injection vectors, and quantify our findings through further over the Internet is a costly process for both web servers and analysis of collected data. their end-users. Multiple round-trips between a client and Finally, we perform an empirical analysis of popular CDN server over long distances, especially in the face of common providers, documenting their default caching settings and technical issues with the Internet infrastructure and routing customization mechanisms. Our findings underline the fact problems, can lead to increased network latency and result that WCD is a system safety problem. Site operators must in web applications being perceived as unresponsive. Like- adopt a holistic view of their infrastructure, and carefully wise, routinely accessed resources put a heavy load on web configure web caches taking into consideration their complex servers, wasting valuable computational cycles and network interactions with origin servers. bandwidth. The Internet community has long been aware of To summarize, we make the following contributions: these problems, and deeply explored caching strategies and • We propose a novel methodology to detect sites impacted technologies as an effective solution. by WCD at scale. Unlike existing WCD scan tools that Today web caches are ubiquitous, and are used at various— are designed for site administrators to test their own and often multiple—steps of Internet communications. For properties in a controlled environment, our methodology instance, client applications such as web browsers implement is designed to automatically detect WCD in the wild. their own private cache for a single user. Otherwise, web caches deployed together with a web server, or as a man-in- • We present findings that quantify the prevalence of WCD the-middle proxy on the communication path implement a in 295 sites among the Alexa Top 5K, and provide a shared cache designed to store and serve objects frequently detailed breakdown of leaked information types. Our accessed by multiple users. In all cases, a cache hit elimi- analysis
Recommended publications
  • Nvme-Based Caching in Video-Delivery Cdns
    NVMe-Based Caching In Video-Delivery CDNs Thomas Colonna 1901942 Double Degree INSA Rennes Supervisor: Sébastien Lafond Faculty of Science and Engineering Åbo Akademi University 2020 In HTTP-based video-delivery CDNs (content delivery networks), a critical compo- nent is caching servers that serve clients with content obtained from an origin server. These caches store the content they obtain in RAM or onto disks for serving additional clients without fetching them from the origin. For most use cases, access to the disk remains the limiting factor, thus requiring a significant amount of RAM to avoid these accesses and achieve good performance, but increasing the cost. In this master’s thesis, we benchmark various approaches to provide storage such as regular disks and NVMe-based SSDs. Based on these insights, we design a caching mod- ule for a web server relying on kernel-bypass, implemented using the reference framework SPDK. The outcome of the master’s thesis is a caching module leveraging specific proper- ties of NVMe disks, and benchmark results for the various types of disks with the two approaches to caching (i.e., regular filesystem based or NVMe-specific). Contents 1 Introduction 1 2 Background 3 2.1 Caching in the context of CDNs . .3 2.2 Performances of the different disk models . .4 2.2.1 Hard-Disk Drive . .4 2.2.2 Random-Access Memory . .5 2.2.3 Solid-State Drive . .6 2.2.4 Non-Volatile Main Memory . .6 2.2.5 Performance comparison of 2019-2020 storage devices . .6 2.3 Analysing Nginx . .7 2.3.1 Event processing .
    [Show full text]
  • The Study of Open Source Cmss by CHETAN GOPILAL JAIN a Thesis
    The Study of Open Source CMSs By CHETAN GOPILAL JAIN A thesis submitted to the Graduate School-New Brunswick Rutgers, The State University of New Jersey in partial fulfillment of the requirements for the degree of Master of Science Graduate Program in Electrical and Computer Engineering written under the direction of Prof Deborah Silver and approved by ________________________ ________________________ ________________________ ________________________ New Brunswick, New Jersey May, 2010 2010 CHETAN GOPILAL JAIN ALL RIGHTS RESERVED ABSTRACT OF THE THESIS The Study of Open Source CMSs By CHETAN JAIN Thesis Director: Professor Deborah Silver In this thesis, we evaluate different open source content management systems (CMSs) and determine their appropriateness for scientific research laboratories’ website content management. We describe different CMSs and evaluate them based on the following criteria: ease of installation, usability, maintenance and updates, scalability, community strength and support, user roles and workflow, security, and Web 2.0 features. We then choose of these system, Drupal, and demonstrate its effectiveness for two different scientific websites, Bio-1 and Vizlab. Drupal allows integrating new features using community contributed modules and easy future up-gradation. Successful implementation of both projects using Drupal highlights the importance of Open Source CMSs. ii Acknowledgement I would like to thank my advisor, Prof. Deborah Silver, for her support and encouragement while writing this thesis.Also, I would like to thank my parents and family who provided me with a strong educational foundation and supported me in all my academic pursuits. I also acknowledge the help of VIZLAB at Rutgers. iii Table of Contents Abstract ……………………………………………………………………….………………….ii Acknowledgement …………………………………………………………...………………...
    [Show full text]
  • Tinkertool System 7 Reference Manual Ii
    Documentation 0642-1075/2 TinkerTool System 7 Reference Manual ii Version 7.5, August 24, 2021. US-English edition. MBS Documentation 0642-1075/2 © Copyright 2003 – 2021 by Marcel Bresink Software-Systeme Marcel Bresink Software-Systeme Ringstr. 21 56630 Kretz Germany All rights reserved. No part of this publication may be redistributed, translated in other languages, or transmitted, in any form or by any means, electronic, mechanical, recording, or otherwise, without the prior written permission of the publisher. This publication may contain examples of data used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. This publication could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. The publisher may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Make sure that you are using the correct edition of the publication for the level of the product. The version number can be found at the top of this page. Apple, macOS, iCloud, and FireWire are registered trademarks of Apple Inc. Intel is a registered trademark of Intel Corporation. UNIX is a registered trademark of The Open Group. Broadcom is a registered trademark of Broadcom, Inc. Amazon Web Services is a registered trademark of Amazon.com, Inc.
    [Show full text]
  • ICP and the Squid Web Cache* 1 Introduction
    ICP and the Squid Web Cache Duane Wessels k cla y August 13, 1997 Abstract We describ e the structure and functionality of the Internet Cache Proto col ICP and its implementation in the Squid Web Caching software. ICP is a lightweight message format used for communication among Web caches. Caches exchange ICP queries and replies to gather information to use in selecting the most appropriate lo cation from which to retrieve an ob ject. We present background on the history of ICP, and discuss issues in ICP deployment, e- ciency, security, and interaction with other asp ects of Web trac b ehavior. We catalog successes, failures, and lessons learned from using ICP to deploy a global Web cache hierarchy. 1 Intro duction Ever since the World-Wide Web rose to p opularity around 1994, much e ort has fo cused on reducing latency exp erienced by users. Sur ng the Web can b e slow for many reasons. Server systems b ecome slow when overloaded, esp ecially when hot sp ots suddenly app ear. Congestion can also o ccur at network exchange p oints or across links, and is esp ecially prevalent across trans-o ceanic links that often cost millions of dollars p er month. A common, alb eit exp ensiveway to alleviate such problems is to upgrade the overloaded resource: get a faster server, another E1, a bigger switch. However, this approach is not only often eco- nomically infeasible, but p erhaps more imp ortantly, it also fails to consider the numerous parties involved in even a single, simple Web transaction.
    [Show full text]
  • Squirrel: a Decentralized Peer-To-Peer Web Cache
    To appear in the 21th ACM Symposium on Principles of Distributed Computing (PODC 2002) Squirrel: A decentralized peer-to-peer web cache ∗ Sitaram Iyer Antony Rowstron Peter Druschel Rice University Microsoft Research Rice University 6100 Main Street, MS-132 7 J J Thomson Close 6100 Main Street, MS-132 Houston, TX 77005, USA Cambridge, CB3 0FB, UK Houston, TX 77005, USA [email protected] [email protected] [email protected] ABSTRACT There is substantial literature in the areas of cooperative This paper presents a decentralized, peer-to-peer web cache web caching [3, 6, 9, 20, 23, 24] and web cache workload char- called Squirrel. The key idea is to enable web browsers on acterization [4]. This paper demonstrates how it is possible, desktop machines to share their local caches, to form an ef- desirable and efficient to adopt a peer-to-peer approach to ficient and scalable web cache, without the need for dedicated web caching in a corporate LAN type environment, located hardware and the associated administrative cost. We propose in a single geographical region. Using trace-based simula- and evaluate decentralized web caching algorithms for Squir- tion, it shows how most of the functionality and performance rel, and discover that it exhibits performance comparable to of a traditional web cache can be achieved in a completely a centralized web cache in terms of hit ratio, bandwidth us- self-organizing system that needs no extra hardware or ad- age and latency. It also achieves the benefits of decentraliza- ministration, and is fault-resilient. The following paragraphs tion, such as being scalable, self-organizing and resilient to elaborate on these ideas.
    [Show full text]
  • Performance Guide 3.7.0.123608
    Vizrt Community Expansion Performance Guide 3.7.0.123608 Copyright © 2010-2012 Vizrt. All rights reserved. No part of this software, documentation or publication may be reproduced, transcribed, stored in a retrieval system, translated into any language, computer language, or transmitted in any form or by any means, electronically, mechanically, magnetically, optically, chemically, photocopied, manually, or otherwise, without prior written permission from Vizrt. Vizrt specifically retains title to all Vizrt software. This software is supplied under a license agreement and may only be installed, used or copied in accordance to that agreement. Disclaimer Vizrt provides this publication “as is” without warranty of any kind, either expressed or implied. This publication may contain technical inaccuracies or typographical errors. While every precaution has been taken in the preparation of this document to ensure that it contains accurate and up-to-date information, the publisher and author assume no responsibility for errors or omissions. Nor is any liability assumed for damages resulting from the use of the information contained in this document. Vizrt’s policy is one of continual development, so the content of this document is periodically subject to be modified without notice. These changes will be incorporated in new editions of the publication. Vizrt may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time. Vizrt may have patents or pending patent applications covering subject matters in this document. The furnishing of this document does not give you any license to these patents. Technical Support For technical support and the latest news of upgrades, documentation, and related products, visit the Vizrt web site at www.vizrt.com.
    [Show full text]
  • Appximity: a Context-Aware Mobile Application Management Framework
    AppXimity: A Context-Aware Mobile Application Management Framework by Ernest E. Alexander Jr. Aaron B.Sc., Universiti Tenaga Nasional, 2011 A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE in the Department of Computer Science c Ernest E. Alexander Jr. Aaron, 2017 University of Victoria All rights reserved. This thesis may not be reproduced in whole or in part, by photocopying or other means, without the permission of the author. ii AppXimity: A Context-Aware Mobile Application Management Framework by Ernest E. Alexander Jr. Aaron B.Sc., Universiti Tenaga Nasional, 2011 Supervisory Committee Dr. Hausi A. M¨uller,Supervisor (Department of Computer Science) Dr. Issa Traor´e,Outside Member (Department of Electrical and Computer Engineering) iii Supervisory Committee Dr. Hausi A. M¨uller,Supervisor (Department of Computer Science) Dr. Issa Traor´e,Outside Member (Department of Electrical and Computer Engineering) ABSTRACT The Internet of Things is an emerging technology where everyday devices with sensing and actuating capabilities are connected to the Internet and seamlessly com- municate with other devices over the network. The proliferation of mobile devices enables access to unprecedented levels of rich information sources. Mobile app cre- ators can leverage this information to create personalized mobile applications. The amount of available mobile apps available for download will increase over time, and thus, accessing and managing apps can become cumbersome. This thesis presents AppXimity, a mobile-app-management that provides personalized app suggestions and recommendations by leveraging user preferences and contextual information to provide relevant apps in a given context. Suggested apps represent a subset of the installed apps that match nearby businesses or have been identified by AppXimity as apps of interest to the user, and recommended apps are those apps that are not installed on the user's device, but may be of interest to the user, in that location.
    [Show full text]
  • Exploring Search Engine Optimization (SEO) Techniques for Dynamic Websites
    Master Thesis Computer Science Thesis no: MCS-2011-10 March, 2011 __________________________________________________________________________ Exploring Search Engine Optimization (SEO) Techniques for Dynamic Websites Wasfa Kanwal School of Computing Blekinge Institute of Technology SE – 371 39 Karlskrona Sweden This thesis is submitted to the School of Computing at Blekinge Institute of Technology in partial fulfillment of the requirements for the degree of Master of Science in Computer Science. The thesis is equivalent to 20 weeks of full time studies. ___________________________________________________________________________________ Contact Information: Author: Wasfa Kanwal E-mail: [email protected] University advisor: Martin Boldt, PhD. School of Computing School of Computing Internet : www.bth.se/com Blekinge Institute of Technology Phone : +46 455 38 50 00 SE – 371 39 Karlskrona Fax : +46 455 38 50 57 Sweden ii ABSTRACT Context: With growing number of online businesses, Search Engine Optimization (SEO) has become vital to capitalize a business because SEO is key factor for marketing an online business. SEO is the process to optimize a website so that it ranks well on Search Engine Result Pages (SERPs). Dynamic websites are commonly used for e-commerce because they are easier to update and expand; however they are subjected to indexing related problems. Objectives: This research aims to examine and address dynamic websites indexing related issues. To achieve aims and objectives of this research I intend to explore dynamic websites indexing considerations, investigate SEO tools to carry SEO campaign in three major search engines (Google, Yahoo and Bing), experiment SEO techniques, and determine to what extent dynamic websites can be made search engine friendly on these major search engines.
    [Show full text]
  • Ubuntu Server Guide Basic Installation Preparing to Install
    Ubuntu Server Guide Welcome to the Ubuntu Server Guide! This site includes information on using Ubuntu Server for the latest LTS release, Ubuntu 20.04 LTS (Focal Fossa). For an offline version as well as versions for previous releases see below. Improving the Documentation If you find any errors or have suggestions for improvements to pages, please use the link at thebottomof each topic titled: “Help improve this document in the forum.” This link will take you to the Server Discourse forum for the specific page you are viewing. There you can share your comments or let us know aboutbugs with any page. PDFs and Previous Releases Below are links to the previous Ubuntu Server release server guides as well as an offline copy of the current version of this site: Ubuntu 20.04 LTS (Focal Fossa): PDF Ubuntu 18.04 LTS (Bionic Beaver): Web and PDF Ubuntu 16.04 LTS (Xenial Xerus): Web and PDF Support There are a couple of different ways that the Ubuntu Server edition is supported: commercial support and community support. The main commercial support (and development funding) is available from Canonical, Ltd. They supply reasonably- priced support contracts on a per desktop or per-server basis. For more information see the Ubuntu Advantage page. Community support is also provided by dedicated individuals and companies that wish to make Ubuntu the best distribution possible. Support is provided through multiple mailing lists, IRC channels, forums, blogs, wikis, etc. The large amount of information available can be overwhelming, but a good search engine query can usually provide an answer to your questions.
    [Show full text]
  • Chapter 4 Working with Content
    C H A P T E R 4 ! ! ! Working with Content WordPress comes with several basic content types: posts, pages, links, and media files. In addition, you can create your own content types, which I’ll talk more about in Chapter 12. Posts and pages make up the heart of your site. You’ll probably add images, audio, video, or other documents like Office files to augment your posts and pages, and WordPress makes it easy to upload and link to these files. WordPress also includes a robust link manager, which you can use to maintain a blogroll or other link directory. WordPress automatically generates a number of different feeds to syndicate your content. I’ll talk about the four feed formats, the common feeds, and the hidden ones that even experienced WordPress users might not know about. Since WordPress is known for its exceptional blogging capabilities, I’ll talk about posts first, and then discuss how pages differ from posts—and how you can modify them to be more alike. Posts Collectively, posts make up the blog (or news) section of your site. Posts are generally listed according to date, but can also be tagged or filed into categories. At its most basic, a post consists of a title and some content. In addition, WordPress will add some required metadata to every post: an ID number, an author, a publication date, a category, the publication status, and a visibility setting. There are a number of other things that may be added to posts, but the aforementioned are the essentials.
    [Show full text]
  • A Two-Level Intelligent Web Caching Scheme with a Hybrid Extreme Learning Machine and Least Frequently Used 725
    A Two-Level Intelligent Web Caching Scheme with a Hybrid Extreme Learning Machine and Least Frequently Used 725 A Two-Level Intelligent Web Caching Scheme with a Hybrid Extreme Learning Machine and Least Frequently Used Phet Imtongkhum1, Chakchai So-In1, Surasak Sanguanpong2, Songyut Phoemphon1* 1 Department of Computer Science, Faculty of Science, Khon Kaen University, Thailand 2 Department of Computer Engineering, Faculty of Engineering, Kasetsart University, Thailand [email protected], [email protected], [email protected], [email protected] Abstract [4-5]. Consequently, many studies and applications have The immense increase in data traffic has created been developed to not only evaluate policy-based several issues for the Internet community, including long approaches of multi-tier ISPs but also efficiently use delays and low throughput. Most Internet user activity the existing Internet infrastructure, particularly from occurs via web access, thus making it a major source of the user perspective, including factors such as the Internet traffic. Due to a lack of effective management throughput and delay, which contribute to the quality schemes, Internet usage is inefficient. Advances in of accessible web objects. Web proxies (web caching) caching mechanisms have led to the introduction of web have been developed to address this issue. The key proxies that have improved real-time communication and concept that underlies proxy/caching is similar to that cost savings. Although several traditional caching polices of traditional caching of the memory hierarchical have been implemented to increase speed and simplicity, structure of a CPU, including the CPU cache, RAM, cache replacement accuracy remains a key limitation due and hard disk [6].
    [Show full text]
  • Web Tracking: Mechanisms, Implications, and Defenses Tomasz Bujlow, Member, IEEE, Valentín Carela-Español, Josep Solé-Pareta, and Pere Barlet-Ros
    ARXIV.ORG DIGITAL LIBRARY 1 Web Tracking: Mechanisms, Implications, and Defenses Tomasz Bujlow, Member, IEEE, Valentín Carela-Español, Josep Solé-Pareta, and Pere Barlet-Ros Abstract—This articles surveys the existing literature on the of ads [1], [2], price discrimination [3], [4], assessing our methods currently used by web services to track the user online as health and mental condition [5], [6], or assessing financial well as their purposes, implications, and possible user’s defenses. credibility [7]–[9]. Apart from that, the data can be accessed A significant majority of reviewed articles and web resources are from years 2012 – 2014. Privacy seems to be the Achilles’ by government agencies and identity thieves. Some affiliate heel of today’s web. Web services make continuous efforts to programs (e.g., pay-per-sale [10]) require tracking to follow obtain as much information as they can about the things we the user from the website where the advertisement is placed search, the sites we visit, the people with who we contact, to the website where the actual purchase is made [11]. and the products we buy. Tracking is usually performed for Personal information in the web can be voluntarily given commercial purposes. We present 5 main groups of methods used for user tracking, which are based on sessions, client by the user (e.g., by filling web forms) or it can be collected storage, client cache, fingerprinting, or yet other approaches. indirectly without their knowledge through the analysis of the A special focus is placed on mechanisms that use web caches, IP headers, HTTP requests, queries in search engines, or even operational caches, and fingerprinting, as they are usually very by using JavaScript and Flash programs embedded in web rich in terms of using various creative methodologies.
    [Show full text]