Cached and Confused: Web Cache Deception in the Wild

Cached and Confused: Web Cache Deception in the Wild Seyed Ali Mirheidari, University of Trento; Sajjad Arshad, Northeastern University; Kaan Onarlioglu, Akamai Technologies; Bruno Crispo, University of Trento, KU Leuven; Engin Kirda and William Robertson, Northeastern University https://www.usenix.org/conference/usenixsecurity20/presentation/mirheidari This paper is included in the Proceedings of the 29th USENIX Security Symposium. August 12–14, 2020 978-1-939133-17-5 Open access to the Proceedings of the 29th USENIX Security Symposium is sponsored by USENIX. Cached and Confused: Web Cache Deception in the Wild Seyed Ali Mirheidari Sajjad Arshad∗ Kaan Onarlioglu University of Trento Northeastern University Akamai Technologies Bruno Crispo Engin Kirda William Robertson University of Trento & Northeastern University Northeastern University KU Leuven Abstract In particular, Content Delivery Network (CDN) providers Web cache deception (WCD) is an attack proposed in 2017, heavily rely on effective web content caching at their edge where an attacker tricks a caching proxy into erroneously servers, which together comprise a massively-distributed In- storing private information transmitted over the Internet and ternet overlay network of caching reverse proxies. Popular subsequently gains unauthorized access to that cached data. CDN providers advertise accelerated content delivery and Due to the widespread use of web caches and, in particular, high availability via global coverage and deployments reach- the use of massive networks of caching proxies deployed ing hundreds of thousands of servers [5,15]. A recent scien- by content distribution network (CDN) providers as a critical tific measurement also estimates that more than 74% of the component of the Internet, WCD puts a substantial population Alexa Top 1K are served by CDN providers, indicating that of Internet users at risk. CDNs and more generally web caching play a central role in We present the first large-scale study that quantifies the the Internet [26]. prevalence of WCD in 340 high-profile sites among the Alexa While there exist technologies that enable limited caching Top 5K. Our analysis reveals WCD vulnerabilities that leak of dynamically-generated pages, web caching primarily tar- private user data as well as secret authentication and autho- gets static, publicly accessible content. In other words, web rization tokens that can be leveraged by an attacker to mount caches store static content that is costly to deliver due to an ob- damaging web application attacks. Furthermore, we explore ject’s size or distance. Importantly, these objects must not con- WCD in a scientific framework as an instance of the path tain private or otherwise sensitive information, as application- confusion class of attacks, and demonstrate that variations on level access control is not enforced at cache servers. Good the path confusion technique used make it possible to exploit candidates for caching include frequently accessed images, sites that are otherwise not impacted by the original attack. software and document downloads, streaming media, style Our findings show that many popular sites remain vulnerable sheets, and large static HTML and JavaScript files. two years after the public disclosure of WCD. In 2017, Gil presented a novel attack called web cache de- Our empirical experiments with popular CDN providers ception (WCD) that can trick a web cache into incorrectly underline the fact that web caches are not plug & play tech- storing sensitive content, and consequently give an attacker nologies. In order to mitigate WCD, site operators must adopt unauthorized access to that content [23,24]. Gil demonstrated a holistic view of their web infrastructure and carefully con- the issue with a real-life attack scenario targeting a high pro- figure cache settings appropriate for their applications. file site, PayPal, and showed that WCD can successfully leak details of a private payment account. Consequently, WCD garnered significant media attention, and prompted responses 1 Introduction from major web cache and CDN providers [8,9,12,13,43,48]. At its core, WCD results from path confusion between an Web caches have become an essential component of the Inter- origin server and a web cache. In other words, different in- net infrastructure with numerous use cases such as reducing terpretations of a requested URL at these two points lead to bandwidth costs in private enterprise networks and accelerat- a disagreement on the cacheability of a given object. This ing content delivery over the World Wide Web. Today caching disagreement can then be exploited to trick the web cache is implemented at multiple stages of Internet communications, into storing non-cacheable objects. WCD does not imply for instance in popular web browsers [45,58], at caching prox- that these individual components—the origin server and web ies [55, 64], and directly at origin web servers [6, 46]. cache—are incorrectly configured per se. Instead, their haz- ∗Currently employed by Google. ardous interactions as a system lead to the vulnerability. As a USENIX Association 29th USENIX Security Symposium 665 result, detecting and correcting vulnerable systems is a cum- Ethical Considerations. We have designed our measure- bersome task, and may require careful inspection of the en- ment methodology to minimize the impact on scanned sites, tire caching architecture. Combined with the aforementioned and limit the inconvenience we impose on site operators. Sim- pervasiveness and critical role of web caches in the Internet ilarly, we have followed responsible disclosure principles to infrastructure, WCD has become a severely damaging issue. notify the impacted parties, and limited the information we In this paper, we first present a large-scale measurement share in this paper to minimize the risk of any inadvertent and analysis of WCD over 295 sites in the Alexa Top 5K. We damage to them or their end-users. We discuss details of the present a repeatable and automated methodology to discover ethical considerations pertaining to this work in Section 3.5. vulnerable sites over the Internet, and a detailed analysis of our findings to characterize the extent of the problem. Our results show that many high-profile sites that handle sensitive 2 Background & Related Work and private data are impacted by WCD and are vulnerable to practical attacks. We then discuss additional path confusion In this section, we present an overview of how web cache methods that can maximize the damage potential of WCD, deception (WCD) attacks work and discuss related concepts and demonstrate their impact in a follow-up experiment over and technologies such as web caches, path confusion, and an extended data set of 340 sites. existing WCD scanners. As of this writing, the academic To the best of our knowledge, this is the first in-depth inves- literature has not yet directly covered WCD. Nevertheless, in tigation of WCD in a scientific framework and at this scale. In this section we summarize previous publications pertaining addition, the scope of our investigation goes beyond private to other security issues around web caches and CDNs. data leakage to provide novel insights into the severity of WCD. We demonstrate how WCD can be exploited to steal other types of sensitive data including security tokens, explain 2.1 Web Caches advanced attack techniques that elevate WCD vulnerabilities Repeatedly transferring heavily used and large web objects to injection vectors, and quantify our findings through further over the Internet is a costly process for both web servers and analysis of collected data. their end-users. Multiple round-trips between a client and Finally, we perform an empirical analysis of popular CDN server over long distances, especially in the face of common providers, documenting their default caching settings and technical issues with the Internet infrastructure and routing customization mechanisms. Our findings underline the fact problems, can lead to increased network latency and result that WCD is a system safety problem. Site operators must in web applications being perceived as unresponsive. Like- adopt a holistic view of their infrastructure, and carefully wise, routinely accessed resources put a heavy load on web configure web caches taking into consideration their complex servers, wasting valuable computational cycles and network interactions with origin servers. bandwidth. The Internet community has long been aware of To summarize, we make the following contributions: these problems, and deeply explored caching strategies and • We propose a novel methodology to detect sites impacted technologies as an effective solution. by WCD at scale. Unlike existing WCD scan tools that Today web caches are ubiquitous, and are used at various— are designed for site administrators to test their own and often multiple—steps of Internet communications. For properties in a controlled environment, our methodology instance, client applications such as web browsers implement is designed to automatically detect WCD in the wild. their own private cache for a single user. Otherwise, web caches deployed together with a web server, or as a man-in- • We present findings that quantify the prevalence of WCD the-middle proxy on the communication path implement a in 295 sites among the Alexa Top 5K, and provide a shared cache designed to store and serve objects frequently detailed breakdown of leaked information types. Our accessed by multiple users. In all cases, a cache hit elimi- analysis

Cached and Confused: Web Cache Deception in the Wild

Nvme-Based Caching in Video-Delivery Cdns

The Study of Open Source Cmss by CHETAN GOPILAL JAIN a Thesis

Tinkertool System 7 Reference Manual Ii

ICP and the Squid Web Cache* 1 Introduction

Squirrel: a Decentralized Peer-To-Peer Web Cache

Performance Guide 3.7.0.123608

Appximity: a Context-Aware Mobile Application Management Framework

Exploring Search Engine Optimization (SEO) Techniques for Dynamic Websites

Ubuntu Server Guide Basic Installation Preparing to Install

Chapter 4 Working with Content

A Two-Level Intelligent Web Caching Scheme with a Hybrid Extreme Learning Machine and Least Frequently Used 725

Web Tracking: Mechanisms, Implications, and Defenses Tomasz Bujlow, Member, IEEE, Valentín Carela-Español, Josep Solé-Pareta, and Pere Barlet-Ros