Nvme-Based Caching in Video-Delivery Cdns

Total Page:16

File Type:pdf, Size:1020Kb

Load more

NVMe-Based Caching In
Video-Delivery CDNs

Thomas Colonna 1901942 Double Degree INSA Rennes Supervisor: Sébastien Lafond Faculty of Science and Engineering Åbo Akademi University 2020
In HTTP-based video-delivery CDNs (content delivery networks), a critical component is caching servers that serve clients with content obtained from an origin server. These caches store the content they obtain in RAM or onto disks for serving additional clients without fetching them from the origin. For most use cases, access to the disk remains the limiting factor, thus requiring a significant amount of RAM to avoid these accesses and achieve good performance, but increasing the cost.
In this master’s thesis, we benchmark various approaches to provide storage such as regular disks and NVMe-based SSDs. Based on these insights, we design a caching module for a web server relying on kernel-bypass, implemented using the reference framework SPDK.
The outcome of the master’s thesis is a caching module leveraging specific properties of NVMe disks, and benchmark results for the various types of disks with the two approaches to caching (i.e., regular filesystem based or NVMe-specific).

Contents

  • 1 Introduction
  • 1

  • 2 Background
  • 3

344566677788
2.1 Caching in the context of CDNs . . . . . . . . . . . . . . . . . . . . . . 2.2 Performances of the different disk models . . . . . . . . . . . . . . . . .
2.2.1 Hard-Disk Drive . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Random-Access Memory . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Solid-State Drive . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.4 Non-Volatile Main Memory . . . . . . . . . . . . . . . . . . . . 2.2.5 Performance comparison of 2019-2020 storage devices . . . . . .
2.3 Analysing Nginx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 Event processing . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Caching with Nginx . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.4 Kernel bypasses . . . . . . . . . . . . . . . . . . . . . . . . . .

  • 3 Related Work
  • 11

3.1 Kernel bypass for network . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2 Kernel bypass for storage . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.3 Optimizing the existing access to the storage . . . . . . . . . . . . . . . . 11 3.4 Optimizing the application . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.5 Optimizing the CDN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

  • 4 Technical Choices
  • 13

4.1 First measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.2 Data visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.3 Disks performances comparison . . . . . . . . . . . . . . . . . . . . . . 14

  • 5 NVMe-specific caching module
  • 21

5.1 Architecture of the module . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.1.1 Asynchronous programming . . . . . . . . . . . . . . . . . . . . 21
5.2 Integration with Nginx . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 5.3 Experimental protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

  • 6 Conclusion
  • 27

6.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

1. Introduction

Broadpeak is a company created in 2010, designing and providing components for content delivery networks. The company focuses on content delivery networks for video content, like IPTV, Cable and on demand video. Broadpeak provides it services to content providers and internet services providers like Orange.
The classical ways of consuming videos and films, broadcasts and DVDs have become less popular since the emergence of services providing on-demand video via the internet. To provide this kind of service, some technologies have been created, such as the HLS [1] and DASH [2] protocols. These protocols were developed using norms and protocols already existing for the web, allowing video streaming to use a large part of the infrastructure created for the web. HLS and DASH contributed to popularizing the ondemand video streaming by allowing a large number of devices (smartphones, laptops, television...) to access the system and by creating a standard ensuring the interoperability between all the devices and servers in the network.
Now, the focus is on creating systems able to manage the increasing number of users switching from traditional television to OTT (Over The Top) streaming, because of the advantages of this technology: choice, replay, time-shifting... The example of the containment because of the COVID-19 shows the importance of having optimized and scalable systems to be able to manage the huge demand for content. In the context of the video distribution, one key element is the Content Delivery Network (CDN). The role of the CDN is to deliver the content once the user is connected to it. Because video content is huge (e.g., 1.8 GB for an hour of video delivered at 4 Mbps), the pressure on the CDN throughput is higher for video than for any other content. Note that this trend is reinforcing as shifting to large screens and Ultra HD means that video bitrate is doubled. In addition, video CDNs have stringent performance requirements. Indeed, a CDN must be able to store this huge amount of data and at the same time answer the requests of the users at a constant rate. The constant rate is important, because even the slightest fluctuation can lead to a pause in the reading for one user or more, which has a huge negative impact on the user experience.
Video Streaming relies heavily on I/O operations. I/O operations are the bottleneck of

1a Content Delivery Network (CDN). To increase the performance of the system, we can increase the number of servers in the CDN and, thus, its cost and energy consumption. Improving the I/O performances allows reducing the number of servers without losing performances, leading to less cost, and less energy consumption.
Concerning the I/O, with the evolution of storage device technology, the software overhead, negligible until now, is becoming huge compared to the time spent in the disk, especially the overhead related to the isolation between the kernel and user-space. This isolation requires that every time a storage system is accessed, a context switch is done [3]. The POSIX API used to access files is also an element that slows down the process.
In this master’s thesis, we analyze the performance of a streaming server using standard modules of Nginx. We show the time distribution between the kernel and the disk, and compare it to a configuration where an NVMe disk is used to illustrate the impact of the overhead (Chapter 4). We present a new Nginx caching module using SPDK [4], aiming to eliminate this overhead when using NVMe disks (Chapter 5).
The design of our module is similar to EvFS [5], but we specialized it for caching purposes, and we do not expose a POSIX API. We moved the API to interact with the storage device outside the kernel, reducing the overhead due to the file system API. We also designed it knowing the specific workload of the caching server to maximize the lifetime of the flash memory of the disks. There are several projects aiming to achieve a similar objective: Bluestore [6] and NVMeDirect [7] are two examples.

2

2. Background

2.1 Caching in the context of CDNs

A Content Delivery Network (CDN) is a network composed of servers located in various geographical places [8]. Its purpose is to quickly deliver content such as web pages, files, videos... The infrastructure of a CDN is conceived to reduce the network load and the server load, leading to an improvement of the performances of the applications using it [9].
The infrastructure of a CDN heavily relies on the principle of caching. Web caching is a technology developed to allow websites and applications to face the increasing number of users of the internet [11, 10, 12, 13]. Web caching can reduce the bandwidth consumption, improve the load-balancing, reduce network latency and provide higher availability.
To achieve caching we need at least two servers, the first one, the origin and the second one the caching server. The principle is that when a user requests some content to the server, the request first goes through the caching server to check if the content demanded is already in the cache. If it is, the caching server provides the content to the user. If not, the caching server requests the content to the origin server that stores the data, provides it to the user and keeps a copy of the content for the next time someone requests this content, as illustrated in figure 2.1.
Using the same principle, a CDN is composed of one or several origin servers which store the content to be delivered, and multiples caching all around the geographical area it delivers, as shown in figure 2.2. When a user requests a content to the CDN the request goes through the closest caching server. Thus, the latency is greatly reduced, because the data does not need to travel all around the world, and the workload is spread between all the servers.
This infrastructure is largely used by the industry because of its efficiency, its reliability and its performances. However, in some cases, this is not enough, like in a video CDN. In the context of a video CDN, objects are large, so the request payload is larger and the number of requests is reduced. Since there are fewer requests to serve, the load of

3
Figure 2.1: caching diagram. the CPU is reduced. Because the content to deliver is larger, a video CDN requires more storage capacity. To improve the performances of these systems we can leverage on the technology used to store the data.

2.2 Performances of the different disk models

2.2.1 Hard-Disk Drive

The Hard-Disk Drive (HDD) is a permanent storage technology made of rotating magnetic disks that store the information. To read or write data with this kind of storage, the disk needs to place the reading head on the correct location and start to read the data.
The main advantage of this kind of storage is that it is a well-known technology and we can produce devices that can store a huge amount of the information. HDDs have the best capacity/price ratio making it appealing for use in caching servers. However, due to the multiple steps required to locate the data before reading or writing on the disk, this technology has a big latency. The latency is even grater where the accessing small files, in a video CDN context this could be the video manifests or the subtitle files. The limited number of reading head installed on a HDD increase the latency of concurrent access to the data stored on it.
Before the popularization of the technologies presented below, some techniques to

4
Figure 2.2: CDN diagram. improve the performance of the HDD were developed. Some strategies of writing on a disk are more performing than others: it is more efficient to split the writing onto several disks than writing it on one disk [14]. The main inconvenience of this technique is that it requires more disks and the capacity of storage is not increased. This principle is used in the technology RAID (Redundant Array of Independent Disks). It requires a large number of disks and greatly improves all the performances of a storage system: speed, reliability, and availability. However, this system is still less performing than the other systems presented below.

2.2.2 Random-Access Memory

The Random-Access Memory (RAM) is a kind of data storage that is used to store "working data". It is mainly used for this purpose because it is a volatile memory storage, meaning that when the power supply of the system is shut off, all the data stored in the RAM is lost. In the context of a caching server, since the server is supposed to be constantly run-

5ning this is not a real problem, and a backup policy can be implemented, to periodically make a copy of all the data stored in the RAM on a more reliable storage device.
The interest in storing data in the RAM is the extremely low latency. This is the fastest memory technology (excluding CPU cache). On a caching server delivering video to a large number of clients, the performance gain of storing the data in the RAM instead of storing it on HDDs is crucial. This is why some server models have terabytes of RAM. Although it is performing, it is also orders of magnitude more expensive than HDDs, ass showed in table 2.1.

2.2.3 Solid-State Drive

Solid-State Drives (SSD) are a new type of storage device. They are composed of electronic components like NAND gates. Unlike the RAM, SSDs are non-volatile memory storage. They are still slower to access than the RAM, but they are orders of magnitude faster than HDDs. Now SSDs are so fast that the protocol used to interact with the disks is the bottleneck.
To interact with storage disks a computer must use a protocol supported by the disks.
Today there are multiple protocols. The most used are the Serial AT Attachment (SATA) and the Serial Attached SCSI (SAS). These protocols were designed with the slow HardDisk Drive (HDD) in mind. To use the full potential of SSDs, a new protocol has been created: the Non-Volatile Memory Host Controller Interface Specification or NVM Express (NVMe). This protocol requires that the disk is connected to the computer with a PCI Express (PCIe). It allows more efficient access to the data: better speed, better parallel access. With this new protocol, SSDs have great performances and are cheaper than the RAM.
Since then, some hardware constructors have made significant progress and have created disks that we can consider as Ultra-Low Latency SSDs [15, 16]. These disks are designed to use the maximum of the NVMe protocol.

2.2.4 Non-Volatile Main Memory

Non-Volatile Main Memory (NVMM) is a design of storage devices composed of flash memory, like SSDs, connected to the computer through the DIMM ports. One example of this kind of storage is the Intel Optane DIMM [17].

2.2.5 Performance comparison of 2019-2020 storage devices

Table 2.1 shows the characteristics and performances of eight NVMe SSDs compared to standard RAM. Even thought the RAM has the best performances, it comes with a high

6cost. In order to create a caching server with a large capacity, NVMe SSD are a viable solution thanks to their price and capacity.

2.3 Analysing Nginx

Nginx is an asynchronous web server. Two large CDN and related operators have publicly claimed that they use Nginx (i) Cloudflare, and (ii) Netflix. It is also used as the loadbalancer for HTTP service in Kubernetes. It can fulfill multiples roles: web server, reverse proxy, load balancer, mail proxy, and HTTP cache.

2.3.1 Event processing

Nginx uses a multi-process pattern where there is a master process controlling worker processes. This allows reloading the configuration of the server without interruption of the service. Nginx is asynchronous because each request made to the server creates an event that is posted in the list of events, and the event loop picks one event and starts processing the request associated. The pseudo-code of the event loop is presented in figure 2.3.

while (true) {next_timer = get_next_timer(); wait_for_io_events(XX, next_timer - now); process_timer_event(); process_list_event();
}

Figure 2.3: Pseudo code of the event loop of Nginx
The main advantage of this model of programming is that requests are non-blocking.
Whenever a request is waiting for further I/O (reading or writing from disk or network), it does not block the process so that other requests can be processed in the meantime, minimizing the CPU time spent doing nothing.

2.3.2 Caching with Nginx

Nginx is often used to make a caching server. The standard cache implemented in Nginx uses the abstraction of a file system to store the data to cache. The cache can be stored on any storage device connected, even the RAM via a RAMFS (a file system located in

7the RAM of the computer). To access the cache, Nginx must use a system call to ask the kernel to retrieve the data. The goal of this cache is to be deployed everywhere, to achieve that it does not use any special properties of the hardware of the storage device.
Since Nginx is modular, some modules implement different caching mechanisms. For instance, the module "nginx_tcache" is a module developed for tengine [18], a fork of Nginx. This module uses a cache located in the RAM. It stores the data to cache in a large RAM segment allocated at the start of the server. Since the cache is stored in RAM, whenever the server is shut down, all the data stored in the cache is lost. The same drawback applies to the standard cache of Nginx if we choose to use the RAMFS.

2.3.3 Overhead

The role of the operating system is to provide an abstraction of the hardware, to allow applications to run without worrying about the hardware they run on. This abstraction provides an API usable by any program launched on the computer. On UNIX-based systems this API is called POSIX and is used by almost every software developed for UNIX systems.
However, when a software needs to use a specific feature of a hardware component or it needs to have faster access to the memory or to the network, this API is not efficient. Mostly because it is too generic to implement special features.
Each time we want to access a file on a disk or send a request on the network, the program executes a system call. It calls a function provided by the kernel of the operating system. This call requires a context switch from the user space, the area of the memory the applications can use, to the kernel space, the area of the memory the kernel of the operating system and its extensions use. This context switch is heavy on the CPU, especially if it is done frequently. A large number of cycles are lost because of it.
When a program executes a system call, the calling thread initializes some registers.
Then it is suspended to allow the kernel to execute the system call. Once the call is finished, the kernel restores the context of the suspended thread where it was stopped and then resumes the execution of the thread. If the system call is an I/O operation, the suspension of the thread can be long, leading to a significant loss of performances.

2.3.4 Kernel bypasses

Some use cases do not need the abstraction of the operating system and are considerably slowed down by multiple context switches, for instance when an application has an intensive use of the network. To improve the performances, one of the solutions available is to ignore the kernel. The application directly uses the hardware it needs without ask-

8ing the kernel, therefore there is no context switch, leading to a significant improvement of the performances of the application. Software with intensive use of the network frequently uses this solution. Open vSwitch [19], a software switch, uses DPDK to improve its performances.
Without using the kernel API to access the network or the storage disks, the application has better performances, but it is more difficult to develop because everything must be handled by it, even the security, usually handled by the kernel.

  • 9
  • 10

3. Related Work

3.1 Kernel bypass for network

One of the first uses of kernel bypass is in networking. The DPDK (Data Plane Development Kit) project is a popular project used to achieve kernel bypass for networking. The F-Stack project is also frequently use for kernel bypass for networking.

3.2 Kernel bypass for storage

This project of using kernel bypass to store data is not the first one. Ceph’s Bluestore [6] project is one example. However, its main goal is not to do memory cache, but permanently store data.
EvFS [5] is also a project of kernel bypass for storage. The goal of this project is to create a POSIX API in the user space, meaning that a program that needs access to the data on an NVMe Drive does not need to make a system call to retrieve the data, eliminating the overhead. Exposing a POSIX API makes it easier to integrate into preexisting systems.
The project NVMeDirect [7] aims to give, to the applications, direct access to the
NVMe SSD, to improve the performances of the applications. With this project, other applications can still use the default I/O stack to access the storage device, whereas the SPDK framework claims the whole device and it becomes impossible to access the storage device without using SPDK. The project also provides different features aimed to make the interaction with the storage device easier and more flexible.

3.3 Optimizing the existing access to the storage

The I/O stack and the abstractions developed with it were created when the storage devices were slow compared to the CPU. Some projects are trying to modernize the existing I/O stacks, of the kernel, to use the full potential of the new storage technologies.

11
Enberg, Rao, and Tarkoma proposed a new structure for the operating system. They called this structure, the parakernel [20]. The role of this structure is to partition the hardware of the computer to allow different applications to use it at the same time. Thanks to this separation, it is easier to parallelize the access to I/O devices. With this structure, a program that needs some resources will ask the parakernel for the access, and the parakernel will isolate a part of the device and give access to the program. Now, the program can directly interact with the device, without involving the kernel.
Lee, Shin, Song, Ham, Lee, and Jeong are creating a new I/O stack to use with NVMe devices. This I/O stack is optimized to use low latency NVMe SSDs like the Intel Optane [16] or the Samsung Z-SSD [15]. They are not creating a new API, they just change the implementation of some functions of the POSIX standard to use the advantages of the hardware. To improve the performances, they developed a special block I/O layer, lighter and faster but only usable with NVMe SSDs. They also overlapped/parallelized the operations that create the structures necessary to store the request result with the data transfer from the device to the memory, and they implemented several lazy mechanisms.

Recommended publications
  • Tinkertool System 7 Reference Manual Ii

    Tinkertool System 7 Reference Manual Ii

    Documentation 0642-1075/2 TinkerTool System 7 Reference Manual ii Version 7.5, August 24, 2021. US-English edition. MBS Documentation 0642-1075/2 © Copyright 2003 – 2021 by Marcel Bresink Software-Systeme Marcel Bresink Software-Systeme Ringstr. 21 56630 Kretz Germany All rights reserved. No part of this publication may be redistributed, translated in other languages, or transmitted, in any form or by any means, electronic, mechanical, recording, or otherwise, without the prior written permission of the publisher. This publication may contain examples of data used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. This publication could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. The publisher may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Make sure that you are using the correct edition of the publication for the level of the product. The version number can be found at the top of this page. Apple, macOS, iCloud, and FireWire are registered trademarks of Apple Inc. Intel is a registered trademark of Intel Corporation. UNIX is a registered trademark of The Open Group. Broadcom is a registered trademark of Broadcom, Inc. Amazon Web Services is a registered trademark of Amazon.com, Inc.
  • ICP and the Squid Web Cache* 1 Introduction

    ICP and the Squid Web Cache* 1 Introduction

    ICP and the Squid Web Cache Duane Wessels k cla y August 13, 1997 Abstract We describ e the structure and functionality of the Internet Cache Proto col ICP and its implementation in the Squid Web Caching software. ICP is a lightweight message format used for communication among Web caches. Caches exchange ICP queries and replies to gather information to use in selecting the most appropriate lo cation from which to retrieve an ob ject. We present background on the history of ICP, and discuss issues in ICP deployment, e- ciency, security, and interaction with other asp ects of Web trac b ehavior. We catalog successes, failures, and lessons learned from using ICP to deploy a global Web cache hierarchy. 1 Intro duction Ever since the World-Wide Web rose to p opularity around 1994, much e ort has fo cused on reducing latency exp erienced by users. Sur ng the Web can b e slow for many reasons. Server systems b ecome slow when overloaded, esp ecially when hot sp ots suddenly app ear. Congestion can also o ccur at network exchange p oints or across links, and is esp ecially prevalent across trans-o ceanic links that often cost millions of dollars p er month. A common, alb eit exp ensiveway to alleviate such problems is to upgrade the overloaded resource: get a faster server, another E1, a bigger switch. However, this approach is not only often eco- nomically infeasible, but p erhaps more imp ortantly, it also fails to consider the numerous parties involved in even a single, simple Web transaction.
  • Squirrel: a Decentralized Peer-To-Peer Web Cache

    Squirrel: a Decentralized Peer-To-Peer Web Cache

    To appear in the 21th ACM Symposium on Principles of Distributed Computing (PODC 2002) Squirrel: A decentralized peer-to-peer web cache ∗ Sitaram Iyer Antony Rowstron Peter Druschel Rice University Microsoft Research Rice University 6100 Main Street, MS-132 7 J J Thomson Close 6100 Main Street, MS-132 Houston, TX 77005, USA Cambridge, CB3 0FB, UK Houston, TX 77005, USA [email protected] [email protected] [email protected] ABSTRACT There is substantial literature in the areas of cooperative This paper presents a decentralized, peer-to-peer web cache web caching [3, 6, 9, 20, 23, 24] and web cache workload char- called Squirrel. The key idea is to enable web browsers on acterization [4]. This paper demonstrates how it is possible, desktop machines to share their local caches, to form an ef- desirable and efficient to adopt a peer-to-peer approach to ficient and scalable web cache, without the need for dedicated web caching in a corporate LAN type environment, located hardware and the associated administrative cost. We propose in a single geographical region. Using trace-based simula- and evaluate decentralized web caching algorithms for Squir- tion, it shows how most of the functionality and performance rel, and discover that it exhibits performance comparable to of a traditional web cache can be achieved in a completely a centralized web cache in terms of hit ratio, bandwidth us- self-organizing system that needs no extra hardware or ad- age and latency. It also achieves the benefits of decentraliza- ministration, and is fault-resilient. The following paragraphs tion, such as being scalable, self-organizing and resilient to elaborate on these ideas.
  • Performance Guide 3.7.0.123608

    Performance Guide 3.7.0.123608

    Vizrt Community Expansion Performance Guide 3.7.0.123608 Copyright © 2010-2012 Vizrt. All rights reserved. No part of this software, documentation or publication may be reproduced, transcribed, stored in a retrieval system, translated into any language, computer language, or transmitted in any form or by any means, electronically, mechanically, magnetically, optically, chemically, photocopied, manually, or otherwise, without prior written permission from Vizrt. Vizrt specifically retains title to all Vizrt software. This software is supplied under a license agreement and may only be installed, used or copied in accordance to that agreement. Disclaimer Vizrt provides this publication “as is” without warranty of any kind, either expressed or implied. This publication may contain technical inaccuracies or typographical errors. While every precaution has been taken in the preparation of this document to ensure that it contains accurate and up-to-date information, the publisher and author assume no responsibility for errors or omissions. Nor is any liability assumed for damages resulting from the use of the information contained in this document. Vizrt’s policy is one of continual development, so the content of this document is periodically subject to be modified without notice. These changes will be incorporated in new editions of the publication. Vizrt may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time. Vizrt may have patents or pending patent applications covering subject matters in this document. The furnishing of this document does not give you any license to these patents. Technical Support For technical support and the latest news of upgrades, documentation, and related products, visit the Vizrt web site at www.vizrt.com.
  • Ubuntu Server Guide Basic Installation Preparing to Install

    Ubuntu Server Guide Basic Installation Preparing to Install

    Ubuntu Server Guide Welcome to the Ubuntu Server Guide! This site includes information on using Ubuntu Server for the latest LTS release, Ubuntu 20.04 LTS (Focal Fossa). For an offline version as well as versions for previous releases see below. Improving the Documentation If you find any errors or have suggestions for improvements to pages, please use the link at thebottomof each topic titled: “Help improve this document in the forum.” This link will take you to the Server Discourse forum for the specific page you are viewing. There you can share your comments or let us know aboutbugs with any page. PDFs and Previous Releases Below are links to the previous Ubuntu Server release server guides as well as an offline copy of the current version of this site: Ubuntu 20.04 LTS (Focal Fossa): PDF Ubuntu 18.04 LTS (Bionic Beaver): Web and PDF Ubuntu 16.04 LTS (Xenial Xerus): Web and PDF Support There are a couple of different ways that the Ubuntu Server edition is supported: commercial support and community support. The main commercial support (and development funding) is available from Canonical, Ltd. They supply reasonably- priced support contracts on a per desktop or per-server basis. For more information see the Ubuntu Advantage page. Community support is also provided by dedicated individuals and companies that wish to make Ubuntu the best distribution possible. Support is provided through multiple mailing lists, IRC channels, forums, blogs, wikis, etc. The large amount of information available can be overwhelming, but a good search engine query can usually provide an answer to your questions.
  • Cached and Confused: Web Cache Deception in the Wild

    Cached and Confused: Web Cache Deception in the Wild

    Cached and Confused: Web Cache Deception in the Wild Seyed Ali Mirheidari, University of Trento; Sajjad Arshad, Northeastern University; Kaan Onarlioglu, Akamai Technologies; Bruno Crispo, University of Trento, KU Leuven; Engin Kirda and William Robertson, Northeastern University https://www.usenix.org/conference/usenixsecurity20/presentation/mirheidari This paper is included in the Proceedings of the 29th USENIX Security Symposium. August 12–14, 2020 978-1-939133-17-5 Open access to the Proceedings of the 29th USENIX Security Symposium is sponsored by USENIX. Cached and Confused: Web Cache Deception in the Wild Seyed Ali Mirheidari Sajjad Arshad∗ Kaan Onarlioglu University of Trento Northeastern University Akamai Technologies Bruno Crispo Engin Kirda William Robertson University of Trento & Northeastern University Northeastern University KU Leuven Abstract In particular, Content Delivery Network (CDN) providers Web cache deception (WCD) is an attack proposed in 2017, heavily rely on effective web content caching at their edge where an attacker tricks a caching proxy into erroneously servers, which together comprise a massively-distributed In- storing private information transmitted over the Internet and ternet overlay network of caching reverse proxies. Popular subsequently gains unauthorized access to that cached data. CDN providers advertise accelerated content delivery and Due to the widespread use of web caches and, in particular, high availability via global coverage and deployments reach- the use of massive networks of caching proxies deployed ing hundreds of thousands of servers [5,15]. A recent scien- by content distribution network (CDN) providers as a critical tific measurement also estimates that more than 74% of the component of the Internet, WCD puts a substantial population Alexa Top 1K are served by CDN providers, indicating that of Internet users at risk.
  • A Two-Level Intelligent Web Caching Scheme with a Hybrid Extreme Learning Machine and Least Frequently Used 725

    A Two-Level Intelligent Web Caching Scheme with a Hybrid Extreme Learning Machine and Least Frequently Used 725

    A Two-Level Intelligent Web Caching Scheme with a Hybrid Extreme Learning Machine and Least Frequently Used 725 A Two-Level Intelligent Web Caching Scheme with a Hybrid Extreme Learning Machine and Least Frequently Used Phet Imtongkhum1, Chakchai So-In1, Surasak Sanguanpong2, Songyut Phoemphon1* 1 Department of Computer Science, Faculty of Science, Khon Kaen University, Thailand 2 Department of Computer Engineering, Faculty of Engineering, Kasetsart University, Thailand [email protected], [email protected], [email protected], [email protected] Abstract [4-5]. Consequently, many studies and applications have The immense increase in data traffic has created been developed to not only evaluate policy-based several issues for the Internet community, including long approaches of multi-tier ISPs but also efficiently use delays and low throughput. Most Internet user activity the existing Internet infrastructure, particularly from occurs via web access, thus making it a major source of the user perspective, including factors such as the Internet traffic. Due to a lack of effective management throughput and delay, which contribute to the quality schemes, Internet usage is inefficient. Advances in of accessible web objects. Web proxies (web caching) caching mechanisms have led to the introduction of web have been developed to address this issue. The key proxies that have improved real-time communication and concept that underlies proxy/caching is similar to that cost savings. Although several traditional caching polices of traditional caching of the memory hierarchical have been implemented to increase speed and simplicity, structure of a CPU, including the CPU cache, RAM, cache replacement accuracy remains a key limitation due and hard disk [6].
  • Web Tracking: Mechanisms, Implications, and Defenses Tomasz Bujlow, Member, IEEE, Valentín Carela-Español, Josep Solé-Pareta, and Pere Barlet-Ros

    Web Tracking: Mechanisms, Implications, and Defenses Tomasz Bujlow, Member, IEEE, Valentín Carela-Español, Josep Solé-Pareta, and Pere Barlet-Ros

    ARXIV.ORG DIGITAL LIBRARY 1 Web Tracking: Mechanisms, Implications, and Defenses Tomasz Bujlow, Member, IEEE, Valentín Carela-Español, Josep Solé-Pareta, and Pere Barlet-Ros Abstract—This articles surveys the existing literature on the of ads [1], [2], price discrimination [3], [4], assessing our methods currently used by web services to track the user online as health and mental condition [5], [6], or assessing financial well as their purposes, implications, and possible user’s defenses. credibility [7]–[9]. Apart from that, the data can be accessed A significant majority of reviewed articles and web resources are from years 2012 – 2014. Privacy seems to be the Achilles’ by government agencies and identity thieves. Some affiliate heel of today’s web. Web services make continuous efforts to programs (e.g., pay-per-sale [10]) require tracking to follow obtain as much information as they can about the things we the user from the website where the advertisement is placed search, the sites we visit, the people with who we contact, to the website where the actual purchase is made [11]. and the products we buy. Tracking is usually performed for Personal information in the web can be voluntarily given commercial purposes. We present 5 main groups of methods used for user tracking, which are based on sessions, client by the user (e.g., by filling web forms) or it can be collected storage, client cache, fingerprinting, or yet other approaches. indirectly without their knowledge through the analysis of the A special focus is placed on mechanisms that use web caches, IP headers, HTTP requests, queries in search engines, or even operational caches, and fingerprinting, as they are usually very by using JavaScript and Flash programs embedded in web rich in terms of using various creative methodologies.
  • Optimal Web Cache Sizing: Scalable Methods for Exact Solutions

    Optimal Web Cache Sizing: Scalable Methods for Exact Solutions

    Optimal Web Cache Sizing: Scalable Methods for Exact Solutions Terence Kelly Daniel Reeves g ftpkelly,dreeves @eecs.umich.edu Electrical Engineering & Computer Science University of Michigan Ann Arbor, Michigan 48109 USA February 20, 2000 Abstract This paper describes two approaches to the problem of determining exact optimal storage capacity for Web caches based on expected workload and the monetary costs of memory and bandwidth. The first approach considers memory/bandwidth tradeoffs in an idealized cost model. It assumes that workload consists of independent references drawn from a known distribution (e.g., Zipf) and caches employ a “Perfect LFU” removal policy. We derive conditions under which a shared higher-level “parent” cache serving several lower-level “child” caches is economically viable. We also characterize circumstances under which globally optimal storage capacities in such a hierarchy can be determined through a decentralized computation in which caches individually minimize local expenditures. The second approach is applicable if the workload at a single cache is represented by an explicit request sequence and the cache employs any one of a large family of removal policies that includes LRU. The miss costs associated with individual requests may be completely arbitrary, and the cost of cache storage need only be monotonic. We use an M log M efficient single-pass simulation algorithm to compute aggregate miss cost as a function of cache size in O M M time and O memory, where is the number of requests in the workload. Because it allows us to compute complete stack distance transformations and hit rates at all cache sizes with modest computational resources, this algorithm permits us to examine reference locality and cache performance with no loss of precision.
  • An Introduction to Apache Traffic Server

    An Introduction to Apache Traffic Server

    An introduction to Apache Traffic Server Leif Hedstrom December 2011 Who am I? • One of the drivers to open source Traffic Server (at Yahoo) • Committer for Apache Traffic Server • VP of Apache Traffic Server PMC • ASF member • Go Daddy’s Principal Architect for Hosting [email protected] @zwoop +zwoop History of Traffic Server Inktomi Yahoo Apache Traffic Server Traffic Server Traffic Server 1995 2000 2005 2010 Plenty of FOSS Proxy Servers Features ATS HAproxy nginx Squid Varnish mod_proxy Worker Threads Y N N N Y Y Multi-Process N Y Y N Y Y Event-driven Y Y Y Y sometimes Y Plugin APIs Y N Y part Y Y Forward Proxy Y N N Y N Y Reverse Proxy Y Y Y Y Y Y Transp. Proxy Y Y N Y N N Load Balancer weak Y Y Y Y Y Persistent Cache Y N Y Y sorta Y ESI Y N N Y kinda N ICP soon N N Y N N Keep-Alive Y N Y Y Y Y SSL Y N Y Y N Y Pipeline Y N Y Y N Y Mandatory useless benchmark … 120,000" 100,000" 80,000" 60,000" Throughput) 40,000" 20,000" 0" ATS"2.1.9" Nginx"0.8.53" Varnish"2.1.5" Req"/"sec" Less useless benchmark … 120,000# 4.0# 3.5# 100,000# 3.0# 80,000# 2.5# 60,000# 2.0# Throughput% 1.5# 40,000# 1.0# Time%to%first%response% 20,000# 0.5# 0# 0.0# ATS#2.1.9# Nginx#0.8.53# Varnish#2.1.5# Req#/#sec# Latency#(ms)# Intermediaries - Forward Proxy The browser (UA) is explicitly configured to use the FP server as a proxy Web site A GET http://c.com/ HTTP/1.1 Forward Web site B Proxy Browser GET / HTTP/1.1 Host: c.com Web site C Intermediaries - Reverse Proxy The reverse proxy has explicit rules for a.com, b.com and c.com.
  • Web Cache Entanglement: Novel Pathways to Poisoning

    Web Cache Entanglement: Novel Pathways to Poisoning

    Web Cache Entanglement: Novel Pathways to Poisoning James Kettle - [email protected] - @albinowax Caches are woven into websites throughout the net, discreetly juggling data between users, and yet they are rarely scrutinized in any depth. In this paper, I'll show you how to remotely probe through the inner workings of caches to find subtle inconsistencies, and combine these with gadgets to build majestic exploit chains. These flaws pervade all layers of caching - from sprawling CDNs, through caching web servers and frameworks, all the way down to fragment-level internal template caches. Building on my prior cache poisoning research, I'll demonstrate how misguided transformations, naive normalization, and optimistic assumptions let me perform numerous attacks, including persistently poisoning every page on an online newspaper, compromising the administration interface on an internal DoD intelligence website, and disabling Firefox updates globally. Outline Introduction Methodology Unkeyed Query Detection Exploitation - XSS Exploitation - Redirect Cache Parameter Cloaking Akamai Ruby on Rails Unkeyed Method Fat GET Gadgets Key Normalization Key Magic Tricks Encoded XSS Cache Key Injection Relative Path Overwrite Internal Cache Poisoning Tooling Defence Conclusion Introduction Caches save copies of responses to reduce load on the backend system. When a cache receives a HTTP request, it calculates the request's cache key and uses that to identify whether it has the appropriate response already saved, or whether it needs to forward the request on to the back-end. A cache key typically consists of the request method, path, query string, and Host header, plus maybe one or two other headers. In the following request, the values not included in the cache key have been coloured orange.
  • Implementing Reverse Proxy Using Squid

    Implementing Reverse Proxy Using Squid

    Implementing Reverse Proxy Using Squid Prepared By Visolve Squid Team | Introduction | What is Reverse Proxy Cache | About Squid | How Reverse Proxy Cache work | Configuring Squid as Reverse Proxy | Configuring Squid as Reverse Proxy for Multiple Domains | References | Conclusion | | About ViSolve.com | Introduction This document describes reverse proxies, and how they are used to improve Web server performance. Section 1 gives an introduction to reverse proxies, describing what they are and what they are used for. Section 2 compares reverse proxy caches with standard and transparent proxy caches, explaining the different functionality each provides. Section 3 illustrates how the reverse proxy actually caches the content and delivers it to the client. Section 4 describes how to configure Squid as a reverse proxy cache. What is Reverse Proxy Cache Reverse proxy cache, also known as Web Server Acceleration, is a method of reducing the load on a busy web server by using a web cache between the server and the internet. Another benefit that can be gained is improved security. It's one of many ways to improve scalability without increasing the complexity of maintenance too much. A good use of a reverse proxy is to ease the burden on a web server that provides both static and dynamic content. The static content can be cached on the reverse proxy while the web server will be freed up to better handle the dynamic content. By deploying Reverse Proxy Server alongside web servers, sites will: • Avoid the capital expense of purchasing additional web servers by increasing the capacity of existing servers. • Serve more requests for static content from web servers.