The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms∗

The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms∗ Ali R. Butt, Chris Gniady, and Y.Charlie Hu Purdue Univeristy West Lafayette, IN 47907 fbutta, gniady, [email protected] ABSTRACT future. Numerous other replacement algorithms have been A fundamental challenge in improving the file system perfor- developed [2, 11, 12, 13, 14, 18, 19, 21, 25, 27, 30, 33, 34]. mance is to design effective block replacement algorithms to However, the vast majority of these replacement algorithm minimize buffer cache misses. Despite the well-known inter- studies used trace-driven simulations and used the cache hit actions between prefetching and caching, almost all buffer ratio as the sole performance metric in comparing different cache replacement algorithms have been proposed and stud- algorithms. ied comparatively without taking into account file system Prefetching is another highly effective technique for im- prefetching which exists in all modern operating systems. proving the I/O performance. The main motivation for This paper shows that such kernel prefetching can have a prefetching is to overlap computation with I/O and thus significant impact on the relative performance in terms of reduce the exposed latency of I/Os. One way to induce the number of actual disk I/Os of many well-known replace- prefetching is via user-inserted hints of I/O access patterns ment algorithms; it can not only narrow the performance which are then used by the file system to perform asyn- gap but also change the relative performance benefits of chronous I/Os [7, 8, 32]. Since prefetched disk blocks need different algorithms. These results demonstrate the impor- to be stored in the buffer cache, prefetching can potentially tance for buffer caching research to take file system prefetch- compete for buffer cache entries. The close interactions ing into consideration. between prefetching and caching that exploit user-inserted hints have also been studied [7, 8, 32]. However, such user- inserted hints place a burden on the programmer as the Categories and Subject Descriptors programmer has to accurately identify the access patterns D.4.8 [Operating Systems]: Performance|Measurements, of the application. Simulation File systems in most modern operating systems imple- ment prefetching transparently by detecting sequential pat- General Terms terns and issuing asynchronous I/Os. In addition, file sys- Design, Experimentation, Measurement, Performance tems perform synchronous read-ahead where requests are Keywords clustered to 64KB (typically) to amortize seek costs over larger reads. As in the user-inserted hints scenario, such Buffer caching, prefetching, replacement algorithms kernel-driven prefetching also interacts with and potentially affects the performance of the buffer caching algorithm be- 1. INTRODUCTION ing used. However, despite the well-known potential interac- A critical problem in improving file system performance tions between prefetching and caching [7], almost all buffer is to design an effective block replacement algorithm for the cache replacement algorithms have been proposed and stud- buffer cache. Over the years, developing such algorithms ied comparatively without taking into account the kernel- has remained one of the most active research areas in oper- driven prefetching [2, 13, 18, 19, 25, 27, 30, 33, 34]. ating systems design. The oldest and yet still widely used In this paper, we perform a detailed simulation study of replacement algorithm is the Least Recently Used (LRU) re- the impact of kernel prefetching on the performance of a set placement policy [9]. The effectiveness of LRU comes from of representative buffer cache replacement algorithms devel- the simple yet powerful principle of locality: recently ac- oped over the last decade. Using a cache simulator that cessed blocks are likely to be accessed again in the near faithfully implements the kernel prefetching of the Linux operating system, we compare different replacement algo- ∗This work is supported in part by the U.S. National Science rithms in terms of the miss ratio, the actual number of ag- Foundation under CAREER award ACI-0238379. gregated synchronous and asynchronous disk I/O requests issued from the kernel to the disk driver, as well as the ul- Permission to make digital or hard copies of all or part of this work for timate performance measure { the actual running time of personal or classroom use is granted without fee provided that copies are applications using an accurate disk simulator, DiskSim [15]. not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to Our study shows that the widely used kernel prefetching can republish, to post on servers or to redistribute to lists, requires prior specific indeed have a significant impact on the relative performance permission and/or a fee. of different replacement algorithms. In particular, the find- SIGMETRICS'05, June 6–10, 2005, Banff, Alberta, Canada. ings and contributions of this paper are: Copyright 2005 ACM 1-59593-022-1/05/0006 ...$5.00. File system access through various kernel subsystems that a file system access has to go through before issued as a disk request. The first critical component is the buffer cache, which can significantly reduce the number of on-demand I/O Buffer Cache requests that are issued to the components below. For se- check for hits Cache quentially accessed files, the kernel also attempts to prefetch cached buffers misses consecutive blocks from the disk to amortize the cost of on- Kernel I/O demand I/Os. Moreover, the kernel has a clustering facility + that attempts to increase the size of a disk I/O to the size Prefetching of a cluster { a cluster is a set of file system blocks that are I/O requests stored on the disk contiguously. As the cost of reading a Kernel I/O Clustering block or the whole cluster is comparable, the advantage of clustering is that it provides prefetching at minimal cost. disk requests Prefetching in the Linux kernel is beneficial for sequential Disk Driver accesses to a file, i.e., accesses to consecutive blocks of that file. When a file is not accessed sequentially, prefetching can potentially result in extra I/Os by reading data that is not Disk used. For this reason, it is critical for the kernel to make its best guesses of whether future accesses are sequential, and decide whether to perform prefetching. Figure 1: Various kernel components on the path from The Linux kernel decides on prefetching by examining the file system operations to the disk. pattern of accesses to the file, and only considers prefetching for read accesses. To simplify the description, we assume an access is to one block only. Although an access (system call) • We develop a buffer caching simulator that faithfully can be to multiple consecutive blocks, the simplification does implements the Linux kernel prefetching to allow the not change the behavior of the prefetching algorithm. On performance study of different replacement algorithms the first access (A1) to a file, the kernel has no information in a realistic environment; about the access pattern. In this case, the kernel resorts to 1 • We show how to adapt different cache replacement al- conservative prefetching ; it reads the on-demand accessed gorithms to exploit kernel prefetching to minimize disk block and prefetches a minimum number of blocks follow- I/Os while preserving the nature of these replacement ing the on-demand accessed block. The minimum number algorithms; of blocks prefetched is at least one, and is typically three. This prefetching is called synchronous prefetching, as the • We find that kernel prefetching can not only signif- prefetched blocks are read along with the on-demand ac- icantly narrow the performance gap of different re- cessed block. The blocks that are prefetched are also replacement algorithms, it can even change the relative ferred to as a read-ahead group. The kernel remembers the performance benefits of different algorithms; current read-ahead group per file and updates it on each access to the file. Note that as A1 was the first access, no • We present results demonstrating that the hit ratio blocks were previously prefetched for this file, and thus the is far from a definitive metric in comparing different previous read-ahead group was empty. replacement algorithms, the number of aggregated disk The next access (A2) may or may not be sequential with I/Os gives much more accurate information of disk I/O respect to A1. If A2 accesses a block that the kernel has not load, but the actual application running time is the already prefetched, i.e., the block is not in A1's read-ahead only definitive performance metric in the presence of group, the kernel decides that prefetching was not useful asynchronous kernel prefetching. and resorts to conservative prefetching as described above. The outline of the paper is as follows. Section 2 describes However, if the block accessed by A2 is in the previous read- kernel prefetching in Linux and in 4.4BSD. Section 3 shows ahead group, showing that prefetching was beneficial, the the potential impact of kernel prefetching on buffer caching kernel decides that the file is being accessed sequentially and algorithms using Belady's algorithm as an example. Sec- performs more aggressive prefetching. The size of the pre- tion 4 summarizes the various buffer cache replacement al- vious read-ahead group, i.e., the number of blocks that were gorithms that are evaluated in this paper. Section 5 presents previously prefetched, is doubled to determine the number trace-driven simulation results of performance evaluation of blocks (N) to be prefetched on this access. However, N is and comparison of the studied replacement algorithms. Fi- never increased beyond a pre-specified maximum (usually 32 nally, Section 6 discusses additional related work, and Sec- blocks).

Load more