The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms

IEEE TRANSACTIONS ON COMPUTERS, VOL. 56, NO. 7, JULY 2007 889 The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms Ali R. Butt, Member, IEEE, Chris Gniady, Member, IEEE, and Y. Charlie Hu, Member, IEEE Abstract—A fundamental challenge in improving file system performance is to design effective block replacement algorithms to minimize buffer cache misses. Despite the well-known interactions between prefetching and caching, almost all buffer cache replacement algorithms have been proposed and studied comparatively, without taking into account file system prefetching, which exists in all modern operating systems. This paper shows that such kernel prefetching can have a significant impact on the relative performance in terms of the number of actual disk I/Os of many well-known replacement algorithms; it can not only narrow the performance gap but also change the relative performance benefits of different algorithms. Moreover, since prefetching can increase the number of blocks clustered for each disk I/O and, hence, the time to complete the I/O, the reduction in the number of disk I/Os may not translate into proportional reduction in the total I/O time. These results demonstrate the importance of buffer caching research taking file system prefetching into consideration and comparing the actual disk I/Os and the execution time under different replacement algorithms. Index Terms—Metrics/measurement, operating systems, file systems management, operating systems performance, measurements, simulation. Ç 1INTRODUCTION fundamental challenge in improving file system blocks need to be stored in the buffer cache, prefetching can Aperformance is to design an effective block replace- potentially compete for buffer cache entries. The close ment algorithm for the buffer cache. Over the years, interactions between prefetching and caching, which exploit developing such algorithms has remained one of the most user-inserted hints, have also been studied [8], [9], [34]. active research areas in operating systems design. The However, such user-inserted hints place a burden on the oldest and yet still widely used replacement algorithm is programmer as the programmer has to accurately identify the Least Recently Used (LRU) replacement policy [10]. The the access patterns of the application. effectiveness of LRU comes from the simple yet powerful The file systems in most modern operating systems principle of locality: Recently accessed blocks are likely to implement prefetching transparently by detecting sequen- be accessed again in the near future. Numerous other tial patterns and issuing asynchronous I/Os. In addition, replacement algorithms have been developed [3], [12], [13], file systems perform synchronous read-ahead processes, [15], [16], [21], [22], [24], [27], [29], [32], [35], [36]. However, where requests are clustered to 64 Kbytes (typically) to although a few of these replacement algorithm studies have amortize seek costs over larger reads. As in the user- been implemented and reported measured results, the vast inserted-hints scenario, such kernel-driven prefetching also majority used trace-driven simulations and used the cache interacts with and potentially affects the performance of the hit ratio as the main performance metric in comparing buffer caching algorithm being used. However, despite the different algorithms. well-known potential interactions between prefetching and Prefetching is another highly effective technique for caching [8], almost all buffer cache replacement algorithms improving the I/O performance. The main motivation of have been proposed and studied comparatively without prefetching is to overlap computation with I/O and, thus, taking into account kernel-driven prefetching [3], [15], [21], reduce the exposed latency of I/Os. One way to induce [22], [27], [29], [32], [35], [36]. prefetching is via user-inserted hints of I/O access patterns, In this paper, we perform a detailed simulation study of the impact of kernel prefetching on the performance of a set which are then used by the file system to perform of representative buffer cache replacement algorithms asynchronous I/Os [8], [9], [34]. Since prefetched disk developed over the last decade. By using a cache simulator that faithfully implements the kernel prefetching of the Linux OS, we compare different replacement algorithms in . A.R. Butt is with the Department of Computer Science, Virginia Polytechnic Institute and State University, McBryde Hall (0106), terms of the miss ratio, the actual number of aggregated Blacksburg, VA 24061. E-mail: [email protected]. synchronous and asynchronous disk I/O requests issued . C. Gniady is with the Department of Computer Science, University of from the kernel to the disk driver, and the ultimate Arizona, Gould-Simpson Building, 1040 E. 4th Street, PO Box 210077, performance measure—the actual running time of applica- Tucson, AZ 85721-0077. E-mail: [email protected]. tions using an accurate disk simulator, DiskSim [17]. Our . Y.C. Hu is with the School of Electrical and Computer Engineering, Purdue University, 465 Northwestern Ave., West Lafayette, IN 47907. study shows that the widely used kernel prefetching can E-mail: [email protected]. indeed have a significant impact on the relative performance of different replacement algorithms. In particular, Manuscript received 4 Nov. 2005; revised 30 Aug. 2006; accepted 9 Nov. 2006; published online 14 Feb. 2007. the findings and contributions of this paper are: For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TC-0396-1105. We develop a buffer caching simulator, AccuSim, that Digital Object Identifier no. 10.1109/TC.2007.1029. faithfully implements the Linux kernel prefetching 0018-9340/07/$25.00 ß 2007 IEEE Published by the IEEE Computer Society 890 IEEE TRANSACTIONS ON COMPUTERS, VOL. 56, NO. 7, JULY 2007 and I/O clustering and interfaces with a realistic disk simulator, DiskSim [17], to allow the performance study of different replacement algorithms in a realistic environment. We show how we can adapt a set of eight representative cache replacement algorithms to exploit kernel prefetching to minimize disk I/Os while preserving the nature of these replacement algorithms. We find that kernel prefetching can not only significantly narrow the performance gap of different replacement algorithms but also change the relative performance benefits of different algorithms. We present results demonstrating that the hit ratio is far from a definitive metric in comparing different replacement algorithms. The number of aggregated disk I/Os gives much more accurate information on disk I/O load, but the actual application running time is the only definitive performance metric in the Fig. 1. Various kernel components on the path from file system presence of asynchronous kernel prefetching. operations to the disk. Our experimental results clearly demonstrate the importance of future buffer caching research incorporating file system blocks that are stored on the disk contiguously. As system prefetching. To facilitate this, we are making our the cost of reading a block or the whole cluster is simulator publicly available [1]. The simulator implements comparable, the advantage of clustering is that it provides the kernel prefetching and I/O clustering mechanisms of the prefetching at a minimal cost. Linux 2.4 kernel and provides functionalities that allow easy Prefetching in the Linux kernel is beneficial for sequential integration of any cache replacement algorithms into the accesses to a file, that is, accesses to consecutive blocks of simulator for accurately studying and comparing the buffer that file. When a file is not accessed sequentially, prefetch- cache hit ratio, the number of disk requests, and the ing can potentially result in extra I/Os by reading data that application execution time under different replacement is not used. For this reason, it is critical for the kernel to algorithms. make its best guesses on whether future accesses are The outline of this paper is listed as follows: Section 2 sequential and decide whether to perform prefetching. describes kernel prefetching in Linux and in 4.4BSD. The Linux kernel decides on prefetching by examining Section 3 shows the potential impact of kernel prefetching the pattern of accesses to the file and only considers on buffer caching algorithms by using Belady’s algorithm as prefetching for read accesses. To simplify the description, an example. Section 4 summarizes the various buffer cache we assume that an access is to one block only. Although an replacement algorithms that are evaluated in this paper. access (system call) can be to multiple consecutive blocks, Section 5 describes the architecture of our buffer cache the simplification does not change the behavior of the simulator, AccuSim, and how it interfaces with DiskSim to prefetching algorithm. On the first access ðA Þ to a file, the accurately simulate synchronous and asynchronous pre- 1 kernel has no information about the access pattern. In this fetching requests. Section 6 presents trace-driven simulation 1 case, the kernel resorts to conservative prefetching; it reads results of performance evaluation and comparison of the the on-demand accessed block and prefetches a minimum studied replacement algorithms. Finally, Section 7 discusses number of blocks following the on-demand accessed block. additional related work and Section 8 concludes this paper. The minimum

Load more