New NFS Tracing Tools and Techniques for System Analysis
Total Page:16
File Type:pdf, Size:1020Kb
New NFS Tracing Tools and Techniques for System Analysis The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Ellard, Daniel and Margo Seltzer. 2003. New NFS tracing tools and techniques for system analysis. Proceedings of the Seventeenth Large Installation Systems Administration Conference (LISA XVII): October 26 - 31, 2003, San Diego, California, ed. LISA, 73-85. Berkeley, California: USENIX Association. Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:2798959 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA New NFS Tracing Tools and Techniques for System Analysis Daniel Ellard and Margo Seltzer – Harvard University ABSTRACT Passive NFS traces provide an easy and unobtrusive way to measure, analyze, and gain an understanding of an NFS workload. Historically, such traces have been used primarily by file system researchers in an attempt to understand, categorize, and generalize file system workloads. However, because such traces provide a wealth of detailed information about how a specific system is actually used, they should also be of interest to system administrators. We introduce a new open-source toolkit for passively gathering and summarizing NFS traces and show how to use this toolkit to perform analyses that are difficult or impossible with existing tools. Introduction be analyzed several ways and at different time scales. It also means that there is a clear interface between these The purpose of most passive NFS tracing tools and that they can be used independently.For exam- research tools is to discover and investigate workload characteristics that can be exploited by the operating ple, we have written several other programs to analyze system. The goals of the tools and methods described the output of nfsdump and used them in our own studies in this paper are quite different – we wish to use pas- of NFS workloads and usage patterns [6, 7]. sive NFS trace analysis to help system administrators Another advantage of the separation of the gathering understand the workload on their systems, identify and analysis of traces is that the data may be anonymized latent performance issues, and diagnose specific prob- before analysis. The nfsdump toolkit includes an lems. For example, file system researchers might be anonymizing postprocessor that anonymizes the client and content to know that on a particular system most of server IP addresses, user and group ID numbers, and file the reads come from a small number of large files, but names. All of the data presented in this paper have been the administrators of that system might also want to anonymized so that they may be made public. In ordinary know the names of those files. If the system becomes use by system administrators, however, this anonymiza- swamped with requests for those files, the administra- tion step would be omitted so that specific and potentially tors might also want to identify the owner of the files useful identifying information about users, groups, client and the users responsible for the requests so that those hosts, and file names is preserved. users can be consulted. The rest of this paper is organized as follows: There already exist a number of tools that make • A discussion of related work and other tools for it possible to quantify many aspects of NFS traffic, NFS analysis and system administrators have devised ways to use • An overview of nfsdump and nfsscan these tools in order to identify and debug common • A description of the systems used in our exam- problems [17]. Unfortunately, most of these tech- ple analyses and case studies niques are limited in their ability to analyze the activi- • Basic analyses using nfsscan ties of specific clients and users. Furthermore, they are • Case studies tailored to solving specific problems and are difficult • A discussion of the limits of passive tracing and to use as a way to explore the wealth of information how to augment nfsdump/nfsscan with active that can be gleaned from NFS traces. methods In this paper, we describe nfsdump and nfsscan, • Conclusion a pair of tools that provide a framework for gathering and analyzing NFS traces in a general and extensible Related Work manner.nfsdump collects NFS traces, while nfsscan NFS1 trace analysis has a long and rich history as scans through the traces gathered by nfsdump and cre- a tool for file system research. Because of this, most of ates a set of summary tables that can be analyzed by the work in passive NFS trace analysis has focused on simple scripts or imported into a spreadsheet or other research topics such as characterizing workloads and data analysis tool for further processing. In addition to investigating the effects of client-side caching [5, 15]. nfsdump and nfsscan, we describe ns_timeagg, ns_split, and ns_quickview, three applications that 1In the context of this paper, NFS refers only to NFS ver- simplify manipulation of the tables created by nfsscan. sion 2 protocol [14] and the NFS version 3 protocol [2]. nfs- dump and nfsscan do not support version 4 of the NFS pro- The advantage of decoupling the process of gather- tocol [13, 16] or any variants of NFSv2 and NFSv3, al- ing traces from trace analysis is that the same traces can though our tools may be extended to handle these protocols. 2003 LISA XVII – October 26-31, 2003 – San Diego, CA 75 New NFS Tracing Tools and Techniques for System Analysis Ellard & Seltzer Matthew Blaze introduced several techniques for uses libpcap [10], the same packet-capture library as inferring the client workload from an NFS trace and tcpdump, and has the ability to read and write raw implemented these techniques in rpcspy and nfstrace packet files in the same format as tcpdump or any [1]. Andrew Moore gives a survey of these and other other tool that uses this format. techniques and contrasts the analysis of file system The most important functional differences between workloads via passive network monitoring and direct nfsdump and earlier tools are that nfsdump captures the kernel-level traces [12]. This work shows that passive effective UID and GID of each call, and properly NFS tracing reveals useful and accurate information decodes RPC over TCP (even with jumbo frames). about many aspects of an NFS workload. nfsscan There already exist many tools for gathering NFS traces or summary information about NFS activity. The first step of our analysis system, nfsscan, writes its data in a format designed to be easy to parse Basic tools like nfsstat(1M), iostat(1M), and nfswatch [4] provide aggregate statistics about NFS traffic. For many and interface with other tools. This is in contrast with diagnostic purposes, these statistics suffice. Specialized most other tracing tools, which often generate output in tools such as nfstrace and the NFS Logging System [18] an irregular, difficult to parse, or undocumented format focus entirely on NFS. General-purpose network proto- – for example, tcpdump 3.7.2 prints file offsets for read col monitors, such as tcpdump [11], ethereal [3], rpcspy and write operations in hexadecimal for NFSv3 requests [1], and snoop [19] can be used to analyze NFS traffic and decimal for NFSv2 requests, but does not print and other system activity.Most of these tools contain whether the request is NFSv2 or NFSv3. code to decode and print detailed information about The output of nfsscan consists of one or more NFS calls and responses, and therefore general-purpose distinct tables (depending on how nfsscan is invoked). tools like tcpdump have replaced special-purpose NFS These tables can be analyzed by simple scripts or monitors in many contexts. imported into a spreadsheet, database, or other data In terms of general capability, our system most analysis tool for further processing. These tables closely resembles nfswatch and the NFS logging/nfs- include the following information: logd system provided with Solaris. In comparison to • The total number of NFS operations and the nfswatch, nfsdump/nfsscan is more flexible because it number of times each NFS operation is called runs on more platforms and captures more information during each analysis period. The default analysis – nfsscan can gather any combination of per-user, per- period is five minutes, but a different analysis group, per-client, per-file, and per-directory informa- period may be specified on the command line. tion, while nfswatch gathers either per-user or per- • The average latency of each type of NFS operation. client information. • A map of the file system hierarchy, and infor- mation about each file accessed. In contrast to the NFS logging system and nfslogd, The file system hierarchy is inferred from the our system is completely passive, can run on a separate results of lookup, create, mkdir, rename, host instead of running on the server, and is portable remove, and link calls. Information about each across a range of platforms instead of being tied to file is gathered from the responses to getattr Solaris. A shortcoming of nfsdump/nfsscan is that it does not attempt to reconstruct an application-level trace and other calls. of the NFS activity in the same manner as nfslogd, but By default, nfsscan only records the counts for this capability could be added as a postprocess to the the NFS operations that appear most frequently in a output of nfsdump (much as nfslogd postprocesses the typical workload – read, write, lookup, getattr, access, records created by the NFS Logging System). create, and remove. Additional operations or an alter- native list of operations may be specified on the com- The Toolkit mand line.