The Brave Little Toaster Meets Usenet † Karl L

The Brave Little Toaster Meets Usenet † Karl L. Swartz – Network Appliance ABSTRACT Usenet volume has been growing exponentially for many years; this growth places ever- increasing demands on the resources of a netnews server, particularly disk space – and file system performance – for the article spool area. Keeping up with this demand became substantially more difficult when it could no longer be satisfied by a single disk, and since netnews is incidental to SLAC’s research mission, we wanted to find a solution that could easily scale to meet future growth while requiring minimal system administration effort. In the process of evaluating the various solutions that were proposed, we developed benchmarks to measure performance for this specialized application, and were surprised to find that some of our beliefs and intuition were not supported by the facts. The alternatives considered by SLAC are described, as are the benchmarks developed to evaluate the alternatives, and the results of those benchmarks. While netnews is the application we examined, our experience will hopefully provide inspiration for others to more carefully evaluate their applications instead of using stock benchmarks that may not correlate well with the intended use. Our results may also break down some biases and encourage the reader to consider alternatives which might otherwise have been ignored. Introduction The capital equipment budget for this project was hardly munificent, imposing yet another con- This paper discusses work begun while the straint. (This was eased somewhat after it was shown author was employed by the Stanford Linear Accelera- that the initial budget would not even pay for the nec- tor Center (SLAC). essary disks.) In 1992, Usenet had outgrown the then-current The key to this problem was the second point, netnews server at SLAC. A study of growth trends, simplicity of future growth. Most of the reliability based on data gathered from our server and other shortcomings of SLAC’s earlier netnews servers had sources, provided forecasting tools to guide the con- come as they neared their design limits and became figuration of a server that would be adequate until increasingly susceptible to collapse when a minor early 1995 [1]. Despite an unprecedented surge in surge in traffic overwhelmed their strained resources. growth between mid-1993 and early 1995 [2], that By mid-1995, shutting down building power or net- server managed to outlive its original design life with working for maintenance over a weekend could result minimal tinkering, lasting until mid-1995 before suf- in the netnews server spending several weeks to work fering a major collapse due to netnews volume. through the resulting backlog. Nursing and tuning Even before that collapse, SLAC had been study- such a sickly server just so it could do the job was a ing what to do for a next generation server. The labo- major consumer of labor at times. For the holiday ratory’s primary mission is research in high-energy shutdown, netnews – along with mail and payroll – physics and related fields, so the ongoing drain of was deemed a critical system which would be fixed resources for the care and feeding of an incidental ser- immediately, instead of waiting until after New Year ’s vice like Usenet was becoming an irritant, especially Day, because of the cost of recovering from a pro- in a time of shrinking budgets and layoffs. Four goals tracted outage. were established to guide specification of the new server: Critical Server Resources • Capacity and performance to accommodate The core of a netnews server consists of two projected growth in traffic and readership thru large databases located on disk: the articles them- 1997, without reducing newsgroups or expira- selves; and the history file, which tracks which articles tion times. have been received and when they should be removed. • Simplicity of future growth. For a full news feed and a given set of expiration • Greater reliability. times, the size of each is roughly proportional to the • Reduction in administrative labor costs. number of articles accepted per week. From 1984 until mid-1993 this value grew at a remarkably consistent †This work supported by the United States Department of rate of approximately 67% per year, or doubling every Energy under contract number DE-AC03-76SF00515, and 15 to 16 months [1, 3]. For nearly two years starting simultaneously published as SLAC PUB-7254. mid-1993, growth surged to a 100% annual rate before 1996 LISA X – September 29-October 4, 1996 – Chicago, IL 161 The Brave Little Toaster Meets Usenet Swartz dropping back to the historic curve and perhaps even Large File System Alternatives lower [2]. SLAC’s netnews server was using two file sys- This history file is still relatively manageable tems for the article spool so we were all too familiar despite exponential growth – with the generous expi- with the problems with that solution. It was fairly ration times used by SLAC, the history file and associ- clear that a single, large disk would probably be a ated indexes will only require about 763MB at the end problem for performance, even if we could get one of 19971. The main problem is that in older versions that was large enough (9GB, the largest readily avail- of the dbz library used by C News and INN, the able disk when the system was being acquired, would dbzagain() routine did not automatically reduce work, but not even into 1997) and access it as a single the size of the tag stored in the history.pag file as the file system (with SunOS, we were limited to a 2GB n history file grew beyond the 2 bytes initially planned file system). That would still leave the requirement for for. SLAC discovered this when investigating why simplicity of future growth unaddressed. expire was taking over two days to run – it seemed To get a single, large file system, Sun’s OnLine: to be trying to keep the entire history file in memory DiskSuite [4] appeared to be the answer. Our netnews and was thrashing badly. Rebuilding the history dbz server was a Sun (running SunOS 4.1.3) and other index provided a quick fix until updated software sites seemed to be successfully using it for news. This could be installed to keep the problem from recurring product increases the maximum size of a file system as the the relentless growth continued. on SunOS from 2 gigabytes to 1 terabyte. It allows the Unfortunately, the article spool area is a far more creation of large volumes via striping (‘‘RAID 0’’) or difficult beast to tame. Using the SLAC server as an non-interleaved concatenation of multiple disks. It example again, 8.9GB will be needed at the start of also offers the option of higher availability and relia- 1997, growing to 14.9GB by the end of the year. Even bility through mirroring (RAID 1) [5] and hot spares. the later, larger size wouldn’t be too bad if not for the The choice between organizing the disks as a fact that it will consist of nearly 4.7 million files, with concatenation or a stripe set was difficult. Striping over 1.8 million new files being created (and nearly as would seem to be better for performance since it many deleted) each week. The average rate is three spreads data over all disks. A superficial analysis of file creates and deletes per second – greater capacity is concatenation suggests it would fill most of one disk needed to handle short-term surges. If that isn’t bad before moving on to the next one. However, the BSD enough yet, the problem gets worse if the articles are Fast File System creates directories ‘‘in a cylinder not all stored in a single file system. group that has a greater than average number of free The reason for this is that the structure of news is nodes, and the smallest number of directories in it,’’ mapped directly onto the file system. The hierarchical then tries to place files close to where their directory is nature of newsgroup names becomes a directory hier- located [6]. With the large number of directories in archy, and each article is stored in its own file in the the article spool, one would expect data to be spread directory corresponding to its newsgroup. Cross- amongst disks fairly quickly. posted articles are implemented with links – prefer- The weakness of RAID 0 is that the size of the ably hard links, though symbolic links are used if nec- stripe set is fixed when the RAID virtual device is cre- essary. This is the reason for wanting to have the ated, whereas a concatenation can be expanded as entire article spool in a single file system, since a hard needed. A file system on a stripe set can be expanded link requires just another directory entry, while a sym- by concatenating another stripe set, but that may mean bolic link imposes yet another file creation (and even- buying more disks than are required for the desired tually deletion) along with hits on two file systems capacity. With the price of disks always dropping when referenced. (while capacity and performance increase), buying Using multiple file systems also forces the news disks well ahead of need is not appealing. An alterna- administrator to invest effort in guessing how to allo- tive is to concatenate just one or two disks, but that cate newsgroups to the available file systems in a raises the same performance concerns that motivated manner which balances both space and load. It may be the choice of striping in the first place. Why not just difficult to change this allocation later, and a good bal- start with a concatenation? ance now may not be good in the future if one set of Many system managers claim that holes in an groups grows faster than another.

Load more