Forecasting Disk Resource Requirements for a Usenet Server

Forecasting Disk Resource Requirements for a Usenet Server

The following paper was originally presented at the Seventh System Administration Conference (LISA ’93) Monterey, California, November, 1993 Forecasting Disk Resource Requirements for a Usenet Server Karl L. Swartz Standford Linear Accelerator Center For more information about USENIX Association contact: 1. Phone: 510 528-8649 2. FAX: 510 548-5738 3. Email: [email protected] 4. WWW URL: http://www.usenix.org Forecasting Disk Resource Requirements for a Usenet Server† Karl L. Swartz – Stanford Linear Accelerator Center ABSTRACT Three years ago the Stanford Linear Accelerator Center (SLAC) decided to embrace netnews as a site-wide, multi-platform communications tool for the laboratory’s diverse user community. The Usenet newsgroups as well as other world-wide newsgroup hierarchies were appealing for their unique ability to tap a broad pool of information, while the availability of the software on a number of platforms provided a way to communicate to and amongst the computing community. The previous way of doing this ran only on the VM mainframe system and had become increasingly ineffective as users migrated to other platforms. The increasing dependence on netnews brought with it the requirement that the service be reliable. This was dramatically demonstrated when the long-neglected netnews service collapsed under the load of the traditional fall surge in Usenet traffic and the site was without news service for a week while an upgraded system was installed. One result of that painful event was that efforts were made to forecast growth and the accompanying hardware requirements so that equipment could be acquired and installed before problems became visible to the users. This paper describes the major on-disk databases associated with news software, then presents an analysis of the storage requirements for these databases based on data collected at SLAC. A model is developed from this data which permits forecasting of disk resource requirements for a full feed as a function of time and local policies. Suggestions are also made as to how to modify this model for sites which do not carry a full feed. Why NetNews? Why Usenet? SLAC.) BITNET mailing lists provided the main contact with researchers at other sites. The Stanford Linear Accelerator Center (SLAC) is a medium-sized national research labora- There was also a smaller, though still sizable, tory. The lab’s primary missions are research in ele- group of users who used VAX/VMS systems most of mentary particle physics and development of new the time, if not exclusively. These users tended to techniques in particle accelerators. These are large be excluded from VM News and from CONSPIRE, projects that often involve international collabora- though they did make use of BITNET. DECnet mail tions and a diverse user community. This can with other physics sites was also available. This present a formidable problem for communication communications block posed some problems but was with and amongst users, from discussions on the tolerated at the time. design of new experiments to progress reports on Over the past few years the computing environ- current experiments, as well as the more mundane ment became far more diversified. Unix worksta- but equally necessary announcements of network or tions began to appear, while the PCs, Macintoshes, server outages. There is also a tremendous need to and Amigas grew powerful enough that their users stay in touch with what other researchers are doing. had dwindling need for VM. The number of users In the eighties, the majority of computer users who did not use VM on a regular basis, if at all, at SLAC logged onto an IBM mainframe running increased until the VM-based communication model VM. The default user profile on VM brought up a began to fail completely. A new model was needed VM News session upon login, where announcements that would permit users to read announcements and could be made. Most users would see them since participate in discussions from whatever platform they logged into VM regularly. Discussions were they were accustomed to using. The multitude of handled by mailing lists maintained by LISTSERV, platforms made another home-grown solution plus a home-grown VM conferencing system named undesirable. The software which supports Usenet CONSPIRE. (VM News was also developed at (referred to hereafter as netnews software to distin- guish it from the network itself), with ready availa- bility of NNTP-based readers for a variety of plat- †This work supported by the United States Department forms, seemed to solve the problem except for VM. of Energy under contract number DE-AC03-76SF00515, The discovery of the PennState VM NetNews system and simultaneously published as SLAC PUB-6353. completed the solution. 1993 LISA – November 1-5, 1993 – Monterey, CA 195 Forecasting Disk Resource Requirements for a Usenet Server Swartz B News and NNTP software was built and require a substantial amount of disk space and which installed on a Sun fileserver which had some spare fluctuate in size as a function of news activity. The disk space, and the PennState NetNews software, three primary databases are the articles themselves, with NNTP software from Queen’s University, was the history file, and the active file. Ancillary struc- installed on VM.1 Once the bugs were worked out of tures include thread databases, incoming and outgo- this system, NNTP-based readers were acquired for ing spool areas, and log files. While the following other platforms, often with help from interested description is based on a Unix system running the C users. Meanwhile, a local slac newsgroup hierarchy News software, most structures will likely be similar was being populated with new groups and users were on other systems. being introduced to the new system. The article database is by far the largest portion There was also a great deal of interest in non- of a news system. Unix’s hierarchical directory local groups, of course, i.e., Usenet. After a great structure is a handy analog to the newsgroup hierar- deal of debate over proper use of government-funded chy, so the software uses a simple lexical mapping equipment, the appropriateness of censorship in an of newsgroup names into directory names (‘‘.’’ is academic/research community, and the feasibility of mapped to ‘‘/’’) with each article stored in a determining just what was and was not appropriate, separate file. For example, article 42 in group it was decided to carry all groups and assume a news.announce.important is stored as mature and responsible user community. news/announce/important/42 within the Netnews flourished and mostly solved the com- spool directory (often /usr/spool/news). munication problem, but created a new problem in Cross-posted articles are handled by links. These that users came to expect it to be reliable. By early are normally hard links, though there is some sup- 1992 the ever-increasing growth of Usenet traffic port for falling back to symbolic links if a hard link 2 had begun to severely strain the resources of the Sun fails. which was trying to run netnews while also handling Storing each article in a separate file implies a file service and a variety of other tasks. Spool areas tremendous number of relatively small files. If one would overflow and the now-obsolete B News eschews the apparently lightly tested symlink sup- software would collapse, causing substantial delays port – an option not likely to be available for large and user ire. The expiration noose would be netnews servers for much longer – the use of links tightened another notch, staving off disaster for a bit for cross-posted articles further implies that this longer but also aggravating already irate users as entire database must reside within a single disk par- articles expired before they could be read. tition. Unfortunately, the combination of cross- Work was begun to determine what equipment posts, cancellations, varying expiration times for dif- was needed to support netnews for the next few ferent newsgroups, and the common desire to be able years, and to define requirements in the form of to use grep and other Unix tools on the article data- minimum expiration times. Eventually a base make an alternative implementation difficult SPARCserver 2 with 4.5 GB of disk space was and thus unlikely in the near future. (This is a ordered, arriving mere days too late to avert the col- recurrent thread in the news.software.* newsgroups.) lapse of SLAC’s netnews service from the traditional The history file, typically located in September surge in Usenet traffic. While painful at /usr/lib/news/history, records the article ID the time, the degree of pain made it clear just how of each article seen recently by the news system critical netnews had become to SLAC. along with the time it was received, any explicit Popular reference books on managing Usenet expiration time, and, if the article has not yet are notably silent on the matter of forecasting expired, a list enumerating the newsgroups the arti- resources. [1, 2] Therefore, a study of Usenet growth cle appears in and its sequence number in each was begun and has continued, so that future growth group. All of this is stored in a simple, albeit large, needs can be anticipated and handled before another file with one article mentioned per line. In order to crisis occurs. This paper documents the current state speed lookups by article ID, an index is maintained, of this ongoing study. typically using dbz. The last of the primary news databases is the Organization of a Netnews Server’s Data active file, typically /usr/lib/news/active, A netnews server is composed of a number of which lists each newsgroup known to the news sys- databases, several of which have the potential to tem and, for each group, the range of article 1VM NetNews is a full news system, not an NNTP- 2The documentation for the February 1993 C News based reader.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    9 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us