USENIX Association
Proceedings of the 2001 USENIX Annual Technical Conference
Boston, Massachusetts, USA June 25Ð30, 2001
THE ADVANCED COMPUTING SYSTEMS ASSOCIATION
© 2001 by The USENIX Association All Rights Reserved For more information about the USENIX Association: Phone: 1 510 528 8649 FAX: 1 510 548 5738 Email: [email protected] WWW: http://www.usenix.org Rights to individual papers remain with the author or the author's employer. Permission is granted for noncommercial reproduction of the work for educational or research purposes. This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein.
Storage management for web proxies
Lan Huang Eran Gabb er Elizab eth Shriver
SUNY Stony Brook Bel l Laboratories Bel l Laboratories
[email protected] [email protected] [email protected]
Christopher A. Stein
Harvard University
Abstract Squid and Apache are two p opular web proxies.
Both of these systems use the standard le sys-
tem services provided by the host op erating system.
On UNIX this is usually UFS, a descendantofthe
Today,caching web proxies use general-purp ose le
4.2BSD UNIX Fast File System (FFS) [13]. FFS
systems to store web ob jects. Proxies, e.g., Squid or
was designed for workstation workloads and is not
Apache, when running on a UNIX system, typically
optimized for the di erent workload and require-
use the standard UNIX le system (UFS) for this
ments of a web proxy. It has b een observed that le
purp ose. UFS was designed for research and engi-
system latency is a key comp onent in the latency
neering environments, whichhave di erentcharac-
observed byweb clients [21].
teristics from that of a caching web proxy. Some
of the di erences are high temp oral lo cality, relaxed
Some commercial vendors haveimproved I/O p er-
p ersistence requirements, and a di erent read/write
formance by rebuilding the entire system stack:
ratio. In this pap er, wecharacterize the web proxy
a sp ecial op erating system with an application-
workload, describ e the design of Hummingbird, a
sp eci c le system executing on dedicated hardware
light-weight le system for web proxies, and present
(e.g., CacheFlow, Network Appliance). Needless to
p erformance measurements of Hummingbird. Hum-
say, these solutions are exp ensive. We b elieve that
mingbird has two distinguishing features: it sepa-
a lightweight and p ortable le system can b e built
rates ob ject naming and storage lo cality through
that will allowproxies to achieve p erformance close
direct application-provided hints, and its clients are
to that of a sp ecialized system on commo dity hard-
compiled with a linked library interface for mem-
ware, within a general-purp ose op erating system,
ory sharing. When we simulated the Squid proxy,
and with minimal changes to their source co de; Gab-
Hummingbird achieves do cument request through-
b er and Shriver [5] discuss this view in detail.
put 2.3{9.4 times larger than with several di erent
versions of UFS. Our exp erimental results are veri-
We have built a simple, lightweight le system li-
ed within the Polygraph proxy b enchmarking en-
brary named Hummingbird that runs on top of a
vironment.
raw disk partition. This system is easily p ortable|
wehave run it with minimal changes on FreeBSD,
IRIX, Solaris, and Linux. In this pap er we de-
1 Intro duction
scrib e the design, interface, and implementation of
this system along with some exp erimental results
Caching web proxies are computer systems dedi-
that compare the p erformance of our system with a
cated to caching and delivering web content. Typ-
UNIX le system. Our results indicate that Hum-
ically, they exist on a corp orate rewall or at the
mingbird's throughput is 2.3{4.0 times larger than
point where an Internet Service Provider (ISP)
asimulated version of Squid running UFS mounted
p eers with its network access provider. From a web
asynchronously on FreeBSD, 5.4{9.4 times faster
p erformance and scalability p oint of view, these sys-
than Squid running UFS mounted synchronously
tems have three purp oses: improve web client la-
on FreeBSD, 5.6{8.4 times larger than a simulated
tency, drive down the ISP's network access costs
version of Squid running UFS with soft up dates on
b ecause of reduced bandwidth requirements, and re-
FreeBSD, and 5.4{13 times larger than XFS and
duce request load on origin servers.
Naming. The web proxy application determines EFS on IRIX (see Section 4). We also p erformed
thenameofa letobewritteninto the le system. exp eriments using the Polygraph environment [18]
In a traditional UNIX le system, le names are with an Apache proxy; the mean resp onse time for
normally selected by a user. hits in the proxy is 14 times smaller with Humming-
bird than with UFS (see Section 5).
Reference lo cality. Clientweb accesses are char-
acterized by a request for an HTML page followed
Throughout the rest of this pap er, we use the terms
by requests for emb edded images. The set of images
proxy or web proxy to mean caching web proxy. Sec-
within a cacheable HTML page changes slowly over
tion 2 presents the imp ortantcharacteristics of the
time. So, if a page is requested by a client, it makes
proxy workload considered for our le system. It
sense for the proxy to anticipate future requests by
also presents some background on le systems and
prefetching its historically asso ciated images.
proxies that is imp ortant b ecause it motivates much
of our design. Section 3 describ es the Humming-
We studied one day of the web proxy log for this
bird le system. Our exp eriments and results are
reference lo cality. The rst time we see an HTML
presented in Sections 4 and 5. Section 6 discusses
le, we coalesce it with all following non-HTMLs
related work in le systems and web proxy caching.
from the same client to form the primordial local-
ity set,whichdoesnotchange. The next time the
HTML le is referenced, we form another lo cality
2 File system limitations
set in the same manner and compare its le mem-
The web proxy workload is di erentthanthestan-
b ers with those of the primordial lo cality set. The
dard UNIX workload; these di erences are discussed
average hit rate (the ratio between the size of the
in Section 2.1. Due to these di erences, the p erfor-
latter sets and the size of the primordial set) across
mance which can b e attained for a web proxy run-
all references is 47%. Thus, on average, a lo cality set
ning on UFS is limited by some of the features; these
re-reference accesses almost half of the les of the
limitations are discussed in Section 2.2.
original reference. One of the reasons that this hit
rate is small mightbe due to the assumption that
2.1 Web proxy workload characteristics
all non-HTML les that follow a HTML le are in
the same lo cality set; this is clearly not true if a
Web proxy workloads have sp ecial characteristics,
user has multiple active browser sessions. Also, we
which are di erent from those of a traditional UNIX
determined the typ e of le using the le extension,
le system workload. This section describ es the sp e-
thus possibly placing some HTML les in another
cial characteristics of proxy workloads.
le's lo cality set. We also studied the size of the lo-
cality sets in bytes, and found that 42% are 32 KB
We studied a week's worth of web proxy logs from
or smaller, 62% are 64 KB or smaller, 73% are 96
a ma jor, national ISP, collected from January 30 to
KB or smaller, and 80% are 128 KB or smaller.
February 5, 1999. This proxy ran Netscap e Enter-
prise server proxy software and the logs were gen-
File access. Several older UNIX le system p er-
erated in Netscap e-Extended2 format. For the pur-
formance studies have shown that les are usually
p ose of our analysis we isolated the request stream
accessed sequentially [16, 1]. A recent study [19]
to those that would a ect the le system underlying
has suggested that this is changing, esp ecially for
the proxy. Thus we excluded 34% of the GET re-
memory-mapp ed and large les. However, UNIX
quests which are considered non-cacheable by the
le systems have b een designed for the traditional
proxy. If we could not nd the le size, we re-
sequential workload. Web proxies have an even
moved the log event. This prepro cessing results in
stronger and more predictable pattern of b ehavior.
the removal of ab out 4% of the log records, nearly
Web proxies currently always access les sequen-
all during the rst few days. We eliminated the
tially and in their entirety. (This maychange when
rst few days and are left with 4 days of pro cessed
range requests are more p opular.)
logs containing 4.8 million requests for 14.3 GB of
unique cacheable data and 27.6 GB total requested
File size. Most cacheable web do cuments are
cacheable data.
small; the median size of requested les is 1986 B
and the average size is 6167 B. Over 90% of the
Characteristics of a web proxy and its workload are:
references are for les smaller than 8 KB.
Idleness. The proxy workload is characterized by
Persistence. A web proxy only writes les re-
a large variability in request rate and frequent idle
trieved from an origin server. Thus, these are just
cached les, and can b e retrieved again if necessary.
periods. Using simulation, we found, in fact, in each direct and low-overhead mechanism can b e used.
1-minute interval, the disk is idle 57{99% of the du-
ration of the interval, with a median of 82%. If Exp erience with Squid and Apache has shown that
the request rate in the trace doubles, the disk is it is dicult for web proxies to use directory hier-
idle less of the time, with a median of 60%. In the archies to their advantage [11]. Deep pathnames
busiest p erio ds, it is idle only 1%. Note that when mean long traversal times. Populated directories
the request rate doubles, the disk is almost satu- slowdown lo okup further b ecause many legacy im-
rated in the busy intervals (idle time drops to 1%), plementations still do a linear search for the le-
while the disk remains idle in the low-activity in- name through the directory contents. The hierar-
tervals. This idleness study used assumptions that chical name space allows les to be organized by
represent Hummingbird: the disk has 64 KB blo cks, separating them across directories, but this is not
all new les are written to the disk, and only data needed byaproxy. What the proxy actually needs
needs to b e written to disk (no meta-information). is a at name space and the ability to sp ecify storage
The existence of idle p erio ds is crucial for the design lo cality.
of Hummingbird since Hummingbird p erforms back-
ground b o okkeeping op erations in those idle p erio ds UFS le meta-data is stored in the i-no de, which
(see Section 3.5). is up dated using synchronous disk writes to ensure
meta-data integrity. Acaching web proxy do es not
2.2 UFS p erformance limitations
require le durability for correctness, so it is free
to replace synchronous meta-data writes with asyn-
Due to the proxy workload characteristics presented
chronous writes to improve the p erformance (which
in the previous section, UFS has a number of fea-
is done with soft up dates [6]) and eliminate the need
tures which are not needed, or could b e greatly sim-
to maintain much of the meta-data asso ciated with
pli ed, so that le system p erformance could b e im-
p ersistence.
proved. Table 1 presents the features on whichUFS
and a desired caching web proxy le system should
Traditional le systems also force twoarchitectural
di er. Wenow discuss these features.
issues. First, the standard le system interface
copies from kernel VM into the applications' address
UFS les are collections of xed-size blo cks, typi-
space. Second, the le system caches le blo cks in
cally 8 KB in size. When accessing a le, the disk
its own bu er cache. Web proxies manage their
delays are due to disk head p ositioning o ccurring
own application-level VM caches to eliminate mem-
when the le blo cks are not stored contiguously on
ory copies and use private information to facilitate
disk. UFS attempts to minimize the disk head p osi-
more e ective cache management. However, web
tioning time by storing the le blo cks contiguously
do cuments cached at the application level are likely
and prefetching blo cks when a le is accessed se-
to also exist in the le system bu er cache, esp e-
quentially, and do es a good job of this for small
cially if recently accessed from disk. This multiple
les. Thus, when the workload consists of mostly
bu ering reduces the e ective size of the memory.
small les, the largest comp onent of disk delays are
A single uni ed cache solves the multiple bu ering
due to the reference stream lo cality not corresp ond-
and con guration problems. Memory copy costs can
ing with the on-disk layout. UFS attempts to reduce
be alleviated by passing data by reference rather
this delaybyhaving the user place les into direc-
than copying. Both of these can b e done using a le
tories, and lo cates les in a directory on a group of
system implemented by a library that accesses the
contiguous disk blo cks called cylinder groups. Thus,
raw disk partition. Another approach for alleviating
reference lo cality is tied to naming. Here, the re-
multiple bu ering is using memory-mapp ed les as
sp onsibility for p erformance lies with the applica-
done by Maltzahn et al. [11].
tion or user, who must construct a hierarchywithdi-
rectory lo cality that matches future usage patterns.
In addition, to reduce le lo okup times, the direc-
3 File system design
tory hierarchymust b e well-balanced and any single
directory should not havetoomanyentries. Squid The design of Hummingbird is in uenced by the
attempts to balance the directory hierarchy,butin proxy workload characteristics discussed in Sec-
the pro cess distributes the reference stream across tion 2. Hummingbird uses lo calityhints generated
directories, thus destroying lo cality. Apache maps bythe proxy to pack les into large, xed-size ex-
les from the same origin server into the same direc- tents called clusters, which are the unit of disk ac-
tory. For sp ecifying lo calitybya web proxy, a more cess. Therefore, reads and writes are large, amortiz-
Table 1: Comparison of le system features.
feature UFS Hummingbird
name space hierarchical name space at name space
reference lo cality application manages directories application sends lo calityhints
le meta-data meta-data kept on disk; synchronous up dates most meta-data in memory
preserve consistency
disk layoutof les data in blo cks, with separate i-no de for meta-data le and meta-data stored contiguously
disk access could b e as small as a blo ck size cluster size (typically 32 or 64 KB)
interface bu ers are passed; memory copies are necessary pointers are passed
mingbird has several parameters that the proxy is ing disk p ositioning times and interrupt pro cessing.
free to set to optimize the system for its workload.
The parameters set at le system initialization in- Hummingbird manages a large memory cache. Since
clude: size of a cluster, memory cache eviction p ol- Hummingbird is implemented by a library that is
icy, le hash table size, le and cluster lifetimes, disk linked in with the proxy, no memory copies or mem-
data layout p olicy,andrecovery p olicies. ory mappings are required to move data from the
le system to the proxy. The le system simply
3.1 Hummingbird ob jects
passes a pointer. Likewise, the proxy passes the
le system a p ointer when it writes data. Wehave
Hummingbird stores two main typ es of ob jects in
only designed the le system for a single client, so
main memory: les and clusters. A le is created by
protection is not necessary. Since the client (the
a write file() call. Clusters contain les and some
proxy) handles all data transfers to and from the
le meta-data. Grouping les into clusters allows
system, it must b e trusted in any case. The proxy
the le system to physically collo cate les together,
1
maybemulti-threaded or havemultiple pro cesses ,
since when a cluster is read from disk, all of the
in which case access is serialized with a single lo ck
les contained in the cluster are read. Clusters are
on the le system meta-data that is released be-
clean, i.e., they can b e evicted from main memory
fore blo cking for disk I/O. Using a single lo ckmay
by reclaiming their space without writing to disk,
slow the system under a heavy load. Since the le
since a cluster is written to disk as so on as it is
system is I/O b ound, the lo ck is held only when re-
created. (Section 3.5 discusses where on disk the
ferring to Hummingbird's data structures. The lo ck
clusters are written.)
is released b efore a thread blo cks for disk I/O. No
lo cking is needed when accessing le data.
The application provides lo cality hints by the
collocate files(fnameA, fnameB) call. The
Since the typical workload is bursty, Humming-
le system saves these hints until the les are
bird is designed to reduce the resp onse time dur-
assigned to clusters. This assignment o ccurs as
ing bursty p erio ds, and p erform maintenance activ-
late as p ossible, that is, when space is needed
ities during the idle p erio ds present in the workload.
in main memory. At this point, the le system
Hummingbird performs the maintenance activities
attempts to write fnameA and fnameB in the same
by calling several daemons resp onsible for: (1) re-
cluster. It is p ossible for a le to be a member of
claiming main memory space by writing les into
multiple clusters, and stored in multiple lo cations
clusters, and (2) reclaiming disk space by deleting
on disk by the application sending multiple hints
unused clusters.
files(fnameA, fnameB) and (e.g., collocate
collocate files(fnameC, fnameB)). For proxy
While there are invariants across proxy workloads,
caches, this is a useful feature since emb edded
some characteristics will change. Wehave designed
images are frequently referenced in a number of
Hummingbird to be con gurable so that the sys-
related HTML pages.
tem can b e optimized for a proxy workload and the
underlying storage hardware. To this e ect, Hum-
When the le system is building a cluster, it deter-
mines which les to add to the cluster using an LRU
1
To p erform memory management for the multi-pro cess
ordering according to the last time the le was read.
version of Hummingbird, the pro cesses have a shared memory
region which is used for malloc(). If the least-recently-used le has a list of collo cated
les, then these les are added to the cluster if they meta-data must include the cluster ID and the le
are in main memory. (If a le is on the collo ca- reference count for that le.
tion list, and already has been added to a cluster,
it can still b e added to the current cluster if the le It is natural for a proxy to use the URL as a le
is in memory.) Files are packed into the cluster un- name. URLs may b e extremely long, and since we
til the cluster size threshold is reached, or until all have many small les, the le names may takeupa
les on the LRUlisthave b een pro cessed. This way, large p ortion of main memory if they were kept p er-
small lo cality sets with similar last-read times can manently in memory. Thus, we savethe le name
be packed into the same cluster. Another p ossible with the le data in its cluster and not p ermanently
algorithm to pack les into clusters is the Greedy in memory. Internally, Hummingbird hashes the le
Dual Size algorithm [3]. name into a 32-bit index, which is used to lo cate
the le meta-data. Hash collisions can b e detected
Large les are sp ecial. They account for a very by comparing the requested le name with the le
small fraction of the requests, but a signi cantfrac- name stored in the cluster. If there is a collision, the
tion of the bytes transferred. In the log we ana- next element in the hash table bucket is checked.
lyzed, les over 1 MB accounted for over 8% of the
Cluster meta-data. A cluster table contains in-
bytes transferred, but only 0.02% of the requests.
formation ab out each cluster on disk: the status,
Caching these large les is not imp ortant for the
last-time accessed, and a linked list of the les in the
average latency p erceived by clients, but is an im-
cluster. The cluster status eld identi es whether
p ortant factor in the network access costs of the
the cluster is empty, on disk, or in memory. For
ISP. It is b etter to store these large les on disk,
our le system, the cluster ID identi es the lo cation
and not in the le system cache in memory. The
of the cluster on disk. While a cluster is in mem-
write nomem file() call bypasses the main mem-
ory, the address of the cluster in memory is needed.
ory cache and writes a le directly to disk; if the le
The last-time accessed is needed to determine the
is larger than the cluster size, multiple clusters are
amount of time since the cluster was last touched.
nomem file() allo cated. Having an explicit write
function allows the application to request that any
3.3 The Hummingbird interface
le can bypass main memory, not just large les.
This section describ es the basic Hummingbird calls.
3.2 Meta-data
All routines return a p ositiveinteger or zero if the
op eration succeeds; a negative return value is an
Hummingbird maintains three typ es of meta-data:
error co de.
le system meta-data, le meta-data, and cluster
meta-data.
int write file(char* fname, void* buf,
File system meta-data. To determine when
t sz); This function writes the contents of size
main memory space needs to be freed, Humming-
the memory area starting at buf with size sz to
bird maintains counts of the amount of space used
the le system with lename fname. It returns the
for storing the les and the le system data. To
size of the le. Once the pointer buf is handed
assist with determining which les and clusters to
over to the le system, the application should
evict from main memory, Hummingbird maintains
not use it again. Hummingbird will eventually
twoLRU lists, one for les whichhavenotyet b een
pack the le into a cluster, free the bu er, and
write the cluster to stable storage.
packed into clusters and another for clusters that
are in memory.
int read file(char* fname, void** buf); This
function sets *buf to the b eginning of the memory
File meta-data. A hash table stores p ointers to
area containing the contents of the le fname. It
the le information such as the le number (dis-
increments a reference count and returns the size
cussed b elow), status, and a reference countofthe
of the le. If the le is not in main memory, an-
numb er of users that are currently reading the le.
other le or les may b e evicted to make ro om, and
The le status eld identi es whether the le is not
a cluster containing the sp eci ed le will b e read
in a cluster, in one cluster, in multiple clusters, or
from disk.
not cacheable. Until a le b ecomes a member of a
int done read file(char* fname, void* buf);
cluster, the le name and le size need to b e main-
This function releases the space o ccupied by the
tained as part of the le meta-data. We also main-
leinmainmemoryby decrementing the reference
tain a list of les that should be collo cated with
count; the application should not use the p ointer
file() must be accompanied this le. When a le is added to a cluster, the le again. Every read
by a done read file(). Otherwise, the le will Warming the cache. Hummingbird lacks a di-
stay in memory inde nitely (until the application
rectory structure and all meta-data consistency de-
program terminates).
p endencies are contained within clusters. There is
no need for an analog of the UNIX fsck utilityto
file(char* fname); This function int delete
ensure le system consistency after a crash. Dur-
deletes the le fname. An error co de is returned
ing a planned shutdown, Hummingbird will write
if the le has any active read file()'s.
the le-to-cluster mappings to disk. (They are also
files(char* fnameA, char* int collocate
written p erio dically.) During a crash, the system
fnameB); This function attempts to collo cate le
has no such luxury. So, while the le system is
fnameB with le fnameA on disk. Both les must
guaranteed to be consistent with itself, the lack
file()). b e previously written (by calling write
of directories might make it imp ossible to lo cate
int write nomem file(char* fname, void*
les on disk immediately after a crash if the le-to-
buf, size t sz); This function bypasses the
cluster mapping was not up to date. Hummingbird
main memory cache and writes a le directly to
sp eeds up crash recovery time with a log contain-
disk. This le is agged so that when it is read,
ing the cluster identi ers of p opular clusters. The
it do es not comp ete with other do cuments for
hot clusters daemon() creates this log using sync
cache space and is immediately released after the
the cluster LRU list and writes it to disk period-
application issues the done read file().
ically. After a crash, these clusters are scanned in
rst, quickly achieving a hit rate close to that b efore
the crash.
Missing from this API are commands suchasls and
df. Wehaveseennoneedforsuch commands for a
Persistence. Hummingbird do es not change disk
caching web proxy. For example, the existence of a
contents immediately for le and cluster deletions.
le can b e determined with read file().
Research with journalling le systems has shown
that hard meta-data up date p ersistence is exp en-
3.4 Recovery
sive, due to the necessity for up dating stable storage
Web proxies are slowly convergent; it takes days to
synchronously [22]. Hummingbird do es not provide
reach the maximal hit rate. Consequently, proxy
hard p ersistence, but uses a log of intentional dele-
cache contents must be saved across system fail-
tions to b ound the p ersistence of deletions. Records
ures. At the same time, recovery must be quick.
describing deletions, either cluster or le, are writ-
With to day's disk sizes, the system cannot wait for
ten into the log, which is bu ered in memory. Peri-
full disk scans b efore servicing do cument requests.
o dically, the log is written out to disk according to
Hummingbird warms the main memory cache with
the sp eci ed thresholds. The user sp eci es when the
les and clusters that were \hot" b efore the system
log should be written by sp ecifying either a num-
reb o ot. Bounding le create and delete p ersistence
ber of les threshold (i.e., once X les have b een
rather than attaching them to system call semantics
recorded in the log, it must b e written to disk), or a
allows for higher p erformance.
threshold on the passing of time (i.e., the log must
b e written to disk at least once every Y seconds).
Data stored on disk. Hummingbird's disk stor-
age is segmented into four regions:
The log is structured as a table, indexed by cluster
or le identi er. When a le or cluster is overwritten
on disk, the delete intent record is removed from the
clusters, which store the le data and meta-
log. Records contain le or cluster identi ers as well
data,
as a generation numb er, which is stored in the on-
disk meta-data and used to eliminate the p ossibility
mappings of les to clusters, whichallow a le
of replayed deletes.
to be quickly lo cated on disk by identifying
The recovery pro cedure. During recovery, all
which clusters the le is in,
four regions of the disk are lo cked exclusively by
the recovery pro cess. The Hummingbird interface
the hot cluster log, which caches frequently
comes alive to the application early in the pro cess
used clusters, and
{ after the hot cluster log of the previous session
has b een recovered. However, Hummingbird will
the delete log, which stores small records de-
not write new les or clusters to disk until recov-
scribing intentional deletes.
ery has completed; write file() calls will fail. It
The mappings of les to clusters is part of the le meta-data describ ed in Section 3.2.
tinues to recover the clusters in the background con proxy workload file
log generator system disk During this
and rebuild the in-memory meta-data. file system
requests that do not hit in the currently-
phase, operations
rebuilt meta-data are checked in the le-to-cluster
Figure 1: Simulation environment.
mappings to identify the cluster that needs to be
recovered. Since the le-to-clusters mapping can b e
out-of-date (due to a crash), a le without a le-to-
not o ccur.
cluster mapping is viewed as not in the le system.
Pack les from head daemon. The pack
Wenow outline the sequence of events during crash
files daemon() takes les o the tail of the LRU
recovery; recovery is much simpler during a planned
list to pack into clusters; pack files from head
shutdown. (1) Crash or power failure. The sys-
daemon() takes les o the head. This has the
tem reb o ots and enters recovery mo de. (2) The hot
e ect of packing the most-frequently accessed les
cluster log is scanned, the hot clusters are read in,
into clusters.
and their contents are used to initialize in-memory
meta-data. (3) The le-to-cluster mappings are read
Free main memory data daemon. This dae-
in from disk. (4) The delete log is scanned and
mon evicts data using le and cluster LRU lists
records are applied, mo difying meta-data as neces-
when the amount of main memory used by the le
sary. (5) Proxy service is enabled. Hummingbird
system exceeds a tunable threshold.
will now service requests. (6) During idle time, the
recovery pro cess scans the cluster region, rebuilding
Free disk space daemon. This daemon deletes
the in-memory meta-data for all les and clusters.
les or clusters whose age exceeds tunable thresh-
As les are recovered, they are available to the ap-
olds set bythedelete le policy and the delete cluster
plication. (7) Recovery is complete. All les and
policy.
clusters are now available. Hummingbird can now
write new clusters to disk.
3.5 Daemons
4 Exp erimental results
Hummingbird employs a numb er of daemons in ad-
We built an environment to run trace-driven exp er-
dition to the recovery daemons mentioned ab ove
iments on real implementations of UFS and Hum-
that run when the le system is idle and can be
mingbird. This environment consists of four com-
called on demand. In the future, wehopetosupport
ponents, as depicted in Figure 1. (1) The pro-
client-provided daemons, whichwould, for example,
cessed proxy log whichwas discussed in Section 2.1.
supp ort di erentcache eviction p olicies.
(2) The workload generator simulates the op eration
of a caching web proxy. It reads the proxy log
Pack les daemon. If the amountofmainmem-
events and generates the le system op erations that
ory used to store les exceeds a tunable threshold,
aproxy would have generated during pro cessing of
the pack files daemon() uses the le LRUlistto
the original HTTP requests. (3) The le system,
create a cluster of les and write the cluster to
which is either Hummingbird or various implemen-
disk. The daemon packs the les using the informa-
tations of UFS, EFS, or XFS. (4) The disk, which
file() calls, attempting tion from the collocate
is a physical magnetic disk that the le system uses
to pack les from the same lo cality set in the same
to store the les. The workload generators, imple-
cluster. If a le is larger than the cluster size, it is
mentation details, and exp eriments are describ ed in
split b etween multiple clusters.
the following section.
This daemon uses the disk data layout policy to de-
4.1 Workload generators
cide which cluster to pack next. Implemented p oli-
We develop ed two workload generators: wg-Squid cies include: closest cluster, which picks the clos-
which mimics Squid's interaction with the le sys- est free cluster to the previous cluster on disk ac-
tem, and wg-Hummingbird which issues Humming- cessed, closest cluster to previous write,whichpicks
bird calls. The same wg-squid workload genera- the closest free cluster to the previous cluster writ-
tor was used with the various UFS implementation, ten to on disk, and overwrite, whichoverwrites the
EFS and XFS. The generators take as input the next cluster. All of these p olicies tend to write les
mo di ed proxy access log. The workload genera- accessed within a short time to clusters close to-
tors op erate in a trace-driven lo op which pro cesses gether on disk. Thus, long-term fragmentation do es
2
HTML le request seen from a particular client. logged events sequentially without pause . This
The hash table only stores the URLs of static simulates a heavily loaded server.
HTML les. As requests for non-HTML do cu-
UFS workload generator: wg-Squid. The
ments are pro cessed, wg-Hummingbird generates
wg-Squid simulates the le system behavior of
collocate files() calls for the non-HTML paired
Squid-2.2. The Squid cache has a 3-level directory
with its client's current HTML le, as stored in the
hierarchy for storing les; the number of children at
hash table.
eachlevel is a con gurable parameter. In order to
minimize le lo okup times, Squid attempts to keep
Note that Squid (and wg-Squid) do not pro-
directories small by distributing les evenly across
vide explicit lo cality hints to the underlying op-
the directory hierarchy.
erating system; they only place les in the di-
rectory hierarchy. This is in contrast with wg-
When a le is written to the cache a top-level direc-
Hummingbird, whichprovides explicit lo cality hints
tory,orSwapDir, is selected. Squid attempts to load
via the colocate files() call. There are other
balance across the SwapDirs. Once the SwapDir has
ways which a proxy could use UFS to obtain le
b een selected, the le is assigned a le numb er which
lo cality; we did not test against these other ap-
uniquely identi es the le within the SwapDir. The
proaches. Thus, our comparisons are restricted to
value of this le number is used to compute the
the Squid approach for le lo cality.
names of the level-2 and level-3 directories. Thus,
4.2 Exp eriments
Squid do es not use the URL or URL reference lo-
We p erformed our exp eriments on PCs with a
cality for le placementinto directories, limiting the
700 MHz Pentium I I I running FreeBSD 4.1 with an
ability of the le system to collo cate les whichwill
18 GB IBM 10,000 RPM Ultra2 LVD SCSI (Ultra-
b e accessed together. Once the directory path has
star 18LZX). We also p erformed exp eriments on an
b een determined, the le is created and written.
SGI Origin 2000 with 18 GB 10,000 RPM Seagate
Cheetah disks (ST118202FC) under IRIX 6.5. The
Squid has con gurable high and low water marks.
results with di erent parameter settings were simi-
When the total cache size passes the high water
lar to each other so we only present a representative
mark, eviction b egins. Files are deleted from the
subset of the results.
cache in mo di ed LRU order with a small bias to-
wards expired les. Eviction continues until the low
Our exp eriments used the 4-day web proxy log as
water mark is reached. The wg-Squid simulates this
input into the workload generators. Among other
behavior except that it uses LRU instead of mo di ed
measurements, we measured the proxy hit rate,
LRU; our log do es not contain the expires informa-
which represents how frequently the web page will
tion.
not havetobefetched from the server, and the le
system read time, which represents how long the
Hummingbird workload generator: wg-
proxy must wait to get the le from the le sys-
Hummingbird. Each iteration of the wg-Humming-
tem. The le system write time is the time it takes
bird lo op parses the next sequential event from
the le system to return after a write; this may
the log and attempts to read the URL from Hum-
not include the time to write the le or le meta-
mingbird. If successful, it explicitly releases the le
data to disk. (We call the le system read (write)
bu er. If the read attempt fails, it attempts to write
times FS read (write) time in our gures and ta-
the URL into the le system; this would have oc-
bles.) wg-Squid and Hummingbird measured the
curred after the proxy fetched the URL from the
le system read/write times directly, and the out-
server.
put of the iostat -o and sar commands were used
to determine the disk I/O times. We also rep ort the
The wg-Hummingbird maintains a hash table keyed
throughput,whichwe compute as the ratio of the ex-
by client IP address to store information for gen-
p eriment run time and the total numb er of requests
erating the collo cate hints. The values in the
pro cessed. The measurements are averaged over the
hash table are the URLs of the most recent
entire log; warm-up e ects were insigni cant.
2
We call this timing mo de THROTTLE. Wehave imple-
mented two other di erent timing mo des: REAL and FAST.
Comparing Hummingbird with UFS: single
REAL issues the event with the frequency that they were
thread. We compared Hummingbird with three
recorded. FAST pro cesses the events with a sp eed-up heuris-
versions of UFS on FreeBSD 4.1. The three ver-
tic parameterized by n; if the time b etween events is longer
than time n,thenweonlywait time n between them. sions of UFS were: UFS, which is UFS mounted
Table 2: Comparing Hummingbird with UFS, UFS-async, and UFS-soft when les greater than 64 KB are
not cached.
le disk main proxy FS read FS write # of disk mean disk
system size memory (MB) hit rate time (ms) time (ms) I/Os I/O time (ms)
Hummingbird 4GB 256 0.62 1.81 0.32 1,161,163 5.14
UFS-async 4GB 128+128 0.64 6.13 2.54 4,807,440 4.66
UFS-soft 4GB 128+128 0.64 5.54 21.07 10,737,300 4.97
UFS 4GB 128+128 0.64 5.93 20.77 10,238,460 5.02
Hummingbird 4GB 1024 0.63 1.05 0.03 552,882 5.75
UFS-async 4GB 512+512 0.64 3.21 2.35 3,217,440 3.95
UFS-soft 4GB 512+512 0.64 3.27 20.85 9,714,420 4.76
UFS 4GB 512+512 0.64 3.92 18.43 9,022,200 4.63
Hummingbird 8GB 256 0.64 2.03 0.32 1,194,494 5.64
UFS-async 8GB 128+128 0.67 5.69 1.47 4,605,360 4.34
UFS-soft 8GB 128+128 0.67 5.78 15.66 9,058,141 4.86
UFS 8GB 128+128 0.67 6.17 12.83 8,313,360 4.63
Hummingbird 8GB 1024 0.64 1.20 0.03 578,562 6.37
UFS-async 8GB 512+512 0.67 4.05 1.34 3,561,180 4.13
UFS-soft 8GB 512+512 0.67 3.74 15.65 8,148,721 4.62
UFS 8GB 512+512 0.67 3.81 15.70 7,611,840 4.68
1000
hronously (the default), UFS-soft, whichisUFS
sync Hummingbird
soft up dates, and UFS-async, which is UFS with UFS-async
UFS-soft
ted asynchronously, so that meta-data up dates moun UFS
800 are not synchronous and the le system is not guar-
anteed to be recoverable after a crash. We used
a version of Hummingbird with a single working ere called explicitly ev-
thread, where the daemons w 600
ery 1000 log events. Table 2 presents comparisons
for two di erent disk sizes, 4 GB and 8 GB, with
two memory sizes, 256 MB and 1024 MB, when les
400
greater than 64 KB are not cached. The memory
was split evenly between the Squid cache and the 3
throughput (FS reads per second)
le system bu er cache . The proxy-p erceived la-
200
tency in Table 2 is the 5th column, the FS read time.
Hummingbird's smaller le system read time is due
to the hits in main memory caused by grouping les
y sets into clusters. Hummingbird's smaller in lo calit 0 4GB/256MB 8GB/256MB
le system write time (6th column) when compared 4GB/1024MB 8GB/1024MB
to cluster writes, which write
to UFS-async is due Configuration
Figure 2: Request throughput from Table 2.
multiple les to disk in a single op eration. The FS
write times for UFS and UFS-soft are greater than
UFS-async due to the synchronous le create op er-
ation.
the number of disk I/Os in the UFS exp eriments
is larger than the total number of requests in the
The e ectiveness of the clustered reads and writes
log. This is b ecause le op erations resulted in mul-
and the collo cation strategy is illustrated in the
tiple disk I/Os. This also explains why UFS read
number of disk I/Os. In all test con gurations,
and write op erations (as seen in FS read and write
Hummingbird issued substantially fewer disk I/Os
times) are slower than individual disk I/Os. The
than any of the UFS con gurations. Also, note that
mean disk I/O time is larger in Hummingbird since
the request size is a cluster, which is larger when
3
We controlled the size of the le system bu er cache by
compared to the mean data transfer size accessed
lo cking the Squid's memory using the mlock system call and
by UFS.
lo cking additional memory, so that le system bu er cache
could use only the remaining unlo cked memory. In this way
The throughput for each exp eriment in Table 2 is we also prevented paging of the Squid pro cess.
shown in Figure 2. Figure 2 shows that Humming- ond observation from Table 4 is that EFS is much
bird throughput is much higher than both UFS, slower than XFS, which is a journalling le system.
UFS-soft, and UFS-async on the same disk size and In particular, le system write time for EFS is more
memory size. This is not quite a fair comparison than 3 times larger than XFS write time. It is the
since the proxy hit rate is lower with Hummingbird. result of frequent synchronous write op erations for
(We do not exp ect the exp eriment run time to in- meta-data, which are p erformed by EFS. The third
crease more than 10% when the Hummingbird p oli- observation is that increasing wg-Squid cache size
cies are set so that it would have equivalent hit rate to 128MB and reducing the le system bu er cache
to wg-Squid). The throughput is larger for Hum- size actually improved the le system p erformance,
mingbird since much less time is sp entindiskI/O. which indicates that the wg-Squid cache is more ef-
Using throughput as a comparison metric, we see fective than the le system bu er cache.
that Hummingbird is 2.3{4.0 times faster than sim-
Comparing Hummingbird with UFS: multi-
ulated Squid running on UFS-async, 5.6{8.4 times
threaded workload generators. Many proxies
larger than a simulated version of Squid running on
are either multi-threaded or havemultiple pro cesses.
UFS-soft, and 5.4{9.4 times faster than simulated
Hummingbird is b oth thread- and pro cess-safe. We
Squid running on UFS. These numb ers include also
implemented multi-threaded versions of both our
the results from Table 3.
workload generators, with a thread for the daemons
in wg-Hummingbird, and ran similar exp eriments to
The exp eriments for Table 2 assumed that les
the ab ove with one pro cessor running FreeBSD 4.1
greater than 64 KB were not cached by the proxy.
with the LinuxThreads library. Table 5 contains
We got similar results when assuming the proxy
a subset of the results of our exp eriments using
would cache all les; see Table 3. Note that the
four threads and two disks. In multi-threaded squid
proxy hit rate in Table 3 is lower than in Table 2.
workload generator, two cache ro ot directories are
This is the result of the cache b eing \p olluted"
used, each residing on a disk. The exp erimentrun
with large les, which cause some smaller les to
time for the exp eriments in Table 5 are consistently
be evicted. The end result is that there are less
longer than the corresp onding cases in Table 3 due
hits, which translate into less le system activity,
to an uneven distribution of the les on the 2 disks;
and fewer le accesses.
we observed a bursty access pattern to each disk
in the iostat log. Queuing in the device driver re-
Comparing Hummingbird with EFS and
sults in longer FS read/write time to o. However, the
XFS: single thread. We compared Humming-
Hummingbird throughput is again 2{4 times greater
bird with EFS and XFS on SGI IRIX. EFS is an
than for UFS, UFS-async, and UFS-soft.
extent le system. XFS is a high-p erformance jour-
Recovery p erformance. Recovering the hot
nalling le system that interacts with the kernel
clusters and the deletion log are quick; it takes ab out
through the traditional VFS and vno de interfaces.
30 seconds to read 2500 hot clusters and a log of
3000 deletion entries into memory. Thus, on a sys-
We exp erimented with 3 di erent disk sizes, 4 GB,
tem crash with a 18 GB disk, Hummingbird can
9 GB, and 18 GB. Since EFS supp orts le systems
start to service requests in 30 seconds, while UFS
up to 8 GB in size, we could not test it with 9
will take more than 20 minutes to p erform the nec-
or 18 GB disks. Table 4 presents results where
essary fsck b efore requests can b e serviced. While
wg-Squid cache and the IRIX bu er cache together
rebuilding the in-memory meta-data (Step 6 in Sec-
use 256 MB of main memory, which are divided
tion 3.4), the FS read time increased a small amount
in two ways: 50 MB + 206 MB and 128 MB +
(e.g., from 2.03 ms to 2.50 ms for a 8 GB disk).
128 MB (wg-Squid and bu er cache, resp ectively).
The50MBforthewg-Squid was selected according
Journalling le systems will recover integrityquickly
to the Squid administration guidelines [23]. Hum-
compared with traditional UFSs due to log-based
mingbird has 256 MB of main memory, and it has
recovery. However, journalling alone will not fetch
thebestchoice of p olicies as previously discussed.
hot data from disk as part of the recovery pro cess.
Our exp erimental results are in Table 4. Humming-
bird throughput is much higher than b oth XFS and
5 Polygraph results
EFS on the same disk size as seen in the UFS exp er-
A web cache b enchmarking package called Web iments, and the user-p erceived latency (5th column
Polygraph [18] is used for comparison of caching in Table 4) is smallest in Hummingbird. The sec-
Table 3: Comparing Hummingbird with UFS, UFS-async, and UFS-soft when all les are cached.
le disk main proxy FS read FS write #ofdisk mean disk exp eriment
system size memory (MB) hit rate time (ms) time (ms) I/Os I/O time (ms) run time (s)
Hummingbird 4GB 256 0.60 1.68 0.39 1,349,175 4.17 6,362
UFS-async 4GB 128+128 0.62 6.61 2.88 5,134,380 4.72 25,510
UFS-soft 4GB 128+128 0.62 5.81 22.91 11,858,100 5.02 59,475
UFS 4GB 128+128 0.62 6.13 22.67 11,283,060 5.05 59,807
Hummingbird 4GB 1024 0.61 0.88 0.52 630,362 5.62 6,030
UFS-async 4GB 512+512 0.62 3.67 2.70 3,919,620 3.98 16,384
UFS-soft 4GB 512+512 0.62 3.51 22.89 10,868,700 4.84 52,377
UFS 4GB 512+512 0.62 4.24 20.23 10,115,400 4.69 49,749
Hummingbird 8GB 256 0.64 2.17 0.40 1,464,727 5.03 7,923
UFS-async 8GB 128+128 0.66 6.42 2.30 5,017,920 4.67 24,657
UFS-soft 8GB 128+128 0.66 6.14 19.31 10,461,841 4.98 51,797
UFS 8GB 128+128 0.66 7.37 16.42 9,954,540 4.89 51,062
Hummingbird 8GB 1024 0.62 1.18 0.51 695,771 6.41 6,802
UFS-async 8GB 512+512 0.66 4.66 1.90 4,016,400 4.32 18,240
UFS-soft 8GB 512+512 0.67 3.69 15.71 8,158,080 4.61 37,097
UFS 8GB 512+512 0.66 4.24 18.90 8,894,760 4.82 45,044
Table 4: Comparing Hummingbird with XFS and EFS with 256 MB of main memory when all les are
cached.
le disk cache proxy FS read FS write #ofdisk mean disk exp eriment
system size size (MB) hit rate time (ms) time (ms) I/Os I/O time (ms) run time (s)
Hum 4GB 256 0.62 2.82 0.20 922,871 9.85 9,926
EFS 4GB 50+206 0.62 14.79 46.38 10,784,969 10.99 128,878
XFS 4GB 50+206 0.62 14.97 16.56 10,870,353 6.67 75,565
EFS 4GB 128+128 0.62 14.08 48.33 10,540,778 11.37 130,359
XFS 4GB 128+128 0.62 14.90 13.95 10,115,369 6.72 70,746
Hum 9GB 256 0.66 3.30 0.21 1,028,922 10.77 11,861
XFS 9GB 50+206 0.66 15.47 12.39 9,558,954 7.02 69,929
XFS 9GB 128+128 0.66 15.65 8.80 8,737,878 7.12 64,726
Hum 15 GB 256 0.67 3.71 0.21 1,049,121 11.92 13,313
XFS 15 GB 50+206 0.67 16.71 5.73 8,103,576 7.60 63,371
XFS 15 GB 128+128 0.67 15.91 5.79 7,841,184 7.55 60,943
Table 5: Comparing Hummingbird with UFS, UFS-async, and UFS-soft with 256 MB of main memory
when all les are cached with 2 disks and 4 threads in the workload generator.
le disk proxy FS read FS write # of disk mean disk exp eriment
system size hit rate time (ms) time (ms) I/Os I/O time (ms) run time (s)
Hummingbird 4GB 0.64 4.86 1.34 1,478,005 11.68 11,743
UFS-async 4GB 0.66 5.92 6.39 5,638,201 10.72 30,427
UFS-soft 4GB 0.66 3.25 20.42 9,405,301 9.76 45,655
UFS 4GB 0.66 6.71 15.86 10,771,201 9.05 48,577
Hummingbird 8GB 0.67 6.12 1.40 1,556,169 14.08 12,997
UFS-async 8GB 0.67 6.19 5.18 5,466,061 10.67 29,354
UFS-soft 8GB 0.67 6.04 14.67 8,678,761 10.52 44,622
UFS 8GB 0.67 6.78 9.73 8,903,641 8.74 38,464
web proxies. The clients and servers are simulated; the write-ahead logging proto col. This di ers from
the client workload parameters such as hit ratio, LFS, in which the log contains all data, including
cacheability, and resp onse sizes can b e sp eci ed and meta-data. LFS also addresses the meta-data up-
server-side delays can be sp eci ed. We used the date problem by ordering up dates within segments.
PolyMix-2 trac mo del to compare Apache [25]us-
ing UFS and a slightly mo di ed version of Apache The Bullet server [26, 24] is the le system for
using Hummingbird without collocate files() Amo eba, a distributed op erating system. The Bul-
calls. Due to space considerations of this pap er, let service supp orts entire le op erations to read,
we only brie y discuss our results. create, and delete les. All les are immutable.
Each le is stored contiguously, b oth on disk and in
We used one of our FreeBSD Pentium IIIs for the memory. Even though the le system API is similar
proxy. We used a numb er of di erent client request to Hummingbird, the Bullet service do es not p er-
rates; the following results are for 8 requests/second. form clustering of les together, so it would not have
We found that the mean resp onse time for hits in the the same typ e of p erformance improvement that
proxy is 14 times smaller with Hummingbird than Hummingbird has for a caching web proxy work-
with UFS, and the median resp onse time for hits load.
is 20 times smaller. The improvement in mean and
median resp onse times for proxy misses was much Kaasho ek et al. [9] approaches high p erformance
smaller, as exp ected; Hummingbird is 20% faster through developing server operating systems, where
than UFS. Since Hummingbird serves proxy hits a server op erating system is a set of abstractions and
faster, its request queue is shorter, which in turn, runtime supp ort for sp ecialized, high p erformance
shortens the queue time of proxy misses. server applications. Their implementation of the
Cheetah web server is similar to Hummingbird in
one way: collo cating an HTML page and its images
6 Related work
on disk and reading them from disk as a unit. Web
servers' data storage is represented naturally bythe
Related work falls into two categories: rst, analyses
UFS le hierarchy. This is not true for caching web
of traditional UFS-based systems and ways to b eat
proxies as discussed in Section 2.2.
their p erformance limitations, and second, analyses
of the b ehavior of web proxies and how they can
CacheFlow [2] builds a cache op erating system
b etter use the underlying I/O and le systems.
called CacheOS with an optimized ob ject storage
system which minimizes the numb er of disk seek and
The rst set of research extends back to the original
I/O op erations p er ob ject. Unfortunately, details of
FFS work of McKusick et al. [13] which addressed
the ob ject storage are not public. The Network Ap-
the limitations of the System V le system byintro-
pliance ler [8] is a prime example of combination
ducing larger blo ck sizes, fragments, and cylinder
of an op erating system and a sp ecialized le sys-
groups. With increasing memory and bu er cache
tem (WAFL) inside a storage appliance. Novell [15]
sizes, UNIX le systems were able to satisfy more
has develop ed the Cache Ob ject Store (COS) which
reads out of memory. The FFS clustering work of
they state is 10 times more ecientthantypical le
McVoy and Kleiman [14] sought to improve write
systems; few details on the design are available. The
times by lazily writing the data to disk in contigu-
COS prefetches the comp onents for a page when the
ous extents called clusters. LFS [20] soughttoim-
page is requested, leading us to b elieve that the com-
prove write times by packing dirty le blo cks to-
ponents are not stored contiguously as they are in
gether and writing to an on-disk log in large extents
Hummingbird.
called segments. The LFS approach necessitates a
cleaner daemon to coalesce live data and free on-
Rousskov and Soloviev [21] studied the p erformance
disk segments. As well, new on-disk structures are
of Squid and its use of the le system. Markatos
required. Work in soft up dates [6] and journalling
et al. [12] presents metho ds for web proxies to
[7, 4] has sought to alleviate the p erformance limi-
work around costly le system le op ens, closes,
tations due to synchronous meta-data op erations,
and deletes. One of their metho ds, LAZY-READS,
such as le create or delete, which must mo dify
gathers read requests n-at-a-time, and issues them
le system structures in a sp eci ed order. Soft up-
all at the same time to the disk; results are presented
dates maintains dep endency information in kernel
when n is 10. This is similar to our clustering of lo-
memory to order disk up dates. Journalling systems
cality sets, since a read for a cluster will, on average,
write meta-data up dates to an auxiliary log using
access 8 les. We feel that Hummingbird in a more version of Squid running UFS with soft up dates on
general solution to the decreasing the e ect of costly FreeBSD, and 5.4{13 times larger than XFS and
le system op erations on a web proxy. EFS on IRIX. The Web Polygraph environmentcon-
rmed these improvements for the resp onse times
Maltzahn et al. [10] compared the disk I/O of for proxy hits.
Apache and Squid and concluded that they were
remarkably similar. In a later pap er [11], they sim- Additional information ab out Hummingbird is
ulated the op eration of several techniques for mo d- available at http://www.bell-labs.com/pro-
ifying Squid, one of which was to use a memory- ject/hummingbird/.
mapp ed interface to access small les. Other tech-
Acknowledgements. Many thanks to Arthur
niques improved the lo cality of related les based
Goldb erg for giving us copies of the ISP log les that
on domain names. This pap er rep orted a reduc-
we studied. WeeTeck Ng help ed us with prepar-
tion of up to 70% in the number of disk op era-
ing our exp erimentenvironment. Bruce Hillyer and
tions relative to unmo di ed Squid. An inherent
Philip Bohannon were very helpful in initial design
problem using one memory-mapp ed le to access
discussions. Hao Sun p erformed the Polygraph ex-
all small ob jects is that it cannot scale to handle a
p eriments.
very large number of ob jects. Like Hummingbird,
using memory-mapp ed les requires mo di cation to
the proxy co de.
References
Pai et al. [17] develop ed a kernel I/O system called
IO-lite to p ermit sharing of \bu er aggregates" b e-
[1] Baker, M. G., Hartman, J. H., Kupfer, M. D.,
tween multiple applications and kernel subsystems.
Shirriff, K. W., and Ousterhout, J. K. Mea-
This system solves the multiple bu ering problem,
surements of a distributed le system. In Proceed-
but, like Hummingbird, applications must use a
ings of the Thirteenth ACM Symposium on Operat-
di erent interface that sup ersedes the traditional
ing Systems Principles (Asilomar (Paci c Grove),
UNIX read and writes.
CA, Oct. 1991), ACM Press, pp. 198{212.
[2] Network cache p erformance measurements.
CacheFlowWhitePap ers Version 2.1, CacheFlow,
7 Conclusions
Sept. 1998.
This pap er explores le system supp ort for appli-
[3] Cao, P., and Irani, S. Cost-aware WWW proxy
cations which can take advantage of the perfor-
caching algorithms. 193{206.
mance/p ersistence tradeo . Such a le system is
[4] Chutani, S., Anderson, O. T., Kazar, M. L.,
esp ecially useful for lo cal caching of data, where p er-
Leverett, B. W., Mason, W. A., and Side-
manent storage of the data is available elsewhere. A
botham, R. N. The Episo de le system. In Pro-
caching web proxy is the prime example of an ap-
ceedings of the Winter 1992 USENIX Conference
plication that may b ene t from this le system.
(San Francisco, CA, Winter 1992), pp. 43{60.
[5] Gabber, E., and Shriver, E. Let's put NetApp
We implemented Hummingbird, a light-weight le
and CacheFlow out of business! In Proceedings of
system that is designed to supp ort caching web
the 9th ACM SIGOPS European Workshop (Kold-
proxies. Hummingbird has two distinguishing fea-
ing, Denmark, Sept. 2000), pp. 85{90.
tures: it stores its meta-data in memory, and it
[6] Ganger, G. R., and Patt, Y. N. Metadata up-
stores groups of related ob jects (e.g., HTML page
date p erformance in le systems. In Proceedings of
and its embedded images) together on the disk.
the First USENIX Symposium on Operating Sys-
By setting the tunable parameters to achieve per-
tems Design and Implementation (OSDI) (Mon-
sistence, Hummingbird can also be used to im-
terey, CA, Nov. 1994), pp. 49{60.
prove the p erformance of web servers, which have
[7] Hagmann, R. Reimplementing the Cedar le sys-
similar reference lo cality as proxies. Our results
tem using logging and group commit. In Proceed-
are very promising; Hummingbird's throughput is
ings of the Eleventh ACM Symposium on Operat-
2.3{4.0 times larger than a simulated version of
ing Systems Principles (Austin, TX, Nov. 1987),
pp. 155{162. In ACM Operating Systems Review
Squid running UFS mounted asynchronously on
21:5.
FreeBSD, 5.4{9.4 times larger than a simulated ver-
sion of Squid running UFS mounted synchronously
[8] Hitz, D., Lau, J., and Malcolm, M. File sys-
on FreeBSD, 5.6{8.4 times larger than a simulated tem design for an NFS le server appliance. In
[20] Rosenblum, M., and Ousterhout, J. K. The Proceedings of the USENIX 1994 Winter Techni-
design and implementation of a log-structured le cal Conference (San Francisco, CA, Jan. 1994),
system. ACM Transactions on Computer Systems pp. 235{246.
10,1(Feb. 1992), 26{52.
[9] Kaashoek, M. F., Engler, D. R., Ganger,
G. R., and Wallach, D. A. Server op erating
[21] Rousskov, A., and Soloviev, V. A p erformance
systems. In Proceedings of the 1996 SIGOPS Euro-
study of the Squid proxy on HTTP/1.0. World-
pean Workshop (Connemara, Ireland, Sept. 1996),
Wide Web Journal, Special Edition on WWW
pp. 141{148.
Characterization and Performance and Evaluation
2, 1{2 (1999).
[10] Maltzahn, C., Richardson, K., and Grun-
wald, D. Performance issues of enterprise level
[22] Seltzer, M. I., Ganger, G. R., McKusick,
proxies. In Proceedings of the ACM Sigmetrics Con-
M. K., Smith, K. A., Soules, C. A. N., and
ferenceonMeasurement and Modeling of Computer
Stein, C. A. Journaling versus soft up dates: asyn-
Systems (Sigmetrics '97) (Seattle, WA, June 1997),
chronous meta-data protection in le systems. In
pp. 13{23.
Proceedings of the 2000 USENIX Annual Techni-
cal Conference (USENIX-00) (San Diego, CA, June
[11] Maltzahn, C., Richardson, K. J., and Grun-
2000), pp. 71{84.
wald, D. Reducing the disk I/O of web proxy
server caches. In Proceedings of the 1999 USENIX
[23] 1999. http://squid.nlanr.net/mail- archi ve/-
Annual Technical Conference (Monterey, CA, June
squid-users/.
1999), pp. 225{238.
[24] Tanenbaum, A. S., van Renesse, R., van
[12] Markatos, E. P., Katevenis, M. G., Pnev-
Staveren, H., Sharp, G. J., Mullender, S. J.,
matikatos, D., and Flouris, M. Secondary stor-
Jansen, J., and van Rossum, G. Exp eriences
age managementforweb proxies. In Proceedings of
with the Amo eba distributed op erating system.
the 2nd USENIX Symposium on Internet Technolo-
Communications of the ACM 33 (Dec. 1990), 46{
gies and Systems (Boulder, CO, Oct. 1999), pp. 93{
63.
114.
[25] The Apache Software Foundation. Apache
[13] McKusick, M. K., Joy, W. N., Leffler, S. J.,
http server pro ject. http://www.apache.org/-
and Fabry, R. S. A fast le system for UNIX.
httpd.html.
ACM Transactions on Computer Systems 2, 3
[26] van Renesse, R., Tanenbaum, A. S., and
(Aug. 1984), 181{197.
Wilschut, A. N. The design of a high-
[14] McVoy, L. W., and Kleiman, S. R. Extent-like
p erformance le server. In Proceedings of the Ninth
p erformance from a UNIX le system. In Proceed-
International Conference on Distributed Comput-
ings of the Winter 1991 USENIX Conference (Dal-
ing Systems (ICDCS 1989) (Newp ort Beach, CA,
las, TX, Jan. 1991), pp. 33{43.
June 1989), IEEE Computer So ciety Press, pp. 22{
[15] The Novell ICS advantage: Comp etitive white pa-
27.
per. Tech. rep. Available at http://www.novell.-
com/advantage/nics/nics-compw p.htm l.
[16] Ousterhout, J. K., Da Costa, H., Harrison,
D., Kunze, J. A., Kupfer, M., and Thompson,
J. G. A trace-driven analysis of the UNIX 4.2 BSD
le system. In Proceedings of 10th ACM Sympo-
sium on Operating Systems Principles (Orcas Is-
land, WA, Dec. 1985), vol. 19, ACM Press, pp. 15{
24.
[17] Pai, V. S., Druschel, P., and Zwaenepoel, W.
IO-lite: a uni ed I/O bu ering and caching system.
In Proceedings of the Third USENIX Symposium
on Operating System Design and Implementation
(OSDI'99) (New Orleans, LA, Feb. 1999), pp. 15{
28.
[18] Polyteam. Web p olygraph site. http://poly-
graph.ircache.net/.
[19] Roselli, D., Lorch, J. R., and Anderson,
T. E. A comparison of le system workloads. In
Proceedings of the 2000 USENIX Annual Techni-
cal Conference (USENIX-00) (San Diego, CA, June 2000), pp. 41{54.