Atropos: a Disk Array Volume Manager for Orchestrated Use of Disks

Atropos: a Disk Array Volume Manager for Orchestrated Use of Disks

Appears in Proceedings of the 3rd USENIX Conference on File and Storage Technologies (FAST’04). San Francisco, CA. March 2004. Atropos: A Disk Array Volume Manager for Orchestrated Use of Disks Jiri Schindler,£ Steven W. Schlosser, Minglong Shao, Anastassia Ailamaki, Gregory R. Ganger Carnegie Mellon University Abstract explicit hints APPLICATION to applications The Atropos logical volume manager allows applications host to exploit characteristics of its underlying collection of I/O 2 disks. It stripes data in track-sized units and explicitly requests LVM exposes the boundaries, allowing applications to maxi- parameters layout w/ efficient mize efficiency for sequential access patterns even when data access Atropos LVM they share the array. Further, it supports efficient diag- 1 onal access to blocks on adjacent tracks, allowing ap- disk drive plications to orchestrate the layout and access to two- parameters dimensional data structures, such as relational database disk array tables, to maximize performance for both row-based and Figure 1: Atropos logical volume manager architecture. Atropos column-based accesses. exploits disk characteristics (arrow 1), automatically extracted from disk drives, to construct a new data organization. It exposes high-level 1 Introduction parameters that allow applications to directly take advantage of this data organization for efficient access to one- or two-dimensional data Many storage-intensive applications, most notably structures (arrow 2). database systems and scientific computations, have some control over their access patterns. Wanting the it accomplishes two significant ends. First, Atropos ex- best performance possible, they choose the data layout ploits automatically-extracted knowledge of disk track and access patterns they believe will maximize I/O effi- boundaries, using them as its stripe unit boundaries. By ciency. Currently, however, their decisions are based on also exposing these boundaries explicitly, it allows ap- manual tuning knobs and crude rules of thumb. Applica- plications to use previously proposed “track-aligned ex- tion writers know that large I/Os and sequential patterns tents” (traxtents), which provide substantial benefits for are best, but are otherwise disconnected from the under- mid-sized segments of blocks and for streaming patterns lying reality. The result is often unnecessary complexity interleaved with other I/O activity [22]. and inefficiency on both sides of the interface. Second, Atropos uses and exposes a data organiza- Today’s storage interfaces (e.g., SCSI and ATA) hide tion that lets applications go beyond the “only one di- almost everything about underlying components, forc- mension can be efficient” assumption associated with ing applications that want top performance to guess and today’s linear storage address space. In particular, two- assume [7, 8]. Of course, arguing to expose more in- dimensional data structures (e.g., database tables) can be formation highlights a tension between the amount of laid out for almost maximally efficient access in both information exposed and the added complexity in the row- and column-orders, eliminating a trade-off [ 15] interface and implementations. The current storage in- currently faced by database storage managers. Atro- terface, however, has remained relatively unchanged for pos enables this by exploiting automatically-extracted 15 years, despite the shift from (relatively) simple disk knowledge of track/head switch delays to support semi- drives to large disk array systems with logical volume sequential access: diagonal access to ranges of blocks managers (LVMs). The same information gap exists in- (one range per track) across a sequence of tracks. side disk array systems—although their LVMs sit below In this manner, a relational database table can be laid a host’s storage interface, most do not exploit device- out such that scanning a single column occurs at stream- specific features of their component disks. ing bandwidth (for the full array of disks), and reading This paper describes a logical volume manager, called a single row costs only 16%–38% more than if it had Atropos (see Figure 1), that exploits information about been the optimized order. We have implemented Atro- its component disks and exposes high-level information pos as a host-based LVM, and we evaluate it with both about its data organization. With a new data organiza- database workload experiments (TPC-H) and analytic tion and minor extensions to today’s storage interface, models. Because Atropos exposes its key parameters ex- plicitly, these performance benefits can be realized with £ Now with EMC Corporation. no manual tuning of storage-related application knobs. The rest of the paper is organized as follows. Sec- cesses: accesses aligned and sized for one track. Recent tion 2 discusses current logical volume managers and research [22] has shown that doing so increases disk ef- the need for Atropos. Section 3 describes the design an ficiency by up to 50% for streaming applications that implementation of Atropos. Section 4 describes how At- share the disk system with other activity and for compo- ropos is used by a database storage manager. Section 5 nents (e.g., log-structured file systems [17]) that utilize evaluates Atropos and its value for database storage man- medium-sized segments. In fact, track-based access pro- agement. Section 6 discusses related work. vides almost the same disk efficiency for such applica- tions as would sequential streaming. 2 Background The improvement results from two disk-level details. This section overviews the design of current disk array First, firmware support for zero-latency access elimi- LVMs, which do not exploit the performance benefits of nates rotational latency for full-track accesses; the data disk-specific characteristics. It highlights the features of of one track can be read in one revolution regardless of the Atropos LVM, which addresses shortcomings of cur- the initial rotational offset after the seek. Second, no rent LVMs, and describes how Atropos supports efficient head switch is involved in reading a single track. Com- access in both column- and row-major orders to applica- bined, these positioning delays represent over a third tions accessing two-dimensional data structures. of the total service time for non-aligned, approximately track-sized accesses. Using small stripe unit sizes, as 2.1 Conventional LVM design do the array controllers mentioned above, increases the Current disk array LVMs do not sufficiently exploit or proportion of time spent on these overheads. expose the unique performance characteristics of their Atropos uses automated extraction methods described individual disk drives. Since an LVM sits below the in previous work [21, 22] to match stripe units to disk host’s storage interface, it could internally exploit disk- track boundaries. As detailed in Section 3.3, Atropos specific features without the host being aware beyond also deals with multi-zoned disk geometries, whereby possibly improved performance. Instead, most use data tracks at different radial distances have different num- distribution schemes designed and configured indepen- bers of sectors. that are not multiples of any useful block dently of the underlying devices. Many stripe data size. across their disks, assigning fixed-sized sets of blocks Efficient access to non-contiguous blocks: In addi- to their disks in a round-robin fashion; others use more tion to exploiting disk-specific information to determine dynamic assignment schemes for their fixed-size units. its stripe unit size, Atropos exploits disk-specific infor- With a well-chosen unit size, disk striping can provide mation to support efficient access to data across sev- effective load balancing of small I/Os and parallel trans- eral stripe units mapped to the same disk. This access fers for large I/Os [13, 16, 18]. pattern, called semi-sequential, reads some data from A typical choice for the stripe unit size is 32–64 KB. each of several tracks such that, after the initial seek, For example, EMC’s Symmetrix 8000 spreads and repli- no positioning delays other than track switches are in- cates 32 KB chunks across disks [10]. HP’s Au- curred. Such access is appropriate for two-dimensional toRAID [25] spreads 64 KB “relocation blocks” across data structures, allowing efficient access in both row- disks. These values conform to the conclusions of early and column-major order. studies [4] of stripe unit size trade-offs, which showed In order to arrange data for efficient semi-sequential that a unit size roughly matching a single disk track (32– access, Atropos must know the track switch time as well 64 KB at the times of these systems’ first implementa- as the track sizes. Carefully deciding how much data to tions) was a good rule-of-thumb. Interestingly, many access on each track, before moving to the next, allows such systems seem not to track the growing track size Atropos to access data from several tracks in one full rev- over time (200–350 KB for 2002 disks), perhaps because olution by taking advantage of the Shortest-Positioning- the values are hard-coded into the design. As a conse- Time-First (SPTF) [12, 23] request scheduler built into quence, medium- to large-sized requests to the array re- disk firmware. Given the set of accesses to the dif- sult in suboptimal performance due to small inefficient ferent tracks, the scheduler can arrange them to ensure disk accesses. efficient execution by minimizing the total positioning time. If the sum of the data transfer times and the track 2.2 Exploiting disk characteristics switch times equals the time for one rotation, the sched- Track-sized stripe units: Atropos matches the stripe uler will service them in an order that largely eliminates unit size to the exact track size of the disks in the vol- rotational latency (similar to the zero-latency feature for ume. In addition to conforming to the rule-of-thumb as single track access). The result is that semi-sequential disk technology progresses, this choice allows applica- accesses are much more efficient than a like number of tions (and the array itself [11]) to utilize track-based ac- random or unorchestrated accesses.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    14 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us