Block-Level Virtualization: How Far Can We Go?

Block-level Virtualization: How far can we go? Michail D. Flouris†, Stergios V. Anastasiadis∗ and Angelos Bilas‡ Institute of Computer Science (ICS) Foundation for Research and Technology - Hellas P.O.Box 1385, Heraklion, GR 71110, Greece { flouris, stergios, bilas }@ics.forth.gr significant objective of lowering administration costs through Abstract automated administration. Furthermore, the software infrastructure should provide monitoring and dynamic reconfigura- In this paper, we present our vision on building large-scale tion mechanisms to allow the definition and implementation of storage systems from commodity components in a data center. For high-level quality-of-service policies. reasons of efficiency and flexibility, we advocate maintaining so- 3. Finally, the third challenge is efficient storage sharing between phisticated data management functions behind a block-based in- remote data-centers, enabling global-scale storage distribution. terface. We anticipate that such an interface can be customized to meet the diverse needs of end-users and applications through the Next we explore each of these challenges in more detail. extensible storage system architecture that we propose. 1.1. Scalable, Low-cost Platform Today, commercial storage systems mostly use expensive custom- 1. Introduction made hardware components and proprietary protocols. We advocate building scalable low-cost storage systems from storage Data storage is becoming increasingly important as larger clusters composed of common off-the-shelf components. Current amounts of data assets need to be stored for archival or online- commodity computers and networks have (or will soon have) sig- processing purposes. Increasing requirements for improving stor- nificant computing power and network performance, at a fraction age efficiency and cost-effectiveness introduce the need for scal- of the cost of custom storage-specific hardware (controllers or net- able storage systems in a data-center, that consolidate system re- works). Using such components, we can build low-cost storage sources into a single system image. Consolidation enables con- clusters with enough capacity and performance to meet practi- tinuous system operation uninterrupted from device faults, easier cally any storage needs, and scale it economically. This idea is storage management, and ultimately lower cost of purchase and not new. It has been proposed and predicted a few years ago by maintenance. several researchers and vendors [8, 9]. We think that the necessary Our target is primary a data-center environment, where appli- commodity technologies are maturing now with the emergence of cation servers and user workstations are able to access the storage new standards, protocols and interconnects, such as Serial ATA, system through potentially different network paths and protocols. PCI-X/Express/AS and iSCSI. For the rest of this discussion, we In this environment, although storage consolidation has potential assume that the future storage will be based on clusters built of for lower costs and more efficient storage management, its suc- commodity PCs (e.g. x86, SATA disks) and commodity intercon- cess depends on the ability of the system architecture to face three nects (e.g. gigabit Ethernet, PCI-Express/AS ) accessible through main challenges: standard protocols (e.g. SCSI). 1. The first significant challenge is the platform, that is how to A commodity PC with PCI-X or PCI-Express can currently provide system scalability with minimal hardware costs. accommodate at least 2 TBytes of storage using commodity I/O controllers with 8 or 16 disks SATA disks each. The PCI-X bus 2. The second crucial challenge is the software infrastructure re- can support aggregate throughput in excess of 800 MBytes/sec to quired for (i) virtualizing, (ii) sharing, and (iii) managing the the disks, while gigabit Ethernet NICs and switches are an inex- physical storage in a data center. This includes the virtual- pensive means for interconnecting the storage nodes. The nodes ization capability to mask applications from each other and can also run an existing high-performance operating system such to facilitate storage management tasks. It also includes the as Linux or FreeBSD for a very low cost. Similarly, intercon- † nect technology that will allow efficient scaling is currently being Also a graduate student at the Department of Computer Science, University of Toronto, Toronto, Ontario M5S 3G4, Canada. developed (e.g. PCI-Express/AS). Thus, we assume that the com- ∗ Also, with the Department of Electronic and Computer Engineer- modity components to build such storage platform are or will be ing, Technical University of Crete, Chania, GR 73100, Greece. available at a low cost. We believe that the next missing com- ‡ Also, with the Department of Computer Science, University of ponent for building cost-effective storage systems is addressing Crete, P.O. Box 2208, Heraklion, GR 71409, Greece. challenge 2. This will provide the software infrastructure that 101 will run on top, integrating storage resources into a system that in any significant way. Even though some vendors claim that they provides flexibility, sharing, and automated management. This is can support such sharing, efficient virtualization remains an elu- the main motivation for our work. sive target in large-scale systems, because of the software com- plexity. Maintaining consistency is complex and usually intro- 1.2. Software Infrastructure duces significant overheads in the system. A consolidated storage system requires increased flexibility to serve multiple applications and satisfy diverse needs in both stor- 1.3. Global-scale Sharing age management and data access. Moreover, as we accumulate In this area issues include interoperability, hiding network laten- massive amounts of storage resources behind a single system cies and achieving high throughput in networks with large delay- image, the automation of the administration tasks becomes cru- bandwidth products. We consider this area out of the scope of cial for sustaining efficient and cost-effective operation. Contin- our work, but we believe that the virtualization infrastructure we uous unobtrusive measurement of the device efficiency and the propose facilitates global storage connectivity and interoperabil- user-perceived performance should drive data reorganization and ity between data centers. device reconfiguration towards load-balanced system operation. The rest of this paper is organized as follows. Section 2 Finally, policy-based data administration should meet preagreed presents our position on virtualization issues, while Section 3 dis- performance objectives for individual applications. These are the cusses our approach to block-level virtualization. Finally, Section challenges for the software infrastructure in a data center. We 4 discusses related work and Section 5 draws our conclusions. argue that such requirements can be satisfied through a virtualization infrastructure. 2. Our Approach The term “virtualization” has been used to describe mainly two Our goal in this work is to address the software infrastructure concepts: indirection and sharing. The first refers to the addition challenges, by exploring storage virtualization solutions. The of an indirection layer between the physical resources and the log- term virtualization is used here to signify both an indirection layer ical devices accessed by the applications. Such a layer allows ad- between physical resources and virtual volume, and resource shar- ministrators to create and applications to access various types of ing. Next we describe the directions we have taken in our ap- virtual volumes that are mapped to physical devices, while offer- proach. ing higher-level semantics through multiple layers of mappings. This kind of virtualization has several advantages: (i) It enables a lot of useful management and reconfiguration operations, such as 2.1. Block-level functionality non-disruptive data reorganization and migration during the sys- We advocate block-level virtualization (we use the indirection tem operation. (ii) It adds flexibility to the system configuration, notion here), as opposed to file-level virtualization for the fol- since physical storage space can be arbitrarily mapped to virtual lowing reasons. First, certain functionality, such as compres- volumes. For example a virtual volume can be defined as any sion or encryption, may be simpler and more efficient to provide combination of basic operators, such as aggregation, partition- on unstructured fixed data blocks rather than variable-size files. ing, striping or mirroring. (iii) It allows implementation of useful Second, storage hardware at the back-end has evolved signifi- functionality, such as fault-tolerance with RAID levels, backup cantly from simple disks and fixed controllers to powerful stor- with volume snapshots or encryption with encryption data filters. age nodes [1, 8, 9] that offer block-level storage to multiple ap- (iv) It allows extensibility, the addition of new functionality in plications over a storage area network [16, 17]. Block-level vir- the system. This is achieved by implementing new virtualiza- tualization modules can exploit the processing capacity of these tion modules with the desired functions and incrementally adding storage nodes, where filesystems (running mainly on the applica- them to the system. tion servers and the clients) cannot. Third, storage management Currently, the “indirection” virtualization mechanisms of most (e.g. migrating data, backup,

Load more