Clotho:Decoupling Memory Page Layout from Storage Organization

Clotho:Decoupling Memory Page Layout from Storage Organization Minglong Shao Jiri Schindler £ Steven W. Schlosser† Anastassia Ailamaki Gregory R. Ganger Carnegie Mellon University Abstract significantly. One query, for instance, might access all the attributes in a table (full-record access), while another ac- As database application performance depends on cesses only a subset of them (partial-record access). Full- the utilization of the memory hierarchy, smart data record accesses are typical in transactional (OLTP) appli- placement plays a central role in increasing local- cations where insert and delete statements require the en- ity and in improving memory utilization. Existing tire record to be read or written, whereas partial-record techniques, however, do not optimize accesses to accesses are often met in decision-support system (DSS) all levels of the memory hierarchy and for all the queries. Moreover, when executing compound workloads, different workloads, because each storage level one query may access records sequentially while others ac- uses different technology (cache, memory, disks) cess the same records “randomly” (e.g., via non-clustered and each application accesses data using different index). Currently, database storage managers implement a patterns. Clotho is a new buffer pool and stor- single page layout and storage organization scheme, which age management architecture that decouples in- is utilized by all applications running thereafter. As a re- memory page layout from data organization on sult, in an environment with a variety of workloads, only a non-volatile storage devices to enable indepen- subset of query types can be serviced well. dent data layout design at each level of the storage Several data page layout techniques have been proposed hierarchy. Clotho can maximize cache and mem- in the literature, each targeting different query type. No- ory utilization by (a) transparently using appropri- tably, the N-ary Storage Model (NSM)[13] stores records ate data layouts in memory and non-volatile stor- consecutively, optimizing for full-record accesses, while age, and (b) dynamically synthesizing data pages penalizing partial-record sequential scans. By contrast, the to follow application access patterns at each level Decomposition Storage Model (DSM)[7] stores values of as needed. Clotho creates in-memory pages indi- each attribute in a separate table, optimizing for partial- vidually tailored for compound and dynamically record accesses, while penalizing queries that need the en- changing workloads, and enables efficient use of tire record. More recently, PAX [1] optimizes cache per- different storage technologies (e.g., disk arrays or formance, but not memory utilization. Fractured mirrors MEMS-based storage devices). This paper de- [14] reduce DSM’s record reconstruction cost by using an scribes the Clotho design and prototype imple- optimized structure and scan operators, but need to keep mentation and evaluates its performance under a an NSM-organized copy of the database as well to support variety of workloads using both disk arrays and full-record access queries. None of the previously proposed simulated MEMS-based storage devices. schemes provides a universally efficient solution, however, because they all make a fundamental assumption that the 1 Introduction pages used in main memory must have the same contents as those stored on disk. Page structure and storage organization have been the sub- This paper proposes Clotho, a buffer pool and storage ject of numerous studies [1, 3, 6, 9, 10], because they play management architecture that decouples the memory page a central role in database system performance. Research layout from the non-volatile storage data organization. This continues as no single data organization serves all needs decoupling allows memory page contents to be determined within all systems. In particular, the access patterns re- dynamically according to queries being served and offers sulting from queries posed by different workloads can vary two significant advantages. First, it optimizes storage ac- £ Now with EMC Corporation cess and memory utilization by requesting only the data † Now with Intel Corporation accessed by a given query. Second, it allows new two- Permission to copy without fee all or part of this material is granted pro- dimensional storage mechanisms to be exploited to miti- vided that the copies are not made or distributed for direct commercial gate the trade-off between the NSM and DSM storage mod- advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the els. Clotho chooses data layouts at each level of the mem- Very Large Data Base Endowment. To copy otherwise, or to republish, ory hierarchy to match NSM where it performs best, DSM requires a fee and/or special permission from the Endowment. where it performs best, and outperforms both for query Proceedings of the 30th VLDB Conference, mixes and access types in between. It leverages the Atro- Toronto, Canada, 2004 pos logical volume manager [16] to efficiently access two dimensional data structures stored both in disk arrays and Data Cache–Memory Memory–Storage in MEMS-based storage devices (MEMStores) [19, 22]. Organization OLTP DSS OLTP DSS Ô Ô ¢ This paper also describes and evaluates a prototype im- NSM ¢ Ô Ô ¢ plementation of Clotho within the Shore database storage DSM ¢ Ô Ô Ô manager [4]. Experiments with disk arrays show that, with PAX ¢ only a single storage organization, performance of DSS and OLTP workloads is comparable to the page layouts best Table 1: Summary of performance with current page layouts. suited for the respective workload (i.e., DSM and PAX , re- spectively). Experiments with a simulated MEMStore con- port systems (DSS) workloads. Since DSS queries typi- firm that similar benefits will be realized with these future cally access a small number of attributes and most of the devices as well. data in the page is not touched in memory by the scan op- The remainder of this paper is organized as follows. erator, DSM stores only one attribute per page. To ensure Section 2 gives background and related work. Section 3 efficient storage device access, a storage manager maps describes the architecture of a database system that enables DSM pages with consecutive records containing the same decoupling of the in-memory and storage layouts. Sec- attribute into extents of contiguous LBNs. In anticipation of tion 4 describes the design of a buffer pool manager that a sequential scan through records stored in multiple pages, supports query-specific in-memory page layout. Section 5 a storage manager can prefetch all pages in one extent with describes the design of a volume manager that allows effi- a single large I/O, which is more efficient than accessing cient access when an arbitrary subset of table attributes are each page individually by a separate I/O. needed by a query. Section 6 describes our initial imple- A page layout optimized for CPU cache performance, mentation, and Section 7 evaluates this implementation for called PAX [1], offers good CPU-memory performance for several database workloads using both a disk array logical both individual attribute scans of DSS queries and full- volume and a simulated MEMStore. record accesses in OLTP workloads. The PAX layout par- titions data across into separate minipages. A single mini- page contains data of only one attribute and occupies con- 2 Background and related work secutive memory locations. Collectively, a single page con- Conventional relational database systems store data in tains all attributes for a given set of records. Scanning indi- fixed-size pages (typically 4 to 64 KB). To access individual attributes in PAX accesses consecutive memory lo- vidual records of a relation (table) requested by a query, cations and thus can take advantage of cache-line prefetch a scan operator of a database system accesses main mem- logic. With proper alignment to cache-line sizes, a single ory. Before accessing data, a page must first be fetched cache miss can effectively prefetch data for several records, from non-volatile storage (e.g., a logical volume of a disk amortizing the high latency of memory access compared to array) into main memory. Hence, a page is the basic allo- cache access. However, PAX does not address memory- cation and access unit for non-volatile storage. A database storage performance. storage manager facilitates this access and sends requests All of the described storage models share the same char- to a storage device to fetch the necessary blocks. acteristics. They (i) are highly optimized for one workload A single page contains a header describing what records type, (ii) focus predominantly on one level of the mem- are contained within and how they are laid out. In order to ory hierarchy, (iii) use a static data layout that is deter- retrieve data requested by a query, a scan operator must un- mined a priori when the relation is created, and (iv) apply derstand the page layout, (a.k.a. storage model). Since the the same layout across all levels of the memory hierarchy, page layout determines what records and which attributes even though each level has unique (and very different) char- of a relation are stored in a single page, the storage model acteristics. As a consequence, there are inherent perfor- employed by a database system has far reaching implica- mance trade-offs for each layout that arise when a work- tions on the performance of a particular workload [ 2]. load changes. For example, NSM or PAX layouts waste The page layout prevalent in commercial database sys- memory capacity and storage device bandwidth for DSS tems, called N-ary storage model (NSM), is optimized for workloads, since most data within a page is never touched. queries with full-record access common in an on-line trans- Similarly, a DSM layout is inefficient for OLTP queries ac- action processing (OLTP) workload.

Load more