Middle-Tier Database Caching for E-Business *
Total Page:16
File Type:pdf, Size:1020Kb
Middle-Tier Database Caching for e-Business * Qiong Luo # Sailesh Krishnamurthy ÷ C. Mohan ~ Hamid Pirahesh ~ Honguk Woo ° Bruce G. Lindsay ~ Jeffrey F. Naughton # • Work done at IBM Almaden Research Center #Computer Sciences Dept, University of Wisconsin, Madison, WI 53706 • Department of EECS, UC Berkeley, Berkeley, CA 94720 5IBM Almaden Research Center, 650 Harry Road, San Jose, CA 95120 °Department of Computer Sciences, University of Texas-Austin, Austin, TX 78712 Contact email: {[email protected], [email protected]} ABSTRACT usually at the application server, of a multi-tier web site infrastructure. Example commercial products include the While scaling up to the enormous and growing Internet Database Cache of Oracle 9i Internet Application Server [19] and population with unpredictable usage patterns, E-commerce TimesTen's Front-Tier [22]. applications face severe challenges in cost and manageability, especially for database servers that are deployed as those applications' backends in a multi-tier configuration. Middle-tier database caching is one solution to this problem. In this paper, we present a simple extension to the existing federated features in I-WIP DB2 UDB, which enables a regular DB2 instance to become a DBCache without any application modification. On deployment Figure 1: Middle-Tier Databas Caching of a DBCache at an application server, arbitrary SQL statements generated from the unchanged application that are intended for a In a multi-tier e-Business infrastructure, middle-tier database backend database server, can be answered: at the cache, at the caching is attractive because of improvements to the following backend database server, or at both locations in a distributed attributes: manner. The factors that determine the distribution of workload include the SQL statement type, the cache content, the application (1) Scalability: by distributing query workload from requirement on data freshness, and cost-based optimization at the backend to multiple cheap front-end systems. cache. We have developed a research prototype of DBCache, and conducted an extensive set of experiments with an E-Commerce (2) Flexibility: with QoS (Quality Of Service) control benchmark to show the benefits of this approach and illustrate where each cache hosts different parts of the backend tradeoffs in caching considerations. data, e.g., the data of Platinum customers is cached while that of ordinary customers is not. 1. INTRODUCTION (3) Availability: by continued service for applications that Various caching techniques have been deployed to increase the depend only on cached tables even if the backend server performance of multi-tier web-based applications in response to is unavailable. the ever-increasing scale of the Internet. Such applications typically achieve a measure of scalability with application servers (4) Performance: by potentially responding to locality running on multiple (relatively cheaper) systems connecting to a patterns in the workload and smoothing out load peaks. single database system. This, however, does not solve the scalability problem for backend database servers. One way to Using a general-purpose industrial-strength DBMS for address this problem is middle-tier database caching (shown as middle-tier database caching is especially attractive to e- the gray box in Figure 1), which is deployed in the middle, Businesses, even though there have been special-purpose solutions (for example, e-Bay uses its own front-end data cache [ 18]). This is mainly due to crucial business requirements such as Permission to make digital or hard copies of all or part of this work for reliability, scalability, and manageability. For instance, an personal or classroom use is granted without fee provided that copies are industrial-strength DBMS closely tracks SQL enhancements, and not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy provides a variety of tools for application development. More otherwise, or republish, to post on servers or to redistribute to lists, importantly, it provides transactional support, multiple requires prior specific permissionand/or a fee. consistency levels, and efficient recovery services. Finally, an ACM S1GMOD "2002, June 4-6, 2002, Madison, Wisconsin,USA. ideal cache should be transparent to the application that uses it, Copyright 2002 ACM 1-58113-497-5/02/06...$5.00. and this is difficult to achieve with a special-purpose solution. We would also like to take advantage of existing replication support 600 in commercial products rather than developing and supporting (called DBCache) that leverages existing features of federated home-grown customer solutions. technology in a commercial DBMS engine (DB2). In Section 3, we describe our overall evaluation methodology with an e- Many research questions arise in using a full-fledged Commerce benchmark. We then present our experimental results database engine for middle-tier database caching, and the answers to show the performance impact of middle-tier database caching to some of them affect the relevance of others. In decreasing (Section 4). We discuss related work in Section 5 and conclude in order of importance, these are: Section 6 with our agenda for future research. (1) What are the performance bottlenecks in e-Business 2. TURNING DB2 INTO A DBCACHE applications, or in other words, are we addressing the right problems by focusing on database caching? (2) Will performance be acceptable using a commercial DBMS as a middle-tier data cache? Features such as y.. A ..v Boo • ol o transactional semantics, consistency, and recovery come ""A ~ A"'" with some overhead. What features can be dispensed with in such an environment? • . Networ spate er ...... .... I ~.......~.. I (3) What database caching schemes are suitable for e- Business applications? I Application I Application Web/App. • ...... Web/App. (4) How can a database caching scheme be implemented in Server I..... Server........ a commercial database engine and how does it perform under realistic e-Commerce workloads? I oBCaoh 1 I (5) What is the impact of running a database server in the same computer as an application server? l DB Server ] (6) How can we generalize these results to other kinds of web applications? Figure 2: Deploying DBCache Most of these questions remain open, partly due to the In this section, we will first discuss our design considerations for diversity of e-Commerce applications and the complexity of these the DBCache. Then, we present our cache initialization tool and systems. our modification to the DB2 engine. In this paper, we attempt to answer some of these questions. We start with examining the opportunities in e-Commerce 2.1 Design Considerations applications for middle-tier database caching by running an e- Commerce benchmark on typical web site architectures. We The goal of our DBCache is to improve the performance and observe that this benchmark generated a large number of simple scalability of web-based applications by distributing query OLTP-style queries, their table accesses were highly skewed on a processing to the clones of the applications and the underlying few read-dominant tables, and there was a clear separation application servers (as in Figure 2). With this goal in mind, we between write-dominant tables and read-dominant tables. We find examine the design requirements, our choice of caching schemes that web application clones could scale up to heavy loads and this and existing mechanisms. leaves the backend database server to eventually become the performance bottleneck in the system. 2.1.1 Requirements We then explore how to extend DB2 so that it can be used as The first requirement in our design of DBCache is that neither the a middle-tier database cache. By extending DB2's federated application, nor the underlying database schema should have to features, we turned a DB2 instance into a table level database change. Firstly, it is desired that the decision to deploy a cache without changing user applications. The novelty of this DBCache could be made for an arbitrary shrink-wrapped extension is that query plans at the cache may involve both the application by local administrators who do not have access to the cache and the remote server based on cost estimation. Through application source code. Secondly, requiring the application to be experiments, we showed that the overhead of adding a full- cognizant of the DBCache would result in increased complexity, strength DBMS as a middle-tier database cache was insignificant which is undesirable especially given that cost and maintainability for e-Commerce workloads. Consequently, middle-tier database are already major problems in such environments. We aim to caching improved users response time significantly when the make it easy for database administrators to set up the cache backend database server was heavily loaded. database schema, and make the DBCache be transparent to the applications at run time. The remainder of the paper is organized as follows. In Section 2, we present our prototype middle-tier database cache The second requirement is to support reasonable update 601 semantics. Update transactions increase resource contention at 2.1.3 Leveraging Existing Mechanisms the backend database server as well as the cost for cache consistency maintenance. Fortunately, E-commerce applications Having decided on table level caching, our problem is the have high browse-to-buy ratios (read-dominant) and high following: