2.5 Main Memory Databases (MMDB)
Total Page:16
File Type:pdf, Size:1020Kb
BUILDING HIGH PERFORMANCE MAIN MEMORY WEB DATABASES A Thesis Presented to The Faculty of Graduate Studies Of The University of Guelph BY MIN JIANG In partial FuIfillment of requirements For the degree of Master of Science April, 200 1 O Min Jiang, 200 L National Library Bibliothèque nationale 1+1 of Canada du Canada Acquisitions and Acquisitions et Bibliographie Services services bibliographiques 395 Weliington Street 395, nie Wellington Ottawa ON KIA ON4 Ottawa ON K1A ON4 Canada Canada Vour file Votre r6/eme Our lire Narre re-fdr- The author has granted a non- L'auteur a accordé une licence non exclusive licence allowing the exclusive permettant à la National Library of Canada to Bibliothèque nationale du Canada de reproduce, loan, distribute or sel1 reproduire, prêter, distribuer ou copies of this thesis in microfom, vendre des copies de cette thèse sous paper or electronic formats. la forme de microfiche/film, de reproduction sur papier ou sur format électronique. The author retains ownership of the L'auteur conserve la propriété du copyright in this thesis. Neither the droit d'auteur qui protège cette thèse. thesis nor substantial extracts fiom it Ni la thèse ni des extraits substantiels may be printed or othenvise de celle-ci ne doivent être imprimés reproduced without the author's ou autrement reproduits sans son permission. autorisation. ABSTRACT BUILDING HlGH PERFORMANCE MAIN MEMORY WEB DATABASES Min Jiang Advisor: University of Guelph, 200 1 Professor Fanau Wang Disk resident databases (DRDBs) dorninate today's Web sites. DRDBs have high costs in loading prograns and reading data from disk. which result in long response time. In this research, we study how main memory Web databases (MMDBs) can help reduce query processing time on the server side and consequently response time, which is a major indicator of Web server performance. We develop and experimenr three simulation models. The Servlet-MMDB model using Java servlets to simulate main memory Web databases; the Module-MMDB model using the server API to sirnulate main memory Web databases; the CGI-DRDB rnodel using the CG1 protocols to simulate disk resident databases for comparison. We discuss the implementation details of each model. We quantitatively study and analyze the experimental results, and compare the two main memory database models with the disk resident database model respectively. Finally we give our recomrnendations for choosing the best irnplementation strategy. First of all. 1 would like to thmk my advisor Dr. Fangju Wang for his encouragement and guidance. which have been indispensable in cornpleting this work. I wouid also like to thank my cornmittee members: Dr. David K. Y. Chiu, Dr. Wlodek Dobosiewicz, Dr. Michael Wirth for their valuable suggestions. 1 am indebted to my parents. 1 thank them for their love and support. Last but not the least, I thank my wife, Jing Ye, for her love. patience and support. CONTENTS 1 INTRODUCTION 1 1.1 Motivations of the Research ...................................................... I 1.2 The Research Objectives ......................................................... -4 1.3 ThesisOrganization ............................................................... 6 2 BACKGROUND AND RELATED WORK 7 2.1 Network Computing .............................................................. -7 2.2 World Wide Web (WWW) Architectures .......................................9 2.3 Web Pages and Backend Databases ............................................ 11 2.4 Techniques for Improving the Web Performance ........................... -14 2.4.1 Caching static pages ..................................................... 14 2.4.2 Caching dynarnic pages ............................................... -17 2.4.3 Clusters of servers and a DNA distributor .......................... -18 2.4.4 Replication of Web servers ............................................. 18 2.4.5 Multicast delivery of Web pages ..................................... -19 2.4.6 Improving HTTP latency ............................................... 20 2.5 Main Memory DataBases (MMDB) ........................................... 22 2.5.1 Concurrency control ..................................................... 23 2.5.2 Backup and recovery .................................................... 25 6 CONCLUSIONS 89 6.1 Achievements ..................................................................... -89 6.2 Future Work ....................................................................... 90 B ib liography 9 1 List of Figures Figure 2.1 The Web architecture ............................................................ IO Figure 3.2 Requesting and serving a static Web page ..................................... 11 Figure 2.3 Requesting and serving a dynarnic Web page ................................. 12 Figure 2.4 The MMDB architecture ...... .. ................................................. 22 Figure 3.1 CG1 script ing ..................................................................... -32 Figure 3.2 An overview of the CGI-DRDB mode1 ....................................... 35 Figure 3.3 Execution of a CG1 script ....................................................... 36 Figure 3.4 Process Structure with CG1 script Execution of a CG1 script ............. -37 Figure 3.5 N concurrent visits handled by the CG1 protocols ........................... 37 Figure 3.6 Life cycle of a servlet ........................................................... -39 Figure 3.7 Servkt flow of execution ........................................................ 43 Figure 3.8 An overview of the Servlet-MMDB mode1 ................................... 44 Figure 3.9 N concurrent visits handles by a servlet ....................................... 44 Figure 3.10 The Apache server life cycle .................................................... 47 Figure 3.1 1 An Overview of the Server API Module-MMDB mode1 ................... 49 Figure 3.12 N concurrent visits handled by a server API module ........................ 49 Figure 4.1 Process overview of raw map data to query data sets ........................ 54 Figure 4.2 The flow of database preloading process ....................................... 56 vii Figure 4.3 Execution Flow Web Applications ............................................. 58 Figure 5.1 Response time and query processing tirne of CGI-DRDB mode1 ........ -70 Figure 5.2 Response time and query processing t ime of Servlet-MMDB mode1 .....70 Figure 5.3 Response time and query processing time of Module-MMDB mode1 ...-70 Figure 5.4 Response time for the three models ............................................ 71 Figure 5.5 Query processing tirne for the t hree modeis .................................. 71 Figure 5.6 Response time ratio .............................................................. -74 Figure 5.7 Query processing t ime ratio ..................................................... 74 Figure 5.8 Response time and query processing time of CGI-DRDB mode1 .......... 84 Figure 5.9 Response time and query processing time of Servlet-MMDB mode1 .....84 Figure 5.10 Response tirne and query processing t ime of Module-MMDB mode1 ....84 Figure 5.1 1 Response time for three rnodels ............................................... -85 Figure 5.12 Query processing t ime for three models ....................................... 85 Figure 5.13 Response time ratios ............................................................ -88 Figure 5.14 Query processing t ime ratio ..................................................... 88 ... Vlll CHAPTER 1 INTRODUCTION 1.1 Motivations of the Research in the last few years, the World Wide Web (WWW or Web) has been growing exponentially and it has changed our lives drarnatically. in the early stage, the Web allowed users to publish and retrieve information easily via hypertext interfaces. What they published and retrieved were static Web pages, whic h are stored in the file systern of the mnning Web server machine. Nowadays, people are doing online shopping, banking and stock trading through the Web, which have become our daily activities. Interactivity. fiequent updates and searching/querying are among the most attractive features of a modern Web site. These features are made possible only by the dynamic generation of Web pages with database support. To generate dynamic content, we need Web databases to organize, store and retrieve data; we also need the server-side scripting technology, which is a comrnonly used approach to interface Web databases, perform queries, retrieve data dynamically from databases and return results to the client. When querying a remote Web database. the response time includes the time for transrnitting the request and receiving the result on the network, and the time required by the server to process the request. The server processing time mainly comprises the time for loading the semer's external program compiling it, and executing it; invoking a database application program (establishing a database connection) if the server's external program comrnunicates with a database: retrieving data, and manipulating the data. Traditionally. the time required to transmit data was the dominant component in the response time. Currently, with the development of wide band network technologies, more and more high-speed networks have been used for the Intemet and intranets (for example giganets). Data transmission (especially on many LANs and intranets) takes less and less tirne. Therefore. of the total response tirne. the proportion of server processing