SQL Query Cache for the Mysql Database System
Total Page:16
File Type:pdf, Size:1020Kb
Masaryk University Faculty}w¡¢£¤¥¦§¨ of Informatics !"#$%&'()+,-./012345<yA| SQL query cache for the MySQL database system Master Thesis Brno, April 2002 Martin Klemsa Hereby I state that this thesis is my genuine copyrighted work which I elaborated all by myself. All sources and literature that I used or consulted are properly cited and their complete reference is given. Acknowledgements I would like to thank Jan Pazdziora, the advisor of my thesis, for his help throughout the work. I am very grateful for his advice and valuable discussions. I would also like to thank Michael “Monty” Widenius – the MySQL chief developer – for a never- -ending stream of suggestions. Another thanks go to my family and my friends for their support during the period of time when I was working on this thesis. i Abstract MySQL is a free open source database server. For the purpose of increasing its performance, this paper presents an SQL query cache enhancement for the server. This new feature aims at increasing the server response speed, reducing its load and saving its resources by storing results of evaluated selection queries and retriev- ing them in case of their repeated occurrance. Reviews of previous work done on related subjects are given. System overview of the developed SQL query cache is presented together with a discussion of the problems encountered during its design and implementation. The comparison with the newly released version of the MySQL which contains a built-in query cache is also brought. Benchmarks shown to prove the performance increase of the modified server with repeated selection queries and a discussion about further enhancement and refinement possibilities conclude this paper. Keywords: database, caching, hashing, MySQL, SQL, query cache ii Contents 1 Introduction 1 1.1 Data retrieval . 1 1.2 Structured Query Language . 1 1.3 The MySQL database server . 2 1.3.1 SQL query cache enhancement . 2 1.4 The included CD-ROM . 2 1.5 Contents overview . 3 2 Caching 4 2.1 CPU caching . 4 2.2 World Wide Web caching . 5 2.3 Database web server query caching . 6 2.3.1 Active query caching . 6 2.4 Database query caching . 7 2.5 Database query cache requirements . 8 2.5.1 Memory requirements . 9 2.6 Differences between various types of caching . 9 3 Hashing 11 3.1 Collisions . 11 3.1.1 Open addressing . 12 3.1.2 Separate chaining . 12 3.1.3 Growing hashing tables . 12 3.2 The internal MySQL hashing table . 13 3.2.1 Fowler/Noll/Vo hash . 13 4 System Overview 14 4.1 The SQL query cache design . 14 4.2 The MySQL source code . 16 4.3 Server – client communication . 19 4.3.1 Sending fields . 20 4.3.2 Sending results . 20 4.4 Invalidation . 20 iii 4.4.1 Naive approach . 20 4.4.2 Gradual refinement . 21 4.5 Preventing parsing . 22 4.5.1 Table list entries duplicities . 23 4.6 Testing queries . 23 4.6.1 Sending modified data . 23 4.6.2 The mar sql not cacheable flag . 23 4.6.3 Temporary tables . 23 4.6.4 Special functions . 25 4.6.5 Procedures and UDF functions . 25 4.6.6 Variables . 25 4.6.7 MERGE table types . 27 4.7 Special MySQL options . 27 4.8 Caching empty results . 27 4.9 Caching more queries . 28 4.10 Generating hash key of stored items . 28 4.10.1 Active database name . 28 4.10.2 MySQL environment variables . 29 4.11 Memory limits . 29 4.11.1 sql cache memory limit . 29 4.11.2 sql cached query memory limit . 30 4.11.3 Determining cache size . 30 4.12 Cache replacement algorithm . 31 4.13 Fine-tuning the cache . 32 4.13.1 Ensuring thread safety . 32 4.13.2 Debugging compliance within MySQL . 32 4.13.3 Cache disabling situations . 32 4.14 The SQL query cache user interface . 32 4.14.1 Compile time options . 32 4.14.2 Command line options . 32 4.14.3 Client commands . 33 5 Comparison with MySQL 4.0.1-alpha 35 5.1 Differences . 35 5.1.1 Temporary tables . 36 5.2 Advantages of the built-in query cache . 36 5.3 Disadvantages of the built-in query cache . 36 5.4 Functionality errors . 37 5.5 Comparison conclusion . 38 6 Benchmarks 39 6.1 bench count distinct . 40 6.2 test alter table . 40 6.3 test ATIS ................................. 40 iv 6.4 test big tables . 41 6.5 test connect . 41 6.6 test create . 42 6.7 test different select . 43 6.8 test repeated select . 43 6.9 test select . 44 6.10 test wisconsin 100 ............................ 44 6.11 Benchmarks conclusion . 46 7 Discussion and Conclusion 47 A Installation 52 B Classes used 53 B.1 Class THD . 53 B.2 Class mar sql cache . 53 B.3 Class mar sql cache item . 54 B.4 Class mar sql cache packet . 55 B.5 Class db plus table . 55 B.6 Class db table info............................ 55 v Chapter 1 Introduction 1.1 Data retrieval Storing and retrieving data is one of the computers’ duty since the beginning of computing. Databases – storage facilities for data – and computer languages for working with them have been evolving ever since. As time goes on, more and more speed is required of data retrieval processes. There are fast rotating hard drives, high speed memory units and lightning quick processors. But there is also more data to work with, more users to serve, and larger retrievals demanded. The data is pulled from the database by submitting a query – an expression in a computer language that tells the database system to find some data and send it to the client. Evaluation of these queries upon large databases can be very demanding as far as server resources are concerned. Ways are looked-for how to reduce data retrieval latency, minimize the server load and thus to enhance the performance of the systems. One of the methods (applicable to probably every job a man or a computer ever has to do) is preventing an unnecessary repetition of the work that has been already done. This can be achieved (at least in the case of the database server) by storing the retrieved data into some place from which it can be regained much faster than by pulling it from the database again. A possible solution is to store the data into a predefined area in the computer’s memory where it can wait for repeated usage. Such area is called the cache and the technique is called caching. 1.2 Structured Query Language Perhaps the most popular language for retrieving data from a database today is the SQL – Structured Query Language – or some of its modifications. Queries in SQL have a more or less intuitive syntax, e.g. SELECT * FROM t1 WHERE t1.name1 = "Monty"; returns all entries from table t1 with attribute name1 equal to Monty. It is also possible to nest queries, e.g. SELECT * FROM t1 WHERE t1.name1 IN /*nested query begins here*/ SELECT name2 FROM t2; which returns all en- 1 tries from table t1 with attribute name1 in the set of all name2 attributes from table t2. Of course, there are many possibilities and a variety of queries that can be expressed in the SQL, but their full width is not important for this thesis. Queries that are necessary are all mentioned and briefly explained when it matters. 1.3 The MySQL database server Many database servers are used world-wide and these systems, if commercial, are very costly. In order to make the database usage more available to common users that are not under the wings of an organization that would finance the use of a com- mercial database system, non-commercial systems came to being. Another reason for their development could be increasing competitive environment on this field and uncovering the possible weaknesses of the commercial products. The MySQL is one of these free open source systems. The project was begun in the 80’s by one Finn, Michael “Monty” Widenius, and two Swedes, David Axmark and Allan Larsson. Its increasing popularity can be proved by stating some of the companies that use it – Yahoo!, Finance, MP3.com, Motorola, NASA, Silicon Graphics, and Texas Instruments [1]. When working on this thesis began, the current version of the MySQL was 3.23.33. During working, version 4.0.0-alpha was released. None of these versions had an implemented query cache. 1.3.1 SQL query cache enhancement Michael Widenius, the MySQL chief developer, considered adding the query cache into the server a matter of highest importance, and contacted (among others) Jan Pazdziora (the advisor of this thesis) with a call for proposals and opinions. Thus, the SQL query cache became available as the master thesis at the Faculty of Infor- matics and was picked by the author. Main requirements on the SQL query cache (how the created query cache is called throughout the work) included correct functionality on as many queries as possible and server performance gain where it is intuitively expected – in case of repeated query submission from the clients. Another requirement was the proper behavior in multithreaded environment, as MySQL is written as a multithreaded program. 1.4 The included CD-ROM The CD contains all the code that came to being during solving this thesis. Anyone can either use it as a patch (the patches/ directory, where all versions of the SQL query cache can be found (the latest version is 2.7), or use a copy of the sql/ 2 directory (for MySQL 4.0.0-alpha) or a copy of the sql-3.23.33/ directory (for the older MySQL 3.23.33).