An Empirical Study of SQL Query Optimization Techniques a Seminar Paper

An Empirical Study of SQL Query Optimization Techniques A Seminar Paper Presented to Dr. Yan Shi University of Wisconsin-Platteville In Partial Fulfillment Of the Requirement for the Degree Master of Science in Computer Science By Megha Patel Year of Graduation - 2019 Acknowledgement This research is part of my graduate study at the University of Wisconsin-Platteville. I thank the MSCS graduate studies program for giving me an opportunity to make this seminar paper happen. I thank Dr. Shi Yan, Associate Professor of Computer Science and Software Engineering at the University of Wisconsin-Platteville for assistance with completing this seminar paper and for comments that greatly improved the manuscript. With all support and guidance of Dr. Shi, I made this seminar paper happen successfully, and I am very thankful for Dr. Shi. Abstract It is necessary to have databases and datasets in all the applications, websites and programs to support their Modern Information System (MIS). These MIS systems use Relational Database Management System (R-DBMS) to store data and Procedure Language/Structured Query Language (PL/SQL) to retrieve, add or generate data. The SQL query may not perform well on big data or complex data tables. Hence, it is important to use improved SQL queries. This paper describes different SQL query optimization methods based on different scenarios and examples. This paper also describes parse trees for each technique. It includes how the parse tree is affected by optimized query. It also shows the difference between parse trees before applied optimized method and after. Table of Contents i. Approval Page ii. Title Page iii. Acknowledgement iv. Abstract v. Table of Contents 1. Introduction: An Overview of SQL Query Optimization 2. Preliminary 3. SQL Query Optimization Techniques to Study 3.1. Using Index 3.2. Using SQL Variables 3.3. Deter Full Table Scan (FTS) 3.4. Identical Statement Processing 3.5. Making SQL Queries Reusable 3.6. Reducing Sub Queries 3.7. Reducing Characteristics 3.8. Dead Code Elimination 3.9. Break Down SQL Queries 4. Conclusion 5. References vi. List of Figures Fig.1 Query Execution Example. Fig. 2 Query Processing Steps. Fig.3 Parse Tree for the query without using Index. Fig.4 Parse Tree for the query using Index. Fig. 5: Parse tree with WHERE clause. Fig. 6: Parse tree without WHERE clause. Fig. 7: Parse tree for different data values query. Fig.8: Query trees for query statement. Fig. 9: Parse tree for Reusable query. Fig. 10: Parse tree for not reusable query. Fig. 11: Parse tree for not reusable query. Fig.12: Parse tree for query without dead code. Fig.13: Parse tree for query with dead code. An Empirical Study of SQL Query Optimization Techniques Megha Patel Master of Science – Computer Science University of Wisconsin, Platteville 1. Introduction: An Overview of SQL applicable and what type of database and Query Optimization queries would be good for the optimization. All the applications, websites and 2. Preliminary programs require database and datasets to support their operations in the Modern Let’s consider an example of simple Information Systems (MIS). These Modern query for the dataset shown in fig. 1. Information Systems store large amounts of Query: Returning the rooms which cost less data that typically use relational databases. than $130. Relational databases use PL/SQL (Procedure Π Room (Hotel Hotel_Name=HotelRoom σ Language/Structured Query Language) to Price<$130 Room_Type=Room (Price)) retrieve, add or generate the data. To improve Hotel_Name City the applications, websites and programs Fairfield Madison performance, it is important to use improved Best Western Verona PL/SQL queries in this era. SQL query on a (a) Hotel large dataset may not keep up with the performance requirements so that it needs to Hotel_Name Room_Type optimize the SQL query. Fairfield King There are too many existing query Fairfield Queen optimization methods, so it is difficult to Fairfield 2 Queens choose one best method for practitioners. This Best Western King seminar paper describes the different existing Best Western Queen (b) Room techniques for optimizing the SQL queries. It explains which technique suits best in different Room_Type Price scenarios. King $120 This seminar paper contains a brief Queen $110 description of the optimization methods and 2 Queens $140 detailed explanations with examples of the (c) Price SQL query optimization methods. This seminar paper also contains the analysis of the Room optimization methods and how much query is King optimized. For example, there is the table Queen EMPLOYEE has X columns and Y records. (d) Result How is the query tree and parse tree affected using each optimization method? It also explains when the optimization methods are next step of query processing. During this Result Hotel Room Price stage, if any errors get detected, the process Room Hotel_ City Hotel_ Room_ Room_ Price Name Name Type Type will stop and return the errors. King Fairfield Madison Fairfield King King $120 Queen Best Verona Fairfield Queen Queen $110 Western (e) Relational encoding of query Fig.1 Query Execution Example. The query retrieves data based on asked criteria. This query accesses the data of the tables name Hotel, Room and Price. SQL statements can be used to write the query. The SQL statement can control transactions, program flow, connections, or sessions. In the programs, SQL statements are used for sending queries from client programs to database servers. SQL statement can be as: Fig. 2 Query Processing Steps SELECT Room FROM Hotel H JOIN Room R The next stage is algebraizing, which ON H.Hotel_Name = R.Hotel_Name can also be referred to as binding. During this JOIN Price P stage, the SQL server performs several ON R.Room_Type = P.Room_Type operations on the parse tree and generates a WHERE P.Price < $130; query tree which passes to the query optimizer. This stage performs several tasks like name This SQL statement sends the query from client resolution, type derivation, aggregate binding program to the database server, and it retrieves and group binding. Name resolution confirms the data from the server. The tables will be that used table and column names are existing. joining by their relationship by using query as Type derivation determines the final type of shown in fig. 1(e) and gives the output of the each parse tree node. Aggregation binding rooms that costs less than $130 shown in fig. checks where to apply aggregation, and Group 1(d). binding binds the appropriate selected list. The SQL server performs in four steps During this stage, if there is any syntax error it as shown in fig. 2 for the SQL query statement will be detected. The process would be halted, process. The parsing step performs some basic and it would return the errors to the user. checks on the source code. It looks for invalid The SQL server finds the best way to SQL syntax such as incorrect column and table execute query using the optimizer. It takes data name, incorrect use of reserved words, etc. statistics as rows and unique data in rows if the When parsing completes successfully, it table increases in size more than one page. generates a parse tree which is an internal Using these statistics, a cost-based plan is representation of the query, and it passed to prepared. Every cost-based plan is prepared using CPU and I/O resources. Among these SELECT * FROM EMPLOYEE plans, optimizer chooses the best plan to WHERE LASTNAME = ‘ANDERSON’; execute. Finally, the last stage executes the execution plan sent by the query optimizer. This example shows the use of an index on the These query processing steps show the table EMPLOYEE to retrieve the data of all the importance of the optimized query usage. The employees with last name Anderson. The table next section will discuss different methods of EMPLOYEE has large amounts of data. It query optimization techniques. would be more convenient to use an index to retrieve data. Without using an index, the select 3. SQL Query Optimization query will force the parser to parse all the data Techniques to Study of the columns which takes more time for query execution. Table data can be indexed by all the 3.1. Using Index predicates by clauses like JOIN, WHERE, GROUP BY, ORDER BY to optimize the SQL Indexes are used to view data from tables query. In the example, WHERE clause is used quickly. We can compare indexes in SQL with to optimize the query. Improper indexes can an index in a book. For example, if we need to cause full table scans which could result in find any specific topic from the chapter, we can locking or performance problem. search for the chapter and the topic name into This optimization technique would be more the index of a book with less time and effort. efficient for the database that has more than one Same as that, an index can be used to find and thousand data. For a query tree, this technique view data from tables more efficiently. may increase the number of nodes but using There are three ways to create an index in index technique reduces the nodes in a parse SQL query: unique, clustered and non- tree. The parse tree for the example shown in clustered. A unique index has no permission to this section 3.1 could be as shown in fig. 3 and have same index key value for two rows. The Fig. 4. clustered index creates an index in which the logical order of the key values determines the physical order of the corresponding rows in the table, and the non-clustered index specifies the logical order of a table but physical order of the data rows are independent to their logical indexed order.

Load more