Index Selection for In-Memory Databases

International Journal of Computer Sciences &&& Engineering Open Access Research Paper Volume-3, Issue-7 E-ISSN: 2347-2693 Index Selection for in-Memory Databases Pratham L. Bajaj 1* and Archana Ghotkar 2 1*,2 Dept. of Computer Engineering, Pune University, India, www.ijcseonline.org Received: Jun/09/2015 Revised: Jun/28/2015 Accepted: July/18/2015 Published: July/30/ 2015 Abstract —Index recommendation is an active research area in query performance tuning and optimization. Designing efficient indexes is paramount to achieving good database and application performance. In association with database engine, index recommendation technique need to adopt for optimal results. Different searching methods used for Index Selection Problem (ISP) on various databases and resemble knapsack problem and traveling salesman. The query optimizer reliably chooses the most effective indexes in the vast majority of cases. Loss function calculated for every column and probability is assign to column. Experimental results presented to evidence of our contributions. Keywords— Query Performance, NPH Analysis, Index Selection Problem provides rules for index selections, data structures and I. INTRODUCTION materialized views. Physical ordering of tuples may vary from index table. In cluster indexing, actual physical Relational Database System is very complex and it is design get change. If particular column is hits frequently continuously evolving from last thirty years. To fetch data then clustering index drastically improves performance. from millions of dataset is time consuming job and new In in-memory databases, memory is divided among searching technologies need to develop. Query databases and indexes. Different data structures are used performance tuning and optimization was an area of for indexes like hash, tree, bitmap, etc. Collision interest for many researchers. Query performance can be avoidance should be considered in hash indexes. Indexes optimize either by query reformation or index allocates some space, so it is important to select indexes recommendation. In Query reformation, compound which will yield profit and leads to less overhead for queries (nested queries) are converted into simple queries index maintenance. In composite indexes, order of and join operations are restructure to reduce columns have major impact on complete system computational cost. Database Engine performs query performance. Compound indexes used only when reformation by converting one operator into other. For particular set of columns are hits simultaneously in query. example, BETWEEN operator is converted into IN For example, supposed column A and column B are operator, if number of values in IN operator are less. OR considered as compound indexes AB. If queries get fired operator is try to convert into AND operator because both on column A or on column A and B then compound index predicates are evaluated separately that means two times AB gives optimal results. If predicate in queries fired on table scan need to do. In general, database engine suggest column B then compound index is not used for user query. indexes from candidate keys. But it may not be Tuples retrieval from millions of records is difficult so considered in user query, so complete table scan is optimized searching methods are used. Generally perform. And also there is cost associated with index indexing is perform on candidate keys and selected table maintenance so number of indexes and type of data columns contributes in query execution [1]. Index structures are need to choose optimally. Selection Problem is NP-Hard problem. If there are N columns then number of indexes are power set of N i.e. Initially databases are solely stored in disk but due to fast N data requirement and by considering limitations of disk, 2 . Indexes required some maintenance time and allocates in-memory databases are evolved. Searching specific set memory. If insert and update queries hit frequently then of tuples from millions of tuples which satisfies input overhead is more. Researchers found ISP resembles query is difficult. Automatic Index recommendation in Knapsack problem and knapsack. Knapsack allows to databases will reduce overall query execution cost. CRUD select indexes which provides maximum profit to user operations may force to change indexing structure so queries. Genetic algorithm is stochastic search, use when selection of indexes should be done carefully. Also there is search space is large and space increasing exponentially. maintenance cost associated with indexing. Number of It is highly recommended when to deal with non-linear index selection is depend upon application requirement. optimization. It follows Darwin’s “survival of fittest” For experimentation, in-memory databases are used. SQL model. Following table discussed different searching like query language used from user queries. techniques for ISP. Select type of index that best fits application specification. II. RELATED WORK Index characteristics include the following: Database tables contains millions of records and indexes • Clustered versus nonclustered helps to reduce overall searching time. Researchers © 2015, IJCSE All Rights Reserved 129 International Journal of Computer Sciences and Engineering Vol.-3(7), PP( 129-132) July 2015, E-ISSN: 2347-2693 • Unique versus nonunique computing ISP by Genetic Algorithm (GA). Objective function explained with time complexity and binary • Single column versus multicolumn encoding. Wearable devices and sensors generating data so frequently that fast searching algorithms need to derive. One can summarized from above literatures, ISP can be With evolving databases and changing user requirements, solve either by linear programming or by non-linear i.e. new query languages comes like NoSQL and NewSQL GA. If rate of growth of data is high then it better to go were developed. with GA. G. Valentin et. al. [2] considered variant of knapsack III. METHODOLOGY problem and suggested best index configuration for query. The paper discussed complete implementation of query Smart column enumeration suggests indexes based on user performance optimization by index recommendation. queries. Solution provided is globally optimal. DB2 work Goals and objectives are as black box index recommendation engine. Various operator combination form within query and remove • To recommend set of indexes that can minimizes duplicates. Also formulated number of indexes possible query execution cost. for given databases. • To calculate query cost estimation. • H. Gupta et. al. [3] provide first model for automatic To perform Space analysis. • index selection by summary table. They rely on To perform Query analysis. • precompute data so that in future queries will give optimal To calculate probability of every column. result. Also provide all possible set of solution for view, • To validate recommended indexes. queries and indexes. Chances of optimal result is very less. System contains database, system block diagram and detail description of algorithms used in each phase. High S. Chaudhari et. al [5] depicted autoAdmin model for dimensional database i.e number of columns more than 7 physical design and provided what-if interface method. and having different data types were considered for Very well explanation of materialize view for index implementation. Also database contains millions of rows. selection. Considered multicolumn indexes with Another Tool For Language Recognition (ANTLR) is enumeration of single column indexes. Explained parser generator that uses LL(*) parsing. LL(1) is top challenges in physical design changes, scaling challenges down parser and lookahead one symbol. ANTLR takes an and how it changes overall system maintenance cost. [11] input a grammar that specific language and generate Focused on exploiting feedback from query execution and output as source code. It generate lexer, parser, tree parser query progress estimation. Specially focused on merging and combine lexer-parser. Parser then generate Abstract and reduction in number of calls from optimizer. Syntax Tree (AST). Following figure shows syntax tree P. Kolackow et. al. [6] solve ISP with genetic for grammar. algorithm by considering Darwin’s survival of fittest model. Genetic algorithm (GA) does not guarantee of optimal solution but converges solution space in one phase. Focus on query execution cost by index recommendation. GA refer chromosomes model by adopting mutation and crossover [4]. S. Chaudhuri [8] resembles ISP with variation of knapsack problem and approach lies in Linear Programming. Experiments done by considering materialize view. Highlighted two sub problems: picking right set of cluster Fig 1. Syntax Tree indexes and non-clustered indexes and referred as hard to approximate. Explained workload with extensive updates. The major steps involved in the proposed system are: 1. Database analysis S. Chaudhuri et. al. [10][13] number of index set evaluated 2. Query analysis by considering both query syntax and cost information. 3. Statistical analysis Find “goodness” for each index set. Iterative complexity 4. Index recommendation analysis done for multicolumn indexes. Depicted ISP as 5. Query execution cost difficult search problem. Efficiency measure on (1) In proposed work, index recommendation done Number of indexes (2) Enumerated number of index dynamically by analyzing both user inputs and databases. selection configuration (3) Number of optimizer calls for Generally, indexes are recommended by database engine each enumeration

Index Selection for In-Memory Databases

Amazon Aurora Mysql Database Administrator's Handbook

Innodb 1.1 for Mysql 5.5 User's Guide Innodb 1.1 for Mysql 5.5 User's Guide

Mysql Table Doesn T Exist in Engine

Migrating Borland® Database Engine Applications to Dbexpress

A Methodology for Evaluating Relational and Nosql Databases for Small-Scale Storage and Retrieval

Digital Forensic Innodb Database Engine for Employee Performance Appraisal Application

Introduction to Databases Presented by Yun Shen ([email protected]) Research Computing

The Design and Implementation of a Non-Volatile Memory Database Management System Joy Arulraj

Introducing Severalnines Clustercontrol™

Mysql Storage Engines Data in Mysql Is Stored in Files (Or Memory) Using a Variety of Different Techniques

Database Buffer Cache • Relation to Overall System • Storage Techniques in Context

Database Engine Tuning Advisor Index Recommendations