A New Mathematical Model for Unsorted Database Search

A new mathematical model for unsorted database search Vivek Kumar1 and Sandeep Sharma2 1,2Department of Electronics and Communication Engineering, Dehradun Institute of Technology, Dehradun-248009, Uttarakhand, India [email protected], [email protected] An unsorted database consists of N records, out of which only one is of particular interest. Implementation of classical sequential search and quantum search algorithms on such database gives an upper bound complexity of O(푵) and O(√푵) respectively. We hereby describe a new approach which deploys simple arithmetic and comparison operations to search for a record in an upper bound space and time complexity of O(푵) and O(√푵) respectively. A record in a sorted database can be searched either by binary search algorithms or hash tables which results in faster searching output. [1] However, most of the real world data occur in random format and therefore before searching, sorting of the database elements is implemented. Meanwhile, the lower bound complexity of any known sorting algorithm reaches to O(푁) e.g. timsort[2], cubesort[3], shellsort[4] etc. or log(N) in the case of binary search which means that to perform any searching operations, sorting algorithm complexity has to be included with it. But, in the case of unsorted database only classical sequential search is practiced at an asymptotic complexity of O(푁). Here we present a new approach which covers complete database during searching for a particular record within an upper bound asymptotic space-time complexity of O(푁) and O(√푁) respectively. Consider a database of size N onto which a loop is maintained to traverse, fetching √푁 records at a time and incrementing its value by a factor of √푁 on each new run. Thus, traversing the complete database in √푁 runs. We derive a condition (see Methods for derivation) which investigates presence of the particular record in fetched records in an asymptotic complexity of O(√푁) per √푁 runs. The condition works as a judging tool to identify either any of the fetched record is of our interest or not. If any of the √푁 records satisfies the condition then the loop breaks and another loop starts which traverses those √푁 records, comparing each record to the one to be searched sequentially in O(√푁). Hence, final time complexity of the algorithm becomes O(√푁) × O(√푁) + O(√푁) = O(N) + O(√푁) = O(N) a total of O(N). For simple understanding and demonstration of the algorithm, we replace (√푁) by 푁⁄2. Now, consider a record c to be searched along with content storing functions f (푁⁄2) and g (푁⁄2), where on each fresh run, single value is stored in each respective function giving a space complexity of O(1). By these conventions we deliver a comparison condition as mentioned below. 2 2 2 2 푁 푁 ( (푓 ( )) − 푐2) ( (푔 ( )) − 푐2) 2 2 ( ) ( ) ( ) ( )( ⁄ ) 푓푙표푎푡 (1⁄1 + 푖푛푡 ( 2 2 ) + 푖푛푡 ( 2 2 )) != 푓푙표푎푡 1 3 푁 푁 ( (푓 ( )) − 푐2) −1 ( (푔 ( )) − 푐2) −1 2 2 2 2 2 2 푁 푁 ( (푓 ( )) − 푐2) ( (푔 ( )) − 푐2) 2 2 ( ⁄ ) The factor ( 2 2 ) is represented by F 푁 2 and the factor ( 2 2 ) is 푁 푁 ( (푓 ( )) − 푐2) −1 ( (푔 ( )) − 푐2) −1 2 2 represented by G(푁⁄2). Both the functions are typecast with keyword integer (푖푛푡) such that the results of F(푁⁄2) and G(푁⁄2) aren’t a decimal equivalent. Furthermore, the final outcome on LHS and RHS has to be a decimal equivalent and hence, uses keyword float (푓푙표푎푡) typecasting. This complete condition is called at each fresh run of the loop (here,(푁⁄2) times) and can only be satisfied true if either of the two functions f (푁⁄2) or g (푁⁄2) consists of the required record c. Once satisfied, the program breaks and enter into another loop of sequential checking of each (푁⁄2) records. Hence, retrieving the position of required record within an asymptotic time complexity of O(푁/4) {However, O(푁/4) ≈ O(푁)}. The proposed algorithm works well for both positive and negative numbers. But, concerns arise with decimal record which is kept in future work. Therefore, it’s not recommended currently for decimal record databases. Now, let us bring the final generalized comparison condition expression; 2 2 ( (푓 (√푁)) − 푐2) ( ) ∑ ( ) ( ) ⁄ 푓푙표푎푡 (1⁄1 + √푁 ( 푖푛푡 ( 2 2 ))) != 푓푙표푎푡 (1 1 + √푁) ( (푓 (√푁)) − 푐2) −1 This function checks for √푁 records at each fresh run with an asymptotic space complexity of O(N); giving a final space-time complexity of O(N) and O(√푁) as stated. References [1] Knuth 1998, §6.2 ("Searching by Comparison Of Keys") [2] Peters, Tim. "[Python-Dev] Sorting". https://mail.python.org/pipermail/python-dev/2002- July/026837.html. Retrieved 2 June 2016 [3] Robert Cypher, Jorge L.C Sanz (1992), Cubesort: A parallel algorithm for sorting N data items with S-sorters [4] Pratt, Vaughhan Ronald (1979). Shellsort and Sorting Networks. Garland. ISBN 0-8240- 4406-1. Methods Derivation of algorithm complexity: The proposed algorithm constitutes of two loops out of which first consists of proposed comparison condition and later consists of sequential comparison condition. We start by analyzing the complexity of each loop successively. Given below is the complete proposed algorithm: Initialize i, j, rec Given an unsorted database array A Retrieve record to be searched and save in rec // First loop (Consisting of proposed comparison condition) for i = 0; undergoing √푁 times runs; incrementing i by √푁 times 2 푡ℎ 2 2 ( (퐴[푖+√푁 ]) − 푟푒푐2) ( (퐴[푖])2 − 푟푒푐2) if (푓푙표푎푡) 1⁄1 + (푖푛푡) ( 2 2 2 ) + ⋯ + (푖푛푡) ( 2 ) ( (퐴[푖]) − 푟푒푐 ) −1 푡ℎ 2 ( (퐴[푖+√푁 ]) − 푟푒푐2) −1 ( ) != (푓푙표푎푡)(1⁄1 + √푁) break end if end for // Second loop (Consisting of sequential comparison condition) for j = 0; undergoing √푁 runs; incrementing j by 1 if A[i] == rec Record found at ith position break end if incrementing i by 1 end for Complexity analysis of former loop: The total runs performed by the loop ignoring the inner condition are equal to √푁 giving first complexity factor as √푁. Next, the proposed condition is responsible for major change in both space and time complexity. Since, it has to occupy √푁 space for storing those much records for analysis, therefore, asymptotic space complexity becomes O(√푁). Next, to analyze complexity of factors like (((퐴[푖])2 − 푟푒푐2)2⁄((퐴[푖])2 − 푟푒푐2)2 − 1), we have to remove the constants (here, rec and 1), which reforms the factor in the terms of 퐴[푖]4 equal in both numerator and denominator. Hence, a single factor is responsible for O(1), however, √푁 such factors are summed up which finally gives complexity of O((N+√푁)/2) = O(N). To attain the overall complexity of former loop, we perform multiplication of both complexities; O(√푁) × O(√푁) = O(푁) Complexity analysis of later loop: The later loop consists of sequential comparison of √푁 records. Hence, giving an asymptotic space complexity of O(1) and asymptotic time complexity of O(√푁). Overall, the asymptotic space complexity becomes O(√푁) + O(1) = O(√푁) and asymptotic time complexity becomes O(N) + O(√푁) = O(N). .

Load more