JASC: Journal of Applied Science and Computations ISSN NO: 1076-5131

ANALYSIS ON SEARCHING

N.Pavani, M.Sree Vyshnavi, K. Sindhu Reddy, Mr. C Kishor Kumar Reddy and Dr.B V Ramana Murthy.

Stanley College of Engineering and Technology for Women, Hyderabad

[email protected], [email protected],

[email protected], [email protected], [email protected]

ABSTRACT

Searching algorithms are designed to check form an element and retrieve an element from array from any of data structure where it is stored. Various searching techniques are analyzed based on the time a space complexity. There are two types of searching, internal searching and external searching. The analysis shows the advantages and disadvantages of various searching algorithms. On analysis it is found out that binary search is suitable for mid-sized data items in arrays and whereas hash search is best for the larger data items.

Keywords: , Searching, Internal and External Searching, .

1. INTRODUCTION

Searching is a technique which programmers are always passively solving. Not even asingle day pass by, when we do not have to search for something in our daily life. Whenever a user asks for some data, computer has to search its memory to look for that data and should make it available to the user. And the computer has its own techniques to search through its memory very fast. To search an element in a given array, there are few algorithms available here:

⦁ Ternary Search

⦁ Binary Search

⦁ Hash Search

Volume VI, Issue I, January/2019 Page No:654 JASC: Journal of Applied Science and Computations ISSN NO: 1076-5131

1.1. Complexity Analysis

The Complexity analysis is used to determine algorithm which will take amount of resources (like time and space) are necessary to execute it. There are two types of complexities, they are: Time Complexity and Space Complexity.

An analysis of the time is required to solve a problem of particular size involves the time complexity of the algorithm. An analysis of computer memory required, involves the space complexity of the algorithm. There are three types of time complexities--- Best, average and worst case. Asymptotic Notations are languages that allow us to analyze an algorithms run time by identifying its behavior as the input size for the algorithm increases which is also known as an algorithm’s growth rate. Big Oh is mostly used to describe worst-case of an algorithm. Big Omega is the opposite of Big Oh, big Omega is used to describe the lower bound of asymptotic function. When an algorithm has complexity with lower bound=upper bound, which means the running time of that algorithm always falls in n log n for best-case and worst-case which is Big Theta.

2. RELEVANT WORK

Bentley discussed searching algorithms, its advantages and disadvantages. They analyzed the different searching algorithms and binary search for mid-sized lists. Thomas Niemann in his research worked on searching algorithms on basics of time, space complexities. Reema Thareja discussed about the hashing and the hash functions. Wein Mark Allen explained about the different searching techniques, implementation with few examples. Various Searching Algorithms are:

2.1. LINEAR SEARCH/SEQUENTIAL SEARCH

Linear searching is one of the basic searching technique. This type of searching is used to search an element in sequential order. Therefore, it is also called a sequential search. The basic concept of search is comparing the elements in the array. If any element in the array is matched to the number we entered then the position of that number is printed (index).Linear search has a simple . Performance of linear search, we have different cases: In the average case, the linear search takes n/2 comparisons and in the worst case, it takes 2n+1 comparison’s, where n is the number of elements in the set.

2.1.1. Algorithm

Linear search is simple and the job of this technique is to keep comparing with all the elements present in the list. This can be easily understood from the below algorithm. Firstly we need to set the value of i to. If the initial value is greater than n, then go to step 7. If a[i]=x then go to step 6. Then the value of i gets incremented. After incrementing again go to step 2. If the required element is found print "The element is found". If the element is not found then print "The element is not found", Exit.

Volume VI, Issue I, January/2019 Page No:655 JASC: Journal of Applied Science and Computations ISSN NO: 1076-5131

Figure.1: Image showing target value through linear search.

2.1.2. Analysis

a) Worst case is the condition at which the target is not found in the list and the order is O(n).

b) Best case is when the required target is found in the first position and the order is O(1).

c) Average case is when a target is found after n comparisons and the order is O(n).

2.1.3. Advantages

a) This kind of searching technique is simple to perform.

b) If we find the element in the first increment then there is no need for us to know how many pages are there.

c) This type of technique takes less time.

d) It is independent of number of elements in the directory.

e) The Time complexity of linear search is O(1).

2.1.4. Disadvantages

a) It is not suitable for long lists.

b) If the list is too long then we need to search each and every element in the list.

c) The Worst time complexity of the linear search is O(n).

Volume VI, Issue I, January/2019 Page No:656 JASC: Journal of Applied Science and Computations ISSN NO: 1076-5131

2.2. EXPONENTIAL SEARCH

Exponential search which is also called doubling search or galloping search is an algorithm created by Jon Bentley and Andrew-chi-chih-yao in 1976, for searching or sorted unbounded/infinite lists. There are many ways to implement this with the most common being to determine the range that the search key resides in and performing a binary search within that range. This takes O (log i ) where i is the position of the search key in list, if search key is in the list, or in position where the search key should be.

2.2.1. Algorithm

Exponential search allows searching through a sorted and an unbounded list for a specified input value “the search key". The algorithm consists of two stages. The first stage determines the range in which search key would reside if it was in the list. In the second stage, a binary search is performed on that range. In the first stage, assuming that list is sorted in an ascending order, the algorithm looks for the first exponent, j , where the value 2 j is greater than search key.

Figure.2: Target value being searched by exponential search.

In each step, the algorithm compares the search key value with the key value at current search index. If the element at current index is smaller than the search key, the algorithm repeats, skipping to next search index by doubling it, calculating the next power of 2. If the element at the current index is larger than the search key, the algorithm now knows that search key, if it is contained in the list is located in the interval formed by the previous search index, 2 j - 1 , and the current search index, 2 j . The binary search is then performed with the result of either a failure, if the search key is not in the list, or in the position of the search key in the list.

2.2.2. Time Complexity

The first stage of the algorithm takes O (log i ) time, where i is index where the search key would be in list. This gives the algorithm a total runtime, calculated by summing runtimes of the two stages: O (log i) + O (log i ) = 2 O (log i ) = O (log i ).

2.2.3. Advantages

a) It is easy to learn and apply.

b) It produces accurate forecasts.

Volume VI, Issue I, January/2019 Page No:657 JASC: Journal of Applied Science and Computations ISSN NO: 1076-5131

c) It gives more significance to recent observations.

2.2.4. Disadvantages

a) It produces forecasts that lag behind the actual trend.

b) It cannot handle trends well.

2.3. TERNARY SEARCH

This concept is used in unimodal functions to determine maximum or minimum value of that function. Unimodal functions are functions that have single highest value. Like linear search and binary search, ternary search is also a searching technique that is used to determine position of a specific value in an array. In binary search, the sorted array is divided into two parts while in ternary search, it is divided into 3 parts and then you determine in which part the element exists. Like linear search and binary search, ternary search is also a searching technique that is used to determine the position of a specific value in an array. In binary search, the sorted array is divided into two parts while in ternary search, it is divided into parts and then you determine in which part element exists. Ternary search, like binary search, is a divide-and-conquer algorithm. A ternary search tree is a special trie data structure where the child nodes of a standard trie are ordered as a binary search tree. Representation of ternary search trees: Unlike trie(standard) data structure where each node contains 26 pointers for its children, each node in a ternary search tree contains only 3 pointers:

a) The left pointer points to the node whose value is less than the value in the current node.

b) The equal pointer points to the node whose value is equal to the value in the current node.

c) The right pointer points to the node whose value is greater than the value in the current node.

Figure.3: Ternary Search Tree

2.3.1. Analysis

Let f(x) be a unimodal function on some interval [l; r]. Take any two points m1 and m2 in this segment:

Volume VI, Issue I, January/2019 Page No:658 JASC: Journal of Applied Science and Computations ISSN NO: 1076-5131

l< m1 < m2 < r. Then there are three possibilities: if f(m1) < f(m2), then the required maximum can not be located on the left side - [l; m1]. It means that the maximum further makes sense to look only in the interval [m1;r]. if f(m1) > f(m2), that the situation is similar to the previous, up to symmetry. Now, the required maximum cannot be in the right side - [m2; r], so go to the segment [l; m2]. if f(m1) = f(m2), then the search should be conducted in [m1; m2], but this case can be attributed to any of the previous two (in order to simplify the code). Sooner or later the length of the segment will be a little less than a predetermined constant, and the process can be stopped. Choice points m1 and m2:

m1 = l + (r-l)/3 and m2 = r - (r-l)/3

Run time order is, T ( n ) = T ( 2 n / 3 ) + 1 = Θ ( log ⁡ n ) {\displaystyle T(n)=T(2n/3)+1=\Theta (\logn)}

2.3.2. Time Complexity

The time complexity of ternary search tree operations is similar to that of binary search tree. i.e, the insertion, deletion and the search operations take time proportional to height of the ternary search tree. The space is proportional to length of the string to be stored.

2.3.3. Applications

a) Ternary search trees are efficient for queries like “Given a word, find the next word in dictionary(near-neighbor lookups)” or “Find all telephone numbers starting with 9342 or “typing few starting characters in a web browser displays all website names with this prefix”(Auto complete feature)”. b) Used in spell checks: Ternary search trees can be used as a dictionary to store all the words. Once the word is typed in an editor, the word can be parallel searched in the ternary search tree to check for correct spelling.

2.4. BINARY SEARCH

A is used to find the position of a specific value contained in a sorted array. Working with principle of divide and conquer, this search algorithm can be quite fast, but the thing is that the data has to be in a sorted form. It works by starting the search in the middle of the array and working going down the first lower or upper half of the sequence. It searches for a particular data item by equating it to the middle most data item is bigger than the search element, then the location of the data item is returned. If the middle data item is bigger than the search element, then the location of the data item is searched to the left of the middle data item. Else it is searched to the right of the middle data item. This process is repeated until the size of the subarray reduces to zero

2.4.1. Algorithm

Binary search works on sorted arrays. Binary search begins by comparing the middle element of the array with the target value. If the target value matches the middle element, its position in the array is returned. If the target value is less than the middle element, the search continues in the lower half of the array. If the target value is greater than the middle element, the search continues in the upper half of the array. By doing this, the algorithm eliminates the half in which the target value cannot lie in each iteration. a) Start with the middle element: - If the target value is equal to the middle element of the array, then return the index of the middle element.

Volume VI, Issue I, January/2019 Page No:659 JASC: Journal of Applied Science and Computations ISSN NO: 1076-5131

- If not, then compare the middle element with the target value, - If the target value is greater than the number in the middle index, then pick the elements to the right of the middle index, and start with Step 1. - If the target value is less than the number in the middle index, then pick the elements to the left of the middle index, and start with Step 1. b) When a match is found, return the index of the element matched. c) If no match is found, then return -1.

2.4.2. Performance

The performance of a binary search can be analyzed by reducing the procedure to a binary comparison tree. The root node of the tree is the middle element of the array. The middle element of the lower half is the left child node of the root and the middle element of the upper half is the right child node of the root. The rest of the tree is built in a similar fashion. The worst case may also be reached when the target element is not in the array.

Figure.4: Visualization of binary search algorithm where target value is 7

2.4.3. Time Complexity

The time complexity of the binary search algorithm belongs to the O(log n) class. This is called big O notation. The way you should interpret this is that the asymptotic growth of the time the function takes to execute given an input set of size n will not exceed logn.

a) Data structure - Array

b) Worst-case performance - O(log n)

c) Best-case performance - O(1)

d) Average performance - O(log n)

e) Worst-case space complexity - O(1).

2.4.4. Advantages

a) Compared to linear search (checking each element in the array starting from the first), binary search is much faster.

Volume VI, Issue I, January/2019 Page No:660 JASC: Journal of Applied Science and Computations ISSN NO: 1076-5131

b) Linear search takes, on average N/2 comparisons (where N is the number of elements in the array), and worst case N comparisons.

c) It’s a fairly simple algorithm, though people get it wrong all the time.

d) It’s well known and often implemented for you as a library routine.

2.4.4. Disadvantages

a) It’s more complicated than a linear search and is overkill for very small numbers of elements.

b) It works only on lists that are sorted and kept sorted. That is not always feasible, especially if elements are constantly being added to the list.

2.5. INTERPOLATION SEARCH

The Interpolation Search is an improvement over the Binary Search for instances, where the values in sorted array are uniformly distributed. Interpolation search resembles the method by which people search a telephone directory for a name in each step the algorithm calculates where in the remaining search space the sought item might be, based on the key values at the bounds of the search space and value of the sought key, usually via a linear interpolation. The key value actually found at this estimated position is then compared to key value being sought. If it is not equal, then depending on comparison, the remaining search space is reduced to the part before or after the estimated position. This method will only work if calculations on the size of differences between key values are sensible.

2.5.1. Algorithm

Step1: In a loop, calculate the value of “pos” using the probe position formula.

Step2: If it is a match, return the index of the item, and exit.

Step3: If the item is less than arr[pos], calculate the probe position of the left sub-array. Otherwise, calculate the same in the right sub-array.

Step4: Repeat until a match is found or the sub-array reduces to zero.

2.5.2. Advantages

a) When all elements in the list are sorted and evenly distributed, then executing time of interpolation search algorithm is log(log n)i.e) Best case

b) It requires less time when compared to binary search.

c) The Number of steps required to search an element will be comparatively less than binary search.

2.5.3. Disadvantages

a) However, when the elements in the list are increased exponentially, then executing time of interpolation search algorithm is 0(n)i.e)worst case.

Volume VI, Issue I, January/2019 Page No:661 JASC: Journal of Applied Science and Computations ISSN NO: 1076-5131

b) The calculation of x is complicated and requires more time.

c) The elements must be in sorted order.

2.6. HASH SEARCH

Hash Table is a data structure which stores data in an associative manner. In a , data is stored in an array format, where each data value has its own unique index value. Access of data becomes very fast if we know index of the desired data. Thus, it becomes data structure in which insertion and search operations are very fast irrespective of the size of the data. Hash Table uses an array as a storage medium and uses a hash technique to generate an index where an element is to be inserted or is to be located from. Hashing is a technique to convert a range of key values into a range of indexes of an array. We're going to use the modulo operator to get a range of key values.

2.6.1 Algorithm

Given an array in which the elements are arranged in descending order A(1, N) where N>=0 such that X=A(J) where J=0.

a) Integer low,mid,high,N,J.

b) One point to low and N points to high.

c) While low<=high we perform do function.

d) [(High +Low)/2] gives us mid value.

e) X

f) X>A(Mid),Mid +1 gives Low value.

g) Else Midpoints to J.

h) Return until J tends to 0.

Figure.5: Table for Hash function

Volume VI, Issue I, January/2019 Page No:662 JASC: Journal of Applied Science and Computations ISSN NO: 1076-5131

2.6.2. Analysis

We stated earlier that in the best case hashing would provide an O(1), constant time search technique. However, due to the collisions, the number of comparisons is typically not so simple If λ is large, meaning that the table is filling up, then there are more and more collisions. This means that collision resolution is more difficult, requiring more comparisons to find an empty slot.

2.6.3. Advantages

It is simple to implement. Hash table never fills up, we can always add more elements to the chain.

2.6.4. Disadvantages

If the chain becomes long, then the search time can become O(n) in the worst case. It Use extra space for links.

3. ANALYSIS ON SEARCHING ALGORITHMS

NAME AVERAGE WORST SPACE

Linear Search O(N) O(N) O(N)

Exponential Search O(N) O(N) O(N)

Ternary Search O(logN) O(N)

Binary search O(logN) O(logN) O(N)

Interpolation Search O(log(logN)) O(N)

Hash Search O(1) O(1) O(N)

Volume VI, Issue I, January/2019 Page No:663 JASC: Journal of Applied Science and Computations ISSN NO: 1076-5131

4. CONCLUSION

This paper discusses about various searching algorithms. These algorithms are implemented to search an element in the list. This analysis shows the advantages, disadvantages, time complexities along with example programs. On analysis we found out that binary search is suitable for mid-sized data. It is also applicable in arrays and in linked lists whereas hash search is best for large data items. We can also analyze that exponential search can be used for infinite set of elements.

5. REFERENCES

1) An Approximation Algorithm for Binary Searching in Trees duardo Laber Marco Molinaro : Algorithmica (2011)

2) D. E. Knuth, The Art of Computer Programming, Vol. 3: Sorting and Searching. Reading, MA: AddisonWesley,1973 Structures. [3] E. Horowitz and S. Sahni, fundamental of Data Structure.

3) Reema Thareja, Data structure using C, Oxford University Press, 2011.

4) Yedidyah langsam, Aaron M. Tenenbaum, "Data structures using c and c++" in second Indian printing, New Delhi-110001:Prentice Hall of India private limited.

5) Seymour Lipschutz, G.A. Vijayalakshmi Pai, Data Structures, Tata McGraw Hill companies.

6) R.S. Boyer, J. Srother Moore, "A fast string searching algorithm", Communication of the association for computing machinery Inc., vol. 20, no. 10, pp. 762-772, 1977.

7) Smita Paira, Sourabh Chandra, Sk Safikul Alam, Subhendu Sekhar Patra, "Bi Linear search a new session of searching", IJARCSSE, vol. 4, no. 3, pp. 459-463, March 2014.

8) Horvath, Adam. "Binary search and linear search performance on the .NET and Mono platform". Retrieved 19 April 2013.

9) W. W. Peterson (1957). "Addressing for Random-Access Storage". IBM J. Res. Dev. 1 (2): 130–146. doi:10.1147/rd.12.0130. 10) Weiss, Mark Allen (2006). Data structures and problem solving using Java, Pearson Addison Wesley 11) Armenakis, A. C., Garey, L. E., Gupta, R. D., An adaptation of a root finding method to searching ordered disk files, BIT Numerical Mathematics, Volume 25, Number 4 / December, 1985. 12) Sedgewick, Robert (1990), Algorithms in C, Addison-Wesley

13) Andersson, Arne, and Christer Mattsson. ‘Dynamic Interpolation Search in o(log log n) Time’. In Automata, Languages and Programming, edited by Andrzej Lingas, Rolf Karlsson, and Svante Carlsson, 700:15–27. Lecture Notes in Computer Science. Springer Berlin / Heidelberg, 1993. 14) Weisstein, Eric W. "Binary search". 15) Bentley 2000, §4.1 ("The Challenge of Binary Search").

Volume VI, Issue I, January/2019 Page No:664