Binary Search Algorithm Anthony Lin¹* Et Al

Total Page:16

File Type:pdf, Size:1020Kb

Binary Search Algorithm Anthony Lin¹* Et Al WikiJournal of Science, 2019, 2(1):5 doi: 10.15347/wjs/2019.005 Encyclopedic Review Article Binary search algorithm Anthony Lin¹* et al. Abstract In In computer science, binary search, also known as half-interval search,[1] logarithmic search,[2] or binary chop,[3] is a search algorithm that finds a position of a target value within a sorted array.[4] Binary search compares the target value to an element in the middle of the array. If they are not equal, the half in which the target cannot lie is eliminated and the search continues on the remaining half, again taking the middle element to compare to the target value, and repeating this until the target value is found. If the search ends with the remaining half being empty, the target is not in the array. Binary search runs in logarithmic time in the worst case, making 푂(log 푛) comparisons, where 푛 is the number of elements in the array, the 푂 is ‘Big O’ notation, and 푙표푔 is the logarithm.[5] Binary search is faster than linear search except for small arrays. However, the array must be sorted first to be able to apply binary search. There are spe- cialized data structures designed for fast searching, such as hash tables, that can be searched more efficiently than binary search. However, binary search can be used to solve a wider range of problems, such as finding the next- smallest or next-largest element in the array relative to the target even if it is absent from the array. There are numerous variations of binary search. In particular, fractional cascading speeds up binary searches for the same value in multiple arrays. Fractional cascading efficiently solves a number of search problems in compu- tational geometry and in numerous other fields. Exponential search extends binary search to unbounded lists. The binary search tree and B-tree data structures are based on binary search. Algorithm 1. Set 퐿 to 0 and 푅 to 푛 − 1. 2. While 퐿 ≤ 푅, Binary search works on sorted arrays. Binary search be- 1. Set 푚 (the position of the middle element) to gins by comparing an element in the middle of the array 퐿+푅 the floor of , which is the greatest integer with the target value. If the target value matches the el- 2 퐿+푅 ement, its position in the array is returned. If the target less than or equal to . 2 value is less than the element, the search continues in 2. If 퐴 < 푇, set 퐿 to 푚 + 1. the lower half of the array. If the target value is greater 푚 3. If 퐴 > 푇, set 푅 to 푚 − 1. than the element, the search continues in the upper half 푚 of the array. By doing this, the algorithm eliminates the 4. Else, 퐴푚 = 푇; return 푚. half in which the target value cannot lie in each itera- 3. If the search has not returned a value by the time tion.[6] While 퐿 > 푅, the search terminates as unsuccessful. This iterative procedure keeps track of the search Procedure boundaries with the two variables 퐿 and 푅. The proce- Given an array 퐴 of 푛 elements with values or records dure may be expressed in pseudocode as follows, where the variable names and types remain the same as 퐴0, 퐴1, 퐴2, … , 퐴푛−1 sorted such that 퐴0 ≤ 퐴1 ≤ 퐴2 ≤ above, floor is the floor function, and unsuccessful ⋯ ≤ 퐴푛−1, and target value 푇, the following subroutine uses binary search to find the index of 푇 in 퐴.[6] refers to a specific value that conveys the failure of the search.[6] *Author correspondence: by online form Licensed under: CC BY-SA Received 29-10-2018; accepted 02-07-2019 1 of 13 | WikiJournal of Science WikiJournal of Science, 2019, 2(1):5 doi: 10.15347/wjs/2019.005 Encyclopedic Review Article function binary_search(A, n, T): L := 0 Duplicate elements R := n − 1 while L <= R: The procedure may return any index whose element is m := floor((L + R) / 2) equal to the target value, even if there are duplicate el- if A[m] < T: ements in the array. For example, if the array to be L := m + 1 else if A[m] > T: searched was [1,2,3,4,4,5,6,7] and the target was 4, R := m - 1 then it would be correct for the algorithm to either re- else: turn the 4th (index 3) or 5th (index 4) element. The reg- return m return unsuccessful ular procedure would return the 4th element (index 3). However, it is sometimes necessary to find the leftmost 퐿+푅 Alternatively, the algorithm may take the ceiling of , 2 element or the rightmost element for a target value 퐿+푅 or the least integer greater than or equal to . This that is duplicated in the array. In the above example, the 2 4th element is the leftmost element of the value 4, may change the result if the target value appears more while the 5th element is the rightmost element of the than once in the array. value 4. The alternative procedure above will always re- turn the index of the rightmost element if such an ele- Alternative procedure ment exists.[8] In the above procedure, the algorithm checks whether the middle element (푚) is equal to the target (푇) in Procedure for finding the leftmost element every iteration. Some implementations leave out this To find the leftmost element, the following procedure check during each iteration. The algorithm would per- can be used:[9] form this check only when one element is left (when 1. Set 퐿 to 0 and 푅 to 푛. 퐿 = 푅). This results in a faster comparison loop, as one comparison is eliminated per iteration. However, it re- 2. While 퐿 < 푅, quires one more iteration on average.[1] 1. Set 푚 (the position of the middle element) to 퐿+푅 the floor of , which is the greatest integer Hermann Bottenbruch published the first implementa- 2 퐿+푅 [1][8] less than or equal to . tion to leave out this check in 1962. 2 1. Set 퐿 to 0 and 푅 to 푛 − 1. 2. If 퐴푚 < 푇, set 퐿 to 푚 + 1. 2. While 퐿 ≠ 푅, 3. Else 퐴푚 ≥ 푇, set 푅 to 푚. 1. Set 푚 (the position of the middle element) to 3. Return 퐿. 퐿+푅 the ceiling of , which is the least integer 2 If 퐿 < 푛 and 퐴퐿 = 푇, then 퐴퐿 is the leftmost element 퐿+푅 greater than or equal to . that equals 푇. Even if 푇 is not in the array, 퐿 is the rank 2 of 푇 in the array, or the number of elements in the array 2. If 퐴푚 > 푇, set 푅 to 푚 − 1. that are less than 푇. 3. Else 퐴푚 ≤ 푇, set 퐿 to 푚. floor 3. Now 퐿 = 푅, the search is done. If 퐴퐿 = 푇, return 퐿. Where is the floor function, the pseudocode for Otherwise, the search terminates as unsuccessful. this version is: Where ceil is the ceiling function, the pseudocode for function binary_search_rightmost(A, n, T): this version is: L := 0 R := n function binary_search_alternative(A, n, T): while L < R: L := 0 m := floor((L + R) / 2) R := n − 1 if A[m] > T: while L != R: R := m m := ceil((L + R) / 2) else: if A[m] > T: L := m + 1 R := m - 1 return L - 1 else: L := m if A[L] == T: return L return unsuccessful Procedure for finding the rightmost element To find the rightmost element, the following procedure can be used:[9] 2 of 13 | WikiJournal of Science WikiJournal of Science, 2019, 2(1):5 doi: 10.15347/wjs/2019.005 Encyclopedic Review Article 1. Set 퐿 to 0 and 푅 to 푛. 2. While 퐿 < 푅, 1. Set 푚 (the position of the middle element) to 퐿+푅 the floor of , which is the greatest integer 2 퐿+푅 less than or equal to . 2 2. If 퐴푚 > 푇, set 푅 to 푚. 3. Else 퐴푚 ≤ 푇, set 퐿 to 푚 + 1. 3. Return 퐿 − 1. Figure 1 | Binary search can be adapted to compute approxi- If 퐿 > 0 and 퐴 = 푇, then 퐴 is the rightmost ele- 퐿−1 퐿−1 mate matches. In the example above, the rank, predecessor, ment that equals 푇. Even if 푇 is not in the array, successor, and nearest neighbor are shown for the target (푛 − 1) − 퐿 is the number of elements in the array that value 5, which is not in the array. are greater than 푇. Where floor is the floor function, the pseudocode for • The nearest neighbor of the target value is either its this version is: predecessor or successor, whichever is closer. • Range queries are also straightforward.[11] Once the function binary_search_rightmost(A, n, T): ranks of the two values are known, the number of L := 0 elements greater than or equal to the first value and R := n while L < R: less than the second is the difference of the two m := floor((L + R) / 2) ranks. This count can be adjusted up or down by one if A[m] > T: according to whether the endpoints of the range R := m else: should be considered to be part of the range and L := m + 1 whether the array contains entries matching those return L - 1 endpoints.[12] Approximate matches Performance The above procedure only performs exact matches, finding the position of a target value. However, it is triv- Number of comparisons ial to extend binary search to perform approximate matches because binary search operates on sorted ar- In terms of the number of comparisons, the perfor- rays.
Recommended publications
  • Lecture 11: Heapsort & Its Analysis
    Lecture 11: Heapsort & Its Analysis Agenda: • Heap recall: – Heap: definition, property – Max-Heapify – Build-Max-Heap • Heapsort algorithm • Running time analysis Reading: • Textbook pages 127 – 138 1 Lecture 11: Heapsort (Binary-)Heap data structure (recall): • An array A[1..n] of n comparable keys either ‘≥’ or ‘≤’ • An implicit binary tree, where – A[2j] is the left child of A[j] – A[2j + 1] is the right child of A[j] j – A[b2c] is the parent of A[j] j • Keys satisfy the max-heap property: A[b2c] ≥ A[j] • There are max-heap and min-heap. We use max-heap. • A[1] is the maximum among the n keys. • Viewing heap as a binary tree, height of the tree is h = blg nc. Call the height of the heap. [— the number of edges on the longest root-to-leaf path] • A heap of height k can hold 2k —— 2k+1 − 1 keys. Why ??? Since lg n − 1 < k ≤ lg n ⇐⇒ n < 2k+1 and 2k ≤ n ⇐⇒ 2k ≤ n < 2k+1 2 Lecture 11: Heapsort Max-Heapify (recall): • It makes an almost-heap into a heap. • Pseudocode: procedure Max-Heapify(A, i) **p 130 **turn almost-heap into a heap **pre-condition: tree rooted at A[i] is almost-heap **post-condition: tree rooted at A[i] is a heap lc ← leftchild(i) rc ← rightchild(i) if lc ≤ heapsize(A) and A[lc] > A[i] then largest ← lc else largest ← i if rc ≤ heapsize(A) and A[rc] > A[largest] then largest ← rc if largest 6= i then exchange A[i] ↔ A[largest] Max-Heapify(A, largest) • WC running time: lg n.
    [Show full text]
  • CS321 Spring 2021
    CS321 Spring 2021 Lecture 2 Jan 13 2021 Admin • A1 Due next Saturday Jan 23rd – 11:59PM Course in 4 Sections • Section I: Basics and Sorting • Section II: Hash Tables and Basic Data Structs • Section III: Binary Search Trees • Section IV: Graphs Section I • Sorting methods and Data Structures • Introduction to Heaps and Heap Sort What is Big O notation? • A way to approximately count algorithm operations. • A way to describe the worst case running time of algorithms. • A tool to help improve algorithm performance. • Can be used to measure complexity and memory usage. Bounds on Operations • An algorithm takes some number of ops to complete: • a + b is a single operation, takes 1 op. • Adding up N numbers takes N-1 ops. • O(1) means ‘on order of 1’ operation. • O( c ) means ‘on order of constant’. • O( n) means ‘ on order of N steps’. • O( n2) means ‘ on order of N*N steps’. How Does O(k) = O(1) O(n) = c * n for some c where c*n is always greater than n for some c. O( k ) = c*k O( 1 ) = cc * 1 let ccc = c*k c*k = c*k* 1 therefore O( k ) = c * k * 1 = ccc *1 = O(1) O(n) times for sorting algorithms. Technique O(n) operations O(n) memory use Insertion Sort O(N2) O( 1 ) Bubble Sort O(N2) O(1) Merge Sort N * log(N) O(1) Heap Sort N * log(N) O(1) Quicksort O(N2) O(logN) Memory is in terms of EXTRA memory Primary Notation Types • O(n) = Asymptotic upper bound.
    [Show full text]
  • Rate of Growth Linear Vs Logarithmic Growth O() Complexity Measure
    Where were we…. Rate of Growth • Comparing worst case performance of algorithms. We don't know how long the steps actually take; we only know it is some constant time. We can • Do it in machine-independent way. just lump all constants together and forget about • Time usage of an algorithm: how many basic them. steps does an algorithm perform, as a function of the input size. What we are left with is the fact that the time in • For example: given an array of length N (=input linear search grows linearly with the input, while size), how many steps does linear search perform? in binary search it grows logarithmically - much slower. Linear vs logarithmic growth O() complexity measure Linear growth: Logarithmic growth: Input size T(N) = c log N Big O notation gives an asymptotic upper bound T(N) = N* c 2 on the actual function which describes 10 10c c log 10 = 4c time/memory usage of the algorithm: logarithmic, 2 linear, quadratic, etc. 100 100c c log2 100 = 7c The complexity of an algorithm is O(f(N)) if there exists a constant factor K and an input size N0 1000 1000c c log2 1000 = 10c such that the actual usage of time/memory by the 10000 10000c c log 10000 = 16c algorithm on inputs greater than N0 is always less 2 than K f(N). Upper bound example In other words f(N)=2N If an algorithm actually makes g(N) steps, t(N)=3+N time (for example g(N) = C1 + C2log2N) there is an input size N' and t(N) is in O(N) because for all N>3, there is a constant K, such that 2N > 3+N for all N > N' , g(N) ≤ K f(N) Here, N0 = 3 and then the algorithm is in O(f(N).
    [Show full text]
  • Mock In-Class Test COMS10007 Algorithms 2018/2019
    Mock In-class Test COMS10007 Algorithms 2018/2019 Throughout this paper log() denotes the binary logarithm, i.e, log(n) = log2(n), and ln() denotes the logarithm to base e, i.e., ln(n) = loge(n). 1 O-notation 1. Let f : N ! N be a function. Define the set Θ(f(n)). Proof. Θ(f(n)) = fg(n) : There exist positive constants c1; c2 and n0 s.t. 0 ≤ c1f(n) ≤ g(n) ≤ c2f(n) for all n ≥ n0g 2. Give a formal proof of the statement: p 10 n 2 O(n) : p Proof. We need to show that there are positive constants c; n0 such that 10 n ≤ c · n, 10 p for every n ≥ n0. The previous inequality is equivalent to c ≤ n, which in turn 100 100 gives c2 ≤ n. Hence, we can pick c = 1 and n0 = 12 = 100. 3. Use the racetrack principle to prove the following statement: n 2 O(2n) : Hint: The following facts can be useful: • The derivative of 2n is ln(2)2n. 1 • 2 ≤ ln(2) ≤ 1 holds. n Proof. We need to show that there are positive constants c; n0 such that n ≤ c·2 . We n pick c = 1 and n0 = 1. Observe that n ≤ 2 holds for n = n0(= 1). It remains to show n that n ≤ 2 also holds for every n ≥ n0. To show this, we use the racetrack principle. Observe that the derivative of n is 1 and the derivative of 2n is ln(2)2n. Hence, by n the racetrack principle it is enough to show that 1 ≤ ln(2)2 holds for every n ≥ n0, 1 1 1 1 or log( ln 2 ) ≤ n.
    [Show full text]
  • Bubble Sort, Selection Sort, and Insertion Sort
    Data Structures & Algorithms Lecture 3 & 4: Searching and Sorting Algorithm Halabja University Collage of Science /Computer Science Department Karwan Mahdi [email protected] 2020-2021 Objectives Introduction to Search Algorithms which is consist of Linear Search and Binary Search Understand the significance of sorting Distinguish between internal and external sorting Understand the logic behind the simple sorting methods of bubble sort, selection sort, and insertion sort. Introduction to Search Algorithms Search: locate an item in a list of information. Two common algorithms (methods): • Linear search • Binary search Linear search: also called sequential search. Examines all values in an array until it finds a match or reaches the end. • Number of visits for a linear search of an array of n elements: • The average search visits n/2 elements • The maximum visits is n • A linear search locates a value in an array in O(n) steps Example for linear search Example: linear search Algorithm of Linear Search Order in linear search Advantages and Disadvantages • Benefits: • Easy algorithm to understand • Array can be in any order • Disadvantages: • Inefficient (slow): for array of N elements, examines N/2 elements on average for value in array, • N elements for value not in array “Java code for linear search REQUIRE” Binary Search Binary search is a smart algorithm that is much more efficient than the linear search ,the elements in the array must be sorted in order 1. Divides the array into three sections: • middle element • elements on one side of the middle element • elements on the other side of the middle element 2.
    [Show full text]
  • Comparing the Heuristic Running Times And
    Comparing the heuristic running times and time complexities of integer factorization algorithms Computational Number Theory ! ! ! New Mexico Supercomputing Challenge April 1, 2015 Final Report ! Team 14 Centennial High School ! ! Team Members ! Devon Miller Vincent Huber ! Teacher ! Melody Hagaman ! Mentor ! Christopher Morrison ! ! ! ! ! ! ! ! ! "1 !1 Executive Summary....................................................................................................................3 2 Introduction.................................................................................................................................3 2.1 Motivation......................................................................................................................3 2.2 Public Key Cryptography and RSA Encryption............................................................4 2.3 Integer Factorization......................................................................................................5 ! 2.4 Big-O and L Notation....................................................................................................6 3 Mathematical Procedure............................................................................................................6 3.1 Mathematical Description..............................................................................................6 3.1.1 Dixon’s Algorithm..........................................................................................7 3.1.2 Fermat’s Factorization Method.......................................................................7
    [Show full text]
  • Binary Search Tree
    ADT Binary Search Tree! Ellen Walker! CPSC 201 Data Structures! Hiram College! Binary Search Tree! •" Value-based storage of information! –" Data is stored in order! –" Data can be retrieved by value efficiently! •" Is a binary tree! –" Everything in left subtree is < root! –" Everything in right subtree is >root! –" Both left and right subtrees are also BST#s! Operations on BST! •" Some can be inherited from binary tree! –" Constructor (for empty tree)! –" Inorder, Preorder, and Postorder traversal! •" Some must be defined ! –" Insert item! –" Delete item! –" Retrieve item! The Node<E> Class! •" Just as for a linked list, a node consists of a data part and links to successor nodes! •" The data part is a reference to type E! •" A binary tree node must have links to both its left and right subtrees! The BinaryTree<E> Class! The BinaryTree<E> Class (continued)! Overview of a Binary Search Tree! •" Binary search tree definition! –" A set of nodes T is a binary search tree if either of the following is true! •" T is empty! •" Its root has two subtrees such that each is a binary search tree and the value in the root is greater than all values of the left subtree but less than all values in the right subtree! Overview of a Binary Search Tree (continued)! Searching a Binary Tree! Class TreeSet and Interface Search Tree! BinarySearchTree Class! BST Algorithms! •" Search! •" Insert! •" Delete! •" Print values in order! –" We already know this, it#s inorder traversal! –" That#s why it#s called “in order”! Searching the Binary Tree! •" If the tree is
    [Show full text]
  • Characterization Quaternaty Lookup Table in Standard Cmos Process
    International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 02 Issue: 07 | Oct-2015 www.irjet.net p-ISSN: 2395-0072 CHARACTERIZATION QUATERNATY LOOKUP TABLE IN STANDARD CMOS PROCESS S.Prabhu Venkateswaran1, R.Selvarani2 1Assitant Professor, Electronics and Communication Engineering, SNS College of Technology,Tamil Nadu, India 2 PG Scholar, VLSI Design, SNS College Of Technology, Tamil Nadu, India -------------------------------------------------------------------------**------------------------------------------------------------------------ Abstract- The Binary logics and devices have been process gets low power and easy to scale down, gate of developed with an latest technology and gate design MOS need much lower driving current than base current of area. The design and implementation of logical circuits two polar, scaling down increase CMOS speed Comparing become easier and compact. Therefore present logic MVL present high power consumption, due to current devices that can implement in binary and multi valued mode circuit element or require nonstandard multi logic system. In multi-valued logic system logic gates threshold CMOS technologies. multiple-value logic are increase the high power required for level of transition varies in different logic systems, a quaternary has and increase the number of required interconnections become mature in terms of logic algebra and gates. ,hence also increasing the overall energy. Interconnections Some multi valued logic systems such as ternary and are increase the dominant contributor to delay are and quaternary logic schemes have been developed. energy consumption in CMOS digital circuit. Quaternary logic has many advantages over binary logic. Since it require half the number of digits to store 2. QUATERNARY LOGIC AND LUTs any information than its binary equivalent it is best for storage; the quaternary storage mechanism is less than Ternary logic system: It is based upon CMOS compatible twice as complex as the binary system.
    [Show full text]
  • Optimizing MPC for Robust and Scalable Integer and Floating-Point Arithmetic
    Optimizing MPC for robust and scalable integer and floating-point arithmetic Liisi Kerik1, Peeter Laud1, and Jaak Randmets1,2 1 Cybernetica AS, Tartu, Estonia 2 University of Tartu, Tartu, Estonia {liisi.kerik, peeter.laud, jaak.randmets}@cyber.ee Abstract. Secure multiparty computation (SMC) is a rapidly matur- ing field, but its number of practical applications so far has been small. Most existing applications have been run on small data volumes with the exception of a recent study processing tens of millions of education and tax records. For practical usability, SMC frameworks must be able to work with large collections of data and perform reliably under such conditions. In this work we demonstrate that with the help of our re- cently developed tools and some optimizations, the Sharemind secure computation framework is capable of executing tens of millions integer operations or hundreds of thousands floating-point operations per sec- ond. We also demonstrate robustness in handling a billion integer inputs and a million floating-point inputs in parallel. Such capabilities are ab- solutely necessary for real world deployments. Keywords: Secure Multiparty Computation, Floating-point operations, Protocol design 1 Introduction Secure multiparty computation (SMC) [19] allows a group of mutually distrust- ing entities to perform computations on data private to various members of the group, without others learning anything about that data or about the intermedi- ate values in the computation. Theory-wise, the field is quite mature; there exist several techniques to achieve privacy and correctness of any computation [19, 28, 15, 16, 21], and the asymptotic overheads of these techniques are known.
    [Show full text]
  • Binary Search Trees (BST)
    CSCI-UA 102 Joanna Klukowska Lecture 7: Binary Search Trees [email protected] Lecture 7: Binary Search Trees (BST) Reading materials Goodrich, Tamassia, Goldwasser: Chapter 8 OpenDSA: Chapter 18 and 12 Binary search trees visualizations: http://visualgo.net/bst.html http://www.cs.usfca.edu/~galles/visualization/BST.html Topics Covered 1 Binary Search Trees (BST) 2 2 Binary Search Tree Node 3 3 Searching for an Item in a Binary Search Tree3 3.1 Binary Search Algorithm..............................................4 3.1.1 Recursive Approach............................................4 3.1.2 Iterative Approach.............................................4 3.2 contains() Method for a Binary Search Tree..................................5 4 Inserting a Node into a Binary Search Tree5 4.1 Order of Insertions.................................................6 5 Removing a Node from a Binary Search Tree6 5.1 Removing a Leaf Node...............................................7 5.2 Removing a Node with One Child.........................................7 5.3 Removing a Node with Two Children.......................................8 5.4 Recursive Implementation.............................................9 6 Balanced Binary Search Trees 10 6.1 Adding a balance() Method........................................... 10 7 Self-Balancing Binary Search Trees 11 7.1 AVL Tree...................................................... 11 7.1.1 When balancing is necessary........................................ 12 7.2 Red-Black Trees.................................................. 15 1 CSCI-UA 102 Joanna Klukowska Lecture 7: Binary Search Trees [email protected] 1 Binary Search Trees (BST) A binary search tree is a binary tree with additional properties: • the value stored in a node is greater than or equal to the value stored in its left child and all its descendants (or left subtree), and • the value stored in a node is smaller than the value stored in its right child and all its descendants (or its right subtree).
    [Show full text]
  • Quick Sort Algorithm Song Qin Dept
    Quick Sort Algorithm Song Qin Dept. of Computer Sciences Florida Institute of Technology Melbourne, FL 32901 ABSTRACT each iteration. Repeat this on the rest of the unsorted region Given an array with n elements, we want to rearrange them in without the first element. ascending order. In this paper, we introduce Quick Sort, a Bubble sort works as follows: keep passing through the list, divide-and-conquer algorithm to sort an N element array. We exchanging adjacent element, if the list is out of order; when no evaluate the O(NlogN) time complexity in best case and O(N2) exchanges are required on some pass, the list is sorted. in worst case theoretically. We also introduce a way to approach the best case. Merge sort [4] has a O(NlogN) time complexity. It divides the 1. INTRODUCTION array into two subarrays each with N/2 items. Conquer each Search engine relies on sorting algorithm very much. When you subarray by sorting it. Unless the array is sufficiently small(one search some key word online, the feedback information is element left), use recursion to do this. Combine the solutions to brought to you sorted by the importance of the web page. the subarrays by merging them into single sorted array. 2 Bubble, Selection and Insertion Sort, they all have an O(N2) time In Bubble sort, Selection sort and Insertion sort, the O(N ) time complexity that limits its usefulness to small number of element complexity limits the performance when N gets very big. no more than a few thousand data points.
    [Show full text]
  • A Set Intersection Algorithm Via X-Fast Trie
    Journal of Computers A Set Intersection Algorithm Via x-Fast Trie Bangyu Ye* National Engineering Laboratory for Information Security Technologies, Institute of Information Engineering, Chinese Academy of Sciences, Beijing, 100091, China. * Corresponding author. Tel.:+86-10-82546714; email: [email protected] Manuscript submitted February 9, 2015; accepted May 10, 2015. doi: 10.17706/jcp.11.2.91-98 Abstract: This paper proposes a simple intersection algorithm for two sorted integer sequences . Our algorithm is designed based on x-fast trie since it provides efficient find and successor operators. We present that our algorithm outperforms skip list based algorithm when one of the sets to be intersected is relatively ‘dense’ while the other one is (relatively) ‘sparse’. Finally, we propose some possible approaches which may optimize our algorithm further. Key words: Set intersection, algorithm, x-fast trie. 1. Introduction Fast set intersection is a key operation in the context of several fields when it comes to big data era [1], [2]. For example, modern search engines use the set intersection for inverted posting list which is a standard data structure in information retrieval to return relevant documents. So it has been studied in many domains and fields [3]-[8]. One of the most typical situations is Boolean query which is required to retrieval the documents that contains all the terms in the query. Besides, set intersection is naturally a key component of calculating the Jaccard coefficient of two sets, which is defined by the fraction of size of intersection and union. In this paper, we propose a new algorithm via x-fast trie which is a kind of advanced data structure.
    [Show full text]