Binary Search Algorithm Anthony Lin¹* Et Al
Total Page:16
File Type:pdf, Size:1020Kb
WikiJournal of Science, 2019, 2(1):5 doi: 10.15347/wjs/2019.005 Encyclopedic Review Article Binary search algorithm Anthony Lin¹* et al. Abstract In In computer science, binary search, also known as half-interval search,[1] logarithmic search,[2] or binary chop,[3] is a search algorithm that finds a position of a target value within a sorted array.[4] Binary search compares the target value to an element in the middle of the array. If they are not equal, the half in which the target cannot lie is eliminated and the search continues on the remaining half, again taking the middle element to compare to the target value, and repeating this until the target value is found. If the search ends with the remaining half being empty, the target is not in the array. Binary search runs in logarithmic time in the worst case, making 푂(log 푛) comparisons, where 푛 is the number of elements in the array, the 푂 is ‘Big O’ notation, and 푙표푔 is the logarithm.[5] Binary search is faster than linear search except for small arrays. However, the array must be sorted first to be able to apply binary search. There are spe- cialized data structures designed for fast searching, such as hash tables, that can be searched more efficiently than binary search. However, binary search can be used to solve a wider range of problems, such as finding the next- smallest or next-largest element in the array relative to the target even if it is absent from the array. There are numerous variations of binary search. In particular, fractional cascading speeds up binary searches for the same value in multiple arrays. Fractional cascading efficiently solves a number of search problems in compu- tational geometry and in numerous other fields. Exponential search extends binary search to unbounded lists. The binary search tree and B-tree data structures are based on binary search. Algorithm 1. Set 퐿 to 0 and 푅 to 푛 − 1. 2. While 퐿 ≤ 푅, Binary search works on sorted arrays. Binary search be- 1. Set 푚 (the position of the middle element) to gins by comparing an element in the middle of the array 퐿+푅 the floor of , which is the greatest integer with the target value. If the target value matches the el- 2 퐿+푅 ement, its position in the array is returned. If the target less than or equal to . 2 value is less than the element, the search continues in 2. If 퐴 < 푇, set 퐿 to 푚 + 1. the lower half of the array. If the target value is greater 푚 3. If 퐴 > 푇, set 푅 to 푚 − 1. than the element, the search continues in the upper half 푚 of the array. By doing this, the algorithm eliminates the 4. Else, 퐴푚 = 푇; return 푚. half in which the target value cannot lie in each itera- 3. If the search has not returned a value by the time tion.[6] While 퐿 > 푅, the search terminates as unsuccessful. This iterative procedure keeps track of the search Procedure boundaries with the two variables 퐿 and 푅. The proce- Given an array 퐴 of 푛 elements with values or records dure may be expressed in pseudocode as follows, where the variable names and types remain the same as 퐴0, 퐴1, 퐴2, … , 퐴푛−1 sorted such that 퐴0 ≤ 퐴1 ≤ 퐴2 ≤ above, floor is the floor function, and unsuccessful ⋯ ≤ 퐴푛−1, and target value 푇, the following subroutine uses binary search to find the index of 푇 in 퐴.[6] refers to a specific value that conveys the failure of the search.[6] *Author correspondence: by online form Licensed under: CC BY-SA Received 29-10-2018; accepted 02-07-2019 1 of 13 | WikiJournal of Science WikiJournal of Science, 2019, 2(1):5 doi: 10.15347/wjs/2019.005 Encyclopedic Review Article function binary_search(A, n, T): L := 0 Duplicate elements R := n − 1 while L <= R: The procedure may return any index whose element is m := floor((L + R) / 2) equal to the target value, even if there are duplicate el- if A[m] < T: ements in the array. For example, if the array to be L := m + 1 else if A[m] > T: searched was [1,2,3,4,4,5,6,7] and the target was 4, R := m - 1 then it would be correct for the algorithm to either re- else: turn the 4th (index 3) or 5th (index 4) element. The reg- return m return unsuccessful ular procedure would return the 4th element (index 3). However, it is sometimes necessary to find the leftmost 퐿+푅 Alternatively, the algorithm may take the ceiling of , 2 element or the rightmost element for a target value 퐿+푅 or the least integer greater than or equal to . This that is duplicated in the array. In the above example, the 2 4th element is the leftmost element of the value 4, may change the result if the target value appears more while the 5th element is the rightmost element of the than once in the array. value 4. The alternative procedure above will always re- turn the index of the rightmost element if such an ele- Alternative procedure ment exists.[8] In the above procedure, the algorithm checks whether the middle element (푚) is equal to the target (푇) in Procedure for finding the leftmost element every iteration. Some implementations leave out this To find the leftmost element, the following procedure check during each iteration. The algorithm would per- can be used:[9] form this check only when one element is left (when 1. Set 퐿 to 0 and 푅 to 푛. 퐿 = 푅). This results in a faster comparison loop, as one comparison is eliminated per iteration. However, it re- 2. While 퐿 < 푅, quires one more iteration on average.[1] 1. Set 푚 (the position of the middle element) to 퐿+푅 the floor of , which is the greatest integer Hermann Bottenbruch published the first implementa- 2 퐿+푅 [1][8] less than or equal to . tion to leave out this check in 1962. 2 1. Set 퐿 to 0 and 푅 to 푛 − 1. 2. If 퐴푚 < 푇, set 퐿 to 푚 + 1. 2. While 퐿 ≠ 푅, 3. Else 퐴푚 ≥ 푇, set 푅 to 푚. 1. Set 푚 (the position of the middle element) to 3. Return 퐿. 퐿+푅 the ceiling of , which is the least integer 2 If 퐿 < 푛 and 퐴퐿 = 푇, then 퐴퐿 is the leftmost element 퐿+푅 greater than or equal to . that equals 푇. Even if 푇 is not in the array, 퐿 is the rank 2 of 푇 in the array, or the number of elements in the array 2. If 퐴푚 > 푇, set 푅 to 푚 − 1. that are less than 푇. 3. Else 퐴푚 ≤ 푇, set 퐿 to 푚. floor 3. Now 퐿 = 푅, the search is done. If 퐴퐿 = 푇, return 퐿. Where is the floor function, the pseudocode for Otherwise, the search terminates as unsuccessful. this version is: Where ceil is the ceiling function, the pseudocode for function binary_search_rightmost(A, n, T): this version is: L := 0 R := n function binary_search_alternative(A, n, T): while L < R: L := 0 m := floor((L + R) / 2) R := n − 1 if A[m] > T: while L != R: R := m m := ceil((L + R) / 2) else: if A[m] > T: L := m + 1 R := m - 1 return L - 1 else: L := m if A[L] == T: return L return unsuccessful Procedure for finding the rightmost element To find the rightmost element, the following procedure can be used:[9] 2 of 13 | WikiJournal of Science WikiJournal of Science, 2019, 2(1):5 doi: 10.15347/wjs/2019.005 Encyclopedic Review Article 1. Set 퐿 to 0 and 푅 to 푛. 2. While 퐿 < 푅, 1. Set 푚 (the position of the middle element) to 퐿+푅 the floor of , which is the greatest integer 2 퐿+푅 less than or equal to . 2 2. If 퐴푚 > 푇, set 푅 to 푚. 3. Else 퐴푚 ≤ 푇, set 퐿 to 푚 + 1. 3. Return 퐿 − 1. Figure 1 | Binary search can be adapted to compute approxi- If 퐿 > 0 and 퐴 = 푇, then 퐴 is the rightmost ele- 퐿−1 퐿−1 mate matches. In the example above, the rank, predecessor, ment that equals 푇. Even if 푇 is not in the array, successor, and nearest neighbor are shown for the target (푛 − 1) − 퐿 is the number of elements in the array that value 5, which is not in the array. are greater than 푇. Where floor is the floor function, the pseudocode for • The nearest neighbor of the target value is either its this version is: predecessor or successor, whichever is closer. • Range queries are also straightforward.[11] Once the function binary_search_rightmost(A, n, T): ranks of the two values are known, the number of L := 0 elements greater than or equal to the first value and R := n while L < R: less than the second is the difference of the two m := floor((L + R) / 2) ranks. This count can be adjusted up or down by one if A[m] > T: according to whether the endpoints of the range R := m else: should be considered to be part of the range and L := m + 1 whether the array contains entries matching those return L - 1 endpoints.[12] Approximate matches Performance The above procedure only performs exact matches, finding the position of a target value. However, it is triv- Number of comparisons ial to extend binary search to perform approximate matches because binary search operates on sorted ar- In terms of the number of comparisons, the perfor- rays.