Investigating Search Algorithms to What Extent Is A* Search Algorithm
Total Page:16
File Type:pdf, Size:1020Kb
Investigating Search Algorithms To what extent is A* search algorithm more efficient than Bidirectional Dijkstra’s algorithm at finding the closest relationship between 2 people in a network of human connections? Computer Science Word Count: 3814 Candidate Number: Contents page 1 Introduction 2 2 Search Algorithms 4 2.1 Basics 4 2.2 Breadth-First Search 8 2.3 Branch-and-Bound 11 2.4 Extended Set 14 2.5 Heuristic Function 18 3 Algorithms Used 22 3.1 Bidirectional Dijkstra’s Algorithm 22 3.1.1 Dijkstra’s Algorithm 22 3.1.2 Bidirectional Search 22 3.1.3 In Context 28 3.2 A* Search Algorithm 28 3.2.1 In Context 33 4 Methodology 34 4.1 Setup 34 4.2 Procedure 34 4.3 Success Criteria 35 4.3.1 Time Complexity 35 4.3.2 Expanded Nodes 35 4.3.3 Optimal Distance 35 4.4 Clarifications 36 4.4.1 Six Degrees of Separation 36 4.4.2 Calculating Heuristic Values 37 4.4.3 Privacy 38 4.4.4 Logistical Setback 38 5 Comparison 39 5.1 Hypothesis 39 5.2 Analysis 39 5.2.1 Bidirectional Dijkstra’s Algorithm Explanation 39 5.2.2 A* Search Algorithm Explanation 45 1 6 Conclusion 49 7 Evaluation 50 8 Works Cited 51 9 Works Consulted 53 1 Introduction Artificial intelligence is taking over. It is used to recognise faces, detect fake news, and even identify the type of pizza in a photo. It has allowed us to process large amounts of data, quantities that would take generations of humans to absorb let alone analyse. Computers are known for their ability to solve seemingly unsolvable problems and search algorithms are indispensable to fast software. Searching algorithms are algorithms that solves problem, from searching for where data is stored in a database, to searching for anomalies in data sets. Search algorithm investigations identify algorithms that can provide more effective solutions to complex problems. Searching in human networks explores an integral part of human lives and has a wide variety of applications, such as forensic investigation, social networking and tracing one's ancestors. This paper explores the strengths and weaknesses of A* search algorithm and Bidirectional Dijkstra’s algorithm, and compare their efficiency in finding the shortest connection between two people in a large network through analysing their time complexity and the optimality of their results. In this essay, the optimality of a path is determined by the shortness of its length. 2 A* and Bidirectional Dijkstra found similarly, if not the same, optimal paths. A*’s time complexity is dependent on the effectiveness of its heuristic function. The essay concludes that A* with effective heuristic functions (see 2.5 Heuristic Function) is more efficient than Bidirectional Dijkstra in finding the shortest connection between humans in a network. 3 2 Search Algorithms A* search algorithm and Dijkstra’s algorithm are extensions of the algorithms below. Note that paths are referenced by their terminal node. 2.1 Basics Table 1 contains some related terms with examples referring to Graph 1 and Tree 1 below. Terms Definition Example Search tree Records the search paths Tree 1 is the full search tree of Graph 1. Node A point along the graph S and A to I are nodes. Start node The search starts from said node. S is the start node. Goal node The aim of the search is to find a G is the goal node. path to said node. Edges A line joining two nodes. Weights (of The actual “distance” between The weight of the edge between A edges) two nodes. and C is 7. Path A series of nodes that contains 4 the explored route Terminal node The last node in a path H and G are terminal nodes. Child node Subnode of a given node H and F are children nodes of B. Parent node The node that a given node is a B is the parent node of H and F. subnode of Ancestors A series of preceding parent S and B are the ancestors of H. nodes Descendents A series of nodes that expands F, I and G are the descendants of B. from the node Root node Node with no parent node, S is the root node. usually at the top of the tree Leaf node Node with no children nodes, H and G are the leaf nodes. usually at the bottom Partial path Path that does not reach the goal S-B-H is a partial path. Complete path Path that reaches the goal S-A-C-E-I-G, S-A-D-E-I-G and S-B-F-I-G are complete paths. Expanding (a Finding the children nodes of a H and F are expanded from B. node) given node 5 Open Have potential children B is an open. Closed Have no potential children H is closed. Layer A layer of nodes are of the same C, D, H and F are in the same layer. generation Branching factor (Average) number of children (a B has a branching factor of 2 (b) node has in a graph) Depth (d) Layers of nodes (or number of Tree 1 has a depth of 6 nodes in the complete path) Frontiers Terminal nodes of partial paths The frontiers in Tree 1 are: G and H. in a search Extension/ Children nodes are C, D and B are extended/expanded Expansion extended/expanded from a parent from A. node. Accumulated The sum of the weights of the The accumulated distance of S-A-C is Distance path’s edges. 3 + 7 = 10. Heuristic Distance The predicted remaining distance The heuristic distance of A is 15. from the node to the goal node. In this Graph, it is the length of a straight line drawn from the 6 corresponding nodes to the goal node. (see 2.5 Heuristic Function for detailed explanation) Table 1 - Search Trees Relevant Terms Graph 1 - Example graph 7 Tree 1 - Full Search Tree of Graph 1 2.2 Breadth-First Search Breadth-first search checks the same layer of nodes in a tree for the goal node until the goal node is found. A queue is usually used to store the paths for checking. To add an element to a queue, the element is added, or enqueued, to the back of the queue while extracting an element, or dequeuing, takes the first element in the queue. Diagram 1 shows how a queue works. 8 Diagram 1 - Queue Flowchart 1 shows how breadth-first search works. 9 Flowchart 1 - Breadth-First Search Procedure Tree 2 is an example of BFS search of Graph 1 above. (refer to Table 1 for definition of extension and expansion) 10 Tree 2 - BFS Search Tree Example 2.3 Branch-and-Bound Branch-and-bound (BnB) sorts the list of paths waiting to be expanded by the lengths of their paths, from shortest to longest. BnB introduces node costs. Each node is assigned a cost which is a value that determines its search priority. In BnB, the node cost is it’s path length from the start node. The lower the cost is, the higher priority node receives. 11 Flowchart 2 shows how BnB works. The red text indicates the addition to the procedure of BFS in that of BnB. 12 Flowchart 2 - BnB Search Procedures Tree 3 is an example of BnB search of Graph 1. The orange numbers in Tree 3 indicate the accumulated distance of the path. 13 Tree 3 - BnB Search Tree Example Since the first path in the list of nodes to explore (PATHS in Flowchart 2), the shortest path so far, in the list is expanded first, the result is probably of an optimal distance. This also prunes away paths that are too long, reducing the number of extensions from 16 in Tree 2 to 15 in Tree 3. 2.4 Extended Set An extended set is a collection that keeps track of nodes that has been extended. 14 Flowchart 3 shows how BnB with extended set (BnB w/ extended set) works. The blue boxes indicate addition to the BnB w/ extended set model. 15 Flowchart 3 - BnB w/ Extended Set Search Procedure 16 Tree 4 is an example of BnB w/ extended set search of Graph 1. The green cross in Tree 4 indicates the eliminated path due to the addition of the extended set. Tree 4 - BnB w/ Extended Set Search Tree Example Since this model keeps track of the expanded nodes and sorts the paths by path length, it will prioritise the shortest paths first and if a node has previously been expanded, the previous path is most likely more efficient since it took a shorter distance to reach the same node. This prunes 17 away nodes that are unlikely to lead to an optimal path, reducing the number of extensions from 15 in Tree 3 to 12 in Tree 4. 2.5 Heuristic Function A heuristic distance is the estimated remaining distance to the goal. It is the minimum value of the remaining distance. The estimated total distance is the addition of the path length so far and the heuristic distance to the goal. This can be shown mathematically. e(total path length) as the estimated total distance, d(travelled distance) as the path lengths so far and e(remaining distance) as the heuristic distance between the terminal node and the goal node. e(total path length) = d(travelled distance) + e(remaining distance) Equation 1 - Heuristic Value Heuristic values are generated by a function that takes in a node in the graph and the goal node.