Investigating Search Algorithms

To what extent is A* search algorithm more efficient than Bidirectional Dijkstra’s algorithm at

finding the closest relationship between 2 people in a network of human connections?

Computer Science

Word Count: 3814

Candidate Number:

Contents page

1 Introduction 2

2 Search Algorithms 4 2.1 Basics 4 2.2 Breadth-First Search 8 2.3 Branch-and-Bound 11 ​ 2.4 Extended Set 14 ​ 2.5 Heuristic Function 18 ​ 3 Algorithms Used 22 ​ 3.1 Bidirectional Dijkstra’s Algorithm 22 ​ 3.1.1 Dijkstra’s Algorithm 22 ​ 3.1.2 Bidirectional Search 22 ​ 3.1.3 In Context 28 ​ 3.2 A* Search Algorithm 28 ​ 3.2.1 In Context 33 ​ 4 Methodology 34 ​ 4.1 Setup 34 ​ 4.2 Procedure 34 ​ 4.3 Success Criteria 35 ​ 4.3.1 Time Complexity 35 ​ 4.3.2 Expanded Nodes 35 ​ 4.3.3 Optimal Distance 35 ​ 4.4 Clarifications 36 ​ 4.4.1 Six Degrees of Separation 36 ​ 4.4.2 Calculating Heuristic Values 37 ​ 4.4.3 Privacy 38 ​ 4.4.4 Logistical Setback 38 ​ 5 Comparison 39 5.1 Hypothesis 39 5.2 Analysis 39 5.2.1 Bidirectional Dijkstra’s Algorithm Explanation 39 5.2.2 A* Search Algorithm Explanation 45 ​

1 6 Conclusion 49

7 Evaluation 50 ​ 8 Works Cited 51 ​ 9 Works Consulted 53 ​

1 Introduction

Artificial intelligence is taking over. It is used to recognise faces, detect fake news, and even identify the type of pizza in a photo. It has allowed us to process large amounts of data, quantities that would take generations of humans to absorb let alone analyse.

Computers are known for their ability to solve seemingly unsolvable problems and search algorithms are indispensable to fast software. Searching algorithms are algorithms that solves problem, from searching for where data is stored in a database, to searching for anomalies in data sets. Search algorithm investigations identify algorithms that can provide more effective solutions to complex problems. Searching in human networks explores an integral part of human lives and has a wide variety of applications, such as forensic investigation, social networking and tracing one's ancestors.

This paper explores the strengths and weaknesses of A* search algorithm and Bidirectional

Dijkstra’s algorithm, and compare their efficiency in finding the shortest connection between two people in a large network through analysing their time complexity and the optimality of their results. In this essay, the optimality of a path is determined by the shortness of its length.

2

A* and Bidirectional Dijkstra found similarly, if not the same, optimal paths. A*’s time complexity is dependent on the effectiveness of its heuristic function. The essay concludes that

A* with effective heuristic functions (see 2.5 Heuristic Function) is more efficient than ​ ​ Bidirectional Dijkstra in finding the shortest connection between humans in a network.

3 2 Search Algorithms

A* search algorithm and Dijkstra’s algorithm are extensions of the algorithms below. Note that paths are referenced by their terminal node.

2.1 Basics

Table 1 contains some related terms with examples referring to Graph 1 and 1 below. ​ ​ ​ ​ ​

Terms Definition Example

Search tree Records the search paths Tree 1 is the full search tree of Graph ​ ​ 1. ​

Node A point along the graph S and A to I are nodes.

Start node The search starts from said node. S is the start node.

Goal node The aim of the search is to find a G is the goal node.

path to said node.

Edges A line joining two nodes.

Weights (of The actual “distance” between The weight of the edge between A edges) two nodes. and C is 7.

Path A series of nodes that contains

4 the explored route

Terminal node The last node in a path H and G are terminal nodes.

Child node Subnode of a given node H and F are children nodes of B.

Parent node The node that a given node is a B is the parent node of H and F.

subnode of

Ancestors A series of preceding parent S and B are the ancestors of H.

nodes

Descendents A series of nodes that expands F, I and G are the descendants of B.

from the node

Root node Node with no parent node, S is the root node.

usually at the top of the tree

Leaf node Node with no children nodes, H and G are the leaf nodes.

usually at the bottom

Partial path Path that does not reach the goal S-B-H is a partial path.

Complete path Path that reaches the goal S-A-C-E-I-G, S-A-D-E-I-G and

S-B-F-I-G are complete paths.

Expanding (a Finding the children nodes of a H and F are expanded from B. node) given node

5 Open Have potential children B is an open.

Closed Have no potential children H is closed.

Layer A layer of nodes are of the same C, D, H and F are in the same layer.

generation

Branching factor (Average) number of children (a B has a branching factor of 2

(b) node has in a graph) ​ ​

Depth (d) Layers of nodes (or number of Tree 1 has a depth of 6 ​ ​ ​ nodes in the complete path)

Frontiers Terminal nodes of partial paths The frontiers in Tree 1 are: G and H. ​ ​ in a search

Extension/ Children nodes are C, D and B are extended/expanded

Expansion extended/expanded from a parent from A.

node.

Accumulated The sum of the weights of the The accumulated distance of S-A-C is

Distance path’s edges. 3 + 7 = 10.

Heuristic Distance The predicted remaining distance The heuristic distance of A is 15.

from the node to the goal node.

In this Graph, it is the length of a

straight line drawn from the

6 corresponding nodes to the goal

node. (see 2.5 Heuristic Function ​ for detailed explanation)

Table 1 - Search Trees Relevant Terms ​

Graph 1 - Example graph ​

7

Tree 1 - Full Search Tree of Graph 1 ​ ​

2.2 Breadth-First Search

Breadth-first search checks the same layer of nodes in a tree for the goal node until the goal node is found.

A queue is usually used to store the paths for checking. To add an element to a queue, the element is added, or enqueued, to the back of the queue while extracting an element, or dequeuing, takes the first element in the queue.

Diagram 1 shows how a queue works. ​

8

Diagram 1 - Queue ​

Flowchart 1 shows how breadth-first search works. ​

9

Flowchart 1 - Breadth-First Search Procedure ​

Tree 2 is an example of BFS search of Graph 1 above. (refer to Table 1 for definition of ​ ​ ​ extension and expansion)

10

Tree 2 - BFS Search Tree Example ​

2.3 Branch-and-Bound

Branch-and-bound (BnB) sorts the list of paths waiting to be expanded by the lengths of their paths, from shortest to longest.

BnB introduces node costs. Each node is assigned a cost which is a value that determines its search priority. In BnB, the node cost is it’s path length from the start node. The lower the cost is, the higher priority node receives.

11 Flowchart 2 shows how BnB works. The red text indicates the addition to the procedure of BFS ​ in that of BnB.

12

Flowchart 2 - BnB Search Procedures ​

Tree 3 is an example of BnB search of Graph 1. The orange numbers in Tree 3 indicate the ​ ​ ​ ​ ​ accumulated distance of the path.

13

Tree 3 - BnB Search Tree Example ​

Since the first path in the list of nodes to explore (PATHS in Flowchart 2), the shortest path so ​ ​ far, in the list is expanded first, the result is probably of an optimal distance. This also prunes away paths that are too long, reducing the number of extensions from 16 in Tree 2 to 15 in Tree ​ ​ ​ 3. ​

2.4 Extended Set

An extended set is a collection that keeps track of nodes that has been extended.

14 Flowchart 3 shows how BnB with extended set (BnB w/ extended set) works. The blue boxes ​ indicate addition to the BnB w/ extended set model.

15

Flowchart 3 - BnB w/ Extended Set Search Procedure ​

16 Tree 4 is an example of BnB w/ extended set search of Graph 1. The green cross in Tree 4 ​ ​ indicates the eliminated path due to the addition of the extended set.

Tree 4 - BnB w/ Extended Set Search Tree Example ​

Since this model keeps track of the expanded nodes and sorts the paths by path length, it will prioritise the shortest paths first and if a node has previously been expanded, the previous path is most likely more efficient since it took a shorter distance to reach the same node. This prunes

17 away nodes that are unlikely to lead to an optimal path, reducing the number of extensions from

15 in Tree 3 to 12 in Tree 4. ​ ​ ​ ​

2.5 Heuristic Function

A heuristic distance is the estimated remaining distance to the goal. It is the minimum value of the remaining distance. The estimated total distance is the addition of the path length so far and the heuristic distance to the goal. This can be shown mathematically. e(total path length) as the estimated total distance, d(travelled distance) as the path lengths so far and e(remaining distance) as the heuristic distance between the terminal node and the goal node.

e(total path length) = d(travelled distance) + e(remaining distance)

Equation 1 - Heuristic Value ​

Heuristic values are generated by a function that takes in a node in the graph and the goal node.

There are different methods of generating the value. As-the-crow-flies distances are often used in heuristic functions for map searching. However, a different function is needed when searching a human network (see 4.4.2 Calculating Heuristic Values) ​ ​

Heuristic function introduces heuristic values into the node cost which becomes the sum of the node’s path length and heuristic distance to the goal node.

18 Flowchart 4 shows how BnB with heuristic function (BnB w/ heuristic function) works. The ​ green text indicates the addition to the BnB w/ heuristic function model.

19

Flowchart 4 - BnB w/ Heuristic Function Search Procedure ​

Tree 5 is an example of BnB w/ heuristic function search of Graph 1. In Tree 5, the purple ​ ​ ​ ​ ​ numbers between “+” and “=” are the heuristic distances from the node to the goal. The purple number after “=” is the estimated total distance.

20

Tree 5 - BnB w/ Heuristic Function Search Tree Example ​

In the BnB w/ heuristic function procedure, the collection of paths to explore sorts by the path’s estimated total distance. Taking the heuristic value into account speeds up the search process since it eliminates paths that goes in the opposite direction from where the goal node is relative to the start node, reducing the number of extensions from 12 in Tree 4 to 6 in Tree 5. This also ​ ​ ​ ​ ​ ​ means that the first complete path found is likely the optimal path.

21 3 Algorithms Used

3.1 Bidirectional Dijkstra’s Algorithm

Bidirectional Dijkstra’s algorithm (Bidirectional Dijkstra) is a combination of Dijkstra’s

Algorithm and Bidirectional search.

3.1.1 Dijkstra’s Algorithm

Dijkstra’s algorithm (Dijkstra) is a type of best first search that prioritises paths by path distance.

It can be described as a BnB w/ extended set.

Flowchart 3 above shows how Dijkstra’s works. Tree 4 above show an example of Dijkstra ​ ​ ​ search of Graph 1.

3.1.2 Bidirectional Search

Bidirectional search alternates between searching from the start node and the goal node. When the frontiers from the start and goal node searches intersect, a path is formed. One problem that this might encounter is the inability for the partial paths from the start and goal node to meet and create a path. However, this is not a problem for Dijkstra since it’s a derivation of breadth first search. In other words, the children nodes are very widespread and is very likely to meet.

Flowchart 5 shows how Bidirectional Dijkstra works. The orange text and boxes indicate the ​ addition to Bidirectional Dijkstra.

22

23

24

Flowchart 5 - Bidirectional Dijkstra Search Procedure ​

Diagram 2 is a visual representation of a Bidirectional Dijkstra search. ​

Diagram 2 - Bidirectional Dijkstra Search Example ​

25 (“.js”)

*Blue squares = visited area | Light green squares = leaf nodes (terminal nodes of partial paths at the end of the search) | Yellow line = final route | Dark green square = start | Red square = goal

Here is the annotated pseudocode that was used to create Flowchart 5. The annotations were ​ ​ made by the author of this essay.

26

(Sawlani 11)

27 3.1.3 In Context

Dijkstra is known for finding the shortest path in graphs. However, since it does not have a heuristics function, Dijkstra’s search takes longer than an A* search. The bidirectional search component speeds up the search by expanding from the start and goal node at the same time. In theory, Bidirectional Dijkstra should take the best of both worlds. In other words, it should be faster than a basic Dijkstra’s search and be able to find an optimal path.

3.2 A* Search Algorithm

A* search algorithm (A*) is a type of best first search, meaning it expands the best node according to an evaluation based on a rule or function. It is a BnB search that takes heuristic value into account and has a list that records the nodes that has already been visited called an extended set.

Flowchart 6 shows how A* works. ​

28

29

Flowchart 6 - A* Search Procedure ​

In order for A* to work, two conditions must be met: admissibility and consistency.

Admissibility refers to the heuristics of a graph generated by a heuristics function. If the heuristic distance of any given path in a graph is larger than the actual distance, the heuristic function is not admissible. This can be shown mathematically. In the equation below, x is a random node and G is the goal node. H(x,G) is the heuristic distance between x and G. D(x,G) is the actual distance between x and G.

H (x,G) ≤ D (x,G)

Equation 2 - Admissibility ​

Consistency can be mathematically, as shown in Equation 3. In Equation 3, H(x,G) and H (y,G) ​ ​ ​ ​ are the heuristic distances between x and G, and y and G respectively. D(x,y) is the actual distance between x and y.

30 | H (x,G) - H (y,G) | = D (x,y)

Equation 3 - Consistency ​

Graph 2 is an example of inconsistent heuristics in a graph from MIT Professor Patrick ​ Winston’s Lecture on A* Search (Winston 39:38).

Graph 2 - Example of graph with inconsistent heuristics ​

Equation 4 explains why the heuristics in regards to Graph 2 is not consistent. The absolute ​ ​ ​ value of the difference between the heuristic distance (H()) between a node (x) and the goal node ​ ​ (G) and the heuristic distance between another node (y) and the goal node should be equal to the ​ ​ ​ ​ actual accumulated distance between x and y. Values from Graph 2 are plugged into the equation ​ ​ ​ ​ ​ ​ for demonstration in Equation 4. ​ ​

31

Equation 4 - Graph 2 Inconsistency Explanation ​

Tree 6 is the search tree for A* search in Graph 2. In Tree 6, the purple numbers between “+” ​ ​ ​ and “=” are the heuristic distances from the node to the goal. The purple number after “=” is the estimated total distance. The green cross indicates the eliminated path due to the addition of the extended set.

32

Tree 6 - A* Search Tree of Graph 2 ​ ​

The most optimal path is S-A-C-G with a path length of 102. However, the A* search returned

S-B-C-G with a path length of 111, which is not optimal.

3.2.1 In Context

A* is known to prioritise speed over the quality of the result due to elimination of unlikely paths by a heuristic function. While it generates a reasonably optimal path, meaning that the path generated is shorter than one generated by BFS but longer than one generated by Dijkstra’s algorithm which is the shortest possible path, its main feature is its fast searching speed. If the heuristic function is not admissible in regards to the graph, the search process will not speed up.

33 4 Methodology

4.1 Setup

● Anonymised user database in the form of a Graph API from social media (see 4.4.3 ​ Privacy) ​ ● NetworkX library in Java/Python for graph generation and search algorithm

4.2 Procedure

1. Import user database

This step includes acquiring an Access Token to access the API before installing and importing the data into Python.

2. Import NetworkX

NetworkX is a library for graph studies in Python. It has graph-generating functions and pre-made search algorithms.

3. Create Heuristic Function

The heuristic function would read certain information of the anonymised user and information, such as the geographical location, workplace, schools, hometown, number of comments and likes on each other’s post etc. to generate a “score” of the closeness between the users which determines the weight of the edge between the two user nodes.

4. Use NetworkX to generate graph

5. Use NetworkX’s A* and Bidirectional Dijkstra functions to search the graph

6. Run 3 trials and collect data

34 7. Record time taken, number of nodes expanded and path length of each trial

4.3 Success Criteria

The algorithms will be evaluated by their time complexity, number of nodes expanded and the length of the distance generated.

4.3.1 Time Complexity

Time complexity is an indication of the efficiency of or time taken to run an algorithm. Search algorithms’ time complexity is often measured by the number of nodes expanded. Time complexity is expressed with the big O notation.

4.3.2 Expanded Nodes

This criterion measures the number of nodes expanded. In other words, the number of nodes in the extended set. The less expended nodes there are, the better the algorithm because it means the algorithm is more efficient in determining the direction of the best path.

4.3.3 Optimal Distance

The shorter the distance between the start and goal node is, the better the algorithm. The distance will be calculated by adding up the weights of the edges between the nodes. The value of the weights indicates the closeness of the relationship between the nodes it is connected to. The closer the relationship, the lower the weight of the edge is.

35 4.4 Clarifications

4.4.1 Six Degrees of Separation

Six degrees of separation is the concept that any 2 randomly selected people in the world are 6 people or less away from each other. (Watts)

British anthropologist Robin Dunbar suggested that humans are capable of maintaining a social sphere of 150 at maximum. Dunbar’s number is a theorised limit to the number of people in a person’s social circle according to the size of their brain. (Emerging Technology from the arXiv)

Therefore, 150 is the branching factor of a network of human connections.

Supposing that everyone knows 150 people. The first person knows 150 people who each knows

150 people who each knows 150 people who each knows 150 people who each knows 150 people. The number of people reached after 5 extensions is shown in Equation 5. ​ ​

5 150 ✕ 150 ✕ 150 ✕ 150 ✕ 150 = 150 =​ 75,937,500,000 ​ Equation 5 - Number of people reached ​

According to the United Nations, the world population in 2019 is 7.7 billion which is less than the number of people reached after 5 extensions. ("World Population") In other words, two random people are connected to each other through the social circles of 5 or less people.

36 This implies that in an ideal dataset that includes everyone on Earth, the number of nodes expanded should be 5 or less. However, since the datasets used in this test is not ideal, the number of nodes expanded will most likely be larger than 5.

4.4.2 Calculating Heuristic Values

Since as-the-crow-flies distance is non-existent in an abstract map of human connections, a heuristic function will be written to generate the heuristic values between nodes in the maps.

Heuristic values are necessary for A* to work since it uses a heuristics function as mentioned in

2.5 Heuristic Function.

According to Professor of Counselling at Northern Illinois University Suzanne Degges-White, ​ ​ there are 2 major aspects that influence the likelihood of a connection between people: individual factors and environmental factors. Individual factors, such as approachability, social skills, self-disclosure, similarity, and closeness, are difficult to quantify and will not be accounted for in the heuristics function. Environmental factors, such as proximity, geography, activities, and life events, can be quantified to a degree and will be accounted for in the heuristics function.

(Degges-White)

The heuristic function will take into account factors such as geographical location, workplaces, schools, occupation industry and age. Geographical location determines the likelihood of encounter in daily life. Workplaces is the list of companies that the person had been or is employed in. Schools is the list of schools that the person had or is studying at. Occupation industry is the industry that the person is working in. People in the same or related industries

37 have a higher chance of contact. Age determines the compatibility of people and likelihood of encounter throughout their lives. These factors are helpful but not strict determinants in determining the likelihood of encounter between people. There may be other factors that are not accounted for in this heuristic function.

4.4.3 Privacy

Privacy is a compound issue that needs to be addressed to prevent abuse of user information such as large scale psychological manipulation. Anonymising users, as mentioned in 4.1 Setup, ​ ​ conceals the users’ name and identity. The user information will not be used against the user and is purely for research-purpose. The abuse of user information to manipulate the users’ psyche for the selfish motivations of another is in no way justifiable.

4.4.4 Logistical Setback

In light of recent events concerning user privacy, major social media companies have removed developer access to user friend lists. Hence, one of the major components of the procedure is unattainable. An alternative solution would be to artificially generate a human network by creating fake human nodes with personal information, such as geographical location, chosen from a predetermined list, but an artificial network would defeat this paper’s goal to study real human relationships. Therefore, this paper have resorted to a theoretical and analytical approach backed by small scale hypothetical examples.

38 5 Comparison

5.1 Hypothesis

A* and Dijkstra are technically the same except that the former takes heuristic values into account. If a graph of people can be traversed, then Dijkstra is guaranteed to find the shortest path. However, if the graph is large or efficiency is prioritised, A* is preferred. If there is not enough information on the graph, such as the size, only Dijkstra can guarantee a result. If the heuristic function is unable to generate reliable heuristic values, A* will lose its advantage of speed from eliminating paths based on heuristic values and be the same as Dijkstra.

As mentioned, A* is theoretically faster than Dijkstra. In this test, the bidirectional component of

Bidirectional Dijkstra makes up for the slow speed of Dijkstra.

It is hypothesised that Bidirectional Dijkstra is both faster and finds more optimal paths compared to A* with a functional heuristics function in a graph.

5.2 Analysis

5.2.1 Bidirectional Dijkstra’s Algorithm Explanation

Bidirectional Dijkstra does not have a heuristic function, meaning it does not have an intuition of where the goal node is and explores all of the neighbouring nodes around the start node and goal node until the two frontiers meet, slowing down the processing speed.

39

However, Dijkstra is guaranteed to find the best path since it explores every possibility, while bidirectional search speeds up the process (see 3.1.3 In Context). ​ ​

3 The time complexity is O(b )​ where b is the branching factor, which is 150 in a human network ​ (see 4.4.1 Six Degrees of Separation). Graph 3 and Equation 6 explains the derivation of the ​ ​ ​ ​ ​ ​ time complexity. In Graph 3, the dots represent the nodes in the final path. The coloured region ​ ​ represent search coverage. The other nodes were not drawn for the sake of simplicity.

Graph 3 - Bidirectional Dijkstra Time Complexity Visualization ​

The highlight colours in Equation 6 correspond with the expansion region of the same colour in ​ ​ Graph 3. The red region represents the nodes expanded from the start node, which is . Then, ​ for every one of the nodes expanded from the start node, each has b child nodes, hence ​ ​

40 . Finally, half of the leaf nodes are expanded and since each node has around b children nodes, . ​ ​ Simultaneously, the same process is taking place from the goal node, meaning that the yellow and green search regions will meet halfway, thus only half of the nodes in the orange region are expanded.

Equation 6 - Bidirectional Dijkstra Time Complexity Derivation ​

Table 2 is a small-scale example to illustrate the time complexity where branching factor is ​ scaled down to 5.

41 1) No. of Nodes Reached: 0 2) No. of Nodes Reached: 10

Red dot: start node

Green dot: goal node

3) No. of Nodes Reached: 14 4) No. of Nodes Reached: 19

The circled nodes are expanded

42

5) No. of Nodes Reached: 22 6) No. of Nodes Reached: 24

A complete path is found. Continue expanding nodes in queue to check

for shorter paths.

A shorter path is found.

43 7) No. of Nodes Reached: 26

Remaining nodes expanded and no other complete paths found. Search complete.

Table 2 - Bidirectional Dijkstra Time Complexity Example ​

3 3 b =​ 5 ​ = 125 ​ ​ Equation 7 - Time Complexity of Bidirectional Dijkstra Example ​

The number of nodes expanded is less than the theorised time complexity, 26 < 125; therefore,

3 O(b )​ is justified. ​

44 5.2.2 A* Search Algorithm Explanation

A* has a heuristic function, meaning it has an intuition about the general direction of the goal node and is able to eliminate most neighbouring nodes. Since it expands less nodes, the processing speed is relatively faster than that of Bidirectional Dijskstra’s Algorithm.

However, A*’s time complexity depends on the heuristic function’s quality. A* with an ineffective heuristic function, which can’t compute a value, would result in a Dijkstra search whose default heuristic value is 0. In this case, A* prioritises based on path lengths only. A* with a misleading heuristics function, which computes a wrong value and leads the search away from the goal node, might result in the expansion of all the nodes in the graph. Therefore, the worst case time complexity is O(bd). (Russell and Norvig 97–104) b is the branching factor while d is the depth of the search tree (see Table 1 in 2.1 Basics for full definition). Table 3 and Equation 8 ​ ​ ​ ​ ​ ​ ​ show a small-scale example of a A* search with a misleading heuristic function. (same graph as

Table 2) A misleading heuristic function leads the search away from the goal node instead of ​ leading it towards the goal. There may be a lack of quantifiable factors leaving the function with insufficient or incomplete information, causing it to be misleading.

45

1) No. of Nodes Reached: 5 2) No. of Nodes Reached: 8

3) No. of Nodes Reached: 11 4) No. of Nodes Reached: 39

Table 3 - A* Worst Case Time Complexity Example ​

46

d 6 b ​ = 5 =​ 15,625 ​ ​ Equation 8 - Best Case Time Complexity of A* Best Case Example ​

Since this is a limited graph, this is not an accurate quantitative representation of a A* with a bad heuristic function search. However, this visually shows the extensive coverage caused by misleading heuristic functions.

The best case time complexity is O(5b) because of the concept of six degrees of separation (see

4.4.1 Six Degrees of Separation). The expansion of the first 5 nodes will reach the 6th node; ​ therefore, b is multiplied by 5 not 6. An effective heuristic function would compute an accurate value that only identifies relevant nodes for expansion. Table 4 and Equation 9 illustrate a ​ ​ ​ ​ small-scale example of this.

47

1) No. of Nodes Reached: 5 2) No. of Nodes Reached: 8

3) No. of Nodes Reached: 11 4) No. of Nodes Reached: 13

Table 4 - A* Best Case Time Complexity Example ​

48 6b = 6(5) = 30

Equation 9 - Best Case Time Complexity of A* Best Case Example ​

The number of nodes reached is smaller than the theorised, 13 < 30, and provided with the small-scale of this example contributing to a smaller time complexity, O(5b) is justified.

6 Conclusion

Through this research, I compared the efficiency of A* and Bidirectional Dijkstra in traversing a human network. Both algorithms had their strengths and weaknesses. Bidirectional Dijkstra explores every possibility, guaranteeing an optimal path. However, its strength of meticulous searching also slows the search. Bidirectional search shortens the search time by searching from both the start and the goal. A* has a heuristic function that prunes irrelevant nodes and speeds up the search.

The analysis concluded that A*’s efficiency depends heavily on its heuristic function’s quality.

A*’s worst case time complexity is significantly larger than Bidirectional Dijkstra’s average time complexity. However, A*’s best case time complexity is smaller than Bidirectional Dijkstra’s average time complexity. The same applies to the number of nodes expanded since time complexity is calculated based on it. Worst-Case A*, Best-Case A* and Bidirectional Dijkstra find paths of similar, if not the same, optimality.

49 In conclusion, theoretically, A* search algorithm with an effective heuristic function is more efficient than Bidirectional Dijkstra’s algorithm at finding the closest relationship between 2 people in a network of human connections.

7 Evaluation

Six degrees of separation occurs if the graph includes a large number of people and accurately represents their connections. The graph edge weights has to accurately quantify the closeness of the relationship. The graph needs to have directed edges of different weights to represent different individual perception of familiarity level in a relationship (e.g. unrequited affection).

Creating such a graph is not impossible but challenging.

A perfect heuristics function needs to take human factors into account. Individual factors mentioned in 4.4.2 Calculating Heuristic Values are difficult to quantify, thus, a perfect heuristic ​ ​ function is difficult to create. Even with a heuristic function that measures quantifiable factors, perfect information about the people must be accessible. This was a major issue during the setup process since existing datasets do not have perfect information. However, despite imperfect conditions, heuristic functions can compute a reasonable heuristic value based on publicly available data, such as location, occupation, age group, company, school, etc.

Further investigation would require access to publicly unavailable data. (see 4.4.4 Logistical ​ Setback) ​

50 8 Works Cited

Degges-White, Suzanne, Ph.D. "Friendology: The Science of Friendship." Psychology Today, ​ ​ www.psychologytoday.com/us/blog/lifetime-connections/201805/friendology-the-science

-friendship. Accessed 11 Sept. 2019.

Emerging Technology from the arXiv. "Your Brain Limits You to Just Five BFFs." MIT ​ Technology Review. MIT Technology Review, ​ ​ ​ www.technologyreview.com/s/601369/your-brain-limits-you-to-just-five-bffs/. Accessed

11 Sept. 2019.

"Lab 2." Ai6034.mit.edu, 14 Sept. 2018, ai6034.mit.edu/wiki/index.php?title=Lab_2. Accessed ​ ​ 17 June 2019.

"Pathfinding.js." Qiao.github.io, qiao.github.io/PathFinding.js/visual/. Accessed 11 Sept. 2019. ​ ​ Russell, Stuart, and Peter Norvig. Artificial Intelligence: A Modern Approach. Prentice Hall, ​ ​ 2003.

Sawlani, Sneha. "Explaining the Performance of Bidirectional Dijkstra and A* on Road

Networks." Digital Commons @ DU, p. 11. Digital Commons @ DU, ​ ​ ​ ​ digitalcommons.du.edu/cgi/viewcontent.cgi?article=2303&context=etd. Accessed 11

Sept. 2019.

Watts, Duncan J. Six Degrees: The Science of a Connected Age. W. W. Norton & Company, ​ ​ 2004.

Winston, Patrick H., narrator. Lecture 5: Search: Optimal, , A*. MIT ​ ​ ​ OpenCourseWare, ​

51 ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-034-artificial-intellig

ence-fall-2010/lecture-videos/lecture-5-search-optimal-branch-and-bound-a/. Accessed

17 June 2019.

"World Population Prospects 2019: Highlights." United Nations, 17 June 2019, ​ ​ www.un.org/development/desa/publications/world-population-prospects-2019-highlights.

html. Accessed 11 Sept. 2019.

52 9 Works Consulted

"Dijkstra vs Bi-directional Dijkstra Algorithm on US Road Network." Youtube, 20 May 2017, ​ ​ www.youtube.com/watch?v=1oVuQsxkhY0. Accessed 11 Sept. 2019.

"Dijkstra vs Bi-directional Dijkstra Progress - Rectangular and Hexagonal Grid." Youtube, 14 ​ ​ May 2017, www.youtube.com/watch?v=8Jjdp6f7oaE. Accessed 11 Sept. 2019.

Geeks for Geeks. www.geeksforgeeks.org/fundamentals-of-algorithms/#SearchingandSorting. ​ Accessed 11 Sept. 2019.

"Lab 2." Ai6034.mit.edu, 7 Sept. 2019, ai6034.mit.edu/wiki/index.php?title=Lab_2. Accessed 11 ​ ​ Sept. 2019.

"The Science of Six Degrees of Separation." Youtube, 25 Aug. 2015, ​ ​ www.youtube.com/watch?v=TcxZSmzPw8k. Accessed 11 Sept. 2019.

Winston, Patrick. Artificial Intelligence. 3rd ed., Pearson. Courses.csail.mit.edu, Pearson, ​ ​ ​ ​ courses.csail.mit.edu/6.034f/ai3. Accessed 11 Sept. 2019.

53