Tarjan's Lowest Common Ancestor Algorithm
Total Page:16
File Type:pdf, Size:1020Kb
Tarjan's Lowest Common Ancestor Algorithm Maria Mahbub Algorithms - COSC 581 04/20/2021 Test Questions 1. What is the difference between static and off-line LCA finding problems? 2. Which data structure is used in Tarjan’s lowest common ancestor algorithm? 3. What is the overall time complexity of Tarjan's lowest common ancestor algorithm? 2 About me ● Phd Student ○ Computer Science Major -- EECS, UTK ○ Joined the Data Science Program @UTK in Fall 2018. ○ Migrated to Computer Science in Fall 2020 ● Research interest: Natural Language Processing, more specifically vulnerability assessment of NLP models ● Research Collaborator -- Oak Ridge National Laboratory (NSSD) ○ Currently working on the REACHVET project, focusing on suicide prevention ● Advisors: ○ Dr. Gregory Peterson (primary advisor at UTK) ○ Dr. Edmon Begoli (co-advisor at ORNL) 3 Hometown & Interests ● Home country: Bangladesh ● Home town: Dhaka, the capital of BD 8,354 ● BS in Mathematics & MS in Appl. Mathematics miles ● Love to travel places with friends and family ● LOVE Biryani (my comfort food!) My undergrad institution: University of Dhaka (the oldest, largest and one of the most prestigious universities in our country) 4 Some Pictures! 5 Outline ● Overview ● Motivation ● History ● Some Developments since Tarjan’s Algorithm ● Algorithm ● Implementation ● Comparison with Another Frequently Used Algorithm: RMQ-LCA ● Applications ● References ● Discussion 6 Overview ● Lowest Common Ancestor (LCA) ○ Consider two nodes x and y in a tree ○ LCA(x,y) is the lowest node in the tree that has both x and y as descendants. ○ A node can be a descendant of itself. ○ LCA of z and y would be z, since y has a direct connection from z x z y 7 Overview ● Problem definition: If we have a rooted tree, how fast can we find answers for Lowest Common Ancestor (LCA) queries for any pair of nodes? ● The LCA problem was first formulated by Aho, Hopcroft, and Ullman in 1973 ● Applications: ○ In object-oriented programming to find superclass in an inheritance hierarchy ○ In compiler design to facilitate some common basic computation for two basic blocks through their ancestors 8 Motivation ● Aho, Hopcroft, and Ullman introduced 3 versions of the LCA problem: ● Online LCA: Find LCA for node pairs as the queries are made ● Static LCA: Require the answers on line, but all tree merging instructions precede the information requests ● Off-line LCA: First get all queries and then find LCA for node pairs ○ Time complexity: O(nlog*n) in the 1973 version ○ In 1976, Aho, Hopcroft, and Ullman used the set union algorithm. They used an intermediate problem, called the off-line min problem. ○ Time complexity becomes O(nα(n)) ● The motivation behind Tarjan’s LCA algorithm was to implement a cleaner approach using set union algorithm directly 9 History ● Developed in 1979 by Robert Endre Tarjan ● Published in “Applications of path compression on balanced trees” paper ● Uses disjoint-set/union-find data structure ● Time complexity: ○ Barely slower than linear Robert Endre Tarjan ○ For a rooted tree with N nodes and Q queries, total runtime is O(Nα (N)+Q). ○ α is the Inverse Ackermann function 10 Chronological Developments 1983 1984 1988 ● off-line ● off-line ● off-line ● for special case of disjoint set ● uses the observation that on ● uses EREW PRAM union problem, time complete binary trees the LCA ● q queries take O(logn) time to complexity is O(n) can be solved in O(1) time by process in (n+q)/logn ● slight, but theoretically direct calculation. processors significant improvement ● O(n) to preprocess the tree ● Read conflicts are allowed ● the theory was too ● also not easily implementable complicated to implement effectively Gabow and Tarjan Gabow and Tarjan Schieber and Vishkin 11 Chronological Developments 1989 2000 1998 ● static ● static ● off-line ● based on the observation by ● simplification of the previous ● pointer-machine Gabow, Bently and Tarjan that approach implementation computing minimum over any ● implemented without the ● uses pointer-based radix sort interval can be reduced to PRAM ● time complexity: O(n+q) answering an LCA query in a ● sequential approach Cartesian data structure and ● uses RMQ the RMQ problem can be ● O(n) for pre-processing and solved serially in linear time O(1) for query ● proposed an algorithm using CRCW PRAM that takes O(ɑ Bender and Farach-Colton Buchsbaum, Kaplan, Rogers, and (n)) to preprocess and answer Westbrook LCA queries ● Complexities arises due to PRAM Berkman, Breslauer, Galil, Schieber, and Vishkin 12 Algorithm Basics ● Disjoint-set/Union-find Data Structure ○ Keeps track of a set of elements partitioned into several disjoint subsets ○ Supports two basic operations: e a ■ Find: Determines which subset a particular element is in f b c ■ Union: Merges two subsets into a single subset ○ Represented by rooted trees: Node: Member, Tree: Set d ○ A member points only to its parent. Root contains the representative and is its own parent. ● Time complexity O(mα(n)), for a sequence of m union, or find operations on a disjoint-set forest with n nodes, where α(n) is the extremely slow-growing inverse Ackermann function. 13 Algorithm Basics ● Disjoint-set Union implementation w/ Path Compression & Union Rank: ○ MAKE-SET: creates a tree with just one node. Time Complexity: O(1) ○ FIND-SET: follows parent pointers until the root of the tree is found. ■ path compression: points all the nodes on the search path directly to the root ○ UNION: causes the root of one tree to point to the root of the other. ■ union by rank: attaches the shorter tree to the root of the taller tree. f a a union (e, g) find(e) = find(g) g b c b c e f a f d e d g 14 Algorithm LCA(u) 1. MAKE-SET(u) 2. FIND-SET(u).ancestor = u 3. for each child v of u in T 4. LCA(v) 5. UNION(u,v) 6. FIND-SET(u).ancestor = u 7. u.color = BLACK 8. for each node v such that {u,v} ∈ P 9. if v.color = BLACK 10. print “The least common ancestor of” 11. u and v is FIND-SET(v).ancestor 15 Implementation ● Find Answers to the Queries: LCA (2,5), LCA (6,7), LCA (5,6) in this tree 1 2 3 4 5 6 7 16 Implementation LCA walk from 1 towards its 1 1 left-child 2 LCA walk from 2 towards its 2 3 left-child 4 2 3 4 5 6 7 4 5 6 7 ➢ ➢ create disjoint set for node 1 1 create disjoint set for node 2 2 4 ➢ ancestor[1] = 1 ➢ ancestor[2] = 2 17 Implementation 1 1 LCA walk Return back from 2 from 4 to 2 2 3 2 towards its 3 and color 4 right-child 5 BLACK 4 5 6 7 4 5 6 7 ➢ return disjoint set for node 4 2 1 5 ➢ UNION (2,4) ➢ ancestor[4] = 2 4 18 Implementation Return back 1 from 2 to 1 1 and color 2 BLACK Return back 2 from 5 to 2 3 2 3 and color 5 BLACK 4 5 6 7 4 5 6 7 ➢ return disjoint set for node 5 ➢ LCA (2,5) = FIND-SET(5).ancestor 2 1 ➢ UNION (2,5) = ancestor [FIND(5)] = ancestor[2] ➢ ancestor[5] = 2 = 2 4 5 19 Implementation LCA walk from 1 1 towards its 1 right-child 3 LCA walk 2 3 2 from 3 3 towards its left-child 6 4 5 6 7 4 5 6 7 ➢ return disjoint set for node 2 2 3 6 ➢ UNION (1,2) ➢ ancestor[2] = 1 4 5 1 20 Implementation 1 1 LCA walk from 3 towards its right-child 7 2 Return back 3 2 3 from 6 to 3 and color 6 BLACK 4 5 6 7 4 5 6 7 ➢ return disjoint set for node 6 2 3 7 ➢ UNION (3,6) ➢ ancestor[6] = 3 4 5 1 6 21 Implementation Return back 1 1 from 3 to 1 and color 3 BLACK Return back 2 3 from 7 to 3 2 3 and color 7 BLACK 4 5 6 7 4 5 6 7 ➢ return disjoint set for node 7 ➢ LCA (6,7) = FIND-SET(7).ancestor 2 3 ➢ UNION (3,7) = ancestor [FIND(7)] ➢ ancestor[7] = 3 = ancestor[3] 4 5 1 6 7 = 3 22 Implementation color 1 2 1 BLACK 4 5 1 2 3 3 4 5 6 7 6 7 ➢ UNION (1,3) ❏ LCA (2,5) = 2 ➢ LCA (5,6) = FIND-SET(6).ancestor ❏ LCA (6,7) = 3 = ancestor [FIND(6)] ❏ LCA (5,6) = 1 = ancestor [3] = 1 23 Tarjan’s LCA vs. RMQ-LCA ● Find Answers to the Queries: LCA (2,5), LCA (6,7), LCA (5,6) in T 1 2 3 4 5 6 7 24 RMQ-LCA Algorithm ● Uses Range Minimum Query & Euler tour to find LCA on static tree ● For this Range Minimum Query Algorithm, in LCA(u,v), u must be smaller than v. ● Range Minimum Query: Used to find the position of an element with the minimum value between two specified indices in an array ● Euler Tour: way of traversing tree starting from root and then reaching back to root after visiting all vertices without lifting pencil. ● Time Complexity: 1 ○ Preprocessing: O(n) ○ RMQ w/ segment tree data structure: O(logn) 2 3 4 5 6 7 25 RMQ-LCA Algorithm ● Perform a Euler tour on the tree, and fill three arrays: ○ Euler Tour Array - tracks nodes visited in order during Euler tour ○ Level Array - tracks each node’s respective level during Euler tour ○ First Occurrence Array - tracks index of the first occurrence of nodes in Euler tour ● Using the first occurrence array, get the indices corresponding to the two given nodes which will be the corners of the range in the level array that is fed to the RMQ algorithm for the minimum value.