1 Persistent Trees

Persistent Trees Full copy bad Fat no des metho d � replace each singlevalued eld of data structure by list of all values taken sorted by time � requires O space p er data change unavoidable if keep old date � to lo okup data eld need to search based on time � store values in binary tree � checkingchanging a data item takes O log m time after m up dates � slowdown of O log m in structure access Path copying � much of data structure consists of xedsize nodes conencted by pointers � can only reach no de by traversing p ointers starting from root � changes to a no de only visible to ancestors in p ointer structure ) k to ro ot of data structure � when change a no de copy it and ancestors bac � keep list of ro ots sorted by up date time � O log m time to nd right ro ot or const if time is integers � same access time as original structure � additive instead of multiplicative O log m � mo dication time and space usage equals numb er of ancestors p ossibly huge Combined Solution trees only � in each no de store extra timestamp ed eld � if full overrides one of standard elds for any accesses later than stamp ed time � access rule standard access just check for overrides while following p ointers constant factor increase in access time � up date rule when need to changecopy p ointer use extra if available { otherwise make new copy of no de with new info and recursively mo dify parent � Analysis { live no de p ointed at by current ro ot { p otential function numb er of ful l live no des { copying a no de is free new copy not full pays for copy spacetime { pay for lling an extra p ointer do only once since can stop at that p oint { amortized space p er up date O Power of twos Like Fib heaps Show binary tree of mo dications Application p ersistent trees � amortized cost O to change a eld � splay tree has O log n amortized eld change p er access � O log n space p er access � drawback rotations on access mean unbounded space usage Redblack trees � aggressive rebalancers � store redblack bit in each no de k bit to rebalance � use redblac � depth O log n � search standard binary tree search no changes � up date causes changes in redblack elds on path to item O rotations � result log n space per insertdelete � geometry do es O n changes so O n log n space � O log n query time Improvement � redblack bits used only for up dates � only need current version of redblack bits � dont store old versions just overwrite � only up dates needed are for O rotations � so O space p er up date � so O n space overall Result O n space O log n query time for planar p oint lo cation Extensions � metho d extends to arbitrary p ointerbased structures � O cost p er up date for any p ointerbased structure with any constant indegree s � full p ersistence with same b ounds Sux Trees Cro chemore and Rytter Text Algorithms Guseld Algorithms on Strings Trees and Sequences Weiner Linear Patternmatching algorithms IEEE conference on automata and switching theory McCreight A spaceeconomical sux tree construction algorithm JACM Chen and Seifras Ecient and Elegegant Sux tree construction in Ap os tolicoGalil Combninatorial Algorithms on Words Another search structure dedicated to strings Basic problem match a pattern to text � goal decide if a given string pattern is a substring of the text � p ossibly created by concatenating short ones eg newspap er � application in IR also computational bio DNA seqs � if pattern avilable rst can build DFA run in time linear in text � if text available rst can build sux tree run in time linear in pattern tree on strings Inecient b ecause run over pattern many First idea binary times Tries � Idea like bucket heaps use b ounded alphab et � used to store dictionary of strings � trees with children indexed by alphab et � time to search equal length of query string � insertion ditto � optimal since even hashing requires this time to hash � but b etter b ecause no hash function computed � space an issue { using array increases stroage cost by jj { using binary tree on alphab et increases search time by log jj { ok for const alphab et size in worst case sum of word lengths so pretty much solves dictionary problem But what ab out substrings 2 substrings idea trie of all n equivalent to trie of all n suxes put marker at end so no sux contained in other otherwise some sux can b e an internal no de hidden by piece of other sux means one leaf p er sux Naive construction insert each sux basic alg { text a a 1 m { dene s a a i i m { for i to m { insert s i 2 time space O m Better construction uch smaller aaaaaaa note trie size may b e m algorithm with time O jT j idea avoid rep eated work by memoizing also shades of nger search tree idea supp ose just inserted aw next insert is w big prex of w might already b e in trie � avoid traversing skip to end of prex Sux links � any no de in trie corresp onds to string � arrange for no de corresp to ax to p oint at no de corresp to x � supp ose just inserted aw � walk up tree till nd sux link � follow link puts you on path corresp to w � walk down tree adding no des to insert rest of w Memoizing save your work � can add sux link to every no de we walked up � since walked up end of aw and are putting in w now � charging scheme charge traversal up a no de to creation of sux link � traversal up also covers same length traversal down � once no de has sux link never passed up again � thus total time sp ent going updown equals numb er of sux links � one sux link p er no de so time O jT j Sux Trees 2 m problem maybe jT j is large compress paths in sux trie path on letters a a corresp to substring of text i j replace by edge lab elled by i j implicit nodes gives tree where every no de has indegree at least in such a tree size is order numb er of leaves O m Search still works { preserves invariant at most one edge starting with given character leaves a no de { store edges in array indexed by starting character { walk down same as trie but use indexing into text to nd chars { called slownd for later Construction � obvious build sux trie compress 2 � drawback may take m time and intermediate space � b etter use original construction idea work in compressed domain � as b efore insert suxes in order s s 1 m � compressed tree of what inserted so far � to insert s walk down tree i � at some p oint path diverges from whats in tree � may force us to break an edge show � tack on one new edge for rest of string cheap MacReight � use sux link idea of uplinkdown � problem cant sux link every character only explict no des � want to work prop ortional to real no des traversed � need to skip characters inside edges since cant pay for them � intro duced fastnd { idea fast alg for descending tree if know string present in tree { just check rst char on edge then skip numb er of chars equal to edge length { may land you in middle of edge sp ecied oset b er of explicit no des in path { cost of search num { pay for with sux links Analysis � supp ose just inserted string aw � sitting on its leaf which has parent � invariant every internal no de except for parent of current leaf has sux link to another explicit no de � plausible { supp ose s and s diverge creating explicit no de at v j k { claim s and s diverge at suxv creating another explicit j +1 k +1 no de { only problem if s not yet present k +1 { happ ens only if k is current sux { only blo cks parent of current leaf � insertion step { consider parent p and grandparent parent of parent g of current i i no de { g to p link has string w i i 1 { p to l link w i i 2 { go up to grandparent { follow sux link { fastnd w 1 { claim know w is present in tree 1 es in variant { create sux link from p preserv i { slownd w stopping when leave current tree 2 { break current edge { add new edge for rest of w 2 Analysis � Break into two costs from sufg to sufp w then sufp to p i i 1 i i+1 w 2 � slownd cost is time to get from sufp to p plus const i i+1 � note p is descendant of sufp i+1 i P � so total cost O jp p j O n i+1 i what ab out time to nd sufp i less than slownd so at most jg j jg j to reach g fastnd costs i+1 i i+1 Sums to linear Done if g b elow sufp double counts but who cares i+1 i what if g ab ove sufp i+1 i can only happ en if sufp p this is only no de b elow g i i+1 i+1 in this case fastnd takes step to go from g to p landing in middle i+1 i+1 of edge only case where fastnd necessary but cant tell in advance Weiner algorithm insert strings backwards use prex links Ukonnen online version Sux arrays Applications � prepro cess b ottom up storing rst last num of suxes in subtree � allows to answer queries what rst last count of w in text in time O jw j enumerate k o ccurrences in time O w jk j traverse subtree binary so size order of numb er of o ccurences longest common substring work b ottom up deciding if subtree contains suxes of b oth strings Take deep est no de satisfying .

Load more