A Version Numbering Scheme with a Useful Lexicographical Order

AVersion Numb ering Scheme with a Useful Lexicographical Order y z Arthur M. Keller Je rey D. Ullman Stanford University Stanford University Computer Science Dept. Computer Science Dept. Stanford, CA 94305-2140 Stanford, CA 94305-2140 Abstract all situations. A con guration C mayhave an ances- 1 tor con guration C , in which each of the versions par- 0 ticipating in con guration C must b e a descendant We describe a numbering scheme for versions with 1 of the corresp onding version in con guration C . The alternatives that has a useful lexicographical ordering. 0 CEDB computational strategy is to check incremen- The version hierarchy is a tree. By inspection of the tally for constraint violations by comparing a con g- version numbers, we can easily determine whether one uration with an ancestor con guration. This compar- version is an ancestor of another. If so, we can de- ison relies on the determination of changes \deltas" termine the version sequencebetween these two ver- that o ccurred b etween twoversions. sions. If not, we can determine the most recent common ancestor to these two versions i.e., the least up- Our approach is to record the incremental changes per bound, lub. Sorting the version numbers lexico- between a version and its immediate ancestor. As a graphical ly results in a version being fol lowedbyall result, if we need to nd the changes that haveoc- descendants and preceded by al l its ancestors. We use curred b etween versions A and B ,wemust nd the arepresentation of nonnegative integers that is self changes asso ciated with all versions that lie b etween delimiting and whose lexicographical ordering matches A and B in the version hierarchy. Imagine that all the ordering by value. changes are stored in a relation, along with the version to which they p ertain. For example, if in going 1 Intro duction from version P toachild version C ,we insert the ele- ment a, then we might store the change record, a triple Our motivation is a pro ject in collab orative design insert, a, C in the relation. for Civil Engineering b eing carried out at Stanford. Ideally,we should b e able, given version identi ers The approach taken by the pro ject, named CEDB A and B , to nd all the change records asso ciated Collab orativeEnvironment for the Design of Build- with versions that lie b etween the version numb ers for ings, leads to an interesting problem in version num- A and B in the version hierarchy including B but b ering. Our approachtoversion numb ering is the sub- not A. Surely,we can do so if wehave access to the ject of this pap er. hierarchy.However, there is a lack of eciency in an CEDB uses a version and con guration manage- approach that issues one query for eachversion on the ment system that supp orts incremental checking of path b etween A and B .We prop ose instead to issue constraints see Krishnamurthy [1993]. There is a hi- one range query that will access all the change records erarchical version system, with a separate version hier- for all the versions b etween A and B . With the prop er archy for each design discipline e.g., plumbing, elec- index and storage structure, these change records can trical, structural engineering. A con guration con- b e obtained in little more than the time it takes to sists of a version from each discipline together with a access data of this bulk. set of constraints that the con guration should satisfy Our primary goal, therefore, is to nd a scheme for although we do not insist on constraint satisfaction in numb ering versions, so that all of the relevantchange This work is part of the CEDB pro ject: Collab orative En- records can b e found by a sequential scan of the data vironment for the Design of Buildings, or Civil Engineering between the twoversion numb ers when change records Database. This e ort is funded in part by NSF grant IRI{ are sorted by their asso ciated version numb er. There 91{16646. y are twointeresting asp ects of our version numb ering Arthur Keller's e-mail address is [email protected]. z scheme. Je Ullman's e-mail address is [email protected]. 1. A metho d for assigning numb ers to versions so the parent and the three children, a sequential traver- that all the versions b etween A and B in the sal from at least one of the children must skip over at hierarchyhaveversion numb ers that lie b etween least one of its siblings to reach its parent. In general, the version numb ers for A and B however, there we shall have to skip some versions, but it is highly may also b e other version numb ers that lie b e- desirable to nd all the ancestors of a version within tween A and B . The version numb ering scheme some small region of the complete ordering. For ex- must make it p ossible to add new children with- ample: out renumb ering old versions. If we wish to nd the changes from some version V to a descendantversion W ,we shall need to 2. A metho d for enco ding nonnegativeintegers as retrieve less information from secondary storage character strings so that if i<j, then the co de if the ancestors of W are in a small region of the for integer i is lexicographically less than the co de ordering. for j . If we store all change records in a single relation, Note that the conventional enco ding of integers and we sort or index those records byversion do es not satisfy 2. For example, in decimal the char- numb er, we can use a range query to help nd acter strings for the integers 5 and 43 are in the wrong the changes b etween a version V and a descen- order. That is, as integers, 5 < 43, but as character dantversion W , and this query will b e eciently strings, 43 lexically precedes 5. implemented. 2 Representing a Hierarchy Now supp ose we wish to add to the hierarchy.We assume that new versions can only b e added as chil- In this section, we describ e how to represent the dren of existing versions, as is the case in the CEDB hierarchical relationship of version numb ers using a system. When we add new versions, we are faced with linear enco ding that sorts in a useful manner. Let the problem of nding new names for the new versions, us consider the simple version hierarchy in Figure 1, esp ecially if we wish to retain the prop erty that names where the names of the versions o er no help in nding are in lexicographic order along paths of the hierarchy. the versions that lie b etween two given versions in the Figure 2 shows two p ossible approaches. In each hierarchy. case, wehave added some children to Figure 1. In Figure 2a wehave renamed versions so children fol- A low their parent in lexicographic order. Figure 2b shows another approach, where names of versions are unchanged and new versions are given names that fol- B lowany previously used names in lexicographic order. The primary advantage of ordering new versions near the parent as in Figure 2a is lo cality of refer- C D E ence. The primary advantage of naming versions in order of creation as in Figure 2b is that wedonot have to rename versions when adding other versions. Figure 1: Simple version hierarchy. However, in the latter scheme there tend to b e many \false drops," that is, version names that lie in lexi- Version A has one child, version B , which in turn cographic order b etween the names of some version V has three children, versions C , D , and E . In this case, and a descendant W ,yet are not lo cated b etween V the lexicographical ordering of the version names is and W in the hierarchy. useful. To trace the ancestry of version D ,we can By using variable-length version names, we can progress sequentially though the ordering. We tra- avoid the problem of renaming. Consider Figure 3, verse, A, B , skip C , and then stop when we reach where wehave enco ded the ancestry in the version D . name. That is, the name of eachversion is the name of its parent, followed byanumb er that indicates which Observe that it is not p ossible to order the ver- child it is in the ordering of children of its parent. sions so that a sequential traversal of the ordering will There are several further improvements we can encounter all the ancestors of eachversion in an unin- make to this scheme. First, we can optimize for the terrupted sequence. For example, consider a parent common case where a version has only one child. That has three children. Given some sequential ordering of A A B B C F I BA BB BC D E G H BAA BAB BBA BBB a Sorted with parents. Figure 3: Enco ding ancestry. A A B B C D E C BA BB F G H I D CA BAA BAB b Sorted by level. Figure 4: Another way to enco de ancestry. Figure 2: Sorting versions. 3 Our Prop osed Version-Numb ering Scheme is, wemay assign to the rst child of certain versions anumb er that is one greater than the numb er of its Note the problem in naming the descendants of ver- parent, than treating all alternatives children of the sion BA of Figure 4.

Load more