<<

AVersion Numb ering Scheme



with a Useful Lexicographical Order

y z

Arthur M. Keller Je rey D. Ullman

Stanford University Stanford University

Computer Science Dept. Science Dept.

Stanford, CA 94305-2140 Stanford, CA 94305-2140

Abstract all situations. A con guration C mayhave an ances-

1

tor con guration C , in which each of the versions par-

0

ticipating in con guration C must b e a descendant

We describe a numbering scheme for versions with

1

of the corresp onding version in con guration C . The

alternatives that has a useful lexicographical ordering.

0

CEDB computational strategy is to check incremen-

The version hierarchy is a . By inspection of the

tally for constraint violations by comparing a con g-

version numbers, we can easily determine whether one

uration with an ancestor con guration. This compar-

version is an ancestor of another. If so, we can de-

ison relies on the determination of changes \deltas"

termine the version sequencebetween these two ver-

that o ccurred b etween twoversions.

sions. If not, we can determine the most recent com-

mon ancestor to these two versions i.e., the least up-

Our approach is to record the incremental changes

per bound, lub. Sorting the version numbers lexico-

between a version and its immediate ancestor. As a

graphical ly results in a version being fol lowedbyall

result, if we need to nd the changes that haveoc-

descendants and preceded by al l its ancestors. We use

curred b etween versions A and B ,wemust nd the

arepresentation of nonnegative integers that is self

changes asso ciated with all versions that lie b etween

delimiting and whose lexicographical ordering matches

A and B in the version hierarchy. Imagine that all

the ordering by value.

changes are stored in a relation, along with the ver-

sion to which they p ertain. For example, if in going

1 Intro duction

from version P toachild version C ,we insert the ele-

ment a, then we might store the change record, a triple

Our motivation is a pro ject in collab orative design

insert, a, C in the relation.

for Civil Engineering b eing carried out at Stanford.

Ideally,we should b e able, given version identi ers

The approach taken by the pro ject, named CEDB

A and B , to nd all the change records asso ciated

Collab orativeEnvironment for the Design of Build-

with versions that lie b etween the version numb ers for

ings, leads to an interesting problem in version num-

A and B in the version hierarchy including B but

b ering. Our approachtoversion numb ering is the sub-

not A. Surely,we can do so if wehave access to the

ject of this pap er.

hierarchy.However, there is a lack of eciency in an

CEDB uses a version and con guration manage-

approach that issues one query for eachversion on the

ment system that supp orts incremental checking of

path b etween A and B .We prop ose instead to issue

constraints see Krishnamurthy [1993]. There is a hi-

one range query that will access all the change records

erarchical version system, with a separate version hier-

for all the versions b etween A and B . With the prop er

archy for each design discipline e.g., plumbing, elec-

index and storage structure, these change records can

trical, structural engineering. A con guration con-

b e obtained in little more than the time it takes to

sists of a version from each discipline together with a

access data of this bulk.

of constraints that the con guration should satisfy

Our primary goal, therefore, is to nd a scheme for

although we do not insist on constraint satisfaction in

numb ering versions, so that all of the relevantchange



This work is part of the CEDB pro ject: Collab orative En-

records can b e found by a sequential scan of the data

vironment for the Design of Buildings, or Civil Engineering

between the twoversion numb ers when change records

Database. This e ort is funded in part by NSF grant IRI{

are sorted by their asso ciated version numb er. There

91{16646.

y

are twointeresting asp ects of our version numb ering

Arthur Keller's e-mail address is [email protected].

z

scheme. Je Ullman's e-mail address is [email protected].

1. A metho d for assigning numb ers to versions so the parent and the three children, a sequential traver-

that all the versions b etween A and B in the sal from at least one of the children must skip over at

hierarchyhaveversion numb ers that lie b etween least one of its siblings to reach its parent. In general,

the version numb ers for A and B however, there we shall have to skip some versions, but it is highly

may also b e other version numb ers that lie b e- desirable to nd all the ancestors of a version within

tween A and B . The version numb ering scheme some small region of the complete ordering. For ex-

must make it p ossible to add new children with- ample:

out renumb ering old versions.

 If we wish to nd the changes from some version

V to a descendantversion W ,we shall need to

2. A metho d for enco ding nonnegativeintegers as

retrieve less information from secondary storage

character strings so that if i

if the ancestors of W are in a small region of the

for integer i is lexicographically less than the co de

ordering.

for j .

 If we store all change records in a single relation,

Note that the conventional enco ding of integers

and we sort or index those records byversion

do es not satisfy 2. For example, in the char-

numb er, we can use a range query to help nd

acter strings for the integers 5 and 43 are in the wrong

the changes b etween a version V and a descen-

order. That is, as integers, 5 < 43, but as character

dantversion W , and this query will b e eciently

strings, 43 lexically precedes 5.

implemented.

2 Representing a Hierarchy

Now supp ose we wish to add to the hierarchy.We

assume that new versions can only b e added as chil-

In this section, we describ e how to represent the

dren of existing versions, as is the case in the CEDB

hierarchical relationship of version numb ers using a

system. When we add new versions, we are faced with

linear enco ding that sorts in a useful manner. Let

the problem of nding new names for the new versions,

us consider the simple version hierarchy in Figure 1,

esp ecially if we wish to retain the prop erty that names

where the names of the versions o er no help in nding

are in lexicographic order along paths of the hierarchy.

the versions that lie b etween two given versions in the

Figure 2 shows two p ossible approaches. In each

hierarchy.

case, wehave added some children to Figure 1. In

Figure 2a wehave renamed versions so children fol-

A

low their parent in lexicographic order. Figure 2b

shows another approach, where names of versions are

unchanged and new versions are given names that fol-

B

lowany previously used names in lexicographic order.

The primary advantage of ordering new versions

near the parent as in Figure 2a is lo cality of refer-

C D E

ence. The primary advantage of naming versions in

order of creation as in Figure 2b is that wedonot

have to rename versions when adding other versions.

Figure 1: Simple version hierarchy.

However, in the latter scheme there tend to b e many

\false drops," that is, version names that lie in lexi-

Version A has one child, version B , which in turn

cographic order b etween the names of some version V

has three children, versions C , D , and E . In this case,

and a descendant W ,yet are not lo cated b etween V

the lexicographical ordering of the version names is

and W in the hierarchy.

useful. To trace the ancestry of version D ,we can

By using variable-length version names, we can

progress sequentially though the ordering. We tra-

avoid the problem of renaming. Consider Figure 3,

verse, A, B , skip C , and then stop when we reach

where wehave enco ded the ancestry in the version

D .

name. That is, the name of eachversion is the name of

its parent, followed byanumb er that indicates which Observe that it is not p ossible to order the ver-

child it is in the ordering of children of its parent. sions so that a sequential traversal of the ordering will

There are several further improvements we can encounter all the ancestors of eachversion in an unin-

make to this scheme. First, we can optimize for the terrupted . For example, consider a parent

common case where a version has only one child. That has three children. Given some sequential ordering of

A A

B B

C F I BA BB BC

D E G H BAA BAB BBA BBB

a Sorted with parents.

Figure 3: Enco ding ancestry.

A

A

B

B

C D E

C BA BB

F G H I

D CA BAA BAB

b Sorted by level.

Figure 4: Another way to enco de ancestry.

Figure 2: Sorting versions.

3 Our Prop osed Version-Numb ering Scheme

is, wemay assign to the rst child of certain versions

anumb er that is one greater than the numb er of its

Note the problem in naming the descendants of ver-

parent, than treating all alternatives children of the

sion BA of Figure 4. We need to distinguish b etween

same parentversion the same, as shown in Figure 4.

the rst child of BA and the alternativetoBA, b oth

This approach reduces the average length of names in

of which could plausibly have b een named BB. To

the common case when most versions have one child.

handle this problem, we shall alternate b etween en-

In comparison, the scheme of Figure 3 forces names to

co ding the generation numb er and enco ding the al-

b e as long as the paths in the hierarchy.

ternativenumb er. To facilitate the exp osition, we

In summary, there are three imp ortant criteria we

will use numb ers to represent the generation sequence

need from a version-naming scheme.

and letters to represent alternatives. However, in ver-

sion names, think of A;B;::: as representing integers

1. The lexicographic property : all the versions b e-

0; 1;::: . This approach is illustrated in Figure 5.

tween some version V and a descendantversion

Formally,a version numb er for a version V is a se-

W lie lexicographically b etween V and W .

quence g a g :::a g . Here, g indicates the number

0 1 1 n n n

of generations from the matriarch of V , whose num-

2. Given any hierarchyofversions, we can nd a

ber is g a g :::a 0. The matriarchof V is the closest

0 1 1 n

name for any additional child of an existing ver-

not necessarily prop er ancestor of V that is not a rst

sion. This name must b e distinct from the names

child of its parent or the ro ot if there is no such an-

of other versions in the hierarchy.

cestor of V . For example, in Figure 5, the matriarch

of 1B 1A2is1B 1A0, and the matriarchof3A2A0is

itself.

3. The length of a name do es not grow signi cantly

The number a is one less than the numb er of sib- along paths that follow a sequence of leftmost

n

lings that the matriarchofV has to its left. For exam- children.

0

1

2 1A0 1B 0

3 1A1 1A0A0 1B 1 2A0

4 1A2 1B 2 3A0 1B 1A0

5 1B 3 3A1 1B 1A1

6 1B 4 3A2 1B 1A2

1B 5 3A3 3A2A0 1B 1A3 1B 1A2A0 7

Figure 5: Improved version numb ering.

ple, in Figure 5 consider the children of version 1. The 4 Some Basic Using the Version-

rst child gets numb er 2. The second child instead has Numb ering Scheme

app ended to the name \1" the letter A which repre-

We b egin with an for determining the

sents \0". The third child has B app ended to the

ancestors of a version.

name \1". Both the second and third children havea

0 app ended to their names after the A or B , b ecause

Algorithm 1 Traverse ancestors

they are their own matriarchs; i.e., they are 0 genera-

tions from their matriarch. That is, the following rules

Input: Version g a g :::a g .

0 1 1 n n

summarize how the children of a version are named.

Output: All ancestors in order from parent to oldest.

 The rst child of version g a g :::a g is

0 1 1 n n

for i := n downto 0 do

g a g :::a g + 1.

0 1 1 n n

for j := g downto 0 do

i

Printg a g :::g a j 

0 1 1 i1 i

 The k th child of version g a g :::a g , for k>1,

0 1 1 n n

3

is g a g :::a g k 20.

0 1 1 n n

Note that when i is 0, then merely j is output.

Here is an algorithm for determining the closest

This scheme o ers an imp ortant economy in the

common ancestor i.e., least upp er b ound to twover-

situation that is most common: very few versions have

sions.

more than one child. For example, in the extreme

case where there are n versions in a chain, that is, no

Algorithm 2 Closest common ancestor

version has more than one child, log n bits suce to

00 00 00 00 00 0 0 0 0 0

representversion names. In comparison, anyscheme . g :::a g a and g g :::a g a Input: Versions g

m m 1 1 0 n n 1 1 0

that allows us to extend the name of a version to get

the name of its child would require linear space to Output: Closest common ancestor g a g :::a g .

0 1 1 l l

represent these n version names.

0 00

if g 6= g W lie b etween V and W in lexicographic order. In a

0 0

0 00

then b egin l := 0; g := ming ;g ; exit end sense, that is all we need. However, the scheme of

0

0 0

0

else g := g ; Section 3 su ers from the problem that the elements

0

0

for i := 1 to minn; m do b egin of the lists that form names are arbitrary integers.

0 00

if a 6= a Since wemust represent suchintegers in some system

i i

then exit; that uses a nite numb er of symb ols, wemay not b e

0

a := a ; able to preserve the lexicographic order.

i

i

0 00

if g 6= g

For example, consider the names 10A0 and 2B 3.

i i

0 00

then b egin l := i; g := ming ;g ; exit end;

The rst of these represents a version that is found by

i

i i

0

g := g ;

following 10 generations of rst children from the ro ot,

i

i

end;

then moving to the second child. The second is found

l := minn; m;

by following two generations of rst children from the

exit

ro ot, moving to the third child, and then going down

3

three more generations of rst children. The latter

precedes the former, since 2 < 10. However, if we

We call the traversal that results from lexicograph-

representintegers in decimal, and treat A as 0, B as

ical ordering from our enco ding the inverse left-corner

1, and so on, then the two names b ecome 1000 and

traversal. This traversal is pro duced by the following

213. The rst precedes the second in lexicographic

algorithm.

order, which is wrong.

Wethus face the problem of representing integers

Algorithm 3 Inverse left-corner traversal

by co des that are in the same order, lexicographically,

Input: Aversion history represented by tree T .

as the values of the integers. Such an enco ding of inte-

gers will b e said to have the lexicographic property.We

Output: Traversal according to lexicographical ordering.

just observed that the ordinary representations such

as decimal do not have the lexicographic prop erty.We

walkT =

saw that 2 < 10 as integers, but 10 precedes 2 lexico-

for i:=2 to last child of ro ot do

graphically.

walkith subtree of ro ot of T ;

The simplest solution is to use xed-length numb ers

printrootofT;

of sucient length, with numb ers padded on the left

walk rst subtree of ro ot of T 

by leading zero es. For example, 5 digits are enough to

end;

handle 10,000 generations of versions and up to 10,000

3

children of one version. The xed-length solution suf-

fers from three problems: wasted space for lowvalues,

To traverse from a version v to some descendant

a

limited maximum value, and the need to sp ecify the

v , scan the ordering starting at the version v and

d a

length somehow.

ending at the desired descendant v , skipping all ver-

d

Alternatively,we can use a variable-length repre-

sions that are not ancestors of the desired descendant.

sentation that is self-delimiting. Consider the follow-

The only versions that will have to b e skipp ed are

ing simple scheme for representing binary numb ers.

other descendants of ancestor v .

a

Represent a binary number of b bits by pre xing it

with b 1 one bits and a zero bit. This scheme is called

5 Representing Version Names

Elias co des [Elias 1975]. For example, the number 5,

or 101 , is represented by the binary string 110101.

Wemay see the version naming scheme of Section 3

2

Deco ding is easy. The pre x consists of all the bits

as one where version names are lists of integers, the

up to and including the rst zero; the numb er of bits

alternating g 's generation numb ers and a's p osition

in the pre x tells how many bits follow. The number

among siblings. These lists have a natural lexico-

follows in binary.

graphic order. That is, to compare two lists, nd the

common pre x and delete it. After this mo di cation, A similar scheme can b e used for variable-length

an empty list precedes any list, and otherwise, the list enco dings using decimal numb ers. Here, h digits with

with the lower rst element precedes the other. For the value 9 are followed by h + 1 digits, where the

example, in Figure 5, 1B 1A2 precedes b oth 1B 4 and rst of those latter digits is not 9. The numb ers 0

1B 1A2A0. through 8 are represented as themselves. The numb ers

It is easy to check that with this numb ering scheme, 9 through 98 are represented by 900 through 989. The

all the versions b etween a version V and its descendant numb ers 99 through 998 are represented by 99000 to

We shall intro duce the optimal scheme rst by con- 99899, and so on. In general, to get the value from

sidering the case of the decimal representation. such a representation add the pre x the consecutive

9's to the rest of the numb er using ordinary decimal

Enco de the rst 5 numb ers 0 through 4 by 0 through

arithmetic.

4.

To obtain a representation we use the following al-

Enco de the next 25 numb ers 5 through 29 by 50

gorithm. Let d b e the numb er of digits ordinarily used

through 74.

to represent the desired value v . If all d digits of v are

Enco de the next 125 numb ers 30 through 154 by 750

the digit 9, then the representation is d 9's followed

through 874.

0

by d + 1 0's. Otherwise, let v = v x, where x

Enco de the next 625 numb ers 155 through 779 by

is the decimal numb er formed by d 1 9's; that is,

8750 through 9374.

d1

x =10 1. Then the representation of v is d 1

Enco de the next 3125 numb ers 780 through 3904 by

0

9's followed by v padded out with leading zero es to d

93750 through 96874, and so on.

digits, if necessary.

This approach o ers smaller expansion of co des at

We might also adapt the same idea to any radix

the exp ense of slightly more complicated enco ding and

other than binary or decimal. In any radix it is easy

deco ding. For example, the scheme of Section 5 repre-

to prove that for anyintegers i and j ,ifi

sents 999 numb ers in 5 digits, while this scheme rep-

the enco ding of i precedes the enco ding of j , lexico-

resents 3905 numb ers in 5 digits. Here are algorithms

graphically.We shall fo cus on the binary system, for

for enco ding and deco ding numb ers using this scheme.

simplicity.

Supp ose i

Algorithm 4 Enco ding

then the enco ding of i will b egin with fewer 1's than j 's

Input: n to b e enco ded.

enco ding. Thus, i precedes j lexicographically, when

b oth are enco ded. The only other p ossibility when

Output: Decimal representation of the enco ded

i< j is that i and j have the same numb er of bits

numb er.

in their binary representations, say b bits. Then b oth

b1

their enco dings start with 1 0, followed by the b-bit

d

Let d b e the smallest integer such that n<5 5=4.

representations of i and j . Since i

Let :h h :::h b e the decimal fraction

1 2 d2 enco ding of i will precede that of j , lexicographically.

d2

representation of 1 1=2 .

Then the enco ding of n is the decimal representation

6 Optimal Lexicographic Enco dings of Inte-

d1

of h h :::h 0, plus n, minus 5 5=4.

1 2 d2

gers

3

For example, let n = 2000. Then d = 6, b ecause

Let us de ne the expansion of an enco ding to b e

6 5

5 5=4 = 3905 is greater than 2000, but 5

the limit, as i gets large, of the ratio of the length of

5=4 = 780 is not. Thus, h h :::h = 9375, since

the enco ding of i to the length of the ordinary radix

1 2 d2

1

1 = :9375. Wethus compute the enco ding of 2000

representation of i in the base that has as many digits

16

5

by adding 93,750 to 2000 and subtracting 5 5=4,

as the enco ding uses. For example, whether in binary,

or 780. The resulting co de is 94970.

decimal, or another base, the expansion of the enco d-

ing schemes of Section 5 is 2. The reason is that the

Algorithm 5 Deco ding

pre xes intro duced have, within 1, as many digits as

the ordinary representation of the numb er.

Input: A k -digit enco ding, whose decimal value is d,

We might ask whether 2 is the minimum expansion

to b e deco ded.

p ossible. The answer is rather surprising. It is p ossi-

ble to approach expansion 1 as closely as we like; that

Output: Numerical value of the deco ded numb er.

is, asymptotically the requirement of a lexicographic

k k k

enco ding costs nothing. However, to approach1it

The enco ded integer is d 10 +2 5 +5 5=4.

is necessary to inde nitely increase the base of the

3

representation. That has the e ect of putting a pro-

gressively higher o or on the minimum length of an For example, supp ose we are given the co de 94970.

5

enco ded integer when all representations are reduced Then d = 94,970 and k =5.Wemust subtract 10

5

to binary, as they surely must b e for representation = 100,000 from d, then add 2  5 = 6250 and add

5

within a computer. 5 5=4 = 780. The result is 2000.

i

7 Optimality of Our Integer Enco ding symb ols is b=2 . Note that the scheme of Section 6,

which uses b = 10, actually assigns exactly this many

It turns out that the enco ding of Section 6, or its

integers.

n

generalization to an arbitrary base, is essentially the

For a general base b,we can represent b numb ers

b est we can do. We shall show that any enco ding in

by co des of length up to m, provided m satis es

base b that satis es two constraints app ears to use only

m m1 n

b=2 +b=2 +  +1  b

half of the b available digits. The two constraints, b oth

of which are consequences of our assumptions ab out

Some algebra shows that the smallest m that satis es

howversion names are to b e enco ded, are:

the ab ove is approximately

1. The enco ding must b e a pre x co de; that is, no m = n log b=log b 1

2 2

integer mayhave an enco ding that is a pre x of

Put another way, the expansion of such a co de is ap-

another integer's enco ding. To understand the

proximately 1 + 1= log b. For b = 10, wehavean

2

need for this requirement, wemust rememb er that

expansion of ab out 1.3, compared with an expansion

whatever alphab et is used to represent enco dings

of 2 for the enco ding of Section 5.

is the entire alphab et. There is no extra \blank"

Of course, we need to show the existence of an en-

or \endmarker" symbol available. Thus, if one

co ding for an arbitrary numberofsymb ols b that pre-

integer's enco ding were a pre x of another's, it

serves lexicographic order the way the enco ding for

would not b e clear whichinteger was meant when

b =10does.We claim that such a co de always exists.

the latter's app eared.

The existence is easy to exhibit and also comparatively

easy to understand for the case that b isapower of

2. The enco ding must not b e biased in favor of large

k

2; say b =2 . It also helps our understanding if we

or small integers. The formal requirement that

represent the digits in base b by their binary represen-

we b elieve re ects this condition is that the num-

tation. For example, if b = 16, we represent eachof

b er of enco dings of a given length i grows in a

the hexadecimal digits by a string of four 0's and 1's.

uniformly exp onential way. That is, the number

i

Then the co de consists of the following:

of integers that have enco dings of length i is c for

some constant c. While it is p ossible to develop

k 1

2 bit strings of length k b eginning with 0,

co des that favor either small or large numb ers,

2k 2

2 bit strings of length 2k b eginning with 10,

wefavor the common case where version numb ers

3k 3

2 bit strings of length 3k b eginning with 110,

may b e arbitrarily large.

4k 4

2 bit strings of length 4k b eginning with 1110,

5k 5

2 bit strings of length 5k b eginning with 11110,

Condition 1 has a well-known arithmetic interpre-

and so on.

tation, called the Kraft McMillan [1956],

Cover and Thomas [1991]. Let n b e the number of

For example, again assuming b = 16, so k = 4, our

i

integers with co des of length i, and let b b e the num-

co de uses 8 bit strings of length 4, which are 0000

b er of symb ols in the enco ding alphab et. Then if the

through 0111, corresp onding to the hexadecimal dig-

enco ding satis es condition 1, i.e., it is a pre x co de,

its 0 through 7. Wehave 64 bit strings of length 8,

then

each of which b egins with 10. These are the two-digit

1

X

hexadecimal numb ers whose rst digit is b etween 8

i

n b  1

i

and 11 and whose second digit is arbitrary. The last

i=1

of the ve listed cases ab ove strings of length 5k 

i

Now, wemay use condition 2 to say that n = c

i

tells us that there are 32,768 bit strings of length 20,

for some constant c. That, combined with the Kraft

each b eginning 11110. These corresp ond to hexadeci-

inequalitysays

mal numb ers whose rst digit is 15 and whose second

1

X

digit is b etween 0 and 7; the remaining three digits

i

c=b  1

are arbitrary.

i=1

If we assign co des to integers lowest rst, it should

If we sum the in nite series we get

b e clear that the co des will have the lexicographic

prop erty. The reason is that shorter strings always

c=b

 1

precede longer ones in lexicographic order, b ecause

1 c=b

the longer the string the longer the initial pre x of

or c  b=2. That is, the numberofintegers that can 1's. Among strings of the same length, their lexico-

b e assigned co des of length i in an enco ding using b graphic and numeric orders are the same.

two levels are release and revision numb er, as in SCCS. The expansion of this enco ding scheme is easily seen

When a new branch is created i.e., a second child, to approximate 1 + 1=k . Thus, by picking a higher

then two extra numb ers are added: the branchnum- power of two as the base, we can create a co de with

b er followed by the child numb er of that branch. For expansion arbitrarily close to 1. Moreover, we can

k

example, supp ose version 1.3 already has a child ver- represent the digits in base 2 = b in binary,sowedo

sion 1.4 and wants to create a new child version. That not have a problem with representing arbitrarily large

new child version is numb ered 1.3.1.1. A subsequent alphab ets.

child of version 1.3 is numb ered 1.3.2.1. The rst child

The only downside with making k as large as we

of version 1.3.1.1 is called 1.3.1.2. There is a prob-

like is that wedoworse when we enco de small inte-

lem with naming the second child of version 1.3.1.1.

gers. The reason is that no integer has an enco ding

The approach used in RCS is to pick the next avail-

with fewer than k bits. For example, if wecho ose the

k

able branchnumb er. Thus, the second child of version

co de with b =2 and represent base-b digits in binary,

1.3.1.1 is named 1.3.3.1.

then the integers 0 and 1 have co des k times as long

k k 1

Zob el, Mo at, and Sacks-Davis [1992] consider in- as their binary representations, namely 0 and 0 1,

teger enco ding for full-text database systems. They resp ectively.

consider ecient one-pass enco ding schemes, but are

Fortunately, there are reasonable compromises. For

not concerned with the lexicographical ordering prop-

example, we can pick k = 8, thereby representing inte-

erties of those schemes. Bentley, Sleator, Tarjan, and

gers bybyte strings, with a 12.5 expansion. Version

Wei [1986] consider compression schemes with lo cal-

names, which it should b e recalled is what we really

ity of reference. These are pre x co des, but they do

wish to represent, are then likewise represented by bit

not ob ey the prop erty that the lexicographical order-

strings. Because of the pre x prop erty of our enco d-

ing of the enco ding matches the ordering of the num- ing, we can represent a list of integers, which are the

b ers enco ded. Rather, their ob jective is to use shorter

version names according to the scheme of Section 3,

enco dings for frequently used values, like Hu man

concatenate them, and still are able to pull the en-

co des. Gallagher and Van Vo orhis [1975] enco de in -

co ded byte string apart into its constituent enco ded

nite sources like our enco ding of the non-negative in-

integers.

tegers using a Hu man-like enco ding. They also give

shorter enco dings to frequently used values, while we

8 Related Work

give shorter enco dings to values that are smaller. We

assume that smaller values are used more frequently

Katz [1990] gives a survey of version mo deling in

b ecause there are manyversion hierarchies that start

engineering databases. Some of approaches he men-

with the version numb er zero, and fewer and fewer

tions allow the version hierarchytobeaDAG di-

version numb ers with larger and larger values.

rected acyclic graph, whichwe do not handle. How-

ever, there is no mention of version naming schemes

9 Conclusion

in this survey. Ambriola et al., [1990] gives a sur-

vey of version control and con guration management

Wehave presented a version numb ering scheme for

in the software engineering environment. The SCCS

versions with alternatives that has a useful lexico-

system byRochkind [1975] allows a version hierarchy,

graphical ordering. The version numb ering scheme

but version numb ering is limited to release number

uses two techniques. The rst technique enco des the

for ma jor or imp ortantchanges b efore the decimal

version hierarchy, that is, the part from the ro ot to the

p oint, followed byversion numb er for minor changes.

given version. Our approachfavors the common case

The deltas b etween a parent and child version are in-

of the leftmost child, so that the length of the version

sertions and deletions to the parentversion. All of

name for this child is at most slightly longer than its

the deltas for an entire version hierarchy are stored as

parent. A version's numb er is prop ortional in length

change records nested if necessary in a single UNIX

to the numb er of times that one of its ancestors was

le. The deltas are thus sorted by where they apply

not the leftmost child.

in the version. To compute a version or the deltas b e-

The second technique is to enco de nonnegative in- tween anytwoversions with an ancestor relationship,

tegers so that their lexical order matches the ordering it is necessary to scan the entire le.

of their values. This enco ding is also self-delimiting

The RCS system by Tichy [1985] includes an ex-

and optimal.

plicit revision tree with some of the characteristics of

We are implementing these metho ds in the version our scheme, but less general and less compact. The

and con guration management system b eing devel- version hierarchy is limited to four levels. The rst

op ed for the CEDB pro ject.

10 Acknowledgments

We thank memb ers of the CEDB pro ject for their

feedback. We thank Marianne Siroker for her assis-

tance in preparing this pap er.

11 Bibliography

Ambriola, V., L. Bendix, and P. Ciancarini [1990].

\The evolution of con guration management and ver-

sion control," Software Engineering Journal,Novem-

b er 1990, pp. 303{310.

Bentley, J. L., D. D. Sleator, R. E. Tarjan, and V. K.

Wei [1986]. \A Lo cally Adaptive Data Compression

Scheme," Communications of the ACM 29:4, April

1986, pp. 320{330.

Cover, T. M. and J. A. Thomas [1991]. Elements of

Information Theory, Wiley, New York.

Elias, P. [1975]. \Universal Co deword Sets and Rep-

resentations of the Integers," IEEE Trans. on Infor-

mation Theory IT-21, pp. 194{203.

Gallagher, R. G. and D. C. Van Vo orhis [1975]. \Opti-

mal Source Co des for Geometrically Distributed Inte-

ger Alphab ets," IEEE Trans. on Information Theory,

March 1975, pp. 228{230.

Katz, R. [1990]. \Towards a uni ed framework for ver-

sion mo deling in engineering databases," ACM Com-

puting Surveys, pp. 375-408.

Krishnamurthy, K. [1993]. \Version and con guration

management for collab orative design," TR{92, Stan-

ford Center for Integrated Facility Engineering, Nov.,

1993.

McMillan, B. [1956]. \Two inequalities implied by

unique decipherability," IEEE Trans. Information Th.

IT-2, pp. 115{116.

Ro chkind, M.J. [1975]. \The Source Co de Control

System," IEEE Trans. on Software Eng., SE-1:4,

pp. 364{370.

Tichy, W.F. [1985]. \RCS | A System for Version

Control," Software|Practice and Experience 15:7,

John Wiley, pp. 647{654.

Zob el, J., A. Mo at, and R. Sacks-Davis [1992]. \An

Ecient Indexing Technique for Full-Text Database

Systems," 18th VLDB, 1992, pp. 352{362.