University of Pennsylvania ScholarlyCommons

Technical Reports (CIS) Department of Computer & Information Science

April 1997

Some Undecidable Implication Problems for Path Constraints

Peter Buneman University of Pennsylvania

Wenfei Fan University of Pennsylvania

Scott Weinstein University of Pennsylvania, [email protected]

Follow this and additional works at: https://repository.upenn.edu/cis_reports

Recommended Citation , Wenfei Fan, and Scott Weinstein, "Some Undecidable Implication Problems for Path Constraints", . April 1997.

University of Pennsylvania Department of Computer and Information Science Technical Report No. MS-CIS-97-14.

This paper is posted at ScholarlyCommons. https://repository.upenn.edu/cis_reports/84 For more information, please contact [email protected]. Some Undecidable Implication Problems for Path Constraints

Abstract We present a class of path constraints of interest in connection with both structured and semi-structured , and investigate their associated implication problems. These path constraints are capable of expressing natural integrity constraints that are not only a fundamental part of the semantics of the data, but are also important in query optimization. We show that, despite the simple syntax of the constraints, the implication problem for the constraints is r.e. complete and the finite implication problem for the constraints is co-r.e. complete. Indeed, we establish the existence of a conservative reduction of the set of all first-order sentences to the path constraint language.

Comments University of Pennsylvania Department of Computer and Information Science Technical Report No. MS- CIS-97-14.

This technical report is available at ScholarlyCommons: https://repository.upenn.edu/cis_reports/84

Some Undecidable Implication Problems for

Path Constraints

y z

Peter Buneman Wenfei Fan Scott Weinstein

p etercentralcisup ennedu wfansaulcisup ennedu weinsteinlinccisup ennedu

Department of Computer and Information Science

University of Pennsylvania

April

Abstract

We present a class of path constraints of interest in connection with b oth structured and

semistructured databases and investigate their asso ciated implication problems These path

constraints are capable of expressing natural integrity constraints that are not only a funda

mental part of the semantics of the data but are also imp ortant in query optimization We show

that despite the simple syntax of the constraints the implication problem for the constraints is

re complete and the nite implication problem for the constraints is core complete Indeed

we establish the existence of a conservative reduction of the set of all rstorder sentences to

the path constraint language

Intro duction

Path inclusion constraints have b een studied in in the context of semistructured data

Consider the following ob jectoriented schema

class studentf

Name string

Taking setcourse

g

class coursef

CName string

setstudent Enrolled

g

Students setstudent

Courses setcourse

in which we assume that the declarations Students and Courses dene p ersistent entry p oints

it stands this declaration do es not provide full information ab out the intended into the As

structure Given such a database we would exp ect the following informally stated constraints to

hold

This work was partly supp orted by the Army Research Oce DAAH and NSF Grant CCR

y

Supp orted by an IRCS graduate fellowship

z

Supp orted by NSF Grant CCR r

Students Courses Students Courses

Taking Taking Taking S1 C1 S2 C2 Enrolled Enrolled Enrolled

Name CName Name CName

"Smith" "Chem3" "Jones" "Phil4"

Figure Representation of a studentcourse database

a s S tudents c sT ak ing c C our ses

nr ol l ed s S tudents b c C our ses s cE

That is any course taken by a student must b e a course that o ccurs in the database extent of courses

e and any student enrolled in a course must b e a student that similarly o ccurs in the database W

shall call such constraints extent constraints

We might also exp ect an inverse relationship to hold b etween Taking and Enrolled Ob ject

oriented databases dier in the ways they enable one to state and enforce extent constraints and

inverse relationships Compare for example O and Ob jectStore The presence of such

2

constraints is imp ortant b oth for database and for query optimization

Let us develop a more formal notation for describing such constraints To do this we b orrow an

idea that has b een exploited in semistructured data mo dels eg of regarding

semistructured data as an edgelab eled graph The database consists of two sets and we express this

by a ro ot no de r with edges emanating from it that are lab eled either Students or Courses These

connect to no des that resp ectively represent students and courses which have edges emanating from

them that resp ectively describ e the structure of students and courses For example a student has a

single Name edge connected to a string no de and multiple Taking edges connected to course no des

See Figure for an example of such a graph

this representation of data we can examine certain kinds of constraints Using

Extent Constraints By taking edge lab els as binary predicates constraints of the form a and

b ab ove can b e stated as

c s S tudentsr s T ak ing s c C our sesr c

s c C our sesr c E nr ol l edc s S tudentsr s

These constraints are examples of word constraints studied in the implication problems

for word constraints were shown to b e decidable there

verse Constraints These are common in ob jectoriented databases With resp ect to our In

studentcourse schema the inverse b etween Taking and Enrolled is expressed as

s c S tudentsr s T ak ing s c E nr ol l edc s

c s C our sesr c E nr ol l edc s T ak ing s c

Such constraints cannot b e expressed as word constraints or even by the more general path

constraints given in

Lo cal Database Constraints In database integration it is sometimes desirable to make one

database a comp onent of another database or to build a database of databases Supp ose for

example we wanted to bring together a numb er of studentcourse databases as describ ed ab ove

We might write something like

class SchoolDBf

DBidentifier string

Studentssetstudent as defined above

Courses setcourse as defined above

g

Schools setSchoolDB

Now we may want certain constraints to hold on comp onents of this database For example the

we refer to extent constraints describ ed ab ove now hold on each memb er of the Schools set Here

a comp onent database such as a memb er of the set Schools as a loc al database and its constraints

our graph representation by adding Schools edges from as local database constraints Extending

the new ro ot no de to the ro ots of lo cal databases the lo cal extent constraints are

d c S chool sr d s S tudentsd s T ak ing s c C our sesd c

d s S chool sr d c C our sesd c E nr ol l edc s S tudentsd s

Again these cannot b e stated as path constraints of

These considerations give rise to the question whether there is a natural generalization of the

constraints of which will capture these slightly more complicated forms Here we consider a

class of path constraints of either the form

y r x x y x y x

or the form

x y r x x y y x

where x y x y x y represents a path from no de x to no de y

This class of path constraints can b e used to express all the constraints we have so far en

countered For semistructured data in particular this class of constraints is useful not only for

on this sub ject optimizing navigational queries but also for inferring structure see

Surprisingly the implication problems for this mild generalization of word constraints are unde

cidable However certain restricted cases are decidable in semistructured databases and these

t to express at least the constraints we have describ ed ab ove In this pap er we cases are sucien

establish the undecidability of the implication problems for this class of path constraints We defer

to another pap er a full treatment of the decidability results

Related work

The idea of representing data as an edgelab eled graph and using paths to sp ecify navigational

queries dates back to the early s Recently the idea has b een exploited and adapted

to a variety of new database applications ranging from querying ob jectoriented databases eg

XSQL OQLdo c to querying semistructured data eg UnQL Lorel WebSQL

STRUQL

There has also b een work in constraint languages dened in terms of paths for structured data

as well as for semistructured data A class of functional constraints called

path functional dependencies was prop osed in The axiomatizability and decidability of

its asso ciated unrestricted implication problem were established in This constraint

language generalizes functional dep endencies in the relational data mo del for semantic and ob ject

oriented data mo dels It diers signicantly from ours which is a generalization of unary inclusion

dep endencies in the relational mo del for b oth structured and semistructured data

In a class of constraints for sp ecifying range restrictions asso ciated with paths called

specialization constraints was prop osed for ob jectoriented data mo dels The axiomatizability

and decidability of its asso ciated implication problems were established in A sp ecialization

constraint asserts a typ e condition which is often sp ecied by a schema The central dierence

b etween sp ecialization constraints and our path constraints is that sp ecialization constraints are

typ e constraints for a graph representing a schema whereas our path constraints sp ecify inclusion

relations for a graph representing data In particular sp ecialization constraints must b e asso ciated

with a schema whereas our path constraints are dened for b oth structured and semistructured

data

Closer to our work is the path inclusion constraint language intro duced and investigated in

q or p q where p and q are regular A constraint in this language is an expression of the form p

expressions denoting paths in a graph representing semistructured data In particular if p and

q are simply sequences of lab els the constraint is called a wor d constraint A constraint p q

p q expresses the inclusion equality relation b etween the two sets of no des reachable via p

and q The decidability of the implication problems for the language was established in In

This constraint addition also showed that word constraint implication is decidable in PTIME

language diers from ours in expressive p ower On the one hand the constraint language of

allows a more general form of path expressions than ours On the other hand it do es not capture

inverse constraints and lo cal database constraints mentioned ab ove whereas these constraints can

b e expressed in our language In short our constraints language is a generalization of the class of

word constraints given in

The rest of the pap er is organized as follows Section formally presents our path constraint

language and identies two of its fragments Section and show that for each of the two fragments

the implication problem is re complete and the nite implication problem is core complete and

therefore establish the undecidability of the implication problems for our path constraint language

Path Constraint Language P

In this section we formalize the path constraints language P We rst present an abstraction of

semistructured databases and dene the language P in terms of rstorder logic We then describ e

the implication problems for P and state the main results of the pap er

We assume the standard notions of sentences mo dels and implication used in rstorder logic

Abstraction of semistructured data

Semistructured data is usually represented as an edgelab eled ro oted directed graph eg in

UnQL and in OEM See for a survey of semistructured data mo dels Along

the same lines here we use an abstraction of semistructured databases as nite rstorder logic

structures of signature

r E

where r is a constant denoting the ro ot and E is a nite set of binary relations denoting the edge

lab els

Path constraints

A path ie a sequence of lab els can b e represented as a logic formula with two free variables

Denition A path is a formula x y having one of the following forms

x y denoted x y and called an empty path

K x y where K E or

z K x z z y where K E and z y is a path

Here the free variables x and y denote the tail and head no des of the path resp ectively We write

x y as when the parameters x and y are clear from the context

The path constraint language P is formalized as follows

is an expression of either the forward form Denition A path constraint

x y r x x y x y

or the backwar d form

x y r x x y y x

where are paths The path is called the prex of The paths and are denoted by

pf l t and r t resp ectively

The set of all path constraints is denoted by P

For example all the path constraints presented in the last section are constraints in the set P

Next we identify several sp ecial sub classes of P

We call a path constraint in P a simple path constraint if pf That is is of either

the form

y r y r y

or the form

y r y y r

The set of all simple path constraints is denoted by P

s

A prop er sub class of simple path constraints called word constraints and denoted by P was

w

intro duced and investigated in A word constraint can b e represented as

y r y r y

where and are paths

Path constraint implication

We next describ e the implication problems for path constraints in P

We b orrow the standard notations of mo dels and implication from rstorder logic For a

structure G and a constraint in P we use G j to denote that G satises ie G is a

fg of P we use j to denote that implies That mo del of For any nite subset

is for every structure G if G j then G j Similarly we use j to denote that nitely

f

implies That is for every nite structure G if G j then G j

The implication problem for P is the problem of determining given any nite set fg of

sentences in P whether j The nite implication problem for P is the problem of determining

given any nite subset fg of P whether j

f

As observed by every word constraint in fact every simple path constraint can b e expressed

2

by a sentence in twovariable rstorder logic F O the fragment of rstorder logic consisting

of all relational sentences with at most two distinct variables Recently has shown that the

2 2

satisability problem for FO is NEXPTIMEcomplete by establishing that any satisable FO

sentence has a mo del of size exp onential in the length of the sentence The decidability of the

implication and nite implication problems for word constraints and for simple constraints follows

2

immediately In fact directly establishes without reference to the emb edding into FO that

the implication problems for word constraints are in PTIME

In contrast to word constraint implication b oth the implication and the nite implication prob

lems for P are undecidable These undecidability results which are the main results of the pap er

are stated as follows

Theorem The implication problem for P is re complete and the nite implication problem

for P is core complete

In fact these results hold true for two prop er sub classes of P One of the sub classes P is the

f

set of all the constraints of the forward form in P The other P is the set

+

f j P l t r t g

Note that P is the largest subset of P without equality

+

For P and P we have the following theorems from which Theorem follows immediately

+

f

Theorem The implication problem for P is re complete and the nite implication problem

+

for P is core complete

+

Theorem The implication problem for P is re complete and the nite implication problem

f

for P is core complete

f

We prove Theorem and in the next two sections

The Implication Problems for P

In this section we establish the undecidability of the implication and nite implication problems

for P

+

Preliminaries

We rst recall the denitions of tworegister machines and conservative reduction classes from

Tworegister machines

A tworegister machine RM M consists of two registers r eg ister r eg ister and is programmed

1 2

I I of instructions Each register contains a natural numb er by a numb ered sequence I

0 1

l

An instantaneous description ID of M is i m n where i l m and n are natural

or at state i with r eg ister and numb ers It indicates that M is ready to execute instruction I

1 i

r eg ister containing m and n resp ectively

2

An instruction of M is either an addition or a subtr action which denes a relation b etween

M

describ ed as follows IDs

addition i r g j where r g is either r eg ister or r eg ister i j l The semantics of

1 2

the addition instruction is at state i M adds to the content of r g and then go es to state

j Accordingly

(

j m n if r g r eg ister

1

i m n

M

j m n otherwise

subtraction i r g j k where r g is either r eg ister or r eg ister i j k l The seman

1 2

tics of the subtraction instruction is at state i M tests whether the content of r g is and

if it is then go es to state j otherwise M subtracts from the content of r g and go es to the

state k Accordingly

8

>

j n if r g r eg ister and m

1

>

>

<

n if r g r eg ister and m k m

1

i m n

M

>

j m if r g r eg ister and n

2

>

>

:

k m n if r g r eg ister and n

2

The relation can b e understo o d as a set of rewrite rules for IDs

M

We use to denote the reexive and transitive closure of The relation of Mreachability

M M

C D holds just in case M started from ID C reaches ID D by application of zero or more

M

rewrite rules

M

We will need the following denitions from

Denition Let X b e a class of sentences We write

N X for the set of all unsatisable sentences in X that is

N X f j X do es not have a mo delg

F X for the set of all nitely satisable sentences in X that is

F X f j X has a nite mo delg

We write FO for the set of all rstorder sentences

Recall the following wellknown result

There is an eective partial pro cedure by which given a sentence in FO we can test

whether it has no mo del a nite mo del or only innite mo dels The pro cedure termi

nates in the rst two cases but do es not terminate in the last case

We x M to b e a tworegister machine with the following b ehavior the existence of such a

L

machine follows from the result just quoted See for additional discussions The tworegister

machine M has two halting states and For each FO let m b e an

L

appropriate enco ding of a natural numb er and C the ID m of M Started from

L

C

M halts at i is not satisable

L

M halts at i has a nite mo del

L

In other words M has the following prop erty For i let

L

H f j F O C i g

M i M

L L

Then H is N FO and H is F FO

M 1 M 2

L L

Here halting of M means that the ID sequence b ecomes constant when reaching a stop state

L

This stop condition can b e assumed without loss of generality

Conservative reductions

Recall the following denitions from

Denition Let X and Y b e recursive classes of sentences A reduction from X to Y is a

recursive function f X Y such that for any X is satisable i f is satisable

A conservative reduction from X to Y is a recursive function f X Y such that for any

X

is satisable i f is satisable and

is nitely satisable i f is nitely satisable

A recursive class of sentences X is a conservative reduction class if there exists a conservative

reduction from FO to X

The nite satisability problem for a recursive class of sentences X is the problem of deter

mining given any X whether has a nite mo del

Obviously if a recursive class of sentences X is a conservative reduction class then

the satisability problem for X is core complete and

the nite satisability problem for X is re complete

To show that a recursive subset X of FO is a conservative reduction class it suces to reduce

N FO and F FO to N X and F X resp ectively More precisely we dene the notion of

semiconservative reductions as follows

Denition Let X and Y b e recursive classes of sentences A semiconservative reduction

from X to Y is a recursive function f X Y such that

f N X N Y and

f F X F Y

Lemma If there exists a semiconservative reduction from FO to a recursive subset X of

FO then X is a conservative reduction class

Reduction from the halting problem for RMs

We next show Theorem It suces to show that the set

^

S P f j P P is niteg

+ + +

is a conservative reduction class To establish the conservative reduction class prop erty for S P

+

by Lemma it suces to show that there is a semiconservative reduction from FO to S P

+

We establish the existence of such a semiconservative reduction by reduction from the halting

problem for tworegister machines We rst present an enco ding of tworegister machines by sen

tences in P and then prove a reduction prop erty of the enco ding Using this reduction prop erty

+

we dene a semiconservative reduction from FO to S P

+

Enco ding

Let M b e a tworegister machine Assume that M is programmed by

I I I

0 1

l

We also assume that the set E of binary relations in the signature includes

the predicates enco ding the states of M

K K K

0 1

l

K K K

0 1

l

the predicates enco ding the contents of the registers natural numb ers

+

R R to enco de the successor and the predecessor of the contents of r eg ister

1

1 1

+

R R to enco de the successor and the predecessor of the contents of r eg ister

2

2 2

E E to indicate that the content of r eg ister is

01 1

01

to indicate that the content of r eg ister is E E

2 02

02

the predicates distinguishing r eg ister from r eg ister

1 2

to identify r eg ister L L

1 1

1

to identify r eg ister and L L

2 2

2

L to identify the ro ot r

r

We now dene the enco ding as follows

Registers

We enco de the contents of the registers by which is the conjunction of the path constraints

N

of P given b elow

+

Successor predecessor

+

y x x y R xy L r x R

1 1

1 1

+

y x x y R xy L r x R

2 1

1 1

+

xy L r x R x y R y x

3 2

2 2

+

xy L r x R x y R y x

2 4

2 2

and are constraints of the backward form

1 2 3 4

+

xL r x y R x y L y r

5 1

1 1

+

xL r x y R x y L y r

2 6

2 2

and are simple constraints of the backward form

5 6

Register identication

+

xy L r y R y x L r x

7 1 1

1

xy L r y R y x L r x

8 1 1

1

+

xy L r y R y x L r x

9 2 2

2

xy L r y R y x L r x

10 2 2

2

and are simple constraints of the forward form

7 8 9 10

States for i l

i

xy L r x K x y K y x

1 i

11

i

i

xy L r x K x y K y x

2 i

12

i

i i

and are constraints of the backward form

11 12

Zeros

xy L r x E x y E y x

13 1 01

01

xy L r x E x y L r y

14 1 r

01

xy L r x E x y E r y

15 r 01 01

xz L r z y E z y E y x E r x

16 1 02 02

01

is a constraint of the backward form and are simple constraints of the

13 14 15 16

forward form

xE r x L r x

17 01 1

xE r x L r x

18 02 2

and are simple constraints of the forward form

17 18

IDs

We enco de each ID C i m n of M by

+

n m

E R xy L r x R E x y K x y

02 C 1 i

2 1 01

where

abbreviation for x z z y stands for the concatenation of paths x z and

z y which is dened by

8

>

x y if

<

z K x z z y if K

x z z y

>

:

uK x u u z z y if x z uK x u u z

m

stands for the mtime concatenation of which is dened by

(

if m

m

m1

otherwise

It is easy to see that the constraint is in P It has the forward form with pf L

C + C 1

+

m n

l t R E E R and r t K

C 02 C i

1 01 2

Instructions

given b elow For each i l we enco de the instruction I by

i I

i

Addition

For i r eg ister j is

1 I

i

i

xy L r x x R x x K x y K x y

1 i j

a

1

1

i i i

Here is a constraint of the forward form with pf L l t R K and

1 i

a a a

1

1 1 1

i

r t K

j

a

1

For i r eg ister j is

2 I

i

i +

xy L r x y K x y R y y K x y

1 i j

a

2

2

+

i i i

Here is a constraint of the forward form with pf L l t K R and

1 i

a a a

2

2 2 2

i

r t K

j

a

2

Subtraction

For i r eg ister j k is

1 I

i

i i i

s s s

0 1 1 1n

where

i

xy E r x K x y K x y

01 i j

s

10

and

i +

x y xy L r x x R x x K x y K

1 i

k

s

1

1n

i

is a conjunction of two constraints having the forward form The rst conjunct Here

s

1

i i i

K In the second K and r t E l t is a constraint with pf

j i 01

s s s

10 10 10

+

i i i

K K and r t R L l t conjunct pf

i 1

k

s s s

1

1n 1n 1n

For i r eg ister j k is

2 I

i

i i i

s s s

2n 20 2

where

i

y x K x y xy E r y K

j 02

s

i

20

and

i

y y K x y xy L r x y K x y R

1 i

k

s

2

2n

i

is a conjunction of two constraints The rst conjunct is a constraint of the Here

s

2

i i i

K The second and r t K L l t backward form with pf

j 1

s s s

i

20 20 20

i i

and K R L l t conjunct is a constraint of the forward form with pf

i 1

s s

2

2n 2n

i

K r t

k

s

2n

l

^

The enco ding of the program of M is Clearly is a conjunction of path

M I M

i

i=0

constraints in P

+

erty Reduction prop

Now we show that the enco ding ab ove has the following reduction prop erty

Prop osition Let M b e a tworegister machine For all IDs C and D of M

C D i is valid

M N M C D

Pro of

Assume C D We show that for each mo del G of G j To show

M N M C D

this it suces to show that for each natural numb er t and each ID E of M if E is reached by M

t

We prove this by induction on t in t steps starting from C denoted C E then G j

E

M

Base case If t then the claim holds since G j

C

Inductive step Assume the claim for t

I I

t

i i

Supp ose that C C E where C i m n and C E stands for that E is

1 1 1

M

M M

reached by executing instruction I at C Then by the induction hyp othesis we have G j

i 1 C

1

That is

+

m n

G j xy L r x R E E R x y K x y

1 02 i

1 01 2

We show that the claim holds for t for each case of I which has six cases in total

i

I i r eg ister j In this case E must b e j m n Case

i 1

Supp ose for reductio that there exist a b jGj such that

+

m+1 n

G j L r a R E E R a b K a b

1 02 j

1 01 2

then there exists c jGj such that

m + n

G j R a c R E E R c b

02

1 1 01 2

By in we have G j L r c Thus by G j G j K c b Hence

8 N 1 C i

1

a c K c b G j L r a R

i 1

1

i

in we have G j K a b This contradicts the assumption Thus by

M j

a

1

Case I i r eg ister j In this case E must b e j m n

i 2

Supp ose for reductio that there exist a b jGj such that

+

n+1 m

a b K a b E R E G j L r a R

j 02 1

2 01 1

then there exists c jGj such that

n + + m

c b a c R E R E G j R

02

2 2 01 1

By G j we have G j K a c Hence

C i

1

+

c b G j L r a K a c R

1 i

2

i

in we have G j K a b This contradicts the assumption Thus by

M j

a

2

Case I i r eg ister j k and m In this case E must b e j n

i 1

Supp ose for reductio that there exist a b jGj such that

+ n

G j L r a E E R a b K a b

1 02 j

01 2

then by G j we have G j K a b In addition there exists c jGj such that

C i

1

G j L r a E a c

1

01

By and in we have G j E r a Hence

13 14 15 N 01

G j E r a K a b

01 i

i

Thus by in we have G j K a b This contradicts the assumption

M j

s

10

I i r eg ister j k and m p In this case E must b e k p n Case

i 1

Supp ose for reductio that there exist a b jGj such that

+ n p

E E R a b K a b G j L r a R

02 1

k

01 2 1

then by in there exists c jGj such that

5 N

+

G j L r a R a c

1

1

By in we have G j L r c R c a Hence

7 1 N 1

1

+

p+1 n

G j L r c R E E R c b

1 02

1 01 2

Thus by G j G j K c b Hence

C i

1

+

G j L r a R a c K c b

1 i

1

i

Thus by in we have G j K a b This contradicts the assumption

M

k

s

1n

Case I i r eg ister j k and n In this case E must b e j m

i 2

Supp ose for reductio that there exist a b jGj such that

m

G j L r a R E E a b K a b

1 02 j

1 01

i

b a Moreover there exist in G j K then by G j we have G j K a b By

N C i

11 1

i

c d jGj such that

m

d c E c b a d E G j R

02

01 1

By G j L r a and in we have G j L r d Thus by in G j E r b Hence

1 8 N 1 16 N 02

b a G j E r b K

02

i

i

in we have G j K a b This contradicts the assumption Thus by

M j

s

20

Case I i r eg ister j k and n p In this case E must b e k m p

i 2

Supp ose for reductio that there exist a b jGj such that

+

p m

a b K a b E R E G j L r a R

02 1

k

2 01 1

then there exist c d jGj such that

p + m

d b E c d R a c E G j R

02

2 01 1

By in we have G j L r c By in G j E r d By in G j L r d

8 N 1 16 N 02 18 N 2

By in G j L r b Therefore by in there exists e jGj such that

9 N 2 6 N

+

G j R b e

2

Hence

+

m p+1

G j L r a R E E R a e

1 02

1 01 2

+

By G j we have G j K a e By in and G j R b e G j R e b Hence

C i 3 N

1

2 2

G j L r a K a e R e b

1 i

2

i

Thus by in we have G j K a b This contradicts the assumption

M

k

s

2n

Hence the claim holds for t for all the cases of I

i

Conversely assume C D We show that is not valid That is we

M N M C D

show that is satisable To show this we construct a structure G such that

N M C D

G j and G j

N M C D

The structure G is dened as follows The universe of G consists of a distinguished no de r t which

is the interpretation of the constant r in G and two distinct innite chains of natural numb ers

More sp ecically let IN denote the set of all natural numb ers then

jGj fr tg IN fi j i INg

The binary relations in G are p opulated as follows

L fr t r tg

r

E fr t g E f r tg

01

01

E fr t g E f r tg

02

02

L fr t i j i INg L fi r t j i INg

1

1

L fr t i j i INg L fi r t j i INg

2

2

+

R fi i j i INg R fi i j i INg

1 1

+

R fi i j i INg R fi i j i INg

2 2

K fm n j C i m ng K fn m j m n K g

i M i

i

edges are omitted in the graph K R R L L E See Figure for the structure G E

2 1 2 1 02 01

i

It is easy to verify the following claims

Claim G j

C D

This is b ecause C C and C D

M M

Claim G j

N

This is immediate from the construction of G

Claim G j

M

Claim follows from the simple facts given b elow

F act G j K m n i C i m n

i M

I

i

E then C E Moreover E must satisfy the following Fact If C i m n

M M

M

conditions

If I i r eg ister j then E j m n

i 1

If I i r eg ister j then E j m n

i 2

If I i r eg ister j k and m then E j n

i 1

If I i r eg ister j k and m p then E j p n

i 1

If I i r eg ister j k and n then E j m

i 2

If I i r eg ister j k and n p then E j m p

i 2

Fact If G j ie there exists I for some i l such that G j then there exist

M i I

i

m n jGj such that

if I i r eg ister j then G j K m n K m n

i 1 i j

if I i r eg ister j then G j K m n K m n

i 2 i j

if I i r eg ister j k then either

i 1

G j K n K n where m or

i j

G j K p n K p n where m p

i

k

if I i r eg ister j k then either

i 2

K m where n or G j K m

j i Lr

E 01 rt E 02

L 1 L 2 0 0' + + R R 1 2 L 1 L 2

1 1'

+ + R 1 L 1 L 2 R 2

2 2'

+ + R R 1 2

L 1 L 2

+ R + 1 R 2

m n' K + + i R

R 1 2

Figure The structure G in Prop osition

G j K m p K m p where n p

i

k

Using these facts Claim can b e easily veried by reductio More sp ecically supp ose that

G j Then there is i l such that G j Here I has six cases For each of these

M I i

i

cases the assumption contradicts the facts ab ove As an example consider the case in which

I i r eg ister j By Fact there exist m n jGj such that G j K m n K m n

i 1 i j

By Fact C i m n In addition by Fact C j m n Thus again by Fact

M M

G j K m n This contradicts the assumption The pro ofs for the other cases are similar

j

Therefore if C D then is satisable

M N M C D

This completes the pro of of Prop osition

Semiconservative reduction

Taking advantage of the reduction prop erty established ab ove we now dene a recursive function

f FO S P by

+

f

N M

C ( ) (100)

where C is the ID m of M with an appropriate enco ding m of as describ ed in

L

Section

The prop osition b elow shows that f is indeed a semiconservative reduction from FO to S P

+

Prop osition Let M b e the tworegister machine describ ed in Section For each rst

L

order sentence

H i f is not satisable and

M 1

L

if H then f has a nite mo del

M 2

L

Pro of

By Prop osition we have that

C i is valid

M N M

C ( ) (100)

L

Therefore

C i is not satisable

M N M

C ( ) (100)

L

Notice that H i C Therefore H i f is not satisable

M 1 M M 1

L L L

This completes the pro of of claim

We show that if H then f has a nite mo del

M 2

L

First note that if H then the computation of M with initial ID C is nite

M 2 L

L

Therefore the set

SID fi m n j C i m ng

M

C ( )

L

is nite Hence there is a natural numb er p such that for each i m n SID m p and

C ( )

n p

Now we construct a nite mo del H for The universe of H has

N M

C ( ) (100)

p no des More sp ecically

p g jH j fr t pg f

where r t is the interpretation of the constant r in H

The binary relations L E E E E K and K in H are exactly the same as those in the

r 01 02 i

01 02

i

+ +

structure G given in the pro of of Prop osition The binary relations L L L L R R R

1 2

1 2 1 1 2

and R are p opulated in H as follows

2

+

R fi i j i pg fp pg

1

R fi i j i pg fp pg

1

+

R fi i j i pg fp p g

2

R fi i j i pg fp p g

2

L fr t i j i pg L fi r t j i pg

1

1

L fr t i j i pg L fi r t j i pg

2

2

See Figure for the structure H E E L L R R K edges are omitted in the graph

01 02 1 2 1 2

i

Note that the relations K and K in H are welldened since if C i m n then

i M

L

i

m p and n p In addition it is easy to verify that H is welldened

We now show that H is indeed a mo del of

N M

C ( ) (100)

First by C C and C we have that H j

M M

C ( ) (100)

L L

Second it is easy to verify that H j Note here it is to ensure H j that we require

N 5 6

+ +

p p p p R H j R

2 1

Finally we show that H j It is straightforward to verify the following simple facts

M

Fact If C i m n then m p and n p

M

L

I

i

j m n then m m and n n As a result of Fact Fact If i m n

1 1 1 1

M

L

m p and n p

1 1

Consequently Facts and for showing G j in the pro of of Prop osition are also true

M

here Therefore the argument for showing G j in the pro of of Prop osition together with

M

Facts and ab ove proves H j

M

Hence H is indeed a nite mo del of

N M

C ( ) (100)

This completes the pro of of Prop osition

Corollary The function f dened ab ove is a reduction from FO to S P

+

Pro of By the denition of M for each FO is satisable i H As shown in the

L M 1

L

pro of of Prop osition H i f is satisable Therefore f is a reduction from FO to

M 1

L

S P

+

As an immediate result of Prop osition and Lemma we have the following corollary

Corollary The set S P is a conservative reduction class

+

From Corollary Theorem follows immediately

The Implication Problems for P

f

In this section we establish Theorem As in the pro of of Theorem we show that the set

^

S P f j P P is niteg

f f f

is a conservative reduction class To do this we rst present an enco ding of tworegister machines

by sentences in P and then prove a reduction prop erty of the enco ding Using this reduction

f

prop erty we dene a semiconservative reduction from FO to S P

f Lr

E 01 rt E 02

0 L 1 L 0' K 2 2 + + R R 1 2 L 1 L 2

1 1'

+ + R 1 L 1 L 2 R 2

2 2'

+ + R R 1 2

L 1 L 2

+ R + 1 R 2 L 1 L 2

m n' K + + i R R 1 2

+ + R 1 R 2

p p'

+ +

R 1 R 2

Figure The structure H in Prop osition

Enco ding

Let M b e a tworegister machine as describ ed in Section Assume that the set E of binary

relations in the signature is the same as the one describ ed in Section except that the

predicates L and K for i l are no longer required here

r

i

We dene the enco ding as follows

Registers

f

We enco de the contents of the registers by which is the conjunction of the path constraints

N

of P given b elow

f

Successor predecessor

+

xy L r x z R x z R z y x y

1 1

1 1

+

pf L l t R R and r t

1 1 1 1

1 1

+

xy L r x z R x z R z y x y

2 1

1 1

+

z y x y x z R xy L r x z R

3 2

2 2

+

z y x y x z R xy L r x z R

4 2

2 2

+

z y x z R xy L r x x y z R

5 1

1 1

+

R pf L l t and r t R

5 1 5 5

1 1

+

z y x z R xy L r x x y z R

6 2

2 2

Register identication and are the same as given in Section

7 8 9 10

Zeros

y x r x xy L r y E

11 1

01

and r t is a simple path constraint with l t L E

11 11 11 1

01

z y x z z E z z E x y z E xy L r x E

01 12 1

01 01 01

E E and r t E pf L l t E

01 12 12 1 12

01 01 01

x z E z y x y xy L r x z E

01 13 1

01

E and r t pf L l t E

01 13 13 1 13

01

y z E z x E r x xy L r y z E

02 02 14 1

01

E and r t E is a simple path constraint with l t L E

02 14 02 14 14 1

01

xy E r x x y z E x z E z y

15 02 02

02

pf E l t and r t E E

15 02 15 15 02

02

xE r x L r x

16 02 2

IDs

The enco ding of each ID C of M is the same as the one given in Section

C

Note that is in P

C

f

Instructions

The enco ding of each instruction I is the same as the one given in Section except

i I

i

i

xy L r x z K x z E E z y K x y

1 i 02 j

s

02

20

i i i

Here pf L l t K E E and r t K

1 i 02 j

s s s

02

20 20 20

l

^

f

The enco ding of the program of M is

I

i

M

i=0

f

It is clear that is a conjunction of path constraints in P

f

M

Reduction prop erty

Analogous to Prop osition we establish the reduction prop erty of the enco ding ab ove as follows

Prop osition Let M b e a tworegister machine For all IDs C and D of M we have that

f f

C D i is valid

M C D

N M

Pro of

Assume that C D As in the pro of of Prop osition we prove by induction on step

M

f f

t

t that for each ID E of M and each mo del G of if C E then G j This

C E

M

N M

can b e shown in basically the same way as for Prop osition except for the following cases in the

inductive step

Case I i r eg ister j k and m In this case E must b e j n

i 1

Supp ose for reductio that there exist a b jGj such that

n +

a b K a b E R G j L r a E

j 02 1

2 01

Then by G j we have G j K a b In addition there exists e jGj such that

C i

1

a e G j L r a E

1

01

f

there exist c d jGj such that By in

12

N

a c E c d G j L r a E

01 1

01

f

we have G j a d Hence Thus by in

13

N

a c E c a G j L r a E

01 1

01

f

we have G j r c Thus G j E r a Hence a c and in By G j L r a E

01 11 1

01

N

G j E r a K a b

01 i

f

i

Thus by in we have G j K a b This contradicts the assumption

j

s

M

10

Case I i r eg ister j k and m p In this case E must b e k p n

i 1

Supp ose for reductio that there exist a b jGj such that

p + n

G j L r a R E E R a b K a b

1 02

k

1 01 2

f

Then by in there exists c jGj such that

5

N

+

G j L r a R a c R c a

1

1 1

f

By in we have G j L r c R c a Hence

7 1

1

N

+

p+1 n

G j L r c R E E R c b

1 02

1 01 2

Thus by G j we have G j K c b Hence

C i

1

+

G j L r a R a c K c b

1 i

1

f

i

Thus by in we have G j K a b This contradicts the assumption

k

s

M

1n

Case I i r eg ister j k and n In this case E must b e j m

i 2

Supp ose for reductio that there exist a b jGj such that

m

G j L r a R E E a b K a b

1 02 j

1 01

Then by G j we have G j K a b Moreover there exist c d jGj such that

C i

1

m

G j R a d E d c E c b

02

1 01

f f

By G j L r a and in we have G j L r d Thus by in we have

1 8 1 14

N N

G j E r b

02

f

there exists e jGj such that By in

15

N

b e E e b G j E

02

02

Hence

b e E e b G j L r a K a b E

02 1 i

02

f

i

we have G j K a b This contradicts the assumption in Thus by

j

s

M

20

Case I i r eg ister j k and n p In this case E must b e k m p

i 2

Supp ose for reductio that there exist a b jGj such that

+

p m

a b K a b E R E G j L r a R

02 1

k

2 01 1

Then there exist c d jGj such that

+

p m

d b E c d R a c E G j R

02

2 01 1

f f f

G j L r d G j E r d By in we have G j L r c By in By in

2 02 16 1 14 8

N N N

f f

By in G j L r b Therefore by in there exists e jGj such that

9 2 6

N N

+

G j R b e R e b

2 2

Hence

+

m p+1

G j L r a R E E R a e

1 02

1 01 2

By G j we have G j K a e Hence

C i

1

G j L r a K a e R e b

1 i

2

f

i

Thus by in we have G j K a b This contradicts the assumption

k

s

M

2n

Conversely assume that C D It is easy to verify that the structure G without L and

M r

f f

K edges constructed in the pro of of Prop osition is a mo del of

C D

i

N M

Semiconservative reduction

We dene a recursive function g FO S P by

f

f f

g

C ( ) (100)

N M

where C is the ID m of M with an appropriate enco ding m of as describ ed in

L

Section

Prop osition b elow shows that the function g is indeed a semiconservative reduction from

FO to S P

f

Prop osition Let M b e the tworegister machine describ ed in Section For each rst

L

order sentence

H i g is not satisable and

M 1

L

if H then g has a nite mo del

M 2

L

Pro of The pro of is the same as the pro of of Prop osition except that here in the structure H

edges shown in Figure there are no L and K

r

i

Corollary The function g dened ab ove is a reduction from FO to S P

f

Corollary The set S P is a conservative reduction class

f

From Corollary Theorem follows immediately

Conclusions

We have presented a class of path constraints P These constraints are imp ortant in b oth struc

tured and semistructured data for sp ecifying natural integrity constraints They are not only a

fundamental part of the semantics of the data they are also imp ortant in query optimization For

example the familiar inverse constraints that o ccur in ob jectoriented databases can b e stated as

path constraints of P

For semistructured data we have shown that despite the simple syntax of the language P

its asso ciated implication problem is re complete and its nite implication problem is core

complete Indeed we have established these undecidability results for two fragments of P One of

the fragments is the largest subset of P without equality The other is the set of path constraints

of the forward form in P

These undecidability results motivate our search for decidable fragments of P In we

establish the decidability of the implication problems for several fragments of P which retain

sucient expressive p ower to capture imp ortant semantic information such as inverse constraints

and lo cal database constraints commonly found in ob jectoriented databases

Vianu Val Tannen and Susan Davidson for Acknowledgements The authors thank Victor

helpful discussions

References

S O Aanderaa On the decision problem for formulas in which all disjunctions are binary

In Pr oc nd Scandinavian Logic Symp pp

S Abiteb oul Querying semistructured data In Proc ICDT

S Abiteb oul S Cluet V Christophides T Milo G Mo erkotte and J Simeon Querying

do cuments in ob ject databases Journal of Digital Libraries

S Abiteb oul D Quass J McHugh J Widom and J Weiner The lorel query language for

semistructured data Journal of Digital Libraries

S Abiteb oul and V Vianu Regular path queries with constraints In Proc A CM Symp on

Principles of Database Systems

F Bancilhon C Delob el and P Kanellakis editors Building an objectoriented database

Morgan Kaufmann San Mateo California system the story of O

J Barwise On Moschovakis closure ordinals Journal of Symbolic Logic

M F van Bommel and G E Weddell Reasoning ab out equations and functional dep enden

cies on complex ob jects IEEE Trans on Know ledge and Data Engineering

E Borger E Gradel and Y Gurevich The classical decision problem Springer

P Buneman S Davidson M Fernandez and D Suciu Adding structure to unstructured

data In Proc ICDT

P Buneman S Davidson G Hillebrand and D Suciu A query language and optimization

International Conf on Manage techniques for unstructured data In Proc ACM SIGMOD

ment of Data pp

P Buneman W Fan and S Weinstein The decidability of some restricted implication

problems for path constraints Technical Rep ort MSCIS Department of Computer

and Information Science University of Pennsylvania

R G G Cattell ed The objectoriented standard ODMG Release Morgan Kauf

mann San Mateo California

N Coburn and G E Weddell Path constraints for graphbased data mo del Towards s

unied theory of typing constraints equations and functional dep endencies In Proc nd

International Conf on Deductive and ObjectOriented Databases pp

H B Enderton A mathematical introduction to logic Academic Press

M Fernandez D Florescu A Levy and D Suciu A query language and pro cessor for a

website management system In Workshop on Management of Semistructured Data

E Gradel P Kolaitis and M Vardi On the decision problem for twovariable rstorder

logic Bul letin of Symbolic Logic March

M Ito and G E Weddell Implication problems for functional constraints on databases

supp orting complex ob jects Journal of Computer and System Science

M Ito G E Weddell and N Coburn On sp ecialization constraints over complex ob jects

Technical Rep ort CS Department of Computer Science University of Waterlo o

M Kifer W Kim and Y Sagiv Querying ob jectoriented databases In Proc ACM SIG

MOD International Conf on Management of Data pp

C Lamb G Landis J Orenstein and D Weinreb The Ob jectStore Database system

Comm ACM Octob er

A O Mendelzon G A Mihaila and T Milo Querying the World Wide Web In Proc

PDIS pp

J Mylop oulos P Bernstein and H Wong A language facility for designing databaseintensive

applications ACM Trans on Database Sys June

S Nestorov S Abiteb oul and R Motwani Inferring structure in semistructured data In

Workshop on Management of Semistructured Data

S Nestorov J Ullman J Weiner and S Chawathe Representative ob jects Concise rep

resentations of semistructured hierarchical data In Proc Thirteenth International Conf on

Data Engineering

Y Papakonstantinou H GarciaMolina and J Widom Ob ject exchange across heteroge

neous information sources In Proc Eleventh International Conf on Data Engineering pp

March

H Wang Domino es and the case of the decision problem In Proc Symp on Mathe

matical Theory of Automata Bro oklyn Polytechnic Institute pp

G E Weddell Reasoning ab out functional dep endencies generalized for semantic data mo d

els ACM Trans on Database Sys March

G E Weddell and N Coburn A theory of sp ecialization constraints for complex ob jects

In Proc of rd International Conf on Database Theory pp Decemb er

C Zaniolo The database language GEM In Proc ACM SIGMOD International Conf on

Management of Data pp