Some Undecidable Implication Problems for Path Constraints
Total Page:16
File Type:pdf, Size:1020Kb
University of Pennsylvania ScholarlyCommons Technical Reports (CIS) Department of Computer & Information Science April 1997 Some Undecidable Implication Problems for Path Constraints Peter Buneman University of Pennsylvania Wenfei Fan University of Pennsylvania Scott Weinstein University of Pennsylvania, [email protected] Follow this and additional works at: https://repository.upenn.edu/cis_reports Recommended Citation Peter Buneman, Wenfei Fan, and Scott Weinstein, "Some Undecidable Implication Problems for Path Constraints", . April 1997. University of Pennsylvania Department of Computer and Information Science Technical Report No. MS-CIS-97-14. This paper is posted at ScholarlyCommons. https://repository.upenn.edu/cis_reports/84 For more information, please contact [email protected]. Some Undecidable Implication Problems for Path Constraints Abstract We present a class of path constraints of interest in connection with both structured and semi-structured databases, and investigate their associated implication problems. These path constraints are capable of expressing natural integrity constraints that are not only a fundamental part of the semantics of the data, but are also important in query optimization. We show that, despite the simple syntax of the constraints, the implication problem for the constraints is r.e. complete and the finite implication problem for the constraints is co-r.e. complete. Indeed, we establish the existence of a conservative reduction of the set of all first-order sentences to the path constraint language. Comments University of Pennsylvania Department of Computer and Information Science Technical Report No. MS- CIS-97-14. This technical report is available at ScholarlyCommons: https://repository.upenn.edu/cis_reports/84 Some Undecidable Implication Problems for Path Constraints y z Peter Buneman Wenfei Fan Scott Weinstein p etercentralcisup ennedu wfansaulcisup ennedu weinsteinlinccisup ennedu Department of Computer and Information Science University of Pennsylvania April Abstract We present a class of path constraints of interest in connection with b oth structured and semistructured databases and investigate their asso ciated implication problems These path constraints are capable of expressing natural integrity constraints that are not only a funda mental part of the semantics of the data but are also imp ortant in query optimization We show that despite the simple syntax of the constraints the implication problem for the constraints is re complete and the nite implication problem for the constraints is core complete Indeed we establish the existence of a conservative reduction of the set of all rstorder sentences to the path constraint language Intro duction Path inclusion constraints have b een studied in in the context of semistructured data Consider the following ob jectoriented schema class studentf Name string Taking setcourse g class coursef CName string setstudent Enrolled g Students setstudent Courses setcourse in which we assume that the declarations Students and Courses dene p ersistent entry p oints it stands this declaration do es not provide full information ab out the intended into the database As structure Given such a database we would exp ect the following informally stated constraints to hold This work was partly supp orted by the Army Research Oce DAAH and NSF Grant CCR y Supp orted by an IRCS graduate fellowship z Supp orted by NSF Grant CCR r Students Courses Students Courses Taking Taking Taking S1 C1 S2 C2 Enrolled Enrolled Enrolled Name CName Name CName "Smith" "Chem3" "Jones" "Phil4" Figure Representation of a studentcourse database a s S tudents c sT ak ing c C our ses nr ol l ed s S tudents b c C our ses s cE That is any course taken by a student must b e a course that o ccurs in the database extent of courses e and any student enrolled in a course must b e a student that similarly o ccurs in the database W shall call such constraints extent constraints We might also exp ect an inverse relationship to hold b etween Taking and Enrolled Ob ject oriented databases dier in the ways they enable one to state and enforce extent constraints and inverse relationships Compare for example O and Ob jectStore The presence of such 2 constraints is imp ortant b oth for database and for query optimization Let us develop a more formal notation for describing such constraints To do this we b orrow an idea that has b een exploited in semistructured data mo dels eg of regarding semistructured data as an edgelab eled graph The database consists of two sets and we express this by a ro ot no de r with edges emanating from it that are lab eled either Students or Courses These connect to no des that resp ectively represent students and courses which have edges emanating from them that resp ectively describ e the structure of students and courses For example a student has a single Name edge connected to a string no de and multiple Taking edges connected to course no des See Figure for an example of such a graph this representation of data we can examine certain kinds of constraints Using Extent Constraints By taking edge lab els as binary predicates constraints of the form a and b ab ove can b e stated as c s S tudentsr s T ak ing s c C our sesr c s c C our sesr c E nr ol l edc s S tudentsr s These constraints are examples of word constraints studied in the implication problems for word constraints were shown to b e decidable there verse Constraints These are common in ob jectoriented databases With resp ect to our In studentcourse schema the inverse b etween Taking and Enrolled is expressed as s c S tudentsr s T ak ing s c E nr ol l edc s c s C our sesr c E nr ol l edc s T ak ing s c Such constraints cannot b e expressed as word constraints or even by the more general path constraints given in Lo cal Database Constraints In database integration it is sometimes desirable to make one database a comp onent of another database or to build a database of databases Supp ose for example we wanted to bring together a numb er of studentcourse databases as describ ed ab ove We might write something like class SchoolDBf DBidentifier string Studentssetstudent as defined above Courses setcourse as defined above g Schools setSchoolDB Now we may want certain constraints to hold on comp onents of this database For example the we refer to extent constraints describ ed ab ove now hold on each memb er of the Schools set Here a comp onent database such as a memb er of the set Schools as a loc al database and its constraints our graph representation by adding Schools edges from as local database constraints Extending the new ro ot no de to the ro ots of lo cal databases the lo cal extent constraints are d c S chool sr d s S tudentsd s T ak ing s c C our sesd c d s S chool sr d c C our sesd c E nr ol l edc s S tudentsd s Again these cannot b e stated as path constraints of These considerations give rise to the question whether there is a natural generalization of the constraints of which will capture these slightly more complicated forms Here we consider a class of path constraints of either the form y r x x y x y x or the form x y r x x y y x where x y x y x y represents a path from no de x to no de y This class of path constraints can b e used to express all the constraints we have so far en countered For semistructured data in particular this class of constraints is useful not only for on this sub ject optimizing navigational queries but also for inferring structure see Surprisingly the implication problems for this mild generalization of word constraints are unde cidable However certain restricted cases are decidable in semistructured databases and these t to express at least the constraints we have describ ed ab ove In this pap er we cases are sucien establish the undecidability of the implication problems for this class of path constraints We defer to another pap er a full treatment of the decidability results Related work The idea of representing data as an edgelab eled graph and using paths to sp ecify navigational queries dates back to the early s Recently the idea has b een exploited and adapted to a variety of new database applications ranging from querying ob jectoriented databases eg XSQL OQLdo c to querying semistructured data eg UnQL Lorel WebSQL STRUQL There has also b een work in constraint languages dened in terms of paths for structured data as well as for semistructured data A class of functional constraints called path functional dependencies was prop osed in The axiomatizability and decidability of its asso ciated unrestricted implication problem were established in This constraint language generalizes functional dep endencies in the relational data mo del for semantic and ob ject oriented data mo dels It diers signicantly from ours which is a generalization of unary inclusion dep endencies in the relational mo del for b oth structured and semistructured data In a class of constraints for sp ecifying range restrictions asso ciated with paths called specialization constraints was prop osed for ob jectoriented data mo dels The axiomatizability and decidability of its asso ciated implication problems were established in A sp ecialization constraint asserts a typ e condition which is often sp ecied by a schema The central dierence b etween sp ecialization constraints and our path constraints is that sp ecialization constraints are