Extending Data Processing Capabilities of Relational Database Management Systems

Extending Data Processing Capabilities of Relational Database Management Systems. Igor Wojnicki Cezary Z. Janikow University of Missouri { St. Louis University of Missouri { St. Louis Department of Mathematics Department of Mathematics and Computer Science and Computer Science 8001 Natural Bridge Road 8001 Natural Bridge Road St. Louis, MO 63121{4499 St. Louis, MO 63121{4499 phone: (314) 516-6353 phone: (314) 516-6352 fax: (314) 516-5400 fax: (314) 516-5400 e-mail: [email protected] e-mail: [email protected] Abstract Relational Database Management Sys- 1 The Relational Model tems proven to be robust and efficient for storing and retrieving data. However, they have limita- The relational model represents a database as tions. Sophisticated data processing (a rule-based a collection of relations [1, 2, 3]. Each relation processing with inferences) is possible but hard to is characterized by its name and attributes. A implement. To overcome these problems, several set of attribute values for a certain relation is methods have been developed. Data can be processed called a tuple. A value of the attribute is de- by a client application (a client-side processing) termined by the attribute domain (data type). or by a RDBMS server (a server-side processing). A domain is a set of atomic (indivisible) val- These methods overcome the main limitations, but ues. A relation schema is defined as a relation such solutions are not flexible, hard to modify or al- name (R) and a list of attributes(Ai): ter. This paper presents a novel approach, a case of R(A ;A ;:::;A ) server-side processing with the use of a declarative 1 2 n language. A program, written in Prolog, is decom- The domain is defined for each attribute posed into relations, and stored in the database. It dom(Ai). A relation state (or just relation) can be easily managed, modified or altered. It has of the relation schema R(A1;A2;:::;An), de- access to database relations, and it also has an abil- noted by r(R), is a set of tuples t1; : : : ; tm, ity to create relations on demand, so-called dynamic where each tuple is an ordered list of n values views or Jelly Views. Jelly Views serve as tempo- t =< v1; : : : ; vn >, and each value vi is an ele- rary relations holding data obtained as the result of ment of dom(Ai) or null (does not exist). The the program. The communication interface between values in tuple t are referred to as t[Ai], so by the client and the database is preserved, it is still the attribute's name. Relations represent facts SQL. about entities or facts about relationships between relations. A relational database schema is a set of relation schemas: S = R1;:::;Ro 2 RDBMS Boundaries Keywords: RDBMS, Prolog, deductive Each Relational Database Management Sys- database, inference tem uses two languages: Data Definition Lan- guage for defining a relational database schema The Acceptable Solutions problem is another and Data Manipulation Language for storing application where RDBMS can not be applied and retrieving data. directly. It can be also called a Reverse Ag- SQL has evolved as a high level, robust and gregation. Let's use the same Traveling Sales- compact DDL and DML query language. The man Problem. The question would be: find all power of SQL lies in its declarative nature. paths shorter then some value. Each path is a A query can be expressed in a logical form list of cities. As a result, the replay would be a with little care for procedural processing. It set of paths. Each path would be shorter than gives meaningful and highly abstractive way the given in the query value. for querying. But here is a limitation. SQL There are several applications where these offers an abstract interface allowing to build so- problems appear: phisticated queries, but more complicated data phone billing: this usually involves so- processing is out of reach. In particular, the • phisticated rule-based processing, cost of following disadvantages can be identified: connections is calculated basing on differ- lack of more sophisticated, i.e. rule-based ent variables: day time, number of active • data processing, phone numbers of the customer, calling plan, length of the call etc.; rules are sub- a query can not be recursive, ject to change, • searching for acceptable solutions is lim- flight planes, trip planning (the Traveling • ited. • Salesman Problem in general), Rule-based processing requires rules, usu- budget planning, alternative flight/trip • ally the a form of decision trees. Decisions routes. are based on information acquired from a These problems can be tackled using either database. At each node of such a tree, the client-side processing, or server-side processing database is queried and the replay is compared with functions. with appropriate values (or set of values) in the Client-side processing is based on serializa- node. Depending on the comparison, the next tion of queries and taking actions accordingly node of the tree is taken and the process contin- to partial replies. The process is as follows: the ues. Rule-based processing is a core part of the client sends a query, the RDBMS replies, de- most Expert and DDB (Deductive Database) pending on the reply the client sends another systems [4, 5, 2]. query. The logic is usually hard-coded into the Recursive queries, in general, regard search- client. There are a few drawbacks of such a ing for data which is stored in some tree-like or solution: graph-like way. The Traveling Salesman Prob- lem can be an example. A relation holds infor- the inference process involves many mation about distances between neighboring • queries, cities, there is no data redundancy. A relaxed queries are not optimized by the RDBMS question is: find the shortest path between two • given cities. If the question was asked with (series of simple queries and travel known number of cities on the path, it would through a decission tree instead of few be issued as an ordinary query. However, usu- more complex queries), ally the number of cities is not known in ad- with the decision tree hard-coded into the vance. SQL3 (known also as SQL99) standard • client, it is hard to modify the logic, allows recursive queries. However, this feature is very rarely implemented (it is implemented only dedicated client software is able to in DB2 [6]). • conduct the inference process, and thus the principle of a database as a universal where, < a1; : : : ; an > is a tuple (ordered list of source of information fails. values). There is an analogy between a relation and a clause. (in terms of the Relational Model Such a solution is implemented in expert sys- see Sec. 1) Both of them describe facts using tems, with an external RDBMS data source. tuples. In the case of the Relational Model, a Server-side processing seems to be more ro- value in the tuple can be referenced using ap- bust, but there is a need for support from propriate attribute name or its position, while RDBMS in the form of additional languages, for the Logic Programming only position can usually procedural ones. Using such a lan- be used. Concluding, any Relational Model tu- guage, a function can be written. A user query ple can be represented as a simple clause, so can use such a function as a data-source. Com- compatibility between the relational databases paring to client-side processing, the number of and logic programs has been stated. queries is reduced, a function is stored and pro- Furthermore logic programs can be con- cessed by the RDBMS so it is optimized, and structed with so-called complex clauses. Such the database remains an universal source of in- a clause doesn't express facts explicitly, but formation. Still, there are a few drawbacks: defines relationships among them (intensional knowledge), with a conclusion. the function's language is RDBMS spe- • cific, can be used only for dedicated RDBMS, con(C1;:::;Cn) c1(Ci :::;Cj) 1 ::: 4 ::: r cm(Ck;:::;Cl): logic, i.e. decission tree, is hard-coded 4 • within the function code, inflexible, hard to modify, Here, con is the concluding clause with arity the input and output for the function have n; are logical operators joining predicates • 4 to be known when the function is defined. cq (preconditions). Predicates such as cq are matched against their clauses. If such a match is found, appropriate facts are set and the con- 3 Relational Model vs Logic cluding tuple is passed as a new fact. As a result, a non-deterministic conclusion can be In general, knowledge processing involves two drawn; a single complex clause can generate kinds of data: extensional and intensional. multiple conclusions, and new facts. Function- Extensional knowledge is just bare facts, ality of complex clauses is not supported by the while intensional defines relationships among Relational Model. facts. Logic can be used to express extensional Logic Programs provide a uniform language and intensional knowledge as well [7]. To for- for representation of databases and even more, malize syntax for writing logical sentences, the enabling additional expressive power in the alphabet has to be stated. Data (or knowl- form of intensional knowledge. edge) is stored as predicates. A predicate p is defined as a symbol denoting appropriate relation among n number of arguments: 4 Logic Programs in RDBMS p=n The main proposed improvement to RDMBS is adding capabilities to store and process inten- where n is called its arity. A fact compound sional knowledge, along with the extensional of n values, concerning predicate p is named a one. The inference engine will be coupled simple clause: with the data source, and available to a client through standard SQL queries [8].

Load more