<<

Extending Data Processing Capabilities of Relational Management Systems.

Igor Wojnicki Cezary Z. Janikow University of Missouri – St. Louis University of Missouri – St. Louis Department of Mathematics Department of Mathematics and Computer Science and Computer Science 8001 Natural Bridge Road 8001 Natural Bridge Road St. Louis, MO 63121–4499 St. Louis, MO 63121–4499 phone: (314) 516-6353 phone: (314) 516-6352 fax: (314) 516-5400 fax: (314) 516-5400 e-mail: [email protected] e-mail: [email protected]

Abstract Management Sys- 1 The tems proven to be robust and efficient for storing and retrieving data. However, they have limita- The relational model represents a database as tions. Sophisticated data processing (a rule-based a collection of relations [1, 2, 3]. Each processing with inferences) is possible but hard to is characterized by its name and attributes. A implement. To overcome these problems, several of attribute values for a certain relation is methods have been developed. Data can be processed called a . A value of the attribute is de- by a client application (a client-side processing) termined by the attribute domain (). or by a RDBMS server (a server-side processing). A domain is a set of atomic (indivisible) val- These methods overcome the main limitations, but ues. A relation schema is defined as a relation such solutions are not flexible, hard to modify or al- name (R) and a list of attributes(Ai): ter. This paper presents a novel approach, a case of R(A ,A ,...,A ) server-side processing with the use of a declarative 1 2 n language. A program, written in Prolog, is decom- The domain is defined for each attribute posed into relations, and stored in the database. It dom(Ai). A relation state (or just relation) can be easily managed, modified or altered. It has of the relation schema R(A1,A2,...,An), de- access to database relations, and it also has an abil- noted by r(R), is a set of t1, . . . , tm, ity to create relations on demand, so-called dynamic where each tuple is an ordered list of n values views or Jelly Views. Jelly Views serve as tempo- t =< v1, . . . , vn >, and each value vi is an ele- rary relations holding data obtained as the result of ment of dom(Ai) or (does not exist). The the program. The communication interface between values in tuple t are referred to as t[Ai], so by the client and the database is preserved, it is still the attribute’s name. Relations represent facts SQL. about entities or facts about relationships be- tween relations. A relational is a set of relation schemas: S = R1,...,Ro

2 RDBMS Boundaries

Keywords: RDBMS, Prolog, deductive Each Relational Database Management Sys- database, inference tem uses two languages: Data Definition Lan- guage for defining a relational database schema The Acceptable Solutions problem is another and Data Manipulation Language for storing application where RDBMS can not be applied and retrieving data. directly. It can be also called a Reverse Ag- SQL has evolved as a high level, robust and gregation. Let’s use the same Traveling Sales- compact DDL and DML . The man Problem. The question would be: find all power of SQL lies in its declarative nature. paths shorter then some value. Each path is a A query can be expressed in a logical form list of cities. As a result, the replay would be a with little care for procedural processing. It set of paths. Each path would be shorter than gives meaningful and highly abstractive way the given in the query value. for querying. But here is a limitation. SQL There are several applications where these offers an abstract interface allowing to build so- problems appear: phisticated queries, but more complicated data phone billing: this usually involves so- processing is out of reach. In particular, the • phisticated rule-based processing, cost of following disadvantages can be identified: connections is calculated basing on differ- lack of more sophisticated, i.e. rule-based ent variables: day time, number of active • data processing, phone numbers of the customer, calling plan, length of the call etc.; rules are sub- a query can not be recursive, ject to change, • searching for acceptable solutions is lim- flight planes, trip planning (the Traveling • ited. • Salesman Problem in general),

Rule-based processing requires rules, usu- budget planning, alternative flight/trip • ally the a form of decision trees. Decisions routes. are based on acquired from a These problems can be tackled using either database. At each node of such a tree, the client-side processing, or server-side processing database is queried and the replay is compared with functions. with appropriate values (or set of values) in the Client-side processing is based on serializa- node. Depending on the comparison, the next tion of queries and taking actions accordingly node of the tree is taken and the process contin- to partial replies. The process is as follows: the ues. Rule-based processing is a core part of the client sends a query, the RDBMS replies, de- most Expert and DDB () pending on the reply the client sends another systems [4, 5, 2]. query. The logic is usually hard-coded into the Recursive queries, in general, regard search- client. There are a few drawbacks of such a ing for data which is stored in some tree-like or solution: graph-like way. The Traveling Salesman Prob- lem can be an example. A relation holds infor- the inference process involves many mation about distances between neighboring • queries, cities, there is no data redundancy. A relaxed queries are not optimized by the RDBMS question is: find the shortest path between two • given cities. If the question was asked with (series of simple queries and travel known number of cities on the path, it would through a decission tree instead of few be issued as an ordinary query. However, usu- more complex queries), ally the number of cities is not known in ad- with the decision tree hard-coded into the vance. SQL3 (known also as SQL99) standard • client, it is hard to modify the logic, allows recursive queries. However, this feature is very rarely implemented (it is implemented only dedicated client software is able to in DB2 [6]). • conduct the inference process, and thus the principle of a database as a universal where, < a1, . . . , an > is a tuple (ordered list of source of information fails. values). There is an analogy between a relation and a clause. (in terms of the Relational Model Such a solution is implemented in expert sys- see Sec. 1) Both of them describe facts using tems, with an external RDBMS data source. tuples. In the case of the Relational Model, a Server-side processing seems to be more ro- value in the tuple can be referenced using ap- bust, but there is a need for support from propriate attribute name or its position, while RDBMS in the form of additional languages, for the Logic Programming only position can usually procedural ones. Using such a lan- be used. Concluding, any Relational Model tu- guage, a function can be written. A user query ple can be represented as a simple clause, so can use such a function as a data-source. Com- compatibility between the relational paring to client-side processing, the number of and logic programs has been stated. queries is reduced, a function is stored and pro- Furthermore logic programs can be con- cessed by the RDBMS so it is optimized, and structed with so-called complex clauses. Such the database remains an universal source of in- a clause doesn’t express facts explicitly, but formation. Still, there are a few drawbacks: defines relationships among them (intensional knowledge), with a conclusion. the function’s language is RDBMS spe- • cific, can be used only for dedicated

RDBMS, con(C1,...,Cn) c1(Ci ...,Cj) 1 ... ← 4 ... r cm(Ck,...,Cl). logic, i.e. decission tree, is hard-coded 4 • within the function code, inflexible, hard to modify, Here, con is the concluding clause with arity the input and output for the function have n; are logical operators joining predicates • 4 to be known when the function is defined. cq (preconditions). Predicates such as cq are matched against their clauses. If such a match is found, appropriate facts are set and the con- 3 Relational Model vs Logic cluding tuple is passed as a new fact. As a result, a non-deterministic conclusion can be In general, knowledge processing involves two drawn; a single complex clause can generate kinds of data: extensional and intensional. multiple conclusions, and new facts. Function- Extensional knowledge is just bare facts, ality of complex clauses is not supported by the while intensional defines relationships among Relational Model. facts. Logic can be used to express extensional Logic Programs provide a uniform language and intensional knowledge as well [7]. To for- for representation of databases and even more, malize syntax for writing logical sentences, the enabling additional expressive power in the alphabet has to be stated. Data (or knowl- form of intensional knowledge. edge) is stored as predicates. A predicate p is defined as a symbol denoting appropriate rela- tion among n number of arguments: 4 Logic Programs in RDBMS

p/n The main proposed improvement to RDMBS is adding capabilities to store and process inten- where n is called its arity. A fact compound sional knowledge, along with the extensional of n values, concerning predicate p is named a one. The inference engine will be coupled simple clause: with the data source, and available to a client through standard SQL queries [8]. This ap- p(a1, . . . , an) proach addresses the major disadvantages of current RDBMS (see Sec. 2): limited rule- Let’s define a behavior of the system. All based processing, recursive queries, searching queries requiring a Jelly View have to be pro- for acceptable solutions. cessed. Once such a query appears (detected To keep the architecture flexible and easy to by External Matching routines), the Logic Pro- modify, the following assumptions are taken: gram should be built. All extensional data is available to the Logic Program through the In- logic programs are used to construct a • ternal Matching mechanism. Then the infer- called a Jelly View, ence engine generates the necessary Jelly View. the view is created on demand, Upon the completion of the query, all dynamic • data (the Jelly View) is assumed to be no the query language is preserved, the view longer valid and thus it is removed. If a query • is transparent – not different from other doesn’t refer to Jelly View, it is passed on to RDBMS views. the RDBMS, and the response is passed to the user. In other words, the system in transparent As a result, RDBMS has its functionality ex- to the user. tended toward those of deductive databases (DDB). The chosen syntax is that of the Prolog lan- 5 Decomposition guage. This provides all necessary mechanisms for declaring intensional knowledge, and it also Decomposing the three main components of provides some features of procedural languages a Jelly View: External Matching, Internal that allow to control the inference process. In Matching and Logic Program, is a crucial prob- turn, this allows to create logic programs and lem. They must comply with the relational optimize them. A Prolog language program model. One possible approach is illustrated in is similar to pure logical programs, with some the ER diagram in Fig. 1. technical differences [9, 7] (see Sec. 3). To ac- External Matching matches a relation name complish the coupling of the Prolog program and a clause name. If a relation name is found with RDBMS, that program must be repre- in -clause.table, a Jelly View for this sented in a way complying with the Relational relation is generated. The Jelly View has at- Model, including the normal forms. There are tributes defined by the entity argument. To following challenges to face: generate the view, the clause name is speci- Internal Matching — matching between fied (table-clause.clause) and subsequently • relations and clauses, enabling access to used as goal for the inference engine. The ar- data (extensional knowledge) from the ity of the goal predicate is calculated from the logic program, number of arguments (the relation argument). Internal Matching is a bridge between the External Matching — stating the goal for inference engine and the RDBMS, it handles • the prolog program to generate a Jelly extensional knowledge. A predicate can be View, mapped on a relation. If the inference en- gine requests a simple clause dealing with the Logic Program itself, which should be de- predicate named clause-table.name of ar- • composed to meet the normal form re- ity clause-table.arity, the clause is gen- quirements. erated from the data stored in the relation These challenges can be met by the three clause-table.table. This is the way exten- main parts of a Prolog program: extensional sional knowledge stored in the RDBMS is ac- knowledge (Internal Matching), intensional cessed by the inference engine. knowledge (Logic Program) and goal (External The goal for the logic program has to be Matching). specified. It is held by the clause relation. EXT Matching INT Matching

clause position arity argument clause table table 1 N has type name N M table-clause has clause-table

clause order name order N M consists_of argument 1 N has clause name

1 1 N has preconditioned order 1 logical operator symbol

Program

Figure 1: External Matching, Internal Matching and Logic Program Entity Relationship Diagram.

Each logic program consists of certain num- mains and situations. Combining speed and ef- ber (one or more) of complex clauses. Each ficiency of a RDBMS with a server-side declar- table-clause is an ordered list of complex ative programming gives a powerful, robust, clauses (the relation clause). A single clause and flexible environment, not only for data has a number of named arguments. This clause storage and retrieval, but also for sophisticated is either a concluding clause or a precondition processing such as drawing conclusions and dis- predicate. These are not differentiated because covering new knowledge. A client application they are represented by the same relation. As issues ordinary SQL queries, all the processing in Prolog, such a construct always has a logi- is done on the server side. Intensional knowl- cal operator on the right (the relation logical edge (rules) is stored within RDBMS as a de- operator). The concluding clauses and pre- composed Prolog program. This architecture condition predicates are not distinct in terms of makes the system flexible and transparent to decomposition., preconditioned relationship the client. Any rule can be altered individually, is used to assign preconditions to the conclud- with changes applied only to appropriate rela- ing clause. As shown, any complex clause de- tions, using ordinary SQL queries. This can be fined in Sec. 3 can be decomposed into such done without altering any client application. entities. As a result, RDBMS can have DDB function- ality, preserving communication methods and the Relational Model. 6 Conclusions This approach can have applications in many fields of Computer Science, including: Extending knowledge processing capabilities of RDBMS makes them applicable to new do- Data Mining and Data Discovery, • rule systems, e.g., telephone billing sys- [9] Michael A. Covington, Donald Nute, and • tems, tax systems, Andr´e Vellino. Prolog programming in depth. Prentice-Hall, 1997. Expert Systems, including those self- • teaching (RDBMS could now posses Ex- pert System capabilities),

Artificial Intelligence, Knowledge Discov- • ery tools.

References

[1] E. F. Codd. A relational model of data for large shared data banks. CACM, 13(6):377–387, 1970.

[2] Ramez Elmasri and Shamkant B. Navathe. Fundamentals of Database Systems. Addi- son Wesley, 2000.

[3] Jeffrey D. Ullman and Jennifer Widom. A first course in Database systems. Prentice- Hall Inc., 1997.

[4] Marcia A. Derr, Shinichi Morishita, and Geoffrey Phipps. The glue-nail deductive database system: Design, implementation, and evaluation. VLDB Journal, 3(2):123– 160, 1994.

[5] Raghu Ramakrishnan, Divesh Srivastava, S. Sudarshan, and Praveen Seshadri. The CORAL deductive system. VLDB Jour- nal: Very Large Data Bases, 3(2):161–210, 1994.

[6] Srini Venigalla Netsetgo. Expanding re- cursive opportunities with udfs in db2 v7.2. Technical report, International Busi- ness Machines Corporation, 2002.

[7] Ulf Nilsson and Jan Małuszyński. Logic, Programming and Prolog. John Wiley & Sons, 1990.

[8] Igor Wojnicki and Antoni Ligęza. An in- ference engine for rdbms. In 6th Interna- tional Conference on Soft Computing and Distributed Processing, Rzeszów, Poland, 2002.