Semantic Modeling of Object Oriented

Mokrane Bouzeghoub, Elisabeth Mttais

Laboratoire MAX, Universite P. et M. Curie - CNRS, Centre de Versailles 45, avenue des Etats-Unis 78000 Versailles, France Email: [email protected]

Abstract: can easily be specified. Class hierarchies and inheritance are generally defined in the same way. The dynamic aspect can be This paper describes a design methodology for fulfilled by introducing the concept of behavior in the an object oriented , based on a semantic semantic data models. In some sense, this was already done in network. This approach is based on the the AI domain by the concept of script which has been assumption that Yemantic data models are more developped ‘to enhance the expressive power of semantic powerful and more easy to use than current networks and frames. We follow the same approach and proposed object oriented data models. They are describe the behavior of a semantic by means of especially more poweful in representing production rules. This kind of a declarative language permits integrity constraints and various relationships, to avoid the complexily of procedural languages which arc Object oriented data models are generally based gcncrally used in objccl oriented data models. The behavior of only on class hierarchies and inheritance, plus each object in the will be described by one or several rules expressing either integrity constraints or their ability to represent the behavior of objects, any management rules concerning objects. But this latter capability is generally provided through an algorithmic language which cannot This paper highlights on one hand the object-oriented be considered as a conceptual language. This databasedesign methodology we have developped, and on the paper describes a design procedure which olher hand the design tool which supports this methodology. generates an object oriented This methodology is baseci on two design levels: a Semantic (both the structural aspect and the dynamic qbicct oriented level and an eperatio&lect oriented 1e el (Figure 1). The process of interactive acquisiG& aspect) from an abstract specification giveri in a completeness and consistency checkings of the behavioural high level language. This specification language rules is particularly emphasized in the first level. At the is built upon a semantic network and allows to second level, we use as an operational object oriented data define integrity constraints and behavior rules. model, called 02 model, which was developed by Alta’ir This approach is presented through a CASE project [ Bancilhon 871. Then a mapping process between the tool environment. two models is proposed. Besides the mappings, the transformation of a semantic object oriented schema into 1. Introduction an operational object oriented schema consists, among others, in the generation of procedural methods (C written) from a Like relational databases, the design of an object oriented declarative language specification (production rules). database is a complex art which needs many expertise in the domain. The simultaneous modeling of the structural aspect 1 CONCEPTUAL LEVEL! and the behaviouralaspect of objectsincreases the complexity of the design. The current object oriented data models are mainly defmed by a few basic constructors (like the tuple constructor and the set constructor) and a taxonomy of objects + productlon rules (i.e. hierarchy of classes and inheritance). The power of object oriented data models is highlighted by their ability to describe the dynamic behavior of the objects (methods). However, as generally proposed in the object oriented database systems, this dynamic description is made in a procedural language; this fact makes the specification of the methods too difficult at the conceptual level. Another weakness of current object oriented data models is that, except through methods, they do not easily permit to specify integrity constraints on the objects. Except for the dynamic aspect, the expressive power of semantic data models is stronger than that of object-oriented data models. Various relationships and integrity constraints 0 level v

Barcelona, September, 1991 Proceedings df the 17th International 3 Conference on Very Large Data Bases The second section of this paper will give an overview of o(Y,X), i(Y,X), s(Y,X)). To simplify the graphical the two data models used. Section 3 describes the general representation, we use only one specification which mapping process from the semantic network to the object subsumes the other (for example p, o, c and g) except if oriented model 02. This mapping process is only concerned constraint specification is needed for each specific arc. with the structural aspect and with some static integrity constraints. Section 4 describes the mapping of general integrity constraints (Fist order formulas) and behavioural rules (production rules) to the operational level expressed in C language. Roblems of method definition and attachment are also addressed in this section. Section 5 gives a general flavour of the modelling prototype which was designed. Section 6 concludes on the obtained results and the remaining problems to solve. 2. The Hierarchy of Data Models Used This section gives a general overview of the two data models considered in the methodology. The first one is a generic semantic network called Morse. The second one is an object oriented data model called 02.

2.1. The Semantic Whght A semantic network is an oriented diagram where the nodes represent real world objects and the arcs represent semantic The inheritance is one of the interesting properties of relationships between these objects. In addition, constraints generalization hierarchies; each atomic or molecular can be defined over these nodes and arcs. In the following, component of an object X can be transfered by inheritance to such a semantic network data model is designated by the name objects X1 . ..Xn. if these latters are sub-classes of X. Morse [Bouzeghoub 841. This model is formalized in such a Inversely, each instance of a sub-class is an instance of its way it can represent the most important concepts used in super-classes. We say that components of objects propagate semantic data models. The objective of this formal definition toward the leaves of the hierarchy whereas the instances is to provide a general framework whithin which a CASE tool propagate toward the root(s) of the hierarchy. could be specified and programmed. The external vision of the model could be any desired diagram (e.g. Entity-Relationship) Different integrity constraints can be specified in a Morse or abstract syntaxe. semantic network to enhance its capability to capture more meaning from the real world. Among these constraints, we An object is a generic term to designate the different real can mention domains, cardinalities, functional dependencies, world individuals refered to in Morse schemas.We distinguish keys, intersection and disjunction of classes, etc. In the four categories of objects: in one hand instances of atomic semantic network, some of these constraints are defined over objects (IA) and instances of molecular objects (IM), in the nodes, others are defined over arcs. The constraints are other hand classes of atomic objects (NA) and classes of specified either as a complementary information of binary arcs molecular objects (NM). Then, in the following, we use the or as new predicates. For example, cardinal@ constraints are term object in a generic way, and whenever necessary, we use expressed as complementary information over a/p arcs and r/o the more specific term. arcs, while other constraints like functional dependencies are The distinction between atomic objects and molecular representedby a new fd arc: objects permits to highlight their structural links for a better specification of the corresponding constraints. Atomic objects a(Number,VEHICLE, [l,l] ), have values taken from basic domain such as: integer, real, p(VEHICLE,Number, [l,ll), boolean and string. The set of all atomic values in all domains are refered to by the name VA. Molecular objects a(Type,VEHICLE, [l,N]), have molecular values which are composed from the p(VEHICLE, Type, [I,11 ), corresponding atomic objects which constitute the molecular object. Both atomic objects and molecular objects have unique a(Power,VEHICLE, [O,N]), identifiers which are independant of their values. p(VEHICLE,Power, [l,l]), Semantic links are basic binary relationships between the r(VEHICLE,CONTRACT,[l,l)), different categories of objects mentioned above. These binary o(CONTRACT,VEHICLE,[l,N]), relationships formalize the well-known concepts of aggregation and generalization [Smith 771. Specific r(PERSON,CONTRACT,[l,N]), refinements of these concepts are introduced to take into o(CONTRACT,PERSON,[l,l]). account the distinction between atomic objects and molecular objects. The aggregation concept is refined as aromic fd(VEHICLE,lhs(Type),rhs(Power)) aggregation (arc a(X,Y)) and molecular aggregation (arc r(X,Y)). Generalization is refiied as instance generalisation Graphically, a given semantic network can be represented as (arc c(X,Y)) and class generalizafion (arc g(X,l’)). Each portrayed in figure 2. binary relationship has its reverse link (respectively p(Y,X),

Proceedings of the 17th International 4 Barcelona, September, 1991 Conference on Very Large Data Bases 2.2. The ObJect Orlented Data Model type tuple(name:string,age:integer, address:string,children:set(Person)) The 02 data model belongs to the category of the so-called add cla88 Agent inharite Person object oriented data models [Lecluse 871. Then, its basic type tupl*(code:string,salary:integer) concepts are objects and types, type constructors and type hierarchies. The data manipulation language could be the C /* method declaration */ language with embedded 02 expressions (called C02) [Haux method category: 8tring i8 public 881or an SQL like declarative query language (called LOOQ). body category:string in cla88 PgentC02 In the 02 data model, an object is composed of an /* method procedure */ identifier (the name of the object) and a value. Values could I if (self->salary > 50) be either: (i) atomic values (integers, reals, booleans, return (“VIP”) ; strings), for example: (iI,22), (i2,3.14); (ii) tuple values, for 1 example (i3, [name:“John”,age:22]); and (iii) set values, for The keyword inherits defines a hierarchy of types. The example (i4,(red, black, green)). Objects can be defined by keywords public makes the object-integrity method visible construction using list, tuple and set constructors. Objects can from anywhere. The keyword In class CO2 defines the mutually reference each other. For example: class for which this body is defined; this is useful to solve ambiguities of names, as method bodies can be specified (i7, [name:“John”,wife:i8]), (i8, [name:“Mary”,husband:i7]). independently of the class description. The brackets {) A class is an abstraction which represents a set of objects delimite the C source statementsof the procedure. with their behavior. A class is composed of two parts: (i) a The definition of a database schema in CO2 needs the type which contains the structure that characterizes all the knowledge of the objects structure, the status of objects, i.e. instances of the class, (ii) methods which contain operations identified object or non identified object (value), and the which will be applied to these instances. A class may have a sharing of the objects. basic type (integer, real, boolean, string) representing atomic objects, a tuple type representing objects with tuple values or 3. Mapping from the Semantic Level to the a set type representing objects having set values. The Operational Level following expressions are examples of classes : Person=name:string,age:integer], Employees={ p:Person). The The CO2 model describes both the static aspect (data constructors could be composed to create more elaborated structures) and the dynamic aspect (methods). Relationships types (e.g. sets of tuples or tuples of sets). between objects or object classes are not represented by a specific concept; but they are represented by a uniform way The 02 data model makes a clear distinction between based on objects composition and objects sharing. As in the identified objects and non identified objects. The formers can , references are the unique way to represent be stored and manipulated independently, while the lattcrs relationships between objects. Integrity constraints are not exist only as property values of other objects. For example, considered as specific concepts of the model; they are defined in the following specifications , persons and vehicles could be in a uniform way as any procedure describing the behavior of manipulated independently: Person= [name:sning, age:integer, an object. The object identity allows to make a clear vehicle:Vehicle], Vehicle= [number:integer, color:string] But distinction between objects having their own existence, and in the following example, the object vehicle exists only as a values which are only relevant when characterizing other composite attribute value of person: objects. The object identity is represented in CO2 by different Person=[name:string,age:integer, syntactic forms. vehicle:(number5nteger,color:string]] The semantic data model Morse is concerned only with the A partial order between types defines a hierarchy of types static aspect. The different categories of aggregation arcs within which the inheritance concept permits to transfer allow to specify different types of relationships between components from one type toward its subtypes [Lecluse 881. objects. Integrity constraints are represented as declarative assertions on the data structure. A method is a procedure which is associated to a type in order to describe the behavior of the instances of this type. In the following we are only interested in the mapping Methods introduce the notion of encapsulation which from Morse to CO2 and not in the reverse mapping. First we permits the manipulation of objects without any knowledge consider the structural mappings between the two models, about their structure, nor about the internal code of the then we study the representation of constraints by methods, procedurescorresponding to these methods. and we finally describe the general mapping process. This plan is made only for the soundness of the paper; in fact The CO2 language is an embedded databaselanguage (02) structural mapping rules often depend on the integrity into a procedural host language (C) [Haux 881. Besides the constraints [Bouzeghoub 901. . usual programming of algorithms, it permits to specify and access database objects. Objects are manipulated through 3.1. The mapping between objects methods, A method is characterized by its signature (its name, its type and the type of its parameters) and its body An atomic object defined in Morse is equivalent in 02 either (procedure). The following example shows the declaration of to an identified atomic object or to an atomic value (non types and the programming of methods in CO2 (version identified object) inside another object. A molecular object 1.0) : defined in Morse is equivalent to either an identified tuplestructured object or to a tuple value in 02. A class of /* type declaration */ objects defined in Morse is partly equivalent to a class of add cla88 Person objects defined in 02. Indeed, and as stated before, Morse Barcelona. September, 1991 Proceedings of the 17th International 5 Conference on Very Large Data Bases classes describe only the static aspect of the objects, while 02 by basic types in 02 (integer, real, boolean, string), all the classes describe their behavior too, thanks to methods, The other Morse integrity constraints are represented by methods fist part (a) of Figure 3 summerizes the correspondence in the 02 model. In the following, we illustrate this latter between the Morse objects and the 02 objects. case with cardinalities and functional dependencies. Methods which implement integrity constraints are particular in the 3.2. The mapping between constructors sense they are not directly invoked by the users but by other methods which guarantee the encapsulation of the concerned Both atomic and molecular aggregations defined in Morse are object. The part (c) of Figure 3 summerizes the different equivalent to the tuple constructor of the 02 model. More mappings between the Morse constraints and the 02 concepts. precisely, we have to include what is considered as domain constraints in Morse to obtain what is considered as attribute basic type in 02. For example, the following Morse MORSE CONCEPT 02 CONCEPT 1 specification: atomic object atomic objectl atomic value dom (Name, string) p(PERSON, Name) molecular object atn~atmxl objecthuple value p (PERSON, Age) dom (Age, integer) o (PERSON, Address) (a) class dass dam (Number, integer) p (Address, Number) subclass subclass p (Address, Street) dom (Street, string) p (Address, Postcod) dom (Postcod, integer) instance object will be mapped into 02 as for example: obiea identifier object identifier atomic aggregation tuple constructor Person= [Name:string,Age:ii&eqer,Pbdr:Pbdress] I I ~~~ ~~ I Address=[Nunber:int,Street:string,Postccd:int] (b) molecular aggregation tuple consuuctor which can be described in CO2 by the following statements: Classgeneralization Inheritancf! add claam Person typr tuplr (Name:atring, Age: intager, Addr:Address) add cl888 Address (cl type tupb (Number: int, Street : 8tring, Postcod:integer)) GeneralInkgrity m&cd Constraint I If we consider that all of Name and Age are values of the I Person (thus they are not identified), but the Address is an a: 0 esnondence between Morse and 02 CQB.QQQ object by itself (thus it is identified). Addr is called a reference; it is considered as an attribute of Person which Among various constraint we can specify over the referencesanother object, i.e. Address. semantic network, we consider here the mapping of cardinalities. Formally, cardinalities characterize binary The classification/iitanciation defined in Morse is partly relationships (a/p and r/o arcs) by specifiing the frequence of equivalent to an 02 class. In fact the Morse abstraction can object participation in a given binary relationship. More define a class only by extension, without necessarily precisely, a cardinality is a couple of values [m,n] which describing its structure. The generalization/specialization is respectively specify the minimum and the maximum number equivalent to the inheritance hierarchy in 02. In Morse, a of a given relationship instances to which the same object given class can be defined by generalization from other classes could participate. Cardinalities where n=l are called even the structures of these latters are unknown. Inversely, a monovalued cardinalities and those where n>l are called Morse subclassecan be defined as a restriction of a superclass, multivalued cardinalities. In the following, we study the but without any refinements on its structure. This makes the methods which will implement these constraints. As we generalization/specialization more general than a partial order have several situations, we will only focus on two examples. of types which is defined in 02. Case 1: p(X,Y,[l,l]) : which specifies that for a given The inheritance is defined in Morse as a logical property instance of X, there is only one instance of Y. For example: which propagates components and constraints of generic p(PERSON, Name, [l,l] 1 Dom (Name, string) classes to their subclasses. In the 02 model, there is a p(PERSON, Age, [l, 11) Dom (Age, integer) uniform formalization of hierarchies of types and inheritance (partial order of types). The part (b) of Figure 3 summerizes will be implemented into 02 as: the different mappings between the Morse constructors and the add cla88 PERSON 02 constructors. type tuple (Name: string,Age: integer) method Nulle-value:boolean 3.3. The mapping of the constraints body Nulle-value:boolean in cla88 PERSON CO2 Semantic integrity constraints are useful for many reasons: (if ((! (self->Name==(o2 string) NULL) 1 (i) to check the consistency of the object structure and values, &h (! (self->Age==(o2 int) NULL) )) (ii) and possibly to assist in the decision process which (return (true);) determines whether a Morse object coincides or not with an else return (false); 02 object. Except for the usual domains which are represented 1 Barcelona.September. 1991 Proceedings6f the 17thInternational 6 Conferenceon Very Large Data Bases PERSON=[Name:string, Age:integer, which specifies that for a Case 2: o(X,Y,[l,N]): Address:([Number:integer, given instance of X, there is N instances of Y. For example: Street:string,Postcod:integer, Town:string])] oP=, -ss, LNI 1 All other components are considered as values characterizing a P(ass, *, (Lll) Cm(Nwber, integer) person. Solution 2: One 02 object ADDRESS corresponding to Dan(Street, string) pt-ss, street, [l, 11) the whole Morse structure: p (Address, Dun (Postcod, string) Postccd, (1, 11) ADDRESS:[Number:integer,Street:string, Postcod: integer, Town: string, Dan(Town, string) P(-% Town, (Lll) Person : ( [Name : string, will be mapped into 02 as: Age: integer] 1 I In this case, persons do not have any existence, they are just 8dd Cl888 PERSON characterizing addresses. type tupl~(Mdr:~dzof(Address)) umthod Pcmded..set(min:integer, Solution 3: Two 02 objects corresponding to the two max:integer) :boolmn Morse molecular objects: 8dd Cl8U Pss type tuple (Nu&er: integer, Street : string, PERSON- [Name: string, Age: integer, Postcod: string, Town : string) ) Addr : (ADDRESSE) 1 body Bounded_set(min:integ,max:integ) :bool ADDRESSE=[Name:integer, Street:string, in ~1~88 PERSCNCO2 Postcod: integer, Town:string, Pers : (PERSON} ] l 02 set (Adress) x; x = (self->Addr); In this case there is a mutual reference between the two if ((min =c count (xl) && (count(x) ti max)) objects. A person references its set of addresses and an address (return (true); } else return (false); references its set of persons. 1 There are many other solutions where we can consider that 3.4. The obJect Identity and the object sharing towns or telephones are independant objects. To decide between all these solutions, a computer design tool can help In the Morse semantic data model, everything is considered as in the decision process by taking into account several an object. Each object has a unique identification, then heuristics derived, for example, from the following objects can be shared between different other related objects. parameters: In the 02 object oriented data model, there are objects and values; objects are sharable while values are not. So, when - Users’operations and general constraints defined on the mapping a Morse schema into an 02 schema, we have to Morse objects: basic operations like insert, delete and decide whether a Morse object can be considered as an 02 update, can be considered as the main means to identify object or as an 02 value. This decision mainly depends on objects. We shall see in the next section how these the user’s desire in the way to manipulate his database. He operations are detined in the Morse semantic data model. could arbitrarily decide whether a given Morse object is an 02 - Cardinality constraints defined over arcs a/p and r/o of object or value. For example, for the mapping of the the semantic network: if the minimal cardinality of one of following Morse schema (figure 4), he can envision many these arcs is equal 0, then the origine object of the arc can solutions: exist independently of the related one. 4. Extending the Semantic Data Model to (PERSON hut Represent General Constraints This section describes the generalized integrity constraints. Many previous works have been done to manage integrity constraints [Brodie 811 (Brodie 841 [Brodie 861 [ Bertino 841 [Tsalgatidou 901 Different formalisms have already been proposed to specify integrity constraints in the , for example [Morgenstren 891 allows their specification through constraints equations and [Oliv6 891 uses a Deductive . In our approach, general integrity constraints are first order logic formula whose variables refer to the content of the semantic database. Before Number S&t P&cod Town presenting these constraints, let us give a formal . : OtJJ&$ representation of a semantic database as well as for its conceptual schema and for its extension. This representation Solution 1: One 02 object PERSON describing the whole is not intended to implement real databases but just to give a Morse structure: formal abstract representation in order to correctly specify integrity constraints.

Barcelona, September, 1991 Proceedings df the 17th International 7 Conference on Very Large Data Bases 4.1. The representation of a semantic database representation for a formal reasoning. It can be considered as an abstract representation of the content of a given database. A Morse databaseschema is composed of: This representation permits a better understanding of the - the list of names of all classes of atomic objects (i.e. constraint specifications, and provides a convenient framework instances of NA), for a CASE tool.

- the list of names of all classes of molecular objects (i.e. 4.2. The Internal representation of general instances of NM), lntegrlty constraints - for each atomic object, its domain values (basic type), A general integrity constraint is a first order closed formula, - for each molecular object, its data structure (i.e. the set of restricted to conjunction connectors and at most only one all its p/a and oh arcs), implication symbdle. Variables can be quantified existentially or universally. The universe of discourse in - for each binary relationship (i.e. p/a and o/r arcs), its which these formulas are interpreted is constituted as follows: cardinalities, - a set of constants: composed of (i) the union of atomic - for each multiple reference to the same component, the objects domains (VA), (ii) the union of atomic objects different roles played by the component in the abstraction. identifiers (IA) and molecular objects identifiers (IM) and For Example: of (iii) the union of class names of atomic objects (NA) and class names of molecular objects (NM), i (NA, Name, string) - a set of variables taking their values in the previous i (NA,Age, integer) defined universe of discourse, i (NA, number, integer) - a set of predicates: composed of (i) all atomic and piPERSON,Name, [l,l] [l,Nl) molecular aggregation relationships (i.e. p/a and o/r arcs), p(PERSON,Age, [l, 11 [l,Nl) (ii) instance generalization and class generalization p(VEHICLE,Power, [l, 1) [O,N]) relationships (i.e. c/i et g/s arcs), (iii) usual mathematic . . . ptiicates : c, >, <, 2, =, #, and the v (value) predicate. i (NM, PERSON) For example, over the previous database schema, we can i (NM, VEHICLE) define a general integrity constraint which states that if the i (NM, CONTRACT) vehicle power is greater than 10 and the person’s age is less than 20, then the contract premium is at least equal to 5000. ~;CONTRACT,PERSON, [I,11 [I,N]) o(CONTRACT,VEHICLE, [l, 11 [l, I.]) ICl: VP bfC VV tlG \JS VM QVG VVS VVM . . . [i (PERSON,P) A i(VEHICLE,V) A g (CAR, VEHICLE) i(CONTRACT,C) A i(Age,G) A As previously stated, everything in Morse is an objecl. i (Power, S) A i (Premium, M) Then each atomic or molecular object is formally identified. A O(c,P) A O(c,v) The relationship between an atomic object identifier and its A p(P,G) A v(G,VG) A VG<20 corresponding value is represented by a specific predicate V. A p(v,s) A V(s,vs) A vs>lo The relationship between a molecular object identifier and its A p(C,M) A v(M,VM)I corresponding structur& value is represented by a sequenceof ->[VMr5000] v predicates. This systematic identification of all objects As these constraints are specified using the same semantic implies a systematic sharing of objects. Then values of arcs as for describing the static data structure, they can be objects are represented only once. This identification permits represented by a semantic network in which each variable or also an independent manipulation of all object classes. The constant is represented by a node. Variable nodes are generalization arcs (i.e. g/s) are not directly represented in a considered as instances of object classes. The quantifier databaseextension. They are captured by the inclusion of sets corresponding to each variable is represented as a of identifiers with respect to the generalization hierarchy. The complementary information of the arc I relating a variable to following example is an extension of the previous database its class. For example, i(Person,x,t/) describes a variable x schema: universally quantified over the class Person. As the order of the quantifiers is meaningful in a given formula, an indice is I (PERSON,Pl) it-,w, v(NL~PH associated with the quantifier. For example, i(Person,x,V,l). i (Pge,A.U VW, 33) Finally, new binary arcs (inf, sup, equ, einf, esup, diff) are v(N2,duracd) i (PERSON,P2) i(Name,N2), added to the semantic network to represent the predicates: C, i (pge,JQ) VW, 44) >, =, <, 2. To give more meaning to this representation, we i(vEHIcIE,w i (Mrmber, 11) v(Il,l23) must complete each predicate to specify whether it belongs to i (Power,WU VW, 5) the left hand side or to the right hand side of the rule I(vEHI(=IE,v2) i oaunber, 12) v(I2,345) representing the integrity constraint. i (Power,W2) VW, 7) i(vEHIcIE,v3) I (Nunber, 12) v(I2,345) ICl: i (PERSON, P,v, 1, left, ICl) i(m,Cl) I (Premiun, Ml ) v(Ml, 5500) i (VEHICLE,V, V, 2, left, ICl) it-,a I (Premiun, M2) VW, 6000) i (CONTRACT, C, V, 4, leftright, ICl) Obviously this representation is not defined for implementing real databases, but just as a formal i(Age,G,V,S,left,ICl)

Barcelona, September, 1991 Proceedings df the 17th International 8 Conference on Very Large Data Bases i(Power,S,V,6,left,ICl) X(Al:doml,...,A,:dom,) i(Premium,M,V,7,right,ICl) <=> i (NA, Al, doml) , .. . . i (NA, An, dam,) , i (NM, X) o(C,P,left,ICl) p(XtAl) t -I P(XtAn) o(C,V, left, ICl) or by the following if the structure is composed by molecular p(P,G,left,ICl) v (G, VG, left, ICl) objects: inf (VG.20, left, ICl)

p(V,S,left,ICl) v(S,VS, left, ICl) X(Y1, .. ..Y.) sup(VS, 10, left, ICl) <=> o(X,Yl),oaar O(XtYn)t i(NM,X)

p(C,M,left,ICl) v(M,VM,left,ICl) or by the following statement if the structure is either sup(VM, 5000, right, ICl) composed of atomic objects and molecular objects. The following schema illustrates the representation of the X(Al:doml, .. ..A.:dom,,Yl, .. ..Y.) constraint The lower part represents the static data ICl. <=> i (NA,Al, doml) , .. . . i (NA,A,, dam,) , schema, the upper part represents the behavioral schema. In this latter one, we have separated the rule left hand side part i (NM, X) and the right hand side part; although some nodes appear in p(X,Al), .. . . p(X,A,) the both parts. When. the constraint doesn’t have an O(X,Yl) ,*+., Cl(Xt Yn) implication symbol, the semantic network doesn’t have a left hand side. Lf the cardinality constraints are specified, we shall have the following description:

X( tAl:dom1) [amI, anI1 , *.. , A n:dOmn, tam,, l n,1 , (Yll, l-1, rnll , .. . , ym, [rmn, rn,l 1 <=> i(NA,Al,doml), .. . . i(NA,A,,domn), i (NM, X) p(X,Al, tl,Nl [aml,anll),

P(X,Ant [l,Nl [amn,annl 1, o(X,Yl, [1,11 [rq, rnll 1, ..a, o(X,Y,, [1,11 [rm,,rn,l)

(2) Each set of generalization arcs can be declared as follows: g(X,Y) <=> x: Y g(X, Yl) ,...,g(X,Yn) <=> X : Yl, .. . , Yn g(Xl,Y) , .. . . g (Xn, Y) <=> Xl,...,Xn:Y

le of rule reoresentatb Example of an external speclflcatlon: PERSCN(ssn(l,l], name, (surname], age, ADDRESS). 4.3. The semantic object oriented language ADDRESS (no, street, code) VEHICLE (number, colour, type, power). The Morse language exposed in the previous section is a CONTRACT ( PERSON, VEHICLE, premium) . formal language to represent the detailed. . description of a CLIENT : PERSON. conceptual schema. m IS not Wded to be used b end-w. Consequently, we need a friendly user interface b) Specification of general constraints to specify data structures and constraints. This subsection describes the declarative language offered lo specify an The external interface to specify general constraints must application and its integrity constraints. allow the user to specify easily his integrity constraints defined over the external description of the data structures (i.e. a) Specification of data structures previous data language). Each integrity constraint is specified as a production rule. The external language must have the (1) Each set of p or o predicates which defines the siructure of same expressive power as the Morse formal language, but a molecular object class is replaced by the following must be more concise and more easy to learn and to use. The statement, if the structure is composed of atomic objects: external constraint language is built from the Morse formal language as follows:

Barcelona,September, 1991 Proceedingsof the 17th International 9 Conferenceon Very Large Data Bases (1) The alphabet of the external language is roughly the same Example 2 : For each contract relating a person and a as that of the internal language; except that “A” and “->” vehicle, if the age of the person is less than 20 and the power symbols are respectively replaced by “and” and the two of the vehicle greater than 10, then the premium of the keywords “if” - “then” to distinguish between the left part contract is at least equal to 5000. and the right part of a given rule. The quantified variables Vx et 3x are respectively replaced by (x) and [x] to IC3:{person/Person) {vehicle/Vehicle) alleviate the absence of the mathematical symbols in {contract/Contract) {age/Age) common keyboards. {power/Power]{premium/Premium)

(2) The domain of interpretation of the external language is .* If contract->person.age<20 the same of that of Morse language: we distinguish and contract->vehicle.power>lO names of atomic object classes (NA) and molecular object Then contract.premium25000. classes (NM), atomic and molecular object identifiers (respectively IA and IM) and the values of atomic objects 4.4. Code generation from lntegrlty constraints (VA). (3) The following restriction is made for variables: the scope This subsection deals with 02 code generation from logical formulas describing integrity constraints. Before this of each defined variable is the set of instances of a specific class. We use the notation x/class-name to represent generation process, a semantic control of each formula is done. Then we discuss the method definition and attachment, this declaration. at the operational level. (4) The only allowed predicates are: <, >, <, 2, =, f. These predicates apply only on atomic values. a) Consistency checking of integrity constraints (5) To access object identifiers and objects values through The consistency checking of the constraints aims to verify another object, the following functions are defined: in one hand the semantics of the constraints and in other hand their compatibility with the static database schema. It is a) A value vy of an atomic object y through another composed of the following steps: object x is delivered by the function “.” defined as follows: Bach constraint variable must be defined over an existing Ix x Iy -> vy class of the static databaseschema, (XtY> -> X.Y=V~ / i(X,x)Ap(X,A)Ap(x,ai)r\v(ai,vai) For each function symbol there must correspond an aggregation arc in the static semantic where IX and VY are respectively the set of all identifiers network, of the class X and the class Y, and where VY is the set of Arguments of the same predicates have compatible types, all values of the class Y. No predicate is subsumed by another predicate, Check wether different predicates of the same formula are b) The accessto an object identifier through another object contradictory or not, redondant or not, is done by the function “->‘I defmed as following: Ix x NC -> IO As we have not considered the exception handling, no constraint has to be contradictory with another one. We -> x->Y = iy/ i(Y, iy) A o(x,iy) (x,Y> only check the consistency of the set of constraints, but were IX is the set of instances of X, NC the set of all some works could also be done on the problem of class names (i.e. NM+NA) and IO the set of all object satisflability of this set of constraints [ Bry 861. identifiers (i.e. IM+IA), and were x, Y, iy be respectively b) Methods definition and attachment elements of these categories. An integrity constraint is a fust order formula specified on To facilitate the rule expression, we introduce the a semantic network. To give an interpretation to this formula following compositions of functions : (by assigning one of the logical values: true or false) with i) “x->y->z.. .” which is equivalent to: respect to the application universe of discourse represented in “x->Y = y and y->Z = z and.. . “5 the database, we must generate one or several enforcement ii) “x->y.z = v” which is equivalent to : proceduresdepending on different kinds of updates expected for “x->Y = y and y.z = v”. the database (insert, delete, modify). For example, from the following constraint which asserts a classical referential (6) The only allowed terms are constant terms, variable terms constraint, and functional terms obtained by “.” et “->” function symbols. RC: (person/PERSON) (8) The well-formed formulas are those elaborated with (agency-name/Agency-name) conjunction (and) and one implication (lf...Then). [agency/AGENCY] [name/Name] person.agency-name = agency.name. Example 1 : If a student has at least one mark less than 16, then his honors is not a first class. we may generatetwo enforcement procedures: - one procedure Ml triggered by the insertion of a person IC2 : {student/Student) [mark/Mark] (honors/Honors 1 (or the modification of his agency-name), which checks If student.mark

Proceedings tif the 17th International 10 Barcelona, September, 1991 Conference on Very Large Data Bases - one procedure M2 triggered by the deletion of an agency IC3-tmployee2 : for one given employee, each time his (or the modification of its name), which checks whether age is lower than an other employee’s age. his salary is lower referencing persons exist or not. too. Then, we notice that from one constraint specification, we An optimization step will verify that the two generated may generatedifferent controle procedures, attached to different methods on the same class are not similar. This happens objects. We call each of these procedures a constraint-method. when the constraint is perfectly symmetrical in relation to the As the example shows, each constraint-method is attached to a two variables). specific class. A given constraint-method attachment is In the process described in this section, we have not characterized by the following tuple: (Const r Name, considered the case where several different logical formulas Class name, Set-of-updates) where set ol’updates may generate a unique constraint-method. We just focused on can be-( insert, delete, modify,...). Then an integrity the case where a formula may generate one or several constraint specification may be characterized by a set of methods. attachments of this form. For example, the set of I . attachments characterizing the previous constraint RC is the Jdentification of an Operation whcation following: af an htemin, ~onsrrpint For each couple (method, class) previously built, we have RC-A : { (Ml,PERSON, (Insert,Modify)), to determine if it is concerned by a constraint on insertion, (M2,AGENCY, (Delete, Modify)) } modification, or deletion. Our approach consists in The code generation of constraint-methods from a logical considering on one hand, the type of quantifier applied to the constraint specification needs the knowledge of: variables of the class and on the other hand, the position of 1) Classes involved in the constraint specification (known the predicate defined on these variables (left hand side (LHS) through variable declaration), methods needed. or right hand side (RHS) of the rule). In each case, we will 2) For each involved class, update operations which trigger show which operations (insertion, modification, or deletion) the methods implementing the constraint. could induce a violation of an integrity constraint, and thus, require the enforcement of the constraint. This method is very Identifving involved classes close to those used for constraints simplification in the We fist suppose that the distinction between identified relational model [Nicolas 82) [Bry 881 [Quian 881 [Borla 901. objects and value-object (with respect to 02 concepts) is Because we operate at the design step and not during the step already done. Each class of identified objects involved in the of integrity checking by the SGBD, we can’t take into account constraint needa a method in order to check the constraint after the instances of the base, nor the queries, but only the syntaxe an update done on an object of this class. We name the of the constraint. method by the concatenation of the name of the constraint and Let us for example consider the case of a rule where the name of the class. For example, from the constraint: variables are ranging over the class X. The rule is of the form IC2: {employee/Employee) : “(3 x/X (P(x)) => Q”. At any moment in time, for (probationer/Probationer} example, before an operation which will update the database, (salary-l/Salary) (salary-2/Salary) the assertion “(3 xjX P(x)) => Q” is necessarily valid since it employee.salary-1 > probationer.salary-2. corresponds to an integrity constraint. The truth table will generate two methods, IC2-employee, attached to the presented in figure 6 shows that the formula “(3 x/X P(x)) => Q” can be valid in three different states; these correspond to Employee class, IC2grobationer. attached to the probationer class. Even if the two methods are generated from the same lines 1,3 and 4 of the table. constraint, the corresponding bodies are not the same. The method generated for the class Employe will check that the salary of an employee is greater than the salary of each probationer (it will be triggered after an update over ITITIT I employeee). The method generated for the class Probationer will check whether or not his salary is less that the salary of every employee (it will be triggered after an update over Probationer). Fie. 6: Truth table for (UP) = > Q If several variables of a class appear in a constraint, one method is generated for each variable. For example, the Let us now consider the impact of an operation applied to constraint : an object of a class X over lines 1, 3 and 4 of the truth table. IC3 : (enployeeJ/~loyee), (er@oyee2/~loyee) An operation on the object x of class X can change the truth (agel/Age) {age2/Agel value of the predicate (3x P(x) ) (see figure 7). If the truth (salaryl/Salary} (salary2/Salary2) value of 3x P(x) changes while it was in a state which If employeel.agel > employee2.age2 corresponds to lines 1 or 3 of the table, the database remains Then employeel.salaryl>employee2.salary2” in a coherent state (we only change from line 1 to line 3 or vice-versa). However, if the predicate was previously in a will generate two methods for the employee class : state which corresponds to line 4, and if the operation applied iC3-employee1 : for one given employee if his age is to an object x changes the truth value of the predicate 3x P(x) greater than another employee’s age, his salary is greater too, from false to true, then the integrity is violated because the predicate assumesthe truth value of line 2.

Barcelona, September, 1991 Proceedings of the 17th International 11 Conference on Very Large Data Bases { 02 set (string) ENS; 3P(x) 0 SetRes = set () ; if (! ([self NulleValue] 1) may change to F (SET t= set (“NulleValue”) ; 1; if (! ([self UniqueValue] )) (SET t= set( "UniqueValue");); may change to T if (! ( [self BoundedSet] )) (SET t= set (“PoundedSet”) ; ); may change to T if (! ( [self IClPerson] )) (SET += set (“IClPerson”) ; ) ; . : anaes which mav occur after an oneration over if ( ! ( [self IG!Person] )) ansbl=t of class (SET += set (“IC2Person”) ; ); We will now try to identify the operations which can, return (SetPes) ; when applied to objects of class X, change the value of the 1 predicate 3x P(x) from false to true. This logical change In this example, “NulleValue”, “UniqueValue” and may occur if before the operation, no instance of class X “BoundedSet” come from the mapping of the MORSE satisfied the predicate P, and after the operation, an object ol structures. IClPerson and IC2Person comes from the X satisfies the predicate P. This can only happen under the mapping of the integrity constraints ICI and IC2 explicitly two following operations: given by the user as rules. - an object which satisfies the predicate P has been inserted into x, Each object-integrity method (i.e. the method - an object of class X which didn’t satisfy P has been Objectlntegrity of the class Person in the previous example) modified to a value which satisfies P. returns a set type value. This set (e.g. SetRes in the previous cxamplc) contains the names of the constraint-methods which We conclude by stating that the integrity constraint must were not satisfied during the update operation, Depending on be applied to class X in the case of an insertion or a whether this set is empty or not, the programmer can commit modification of an object of the class X. or not the update operation. For example, the generator provides a new insertion method which inserts an object in c) The generator of CO2 code the extension of its class. ln the definition of this method, the corresponding “ObjectJntegrity” method must be activated to The generator of CO2 code is composed of a set of be sure that the update is allowed with respect to the integrity mapping rules which transform objects, relationships and constraints : integrity constraints of Morse toward objects and methods of 02. As each Morse object may satisfy one or several add method insert:boolean integrity constraints, each corresponding 02 object may in close CO2 PERSON satisfy one or several methods called “constraint-methods”. body insert:boolean in claaa CO2 PERSON These latter methods are particular in the sense they are I 02 set (string) ENS activated by other methods which realize the encapsulation of SetRes = [self Integrity] ; the object. Then each update operation on a given object if (SetRes==(o2 set (string) 1 set () 1 should activate by message passing the set of constraint- (PERSONt=set(self)];return(true);] methods associated to this object. This set of constraints is else (printf(“Integrity constraints called the “object-integrity”. It can be itself considered as a not respected: ‘I); general constraint-method associated to an object. Thus each display (SetRes) ; update operation has to know only one general constraint- return (false);) method instead of knowing the set of all specific constraint- methods. ) ln the previous code generation, the cost of the integrity The CO2 code generator is composed of two parts: (i) one checking process is not considered. Constraint-methods are part generatesthe definition of the object data structure and the specified in such a way they semantically correspond to the constraint-methods signatures, (ii) the other part generates the declarative assertions of the semantic network. The experience body of the object-integrity method and the bodies of the in traditional databaseshas shown that integrity checking is a corresponding specific constraints-methods. An example of very expensive process [Simon 841. If we want to avoid the code generation could be the following: multiple scanning of the same class, we have to merge in the same procedure the different constraint-methods which have 8dd Cl888 PENa- been defined for this class and don’t modify the objects. type tupla (Name:atring,Zqe:integer, kkir:aatof (ADDRESS)) method ~jectIntegrity:mot (&ring) i8 public 5. The Prototype NulleValue:boolrm A large number of database design tools are based on tJniqueVelue : boolrm semantic data models. Secsi is one among others Bounded&t (min:int,~~~:int) :boolean [Bouzeghoub 851. It was devoted to the design of relational ICIPerscn :bool*a databases. Starting from the Secsi Expert System IC.2Person:boolaan environment, we have implemented a prototype for all the methodology previously developped. Much of the experience body t%jectIntegrity:ret (string) we got from this previous project (e.g. interactive acquisition in clm8 PERSONCO2 of knowledge, completeness of specifications, consistency

Barcelona, September, 1991 Proceedings 6f the 17th International 12 Conference on Very Large Data Bases checking) is reused in the new design tool prototype integrity constraint. In addition to the class declarations and [ Bouzeghoub 901. the methods describing the integrity constraints, the tool generatesother useful methods necessary for the manipulation During the analysis step, an aid is provided in order to refine incomplete specifications. To simplify the language, a of the objects. For example, we need a method called “ObjectIntegrity” which groups all the integrity constraints few syntactic sugars can be introduced. The absence of a concerning a given object. This global method is called by variable declaration, the usage of class names instead of class any other operation defined on the object class concerned (e.g. variables are examples. The control phase in the compilation an insertion operation creates an object instance, adds it into process can be vued as a design specification phase where the extension of the corresponding class and then calls the rules can be corrected by the system (with the help of the “ObjectIntegrity” method which insure the integrity defuKd on user) instead of brutally rejecting the rule. This phase occurs the class. after the syntactic analysis of the constraint and before its semantic analysis. 6. Conciuding Remada and Current Extensions m vari&& : The user can omit variables. For example he can specify : In this paper, we have described a general framework for a CASE tool devoted to the design of object oriented databases. IC7 : If Person.Age < 20 The design approach is based on two levels: the semantic Then Person.,salary c 1000. object oriented level and the operational object oriented level. The first level is based on a semantic data model which was In this case, the system must introduce variables and their extended to represent more information about the behavior of corresponding quantifien. By default, the quantifier is V and objects (general integrity constraints and deductive rules). its scope is the whole formula. The preceding formula is then The second level is more operational, and is based on an translated into the following one: existing object oriented DBMS. The design methodology described in this paper is implemented in the Secsi Expert IC7 ’ : {person/Person) (age&e) (salary/Salary) system environment which already provides a design If person.age < 20 environment for relational databases. Then person.salary < 1000. This design tool is interfaced with 02 object oriented This transformation supposes that the integrity constraint is database system. It automatically generates a CO2 database mono-object. However, this must be confirmed by the schema and gives a very convenient way to populate the designer. database and to check its constistency with respect to the . . 6 constraint-methods generated. A syntactic analysis of v : We can envisage that the user may specifications, an interactive acquisition aid of constraints and introduce integrity contraints on multiple objects without rules and a set of consistency checking rules are provided too. specifying the predicates which will serve as links between This design environment can be considered as a powerful1 these objects, for example : mean for validating user requirements against an image of the projected databaseapplication. ICB : If Person.AgelO. Then Contract.Premium > 5000. An extension concerning behavioral rules is in progress. We call “behavioral rule” an assertion which has in its right In this case, the links between objects are found in the static hand side some updates to be performed on the database; the graph, and the integrity constraint can be rewritten as : left hand side has the same syntax and the same semantic than a constraint rule’s left hand side. Further developments should ICB ’ : If Contract->Vehicle.Power > 10 mainly concern the formalization of the external language. and Contract->Person.Age < 18 Besides this formalization aspect, many other problems Then Contract.Premium > 5000. remain, like the efficiency of the code generation procedure Many links may exist between two objects. For example, and the code optimization. if a constraint holds on vehicles and persons, the link between them can either represent the fact that a person drives a car or References the fact that a person owns a car. So the designer has to choose one of the reformulations proposed by the system. [Bancilhon 871 BANCILHON F. “Les objectifs scientiiques du GIP Alta’ir” , Rapport Altaii 8/1987. Similarly, each time there is a lack in the static schema, it [Bernstein 82) BERNSTEIN P. & BLAUSTEIN B. “Fast may be completed to be consistent with the integrity Method for Testing Quantified Relational Calculus constraints. For example, the constraint “Person. Age>O” Assertions” ACM-SIGMOD Conf., Colorado, June 1982. may create in the static semantic network an object ‘person’ [Bertlno 841 BERTINO E. & APUZZO D. “Integrity which is composed by an object ‘age’ whose type is ‘integer’. aspects in Database Management Systems” Proceed. of The aid provided by the design tool is glaring at sight of lntemat. Conf. on Trends and Applications of Databases” the final results. The previous constraint, IC8, can be IEEE-NBS, Gaithersburg, USA 1984. expressed with two lines by the user (because of the [Borla 901 BORLA-SALAMET P. “Le contr8le de refinement made by the tool and because of the conciseness l’integrit6 Semantique des bases de don&es Relationnelles et of the rule langage). The generated code is about twenty lines d&ductives” Phd thesis Paris VI, 1990. (because of the procedural aspect of the langage used). One [Bouzeghoub 841 BOUZEGHOUB M. “MORSE: a object type specified by one line including cardinality Functional Query Language and its Semantic Data Model” consuaints, may generate two pages in CO2 code because Proceed. of Internat. Conf. on Trends and Applications of some cardiialities have to be expressed by methods, like any Databases” IEEE-NBS, Gaithersburg, USA 1984.

Barcelona, September, 1991 Proceedings df the 17th International 13 Conference on Very Large Data Bases [Bouzeghoub 851 BOUZEGHOUB M., GARDARIN Cl. & [Lecluse 881 LECLUSE C,RICHARD P. & VELEZ F., METAIS E. “SECSI: An Expert System for Database “Modeling Inheritance and Genericity in Object Oriented Design” 1lth VLDB Conf., Stockholm Sweeden 1985. Databases,Version 1” Rapport Altai? 18/ 1988. [Brodie 811 BRODIE M. “On Modelling Behavioural [Loucopoulos 891 LOUCOPOULOS 0. & KARAKOS- Semantics of Databases” 7th VLDB Conf., Cannes, France TAS V. “Modelling and validating office information 1981. systems: an object and logic oriented approach” Software [Brodie 841 BRODIE M., MYLOPOULOS J., SCHMIDT Engineering Journal, March 1989. Y. “On Conceptual Modelling: Perspectives from Artificial [Morgenstern 841 MORGENSTERN M. “Constraint Intelligence, Data Bases and Programming languages” Equations : Declarative Expression of Constraints with Springer-Verlag, NY 1984. Automatic Enforcement”. 10th VLDB Conference, [Brodle 861 BRODIE M. & MYLOPOULOS J, “On Singapore, 1984. Knowledge Base Management Systems” (editors) Springer [Nicolas 821 NICOLAS J.M. ” Logic for Improving Verlag, 1986. Integrity Checking in Relational Databases” Acta Informatica, [Bry 861 BRY F., MANTHEY R. “Checking July 1982. Consistency od Database Constraints : a Logical Basis”, 12th [Olive 891 OLIVE A. “On the design and implementation VLDB Conf, Kyoto, Japan, 1986. of information systems from deductive conceptual models” 15 [Bry 881 BRY F., DECKER H. “Preserver l’integrite th VLDB Conference, Amsterdam, 1989. d’une base de donntes deductive : unee methcde et son [OODB 861 Object-Oriented Databases, Proceed. of the 1st implementation”, 45me joumBes BD3, B&&et, France, 1988. Intemat. Workshop, IEEE Computer Society Press 1986. [Buchmann 861 BUCHMANN A-P., CARRERA R.S. & (Qulan 881 QUIAN X. “An effective Method for Integrity VASQUEZ-GALINDO M.A. “A Generalized Contraint and Constraint Simplification” 4th Conf. on Data Engineering, Exception Handler for an Object Oriented CAD-DBMS”, in Los Angeles, California, 1988. [OODB 861. [Slmon &I] SIMON E., VALDURIEZ V. “Design and [Gustat’sson 831 GUSTAFSSON M.R., BUBENKO J.A. Implementation of an Extendible Integrity Subsystem” & KARLSSON T. “A declarative Approach to conceptual Proceed. of ACM SIGMOD Intemat Conf., Boston, USA, information modelling” in OLLE,SOL,VERRIJN-STUART 1984. (eds): Information System Methodology: a comparative [Smith 771 SMITH J.M. & SMITH D.C.P., “Database approach, North Holland Publ. Co 1983. Abstractions: Aggregation and Generalization” ACM TODS [Hagelstein 881 HAGELSTEIN T. “A declarative June 1977. Approach to information system requirements” J. Knowledge [Tsalgatidou 901 TSALGATIDOU A., KARAKOSTAS Based Systems, l(4) 1988. V. % LOUCOPOULOS P. “Rule-Based Requirements [Hammer 811 HAMMER M.M. & McLEOD D-J,, Specification and Validation” Proceed. of the 2nd Nordic “Database Description with SDM: A Semantic Data Model” Conf. on Advanced Information System Engineering, Kista ACM TODS Vo16,N03, Sept 1981. CAiSE90, in Lecture Notes in Computer Sciences (436), [Haux 881 HAUX L. , C. LECLUSE, RICHARD P. Steinholtz-Solvberg-Bergman (Eds), Springer-Verlag May “The CO2 V0.4 Language and Some Extensions, Release 1990. 3.1” Rapport inteme Altair 4-88, octobre 1988. [Tucherman 851 TUCHERMAN L., FURTADO A. & [Lecluse 87) LECLUSE C., RICHARD P. & VELEZ Casanova M.A. “A Tool for Modular Database Design” 1lth F.“An Object Oriented Data Model” C. Rapport Altaii 101 VLDB Conf, Stockholm, Sweeden 1985. 1987.

Barcelona, September, 1991 Proceedings of the 17th International 14 Conference on Very Large Data Bases