<<

Modelling in UML By Geoffrey Sparks, [email protected] : http://www.sparxsystems.com.au Originally published in Methods & Tools e-newsletter : http://www.martinig.ch/mt/index.html

Introduction Profile (as proposed by Rational ). When it comes to providing reliable, Some familiarity with object-oriented flexible and efficient object persistence design, UML and for software systems, today's designers modelling is assumed. and architects are faced with many choices. From the technological The Class Model perspective, the choice is usually between pure Object-Oriented, Object- The Class Model in the UML is the Relational hybrids, pure Relational and main artefact produced to represent the custom solutions based on open or logical structure of a software system. proprietary file formats (eg. XML, It captures the both the data OLE structured storage). From the requirements and the behaviour of vendor aspect Oracle, IBM, Microsoft, objects within the model domain. The POET and others offer similar but techniques for discovering and often-incompatible solutions. elaborating that model are outside the scope of this article, so we will assume This article is about only one of those the existence of a well designed class choices, that is the layering of an model that requires mapping onto a object-oriented class model on top of a relational database. purely relational database. This is not to imply this is the only, best or The class is the basic logical entity in simplest solution, but pragmatically it the UML. It defines both the data and is one of the most common, and one the behaviour of a structural unit. A that has the potential for the most class is a template or model from misuse. which instances or objects are created at run time. When we develop a logical We will begin with a quick tour of the model such as a structural hierarchy in two design domains we are trying to UML we explicitly deal with classes. bridge: firstly the object-oriented class model as represented in the UML, and When we work with dynamic secondly the relational . diagrams, such as sequence diagrams and collaborations, we work with For each domain we look only at the objects or instances of classes and their main features that will affect our task. inter-actions at run-time. We will then look at the techniques and issues involved in mapping from The principal of data hiding or the class model to the database model, encapsulation is based on localisation including object persistence, object of effect. A class has internal data behaviour, relationships between elements that it is responsible for. objects and object identity. We will Access to these data elements should conclude with a review of the UML be through the class's exposed behaviour or interface. Adherence to Person Person Class attributes : the encapsulated - Address: CAddress data # Name: # Age:St i double

+ getAge() : int + setAge(n) + getName() : +setName(s)St i A simple person class with no state or Class operations: behaviour shown the behaviour

Attributes and operations define the state of an object run-timet and the capabilities or behaviour of the bj t

Figure 1 - Classes, attributes and operations this principal results in more relationship that is most interesting: for maintainable code. example an Address class may be associated with a Person class. The Behaviour mapping of this relationship into the relational data space requires some Behaviour is captured in the class care. model using the operations that are defined for the class. Operations may Aggregation is a form of association be externally visible (public), visible to that implies the collection of one class children (protected) or hidden of objects within another. Composition (private). By combining hidden data is a stronger form of aggregation that with a publicly accessible interface and implies one object is actually hidden or protected data manipulation, composed of others. Like the a class designer can create highly association relationship, this implies a maintainable structural units that complex class attribute that requires support rather than hinder change. careful consideration in the process of mapping to the relational domain. Relationships and Identity While a class represents the template Association is a relationship between 2 or model from which many object classes indicating that at least one side instances may be created, an object at of the relationship knows about and run time requires some means of somehow uses or manipulates the other identifying itself such that associated side. This relationship may by objects may act upon the correct object functional (do something for me) or instance. In a programming language structural (be something for me). For like C++, object pointers may be this article it is the structural passed around and held to allow objects access to a unique object with which to locate or re-create the instance. required object. Often though, an object will be destroyed and require that it be re- The created as it was during its last active instance. These objects require a The relational has been storage mechanism to save their around for many years and has a internal state and associations into and proven track record of providing to retrieve that state as required. performance and flexibility. It is essentially based and has as its Inheritance provides the class model fundamental unit the '', which is with a means of factoring out common composed of a set of one or more behaviour into generalised classes that 'columns', each of which contains a then act as the ancestors of many data element. variations on a common theme. Inheritance is a means of managing Tables and Columns both re-use and complexity. As we will see, the relational model has no direct A relational table is collection of one counterpart of inheritance, which or more columns each of which has a creates a dilemma for the data unique name within the table construct. modeller mapping an Each is defined to be of a onto a relational framework. certain basic data type, such as a number, text or binary data. A table Navigation from one object at run time definition is a template from which to another is based on absolute table rows are created, each being references. One object has some form an instance of a possible table instance. of link (a pointer or unique object ID)

A class hierarchy Person Aggregation captures the concept Family showing a generalised collectionof or composition between classes person class from which other classes are derived

Parent Child The main relationships we are interested in are Association, 2 Aggregation and Inheritance. 1..n These Association captures describe the ways classes interact a having or using or relate to each other relationship between classes

Figure 2 - UML Class model notation Share A Person may own a set of Shares

A Person is composed of a strict set of ID 0..n A Person may documents (having n reside at zero elements) or more addresses 0..1 Address Person ID Document

0..n 0..n 1 n

An Address may Three forms of the Aggregation relationship. The weak form is have zero or depicted with an unfilled diamond head, the strong form more Persons in (composition) with a filled head. residence

Figure 3- Aggregation Relationships

Public Data Access Database stored procedures provide a means of extending database The relational model only offers a functionality through proprietary public data access model. All data is language extensions used to construct equally exposed and open to any functional units (scripts). These process to update, query or manipulate functional procedures do not map it. Information hiding is unknown. directly to entities, nor have a logical relationship to them. Behaviour Navigation through relational data sets The behaviour associated with a table is based on row traversal and table is usually based on the business or joins. SQL is the primary language logical rules applied to that entity. used to rows and locate instances from a table set. Constraints may be applied to columns in the form of uniqueness Relationships and Identity requirements, relational integrity constraints to other tables/rows, The of a table provides the allowable values and data types. unique identifying value for a particular row. There are two kinds of Triggers provide some additional primary key that we are interested in: behaviour that can be associated with firstly the meaningful key, made up of an entity. Typically this is used to data columns which have a meaning enforce before or after within the business domain, and updates, inserts and deletes. second the abstract unique identifier, such as a counter value, which have no business meaning but uniquely identify Having looked at the two domains of a row. We will discuss this and the interest and compared some of the implications of meaningful keys later. important features of each, we will digress briefly to look at the notation A table may contain columns that map proposed to represent relational data to the primary key of another table. models in the UML. This relationship between tables defines a and implies a The UML Data Model Profile structural relationship or association between the two tables. The Data Model Profile is a proposed UML extension (and currently under Summary review - Jan 2001) to support the modelling of relational in From the above overview we can see UML. It includes custom extensions that the object model is based on for such things as tables, data base discrete entities having both state schema, table keys, triggers and (attributes/data) and behaviour, with constraints. While this is not a ratified access to the encapsulated data extension, it still illustrates one generally through the class public possible technique for modelling a interface only. The relational model relational database in the UML. exposes all data equally, with limited support for associating behaviour with Tables data elements through triggers, indexes and constraints. Customer

You navigate to distinct information in the object model by moving from object to object using unique object identifiers and established object relationships (similar to a network A table in the UML Data Profile is a data model). In the relational model class with the «Table» stereotype, you find rows by joining and filtering displayed as above with a table icon in result sets using SQL using generalised the top right corner. search criteria. Columns Identity in the object model is either a run-time reference or persistent unique ID (termed an OID). In the relational Customer world, primary keys define the PK OID: int Name: VARCHAR2 uniqueness of a data set in the overall Address: VARCHAR2 data space.

In the object model we have a rich set of relationships: inheritance, aggregation, association, composition, Database columns are modelled as dependency and others. In the attributes of the «Table» class. For relational model we can really only example, the figure above shows some specify a relationship using foreign attributes associated with the Customer keys. table. In the example, an object id has been defined as the primary key, as well as two other columns, Name and Customer Address. Note that the example above PK OID: int defines the column type in terms of the Name: VARCHAR2 native DBMS data types. Address: VARCHAR2 + «PK» idx_customer00() + «FK» idx_customer02() Behaviour + «Index» idx_customer01() + «Trigger» trg_customer00() + «Unique» unq_customer00() So far we have only defines the logical + «Proc» spUpdateCustomer() (static) structure of the table; in + «Check» chk_customer00() addition we should describe the behaviour associated with columns, including indexes, keys, triggers, The example illustrates the following procedures & etc. Behaviour is possible behaviour: represented as stereotyped operations. 1. A primary key constraint (PK); The figure below shows our table 2. A Foreign key constraint (FK); above with a primary key constraint 3. An index constraint (Index); and index, both defined as stereotyped 4. A trigger (Trigger); operations: 5. A uniqueness constraint (Unique); 6. A (Proc) - not formally part of the data profile, Customer but an example of a possible PK OID: int Name: VARCHAR2 modelling technique; and a Address: VARCHAR2 7. Validity check (Check).

+ «PK» idx_customer00() + «index» idx_customer01() Using the notation provided above, it is possible to model complex data structures and behaviour at the DBMS level. In addition to this, the UML Note that the PK flag on the column provides the notation to express 'OID' defines the logical primary key, relationships between logical entities. while the stereotyped operation "«PK» idx_customer00" defines the Relationships constraints and behaviour associated with the primary key implementation The UML data modelling profile (that is, the behaviour of the primary defines a relationship as a dependency key). of any kind between two tables. It is represented as a stereotyped Adding to our example, we may now association and includes a set of define additional behaviour such as primary and foreign keys. triggers, constraints and stored procedures as in the example below: The data profile goes on to require that a relationship always involves a parent and child, the parent defining a primary key and the child implementing a foreign key based on all or part of the parent primary key.

The relationship is termed 'identifying' if the child foreign key includes all the elements of the parent primary key and hardware (a 'node' in UML). 'non-identifying' if only some elements of the primary key are included. To represent schema within the database, use the «schema» stereotype The relationship may include on a package. A table may be placed in cardinality constraints and be modelled a «schema» to establish its scope and with the relevant PK - FK pair named location within a database. as association roles. Figure 4 illustrates this kind of relationship modelling «schema» using UML. User

Child Grandchild The Physical Model Grandparent Parent UML also provides some mechanisms Person for representing the overall physical structure of the database, its contents and deployed location. To represent a physical database in UML, use a stereotyped component as in the figure Mapping from the Class Model to below: the Relational Model

«Database» Having described the two domains of MainOraDB interest and the notation to be used, we can now turn our attention as to how to map or translate from one domain to the other. The strategy and sequence A component represents a discrete and presented below is meant to be deployable entity within the model. In suggestive rather than proscriptive - the physical model, a component may adapt the steps and procedures to your be mapped on to a physical piece of personal requirements and

Parent Child PK_PersonID FK_PersonID

2 «identifying» 0..n

An identifying relationship between child and parent, with role names based on primary to foreign key relationship.

Figure 4 - UML relationship environment. construct from the object-oriented model that requires translating into the 1. Model Classes relational model. The relational space is essentially flat, every entity being Firstly we will assume we are complete in its self, while the object engineering a new relational database model is often quite deep with a well- schema from a class model we have developed class hierarchy. created. This is obviously the easiest direction as the models remain under The deep class model may have many our control and we can optimise the layers of inherited attributes and relational data model to the class behaviour, resulting in a final, fully model. In the real world it may be that featured object at run-time. There are you need to layer a class model on top three basic ways to handle the of a legacy data model - a more translation of inheritance to a relational difficult situation and one that presents model: its own challenges. For the current discussion will focus on the first 1. Each class hierarchy has a single situation. At a minimum, your class corresponding table that contains model should capture associations, all the inherited attributes for all inheritance and aggregation between elements - this table is therefore the elements. of every class in the hierarchy. For example, Person, 2. Identify persistent objects Parent, Child and Grandchild may all form a single class hierarchy, Having built our class model we need and elements from each will appear to separate it into those elements that in the same relational table; require persistence and those that do 2. Each class in the hierarchy has a not. For example, if we have designed corresponding table of only the our application using the Model-View- attributes accessible by that class Controller design pattern, then only (including inherited attributes). For classes in the model section would example, if Child is inherited from require persistent state. Person only, then the table will contain elements of Person and 3. Assume each persistent class maps Child only; to one relational table 3. Each generation in the class hierarchy has a table containing A fairly big assumption, but one that only that generation's actual works in most cases (leaving the attributes. For example, Child will inheritance issue aside for the map to a single table with Child moment). In the simplest model a class attributes only from the logical model maps to a relational table, either in whole or in There are cases to be made for each part. The logical extension of this is approach, but I would suggest the that a single object (or instance of a simplest, easiest to maintain and less class) maps to a single table row. error prone is the third option. The first option provides the best performance 4. Select an inheritance strategy at run-time and the second is a compromise between the first and last. Inheritance is perhaps the most problematic relationship and logical The first option flattens the hierarchy be replicated in many child tables. and locates all attributes in one table - Even worse, the parental data in two or convenient for updates and retrievals more child classes may be redundantly of any class in the hierarchy, but stored in many tables; if a parent's difficult to authenticate and maintain. attributes are modified, there is Business rules associated with a row considerable effort in locating are hard to implement, as each row dependent children and updating the may be instantiated as any object in the affected rows. hierarchy. The dependencies between columns can become quite The third option more accurately complicated. In addition, an update to reflects the object model, with each any class in the hierarchy will class in the hierarchy mapped to its potentially impact every other class in own independent table. Updates to the hierarchy, as columns are added, parents or children are localised in the deleted or modified from the table. correct space. Maintenance is also relatively easier, as any modification The second option is a compromise of an entity is restricted to a single that provides better encapsulation and relational table also. The down side is eliminates empty columns. However, a the need to re-construct the hierarchy change to a parent class may need to at run-time to accurately re-create a

Parent tbl_Parent A Parent class with unique ID (OID) - OID: GUID and Name and Sex attributes maps to AddressOID: VARCHAR # Name: String a relational table. Name: VARCHAR # Sex: Gender PK OID: VARCHAR Sex: VARCHAR + setName(String) + getName() : String + setSex(String) <> + getSex() : String

m_Address 0..n The Address association from the logical model becomes a foreign key relationship in the data model

1 Address - OID: GUID # City: String The Address class in the logical tbl_Address # Phone: String model becomes a table in the City: VARCHAR # State: String data model PK OID: VARCHAR # Street: String Phone: VARCHAR State: VARCHAR + getCity() : String Street: VARCHAR + getStreet() : String + setCity(String) <> + setStreet(String)

Figure 5 - Class to Table mapping child class's state. A Child object may A complete run-time scenario can then require a Person member variable to be loaded from storage reasonably represent their model parentage. As efficiently. both require loading, two database calls are required to initialise one An important point about the OID object. As the hierarchy deepens, with values above is that they have no more generations, the number of inherent meaning beyond simple database calls required to initialise or identity. They are only logical pointers update a single object increases. and nothing more. In the relational model, the situation is often quite It is important to understand the issues different. that arise when you map inheritance onto a relational model, so you can Identity in the relational model is decide which solution is right for you. normally implemented with a primary key. A primary key is a set of columns in a table that together uniquely 5. For each class add a unique identify a row. For example, name and object identifier address may uniquely identify a 'Customer'. Where other entities, such In both the relational and the object as a 'Salesperson', reference the world, there is the need to uniquely 'Customer', they implement a foreign identify an object or entity. key based on the 'Customer' primary key. In the object model, non-persistent objects at run-time are typically The problem with this approach for our identified by direct reference or by a purposes is the impact of having pointer to the object. Once an object is business information (such as customer created, we can refer to it by its run- name and address) embedded in the time identity. However, if we write out identifier. Imagine three or four tables an object to storage, the problem is all have foreign keys based on the how to retrieve the exact same instance customer primary key, and a system on demand. change requires the customer primary key to change (for example to include The most convenient method is to 'customer type'). The work required to define an OID (object identifier) that is modify both the 'customer' table and guaranteed to be unique in the the entities related by foreign key is namespace of interest. This may be at quite large. the class, package or system level, depending on actual requirements. On the other hand, if an OID was implemented as the primary key and An example of a system level OID formed the foreign key for other tables, might be a GUID (globally unique the scope of the change is limited to identifier) created with Microsoft's the primary table and the impact of the 'guidgen' tool; eg. {A1A68E8E-CD92- change is therefore much less. 420b-BDA7-118F847B71EB}. A class level OID might be implemented using Also, in practice, a primary key based a simple numeric (eg. 32 bit counter). on business data may be subject to change. For example a customer may If an object holds references to other change address or name. In this case objects, it may do so using their OID. the changes must be propagated correctly to all other related entities, detailed below for handling not to mention the difficulty of associations and aggregation. changing information that is part of the primary key. 7. Map associations to foreign keys

An OID always refers to the same More complex class attributes (ie. entity - no matter what other those which represent other classes), information changes. In the above are usually modelled as associations. example, a customer may change name An association is a structural or address and the related tables relationship between objects. For require no change. example, a Person may live at an Address. While this could be modelled When mapping object models into as a Person has City, Street and Zip relational tables, it is often more attributes, in both the object and the convenient to implement absolute relational world we are inclined to identity using OID's rather than structure this information as a separate business related primary keys. The entity, an Address. OID as primary and foreign key approach will usually give better load In the object domain an address and update times for objects and represents a unique physical object, minimise maintenance effort. In possibly with a unique OID. In the practice, a business related primary relational, an address may be a row in key might be replaced with: an Address table, with other entities having a foreign key to the Address 1. A uniqueness constraint or index primary key. on the columns concerned; 2. Business rules embedded in the In both models then, there is the class behaviour; tendency to move the address 3. A combination of 1 and 2. information into a separate entity. This helps to avoid redundant data and Again, the decision to use meaningful improves maintainability. keys or OID's will depend on the exact requirements of the system being So for each association in the class developed. model, consider creating a foreign key from the child to the parent table. 6. Map attributes to columns

In general we will map the simple data attributes of a class to columns in the relational table. For example a text and number field may represent a person's name and age respectively. This sort of direct mapping should pose no problem - simply select the appropriate data type in the vendor's relational model to host your class attribute.

For complex attributes (ie. attributes that are other objects) use the approach Salesperson Customer

PK OID: int PK OID: int Name: VARCHAR2 PK_Salesperson FK_Salesperson Name: VARCHAR2 Department: VARCHAR2 Address: VARCHAR2 1 0..* Salesperson: int + «PK» PK_Salesperson() + «PK» PK_Customer() + «FK» FK_SalesPerson()

Relationships are based on the PK- FK pair. This example relates a Salesperson to a Customer by the appropriate primary and foreign keys. The assumption is that a customer may only be associated with one salesperson.

Figure 6 - Table relationships in UML

8. Map Aggregation and shares from a Share table, but each Composition Share may be associated with zero or one Person. If the Person ceases to Aggregation and composition exist, the Shares become un-owned or relationships are similar to the are passed to another Person. In the association relationship and map to relational world, this could be tables related by primary-foreign key implemented as each Share having an pairs. There are however, some points 'owner' column which stored a Person to bear in mind. ID (or OID) .

Ordinary aggregation (the weak form) The strong form of aggregation, models relationships such as a Person however, has important integrity resides at one or more Addresses. In constraints associated with it. this instance, more than one person Composition, implies that an entity is could live at the same address, and if composed of parts, and those parts the Person ceased to exist, the have a dependent relationship to the Addresses associated with them would whole. For example, a Person may still exist. This example parallels the have identifying documents such as a many-to-many relationship in Passport, Birth Certificate, Driver's relational terminology, and is usually License & etc. A Person entity may be implemented as a separate table composed of the set of such identifying containing a mapping of primary keys documents. If the Person is deleted from one table to the primary keys of from the system, then the identifying another. documents must be deleted also, as they are mapped to a unique A second example of the weak form of individual. aggregation is an entity has use or exclusive ownership of another. For If we ignore the OID issue for the example, a Person entity aggregates a moment, a weak aggregation could be set of shares. This implies a Person implemented using either an may be associated with zero or more intermediate table (for the many-to- many case) or with a foreign key in the uniqueness and data constraints, and aggregated class/table (one-to-many relational integrity. case). In the case of the many-to-many relationship, if the parent is deleted, A non-persistent object model would the entries in the intermediate table for typically implement all the behaviour that entity must also be deleted also. In required in one or more programming the case of the one-to-many languages (eg. Java or C++). Each relationship, if the parent is deleted, class will be given its required the foreign key entry (ie. 'owner') must behaviour and responsibilities in the be cleared. form of public, protected and private methods. In the case of composition, the use of a foreign key is mandatory, with the Relational databases from different added constraint that on deletion of the vendors typically include some form of parent the part must be deleted also. programmable SQL based scripting Logically there is also the implication language to implement data with composition that the primary key manipulation. The two common of the part forms part of the primary examples are triggers and stored key of the whole - for example, a procedures. Person's primary key may composed of their identifying documents ID's. In When we mix the object and relational practice this would be cumbersome, models, the decision is usually whether but the logical relationship holds true. to implement all the business logic in the class model, or to move some to the often more efficient triggers and 9. Define relationship roles stored procedures implemented in the relational DBMS. For each association type relationship, each end of the relationship may be From a purely object-oriented point of further specified with role information. , the answer is obviously to avoid Typically, you will include the Primary triggers and stored procedures and Key constraint name and the Foreign place all behaviour in the classes. This Key Constraint name. Figure 6 localises behaviour, provides for a illustrates this concept. This logically cleaner design, simplifies maintenance defines the relationship between the and provides good portability between two classes. DBMS vendors.

In addition, you may specify additional In the real world, the bottom line may constraints (eg. {Not NULL}) on the be scaling to 100's or 1000's of role and cardinality constraints (eg. , something 0..n). stored procedures and triggers are purpose designed for. 10. Model behaviour If purity of design, portability, We now come to another difficult maintenance and flexibility are the issue: whether to map some or all class main drivers, localise all behaviour in behaviour to the functional capabilities the object methods. provided by database vendors in the form of triggers, stored procedures, If performance is an over-riding concern, consider delegating some behaviour to the more efficient DBMS In UML, the physical model describes scripting languages. Be aware though how something will be deployed into that the extra time taken to integrate the real world - the hardware platform, the object model with the stored network connectivity, software, procedures in a safe way, including operating system, dll's and other issues with remote effects and components. You produce a physical debugging, may cost more in model to complete the cycle - from an development time than simply initial use case or domain model, deploying to more capable hardware. through the class model and data models and finally the deployment As mentioned earlier, the UML Data model. Profile provides the following extensions (stereotyped operations) Typically for this model you will with which you can model DBMS create one or more nodes that will host behaviour: the database(s) and place DBMS software components on them. If the ! Primary key constraint (PK); database is split over more than one ! Foreign key constraint (FK); DBMS instance, you can assign ! Index constraint (Index); packages («schema») of tables to a ! Trigger (Trigger); single DBMS component to indicate ! Uniqueness constraint (Unique); where the data will reside. ! Validity check (Check). Conclusion

11. Produce a physical model This concludes this short article on database modelling using the UML. As

«schema» System MainServer sys_aliases sys_procs sys_queries sys_tables sys_users «Database» MainOraDB

«schema» Us e r Child Grandchild Grandparent Par ent Per s on

A Node is a physical piece of hardw are (such as a Unix server) on w hich components are deployed. The database component in this example is also mapped to tw o logical «schema», each of w hich contains a number of tables.

Figure 7 - The Physical Model you can see, there are quite a few issues to consider when mapping from the object world to the relational. The UML provides support for bridging the gap between both domains, and together with extensions such as the UML Data Profile is a good language for successfully integrating both worlds.

References

Muller, Robert J., for Smarties, Morgan Kaufman, 1999.

Rational Software, The UML and Data Modelling, Rational Software

Ambler, Scott W., Mapping Objects to Relational Databases, AmbySoft inc, 1999

About the Author

Geoffrey Sparks is the director of Sparx Systems, an Australian company that specialises in UML tools.

The Sparx Systems web site is at: www.sparxsystems.com.au

Geoffrey may be contacted at [email protected] A quick summary guide to data modelling in UML

1. Create a class model for your development domain

2. Identify persistent classes from the model

3. Assume each persistent class in the model will map to one relational table

4. Select a suitable inheritance strategy for each class hierarchy

5. For each class add a unique ID (OID) or select a suitable primary key

6. For each class map simple data types to table columns

7. For each class, map complex attributes (association, aggregation) to PK/ FK pairs. Take special note of the strong and weak forms of aggregation.

8. For related classes, map PK, FK pairs naming the role ends according to selected key.

9. Label relationship roles with their appropriate cardinality and stereotype: <> or <>

10. Add stereotyped operations for table behaviour (keys, indexes, uniqueness, checks, and triggers)

11. Divide persistent classes into

12. Create a deployment model and link database components to physical nodes