A Logical Design Methodology for Relational Using the Extended Entity-Relationship Model

TOBY J. TEOREY Computing Research Laboratory, Electrical Engineering and Computer Science, The University of Michigan, Ann Arbor, Michigan 48109-2122

DONGQING YANG Computer Science and Technology, Peking Uniuersity, Beijing, The People’s Republic of China

JAMES P. FRY Computer and Information Systems, Graduate School of Business Administration, The University of Michigan, Ann Arbor, Michigan 48109-1234

A design methodology is defined for the design of large relational databases. First, the data requirements are conceptualized using an extended entity-relationship model, with the extensions being additional semantics such as ternary relationships, optional relationships, and the generalization abstraction. The extended entity- relationship model is then decomposed according to a set of basic entity-relationship constructs, and these are transformed into candidate relations. A set of basic transformations has been developed for the three types of relations: entity relations, extended entity relations, and relationship relations. Candidate relations are further analyzed and modified to attain the highest degree of normalization desired. The methodology produces database designs that are not only accurate representations of reality, but flexible enough to accommodate future processing requirements. It also reduces the number of data dependencies that must be analyzed, using the extended ER model conceptualization, and maintains data integrity through normalization. This approach can be implemented manually or in a simple software package as long as a “good” solution is acceptable and absolute optimality is not required.

Categories and Subject Descriptors: H.2.1 [Database Management]: Logical Design- data models General Terms: Databases, Design Additional Key Words and Phrases: Entity-relationship model, integrity, logical design, relational databases

INTRODUCTION approach has been a low-level bottom-up activity synthesizing data elements into Relational has been ac- normalized relations using the inter-data- complished with a variety of approaches, element dependencies resulting from the including the top-down, bottom-up, and requirements analysis [Codd 1970, 1974; combined methodologies. The traditional Martin 1982; Date 1984; Smith 19851.

Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. 0 1986 ACM 0360-0300/86/0600-0197 $00.75

Computing Surveys, Vol. 18, No. 2, June 1986 198 l T. J. Teorey, D. Yang, and J. P. Fry CONTENTS nication between the designer and the end user during the requirements analysis and conceptual design phases because of its ease of understanding and its convenience in INTRODUCTION representation [Chen 19761. One of the 1. ER MODELING AND EXTENDED reasons for its effectiveness is that it is a CONSTRUCTS top-down approach using the concept of 1.1 Original Classes of Objects (ER Model) 1.2 Extended Classes of Objects (EER Model) abstraction. The number of entities (i.e., 1.3 Fundamental EER Constructs the objects that we want to collect infor- 2. EER MODELING OF REQUIREMENTS mation about) in a database is typically an (STEP 1) order of magnitude less than the number of 2.1 Design Step 1 Details 2.2 An Example Database: data elements. Therefore, using entities as Cohpany Personnel and Projects an abstraction for data elements and focus- 3. TRANSFORMATION OF THE EER MODEL ing on the interentity relationships greatly TO RELATIONS (STEP 2) reduces the number of objects under con- 3.1 Transformation Rules sideration and simplifies the analysis. Al- 3.2 Design Step 2 Details 3.3 Example though it is still necessary to represent data 4. NORMALIZATION OF RELATIONS elements by attributes of entities at the (STEP 3) conceptual level, their dependencies are 4.1 Design Step 3 Details normally confined to the other attributes 4.2 Example 5. REFINEMENTS TO THE LOGICAL within the entity or, in some cases, to those DESIGN PROCESS attributes associated with other entities 5.1 Addition of More Semantics that have a direct relationship to their to Conceptual Modeling entity. 5.2 Relation Refinement for Usage Efficiency The major interattribute dependencies 6. CONCLUSION APPENDIX: SUMMARY OF LOGICAL are between the entity keys (unique iden- RELATIONAL DATABASE tifiers) of different entities that are cap- DESIGN STEPS tured in the ER modeling process. Special ACKNOWLEDGMENTS cases, such as dependencies among data REFERENCES elements of unrelated entities, can be analyzed upon identification in the data analysis. This relational database design approach Although the traditional process is vital to uses both the ER model and the relational the design of relational databases, its com- model in successive stages. It benefits from plexity, particularly in large databases, can the simplicity and ease of use of the entity- be overwhelming to the point where prac- relationship model and the structure (and tical designers often do not bother to mas- associated formalism) of the relational ter it or even use it with any regularity. At model. In order to achieve this approach, it the theoretical level a top-down approach is necessary to build a framework for trans- has been investigated with regard to the forming the variety of ER constructs into universal relation assumption [Beeri et al. relations that can be easily normalized. 1978; Kent 19811. In practice, typically a Before we do this, however, we first define few basic relations are defined by the re- the major steps of the relational design quirements analysis process, and then a methodology. combination of the top-down and bottom- The logical relational design methodol- up approaches is used. The combined ap- ogy (LRDM) is both a refinement and an proach has recently become much more extension of the design methodology pro- popular because of the introduction of a posed in Teorey and Fry [1982]. The basic well-established conceptual design tool, the steps of this methodology, as shown in entity-relationship model, into this process Figure 1, are summarized as follows: [Date 1984; Sweet 1985; Yang et al. 19851. The entity-relationship (ER) model has Step 1. Extended ER Modeling of Re- been most successful as a tool for commu- quirements. The data requirements are

Computing Surveys, Vol. 18, No. 2, June 1986 Logical Design Methodology for Relational Databases l 199 Step 1 Candidate relations associated with all de- Requirements rived FDs and MVDs are then normalized analysis and to the highest degree desired using standard extended ER(EER) manual normalization techniques. Redun- modeling dancies that occur in normalized candidate relations are then analyzed further for pos- EER sible elimination, with the constraint that diagrams data integrity must be preserved. Step2 Transformation of The LRDM methodology simplifies the EER diagrams to approach to designing large relational relations databases by reducing the number of data candidate dependencies that need to be analyzed. relations This is accomplished by introducing a con- Step3 v ceptual design step in the traditional rela- Normalization tional modeling approach. The objective of relations of this step is to capture an accurate rep- normalized resentation of reality using the extended candidate ER model. Data integrity is preserved relations through normalization of the candidate re- V lations formed from the transformation of To physical design the extended ER model. Processing effi- and implementation ciency for query, update, and maintenance of integrity constraints is considered to be Figure 1. Relational database design: basic steps. part of physical design and is not discussed here. Next we build the foundation for the analyzed and modeled using an extended LRDM by defining the extended ER model ER diagram that includes semantics for and providing a graphical representation optional relationships, ternary relation- scheme for it. ships, and subtyping (categories). Pro- cessing requirements are assumed to be 1. ER MODELING AND EXTENDED specified using natural language expres- sions, along with the frequency of occur- CONSTRUCTS rence. Logical views from multiple sources The entity-relationship approach initi- are integrated into a common global view ally proposed by Chen, although modified of the entire database. and extended by others, still remains the premier model for conceptual design. It Step 2. Transformation of the Extended is used to represent information in terms ER Model to Relations. On the basis of a of entities, their attributes, and asso- categorization of extended ER constructs ciations among entity occurrences called and a set of mapping rules, each relation- relationships. ship and its associated entities are trans- Powerful extensions to data models formed into a set of candidate relations. providing greater semantics have been Redundant relations are eliminated. proposed by others ISmith and Smith 1977; Step 3. Normalization of Relations. Hawryszkiewycz 19841. Other researchers Functional dependencies (FDs) are derived have focused their extensions primar- from the extended ER diagram to represent ily on the ER model, in particular the ab- the dependencies among data elements that straction concepts such as generalization are keys of entities. Additional FDs and [Scheuermann et al. 1980; Atzeni et al. multivalued dependencies (MVDs), which 1981; Navathe and Cheng 1983; Sakai represent the dependencies among key and 1983; Elmasri et al. 1985; Ling 19851. Ter- nonkey attributes within entities, are de- nary relationships and composite attrib- rived from the requirements specification. utes were also studied by Ling [1985].

Computing Surveys, Vol. 18, No. 2, June 1986 200 l T. J. Teorey, D. Yang, and J. P. Fry

?elationship 01

(4 (b)

Figure 2. Extended ER (EER) model representations.

Other work has concentrated on such top- erties such as name, color, and weight. ics as existence constraints [Webre 1981; Finally, relationships (formerly called Sakai 19831 or on more general integrity relationship sets) represented real-world constraints [Lenzerini and Santucci 1983; associations among one or more entities. Oren 19851. There are two types of attributes: iden- There is also a large body of work devoted tifiers and descriptors. The former is used to the transformation of the ER model to to uniquely distinguish among the occur- the . Most of the earlier rences of an entity, whereas the latter is work focused on the original ER model used to describe an entity occurrence. En- [Wong and Katz 1980; Date 1985; Martin tities can be distinguished by the “strength” 1983; Howe 1983; Hawryszkiewycz 1984; of their identifying attributes. Strong enti- Briand et al. 19851. Existence dependency ties have internal identifiers that uniquely was added by Webre [1981]. Later trans- determine the existence of entity occur- formation algorithms included abstraction rences. Weak entities derive their existence in an extended ER model [Elmasri et al. from the identifying attributes (sometimes 1985; Ling 19851. Transformations based called external attributes) of one or more on a normal form for ER models have a “parent” entities. Relationships have se- theoretical basis and a strong potential for mantic meaning, which is indicated by the future applications [Chung et al. 1981; connectivity between entity occurrences Jajodia and Ng 1983,1984; Ling 19851. We (one to one, one to many, and many to take a more pragmatic approach by synthe- many), and the participation in this con- sizing recent research and applying it to nectivity by the member entities may be current model implementation methods either optional or mandatory. For example, [Yang et al. 19851. the entity “person” may or may not have a spouse. Finally, each of the entities may 1.1 Original Classes of Objects (ER Model) have one or more synonyms associated with it. The diagrams for representing entities, Initially, Chen proposed three classes of relationships, and attributes are shown in objects: entities, attributes, and relation- Figure 2a. ships (Figure 2a). Entity sets (we drop the term set in our discussion) were the prin- 1.2 Extended Classes of Objects cipal objects about which information was (EER Model) to be collected and usually denoted a per- son, place, thing, or event of informational The original ER model has long been effec- interest. Attributes were used to detail the tively used for communicating fundamen- entities by giving them descriptive prop- tal data and relationship definitions with

Computing Surveys, Vol. 18, No. 2, June 1986 Logical Design Methodology for Relational Databases l 201 the end user. Using the ER model as a is partitioned by different values of a com- representation, how- mon attribute (Figure 2b). For example, the ever, has proved difficult because of the entity EMPLOYEE is a generalization of inadequacy of the initial modeling con- ENGINEER, SECRETARY, and TECH- structs. View integration, for example, re- NICIAN. The generalization object (EM- quires the use of abstraction concepts such PLOYEE) is called an “IS-A” exclusive as generalization [Navathe et al. 19861. hierarchy because each occurrence of the Data integrity involving null attribute val- entity EMPLOYEE is an occurrence of one ues requires defining relationships such and only one of the entities ENGINEER, that a null set on either side of the relation- SECRETARY, TECHNICIAN. ship is either allowed or disallowed. Also, certain relationships of degree higher than 1.3 Fundamental EER Constructs 2 (binary) may be present and are awkward The following classification of EER con- (or incorrect) when represented in binary structs is defined to facilitate development form. The extended ER model provides of a concise and easy to understand EER simple representations for these commonly diagram. used concepts and is compatible with the simplicity of the original ER model. (1) Degree of a relationship. The degree The introduction of the category abstrac- of a relationship is the number of entities tion into the ER model resulted in two associated with the relationship. An n-ary additional types of objects: subset hier- relationship is of degree n. Unary, binary, archies and generalization hierarchies and ternary relationships are special cases [Navathe and Cheng 1983; Elmasri et al. in which the degree is 1, 2, and 3, respec- 19851. The subset hierarchy specifies tively. This is indicated in Figure 3. possibly overlapping subsets, while the (2) Connectivity of a relationship. The generalization hierarchy specifies strictly connectivity of a relationship specifies the nonoverlapping subsets. Both subset ob- mapping of the associated entity occur- jects will transform equivalently to a rela- rences in the relationship. Values for con- tional scheme, but they will nectivity are either “one” or “many.” For differ significantly with regard to update a relationship among entities El, E2, . . . , (integrity) rules. Ei, ..., E, a connectivity of “one” for entity Ei means that given all entities ex- Subset Hierarchy Definition. An entity cept Ei, there is at most one related entity El is a subset of another entity Ez if every occurrence of Ei. occurrence of El is also an occurrence of The actual number associated with the EP. term “many” is called the cardinality of the A subset hierarchy is the case in which connectivity. Cardinality may be repre- every occurrence of the generic entity may sented by upper and lower bounds. Figure also be an occurrence of other entities that 3 shows the basic constructs for connectiv- are potentially overlapping subsets (Figure ity: one to one (unary or binary relation- 2b). For example, the entity EMPLOYEE ship), one to many (unary or binary may include “employees attending college,” relationship), and many to many (unary or “employees who hold political office,” or binary relationship). The shaded area in the unary or binary relationship diamond “employees who are also shareholders” as represents the “many” side, while the specialized classifications. unshaded area represents the “one” side Generalization Hierarchy Definition. An [Reiner et al. 19851. entity E is generalization of the entities El, We use an n-sided polygon to represent n-ary relationships for n > 2 in order to EP, . . . . E, if each occurrence of E is also an occurrence of one and only one of the show explicitly each entity associated with entities El, E2, . . . , E,,. the relationship to be either “one” or “many” related to the other entities. Each A generalization hierarchy occurs when corner of the n-sided polygon connects to an entity (which we call the generic entity) an entity. A shaded corner denotes “many”

Computing Surveys, Vol. 18, No. 2, June 1986 202 l T. J. Teorey, D. Yang, and J. P. Fry

CONCEPT REPRESENTATION 1 EXAMPLE DEGREE unary

binary I OF ternary

CONNECTIVITY I : 1 -o- 1 DEPT j-+jMPLOYEEI

1 :n DEPT EMPLOYEE CONTAINS

m:n -+-/~IPL~Y;!---++ PROJECT ]

MEMBERSHIP 1 OFFICE 1 CLASS mandatory OCCUPIED-BY

optional 4EMPLOYEE

Figure 3. Fundamental EER constructs: relationship types. and an unshaded corner denotes “one.” The example is considered “many” (e.g., each ternary relationship (see Figures 3 and 6b, employee with a given skill could work on Section 2.1, Step 1.3) illustrates this type many projects). This is functionally equiv- of association, which is much more complex alent to the meaning of the functional de- than either a unary or binary relationship. pendencies in Figure lob (Section 3.1.3) for An entity in a ternary relationship is con- this relationship. In Figure 10 we see that sidered to be “one” if only one occurrence each entity considered “one” appears on of it can be associated with one occurrence the right-hand side of exactly one FD. No of each of the other two associated entities. entity considered “many” ever appears on It is “many” if more than one occurrence the right-hand side of an FD. Equivalent of it can be associated with one occurrence FDs are used to express ternary relation- of each of the other two associated entities. ships in Ling [ 19851. In either case, one occurrence of each of A ternary relationship cannot be reduced the other entities is assumed to be given. to equivalent binary relationships if the The relationship SKILL-USED in Fig- relation used to represent it is in 4NF. For ure 3 associates the entities EMPLOYEE, example, SKILL-USED in Figures 3 and PROJECT, and SKILL. Each entity in this lob (Section 3.1.3) is in 4NF and cannot be

Computing Surveys, Vol. 18, No. 2, June 1986 Logical Design Methodology for Relational Databases l 203

SKILL-USED

(a)

SKILL-AVAILABLE ‘4l

(b) Figure 4. Nondecomposable and decomposable ternary relationships ex- pressed as relations. (a) 4NF relation (nondecomposable); (b) 3NF relation decomposable to 4NF relations. decomposed (Figure 4a). However, SKILL- implied by existence dependency in the AVAILABLE, which has the same ER real-world system; for example, an inde- representation as SKILL-USED, is not in pendent (strong) entity associated with a 4NF if all the skills of an employee can be dependent (weak) entity cannot be op- used on all projects worked on by that tional, but the weak entity may be optional. employee (Figure 4b). In such a case Weak entities are sometimes depicted with SKILL-AVAILABLE can be decomposed a double-bordered rectangle (see Figure 2). into two many-to-many binary relation- (4) Object class of entities and relution- ships between EMPLOYEE and PROJ- ships. The basic objects are the n-ary ECT, and EMPLOYEE and SKILL. Each relationships with their associated entities. of these two new relationships represents a Objects resulting from abstraction are the relation in 4NF. generalization hierarchy and the subset (3) Membership class in a relatidnship. hierarchy (see Figure 2). The generalization Membership class specifies whether either hierarchy implies that the subsets are a full the “one” or “many” side in a relationship partition, such that the subsets are disjoint is mandatory or optional. If an occurrence and their combination makes up the full of the “one’‘-side entity must always exist set. The subset hierarchy implies that the for the entity to be included in the system, subsets are potentially overlapping. then it is mandatory. When an occurrence of that entity need not exist, it is considered 2. EER MODELING OF REQUIREMENTS optional. The “many” side of a relationship (STEP 1) is similarly mandatory if at least one entity occurrence must exist, and optional other- The objective of requirements analysis is wise. The optional membership class, de- manifold: (1) to delineate the data require- fined by a “0” on the connectivity line ments of the enterprise, (2) to describe the between an entity and a relationship, is information about the objects and their shown in Figure 3. Membership class is associations needed to model these data

ComputingSurveys, Vol. 18, No. 2, June 1986 204 . T. J. Teorey, D. Yang, and J. P. Fry

requirements, and (3) to determine the bute. For example, in the above store and types of transactions that are intended to city example, if there is some descriptive be executed on the database. We use the information such as STATE and POPU- extended entity-relationship (EER) model LATION for cities, then CITY should be to describe these objects and their inter- classified as an entity. If only CITY-NAME relationships, and assume natural language is needed to identify a city, then CITY expressions to describe transactions. should be classified as an attribute. The EER model enhances the designer’s (2) Multivalued attributes should be clas- ability to capture the real data require- sified as entities. If more than one value of ments accurately because it requires one to a descriptor corresponds to one value of focus on greater semantic detail in the data identifier, this descriptor should be classi- relationships. The semantics of EER allows fied as an entity instead of an attribute, for direct transformations of entities and even though it does not have descriptors relationships to at least 1NF relations and for itself. For example, in the above store specifies clear guidelines for integrity con- and city example, if one store (a chain) straints. Also, abstraction techniques, such could be located in several cities, then as generalization, provide useful tools for CITY should be classified as an entity even integration of user views to define a global if it only needs an identifier CITY-NAME. conceptual schema. Further discussion of (3) Mahe an attribute that has a many- the requirements data collection process to-one relationship with an entity. If a de- can be found in Martin [1982], Teorey and scriptor in one entity has a many-to-one Fry [1982], and Yao [1985]. relationship with another entity, the de- Let us now look more closely at the scriptor should be classified as an entity, basic objects and their relationships that even if it does not have its own descriptors. should be defined during the requirements For example, if two entities have been de- analysis. fined, STORE (with identifier STORE- NUMBER, descriptors OWNER and 2.1 Design Step 1 Details CITY) and STATE, because there is a many-to-one relationship between CITY Step 1.1. Classify entities and attributes. and STATE, CITY should be classified as Although it is easy to define entity, an entity. attribute, and relationship constructs (cf. (4) Attach attributes to entities that they Section l.l), it is not so easy to distin- describe most directly. For example, attri- guish their role in modeling the database. bute OFFICE-BUILDING should be an What makes an object an entity, an attri- attribute of the entity DEPARTMENT bute, or even a relationship? For example, instead of the entity EMPLOYEE. stores are located in cities. Should CITY (5) Avoid composite identifiers as much be an entity or an attribute? Registra- as possible, If an entity has been defined tion records are kept for each student. Is with a composite identifier, that is, an iden- REGISTRATION-RECORD an entity or tifier composed of two or more attributes, a relationship? What is a “normalized” and the components of the identifier are all entity? identifiers of other entities, then eliminate The following guidelines for classifying this entity. The corresponding object could entities and attributes will help the de- be defined as a relationship in a subsequent signer converge to a normalized relation- step. If an entity has been defined with a ship database design. composite identifier, but components of the identifier are not identifiers of other enti- (1) Entities have descriptive information; ties, then there are two possible solutions. identifying attributes do not. If there is de- One is to eliminate this entity and define scriptive information about an object, the new entities with components of the corn- ’ object should be classified as an entity. If posite identifier as entity identifiers, and only an identifier is needed for an object, in a subsequent step define a relationship the object should be classified as an attri- to represent this object. The other solution

Computing Surveys, Vol. 18, No. 2, June 1986 Logical Design Methodology for Relational Databases l 205 is to keep the entity with the composite identifier if it is reasonably natural. As an example, if an entity REGISTRA- TION-RECORD has been defined, with STUDENT and COURSE as a composite identifier, then the entity REGISTRA- I LOCATED-IN \ TION-RECORD could be eliminated, and CLUB SCHOOL two new entities STUDENT and COURSE could be defined; later in a subsequent step, Figure 5. Transitive relationships. a relationship between STUDENT and COURSE could be defined to represent the EMPLOYEE is a generalization of EN- object REGISTRATION-RECORD. In an- GINEER, SECRETARY, and TECHNI- other example, if an entity VOLLEY- CIAN. Then we reattach attributes to the BALL-TEAM has been defined, with entities. We put identifier EMP-NO and COUNTRY and GENDER as a composite generic descriptors EMP-NAME, HOME- identifier, then it seems suitable to keep ADDRESS, DATE-OF-BIRTH, JOB- this entity because defining an entity GEN- TITLE, and SALARY in the generic entity DER is not very natural. EMPLOYEE; we put identifier EMP-NO and specific descriptor SPECIALITY in The procedure of identifying entities entity ENGINEER, we put identifier and attaching attributes to entities is iter- EMP-NO and specific descriptor SPEED- ative: classifying some objects as entities, OF-TYPING in entity SECRETARY; and attaching identifiers and descriptors to we put identifier EMP-NO and specific de- them, then finding some violation to the scriptors SKILL, YEARS-OF-EXPERI- above guidelines, changing some objects ENCE in entity TECHNICIAN. from entity to attribute or from attribute to entity, then attaching attributes to the new entities, etc. Step 1.3. Define relationships.

Step 1.2. Identify the generalization We now deal with objects that were hierarchies and subset hierarchies. not classified as entities or attributes, but represent associations among objects. We If there is a generalization or subset define them as relationships. For every re- hierarchy among entities, then reattach at- lationship the following should be specified: tributes to the relevant entities. Put iden- degree, connectivity, membership class, tifier and generic descriptors in the generic and attributes. entity, and put identifier and specific de- The following are some guidelines for scriptors in the subset entities. defining relationships. For example, suppose that the following entities were identified in the EER model: (1) Redundant relationships should be EMPLOYEE (with identifier EMP-NO eliminated. Two or more relationships that and descriptors EMP-NAME, HOME- are used to represent the same concept are ADDRESS, DATE-OF-BIRTH, JOB- considered redundant. Redundant relation- TITLE, SALARY, SKILL), ENGINEER ships are more likely to result in unnor- (with identifier EMP-NO and descrip- malized relations when transforming the tors EMP-NAME, HOME-ADDRESS, EER model into relational schemas. Note SPECIALTY), SECRETARY (with iden- that two or more relationships are allowed tifier EMP-NO and descriptors EMP- between the same two entities as long as NAME, DATE-OF-BIRTH, SALARY, the two relationships have different mean- SPEED-OF-TYPING), TECHNICIAN ings. They are not considered redundant. (with identifier EMP-NO and descrip- One important case of redundancy is tors EMP-NAME, SKILL, YEARS- transitive dependency (see Figure 5). If OF-EXPERIENCE). We identify that BELONGS-TO is a many-to-one relation-

ComputingSurveys, Vol. 18, No. 2, June 1986 206 . T. J. Teorey, D. Yang, and J. P. Fry

(a) (b)

Figure 6. Comparison of binary and ternary relationships.

ship between STUDENT and CLUB, LO- Typically, when the design is large and CATED-IN is a many-to-one relationship more than one person is involved in re- between CLUB and SCHOOL, ATTENDS quirements analysis, multiple views of data is a many-to-one relationship between and relationships occur. These views must STUDENT and SCHOOL, and both BE- eventually be consolidated into a single LONGS-TO and LOCATED-IN are man- global view to eliminate redundancy and datory, then ATTENDS is redundant and inconsistency from the model. View inte- should be eliminated. gration requires the use of such extended (2) Ternary relationships must be de- ER semantic tools as identity (identifying fined carefully. We define a ternary rela- synonyms), aggregation, and generaliza- tionship among three entities only when tion. the concept (association) cannot be repre- Recent research has advanced view sented by several binary relationships integration from a representation tool among those entities. For example, there is [Teorey and Fry 19821 to heuristic algo- an association among entities TEACHER, rithms [Elmasri and Wiederhold 1979; STUDENT, and PROJECT. If each stu- Navathe and Gadgil 1982; Navathe et al. dent can be involved in many projects and 1984; Elmasri et al. 1985; Navathe et al. can work under the instruction of several 19861. These algorithms are typically inter- teachers for any of these projects, and if active, allowing the database designer to each teacher can instruct many students make decisions based on suggested alter- on any project, then two binary relation- native integration actions. Adopting an ships could be defined instead of one ter- ER extension called the entity-category- nary relationship (Figure 6a). If, however, relationship model [Elmasri et al. 19851, each student can be involved in several Navathe and others have organized the dif- projects and work under the instruction of ferent classes of objects and relationships several teachers, but, if for every project into forms that are either compatible or the student works under the instruction of incompatible for view integration [Navathe exactly one teacher, then a ternary rela- et al. 19861. A category is defined as a tionship must be defined (Figure 6b). subset of entities from an entity type, thus The meaning of connectivity for ternary representing a form of generalization hier- relationships is important. Figure 6b shows archy. An object class is a set of entities that for a given pair of occurrences of STU- that is either an entity type or category. DENT and PROJECT, there is only one Prior to actual integration, the algorithm corresponding occurrence of TEACHER; performs several functions on object however, for a given pair of occurrences of classes: First, it establishes naming conven- TEACHER and STUDENT, there could tions for object classes and attributes to be many corresponding occurrences of resolve synonyms and homonyms; second, PROJECT. it defines the candidate keys and attribute domains for each object class; and third, it Step 1.4. Integrate multiple views of en- defines mappings between equivalent attri- tities, attributes, and relationships. butes of corresponding object classes. The I

Computing Surveys, vol. 18, No. 2, June 1986 Logical Design Methodology for Relational Databases l 207

PART-OF

SKILL

- \ CONTAIN& /’

BELONGS-TO

PRF-ASSOC PC

Figure 7. Company personnel and project database (EER diagram). integration process is applied to four pos- advanced the automatability of the view sible forms of object class similarity: iden- integration process. tical domains, contained (subset) domains, overlapping domains, and disjoint domains. 2.2 An Example Database: Company Overlapping domains closely correspond to Personnel and Projects our definition of two categories related to each other in a subset hierarchy, and the We define a simple database design prob- disjoint domain form corresponds to a gen- lem to illustrate the major steps in this eralization hierarchy. The subset domain is relational database design methodology. similar to the relationship between a cate- Let us suppose it is desirable to build a gory and its generic entity type. company-wide database for a large engi- Relationships are classified in terms of neering firm that keeps track of all per- their degree, the role of each object class in sonnel, their skills and projects assigned, the relationship, and various constraints, departments worked in, and personal such as cardinality constraints, that may computers allocated. Each employee is differ among object classes. Relationships given a job title (engineer, technician, sec- with equal degree, the same roles, and the retary, manager). Engineers and techni- same cardinality constraints are easy to cians work on an average of two projects at merge. Those with differing characteristics one time, and each project could be head- are more difficult and in some cases impos- quartered at a different location (city). We sible to merge. assume that analysis of the detailed re- Although much more work is needed to quirements for data relationships in this understand the semantic impact of view company results in the global-view EER integration techniques, it is clear that cur- diagram in Figure 7, which becomes the rent ER extensions have significantly focal point for developing the normalized

Computing Surveys, Vol. 18, No. 2, June 1986 208 l T. J. Teorey, D. Yang, and J. P. Fry relations. Each relationship in Figure 7 is one for one of the entities, and with a unary based on a verifiable assertion about the relationship that is one to one or one to actual data in the company. Analysis of many for each entity. those assertions leads to the transforma- (3) Relationship relation with the for- tion of EER constructs into candidate eign keys of all the entities that are thus relations, as shown in Figures 8-12 and related. This transformation always occurs summarized in Figure 13 (see Section 3). for relationships that are binary and many Attributes are not included in Figures to many, relationships that are unary and 8-13 for simplicity, but are defined later many to many, and all relationships that in this example. are ternary or of a higher degree. As an example of view integration, the generalization of EMPLOYEE over JOB- The following rules are set forth for han- TITLE could represent the consolidation dling null values in these transformations. of two views of the database, one based on Nulls are only allowed for foreign keys of EMPLOYEE as the basic unit of personnel any optional entity in an entity relation, and the other based on the classification of but are not allowed for foreign keys of any the employees by job titles and special re- mandatory entity in an entity relation. lationships with those classifications, such Nulls are also not allowed for any foreign as the allocation of personal computers key in a relationship relation. (PCs) to engineers. 3.1.1 Two Entities, One (Binary) Relationship

3. TRANSFORMATION OF THE EER MODEL The one-to-one relationship between enti- TO RELATIONS (STEP 2) ties is illustrated in Figure 8a, b, and c. When both entities are mandatory (Figure 3.1 Transformation Rules 8a), each entity becomes a relation, and the We now look at each EER construct in key of either entity can appear in the other entity’s relation as a foreign key. One of more detail to see how each transformation rule is defined and applied. Our example is the entities in an optional relationship (see drawn from the company personnel and DEPARTMENT in Figure 8b) should con- project database EER schema, illustrated tain the foreign key of the other entity in in Figure 7 (Section 2.2), which indicates its transformed relation. The other entity the transformation of all types of EER (EMPLOYEE) could also contain a foreign constructs to relations. key (of DEPARTMENT), with nulls al- We note that the basic transforma- lowed, but would require more storage tions result in three types of relations space because of the much greater number [McGee 1974; Martin 1983; Sakai 1983; of EMPLOYEE entity occurrences than Hawryszkiewycz 19841: DEPARTMENT entity occurrences. When both entities are optional (Figure 8c), either (1) Entity relation with the same infor- entity could contain the embedded foreign mation content as the original entity. This key of the other entity, with nulls allowed. transformation always occurs for entities The one-to-many relationship is shown with binary relationships that are many to as either mandatory or optional on the many, one to many on the one (parent) “many” side without affecting the transfor- side, or one to one on one side; entities mation. On the “one” side it may be either with unary relationships that are many to mandatory (Figure Bd) or optional (Figure many; and entities with any ternary or 8e). In all cases the foreign key must appear higher degree relationship, generalization on the ‘%nany” side, which represents the hierarchy, or subset hierarchy. child entity, with nulls allowed for foreign (2) Entity relation with the embedded keys only in the optional “one” case. foreign key of the parent entity. This The many-to-many relationship, shown transformation always occurs with binary here as totally optional, requires a relation- relationships that are one to many for the ship relation with primary keys of both entity on the many (child) side and one to entities (Figure 8f). The same transforma-

Computing Surveys, Vol. 18, No. 2, June 1986 Every apprentice has one sponsor, Every employee works in exactly one 1APPRENTICEI andeverysponsorsponsorsone department. Every department could apprentice. contain many employees. CONTAINS Relations : SPONSORED Relations: -BY DEPARTMENTtDEPT-NO, ) APPRENTICE(EMP-NO, SPON-EMP-JO) 0 EMPLOYEE(EMP-NO, .,_DEE:NC> 6 SPONSOR(SPON-EMP-NO, .: .T Null DEPT-NO not allowed in EMPLOYEE Null SPON-EMP-NO not allowed in APPRENTICE. (4 (4

Each engineer can have at most one Every department must have a manager. SECRETARY An employee can be a manager of at secretary. One secretary could work most one department. Q for several engineers. Relations : DEPARTMENT(DEPT-NO, ‘---EMP-NO) -- ENGINEER(EMP-NO, ,sE_C-EMP-NO)----- EMPLOYEE(EMP-NO, ) SECRETARYtEMP-NO, )

Null EMP-NO not allowed in DEPARTMENT Null SEC-EMP-NO allowed in ENGINEER (b) (4

Every professional association could have ( ENGINEER ) Some personal computer (PCs) are many members who are engineers, or no allocated to engineers, but not engineers. Every engineer could belong to necessarily to all engineers. many professional associations, or none.

Relations : Relations : ENGINEER(EMP-NO, p_C-No) PRF-ASSOC(PA-NO.. ) PC(PC-NO, ) ENGINEERtEMP-NO, ) BELONGS-TO(PA-Nfl EMP-NO) Null PC-NO allowed in ENGINEER. (c) (f)

Figure 8. Binary relationship transformation rules. 210 l T. J. Teorey, D. Yang, and J. P. Fry tion applies to either the optional or man- When all relationships are “many” (Fig- datory case. Embedded foreign keys are not ure lob), the relationship relation is all key possible because of the “many” property in unless the relationship has its own attri- both directions. butes. In general the number of entities with connectivity “one” determines the lower bound on the number of FDs. 3.1.2 One Entity, One (Mary) Relationship One entity with a one-to-one relationship 3.1.4 Generalization and Subset Hierarchies implies some form of entity occurrence pairing, as specified by the relationship The generalization hierarchy resulting in name, and this must be either completely disjoint subsets is produced by partition- optional or completely mandatory. In both ing the generic entity by different values the mandatory case (Figure 9a) and the of a common attribute, for example, JOB- optional case (Figure 9b) the pairing entity TITLE in Figure 11. The transformation key appears as a foreign key in the resulting of disjoint subset generalization produces relation. In both cases the two key attri- a separate relation for the whole set (the butes are taken from the same domain but generic entity) and each of the subsets. The are given different names to designate their generic entity relation contains the generic unique use. The one-to-many relationship entity key and all common attributes, in- requires a foreign key in the entity relation cluding the common attribute used for par- for both the optional case (Figure SC), with titioning. This, of course, assumes that nulls allowed, and the mandatory case (Fig- such an attribute for partitioning actually ure 9d), with nulls not allowed. The many- exists. When the attribute does not exist, to-many relationship is shown as optional it must be created. (Figure 9e) and uses a relationship relation; Each subset relation contains the generic it could also be defined as mandatory (using entity key and only attributes specific to the word “must” instead of “may”) but that subset. Update integrity is maintained having the same transformation as the by requiring all insertions and deletions to optional case. occur in both the set (generic entity) rela- tion and relevant subset relation. If the change is to the key, then one subset, as 3.1.3 n Entities, One (n-ary) Relationship well as the set relation, must be updated. A @ > 2) change to a nonkey attribute affects either An n-ary relationship has n + 1 possible the set or one subset relation. varieties of connectivity: all n sides with Overlapping subsets are produced by par- connectivity “one,” n - 1 sides with con- titioning the generic entity by values of nectivity “one” and one side with connec- different attributes (Figure 12). The trans- tivity “many,” n - 2 sides with connectivity formation of this construct produces sepa- “one” and two sides with “many,” and so rate relations for the generic entity and on, until all sides are “many.” The four each of the subset entities. The key of each possible varieties of a ternary relationship relation is the key of the generic entity, and are shown in Figure 10. All varieties are whereas the generic entity relation contains transformed by creating a relationship re- only common attributes, the subset rela- lation containing the primary keys of all n tions contain attributes specific to that sub- entities; in each case, however, the meaning set entity. Thus the transformation rules of the keys is different. When all relation- for the disjoint and overlapping subsets are ships are “one” (Figure lOa), the relation- the same. ship relation consists of three possible The integrity rules between these two distinct candidate keys. This represents the cases are different, however. With overlap- fact that there are three functional depend- ping subsets, deletion from the set (generic encies (FDs) needed to describe this rela- entity) relation cascades to anywhere from tionship. The optional “one” allows null none to all of the subsets. Also, before foreign keys; the mandatory “one” does not. insertion into a subset relation, it is

Computing Surveys, Vol. 18, No. 2, June 1986 Logical Design Methodology for Relational Databases l 211

APPRENTICE Every apprentice has exactly one of the other apprentice as a partner in a Project

Relations : APPRENTICE(EMP-NO, ~-E~P-~O) wPARTNER-OF 64

EMPLOYEE An employee could have one of the other employee as his or her spouse.

Relations : EMPLOYEEtEMP-NO, S_P-EE-NO)- vMARRIED-TO Null SP-EMP-NO allowed in EMPLOYEE (b)

Engineers are divided into groups for certain projects. Each group has a leader

Relation : ENGINEER(EMP-NO, .,_ENG=E_M_P=N@ GROUP-LEADER-OF Null ENG-EMP-NO allowed in ENGINEER. (c)

Every apprentice tutors one of the other apprentices, One may be tutored by several other apprentices.

Relation : APPRENTICE(FMP-NO, ., _AP_P:E_M!3Q) TUTORSw Null APP-EMP-NO not allowed in APPRENTICE. (4

PROJECT Each project may require special communication with many other projects.

Relations : PROJECT(PROJ-NO, .) SPEC-COMM-WITH(PROJ-NAME, RELA-PROJ-NAME) SPEC-COMM-WITH6 (4

Figure 9. Unary relationship transformation rules.

Computing Surveys, Vol. 18, No. 2, June 1986 An engineer will use one casebook for a given project. Different engineers use different casebooks for the same project. No engineer will use the same casebook for different projects, but different engineers can use the ( ENGINEER 1 same casebook for different projects,

Relations : ENGINEER(ENP-NO, 1 USE-CASEBOOK PROJECT(PROJ-NAME, 1 CASEBOOK(BOOK-NO, 1 USE-CASEBOOK(ENP-NO, PROJ-NAME, BOOK-NO)

PROJECT CASEBOOK FDs : ENP-NO, PROJ-NAME ---> BOOK-NO BOOK-NO, PROJ-NAME ---> ENP-NO ENP-NO, BOOK-NO ---> PROJ-NAME

USE-CASEBOOK

(a)

Employees use a wide range.of different skills on each project they are associated ENGINEER with. I lelations : .NPLOYEE(FNP-NO, 1 SKILLCSKILL-NO, 1 PROJECT(PROJ-NAME, ) / SKILL-USEDVMP-NO. SKILL-NO. PROJ-NAME) FDs : ENP-NO, SKILL-NO, PROJ-NO ---> @I (all key)

SKILL-USED

(b) Figure 10. Ternary relationship transformation rules. EMPLOYEE Employees are assigned to one or more projects, but can only be assigned to at most one project at a given location.

i-1 Relations : EMPLOYEE(EMP-NO, 1 PROJECT(PROJ-NAMF, ) LOCATION(LOC-NAME, .I ASSIGNED-TO(EMP-NO. LOC-NAME,PROJ-NAME) FDs : EMP-NO. LOC-NAME ---> PROJ-NAME

ASSIGNED-TO

Apprentices work on projects under instructions of sponsors. No sponsor can instruct any given APPRENTICE apprentice on more than one project. No apprentice can work on any given project under the instructlon w of more than one sponsor. SPONSORS Relations : A APPRENTICE(FMP-NO, ) SPONSOR(EMP-NO, ) PROJECT(PROJ-NAME, ) SPONSORS(SPON-EMP-NO, APP-EMP-NO, PROJ-NAME)

FDs : APP-EMP-NO,SPON-EMP-NO ---> PROJ-NAME APP-EMP-NO,PROJ-NAME ---> SPON-EMP-NO

SPONSORS

APP-EMP-NO 1 SPON-EMP-NO PROJ-NAME I 101 3 BETA 101 9 EPSILON 207 9 ALPHA 207 4 DELTA 512 4 GAMMA 512 9 ALPHA 763 6 BETA (4 Figure 10. (continued) 214 . T. J. Teorey, D. Yang, and J. P. Fry

Different types of employees are partitioned by values of a common attribute JOB-TITLE.

Relations : EMPLOYEE(EMP-NO, JOB-TITLE, common attributes) JOE-TITLE EMP.MANAGER(EMP-NO, specific attributes) EMP.SECRETARY(EMP-NO, specific attributes) EMP .TECHNICIAN(EMP-NO, MANAGER SECRETARY TECHNICIAN specific attributes)

Figure 11. Generalization hierarchy.

Employees with special situations are shown as overlapping subsets based on partitions on values of different attributes.

Relations : EMPLOYEE(EMP-NO, common attr-lbutes) EMP.STUDENT(EMP-NO, specific attributes) EMP.POLITICIAN(EMP-NO< specific attributes)

Figure 12. Subset hierarchy. necessary to check whether a tuple with the to establish their uniqueness. Otherwise same key value exists in the set relation. A they have the same transformation prop- change to a nonkey attribute affects the set erties as strong entities, and no special rules or one of the subsets. A change to a key are needed. When a weak entity is already affects the set and anywhere from none to derived from two or more strong entities in all of the subsets. the ER diagram, it can be directly trans- formed into an entity relation without 3.1.5 Multiple Relationships further change.

Multiple relationships among n entities are 3.1.7 Aggregation always considered to be completely inde- pendent. Each one-to-one or one-to-many The aggregation abstraction can occur relationship results in entity relations that among entities or relate attributes to a sin- are either equivalent or that differ only in gle entity [Smith and Smith 19771. Aggre- the addition of a foreign key and thus can gation among entities, defined by the be merged into a single entity relation con- PART-OF relationship, is a special case taining all foreign keys. Many-to-many re- of the collection of one-to-one or one-to- lationships result in relationship relations many binary relationships and can be that are unique and cannot be merged. transformed as defined in Section 3.1.1. For example, BICYCLE can represent the whole entity, while SEAT, PEDALS, 3.1.6 Existence-Dependent (Weak) Entities HANDLEBARS, etc., represent its parts, Weak entities differ from (strong) entities each part being an entity with its own only in the need for keys from other entities distinct attributes.

Computing Surveys, Vol. 18, No. 2, June 1986 Logical Design Methodology for Relational Databases 9 215 3.2 Design Step 2 Details such a transformation, normalization could easily be reestablished using any of the The following steps summarize the basic well-known methods [Bernstein 1976; transformation rules given in Section 3.1. Fagin 1977; Ullman 1980; Lien 1981; Step 2.1. Transform each entity into a Zaniolo and Melkanoff 1981; Maier 1983; relation containing the key and nonkey at- Martin 1983; Yao 19851. tributes of the entity. If there is a many- to-one relationship between an entity and 3.3 Example another (or same type) entity, add the key of the entity on the “one” side (the parent) The transformation of EER diagrams to into the relation. If there is a one-to-one candidate relations is applied to our exam- relationship between an entity and another ple database of company personnel and (or same type) entity, then add the key of projects, as shown in Figures 7-12, and one of the entities into the relation for the summarized in Figure 13. A summary of other entity. The addition of a foreign key the transformation of all entities and due to a one-to-one relationship can be their relationships to candidate relations made in either direction. One strategy is to (Steps 2.1-2.3) is illustrated in Table 1. maintain the most natural parent-child re- Primary keys are italicized. We include lationship by putting the parent key into some of the most typical nonkey attributes the child relation. Another strategy is based we assume have been obtained from the on efficiency: Add the foreign key to the requirements analysis. relation with fewer tuples. Every entity in a generalization hier- 4. NORMALIZATION OF RELATIONS archy or subset hierarchy is transformed (STEP 3) into a relation. Each of these relations con- tains the key of the generic entity. The Normalization of candidate relations is ac- generic entity relation also contains nonkey complished by analyzing the FDs and values that are common to all the entities MVDs associated with those relations. so related, and the other relations contain Further analysis is then needed to elimi- nonkey values specific to each subtype en- nate data redundancies in the normalized tity. Another option is to include all sub- candidate relations. type attributes in the supertype (generic) entity and allow null values. 4.1 Design Step 3 Details Step 2.2. Transform every many-to- Step 3.1. Derive the primary FDs from many binary (or unary) relationship into a the EER diagram. relationship relation with the keys of the entities and the attributes of the relation- Primary FDs represent the dependencies ship. among data elements that are keys of en- tities, that is, the interentity dependencies. Step 2.3. Transform every ternary (or Secondary FDs, on the other hand, repre- higher n-ary) relationship into a rela- sent dependencies among data elements tionship relation using the rules given in that comprise a single entity, that is, the Figure 10. intraentity dependencies (see Step 3.2). Entity normalization is normally pre- Table 2 shows the type of primary FDs served under these transformations. The derivable from each type of EER construct introduction of a foreign key into an entity defined in Section 1.3 and consistent with relation will not result in additional func- the derivable candidate relations in Figures tional dependencies in a normalized rela- 8-12. In fact, each primary FD is associated tion if the EER diagram is correct, that is, with exactly one candidate relation that if all attributes are associated with the represents a relationship among entities in proper entities. If the EER diagram is not the EER diagram. correct, however, additional FDs could oc- On the basis of the transformations in cur, causing some denormalization. After Table 2 we summarize the basic types of

Computing Surveys, Vol. 18, No. 2, June 1986 216 l T. J. Teorey, D. Yang, and J. P. Fry

SKILL DEPARTMENT DIVISION li*ulil) /OIPT-NOI (,,,...I

SKILL-USED

PROJECT EMPLOYEE

ASSIGNED-TO

LOCATION

EMP.MANAGER EllP.ENGINEER EMP.TECHNICIAN EMPSECRETARY

BELONGS-TO

PA-NO EMP-NO

PRF-ASSOC PC

Figure 13. Company personnel and project database candidate relations.

Table 1. Transformation of Entities and Relationships to Relations (Example) Step 2.1. Entities to relations 1. DIVISION(DZV-NO, . . . , HEAD-EMP-NO) 2. DEPARTMENT(DEZ’T-NO, DEPT-NAME, ROOM-NO, PHONE-NO,. . . , DIV-NO, MANAG-EMP-NO) 3. EMPLOYEE(EMP-NO, EMP-NAME, JOB-TITLE,. . . , DEPT-NO, SPOUSE-EMP-NO, PC-NO) 4. SKILL(SKILL-NO, . . .) 5. PROJECT(PROJ-NAME, . . .) 6. LOCATION(LOC-NAME, . . .) 7. EMP.MANAGER(EMP-NO, . . .) 8. EMP.ENGINEER(EMP-NO, , . .) 9. EMP.TECHNICIAN(EMZ=-NO, . . .) 10. EMP.SECRETARY(EMZ’-NO, .) 11. PC(PC-NO, .) 12. PRF-ASSOC(PA-NO, . . .)

Step 2.2. Binary or unary relationships to relations 13. BELONGS-TO(PA-NO, EMP-NO)

Step 2.3. Ternary (or any n-ary) relationships to relations 14. SKILL-USED(EMP-NO, SKILL-NO, PROJ-NAME) 15. ASSIGNED-TO(EMP-NO, LOC-NAME, PROJ-NAME)

Computing Surveys, Vol. 18, No. 2, June 1986 Logical Design Methodology for Relational Databases l 217

Table 2. Primary FDs Derivable from EER Relationship Constructs Degree Connectivitv Primarv FD Binary 1 to 1 Keycone A) + key(one B) Key (one B) + keytone A) 1 to l(opt) Key(one A) + keytone B) Keytone B) --) keycone A) l(opt) to l(opt) Keytone A) --) keytone B) Key(one B) + keytone A) 1 to many Key(many) + keytone) l(opt) to many Key(many) -+ keytone) Many to many Composite key + 0 Unary 1 to 1 Key(one A) + key(one B) Keytone B) --) keyfone A) l(opt) to l(opt) Keytone A) + keytone B) Keycone B) + key(one A) l(opt) to many Key(many + key(one) 1 to many Key(many) + key(one) Many to many Composite key + 0 Ternary 1 to 1 to 1 Key(A), key(B) ---) key(C) Key(A), key(C) -+ key(B) Key(B), key(C) + key(A) 1 to 1 to many Keytone A), key(many) + key(one B) Keycone B), key(many) + key(one A) 1 to many to many Keycmany A), key(many B) + key(one) Many to many to many Composite key + 0 Generalization hierarchy (Secondary FD only) Subset hierarchy (Secondary FD only) primary FDs derivable from EER relation- Step 3.2 Examine all the candidate re- ship constructs: lations for MVDs and secondary FDs. Each candidate relation is examined to (1) key (many side) + determine what dependencies exist among key (one side); primary key, foreign key, and nonkey attri- (2) key (one side A) + key (one side B); butes. If the EER constructs do not include (3) key (many side A), nonkey attributes, the data requirements specification (or data dictionary) must be key (many side B) -+ consulted. key (one side); (4) key (one side A), key (many side) + Step 3.3. Normalize all candidate rela- key (one side B); tions to the highest degree desired, elimi- (5) key (one side A), nating any redundancies that occur in the key (one side B) + normalized relations. key (one side C); Each candidate relation now has possibly (6) composite key + 0. some primary FDs, secondary FDs, and MVDs uniquely associated with it. These Types (1) and (2) represent an embedded dependencies determine the current degree foreign key functionally determined by the of normalization of the relation. Any of the primary key in a unary or binary relation- well-known techniques for increasing the ship; types (3)~(5) apply only to ternary degree of normalization can now be applied relationships; and type (6) applies to all to each relation, with the desired degree as degrees of relationships in which the rela- stated in the requirements specification. tion is represented as all key. Functional The elimination of data redundancy dependencies for higher degree n-ary rela- tends to minimize storage space and update tionships can be obtained by extending cost without sacrificing data integrity. In- (3)-(6). tegrity is maintained by constraining the

Computing Surveys, Vol. 18, No. 2, June 1986 218 l T. J. Teorey, D. Yang, and J. P. Fry

new relation schema to include all data Table 3. Functional Dependencies Derived dependencies existing in the candidate nor- from the EER Diaaram IExamoleI malized relation schema. A relation B that 1. DIV-NO + HEAD-EMP-NO is subsumed by another relation A can be 2. HEAD-EMP-NO -+ DIV-NO 3. DEPT-NO + DIV-NO potentially eliminated. Relation B is sub- 4. DEPT-NO + MANAG-EMP-NO sumed by another relation A when all the 5. MANAG-EMP-NO + DEPT-NO attributes in B are also contained in A and 6. EMP-NO + DEPT-NO all data dependencies in B also occur in A. 7. EMP-NO + JOB-TITLE As a trivial case, any relation containing 8. EMP-NO + SPOUSE-EMP-NO 9. SPOUSE-EMP-NO + EMP-NO only a composite key and no nonkey attri- 10. EMP-NO + PC-NO butes is automatically subsumed by any 11. PC-NO + EMP-NO other relation containing the same key at- 12. EMP-NO, SKILL-NO, PROJ-NAME + 0 tributes, because the composite key is the 13. EMP-NO, LOC-NAME + PROJ-NAME weakest form of data dependency. If, how- 14. EMP-NO, PA-NO --+ 0 ever, relations A and B represent the ge- neric and specific cases, respectively, of entities defined by the generalization (or Table 4. Secondary Functional Dependencies (Example) subset) hierarchy abstraction, and A sub- sumes B because B has no additional spe- 1. DEPT-NO + DEPT-NAME 2. DEPT-NO -+ ROOM-NO cific attributes, the designer must collect 3. ROOM-NO + PHONE-NO and analyze additional information to de- 4. EMP-NO + EMP-NAME cide whether or not to eliminate B. A relation can also be subsumed by the construction of a join of two other relations PHONE-NO) and deleting PHONE-NO (a “join” relation). When this occurs, the from the DEPARTMENT relation. elimination of a subsumed relation may This example contains no data redun- result in the loss of retrieval efficiency, dancies that can be eliminated without los- although storage and update costs will tend ing some degree of normalization. In Table to be decreased. This trade-off must be 1 neither SKILL-USED nor ASSIGNED- further analyzed in physical design with TO relations can be subsumed by joins of regard to processing requirements to deter- other relations. SKILL-USED has been mine whether elimination of the subsumed defined as 4NF and cannot be lossless de- relation is reasonable. composed. ASSIGNED-TO contains the functional dependency EMP-NO, LOC- 4.2 Example NAME + PROJ-NAME, which cannot be In Step 3.1 we obtain the primary FDs by reconstructed with any of the other rela- applying the rules in Table 2 to each rela- tions. Thus the final normalized relations tionship in the EER diagram in Figure 7. are now completely defined (Table 5). In The results are shown in Table 3. general, we observe that the candidate re- In Step 3.2 we determine the secondary lations are good estimators of the final FDs and MVDs from the EER diagram or schema and normally require very little requirements specification. Let us assume refinement. that the dependencies given in Table 4 are derived from the requirements specifica- 5. REFINEMENTS TO THE LOGICAL tion. DESIGN PROCESS Normalization of the candidate relations 5.1 Addition of More Semantics is accomplished in Step 3.3. In Table 1, to Conceptual Modeling only the DEPARTMENT relation is not at least 3NF due to the transitive functional Conceptual modeling of databases is by no dependency DEPT-NO + ROOM-NO + means confined to the ER approach. A PHONE-NO. This is easily resolved by number of other schools of thought have creating the relation ROOM(ROOM-NO, received attention and some perhaps offer

Computing Surveys, Vol. 18, No. 2, June 1986 Logical Design Methodology for Relational Databases l 219

Table 5. Normalized Relations (Example) 1. DIVISION(DZV-NO. HEAD-EMP-NO. . . .) 2. DEPARTMENT(Dk”-NO, DEPT-NAME, ROOM-NO,. . , DIV-NO, MANAG-EMP-NO) 3. EMPLOYEE(EMP-NO, EMP-NAME, JOB-TITLE,. . . , DEPT-NO, SPOUSE-EMP-NO, PC-NO) 4. SKILL(SKILL-NO, . .) 5. PROJECT(PROJ-NAME, . . .) 6. LOCATION(LOC-NAME, .) 7. EMP.MANAGER(EMP-NO, . . .) 8. EMP.ENGINEER(EMP-NO, . . .) 9. EMP.TECHNICIAN(EMP-NO, . . .) 10. EMP.SECRETARY(EMP-NO, . . .) 11. PC(PC-NO, . . .) 12. PRF-ASSOC(PA-NO, . . .) 13. BELONGS-TO(PA-NO, EMP-NO) 14. SKILL-USED(EMP-NO. SKILL-NO. PROJ-NAME) 15. ASSIGNED-Tb(EMP-hi0, LOC-NAME, PROJ-NAME) 16. ROOM(ROOM-NO, PHONE-NO)

a richer semantic base than the ER model. tivity (cardinality), and membership class The application of the semantic network (mandatory or optional existence). model to conceptual schema design was One of the most significant advantages shown by Bachman [1977] and McLeod of the binary relationship model is the in- and King [1979], and the binary relation- clusion of constraints on role occurrences: ship model concepts were studied by Abrial identifier, subset, equality, uniqueness, and [1974], Bracchi et al. [1976], Nijssen et al. disjoint. Identifiers are included in the ER [1979], IS0 [1982], and Kent [1984]. The model, but not all the variations explicitly significant semantic network concepts have defined in the binary relationship model. been incorporated into the EER model de- The disjoint constraint can be modeled by scribed here, but the binary relationship the generalization and subset hierarchies in model incorporates considerably more se- EER, but the subset, equality, and unique- mantics than either the ER or EER models ness contraints have no equivalent in EER. [ISO 19821. Other extensions to the original Further work is needed to investigate the ER model, such as the inclusion of the time potential of these constraints as additional dimension, have also been described else- extensions to the ER model and whether where [Bubenko 1977; Clifford and Warren semantic equivalence can be achieved be- 1983; Ferg 19851. tween these (and other) approaches. The binary relationship approach is the basis of the information analysis method 5.2 Relation Refinement for Usage Efficiency called NIAM [Verheijen and Van Bekkum 19821. This approach defines a lexical ob- Database design techniques for network ject type, nonlexical object type, and role; and hierarchical systems often make use of these roughly correspond to the attribute, processing requirements to refine the logi- entity, and relationship concepts in the ER cal schema before the physical design phase model. However, the binary relationship if there are obvious large efficiency gains to model tries to avoid making entity-attri- be made [Teorey and Fry 1982; Bertaina et bute decisions as early in the conceptual al. 1983; Hawryszkiewycz 19841. The justi- modeling process as with the ER model. fication for this approach is that once phys- One difference between the models is that ical design begins, the is role names in the binary relationship model considered to be fixed and is thus a con- are directional between two lexical object straint on efficiency. The database designer types or between a lexical and nonlexical would obviously like to remove this inflex- object type. The binary relationship model ibility if possible. A similar technique would also includes the EER concepts of subtyp- be useful in relational database design if it ing (generalization), relationship connec- were to result in more efficient database

Computing Surveys, Vol. 18, No. 2, June 1986 220 l T. J. Teorey, D. Yang, and J. P. Fry

schemas without loss of data integrity and 6. CONCLUSION were relatively easy to implement. We have shown that a practical step-by- This approach is not considered part of step methodology for relational database logical design by many, whereas others de- design can be derived using a variety of fine it as a separate step between logical extensions to the ER conceptual model. and physical design. We take the latter view The methodology has been illustrated with and think of it as a prephysical design step, a simple database design problem, showing definitely not part of conceptual design, but each design step in detail. The strategy of potentially an extension of logical design first modeling the natural data relation- defined as broader in scope than conceptual ships and later refining the design for nor- design. Regardless of where this step be- malized relations was emphasized as clearly longs, its goal is to refine a relational separable phases. The methodology pro- schema. Its algorithm is based on a process- duces nearly reproducible designs from a oriented or usage view of the data to in- given requirements specification and can crease the database efficiency for current be easily implemented as an interactive processing requirements and yet retain all database design tool. the information content of the functional dependency or natural view of data. Thus APPENDIX: SUMMARY OF LOGICAL the database would still be an accurate RELATIONAL DATABASE representation of real-world relationships, DESIGN STEPS more efficient, and potentially more adapt- able to future processing requirements. The 1. Extended ER (EER) modeling of re- results of this algorithm could be used to quirements specify alternative logical structures to be 1.1 Identify entities and attach attri- considered during physical design, and thus butes to each. provide the physical designers with more 1.2 Identify generalization and subset feasible solutions to choose from. More ef- hierarchies. ficient databases are therefore likely to be 1.3 Define relationships. defined. 1.4 Integrate multiple views of entities, One approach is to assume that all attri- attributes, and relationships. butes are initially assigned to relations on the basis of functional dependencies, and 2. Transformation of the EER model to that the relations are at least 3NF. Effi- relations ciency for the current query requirements 2.1 Transform every entity into one could increase by redundantly adding attri- relation with the key and nonkey butes, used together in a query, to an exist- attributes of the entity. ing relation so that all attributes needed for 2.2 Transform every many-to-many bi- that query would reside in a new relation, nary (or unary) relationship into a called a join relation. This is known as relationship relation. materializing the join [Schkolnick and 2.3 Transform every ternary (or higher Sorenson 19801. Query processing time n-ary) relationship into a relation- might be greatly reduced because fewer ship relation. joins would be needed. However, the side effects of this redundancy include an 3. Normalization of relations increase in storage space required, an in- 3.1 Derive the primary FDs from the crease in the update and referential integ- EER diagram. rity cost, potential denormalization and 3.2 Examine all the candidate relations loss of integrity, and program transforma- for MVDs and secondary FDs. tions for all applications containing joins 3.3 Normalize all candidate relations to that are materialized. Further research is the highest degree desired, eliminat- needed to demonstrate the practicality of ing any redundancies that occur in this step. the normalized relations.

Computing Surveys, Vol. 18, No. 2, June 1986 Logical Design Methodology for Relational Databases l 221 ACKNOWLEDGMENTS CODD, E. 1970. A relational model for large shared data banks. Commun. ACM 13, 6 (June), The authors wish to thank Emerson Hevia and the 377-387. referees for their critique of the manuscript. CODD, E. 1974. Recent investigations into relational data base systems. In Proceedings of the ZFZP REFERENCES Congress. North-Holland, Amsterdam. DATE, C. 1984. A Guide to DB2. Addison-Wesley, ABRIAL, J. 1974. Data semantics. In Data Base Man- Reading, Mass. agement, Proceedings of the ZFZP TC2 Conference DATE, C. 1985. An Introduction to Database Systems, (Cargese, Corsica). North-Holland, Amsterdam. vol. 1,4th ed. Addison-Wesley, Reading, Mass. ATZENI, P., BATINI, C., LENZERINI, M., AND ELMASRI, R. AND WIEDERHOLD, G. 1979. Data VILLANELLI, F. 1981. INCOD: A system for model integration using the structural model. In conceptual design of data and transactions in the Proceedings of ACM SZGMOD International entity-relationship model. In Entity-Relationship Conference on Management of Data (Boston, Approach to Information Modeling and Analysis. May 30June 1). ACM, New York, pp. 319-326. ER Institute, Saugus, Calif. ELMASRI, R., HEVNER, A., AND WEELDREYER, J. BACHMAN,C. 1977. The role concept in data models. 1985. The category concept: An extension to the In Proceedings of the 3rd International Conference entity-relationship model. Data Knowl. Eng. 1, 1, on Very Z&ge’Data Bases (Tokyo, Oct. 6-8). 75-116. IEEE, New York, pp. 464-476. FAGIN, R. 1977. Multivalued dependencies and a new BEERI, C., BERNSTEIN, P., AND GOODMAN, N. normal form for relational databases. ACM 1978. A sophisticate’s introduction to database Trans. Database Syst. 2,3 (Sept.), 262-278. normalization theory. In Proceedings of the 4th International Conference on Very Large Data FERG, S. 1985. Modeling the time dimension in Bases (Berlin, Sept. 13-15). IEEE, New York, an entity-relationship diagram. In Proceedings pp. 113-124. of the 4th Znternationnl Conference on the Entity-Relationship Approach (Chicago). IEEE BERNSTEIN, P. 1976. Synthesizing 3NF Relations Computer Society Press, Silver Spring, Md., from functional dependencies. ACM Trans. pp. 280-286. Database Syst. I, 4 (Dec.), 272-298. HAWRYSZKIEWYCZ,I. 1984. Database Analysis and BERTAINA, P., DILEVA, A., AND GIOLITO, P. 1983. Design. SRA, Chicago. Logical design in CODASYL and relational HOWE, D. 1983. Data Analysis and Data Base environments. In MethodoZogy and Tools for Design. Arnold, London. Data Base Design, S. Ceri, Ed. North-Holland, Amsterdam, pp. 85-117. IS0 1982. Concepts and terminology for the concep- BRACCHI, G., PAOLINI, P., AND PELAGAT~I, G. tual schema and the information base. J. van 1976. Binary logical associations in data mo- Griethuysen, Ed. ISO/TC97/SCS/WG3-N695 Report. ANSI, New York, 180 pp. delling. In Modelling in Data Base Management Systems, Proceedings of the ZFZP TC2 Conference JAJODIA,S., AND NC, P. 1983. On the representation (Freudenstadt, West Germany). North-Holland, of relational structures by entitycrelationship Amsterdam. Diagrams. In The Entity-Relationship Approach BRIAND, H., HABRIAS, H., HUE, J., AND SIMON, Y. to Software Engineering; G. C. Davis-et al., Eds. Elsevier North-Holland, New York, pp. 223-248. 1985. Expert system for translating an E-R dia- gram into databases. In Proceedings of the 4th JAJODIA, S., AND NC, P. 1984. Translation of entity- international Conference on EntityIRekionship relationship diagrams into relational structures. Approach (Chicago). IEEE Computer Society J. Syst. Softw. 4, 123-133. Press, Silver Spring, Md., pp. 199-206. KENT, W. 1981. Consequences of assuming a univer- BUBENKO, J. 1977. The temporal dimension in in- sal relation. ACM Trans. Database Syst. 6, 4 formation modelling. In Architecture and Models (Dec.), 539-556. in Data Base Management Systems, G. Nijssen, KENT, W. 1984. Fact-based data analysis and design. Ed. North-Holland, Amsterdam. J. Syst. Softw. 4,99-121. CHEN, P. 1976. The entity-relationship model- LENZERINI, M., AND SANTUCCI,G. 1983. Cardinality Toward a unified view of data. ACM Trans. constraints in the entity-relationship model. In Database Syst. 1, 1 (Mar.), 9-36. The Entity-Relationship Approach to Software CHUNG, I., NAKAIUURA,F., AND CHEN, P. 1981. A Engineering, G. C. Davis et al., Eds. Elsevier decomposition of relations using the entity- North-Holland, New York, pp. 529-549. relationship approach. In Entity-Relationship LIEN, Y. 1981. Hierarchical schemata for relational Approach to Information Modeling and Analysis, databases. ACM Trans. Database Syst. 6, 1 P. Chen, Ed. North-Holland, Amsterdam. (Mar.), 48-69. CLIFFORD, J., AND WARREN, D. 1983. Formal se- LING, T. 1985. A normal form for entity-relationship mantics for time in databases. ACM Trans. diagrams. In Proceedings of the 4th International Database Syst. 8, 2 (June), 214-254. Conference on the Entity-Relationship Approach

Computing Surveys, Vol. 18, No. 2, June 1986 222 l T. J. Teorey, D. Yang, and J. P. Fry (Chicago). IEEE Computer Society Press, Silver SAKAI, H. 1983. Entity-relationship approach to log- Spring, Md., pp. 24-35. ical database design. In Entity-Relationship Ap- MAIER, D. 1983. Theory of Relational Databases. proach to Software Engineering, C. G. Davis, Computer Science Press, Rockville, Md. S. Jajodia, P. A. Ng, and R. T. Yeh, Eds. Elsevier MARTIN, J. 1982. Strategic Data-Planning Method- North-Holland, New York, pp. 155-187. ologies. Prentice-Hall, Englewood Cliffs, N.J. SCHEUERMANN,P., SCHEFFNER,G., AND WEBER, H. MARTIN, J. 1983. Managing the Data-Base Environ- 1980. Abstraction capabilities and invariant ment. Prentice-Hall, Englewood Cliffs, N.J. properties modelling within the entity-relation- ship approach. In Entity-Relationship Approach MCGEE, W. 1974. A contribution to the study of data to Systems Analysis and Design, P. Cben, Ed. equivalence. In Data Base Management, J. W. North-Holland, Amsterdam, pp. 121-140. Klimbie and K. L. Koffeman, Eds. North- Holland, Amsterdam, pp. 123-148. SCHKOLNICK, M., AND SORENSON, P. 1980. DENORMALIZATION: A performance oriented MCLEOD, D., AND KING, R. 1979. Applying a database design technique. In Proceedings of semantic . In Proceedings of the AZCA 1980 Congress (Bologna, Italy). AICA, the 1st International Conference on the Entity- Brussels, pp. 363-377. Relationship Approach to Systems Analysis and SMITH, H. 1985. Database design: Composing fully Design (Los Angeles). North-Holland, Amster- normalized tables from a rigorous dependency dam, pp. 193-210. diagram. Commun. ACM 28,8 (Aug.), 826-838. NAVATHE, S., AND CHENG, A. 1983. A methodology SMITH, J., AND SMITH, D. 1977. Database abstrac- for mapping from extended en- tions: Aggregation and generalization. ACM tity relationship models into the hierarchical Trans. Database Syst. 2,2 (June), 105-133. model. In The Entity-Relationship Approach to Software Engineering, G. C. Davis et al., Eds. SWEET, F. 1985. Process-driven data design. Data- Elsevier North-Holland, New York. mation 31,16,84-85, first of a series of 14 articles. NAVATHE, S., AND GADGIL, S. 1982. A methodol- TEOREY, T., AND FRY, J. 1982. Design of Database ogy for view integration in logical database de- Structures. Prentice-Hall, Englewood Cliffs, N.J. sign. In Proceedings of the 8th International ULLMAN, J. 1980. Principles of Database Systems. Conference on Very Large Data Bases (Mexico Computer Science Press, Potomac, Md. City). VLDB Endowment, Saratoga, Calif., VERHEIJEN, G., AND VAN BEKKUM, J. 1982. NIAM: pp. 142-152. An information analysis method. In Information NAVATHE, S., SASHIDHAR, T., AND ELMASRI, R. Systems Design Me0wdolagies, Olle, Sol, and 1984. Relationship merging in schema integra- Verrvn-Stuart. Eds. North-Holland, Amsterdam, tion. In Proceedings of the 10th International pp. 537-590. Conference on Very Large Data Bases (Singa- WEBRE, N. 1981. An Extended E-R model and its pore). VLDB Endowment, Saratoga, Calif., use on a defense project. In Proceedings of the pp. 78-90. 2nd International Conference on the Entity- NAVATHE, S., ELMASRI, R., AND LARSON, J. 1986. Relationship Approach, (Washington, D.C.). Integrating user views in database design. IEEE North-Holland, Amsterdam, pp. 175-194. Computer 19, 1, 50-62. WONG, E., AND KATZ, R. 1980. Logical design and NIJSSEN, G., VAN ASSCHE, F., AND SNIJDERS, J. schema conversion for relational and DBTG 1979. End user tools for information systems databases. In Entity-Relationship Approach to requirement definition. In Formal Models and Systems Analysis and Design, P. Chen, Ed. North- Practical Tools for Information System Design, Holland, Amsterdam, pp. 311-322. H. Schneider, Ed. North-Holland, Amsterdam. YANG, D., TEOREY,T., AND FRY, J. 1985. A practical OREN, 0. 1985. Integrity constraints in the con- approach to transforming extended ER diagrams ceptual schema language SYSDOC. In Proceed- into the relational model. Computing Research ings of the 4th International Conference on the Laboratory CRL-TR-6-85, Electrical Engineer- Entity-Relationship Approach (Chicago). IEEE ing and Computer Science Dept., Univ., of Computer Society Press, Silver Spring, Md., Michigan, Ann Arbor. pp. 288-294. YAO, S. 1985. Principles of Database Design: Vol. 1, REINER, D., BRODIE, M., BROWN, G., FRIEDELL, M., Logical Organizations. Prentice-Hall, Englewood KRAMLICH, D., LEHMAN, J., AND ROSENTHAL, Cliffs, N.J. A. 1985. The database design and evaluation ZANIOLO, C., AND MELKANOFF, M. 1981. On the workbench (DDEW) project at CCA. Database design of relational database schemas. ACM Eng. 7,4,10-15. Trans. Database Syst. 6, 1 (Mar.), l-47.

Received January 1986; revised May 1986; final revision accepted September 1986.

ComputingSurveys, Vol. 18,No. 2, June 1986