1

Database design

- Satya

Satya © 2004 Architect’s buzZ word 2

"Normalization is a logical concept, performance is determined at the physical level. Therefore, it is impossible to denormalize for performance."

Fabian Pascal – co-founder & editor of Debunkings (dbdebunk.com)

Satya © 2004 Architect’s buzZ word 3

“Denormalization, if necessary, should be done at the level of stored files, not at the level of base relvars”

“denormalization, is not ‘good for performance’, it is good for the performance of specific applications”

Chris J. Date – Most respected database expert in Computer Industry, Author – Database Systems.

Satya © 2004 Background 4

• Presenting concepts, not syntax. • Presenting “How” & “What” not “Why” in the RDBMS.

Satya © 2004 Agenda 5

• Introduction

Satya © 2004 Ready for the data eXplosion? 6

40,000 BCE cave paintings bone tools 3500 writing 0 C.E. paper 105 1½B 1450 in 1999 printing 1870 electricity, telephone transistor 1947 GIGABYTES computing 1950 Late 1960s Internet 1993 (DARPA) The web 1999

Source: UC Berkeley Satya © 2004 The coming content - “Big Bang” 7

2003 24B

2002 12B GIGABYTES 40,000 BCE cave paintings bone tools 3500 writing 0 C.E. paper 105 2001 6B 1450 printing 1870 2000 3B electricity, telephone transistor 1947 computing 1950 Late 1960s Internet 1993 (DARPA) The web 1999 Source: UC Berkeley Satya © 2004 Future data size 8

• Terabytes of data – Common corporate expression – Petabytes(10^15) & Exabytes(10^18) is fast approaching • 2-3 Exabytes = total volume of all information generated worldwide annually – Need structure to efficiently handle large data.

Source: 2001 - IBM Informix Conference, Las Vegas. Satya © 2004 Database – the solution 9

Database

? An Organized Store of Information

–Flat Files Adabas, FileMaker

–Hierarchical IBM’s Information Management System (IMS) – used in Apollo Moon Landing. –Network Databases GE’s Integrated Data Store (IDS)

–Relational Databases Oracle, Db2, Sybase, MS SQL, Postgres

–Object Relational Databases Oracle 9

–Object Databases Cloudscape

Satya © 2004 Database development activities during the 10 systems development life cycle (SDLC)

Project Identification Enterprise modeling and Selection

Project Initiation and Planning Conceptual data modeling

Analysis

Logical Design

Physical Design

Implementation

Maintenance

Satya © 2004 Database Application Lifecycle 11

DATABASE PLANNING

SYSTEMS DEFINITION

REQUIREMENTS ANALYSIS

DB Design CONCEPTUAL DESIGN APPLICATION DBMS SELECTION LOGICAL DESIGN DESIGN

PHYSICAL DESIGN

PROTOTYPING IMPLEMENTATION DATA LOADING / MIGRATION

TESTING

MAINTENANCE Satya © 2004 Database design flow12 Conceptual Design Data analysis and Determine end user views, outputs, and requirements transaction processing requirements.

Entity relationship modeling Define entities, attributes and relationships. and normalization Draw ER diagrams. Normalize tables. DBMS independent Identify main processes, insert, update and Data model verification delete rules. Validate reports, queries, views, integrity, sharing and security.

Distributed database design Define location of tables, access requirements and fragmentation strategy.

DBMS software selection

Translate the conceptual model into DBMS Logical design definitions for tables, views and so on… dependent

Physical design Define storage structures and access paths Hardware for optimum performance. dependent

Satya © 2004 Design Approach 13

• Entity-Relationship (ER) data modeling – A graphical technique for understanding and organizing the data independently of the eventual database implementation • Normalization – An algorithmic process for evaluating the quality of a database design - most applicable to designs

• Types of Models – Models (of databases or anything else) can be built at different levels of abstraction – For databases (following the text): • Conceptual – logical ? ER Models (represent semantics) • Internal - for the chosen DBMS • External - the way the User see the data • Physical - for the actual physical storage

Satya © 2004 14 ER Modeling

Satya © 2004 ER Modeling concepts 15 Entity-Relationship Basics

•The concepts upon which ER models are built are: –Entities (or, more correctly, entity types) –also called as “relvars”, “base relvars”, “” –at physical implementation level, called as table –Relationships (between entities) –Attributes (of entities and relationships)

Satya © 2004 Entities & Entity types 16

•An entity is “A person, place, event, or thing for which we intend to collect data” •Normally a database will contain data about groups of similar entities (e.g. students, subjects, licenses, aircraft or whatever) •These groups of similar entities are referred to as entity types but often this is shortened to just “entity” or “entities”

Satya © 2004 Entity types & Attributes 17

•Entity types are conventionally named in the singular •Attributes are represented on ER diagrams as ellipses attached to the relevant entity type symbol •There are other notations as well (e.g. a list of attributes next to the entity type symbol) but they are conceptually equivalent

DOB

Name student Address

Gender Number student

Satya © 2004 Relationships 18

•A relationship is an association between entity types •Relationships are represented by diamond shaped symbols on ER diagrams •A descriptive name is placed inside the relationship symbol

student enrols subject in

Satya © 2004 Relationship & entity 19

•Entity type names are usually nouns •Relationship names are usually, though not always, verbs (or verb phrases) •Most relationships are binary (i.e. connect 2 entity types) - like “enrolls in” •Other types of relationships are possible

Satya © 2004 Degree of a Relationship 20

•The degree of a relationship is the number of entity type(s) that it connects –One Unary –Two Binary –Three Ternary •Relationships of degree higher than three are rare

enrolls Binary relationship Unary relationship student subject employee supervises

item sale item sale

sale sale vendor purchaser = vendor purchaser Ternary relationship Three binary relationships Satya © 2004 21 Relationship Connectivity (Cardinality)

•Relationships can have different connectivity(s) •one-to-one (1:1) •one-to-many (1:N) •many-to-many (M:N) •Indicated on the ER diagram by placing an appropriate symbol on each “leg” of the relationship

1 supervisor M N enrolls student subject employee supervises

1 N teaches lecturer subject

Satya © 2004 22

E R F E R F E R F

One-to-one relationship Many-to-one relationship Many-to-many relationship min-card(E, R)=0 min-card(E, R)=0 min-card(E, R)=0 max-card(E,R)=1 max-card(E,R)=N max-card(E,R)=N min-card(F,R)=0 min-card(F,R)=1 min-card(F,R)=0 max-card(F,R)=1 max-card(F,R)=1 max-card(F,R)=N

Satya © 2004 Relationship Participation 23

•Entity types connected by a relationship can have two kinds of “participation” in it •Partial (or optional) •Total (or mandatory) •“Total” means that every entity instance must be connected (through the relationship) to an instance of the other participating entity type(s) •“Partial” means not total

1 1 Head of staff department

Satya © 2004 Key Attribute(s) 24

•There will normally be one, or perhaps several, attributes that will be unique for every entity instance •Example: •Every student will have a unique student number •Such an attribute (or combination) is called a key •If the key for an entity set consists of two or more attributes in combination it is called a concatenated key •Key attribute(s) are underlined on the ER diagram

Qualification Name DOB Age

Address Number

person Gender

Satya © 2004 Derived, Multi-valued attributes 25

•Sometimes it is useful to have, on the ER diagram, attributes that can be derived from other attributes •Example: •An attribute Age can be derived from an attribute DOB and the current date •Derived attributes can be indicated on the ER diagram by using a dashed ellipse and connecting line to the relevant entity type

Satya © 2004 Relationships attributes 26

•A relationship is an association between entity sets •Relationships can also have attributes •An attribute of a relationship is drawn attached to the relationship diamond •Usually only M:N relationships have attributes

N

employee supervises

Task M

Satya © 2004 Strong & Weak entities/entity types 27

•Sometimes the instances of one entity type depend, for their unique identification, on their relationship to the instances of another entity type

Name Number

building consists room of

Satya © 2004 Supertypes & Subtypes 28

•Sometimes notionally different entity types are really specializations of a more general entity type •Example: •Trucks, cars, motorcycles, buses, taxis are all motor vehicles •Some attributes are common to all, others are specific to one group •This kind of situation can be dealt with using a generalization hierarchy (or super type/subtype hierarchy) •The attribute(s) that are common belong to the super type •The attributes that are specific are attached to the relevant subtype

Satya © 2004 Supertypes & Subtypes 29

Seats motor Registration vehicle

d U U U

truck car bus

truck car bus attributes attributes attributes

Satya © 2004 Supertypes & Subtypes 30

Gender TFN employee DOB Address o U U U

Safety officer engineer pilot

Safety engineer pilot attributes attributes attributes

Satya © 2004 31 Three schema architecture for Database development

External level (individual user views)

External (COBOL) External (XML )

01 EMPC. Conceptual level 02 EMPNO PIC X(6). (community user view) 02 DEPTNO PIC X(4).

Conceptual Internal level EMPLOYEE (storage view) EMPLOYEE_NUMBER CHARACTER(6) DEPARTMENT_NUMBER CHARACTER(4) SALARY NUMERIC(5)

Internal View STORED_EMP BYTES=20 PREFIX TYPE=BYTE(6), OFFSET=0 EMP# TYPE=BYTE(6), OFFSET=6, INDEX=EMPX DEPT# TYPE=BYTE(4), OFFSET=12 PAY TYPE=FULLWORD, OFFSET=16

Conceptual Schema - Neutral View - External Internal Schema Schema

Satya © 2004 32

Normalization

Satya © 2004 Levels of normalization 33

1NF relvars (normalized entities)

2NF relvars

3NF relvars

BCNF relvars

4NF relvars 5NF relvars

Satya © 2004 Normalization - Keys 34

Superkey: A superkey is a set of one or more attributes that, taken collectively, allows us to identify uniquely an entity. : Any subset of a superkey that is also a superkey and is not reducible to another superkey is called candidate key. Primary key: A primary key is selected arbitrarily from the set of candidate keys to be used in an index for that table.

Source: Database Modeling & Design – Tobey J. Teorey Satya © 2004 Normalization – 1nf 35

First normal form (1NF): Defn: A relvar is in 1NF if and only if, in every legal value of that relvar, every tuple contains exactly one value for each attribute. Explanation: At each row and column position in the table, there exists one value, never a set of values.

Essence: Every row, column should be atomic. Violation of 1NF: Employee (EID#, Name, SkillSet, Address1, Address2)

•SkillSet stores, comma separated values. (C, VisualBasic, Oracle) •How many more addresses can be stored in this fashion?

Source: Administration Guide: Planning – DB2 Database Systems – C.J. Date Satya © 2004 Normalization – 2nf 36

Second normal form (2NF): (Assuming one candidate key, which we assume is the primary key) Defn: A relvar is in 2NF if and only if, it is in 1NF and every nonkey attribute is irreducibly dependent on the primary key. Explanation: Each column that is not part of the key is dependent upon the key.

Essence: All non-keys must depend on Key value. Violation of 2NF: WarehouseParts(PART#, WAREHOUSE#, Qty, WHAddr)

•WAREHOUSE# ? WHAddr •PART# ? Qty

Source: Administration Guide: Planning – DB2 Database Systems – C.J. Date Satya © 2004 Normalization – 3nf 37

Third normal form (3NF): (Assuming one candidate key, which we assume is the primary key) Defn: A relvar is in 3NF if and only if, it is in 2NF and every nonkey attribute is nontransitively dependent on on the primary key.

Note: “No transitive dependencies” implies no mutual dependencies. Explanation: Each column that is not part of the key is dependent upon the key.

Essence: All non-keys must depend “only” on Key value and no other non-key.

Violation of 3NF: Emp_Dept(EID#, FirstName, LastName, WorkDept, DeptName)

Source: Administration Guide: Planning – DB2 Database Systems – C.J. Date Satya © 2004 Normalization - bcnf 38

Boyce/Codd normal form (BCNF): (Assuming composite candidate key as primary key) Defn: A relvar is in BCNF if and only if, every non-trivial, left irreducible FD has a candidate key as its determinant. Explanation: Each column that is not part of the key is fully dependent upon the whole composite key and not on any single key alone.

Essence: All non-keys must depend “only” on “composite” key value and not on a single key.

Violation of BCNF: HotelRoom (HNo#, Room#, RoomType)

RoomType ? Room# & RoomType ?HNo#

Source: Database Systems – C.J. Date Satya © 2004 Normalization – 4nf 39

Fourth normal form (4NF):

Defn: Relvar R is in 4NF if and only if, whenever there exist a subsets A and B of the attributes of R such that the nontrivial MVD A ??B is satisfied, then all attributes of R are also functionally dependent on A. Explanation: No row contains two or more independent multi-valued facts about an entity.

Essence: Two separate facts cannot be in the same entity.

Violation of 4NF: Emp_Skill(EID#, SkillName#, Language#)

Source: Administration Guide: Planning – DB2 Database Systems – C.J. Date Satya © 2004 Normalization – 5nf (pjnf) 40

Fifth normal form (5NF):

Defn: Relvar R is in 5NF(also called projection join normal form) if and only if, every nontrivial join dependency that holds for R is implied by the candidate keys of R. Explanation: If a table can be decomposed further losslessly, then it could be decomposed. R{A,B,C} satisfies JD * {AB,AC} if and only if the MVDs A ?? B and A?? C hold in R

A ?? B | C ? * {AB, AC} Essence: Two separate facts cannot be in the same entity.

Violation of 5NF:

Source: Database Systems – C.J. Date Satya © 2004 Normalization – Others 41

Domain Key normal form (DK/NF):

Defn: A relvar R is said to be in DKNF if and only if, every constraint on R is a logical consequence of the domain constraints and key constraints that apply to R. Explanation: -- Principle of Orthogonal design (A Digression): Eg: SA has suppliers of Paris, SB has suppliers not in paris or with status 30. It is possible for a row to be present in both SA and SB, thus giving rise to update anomaly.

SX(S#, Sname, Status), SY(S#, Sname, City)

This can be best used in Distributed database design.

Source: Database Systems – C.J. Date Satya © 2004 Denormalization - Types 42

1. Collapsing Tables a. Two entities in a m:n relationship To avoid frequent joins, this can be applied. b. Two entities in a 1:1 relationship To avoid updates to two separate entities that are in 1:1 2. Reference data in a 1:m relationship (Add Redundant Columns) When large composite key / derived keys are used, they can be added to child entity in a 1:m relationship as a foreign key, again to avoid certain join operations. 3. Entities with the most detailed data When MVDs/Temporal design is in place, we could store summarized data about MVD attribute/temporal dimension (eg: months) 4. Derived attributes When an attribute is derived by a function of another, but its better to store derived attribute. (eg: SearchName, y = f(x) ? store x,y in R) 5. Splitting Tables (Horizontal / Vertical Splitting)

Source: Denormalization effects on Performance of RDBMS, G. Lawrence Sanders, Seungkyoon Shin, State University of New York, Buffalo Satya © 2004 Denormalization – Criteria 43

Criteria • General application performance requirements • indicated by business needs. • On-line response time requirements for application queries, updates ad processes. • Minimum number of data access paths. • Minimum amount of storage.

Source: Database Modeling 7 Design – Tobey J. Teorey Satya © 2004 Denormalization – Alternatives 44

Alternatives • Application performance criteria. • Future application development and maintenance considerations. • Volatility of application requirements. • Relations between transactions and relations of entities involved. • Transaction type (update/query, OLTP/OLAP). • Transaction frequency. • Access paths needed by each transaction. • Number of rows accessed by each transaction. • Number of pages/blocks accessed by each transaction. • Cardinality of each relation.

Source: Database Modeling & Design – Tobey J. Teorey Satya © 2004 45

Data Modeling

Satya © 2004 Diagramming Notations 46

Notation • Bachman Notation • Chen ERD • Database Model Diagram

Satya © 2004 Diagramming Notations 47

Satya © 2004 Diagramming Notations 48

Database Model Diagram

Satya © 2004 Diagramming Notations 49

ER Source Model

Satya © 2004 50 Diagramming Notations - IDEF1X Notation

Relationship Cardinality

exactly n Attribute And Primary Key Syntax zero, one or more n Entity-name/Entity-number Attribute-Name [Attribute-Name] Primary-Key one or more from n to m Attributes P } n-m [Attribute-Name] zero or one [Attribute-Name] reference to note (n) where [Attribute-Name] Z (n) cardinality is specified

Source: IDEF1X Formalization, 1993, Robert G. Brown (The Database Design Group) Satya © 2004 51 Diagramming Notations - IDEF1X Notation

Identifying Relationship Mandatory Non-Identifying Relationship

Entity-A Key-Attribute-A Entity-A Key-Attribute-A *Parent Entity *Parent Entity

Identifying Relationship Relationship Name Mandatory Non-Identifying Relationship Relationship Name

Entity-B Key-Attribute-A (FK) Entity-B Key-Attribute-B Key-Attribute-B **Child Entity Key-Attribute-A (FK) **Child Entity

* The Parent Entity in an Identifying Relationship may be * The Parent Entity in a Mandatory Non-Identifying Relationship an Identifier-Independent Entity (as shown) or an may be an Identifier-Independent Entity (as shown) or an Identifier-Dependent Entity depending upon other relationships.Identifier-Dependent Entity depending upon other relationships.

** The Child Entity in an Identifying Relationship is always an ** The Child Entity in a Mandatory Non-Identifying Relationship will Identifier-Dependent Entity. be an Identifier-Independent Entity unless the entity is also a Child Entity in some Identifying Relationship.

Source: IDEF1X Formalization, 1993, Robert G. Brown (The Database Design Group) Satya © 2004 52 Diagramming Notations - IDEF1X Notation

Optional Non-Identifying Relationship

Entity-A Key-Attribute-A

*Parent Entity

Optional Non-Identifying Relationship Relationship Name

Entity-B Key-Attribute-B

Key-Attribute-A (FK) **Child Entity

* The Parent Entity in a Optional Non-Identifying Relationship may be an Identifier-Independent Entity (as shown) or an Identifier-Dependent Entity depending upon other relationships.

** The Child Entity in a Optional Non-Identifying Relationship will be an Identifier-Independent Entity unless the entity is also a Child Entity in some Identifying Relationship.

Source: IDEF1X Formalization, 1993, Robert G. Brown (The Database Design Group) Satya © 2004 53 Diagramming Notations - IDEF1X Notation

Domain Hierarchy

Frequency

Base Domain Typed Domains

Radio Audio Frequency Frequency

Ultra High Very High High Frequency Frequency Frequency Ultra-Sonic Sonic Sub-Sonic (UHF) (VHF) (HF)

Source: IDEF1X Formalization, 1993, Robert G. Brown (The Database Design Group) Satya © 2004 54 Diagramming Notations - IDEF1X Notation

Team Organization

Expert

Project Source Manager Modeler

Acceptance Review Committee

Source: IDEF1X Formalization, 1993, Robert G. Brown (The Database Design Group) Satya © 2004 55 Diagramming Notations - IDEF1X Diagram idef1xObject id name (AK1) description (O)

entity view domain entity_id.id (FK) view_id.id (FK) domain_id.id (FK) level domainRule (O) purpose scope /supertype appears in author_conventions aka / real aka / real contains

aliasDomain baseDomain typedDomain domain_id (FK) domain_id (FK) domain_id (FK) aliasEntity realDomain_id.domain_id (FK) dataType (O) superType_id.domain_id (FK) entity_id (FK) viewEntity realEntity_id.entity_id (FK) appears in view_id (FK) entity_id (FK) is_dependent (O) is parent in / parent contains is child in / child is generic in / generic

connectionRelationship parent_id.entity_id (FK) (AK1) cluster viewEntityAttribute connectionNo view_id (FK) (AK1) child_id.entity_id (FK) (AK1) clusterNo is attribute_id.domain_id (FK) alternateKey view_id (FK) (AK1) generic_id.entity_id (FK) (AK1) discriminator view_id (FK) alternateKeyNo name1 (O) (AK1) discEntity_id.entity_id (O) (FK) for entity_id (FK) view_id (FK) name2 (O) (AK1) is_compete is_nonull (O) entity_id (FK) childLow disc_id .attribute_id (O) (FK) (AK1) is_owned (O) childHigh (O) is_migrated (O) parentLow (O) parentHigh (O) contains is-mandatory (O) appears in P is-specific is-identifying category primaryKeyAttribute P view_id (FK) alternateKeyAttribute category_id.entity_id (FK) attribute_id (FK) entity_id (FK) clusterNo (FK) view_id (FK) generic_id (FK) view_id (FK) entity_id (FK) attribute_id (FK) alternateKeyNo (FK) is used as

connectionForeignKeyAttribute parent_id (FK) view_id (FK) role_id.attribute_id (FK) child_id.entity_id (FK) connectionNo (FK) IDEF1X Diagram

Source: IDEF1X Formalization, 1993, Robert G. Brown (The Database Design Group) Satya © 2004 Diagramming Guidelines 56

• Identify layout conventions • Analyze information requirements for attributes • Model attributes • Identify multi-valued attributes • Validate attributes • Identify common and derived data • Understand the use of domains • Identify the components of a data warehouse

Source: Adding Detail to the Diagram - Annette scott, Oracle Satya © 2004 Lay Out the ER Diagram 57

• Neat and tidy

• Unambiguous text

• Memorable patterns

Source: Adding Detail to the Diagram - Annette scott, Oracle Satya © 2004 Layout Guidelines 58

Dead Crows Fly East !

Source: Adding Detail to the Diagram - Annette scott, Oracle Satya © 2004 Attributes 59

Badge Number - Identifies an employee

Name - Qualifies an employee

Payroll category (weekly or salaried) - Classifies an employee

Date of birth - Quantifies an employee

Employment status (active, leave, terminated) - Expresses the status of an employee

Source: Adding Detail to the Diagram - Annette scott, Oracle Satya © 2004 Finding Attributes 60

Is this attribute really needed ?

Beware of obsolete requirements from previous systems

Beware of derived data

Source: Adding Detail to the Diagram - Annette scott, Oracle Satya © 2004 Attribute Diagramming Conventions 61

• Inside the entity's

EMPLOYEE soft box badge num first name last name • Singular payroll num date of birth employment status • Lowercase

Source: Adding Detail to the Diagram - Annette scott, Oracle Satya © 2004 Meaningful Components 62

PERSON PERSON last name name first name

Break down aggregate attributes

ITEM ITEM type code vendor num

Source: Adding Detail to the Diagram - Annette scott, Oracle Satya © 2004 Verify for Single Value 63

RENTAL transaction date Can an attribute have more than one total amount paid value for an instance of the entity? item

Yes, more than one item may be rented at a time. An entity is missing.

RENTAL ITEM RENTAL item num transaction date total amount paid

Source: Adding Detail to the Diagram - Annette scott, Oracle Satya © 2004 Attributes Which have Attributes 64

TITLE product code title description Does information need to be stored about review details any of the attributes?

Yes, review details. An entity is missing.

REVIEW TITLE product code author title comment description date recorded

Source: Adding Detail to the Diagram - Annette scott, Oracle Satya © 2004 Finding Common or Derived Data 65

• Count 12 08 • Total 30 • Maximum, Minimum, Average 22 • Maximum, Minimum, Average ---- • Calculation 72 ----

Derived attributes are redundant and can lead to inconsistent values

Source: Adding Detail to the Diagram - Annette scott, Oracle Satya © 2004 Attribute Optionality 66

Mandatory Attributes • A value must be stored for each entity instance • Tagged with * EMPLOYEE Optional Attributes badge num • A value may be stored for each * first name entity instance * last name • Tagged with o *o title o weight

Source: Adding Detail to the Diagram - Annette scott, Oracle Satya © 2004 Attribute Details and Volumes 67

Attribute - * Engine Size Format Type Number Maximum length 4 Average length 4 Decimal place 1 Unit of measure cc Allowable values 900,1000,1500,1800,2000

Volume Initial 100%

Source: Adding Detail to the Diagram - Annette scott, Oracle Satya © 2004 Using a Domain 68

Movie Mono

AUDIO Stereo MON Audio STE SUR Game

Surround

Sound

Source: Adding Detail to the Diagram - Annette scott, Oracle Satya © 2004 Data Warehousing 69

Fact data Reference data

Summary Meta data data

Load management

Warehouse management

Query management

Source: Adding Detail to the Diagram - Annette scott, Oracle Satya © 2004 70

Database Design Techniques

Satya © 2004 Design Techniques 71

Three approaches can be followed. Sentence Analysis: • Ask Business user to tell ‘their story’ •Resultant sentences serve as basic constituents of tasks and processes performed in IS to be supported. •Extract data requirements from those sentences. Document Analysis: •Analyze documents including transactions, reports. •Interview results, Observation results, Policies and procedures, Output of existing systems (resports & screens), Inputs to existing screens (forms & screens), Database/file specifications of existing systems. Event Analysis: •Identify and describe what happens (the events), who is involved (actors and business resources), and what responses are required. (Follow the Zachman Interrogatives)

Source: 1) Logical Data Modeling - Salvatore T. March , 2) IDEF1x.doc Satya © 2004 Sentence Analysis 72

•Salespeople service Customers. •Customers place Orders through a Salesperson. •Freight is determined when an Order is Shipped. •Salespeople are paid commission based on their commission rate and Invoiced sales. •Each Salesperson has a number, name, and address. •Each Customer has a number and a bill-to-address.

?Identify subjects, verb phrases and objects. ?Specific instances must be generalized. ?If subject and object are both entities, then the verb phrase represents a relationship. ?If subject is an entity but object is a fact about that entity, then the object is an attribute and the verb phrase explains the meaning of the attribute.

Source: Logical Data Modeling - Salvatore T. March Satya © 2004 Document Analysis 73

INVOICE Sample Company, Inc. Number Date 111 Any Street 157289 10/02/90 Anytown, USA Bill To: Customer Number: 0361 Salesperson: 4531 – Joe Smith Local Grocery Store Customer PO: 3291 132 Local Street Terms: Net 30 Localtown, USA FOB Point: Anytown Line Product Product Unit of Quantity Unit No. Number Description Sale Order Ship Backord Price Discount Extension 1 2157 Cheerios Carton 40 40 0 50.00 5 % 1900.00 2 2283 Oat Rings Each 300 200 100 2.00 0 % 400.00 3 0579 Corn Flakes Carton 30 30 0 40.00 10 % 1080.00 Order Gross 4380.00 Tax at 6 % 262.80 Freight 50.00 ------Order Net 4692.80

?Each heading can be an entity, attribute or a derived attribute. ?Relationships need to be defined from a careful analysis only.

Source: Logical Data Modeling - Salvatore T. March Satya © 2004 Document Analysis – Data Flow Diagram (DFD) 74

Examples of DFDs, Level 0, Level 1, Level 2 etc…

Source: Logical Data Modeling - Salvatore T. March Satya © 2004 Event Analysis 75

?Define an entity for each event. Identify associated actor. ?Hence, Place Order, Ship Order, Invoice Order, and Pay Invoice are all entities.

Source: Logical Data Modeling - Salvatore T. March Satya © 2004 Design Evaluation 76

?Each entity must be uniquely identified. ?Attributes are associate with entities (not relationships), and each entity must have one and only one value for each of its attributes (otherwise an additional entity must be created) ?Relationships associate a pair of entities or associate an entity with itself (only binary relationships are allowed but relationships can be recursive) ?Many to Many relationships are not allowed ?Subtypes are identified when the minimum degree of a relationship descriptor is zero or when an attribute does not apply to all instances of an entity

Source: Logical Data Modeling - Salvatore T. March Satya © 2004 Further Reading 77

?CASE*Method: Entity Relationship Modeling by Richard Barker is an excellent introduction to ER modeling.

?Relational Database Design by Fleming and von Halle goes step by step into the nuts and bolts, all the way to the physical side.

?Practical Issues in Database Management by Fabian Pascal, will introduce many of the perennial tough problems in data modeling, and will help assure the new data modeler that there's more to data modeling than what is supported by current commercial implementations of SQL and relational database management products.

Satya © 2004 78

Concurrency

Satya © 2004 79

Benchmarking

Satya © 2004 Dependability Estimation 80

Mean time to failure (MTTF): Mean time to Repair (MTTR):

Availability: MTBF = MTTF + MTTR

Ai = MTTFi / MTBFi Reliability:

Mean Transaction time:

Satya © 2004 81

Data Warehousing - Concepts

Satya © 2004 82

Database types

Satya © 2004 83

Relational: Network: Hierarchical: Object-Oriented: Spatial-Geographic: Multimedia: Temporal: Text: Active: Real Time:

Satya © 2004 84

Database – J2EE

Satya © 2004 85

Database - .NET

Satya © 2004