1
Database design
- Satya
Satya © 2004 Architect’s buzZ word 2
"Normalization is a logical concept, performance is determined at the physical level. Therefore, it is impossible to denormalize for performance."
Fabian Pascal – co-founder & editor of Database Debunkings (dbdebunk.com)
Satya © 2004 Architect’s buzZ word 3
“Denormalization, if necessary, should be done at the level of stored files, not at the level of base relvars”
“denormalization, is not ‘good for performance’, it is good for the performance of specific applications”
Chris J. Date – Most respected database expert in Computer Industry, Author – Database Systems.
Satya © 2004 Background 4
• Presenting concepts, not syntax. • Presenting “How” & “What” not “Why” in the RDBMS.
Satya © 2004 Agenda 5
• Introduction
Satya © 2004 Ready for the data eXplosion? 6
40,000 BCE cave paintings bone tools 3500 writing 0 C.E. paper 105 1½B 1450 in 1999 printing 1870 electricity, telephone transistor 1947 GIGABYTES computing 1950 Late 1960s Internet 1993 (DARPA) The web 1999
Source: UC Berkeley Satya © 2004 The coming content - “Big Bang” 7
2003 24B
2002 12B GIGABYTES 40,000 BCE cave paintings bone tools 3500 writing 0 C.E. paper 105 2001 6B 1450 printing 1870 2000 3B electricity, telephone transistor 1947 computing 1950 Late 1960s Internet 1993 (DARPA) The web 1999 Source: UC Berkeley Satya © 2004 Future data size 8
• Terabytes of data – Common corporate expression – Petabytes(10^15) & Exabytes(10^18) is fast approaching • 2-3 Exabytes = total volume of all information generated worldwide annually – Need structure to efficiently handle large data.
Source: 2001 - IBM Informix Conference, Las Vegas. Satya © 2004 Database – the solution 9
Database
? An Organized Store of Information
–Flat Files Adabas, FileMaker
–Hierarchical Databases IBM’s Information Management System (IMS) – used in Apollo Moon Landing. –Network Databases GE’s Integrated Data Store (IDS)
–Relational Databases Oracle, Db2, Sybase, MS SQL, Postgres
–Object Relational Databases Oracle 9
–Object Databases Cloudscape
Satya © 2004 Database development activities during the 10 systems development life cycle (SDLC)
Project Identification Enterprise modeling and Selection
Project Initiation and Planning Conceptual data modeling
Analysis
Logical Design
Physical Design
Implementation
Maintenance
Satya © 2004 Database Application Lifecycle 11
DATABASE PLANNING
SYSTEMS DEFINITION
REQUIREMENTS ANALYSIS
DB Design CONCEPTUAL DESIGN APPLICATION DBMS SELECTION LOGICAL DESIGN DESIGN
PHYSICAL DESIGN
PROTOTYPING IMPLEMENTATION DATA LOADING / MIGRATION
TESTING
MAINTENANCE Satya © 2004 Database design flow12 Conceptual Design Data analysis and Determine end user views, outputs, and requirements transaction processing requirements.
Entity relationship modeling Define entities, attributes and relationships. and normalization Draw ER diagrams. Normalize tables. DBMS independent Identify main processes, insert, update and Data model verification delete rules. Validate reports, queries, views, integrity, sharing and security.
Distributed database design Define location of tables, access requirements and fragmentation strategy.
DBMS software selection
Translate the conceptual model into DBMS Logical design definitions for tables, views and so on… dependent
Physical design Define storage structures and access paths Hardware for optimum performance. dependent
Satya © 2004 Design Approach 13
• Entity-Relationship (ER) data modeling – A graphical technique for understanding and organizing the data independently of the eventual database implementation • Normalization – An algorithmic process for evaluating the quality of a database design - most applicable to relational database designs
• Types of Models – Models (of databases or anything else) can be built at different levels of abstraction – For databases (following the text): • Conceptual – logical ? ER Models (represent semantics) • Internal - for the chosen DBMS • External - the way the User see the data • Physical - for the actual physical storage
Satya © 2004 14 ER Modeling
Satya © 2004 ER Modeling concepts 15 Entity-Relationship Basics
•The concepts upon which ER models are built are: –Entities (or, more correctly, entity types) –also called as “relvars”, “base relvars”, “relation” –at physical implementation level, called as table –Relationships (between entities) –Attributes (of entities and relationships)
Satya © 2004 Entities & Entity types 16
•An entity is “A person, place, event, or thing for which we intend to collect data” •Normally a database will contain data about groups of similar entities (e.g. students, subjects, licenses, aircraft or whatever) •These groups of similar entities are referred to as entity types but often this is shortened to just “entity” or “entities”
Satya © 2004 Entity types & Attributes 17
•Entity types are conventionally named in the singular •Attributes are represented on ER diagrams as ellipses attached to the relevant entity type symbol •There are other notations as well (e.g. a list of attributes next to the entity type symbol) but they are conceptually equivalent
DOB
Name student Address
Gender Number student
Satya © 2004 Relationships 18
•A relationship is an association between entity types •Relationships are represented by diamond shaped symbols on ER diagrams •A descriptive name is placed inside the relationship symbol
student enrols subject in
Satya © 2004 Relationship & entity 19
•Entity type names are usually nouns •Relationship names are usually, though not always, verbs (or verb phrases) •Most relationships are binary (i.e. connect 2 entity types) - like “enrolls in” •Other types of relationships are possible
Satya © 2004 Degree of a Relationship 20
•The degree of a relationship is the number of entity type(s) that it connects –One Unary –Two Binary –Three Ternary •Relationships of degree higher than three are rare
enrolls Binary relationship Unary relationship student subject employee supervises
item sale item sale
sale sale vendor purchaser = vendor purchaser Ternary relationship Three binary relationships Satya © 2004 21 Relationship Connectivity (Cardinality)
•Relationships can have different connectivity(s) •one-to-one (1:1) •one-to-many (1:N) •many-to-many (M:N) •Indicated on the ER diagram by placing an appropriate symbol on each “leg” of the relationship
1 supervisor M N enrolls student subject employee supervises
1 N teaches lecturer subject
Satya © 2004 22
E R F E R F E R F
One-to-one relationship Many-to-one relationship Many-to-many relationship min-card(E, R)=0 min-card(E, R)=0 min-card(E, R)=0 max-card(E,R)=1 max-card(E,R)=N max-card(E,R)=N min-card(F,R)=0 min-card(F,R)=1 min-card(F,R)=0 max-card(F,R)=1 max-card(F,R)=1 max-card(F,R)=N
Satya © 2004 Relationship Participation 23
•Entity types connected by a relationship can have two kinds of “participation” in it •Partial (or optional) •Total (or mandatory) •“Total” means that every entity instance must be connected (through the relationship) to an instance of the other participating entity type(s) •“Partial” means not total
1 1 Head of staff department
Satya © 2004 Key Attribute(s) 24
•There will normally be one, or perhaps several, attributes that will be unique for every entity instance •Example: •Every student will have a unique student number •Such an attribute (or combination) is called a key •If the key for an entity set consists of two or more attributes in combination it is called a concatenated key •Key attribute(s) are underlined on the ER diagram
Qualification Name DOB Age
Address Number
person Gender
Satya © 2004 Derived, Multi-valued attributes 25
•Sometimes it is useful to have, on the ER diagram, attributes that can be derived from other attributes •Example: •An attribute Age can be derived from an attribute DOB and the current date •Derived attributes can be indicated on the ER diagram by using a dashed ellipse and connecting line to the relevant entity type
Satya © 2004 Relationships attributes 26
•A relationship is an association between entity sets •Relationships can also have attributes •An attribute of a relationship is drawn attached to the relationship diamond •Usually only M:N relationships have attributes
N
employee supervises
Task M
Satya © 2004 Strong & Weak entities/entity types 27
•Sometimes the instances of one entity type depend, for their unique identification, on their relationship to the instances of another entity type
Name Number
building consists room of
Satya © 2004 Supertypes & Subtypes 28
•Sometimes notionally different entity types are really specializations of a more general entity type •Example: •Trucks, cars, motorcycles, buses, taxis are all motor vehicles •Some attributes are common to all, others are specific to one group •This kind of situation can be dealt with using a generalization hierarchy (or super type/subtype hierarchy) •The attribute(s) that are common belong to the super type •The attributes that are specific are attached to the relevant subtype
Satya © 2004 Supertypes & Subtypes 29
Seats motor Registration vehicle
d U U U
truck car bus
truck car bus attributes attributes attributes
Satya © 2004 Supertypes & Subtypes 30
Gender TFN employee DOB Address o U U U
Safety officer engineer pilot
Safety engineer pilot attributes attributes attributes
Satya © 2004 31 Three schema architecture for Database development
External level (individual user views)
External (COBOL) External (XML )
Conceptual Internal level EMPLOYEE (storage view) EMPLOYEE_NUMBER CHARACTER(6) DEPARTMENT_NUMBER CHARACTER(4) SALARY NUMERIC(5)
Internal View STORED_EMP BYTES=20 PREFIX TYPE=BYTE(6), OFFSET=0 EMP# TYPE=BYTE(6), OFFSET=6, INDEX=EMPX DEPT# TYPE=BYTE(4), OFFSET=12 PAY TYPE=FULLWORD, OFFSET=16
Conceptual Schema - Neutral View - External Internal Schema Schema
Satya © 2004 32
Normalization
Satya © 2004 Levels of normalization 33
1NF relvars (normalized entities)
2NF relvars
3NF relvars
BCNF relvars
4NF relvars 5NF relvars
Satya © 2004 Normalization - Keys 34
Superkey: A superkey is a set of one or more attributes that, taken collectively, allows us to identify uniquely an entity. Candidate key: Any subset of a superkey that is also a superkey and is not reducible to another superkey is called candidate key. Primary key: A primary key is selected arbitrarily from the set of candidate keys to be used in an index for that table.
Source: Database Modeling & Design – Tobey J. Teorey Satya © 2004 Normalization – 1nf 35
First normal form (1NF): Defn: A relvar is in 1NF if and only if, in every legal value of that relvar, every tuple contains exactly one value for each attribute. Explanation: At each row and column position in the table, there exists one value, never a set of values.
Essence: Every row, column should be atomic. Violation of 1NF: Employee (EID#, Name, SkillSet, Address1, Address2)
•SkillSet stores, comma separated values. (C, VisualBasic, Oracle) •How many more addresses can be stored in this fashion?
Source: Administration Guide: Planning – DB2 Database Systems – C.J. Date Satya © 2004 Normalization – 2nf 36
Second normal form (2NF): (Assuming one candidate key, which we assume is the primary key) Defn: A relvar is in 2NF if and only if, it is in 1NF and every nonkey attribute is irreducibly dependent on the primary key. Explanation: Each column that is not part of the key is dependent upon the key.
Essence: All non-keys must depend on Key value. Violation of 2NF: WarehouseParts(PART#, WAREHOUSE#, Qty, WHAddr)
•WAREHOUSE# ? WHAddr •PART# ? Qty
Source: Administration Guide: Planning – DB2 Database Systems – C.J. Date Satya © 2004 Normalization – 3nf 37
Third normal form (3NF): (Assuming one candidate key, which we assume is the primary key) Defn: A relvar is in 3NF if and only if, it is in 2NF and every nonkey attribute is nontransitively dependent on on the primary key.
Note: “No transitive dependencies” implies no mutual dependencies. Explanation: Each column that is not part of the key is dependent upon the key.
Essence: All non-keys must depend “only” on Key value and no other non-key.
Violation of 3NF: Emp_Dept(EID#, FirstName, LastName, WorkDept, DeptName)
Source: Administration Guide: Planning – DB2 Database Systems – C.J. Date Satya © 2004 Normalization - bcnf 38
Boyce/Codd normal form (BCNF): (Assuming composite candidate key as primary key) Defn: A relvar is in BCNF if and only if, every non-trivial, left irreducible FD has a candidate key as its determinant. Explanation: Each column that is not part of the key is fully dependent upon the whole composite key and not on any single key alone.
Essence: All non-keys must depend “only” on “composite” key value and not on a single key.
Violation of BCNF: HotelRoom (HNo#, Room#, RoomType)
RoomType ? Room# & RoomType ?HNo#
Source: Database Systems – C.J. Date Satya © 2004 Normalization – 4nf 39
Fourth normal form (4NF):
Defn: Relvar R is in 4NF if and only if, whenever there exist a subsets A and B of the attributes of R such that the nontrivial MVD A ??B is satisfied, then all attributes of R are also functionally dependent on A. Explanation: No row contains two or more independent multi-valued facts about an entity.
Essence: Two separate facts cannot be in the same entity.
Violation of 4NF: Emp_Skill(EID#, SkillName#, Language#)
Source: Administration Guide: Planning – DB2 Database Systems – C.J. Date Satya © 2004 Normalization – 5nf (pjnf) 40
Fifth normal form (5NF):
Defn: Relvar R is in 5NF(also called projection join normal form) if and only if, every nontrivial join dependency that holds for R is implied by the candidate keys of R. Explanation: If a table can be decomposed further losslessly, then it could be decomposed. R{A,B,C} satisfies JD * {AB,AC} if and only if the MVDs A ?? B and A?? C hold in R
A ?? B | C ? * {AB, AC} Essence: Two separate facts cannot be in the same entity.
Violation of 5NF:
Source: Database Systems – C.J. Date Satya © 2004 Normalization – Others 41
Domain Key normal form (DK/NF):
Defn: A relvar R is said to be in DKNF if and only if, every constraint on R is a logical consequence of the domain constraints and key constraints that apply to R. Explanation: -- Principle of Orthogonal design (A Digression): Eg: SA has suppliers of Paris, SB has suppliers not in paris or with status 30. It is possible for a row to be present in both SA and SB, thus giving rise to update anomaly.
SX(S#, Sname, Status), SY(S#, Sname, City)
This can be best used in Distributed database design.
Source: Database Systems – C.J. Date Satya © 2004 Denormalization - Types 42
1. Collapsing Tables a. Two entities in a m:n relationship To avoid frequent joins, this can be applied. b. Two entities in a 1:1 relationship To avoid updates to two separate entities that are in 1:1 2. Reference data in a 1:m relationship (Add Redundant Columns) When large composite key / derived keys are used, they can be added to child entity in a 1:m relationship as a foreign key, again to avoid certain join operations. 3. Entities with the most detailed data When MVDs/Temporal design is in place, we could store summarized data about MVD attribute/temporal dimension (eg: months) 4. Derived attributes When an attribute is derived by a function of another, but its better to store derived attribute. (eg: SearchName, y = f(x) ? store x,y in R) 5. Splitting Tables (Horizontal / Vertical Splitting)
Source: Denormalization effects on Performance of RDBMS, G. Lawrence Sanders, Seungkyoon Shin, State University of New York, Buffalo Satya © 2004 Denormalization – Criteria 43
Criteria • General application performance requirements • indicated by business needs. • On-line response time requirements for application queries, updates ad processes. • Minimum number of data access paths. • Minimum amount of storage.
Source: Database Modeling 7 Design – Tobey J. Teorey Satya © 2004 Denormalization – Alternatives 44
Alternatives • Application performance criteria. • Future application development and maintenance considerations. • Volatility of application requirements. • Relations between transactions and relations of entities involved. • Transaction type (update/query, OLTP/OLAP). • Transaction frequency. • Access paths needed by each transaction. • Number of rows accessed by each transaction. • Number of pages/blocks accessed by each transaction. • Cardinality of each relation.
Source: Database Modeling & Design – Tobey J. Teorey Satya © 2004 45
Data Modeling
Satya © 2004 Diagramming Notations 46
Notation • Bachman Notation • Chen ERD • Database Model Diagram
Satya © 2004 Diagramming Notations 47
Satya © 2004 Diagramming Notations 48
Database Model Diagram
Satya © 2004 Diagramming Notations 49
ER Source Model
Satya © 2004 50 Diagramming Notations - IDEF1X Notation
Relationship Cardinality
exactly n Attribute And Primary Key Syntax zero, one or more n Entity-name/Entity-number Attribute-Name [Attribute-Name] Primary-Key one or more from n to m Attributes P } n-m [Attribute-Name] zero or one [Attribute-Name] reference to note (n) where [Attribute-Name] Z (n) cardinality is specified
Source: IDEF1X Formalization, 1993, Robert G. Brown (The Database Design Group) Satya © 2004 51 Diagramming Notations - IDEF1X Notation
Identifying Relationship Mandatory Non-Identifying Relationship
Entity-A Key-Attribute-A Entity-A Key-Attribute-A *Parent Entity *Parent Entity
Identifying Relationship Relationship Name Mandatory Non-Identifying Relationship Relationship Name
Entity-B Key-Attribute-A (FK) Entity-B Key-Attribute-B Key-Attribute-B **Child Entity Key-Attribute-A (FK) **Child Entity
* The Parent Entity in an Identifying Relationship may be * The Parent Entity in a Mandatory Non-Identifying Relationship an Identifier-Independent Entity (as shown) or an may be an Identifier-Independent Entity (as shown) or an Identifier-Dependent Entity depending upon other relationships.Identifier-Dependent Entity depending upon other relationships.
** The Child Entity in an Identifying Relationship is always an ** The Child Entity in a Mandatory Non-Identifying Relationship will Identifier-Dependent Entity. be an Identifier-Independent Entity unless the entity is also a Child Entity in some Identifying Relationship.
Source: IDEF1X Formalization, 1993, Robert G. Brown (The Database Design Group) Satya © 2004 52 Diagramming Notations - IDEF1X Notation
Optional Non-Identifying Relationship
Entity-A Key-Attribute-A
*Parent Entity
Optional Non-Identifying Relationship Relationship Name
Entity-B Key-Attribute-B
Key-Attribute-A (FK) **Child Entity
* The Parent Entity in a Optional Non-Identifying Relationship may be an Identifier-Independent Entity (as shown) or an Identifier-Dependent Entity depending upon other relationships.
** The Child Entity in a Optional Non-Identifying Relationship will be an Identifier-Independent Entity unless the entity is also a Child Entity in some Identifying Relationship.
Source: IDEF1X Formalization, 1993, Robert G. Brown (The Database Design Group) Satya © 2004 53 Diagramming Notations - IDEF1X Notation
Domain Hierarchy
Frequency
Base Domain Typed Domains
Radio Audio Frequency Frequency
Ultra High Very High High Frequency Frequency Frequency Ultra-Sonic Sonic Sub-Sonic (UHF) (VHF) (HF)
Source: IDEF1X Formalization, 1993, Robert G. Brown (The Database Design Group) Satya © 2004 54 Diagramming Notations - IDEF1X Notation
Team Organization
Expert
Project Source Manager Modeler
Acceptance Review Committee
Source: IDEF1X Formalization, 1993, Robert G. Brown (The Database Design Group) Satya © 2004 55 Diagramming Notations - IDEF1X Diagram idef1xObject id name (AK1) description (O)
entity view domain entity_id.id (FK) view_id.id (FK) domain_id.id (FK) level domainRule (O) purpose scope /supertype appears in author_conventions aka / real aka / real contains
aliasDomain baseDomain typedDomain domain_id (FK) domain_id (FK) domain_id (FK) aliasEntity realDomain_id.domain_id (FK) dataType (O) superType_id.domain_id (FK) entity_id (FK) viewEntity realEntity_id.entity_id (FK) appears in view_id (FK) entity_id (FK) is_dependent (O) is parent in / parent contains is child in / child is generic in / generic
connectionRelationship parent_id.entity_id (FK) (AK1) cluster viewEntityAttribute connectionNo view_id (FK) (AK1) child_id.entity_id (FK) (AK1) clusterNo is attribute_id.domain_id (FK) alternateKey view_id (FK) (AK1) generic_id.entity_id (FK) (AK1) discriminator view_id (FK) alternateKeyNo name1 (O) (AK1) discEntity_id.entity_id (O) (FK) for entity_id (FK) view_id (FK) name2 (O) (AK1) is_compete is_nonull (O) entity_id (FK) childLow disc_id .attribute_id (O) (FK) (AK1) is_owned (O) childHigh (O) is_migrated (O) parentLow (O) parentHigh (O) contains is-mandatory (O) appears in P is-specific is-identifying category primaryKeyAttribute P view_id (FK) alternateKeyAttribute category_id.entity_id (FK) attribute_id (FK) entity_id (FK) clusterNo (FK) view_id (FK) generic_id (FK) view_id (FK) entity_id (FK) attribute_id (FK) alternateKeyNo (FK) is used as
connectionForeignKeyAttribute parent_id (FK) view_id (FK) role_id.attribute_id (FK) child_id.entity_id (FK) connectionNo (FK) IDEF1X Diagram
Source: IDEF1X Formalization, 1993, Robert G. Brown (The Database Design Group) Satya © 2004 Diagramming Guidelines 56
• Identify layout conventions • Analyze information requirements for attributes • Model attributes • Identify multi-valued attributes • Validate attributes • Identify common and derived data • Understand the use of domains • Identify the components of a data warehouse
Source: Adding Detail to the Diagram - Annette scott, Oracle Satya © 2004 Lay Out the ER Diagram 57
• Neat and tidy
• Unambiguous text
• Memorable patterns
Source: Adding Detail to the Diagram - Annette scott, Oracle Satya © 2004 Layout Guidelines 58
Dead Crows Fly East !
Source: Adding Detail to the Diagram - Annette scott, Oracle Satya © 2004 Attributes 59
Badge Number - Identifies an employee
Name - Qualifies an employee
Payroll category (weekly or salaried) - Classifies an employee
Date of birth - Quantifies an employee
Employment status (active, leave, terminated) - Expresses the status of an employee
Source: Adding Detail to the Diagram - Annette scott, Oracle Satya © 2004 Finding Attributes 60
Is this attribute really needed ?
Beware of obsolete requirements from previous systems
Beware of derived data
Source: Adding Detail to the Diagram - Annette scott, Oracle Satya © 2004 Attribute Diagramming Conventions 61
• Inside the entity's
EMPLOYEE soft box badge num first name last name • Singular payroll num date of birth employment status • Lowercase
Source: Adding Detail to the Diagram - Annette scott, Oracle Satya © 2004 Meaningful Components 62
PERSON PERSON last name name first name
Break down aggregate attributes
ITEM ITEM type code vendor num
Source: Adding Detail to the Diagram - Annette scott, Oracle Satya © 2004 Verify for Single Value 63
RENTAL transaction date Can an attribute have more than one total amount paid value for an instance of the entity? item
Yes, more than one item may be rented at a time. An entity is missing.
RENTAL ITEM RENTAL item num transaction date total amount paid
Source: Adding Detail to the Diagram - Annette scott, Oracle Satya © 2004 Attributes Which have Attributes 64
TITLE product code title description Does information need to be stored about review details any of the attributes?
Yes, review details. An entity is missing.
REVIEW TITLE product code author title comment description date recorded
Source: Adding Detail to the Diagram - Annette scott, Oracle Satya © 2004 Finding Common or Derived Data 65
• Count 12 08 • Total 30 • Maximum, Minimum, Average 22 • Maximum, Minimum, Average ---- • Calculation 72 ----
Derived attributes are redundant and can lead to inconsistent values
Source: Adding Detail to the Diagram - Annette scott, Oracle Satya © 2004 Attribute Optionality 66
Mandatory Attributes • A value must be stored for each entity instance • Tagged with * EMPLOYEE Optional Attributes badge num • A value may be stored for each * first name entity instance * last name • Tagged with o *o title o weight
Source: Adding Detail to the Diagram - Annette scott, Oracle Satya © 2004 Attribute Details and Volumes 67
Attribute - * Engine Size Format Type Number Maximum length 4 Average length 4 Decimal place 1 Unit of measure cc Allowable values 900,1000,1500,1800,2000
Volume Initial 100%
Source: Adding Detail to the Diagram - Annette scott, Oracle Satya © 2004 Using a Domain 68
Movie Mono
AUDIO Stereo MON Audio STE SUR Game
Surround
Sound
Source: Adding Detail to the Diagram - Annette scott, Oracle Satya © 2004 Data Warehousing 69
Fact data Reference data
Summary Meta data data
Load management
Warehouse management
Query management
Source: Adding Detail to the Diagram - Annette scott, Oracle Satya © 2004 70
Database Design Techniques
Satya © 2004 Design Techniques 71
Three approaches can be followed. Sentence Analysis: • Ask Business user to tell ‘their story’ •Resultant sentences serve as basic constituents of tasks and processes performed in IS to be supported. •Extract data requirements from those sentences. Document Analysis: •Analyze documents including transactions, reports. •Interview results, Observation results, Policies and procedures, Output of existing systems (resports & screens), Inputs to existing screens (forms & screens), Database/file specifications of existing systems. Event Analysis: •Identify and describe what happens (the events), who is involved (actors and business resources), and what responses are required. (Follow the Zachman Interrogatives)
Source: 1) Logical Data Modeling - Salvatore T. March , 2) IDEF1x.doc Satya © 2004 Sentence Analysis 72
•Salespeople service Customers. •Customers place Orders through a Salesperson. •Freight is determined when an Order is Shipped. •Salespeople are paid commission based on their commission rate and Invoiced sales. •Each Salesperson has a number, name, and address. •Each Customer has a number and a bill-to-address.
?Identify subjects, verb phrases and objects. ?Specific instances must be generalized. ?If subject and object are both entities, then the verb phrase represents a relationship. ?If subject is an entity but object is a fact about that entity, then the object is an attribute and the verb phrase explains the meaning of the attribute.
Source: Logical Data Modeling - Salvatore T. March Satya © 2004 Document Analysis 73
INVOICE Sample Company, Inc. Number Date 111 Any Street 157289 10/02/90 Anytown, USA Bill To: Customer Number: 0361 Salesperson: 4531 – Joe Smith Local Grocery Store Customer PO: 3291 132 Local Street Terms: Net 30 Localtown, USA FOB Point: Anytown Line Product Product Unit of Quantity Unit No. Number Description Sale Order Ship Backord Price Discount Extension 1 2157 Cheerios Carton 40 40 0 50.00 5 % 1900.00 2 2283 Oat Rings Each 300 200 100 2.00 0 % 400.00 3 0579 Corn Flakes Carton 30 30 0 40.00 10 % 1080.00 Order Gross 4380.00 Tax at 6 % 262.80 Freight 50.00 ------Order Net 4692.80
?Each heading can be an entity, attribute or a derived attribute. ?Relationships need to be defined from a careful analysis only.
Source: Logical Data Modeling - Salvatore T. March Satya © 2004 Document Analysis – Data Flow Diagram (DFD) 74
Examples of DFDs, Level 0, Level 1, Level 2 etc…
Source: Logical Data Modeling - Salvatore T. March Satya © 2004 Event Analysis 75
?Define an entity for each event. Identify associated actor. ?Hence, Place Order, Ship Order, Invoice Order, and Pay Invoice are all entities.
Source: Logical Data Modeling - Salvatore T. March Satya © 2004 Design Evaluation 76
?Each entity must be uniquely identified. ?Attributes are associate with entities (not relationships), and each entity must have one and only one value for each of its attributes (otherwise an additional entity must be created) ?Relationships associate a pair of entities or associate an entity with itself (only binary relationships are allowed but relationships can be recursive) ?Many to Many relationships are not allowed ?Subtypes are identified when the minimum degree of a relationship descriptor is zero or when an attribute does not apply to all instances of an entity
Source: Logical Data Modeling - Salvatore T. March Satya © 2004 Further Reading 77
?CASE*Method: Entity Relationship Modeling by Richard Barker is an excellent introduction to ER modeling.
?Relational Database Design by Fleming and von Halle goes step by step into the nuts and bolts, all the way to the physical side.
?Practical Issues in Database Management by Fabian Pascal, will introduce many of the perennial tough problems in data modeling, and will help assure the new data modeler that there's more to data modeling than what is supported by current commercial implementations of SQL and relational database management products.
Satya © 2004 78
Concurrency
Satya © 2004 79
Benchmarking
Satya © 2004 Dependability Estimation 80
Mean time to failure (MTTF): Mean time to Repair (MTTR):
Availability: MTBF = MTTF + MTTR
Ai = MTTFi / MTBFi Reliability:
Mean Transaction time:
Satya © 2004 81
Data Warehousing - Concepts
Satya © 2004 82
Database types
Satya © 2004 83
Relational: Network: Hierarchical: Object-Oriented: Spatial-Geographic: Multimedia: Temporal: Text: Active: Real Time:
Satya © 2004 84
Database – J2EE
Satya © 2004 85
Database - .NET
Satya © 2004