Database Design
Total Page:16
File Type:pdf, Size:1020Kb
1 Database design - Satya Satya © 2004 Architect’s buzZ word 2 "Normalization is a logical concept, performance is determined at the physical level. Therefore, it is impossible to denormalize for performance." Fabian Pascal – co-founder & editor of Database Debunkings (dbdebunk.com) Satya © 2004 Architect’s buzZ word 3 “Denormalization, if necessary, should be done at the level of stored files, not at the level of base relvars” “denormalization, is not ‘good for performance’, it is good for the performance of specific applications” Chris J. Date – Most respected database expert in Computer Industry, Author – Database Systems. Satya © 2004 Background 4 • Presenting concepts, not syntax. • Presenting “How” & “What” not “Why” in the RDBMS. Satya © 2004 Agenda 5 • Introduction Satya © 2004 Ready for the data eXplosion? 6 40,000 BCE cave paintings bone tools 3500 writing 0 C.E. paper 105 1½B 1450 in 1999 printing 1870 electricity, telephone transistor 1947 GIGABYTES computing 1950 Late 1960s Internet 1993 (DARPA) The web 1999 Source: UC Berkeley Satya © 2004 The coming content - “Big Bang” 7 2003 24B 2002 12B GIGABYTES 40,000 BCE cave paintings bone tools 3500 writing 0 C.E. paper 105 2001 6B 1450 printing 1870 2000 3B electricity, telephone transistor 1947 computing 1950 Late 1960s Internet 1993 (DARPA) The web 1999 Source: UC Berkeley Satya © 2004 Future data size 8 • Terabytes of data – Common corporate expression – Petabytes(10^15) & Exabytes(10^18) is fast approaching • 2-3 Exabytes = total volume of all information generated worldwide annually – Need structure to efficiently handle large data. Source: 2001 - IBM Informix Conference, Las Vegas. Satya © 2004 Database – the solution 9 Database ? An Organized Store of Information –Flat Files Adabas, FileMaker –Hierarchical Databases IBM’s Information Management System (IMS) – used in Apollo Moon Landing. –Network Databases GE’s Integrated Data Store (IDS) –Relational Databases Oracle, Db2, Sybase, MS SQL, Postgres –Object Relational Databases Oracle 9 –Object Databases Cloudscape Satya © 2004 Database development activities during the 10 systems development life cycle (SDLC) Project Identification Enterprise modeling and Selection Project Initiation and Planning Conceptual data modeling Analysis Logical Design Physical Design Implementation Maintenance Satya © 2004 Database Application Lifecycle 11 DATABASE PLANNING SYSTEMS DEFINITION REQUIREMENTS ANALYSIS DB Design CONCEPTUAL DESIGN APPLICATION DBMS SELECTION LOGICAL DESIGN DESIGN PHYSICAL DESIGN PROTOTYPING IMPLEMENTATION DATA LOADING / MIGRATION TESTING MAINTENANCE Satya © 2004 Database design flow12 Conceptual Design Data analysis and Determine end user views, outputs, and requirements transaction processing requirements. Entity relationship modeling Define entities, attributes and relationships. and normalization Draw ER diagrams. Normalize tables. DBMS independent Identify main processes, insert, update and Data model verification delete rules. Validate reports, queries, views, integrity, sharing and security. Distributed database design Define location of tables, access requirements and fragmentation strategy. DBMS software selection Translate the conceptual model into DBMS Logical design definitions for tables, views and so on… dependent Physical design Define storage structures and access paths Hardware for optimum performance. dependent Satya © 2004 Design Approach 13 • Entity-Relationship (ER) data modeling – A graphical technique for understanding and organizing the data independently of the eventual database implementation • Normalization – An algorithmic process for evaluating the quality of a database design - most applicable to relational database designs • Types of Models – Models (of databases or anything else) can be built at different levels of abstraction – For databases (following the text): • Conceptual – logical ? ER Models (represent semantics) • Internal - for the chosen DBMS • External - the way the User see the data • Physical - for the actual physical storage Satya © 2004 14 ER Modeling Satya © 2004 ER Modeling concepts 15 Entity-Relationship Basics •The concepts upon which ER models are built are: –Entities (or, more correctly, entity types) –also called as “relvars”, “base relvars”, “relation” –at physical implementation level, called as table –Relationships (between entities) –Attributes (of entities and relationships) Satya © 2004 Entities & Entity types 16 •An entity is “A person, place, event, or thing for which we intend to collect data” •Normally a database will contain data about groups of similar entities (e.g. students, subjects, licenses, aircraft or whatever) •These groups of similar entities are referred to as entity types but often this is shortened to just “entity” or “entities” Satya © 2004 Entity types & Attributes 17 •Entity types are conventionally named in the singular •Attributes are represented on ER diagrams as ellipses attached to the relevant entity type symbol •There are other notations as well (e.g. a list of attributes next to the entity type symbol) but they are conceptually equivalent DOB Name student Address Gender Number student Satya © 2004 Relationships 18 •A relationship is an association between entity types •Relationships are represented by diamond shaped symbols on ER diagrams •A descriptive name is placed inside the relationship symbol student enrols subject in Satya © 2004 Relationship & entity 19 •Entity type names are usually nouns •Relationship names are usually, though not always, verbs (or verb phrases) •Most relationships are binary (i.e. connect 2 entity types) - like “enrolls in” •Other types of relationships are possible Satya © 2004 Degree of a Relationship 20 •The degree of a relationship is the number of entity type(s) that it connects –One Unary –Two Binary –Three Ternary •Relationships of degree higher than three are rare enrolls Binary relationship Unary relationship student subject employee supervises item sale item sale sale sale vendor purchaser = vendor purchaser Ternary relationship Three binary relationships Satya © 2004 21 Relationship Connectivity (Cardinality) •Relationships can have different connectivity(s) •one-to-one (1:1) •one-to-many (1:N) •many-to-many (M:N) •Indicated on the ER diagram by placing an appropriate symbol on each “leg” of the relationship 1 supervisor M N enrolls student subject employee supervises 1 N teaches lecturer subject Satya © 2004 22 E R F E R F E R F One-to-one relationship Many-to-one relationship Many-to-many relationship min-card(E, R)=0 min-card(E, R)=0 min-card(E, R)=0 max-card(E,R)=1 max-card(E,R)=N max-card(E,R)=N min-card(F,R)=0 min-card(F,R)=1 min-card(F,R)=0 max-card(F,R)=1 max-card(F,R)=1 max-card(F,R)=N Satya © 2004 Relationship Participation 23 •Entity types connected by a relationship can have two kinds of “participation” in it •Partial (or optional) •Total (or mandatory) •“Total” means that every entity instance must be connected (through the relationship) to an instance of the other participating entity type(s) •“Partial” means not total 1 1 Head of staff department Satya © 2004 Key Attribute(s) 24 •There will normally be one, or perhaps several, attributes that will be unique for every entity instance •Example: •Every student will have a unique student number •Such an attribute (or combination) is called a key •If the key for an entity set consists of two or more attributes in combination it is called a concatenated key •Key attribute(s) are underlined on the ER diagram Qualification Name DOB Age Address Number person Gender Satya © 2004 Derived, Multi-valued attributes 25 •Sometimes it is useful to have, on the ER diagram, attributes that can be derived from other attributes •Example: •An attribute Age can be derived from an attribute DOB and the current date •Derived attributes can be indicated on the ER diagram by using a dashed ellipse and connecting line to the relevant entity type Satya © 2004 Relationships attributes 26 •A relationship is an association between entity sets •Relationships can also have attributes •An attribute of a relationship is drawn attached to the relationship diamond •Usually only M:N relationships have attributes N employee supervises Task M Satya © 2004 Strong & Weak entities/entity types 27 •Sometimes the instances of one entity type depend, for their unique identification, on their relationship to the instances of another entity type Name Number building consists room of Satya © 2004 Supertypes & Subtypes 28 •Sometimes notionally different entity types are really specializations of a more general entity type •Example: •Trucks, cars, motorcycles, buses, taxis are all motor vehicles •Some attributes are common to all, others are specific to one group •This kind of situation can be dealt with using a generalization hierarchy (or super type/subtype hierarchy) •The attribute(s) that are common belong to the super type •The attributes that are specific are attached to the relevant subtype Satya © 2004 Supertypes & Subtypes 29 Seats motor Registration vehicle d U U U truck car bus truck car bus attributes attributes attributes Satya © 2004 Supertypes & Subtypes 30 Gender TFN employee DOB Address o U U U Safety officer engineer pilot Safety engineer pilot attributes attributes attributes Satya © 2004 31 Three schema architecture for Database development External level (individual user views) External (COBOL) External (XML ) <xsd:element name=“Emp”> 01 EMPC. Conceptual level <xsd:element name=“Eno” type=“Number” /> 02 EMPNO PIC X(6). <xsd:elementname=“Dno”