Normal Forms

Total Page:16

File Type:pdf, Size:1020Kb

Normal Forms Database Relational Database Design Management Peter Wood Relational Database Design Update Anomalies Data Redundancy Normal Forms FD Inference There are two interconnected problems which are caused Boyce-Codd Normal Form by bad database design: Third Normal Form Normalisation Algorithms I Redundancy problems Lossless Join BCNF Algorithm I Update anomalies BCNF Examples Dependency Preservation 3NF Algorithm Good database design is based on using certain normal forms for relation schemas. Database Example 1 Management Peter Wood Relational Let F1 = fE ! D; D ! M; M ! Dg. Database Design Update Anomalies E stands for ENAME, D stands for DNAME and M stands Data Redundancy for MNAME Normal Forms FD Inference Boyce-Codd Normal Form A relation r1 over EMP1 (whose schema is {ENAME, Third Normal Form DNAME, MNAME}): Normalisation Algorithms Lossless Join BCNF Algorithm ENAME DNAME MNAME BCNF Examples Dependency Preservation Mark Computing Peter 3NF Algorithm Angela Computing Peter Graham Computing Peter Paul Math Donald George Math Donald E is the only key for EMP1 w.r.t. F1. Database Problems with EMP1 and F1 Management Peter Wood Relational Database Design Update Anomalies Data Redundancy Normal Forms 1. We cannot represent a department and manager FD Inference Boyce-Codd Normal Form without any employees (i.e., we cannot insert a tuple Third Normal Form with a null ENAME because of entity integrity); Normalisation Algorithms such a problem is called an insertion anomaly. Lossless Join BCNF Algorithm BCNF Examples 2. For the same reason as (1), we cannot delete all the Dependency Preservation employees in a department and keep just the 3NF Algorithm department information; such a problem is called a deletion anomaly. ? In (3) it is not sufficient to check that r1 satisfies the FDs resulting from the keys of EMP1 w.r.t. F1. ? Ideally, we would like all the FDs of a relation schema to be inferred from key dependencies, i.e. FDs of the form K ! schema(R), where K is a key for R w.r.t. F. Database More problems with EMP1 and F1 Management Peter Wood Relational Database Design Update Anomalies 3. E.g. in the first tuple, modifying “Peter” to “Philip” or Data Redundancy Normal Forms “Computing” to “Math”, does not violate any FD FD Inference ! Boyce-Codd Normal Form resulting from a key but D M would be violated Third Normal Form (D is not a key for EMP1 w.r.t. F1); Normalisation Algorithms such a problem is called a modification anomaly. Lossless Join BCNF Algorithm BCNF Examples Dependency Preservation 3NF Algorithm Database More problems with EMP1 and F1 Management Peter Wood Relational Database Design Update Anomalies 3. E.g. in the first tuple, modifying “Peter” to “Philip” or Data Redundancy Normal Forms “Computing” to “Math”, does not violate any FD FD Inference ! Boyce-Codd Normal Form resulting from a key but D M would be violated Third Normal Form (D is not a key for EMP1 w.r.t. F1); Normalisation Algorithms such a problem is called a modification anomaly. Lossless Join BCNF Algorithm BCNF Examples ? In (3) it is not sufficient to check that r satisfies the FDs Dependency Preservation 1 3NF Algorithm resulting from the keys of EMP1 w.r.t. F1. ? Ideally, we would like all the FDs of a relation schema to be inferred from key dependencies, i.e. FDs of the form K ! schema(R), where K is a key for R w.r.t. F. Database Final problem with EMP1 and F1 Management Peter Wood Relational Database Design Update Anomalies Data Redundancy Normal Forms FD Inference Boyce-Codd Normal Form Third Normal Form 4. There is redundancy in r1, i.e. for every employee in Normalisation a given department MNAME is repeated. Algorithms Lossless Join BCNF Algorithm BCNF Examples ? “Peter” appears three times for “Computing” and Dependency Preservation 3NF Algorithm “Donald” twice for “Math”. Database Example 2 Management Peter Wood Relational Let F2 = fE ! Sg. Database Design Update Anomalies E stands for ENAME, S stands for SAL and C stands for Data Redundancy CNAME. Normal Forms FD Inference Boyce-Codd Normal Form A relation r2 over EMP2 (whose schema is {ENAME, Third Normal Form CNAME, SAL}): Normalisation Algorithms Lossless Join BCNF Algorithm ENAME CNAME SAL BCNF Examples Dependency Preservation Jack Jill 25 3NF Algorithm Jack Jake 25 Jack John 25 Donald Dan 30 Donald David 30 EC is the only key for EMP2 w.r.t. F2. Database Problems with EMP2 and F2 Management Peter Wood Relational Database Design Update Anomalies 1. Insertion anomaly: we cannot insert an employee Data Redundancy without any children. Normal Forms FD Inference 2. Deletion anomaly: if there is a mistake and “Donald” Boyce-Codd Normal Form Third Normal Form does not have any children, we cannot record this Normalisation fact by deleting the two tuples for “Donald”. Algorithms Lossless Join BCNF Algorithm 3. Modification anomaly: if we try to modify the salary BCNF Examples Dependency Preservation of “Jack” in the first tuple to be 27 instead of 25, 3NF Algorithm since no FD resulting from a key will be violated, but E ! S would be violated. 4. Redundancy: the salary of each employee is repeated for every child. Database Formalising Redundancy Problems Management Peter Wood Relational Database Design Update Anomalies Data Redundancy Let R be a relation schema and F be a set of FDs over R. Normal Forms FD Inference Definition. R has a redundancy problem if Boyce-Codd Normal Form Third Normal Form (1) there exists a relation r over R that satisfies F, and Normalisation (2) there exists an FD X ! A in F and two distinct tuples Algorithms Lossless Join in r that have equal XA values. BCNF Algorithm BCNF Examples Dependency Preservation • It can be shown that redundancy problems, give rise to 3NF Algorithm update anomalies and vice versa. ? Verify that the schemas of Examples, 1 and 2 have redundancy problems. Database Problem for you to work on Management Peter Wood Consider the following relation over schema Films: Relational Title Year Genre StarName Database Design Update Anomalies Star Wars 1977 SciFi Carrie Fisher Data Redundancy Normal Forms Star Wars 1977 SciFi Harrison Ford FD Inference Boyce-Codd Normal Form Raiders . 1981 Action Harrison Ford Third Normal Form Raiders . 1981 Adventure Harrison Ford Normalisation Algorithms When Harry . 1989 Comedy Carrie Fisher Lossless Join BCNF Algorithm Assume that the only FD that holds on Films is BCNF Examples Dependency Preservation Title ! Year. 3NF Algorithm What is the only key for Films? Give an example of 1. an insertion anomaly 2. a deletion anomaly 3. a modification anomaly 4. a redundancy problem for Films. Database Normal Forms Management Peter Wood Relational Database Design I We assume that we are given a (1NF) relation Update Anomalies Data Redundancy schema R and a set F of functional dependencies Normal Forms FD Inference (FDs) over R. Boyce-Codd Normal Form Third Normal Form I We define two normal forms for relation schemas: Normalisation I Boyce-Codd Normal Form (BCNF) Algorithms Lossless Join I Third Normal Form (3NF) BCNF Algorithm BCNF Examples I BCNF guarantees that the relation schema has no Dependency Preservation 3NF Algorithm redundancy problems I BCNF is stronger than 3NF: If R is in BCNF, then R is in 3NF I 3NF, however, does sometimes have some advantages (see later) There are 3 rules of inference for FDs, known as Armstrong’s Axioms: 1. Reflexivity. If Y ⊆ X, then X ! Y (trivial FDs). 2. Augmentation. If X ! Y , then XA ! YA for any attribute A not in X or Y . 3. Transitivity. If X ! Y and Y ! Z , then X ! Z . Database Rules of inference for FDs Management Peter Wood Relational Given a set F of FDs, other FDs can be derived from Database Design Update Anomalies those in F. Data Redundancy Normal Forms For example, if F contains E ! D and D ! M, then FD Inference Boyce-Codd Normal Form E ! M can be derived from F (transitivity). Third Normal Form Normalisation An FD X ! Y is trivial if Y ⊆ X; otherwise it is Algorithms Lossless Join nontrivial. BCNF Algorithm BCNF Examples Dependency Preservation 3NF Algorithm Database Rules of inference for FDs Management Peter Wood Relational Given a set F of FDs, other FDs can be derived from Database Design Update Anomalies those in F. Data Redundancy Normal Forms For example, if F contains E ! D and D ! M, then FD Inference Boyce-Codd Normal Form E ! M can be derived from F (transitivity). Third Normal Form Normalisation An FD X ! Y is trivial if Y ⊆ X; otherwise it is Algorithms Lossless Join nontrivial. BCNF Algorithm BCNF Examples Dependency Preservation There are 3 rules of inference for FDs, known as 3NF Algorithm Armstrong’s Axioms: 1. Reflexivity. If Y ⊆ X, then X ! Y (trivial FDs). 2. Augmentation. If X ! Y , then XA ! YA for any attribute A not in X or Y . 3. Transitivity. If X ! Y and Y ! Z , then X ! Z . The closure of a set of attributes, CLOSURE(X; F), effectively uses Armstrong’s Axioms to find all attributes determined by X. From CLOSURE(X; F) one can find all FDs in F + that have X on the lefthand side. For example, if CLOSURE(HR; F) = HRCT , then F + contains HR ! C, HR ! T ,... Database Closure of a set of FDs Management Peter Wood If X ! Y can be derived for a set of FDs F, we write this Relational as F ` X ! Y . Database Design Update Anomalies + Data Redundancy Given F, the closure of F, denoted by F , is the set of all Normal Forms FDs that can be derived (or proven) from F. That is, FD Inference Boyce-Codd Normal Form Third Normal Form + F = fX ! Y j F ` X ! Y g: Normalisation Algorithms Lossless Join BCNF Algorithm BCNF Examples Dependency Preservation 3NF Algorithm Database Closure of a set of FDs Management Peter Wood If X ! Y can be derived for a set of FDs F, we write this Relational as F ` X ! Y .
Recommended publications
  • Normalized Form Snowflake Schema
    Normalized Form Snowflake Schema Half-pound and unascertainable Wood never rhubarbs confoundedly when Filbert snore his sloop. Vertebrate or leewardtongue-in-cheek, after Hazel Lennie compartmentalized never shreddings transcendentally, any misreckonings! quite Crystalloiddiverted. Euclid grabbles no yorks adhered The star schemas in this does not have all revenue for this When done use When doing table contains less sensible of rows Snowflake Normalizationde-normalization Dimension tables are in normalized form the fact. Difference between Star Schema & Snow Flake Schema. The major difference between the snowflake and star schema models is slot the dimension tables of the snowflake model may want kept in normalized form to. Typically most of carbon fact tables in this star schema are in the third normal form while dimensional tables are de-normalized second normal. A relation is danger to pause in First Normal Form should each attribute increase the. The model is lazy in single third normal form 1141 Options to Normalize Assume that too are 500000 product dimension rows These products fall under 500. Hottest 'snowflake-schema' Answers Stack Overflow. Learn together is Star Schema Snowflake Schema And the Difference. For step three within the warehouses we tested Redshift Snowflake and Bigquery. On whose other hand snowflake schema is in normalized form. The CWM repository schema is a standalone product that other products can shareeach product owns only. The main difference between in two is normalization. Families of normalized form snowflake schema snowflake. Star and Snowflake Schema in Data line with Examples. Is spread the dimension tables in the snowflake schema are normalized. Like price weight speed and quantitiesie data execute a numerical format.
    [Show full text]
  • Boyce-Codd Normal Forms Lecture 10 Sections 15.1 - 15.4
    Boyce-Codd Normal Forms Lecture 10 Sections 15.1 - 15.4 Robb T. Koether Hampden-Sydney College Wed, Feb 6, 2013 Robb T. Koether (Hampden-Sydney College) Boyce-Codd Normal Forms Wed, Feb 6, 2013 1 / 15 1 Third Normal Form 2 Boyce-Codd Normal Form 3 Assignment Robb T. Koether (Hampden-Sydney College) Boyce-Codd Normal Forms Wed, Feb 6, 2013 2 / 15 Outline 1 Third Normal Form 2 Boyce-Codd Normal Form 3 Assignment Robb T. Koether (Hampden-Sydney College) Boyce-Codd Normal Forms Wed, Feb 6, 2013 3 / 15 Third Normal Form Definition (Transitive Dependence) A set of attributes Z is transitively dependent on a set of attributes X if there exists a set of attributes Y such that X ! Y and Y ! Z. Definition (Third Normal Form) A relation R is in third normal form (3NF) if it is in 2NF and there is no nonprime attribute of R that is transitively dependent on any key of R. 3NF is violated if there is a nonprime attribute A that depends on something less than a key. Robb T. Koether (Hampden-Sydney College) Boyce-Codd Normal Forms Wed, Feb 6, 2013 4 / 15 Example Example order_no cust_no cust_name 222-1 3333 Joe Smith 444-2 4444 Sue Taylor 555-1 3333 Joe Smith 777-2 7777 Bob Sponge 888-3 4444 Sue Taylor Table 3 Table 3 is in 2NF, but it is not in 3NF because [order_no] ! [cust_no] ! [cust_name]: Robb T. Koether (Hampden-Sydney College) Boyce-Codd Normal Forms Wed, Feb 6, 2013 5 / 15 3NF Normalization To put a relation into 3NF, for each set of transitive function dependencies X ! Y ! Z , make two tables, one for X ! Y and another for Y ! Z .
    [Show full text]
  • Normalization Exercises
    DATABASE DESIGN: NORMALIZATION NOTE & EXERCISES (Up to 3NF) Tables that contain redundant data can suffer from update anomalies, which can introduce inconsistencies into a database. The rules associated with the most commonly used normal forms, namely first (1NF), second (2NF), and third (3NF). The identification of various types of update anomalies such as insertion, deletion, and modification anomalies can be found when tables that break the rules of 1NF, 2NF, and 3NF and they are likely to contain redundant data and suffer from update anomalies. Normalization is a technique for producing a set of tables with desirable properties that support the requirements of a user or company. Major aim of relational database design is to group columns into tables to minimize data redundancy and reduce file storage space required by base tables. Take a look at the following example: StdSSN StdCity StdClass OfferNo OffTerm OffYear EnrGrade CourseNo CrsDesc S1 SEATTLE JUN O1 FALL 2006 3.5 C1 DB S1 SEATTLE JUN O2 FALL 2006 3.3 C2 VB S2 BOTHELL JUN O3 SPRING 2007 3.1 C3 OO S2 BOTHELL JUN O2 FALL 2006 3.4 C2 VB The insertion anomaly: Occurs when extra data beyond the desired data must be added to the database. For example, to insert a course (CourseNo), it is necessary to know a student (StdSSN) and offering (OfferNo) because the combination of StdSSN and OfferNo is the primary key. Remember that a row cannot exist with NULL values for part of its primary key. The update anomaly: Occurs when it is necessary to change multiple rows to modify ONLY a single fact.
    [Show full text]
  • Fundamentals of Database Systems [Normalization – II]
    Outline First Normal Form Second Normal Form Third Normal Form Boyce-Codd Normal Form Fundamentals of Database Systems [Normalization { II] Malay Bhattacharyya Assistant Professor Machine Intelligence Unit Indian Statistical Institute, Kolkata October, 2019 Outline First Normal Form Second Normal Form Third Normal Form Boyce-Codd Normal Form 1 First Normal Form 2 Second Normal Form 3 Third Normal Form 4 Boyce-Codd Normal Form Outline First Normal Form Second Normal Form Third Normal Form Boyce-Codd Normal Form First normal form The domain (or value set) of an attribute defines the set of values it might contain. A domain is atomic if elements of the domain are considered to be indivisible units. Company Make Company Make Maruti WagonR, Ertiga Maruti WagonR, Ertiga Honda City Honda City Tesla RAV4 Tesla, Toyota RAV4 Toyota RAV4 BMW X1 BMW X1 Only Company has atomic domain None of the attributes have atomic domains Outline First Normal Form Second Normal Form Third Normal Form Boyce-Codd Normal Form First normal form Definition (First normal form (1NF)) A relational schema R is in 1NF iff the domains of all attributes in R are atomic. The advantages of 1NF are as follows: It eliminates redundancy It eliminates repeating groups. Note: In practice, 1NF includes a few more practical constraints like each attribute must be unique, no tuples are duplicated, and no columns are duplicated. Outline First Normal Form Second Normal Form Third Normal Form Boyce-Codd Normal Form First normal form The following relation is not in 1NF because the attribute Model is not atomic. Company Country Make Model Distributor Maruti India WagonR LXI, VXI Carwala Maruti India WagonR LXI Bhalla Maruti India Ertiga VXI Bhalla Honda Japan City SV Bhalla Tesla USA RAV4 EV CarTrade Toyota Japan RAV4 EV CarTrade BMW Germany X1 Expedition CarTrade We can convert this relation into 1NF in two ways!!! Outline First Normal Form Second Normal Form Third Normal Form Boyce-Codd Normal Form First normal form Approach 1: Break the tuples containing non-atomic values into multiple tuples.
    [Show full text]
  • Aslmple GUIDE to FIVE NORMAL FORMS in RELATIONAL DATABASE THEORY
    COMPUTING PRACTICES ASlMPLE GUIDE TO FIVE NORMAL FORMS IN RELATIONAL DATABASE THEORY W|LL|AM KErr International Business Machines Corporation 1. INTRODUCTION The normal forms defined in relational database theory represent guidelines for record design. The guidelines cor- responding to first through fifth normal forms are pre- sented, in terms that do not require an understanding of SUMMARY: The concepts behind relational theory. The design guidelines are meaningful the five principal normal forms even if a relational database system is not used. We pres- in relational database theory are ent the guidelines without referring to the concepts of the presented in simple terms. relational model in order to emphasize their generality and to make them easier to understand. Our presentation conveys an intuitive sense of the intended constraints on record design, although in its informality it may be impre- cise in some technical details. A comprehensive treatment of the subject is provided by Date [4]. The normalization rules are designed to prevent up- date anomalies and data inconsistencies. With respect to performance trade-offs, these guidelines are biased to- ward the assumption that all nonkey fields will be up- dated frequently. They tend to penalize retrieval, since Author's Present Address: data which may have been retrievable from one record in William Kent, International Business Machines an unnormalized design may have to be retrieved from Corporation, General several records in the normalized form. There is no obli- Products Division, Santa gation to fully normalize all records when actual perform- Teresa Laboratory, ance requirements are taken into account. San Jose, CA Permission to copy without fee all or part of this 2.
    [Show full text]
  • Database Normalization
    Outline Data Redundancy Normalization and Denormalization Normal Forms Database Management Systems Database Normalization Malay Bhattacharyya Assistant Professor Machine Intelligence Unit and Centre for Artificial Intelligence and Machine Learning Indian Statistical Institute, Kolkata February, 2020 Malay Bhattacharyya Database Management Systems Outline Data Redundancy Normalization and Denormalization Normal Forms 1 Data Redundancy 2 Normalization and Denormalization 3 Normal Forms First Normal Form Second Normal Form Third Normal Form Boyce-Codd Normal Form Elementary Key Normal Form Fourth Normal Form Fifth Normal Form Domain Key Normal Form Sixth Normal Form Malay Bhattacharyya Database Management Systems These issues can be addressed by decomposing the database { normalization forces this!!! Outline Data Redundancy Normalization and Denormalization Normal Forms Redundancy in databases Redundancy in a database denotes the repetition of stored data Redundancy might cause various anomalies and problems pertaining to storage requirements: Insertion anomalies: It may be impossible to store certain information without storing some other, unrelated information. Deletion anomalies: It may be impossible to delete certain information without losing some other, unrelated information. Update anomalies: If one copy of such repeated data is updated, all copies need to be updated to prevent inconsistency. Increasing storage requirements: The storage requirements may increase over time. Malay Bhattacharyya Database Management Systems Outline Data Redundancy Normalization and Denormalization Normal Forms Redundancy in databases Redundancy in a database denotes the repetition of stored data Redundancy might cause various anomalies and problems pertaining to storage requirements: Insertion anomalies: It may be impossible to store certain information without storing some other, unrelated information. Deletion anomalies: It may be impossible to delete certain information without losing some other, unrelated information.
    [Show full text]
  • A Simple Guide to Five Normal Forms in Relational Database Theory
    A Simple Guide to Five Normal Forms in Relational Database Theory 1 INTRODUCTION 2 FIRST NORMAL FORM 3 SECOND AND THIRD NORMAL FORMS 3.1 Second Normal Form 3.2 Third Normal Form 3.3 Functional Dependencies 4 FOURTH AND FIFTH NORMAL FORMS 4.1 Fourth Normal Form 4.1.1 Independence 4.1.2 Multivalued Dependencies 4.2 Fifth Normal Form 5 UNAVOIDABLE REDUNDANCIES 6 INTER-RECORD REDUNDANCY 7 CONCLUSION 8 REFERENCES 1 INTRODUCTION The normal forms defined in relational database theory represent guidelines for record design. The guidelines corresponding to first through fifth normal forms are presented here, in terms that do not require an understanding of relational theory. The design guidelines are meaningful even if one is not using a relational database system. We present the guidelines without referring to the concepts of the relational model in order to emphasize their generality, and also to make them easier to understand. Our presentation conveys an intuitive sense of the intended constraints on record design, although in its informality it may be imprecise in some technical details. A comprehensive treatment of the subject is provided by Date [4]. The normalization rules are designed to prevent update anomalies and data inconsistencies. With respect to performance tradeoffs, these guidelines are biased toward the assumption that all non-key fields will be updated frequently. They tend to penalize retrieval, since data which may have been retrievable from one record in an unnormalized design may have to be retrieved from several records in the normalized form. There is no obligation to fully normalize all records when actual performance requirements are taken into account.
    [Show full text]
  • Functional Dependency and Normalization for Relational Databases
    Functional Dependency and Normalization for Relational Databases Introduction: Relational database design ultimately produces a set of relations. The implicit goals of the design activity are: information preservation and minimum redundancy. Informal Design Guidelines for Relation Schemas Four informal guidelines that may be used as measures to determine the quality of relation schema design: Making sure that the semantics of the attributes is clear in the schema Reducing the redundant information in tuples Reducing the NULL values in tuples Disallowing the possibility of generating spurious tuples Imparting Clear Semantics to Attributes in Relations The semantics of a relation refers to its meaning resulting from the interpretation of attribute values in a tuple. The relational schema design should have a clear meaning. Guideline 1 1. Design a relation schema so that it is easy to explain. 2. Do not combine attributes from multiple entity types and relationship types into a single relation. Redundant Information in Tuples and Update Anomalies One goal of schema design is to minimize the storage space used by the base relations (and hence the corresponding files). Grouping attributes into relation schemas has a significant effect on storage space Storing natural joins of base relations leads to an additional problem referred to as update anomalies. These are: insertion anomalies, deletion anomalies, and modification anomalies. Insertion Anomalies happen: when insertion of a new tuple is not done properly and will therefore can make the database become inconsistent. When the insertion of a new tuple introduces a NULL value (for example a department in which no employee works as of yet).
    [Show full text]
  • A New Normal Form for the Design of Relational Database Schemata
    A New Normal Form for the Design of Relational Database Schemata CARLO ZANIOLO Sperry Research Center This paper addresses the problem of database schema design in the framework of the relational data model and functional dependencies. It suggests that both Third Normal Form (3NF) and Boyce- Codd Normal Form (BCNF) supply an inadequate basis for relational schema design. The main problem with 3NF is that it is too forgiving and does not enforce the separation principle as strictly as it should. On the other hand, BCNF is incompatible with the principle of representation and prone to computational complexity. Thus a new normal form, which lies between these two and captures the salient qualities of both is proposed. The new normal form is stricter than 3NF, but it is still compatible with the representation principle. First a simpler definition of 3NF is derived, and the analogy of this new definition to the definition of BCNF is noted. This analogy is used to derive the new normal form. Finally, it is proved that Bernstein’s algorithm for schema design synthesizes schemata that are already in the new normal form. Categories and Subject Descriptors: H.2.1 [Database Management]: Logical Design--normal forms General Terms: Algorithms, Design, Theory Additional Key Words and Phrases: Relational model, functional dependencies, database schema 1. INTRODUCTION The concept of normal form has supplied the cornerstone for most of the formal approaches to the design of relational schemata for database systems. Codd’s original Third Normal Form (3NF) [9] was followed by a number of refinements. These include Boyce-Codd Normal Form (BCNF) [lo], Fourth Normal Form [ 121, and Fifth Normal Form [ 131.
    [Show full text]
  • Database Normalization—Chapter Eight
    DATABASE NORMALIZATION—CHAPTER EIGHT Introduction Definitions Normal Forms First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF) Boyce-Codd Normal Form (BCNF) Fourth and Fifth Normal Form (4NF and 5NF) Introduction Normalization is a formal database design process for grouping attributes together in a data relation. Normalization takes a “micro” view of database design while entity-relationship modeling takes a “macro view.” Normalization validates and improves the logical design of a data model. Essentially, normalization removes redundancy in a data model so that table data are easier to modify and so that unnecessary duplication of data is prevented. Definitions To fully understand normalization, relational database terminology must be defined. An attribute is a characteristic of something—for example, hair colour or a social insurance number. A tuple (or row) represents an entity and is composed of a set of attributes that defines an entity, such as a student. An entity is a particular kind of thing, again such as student. A relation or table is defined as a set of tuples (or rows), such as students. All rows in a relation must be distinct, meaning that no two rows can have the same combination of values for all of their attributes. A set of relations or tables related to each other constitutes a database. Because no two rows in a relation are distinct, as they contain different sets of attribute values, a superkey is defined as the set of all attributes in a relation. A candidate key is a minimal superkey, a smaller group of attributes that still uniquely identifies each row.
    [Show full text]
  • Chapter- 7 DATABASE NORMALIZATION Keys Types of Key
    Chapter- 7 DATABASE NORMALIZATION Keys o Keys play an important role in the relational database. o It is used to uniquely identify any record or row of data from the table. It is also used to establish and identify relationships between tables. For example: In Student table, ID is used as a key because it is unique for each student. In PERSON table, passport_number, license_number, SSN are keys since they are unique for each person. Types of key: 1. Primary key o It is the first key which is used to identify one and only one instance of an entity uniquely. An entity can contain multiple keys as we saw in PERSON table. The key which is most suitable from those lists become a primary key. o In the EMPLOYEE table, ID can be primary key since it is unique for each employee. In the EMPLOYEE table, we can even select License_Number and Passport_Number as primary key since they are also unique. o For each entity, selection of the primary key is based on requirement and developers. 2. Candidate key o A candidate key is an attribute or set of an attribute which can uniquely identify a tuple. o The remaining attributes except for primary key are considered as a candidate key. The candidate keys are as strong as the primary key. For example: In the EMPLOYEE table, id is best suited for the primary key. Rest of the attributes like SSN, Passport_Number, and License_Number, etc. are considered as a candidate key. 3. Super Key Super key is a set of an attribute which can uniquely identify a tuple.
    [Show full text]
  • Third Normal Form (3NF) Remove Transitive Dependencies
    Third Normal Form (3NF) Remove transitive dependencies. • Transitive dependence - two separate entities exist within one table. • Any transitive dependencies are moved into a smaller (subset) table . 3NF further improves data integrity. • Prevents update, insert, and delete anomalies. Example item color T-shirt red item price tax T-shirt blue T-shirt 12.00 0.60 polo red polo 12.00 0.60 polo yellow sweatshirt 25.00 1.25 sweatshirt blue Tables are not in third normal form because: sweatshirt black • Tax depends on price, not item Example item color item price T-shirt red T-shirt 12.00 T-shirt blue polo 12.00 polo red sweatshirt 25.00 polo yellow sweatshirt blue sweatshirt black price tax 12.00 0.60 25.00 1.25 Tables are now in third normal form. Example Name Assignment 1 Assignment 2 Jeff Smith Article Summary Poetry Analysis Nancy Jones Article Summary Reaction Paper Jane Scott Article Summary Poetry Analysis Table is not in first normal form because: • Assignment field repeating • First and last name in one field • No (guaranteed unique) primary key field First Name Last Name Assignment Jeff Smith Article Summary Jeff Smith Poetry Analysis Nancy Jones Article Summary Nancy Jones Reaction Paper Jane Scott Article Summary Jane Scott Poetry Analysis Table is now in first normal form. Student ID First Name Last Name Assignment ID Student ID Description 1 Jeff Smith 1 1 Article Summary 2 Nancy Jones 2 1 Poetry Analysis 3 Jane Scott 1 2 Article Summary 3 2 Reaction Paper 1 3 Article Summary 2 3 Poetry Analysis Tables are in second normal form Assignment ID Description Assignment ID Student ID 1 Article Summary 1 1 2 Poetry Analysis 1 2 3 Reaction Paper 1 3 2 1 Student ID First Name Last Name 2 3 1 Jeff Smith 3 2 2 Nancy Jones 3 Jane Scott Tables are in third normal form.
    [Show full text]