Session 7 – Main Theme

Total Page:16

File Type:pdf, Size:1020Kb

Session 7 – Main Theme Database Systems Session 7 – Main Theme Functional Dependencies and Normalization Dr. Jean-Claude Franchitti New York University Computer Science Department Courant Institute of Mathematical Sciences Presentation material partially based on textbook slides Fundamentals of Database Systems (6th Edition) by Ramez Elmasri and Shamkant Navathe Slides copyright © 2011 and on slides produced by Zvi Kedem copyight © 2014 1 Agenda 1 Session Overview 2 Logical Database Design - Normalization 3 Normalization Process Detailed 4 Summary and Conclusion 2 Session Agenda . Logical Database Design - Normalization . Normalization Process Detailed . Summary & Conclusion 3 What is the class about? . Course description and syllabus: » http://www.nyu.edu/classes/jcf/CSCI-GA.2433-001 » http://cs.nyu.edu/courses/spring15/CSCI-GA.2433-001/ . Textbooks: » Fundamentals of Database Systems (6th Edition) Ramez Elmasri and Shamkant Navathe Addition Wesley ISBN-10: 0-1360-8620-9, ISBN-13: 978-0136086208 6th Edition (04/10) 4 Icons / Metaphors Information Common Realization Knowledge/Competency Pattern Governance Alignment Solution Approach 55 Agenda 1 Session Overview 2 Logical Database Design - Normalization 3 Normalization Process Detailed 4 Summary and Conclusion 6 Agenda . Informal guidelines for good design . Functional dependency . Basic tool for analyzing relational schemas . Informal Design Guidelines for Relation Schemas . Normalization: . 1NF, 2NF, 3NF, BCNF, 4NF, 5NF • Normal Forms Based on Primary Keys • General Definitions of Second and Third Normal Forms • Boyce-Codd Normal Form • Multivalued Dependency and Fourth Normal Form • Join Dependencies and Fifth Normal Form 7 Logical Database Design . We are given a set of tables specifying the database » The base tables, which probably are the community (conceptual) level . They may have come from some ER diagram or from somewhere else . We will need to examine whether the specific choice of tables is good for » For storing the information needed » Enforcing constraints » Avoiding anomalies, such as redundancies . If there are issues to address, we may want to restructure the database, of course not losing any information . Let us quickly review an example from “long time ago” 8 A Fragment Of A Sample Relational Database R Name SSN DOB Grade Salary A 121 2367 2 80 A 132 3678 3 70 B 101 3498 4 70 C 106 2987 2 80 Business rule (one among several): • The value of Salary is determined only by the value of Grade Comment: • We keep track of the various Grades for more than just computing salaries, though we do not show it • For instance, DOB and Grade together determine the number of vacation days, which may therefore be different for SSN 121 and 106 9 Anomalies Name SSN DOB Grade Salary A 121 2367 2 80 A 132 3678 3 70 B 101 3498 4 70 C 106 2987 2 80 . “Grade = 2 implies Salary = 80” is written twice . There are additional problems with this design. » We are unable to store the salary structure for a Grade that does not currently exist for any employee. » For example, we cannot store that Grade = 1 implies Salary = 90 » For example, if employee with SSN = 132 leaves, we forget which Salary should be paid to employee with Grade = 3 » We could perhaps invent a fake employee with such a Grade and such a Salary, but this brings up additional problems, e.g., What is the SSN of such a fake employee? It cannot be NULL as SSN is the primary key 10 Better Representation Of Information . The problem can be solved by replacing R Name SSN DOB Grade Salary A 121 2367 2 80 A 132 3678 3 70 B 101 3498 4 70 C 106 2987 2 80 . by two tables S Name SSN DOB Grade T Grade Salary A 121 2367 2 2 80 A 132 3678 3 3 70 B 101 3498 4 4 70 C 106 2987 2 11 Decomposition . SELECT INTO S Name, SSN, DOB, Grade FROM R; . SELECT INTO T Grade, Salary FROM R; 12 Better Representation Of Information . And now we can » Store “Grade = 3 implies Salary = 70”, even after the last employee with this Grade leaves » Store “Grade = 2 implies Salary = 90”, planning for hiring employees with Grade = 1, while we do not yet have any employees with this Grade S Name SSN DOB Grade T Grade Salary A 121 2367 2 1 90 B 101 3498 4 2 80 3 70 C 106 2987 2 4 70 13 No Information Was Lost . Given S and T, we can reconstruct R using natural join S Name SSN DOB Grade T Grade Salary A 121 2367 2 2 80 A 132 3678 3 3 70 B 101 3498 4 4 70 C 106 2987 2 SELECT INTO R Name, SSN, DOB, S.Grade AS Grade, Salary FROM T, S WHERE T.Grade = S.Grade; R Name SSN DOB Grade Salary A 121 2367 2 80 A 132 3678 3 70 B 101 3498 4 70 C 106 2987 2 80 14 Natural Join (Briefly, More Later) . Given several tables, say R1, R2, …, Rn, their natural join is computed using the following “template”: SELECT INTO R one copy of each column name FROM R1, R2, …, Rn WHERE equal named columns have to be equal . The intuition is that R was “decomposed” into R1, R2, …,Rn by appropriate SELECT statements, and now we are putting it back together 15 Comment On Decomposition . It does not matter whether we remove duplicate rows . But some systems insist that that a row cannot appear more than once with a specific value of a primary key . So this would be OK for such a system T Grade Salary 2 80 3 70 4 70 . This would not be OK for such a system T Grade Salary 2 80 3 70 4 70 2 80 16 Comment On Decomposition . We can always make sure, in a system in which DISTINCT is allowed, that there are no duplicate rows by writing SELECT INTO T DISTINCT Grade, Salary FROM R; . And similarly elsewhere 17 Natural Join And Lossless Join Decomposition . Natural Join is: » Cartesian join with condition of equality on corresponding columns » Only one copy of each column is kept . “Lossless join decomposition” is another term for information not being lost, that is we can reconstruct the original table by “combining” information from the two new tables by means of natural join . This does not necessarily always hold . We will have more material about this later . Here we just observe that our decomposition satisfied this condition at least in our example 18 Elaboration On “Corresponding Columns” (Using Semantically “Equal” Columns) . It is suggested by some that no two columns in the database should have the same name, to avoid confusion, then we should have columns and join similar to these S S_Name S_SSN S_DOB S_Grade T T_Grade T_Salary A 121 2367 2 2 80 A 132 3678 3 3 70 B 101 3498 4 4 70 C 106 2987 2 SELECT INTO R S_Name AS R_Name, S_SSN AS R_SSN, S_DOB AS R_DOB, S_Grade AS R_Grade, T_Salary AS R_Salary FROM T, S WHERE T_Grade = S_Grade; R R_Name R_SSN R_DOB R_Grade R_Salary A 121 2367 2 80 A 132 3678 3 70 B 101 3498 4 70 C 106 2987 2 80 19 Mathematical Notation For Natural Join (We Will Use Sparingly) . There is a special mathematical symbol for natural join . It is not part of SQL, of course, which only allows standard ANSI font . In mathematical, relational algebra notation, natural join of two tables is denoted by a bow-like symbol (this symbol appears only in special mathematical fonts, so we may use in these notes instead) . So we have: R = S T . It is used when “corresponding columns” means “equal columns” 20 Revisiting The Problem . Let us look at R Name SSN DOB Grade Salary A 121 2367 2 80 A 132 3678 3 70 B 101 3498 4 70 C 106 2987 2 80 A 132 3678 3 70 B 101 3498 4 70 . The problem is not that there are duplicate rows . The problem is the same as before, business rule assigning Salary to Grade is written a number of time . So how can we “generalize” the problem? 21 Stating The Problem In General R Name SSN DOB Grade Salary A 121 2367 2 80 A 132 3678 3 70 B 101 3498 4 70 C 106 2987 2 80 A 132 3678 3 70 B 101 3498 4 70 . We have a problem whenever we have two sets of columns X and Y (here X is just Grade and Y is just Salary), such that 1. X does not contain a key either primary or unique (thus there could be several/many non-identical rows with the same value of X) 2. Whenever two rows are equal on X, they must be equal on Y . Why a problem: the business rule specifying how X “forces” Y is “embedded” in different rows and therefore » Inherently written redundantly » Cannot be stored by itself 22 What Did We Do? Think X = Grade And Y = Salary . We had a table U X V Y W . We replaced this one table by two tables U X V W X Y 23 Goodness of Relational Schemas . Levels at which we can discuss goodness of relation schemas . Logical (or conceptual) level . Implementation (or physical storage) level . Approaches to database design: . Bottom-up or top-down 24 Informal Design Guidelines for Relation Schemas . Measures of quality . Making sure attribute semantics are clear . Reducing redundant information in tuples . Reducing NULL values in tuples . Disallowing possibility of generating spurious tuples 25 Imparting Clear Semantics to Attributes in Relations . Semantics of a relation . Meaning resulting from interpretation of attribute values in a tuple .
Recommended publications
  • Functional Dependencies
    Basics of FDs Manipulating FDs Closures and Keys Minimal Bases Functional Dependencies T. M. Murali October 19, 21 2009 T. M. Murali October 19, 21 2009 CS 4604: Functional Dependencies Students(Id, Name) Advisors(Id, Name) Advises(StudentId, AdvisorId) Favourite(StudentId, AdvisorId) I Suppose we perversely decide to convert Students, Advises, and Favourite into one relation. Students(Id, Name, AdvisorId, AdvisorName, FavouriteAdvisorId) Basics of FDs Manipulating FDs Closures and Keys Minimal Bases Example Name Id Name Id Students Advises Advisors Favourite I Convert to relations: T. M. Murali October 19, 21 2009 CS 4604: Functional Dependencies I Suppose we perversely decide to convert Students, Advises, and Favourite into one relation. Students(Id, Name, AdvisorId, AdvisorName, FavouriteAdvisorId) Basics of FDs Manipulating FDs Closures and Keys Minimal Bases Example Name Id Name Id Students Advises Advisors Favourite I Convert to relations: Students(Id, Name) Advisors(Id, Name) Advises(StudentId, AdvisorId) Favourite(StudentId, AdvisorId) T. M. Murali October 19, 21 2009 CS 4604: Functional Dependencies Basics of FDs Manipulating FDs Closures and Keys Minimal Bases Example Name Id Name Id Students Advises Advisors Favourite I Convert to relations: Students(Id, Name) Advisors(Id, Name) Advises(StudentId, AdvisorId) Favourite(StudentId, AdvisorId) I Suppose we perversely decide to convert Students, Advises, and Favourite into one relation. Students(Id, Name, AdvisorId, AdvisorName, FavouriteAdvisorId) T. M. Murali October 19, 21 2009 CS 4604: Functional Dependencies Name and FavouriteAdvisorId. Id ! Name Id ! FavouriteAdvisorId AdvisorId ! AdvisorName I Can we say Id ! AdvisorId? NO! Id is not a key for the Students relation. I What is the key for the Students? fId, AdvisorIdg.
    [Show full text]
  • Normalized Form Snowflake Schema
    Normalized Form Snowflake Schema Half-pound and unascertainable Wood never rhubarbs confoundedly when Filbert snore his sloop. Vertebrate or leewardtongue-in-cheek, after Hazel Lennie compartmentalized never shreddings transcendentally, any misreckonings! quite Crystalloiddiverted. Euclid grabbles no yorks adhered The star schemas in this does not have all revenue for this When done use When doing table contains less sensible of rows Snowflake Normalizationde-normalization Dimension tables are in normalized form the fact. Difference between Star Schema & Snow Flake Schema. The major difference between the snowflake and star schema models is slot the dimension tables of the snowflake model may want kept in normalized form to. Typically most of carbon fact tables in this star schema are in the third normal form while dimensional tables are de-normalized second normal. A relation is danger to pause in First Normal Form should each attribute increase the. The model is lazy in single third normal form 1141 Options to Normalize Assume that too are 500000 product dimension rows These products fall under 500. Hottest 'snowflake-schema' Answers Stack Overflow. Learn together is Star Schema Snowflake Schema And the Difference. For step three within the warehouses we tested Redshift Snowflake and Bigquery. On whose other hand snowflake schema is in normalized form. The CWM repository schema is a standalone product that other products can shareeach product owns only. The main difference between in two is normalization. Families of normalized form snowflake schema snowflake. Star and Snowflake Schema in Data line with Examples. Is spread the dimension tables in the snowflake schema are normalized. Like price weight speed and quantitiesie data execute a numerical format.
    [Show full text]
  • ACS-3902 Ron Mcfadyen Slides Are Based on Chapter 5 (7Th Edition)
    ACS-3902 Ron McFadyen Slides are based on chapter 5 (7th edition) (chapter 3 in 6th edition) ACS-3902 1 The Relational Data Model and Relational Database Constraints • Relational model – Ted Codd (IBM) 1970 – First commercial implementations available in early 1980s – Widely used ACS-3902 2 Relational Model Concepts • Database is a collection of relations • Implementation of relation: table comprising rows and columns • In practice a table/relation represents an entity type or relationship type (entity-relationship model … later) • At intersection of a row and column in a table there is a simple value • Row • Represents a collection of related data values • Formally called a tuple • Column names • Columns may be referred to as fields, or, formally as attributes • Values in a column are drawn from a domain of values associated with the column/field/attribute ACS-3902 3 Relational Model Concepts 7th edition Figure 5.1 ACS-3902 4 Domains • Domain – Atomic • A domain is a collection of values where each value is indivisible • Not meaningful to decompose further – Specifying a domain • Name, data type, rules – Examples • domain of department codes for UW is a list: {“ACS”, “MATH”, “ENGL”, “HIST”, etc} • domain of gender values for UW is the list (“male”, “female”) – Cardinality: number of values in a domain – Database implementation & support vary ACS-3902 5 Domain example - PostgreSQL CREATE DOMAIN posint AS integer CHECK (VALUE > 0); CREATE TABLE mytable (id posint); INSERT INTO mytable VALUES(1); -- works INSERT INTO mytable VALUES(-1); -- fails https://www.postgresql.org/docs/current/domains.html ACS-3902 6 Domain example - PostgreSQL CREATE DOMAIN domain_code_type AS character varying NOT NULL CONSTRAINT domain_code_type_check CHECK (VALUE IN ('ApprovedByAdmin', 'Unapproved', 'ApprovedByEmail')); CREATE TABLE codes__domain ( code_id integer NOT NULL, code_type domain_code_type NOT NULL, CONSTRAINT codes_domain_pk PRIMARY KEY (code_id) ) ACS-3902 7 Relation • Relation schema R – Name R and a list of attributes: • Denoted by R (A1, A2, ...,An) • E.g.
    [Show full text]
  • Relational Database Design Chapter 7
    Chapter 7: Relational Database Design Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design Functional Dependencies Decomposition Boyce-Codd Normal Form Third Normal Form Multivalued Dependencies and Fourth Normal Form Overall Database Design Process Database System Concepts 7.2 ©Silberschatz, Korth and Sudarshan 1 First Normal Form Domain is atomic if its elements are considered to be indivisible units + Examples of non-atomic domains: Set of names, composite attributes Identification numbers like CS101 that can be broken up into parts A relational schema R is in first normal form if the domains of all attributes of R are atomic Non-atomic values complicate storage and encourage redundant (repeated) storage of data + E.g. Set of accounts stored with each customer, and set of owners stored with each account + We assume all relations are in first normal form (revisit this in Chapter 9 on Object Relational Databases) Database System Concepts 7.3 ©Silberschatz, Korth and Sudarshan First Normal Form (Contd.) Atomicity is actually a property of how the elements of the domain are used. + E.g. Strings would normally be considered indivisible + Suppose that students are given roll numbers which are strings of the form CS0012 or EE1127 + If the first two characters are extracted to find the department, the domain of roll numbers is not atomic. + Doing so is a bad idea: leads to encoding of information in application program rather than in the database. Database System Concepts 7.4 ©Silberschatz, Korth and Sudarshan 2 Pitfalls in Relational Database Design Relational database design requires that we find a “good” collection of relation schemas.
    [Show full text]
  • Boyce-Codd Normal Forms Lecture 10 Sections 15.1 - 15.4
    Boyce-Codd Normal Forms Lecture 10 Sections 15.1 - 15.4 Robb T. Koether Hampden-Sydney College Wed, Feb 6, 2013 Robb T. Koether (Hampden-Sydney College) Boyce-Codd Normal Forms Wed, Feb 6, 2013 1 / 15 1 Third Normal Form 2 Boyce-Codd Normal Form 3 Assignment Robb T. Koether (Hampden-Sydney College) Boyce-Codd Normal Forms Wed, Feb 6, 2013 2 / 15 Outline 1 Third Normal Form 2 Boyce-Codd Normal Form 3 Assignment Robb T. Koether (Hampden-Sydney College) Boyce-Codd Normal Forms Wed, Feb 6, 2013 3 / 15 Third Normal Form Definition (Transitive Dependence) A set of attributes Z is transitively dependent on a set of attributes X if there exists a set of attributes Y such that X ! Y and Y ! Z. Definition (Third Normal Form) A relation R is in third normal form (3NF) if it is in 2NF and there is no nonprime attribute of R that is transitively dependent on any key of R. 3NF is violated if there is a nonprime attribute A that depends on something less than a key. Robb T. Koether (Hampden-Sydney College) Boyce-Codd Normal Forms Wed, Feb 6, 2013 4 / 15 Example Example order_no cust_no cust_name 222-1 3333 Joe Smith 444-2 4444 Sue Taylor 555-1 3333 Joe Smith 777-2 7777 Bob Sponge 888-3 4444 Sue Taylor Table 3 Table 3 is in 2NF, but it is not in 3NF because [order_no] ! [cust_no] ! [cust_name]: Robb T. Koether (Hampden-Sydney College) Boyce-Codd Normal Forms Wed, Feb 6, 2013 5 / 15 3NF Normalization To put a relation into 3NF, for each set of transitive function dependencies X ! Y ! Z , make two tables, one for X ! Y and another for Y ! Z .
    [Show full text]
  • Normalization
    Normalization Normalization is a process in which we refine the quality of the logical design. Some call it a form of evaluation or validation. Codd said normalization is a process of elimination, through which we represent the table in a preferred format. Normalization is also called “non-loss decomposition” because a table is broken into smaller components but no information is lost. Are We Normal? If all values in a table are single atomic values (this is first normal form, or 1NF) then our database is technically in “normal form” according to Codd—the “math” of set theory and predicate logic will work. However, we usually want to do more to get our DB into a preferred format. We will take our databases to at least 3rd normal form. Why Do We Normalize? 1. We want the design to be easy to understand, with clear meaning and attributes in logical groups. 2. We want to avoid anomalies (data errors created on insertion, deletion, or modification) Suppose we have the table Nurse (SSN, Name, Unit, Unit Phone) Insertion anomaly: You need to insert the unit phone number every time you enter a nurse. What if you have a new unit that does not have any assigned staff yet? You can’t create a record with a blank primary key, so you can’t just leave nurse SSN blank. You would have no place to record the unit phone number until after you hire at least 1 nurse. Deletion anomaly: If you delete the last nurse on the unit you no longer have a record of the unit phone number.
    [Show full text]
  • Relational Algebra
    Relational Algebra Instructor: Shel Finkelstein Reference: A First Course in Database Systems, 3rd edition, Chapter 2.4 – 2.6, plus Query Execution Plans Important Notices • Midterm with Answers has been posted on Piazza. – Midterm will be/was reviewed briefly in class on Wednesday, Nov 8. – Grades were posted on Canvas on Monday, Nov 13. • Median was 83; no curve. – Exam will be returned in class on Nov 13 and Nov 15. • Please send email if you want “cheat sheet” back. • Lab3 assignment was posted on Sunday, Nov 5, and is due by Sunday, Nov 19, 11:59pm. – Lab3 has lots of parts (some hard), and is worth 13 points. – Please attend Labs to get help with Lab3. What is a Data Model? • A data model is a mathematical formalism that consists of three parts: 1. A notation for describing and representing data (structure of the data) 2. A set of operations for manipulating data. 3. A set of constraints on the data. • What is the associated query language for the relational data model? Two Query Languages • Codd proposed two different query languages for the relational data model. – Relational Algebra • Queries are expressed as a sequence of operations on relations. • Procedural language. – Relational Calculus • Queries are expressed as formulas of first-order logic. • Declarative language. • Codd’s Theorem: The Relational Algebra query language has the same expressive power as the Relational Calculus query language. Procedural vs. Declarative Languages • Procedural program – The program is specified as a sequence of operations to obtain the desired the outcome. I.e., how the outcome is to be obtained.
    [Show full text]
  • Normalization Exercises
    DATABASE DESIGN: NORMALIZATION NOTE & EXERCISES (Up to 3NF) Tables that contain redundant data can suffer from update anomalies, which can introduce inconsistencies into a database. The rules associated with the most commonly used normal forms, namely first (1NF), second (2NF), and third (3NF). The identification of various types of update anomalies such as insertion, deletion, and modification anomalies can be found when tables that break the rules of 1NF, 2NF, and 3NF and they are likely to contain redundant data and suffer from update anomalies. Normalization is a technique for producing a set of tables with desirable properties that support the requirements of a user or company. Major aim of relational database design is to group columns into tables to minimize data redundancy and reduce file storage space required by base tables. Take a look at the following example: StdSSN StdCity StdClass OfferNo OffTerm OffYear EnrGrade CourseNo CrsDesc S1 SEATTLE JUN O1 FALL 2006 3.5 C1 DB S1 SEATTLE JUN O2 FALL 2006 3.3 C2 VB S2 BOTHELL JUN O3 SPRING 2007 3.1 C3 OO S2 BOTHELL JUN O2 FALL 2006 3.4 C2 VB The insertion anomaly: Occurs when extra data beyond the desired data must be added to the database. For example, to insert a course (CourseNo), it is necessary to know a student (StdSSN) and offering (OfferNo) because the combination of StdSSN and OfferNo is the primary key. Remember that a row cannot exist with NULL values for part of its primary key. The update anomaly: Occurs when it is necessary to change multiple rows to modify ONLY a single fact.
    [Show full text]
  • Adequacy of Decompositions of Relational Databases*9+
    JOURNAL OF COMPUTER AND SYSTEM SCIENCES 21, 368-379 (1980) Adequacy of Decompositions of Relational Databases*9+ DAVID MAIER State University of New York, Stony Brook, New York 11794 0. MENDELZON IBM Thomas J. Watson Research Center, Yorktown Heights, New York 10598 FEREIDOON SADRI Princeton University, Princeton, New Jersey 08540 AND JEFFREY D. ULLMAN Stanford University, Stanford, California 94305 Received August 6, 1979; revised July 16, 1980 We consider conditions that have appeared in the literature with the purpose of defining a “good” decomposition of a relation scheme. We show that these notions are equivalent in the case that all constraints in the database are functional dependencies. This result solves an open problem of Rissanen. However, for arbitrary constraints the notions are shown to differ. I. BASIC DEFINITIONS We assume the reader is familiar with the relational model of data as expounded by Codd [ 111, in which data are represented by tables called relations. Rows of the tables are tuples, and the columns are named by attributes. The notation used in this paper is that found in Ullman [23]. A frequent viewpoint is that the “real world” is modeled by a universal set of attributes R and constraints on the set of relations that can be the “current” relation for scheme R (see Beeri et al. [4], Aho et al. [ 11, and Beeri et al. [7]). The database scheme representing the real world is a collection of (nondisjoint) subsets of the * Work partially supported by NSF Grant MCS-76-15255. D. Maier and A. 0. Mendelzon were also supported by IBM fellowships while at Princeton University, and F.
    [Show full text]
  • Fundamentals of Database Systems [Normalization – II]
    Outline First Normal Form Second Normal Form Third Normal Form Boyce-Codd Normal Form Fundamentals of Database Systems [Normalization { II] Malay Bhattacharyya Assistant Professor Machine Intelligence Unit Indian Statistical Institute, Kolkata October, 2019 Outline First Normal Form Second Normal Form Third Normal Form Boyce-Codd Normal Form 1 First Normal Form 2 Second Normal Form 3 Third Normal Form 4 Boyce-Codd Normal Form Outline First Normal Form Second Normal Form Third Normal Form Boyce-Codd Normal Form First normal form The domain (or value set) of an attribute defines the set of values it might contain. A domain is atomic if elements of the domain are considered to be indivisible units. Company Make Company Make Maruti WagonR, Ertiga Maruti WagonR, Ertiga Honda City Honda City Tesla RAV4 Tesla, Toyota RAV4 Toyota RAV4 BMW X1 BMW X1 Only Company has atomic domain None of the attributes have atomic domains Outline First Normal Form Second Normal Form Third Normal Form Boyce-Codd Normal Form First normal form Definition (First normal form (1NF)) A relational schema R is in 1NF iff the domains of all attributes in R are atomic. The advantages of 1NF are as follows: It eliminates redundancy It eliminates repeating groups. Note: In practice, 1NF includes a few more practical constraints like each attribute must be unique, no tuples are duplicated, and no columns are duplicated. Outline First Normal Form Second Normal Form Third Normal Form Boyce-Codd Normal Form First normal form The following relation is not in 1NF because the attribute Model is not atomic. Company Country Make Model Distributor Maruti India WagonR LXI, VXI Carwala Maruti India WagonR LXI Bhalla Maruti India Ertiga VXI Bhalla Honda Japan City SV Bhalla Tesla USA RAV4 EV CarTrade Toyota Japan RAV4 EV CarTrade BMW Germany X1 Expedition CarTrade We can convert this relation into 1NF in two ways!!! Outline First Normal Form Second Normal Form Third Normal Form Boyce-Codd Normal Form First normal form Approach 1: Break the tuples containing non-atomic values into multiple tuples.
    [Show full text]
  • Aslmple GUIDE to FIVE NORMAL FORMS in RELATIONAL DATABASE THEORY
    COMPUTING PRACTICES ASlMPLE GUIDE TO FIVE NORMAL FORMS IN RELATIONAL DATABASE THEORY W|LL|AM KErr International Business Machines Corporation 1. INTRODUCTION The normal forms defined in relational database theory represent guidelines for record design. The guidelines cor- responding to first through fifth normal forms are pre- sented, in terms that do not require an understanding of SUMMARY: The concepts behind relational theory. The design guidelines are meaningful the five principal normal forms even if a relational database system is not used. We pres- in relational database theory are ent the guidelines without referring to the concepts of the presented in simple terms. relational model in order to emphasize their generality and to make them easier to understand. Our presentation conveys an intuitive sense of the intended constraints on record design, although in its informality it may be impre- cise in some technical details. A comprehensive treatment of the subject is provided by Date [4]. The normalization rules are designed to prevent up- date anomalies and data inconsistencies. With respect to performance trade-offs, these guidelines are biased to- ward the assumption that all nonkey fields will be up- dated frequently. They tend to penalize retrieval, since Author's Present Address: data which may have been retrievable from one record in William Kent, International Business Machines an unnormalized design may have to be retrieved from Corporation, General several records in the normalized form. There is no obli- Products Division, Santa gation to fully normalize all records when actual perform- Teresa Laboratory, ance requirements are taken into account. San Jose, CA Permission to copy without fee all or part of this 2.
    [Show full text]
  • Relational Algebra and SQL Relational Query Languages
    Relational Algebra and SQL Chapter 5 1 Relational Query Languages • Languages for describing queries on a relational database • Structured Query Language (SQL) – Predominant application-level query language – Declarative • Relational Algebra – Intermediate language used within DBMS – Procedural 2 1 What is an Algebra? · A language based on operators and a domain of values · Operators map values taken from the domain into other domain values · Hence, an expression involving operators and arguments produces a value in the domain · When the domain is a set of all relations (and the operators are as described later), we get the relational algebra · We refer to the expression as a query and the value produced as the query result 3 Relational Algebra · Domain: set of relations · Basic operators: select, project, union, set difference, Cartesian product · Derived operators: set intersection, division, join · Procedural: Relational expression specifies query by describing an algorithm (the sequence in which operators are applied) for determining the result of an expression 4 2 The Role of Relational Algebra in a DBMS 5 Select Operator • Produce table containing subset of rows of argument table satisfying condition σ condition (relation) • Example: σ Person Hobby=‘stamps’(Person) Id Name Address Hobby Id Name Address Hobby 1123 John 123 Main stamps 1123 John 123 Main stamps 1123 John 123 Main coins 9876 Bart 5 Pine St stamps 5556 Mary 7 Lake Dr hiking 9876 Bart 5 Pine St stamps 6 3 Selection Condition • Operators: <, ≤, ≥, >, =, ≠ • Simple selection
    [Show full text]