Week 11: Normal Forms Logical Database Design

Total Page:16

File Type:pdf, Size:1020Kb

Week 11: Normal Forms Logical Database Design Week 11: Normal Forms Logical Database Design Database Design We have seen how to design a relational Database Redundancies and Anomalies schema by first designing an ER schema Functional Dependencies and then transforming it into a relational Entailment, Closure and Equivalence one. Lossless Decompositions The Third Normal Form (3NF) Now we focus on how to transform the The Boyce-Codd Normal Form (BCNF) generated relational schema into a "better" Normal Forms and Database Design one. Goodness of relational schemas is defined in terms of the notion of normal form. CSC343 – Introduction to Databases Normal Forms — 1 CSC343 – Introduction to Databases Normal Forms — 2 1 Normal Forms and Normalization Examples of Redundancy A normal form is a property of a database schema. When a database schema is un-normalized (that is, Employee Salary Project Budget Function does not satisfy the normal form), it allows Brown 20 Mars 2 technician redundancies of various types which can lead to Green 35 Jupiter 15 designer anomalies and inconsistencies. Green 35 Venus 15 designer Hoskins 55 Venus 15 manager Normal forms can serve as basis for evaluating the Hoskins 55 Jupiter 15 consultant quality of a database schema and constitutes a useful Hoskins 55 Mars 2 consultant tool for database design. Moore 48 Mars 2 manager Moore 48 Venus 15 designer Normalization is a procedure that transforms an un- Kemp 48 Venus 15 designer normalized schema into a normalized one. Kemp 48 Jupiter 15 manager CSC343 – Introduction to Databases Normal Forms — 3 CSC343 – Introduction to Databases Normal Forms — 4 2 Anomalies What’s Wrong??? The value of the salary of an employee is repeated in every tuple where the employee is mentioned, We are using a single relation to represent data of leading to a redundancy. Redundancies lead to very different types. anomalies: In particular, we are using a single relation to store If salary of an employee changes, we have to the following types of entities, relationships and modify the value in all corresponding tuples attributes: (update anomaly) Employees and their salaries; If an employee ceases to work in projects, but Projects and their budgets; stays with company, all corresponding tuples are Participation of employees in projects, along with deleted, leading to loss of information (deletion their functions. anomaly) To set the problem on a formal footing, we introduce A new employee cannot be inserted in the the notion of functional dependency (FD). relation until the employee is assigned to a project (insertion anomaly) CSC343 – Introduction to Databases Normal Forms — 5 CSC343 – Introduction to Databases Normal Forms — 6 3 Functional Dependencies (FDs) Functional Dependencies in the Example Given schema R(X) and non-empty subsets Y and Z of the attributes X, we say that there is a functional Each employee has a unique salary. We dependency between Y and Z (Y→Z), iff for every represent this dependency as relation instance r of R(X) and every pair of tuples t , t 1 2 Employee → Salary of r, if t .Y = t .Y, then t .Z = t .Z. and say " functionally depends on 1 2 1 2 Salary A functional dependency is a statement about all Employee". allowable relations for a given schema. Meaning: if two tuples have the same Functional dependencies have to be identified by Employee attribute value, they must also understanding the semantics of the application. have the same Salary attribute value Given a particular relation r0 of R(X), we can tell if a Likewise, dependency holds or not; but just because it holds for Project → Budget r0, doesn’t mean that it also holds for R(X)! i.e., each project has a unique budget CSC343 – Introduction to Databases Normal Forms — 7 CSC343 – Introduction to Databases Normal Forms — 8 4 Non-Trivial Dependencies Looking for FDs A functional dependency Y→Z is non-trivial if no attribute in Z appears among attributes of Y, e.g., Employee Salary Project Budget Function Employee → Salary is non-trivial; Brown 20 Mars 2 technician Employee,Project → Project is trivial. Green 35 Jupiter 15 designer Green 35 Venus 15 designer Anomalies arise precisely for the attributes which are Hoskins 55 Venus 15 manager involved in (non-trivial) functional dependencies: Hoskins 55 Jupiter 15 consultant Hoskins 55 Mars 2 consultant Employee → Salary; Moore 48 Mars 2 manager Project → Budget. Moore 48 Venus 15 designer Kemp 48 Venus 15 designer Moreover, note that our example includes another Kemp 48 Jupiter 15 manager functional dependency: Employee,Project → Function. CSC343 – Introduction to Databases Normal Forms — 9 CSC343 – Introduction to Databases Normal Forms — 10 5 Dependencies Cause Anomalies, Another Example ...Sometimes! ER Model The first two dependencies cause undesirable redundancies and anomalies. (0.N) The third dependency, however, does not cause redundancies because {Employee,Project} constitutes a key of the SI# Name Address Hobbies This is NOT a relation relation (...and a relation cannot contain two 1111 Joe 123 Main {biking, hiking} tuples with the same values for the key attributes.) Relational Model SI# Name Address Hobby Dependencies on keys are OK, 1111 Joe 123 Main biking other dependencies are not! 1111 Joe 123 Main hiking ……………. Redundancy CSC343 – Introduction to Databases Normal Forms — 11 CSC343 – Introduction to Databases Normal Forms — 12 6 Superkey Constraints How Do We Eliminate Redundancy? A superkey constraint is a special functional Decomposition: Use two relations to store dependency: Let K be a set of attributes of R, and U Person information: the set of all attributes of R. Then K is a superkey iff the functional dependency K U is satisfied in Person1 (SI#, Name, Address) → R. Hobbies (SI#, Hobby) E.g., SI# → SI#,Name,Address (for a Person The decomposition is more general: people with relation) hobbies can now be described independently of A key is a minimal superkey, I.e., for each X ⊂ K, X their name and address. is not a superkey No update anomalies: SI#, Hobby → SI#, Name, Address, Hobby but Name and address stored once; SI# → SI#, Name, Address, Hobby A hobby can be separately supplied or Hobby SI#, Name, Address, Hobby deleted; → We can represent persons with no hobbies. A key attribute is an attribute that is part of a key. CSC343 – Introduction to Databases Normal Forms — 13 CSC343 – Introduction to Databases Normal Forms — 14 7 More Examples When are FDs "Equivalent"? Address → PostalCode Sometimes functional dependencies (FDs) DCS’s postal code is M5S 3H5 seem to be saying the same thing, Author, Title, Edition → PublicationDate e.g., Addr → PostalCode,Str# vs Addr PostalCode, Addr Str# Ramakrishnan, et al., Database → → Management Systems, 3rd publication date Another example is 2003 Addr → PostalCode, PostalCode → Province CourseID → ExamDate, ExamTime vs Addr → PostalCode, PostalCode → Province vs Addr Province CSC343’s exam date is December 18, → When are two sets of FDs equivalent? How do starting at 7pm we "infer" new FDs from given FDs? CSC343 – Introduction to Databases Normal Forms — 15 CSC343 – Introduction to Databases Normal Forms — 16 8 Entailment, Closure, Equivalence How Do We Compute Entailment? Satisfaction, entailment, and equivalence are If F is a set of FDs on schema R and f is another semantic concepts – defined in terms of the FD on R, then F entails f (written F |= f) if every "meaning" of relations in the “real world.” instance r of R that satisfies every FD in F also satisfies f. How to check if F entails f, F and G are equivalent? Example: F = {A B, B C} and f is A C → → → Apply the respective definitions for all possible If Phone# → Address and Address → relation instances for a schema R …… ZipCode, then Phone# → ZipCode Find algorithmic, syntactic ways to compute The closure of F, denoted F+, is the set of all FDs these notions. entailed by F. Note: The syntactic solution must be "correct” with F and G are equivalent if F entails G and G respect to the semantic definitions. entails F. Correctness has two aspects: soundness and completeness – see later. CSC343 – Introduction to Databases Normal Forms — 17 CSC343 – Introduction to Databases Normal Forms — 18 9 Soundness Armstrong’s Axioms for FDs Theorem: F |- f implies F |= f This is the syntactic way of computing/testing In words: If FD f: X → Y can be derived from a set of semantic properties of FDs FDs F using the axioms, then f holds in every relation Reflexivity: Y ⊆ X |- X → Y (trivial FD) that satisfies every FD in F. Example: Given X → Y and X → Z then e.g., |- Name, Address → Name X XY Augmentation by X Augmentation: X → Y |- XZ → YZ → YX → YZ Augmentation by Y e.g., Address → ZipCode |- X → YZ Transitivity Address,Name → ZipCode, Name Thus, X → YZ is satisfied in every relation where both Transitivity: X Y, Y Z |- X Z → → → X → Y and X → Z are satisfied. We have derived e.g., Phone# → Address, Address → ZipCode the union rule for FDs. |- Phone# → ZipCode CSC343 – Introduction to Databases Normal Forms — 19 CSC343 – Introduction to Databases Normal Forms — 20 10 Completeness Correctness Theorem: F |= f implies F |- f The notions of soundness and In words: If F entails f, then f can be derived completeness link the syntax (Armstrong’s from F using Armstrong's axioms. axioms) with semantics, i.e., entailment defined in terms of relational instances. A consequence of completeness is the following (naïve) algorithm to determining if This is a precise way of saying that the F entails f: algorithm for entailment based on the
Recommended publications
  • Functional Dependencies
    Basics of FDs Manipulating FDs Closures and Keys Minimal Bases Functional Dependencies T. M. Murali October 19, 21 2009 T. M. Murali October 19, 21 2009 CS 4604: Functional Dependencies Students(Id, Name) Advisors(Id, Name) Advises(StudentId, AdvisorId) Favourite(StudentId, AdvisorId) I Suppose we perversely decide to convert Students, Advises, and Favourite into one relation. Students(Id, Name, AdvisorId, AdvisorName, FavouriteAdvisorId) Basics of FDs Manipulating FDs Closures and Keys Minimal Bases Example Name Id Name Id Students Advises Advisors Favourite I Convert to relations: T. M. Murali October 19, 21 2009 CS 4604: Functional Dependencies I Suppose we perversely decide to convert Students, Advises, and Favourite into one relation. Students(Id, Name, AdvisorId, AdvisorName, FavouriteAdvisorId) Basics of FDs Manipulating FDs Closures and Keys Minimal Bases Example Name Id Name Id Students Advises Advisors Favourite I Convert to relations: Students(Id, Name) Advisors(Id, Name) Advises(StudentId, AdvisorId) Favourite(StudentId, AdvisorId) T. M. Murali October 19, 21 2009 CS 4604: Functional Dependencies Basics of FDs Manipulating FDs Closures and Keys Minimal Bases Example Name Id Name Id Students Advises Advisors Favourite I Convert to relations: Students(Id, Name) Advisors(Id, Name) Advises(StudentId, AdvisorId) Favourite(StudentId, AdvisorId) I Suppose we perversely decide to convert Students, Advises, and Favourite into one relation. Students(Id, Name, AdvisorId, AdvisorName, FavouriteAdvisorId) T. M. Murali October 19, 21 2009 CS 4604: Functional Dependencies Name and FavouriteAdvisorId. Id ! Name Id ! FavouriteAdvisorId AdvisorId ! AdvisorName I Can we say Id ! AdvisorId? NO! Id is not a key for the Students relation. I What is the key for the Students? fId, AdvisorIdg.
    [Show full text]
  • Normalized Form Snowflake Schema
    Normalized Form Snowflake Schema Half-pound and unascertainable Wood never rhubarbs confoundedly when Filbert snore his sloop. Vertebrate or leewardtongue-in-cheek, after Hazel Lennie compartmentalized never shreddings transcendentally, any misreckonings! quite Crystalloiddiverted. Euclid grabbles no yorks adhered The star schemas in this does not have all revenue for this When done use When doing table contains less sensible of rows Snowflake Normalizationde-normalization Dimension tables are in normalized form the fact. Difference between Star Schema & Snow Flake Schema. The major difference between the snowflake and star schema models is slot the dimension tables of the snowflake model may want kept in normalized form to. Typically most of carbon fact tables in this star schema are in the third normal form while dimensional tables are de-normalized second normal. A relation is danger to pause in First Normal Form should each attribute increase the. The model is lazy in single third normal form 1141 Options to Normalize Assume that too are 500000 product dimension rows These products fall under 500. Hottest 'snowflake-schema' Answers Stack Overflow. Learn together is Star Schema Snowflake Schema And the Difference. For step three within the warehouses we tested Redshift Snowflake and Bigquery. On whose other hand snowflake schema is in normalized form. The CWM repository schema is a standalone product that other products can shareeach product owns only. The main difference between in two is normalization. Families of normalized form snowflake schema snowflake. Star and Snowflake Schema in Data line with Examples. Is spread the dimension tables in the snowflake schema are normalized. Like price weight speed and quantitiesie data execute a numerical format.
    [Show full text]
  • Powerdesigner 16.6 Data Modeling
    SAP® PowerDesigner® Document Version: 16.6 – 2016-02-22 Data Modeling Content 1 Building Data Models ...........................................................8 1.1 Getting Started with Data Modeling...................................................8 Conceptual Data Models........................................................8 Logical Data Models...........................................................9 Physical Data Models..........................................................9 Creating a Data Model.........................................................10 Customizing your Modeling Environment........................................... 15 1.2 Conceptual and Logical Diagrams...................................................26 Supported CDM/LDM Notations.................................................27 Conceptual Diagrams.........................................................31 Logical Diagrams............................................................43 Data Items (CDM)............................................................47 Entities (CDM/LDM)..........................................................49 Attributes (CDM/LDM)........................................................55 Identifiers (CDM/LDM)........................................................58 Relationships (CDM/LDM)..................................................... 59 Associations and Association Links (CDM)..........................................70 Inheritances (CDM/LDM)......................................................77 1.3 Physical Diagrams..............................................................82
    [Show full text]
  • Denormalization Strategies for Data Retrieval from Data Warehouses
    Decision Support Systems 42 (2006) 267–282 www.elsevier.com/locate/dsw Denormalization strategies for data retrieval from data warehouses Seung Kyoon Shina,*, G. Lawrence Sandersb,1 aCollege of Business Administration, University of Rhode Island, 7 Lippitt Road, Kingston, RI 02881-0802, United States bDepartment of Management Science and Systems, School of Management, State University of New York at Buffalo, Buffalo, NY 14260-4000, United States Available online 20 January 2005 Abstract In this study, the effects of denormalization on relational database system performance are discussed in the context of using denormalization strategies as a database design methodology for data warehouses. Four prevalent denormalization strategies have been identified and examined under various scenarios to illustrate the conditions where they are most effective. The relational algebra, query trees, and join cost function are used to examine the effect on the performance of relational systems. The guidelines and analysis provided are sufficiently general and they can be applicable to a variety of databases, in particular to data warehouse implementations, for decision support systems. D 2004 Elsevier B.V. All rights reserved. Keywords: Database design; Denormalization; Decision support systems; Data warehouse; Data mining 1. Introduction houses as issues related to database design for high performance are receiving more attention. Database With the increased availability of data collected design is still an art that relies heavily on human from the Internet and other sources and the implemen- intuition and experience. Consequently, its practice is tation of enterprise-wise data warehouses, the amount becoming more difficult as the applications that data- of data that companies possess is growing at a bases support become more sophisticated [32].Cur- phenomenal rate.
    [Show full text]
  • A Comprehensive Analysis of Sybase Powerdesigner 16.0
    white paper A Comprehensive Analysis of Sybase® PowerDesigner® 16.0 InformationArchitect vs. ER/Studio XE2 Version 2.0 www.sybase.com TABLe OF CONTENtS 1 Introduction 1 Product Overviews 1 ER/Studio XE2 3 Sybase PowerDesigner 16.0 4 Data Modeling Activities 4 Overview 6 Types of Data Model 7 Design Layers 8 Managing the SAM-LDM Relationship 10 Forward and Reverse Engineering 11 Round-trip Engineering 11 Integrating Data Models with Requirements and Processes 11 Generating Object-oriented Models 11 Dependency Analysis 17 Model Comparisons and Merges 18 Update Flows 18 Required Features for a Data Modeling Tool 18 Core Modeling 25 Collaboration 27 Interfaces & Integration 29 Usability 34 Managing Models as a Project 36 Dependency Matrices 37 Conclusions 37 Acknowledgements 37 Bibliography 37 About the Author IntrOduCtion Data modeling is more than just database design, because data doesn’t just exist in databases. Data does not exist in isolation, it is created, managed and consumed by business processes, and those business processes are implemented using a variety of applications and technologies. To truly understand and manage our data, and the impact of changes to that data, we need to manage more than just models of data in databases. We need support for different types of data models, and for managing the relationships between data and the rest of the organization. When you need to manage a data center across the enterprise, integrating with a wider set of business and technology activities is critical to success. For this reason, this review will use the InformationArchitect version of Sybase PowerDesigner rather than their DataArchitect™ version.
    [Show full text]
  • Relational Database Design Chapter 7
    Chapter 7: Relational Database Design Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design Functional Dependencies Decomposition Boyce-Codd Normal Form Third Normal Form Multivalued Dependencies and Fourth Normal Form Overall Database Design Process Database System Concepts 7.2 ©Silberschatz, Korth and Sudarshan 1 First Normal Form Domain is atomic if its elements are considered to be indivisible units + Examples of non-atomic domains: Set of names, composite attributes Identification numbers like CS101 that can be broken up into parts A relational schema R is in first normal form if the domains of all attributes of R are atomic Non-atomic values complicate storage and encourage redundant (repeated) storage of data + E.g. Set of accounts stored with each customer, and set of owners stored with each account + We assume all relations are in first normal form (revisit this in Chapter 9 on Object Relational Databases) Database System Concepts 7.3 ©Silberschatz, Korth and Sudarshan First Normal Form (Contd.) Atomicity is actually a property of how the elements of the domain are used. + E.g. Strings would normally be considered indivisible + Suppose that students are given roll numbers which are strings of the form CS0012 or EE1127 + If the first two characters are extracted to find the department, the domain of roll numbers is not atomic. + Doing so is a bad idea: leads to encoding of information in application program rather than in the database. Database System Concepts 7.4 ©Silberschatz, Korth and Sudarshan 2 Pitfalls in Relational Database Design Relational database design requires that we find a “good” collection of relation schemas.
    [Show full text]
  • Boyce-Codd Normal Forms Lecture 10 Sections 15.1 - 15.4
    Boyce-Codd Normal Forms Lecture 10 Sections 15.1 - 15.4 Robb T. Koether Hampden-Sydney College Wed, Feb 6, 2013 Robb T. Koether (Hampden-Sydney College) Boyce-Codd Normal Forms Wed, Feb 6, 2013 1 / 15 1 Third Normal Form 2 Boyce-Codd Normal Form 3 Assignment Robb T. Koether (Hampden-Sydney College) Boyce-Codd Normal Forms Wed, Feb 6, 2013 2 / 15 Outline 1 Third Normal Form 2 Boyce-Codd Normal Form 3 Assignment Robb T. Koether (Hampden-Sydney College) Boyce-Codd Normal Forms Wed, Feb 6, 2013 3 / 15 Third Normal Form Definition (Transitive Dependence) A set of attributes Z is transitively dependent on a set of attributes X if there exists a set of attributes Y such that X ! Y and Y ! Z. Definition (Third Normal Form) A relation R is in third normal form (3NF) if it is in 2NF and there is no nonprime attribute of R that is transitively dependent on any key of R. 3NF is violated if there is a nonprime attribute A that depends on something less than a key. Robb T. Koether (Hampden-Sydney College) Boyce-Codd Normal Forms Wed, Feb 6, 2013 4 / 15 Example Example order_no cust_no cust_name 222-1 3333 Joe Smith 444-2 4444 Sue Taylor 555-1 3333 Joe Smith 777-2 7777 Bob Sponge 888-3 4444 Sue Taylor Table 3 Table 3 is in 2NF, but it is not in 3NF because [order_no] ! [cust_no] ! [cust_name]: Robb T. Koether (Hampden-Sydney College) Boyce-Codd Normal Forms Wed, Feb 6, 2013 5 / 15 3NF Normalization To put a relation into 3NF, for each set of transitive function dependencies X ! Y ! Z , make two tables, one for X ! Y and another for Y ! Z .
    [Show full text]
  • Normalization Exercises
    DATABASE DESIGN: NORMALIZATION NOTE & EXERCISES (Up to 3NF) Tables that contain redundant data can suffer from update anomalies, which can introduce inconsistencies into a database. The rules associated with the most commonly used normal forms, namely first (1NF), second (2NF), and third (3NF). The identification of various types of update anomalies such as insertion, deletion, and modification anomalies can be found when tables that break the rules of 1NF, 2NF, and 3NF and they are likely to contain redundant data and suffer from update anomalies. Normalization is a technique for producing a set of tables with desirable properties that support the requirements of a user or company. Major aim of relational database design is to group columns into tables to minimize data redundancy and reduce file storage space required by base tables. Take a look at the following example: StdSSN StdCity StdClass OfferNo OffTerm OffYear EnrGrade CourseNo CrsDesc S1 SEATTLE JUN O1 FALL 2006 3.5 C1 DB S1 SEATTLE JUN O2 FALL 2006 3.3 C2 VB S2 BOTHELL JUN O3 SPRING 2007 3.1 C3 OO S2 BOTHELL JUN O2 FALL 2006 3.4 C2 VB The insertion anomaly: Occurs when extra data beyond the desired data must be added to the database. For example, to insert a course (CourseNo), it is necessary to know a student (StdSSN) and offering (OfferNo) because the combination of StdSSN and OfferNo is the primary key. Remember that a row cannot exist with NULL values for part of its primary key. The update anomaly: Occurs when it is necessary to change multiple rows to modify ONLY a single fact.
    [Show full text]
  • Fundamentals of Database Systems [Normalization – II]
    Outline First Normal Form Second Normal Form Third Normal Form Boyce-Codd Normal Form Fundamentals of Database Systems [Normalization { II] Malay Bhattacharyya Assistant Professor Machine Intelligence Unit Indian Statistical Institute, Kolkata October, 2019 Outline First Normal Form Second Normal Form Third Normal Form Boyce-Codd Normal Form 1 First Normal Form 2 Second Normal Form 3 Third Normal Form 4 Boyce-Codd Normal Form Outline First Normal Form Second Normal Form Third Normal Form Boyce-Codd Normal Form First normal form The domain (or value set) of an attribute defines the set of values it might contain. A domain is atomic if elements of the domain are considered to be indivisible units. Company Make Company Make Maruti WagonR, Ertiga Maruti WagonR, Ertiga Honda City Honda City Tesla RAV4 Tesla, Toyota RAV4 Toyota RAV4 BMW X1 BMW X1 Only Company has atomic domain None of the attributes have atomic domains Outline First Normal Form Second Normal Form Third Normal Form Boyce-Codd Normal Form First normal form Definition (First normal form (1NF)) A relational schema R is in 1NF iff the domains of all attributes in R are atomic. The advantages of 1NF are as follows: It eliminates redundancy It eliminates repeating groups. Note: In practice, 1NF includes a few more practical constraints like each attribute must be unique, no tuples are duplicated, and no columns are duplicated. Outline First Normal Form Second Normal Form Third Normal Form Boyce-Codd Normal Form First normal form The following relation is not in 1NF because the attribute Model is not atomic. Company Country Make Model Distributor Maruti India WagonR LXI, VXI Carwala Maruti India WagonR LXI Bhalla Maruti India Ertiga VXI Bhalla Honda Japan City SV Bhalla Tesla USA RAV4 EV CarTrade Toyota Japan RAV4 EV CarTrade BMW Germany X1 Expedition CarTrade We can convert this relation into 1NF in two ways!!! Outline First Normal Form Second Normal Form Third Normal Form Boyce-Codd Normal Form First normal form Approach 1: Break the tuples containing non-atomic values into multiple tuples.
    [Show full text]
  • Aslmple GUIDE to FIVE NORMAL FORMS in RELATIONAL DATABASE THEORY
    COMPUTING PRACTICES ASlMPLE GUIDE TO FIVE NORMAL FORMS IN RELATIONAL DATABASE THEORY W|LL|AM KErr International Business Machines Corporation 1. INTRODUCTION The normal forms defined in relational database theory represent guidelines for record design. The guidelines cor- responding to first through fifth normal forms are pre- sented, in terms that do not require an understanding of SUMMARY: The concepts behind relational theory. The design guidelines are meaningful the five principal normal forms even if a relational database system is not used. We pres- in relational database theory are ent the guidelines without referring to the concepts of the presented in simple terms. relational model in order to emphasize their generality and to make them easier to understand. Our presentation conveys an intuitive sense of the intended constraints on record design, although in its informality it may be impre- cise in some technical details. A comprehensive treatment of the subject is provided by Date [4]. The normalization rules are designed to prevent up- date anomalies and data inconsistencies. With respect to performance trade-offs, these guidelines are biased to- ward the assumption that all nonkey fields will be up- dated frequently. They tend to penalize retrieval, since Author's Present Address: data which may have been retrievable from one record in William Kent, International Business Machines an unnormalized design may have to be retrieved from Corporation, General several records in the normalized form. There is no obli- Products Division, Santa gation to fully normalize all records when actual perform- Teresa Laboratory, ance requirements are taken into account. San Jose, CA Permission to copy without fee all or part of this 2.
    [Show full text]
  • A Hybrid Approach to Functional Dependency Discovery
    A Hybrid Approach to Functional Dependency Discovery Thorsten Papenbrock Felix Naumann Hasso Plattner Institute (HPI) Hasso Plattner Institute (HPI) 14482 Potsdam, Germany 14482 Potsdam, Germany [email protected] [email protected] ABSTRACT concrete metadata. One reason is that they depend not only Functional dependencies are structural metadata that can on a schema but also on a concrete relational instance. For be used for schema normalization, data integration, data example, child ! teacher might hold for kindergarten chil- cleansing, and many other data management tasks. Despite dren, but it does not hold for high-school children. Conse- their importance, the functional dependencies of a specific quently, functional dependencies also change over time when dataset are usually unknown and almost impossible to dis- data is extended, altered, or merged with other datasets. cover manually. For this reason, database research has pro- Therefore, discovery algorithms are needed that reveal all posed various algorithms for functional dependency discov- functional dependencies of a given dataset. ery. None, however, are able to process datasets of typical Due to the importance of functional dependencies, many real-world size, e.g., datasets with more than 50 attributes discovery algorithms have already been proposed. Unfor- and a million records. tunately, none of them is able to process datasets of real- world size, i.e., datasets with more than 50 columns and We present a hybrid discovery algorithm called HyFD, which combines fast approximation techniques with efficient a million rows, as a recent study showed [20]. Because validation techniques in order to find all minimal functional the need for normalization, cleansing, and query optimiza- dependencies in a given dataset.
    [Show full text]
  • Database Normalization
    Outline Data Redundancy Normalization and Denormalization Normal Forms Database Management Systems Database Normalization Malay Bhattacharyya Assistant Professor Machine Intelligence Unit and Centre for Artificial Intelligence and Machine Learning Indian Statistical Institute, Kolkata February, 2020 Malay Bhattacharyya Database Management Systems Outline Data Redundancy Normalization and Denormalization Normal Forms 1 Data Redundancy 2 Normalization and Denormalization 3 Normal Forms First Normal Form Second Normal Form Third Normal Form Boyce-Codd Normal Form Elementary Key Normal Form Fourth Normal Form Fifth Normal Form Domain Key Normal Form Sixth Normal Form Malay Bhattacharyya Database Management Systems These issues can be addressed by decomposing the database { normalization forces this!!! Outline Data Redundancy Normalization and Denormalization Normal Forms Redundancy in databases Redundancy in a database denotes the repetition of stored data Redundancy might cause various anomalies and problems pertaining to storage requirements: Insertion anomalies: It may be impossible to store certain information without storing some other, unrelated information. Deletion anomalies: It may be impossible to delete certain information without losing some other, unrelated information. Update anomalies: If one copy of such repeated data is updated, all copies need to be updated to prevent inconsistency. Increasing storage requirements: The storage requirements may increase over time. Malay Bhattacharyya Database Management Systems Outline Data Redundancy Normalization and Denormalization Normal Forms Redundancy in databases Redundancy in a database denotes the repetition of stored data Redundancy might cause various anomalies and problems pertaining to storage requirements: Insertion anomalies: It may be impossible to store certain information without storing some other, unrelated information. Deletion anomalies: It may be impossible to delete certain information without losing some other, unrelated information.
    [Show full text]