Normalization Rules

Total Page:16

File Type:pdf, Size:1020Kb

Normalization Rules APPENDIX Normalization Rules Normalization is the process of removing data redundancy by implementing normalization rules. There are five degrees of normal forms, from the first normal form through the fifth normal form, as described in this appendix. First Normal Form The following are the characteristics of first normal form (1NF): • There must not be any repeating columns or groups of columns. An example of a repeating column is a customer table with Phone Number 1 and Phone Number 2 columns. Using “table (column, column)” notation, an example of a repeating group of columns is Order Table (Order ID, Order Date, Product ID, Price, Quantity, Product ID, Price, Quantity). Product ID, Price, and Quantity are the repeating group of columns. • Each table must have a primary key (PK) that uniquely identifies each row. The PK can be a composite, that is, can consist of several columns, for example, Order Table (Order ID, Order Date, Customer ID, Product ID, Product Name, Price, Quantity). In this notation, the underlined columns are the PKs; in this case, Order ID and Product ID are a composite PK. Second Normal Form The following are the characteristics of second normal form (2NF): • It must be in 1NF. • When each value in column 1 is associated with a value in column 2, we say that column 2 is dependant on column 1, for example, Customer (Customer ID, Customer Name). Customer Name is dependant on Customer ID, noted as Customer ID ➤ Customer Name. 505 506 APPENDIX ■ NORMALIZATION RULES • In 2NF, all non-PK columns must be dependent on the entire PK, not just on part of it, for example, Order Table (Order ID, Order Date, Product ID, Price, Quantity). The underlined columns are a composite PK. Order Date is dependent on Order ID but not on Product ID. This violates 2NF. • To make it 2NF, we need to break it into two tables: Order Header (Order ID, Order Date) and Order Item (Order ID,Product ID, Price, Quantity). Now all non-PK columns are dependent on the entire PK. In the Order Header table, Order Date is dependent on Order ID. In the Order Item table, Price and Quantity are dependent on Order ID and Product ID. Order ID in the Order Item table is a foreign key. Third Normal Form The following are the characteristics of third normal form (3NF): • It must be in 2NF. • If column 1 is dependent on column 2 and column 2 is dependent on column 3, we say that column 3 is transitively dependent on column 1. In 3NF, no column is tran- sitively dependent on the PK, for example, Product (Product ID, Product Name, Category ID, Category Name). Category Name is dependant on Category ID, and Category ID is dependant on Product ID. Category Name is transitively dependent on the PK (Product ID). This violates 3NF. • To make it 3NF, we need to break it into two tables: Product (Product ID, Product Name, Category ID) and Category (CategoryID, Category Name). Now no column is transi- tively dependent on the PK. Category ID in the Product table is a foreign key. Boyce-Codd Normal Form Boyce-Codd Normal Form (BCNF) is between 3NF and 4NF. The following are the characteris- tics of BCNF: • It must be in 3NF. • In Customer ID ➤ Customer Name, we say that Customer ID is a determinant. In BCNF,every determinant must be a candidate PK. A candidate PK means capable of being a PK; that is, it uniquely identifies each row. • BCNF is applicable to situations where you have two or more candidate composite PKs, such as with a cable TV service engineer visiting customers: Visit (Date,Route ID, Shift ID, Customer ID, Engineer ID, Vehicle ID). A visit to a customer can be identified using Date, Route ID, and Customer ID as the composite PK. Alternatively, the PK can be Shift ID and Customer ID. Shift ID is the determinant of Date and Route ID. APPENDIX ■ NORMALIZATION RULES 507 Higher Normal Forms The following are the characteristics of other normal forms: • A table is in fourth normal form (4NF) when it is in BCNF and there are no multivalued dependencies. • A table is in fifth normal form (5NF) when it is in 4NF and there are no cyclic dependencies. It is a good practice to apply 4NF or 5NF when it is applicable. ■Note A sixth normal form (6NF) has been suggested, but it’s not widely accepted or implemented yet. Index ■Numbers and Symbols overview, 302 @ for naming report parameters, 343 purposes of, 323 1NF (first normal form), 506 audits 2NF (second normal form), 505 DQ auditing, 296–298 3NF (third normal form), 506 ETL, defined, 31 4NF (fourth normal form), 507 reports, 332 5NF (fifth normal form), 507 authentication of users, 498 authorization of user access, 498 ■A Auto Build, 385 accounts, security audits of, 499 Auto Layout, 249 action column, 322 autofix action (DQ rules), 296 actions, data quality, 293–296 automating ETL monitoring, 492–493 administration functions ■ data quality monitoring, 495–498 B database management, 499–501 backing up ETL monitoring, 492–495 databases, 500 schema changes, 501–502 MDBs, 405–408 security management, 498–499 band attribute (Amadeus), 64 updating applications, 503 batch files ADOMD.NET, 412 creating, 138, 157 aggregates. See also summary tables ETL, 269 defined, 415 updating, 15–16 alerts (BI), 437–438 BCNF (Boyce-Codd Normal Form), 506 aligning partition indexes, 166 BI (Business Intelligence) allow action (DQ rules), 295 alerts, 437–438 Amadeus Entertainment case study. See case analytics applications, 413–416 study (Amadeus Entertainment) application categories, 411 AMO (Analysis Management Objects), 417 Business Intelligence Development Studio Analysis Services (OLAP) Report Wizard, 339 authentication and, 397 dashboard applications, 432–437 cubes in, 397 data mining applications. See data mining failover clusters and, 115 applications (BI) partitioned cubes, 119 examples of, 12–13 tools vs. reports, 333 portal applications, 438–439 analytics applications (BI), 413–416 reports, 34, 412–413 applications, updating by DWA, 503 search product vendors, 474 architectures systems, applications for, 17–18 data flow. See data flow architecture binary files, importing, 190 determining, 52 bitmapping, index, 169 system. See system architecture design block lists (black lists), 451 association scores, 471 boolean data type (data mining), 419, 420 attributes, customer, 444 bounce rate (e-mail), defined, 447 audio processing, text analytics and, 473 bridge tables, defined, 109 audit metadata bulk copy utility (bcp) SQL command, components of, 323 188–189 event tables, 323 bulk insert SQL command, 187, 189 maintaining, 327 business areas, identifying (Amadeus), 61–62 business case document, 51–52 509 510 ■INDEX business Intelligence (BI). See BI (Business class attribute (Amadeus), 64 Intelligence) classification algorithm, 422 Business Objects Crystal Report XI, 356 cleaning (CDI), defined, 468 Business Objects XI Release 2 Voyager, 380 cleansing, data, 277–290 business operations, evaluating (Amadeus), click-through rate (email), 98, 447 62–63 clustered configuration, defined, 43 business performance management, 13 clustering algorithm, 422 business requirements Clustering model, 431 CRM data marts (Amadeus), 96 Cognos subscription sales data mart (Amadeus), BI 8 Analysis, 380 90 PowerCube, 377, 379 verifying with functional testing, 480 Powerplay, 356 collation, database, 124 ■C columns calendar date attributes column (date continuous (data mining), 419 dimension), 77–78 cyclical (data mining), 420 campaigns description (data definition table), 305 creating CRM, 447–448 discrete (data mining), 419 defined, 447 discretized (data mining), 419 delivery/response data (CRM), 454–460 ordered (data mining), 420 response selection queries, 449 repeating, 505 results fact table, 99, 450 risk_level column, 322 segmentation (CRM), 18, 98, 447–450 status, 320, 322 candidate PK, 506 storing historical data as, 81 case sensitivity in database configuration, types in DW tables, 306 124 communication case study (Amadeus Entertainment) Communication Subscriptions Fact Table data feasibility study, 67–70 (example), 452 data warehouse risks, 67 communication_subscription transaction defining functional requirements, 63–65 table (NDS database), 140–143 defining nonfunctional requirements, master table (NDS physical database), 143 65–67 permission, defined, 96 evaluating business operations, 62–63 preferences, defined, 96 extracting Jade data with SSIS, 191–200 subscription, defined, 96 functional testing of data warehouse, 480 comparing data (ETL monitoring), 494–495 identifying business areas, 61–62 complaint rate (email), 98 iterative methodology example, 56–58 conformed dimensions overview of, 44–46 creating (views), 158 product sales. See product sales data mart defined, 7 (Amadeus) consolidation of data, 5–6 product sales reports, 349, 353, 355, 359, construction iteration, 56 369 content types (data mining), 419–420 query for product sales report, 331 continuous columns (data mining), 419 security testing, 485 control system, ETL, 31 server licenses and, 119 converting data for consolidation, 6 case table, defined (data mining), 418 cookies vs. self-authentication, 464 CDI (Customer Data Integration) covering index, 170 customer data store schema, 469 CRM (customer relationship management) fundamentals, 23–24, 467–468 basics, 14 implementation of, 469 campaign analysis (Amadeus), 64 CET (current extraction time), 182 campaign delivery/response data, change requests, procedures for, 501 454–460 character-based data types, 277 campaign segmentation, 447–450 charting. See also analytics applications (BI), customer analysis, 460–463
Recommended publications
  • Normalized Form Snowflake Schema
    Normalized Form Snowflake Schema Half-pound and unascertainable Wood never rhubarbs confoundedly when Filbert snore his sloop. Vertebrate or leewardtongue-in-cheek, after Hazel Lennie compartmentalized never shreddings transcendentally, any misreckonings! quite Crystalloiddiverted. Euclid grabbles no yorks adhered The star schemas in this does not have all revenue for this When done use When doing table contains less sensible of rows Snowflake Normalizationde-normalization Dimension tables are in normalized form the fact. Difference between Star Schema & Snow Flake Schema. The major difference between the snowflake and star schema models is slot the dimension tables of the snowflake model may want kept in normalized form to. Typically most of carbon fact tables in this star schema are in the third normal form while dimensional tables are de-normalized second normal. A relation is danger to pause in First Normal Form should each attribute increase the. The model is lazy in single third normal form 1141 Options to Normalize Assume that too are 500000 product dimension rows These products fall under 500. Hottest 'snowflake-schema' Answers Stack Overflow. Learn together is Star Schema Snowflake Schema And the Difference. For step three within the warehouses we tested Redshift Snowflake and Bigquery. On whose other hand snowflake schema is in normalized form. The CWM repository schema is a standalone product that other products can shareeach product owns only. The main difference between in two is normalization. Families of normalized form snowflake schema snowflake. Star and Snowflake Schema in Data line with Examples. Is spread the dimension tables in the snowflake schema are normalized. Like price weight speed and quantitiesie data execute a numerical format.
    [Show full text]
  • Sixth Normal Form
    3 I January 2015 www.ijraset.com Volume 3 Issue I, January 2015 ISSN: 2321-9653 International Journal for Research in Applied Science & Engineering Technology (IJRASET) Sixth Normal Form Neha1, Sanjeev Kumar2 1M.Tech, 2Assistant Professor, Department of CSE, Shri Balwant College of Engineering &Technology, DCRUST University Abstract – Sixth Normal Form (6NF) is a term used in relational database theory by Christopher Date to describe databases which decompose relational variables to irreducible elements. While this form may be unimportant for non-temporal data, it is certainly important when maintaining data containing temporal variables of a point-in-time or interval nature. With the advent of Data Warehousing 2.0 (DW 2.0), there is now an increased emphasis on using fully-temporalized databases in the context of data warehousing, in particular with next generation approaches such as Anchor Modeling . In this paper, we will explore the concepts of temporal data, 6NF conceptual database models, and their relationship with DW 2.0. Further, we will also evaluate Anchor Modeling as a conceptual design method in which to capture temporal data. Using these concepts, we will indicate a path forward for evaluating a possible translation of 6NF-compliant data into an eXtensible Markup Language (XML) Schema for the purpose of describing and presenting such data to disparate systems in a structured format suitable for data exchange. Keywords :, 6NF,SQL,DKNF,XML,Semantic data change, Valid Time, Transaction Time, DFM I. INTRODUCTION Normalization is the process of restructuring the logical data model of a database to eliminate redundancy, organize data efficiently and reduce repeating data and to reduce the potential for anomalies during data operations.
    [Show full text]
  • The Design of Multidimensional Data Model Using Principles of the Anchor Data Modeling: an Assessment of Experimental Approach Based on Query Execution Performance
    WSEAS TRANSACTIONS on COMPUTERS Radek Němec, František Zapletal The Design of Multidimensional Data Model Using Principles of the Anchor Data Modeling: An Assessment of Experimental Approach Based on Query Execution Performance RADEK NĚMEC, FRANTIŠEK ZAPLETAL Department of Systems Engineering Faculty of Economics, VŠB - Technical University of Ostrava Sokolská třída 33, 701 21 Ostrava CZECH REPUBLIC [email protected], [email protected] Abstract: - The decision making processes need to reflect changes in the business world in a multidimensional way. This includes also similar way of viewing the data for carrying out key decisions that ensure competitiveness of the business. In this paper we focus on the Business Intelligence system as a main toolset that helps in carrying out complex decisions and which requires multidimensional view of data for this purpose. We propose a novel experimental approach to the design a multidimensional data model that uses principles of the anchor modeling technique. The proposed approach is expected to bring several benefits like better query execution performance, better support for temporal querying and several others. We provide assessment of this approach mainly from the query execution performance perspective in this paper. The emphasis is placed on the assessment of this technique as a potential innovative approach for the field of the data warehousing with some implicit principles that could make the process of the design, implementation and maintenance of the data warehouse more effective. The query performance testing was performed in the row-oriented database environment using a sample of 10 star queries executed in the environment of 10 sample multidimensional data models.
    [Show full text]
  • Powerdesigner 16.6 Data Modeling
    SAP® PowerDesigner® Document Version: 16.6 – 2016-02-22 Data Modeling Content 1 Building Data Models ...........................................................8 1.1 Getting Started with Data Modeling...................................................8 Conceptual Data Models........................................................8 Logical Data Models...........................................................9 Physical Data Models..........................................................9 Creating a Data Model.........................................................10 Customizing your Modeling Environment........................................... 15 1.2 Conceptual and Logical Diagrams...................................................26 Supported CDM/LDM Notations.................................................27 Conceptual Diagrams.........................................................31 Logical Diagrams............................................................43 Data Items (CDM)............................................................47 Entities (CDM/LDM)..........................................................49 Attributes (CDM/LDM)........................................................55 Identifiers (CDM/LDM)........................................................58 Relationships (CDM/LDM)..................................................... 59 Associations and Association Links (CDM)..........................................70 Inheritances (CDM/LDM)......................................................77 1.3 Physical Diagrams..............................................................82
    [Show full text]
  • Translating Data Between Xml Schema and 6Nf Conceptual Models
    Georgia Southern University Digital Commons@Georgia Southern Electronic Theses and Dissertations Graduate Studies, Jack N. Averitt College of Spring 2012 Translating Data Between Xml Schema and 6Nf Conceptual Models Curtis Maitland Knowles Follow this and additional works at: https://digitalcommons.georgiasouthern.edu/etd Recommended Citation Knowles, Curtis Maitland, "Translating Data Between Xml Schema and 6Nf Conceptual Models" (2012). Electronic Theses and Dissertations. 688. https://digitalcommons.georgiasouthern.edu/etd/688 This thesis (open access) is brought to you for free and open access by the Graduate Studies, Jack N. Averitt College of at Digital Commons@Georgia Southern. It has been accepted for inclusion in Electronic Theses and Dissertations by an authorized administrator of Digital Commons@Georgia Southern. For more information, please contact [email protected]. 1 TRANSLATING DATA BETWEEN XML SCHEMA AND 6NF CONCEPTUAL MODELS by CURTIS M. KNOWLES (Under the Direction of Vladan Jovanovic) ABSTRACT Sixth Normal Form (6NF) is a term used in relational database theory by Christopher Date to describe databases which decompose relational variables to irreducible elements. While this form may be unimportant for non-temporal data, it is certainly important for data containing temporal variables of a point-in-time or interval nature. With the advent of Data Warehousing 2.0 (DW 2.0), there is now an increased emphasis on using fully-temporalized databases in the context of data warehousing, in particular with approaches such as the Anchor Model and Data Vault. In this work, we will explore the concepts of temporal data, 6NF conceptual database models, and their relationship with DW 2.0. Further, we will evaluate the Anchor Model and Data Vault as design methods in which to capture temporal data.
    [Show full text]
  • Denormalization Strategies for Data Retrieval from Data Warehouses
    Decision Support Systems 42 (2006) 267–282 www.elsevier.com/locate/dsw Denormalization strategies for data retrieval from data warehouses Seung Kyoon Shina,*, G. Lawrence Sandersb,1 aCollege of Business Administration, University of Rhode Island, 7 Lippitt Road, Kingston, RI 02881-0802, United States bDepartment of Management Science and Systems, School of Management, State University of New York at Buffalo, Buffalo, NY 14260-4000, United States Available online 20 January 2005 Abstract In this study, the effects of denormalization on relational database system performance are discussed in the context of using denormalization strategies as a database design methodology for data warehouses. Four prevalent denormalization strategies have been identified and examined under various scenarios to illustrate the conditions where they are most effective. The relational algebra, query trees, and join cost function are used to examine the effect on the performance of relational systems. The guidelines and analysis provided are sufficiently general and they can be applicable to a variety of databases, in particular to data warehouse implementations, for decision support systems. D 2004 Elsevier B.V. All rights reserved. Keywords: Database design; Denormalization; Decision support systems; Data warehouse; Data mining 1. Introduction houses as issues related to database design for high performance are receiving more attention. Database With the increased availability of data collected design is still an art that relies heavily on human from the Internet and other sources and the implemen- intuition and experience. Consequently, its practice is tation of enterprise-wise data warehouses, the amount becoming more difficult as the applications that data- of data that companies possess is growing at a bases support become more sophisticated [32].Cur- phenomenal rate.
    [Show full text]
  • A Comprehensive Analysis of Sybase Powerdesigner 16.0
    white paper A Comprehensive Analysis of Sybase® PowerDesigner® 16.0 InformationArchitect vs. ER/Studio XE2 Version 2.0 www.sybase.com TABLe OF CONTENtS 1 Introduction 1 Product Overviews 1 ER/Studio XE2 3 Sybase PowerDesigner 16.0 4 Data Modeling Activities 4 Overview 6 Types of Data Model 7 Design Layers 8 Managing the SAM-LDM Relationship 10 Forward and Reverse Engineering 11 Round-trip Engineering 11 Integrating Data Models with Requirements and Processes 11 Generating Object-oriented Models 11 Dependency Analysis 17 Model Comparisons and Merges 18 Update Flows 18 Required Features for a Data Modeling Tool 18 Core Modeling 25 Collaboration 27 Interfaces & Integration 29 Usability 34 Managing Models as a Project 36 Dependency Matrices 37 Conclusions 37 Acknowledgements 37 Bibliography 37 About the Author IntrOduCtion Data modeling is more than just database design, because data doesn’t just exist in databases. Data does not exist in isolation, it is created, managed and consumed by business processes, and those business processes are implemented using a variety of applications and technologies. To truly understand and manage our data, and the impact of changes to that data, we need to manage more than just models of data in databases. We need support for different types of data models, and for managing the relationships between data and the rest of the organization. When you need to manage a data center across the enterprise, integrating with a wider set of business and technology activities is critical to success. For this reason, this review will use the InformationArchitect version of Sybase PowerDesigner rather than their DataArchitect™ version.
    [Show full text]
  • Relational Database Design Chapter 7
    Chapter 7: Relational Database Design Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design Functional Dependencies Decomposition Boyce-Codd Normal Form Third Normal Form Multivalued Dependencies and Fourth Normal Form Overall Database Design Process Database System Concepts 7.2 ©Silberschatz, Korth and Sudarshan 1 First Normal Form Domain is atomic if its elements are considered to be indivisible units + Examples of non-atomic domains: Set of names, composite attributes Identification numbers like CS101 that can be broken up into parts A relational schema R is in first normal form if the domains of all attributes of R are atomic Non-atomic values complicate storage and encourage redundant (repeated) storage of data + E.g. Set of accounts stored with each customer, and set of owners stored with each account + We assume all relations are in first normal form (revisit this in Chapter 9 on Object Relational Databases) Database System Concepts 7.3 ©Silberschatz, Korth and Sudarshan First Normal Form (Contd.) Atomicity is actually a property of how the elements of the domain are used. + E.g. Strings would normally be considered indivisible + Suppose that students are given roll numbers which are strings of the form CS0012 or EE1127 + If the first two characters are extracted to find the department, the domain of roll numbers is not atomic. + Doing so is a bad idea: leads to encoding of information in application program rather than in the database. Database System Concepts 7.4 ©Silberschatz, Korth and Sudarshan 2 Pitfalls in Relational Database Design Relational database design requires that we find a “good” collection of relation schemas.
    [Show full text]
  • Boyce-Codd Normal Forms Lecture 10 Sections 15.1 - 15.4
    Boyce-Codd Normal Forms Lecture 10 Sections 15.1 - 15.4 Robb T. Koether Hampden-Sydney College Wed, Feb 6, 2013 Robb T. Koether (Hampden-Sydney College) Boyce-Codd Normal Forms Wed, Feb 6, 2013 1 / 15 1 Third Normal Form 2 Boyce-Codd Normal Form 3 Assignment Robb T. Koether (Hampden-Sydney College) Boyce-Codd Normal Forms Wed, Feb 6, 2013 2 / 15 Outline 1 Third Normal Form 2 Boyce-Codd Normal Form 3 Assignment Robb T. Koether (Hampden-Sydney College) Boyce-Codd Normal Forms Wed, Feb 6, 2013 3 / 15 Third Normal Form Definition (Transitive Dependence) A set of attributes Z is transitively dependent on a set of attributes X if there exists a set of attributes Y such that X ! Y and Y ! Z. Definition (Third Normal Form) A relation R is in third normal form (3NF) if it is in 2NF and there is no nonprime attribute of R that is transitively dependent on any key of R. 3NF is violated if there is a nonprime attribute A that depends on something less than a key. Robb T. Koether (Hampden-Sydney College) Boyce-Codd Normal Forms Wed, Feb 6, 2013 4 / 15 Example Example order_no cust_no cust_name 222-1 3333 Joe Smith 444-2 4444 Sue Taylor 555-1 3333 Joe Smith 777-2 7777 Bob Sponge 888-3 4444 Sue Taylor Table 3 Table 3 is in 2NF, but it is not in 3NF because [order_no] ! [cust_no] ! [cust_name]: Robb T. Koether (Hampden-Sydney College) Boyce-Codd Normal Forms Wed, Feb 6, 2013 5 / 15 3NF Normalization To put a relation into 3NF, for each set of transitive function dependencies X ! Y ! Z , make two tables, one for X ! Y and another for Y ! Z .
    [Show full text]
  • Normalization
    Normalization Normalization is a process in which we refine the quality of the logical design. Some call it a form of evaluation or validation. Codd said normalization is a process of elimination, through which we represent the table in a preferred format. Normalization is also called “non-loss decomposition” because a table is broken into smaller components but no information is lost. Are We Normal? If all values in a table are single atomic values (this is first normal form, or 1NF) then our database is technically in “normal form” according to Codd—the “math” of set theory and predicate logic will work. However, we usually want to do more to get our DB into a preferred format. We will take our databases to at least 3rd normal form. Why Do We Normalize? 1. We want the design to be easy to understand, with clear meaning and attributes in logical groups. 2. We want to avoid anomalies (data errors created on insertion, deletion, or modification) Suppose we have the table Nurse (SSN, Name, Unit, Unit Phone) Insertion anomaly: You need to insert the unit phone number every time you enter a nurse. What if you have a new unit that does not have any assigned staff yet? You can’t create a record with a blank primary key, so you can’t just leave nurse SSN blank. You would have no place to record the unit phone number until after you hire at least 1 nurse. Deletion anomaly: If you delete the last nurse on the unit you no longer have a record of the unit phone number.
    [Show full text]
  • Chapter 7 Multi Dimensional Data Modeling
    Chapter 7 Multi Dimensional Data Modeling Fundamentals of Business Analytics” Content of this presentation has been taken from Book “Fundamentals of Business Analytics” RN Prasad and Seema Acharya Published by Wiley India Pvt. Ltd. and it will always be the copyright of the authors of the book and publisher only. Basis • You are already familiar with the concepts relating to basics of RDBMS, OLTP, and OLAP, role of ERP in the enterprise as well as “enterprise production environment” for IT deployment. In the previous lectures, you have been explained the concepts - Types of Digital Data, Introduction to OLTP and OLAP, Business Intelligence Basics, and Data Integration . With this background, now its time to move ahead to think about “how data is modelled”. • Just like a circuit diagram is to an electrical engineer, • an assembly diagram is to a mechanical Engineer, and • a blueprint of a building is to a civil engineer • So is the data models/data diagrams for a data architect. • But is “data modelling” only the responsibility of a data architect? The answer is Business Intelligence (BI) application developer today is involved in designing, developing, deploying, supporting, and optimizing storage in the form of data warehouse/data marts. • To be able to play his/her role efficiently, the BI application developer relies heavily on data models/data diagrams to understand the schema structure, the data, the relationships between data, etc. In this lecture, we will learn • About basics of data modelling • How to go about designing a data model at the conceptual and logical levels? • Pros and Cons of the popular modelling techniques such as ER modelling and dimensional modelling Case Study – “TenToTen Retail Stores” • A new range of cosmetic products has been introduced by a leading brand, which TenToTen wants to sell through its various outlets.
    [Show full text]
  • Normalization Exercises
    DATABASE DESIGN: NORMALIZATION NOTE & EXERCISES (Up to 3NF) Tables that contain redundant data can suffer from update anomalies, which can introduce inconsistencies into a database. The rules associated with the most commonly used normal forms, namely first (1NF), second (2NF), and third (3NF). The identification of various types of update anomalies such as insertion, deletion, and modification anomalies can be found when tables that break the rules of 1NF, 2NF, and 3NF and they are likely to contain redundant data and suffer from update anomalies. Normalization is a technique for producing a set of tables with desirable properties that support the requirements of a user or company. Major aim of relational database design is to group columns into tables to minimize data redundancy and reduce file storage space required by base tables. Take a look at the following example: StdSSN StdCity StdClass OfferNo OffTerm OffYear EnrGrade CourseNo CrsDesc S1 SEATTLE JUN O1 FALL 2006 3.5 C1 DB S1 SEATTLE JUN O2 FALL 2006 3.3 C2 VB S2 BOTHELL JUN O3 SPRING 2007 3.1 C3 OO S2 BOTHELL JUN O2 FALL 2006 3.4 C2 VB The insertion anomaly: Occurs when extra data beyond the desired data must be added to the database. For example, to insert a course (CourseNo), it is necessary to know a student (StdSSN) and offering (OfferNo) because the combination of StdSSN and OfferNo is the primary key. Remember that a row cannot exist with NULL values for part of its primary key. The update anomaly: Occurs when it is necessary to change multiple rows to modify ONLY a single fact.
    [Show full text]