Database Normalization

Total Page:16

File Type:pdf, Size:1020Kb

Database Normalization Database Normalization (Olav Dæhli – 2018) 25.10.2018 OD: Normalization 1 What is normalization and why normalize? • Normalization: A set of rules to decompose relations (tables) into smaller relations (tables), without loosing any data dependencies. • The reason for doing this, is to avoid: • duplicates, when not needed • redundant data, when not needed • unnecessary null values (because it makes selections and joins more difficult) • anomalies when updating data (when the same data have to be updated several places) 25.10.2018 OD: Normalization 2 Redundancy and Anomalies • Redundant data • Data that are unnecessary to store in the database • Duplicated data (the same information stored in several places) • Information that can be derived/calculated from other data • Age from BirthDate • Price inclusive VAT (Value Added Tax) calculated from price exclusive VAT • Anomalies • When the same data are stored in several places, update anomalies can occure 25.10.2018 OD: Normalization 3 Functional Dependency • X -> Y (means that Y is functionally dependent of X) • X and Y can be either one attribute (column) or a set of attributes (columns) (X1, X2, ... , Xi -> Y1, Y2, ... Yj ) • X is called a determinant, since X functionally determines Y. We have to know the meaning of X and Y to deside if a functional dependency exists. • Example: • PostalCode -> City (a specific postal code will always reference the same city). There exists a functional dependency between PostalCode and City, with PostalCode as a determinant. 25.10.2018 OD: Normalization 4 Functional Dependency (FD) • To know if a functional dependency exists, the data relations have to be analyzed by someone who knows the meaning of the data FD 1 FD 2 FD 3 25.10.2018 OD: Normalization 5 Superkey • A Superkey is one or more attributes (columns) in a relation (table), that uniquely determines all the attributes (columns) in the relation (table) • All the attributes together (in a table) will always be a Superkey. Example: • RegNumber, CarMake, Color, OwnerId, OwnerId, OwnerLastName, PostalCode, City • But lots of other combinations will also be Superkeys. Examples: • RegNumber, CarMake, Color, OwnerId, OwnerId, OwnerLastName, PostalCode, • RegNumber, OwnerId, PostalCode • RegNumber 25.10.2018 OD: Normalization 6 Candidate Key • A Candidate Key is a minimal Super Key • Criteria 1: If any attributes are removed from a Candidate Key, it will loose its characteristics as a Super Key • A Candidate Key is a «candidate» to be a Primary Key (A Primary Key is always selected amongst the Candidate Keys) • Criteria 2: Since it must have the ability to act as a Primary Key, it must be able to uniquely identify each and every row in the table 25.10.2018 OD: Normalization 7 Candidate Key - Examples • This is not a Candidate Key (but it is a Superkey): RegNumber, CarMake, Color, OwnerId, OwnerLastName, PostalCode, City • This is a Candidate Key: RegNumber • Case: PROJECTMEMBER (ProjectId, EmployeeId, HoursWorked) • This is a Candidate Key: ProjectId, EmployeeId • This is not Candidate Keys • ProjectId (neither a Superkey nor a Candidate Key) • EmployeeId (neither a Superkey nor a Candidate Key) • ProjectId, EmployeeId, HoursWorked (a Superkey, but not a Candidate Key) 25.10.2018 OD: Normalization 8 Using letters instead of column names • When the functional dependencies are described, letters (instead of column names) is often used to simplify the expression 25.10.2018 OD: Normalization 9 Normalization Criterias BCNF • No other 3NF functional • No transitive depen- 2NF functional dencies • No partial dependencies functional dependencies The goal is to 1NF achieve at least 3NF • Only «atomic» values for each attribute 25.10.2018 OD: Normalization 10 First normal form (1NF) • Criteria: Only «atomic» values for each attribute • Any multivalued attributes (repeating groups) have to be removed Example: Repeating groups (violates the 1NF-rule) 25.10.2018 OD: Normalization 11 First normal form (1NF) • One solution is to make one column for each car • Problems: 1) How many columns do we need? 2) It will result in many NULL-values 2) It will complicate search for data This is a bad solution. We want to expand the number of rows, not the number columns 25.10.2018 OD: Normalization 12 First normal form (1NF) - Normalization • Make a new table Primary Foreign Key • Move the repeating values to the new table Key • To preserve the functional dependency, the repeating values will act as Primary Key and a new attributt will act as a Foreign key to the original table 25.10.2018 OD: Normalization 13 Second normal form (2NF) • The table must satisfy 1NF • A Primary Key has to be chosen (amongst the Candidate Keys) • To qualify for 2NF, no non-key attributes can be dependant of only a part of the Primary Key (ie. «partial dependency» is not allowed) • NB! Only relevant when the Primary Key consists of more than one attribute (Combined Primary Key) Primary Key Partial Functional Dependency. C is dependant of a part of the Primary Key (B). This violates the 2NF-rule. 25.10.2018 OD: Normalization 14 Second normal form (2NF) - Normalization Split the table into two tables. Remove the partial dependency, and make it a new table B will become a Foreign Key 25.10.2018 OD: Normalization 15 Second normal form (2NF) - Example Splits the table into two tables without 2NF-violations CourseId will become a Foreign Key 25.10.2018 OD: Normalization 16 Third normal form (3NF) • The table must satisfy 2NF • A Primary Key has to be chosen (amongst the Candidate Keys) • No transitive functional dependencies from the Primary Key is allowed (which means that dependencies between non-key attributes will break the rule) Breaks the rule of the 3NF criteria A -> F and F -> G (and then: A -> G) Breaks the rule of is a Transitive Dependency the 3NF criteria 25.10.2018 OD: Normalization 17 Third normal form (3NF) - Normalization Still breaking Normalized the 3NF-rule to 3NF 1 2 3 Normalized Normalized to 3NF to 3NF 25.10.2018 OD: Normalization 18 Third normal form (3NF) - Example Still breaking Normalized the 3NF criteria to 3NF 1 2 3 Normalized Normalized to 3NF to 3NF 25.10.2018 OD: Normalization 19 Third normal form (3NF) - Example The result is that we end up with three 3NF-normalized tables, where all the original dependencies are preserved (through Foreign Keys): Foreign Key Foreign Key 25.10.2018 OD: Normalization 20 BCNF (Boyce Codd Normal Form) • This is very rare condition in practice, so we only take a brief look at it • BCNF is achieved by removing all other dependencies than specified in 2NF and 3NF • To meet the requirements of BCNF, every determinant in the relation (table) have to be a Candidate Key • BCNF-violation can only occur in relations (tables) with more than one composite Candidate Key, where at least one common attribute overlaps. In other cases, 3NF and BCNF will be equivalent 25.10.2018 OD: Normalization 21 BCNF - Example • Goal: A database for arranging parent-teacher conferences at a school • Table Name: PARENT_TEACHER_CONFERENCE • Constraint: A teacher is always assigned the same room for all the conferences listed on the same date. Various teachers can use the same room. • Table before normalization: We start with the Universal Relation: (All the data we want to register) Date Time Teacher Pupil Room 25.10.2018 OD: Normalization 22 BCNF – Example: Identifing dependencies Candidate Key 1 (CK1) Candidate Key 2 (CK2) CK2 CK1 A third functional dependency which is not a Candidate Key. This violates the BCNF-criteria. 25.10.2018 OD: Normalization 23 BCNF – Example: Normalization-solution Remove the attributes involved in the functional dependency that violates BCNF, and place them in a new, separate table The remaining part becomes the second table Both dependencies is now Candidate Keys and no other dependencies exist in the relation (table) 25.10.2018 OD: Normalization 24 Normalization - A step by step-strategy 1) Start with the Universal Relation (all the attributes from all the tables) 2) Analyze the data to reveal all the functional dependencies 3) Remove any repeating values (1NF-criteria) 4) Identify all Candidate Keys 5) Select a Primary Key 6) Remove any partial dependencies (to achieve 2NF) 7) Remove any transitive dependencies (to achieve 3N) 8) Remove any remaining dependencies (to achieve BCNF) 25.10.2018 OD: Normalization 25 Problems with Normalization • Normalization leads to smaller tables • Smaller tables (and more foreign keys) complicates inserting data • Normalization increases the need for joining tables when performing queries against the table. • More duplicated data, because of more foreign keys (but less redundancy of the same reason, which is positive) • 3NF is often the goal (because the rules of BCNF is more complicated to maintain than the rules of 3NF) 25.10.2018 OD: Normalization 26.
Recommended publications
  • Foreign(Key(Constraints(
    Foreign(Key(Constraints( ! Foreign(key:(a(rule(that(a(value(appearing(in(one(rela3on(must( appear(in(the(key(component(of(another(rela3on.( – aka(values(for(certain(a9ributes(must("make(sense."( – Po9er(example:(Every(professor(who(is(listed(as(teaching(a(course(in(the( Courses(rela3on(must(have(an(entry(in(the(Profs(rela3on.( ! How(do(we(express(such(constraints(in(rela3onal(algebra?( ! Consider(the(rela3ons(Courses(crn,(year,(name,(proflast,(…)(and( Profs(last,(first).( ! We(want(to(require(that(every(nonLNULL(value(of(proflast(in( Courses(must(be(a(valid(professor(last(name(in(Profs.( ! RA((πProfLast(Courses)((((((((⊆ π"last(Profs)( 23( Foreign(Key(Constraints(in(SQL( ! We(want(to(require(that(every(nonLNULL(value(of(proflast(in( Courses(must(be(a(valid(professor(last(name(in(Profs.( ! In(Courses,(declare(proflast(to(be(a(foreign(key.( ! CREATE&TABLE&Courses&(& &&&proflast&VARCHAR(8)&REFERENCES&Profs(last),...);& ! CREATE&TABLE&Courses&(& &&&proflast&VARCHAR(8),&...,&& &&&FOREIGN&KEY&proflast&REFERENCES&Profs(last));& 24( Requirements(for(FOREIGN(KEYs( ! If(a(rela3on(R(declares(that(some(of(its(a9ributes(refer( to(foreign(keys(in(another(rela3on(S,(then(these( a9ributes(must(be(declared(UNIQUE(or(PRIMARY(KEY(in( S.( ! Values(of(the(foreign(key(in(R(must(appear(in(the( referenced(a9ributes(of(some(tuple(in(S.( 25( Enforcing(Referen>al(Integrity( ! Three(policies(for(maintaining(referen3al(integrity.( ! Default(policy:(reject(viola3ng(modifica3ons.( ! Cascade(policy:(mimic(changes(to(the(referenced( a9ributes(at(the(foreign(key.( ! SetLNULL(policy:(set(appropriate(a9ributes(to(NULL.(
    [Show full text]
  • Referential Integrity in Sqlite
    CS 564: Database Management Systems University of Wisconsin - Madison, Fall 2017 Referential Integrity in SQLite Declaring Referential Integrity (Foreign Key) Constraints Foreign key constraints are used to check referential integrity between tables in a database. Consider, for example, the following two tables: create table Residence ( nameVARCHARPRIMARY KEY, capacityINT ); create table Student ( idINTPRIMARY KEY, firstNameVARCHAR, lastNameVARCHAR, residenceVARCHAR ); We can enforce the constraint that a Student’s residence actually exists by making Student.residence a foreign key that refers to Residence.name. SQLite lets you specify this relationship in several different ways: create table Residence ( nameVARCHARPRIMARY KEY, capacityINT ); create table Student ( idINTPRIMARY KEY, firstNameVARCHAR, lastNameVARCHAR, residenceVARCHAR, FOREIGNKEY(residence) REFERENCES Residence(name) ); or create table Residence ( nameVARCHARPRIMARY KEY, capacityINT ); create table Student ( idINTPRIMARY KEY, firstNameVARCHAR, lastNameVARCHAR, residenceVARCHAR REFERENCES Residence(name) ); or create table Residence ( nameVARCHARPRIMARY KEY, 1 capacityINT ); create table Student ( idINTPRIMARY KEY, firstNameVARCHAR, lastNameVARCHAR, residenceVARCHAR REFERENCES Residence-- Implicitly references the primary key of the Residence table. ); All three forms are valid syntax for specifying the same constraint. Constraint Enforcement There are a number of important things about how referential integrity and foreign keys are handled in SQLite: • The attribute(s) referenced by a foreign key constraint (i.e. Residence.name in the example above) must be declared UNIQUE or as the PRIMARY KEY within their table, but this requirement is checked at run-time, not when constraints are declared. For example, if Residence.name had not been declared as the PRIMARY KEY of its table (or as UNIQUE), the FOREIGN KEY declarations above would still be permitted, but inserting into the Student table would always yield an error.
    [Show full text]
  • The Unconstrained Primary Key
    IBM Systems Lab Services and Training The Unconstrained Primary Key Dan Cruikshank www.ibm.com/systems/services/labservices © 2009 IBM Corporation In this presentation I build upon the concepts that were presented in my article “The Keys to the Kingdom”. I will discuss how primary and unique keys can be utilized for something other than just RI. In essence, it is about laying the foundation for data centric programming. I hope to convey that by establishing some basic rules the database developer can obtain reasonable performance. The title is an oxymoron, in essence a Primary Key is a constraint, but it is a constraint that gives the database developer more freedom to utilize an extremely powerful relational database management system, what we call DB2 for i. 1 IBM Systems Lab Services and Training Agenda Keys to the Kingdom Exploiting the Primary Key Pagination with ROW_NUMBER Column Ordering Summary 2 www.ibm.com/systems/services/labservices © 2009 IBM Corporation I will review the concepts I introduced in the article “The Keys to the Kingdom” published in the Centerfield. I think this was the inspiration for the picture. I offered a picture of me sitting on the throne, but that was rejected. I will follow this with a discussion on using the primary key as a means for creating peer or subset tables for the purpose of including or excluding rows in a result set. The ROW_NUMBER function is part of the OLAP support functions introduced in 5.4. Here I provide some examples of using ROW_NUMBER with the BETWEEN predicate in order paginate a result set.
    [Show full text]
  • Keys Are, As Their Name Suggests, a Key Part of a Relational Database
    The key is defined as the column or attribute of the database table. For example if a table has id, name and address as the column names then each one is known as the key for that table. We can also say that the table has 3 keys as id, name and address. The keys are also used to identify each record in the database table . Primary Key:- • Every database table should have one or more columns designated as the primary key . The value this key holds should be unique for each record in the database. For example, assume we have a table called Employees (SSN- social security No) that contains personnel information for every employee in our firm. We’ need to select an appropriate primary key that would uniquely identify each employee. Primary Key • The primary key must contain unique values, must never be null and uniquely identify each record in the table. • As an example, a student id might be a primary key in a student table, a department code in a table of all departments in an organisation. Unique Key • The UNIQUE constraint uniquely identifies each record in a database table. • Allows Null value. But only one Null value. • A table can have more than one UNIQUE Key Column[s] • A table can have multiple unique keys Differences between Primary Key and Unique Key: • Primary Key 1. A primary key cannot allow null (a primary key cannot be defined on columns that allow nulls). 2. Each table can have only one primary key. • Unique Key 1. A unique key can allow null (a unique key can be defined on columns that allow nulls.) 2.
    [Show full text]
  • Pizza Parlor Point-Of-Sales System CMPS 342 Database
    1 Pizza Parlor Point-Of-Sales System CMPS 342 Database Systems Chris Perry Ruben Castaneda 2 Table of Contents PHASE 1 1 Pizza Parlor: Point-Of-Sales Database........................................................................3 1.1 Description of Business......................................................................................3 1.2 Conceptual Database.........................................................................................4 2 Conceptual Database Design........................................................................................5 2.1 Entities................................................................................................................5 2.2 Relationships....................................................................................................13 2.3 Related Entities................................................................................................16 PHASE 2 3 ER-Model vs Relational Model..................................................................................17 3.1 Description.......................................................................................................17 3.2 Comparison......................................................................................................17 3.3 Conversion from E-R model to relational model.............................................17 3.4 Constraints........................................................................................................19 4 Relational Model..........................................................................................................19
    [Show full text]
  • Normalization Exercises
    DATABASE DESIGN: NORMALIZATION NOTE & EXERCISES (Up to 3NF) Tables that contain redundant data can suffer from update anomalies, which can introduce inconsistencies into a database. The rules associated with the most commonly used normal forms, namely first (1NF), second (2NF), and third (3NF). The identification of various types of update anomalies such as insertion, deletion, and modification anomalies can be found when tables that break the rules of 1NF, 2NF, and 3NF and they are likely to contain redundant data and suffer from update anomalies. Normalization is a technique for producing a set of tables with desirable properties that support the requirements of a user or company. Major aim of relational database design is to group columns into tables to minimize data redundancy and reduce file storage space required by base tables. Take a look at the following example: StdSSN StdCity StdClass OfferNo OffTerm OffYear EnrGrade CourseNo CrsDesc S1 SEATTLE JUN O1 FALL 2006 3.5 C1 DB S1 SEATTLE JUN O2 FALL 2006 3.3 C2 VB S2 BOTHELL JUN O3 SPRING 2007 3.1 C3 OO S2 BOTHELL JUN O2 FALL 2006 3.4 C2 VB The insertion anomaly: Occurs when extra data beyond the desired data must be added to the database. For example, to insert a course (CourseNo), it is necessary to know a student (StdSSN) and offering (OfferNo) because the combination of StdSSN and OfferNo is the primary key. Remember that a row cannot exist with NULL values for part of its primary key. The update anomaly: Occurs when it is necessary to change multiple rows to modify ONLY a single fact.
    [Show full text]
  • Data Definition Language
    1 Structured Query Language SQL, or Structured Query Language is the most popular declarative language used to work with Relational Databases. Originally developed at IBM, it has been subsequently standard- ized by various standards bodies (ANSI, ISO), and extended by various corporations adding their own features (T-SQL, PL/SQL, etc.). There are two primary parts to SQL: The DDL and DML (& DCL). 2 DDL - Data Definition Language DDL is a standard subset of SQL that is used to define tables (database structure), and other metadata related things. The few basic commands include: CREATE DATABASE, CREATE TABLE, DROP TABLE, and ALTER TABLE. There are many other statements, but those are the ones most commonly used. 2.1 CREATE DATABASE Many database servers allow for the presence of many databases1. In order to create a database, a relatively standard command ‘CREATE DATABASE’ is used. The general format of the command is: CREATE DATABASE <database-name> ; The name can be pretty much anything; usually it shouldn’t have spaces (or those spaces have to be properly escaped). Some databases allow hyphens, and/or underscores in the name. The name is usually limited in size (some databases limit the name to 8 characters, others to 32—in other words, it depends on what database you use). 2.2 DROP DATABASE Just like there is a ‘create database’ there is also a ‘drop database’, which simply removes the database. Note that it doesn’t ask you for confirmation, and once you remove a database, it is gone forever2. DROP DATABASE <database-name> ; 2.3 CREATE TABLE Probably the most common DDL statement is ‘CREATE TABLE’.
    [Show full text]
  • A Simple Database Supporting an Online Book Seller Tables About Books and Authors CREATE TABLE Book ( Isbn INTEGER, Title
    1 A simple database supporting an online book seller Tables about Books and Authors CREATE TABLE Book ( Isbn INTEGER, Title CHAR[120] NOT NULL, Synopsis CHAR[500], ListPrice CURRENCY NOT NULL, AmazonPrice CURRENCY NOT NULL, SavingsInPrice CURRENCY NOT NULL, /* redundant AveShipLag INTEGER, AveCustRating REAL, SalesRank INTEGER, CoverArt FILE, Format CHAR[4] NOT NULL, CopiesInStock INTEGER, PublisherName CHAR[120] NOT NULL, /*Remove NOT NULL if you want 0 or 1 PublicationDate DATE NOT NULL, PublisherComment CHAR[500], PublicationCommentDate DATE, PRIMARY KEY (Isbn), FOREIGN KEY (PublisherName) REFERENCES Publisher, ON DELETE NO ACTION, ON UPDATE CASCADE, CHECK (Format = ‘hard’ OR Format = ‘soft’ OR Format = ‘audi’ OR Format = ‘cd’ OR Format = ‘digital’) /* alternatively, CHECK (Format IN (‘hard’, ‘soft’, ‘audi’, ‘cd’, ‘digital’)) CHECK (AmazonPrice + SavingsInPrice = ListPrice) ) CREATE TABLE Author ( AuthorName CHAR[120], AuthorBirthDate DATE, AuthorAddress ADDRESS, AuthorBiography FILE, PRIMARY KEY (AuthorName, AuthorBirthDate) ) CREATE TABLE WrittenBy (/*Books are written by authors Isbn INTEGER, AuthorName CHAR[120], AuthorBirthDate DATE, OrderOfAuthorship INTEGER NOT NULL, AuthorComment FILE, AuthorCommentDate DATE, PRIMARY KEY (Isbn, AuthorName, AuthorBirthDate), FOREIGN KEY (Isbn) REFERENCES Book, ON DELETE CASCADE, ON UPDATE CASCADE, FOREIGN KEY (AuthorName, AuthorBirthDate) REFERENCES Author, ON DELETE CASCADE, ON UPDATE CASCADE) 1 2 CREATE TABLE Publisher ( PublisherName CHAR[120], PublisherAddress ADDRESS, PRIMARY KEY (PublisherName)
    [Show full text]
  • 3 Data Definition Language (DDL)
    Database Foundations 6-3 Data Definition Language (DDL) Copyright © 2015, Oracle and/or its affiliates. All rights reserved. Roadmap You are here Data Transaction Introduction to Structured Data Definition Manipulation Control Oracle Query Language Language Language (TCL) Application Language (DDL) (DML) Express (SQL) Restricting Sorting Data Joining Tables Retrieving Data Using Using ORDER Using JOIN Data Using WHERE BY SELECT DFo 6-3 Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 3 Data Definition Language (DDL) Objectives This lesson covers the following objectives: • Identify the steps needed to create database tables • Describe the purpose of the data definition language (DDL) • List the DDL operations needed to build and maintain a database's tables DFo 6-3 Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 4 Data Definition Language (DDL) Database Objects Object Description Table Is the basic unit of storage; consists of rows View Logically represents subsets of data from one or more tables Sequence Generates numeric values Index Improves the performance of some queries Synonym Gives an alternative name to an object DFo 6-3 Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 5 Data Definition Language (DDL) Naming Rules for Tables and Columns Table names and column names must: • Begin with a letter • Be 1–30 characters long • Contain only A–Z, a–z, 0–9, _, $, and # • Not duplicate the name of another object owned by the same user • Not be an Oracle server–reserved word DFo 6-3 Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 6 Data Definition Language (DDL) CREATE TABLE Statement • To issue a CREATE TABLE statement, you must have: – The CREATE TABLE privilege – A storage area CREATE TABLE [schema.]table (column datatype [DEFAULT expr][, ...]); • Specify in the statement: – Table name – Column name, column data type, column size – Integrity constraints (optional) – Default values (optional) DFo 6-3 Copyright © 2015, Oracle and/or its affiliates.
    [Show full text]
  • Fast Foreign-Key Detection in Microsoft SQL
    Fast Foreign-Key Detection in Microsoft SQL Server PowerPivot for Excel Zhimin Chen Vivek Narasayya Surajit Chaudhuri Microsoft Research Microsoft Research Microsoft Research [email protected] [email protected] [email protected] ABSTRACT stored in a relational database, which they can import into Excel. Microsoft SQL Server PowerPivot for Excel, or PowerPivot for Other sources of data are text files, web data feeds or in general any short, is an in-memory business intelligence (BI) engine that tabular data range imported into Excel. enables Excel users to interactively create pivot tables over large data sets imported from sources such as relational databases, text files and web data feeds. Unlike traditional pivot tables in Excel that are defined on a single table, PowerPivot allows analysis over multiple tables connected via foreign-key joins. In many cases however, these foreign-key relationships are not known a priori, and information workers are often not be sophisticated enough to define these relationships. Therefore, the ability to automatically discover foreign-key relationships in PowerPivot is valuable, if not essential. The key challenge is to perform this detection interactively and with high precision even when data sets scale to hundreds of millions of rows and the schema contains tens of tables and hundreds of columns. In this paper, we describe techniques for fast foreign-key detection in PowerPivot and experimentally evaluate its accuracy, performance and scale on both synthetic benchmarks and real-world data sets. These techniques have been incorporated into PowerPivot for Excel. Figure 1. Example of pivot table in Excel. It enables multi- dimensional analysis over a single table.
    [Show full text]
  • Databases : Lecture 1 1: Beyond ACID/Relational Databases Timothy G
    Databases : Lecture 1 1: Beyond ACID/Relational databases Timothy G. Griffin Lent Term 2015 • Rise of Web and cluster-based computing • “NoSQL” Movement • Relationships vs. Aggregates • Key-value store • XML or JSON as a data exchange language • Not all applications require ACID • CAP = Consistency, Availability, and Partition tolerance • The CAP theorem (pick any two?) • Eventual consistency Apologies to Martin Fowler (“NoSQL Distilled”) Application-specific databases have always been with us . Two that I am familiar with: Daytona (AT&T): “Daytona is a data management system, not a database”. Built on top of the unix file system, this toolkit is for building application-specific But these systems and highly scalable data stores. Is used at AT&T are proprietary. for analysis of 100s of terabytes of call records. http://www2.research.att.com/~daytona/ Open source is a hallmark of NoSQL DataBlitz (Bell Labs, 1995) : Main-memory database system designed for embedded systems such as telecommunication switches. Optimized for simple key-driven queries. What’s new? Internet scale, cluster computing, open source . Something big is happening in the land of databases The Internet + cluster computing + open source systems many more points in the database design space are being explored and deployed Broader context helps clarify the strengths and weaknesses of the standard relational/ACID approach. http://nosql-database.org/ Eric Brewer’s PODC Keynote (July 2000) ACID vs. BASE (Basically Available, Soft-state, Eventually consistent) ACID BASE • Strong consistency Weak consistency • Isolation Availability first • Focus on “commit” Best effort • Nested transactions Approximate answers OK • Availability? Aggressive (optimistic) • Conservative (pessimistic) Simpler! • Difficult evolution (e.g.
    [Show full text]
  • CSC 443 – Database Management Systems Data and Its Structure
    CSC 443 – Database Management Systems Lecture 3 –The Relational Data Model Data and Its Structure • Data is actually stored as bits, but it is difficult to work with data at this level. • It is convenient to view data at different levels of abstraction . • Schema : Description of data at some abstraction level. Each level has its own schema. • We will be concerned with three schemas: physical , conceptual , and external . 1 Physical Data Level • Physical schema describes details of how data is stored: tracks, cylinders, indices etc. • Early applications worked at this level – explicitly dealt with details. • Problem: Routines were hard-coded to deal with physical representation. – Changes to data structure difficult to make. – Application code becomes complex since it must deal with details. – Rapid implementation of new features impossible. Conceptual Data Level • Hides details. – In the relational model, the conceptual schema presents data as a set of tables. • DBMS maps from conceptual to physical schema automatically. • Physical schema can be changed without changing application: – DBMS would change mapping from conceptual to physical transparently – This property is referred to as physical data independence 2 Conceptual Data Level (con’t) External Data Level • In the relational model, the external schema also presents data as a set of relations. • An external schema specifies a view of the data in terms of the conceptual level. It is tailored to the needs of a particular category of users. – Portions of stored data should not be seen by some users. • Students should not see their files in full. • Faculty should not see billing data. – Information that can be derived from stored data might be viewed as if it were stored.
    [Show full text]