Introduction to Database Introduction to Database Systems Systems

Slides adapted from Loreto Bravo Slides adapted from Loreto Bravo

Introduction to Database Introduction to Database Systems Systems

Slides adapted from Loreto Bravo Slides adapted from Loreto Bravo What is a database? What is a database?

! A database is an organized collection of data for ! A database is an organized collection of data for one or more multiple uses. one or more multiple uses.

! Databases organizes the data in a database ! Databases organizes the data in a database according to a data model. according to a data model.

" A data model is a collection of conceptual tools for describing " A data model is a collection of conceptual tools for describing data, data relationships, data semantics and data constraints. data, data relationships, data semantics and data constraints.

" Components: " Components:

! structural part ! structural part

! manipulative part ! manipulative part ! integrity rules ! integrity rules

What is a database? What is a database?

! A database is an organized collection of data for ! A database is an organized collection of data for one or more multiple uses. one or more multiple uses.

! Databases organizes the data in a database ! Databases organizes the data in a database according to a data model. according to a data model.

" A data model is a collection of conceptual tools for describing " A data model is a collection of conceptual tools for describing data, data relationships, data semantics and data constraints. data, data relationships, data semantics and data constraints.

" Components: " Components:

! structural part ! structural part

! manipulative part ! manipulative part ! integrity rules ! integrity rules Data Models Data Models

! Types of Data Models: ! Types of Data Models:

" Describe data at the conceptual and external levels " Describe data at the conceptual and external levels

! Object-based Data Models ! Object-based Data Models " Entity-relationship model, Object-oriented model, Semantic data model, " Entity-relationship model, Object-oriented model, Semantic data model, Functional data model Functional data model

! Record-based Data Models ! Record-based Data Models " relational, network, and hierarchical data model, etc. " relational, network, and hierarchical data model, etc.

" Describe data at the internal level " Describe data at the internal level

! Unifying model or Frame memory. ! Unifying model or Frame memory.

Data Models Data Models

! Types of Data Models: ! Types of Data Models:

" Describe data at the conceptual and external levels " Describe data at the conceptual and external levels

! Object-based Data Models ! Object-based Data Models " Entity-relationship model, Object-oriented model, Semantic data model, " Entity-relationship model, Object-oriented model, Semantic data model, Functional data model Functional data model

! Record-based Data Models ! Record-based Data Models " relational, network, and hierarchical data model, etc. " relational, network, and hierarchical data model, etc.

" Describe data at the internal level " Describe data at the internal level

! Unifying model or Frame memory. ! Unifying model or Frame memory. Historical Perspective -- Before 1960 Historical Perspective -- Before 1960

! File systems ! File systems

" Problems: " Problems: ! data redundancy ! data redundancy

! data is separated: they cannot be easily combined ! data is separated: they cannot be easily combined

! high cost of propagation of updates ! high cost of propagation of updates

! update anomalies and inconsistencies ! update anomalies and inconsistencies

! no abstract data model ! no abstract data model ! requires knowledge of storage details ! requires knowledge of storage details

! no standard query language ! no standard query language

! need to enforce security policies in which different users have ! need to enforce security policies in which different users have permission to access different subsets of the data permission to access different subsets of the data

Historical Perspective -- Before 1960 Historical Perspective -- Before 1960

! File systems ! File systems

" Problems: " Problems: ! data redundancy ! data redundancy

! data is separated: they cannot be easily combined ! data is separated: they cannot be easily combined

! high cost of propagation of updates ! high cost of propagation of updates

! update anomalies and inconsistencies ! update anomalies and inconsistencies

! no abstract data model ! no abstract data model ! requires knowledge of storage details ! requires knowledge of storage details

! no standard query language ! no standard query language

! need to enforce security policies in which different users have ! need to enforce security policies in which different users have permission to access different subsets of the data permission to access different subsets of the data File Systems File Systems

Cliente processing Client Files Cliente processing Client Files

User User

Loans processing Loan Files Loans processing Loan Files

User User

For each loan the information of the client is stored: Redundancy For each loan the information of the client is stored: Redundancy

File Systems File Systems

Cliente processing Client Files Cliente processing Client Files

User User

Loans processing Loan Files Loans processing Loan Files

User User

For each loan the information of the client is stored: Redundancy For each loan the information of the client is stored: Redundancy File Systems File Systems

! Enroll “Mary Johnson” in “CSE444”: ! Enroll “Mary Johnson” in “CSE444”:

Write a C program to do the following: Write a C program to do the following: Read ‘students.txt’ Read ‘students.txt’ Read ‘courses.txt’ Read ‘courses.txt’ Find&update the record “Mary Johnson” Find&update the record “Mary Johnson” Find&update the record “CSE444” Find&update the record “CSE444” Write “students.txt” Write “students.txt” Write “courses.txt” Write “courses.txt”

File Systems File Systems

! Enroll “Mary Johnson” in “CSE444”: ! Enroll “Mary Johnson” in “CSE444”:

Write a C program to do the following: Write a C program to do the following: Read ‘students.txt’ Read ‘students.txt’ Read ‘courses.txt’ Read ‘courses.txt’ Find&update the record “Mary Johnson” Find&update the record “Mary Johnson” Find&update the record “CSE444” Find&update the record “CSE444” Write “students.txt” Write “students.txt” Write “courses.txt” Write “courses.txt” File Systems File Systems

! System crashes: ! System crashes: Read ‘students.txt’ Read ‘students.txt’ Read ‘courses.txt’ CRASH ! Read ‘courses.txt’ CRASH ! Find&update the record “Mary Johnson” Find&update the record “Mary Johnson” Find&update the record “CSE444” Find&update the record “CSE444” Write “students.txt” Write “students.txt” Write “courses.txt” Write “courses.txt”

" What is the problem ? " What is the problem ? ! Large data sets (say 50GB) ! Large data sets (say 50GB) " What is the problem ? " What is the problem ? ! Simultaneous access by many users ! Simultaneous access by many users " Need locks " Need locks

File Systems File Systems

! System crashes: ! System crashes: Read ‘students.txt’ Read ‘students.txt’ Read ‘courses.txt’ CRASH ! Read ‘courses.txt’ CRASH ! Find&update the record “Mary Johnson” Find&update the record “Mary Johnson” Find&update the record “CSE444” Find&update the record “CSE444” Write “students.txt” Write “students.txt” Write “courses.txt” Write “courses.txt”

" What is the problem ? " What is the problem ? ! Large data sets (say 50GB) ! Large data sets (say 50GB) " What is the problem ? " What is the problem ? ! Simultaneous access by many users ! Simultaneous access by many users " Need locks " Need locks 1960 1960

! Hierarchical Databases ! Hierarchical Databases

" Developed by North American Rockwell and IBM as the IMS " Developed by North American Rockwell and IBM as the IMS (Information Management System) (Information Management System)

! IMS formed the basis for hierarchical data model ! IMS formed the basis for hierarchical data model

! Still Available!! http://www-01.ibm.com/software/data/ims/ ! Still Available!! http://www-01.ibm.com/software/data/ims/

" American Airlines and IBM jointly developed SABRE for making " American Airlines and IBM jointly developed SABRE for making airline reservations airline reservations

! SABRE is used today to populate Web-based travel services such as ! SABRE is used today to populate Web-based travel services such as Travelocity Travelocity

" Based on a tree structure " Based on a tree structure

" Problems: " Problems: ! Changes in data structure require changes in application programs that access ! Changes in data structure require changes in application programs that access that structure that structure

! No Many-to-Many relationships ! No Many-to-Many relationships

! Programmers must be thoroughly familiar with the database structure. ! Programmers must be thoroughly familiar with the database structure.

1960 1960

! Hierarchical Databases ! Hierarchical Databases

" Developed by North American Rockwell and IBM as the IMS " Developed by North American Rockwell and IBM as the IMS (Information Management System) (Information Management System)

! IMS formed the basis for hierarchical data model ! IMS formed the basis for hierarchical data model

! Still Available!! http://www-01.ibm.com/software/data/ims/ ! Still Available!! http://www-01.ibm.com/software/data/ims/

" American Airlines and IBM jointly developed SABRE for making " American Airlines and IBM jointly developed SABRE for making airline reservations airline reservations

! SABRE is used today to populate Web-based travel services such as ! SABRE is used today to populate Web-based travel services such as Travelocity Travelocity

" Based on a tree structure " Based on a tree structure

" Problems: " Problems: ! Changes in data structure require changes in application programs that access ! Changes in data structure require changes in application programs that access that structure that structure

! No Many-to-Many relationships ! No Many-to-Many relationships

! Programmers must be thoroughly familiar with the database structure. ! Programmers must be thoroughly familiar with the database structure.

1960 1960

! Network Databases ! Network Databases

" Integrated data store, first general-purpose DBMS " Integrated data store, first general-purpose DBMS designed by Charles Bachman at GE designed by Charles Bachman at GE

! Formed basis for network data model ! Formed basis for network data model

! Bachman received Turing Award in 1973 for his work in ! Bachman received Turing Award in 1973 for his work in database area database area

" Extension of the hierarchical data model " Extension of the hierarchical data model

! Based on acyclic digraph ! Based on acyclic digraph

" Standardized (1971) by the CODASYL group (Conference " Standardized (1971) by the CODASYL group (Conference on Data Systems Languages) on Data Systems Languages)

" Advantage: Many-to-Many relationships are " Advantage: Many-to-Many relationships are implemented implemented

" Problems: “Navigation” is even harder " Problems: “Navigation” is even harder

1960 1960

! Network Databases ! Network Databases

" Integrated data store, first general-purpose DBMS " Integrated data store, first general-purpose DBMS designed by Charles Bachman at GE designed by Charles Bachman at GE

! Formed basis for network data model ! Formed basis for network data model

! Bachman received Turing Award in 1973 for his work in ! Bachman received Turing Award in 1973 for his work in database area database area

" Extension of the hierarchical data model " Extension of the hierarchical data model

! Based on acyclic digraph ! Based on acyclic digraph

" Standardized (1971) by the CODASYL group (Conference " Standardized (1971) by the CODASYL group (Conference on Data Systems Languages) on Data Systems Languages)

" Advantage: Many-to-Many relationships are " Advantage: Many-to-Many relationships are implemented implemented

" Problems: “Navigation” is even harder " Problems: “Navigation” is even harder

Problems with first DBMS’ Problems with first DBMS’

! Access to database was through low level pointer operations ! Access to database was through low level pointer operations ! Storage details depended on the type of data to be stored ! Storage details depended on the type of data to be stored

! Adding a field to the DB required rewriting the underlying ! Adding a field to the DB required rewriting the underlying access/modification scheme access/modification scheme

! Emphasis on records to be processed, not overall structure ! Emphasis on records to be processed, not overall structure ! User had to know physical structure of the DB in order to ! User had to know physical structure of the DB in order to query for information query for information ! Overall first DBMS’ were very complex and inflexible which ! Overall first DBMS’ were very complex and inflexible which made life difficult when it came to adding new applications or made life difficult when it came to adding new applications or reorganizing the data reorganizing the data

Problems with first DBMS’ Problems with first DBMS’

! Access to database was through low level pointer operations ! Access to database was through low level pointer operations ! Storage details depended on the type of data to be stored ! Storage details depended on the type of data to be stored

! Adding a field to the DB required rewriting the underlying ! Adding a field to the DB required rewriting the underlying access/modification scheme access/modification scheme

! Emphasis on records to be processed, not overall structure ! Emphasis on records to be processed, not overall structure ! User had to know physical structure of the DB in order to ! User had to know physical structure of the DB in order to query for information query for information ! Overall first DBMS’ were very complex and inflexible which ! Overall first DBMS’ were very complex and inflexible which made life difficult when it came to adding new applications or made life difficult when it came to adding new applications or reorganizing the data reorganizing the data 1970 1970 ! Relational Databases ! Relational Databases

" Edgar Codd, at IBM, proposed relational data model. " Edgar Codd, at IBM, proposed relational data model. " Codd's paper “A Relational Model of Data for Large Shared Data " Codd's paper “A Relational Model of Data for Large Shared Data Banks.” Banks.” ! “It provides a means of describing data with its natural structure only-- ! “It provides a means of describing data with its natural structure only-- that is, without superimposing any additional structure for machine that is, without superimposing any additional structure for machine representation purposes. Accordingly, it provides a basis for a high representation purposes. Accordingly, it provides a basis for a high level data language which will yield maximal independence between level data language which will yield maximal independence between programs on the one hand and machine representation on the programs on the one hand and machine representation on the other.”(Codd 1970) other.”(Codd 1970)

" In other words the Relational Model consisted of: " In other words the Relational Model consisted of:

! Data independence from hardware and storage implementation ! Data independence from hardware and storage implementation

! High level, nonprocedural language for accessing data. Instead of ! High level, nonprocedural language for accessing data. Instead of processing one record at a time, a programmer could use the language processing one record at a time, a programmer could use the language to specify single operations that would be performed across the entire to specify single operations that would be performed across the entire data. data.

" Codd won 1981 Turing Award. " Codd won 1981 Turing Award.

1970 1970 ! Relational Databases ! Relational Databases

" Edgar Codd, at IBM, proposed relational data model. " Edgar Codd, at IBM, proposed relational data model. " Codd's paper “A Relational Model of Data for Large Shared Data " Codd's paper “A Relational Model of Data for Large Shared Data Banks.” Banks.” ! “It provides a means of describing data with its natural structure only-- ! “It provides a means of describing data with its natural structure only-- that is, without superimposing any additional structure for machine that is, without superimposing any additional structure for machine representation purposes. Accordingly, it provides a basis for a high representation purposes. Accordingly, it provides a basis for a high level data language which will yield maximal independence between level data language which will yield maximal independence between programs on the one hand and machine representation on the programs on the one hand and machine representation on the other.”(Codd 1970) other.”(Codd 1970)

" In other words the Relational Model consisted of: " In other words the Relational Model consisted of:

! Data independence from hardware and storage implementation ! Data independence from hardware and storage implementation

! High level, nonprocedural language for accessing data. Instead of ! High level, nonprocedural language for accessing data. Instead of processing one record at a time, a programmer could use the language processing one record at a time, a programmer could use the language to specify single operations that would be performed across the entire to specify single operations that would be performed across the entire data. data.

" Codd won 1981 Turing Award. " Codd won 1981 Turing Award. Codd vs. IBM Codd vs. IBM

! Codd’s model had an immediate impact on research, however, to ! Codd’s model had an immediate impact on research, however, to become a legitimacy within the field, it had to survive at least two become a legitimacy within the field, it had to survive at least two battles: battles:

" One in the technical community at large " One in the technical community at large

" One within IBM " One within IBM

! Within IBM ! Within IBM

" Conflict with existing product IMS which had been heavily invested into " Conflict with existing product IMS which had been heavily invested into

" New technology had to prove itself before replacing existing revenue " New technology had to prove itself before replacing existing revenue producing product producing product

" Codd published his paper in open literature because no one at IBM (himself " Codd published his paper in open literature because no one at IBM (himself included) recognized its eventual impact included) recognized its eventual impact

" Outside technical community showed that the idea had great potential " Outside technical community showed that the idea had great potential

Codd vs. IBM Codd vs. IBM

! Codd’s model had an immediate impact on research, however, to ! Codd’s model had an immediate impact on research, however, to become a legitimacy within the field, it had to survive at least two become a legitimacy within the field, it had to survive at least two battles: battles:

" One in the technical community at large " One in the technical community at large

" One within IBM " One within IBM

! Within IBM ! Within IBM

" Conflict with existing product IMS which had been heavily invested into " Conflict with existing product IMS which had been heavily invested into

" New technology had to prove itself before replacing existing revenue " New technology had to prove itself before replacing existing revenue producing product producing product

" Codd published his paper in open literature because no one at IBM (himself " Codd published his paper in open literature because no one at IBM (himself included) recognized its eventual impact included) recognized its eventual impact

" Outside technical community showed that the idea had great potential " Outside technical community showed that the idea had great potential Codd vs. IBM Codd vs. IBM

! Within IBM ! Within IBM

" IBM declared IMS its sole strategic product, setting up Codd and his ideas " IBM declared IMS its sole strategic product, setting up Codd and his ideas as counter to company goals as counter to company goals

" Codd speaks out in spite of IBM’s dissatisfaction and promotes relational " Codd speaks out in spite of IBM’s dissatisfaction and promotes relational model to computer scientists. He arranges a public debate between himself model to computer scientists. He arranges a public debate between himself and Charles Bachmann, who at the time was a key proponent of the and Charles Bachmann, who at the time was a key proponent of the CODASYL standard. CODASYL standard.

" Debate produced further criticism from IBM for undermining its goals, but " Debate produced further criticism from IBM for undermining its goals, but also proved his relational model as a cornerstone to the technical also proved his relational model as a cornerstone to the technical community. community.

! Finally, Two main relational prototypes emerge in the 70’s ! Finally, Two main relational prototypes emerge in the 70’s

" System R from IBM " System R from IBM

! Ingres from UC-Berkeley ! Ingres from UC-Berkeley

Codd vs. IBM Codd vs. IBM

! Within IBM ! Within IBM

" IBM declared IMS its sole strategic product, setting up Codd and his ideas " IBM declared IMS its sole strategic product, setting up Codd and his ideas as counter to company goals as counter to company goals

" Codd speaks out in spite of IBM’s dissatisfaction and promotes relational " Codd speaks out in spite of IBM’s dissatisfaction and promotes relational model to computer scientists. He arranges a public debate between himself model to computer scientists. He arranges a public debate between himself and Charles Bachmann, who at the time was a key proponent of the and Charles Bachmann, who at the time was a key proponent of the CODASYL standard. CODASYL standard.

" Debate produced further criticism from IBM for undermining its goals, but " Debate produced further criticism from IBM for undermining its goals, but also proved his relational model as a cornerstone to the technical also proved his relational model as a cornerstone to the technical community. community.

! Finally, Two main relational prototypes emerge in the 70’s ! Finally, Two main relational prototypes emerge in the 70’s

" System R from IBM " System R from IBM

! Ingres from UC-Berkeley ! Ingres from UC-Berkeley System R System R

! Prototype intended to provide a high-level, nonnavigational, data- ! Prototype intended to provide a high-level, nonnavigational, data- independent interface to many users simultaneously, with high integrity independent interface to many users simultaneously, with high integrity and robustness. and robustness. ! Led to a query language called SEQUEL(Structured English Query ! Led to a query language called SEQUEL(Structured English Query Language) later renamed to Structured Query Language(SQL) for Language) later renamed to Structured Query Language(SQL) for legal reasons. Now a standard for database access. legal reasons. Now a standard for database access. ! Project finished with the conclusion that relational databases were a ! Project finished with the conclusion that relational databases were a feasible commercial product feasible commercial product

! Eventually evolved into SQL/DS which later became DB2 ! Eventually evolved into SQL/DS which later became DB2

System R System R

! Prototype intended to provide a high-level, nonnavigational, data- ! Prototype intended to provide a high-level, nonnavigational, data- independent interface to many users simultaneously, with high integrity independent interface to many users simultaneously, with high integrity and robustness. and robustness. ! Led to a query language called SEQUEL(Structured English Query ! Led to a query language called SEQUEL(Structured English Query Language) later renamed to Structured Query Language(SQL) for Language) later renamed to Structured Query Language(SQL) for legal reasons. Now a standard for database access. legal reasons. Now a standard for database access. ! Project finished with the conclusion that relational databases were a ! Project finished with the conclusion that relational databases were a feasible commercial product feasible commercial product

! Eventually evolved into SQL/DS which later became DB2 ! Eventually evolved into SQL/DS which later became DB2 Ingres Ingres

! Two scientists, and Eugene Wong at UC- ! Two scientists, Michael Stonebraker and Eugene Wong at UC- Berkeley) became interested in relational databases Berkeley) became interested in relational databases

! Used QUEL as its query language ! Used QUEL as its query language

! Similar to System R, but based on different hardware and operating ! Similar to System R, but based on different hardware and operating system system ! Developers eventually branched off to form Ingres Corp, Sybase, MS ! Developers eventually branched off to form Ingres Corp, Sybase, MS SQL Server, Britton-Lee. SQL Server, Britton-Lee. System R and Ingres inspire the development of virtually all System R and Ingres inspire the development of virtually all commercial relational databases, including those from commercial relational databases, including those from Sybase, Informix, Tandem, and even ’s SQL Server Sybase, Informix, Tandem, and even Microsoft’s SQL Server

Ingres Ingres

! Two scientists, Michael Stonebraker and Eugene Wong at UC- ! Two scientists, Michael Stonebraker and Eugene Wong at UC- Berkeley) became interested in relational databases Berkeley) became interested in relational databases

! Used QUEL as its query language ! Used QUEL as its query language

! Similar to System R, but based on different hardware and operating ! Similar to System R, but based on different hardware and operating system system ! Developers eventually branched off to form Ingres Corp, Sybase, MS ! Developers eventually branched off to form Ingres Corp, Sybase, MS SQL Server, Britton-Lee. SQL Server, Britton-Lee. System R and Ingres inspire the development of virtually all System R and Ingres inspire the development of virtually all commercial relational databases, including those from commercial relational databases, including those from Sybase, Informix, Tandem, and even Microsoft’s SQL Server Sybase, Informix, Tandem, and even Microsoft’s SQL Server Where’s Oracle!? Where’s Oracle!?

! Larry Ellison learned of IBM’s work and founded Relational Software ! Larry Ellison learned of IBM’s work and founded Relational Software Inc. in 1977 in California Inc. in 1977 in California

! Their first product was a relational database based off of IBM’s System ! Their first product was a relational database based off of IBM’s System R model and SQL technology R model and SQL technology ! Released in 1979, it was the first commercial RDBMS, beating IBM to ! Released in 1979, it was the first commercial RDBMS, beating IBM to the market by 2 years. the market by 2 years.

! In the 1980’s the company was renamed to and ! In the 1980’s the company was renamed to Oracle Corporation and throughout the 80’s new features were added and performance throughout the 80’s new features were added and performance improved as the price of hardware came down and Oracle became the improved as the price of hardware came down and Oracle became the largest independent RDBMS vendor. largest independent RDBMS vendor.

Where’s Oracle!? Where’s Oracle!?

! Larry Ellison learned of IBM’s work and founded Relational Software ! Larry Ellison learned of IBM’s work and founded Relational Software Inc. in 1977 in California Inc. in 1977 in California

! Their first product was a relational database based off of IBM’s System ! Their first product was a relational database based off of IBM’s System R model and SQL technology R model and SQL technology ! Released in 1979, it was the first commercial RDBMS, beating IBM to ! Released in 1979, it was the first commercial RDBMS, beating IBM to the market by 2 years. the market by 2 years.

! In the 1980’s the company was renamed to Oracle Corporation and ! In the 1980’s the company was renamed to Oracle Corporation and throughout the 80’s new features were added and performance throughout the 80’s new features were added and performance improved as the price of hardware came down and Oracle became the improved as the price of hardware came down and Oracle became the largest independent RDBMS vendor. largest independent RDBMS vendor. 1975 1975

! ANSI-SPARC Three-Level ! ANSI-SPARC Three-Level Architecture View 1 View 2 View 3 Architecture View 1 View 2 View 3 " Views describe how users " Views describe how users see the data. see the data. Conceptual Schema Conceptual Schema " Conceptual schema defines " Conceptual schema defines logical structure logical structure Physical Schema Physical Schema ! Describes what data is ! Describes what data is stored and relationships stored and relationships among the data. among the data. " Physical schema describes " Physical schema describes the files and indexes used. the files and indexes used. ! Describes how the data is ! Describes how the data is stored in the database stored in the database

1975 1975

! ANSI-SPARC Three-Level ! ANSI-SPARC Three-Level Architecture View 1 View 2 View 3 Architecture View 1 View 2 View 3 " Views describe how users " Views describe how users see the data. see the data. Conceptual Schema Conceptual Schema " Conceptual schema defines " Conceptual schema defines logical structure logical structure Physical Schema Physical Schema ! Describes what data is ! Describes what data is stored and relationships stored and relationships among the data. among the data. " Physical schema describes " Physical schema describes the files and indexes used. the files and indexes used. ! Describes how the data is ! Describes how the data is stored in the database stored in the database 1976 1976

! Entity-Relationship(ER) Models ! Entity-Relationship(ER) Models

" Proposed by Peter Chen for database design giving an important " Proposed by Peter Chen for database design giving an important insight into conceptual data models insight into conceptual data models

" Allows the designer to concentrate on the use of data instead of the " Allows the designer to concentrate on the use of data instead of the logical table structure logical table structure

1976 1976

! Entity-Relationship(ER) Models ! Entity-Relationship(ER) Models

" Proposed by Peter Chen for database design giving an important " Proposed by Peter Chen for database design giving an important insight into conceptual data models insight into conceptual data models

" Allows the designer to concentrate on the use of data instead of the " Allows the designer to concentrate on the use of data instead of the logical table structure logical table structure 1980's 1980's

! Birth of IBM PC. RDBMS market begins to boom. ! Birth of IBM PC. RDBMS market begins to boom.

! SQL becomes standardized through ANSI (American National ! SQL becomes standardized through ANSI (American National Standards Institute) and ISO (International Organization for Standards Institute) and ISO (International Organization for Standardization) Standardization)

! By Mid 80’s it had become apparent that there were some fields ! By Mid 80’s it had become apparent that there were some fields (medicine, multimedia, physics) where relational databases were not (medicine, multimedia, physics) where relational databases were not practical, due to the types of data involved. practical, due to the types of data involved.

" More flexibility was needed in how their data was represented and " More flexibility was needed in how their data was represented and accessed. accessed.

! This led to research in Object Oriented Databases in which users ! This led to research in Object Oriented Databases in which users could define their own methods of access to data and how to could define their own methods of access to data and how to represent and manipulate it. This coincided with the introduction of represent and manipulate it. This coincided with the introduction of Object Oriented Programming languages such as C++ which started Object Oriented Programming languages such as C++ which started to appear to appear

1980's 1980's

! Birth of IBM PC. RDBMS market begins to boom. ! Birth of IBM PC. RDBMS market begins to boom.

! SQL becomes standardized through ANSI (American National ! SQL becomes standardized through ANSI (American National Standards Institute) and ISO (International Organization for Standards Institute) and ISO (International Organization for Standardization) Standardization)

! By Mid 80’s it had become apparent that there were some fields ! By Mid 80’s it had become apparent that there were some fields (medicine, multimedia, physics) where relational databases were not (medicine, multimedia, physics) where relational databases were not practical, due to the types of data involved. practical, due to the types of data involved.

" More flexibility was needed in how their data was represented and " More flexibility was needed in how their data was represented and accessed. accessed.

! This led to research in Object Oriented Databases in which users ! This led to research in Object Oriented Databases in which users could define their own methods of access to data and how to could define their own methods of access to data and how to represent and manipulate it. This coincided with the introduction of represent and manipulate it. This coincided with the introduction of Object Oriented Programming languages such as C++ which started Object Oriented Programming languages such as C++ which started to appear to appear 1990’s 1990’s

! Considerable research into more powerful query language and ! Considerable research into more powerful query language and richer data model, with emphasis on supporting complex richer data model, with emphasis on supporting complex analysis of data from all parts of an enterprise analysis of data from all parts of an enterprise

! First OODBMS’ start to appear from companies like Objectivity. ! First OODBMS’ start to appear from companies like Objectivity. Object Relational DBMS’ hybrids also begin to appear. Object Relational DBMS’ hybrids also begin to appear.

! Several vendors, e.g., IBM’s DB2, Oracle 8, Informix UDS, ! Several vendors, e.g., IBM’s DB2, Oracle 8, Informix UDS, extended their systems with the ability to store new data types extended their systems with the ability to store new data types such as images and text, and to ask more complex queries such as images and text, and to ask more complex queries

! New application areas: Data warehousing and OLAP(Online ! New application areas: Data warehousing and OLAP(Online Analytical Processing, a category of software tools that Analytical Processing, a category of software tools that provides analysis of data stored in a database), internet, provides analysis of data stored in a database), internet, multimedia, etc multimedia, etc

! Development of personal/small business productivity tools such ! Development of personal/small business productivity tools such as Excel and Access from Microsoft. as Excel and Access from Microsoft.

1990’s 1990’s

! Considerable research into more powerful query language and ! Considerable research into more powerful query language and richer data model, with emphasis on supporting complex richer data model, with emphasis on supporting complex analysis of data from all parts of an enterprise analysis of data from all parts of an enterprise

! First OODBMS’ start to appear from companies like Objectivity. ! First OODBMS’ start to appear from companies like Objectivity. Object Relational DBMS’ hybrids also begin to appear. Object Relational DBMS’ hybrids also begin to appear.

! Several vendors, e.g., IBM’s DB2, Oracle 8, Informix UDS, ! Several vendors, e.g., IBM’s DB2, Oracle 8, Informix UDS, extended their systems with the ability to store new data types extended their systems with the ability to store new data types such as images and text, and to ask more complex queries such as images and text, and to ask more complex queries

! New application areas: Data warehousing and OLAP(Online ! New application areas: Data warehousing and OLAP(Online Analytical Processing, a category of software tools that Analytical Processing, a category of software tools that provides analysis of data stored in a database), internet, provides analysis of data stored in a database), internet, multimedia, etc multimedia, etc

! Development of personal/small business productivity tools such ! Development of personal/small business productivity tools such as Excel and Access from Microsoft. as Excel and Access from Microsoft. Late 90’s-2000’s Late 90’s-2000’s

! XML ! XML

" Starts incorporation (as middleware or enabled DBMS) in 1997 " Starts incorporation (as middleware or enabled DBMS) in 1997 ! Data Junction, ADO, Delphi ! Data Junction, ADO, Delphi

! Oracle 8i, 9i, MS Access 2002, SQL Server 2000, DB2, Informix ! Oracle 8i, 9i, MS Access 2002, SQL Server 2000, DB2, Informix

" Native XML DBMS, 2000 " Native XML DBMS, 2000

! TigerLogic XDMS, Raining Data, Tamino, Software AG, Birdstep ! TigerLogic XDMS, Raining Data, Tamino, Software AG, Birdstep

! Large investment in internet companies fuels tools-market boom for ! Large investment in internet companies fuels tools-market boom for Web/Internet/DB connectors: Web/Internet/DB connectors:

" Active Server Pages, Front page, Java Servlets, JDBC, Java Beans, " Active Server Pages, Front page, Java Servlets, JDBC, Java Beans, ColdFusion, Dream Weaver, Oracle Developer 2000, etc ColdFusion, Dream Weaver, Oracle Developer 2000, etc

! Open source projects come online with widespread use of gcc,cgi, ! Open source projects come online with widespread use of gcc,cgi, Apache, MySQL Apache, MySQL

! Three main companies dominate in the large DB market: IBM, ! Three main companies dominate in the large DB market: IBM, Microsoft, and Oracle Microsoft, and Oracle

Late 90’s-2000’s Late 90’s-2000’s

! XML ! XML

" Starts incorporation (as middleware or enabled DBMS) in 1997 " Starts incorporation (as middleware or enabled DBMS) in 1997 ! Data Junction, ADO, Delphi ! Data Junction, ADO, Delphi

! Oracle 8i, 9i, MS Access 2002, SQL Server 2000, DB2, Informix ! Oracle 8i, 9i, MS Access 2002, SQL Server 2000, DB2, Informix

" Native XML DBMS, 2000 " Native XML DBMS, 2000

! TigerLogic XDMS, Raining Data, Tamino, Software AG, Birdstep ! TigerLogic XDMS, Raining Data, Tamino, Software AG, Birdstep

! Large investment in internet companies fuels tools-market boom for ! Large investment in internet companies fuels tools-market boom for Web/Internet/DB connectors: Web/Internet/DB connectors:

" Active Server Pages, Front page, Java Servlets, JDBC, Java Beans, " Active Server Pages, Front page, Java Servlets, JDBC, Java Beans, ColdFusion, Dream Weaver, Oracle Developer 2000, etc ColdFusion, Dream Weaver, Oracle Developer 2000, etc

! Open source projects come online with widespread use of gcc,cgi, ! Open source projects come online with widespread use of gcc,cgi, Apache, MySQL Apache, MySQL

! Three main companies dominate in the large DB market: IBM, ! Three main companies dominate in the large DB market: IBM, Microsoft, and Oracle Microsoft, and Oracle 2010’s…. 2010’s….

! Big Data: ! Big Data:

" Google processes 20 PB a day (2008) " Google processes 20 PB a day (2008)

" Wayback Machine has 3 PB + 100 TB/month (3/2009) " Wayback Machine has 3 PB + 100 TB/month (3/2009)

" eBay has 6.5 PB of user data + 50 TB/day (5/2009) " eBay has 6.5 PB of user data + 50 TB/day (5/2009)

" Facebook has 36 PB of user data + 80-90 TB/day (6/2010) " Facebook has 36 PB of user data + 80-90 TB/day (6/2010)

! New ways for efficient query answering are needed: ! New ways for efficient query answering are needed:

" For example: " For example:

! INSERT only, not UPDATES/DELETES ! INSERT only, not UPDATES/DELETES

! No JOINs, thereby reducing query time ! No JOINs, thereby reducing query time " This involves de-normalizing data " This involves de-normalizing data

2010’s…. 2010’s….

! Big Data: ! Big Data:

" Google processes 20 PB a day (2008) " Google processes 20 PB a day (2008)

" Wayback Machine has 3 PB + 100 TB/month (3/2009) " Wayback Machine has 3 PB + 100 TB/month (3/2009)

" eBay has 6.5 PB of user data + 50 TB/day (5/2009) " eBay has 6.5 PB of user data + 50 TB/day (5/2009)

" Facebook has 36 PB of user data + 80-90 TB/day (6/2010) " Facebook has 36 PB of user data + 80-90 TB/day (6/2010)

! New ways for efficient query answering are needed: ! New ways for efficient query answering are needed:

" For example: " For example:

! INSERT only, not UPDATES/DELETES ! INSERT only, not UPDATES/DELETES

! No JOINs, thereby reducing query time ! No JOINs, thereby reducing query time " This involves de-normalizing data " This involves de-normalizing data Entender los datos: medidas…. Entender los datos: medidas…. Nombre Standard SI Uso Binario Nombre Standard SI Uso Binario Kilobyte 10 e 3 2 e 10 Kilobyte 10 e 3 2 e 10 Megabyte 10 e 6 2 e 20 Megabyte 10 e 6 2 e 20 Gigabyte 10 e 9 2 e 30 Gigabyte 10 e 9 2 e 30 Terabyte 10 e 12 2 e 40 Terabyte 10 e 12 2 e 40 Petabyte 10 e 15 2 e 50 Petabyte 10 e 15 2 e 50 Exabyte 10 e 18 2 e 60 Exabyte 10 e 18 2 e 60 Zettabyte 10 e 21 2 e 70 Zettabyte 10 e 21 2 e 70

Entender los datos: medidas…. Entender los datos: medidas…. Nombre Standard SI Uso Binario Nombre Standard SI Uso Binario Kilobyte 10 e 3 2 e 10 Kilobyte 10 e 3 2 e 10 Megabyte 10 e 6 2 e 20 Megabyte 10 e 6 2 e 20 Gigabyte 10 e 9 2 e 30 Gigabyte 10 e 9 2 e 30 Terabyte 10 e 12 2 e 40 Terabyte 10 e 12 2 e 40 Petabyte 10 e 15 2 e 50 Petabyte 10 e 15 2 e 50 Exabyte 10 e 18 2 e 60 Exabyte 10 e 18 2 e 60 Zettabyte 10 e 21 2 e 70 Zettabyte 10 e 21 2 e 70 Human Scale Human Scale KILO 10^3 (2^10) KILO 10^3 (2^10) Cellular memory Cellular memory Text (email, document) Text (email, document)

MEGA 10^6 (2^20) MEGA 10^6 (2^20) Book, Picture Book, Picture

GIGA 10^9 (2^30) GIGA 10^9 (2^30) RAM, Good video RAM, Good video

(This is our world) (This is our world)

Human Scale Human Scale KILO 10^3 (2^10) KILO 10^3 (2^10) Cellular memory Cellular memory Text (email, document) Text (email, document)

MEGA 10^6 (2^20) MEGA 10^6 (2^20) Book, Picture Book, Picture

GIGA 10^9 (2^30) GIGA 10^9 (2^30) RAM, Good video RAM, Good video

(This is our world) (This is our world) More More

TERA 10^12 2^{40} TERA 10^12 2^{40} -- Congress library (USA): 160 TB -- Congress library (USA): 160 TB -- Daily internet traffic (100 TB) -- Daily internet traffic (100 TB) -- Wikipedia: 6 Terabyte dump (2010) -- Wikipedia: 6 Terabyte dump (2010) --3-D movie Monsters Vs Aliens (necesitó 100 TB --3-D movie Monsters Vs Aliens (necesitó 100 TB disco) disco)

It is not a human scale, but still it is usual for any normal company It is not a human scale, but still it is usual for any normal company

More More

TERA 10^12 2^{40} TERA 10^12 2^{40} -- Congress library (USA): 160 TB -- Congress library (USA): 160 TB -- Daily internet traffic (100 TB) -- Daily internet traffic (100 TB) -- Wikipedia: 6 Terabyte dump (2010) -- Wikipedia: 6 Terabyte dump (2010) --3-D movie Monsters Vs Aliens (necesitó 100 TB --3-D movie Monsters Vs Aliens (necesitó 100 TB disco) disco)

It is not a human scale, but still it is usual for any normal company It is not a human scale, but still it is usual for any normal company Even More… Even More…

PETA 10^15 2^50 PETA 10^15 2^50 " World of Warcraft uses 1.3 PB to keep its game " World of Warcraft uses 1.3 PB to keep its game

" Internet Archive (3 PB) (it increases a 100 TB per month) " Internet Archive (3 PB) (it increases a 100 TB per month) " Google procesdes 24 petabytes per day " Google procesdes 24 petabytes per day

" 1/2 PB:to films the life of a person (100 years in high definition). " 1/2 PB:to films the life of a person (100 years in high definition). " Facebook has 60 thousend millions of images, that is, 1,5PB. " Facebook has 60 thousend millions of images, that is, 1,5PB. " AT&T transfers around 19 petabytes per day. " AT&T transfers around 19 petabytes per day.

Even More… Even More…

PETA 10^15 2^50 PETA 10^15 2^50 " World of Warcraft uses 1.3 PB to keep its game " World of Warcraft uses 1.3 PB to keep its game

" Internet Archive (3 PB) (it increases a 100 TB per month) " Internet Archive (3 PB) (it increases a 100 TB per month) " Google procesdes 24 petabytes per day " Google procesdes 24 petabytes per day

" 1/2 PB:to films the life of a person (100 years in high definition). " 1/2 PB:to films the life of a person (100 years in high definition). " Facebook has 60 thousend millions of images, that is, 1,5PB. " Facebook has 60 thousend millions of images, that is, 1,5PB. " AT&T transfers around 19 petabytes per day. " AT&T transfers around 19 petabytes per day. 2010’s 2010’s

! NoSQL ! NoSQL

" Stands for Not Only SQL " Stands for Not Only SQL

" Class of non-relational data storage systems " Class of non-relational data storage systems

" Usually do not require a fixed table schema nor do they use the " Usually do not require a fixed table schema nor do they use the concept of joins concept of joins

! NoSQL movement started from: ! NoSQL movement started from:

" BigTable (Google) " BigTable (Google)

" Dynamo (Amazon) " Dynamo (Amazon)

! Gossip protocol (discovery and error detection) ! Gossip protocol (discovery and error detection) ! Distributed key-value data store ! Distributed key-value data store

! Eventual consistency ! Eventual consistency

2010’s 2010’s

! NoSQL ! NoSQL

" Stands for Not Only SQL " Stands for Not Only SQL

" Class of non-relational data storage systems " Class of non-relational data storage systems

" Usually do not require a fixed table schema nor do they use the " Usually do not require a fixed table schema nor do they use the concept of joins concept of joins

! NoSQL movement started from: ! NoSQL movement started from:

" BigTable (Google) " BigTable (Google)

" Dynamo (Amazon) " Dynamo (Amazon)

! Gossip protocol (discovery and error detection) ! Gossip protocol (discovery and error detection) ! Distributed key-value data store ! Distributed key-value data store

! Eventual consistency ! Eventual consistency NoSQL solutions NoSQL solutions

! NoSQL solutions fall into two major areas: ! NoSQL solutions fall into two major areas:

" Key/Value or ‘the big hash table’. " Key/Value or ‘the big hash table’.

" Schema-less which comes in multiple flavors, column- " Schema-less which comes in multiple flavors, column- based, document-based or graph-based. based, document-based or graph-based.

! In NoSQL solutions we are giving up: ! In NoSQL solutions we are giving up:

" joins " joins

" group by " group by

" order by " order by

" ACID transactions " ACID transactions

" SQL as a sometimes frustrating but still powerful query " SQL as a sometimes frustrating but still powerful query language language

" easy integration with other applications that support SQL " easy integration with other applications that support SQL

NoSQL solutions NoSQL solutions

! NoSQL solutions fall into two major areas: ! NoSQL solutions fall into two major areas:

" Key/Value or ‘the big hash table’. " Key/Value or ‘the big hash table’.

" Schema-less which comes in multiple flavors, column- " Schema-less which comes in multiple flavors, column- based, document-based or graph-based. based, document-based or graph-based.

! In NoSQL solutions we are giving up: ! In NoSQL solutions we are giving up:

" joins " joins

" group by " group by

" order by " order by

" ACID transactions " ACID transactions

" SQL as a sometimes frustrating but still powerful query " SQL as a sometimes frustrating but still powerful query language language

" easy integration with other applications that support SQL " easy integration with other applications that support SQL A lot has been left out! A lot has been left out!

1970's 1970's

2000's 2000's

A lot has been left out! A lot has been left out!

1970's 1970's

2000's 2000's References References

! "The History of Databases" By Patrick Rogers- ! "The History of Databases" By Patrick Rogers- Ostema Ostema

! Database Management Systems, R. Ramakrishnan ! Database Management Systems, R. Ramakrishnan and J. Gehrke (slides) and J. Gehrke (slides)

References References

! "The History of Databases" By Patrick Rogers- ! "The History of Databases" By Patrick Rogers- Ostema Ostema

! Database Management Systems, R. Ramakrishnan ! Database Management Systems, R. Ramakrishnan and J. Gehrke (slides) and J. Gehrke (slides)