Applying ER Model

• Problem Domain: The Winter Olympics • Fans: • Get country medal counts • See who won an event • Find out how a favorite athelete is doing • Scheduler: • Set times for competitions • Assign competitors to heats, brackets, etc • Officials: • Record Results (winners, times, medals, etc).

2/11/20102/9/2010 1 Topics for Today • Data Models • Network Data Models • Hierarchical Data Models • Learning objectives • Explain the technical landscape from which the relational emerged. • When we present the , articulate the most salient points that distinguish it from what preceded it.

2/11/2010 2 Network Data Model

• Basics • Think of this as key/data pairs where the data can contain a reference to another key/data pair (either in the same database or a different database). • Data are represented by collections of records (i.e., a database of key/data pairs). • Relationships are expressed via links between records. • Navigational style of access: you find a record and then traverse links among records to find other information. • Analogies with ER modelingentity? set • Collectionsrelationship? of records == • Links ==

2/11/2010 3 Network Model Details

• Records composed of attributes • Attributes are single-valued. • A link links only two records. • Design tool for the network model is a data-structure diagram (very similar to an ER diagram) • Think of a data-structure diagram as an ER diagram where relationships are expressed as lines connecting entities (that is, the relationship isn’t depicted explicitly).

2/11/2010 4 Diagram ER Diagram PID name Patient Schedule Appointment phone date Data structure Diagram

scheduled PID name phone date

2/11/2010 5 Tertiary Relationships

• Relationships that relate more than two records are represented by a separate record that simply consists of links. ER Diagram PID

name Patient Schedule Appointment

phone date

name Doctor

2/11/2010 6 Link Records

Data structure Diagram

Patient Doctor Appointment PID name phone name date

DoctorLink

PatientLink ApptLink

schedule record

2/11/2010 7 Navigating a Network Database

Doctor Appointment name schedule schedule record Fri AM schedule record Fri AM Sat PM schedule record Fri AM Patient PID name phone PID name phone PID name phone

2/11/2010 8 Standardization of Network Model

• CODASYL DBTG 1971: First database standard • Written in the late 60’s by the Database Task Group (DBTG). • Only allow many:one links (rep 1:1 as many:1) • Disallowed many:many links. • Implement many:many using dummy records. • Canonical many:one relationship is called a DBTG set.

relationship owner A B member/child

• Each set has only one owner and zero or more members. • Each member may occur only once in each set, but a member can belong to multiple different sets.

2/11/2010 9 More on CODASYL

• Like in Berkeley DB, manipulate data using APIs embedded in a host language. • Record selection: DBP->get • Iteration: DBP->put • Iterate over members • Find owner (given a member): secondary index lookup • Implementation • Links are pointers (in-memory). • Links are what in Berkeley DB-like solution? Primary keys • Why only many:one (not many:many)? • Implementation artifact: How do you store many pointers in a single record? Create a linked list through the members. • Makes many:many difficult.

2/11/2010 10 Hierarchical Model

• Similar to the network model except that records are organized as collections of trees rather than as arbitrary graphs. • Basics • Data are represented by collections of records. • Relationships are expressed via links between records. • Analogies • Collections of records: entity set • Links: relations

2/11/2010 11 Hierarchical Model: Details

• Records composed of attributes. • All members of a set (e.g., DBTG set) are thought of as having a common root, and therefore form a tree. • Consider the data structure diagram for several records in our patient database (i.e., a diagram in the Network Model):

Patients Appointments 001 David 6-1234 9@2/16/10

002 Peter 6-5678 10@2/17/10

003 Ryan 6-9012 11@2/18/10

12@2/19/10

2/11/2010 12 Rooted Trees • Database is a collection of trees. • Each tree is organized hierarchically. • The Hierarchy is: Patients

• The data looks like: Appointments Insurance

001 David 6-1234 9@2/16/10 Harvard Pilgrim 002 Peter 6-5678 10@2/17/10 HUGHP Tufts

003 Ryan 6-9012 11@2/18/10 Tufts 12@2/19/10

2/11/2010 13 Yet another tree

• This time the hierarchy is: Appointments

• And the data looks like: Patients

9@2/16/10 001 David 6-1234 10@2/17/10 002 Peter 6-5678 11@2/18/10 003 Ryan 6-9012 12@2/19/10 003 Ryan 6-9012

2/11/2010 14 Implementation Issues

• This model is inherently redundant. • Records are represented in multiple trees. • Redundancy results in: • Inconsistency • Inefficiency • Solution is the virtual record: • Virtual record is a reference to a record. • Rather than storing the entire record multiple times, maintain a pointer to the record. • Rather than navigating through a network as in the network model, we navigate through trees

2/11/2010 15 Real World Systems

• IBM’s IMS (Information Management System) is the oldest and perhaps among the most widely used database systems. • IMS databases were some of the largest databases. • IMS first had to deal with concurrency, recovery, integrity, efficient query processing, etc. • IMS FastPath was an optimized version that kept the most active parts of the database in main memory (forerunner of modern main memory database systems). • Data manipulation language similar to that for Network; has differences, called DL/1.

2/11/2010 16