CSC4480: Principles of Systems

Lecture 2: Models/Entity Relationship Model Steps in Designing a

• Requirements and Specification – This involves scoping out the requirements and limitations of the database

• Conceptual Design – This involves using a (e.g Entity Relationship Model) to represent the requirements of the specifications

• Logical Design – This involves translating the conceptual design into tables and relationships expressed in a data model and implemented in a DBMS.

Slide 2 Importance of

• A foundation for good applications • Is a communication tool that facilitates the interaction among the designer, the application’s programmer, and the end user • Both an art and a science

Slide 3 Defining Data Models

• Models are simplified abstractions of real world events or conditions

• A data model is the relatively simple representation using graphic, of facts (data) that are used to describe real world events, objects, or ideas – A data model illustrates what data are being represented (of interest for the end use or system) and how the data elements relate to each other – A data model represents data characteristics, relationships, constraints, and transformations.

Slide 4 The importance of Data Models

• Facilitates communication • Gives various views of the database • Organizes data for various users • Provides an abstraction for the creation of good a database

Slide 5 Database Models

• A is a collection of logical constructs (based on a data model) used to represent the and the data relationships found within a specific database.

• They include a set of concepts to describe the structure of a database, the operations for manipulating these structures, and certain constraints that the database should obey.

Slide 6 Data Model Structure

• Entity: Represents an object of interest to a user. Each entity describes an object in the real world. • Attributes: Characteristics that describes an entity • Relationships: Association between entities – One to Many (1:M) •One team consists of many players – Many to Many (M:N) •An employee might learn many job skills and each job skills might be learned by many employees •EMPLOYEE(M) learns SKILLS(N) – One to One (1:1) •A president leads one country

Slide 7 • Identify the Model Characteristics for the database below:

Slide 8 Data Model Structure (cont’d)

• Constraints: A restriction placed on the data – E.g Salary must be greater than 0 and less than 350 – A GPA must be greater than or equal to 0.00 and less than or equal to 4.00 using 2 decimal places – A pilot cannot fly more than 10 hours during any 24 hour period.

Slide 9 Naming Conventions

• Entity name requirements – Be descriptive of the objects in the business environment – Use terminology that is familiar to the users • Attribute name – Required to be descriptive of the data represented by the attribute • Proper naming – Facilitates communication between parties – Promotes self-documentation

Slide 10 In Class Exercise

• What entities does uber have? • What attributes do you think Uber would have with the Entities you identified? • Describe relationships that might exist between each entities

Slide 11 Business Rules

• A business rule is a brief, precise and unambiguous description of a policy procedure or principle within an organization • Business rule are essential for: – Setting standards within an organization – Serving as a communication tool among users and designers – Describing the nature, role and scope of the data – Understanding of business processes – Developing the data model

Slide 12 Business Rule Example

• A Customer may generate many invoices • An invoice is generated by one customer

Slide 13 Business Rule Example

• A Customer may generate many invoices • An invoice is generated by one customer

• What are some business rules for Uber?

Slide 14 Translating Business Rules into Data Model Components

• Business rules set the stage for the proper identification of entities, attributes, relationships, and constraints – Nouns translate into entities – Verbs translate into relationships among entities

• Relationships are bidirectional – Questions to identify the relationship type •How many instance of B are related to one instance of A? •How many instances of A are related to one instance of B?

Slide 15 Categories of Data Models

• Conceptual (high-level, semantic) data models: – Provide concepts that are close to the way many users perceive data. •(Also called entity-based or object-based data models.)

• Physical (low-level, internal) data models: – Provide concepts that describe details of how data is stored in the computer. These are usually specified in an ad-hoc manner through DBMS design and administration manuals

• Implementation (representational) data models: – Provide concepts that fall between the above two, used by many commercial DBMS implementations (e.g. relational data models used in many commercial systems).

Slide 16

• Database Schema: – Represents the logical of a database. •Consists of tables – Defines how data is organized, the relationship between them, and constraints that might exist. – A good schema can mean the difference between queries lasting a second to hours.

• Schema Diagram: – An illustrative display of (most aspects of) a database schema.

• Schema Construct: – A component of the schema or an object within the schema, e.g., STUDENT, COURSE.

Slide 17 Database State

• Database State: – The actual data stored in a database at a particular moment in time. This includes the collection of all the data in the database. – Also called database instance (or occurrence or snapshot). •The term instance is also applied to individual database components, e.g. record instance, instance, entity instance

Slide 18 Database Schema vs Database State

• Distinction – The database schema changes very infrequently. – The database state changes every time the database is updated.

• Schema is also called intension. – Intension refers to a given which is independent of time. An intension defines all permissible extensions.

• State is also called extension. – Extension is the state of all records appearing in a database at a given time

Slide 19 Example of a Database State

Slide 20 Example of a Database Schema

Slide 21 Three-Schema Architecture

• Proposed to support DBMS characteristics of: – Program-data independence. – Support of multiple views of the data.

• Not explicitly used in commercial DBMS products, but has been useful in explaining database system organization

Slide 22 Three-Schema Architecture

• Defines DBMS schemas at three levels: – Internal schema at the internal level to describe physical storage structures and access paths (e.g indexes). •Typically uses a physical data model. – at the conceptual level to describe the structure and constraints for the whole database for a community of users. •Uses a conceptual or an implementation data model. – External schemas at the external level to describe the various user views. •Usually uses the same data model as the conceptual schema.

Slide 23 Three-Schema Architecture

Slide 24 Three-Schema Architecture

• Mappings among schema levels are needed to transform requests and data. – Programs refer to an external schema, and are mapped by the DBMS to the internal schema for execution. – Data extracted from the internal DBMS level is reformatted to match the user’s external view (e.g. formatting the results of an SQL query for display in a Web page)

Slide 25 Data Independence

• Logical Data Independence: – The capacity to change the conceptual schema without having to change the external schemas and their associated application programs.

• Physical Data Independence: – The capacity to change the internal schema without having to change the conceptual schema. – For example, the internal schema may be changed when certain file structures are reorganized or new indexes are created to improve database performance

Slide 26 Data Independence (continued)

• When a schema at a lower level is changed, only the mappings between this schema and higher-level schemas need to be changed in a DBMS that fully supports data independence.

• The higher-level schemas themselves are unchanged. – Hence, the application programs need not be changed since they refer to the external schemas.

Slide 27 DBMS Languages

(DDL) • Data Manipulation Language (DML) – High-Level or Non-procedural Languages: These include the relational language SQL •May be used in a standalone way or may be embedded in a programming language – Low Level or Procedural Languages: •These must be embedded in a programming language

Slide 28 DBMS Languages

• Data Definition Language (DDL): – Used by the DBA and database designers to specify the conceptual schema of a database. – In many DBMSs, the DDL is also used to define internal and external schemas (views). – In some DBMSs, separate storage definition language (SDL) and view definition language (VDL) are used to define internal and external schemas. •SDL is typically realized via DBMS commands provided to the DBA and database designers

Slide 29 DBMS Languages

• Data Manipulation Language (DML): – Used to specify database retrievals and updates – DML commands (data sublanguage) can be embedded in a general-purpose programming language (host language), such as COBOL, C, C++, or Java. •A library of functions can also be provided to access the DBMS from a programming language – Alternatively, stand-alone DML commands can be applied directly (called a ).

Slide 30 Types of DML

• High Level or Non-procedural Language: – For example, the SQL relational language – Are “set”-oriented and specify what data to retrieve rather than how to retrieve it. – Also called declarative languages. • Low Level or Procedural Language: – Retrieve data one record-at-a-time; – Constructs such as looping are needed to retrieve multiple records, along with positioning pointers.

Slide 31 DBMS Interfaces

• Stand-alone query language interfaces – Example: Entering SQL queries at the DBMS interactive SQL interface (e.g. SQL*Plus in ORACLE) • Programmer interfaces for embedding DML in programming languages • User-friendly interfaces – Menu-based, forms-based, graphics-based, etc. • Mobile Interfaces: interfaces allowing users to perform transactions using mobile apps

Slide 32 Types of Data Models

• Hierarchical Model • • Object-oriented Data Models • Object-Relational Models

Slide 33 Network Model

• The first network DBMS was implemented by Honeywell in 1964-65 (IDS System).

• Adopted heavily due to the support by CODASYL (Conference on Data Systems Languages) (CODASYL - DBTG report of 1971).

• Later implemented in a large variety of systems - IDMS ( - now Computer Associates), DMS 1100 (Unisys), IMAGE (H.P. (Hewlett- Packard)), VAX -DBMS (Digital Equipment Corp., next COMPAQ, now H.P.).

• Schema is represented as graphs in which objects types are nodes and relationships are arcs.

Slide 34 Network Model

• Advantages: – Network Model is able to model complex relationships and represents semantics of add/delete on the relationships. – Can handle most situations for modeling using record types and relationship types. – Language is navigational; uses constructs like FIND, FIND member, FIND owner, FIND NEXT within set, GET, etc. •Programmers can do optimal navigation through the database.

• Disadvantages: – Navigational and procedural nature of processing – Database contains a complex array of pointers that thread through a set of records. •Little scope for automated “

Slide 35 Hierarchical Model

• Initially implemented in a joint effort by IBM and North American Rockwell around 1965. Resulted in the IMS family of systems.

• IBM’s IMS product had (and still has) a very large customer base worldwide

• Hierarchical model was formalized based on the IMS system.

• Other systems based on this model: System 2k (SAS inc.)

Slide 36 Hierarchical Model

• Data is stored in a -like structure – Data is stored as records and connected by links – Each child record has one parent while each parent record has one or more child records. – Record retrieval is achieved by traversing the tree from the root node

Slide 37 Hierarchical Model

• Advantages: – Simple to construct and operate – Corresponds to a number of natural hierarchically organized domains, e.g., organization (“org”) chart – Language is simple: •Uses constructs like GET, GET UNIQUE, GET NEXT, GET NEXT WITHIN PARENT, etc. • Disadvantages: – Navigational and procedural nature of processing – Database is visualized as a linear arrangement of records – Little scope for "query optimization"

Slide 38 Relational Model

• Proposed in 1970 by E.F. Codd (IBM), first commercial system in 1981-82. • Now in several commercial products (e.g. DB2, ORACLE, MS SQL Server, SYBASE, INFORMIX). • Several free open source implementations, e.g. MySQL, PostgreSQL • Currently most dominant for developing database applications. • SQL relational standards: SQL-89 (SQL1), SQL-92 (SQL2), SQL-99, SQL3, … • Chapters 5 through 11 describe this model in detail

Slide 39 Relational Model

• Advantages – Structural independence is promoted using independent tables – Tabular view improves conceptual simplicity – Ad hoc query capability is based on SQL – Isolates the end user from physical-level details – Improves implementation and management simplicity • Disadvantages – Requires substantial hardware and system overhead – Conceptual simplicity gives untrained people the tools to use a good system poorly – May promote information problems

Slide 40 Object-oriented Data Model

• Several models have been proposed for implementing in a database system.

• One set comprises models of persistent O-O Programming Languages such as C++ (e.g., in OBJECTSTORE or VERSANT), and Smalltalk (e.g., in GEMSTONE).

• Additionally, systems like O2, ORION (at MCC - then ITASCA), IRIS (at H.P.- used in Open OODB).

Standard: ODMG-93, ODMG-version 2.0, ODMG-version 3.0.

Slide 41 Object-oriented Data Model

• The trend to mix object models with relational was started with Informix Universal Server. • Relational systems incorporated concepts from object leading to object-relational. • Exemplified in the versions of Oracle, DB2, and SQL Server and other DBMSs. • Current trend by Relational DBMS vendors is to extend relational DBMSs with capability to process XML, Text and other data types. • The term “Object-relational” is receding in the marketplace.

Slide 42 Object-Oriented Model

• Advantages – Semantic content is added – Visual representation includes semantic content – Inheritance promotes • Disadvantages – Slow development of standards caused vendors to supply their own enhancements – Complex navigational system – Learning curve is steep – High system overhead slows transactions

Slide 43 The Entity Relationship Model (ERM)

• Graphical representation of entities and their relationships in a database structure – Entity relationship diagram (ERD): uses graphic representations to model database components – Entity instance or entity occurrence: rows in the relational table – Attributes: describe particular characteristics – Connectivity: term used to label the relationship types

Slide 45 ERD Representation

Slide 46 ERD Representation – Crow’s Foot Notation

What are the business rules? Draw it using Google Drawings

Slide 47 ERD Translation

Slide 48 In Class Exercise

What are the business rules?

Slide 49 In class Exercise

• In groups, design an ERD for the UNIVERSITY database described in the previous class.

Slide 50