Chapter 2 Introduction to Modeling

Objectives: to understand • reduces complexities of • Data modeling and why data models are important • Designers, programmers, and end users see data • The basic data-modeling building blocks in different ways • What business rules are and how they influence • Different views of same data lead to that do not reflect organization’s operation • How the major data models evolved historically • Various degrees of data help • How data models can be classified by level of reconcile varying views of same data abstraction

CS275 Fall 2010 1 CS275 Fall 2010 2

Data Modeling and Data Models The Importance of Data Models

• Model: an abstraction of a real-world object or • Facilitate interaction among the designer, the event applications programmer, and the end user – Useful in complexities of the real- • End users have different views and needs for data world environment • organizes data for various users • Data models • Data model is a - an abstraction – Relatively simple representations of complex real- • It’s a graphical collection of logical constructs world data structures representing the and relationships • Often graphical within the database. • Creating a Data model is iterative and progressive – Cannot draw required data out of the data model – An implementation model would represent how the data are represented in the database. CS275 Fall 2010 3 CS275 Fall 2010 4

1 Data Model Basic Building Blocks Business Rules Terminology • Entity : anything about which data are to be • Descriptions of policies or principles within an collected and stored organization • Attribute : a characteristic of an entity • Description of operations or procedures, to • Relationship : describes an association among create/enforce actions within an organization’s entities environment – One-to-many (1:M) relationship – Must be in writing and kept up to date – Many-to-many (M:N or M:M) relationship – Must be easy to understand and widely disseminated – Sometimes externally defined, i.e. government – One-to-one (1:1) relationship regulations. • Constraint : a restriction placed on the data • These describe characteristics of data as viewed by the company

CS275 Fall 2010 5 CS275 Fall 2010 6

Discovering Business Rules Importance of Business Rules

• Sources of business rules: • Standardize company’s of data – Company managers • Useful as a communications tool between users – Policy makers and designers – Department managers • Allows the designer to – Written documentation – understand the nature, role, and scope of data • Procedures – understand business processes • Standards – develop appropriate relationship participation • Operations manuals rules and constraints – Direct interviews with end users • Promotes the creation of an accurate data model

• Always verify sources of CS275 Fall 2010 7 CS275 Fall 2010 8

2 Translating Business Rules into Data Naming Conventions Model Components • Naming occurs during translation of business • Generally, nouns translate into entities rules to data model components • Verbs translate into relationships among entities • Names should make the object unique and • Relationships are bidirectional distinguishable from other objects • Two questions to identify the relationship type: • Names should also be descriptive of objects in the – How many instances of B are related to one environment and be familiar to users instance of A? • Proper naming: – How many instances of A are related to one – Facilitates communication between parties instance of B? – Promotes self-documentation

CS275 Fall 2010 9 CS275 Fall 2010 10

Evolution of Data The Hierarchical Model Implementation Models • Hierarchical • The hierarchical model was developed in the 1960s – Logically represented by an upside down tree to manage large amounts of data for manufacturing • Each parent can have many children projects • Each child has only one parent • Basic logical structure is represented by an upside- down “tree” • Network • Hierarchical structure contains levels or segments • Relational – Segment analogous to a record type • Object oriented – Set of one-to-many relationships between segments • Hybrid, XML • Example – manufacturing a car from components (a,b,or c), each made of subassemblies (1,2,or3), each having parts (x,y,&z) ....(tree structure)

CS275 Fall 2010 11 CS275 Fall 2010 12

3 Hierarchical Structure Hierarchical Structure

• Each parent can have many children • Each child has only one parent • Tree is defined by path that traces parent segments to child segments, beginning from the left • Hierarchical path – Ordered sequencing of segments tracing hierarchical structure • Preorder traversal or hierarchic sequence – “Left-list” path

CS275 Fall 2010 13 CS275 Fall 2010 14

The Hierarchical Model The Hierarchical Model

• GUAM (Generalized Update Access Method) • Advantages • Disadvantages – Based on the recognition that the many smaller – Conceptual simplicity – Complex parts would come together as components of still implementation larger components – Database security – Data independence – Difficult to manage • (IMS) – Database integrity – Lacks structural – World’s leading mainframe hierarchical database – Efficiency independence system in the 1970s and early 1980s – Complex applications • TCDMS/ADABAS – jointly developed by IBM and programming and use Lane County – Implementation limitations – Lack of standards

CS275 Fall 2010 15 CS275 Fall 2010 16

4 The The Network Model

• The network model was created to represent • Collection of records in 1:M relationships complex data relationships more effectively than • A Set is a relationship and composed of two the hierarchical model record types: – Improves database performance – Owner: Equialent to the hierarchical model’s – Imposes a database standard parent – Represent complex data relationships more – Member: Equivalent to the hierarchical model’s effectively – such as child w/ multiple parents child • Conference on Data Languages (CODASYL) • American National Standards Institute (ANSI) • Database Task Group (DBTG)

CS275 Fall 2010 17 CS275 Fall 2010 18

The Network Model Components The Network Model

still used today: • Advantages: • Disadvantages of the – Schema: Conceptual organization of entire – Conformance to network model: database as viewed by the database administrator standards – System complexity – Subschema : Database portion “seen” by the – Handled more – Lack of ad hoc query application programs relationship types capability placed – language (DML): Defines the – Data access flexibility burden on environment in which data can be managed programmers to – (DDL): Enables the generate code for reports administrator to define the schema components – Structural change in the database could produce havoc in all

CS275 Fall 2010 19 CS275 Fall 2010 application programs 20

5 The Relational

• Developed by E.F. Codd (IBM) in 1970 • A Relational table is a purely logical structure • Relational models were considered impractical in – How data are physically stored in the database is the 1970’s. of no concern to the user or the designer. • Model was conceptually simple at expense of • Stores a collection of related entities computer overhead – Resembles a file • Relational table is purely logical structure • Table (relations) – How data are physically stored in the database is – Matrix consisting of a series of row/column of no concern to the user or the designer intersections – This is the source of a real database – Each row in a relation is called a tuple revolution – Related to each other by sharing a common entity characteristic CS275 Fall 2010 21 CS275 Fall 2010 22

The Relational Model Components

• Relational data management system (RDBMS) – Performs same functions provided by hierarchical model, but hides complexity from the user • Relational schema/ – Visual representation of ’s entities, attributes within those entities, and relationships between those entities • Relational diagram – Representation of entities, attributes, and relationships • Relational table stores collection of related entities. CS275 Fall 2010 23 CS275 Fall 2010

6 The Relational DBMS Application The Relational Implementation Model

• SQL-based relational database application • Advantages • Disadvantages involves three parts: – Structural – Substantial hardware – User interface independence and system overhead • Allows end user to interact with the data – Improved conceptual simplicity – Can facilitate poor – Set of tables stored in the database – Easier database design, design and • Each table is independent from another implementation, implementation • Rows in different tables are related based on management, and use – May promote “islands common values in common attributes – Ad hoc query of information” problems – SQL “engine” capability (SQL) – Powerful database • Executes all queries management system

CS275 Fall 2010 25 CS275 Fall 2010 26

Logical/Conceptual Model The Entity Relationship Model The Entity Relationship Model • Widely accepted standard for data modeling • Entity instance (or occurrence) is row in table • Introduced by Chen in 1976 • Entity set is collection of like entities • Graphical representation of entities and their • Connectivity labels types of relationships relationships in a database structure • Relationships are expressed using Chen notation • Entity relationship diagram (ERD) – Relationships are represented by a diamond – Uses graphic representations to model database – Relationship name is written inside the diamond components – Entity is mapped to a relational table • Crow’s Foot notation used as design standard in this book

CS275 Fall 2010 27 CS275 Fall 2010 28

7 Logical/Conceptual Model The Object-Oriented (OO) Model

• Models both data and relationships contained in a single structure known as an object • OODM (object-oriented data model) is the basis for OO-DBMS () • An object is described by its factual content: – Are self-contained: a basic building-block for autonomous structures – Is an abstraction of a real-world entity – Contains information about relationships between facts within the object and with other objects.

CS275 Fall 2010 CS275 Fall 2010 30

The Object-Oriented (OO) Model

• An Object is the logical abstraction or basic building block for autonomous structures – Attributes describe the properties of an object – Objects that share similar characteristics are grouped in classes – Classes are organized in a class hierarchy – Inheritance: an object inherits methods and attributes of parent class – UML - Unified is used to graphically model a system • based on OO concepts that describe and

CS275 Fall 2010 symbols 31 CS275 Fall 2010

8 Logical Models: Newer Data Models: Object/Relational Object Oriented Model • Extended relational data model (ERDM) • Advantages • Disadvantages – Semantic data model developed in response to – Adds semantic content – Slow pace of OODM increasing complexity of applications – Visual presentation standards includes semantic development – Includes many of OO model’s best features content – Complex navigational – Often described as an object/relational database – Database integrity data access management system (O/RDBMS) – Both structural and – Steep learning curve – Primarily geared to business applications data independence – High system overhead slows transactions – Lack of market penetration

CS275 Fall 2010 33 CS275 Fall 2010 34

Newer Data Models: XML The Future of Data Models

• The Internet revolution created the potential to • Hybrid DBMSs exchange critical business information – Retain advantages of relational model • Dominance of Web has resulted in growing need – Provide object-oriented view of the underlying to manage unstructured information data • In this environment, Extensible • SQL data services – ‘Cloud ’ (XML) emerged as the de facto standard – Store data remotely without incurring expensive • Current support XML hardware, software, and personnel costs – XML: the standard protocol for data exchange – Companies operate on a “pay-as-you-go” system among systems and Internet services

CS275 Fall 2010 35 CS275 Fall 2010 36

9 The Development of Data Models Data Models: A Summary

• Each new data model capitalized on the shortcomings of previous models • Common characteristics: – Conceptual simplicity with semantic completeness – Represent the real world as closely as possible – Real-world transformations (behavior) must comply with consistency and integrity characteristics • Some models better suited for some tasks

CS275 Fall 2010 CS275 Fall 2010 38

SPARC Framework : The SPARC External Model Degrees of Data Abstraction • Represents the End users’ view of the data • Database designer starts with abstracted view, environment then adds details • ER diagrams represent external views • ANSI Standards Planning and • External schema: specific representation of an Committee (SPARC) external view – Defined a framework for data modeling based on – Entities degrees of data abstraction (1970s): 1. External – Relationships 2. Conceptual – Processes 3. Internal – Constraints

CS275 Fall 2010 39 CS275 Fall 2010 40

10 External Models showing The External Model two different Users

• End users’ view of the data environment Conceptual Model • Requires that the modeler subdivide set of requirements and constraints into functional modules that can be examined within the framework of their external models • Advantages: – Easy to identify specific requirements to support each business unit’s operations – Facilitates designer’s job by providing feedback about the model’s adequacy – Ensures security constraints in database design – Simplifies application program development CS275 Fall 2010 41 CS275 Fall 2010 42

The SPARC Conceptual Model The Conceptual Model

• Represents global view of the entire database Advantages – All external views integrated into single global • Provides a relatively easily understood macro view: level view of data environment • Representation of data as viewed by high-level • Independent of both software and hardware managers – Does not depend on the DBMS software used to • ER Diagram graphically represents the implement the model conceptual schema – Does not depend on the hardware used in the – ER model most widely used conceptual model implementation of the model • Basis for identification and description of main – Changes in hardware or software or do not affect data objects, avoiding details database design at the conceptual level

CS275 Fall 2010 43 CS275 Fall 2010 44

11 The SPARC Internal Model

• Representation of the database as “seen” by the DBMS – Maps the Conceptual model to the DBMS • Internal schema depicts a specific representation of an internal model • Depends on specific database software – Change in DBMS software requires internal model be changed • Logical independence: change internal model without affecting conceptual model

CS275 Fall 2010 45 CS275 Fall 2010

The Physical Model Summary

• Operates at lowest level of abstraction • A data model is an abstraction of a complex real- – Describes the way data are saved on storage world data environment media such as disks or tapes • Basic data modeling components: – Software and hardware dependent – Entities • Requires the definition of physical storage and – Attributes data access methods – Relationships • Relational model aimed at logical level – Constraints – Does not require physical-level details • Business rules identify and define basic modeling • Physical independence: changes in physical components model do not affect internal model

CS275 Fall 2010 47 CS275 Fall 2010 48

12 Summary Summary

• Hierarchical model • Object-oriented data model: object is basic – Set of one-to-many (1:M) relationships between a modeling structure parent and its children segments • Relational model adopted object-oriented • Network data model extensions: extended relational data model – Uses sets to represent 1:M relationships between (ERDM) record types • OO data models depicted using UML • Relational model • Data-modeling requirements are a function of – Current database implementation standard different data views and abstraction levels – ER model is a tool for data modeling – Three SPARC abstraction levels: external, • Complements relational model conceptual, internal

CS275 Fall 2010 49 CS275 Fall 2010 50

13