FactMod - September 26, 2014 Fact Modeling Fundamentals Data Modeling Zone, Portland, 2014 October Page 1

Data Modeling Zone, Portland, 2014 October The Advanced Database Design Course

GETITLE 431INTRO 1 2 DESCRIPTION: Fact • Teaches a different way of thinking – Fact Modeling ( ORM ) Avoids “TABLE THINK” and all of its consequent problems The need for normalization; diagrams difficult to understand Learning ORM is the treatment for “Tableitis” Modeling • Mix of students & working professionals • Professionals often have more trouble unlearning what Fundamentals they know, have learned, and have practiced • Prior knowledge and experience – Used a DBMS to setup and query/manipulate tables ©Gordon C. Everest Professor Emeritus of MIS and Database • Now offered online, piggy back on a face-to-face class Carlson School of Management University of Minnesota • This workshop arises out of my observations of problems [email protected] www.tc.umn.edu/~geverest stemming from faulty thinking – “TABLE THINK”

Logical Database Design Outline Objective, Principles, Benefits B ORMvER DMOD 3 SLIDE # 4 4 • Data Modeling – What, Why, How, the process, constructs OBJECTIVE of LOGICAL DATABASE DESIGN: The WHAT – traditional ERel data modeling TO ACCURATELY AND COMPLETELY MODEL 11 • “TABLE THINK” – and consequent problems SELECTED PORTIONS OF THE REAL WORLD OF INTEREST TO A COMMUNITY OF USERS. Fact Modeling ( e.g., ORM ) – interplay among entities/objects, relationships, attributes, identifiers • USERS (COLLECTIVELY) WILL ALWAYS KNOW MORE 29 • Transitioning from ERel to Fact Modeling (ORM) ABOUT A DATA STRUCTURE THAN THE SYSTEM KNOWS, OR THAN COULD BE DEFINED TO THE SYSTEM. 38 • ORM in greater detail – verbalization in fact sentences; symbolization (diagramming) • WHAT IS NOT FORMALLY DEFINED TO THE SYSTEM, 47 • ORM Constraints – uniqueness (multiplicity/exclusivity); THE SYSTEM CANNOT MANAGE . . . THE USERS MUST! mandatory role; handling ternary++ relationships; other 67 • ORM Modeling Tool – architecture; NORMA demo; •THEREFORE, NEED TO CAPTURE RICH SEMANTICS generating Relational tables; abstraction in model presentation WITH COMPREHENSIVE DATA MODELING and DEFINITION, INCLUDING INTEGRITY CONSTRAINTS AND OPERATIONS. 74 • Data Modeling Schemes – where ORM fits. 77 • APPENDIX: Some ER/Relational design exercises Let the ‘system’ do it! Implications for a Tool!

N

Purpose of Modeling – the WHY Modeling Process – the HOW

DMOD DMOD 5 6 To Facilitate Human Communication, MODEL = Abstract (Re). present .(ation) Understanding, Validation Knowledge Knowledge Knowledge • Capture semantics – all relevant, important details externalized, • Document – record and remember in the world in the head formalized, (infinitely complex) (mental models) • Understand – learn , raise questions, record answers, refine shared. • Communicate – shared with all interested parties – Users, stakeholders, management, developers MODELING Reality MODEL • Validate – a complete and accurate representation PROCESS present .

– Internal validation – consistent with the modeling rules present

– External validation – Who can do this? Re

SECONDARY: • Blueprint to Build (a Database) What drives or guides the process?

N

© Gordon C. Everest, All rights reserved. FactMod - September 26, 2014 Fact Modeling Fundamentals Data Modeling Zone, Portland, 2014 October Page 2

The Modeling Process Data Modeling Constructs

DMOD DMOD 7 8 What to look for : MODELING SCHEME Context METHODOLOGY: Constructs Steps/Tasks + Milestones + Deliverables + Composition ENTITY RELATIONSHIP Constraints (Object)

Real World perception MODELING IDENTIFIER Universe of Discourse selection/filtering PROCESS DOMAIN [ ]

characteristics:

REPRESENTATIONAL FORMS: MODEL ATTRIBUTE Narrative, Graphical Diagram, (Data Item) Formal Language Statements The Semantics (the Syntax) A Day of the Week: are most important characteristics Tuesday, Tues, Tu, Mardi, Martes... The SEMANTICS of a data model What’s the difference? can only be seen through the presentation, the SYNTAX. N

Data Modeling B Fact Modeling (ORM) - Preview 431INTRO ORMINTRO 9 10 • STARTS from some expression of the users world to be • CONSTRUCTS modeled… in data ERel uses three: Entity, Attribute, Relationship (E-A-R) – Applications depend on a well-designed database Fact Modeling uses two: Object, Relationship/Role • TRADITIONAL APPROACH – think ER/Relational tables • Elementary FACT SENTENCE is the basic construct • PROBLEM: some data items not in the right place OBJECT domain = Subject or Object (noun) RELATIONSHIP = Predicate (verb phrase) • SOLUTION: to find errors, apply the rules of normalization hence can directly verbalize a fact data model, • NORMALIZATION - the Achilles heel of data modeling including all the constraints in the model diagram – Even professional data modelers get it wrong • Represents all DOMAINS directly & only once • REMEDY for violations => record decomposition - both entity populations and attribute value sets So wouldn’t it be nice to have : • Explicitly represents all Relationships and in the same way • Modeling scheme which avoids the need for normalization - including all Functional Dependencies . => ORM (Object Role Modeling, or Fact Oriented Modeling) • Modeling tool for ORM => NORMA (also Visio in VS) • Add Constraints from a rich set – To automatically generate tables… in fifth normal form! => Defers clustering attributes to form entity tables.

G Record-based Design ER / Record-based Modeling B Ω ORMvER ORMvER 11 12 WHAT SEMANTICS ARE PRESUMED Real World BY THE FOLLOWING RECORD STRUCTURE? POPULATION of similar entities X A B VALUE VALUE DOMAIN DOMAIN • Do we know if it is Normalized (to 3NF)? How? Surrogate Lexical VALUE • Is X an attribute? Of what? Values DOMAIN Anything • What does it say about A alone? different for – Attribute? Of what ______? Presumes ______? ID ATTRIBUTES . . . attributes? ENTITY • What does it say about X˗A ? TABLE: X AB Consider population • What does it say about A˗B ? of Birthdate or Date Hired Design is always done at the schema (type) level. N N

© Gordon C. Everest, All rights reserved. FactMod - September 26, 2014 Fact Modeling Fundamentals Data Modeling Zone, Portland, 2014 October Page 3

What is an Attribute? Ω H1 - Typical Average Results B

ORMvER ORMvER 13 14 An ATTRIBUTE 40% miss: There exists a population of things called A. 90% miss: There exists a relationship between X and A. is an OBJECT... 100% miss: X is a descriptor of A (even when asked!) 35% miss/wrong: dependency/optionality characteristic. playing a ROLE 60% miss/wrong: multiplicity/exclusivity characteristic. 50% say or imply there is a relationship between A and B. in a RELATIONSHIP What might the results look like if presented with: A 1. Object Domains with some (other) OBJECT . X 2. Relationships What comes first? 3. Constraints B NO Tables, Identifiers or For. Keys many dependent N N

Extending an ER Diagram H2 - Problems for the Students

ORMvER ORMvER 15 Now that you know what this means: X A B 16 • Much confusion with the Foreign Key – Inconsistent with the relationship arc Revise the diagram to handle – Must have an ID to point to some changed semantics. – Can only represent at most a 1:Many relationship

• There exist some orphan A’s which have no X . • Putting an Attribute or Foreign Key in a table - Is A still dependent on X ? means the entity can have at most one of them. • A has additional attributes that are of interest. • Every Relationship must be represented • A can be multivalued for X . somewhere in the model with a pair of values ˗ so now what is the nature of the relationship X˗A ? (for a binary relationship) ˗ Does A remain in the table of X ? • M:N Relationship means there must be a • There exists a 1:Many relationship between A and B . composite key somewhere in the model, and vv. • There exists a M:N relationship between A and B . • Just because two attributes are together in a - Does the original diagram change? table does not mean there is a relationship.

Representing Relationships Representing a Relationship

ORMvER ORMvER 17 18 Multiple different ways: • The schema design level: X A (1) Intra-record => Entity with Attributes X A B C (3) Between/Among • The instance (data) level: Attributes? Spurious Associations? What if: (2) Inter-record => • some A' s can be orphans ? • What is the Identifier? All valid X-A pairs between Entities, Don’t know until…? A E F • Is A still dependent on X ? (in the R/W) Entity with •A has some other attributes? another Entity XA X A X A XA

Y P Q R X Must know the multiplicity/exclusivity characteristics of X-A Where are the before you can put X and A in a Table Diagram. Foreign Keys? That requires the designation of an identifier; a foreign key requires the prior designation of an identifier. How are they different? N

© Gordon C. Everest, All rights reserved. FactMod - September 26, 2014 Fact Modeling Fundamentals Data Modeling Zone, Portland, 2014 October Page 4

Representing a M:N Relationship The problem

DMOD WATSON5, Ch.5, p.115. ORMvER 19 Another 20 Pattern: EMPLOYEE PROJECT

• If you cannot store multiple Projects (or Project IDs) in an Employee record, or multiple Employees (or Employee IDs) in a Project record (as is the case in a in 1NF), then … “TABLE THINK” you must introduce an “Intersection Entity” between them to represent the Many-to-Many Relationship.

EMPLOYEE PROJECT … is the problem Is the Relational (“Logical”) Data Model EMPL-ID PROJ-ID for people or for the “machine/system”? • In the initial stages of modeling, must we always resolve M:N relationships into Why do we draw Data Model Diagrams? two 1:M relationships (i.e.,enforce 1NF)? • What about modeling ternary++ relationships? See description of “Tableitis” – a serious malady in our discipline.

N N

“TABLE THINK” Attribute Migration

ORMvER ORMvER 21 22 The process: focus on entity tables Given: Attribute... or Entity... or what? • Find Entities EMPLOYEE – Top down from a macro view to find major entities, or Emp# Name ... SkillCode What is SkillCode? – Bottom up from a list of data items ( say from a DFD ) • Design a Table for each entity type Now add a Description, etc. for SkillCode: SKILL clustering or adding attributes with each entity Now what is SkillCode? H • Columns for attributes of the entity – Name, Type SkillCode Desc Class ... What is the Relationship? • Designate/define the Identifier What is SkillCode in Employee table? • Represent Entity Relationships with Foreign Keys • [Normalize – not enforced or required by the system.] Suppose an Employee has multiple Skills: Table (Entity) Name ? Now what is the Entity? Emp# SkillCode To jump in and think of putting data into tables (In SQL, every table must have a name) means we must have already determined: What are the Relationships? • Entities, Attributes, Entity Identifiers, Foreign Keys Is SkillCode still an attribute of Employee? (relationships), and relationship characteristics H

Functional Dependency in Relationships – Testing your Understanding ΣΣΣ Normalization NORM NORM 23 24 Basis for Database/Table Normalization . A ← f (X) Assuming that A is single valued with respect to X (i.e. 1NF ). ◄ is functionally dependent on (enforced by construction) GIVEN: X A Could you have a violation of : (if not, why not?) X A 2NF? 3NF? 4NF or 5NF? or X → A determines ► X A X A B noYES no IE notation ---FM stops here Clustered into a record/table for entity of X: X A B YES no no Clustering attributes into tables X A … RULE: Store A with its the modeler may make a mistake. Determinant(s) as the Key. Normalization is the test, A is dependent on X. Encompasses Record decomposition is the remedy. all rules of If you don’t cluster attributes into tables, At most one A for each X (exclusive on A). Normalization! you cannot violate the rules of normalization. There can be multiple Xs for a given A . Normalization becomes unnecessary and irrelevant. N H

© Gordon C. Everest, All rights reserved. FactMod - September 26, 2014 Fact Modeling Fundamentals Data Modeling Zone, Portland, 2014 October Page 5

“TABLE THINK” – Problems-1 At the Root,

ORMvER ORMvER 25 26 • Anchors you on the entity of the identifier, which frames your thinking in one direction. • Inter-entity relationships represented redundantly with a What’s wrong with foreign key and an arc; users see it twice in the model. • Difficult to represent the entire population of an object, whether entity or attribute domain. ER el Modeling ? • Enforcing single-valued (atomic) attributes (i.e., 1NF) makes it impossible to directly represent M:N relationships. It requires an intersection “entity.” CLUSTERING • When speaking of attribute characteristics, most of the H time it is characteristics of the relationship with the entity. (required or optional, single- or multi-valued, unique) i.e., “TABLE THINK” • Putting an attribute in an entity table presumes a particular relationship with the entity without being explicitly defined, consequently we can do it wrong, requiring normalization.

N N H

Problems with ERel Modeling - Summary “TABLE THINK” – My Epiphany

ORMvER ORMvER 27 • Cannot capture the "conceptual" view directly, 28 Must mentally map to the "logical" (record-based) view • When we design a relational database we are by clustering Attributes into Entity records/tables. – Modeler must a priori choose whether Entity or Attribute actually modeling all and only the relationships – Too much clustering; attributes in the wrong place among things in the users’ world – Ignores (or presumes normalized) intra-record structure – Relationships both within and between entity tables (that is, relationships between/among Attributes) - creates (implies) spurious inter-attribute relationships – Clustering (“throwing in”) attributes into tables • Human modeler is responsible for normalization without explicitly defining the intra-entity relationships. remedy is always record decomposition – Designers make mistakes – thinking of tables first • Must choose unique names – for attributes in a record; for spurious new "entities" • We can only speak of attributes in the context of – column names = domain + role; lose object domains a relationship: _ _ _ _ is an attribute of _ _ _ _? • Modeling / Processing dilemma : – Complete representation of an entity object - more clustering – Full normalization (1NF) – decomposition, more fragmentation • We are not modeling all object domains first. • Indirect representation of M:N relationships with intersection “entity” All are solved in solved are All ! ORM • Difficulty representing Ternary relationships All a consequence of clustering ! clustering of consequence a All • Stability of the query language (SQL)

Record-based Design B Transform Record-based (ER) Design ∑ ORMvER ORMvER 29 30 WHAT DOES THIS “RECORD” REPRESENT? TO REALLY REPRESENT THE ENTITY DOMAINS

8 X ENTITY A B C What do you assume? X A B C

Design minimal "records" A with at most one non-key domain. Object X A Remedy for Normal form violations is Decomposition. X A This is the ultimate end of Record Decomposition. Role Model: X B Now what do these “records” represent? X B X R(X) R(B) B Perhaps Codd was right in naming it a ______! Avoids spurious associations, e.g., A -B … X C Could there be any violations of normal forms? X C C What about representing the entity X ? or any domain? OBJECTS (ENTITIES) have "ATTRIBUTES" (DESCRIPTORS) What if A is related to (or attribute of) other “entities”? by playing ROLES in RELATIONSHIPS with other OBJECTS . N

© Gordon C. Everest, All rights reserved. FactMod - September 26, 2014 Fact Modeling Fundamentals Data Modeling Zone, Portland, 2014 October Page 6

Defining an Object Population A Better Data Model Diagram

ORMvER ORMvER 31 • What is the population of A? 32 X A Y • How can you get the population of A? • How many different objects here? • How many relationships? Q 1. P XA Any orphan A’s? B

H 2. XA YA Is A still dependent on X? 1. Objects (domains) shown only once each Not distinguish Entities and Attributes 2. All Relationships shown, and the same way (an arc). 3. XA YA A P Q Any FKeys? 3. Add multiplicity constraints. 4. Where are the tables? Do we care? Who does care? 4. X A B A YA A P Q 2 Which is easier to design? To understand? How many times do you define a population of A’s? Redundant? X A B A 2 YA A P Q

Fact (ORM) Data Modeling Fact Modeling (ORM) Constructs

ORMINTRO ORMintro 33 34 THE ESSENTIAL DIFFERENCE: 8 • Three main constructs ..rolled into.. Two main constructs Fact Sentence

Record-based modeling: FACT (ORM) modeling: ENTITY ? ? ? ? OBJECT PREDICATE What to call it? (noun) ENTITY (verb) ATTRIBUTE OBJECT DOMAIN ENTRIBUTE ! REFERENCE Becomes an MODE Role in RELATIONSHIP Entity (table) Relationship RELATIONSHIP or Attribute Identifier

then rolled into a single construct – the Elementary Fact Sentence . Modifiers and qualifiers become the CONSTRAINTS

Record-Based Modeling Ex.1 G Record -Based Data Model Ex.2

ORMvER ORMvER 35 36 GIVEN TWO FACTS (conceptually): For: PERSON lives in / works in CITY • one about the CITY a PERSON lives in • What is the entity and what is the attribute? • another about the CITY a PERSON works in • Would it make any sense to say (to a novice layperson - a user) : ASSUME: – CITY was an "attribute" of PERSON not an entity (no table)? • every person has to live and work in a city • Doing more than is necessary at the conceptual level • each person can live and work in only one city (at a time) PERSON • not interested in anything more about persons or cities • cannot have CITY and CITY as attributes of PERSON EXAMPLE (two elementary fact instances): PersonID [key] • column/attribute name reflects " entity + role " LiveCity • CITY as an entity/object is lost (not its own table) Gordon Everest lives in Roseville and * works in Minneapolis WorkCity • what if there is a CITY where no one lives or works • some add concept of a DOMAIN in SQL DIAGRAM A CONCEPTUAL DATA MODEL (but in a Relational database it is not in our table diagram!) – to represent this information (a database to contain these facts)

© Gordon C. Everest, All rights reserved. FactMod - September 26, 2014 Fact Modeling Fundamentals Data Modeling Zone, Portland, 2014 October Page 7

Object -Role Model Ex.3 FACT ( ORM ) Modeling Process - detail ∑∑∑

ORMvER ORMODLG 37 38 for: PERSON lives in / works in CITY 1. Start with user descriptions of the modeling domain & processes - Look at what they say and do, at forms, reports, files, etc. lives in 2. break the narrative down into elementary fact sentences PERSON CITY - identify the nouns ( objects ), verbs ( relationships ), and constraints (id) (name) 3. Define object populations (types) for entities and “attributes” works in - give each a name, description, population of __, criteria for inclusion 4. Define relationships among members of those populations FORM L language statements (verbalization) : 5. Define constraints , business rules.. on objects and relationships • PERSON lives in CITY FACT - words like: mandatory/optional ,multiple/exclusive, every, must, may, many, only, if ... Required / 6. Draw a diagram (build iteratively) according to a modeling scheme • Every PERSON lives in some CITY Mandatory 7. Present model diagram, narrative, & verbalization for user validation • Derive the tables (automatically; normalized) for implementation • Each PERSON lives in at most one CITY Unique ... Similarly for works in (Identifier) NOTE: No tables, no identifiers, no data item types, no foreign keys, no distinguishing attributes vs. entities… yet! -that’s implementation G TABLE THINK puts the cart before the horse and Now add: PERSON makes sales calls in multiple CITIES frames our view of the world.

noun User Descriptions verb Σ Verbalize constraint Elementary Fact Sentences ORMODLG ORMODLG HALPIN 08 -§3.3. 39 GIVEN A DESCRIPTION FROM THE USER(S): 40 ENGLISH GRAMMAR - Structure of a Sentence: Famous Foods, a small, specialty food wholesaler, fills orders for restaurants. Customers have names, addresses, etc. An order can include several products. SUBJECT + PREDICATE [ + OBJECT ] Products have unique SKU numbers, descriptions, manufacturer, etc. The company has one big warehouse with many rooms on several floors. Each product NOUN VERB ( phrase ) NOUN is stored in only one bin location in the warehouse, but it can change frequently. Multiple products may be stored in the same bin. Bin numbers are only unique within a room, hence the same number can be used in different rooms. Since the Elementary Fact -- cannot be decomposed into pieces which bin locations can be hard to find in a room (could be on a shelf, on the floor, in a collectively provide the same information as the original fact. cabinet or cooler, hanging from the ceiling, etc.), and the rooms can be hard to find in the warehouse (with many hallways, doors, tunnels, split levels, mezzanines, George runs. => UNARY etc.), explicit location directions must be recorded for each room and for each bin in George runs to the store. => BINARY the room. Location information is a textual narrative and is used by the pickers George likes to run. NOT: who run around gathering the items to fill an order. Each product has its own George likes to run and jump. standard price but it may be modified by applying a discount (a fraction) on any individual order. The discount can be different for each of the products on an order, George does not like to run. … CLOSED WORLD ASSUMPTION and for the same product on different orders. The quantity of each product on an George and Mary like to run. … together! => TERNARY order is recorded ( it is not the quantity on hand or in inventory). Terms indicates If George runs , then Mary runs . the number of days during which a standard discount can be taken on the All people who run are happy! …HANDLED WITH ROLE SET CONSTRAINTS payment. The terms can vary from one customer to the next, and from one order to the next for the same customer. ..

Fact Sentence - Verbalize Symbolize: ORM Basic Diagram (binary)

ORMODLG ORMODLG HALPIN 08 -§3.3. 41 • Naming Object Type Populations 42 An elementary fact sentence – the building block - A surrogate to represent and reference those things - Singular noun Object Predicate Object – INSTANCE (VALUE) e.g., “Gordon Everest” role of X | role of B – ROLE e.g., “Employee” (Employee) (works in | employs) (Department) – GENERIC e.g., “Person” – REFERENCE MODE e.g., “FirstName” + “LastName” X R(X) R(B) B • Arity - the number of object “holes” in the Predicate Describe each of the Object Populations. – UNARY: - “Ann smiles ” Describe the Relationships. - only 2 states: true/false, present/absent, yes/no - making the closed world assumption Describe the Constraints.

– BINARY: - “Ann likes to run” [EMPLOYEE ] - most common PERSON works in employs DEPARTMENT - has an inverse - “Running is liked by Ann” (passive voice) - Inverse name is different (else symmetric, handled differently) – TERNARY: - “Ann married Bob in 1967” Verbalization: “PERSON works in DEPARTMENT” with types: -“PERSON married PERSON in YEAR ” “DEPARTMENT employs PERSON” - verbalizing can be difficult with more than 2 ( sequence problem )

© Gordon C. Everest, All rights reserved. FactMod - September 26, 2014 Fact Modeling Fundamentals Data Modeling Zone, Portland, 2014 October Page 8

Exercise – ORM Diagrams G Nested ‘Objectified’ Predicate/Fact

ORMODLG ORMODLG HALPIN 08 -Ex.3.4 #8, p.94. HALPIN 08 -§3.4,p.88. 43 Find the illegal ORM Diagrams; explain why: 44 Adding ‘attributes’ to a relationship:

A B C Enrollment

F Student takes Course D E

G H J earns Grade What kind of relationship? L What must exist to represent the relationship? Where do we put the Grade? What is the determinant of Grade? RULE : Each role box connects to exactly one object population. Give the objectified predicate a name.

H

Object Reference Mode Rules of Logical Database Design

ORMODLG DMOD HALPIN 08 -§3.4,p.84. Everest-DM: §6.5.3 p.247. 45 46 Non-Lexical Object Types (NOLOTS): IDENTIFIERS: need a surrogate for each entity WATSON5-7p184,189f. SIMSION-Essentials . works in employs DEPARTMENT Designate the attribute(s) and/or relationship(s) which uniquely PERSON identify entity instances in each entity type. CRITERIA: (conflicts necessitate compromise) • Unique - Guaranteed! bridge Shorthand (simplifies the diagram) : • Ubiquitous - have a value for every entity instance • add a REFERENCE MODE to each • Unlimited - won't ever run out of values; reuse? • Unchanging, never changes - "immutable" Lexical Non-Lexical Object Type Object • Under your own control - manage the values/codes name code • Used by people in the user environment Type • 'Mnemonic' (easy for people to recognize/remember/generate) • "Dataless" - sole purpose is to identify PERSON works in employs DEPARTMENT - should carry no other information; don't overwork an identifier (name) (code) • Compact, easy to process and store – often means 'numeric' and 'fixed length' (secondary) Now, every Object has a lexical handle. > else, invent one (“Autonumber”) - only if none exists naturally N

Adding ORM Constraints Uniqueness Constraint

ORMCONSTR ORMCON 47 48 • to represent the exclusivity / multiplicity characteristic of a PERSON works in employs DEPARTMENT relationship (ER focuses on the Entity tables) • ORM focuses on the Elementary Fact Sentence, so... construct a ‘fact table’ of representative instances: Verbalization : “PERSON works in DEPARTMENT” “DEPARTMENT employs PERSON” EMPLOYEE works in DEPARTMENT Peterson 2000 DEPENDENCY (REQUIRED or MANDATORY): Lynn 2000 “PERSON must work[s] in some DEPARTMENT Carr 2100 Callagan 2100 (at least one ) Guttman 2110 :: EXCLUSIVITY (UNIQUENESS): “PERSON works in at most one DEPARTMENT” • Put a line across the role(s) that make the predicate unique: Prefer the "fork" for Multiplicity. X EMPLOYEE works in employs DEPARTMENT An EMPLOYEE can participate in the relationship with DEPARTMENT at most once .

© Gordon C. Everest, All rights reserved. FactMod - September 26, 2014 Fact Modeling Fundamentals Data Modeling Zone, Portland, 2014 October Page 9

Graphical Notation - Uniqueness/Multiplicity Uniqueness Constraint (UC) - Rules

ORMCON ORMCON 49 50 ∑ in an ER Diagram: • Most difficult to grasp; poor intuitive visualization EMPLOYEE DEPARTMENT • Must be one on every predicate (the only required constraint)

Thinking about diagramming this: • Four possibilities for a binary predicate (multiplicity) Legal possibilities for a ternary predicate? Draw them. EMPLOYEE DEPT • UC across all roles in a predicate is always implied, but not shown if a “shorter” one is also true. Relationship is like another entity; a predicate! Alternatively: • There may be multiple UC on a predicate, but never one EMPLOYEE DEPT wholly within the other. Why?

Must think of arc as “going through” the predicate box. • Arity Check: Never more than one non-key role. Why? The ORM Uniqueness line would be redundant. in ORM: • A unary predicate always has a UC, so never shown. EMPLOYEE DEPT • Can enter sample data for the predicate and have NORMA Intuitive visualization of manyness is lost. infer the Uniqueness Constraint (click Analyze ). So N ORM A has an option to add the fork. Try several different cases; verify with user domain experts. N

Uniqueness Constraints - Exercises G Primary Reference Identifier

ORMCON ORMCON Halpin 08 -§5.3. 51 • The keys for certain fact types are as shown. On this basis, 52 • an External Uniqueness Constraint (U) is like a which of these fact types are definitely splitable? – Halpin 08 Ex.§4.5, #1. Composite Key. If object reference scheme requires a (a) (b) composite key, replace with (P) for : E.g., Room numbers are only unique within a Building, so a composite of Building Name and Room Number (c) (–) is required to uniquely identify a physical room.

BUILDING ROOM ROOM# (e) (f) (name) Okay? ROOM (g) (h) BUILDING ROOM# (name) Why not? P RULE: At most one non-key domain. In ORM these are equivalent. ... tables? NOTICE the placement of the forks. N

Mandatory Role Constraint Mandatory Role Constraint - Rules

ORMCON Halpin -§5.2. ORMCON 53 08 54 ∑ • the Optional / Dependent characteristic of a relationship • Every Object (type) has a population. • Every Role has a population… of what? ∀ T • also called: - total role constraint ( other notations: ) • The Pop(Object) = UNION of Pop of all its roles, except… - exhaustibility OR • World recorded in the database cannot be stronger EMPLOYEE works in employs DEPARTMENT than what can be tolerated in the real world

– every EMPLOYEE must work in some DEPARTMENT. • All roles in a predicate are mandatory (all or nothing) • Mandatory constraint is implied if an object plays only one • Disjunctive mandatory: every EMPLOYEE must receive a SALARY or earn a COMMISSION (or both = inclusive) role in the database (not shown unless…) • Default Disjunctive Mandatory is always implied receives SALARY across all roles played by an object in the database, EMPLOYEE i.e., each object instance (value) must play a role with at least one other object in the database … earns COMMISSION • If not, i.e., there may be orphans of that object type, then • External must declare the object to be INDEPENDENT . Disjunctive draws PENSION (shown in NORMA with a ! after the name) Mandatory: • NEVER show the implied disjunctive mandatory constraint Verbalize it: N - why?

© Gordon C. Everest, All rights reserved. FactMod - September 26, 2014 Fact Modeling Fundamentals Data Modeling Zone, Portland, 2014 October Page 10

Ternary++ Relationship Nested ‘Objectified’ Predicate/Fact Ternary Facts… are suspect (4NF) ORMCONSTR ORMCONSTR HALPIN 08 -§4.3. 55 56 ______------• Must examine carefully (if you really think you have one) Student Course What is wrong with this? Proficiency • Grade is single-valued. What is wrong with this? Grade • Must have a Grade. Employee Skill

For any two: (1) Can the third have multiple values? Student takes Course (2) Can the third be null?

What is wrong with this? INDIVIDUAL earns Grade HOUSEHOLD ROLE • Equivalent, … IF grade is mandatory. How? • Ternary is best for clarity, … if only one “attribute” on the relationship. If (1)YES and (2) NO for all 3 pairs, • Nested is best if… several roles on the nested predicate (avoids redundancy) and/or more might be added later. then this constraint is correct.

Ternary++ Relationships G Value Set Constraint

ORMCONSTR ORMCONSTR HALPIN 08 -§3.4,p.88. Halpin 08 -§6.3. 57 58 On the Population of an Object, What about this? defined by: {.....} CONTRIBUTION: • Enumeration - { M , F } Object DATE DONOR AMOUNT FUND • Range (if an ordering) – { 1 … 10 }

In "Table Think" • Pattern of Characters – what is the first question you must ask? – e.g. { a15} = 15 alpha characters In ORM, define in sequence: – e.g. { d6.d2 } = up to 6 digits followed by 2 digits after a decimal point 1. Objects / Domains 2. Relationships between/among them • Reference Entity (Table) or Master List (of codes) 3. Identifier for each Object to contain all possible values of the Entity, even if not used elsewhere in the database . 4. Constraints. – Declare the Entity as ‘Independent’ ORM gives an explicit means to analyze and to allow for orphans, e.g., ... SKILL ! represent higher order relationships. (code) N

Frequency Constraints Role Population Constraints

ORMCONSTR Halpin -§7.2. ORMCONSTR Halpin -§6.4. 59 08 60 08 • Role Frequency • Roles must be on the same entity type population • Only meaningful if both roles are optional – Limits the number of times an object can play a role; or the number of times a role (or role combination) can appear in a • Applies at the type level, on whole populations fact table. • Based on existence in populations, not value-based – Place on the role(s) of the Predicate: role1 May also apply to multiple roles - an ‘n’ or a range ENTITY in the predicates. - optionally, with comparator operators ( <, >, …) (a composite). - or a Range - (min … max) role2 DO A DEMO >

SUBSET EMPLOYEE works on contains COMMITTEE EQUALITY EXCLUSION Order If R2, If R2, matters. UI R1 IFF R2 = then -R1 then R1. <=3 7 9 R1 R2 and vv. R2 R1 • Object Cardinality (indirect in NORMA thru Value Set) pop(R1)=pop(R2) pop(R1)^pop(R2)=NULL pop(R2)<=pop(R1) – Limits the size of an Object Population Not in Record-based modeling schemes! Not in Record-based modeling schemes! N

© Gordon C. Everest, All rights reserved. FactMod - September 26, 2014 Fact Modeling Fundamentals Data Modeling Zone, Portland, 2014 October Page 11 Homogeneous Predicate Homogeneous Predicate Reflexive Relationship, Ring Fact Type Reflexive Relationship, Ring Fact Type ORMCONSTR Halpin -§7.3. ORMCONSTR Halpin -§7.3. 61 08 62 08 • PERSON ----- parent of | child of ----- PERSON (M:M) • PERSON ----- parent of | child of ----- PERSON (M:M) • PERSON ----- mother of | child of ----- PERSON (1:M) • PERSON ----- mother of | child of ----- PERSON (1:M) • PERSON ----- husband of | wife of ----- PERSON (roles) • PERSON ----- husband of | wife of ----- PERSON (roles)

A A

PERSON IR PERSON IR AS AS IT IT B B AC AC parent of child of Ring Constraints parent of child of Ring Constraints Ann Ann NO -> IRreflexive C Ann Ann NO -> IRreflexive C Ann Barb TRUE Ann Barb TRUE Barb Ann NO -> ASymmetric => IR Barb Ann NO -> ASymmetric => IR Barb Curt TRUE (=> implies ) Barb Curt TRUE (=> implies ) Ann Curt NO -> InTransitive => IR Ann Curt NO -> InTransitive => IR Curt Ann NO -> ACyclic => AS => IR Curt Ann NO -> ACyclic => AS => IR

Not in Record-based modeling schemes Not in Record-based modeling schemes

Ring Constraints Subtype / Supertype Constraints

ORMCON SSTYPE 63 Applies at 64 ANS IR SYM AS Instance WITHOUT CONSTRAINTS, AC IT AC+IT IT+SYM Level. ASSUME THE MORE GENERAL CASE: AS+IT IR+SYM IF THEN • overlapping subtype populations – a supertype member may be in more than one subtype ● IRreflexive: IF x Rx , THEN -(x Rx ) ● Reflexive* 1 2 2 1 • non-exhaustive (Partial) on the supertype ● ASymmetric: IF xRy, THEN -(yRx). ==> IR – a supertype member need not be in any subtype

● ● ANtiS ymmetric : IF xRy, THEN -(yRx), but x=y =/=> IR DIAGRAMMING CONSTRAINTS : P ● ● SYMmetric: IF xRy, THEN yRx (forced) 1. Exclusive or disjoint subtypes – X 2. Exhaustive (Mandatory) on supertype – InTransitive: IF xRy AND yRz, THEN -(xRz). ==> IR ● ● & Strongly InTransitive* 3. Rule for membership in a Subtype X Transitive*: IF xRy AND yRz, THEN xRz (forced) - based on attribute(s) of Supertype M F C ● e.g., M: where Sex = Male ACyclic: IF xRy AND yRz, THEN -(zRx). ==> AS (IR) ● ● C: where Age < 13 years e.g., MALE, FEMALE, CHILD * added 2012. N

Constraints Summary Fact Modeling - Reprise

ORMCONSTR ORMvER 65 66 Overcoming the Limitations of ERel… requires a different way of thinking ORM can capture and • on Object Populations graphically represent – Value Set ('Entity Type') 1. First, think and model object populations X A many more integrity  Name with a singular noun reflecting R/W object constraints than – Independent  Describe the population; criteria for inclusion ERel. – Subtype/Supertype  Show only once in the model diagram  [Optional: Designate a lexical surrogate, i.e., identifier] • on Predicates  Note: all object populations are mutually exclusive – Uniqueness; Ring Introduce Subtypes/Supertypes if not. R 2. Then find all relevant relationships X A  Define them explicitly and in the same way  Name the roles objects play in the relationships (verbs)

3. Define integrity constraints or “business rules” R • on Role Populations X A – Frequency; Role sets 4. Present it to the user domain experts External: for human understanding and validation – involving Roles • on Object Roles Only then are you ready to put into tables ...... from multiple Predicates – Mandatory X A

© Gordon C. Everest, All rights reserved. FactMod - September 26, 2014 Fact Modeling Fundamentals Data Modeling Zone, Portland, 2014 October Page 12

Introduction to ORM VisioEA / NORMA Architecture

ORMINTRO ORMINTRO ∑∑∑ 67 68 Population FORML fact sentences Tables FACT EDITOR DIAGRAMMER • Free form Demo of • Guided CONCEPTUAL DATA MODEL (ORM) correct VERBALIZER VALIDATE BUILD (CHECK) NORM A DICTIONARY • Open source (SOURCEFORGE) - free plug-in to VS.net "LOGICAL" • Based on second generation notation for ORM2 DICTIONARY DATA MODEL refine “Repository” (TABLES) • More compact display BROWSER • Language Neutral (internationalization) GENERATE • Agreed on by a majority of 18 experts PHYSICAL DATABASE VisioEA (Microsoft Visual Studio, team editions) STRUCTURE & DEFINITION based on first generation of ORM. for a target DBMS

Generating the Relational Tables Transforming ERel to ORM LEGEND: ORMvER ORMvER Primary Key 69 70 Two simple rules: GIVEN a complete ORM diagram: Foreign Key 1 M 1. Each object which has a M:1 relationship with any Customer Order Line Item Item other object, gets its own table. C N A P O D T C I O Q P r I De Pr L

2. Every relationship (predicate) which is based on ADD Foreign Keys (redundant). NOTE: Duplicate fields - C P r I O . a composite key, gets its own table. Break out all Entities and Attributes into separate Objects: ADD Relationships (presuming above to be fully normalized): X B X B C O O I I X C A L N P D T C Q Pr Guaranteed to be A De A fully normalized! NOTE: All Object Domains are shown only once! Can we go the opposite direction: NOTE: All Functional Dependencies are now explicitly shown! ORM from a Relational database?

Data Modeling in ORM G G ORM Data Model - Presentation

ORMvER DMODPRE 71 Relational Tables? 72 Try it yourself, starting with => 7 SALARY from original earns paid to 3 A ! D (dollars) A major criticism of NIAM Modify / extend the diagram ORM Diagram = / ORM, both by 4 [ X | A | B | C ] DEP with these semantics: EMPLOYEE works in employs protagonists and 5 (number) (number)T For each X: 1 2 proponents, is that it is too detailed , a bottom- 1. A is REQUIRED ( ●) ? BOSS supervises is headed by X B up design, ? reports to superior to 2. A is UNIQUE for all X ac BUT… ER Diagrams 6 3. A is MULTIVALUED( –<) Relational Tables: ? may spend up to of spending for LIMIT usually hide the details [ X | B? | C ] [ X | A ] of attributes and most 4. A is INDEPENDENT (!) [ A | B | D ] [ B | C ] C "EmployeeSkill!" i.e., can be ORPHAN { 1000 .. 9999 } constraints. DESCRIPTION NOTE: possesses possessed by SKILL has is of Also: (code) (name) <=5 5. B is FUNCTIONALLY • No "Attributes" • No 'TABLE THINK' So, present the ORM model DEPENDENT on A • No Foreign Keys { 1 .. 10 } with proficiency of assigned to RATING using a series of top-down 6. B and C are RELATED • No Normalization abstractions . 7. D is an ATTRIBUTE of A • Focus on Object Domains • Think all Relationships N

© Gordon C. Everest, All rights reserved. FactMod - September 26, 2014 Fact Modeling Fundamentals Data Modeling Zone, Portland, 2014 October Page 13

Abstractions of ORM Data Model Stages of Data Modeling ∑ DMODPRE DMOD 73 74 SALARY earns paid to 1. Hide "Terminal" (M:1) (dollars) Start at the highest Conceptual Level! Objects ( => Attributes) USER DEPT EMPLOYEE works in employs (number)DEPT 2. Hide Reference Modes (number) 3. Hide Constraints Domain Knowledge BOSS 4. Hide Less Important Objects & “CONCEPTUAL” supervises is headed by Predicates reports to superior to - Subtypes E-R “LOGICAL” ac - Objectified Predicates ORM - Reflexive Relationships Cluster attributes may spend up to of spending for LIMIT • Objects RELATIONAL 5. Hide all Predicates into Records "EmployeeSkill!" • Obj. ID’s PHYSICAL { 1000 .. 9999 } MultiValued, Decompose DESCRIPTION Leaving BASE Entities ! • Roles/Relships • Implementation possesses possessed by SKILL has is of Flatten (1NF) (code) (name) Nested ------> <=5 • (Fnl. Dep) in/for a DBMS 6. Add back Multiplicity Ternaries ------> Binary only char. on relationships • Sub/SupTypes •Denormalize M:N ------> 1:Many only (for performance) { 1 .. 10 } NO clustering Is this the same data model => A High-level Abstract Normalized (2,3,4) Primary Keys + triggers, stored with proficiency of assigned to RATING => NO “attributes” procedures we started with? “Conceptual” Data Model... Relationships - - -> Foreign Keys w/attributes an ER Diagram ?!!! A common thread DATABASE SCHEMA (DDL)

The Many Faces of Databases B Fact Modeling

DMOD 75 Object-Role (ORM) 76 Object- Multi-Dimensional Oriented ANSI SQL Snowflake (UML) 7s 6 Database 8 5 Relational Multi-File CODASYL 7 (M) Network 3 4 Questions? No File Hierarchical (0) 2 Single File (COBOL) Flat File File (FORTRAN) (1) 1 ©Gordon C. Everest What do all these Professor Emeritus have in common ?----- > Carlson School of Management Logical Database Structures University of Minnesota [email protected] www.tc.umn.edu/~geverest

H2 – An Orphan Attribute T H2 – Attributes of an Attribute T ORMvER ORMvER 77 5. There exists an A (one or more) which is NOT 78 6. Suppose we now have an additional data attribute D associated with any X , that is, some A s are orphans. which is of interest to us about B . Many got this, but missed 5. There is nothing else of interest about A . Is B still dependent on X? HINT: Think “What is the correct design” then pick it. H 1. X A B B Desc 1. X AR B • Is A a ‘foreign key’ in X? • All A’s in A ?

• Redundancy ? 2. esc 2. XB A H X A B B D

3. 3. X A B A X A B D

4. R 4. X A B A X A B B Desc

5. B Desc 5. XB AX XA H H N N Special case: a ‘Decode Table’ for B

© Gordon C. Everest, All rights reserved. FactMod - September 26, 2014 Fact Modeling Fundamentals Data Modeling Zone, Portland, 2014 October Page 14

H2 – Multi-valued Attribute T H2 – Attributes with M:N Relationship T

ORMvER ORMvER 79 7. X is NOT exclusive on A , that is, there may be 80 8. There exists a M:N relationship between A and B , multiple values of A for a given X . Many missed this. in addition to the relationship each of them has with X . First, understand the semantics 1. X A A B B A What about: XB A 1. X A A B B

2. X B A 2. X A B

3. X B X A 3. X A A B B A B

4. X B A X 4. X A A B B A B

5. X B X A 5. X A A B B A B

H H

N H N

H2 – Attributes with 1:M Relationship T

ORMvER 81 9. There is a many-to-one relationship between A and B since we observe that for every different or unique value of A , the values of B are all the same value.

1. X A B

2. X A B A

3. XB BA

4. XA AB

5. XA AB

HN

© Gordon C. Everest, All rights reserved.