Modeling Fundamentals Data Modeling Zone, Portland, 2014 October Page 1
Total Page:16
File Type:pdf, Size:1020Kb
FactMod - September 26, 2014 Fact Modeling Fundamentals Data Modeling Zone, Portland, 2014 October Page 1 Data Modeling Zone, Portland, 2014 October The Advanced Database Design Course GETITLE 431INTRO 1 2 DESCRIPTION: Fact • Teaches a different way of thinking – Fact Modeling ( ORM ) Avoids “TABLE THINK” and all of its consequent problems The need for normalization; diagrams difficult to understand Learning ORM is the treatment for “Tableitis” Modeling • Mix of students & working professionals • Professionals often have more trouble unlearning what Fundamentals they know, have learned, and have practiced • Prior knowledge and experience – Used a DBMS to setup and query/manipulate tables ©Gordon C. Everest Professor Emeritus of MIS and Database • Now offered online, piggy back on a face-to-face class Carlson School of Management University of Minnesota • This workshop arises out of my observations of problems [email protected] www.tc.umn.edu/~geverest stemming from faulty thinking – “TABLE THINK” Logical Database Design Outline Objective, Principles, Benefits B ORMvER DMOD 3 SLIDE # 4 4 • Data Modeling – What, Why, How, the process, constructs OBJECTIVE of LOGICAL DATABASE DESIGN: The WHAT – traditional ERel data modeling TO ACCURATELY AND COMPLETELY MODEL 11 • “TABLE THINK” – and consequent problems SELECTED PORTIONS OF THE REAL WORLD OF INTEREST TO A COMMUNITY OF USERS. Fact Modeling ( e.g., ORM ) – interplay among entities/objects, relationships, attributes, identifiers • USERS (COLLECTIVELY) WILL ALWAYS KNOW MORE 29 • Transitioning from ERel to Fact Modeling (ORM) ABOUT A DATA STRUCTURE THAN THE SYSTEM KNOWS, OR THAN COULD BE DEFINED TO THE SYSTEM. 38 • ORM in greater detail – verbalization in fact sentences; symbolization (diagramming) • WHAT IS NOT FORMALLY DEFINED TO THE SYSTEM, 47 • ORM Constraints – uniqueness (multiplicity/exclusivity); THE SYSTEM CANNOT MANAGE . THE USERS MUST! mandatory role; handling ternary++ relationships; other 67 • ORM Modeling Tool – architecture; NORMA demo; •THEREFORE, NEED TO CAPTURE RICH SEMANTICS generating Relational tables; abstraction in model presentation WITH COMPREHENSIVE DATA MODELING and DEFINITION, INCLUDING INTEGRITY CONSTRAINTS AND OPERATIONS. 74 • Data Modeling Schemes – where ORM fits. 77 • APPENDIX: Some ER/Relational design exercises Let the ‘system’ do it! Implications for a Tool! N Purpose of Modeling – the WHY Modeling Process – the HOW DMOD DMOD 5 6 To Facilitate Human Communication, MODEL = Abstract (Re). present .(ation) Understanding, Validation Knowledge Knowledge Knowledge • Capture semantics – all relevant, important details externalized, • Document – record and remember in the world in the head formalized, (infinitely complex) (mental models) • Understand – learn , raise questions, record answers, refine shared. • Communicate – shared with all interested parties – Users, stakeholders, management, developers MODELING Reality MODEL • Validate – a complete and accurate representation PROCESS present . – Internal validation – consistent with the modeling rules present – External validation – Who can do this? Re SECONDARY: • Blueprint to Build (a Database) What drives or guides the process? N © Gordon C. Everest, All rights reserved. FactMod - September 26, 2014 Fact Modeling Fundamentals Data Modeling Zone, Portland, 2014 October Page 2 The Modeling Process Data Modeling Constructs DMOD DMOD 7 8 What to look for : MODELING SCHEME Context METHODOLOGY: Constructs Steps/Tasks + Milestones + Deliverables + Composition ENTITY RELATIONSHIP Constraints (Object) Real World perception MODELING IDENTIFIER Universe of Discourse selection/filtering PROCESS DOMAIN [ FOREIGN KEY ] characteristics: REPRESENTATIONAL FORMS: MODEL ATTRIBUTE Narrative, Graphical Diagram, (Data Item) Formal Language Statements The Semantics (the Syntax) A Day of the Week: are most important characteristics Tuesday, Tues, Tu, Mardi, Martes... The SEMANTICS of a data model What’s the difference? can only be seen through the presentation, the SYNTAX. N Data Modeling B Fact Modeling (ORM) - Preview 431INTRO ORMINTRO 9 10 • STARTS from some expression of the users world to be • CONSTRUCTS modeled… in data ERel uses three: Entity, Attribute, Relationship (E-A-R) – Applications depend on a well-designed database Fact Modeling uses two: Object, Relationship/Role • TRADITIONAL APPROACH – think ER/Relational tables • Elementary FACT SENTENCE is the basic construct • PROBLEM: some data items not in the right place OBJECT domain = Subject or Object (noun) RELATIONSHIP = Predicate (verb phrase) • SOLUTION: to find errors, apply the rules of normalization hence can directly verbalize a fact data model, • NORMALIZATION - the Achilles heel of data modeling including all the constraints in the model diagram – Even professional data modelers get it wrong • Represents all DOMAINS directly & only once • REMEDY for violations => record decomposition - both entity populations and attribute value sets So wouldn’t it be nice to have : • Explicitly represents all Relationships and in the same way • Modeling scheme which avoids the need for normalization - including all Functional Dependencies . => ORM (Object Role Modeling, or Fact Oriented Modeling) • Modeling tool for ORM => NORMA (also Visio in VS) • Add Constraints from a rich set – To automatically generate tables… in fifth normal form! => Defers clustering attributes to form entity tables. G Record-based Design ER / Record-based Modeling B Ω ORMvER ORMvER 11 12 WHAT SEMANTICS ARE PRESUMED Real World BY THE FOLLOWING RECORD STRUCTURE? POPULATION of similar entities X A B VALUE VALUE DOMAIN DOMAIN • Do we know if it is Normalized (to 3NF)? How? Surrogate Lexical VALUE • Is X an attribute? Of what? Values DOMAIN Anything • What does it say about A alone? different for – Attribute? Of what _______? Presumes ________? ID ATTRIBUTES . attributes? ENTITY • What does it say about X˗A ? TABLE: X AB Consider population • What does it say about A˗B ? of Birthdate or Date Hired Design is always done at the schema (type) level. N N © Gordon C. Everest, All rights reserved. FactMod - September 26, 2014 Fact Modeling Fundamentals Data Modeling Zone, Portland, 2014 October Page 3 What is an Attribute? Ω H1 - Typical Average Results B ORMvER ORMvER 13 14 An ATTRIBUTE 40% miss: There exists a population of things called A. 90% miss: There exists a relationship between X and A. is an OBJECT... 100% miss: X is a descriptor of A (even when asked!) 35% miss/wrong: dependency/optionality characteristic. playing a ROLE 60% miss/wrong: multiplicity/exclusivity characteristic. 50% say or imply there is a relationship between A and B. in a RELATIONSHIP What might the results look like if presented with: A 1. Object Domains with some (other) OBJECT . X 2. Relationships What comes first? 3. Constraints B NO Tables, Identifiers or For. Keys many dependent N N Extending an ER Diagram H2 - Problems for the Students ORMvER ORMvER 15 Now that you know what this means: X A B 16 • Much confusion with the Foreign Key – Inconsistent with the relationship arc Revise the diagram to handle – Must have an ID to point to some changed semantics. – Can only represent at most a 1:Many relationship • There exist some orphan A’s which have no X . • Putting an Attribute or Foreign Key in a table - Is A still dependent on X ? means the entity can have at most one of them. • A has additional attributes that are of interest. • Every Relationship must be represented • A can be multivalued for X . somewhere in the model with a pair of values ˗ so now what is the nature of the relationship X˗A ? (for a binary relationship) ˗ Does A remain in the table of X ? • M:N Relationship means there must be a • There exists a 1:Many relationship between A and B . composite key somewhere in the model, and vv. • There exists a M:N relationship between A and B . • Just because two attributes are together in a - Does the original diagram change? table does not mean there is a relationship. Representing Relationships Representing a Relationship ORMvER ORMvER 17 18 Multiple different ways: • The schema design level: X A (1) Intra-record => Entity with Attributes X A B C (3) Between/Among • The instance (data) level: Attributes? Spurious Associations? What if: (2) Inter-record => • some A' s can be orphans ? • What is the Identifier? All valid X-A pairs between Entities, Don’t know until…? A E F • Is A still dependent on X ? (in the R/W) Entity with •A has some other attributes? another Entity XA X A X A XA Y P Q R X Must know the multiplicity/exclusivity characteristics of X-A Where are the before you can put X and A in a Table Diagram. Foreign Keys? That requires the designation of an identifier; a foreign key requires the prior designation of an identifier. How are they different? N © Gordon C. Everest, All rights reserved. FactMod - September 26, 2014 Fact Modeling Fundamentals Data Modeling Zone, Portland, 2014 October Page 4 Representing a M:N Relationship The problem DMOD WATSON5, Ch.5, p.115. ORMvER 19 Another 20 Pattern: EMPLOYEE PROJECT • If you cannot store multiple Projects (or Project IDs) in an Employee record, or multiple Employees (or Employee IDs) in a Project record (as is the case in a Relational Database in 1NF), then … “TABLE THINK” you must introduce an “Intersection Entity” between them to represent the Many-to-Many Relationship. EMPLOYEE PROJECT … is the problem Is the Relational (“Logical”) Data Model EMPL-ID PROJ-ID for people or for the “machine/system”? • In the initial stages of modeling, must we always resolve