Data Modeling Overview

Total Page:16

File Type:pdf, Size:1020Kb

Data Modeling Overview

Data Modeling Overview

A Data model is a conceptual representation of data structures (tables) required for a database and is very powerful in expressing and communicating the business requirements.

A data model visually represents the nature of data, business rules governing the data, and how it will be organized in the database. A data model is comprised of two parts logical design and physical design.

Data model helps functional and technical team in designing the database. Functional team normally refers to one or more Business Analysts, Business Managers, Smart Management Experts, End Users etc., and Technical teams refers to one or more programmers, DBAs etc. Data modelers are responsible for designing the data model and they communicate with functional team to get the business requirements and technical teams to implement the database.

The concept of data modeling can be better understood if we compare the development cycle of a data model to the construction of a house. For example Company ABC is planning to build a guest house(database) and it calls the building architect(data modeler) and projects its building requirements (business requirements). Building architect(data modeler) develops the plan (data model) and gives it to company ABC. Finally company ABC calls civil engineers(DBA) to construct the guest house(database).

Need for developing a Data Model

 A new application for OLTP (Online Transaction Processing), ODS (Operational Data Store), data warehouse and data marts.  Rewriting data models from existing systems that may need to change reports.  Incorrect data modeling in the existing systems.  A database that has no data models.

Advantages and Importance of Data Model

 The goal of a data model is to make sure that all data objects provided by the functional team are completely and accurately represented.  Data model is detailed enough to be used by the technical team for building the physical database.  The information contained in the data model will be used to define the significance of business, relational tables, primary and foreign keys, stored procedures, and triggers.  Data Model can be used to communicate the business within and across businesses.

Data Modeling Development Cycle

Gathering Business Requirements - First Phase: Data Modelers have to interact with business analysts to get the functional requirements and with end users to find out the reporting needs.

Conceptual Data Modeling(CDM) - Second Phase:

This data model includes all major entities, relationships and it will not contain much detail about attributes and is often used in the INITIAL PLANNING PHASE. Logical Data Modeling(LDM) - Third Phase: This is the actual implementation of a conceptual model in a logical data model. A logical data model is the version of the model that represents all of the business requirements of

Physical Data Modeling(PDM) - Fourth Phase: This is a complete model that includes all required tables, columns, relationship, database properties for the physical implementation of the database.

Database - Fifth Phase: DBAs instruct the data modeling tool to create SQL code from physical data model. Then the SQL code is executed in server to create databases.

Data Modeling standardization has been in practice for many years and the following section highlight the needs and implementation of the data modeling standards.

Conceptual Data Modeling

Conceptual data model includes all major entities and relationships and does not contain much detailed level of information about attributes and is often used in the INITIAL PLANNING PHASE.

Conceptual data model is created by gathering business requirements from various sources like business documents, discussion with functional teams, business analysts, smart management experts and end users who do the reporting on the database. Data modelers create conceptual data model and forward that model to functional team for their review.

Conceptual Data Model - Highlights

 CDM is the first step in constructing a data model in top-down approach and is a clear and accurate visual representation of the business of an organization.  CDM visualizes the overall structure of the database and provides high-level information about the subject areas or data structures of an organization.  CDM discussion starts with main subject area of an organization and then all the major entities of each subject area are discussed in detail.  CDM comprises of entity types and relationships. The relationships between the subject areas and the relationship between each entity in a subject area are drawn by symbolic notation(IDEF1X or IE). In a data model, cardinality represents the relationship between two entities. i.e. One to one relationship, or one to many relationship or many to many relationship between the entities.  CDM contains data structures that have not been implemented in the database.  In CDM discussion, technical as well as non-technical team projects their ideas for building a sound logical data model.

Consider an example of a bank that contains different line of businesses like savings, credit card, investment, loans and so on. In example(figure 1.1) conceputal data model contains major entities from savings, credit card, investment and loans. Conceptual data modeling gives an idea to the functional and technical team about how business requirements would be projected in the logical data model. Figure 1.1 : Example of Conceptual Data Model

Enterprise Data Modeling

The development of a common consistent view and understanding of data elements and their relationships across the enterprise is referred to as Enterprise Data Modeling. This type of data modeling provides access to information scattered throughout an enterprise under the control of different divisions or departments with different databases and data models.

Enterprise Data Modeling is sometimes called as global business model and the entire information about the enterprise would be captured in the form of entities.

Data Model Highlights

When a enterprise logical data model is transformed to a physical data model, super types and sub types may not be as is. i.e. the logical and physical structure of super types and sub types may be entirely different. A data modeler has to change that according to the physical and reporting requirement.

When a enterprise logical data model is transformed to a physical data model, length of table names, column names etc may exceed the maximum number of the characters allowed by the database. So a data modeler has to manually edit that and change the physical names according to database or organization’s standards.

One of the important things to note is the standardization of the data model. Since a same attribute may be present in several entities, the attribute names and data types should be standardized and a conformed dimension should be used to connect to the same attribute present in several tables.

Standard Abbreviation document is a must so that all data structure names would be consistent across the data model.

See Figure 1.4 below:

Consider an example of a bank that contains different line of businesses like savings, credit card, investment, loans and so on. In example(figure 1.2) enterprise data model contains all entities, attributes, relationships, from lines of businesses savings, credit card, investment and loans. Figure 1.4 : Example of Enterprise Data Model

Logical Data Modeling

This is the actual implementation and extension of a conceptual data model. A Logical data model is the version of a data model that represents the business requirements(entire or part) of an organization and is developed before the physical data model

As soon as the conceptual data model is accepted by the functional team, development of logical data model gets started. Once logical data model is completed, it is then forwarded to functional teams for review. A sound logical design should streamline the physical design process by clearly defining data structures and the relationships between them. A good data model is created by clearly thinking about the current and future business requirements. Logical data model includes all required entities, attributes, key groups, and relationships that represent business information and define business rules.

Example of Logical Data Model: Figure 1.2 In the example, we have identified the entity names, attribute names, and relationship. For detailed explanation, refer to relational data modeling.

Physical Data Modeling

Physical data model includes all required tables, columns, relationships, database properties for the physical implementation of databases. Database performance, indexing strategy, physical storage and denormalization are important parameters of a physical model.

Functional team approves logical data model and thereafter development of physical data model work gets started. Once physical data model is completed, it is then forwarded to technical teams (developer, group lead, DBA) for review. The transformations from logical model to physical model include imposing database rules, implementation of referential integrity, super types and sub types etc.

Example of Physical Data Model: Figure 1.3

In the example, the entity names have been changed to table names, changed attribute names to column names, assigned nulls and not nulls, and datatype to each column.

When a data modeler works with the client, his title may be a logical data modeler or a physical data modeler or combination of both. A logical data modeler designs the data model to suit business requirements, creates and maintains the lookup data, compares the versions of data model, maintains change log, generate reports from data model and whereas a physical data modeler has to know about the source and target databases properties.

A physical data modeler should know the technical-know-how to create data models from existing databases and to tune the data models with referential integrity, alternate keys, indexes and how to match indexes to SQL code. It would be good if the physical data modeler knows about replication, clustering and so on.

The differences between a logical data model and physical data model is shown below Logical vs Physical Data Modeling Logical Data Model Physical Data Model Represents business information and defines business Represents the physical implementation of the model in a rules database. Entity Table Attribute Column Primary Key Primary Key Constraint Alternate Key Unique Constraint or Unique Index Inversion Key Entry Non Unique Index Rule Check Constraint, Default Value Relationship Foreign Key Definition Comment

Standardization Needs | Modeling data:

Several data modelers may work on the different subject areas of a data model and all data modelers should use the same naming convention, writing definitions and business rules.

Nowadays, business to business transactions (B2B) are quite common, and standardization helps in understanding the business in a better way. Inconsistency across column names and definition would create a chaos across the business.

For example, when a data warehouse is designed, it may get data from several source systems and each source may have its own names, data types etc. These anomalies can be eliminated if a proper standardization is maintained across the organization.

Table Names Standardization: Giving a full name to the tables, will give an idea about data what it is about. Generally, do not abbreviate the table names; however this may differ according to organization’s standards. If the table name’s length exceeds the database standards, then try to abbreviate the table names. Some general guidelines are listed below that may be used as a prefix or suffix for the table. Examples:

 Lookup – LKP - Used for Code, Type tables by which a fact table can be directly accessed. e.g. Credit Card Type Lookup – CREDIT_CARD_TYPE_LKP  Fact – FCT - Used for transaction tables: e.g. Credit Card Fact - CREDIT_CARD_FCT  Cross Reference - XREF – Tables that resolves many to many relationships. e.g. Credit Card Member XREF – CREDIT_CARD_MEMBER_XREF  History – HIST - Tables the stores history. e.g. Credit Card Retired History – CREDIT_CARD_RETIRED_HIST  Statistics – STAT - Tables that store statistical information. e.g. Credit Card Web Statistics – CREDIT_CARD_WEB_STAT

Column Names Standardization: Some general guidelines are listed below that may be used as a prefix or suffix for the column. Examples:  Key – Key System generated surrogate key. e.g. Credit Card Key – CRDT_CARD_KEY  Identifier – ID - Character column that is used as an identifier. e.g. Credit Card Identifier – CRDT_CARD_ID  Code – CD - Numeric or alphanumeric column that is used as an identifying attribute. e.g. State Code – ST_CD  Description – DESC - Description for a code, identifier or a key. e.g. State Description – ST_DESC  Indicator – IND – to denote indicator columns. e.g. Gender Indicator – GNDR_IND

Database Parameters Standardization: Some general guidelines are listed below that may be used for other physical parameters. Examples:

 Index – Index – IDX – for index names. e.g. Credit Card Fact IDX01 – CRDT_CARD_FCT_IDX01  Primary Key – PK – for Primary key constraint names. e.g. CREDIT Card Fact PK01- CRDT-CARD_FCT_PK01  Alternate Keys – AK – for Alternate key names. e.g. Credit Card Fact AK01 – CRDT_CARD_FCT_AK01  Foreign Keys – FK – for Foreign key constraint names. e.g. Credit Card Fact FK01 – CRDT_CARD_FCT_FK01

Steps to create a Data Model

These are the general guidelines to create a standard data model and in real time, a data model may not be created in the same sequential manner as shown below. Based on the enterprise’s requirements, some of the steps may be excluded or included in addition to these.

Sometimes, data modeler may be asked to develop a data model based on the existing database. In that situation, the data modeler has to reverse engineer the database and create a data model.

1» Get Business requirements. 2» Create High Level Conceptual Data Model. 3» Create Logical Data Model. 4» Select target DBMS where data modeling tool creates the physical schema. 5» Create standard abbreviation document according to business standard. 6» Create domain. 7» Create Entity and add definitions. 8» Create attribute and add definitions. 9» Based on the analysis, try to create surrogate keys, super types and sub types. 10» Assign datatype to attribute. If a domain is already present then the attribute should be attached to the domain. 11» Create primary or unique keys to attribute. 12» Create check constraint or default to attribute. 13» Create unique index or bitmap index to attribute. 14» Create foreign key relationship between entities. 15» Create Physical Data Model. 15» Add database properties to physical data model. 16» Create SQL Scripts from Physical Data Model and forward that to DBA. 17» Maintain Logical & Physical Data Model. 18» For each release (version of the data model), try to compare the present version with the previous version of the data model. Similarly, try to compare the data model with the database to find out the differences. 19» Create a change log document for differences between the current version and previous version of the data model.

Relational (OLTP) Data Modeling

Relational Data Model is a data model that views the real world as entities and relationships. Entities are concepts, real or abstract about which information is collected. Entities are associated with each other by relationship and attributes are properties of entities. Business rules would determine the relationship between each of entities in a data model.

The goal of relational data model is to normalize (avoid redundancy)data and to present it in a good normal form. While working with relational data modeling, a data modeler has to understand 1st normal form thru 5th normal form to design a good data model.

Following are some of the questions that arise during the development of entity relationship data model. A complete business and data analysis would lead to design a good data model.

1» What will be the future scope of the data model? How to normalize the data? 2» How to group attributes in entities? 3» How to name entities, attributes, keys groups, relationships? 4» How to connect one entity to other? What sort of relationship is that? 5» How to validate the data? 6» How to normalize the data? 7» How to present reports?

The sample source data shown in the table below provides the information about employees, their residential state, county, city and their employer names and manager names. It also describes employees working for an "American Bank" that has got many branches in several states. From data modeler point of view, analysis of the source data raises following questions.

 How to group and organize the data?  How to avoid denormalization since employee's residential data like state name, county Name, city Name are repeated in most of the records.  What sort of relationship is between employer and employee?  What sort of relationship is between the employee and state, city, county?

Sample Source Data Emp State County Emp Last Emp Full Manager Employer DateTime City Name First Name Name Name Name Name Name Stamp Name American 1/1/2005 Paul New York Shelby Manhattan Paul Young Bank of 11:23:31 Young New York AM American 1/1/2005 Panama Chris Paul Florida Jefferson Chris Davis Bank of 11:23:31 City Davis Young Florida AM American 1/1/2005 Louis Paul California Montgomery San Hose Louis Johnson Bank of 11:23:31 Johnson Young California AM American 1/1/2005 New Jersey Sam Paul Bank of Hudson Sam Mathew 11:23:31 Jersey City Mathew Young New AM Jersey American 1/1/2005 Nancy Paul New York Shelby Manhattan Nancy Robinson Bank of 11:23:31 Robinson Young New York AM American 1/1/2005 Panama Sheela Chris Florida Jefferson Sheela Shellum Bank of 11:23:31 City Shellum Davis Florida AM American 1/1/2005 Louis California Montgomery Shelby Jeff Bill Jeff Bill Bank of 11:23:31 Johnson California AM American 1/1/2005 New Jersey John Sam Bank of Hudson John Burrell 11:23:31 Jersey City Burrell Mathew New AM Jersey

In the next page, we will discuss how to resolve these problems in order to design a good relational data model.

Upon discussion with business analysts, data modeler can come up with the following conclusions regarding grouping and relationship between the data. These conclusions play a vital role in designing the data model as well as expanding for future scope

 Many cities can be in one county. City names will be unique across the country.  Many counties can be in one state. County names will be unique across the country.  Many states can be in USA. State names will be unique across the country USA.  One employee can work with many branches at same time.  For some employees, managers may not be there.

In order to implement the above decisions, relational data modeling is done in the following manner.

 To achieve normalization, relevant attributes of employee, employer lookup, state lookup, county lookup and city lookup tables should be grouped and created.  In order to validate the data of employee table, employee table has been connected to state, county, and city lookups. Whenever state, county, city data is entered in employee table, data would be checked against respective lookup tables and correct data is stored. Hence there is no need to carry redundant data of state, county, city lookup in employee table.  All tables are identified by primary keys(PK). So data can be uniquely identifed from tables.  Records can be inserted or updated directly in the respective lookup table. For example if a state name changes, then the change will be updated only in the state lookup, hence this change will not affect other tables like employee.  Since one employee can work in many branches at the same time, table EmployeeEmployerXREF has been created and it resolves many to many relationships.  Since an employee can be a manager in many occasions, column "manager identifier" has been added and becomes a foreign key to column employee identifier. The "manager identifier" column would contain the same value as of an employee identifier. Sometimes it may contain null values also. For example, Paul Young is the topmost person and doesn't have any managers.  A new column DateTimeStamp has been added to all tables. This column gives the information about the date and time when the row was inserted or updated.

The completed relational data model is shown in Figure 1.5 and the corresponding data are shown in separate tables in the next page.

Example of Relational Data Model: Figure 1.5

The completed relational data model is shown in Figure 1.5 and the corresponding data stored in database are shown in separate tables below

State Lookup State Code State Name DateTimeStamp NY New York 1/1/2005 11:23:31 AM FL Florida 1/1/2005 11:23:31 AM CA California 1/1/2005 11:23:31 AM NJ New Jersey 1/1/2005 11:23:31 AM

County Lookup County Code County Name DateTimeStamp NYSH Shelby 1/1/2005 11:23:31 AM FLJE Jefferson 1/1/2005 11:23:31 AM CAMO Montgomery 1/1/2005 11:23:31 AM NJHU Hudson 1/1/2005 11:23:31 AM

City Lookup City Code City Name DateTimeStamp NYSHMA Manhattan 1/1/2005 11:23:31 AM FLJEPC Panama City 1/1/2005 11:23:31 AM CAMOSH San Hose 1/1/2005 11:23:31 AM NJHUJC Jersey City 1/1/2005 11:23:31 AM

Employee Emp State County City Manager Emp First Emp Last Emp Full DateTimeStamp Id Code Code Code Id Name Name Name 1/1/2005 11:23:31 1 NY NYSH NYSHMA Paul Young Paul Young AM 1/1/2005 11:23:31 2 FL FLJE FLJEPC 1 Chris Davis Chris Davis AM 1/1/2005 11:23:31 3 CA CAMO CAMOSH <1/td> Louis Johnson Louis Johnson AM 1/1/2005 11:23:31 4 NJ NJHU NJHUJC 1 Sam Mathew Sam Mathew AM Nancy 1/1/2005 11:23:31 5 NY NYSH NYSHMA 1 Nancy Robinson Robinson AM Sheela 1/1/2005 11:23:31 6 FL FLJE FLJEPC 2 Sheela Shellum Shellum AM 1/1/2005 11:23:31 7 CA CAMO CAMOSH 3 Jeff Bill Jeff Bill AM 1/1/2005 11:23:31 8 NJ NJHU NJHUJC 4 John Burrell John Burrell AM

Employer Lookup Employer Id Employer Name DateTimeStamp 1/1/2005 11:23:31 1001 American Bank of NewYork AM 1/1/2005 11:23:31 1002 American Bank of Florida AM 1/1/2005 11:23:31 1003 American Bank of California AM American Bank of New 1/1/2005 11:23:31 1004 Jersey AM

Employee Employer XREF Employee Id Employer Id DateTimeStamp 1 1001 1/1/2005 11:23:31 AM 2 1002 1/1/2005 11:23:31 AM 3 1003 1/1/2005 11:23:31 AM 4 1004 1/1/2005 11:23:31 AM 5 1001 1/1/2005 11:23:31 AM 6 1002 1/1/2005 11:23:31 AM 7 1003 1/1/2005 11:23:31 AM 8 1004 1/1/2005 11:23:31 AM

Dimensional Data Modeling

Dimensional Data Modeling comprises of one or more dimension tables and fact tables. Good examples of dimensions are location, product, time, promotion, organization etc. Dimension tables store records related to that particular dimension and no facts(measures) are stored in these tables. For example, Product dimension table will store information about products(Product Category, Product Sub Category, Product and Product Features) and location dimension table will store information about location( country, state, county, city, zip. A fact(measure) table contains measures(sales gross value, total units sold) and dimension columns. These dimension columns are actually foreign keys from the respective dimension tables. Example of Dimensional Data Model: Figure 1.6

In the example figure 1.6, sales fact table is connected to dimensions location, product, time and organization. It shows that data can be sliced across all dimensions and again it is possible for the data to be aggregated across multiple dimensions. "Sales Dollar" in sales fact table can be calculated across all dimensions independently or in a combined manner which is explained below.

 Sales Dollar value for a particular product  Sales Dollar value for a product in a location  Sales Dollar value for a product in a year within a location  Sales Dollar value for a product in a year within a location sold or serviced by an employee

In Dimensional data modeling, hierarchies for the dimensions are stored in the dimensional table itself. For example, the location dimension will have all of its hierarchies from country, state, county to city. There is no need for the individual hierarchial lookup like country lookup, state lookup, county lookup and city lookup to be shown in the model.

Uses of Dimensional Data Modeling Dimensional Data Modeling is used for calculating summarized data. For example, sales data could be collected on a daily basis and then be aggregated to the week level, the week data could be aggregated to the month level, and so on. The data can then be referred to as aggregate data. Aggregation is synonymous with summarization, and aggregate data is synonymous with summary data. The performance of dimensional data modeling can be significantly increased when materialized views are used. Materialized view is a pre- computed table comprising aggregated or joined data from fact and possibly dimension tables which also known as a summary or aggregate table.

Dimension Table Dimension table is one that describe the business entities of an enterprise, represented as hierarchical, categorical information such as time, departments, locations, and products. Dimension tables are sometimes called lookup or reference tables.

Location Dimension In a relational data modeling, for normalization purposes, country lookup, state lookup, county lookup, and city lookups are not merged as a single table. In a dimensional data modeling(star schema), these tables would be merged as a single table called LOCATION DIMENSION for performance and slicing data requirements. This location dimension helps to compare the sales in one region with another region. We may see good sales profit in one region and loss in another region. If it is a loss, the reasons for that may be a new competitor in that area, or failure of our marketing strategy etc.

Example of Location Dimension: Figure 1.8

In the above example, the location part of the Dimensional data model diagram is shown for easy understanding. It shows all the lookups country, state, county and city are connected to the single location dimension. Below are the data stored in each table found in the above location part. Dimension tables have been explained in detail under the section Dimensions.

Country Lookup Country Code Country Name DateTimeStamp USA United States Of America 1/1/2005 11:23:31 AM

State Lookup State Code State Name DateTimeStamp NY New York 1/1/2005 11:23:31 AM FL Florida 1/1/2005 11:23:31 AM CA California 1/1/2005 11:23:31 AM NJ New Jersey 1/1/2005 11:23:31 AM

County Lookup County Code County Name DateTimeStamp NYSH Shelby 1/1/2005 11:23:31 AM FLJE Jefferson 1/1/2005 11:23:31 AM CAMO Montgomery 1/1/2005 11:23:31 AM NJHU Hudson 1/1/2005 11:23:31 AM

City Lookup City Code City Name DateTimeStamp NYSHMA Manhattan 1/1/2005 11:23:31 AM FLJEPC Panama City 1/1/2005 11:23:31 AM CAMOSH San Hose 1/1/2005 11:23:31 AM NJHUJC Jersey City 1/1/2005 11:23:31 AM

Location Dimension Location Dimension Id Country Name State Name County Name City Name DateTimeStamp 1 USA New York Shelby Manhattan 1/1/2005 11:23:31 AM 2 USA Florida Jefferson Panama City 1/1/2005 11:23:31 AM 3 USA California Montgomery San Hose 1/1/2005 11:23:31 AM 4 USA New Jersey Hudson Jersey City 1/1/2005 11:23:31 AM

Relational Data Modeling is used in OLTP systems which are transaction oriented and Dimensional Data Modeling is used in OLAP systems which are analytical based. In a data warehouse environment, staging area is designed on OLTP concepts, since data has to be normalized, cleansed and profiled before loaded into a data warehouse or data mart. In OLTP environment, lookups are stored as independent tables in detail whereas these independent tables are merged as a single dimension in an OLAP environment like data warehouse.

Relational vs Dimensional Relational Data Modeling Dimensional Data Modeling Data is stored in RDBMS Data is stored in RDBMS or Multidimensional databases Tables are units of storage Cubes are units of storage Data is normalized and used for OLTP. Optimized Data is denormalized and used in datawarehouse and data mart. for OLTP processing Optimized for OLAP Several tables and chains of relationships among Few tables and fact tables are connected to dimensional tables them Volatile(several updates) and time variant Non volatile and time invariant SQL is used to manipulate data MDX is used to manipulate data Summary of bulky transactional data(Aggregates and Measures) Detailed level of transactional data used in business decisions User friendly, interactive, drag and drop multidimensional OLAP Normal Reports Reports

Dimension Table Dimension table is one that describe the business entities of an enterprise, represented as hierarchical, categorical information such as time, departments, locations, and products. Dimension tables are sometimes called lookup or reference tables.

Location Dimension In a relational data modeling, for normalization purposes, country lookup, state lookup, county lookup, and city lookups are not merged as a single table. In a dimensional data modeling(star schema), these tables would be merged as a single table called LOCATION DIMENSION for performance and slicing data requirements. This location dimension helps to compare the sales in one region with another region. We may see good sales profit in one region and loss in another region. If it is a loss, the reasons for that may be a new competitor in that area, or failure of our marketing strategy etc. Example of Location Dimension: Figure 1.8

Country Lookup Country Code Country Name DateTimeStamp USA United States Of America 1/1/2005 11:23:31 AM

State Lookup State Code State Name DateTimeStamp NY New York 1/1/2005 11:23:31 AM FL Florida 1/1/2005 11:23:31 AM CA California 1/1/2005 11:23:31 AM NJ New Jersey 1/1/2005 11:23:31 AM

County Lookup County Code County Name DateTimeStamp NYSH Shelby 1/1/2005 11:23:31 AM FLJE Jefferson 1/1/2005 11:23:31 AM CAMO Montgomery 1/1/2005 11:23:31 AM NJHU Hudson 1/1/2005 11:23:31 AM

City Lookup City Code City Name DateTimeStamp NYSHMA Manhattan 1/1/2005 11:23:31 AM FLJEPC Panama City 1/1/2005 11:23:31 AM CAMOSH San Hose 1/1/2005 11:23:31 AM NJHUJC Jersey City 1/1/2005 11:23:31 AM

Location Dimension Country Location State County City DateTime Dimension Id Name Name Name Stamp Name 1 USA New York Shelby Manhattan 1/1/2005 11:23:31 AM 2 USA Florida Jefferson Panama City 1/1/2005 11:23:31 AM 3 USA California Montgomery San Hose 1/1/2005 11:23:31 AM 4 USA New Jersey Hudson Jersey City 1/1/2005 11:23:31 AM

Product Dimension In a relational data model, for normalization purposes, product category lookup, product sub-category lookup, product lookup, and and product feature lookups are are not merged as a single table. In a dimensional data modeling(star schema), these tables would be merged as a single table called PRODUCT DIMENSION for performance and slicing data requirements. Example of Product Dimension: Figure 1.9

Product Category Lookup Product Category Code Product Category Name DateTimeStamp 1 Apparel 1/1/2005 11:23:31 AM 2 Shoe 1/1/2005 11:23:31 AM

Product Sub-Category Lookup Product Product DateTime Sub-Category Code Sub-Category Name Stamp 11 Shirt 1/1/2005 11:23:31 AM 12 Trouser 1/1/2005 11:23:31 AM 13 Casual 1/1/2005 11:23:31 AM 14 Formal 1/1/2005 11:23:31 AM

Product Lookup Product Code Product Name DateTimeStamp 1001 Van Heusen 1/1/2005 11:23:31 AM 1002 Arrow 1/1/2005 11:23:31 AM 1003 Nike 1/1/2005 11:23:31 AM 1004 Adidas 1/1/2005 11:23:31 AM

Product Feature Lookup Product Feature Code Product Feature Description DateTimeStamp 10001 Van-M 1/1/2005 11:23:31 AM 10002 Van-L 1/1/2005 11:23:31 AM 10003 Arr-XL 1/1/2005 11:23:31 AM 10004 Arr-XXL 1/1/2005 11:23:31 AM 10005 Nike-8 1/1/2005 11:23:31 AM 10006 Nike-9 1/1/2005 11:23:31 AM 10007 Adidas-10 1/1/2005 11:23:31 AM 10008 Adidas-11 1/1/2005 11:23:31 AM

Product Dimension Product Dimension Product Category Product Sub-Category Product Product Feature DateTime Id Name Name Name Desc Stamp 1/1/2005 11:23:31 100001 Apparel Shirt Van Heusen Van-M AM 1/1/2005 11:23:31 100002 Apparel Shirt Van Heusen Van-L AM 1/1/2005 11:23:31 100003 Apparel Shirt Arrow Arr-XL AM 1/1/2005 11:23:31 100004 Apparel Shirt Arrow Arr-XXL AM 1/1/2005 11:23:31 100005 Shoe Casual Nike Nike-8 AM 1/1/2005 11:23:31 100006 Shoe Casual Nike Nike-9 AM 1/1/2005 11:23:31 100007 Shoe Casual Adidas Adidas-10 AM 1/1/2005 11:23:31 100008 Shoe Casual Adidas Adidas-11 AM

Organization Dimension In a relational data model, for normalization purposes, corporate office lookup, region lookup, branch lookup, and employee lookups are not merged as a single table. In a dimensional data modeling(star schema), these tables would be merged as a single table called ORGANIZATION DIMENSION for performance and slicing data.

This dimension helps us to find the products sold or serviced within the organization by the employees. In any industry, we can calculate the sales on region basis, branch basis and employee basis. Based on the performance, an organization can provide incentives to employees and subsidies to the branches to increase further sales.

Example of Organization Dimension: Figure 1.10

Corporate Lookup Corporate Code Corporate Name DateTimeStamp CO American Bank 1/1/2005 11:23:31 AM

Region Lookup Region Code Region Name DateTimeStamp SE South East 1/1/2005 11:23:31 AM MW Mid West 1/1/2005 11:23:31 AM

Branch Lookup Branch Code Branch Name DateTimeStamp FLTM Florida-Tampa 1/1/2005 11:23:31 AM ILCH Illinois-Chicago 1/1/2005 11:23:31 AM

Employee Lookup Employee Code Employee Name DateTimeStamp E1 Paul Young 1/1/2005 11:23:31 AM E2 Chris Davis 1/1/2005 11:23:31 AM

Organization Dimension DateTime Organization Dimension Id Corporate Name Region Name Branch Name Employee Name Stamp 1 American Bank South East Florida-Tampa Paul Young 1/1/2005 11:23:31 AM 2 American Bank Mid West Illinois-Chicago Chris Davis 1/1/2005 11:23:31 AM

Time Dimension In a relational data model, for normalization purposes, year lookup, quarter lookup, month lookup, and week lookups are not merged as a single table. In a dimensional data modeling(star schema), these tables would be merged as a single table called TIME DIMENSION for performance and slicing data.

This dimensions helps to find the sales done on date, weekly, monthly and yearly basis. We can have a trend analysis by comparing this year sales with the previous year or this week sales with the previous week.

Example of Time Dimension: Figure 1.11

Year Lookup Year Id Year Number DateTimeStamp 1 2004 1/1/2005 11:23:31 AM 2 2005 1/1/2005 11:23:31 AM

Quarter Lookup Quarter Number Quarter Name DateTimeStamp 1 Q1 1/1/2005 11:23:31 AM 2 Q2 1/1/2005 11:23:31 AM 3 Q3 1/1/2005 11:23:31 AM 4 Q4 1/1/2005 11:23:31 AM

Month Lookup Month Number Month Name DateTimeStamp 1 January 1/1/2005 11:23:31 AM 2 February 1/1/2005 11:23:31 AM 3 March 1/1/2005 11:23:31 AM 4 April 1/1/2005 11:23:31 AM 5 May 1/1/2005 11:23:31 AM 6 June 1/1/2005 11:23:31 AM 7 July 1/1/2005 11:23:31 AM 8 August 1/1/2005 11:23:31 AM 9 September 1/1/2005 11:23:31 AM 10 October 1/1/2005 11:23:31 AM 11 November 1/1/2005 11:23:31 AM 12 December 1/1/2005 11:23:31 AM Week Lookup Week Number Day of Week DateTimeStamp 1 Sunday 1/1/2005 11:23:31 AM 1 Monday 1/1/2005 11:23:31 AM 1 Tuesday 1/1/2005 11:23:31 AM 1 Wednesday 1/1/2005 11:23:31 AM 1 Thursday 1/1/2005 11:23:31 AM 1 Friday 1/1/2005 11:23:31 AM 1 Saturday 1/1/2005 11:23:31 AM 2 Sunday 1/1/2005 11:23:31 AM 2 Monday 1/1/2005 11:23:31 AM 2 Tuesday 1/1/2005 11:23:31 AM 2 Wednesday 1/1/2005 11:23:31 AM 2 Thursday 1/1/2005 11:23:31 AM 2 Friday 1/1/2005 11:23:31 AM 2 Saturday 1/1/2005 11:23:31 AM

Time Dimension Time Day Month Day Year Quarter Month Month Week DateTime Dim Of Day of Cal Date No No No Name No Stamp Id Year No Week 1/1/2005 1 2004 1 Q1 1 January 1 1 5 1/1/2004 11:23:31 AM 1/1/2005 2 2004 32 Q1 2 February 1 5 1 2/1/2004 11:23:31 AM 1/1/2005 3 2005 1 Q1 1 January 1 1 7 1/1/2005 11:23:31 AM 1/1/2005 4 2005 32 Q1 2 February 1 5 3 2/1/2005 11:23:31 AM Slowly Changing Dimensions

Dimensions that change over time are called Slowly Changing Dimensions. For instance, a product price changes over time; People change their names for some reason; Country and State names may change over time. These are a few examples of Slowly Changing Dimensions since some changes are happening to them over a period of time.

Slowly Changing Dimensions are often categorized into three types namely Type1, Type2 and Type3. The following section deals with how to capture and handling these changes over time.

The "Product" table mentioned below contains a product named, Product1 with Product ID being the primary key. In the year 2004, the price of Product1 was $150 and over the time, Product1's price changes from $150 to $350. With this information, let us explain the three types of Slowly Changing Dimensions.

Product Price in 2004: Product ID(PK) Year Product Name Product Price 1 2004 Product1 $150

Type 1: Overwriting the old values. In the year 2005, if the price of the product changes to $250, then the old values of the columns "Year" and "Product Price" have to be updated and replaced with the new values. In this Type 1, there is no way to find out the old value of the product "Product1" in year 2004 since the table now contains only the new price and year information.

Product Product ID(PK) Year Product Name Product Price 1 2005 Product1 $250

Type 2: Creating an another additional record. In this Type 2, the old values will not be replaced but a new row containing the new values will be added to the product table. So at any point of time, the difference between the old values and new values can be retrieved and easily be compared. This would be very useful for reporting purposes.

Product Product ID(PK) Year Product Name Product Price 1 2004 Product1 $150 1 2005 Product1 $250

The problem with the above mentioned data structure is "Product ID" cannot store duplicate values of "Product1" since "Product ID" is the primary key. Also, the current data structure doesn't clearly specify the effective date and expiry date of Product1 like when the change to its price happened. So, it would be better to change the current data structure to overcome the above primary key violation.

Product Effective Expiry Product ID(PK) Year Product Name Product Price DateTime(PK) DateTime 1 01-01-2004 12.00PM 2004 Product1 $150 12-31-2004 11.59PM 1 01-01-2005 12.00PM 2005 Product1 $250

In the changed Product table's Data structure, "Product ID" and "Effective DateTime" are composite primary keys. So there would be no violation of primary key constraint. Addition of new columns, "Effective DateTime" and "Expiry DateTime" provides the information about the product's effective date and expiry date which adds more clarity and enhances the scope of this table. Type2 approach may need additional space in the data base, since for every changed record, an additional row has to be stored. Since dimensions are not that big in the real world, additional space is negligible.

Type 3: Creating new fields. In this Type 3, the latest update to the changed values can be seen. Example mentioned below illustrates how to add new columns and keep track of the changes. From that, we are able to see the current price and the previous price of the product, Product1.

Product Current Old Product Product Current Product ID(PK) Old Year Name Product Price Year Price 1 2005 Product1 $250 $150 $2004

The problem with the Type 3 approach, is over years, if the product price continuously changes, then the complete history may not be stored, only the latest change will be stored. For example, in year 2006, if the product1's price changes to $350, then we would not be able to see the complete history of 2004 prices, since the old values would have been updated with 2005 product information.

Product Product Old Product Product Product ID(PK) Year Old Year Name Price Price 1 2006 Product1 $350 $250 $2005

Data Modeler Role

Business Requirement Analysis: » Interact with Business Analysts to get the functional requirements. » Interact with end users and find out the reporting needs. » Conduct interviews, brain storming discussions with project team to get additional requirements. » Gather accurate data by data analysis and functional analysis.

Development of data model: » Create standard abbreviation document for logical, physical and dimensional data models. » Create logical, physical and dimensional data models(data warehouse data modelling). » Document logical, physical and dimensional data models (data warehouse data modelling).

Reports: » Generate reports from data model.

Review: » Review the data model with functional and technical team.

Creation of database: » Create sql code from data model and co-ordinate with DBAs to create database. » Check to see data models and databases are in synch.

Support & Maintenance: » Assist developers, ETL, BI team and end users to understand the data model. » Maintain change log for each data model. Data Modeling Report

From Data Modeling tools, reports can be easily generated for technical and business needs. The reports that have been generated from logical data model and physical data model are called as business reports and technical reports respectively. Most of the data modeling tools provide default reports like subject area reports, entity reports, attribute reports, table reports, column reports, indexing reports, relationship reports etc. The advantage of these reports is, whether they are technical or non-technical, everybody would understand what is going on within the organization.

Other than default reports provided by data modeling tools, a data modeler can also create customized reports as per the needs of an organization. For example, if an expert asks of both logical and physical reports of a particular subject area in one file(e.g in .xls), logical and physical reports can be easily merged and reports can be easily generated accordingly. Data Modeling tools provide the facility of sorting, filtering options and the reports can be exported into file formats like .xls, .doc, .xml etc.

Logical Data Model Report: Logical Data Model Report describes information about business such as the entity names, attribute names, definitions, business rules, mapping information etc Logical Data Model Report Example:

Physical Data Model Report: Physical Data Model Report describes information such as the ownership of the database, physical characteristics of a database (in oracle, table space, extents, segments, blocks, partitions etc), performance tuning (processors, indexing), table name, column name, data type, relationship between the tables, constraints, abbreviations, derivation rules, glossary, data dictionary, etc., and is used by the technical team.

Physical Data Model Report Example:

Data Modeling Tools

There are a number of data modeling tools to transform business requirements into logical data model, and logical data model to physical data model. From physical data model, these tools can be instructed to generate sql code for creating database.

Popular Data Modeling Tools Tool Name Company Name Erwin Computer Associates Embarcadero Embarcadero Technologies Rational Rose IBM Corporation Power Designer Sybase Corporation Oracle Designer Oracle Corporation

Data Modeling Tools: What to Learn? Data modeling tools are the only way through which we can create powerful data models. Following are the various options that we have to know and learn in data modeling tools before start building data models.

Software: » How to install the data modeling tool on server/client?

Logical Data Model: » How to create entity and add definition, business rule? » How to create domains? » How to create an attribute and add definition, business rule, validation rules like default values and check constraint? » How to create supertypes, subtypes? » How to create primary keys, unique constraint, foreign key relationships, and recursive relationships? » How to create identifying and non-identifying relationship? » How to assign relationship cardinality? » How to phrase relationship connecting two tables? » How to assign role names? » How to create key groups? » How to create sequence no's?

Physical Data Model: » How to rename a table? » How to rename a column,validation rules like default and check constraints? » How to assign NULL and NOT NULL to columns? » How to name foreign key constraints? » How to connect to databases like MS Access, Oracle, Sibase, Terradata etc? » How to generate sql code from data model to run against databases like MS Access, Oracle, Sibase, Terradata etc.? » How to create a data model from an existing database like MS Access, Oracle, Sibase, Terradata etc.? » How to add database related properties to tables, indexes? » How to check different versions of the data model? » How many data modelers can concurrently work on the same version of a data model? Dimensional Data Model: » Is there any specific notation to identify a Data Warehouse/Data mart data models? Subject Area: » How to create subject area and assign relevant entities to subject area?

Reports: » How to generate reports from data model and export to .XLS, .DOC, .XML file formats?

Naming Options: » Is there any method to change the entity/table, attribute/column name from upper case to lower case or lower case to upper case?

Import & Export: » How to create data models from .xls, .txt files etc.? » How to import and export meta data into ETL tools?

Abbreviation Document: » How to create/attach a standard abbreviation document(for naming tables, columns etc.)?

Print: » How to send data models to printer/plotter/Acrobat Reader?

Backup: » How to take backup of data model?

Others: » How to split a data model to logical and physical data model? » How to copy and paste objects within data model and across data models? » How to search an object within a data model? » How to change the font size and color of entities,attributes,relationship lines? » How to create a legend? » How to show a data model in different levels like entity level, attribute level, and definition level?

Erwin Tutorial

All Fusion Erwin Data Modeler commonly known as Erwin, is a powerful and leading data modeling tool from Computer Associates. Computer Associates delivers several softwares for enterprise management, storage management solutions, security solutions, application life cycle management, data management and business intelligence

Erwin makes database creation very simple by generating the DDL(sql) scripts from a data model by using its Forward Engineering technique or Erwin can be used to create data models from the existing database by using its Reverse Engineering technique. Erwin workplace consists of the following main areas:

 Logical: In this view, data model represents business requirements like entities, attributes etc.  Physical: In this view, data model represents physical structures like tables, columns, datatypes etc.  Modelmart: Many users can work with a same data model concurrently.

What can be done with Erwin?

 Logical, Physical and dimensional data models can be created.  Data Models can be created from existing systems(rdbms, dbms, files etc.).  Different versions of a data model can be compared.  Data model and database can be compared.  SQl scripts can be generated to create databases from data model.  Reports can be generated in different file formats like .html, .rtf, and .txt.  Data models can be opened and saved in several different file types like .er1, .ert, .bpx, .xml, .ers, .sql, .cmt, .df, .dbf, and .mdb files.  By using ModelMart, concurrent users can work on the same data model.

In order to create data models in Erwin, you need to have this All Fusion Erwin Data Modeler installed in your system. If you have installed Modelmart, then more than one user can work on the same model.

How to create a Logical Data Model:

In the following section, a simple example with a step by step procedure to create a logical data model with two entities and their relationship are explained in detail. 1: Open All Fusion Erwin Data Modeler software. 2: Select the view as "Logical" from the drop-down list. By default, logical will be your workplace. 3: Click New from File menu. Select the option "Logical/Physical" from the displayed wizard. Click Ok.

4: To create an Entity, click the icon "Entity" and drop it on the workplace. By default E/1 will be displayed as the entity name. Change it to "Country". 5: To create an Attribute, Place the cursor on the entity "Country" and right click it. From the displayed menu, click attributes which will take you to the attribute wizard. Click "New" button on the wizard and type attribute name as "Country Code". Select the data type as "String" and click OK. Select the option Primary Key to identify attribute "Country Code" as the primary key. Follow the same approach and create another attribute "Country Name" without selecting the primary key option. Click ok, and now you will be having 2 attributes Country Code, and Country Name under the entity "Country" in the current logical workplace. 6: Create another entity "Bank" with two attributes namely Bank Code and Bank Name by following steps 4 and 5. 7: In order to relate these two tables country, bank, a Foreign Key relationship must be created. To create a Foreign Key relationship, follow these steps. (a) Click the symbol "Non Identifying Relationship". (b) Place the cursor on the entity "Country". (c) Place the cursor on the entity "Bank". Now you can see the relationship(a line drawn from Bank to Country) between "Country" and "Bank". Double click on that relationship line to open "Relationships wizard" and change the option from "Nulls Allowed" to "No Nulls" since bank should have a country code.

The Logical Data Model created by following the above steps looks similar to the following diagram.

How to create a Physical Data Model: 1: Change the view from "Logical to Physical" from the drop down list. 2: Click "Database" from main menu and then click "Choose Database" from the sub menu. Then select your target database server where the database has to be created. Click ok.

3: Place the cursor on the table "Country" and right click it. From the displayed menu, click columns which will take you to the column wizard. Click the "Database Tab", which is next to "General Tab" and assign datatypes "VARCHAR2(10), VARCHAR2(50) for columns COUNTRY_CODE and COUNTRY_NAME respectively. Change the default NULL to NOT NULL for the column COUNTRY_NAME. Similarly, repeat the above step for the BANK table. Once you have done all of these, you can see the physical version of the logical data model in the current workplace.

The Physical Data Model created by following the above steps looks similar to the following diagram.

How to generate DDL(sql) scripts to create a database:

1: Select the view as Physical from the drop down list. 2:Click "Tools" from main menu and then click "Forward Engineer/Schema Generation" from the sub menu which will take you to the "Schema Generation Wizard". Select the appropriate properties that satisfies your database requirements like schema, table, primary key etc. Click preview to see your scripts. Either you can click to generate the table in a database or you can store the scripts and run against the database later.

The DDL(sql) scripts generated by Erwin by following the above steps looks similar to the following script.

CREATE TABLE Country(Country_Code VARCHAR2(10) NOT NULL, Country_Name VARCHAR2(50) NOT NULL, CONSTRAINT PK_Country PRIMARY KEY (Country_Code));

CREATE TABLE Bank(Bank_Code VARCHAR2(10) NOT NULL, Bank_Name VARCHAR2(50) NOT NULL, Country_Code VARCHAR2(10) NOT NULL, CONSTRAINT PK_Bank PRIMARY KEY(Bank_Code) );

ALTER TABLE Bank ADD( CONSTRAINT FK_Bank FOREIGN KEY (Country_Code) REFERENCES Country );

Note:This is not a complete tutorial on Erwin. We will add more Tips and Guidelines on Erwin in near future. Please visit us soon to check back. To know more about Erwin, contact its official website www.ca.com.

Recommended publications