Sagent Data Flow Solution 6.8 DATA MART POPULATION GUIDE

Sagent Data Flow Solution

6.8 DATA MART POPULATION GUIDE

Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives. No part of this document may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, without the written permission of Pitney Bowes Software Inc., One Global View, Troy, New York 12180-8399.

©2012 Pitney Bowes Software Inc. All rights reserved. MapInfo, Group 1 Software, and Sagent Data Flow Solution are trademarks of Pitney Bowes Software Inc. Information Studio and Sagent are registered trademarks of Pitney Bowes Software, Inc. The Sagent Logo, Sagent Design Studio, Flashcube, Data Flow, and StarMart are trademarks of Pitney Bowes Software, Inc. All other trademarks are property of their respective companies. All other marks and trademarks are property of their respective holders. Any provisions of this license related to ICU that differ from the IBM Public License 1.0 are offered by Pitney Bowes Software, Inc. alone and not by any other party. The Source Code of the ICU program is available from Pitney Bowes Software, Inc. upon written request. Further information regarding the ICU Program may be found at: http://oss.software.ibm.com/icu/. Portions Copyright ©1996, Microsoft Corporation. All rights reserved. This software uses Xerces-J Copyright ©1999 The Apache Software Foundation. All rights reserved.

Contact information for all Pitney Bowes Software Inc. offices is located at: http://www.pbinsight.com/about/contact-us. April 2012

Table of Contents

Chapter 1: Preface...... 8 Who Should Read this Manual...... 9 Conventions...... 10 If You Need Help ...... 11 Contact Information ...... 11 PBBI’s Website ...... 12 Education ...... 12 Before You Begin ...... 13 How to Proceed ...... 14 Chapter 2: Introduction...... 15 About Data Marts...... 16 Star Schema Structure ...... 17 Fact Tables ...... 17 Dimension Tables ...... 18 Joins...... 18 Advantages of the Star Schema ...... 20 About the Sagent Environment ...... 21 Sagent Design Studio ...... 21 Sagent Server ...... 22 Understanding the Data Movement Processes ...... 23 Extracting ...... 23 Transforming ...... 23 Loading ...... 23 Defining Your Sagent Data Mart ...... 25 Design Phase...... 25 Development Phase...... 26 Production Phase...... 26 Chapter 3: Preparing Your Environment...... 27 Overview ...... 28 Preparing the Sagent Environment ...... 29 Permissions ...... 29 Repository Configurations ...... 29 Preparing Your Source Environment ...... 31 Identifying Data Sources ...... 31 Obtaining Permissions and Privileges ...... 31 Analyzing Existing Data ...... 31 Cleaning Source Data ...... 32

Sagent Data Flow Solution v6.8 5 Data Mart Population Guide Preparing Your Data Mart Environment ...... 33 Obtaining Permissions and Privileges ...... 33 Backing Up the Data Mart Database ...... 33 Creating Star Schema Tables ...... 33 Chapter 4: Defining Sources and Data Marts ...... 34 About this Tutorial ...... 35 Representing the Data Source Structure ...... 38 Creating BaseViews and MetaViews ...... 38 Creating Join Groups ...... 40 Representing the Data Mart Structure...... 42 Creating BaseViews...... 42 Chapter 5: Creating Population Plans ...... 45 About Population Plans ...... 46 Displaying the Data Flow Editor ...... 47 Creating Population Plans ...... 49 Creating the Time Dimension Plan ...... 49 Creating Plans for Other Dimension Tables...... 53 Creating a Plan for the Fact Table ...... 63 Using Flat File Sources ...... 75 Chapter 6: Scheduling Population Plans ...... 83 About Scheduling ...... 84 Schedule Intervals ...... 84 Population Plan Sequence...... 84 Plan Dependencies ...... 85 Using Design Studio Scheduler...... 86 Scheduling a Plan ...... 86 Viewing Scheduled Plans...... 90 Removing a Scheduled Plan ...... 91 Receiving Notification...... 92 Viewing the Event Log ...... 93 Chapter 7: Advanced Population Options...... 94 Optimizing Load Performance ...... 95 Changed Data Capture and Data Cleansing ...... 96 Scrubbing Source Data ...... 96 Timestamping the Data ...... 97 Appending New Fact Records ...... 99 Appending New Dimension Records (Type II Records)...... 99 Updating Existing Records (Type I Records) ...... 99 Using a Third Party Tool ...... 100

Sagent Data Flow Solution v6.8 6 Data Mart Population Guide Auditing Population Plans ...... 101 Enhancing Query Performance on the Star Schema...... 103 Managing Indexes and Audit Trails ...... 104 Index ...... 106

Sagent Data Flow Solution v6.8 7 Data Mart Population Guide Preface 1

In this chapter: • Who Should Read this Manual • Conventions • If You Need Help • Before You Begin • How to Proceed

 Throughout this manual, the term “Windows” refers to all Microsoft Windows operating systems (Windows 2000, Windows XP,, Windows Server 2003, Windows Server 2008, etc.) unless otherwise specified. Chapter 1: Preface Who Should Read this Manual Who Should Read this Manual

This manual is intended for data designers and database administrators who are responsible for population tasks. You must know your particular database environment and concepts of programming, database schema design and dimensional modeling. The instructions in this manual assume you are familiar with Sagent Design Studio™ and the concepts presented in the Sagent Design Studio User’s Guide.

Sagent Data Flow Solution v6.8 9 Data Mart Population Guide Chapter 1: Preface Conventions Conventions

The following conventions are used in this document: • Keys on the keyboard are referred to by name, such as Escape and Enter. • Commands you type or items you click with the mouse are printed in bold, such as clicking Update or typing yes. • Machine names or other variables that you must provide are referred to using . Do not type the brackets (<>). • Paths, file names, and code appear in Courier font, such as C:\Sagent.

Sagent Data Flow Solution v6.8 10 Data Mart Population Guide Chapter 1: Preface If You Need Help If You Need Help

The Customer Support e-mail address is [email protected]. Put the product name, Sagent Solution, in the subject line.

Contact Information Technical Support US Hours: Monday - Friday from 08:00 – 20:00 EST, excluding US Holidays Phone: 800 367 6950 or +1 301 731 2316 (if dialing from outside the US)

Technical Support Europe, Middle East, Africa (Excluding Germany) Hours: Monday - Friday from 09:00 – 17:30 GMT, excluding UK Bank Holidays Phone: Inside the UK: 1 800 840 0001 (Option 1 > Option 3) Outside the UK for Sagent +44 1923 279103 Technical Support Germany Hours: Monday - Thursday from 09:00 – 18:00 CET excluding German Holidays Friday from 09:00 – 17:00 Phone: +49 89 462 387 55 Technical Support APAC, Australia and New Zealand (excluding Japan) Australia Hours: Monday – Friday from 08.00 - 18.00 AEST, excluding Public Holidays Phone: 1 800 648 899 New Zealand (Critchlow Pty) Phone: 0800 MAPPING (0800 627 7464) Email: [email protected] Web: www.critchlow.co.nz Technical Support Japan Hours: Monday - Friday from 09:00 - 18:00 JST, excluding Holidays Phone: +81 3 5468 6991 Email: [email protected] (Group1 and Sagent) If you are having difficulty with a particular function, first review the section of this manual describing that function. If you are still unable to correct the problem, please note: • The release, version, and modification level numbers from the Data Flow Admin software • The task you are trying to accomplish • Any error messages Next, contact a PBBI Customer Support Representative, who will use this information to determine the source of the difficulty. Reporting complete details to Customer Support will enable you and your company to pinpoint the problem and resolve it more quickly. You may send an e-mail or fax a copy of your reports to speed up problem resolution.

Sagent Data Flow Solution v6.8 11 Data Mart Population Guide Chapter 1: Preface If You Need Help

PBBI’s Website You also receive technical support through the PBBI Website: http://www.pbinsight.com/support/online-support-services. Registered users can obtain electronic copies of product announcements, download software and databases, find out about our training classes, or sign up for PBBI’s List Services. To enter the Sagent technical support area of the website, you must have a password and user ID. You can obtain an ID and password by contacting PBBI’s Technical Support department by phone or e-mail, or visiting: http://www.g1.com/support. In the right panel of the site, click on the hyperlinked text: “Do you need your User ID and password?” and follow the instructions provided therein.

Education PBBI offers comprehensive training courses on many of its products, allowing you to maximize the usability of your software. PBBI offers access to its courses in three ways: public seminars, instructor-led online, and on-site training. You can choose the option that best fits your needs. • For information about public seminars or online classes, visit our website at http://www.pbinsight.com/support/training/ • For information about on-site training, contact the PBBI University Manager, [email protected].

Sagent Data Flow Solution v6.8 12 Data Mart Population Guide Chapter 1: Preface Before You Begin Before You Begin

You require Sagent Solution to populate your data mart or data warehouse. For information on how to install Sagent software, see the Sagent Installation Instructions included with your Sagent CD. The Sagent Administrator’s Guide contains additional information on setting up and managing your Sagent environment.

Sagent Data Flow Solution v6.8 13 Data Mart Population Guide Chapter 1: Preface How to Proceed How to Proceed

The guide provides: • Concepts of data mart population • A tutorial that describes how to use Sagent Design Studio to populate a data mart or data warehouse. Follow each chapter in sequence to create and populate a Sagent Data Mart. The following summarizes the content of each section of this guide: • Chapter 2, “Introduction”, introduces concepts and tasks necessary for you to design and populate your own Sagent Data Mart. • Chapter 3, “Preparing Your Environment”, provides an overview of the preliminary tasks you must complete before you create and run Plans to populate your data mart. • Chapter 4, “Defining Sources and Data Marts”, is a tutorial that describes how to create BaseViews, MetaViews, and joins for data sources and targets. • Chapter 5, “Creating Population Plans”, is a tutorial that describes how to create Plans to populate dimension and fact tables for a star schema data mart. • Chapter 6, “Scheduling Population Plans”, is a tutorial that describes how to schedule Plans using Design Studio Scheduler. • Chapter 7, “Advanced Population Options”, describes optional procedures to complete while you prepare your environment to enhance extraction and load.

Sagent Data Flow Solution v6.8 14 Data Mart Population Guide Introduction 2

This chapter introduces the concepts and tasks necessary for you to design and populate your own Sagent Data Mart. In this chapter: • About Data Marts • Star Schema Structure • Advantages of the Star Schema • About the Sagent Environment • Understanding the Data Movement Processes • Defining Your Sagent Data Mart Chapter 2: Introduction About Data Marts About Data Marts

A Data mart is an online analytical processing (OLAP), or a decision support database, that gets data from various sources such as, online transaction processing (OLTP) databases, files and other proprietary systems. • OLTP system—A database optimized for modification (inserting, updating and deleting records) of data. For an example of an OLTP database, see the samples_creditcard_oltp database schema diagram in Figure 1 on page 36. • OLAP system—A database optimized for data retrieval (querying). The information is often used to make business decisions and is often referred to as a decision support system. For an example of an OLAP database, see the samples_creditcard_star database schema diagram in Figure 2 on page 37 Table 1 summarizes the differences between OLTP and OLAP.

Table 1:OLTP versus OLAP (Decision support) Databases

OLTP OLAP or Decision Support

Online Transaction Processing Online Analytical Processing Order Entry Sales and Marketing Billing Department Managers

Entity Relation Model Star Schema or Dimensional Model Highly normalized De-normalized Many tables Fewer tables Many join paths Fewer join paths Complex schema Simple schema

Data is time-variant, frequently updated Data is a snapshot, non-volatile, infrequently updated, or updated according to a schedule

Structure is activity oriented Structure is subject oriented

Many users Fewer users

Optimized for inserts, updates, deletes Optimized for queries

Sagent Data Flow Solution v6.8 16 Data Mart Population Guide Chapter 2: Introduction Star Schema Structure Star Schema Structure

A star schema is composed of: • Fact Tables • Dimension Tables • Joins Star schemas have a subject-oriented design. Data is stored according to logical business relationships, instead of how it was entered into the relational database. In a star schema, data is stored as either facts or dimensional attributes. Data is organized according to how it is measured and whether it changes over time. Facts change regularly and dimensions change slowly or never, Star Schema Model

Fact Tables Fact tables are the central tables in the star schema of your data mart. Fact tables usually contain numeric or quantitative information (called measures) that you want to access quickly. Fact tables are populated with data extracted from a data source. The data source can be an OLTP system or a data warehouse. A snapshot of the source data is regularly extracted and moved to the data mart, usually at the same time every day, week or month.

Sagent Data Flow Solution v6.8 17 Data Mart Population Guide Chapter 2: Introduction Star Schema Structure

For example, if you have a data mart that you use to generate a report on company revenue, you would have dollar_sales, and dollar_cost as columns within your fact table. Fact tables also contain a set of columns that form a concatenated, or composite key. Each column of the concatenated key is a foreign key drawn from a dimension table primary key. Typically, facts are numeric, continuously valued and additive. A continuously valued fact is a numeric measurement that has a value every time it is measured. A fact is additive if it makes sense to add the measurement across the dimensions. Most queries against a fact table access thousands, or hundreds of thousands, of records to construct a result set of relatively few rows. It is helpful if you can compress these records into the result set by adding them or performing other mathematical operations. The level of detail in a fact table is called the grain. Every row in the fact table must be recorded to the same level of detail. In the figure, “Star Schema Model” on page 17, the measurements in the fact table are daily totals of sales in dollars, sales in units, and cost in dollars of each product sold. The grain is daily item totals. Each record in the fact table represents the total sales of a specific product in a retail store on one day. Each new combination of product, store or day generates a different record in the fact table. A star schema can have multiple fact tables. Use a schema with multiple fact tables to separate sets of measurements that share a common subset of dimension tables, or to track measurements with different grains. All operations you can perform in a Sagent Data Mart are supported for schemas with multiple fact tables.

Dimension Tables Dimension tables store data that describe the information in the fact table. Dimension tables are flat and denormalized. Dimensions are used to describe the queries on the fact table. For example, if sales_total differed one month from the next you would look to the dimensions to tell you why. The same dimension table can be used with different fact tables. Dimension tables have attributes and a single part primary key that joins the dimension table to the fact table. Attributes are the columns in the dimension table. The single part primary key allows you to quickly browse a single dimension table. Browsing a dimension table can help determine the best way to query the fact table. Most star schemas include a time dimension. A time dimension table makes it possible to analyze historic data without using complex SQL calculations. For example, you can analyze your data by workdays versus holidays, weekdays versus weekends, by fiscal periods or by special events. If the grain of the fact table is daily sales, each record in the time dimension table represents a day.

Joins Joins define relationships between a fact table and dimension tables in the star schema. The primary key in the dimension table is the foreign key in the fact table. The fact table must contain a primary key value from each dimension table. The reference from the foreign key to the primary key is the mechanism for verifying values between the two tables. Join relationships of this type ensure the referential integrity of a data mart.

Sagent Data Flow Solution v6.8 18 Data Mart Population Guide Chapter 2: Introduction Star Schema Structure

Referential integrity is an important requirement in decision support databases. Referential integrity must be maintained to ensure valid query results.The primary key of a fact table is a combination of the foreign keys it contains. This type of primary key is called a concatenated key. Each record in a dimension table can describe many records in the fact table, making the join cardinality of dimension tables to fact tables one-to-many. In the example “Star Schema Model” on page 17, PRODUCT_KEY is the primary key in the PROD_DIMENSION table and the foreign key in the SALES_FACT table. This join represents the relationship between the company’s products and its sales.

Sagent Data Flow Solution v6.8 19 Data Mart Population Guide Chapter 2: Introduction Advantages of the Star Schema Advantages of the Star Schema

The Sagent Data Mart is a high-performance decision support system, designed to work with a star schema. The star schema is a database design that allows you to take advantage of the Sagent optimization in: – Query and load performance – Aggregate navigation in Sagent Information Studio or Design Studio – Skip drilling with attribute hierarchies in Sagent Analysis

 You can choose to implement a different type of relational database design, such as OLTP design or a snowflaked schema.

A well-designed schema allows you to quickly understand, navigate and analyze large multidimensional data sets. The main advantages of star schemas in a decision support environment are: • Query Performance Queries run faster against a star schema database than an OLTP system because the star schema has fewer tables and clear join paths. In a star schema design, dimensions are linked through the central fact table. Dimensions are linked with each other through one join path intersecting the fact table. This design feature enforces accurate and consistent query results. • Load Performance and Administration The star schema structure reduces the time required to load large batches of data into a database. By defining facts and dimensions and separating them into different tables, the impact of a load operation is reduced. Dimension tables can be populated once and occasionally refreshed. New facts can be added regularly and selectively by appending records to a fact table. • Built-in Referential Integrity A star schema is designed to enforce referential integrity of loaded data. Referential integrity is enforced by the use of primary and foreign keys. Primary keys in dimension tables become foreign keys in fact tables to link each record across dimension and fact tables. • Efficient Navigation Through Data Navigating through data is efficient because dimensions are joined through fact tables. These joins are significant because they represent fundamental relationships of real business processes. You can browse a single dimension table in order to select attribute values to construct an efficient query.

Sagent Data Flow Solution v6.8 20 Data Mart Population Guide Chapter 2: Introduction About the Sagent Environment About the Sagent Environment

The Sagent Solution is made up of many pieces. Data mart population requires a subset of the Sagent Solution. To build your data mart you need the following pieces of the Sagent Solution: • Sagent Design Studio • Sagent Server For more information on Sagent Solution, see the Sagent Design Studio User’s Guide.

Sagent Design Studio Sagent Design Studio is the client piece of the Sagent Solution. Use Design Studio to develop and implement your data mart. You can define business views for end users and extract, transform and load data into decision support databases.

To develop and populate your data mart, use Design Studio to define the following: • BaseViews BaseViews represent source and data mart schemas. BaseViews define tables, joins and the connection information to and from all databases necessary to create your data mart. You can create different BaseViews from the same database. You can create a view for a database administrator that includes system tables, and another view that includes a subset of data tables. • MetaViews A MetaView is a logical metadata layer built on top of one or more source BaseViews. MetaViews are based on objects in the BaseView. You can edit a MetaView to rearrange and delete items, and you can add tables and columns to the MetaView by dragging them from a BaseView. You can also create calculated fields in a MetaView using SQL expressions to perform calculations or establish selection constraints. • Plans A Plan is a set of instructions used to extract, transform and load data. Plans control data flow, population, data cleansing and data transformation. Use a Plan in Design Studio to model the process of data movement between the source and data mart. A Plan always begins with a source and ends with a sink. For example:

A Transform adds a step to a Plan. Transforms can be SQL scripts, built-in functions or custom programs. • Snap A Snap is a table that stores Plan results. A Snap lets you archive information from different points in time.

Sagent Data Flow Solution v6.8 21 Data Mart Population Guide Chapter 2: Introduction About the Sagent Environment

• Schedules Plans can be scheduled to run on specified days and times without intervention. Scheduling is useful for population Plans, because most load operations require system resources typically unavailable during the business day. For information on scheduling Plans, see “Scheduling a Plan” on page 86.

 If your data source is a file, you can create Plans without defining BaseViews and MetaViews.

Sagent Server Design Studio is a server-based application that operates on top of the Sagent Load or Access Servers. A Sagent Server can be enabled for both load and access capabilities. There are two types of Sagent Servers: • Load Server The Load Server provides data movement and data mart population capabilities. You must have the Load Server to design and populate your data mart. • Access Server The Access Server provides data access. The Sagent Server includes: • Repository The Repository is a database that stores metadata and Sagent-specific items, such as: – BaseViews, MetaViews and population Plans – Instructions for retrieving, manipulating and displaying results You can have single or multiple Repository configurations. For more information on Repositories, see the Sagent Installation Instructions and the Sagent Administrator’s Guide.

 Directly altering a Repository database is not recommended. If you modify Repository data or a Repository table directly, items can disappear or you might receive errors.

• Data Flow Service The Data Flow Service brokers all client-server communication in a Sagent environment. The Data Flow Service can be configured for accessing or loading data, or both. The Data Flow Service must be started for Design Studio to run Plans. The Data Flow Service: – Packages a database request – Dynamically constructs the correct type of SQL code – Locates the source database on the network – Submits the request – Returns results to the client application

Sagent Data Flow Solution v6.8 22 Data Mart Population Guide Chapter 2: Introduction Understanding the Data Movement Processes Understanding the Data Movement Processes

The goal of populating a decision support database is to enhance the ease and efficiency of queries, whether you move operational data to a star schema, or stage subsets of data from a data warehouse to departmental data marts. The ETL data movement process includes: • Extracting data from your sources • Transforming data • Loading data into the data mart

Extracting Extraction is the process of moving data from your sources. Data for extraction needs to be identified and may need to be moved to another location. To extract data from databases and legacy systems compatible with Sagent connection functionality, you need to know the connect string information for the user with the correct permissions. To extract data in delimited or flat files, you must copy or move the files to the machine that hosts the Data Flow Service, or is referenced in relation to the server.

Transforming Transformation is the process of changing or adding to data extracted from sources. Sagent Transforms are used in Plans to define what happens to source data after it is extracted and before it is loaded into your data mart. For more information on Sagent Transforms, see the Sagent Transforms Reference Guide. To help you understand and learn the population process, a sample OLTP database and a sample star schema database are included on your Sagent CD. For detailed instructions on how to populate the sample star schema tables with OLTP data, follow the tutorial in this manual.

Loading Loading is the process of moving data into your data mart. Data is loaded by executing various Plans created in Design Studio. The data is saved in the target database. There are two types of load operations that are required at different times. Each type of load requires a separate set of population Plans:

• Initial load - Plans typically insert records into an empty table, or drop the table and recreate it. The initial load populates empty dimension and fact tables with an initial set of data. An initial load operation is simpler to perform because you do not need to integrate the load data with existing data. However, an initial load usually takes longer to complete than an incremental load because the volume of data loaded is much greater.

Sagent Data Flow Solution v6.8 23 Data Mart Population Guide Chapter 2: Introduction Understanding the Data Movement Processes

• Incremental load - Plans use existing tables and either append new records or update existing records. An incremental load is an update of the initial load. The goal is to load only data that is new or has changed since the last load operation. Incremental loads usually involve appending records to the fact table or tables, and carefully tracking dimension records to identify changes and additions. The interval between incremental loads depends on the granularity of data required by client users. Nightly or weekly incremental loads are the most common intervals. You should regularly evaluate changes in source data and choose the best way to refresh the data mart.

Sagent Data Flow Solution v6.8 24 Data Mart Population Guide Chapter 2: Introduction Defining Your Sagent Data Mart Defining Your Sagent Data Mart

The goal of your data mart is to enhance the ease and efficiency of queries. Planning ahead simplifies population tasks, minimizes load time, and helps reduce the number of records discarded due to preventable errors. Defining your data mart can be broken into the following phases: • Design Phase • Development Phase • Production Phase Each phase contains many tasks. You can undertake the phases and tasks at the same time, or in any order.

Design Phase The goal of the design phase is research and analysis. To design a dimensional data mart, you must understand: • the business rules of your organization • the content of data the business collects Design phase tasks can happen in the order that makes the most sense for your company. The tasks in the design phase are: • Define the goal of your data mart. For example, to provide a weekly calculation of company total revenue. • Choose a business process to model in order to identify the fact tables. A business process is a process in your organization that generates data. Examples of business processes include sales, orders, invoices, shipments, inventory and general ledger. The data from the process becomes the source for your data mart. • Identify the sources of data for your data mart. Data sources can be flat files, databases or legacy systems. • Analyze your existing data. • Determine which data you want to load into your data mart database. • Determine how often you want to update data in your data mart.

Sagent Data Flow Solution v6.8 25 Data Mart Population Guide Chapter 2: Introduction Defining Your Sagent Data Mart

• Define your star schema. – Choose the level of detail (grain) of each fact table. Grain is the most detailed level of data to include in the fact table for the business process. The finer the grain of each dimension, the more precisely a query can cut through the database. The grain of a fact table is usually the individual transaction, the individual line item, a daily snapshot or a monthly snapshot. – Choose the dimensions for each fact table and their respective grains. Examples of typical dimensions are time, product, customer, promotion, transaction type, and status. For each dimension, identify all of the distinct attributes that describe the dimension. The most useful attributes are textual and discrete. A dimension can include numeric attributes, such as package size, if the attribute acts more like a description than a measurement. – Choose the measured facts. These are the measurements that make up each fact table record. Typical measured facts are numeric additive quantities such as sales in dollars, or sales in units.

Development Phase The goal of the development phase is to build the data mart. This phase involves using Sagent products to implement your design. To build a data mart: • Prepare your source data. • Prepare your data mart environment. Use SQL or database administration tools to create the fact and dimension tables of your star schema in the database that becomes your data mart. • Create the necessary Sagent items. – BaseViews – MetaViews –Plans • Perform an initial data load to move data into the data mart database.

Production Phase After your data mart is built, you must keep it current. To keep the data mart current, do the following on a regular basis: • Perform incremental data loads by scheduling population plans to run on a recurrent basis. • Publish Plans for common queries and reports.

Sagent Data Flow Solution v6.8 26 Data Mart Population Guide Preparing Your 3 Environment

This chapter provides an overview of the preliminary tasks you must complete before you create and run Plans to populate your data mart. In this chapter:

• Overview • Preparing the Sagent Environment • Preparing Your Source Environment • Preparing Your Data Mart Environment Chapter 3: Preparing Your Environment Overview Overview

Before you can load data from source databases to data mart databases, you must prepare the: • Sagent environment • Source database environment • Data mart database environment

Sagent Data Flow Solution v6.8 28 Data Mart Population Guide Chapter 3: Preparing Your Environment Preparing the Sagent Environment Preparing the Sagent Environment

To prepare the Sagent Environment, consider and gather information about:

• Permissions • Repository Configurations

Permissions To complete the tasks required for data mart population, you must have permission to: • View the Data Flow Editor in Design Studio • Create BaseViews and MetaViews • Create and schedule Plans. For more information on required permissions, see the Sagent Administrator’s Guide.

Repository Configurations In order to create and access BaseViews and MetaViews required for data mart population, you need to know the: • Number of Repositories in your data mart • Names of Repositories • Content of Repositories The two most common Repository configurations are: • Single Repository • Multiple Repositories

Single Repository In a single Repository configuration, all BaseViews, MetaViews and Plans are stored in one location. The advantage of a single Repository design is ease of Repository creation and maintenance. With a single Repository, the Administrator must provide organized security for BaseViews, MetaViews and population Plans.

Multiple Repositories In a multiple Repository configuration BaseViews, MetaViews, and Plans used for population are in one Repository, and BaseViews, MetaViews and Plans for data query and access are in other Repositories. The advantage of a multiple Repository design is security. Though additional effort is required to create and maintain two Repositories, users never access BaseViews, MetaViews and Plans used for population. Configuring BaseView logins, MetaView permissions and other security features is not necessary.

Sagent Data Flow Solution v6.8 29 Data Mart Population Guide Chapter 3: Preparing Your Environment Preparing the Sagent Environment

With multiple Repositories, you must update Registry settings to switch between Repositories. If population Plans are scheduled using the Scheduler, you cannot update Registry settings on the Design Studio machine between the time a Plan is scheduled and the time it is run.

Sagent Data Flow Solution v6.8 30 Data Mart Population Guide Chapter 3: Preparing Your Environment Preparing Your Source Environment Preparing Your Source Environment

Before you can extract source data to populate your data mart, you must prepare your source environment by: • Identifying Data Sources • Obtaining Permissions and Privileges • Analyzing Existing Data • Cleaning Source Data

Identifying Data Sources For data mart population you must understand and specify the following for data sources: • Location of the data sources Know the location of the data sources, such as tables or files, that contain the data to load into the data mart. • Content of data sources Know which data sources contain the data to load into the data mart. • Type of data sources Verify that source data is in a format Design Studio can interpret. See the Sagent Installation Instructions included with your Sagent CD for a list of relational databases, text databases and ODBC data sources that Sagent supports. For non-relational sources, such as flat files or text files, vendor restrictions are not an issue. If source data is not in a supported format, an intermediate step can resolve the incompatibility. For example, you could use your database tools to extract data from an unsupported source database into a comma-separated value (CSV) file.

Obtaining Permissions and Privileges Once you identify the data sources, you must have permission to: • Select against tables that contain data to load if source data is in a database • Access the necessary files if source data is in a delimited text file or flat file For more information on required permissions and privileges, see the Sagent Administrator’s Guide.

Analyzing Existing Data You can analyze existing data using various tools or you can use Sagent. You need to determine: • What data is available • What data needs to be generated • What data anomalies exist • What strategy deals with data anomalies

Sagent Data Flow Solution v6.8 31 Data Mart Population Guide Chapter 3: Preparing Your Environment Preparing Your Source Environment

Using Sagent to Verify Data You can verify data before you load it into the data mart database by displaying it in the Workspace. To verify that the data you want to load is correct: 1. Open Design Studio. 2. Login to the Repository. 3. Drag a Plan from the Plan Bin into the Workspace. 4. Open the Data Flow Editor. 5. If the sink step in the data flow Plan is not a Grid, replace the sink step with a Grid. For example:

6. If the volume of input data is large, add a Filter or Conditional Splitter Transform to the data flow Plan. The Filter and Conditional Splitter Transforms limit the number of records displayed. For more information on these Transforms, see Sagent Transforms Reference Guide.

7. Run the population Plan to display the data in the Workspace.

Cleaning Source Data After you analyze your source data, identify and fix inconsistencies or problems in your source systems. For more information, see “Scrubbing Source Data” on page 96.

Sagent Data Flow Solution v6.8 32 Data Mart Population Guide Chapter 3: Preparing Your Environment Preparing Your Data Mart Environment Preparing Your Data Mart Environment

Before you load data into your data mart, you must prepare your data mart environment. If your data mart environment consists of more than one physical location, prepare each data mart. Prepare your data mart environment by: • Obtaining Permissions and Privileges • Backing Up the Data Mart Database • Creating Star Schema Tables

Obtaining Permissions and Privileges To load data into your data mart, you must have permission to: • Insert data into data mart databases • Bulk insert or copy for SQL Server and Sybase data mart databases with no constraints or indexes • Create table permission if you create tables during population Plans

Backing Up the Data Mart Database Protect your data mart database in case an error during population renders it unrecoverable: • Create a backup copy of the data mart database before running any population Plans. • Timestamp data mart records to load so that changes can be backed out or recovered. For more information see, “Timestamping the Data” on page 97.

Creating Star Schema Tables Extracted data is loaded into tables in your data mart. You create dimension and fact tables: • Before you run population Plans using a database modeling tool, SQL scripts or an interactive SQL tool, or • When running population Plans using Sagent Batch Loader Transforms.

Sagent Data Flow Solution v6.8 33 Data Mart Population Guide Defining Sources and 4 Data Marts

This chapter is a tutorial that describes how to create BaseViews, Metaviews and joins for data sources and targets. In this chapter: • About this Tutorial • Representing the Data Source Structure • Representing the Data Mart Structure Chapter 4: Defining Sources and Data Marts About this Tutorial About this Tutorial

The goals of this tutorial are to: – Divide the population process into well-defined steps – Show each step of the process in detail – Illustrate data mart and population concepts in a practical situation To complete this tutorial, you must have access to the following: – the samples_repository database – the samples_creditcard_oltp database – the empty samples_creditcard_star database In this tutorial, you use Design Studio to: – Create a BaseView and MetaView for the data source – Create a BaseView for the data mart database – Create population Plans to populate data mart dimension tables – Create a population Plan to populate the data mart fact table This data mart population tutorial uses the sample database samples_creditcard_oltp, included on your Sagent CD, as the source database. The Sagent Installation Instructions contain instructions for installing the sample database. The sample must be installed before you start the tutorial. Figure 1 on page 36 shows the normalized schema structure of the samples_creditcard_oltp source database. Figure 2 on page 37 shows the star schema structure of the samples_creditcard_star database.

Sagent Data Flow Solution v6.8 35 Data Mart Population Guide Chapter 4: Defining Sources and Data Marts About this Tutorial

Figure 1: The samples_creditcard_oltp database structure

dilo.~ hot.!sehol;l_h,Mid~ti __ M~ ~OOM i:,umaty.)di ada'ess_ld SttGllda,-yJd ~-1.YJ>e e.irdJutti,!r e,i,d_exf!i dm,_opeli»M ~ ~~-id ~ ....J;I ,¥d_pll!W,;t_l;! ~-~~.ft:;t_rwno- ~-IJ'Wc ~ _ldi ~ J stJl«llil' a,;ie ~css...]d se.x: "·-~--- ,!lddt~,Jd ,'tc:, ~-Y..IMt.J'l'IIYl!I rtllllio!I ¢'-cod.!, ~}'_rr6*I!, ~J:I ad!:tl!!5S rel a, l'l'lflil¥ stall~

t!'.inwQ~t;l ~~ IIMt'dwt.Jd the..!date .-l1t oi.llelit,.debt .w.atable.eredt ,Wr·D't'~t,ad!J t,~d

Sagent Data Flow Solution v6.8 36 Data Mart Population Guide Chapter 4: Defining Sources and Data Marts About this Tutorial

Figure 2: The samples_creditcard_star database structure

,:t.:,, ,j.~. -'~ .,...,.._,,-r, drt.J>st.IlllflO I'--__._. ~~ ~~ »i"= «l!d~

- -- ,:t.:, I 'I '!lf"r', _'I;.... _,:-• .-i°~_.' lr~Jd Ir~ 1r-11on_~ 1:1:•1 I '., j_;. -,j_,y ,-rr'.I <"1 Ir~ ~I_C4lr!!rd f,, •.· Y" ,. --tn-..-~·,_.. r, ""~ ~Cctc_rd '11JP,U.-~<°'!i;!lll' ~~~ oM.Jl(I ,.. ,u.-~-~ ...._Ci.r:.,;•.;:;~~Y4l'liOl'I;:;- =:.....---1 ill-~dnc,,p{b!I (!l'd.J,rl>'hl.koy ti:w .,. scoi_~~ ,;i:)1-~,:il ~~ ~-(ertidl IM!l..t« ~ C

Sagent Data Flow Solution v6.8 37 Data Mart Population Guide Chapter 4: Defining Sources and Data Marts Representing the Data Source Structure Representing the Data Source Structure

You must define the structure of data and connection to the data sources so population Plans can access data to extract. Define the structure of and connection to your source databases by: • Creating BaseViews and MetaViews • Creating Join Groups

Creating BaseViews and MetaViews Create a BaseView and MetaView of each database that contains the source data. In this tutorial, samples_creditcard_oltp contains source data.

To create the source BaseView and MetaView: 1. Start the Sagent Design Studio application. The Design Studio login dialog displays.

2. Type sa into the User Name field.

3. Click the Advanced button. The Design Studio login dialog expands.

4. Type samples_repository in the Database field. 5. Type the name of the server that contains the samples_repository database, for example DB_Server, in the Server Name field.

6. Click OK. The Design Studio application opens.

7. Select ToolsBaseView Editor. 8. Click Create New BaseView in the toolbar at the top of the BaseView Editor window.

Create New BaseView button

The Define BaseView dialog displays. 9. Type SourceData in the BaseView Name field. 10. Select SQL SERVER from the Database Type dropdown. 11. Select DATABASE from the Data Dictionary Type dropdown. 12. Type the name of the server that contains the source data in the Server field. 13. Type samples_creditcard_oltp into the Database field. 14. Click the Use Standard Connection radio button. 15. Type sa into the User ID field.

Sagent Data Flow Solution v6.8 38 Data Mart Population Guide Chapter 4: Defining Sources and Data Marts Representing the Data Source Structure

16. Type the sa user password into the Password field. 17. Type dbo in the Database User field. 18. Click the Autocreate MetaView check box. 19. Type Population in the field under Autocreate MetaView. The Define BaseView dialog looks like the following:

Sagent Data Flow Solution v6.8 39 Data Mart Population Guide Chapter 4: Defining Sources and Data Marts Representing the Data Source Structure

20. Click the Tables button. The dialog expands to show a list of database tables.

21. Click the User check box. 22. Select all tables in Database Tables field.

23. Click OK. Design Studio creates a new BaseView named SourceData and a MetaView named Population. In the Population MetaView, tables are represented as Categories and columns are represented as Parts.

Creating Join Groups Create join groups so that the correct data is extracted from the source database. In this OLTP schema, multiple join paths exist between the account, address, primary_cardholder and secondary_cardholder tables. To create a join group: 1. Draw joins among the tables in the SourceData BaseView as shown in Figure 1 on page 36. For more information on creating joins, see Sagent Design Studio User’s Guide.

2. Click Edit Join Groups in the BaseView Editor.

Edit Join button

Sagent Data Flow Solution v6.8 40 Data Mart Population Guide Chapter 4: Defining Sources and Data Marts Representing the Data Source Structure

The Edit Join Groups dialog displays.

3. Select the Population MetaView from the dropdown list. 4. In the BaseView Editor, press the Ctrl key while clicking all of the following joins to include them in the join group: address.address_id primary_cardholder.address_id primary_cardholder.primary_id account.primary_id secondary_cardholder.secondary_id account.secondary_id

5. In the Edit Join Groups dialog, type Account in the field next to the Create New button.

 The Create New button is enabled after you type the name of the join group.

6. Click Create New. Account displays in the Edit Join Groups dialog.

7. Click Close. 8. Close the BaseView Editor. Use the Account join group to run the population Plan for the account dimension table in samples_creditcard_star. For more information, see “Specifying the Join Group” on page 55.

Sagent Data Flow Solution v6.8 41 Data Mart Population Guide Chapter 4: Defining Sources and Data Marts Representing the Data Mart Structure Representing the Data Mart Structure

You must define the structure of and connection to your data mart so population Plans can direct extracted data to the appropriate tables in your data mart. Define the structure of and connection to your data mart with a BaseView.

Creating BaseViews To define the data mart BaseView: 1. Start the Sagent Design Studio application. The Design Studio login dialog displays.

2. Type sa into the User Name field.

3. Click the Advanced button. The Design Studio login dialog expands.

4. Type samples_repository in the Database field. 5. Type the name of the server that contains the samples_repository database, for example DB_Server.

6. Click OK. The Design Studio application opens.

7. Select ToolsBaseView Editor. 8. Click the Create New BaseView button in the toolbar at the top of the BaseView Editor window.

Create New BaseView button

The Define BaseView dialog displays. 9. Type Target Data in the BaseView Name field. 10. Select SQL SERVER from the Database Type dropdown. 11. Select DATABASE from the Data Dictionary Type dropdown. 12. Type the name of the server that contains the source data in the Server field, for example DB_server. 13. Type samples_creditcard_star into the Database field. 14. Click the Use Standard Connection radio button. 15. Type sa into the User ID field. 16. Type the sa user password into the Password field.

Sagent Data Flow Solution v6.8 42 Data Mart Population Guide Chapter 4: Defining Sources and Data Marts Representing the Data Mart Structure

17. Type dbo in the Database User field. The Define BaseView dialog looks like the following:

18. Click Tables. The dialog expands to show a list of available source database tables. In this example, you create the target tables in samples_credit card_star during the load process, so the BaseView must be empty.

Sagent Data Flow Solution v6.8 43 Data Mart Population Guide Chapter 4: Defining Sources and Data Marts Representing the Data Mart Structure

19. Click the User check box to remove the check. The Database Tables field is empty:

20. Click OK. 21. If tables appear in your Target Data BaseView, delete them. 22. Close the BaseView Editor.

Sagent Data Flow Solution v6.8 44 Data Mart Population Guide Creating Population 5 Plans

This chapter is a continuation of the tutorial from the previous chapter. This part of the tutorial describes how to create Plans to populate dimension and fact tables for a star schema data mart. In this chapter:

• About Population Plans • Displaying the Data Flow Editor • Creating Population Plans Chapter 5: Creating Population Plans About Population Plans About Population Plans

Population Plans extract and transform data from the normalized structure in the data source into a star schema structure in the data mart. Population Plans use the BaseViews and MetaViews you created in Chapter 4, “Defining Sources and Data Marts” to determine the location and structure of data in the data source and data mart. You create a population Plan for: • The time dimension table • Each remaining dimension table • The fact table The steps you include in a Plan depends on the type of table you load with data. Every population Plan has:

• One or more steps to define the data source • One or more steps to define the data mart

 To guarantee proper key creation, populate dimension tables before you populate fact tables.

Sagent Data Flow Solution v6.8 46 Data Mart Population Guide Chapter 5: Creating Population Plans Displaying the Data Flow Editor Displaying the Data Flow Editor

Create and view Plans using the Data Flow Editor. The Data Flow Editor is the upper right pane of the Design Studio interface.

Before you create Plans to populate a data mart, see the following topics: To display the Data Flow Editor: 1. Start the Sagent Design Studio application. The Design Studio login dialog displays.

2. Type sa into the User Name field.

3. Click the Advanced button. The Design Studio login dialog expands.

4. Type samples_repository in the Database field. 5. Type the name of the server that contains the samples_repository database.

6. Click OK. The Design Studio application opens. 7. Place your cursor over the handle at the top of the Workspace so that the cursor becomes a split arrow. This handle is located just below the tool bar.

Sagent Data Flow Solution v6.8 47 Data Mart Population Guide Chapter 5: Creating Population Plans Displaying the Data Flow Editor

8. Drag the mouse part way down the Workspace to reveal the Data Flow Editor.

------') !: Untitled - Design Studio

Eile ~dit ~iew Iools t:!elp

re; II ~ I • I ti Ii ., I • · IB I llilii liiiliilS %.OO~e2' Bin Explorer Untitled CJ ] El·· Data Sources

j...... f@ Advanced XML Source

1...... ~ Delimited Text File Source IIL ~;:::~::::::,

L..... ~ TirY"'lo l':ono..-~ .. i.--.""" ~ - ] >

Sagent Data Flow Solution v6.8 48 Data Mart Population Guide Chapter 5: Creating Population Plans Creating Population Plans Creating Population Plans

You need different types of population Plans to populate a data mart. Populate a data mart by: • Creating the Time Dimension Plan • Creating Plans for Other Dimension Tables • Creating a Plan for the Fact Table • Using Flat File Sources

Creating the Time Dimension Plan Use the Time Generation Transform to populate the time dimension table. Records for the time dimension table are generated independent of other data. The time dimension table must include at least one integer column for unique Julian day values, and a column of date or datetime type for unique date values. In this example, these columns are created for you. To populate the time dimension table, day_dimension:

1. Create a data flow Plan, see “Creating the Data Flow Plan” on page 49. 2. Complete the Time Generation dialog, see “Defining Time Generation” on page 50. 3. View the day_dimension table values, see “Viewing Table Values” on page 51. 4. Complete the Microsoft Batch Loader dialog, see “Defining the Microsoft Batch Loader” on page 52.

Creating the Data Flow Plan To create a data flow Plan: 1. Open the Data Flow Editor, see “Displaying the Data Flow Editor” on page 47. 2. Click the Tool Bin button.

Sagent Data Flow Solution v6.8 49 Data Mart Population Guide Chapter 5: Creating Population Plans Creating Population Plans

3. Drag the following Transforms from the Tool Bin into the Data Flow Editor: – Time Generation Transform – Microsoft Batch Loader Transform

 Each Plan includes a Batch Loader Transform that corresponds to the type of database server that contains your data mart. The Plans in this example use a Microsoft SQL Server Batch Loader Transform. For information on using other Batch Loader Transforms, see the Sagent Transforms Reference Guide.

4. To continue creating the time dimension Plan, go to “Defining Time Generation” on page 50.

Defining Time Generation This procedure is continued from the “Creating the Data Flow Plan” on page 49. To define the Time Generation dialog: 1. Double-click the Time Generation Transform. The Time Generation dialog displays. 2. Click the Start Date radio button. 3. Type 1/1/94 into the Start Date field 4. Type 1095 into the Duration field.

Sagent Data Flow Solution v6.8 50 Data Mart Population Guide Chapter 5: Creating Population Plans Creating Population Plans

5. Select all output types except Weekday. The columns are named by default. The completed dialog looks like the following:

6. Click OK. The Time Generation Transform uses the Julian Day column as a primary key, and generates a unique datetime value for each record in the Date column. 7. To continue creating the time dimension Plan, go to “Viewing Table Values” on page 51.

Viewing Table Values This procedure is continued from “Defining Time Generation” on page 50. To view the generated time values for day_dimension before loading them into the data mart table: 1. Right-click on the Microsoft Batch Loader Transform. A menu displays. 2. Select Delete Step. 3. Drag a Grid Transform from the Tool Bin into the Data Flow Editor. The day_dimension table displays in the Workspace.

Sagent Data Flow Solution v6.8 51 Data Mart Population Guide Chapter 5: Creating Population Plans Creating Population Plans

4. Click Update to run the Plan. The Grid displays the generated time data.

5. Replace the Grid sink with a Microsoft Batch Loader. 6. To continue creating the time dimension Plan, go to “Defining the Microsoft Batch Loader” on page 52.

Defining the Microsoft Batch Loader This procedure is continued from “Viewing Table Values” on page 51. To define the Microsoft Batch Loader dialog: 1. Double-click the Microsoft Batch Loader Transform. The Microsoft Batch Loader dialog displays. 2. Select Target Data from the BaseView dropdown.

3. Type day_dimension in the Table field.

Sagent Data Flow Solution v6.8 52 Data Mart Population Guide Chapter 5: Creating Population Plans Creating Population Plans

4. Click the Create Table radio button.

5. Click OK.

6. Click Update. The Plan runs and populates the time dimension table. A key is generated for the Julian Day column, and values for all the specified output columns are generated. You can run the Plan again when you need to generate new or additional records for the time dimension table. 7. Use SQL Server Enterprise Manager to verify the table is added to the samples_creditcard_star database. 8. Save your Plan.

Creating Plans for Other Dimension Tables A dimension table has a single primary key, and contains detailed information for columns in the fact table. To populate the remaining dimension tables in samples_creditcard_star, see: • Creating the acct_dimension Plan • Creating the card_prod_dimension Plan • Creating the household_dimension Plan • Creating the status_dimension Plan

Sagent Data Flow Solution v6.8 53 Data Mart Population Guide Chapter 5: Creating Population Plans Creating Population Plans

• Creating the trans_type_dimension Plan

Creating the acct_dimension Plan To populate the acct_dimension: 1. Create the data flow Plan, see“Creating the Data Flow Plan” on page 54. 2. Create the acct_dimension table, see “Creating the Dimension Table” on page 54. 3. Specify the join group, see “Specifying the Join Group” on page 55. 4. Define the Key Generation dialog, see “Defining Key Generation” on page 56. 5. View the acct_dimension table values, see “Viewing Table Values” on page 56. 6. Complete the Microsoft Batch Loader dialog, see “Defining the Microsoft Batch Loader” on page 57.

Creating the Data Flow Plan To create a data flow Plan:

1. Open the Data Flow Editor. For more information, see “Displaying the Data Flow Editor” on page 47. 2. Click the Tool Bin button. 3. Drag the following Transforms from the Tool Bin into the Data Flow Editor: – SQL Query Transform – Key Generation Transform – Microsoft Batch Loader Transform

4. To continue creating the acct_dimension Plan, go to “Creating the Dimension Table” on page 54.

Creating the Dimension Table This procedure is continued from “Creating the Data Flow Plan” on page 54. To create the acct_dimension table: 1. Click the Parts Bin button. 2. Click the Search button. A list of MetaViews displays. 3. Select the Population MetaView.

Sagent Data Flow Solution v6.8 54 Data Mart Population Guide Chapter 5: Creating Population Plans Creating Population Plans

4. Expand the following Categories: – dbo.account – dbo.address – dbo.primary_cardholder – dbo.secondary_cardholder 5. Drag the following Parts from the Parts Bin into the Workspace:

 Parts are not displayed in the Workspace.

account_id age date_opened marital address sex city primary_last_name state secondary_last_name zip_code

The Parts are added to the SQL Query step. 1. Double-click the SQL Query Transform. The SQL Editor dialog displays. 2. Click the Suppress Duplicate Records check box. 3. Click OK. 4. To continue creating the acct_dimension Plan go to the next section “Specifying the Join Group” on page 55.

Specifying the Join Group This procedure is continued from the “Creating the Dimension Table” on page 54.

To specify the join group:

1. Select ToolsJoins. The Joins dialog displays

2. Select SQL Query from the Join for step dropdown. 3. Click the Join Groups radio button.

4. Select Account to specify this join group to use for your Plan. This join group ensures that the Plan retrieves address information through the primary_cardholder table.

5. Click OK. The join group is specified. 6. To continue creating the acct_dimension Plan go to “Defining Key Generation” on page 56.

Sagent Data Flow Solution v6.8 55 Data Mart Population Guide Chapter 5: Creating Population Plans Creating Population Plans

Defining Key Generation This procedure is continued from the previous section “Specifying the Join Group” on page 55. To define the Key Generation: 1. Double-click the Key Generation Transform. The Key Generation dialog displays.

2. Type account_key into the Key Column field. 3. Click the Start Key Value At radio button and leave the value as 0. The completed dialog looks like the following:

4. Click OK. 5. To continue creating the acct_dimension Plan go to “Viewing Table Values” on page 56.

Viewing Table Values This procedure is continued from the previous section “Defining Key Generation” on page 56. To view the transformed data for the dimension before loading the data mart table: 1. Right-click on the Microsoft Batch Loader Transform. A menu displays.

Sagent Data Flow Solution v6.8 56 Data Mart Population Guide Chapter 5: Creating Population Plans Creating Population Plans

2. Select Delete Step. 3. Click the Tool Bin button. 4. Drag a Grid Transform from the Tool Bin into the Data Flow Editor. The dimension table displays in the Workspace.

5. Click Update to run the Plan. The Grid displays the data. The acc_dimension table appears as follows:

6. Replace the Grid sink with a Microsoft Batch Loader Transform. 7. To continue creating the acct_dimension Plan go to “Defining the Microsoft Batch Loader” on page 57.

Defining the Microsoft Batch Loader This procedure is continued from “Viewing Table Values” on page 56. To define the Microsoft Batch Loader:

1. Double-click the Microsoft Batch Loader Transform. The Microsoft Batch Loader dialog displays. 2. Select Target Data from the BaseView dropdown.

3. Type acct_dimension in the Table field.

Sagent Data Flow Solution v6.8 57 Data Mart Population Guide Chapter 5: Creating Population Plans Creating Population Plans

4. Click the Create Table radio button. The completed dialog looks like the following:

5. Click OK. 6. Click Update. The Plan runs and populates the acct_dimension table. The acct_dimension table is created. A primary key is generated for account_key, and data from the source table is loaded into the acct_dimension table. 7. Save your Plan.

Creating the card_prod_dimension Plan To populate the card_prod_dimension: 1. Make sure you are using Design Studio logged into the samples_repository database. 2. Click the Parts Bin button. 3. Click the Search button. A list of MetaViews displays. 4. Select the Population MetaView.

Sagent Data Flow Solution v6.8 58 Data Mart Population Guide Chapter 5: Creating Population Plans Creating Population Plans

5. Drag the dbo.card_product Category from the Parts Bin into the Workspace. The following Plan displays in the Data Flow Editor:

6. Double-click the SQL Query Transform. The SQL Editor dialog displays. 7. Click the Suppress Duplicate Records check box. 8. Click OK. 9. Click the Tool Bin button. 10. Drag the Key Generation Transform from the Tool Bin onto the connector between the SQL Query Transform and Grid Transform. 11. Double-click the Key Generation Transform. The Key Generation dialog displays.

12. Type cardproduct_key into the Key Column field. 13. Click the Start Key Value At radio button and leave the value as 0. 14. Click OK. 15. Click Update. The following displays:

16. Delete the Grid Transform.

17. Drag the Microsoft Batch Loader Transform from the Tool Bin into the Data Flow Editor. 18. Double-click the Microsoft Batch Loader Transform. The Microsoft Batch Loader dialog displays. 19. Select Target Data from the BaseView dropdown.

20. Type card_prod_dimension in the Table field.

21. Click the Create Table radio button.

Sagent Data Flow Solution v6.8 59 Data Mart Population Guide Chapter 5: Creating Population Plans Creating Population Plans

22. Click OK. 23. Click Update. The Plan runs and populates the card_prod_dimension table. The card_prod_dimension table is created. A primary key is generated for cardproduct_key, and data from the source table is loaded into the card_prod_dimension table. 24. Save your Plan.

Creating the household_dimension Plan To populate the household_dimension: 1. You must follow the instructions in “Creating the Data Flow Plan” on page 54. 2. Click the Parts Bin button. 3. Click the Search button. A list of MetaViews displays. 4. Select the Population MetaView. 5. Expand the following Categories: – dbo.address – dbo.household

6. Drag the following Parts from the Parts Bin into the Workspace: address household_head_name city household_id state household_income zip_code household_type

1. Double-click the SQL Query Transform. The SQL Editor dialog displays. 2. Click the Suppress Duplicate Records check box. 3. Click OK. 4. Double-click the Key Generation Transform. The Key Generation dialog displays.

5. Type household_key into the Key Column field. 6. Click the Start Key Value At radio button and leave the value as 0. 7. Click OK. 8. Double-click the Microsoft Batch Loader Transform. The Microsoft Batch Loader dialog displays. 9. Select Target Data from the BaseView dropdown. 10. Type household_dimension in the Table field.

Sagent Data Flow Solution v6.8 60 Data Mart Population Guide Chapter 5: Creating Population Plans Creating Population Plans

11. Click the Create Table radio button. 12. Click OK. 13. Click Update. The Plan runs and populates the household_dimension table. The household_dimension table is created. A primary key is generated for household_key, and data from the source table is loaded into the household_dimension table. 14. Save your Plan.

Creating the status_dimension Plan To populate the status_dimension:

1. You must follow the instructions in “Creating the Data Flow Plan” on page 54. 2. Click the Parts Bin button. 3. Click the Search button. A list of MetaViews displays. 4. Select the Population MetaView.

5. Drag the dbo.status Category from the Parts Bin into the Workspace. 6. Double-click the SQL Query Transform. The SQL Editor dialog displays. 7. Click the Suppress Duplicate Records check box. 8. Click OK.

9. Double-click the Key Generation Transform. The Key Generation dialog displays.

10. Type status_key into the Key Column field. 11. Click the Start Key Value At radio button and leave the value as 0. 12. Click OK. 13. Double-click the Microsoft Batch Loader Transform. The Microsoft Batch Loader dialog displays. 14. Select Target Data from the BaseView dropdown. 15. Type status_dimension in the Table field.

16. Click the Create Table radio button. 17. Click OK.

Sagent Data Flow Solution v6.8 61 Data Mart Population Guide Chapter 5: Creating Population Plans Creating Population Plans

18. Click Update. The Plan runs and populates the status_dimension table. The status_dimension table is created. A primary key is generated for status_key, and data from the source table is loaded into the status_dimension table. 19. Save your Plan.

Creating the trans_type_dimension Plan To populate the trans_type_dimension: 1. You must follow the instructions in “Creating the Data Flow Plan” on page 54. 2. Click the Parts Bin button. 3. Click the Search button. A list of MetaViews displays. 4. Select the Population MetaView. 5. Drag the dbo.transaction_desc Category from the Parts Bin into the Workspace. 6. Double-click the SQL Query Transform. The SQL Editor dialog displays. 7. Click the Suppress Duplicate Records check box. 8. Click OK. 9. Double-click the Key Generation Transform. The Key Generation dialog displays.

10. Type transaction_key into the Key Column field. 11. Click the Start Key Value At radio button and leave the value as 0. 12. Click OK. 13. Double-click the Microsoft Batch Loader Transform. The Microsoft Batch Loader dialog displays. 14. Select Target Data from the BaseView dropdown. 15. Type trans_type_dimension in the Table field.

16. Click the Create Table radio button. 17. Click OK. 18. Click Update. The Plan runs and populates the trans_type_dimension table. The trans_type_dimension table is created. A primary key is generated for transaction_key, and data from the source table is loaded into the trans_type_dimension table. 19. Save your Plan.

Sagent Data Flow Solution v6.8 62 Data Mart Population Guide Chapter 5: Creating Population Plans Creating Population Plans

Creating a Plan for the Fact Table Now that the dimension tables are populated, you are ready to populate the fact table. You populate a fact table with numeric measurements from tables in the OLTP database.

 You must populate the dimension tables before populating the fact table.

Keys from the source tables are replaced in the fact table with keys from the dimension tables. Quantitative data from source tables is loaded directly into the fact table. To populate the fact table, daily_trans_fact: 1. Create the data flow Plan, see “Creating the Fact Data Flow Plan” on page 63. 2. Create the fact table, see “Creating the Fact Table” on page 64. 3. Complete the Time Lookup dialog, see “Defining Time Lookup” on page 65. 4. Complete the Key Lookup dialog, see “Defining Key Lookup” on page 66. 5. Complete the Key Lookup dialog 2, see “Defining Key Lookup 2” on page 67. 6. Complete the Key Lookup dialog 3, see “Defining Key Lookup 3” on page 68.

7. Complete the Key Lookup dialog 4, see “Defining Key Lookup 4” on page 69. 8. Complete the Key Lookup dialog 5, see “Defining Key Lookup 5” on page 69. 9. Define the columns of the fact table, see “Defining Columns in the Fact Table” on page 70. 10. Complete the Microsoft Batch Loader dialog, see “Defining the Microsoft Batch Loader” on page 71. 11. Create the star schema in the data mart see “Creating the Star Schema” on page 73.

Creating the Fact Data Flow Plan To create the data flow Plan: 1. Open the Data Flow Editor. For more information, see “Displaying the Data Flow Editor” on page 47. 2. Click the Tool Bin button.

Sagent Data Flow Solution v6.8 63 Data Mart Population Guide Chapter 5: Creating Population Plans Creating Population Plans

3. Drag the following Transforms from the Tool Bin into the Data Flow Editor: – SQL Query Transform – Time Lookup Transform – Key Lookup Transform for each dimension table – Column Select Transform – Microsoft Batch Loader Transform

4. To continue creating the fact table Plan, go to “Creating the Fact Table”in the next section.

Creating the Fact Table This section is continued from “Creating the Fact Data Flow Plan” in the previous section. To create the fact table: 1. Click the Parts Bin button. 2. Click the Search button. A list of MetaViews displays. 3. Select the Population MetaView. 4. Expand the following categories: – dbo.account – dbo.card_product – dbo.card_transaction – dbo.household – dbo.status – dbo.transaction_desc 5. Drag the following Parts from the Parts Bin into the Workspace. amount current_debt available_credit

These Parts become the non-key columns in the fact table, daily_trans_fact. The Parts are added to the SQL Query step. No Grid sink is in the data flow Plan, so Parts are not displayed in the Workspace.

Sagent Data Flow Solution v6.8 64 Data Mart Population Guide Chapter 5: Creating Population Plans Creating Population Plans

1. Drag the following Parts from the Parts Bin into the Workspace: account_id household_id card_product_id status_id transaction_desc_id the_date

These Parts represent the natural keys in the source database. The Key Lookup Transforms use these natural keys to look up new keys in the dimension tables. 1. To continue creating the fact table Plan, go to “Defining Time Lookup” on page 65.

Defining Time Lookup This procedure is continued from “Creating the Fact Data Flow Plan” on page 63. To define Time Lookup:

1. Double-click the Time Lookup Transform. The Time Lookup dialog displays.

2. Click the Input Column radio button 3. Select the_date from the Input Column dropdown. 4. Click the Day check box.

Sagent Data Flow Solution v6.8 65 Data Mart Population Guide Chapter 5: Creating Population Plans Creating Population Plans

5. Type day_key in Output Column Name field. The completed dialog looks like the following:

6. Click OK. 7. To continue creating fact table Plan, go to “Defining Key Lookup” on page 66.

Defining Key Lookup This procedure is continued from “Defining Time Lookup” on page 65. To define the Key Lookup: 1. Double-click the Key Lookup Transform. The Key Lookup dialog displays.

2. Select Target Data from the BaseView dropdown. 3. Type dbo.acct_dimension in the Table field.

4. Type account_key in the Column field. 5. Click the New Field radio button.

6. Type account_key in the field below the New Field radio button.

Sagent Data Flow Solution v6.8 66 Data Mart Population Guide Chapter 5: Creating Population Plans Creating Population Plans

7. Select account_id from the Input Table Column dropdown. 8. Type account_id in the Lookup Table Column field.

9. Click Add. 10. Click OK. 11. To continue creating the fact table Plan, go to “Defining Key Lookup 2” on page 67.

Defining Key Lookup 2 This procedure is continued from “Defining Key Lookup” on page 66. To define Key Lookup 2: 1. Double-click the Key Lookup 2 Transform. The Key Lookup dialog displays.

Sagent Data Flow Solution v6.8 67 Data Mart Population Guide Chapter 5: Creating Population Plans Creating Population Plans

2. In the Key Lookup dialog, specify lookup criteria for the card_product_key column as follows: Figure 3:

Field Criteria

BaseView Target Data

Table card_prod_dimension

Column cardproduct_key

Place Result In radio button New Field

Place Result In text field card_product_key

Input Table Column card_product_id

Lookup Table Column card_product_id

1. Click Add. 2. Click OK. 3. To continue creating the fact table Plan, go to “Defining Key Lookup 3” on page 68.

Defining Key Lookup 3 This procedure is continued from “Defining Key Lookup 2” on page 67. To define the Key Lookup 3: 1. Double-click the Key Lookup 3 Transform. The Key Lookup dialog displays. 2. In the Key Lookup dialog, specify lookup criteria for the household_key column as follows: Figure 4:

Field Criteria

BaseView Target Data

Table household_dimension

Column household_key

Place Result In radio button New Field

Place Result In text field household_key

Input Table Column household_id

Lookup Table Column household_id

Sagent Data Flow Solution v6.8 68 Data Mart Population Guide Chapter 5: Creating Population Plans Creating Population Plans

1. Click Add to complete the mapping. 2. Click OK. To continue creating the fact table Plan, go to “Defining Key Lookup 4” on page 69.

Defining Key Lookup 4 This procedure is continued from “Defining Key Lookup 3” on page 68. To define the Key Lookup 4: 1. Double-click the Key Lookup 4 Transform. The Key Lookup dialog displays. 2. In the Key Lookup dialog, specify lookup criteria for the status_key column as follows: Figure 5:

Field Criteria

BaseView Target Data

Table status_dimension

Column status_key

Place Result In radio button New Field

Place Result In text field status_key

Input Table Column status_id

Lookup Table Column status_id

1. Click Add. 2. Click OK. 3. To continue creating the fact table Plan, go to “Defining Key Lookup 5” on page 69.

Defining Key Lookup 5 This procedure is continued from “Defining Key Lookup 4” on page 69. To define the Key Lookup 5: 1. Double-click the Key Lookup 5 Transform. The Key Lookup dialog displays.

Sagent Data Flow Solution v6.8 69 Data Mart Population Guide Chapter 5: Creating Population Plans Creating Population Plans

2. In the Key Lookup dialog, specify lookup criteria for the transaction_key column as follows: Figure 6:

Field Criteria

BaseView Target Data

Table trans_type_dimension

Column transaction_key

Place Result In radio button New Field

Place Result In text field transaction_key

Input Table Column transaction_desc_id

Lookup Table Column transaction_desc_id

1. Click Add. 2. Click OK. 3. To continue creating the fact table Plan, go to “Defining Columns in the Fact Table” on page 70.

Defining Columns in the Fact Table This procedure is continued from “Defining Key Lookup 5” on page 69.

To define the columns in the fact table:

Sagent Data Flow Solution v6.8 70 Data Mart Population Guide Chapter 5: Creating Population Plans Creating Population Plans

1. Double-click the Column Select Transform. The Column Select dialog displays

2. Click the following check boxes to clear the checks: – account_id – card_product_id – transaction_desc_id – household_id –status_id – the_date The selected columns appear in the fact table. 3. Click OK. 4. To continue creating the fact table Plan, go to “Defining the Microsoft Batch Loader” on page 71.

Defining the Microsoft Batch Loader This procedure is continued from “Defining Columns in the Fact Table” on page 70. To define the Microsoft Batch Loader:

Sagent Data Flow Solution v6.8 71 Data Mart Population Guide Chapter 5: Creating Population Plans Creating Population Plans

1. Double-click the Batch Loader Transform. The Microsoft Batch Loader dialog displays. 2. Select Target Data from the BaseView dropdown. 3. Type daily_trans_fact in the Table field.

4. Click the Create Table radio button.

 When you use the Create Table option to populate the fact table, natural keys in dimension tables are stored in the fact table for reference purposes. In a real population scenario, you might want to remove these columns. To filter out these key columns, add a Column Select Transform after the last Key Lookup Transform in the fact table Plan. For more information on the Column Select Transform, see the Sagent Transforms Reference Guide.

5. Click OK. 6. Click Update. The daily_trans_fact table is populated with data from the source database, and its concatenated key is created from primary keys in the data mart dimension tables. Natural keys from the source tables are stored in the data mart dimension tables.

Sagent Data Flow Solution v6.8 72 Data Mart Population Guide Chapter 5: Creating Population Plans Creating Population Plans

7. Save your Plan.

Creating the Star Schema To create the star schema:

1. Select ToolsBaseView Editor. 2. Select Target Data from the dropdown.

3. Click the Add Tables button on the toolbar.

Add Tables

Sagent Data Flow Solution v6.8 73 Data Mart Population Guide Chapter 5: Creating Population Plans Creating Population Plans

The Add Tables to BaseView dialog displays.

4. Select all tables. 5. Click OK. The tables are added to the Target Data BaseView. 6. Draw the joins to match the diagram “The samples_creditcard_star database structure” on page 37.

7. Close the BaseView Editor. A star schema is created in the samples_creditcard_star database

Sagent Data Flow Solution v6.8 74 Data Mart Population Guide Chapter 5: Creating Population Plans Creating Population Plans

Using Flat File Sources You can use flat files as data sources to populate tables. Make sure the flat file is stored on the machine hosting the Data Flow Service, or is referenced in relation to the server. To extract and load using a flat file source: 1. Create the flat file source, see “Creating the Flat File Source” on page 75. 2. Create a data flow Plan, see “Creating the Data Flow Plan” on page 75. 3. Complete the Flat File Source dialog, see “Defining the Flat File Source” on page 76. 4. View sample file, see “Viewing a Sample File” on page 79.

5. Complete the Key Generation dialog, see “Defining Key Generation” on page 80 6. Load the data into the data mart, see “Defining the Microsoft Batch Loader.” on page 81.

Creating the Flat File Source To create the flat file source: 1. Open a text editor, such as Notepad. 2. Type the following into the text editor:

3. Save the data as a text file named flat file on your machine. 4. Close the text editor. 5. To continue using a flat file source to load a dimension table, go to “Creating the Data Flow Plan” on page 75.

Creating the Data Flow Plan This procedure is continued from “Creating the Flat File Source” on page 75. To create the data flow Plan: 1. Open the Data Flow Editor. For more information, see “Displaying the Data Flow Editor” on page 47. 2. Click the Tool Bin button.

Sagent Data Flow Solution v6.8 75 Data Mart Population Guide Chapter 5: Creating Population Plans Creating Population Plans

3. Drag the following Transforms into the Data Flow Editor from the Tool Bin: – Flat File Source – Key Generation – Microsoft Batch Loader

4. To continue using a flat file source to load a dimension table, go to “Defining the Flat File Source” on page 76.

Defining the Flat File Source This procedure is continued from “Creating the Data Flow Plan” on page 75. To define the Flat File source: 1. Double-click the Flat File Source Transform. The Flat File Source dialog displays. 2. Type the name and location of the flat file source you created in the Source File Path or URL field.

Sagent Data Flow Solution v6.8 76 Data Mart Population Guide Chapter 5: Creating Population Plans Creating Population Plans

3. Type the name and location of the flat file source you created in the Sample File Path or URL field.

4. Click the Attributes tab.

Sagent Data Flow Solution v6.8 77 Data Mart Population Guide Chapter 5: Creating Population Plans Creating Population Plans

5. Click the Delimited Text radio button.

6. Click the Columns tab. 7. Click Detect Columns.

Sagent Data Flow Solution v6.8 78 Data Mart Population Guide Chapter 5: Creating Population Plans Creating Population Plans

8. Click Detect Data Types. The dialog looks like the following:

9. To continue using a flat file source to load a dimension table, go to “Viewing a Sample File” on page 79.

Viewing a Sample File This procedure is continued from “Defining the Flat File Source” on page 76. Optionally, you can use a sample file to view how the settings in the dialog affect fields in the source file. The sample file should have the same characteristics as the source file. The sample usually contains only a representative amount of data. To view the sample file: 1. Click the File Select tab.

Sagent Data Flow Solution v6.8 79 Data Mart Population Guide Chapter 5: Creating Population Plans Creating Population Plans

2. Click View Data to view the data in the flat file source. The data is displayed in the lower field:

3. Click OK. 4. To continue using a flat file source to load a dimension table, go to “Defining Key Generation” on page 80.

Defining Key Generation This procedure is continued from “Viewing a Sample File” on page 79. To complete the Key Generation dialog: 1. Double-click the Key Generation Transform. The Key Generation dialog displays. 2. Type FlatFile_key into the Key Column field.

Sagent Data Flow Solution v6.8 80 Data Mart Population Guide Chapter 5: Creating Population Plans Creating Population Plans

3. Select the Start Key Value at radio button, and leave the value as 0.

4. Click OK. 5. To continue using a flat file source to load a dimension table, go to “Defining the Microsoft Batch Loader.” on page 81.

Defining the Microsoft Batch Loader. This procedure is continued from “Defining Key Generation” on page 80. To complete the Microsoft Batch Loader:

1. Double-click the Microsoft Batch Loader Transform. The Microsoft Batch Loader dialog displays. 2. Select Target Data from the BaseView dropdown.

3. Type flat_file_dimension in the Table field. 4. Click the Create Table radio button. 5. Click OK. 6. Click Update. A table named flat_file_dimension is created and loaded with data from the flat file source.

Sagent Data Flow Solution v6.8 81 Data Mart Population Guide Chapter 5: Creating Population Plans Creating Population Plans

7. Save your Plan.

Sagent Data Flow Solution v6.8 82 Data Mart Population Guide Scheduling Population 6 Plans

This chapter is a tutorial that describes how to schedule Plans using Design Studio Scheduler. In this chapter: • About Scheduling • Using Design Studio Scheduler Chapter 6: Scheduling Population Plans About Scheduling About Scheduling

You can schedule Plans to run automatically at specified times. Scheduling population Plans is useful, because most load operations require system resources unavailable during the business day. You can use several scheduling mechanisms with Design Studio to run Plans automatically.

 The Schedule service and Data Flow Service must be running on the same machine when you schedule a Plan and when it runs. Do not update Registry settings on the Design Studio machine between the time a Plan is scheduled and the time it is run.

Ensure that system clocks on the Repository, Data Flow Service and Design Studio machines are synchronized before scheduling any Plans. Discrepancies among system clock settings cause Plans to run at varying times. Before you create and schedule Plans, determine the following: • Schedule Intervals • Population Plan Sequence • Plan Dependencies

Schedule Intervals Schedule population Plans to execute based on the grain of the most detailed fact table. For example: • If the grain of the fact table is daily, refresh the fact table daily. • If the grain is monthly refresh the table monthly, not sooner because users work only with data from completed past months. Most population Plans process large amounts of data, so schedule population Plans to execute when use of the Sagent Load Server, the source and data mart databases, and the network is minimal. Populate all dimension and fact tables during the initial load. After the initial load, refresh tables based on what was added or changed. Generally, fact tables are refreshed more frequently than dimension tables because: • Dimension tables are usually static unless an attribute in the source is changed or added. • Fact table data in a decision support database is typically historical, and requires regular additions and updates to remain current. The initial load and most incremental loads affect fact tables.

Population Plan Sequence Dependencies exist among data in the data mart databases, so determine the sequence in which to run population Plans before setting the Plan execution schedule. Populate dimension tables before fact tables. Every dimension record and key must exist before a related fact table can be populated. This restriction is a function of the primary-foreign key relationship between dimension and fact tables in a star schema.

Sagent Data Flow Solution v6.8 84 Data Mart Population Guide Chapter 6: Scheduling Population Plans About Scheduling

Refresh base-level tables before populating aggregate tables in your decision support database. This sequence ensures that base-level and aggregate tables remain synchronized. The correct order to run population Plans is: 1. Base-level dimension table Plans 2. Base-level fact table Plans 3. Aggregate dimension table Plans 4. Aggregate fact table Plans

Plan Dependencies You can create Plan dependencies if several population Plans need to run in a specific order, or if the amount of time to run Plans is unpredictable. A Plan is run only if certain requirements are met such as, the previous Plan has completed, or the previous Plan has failed. You can create Plan dependencies and schedule dependent Plans to run using: • Command Line You can also run dependent plans from the command line using the SARUN command. For information see, the Sagent Administrator’s Guide.

Sagent Data Flow Solution v6.8 85 Data Mart Population Guide Chapter 6: Scheduling Population Plans Using Design Studio Scheduler Using Design Studio Scheduler

Scheduler is a graphical scheduling tool integrated with Design Studio. Plan runs at the time specified in the Scheduler. Use the Scheduler to run: • Single population Plans • Multiple population Plans To run multiple Plans, you must know how long each population Plan takes to complete so you can predict how much time to leave between Plans. When you schedule a Plan using the Scheduler, the information is stored in several places: • Registries on Design Studio • Sagent Load Server • Repository Coordinate all components that store Plan schedule information. Verify that the Design Studio machine is configured to use the same Data Flow Service and Repository used when the Plan was scheduled.

 Do not update Registry settings on the Design Studio machine between the time a Plan is scheduled and the time it runs. If Data Flow Service or Repository settings are modified, the scheduled Plan will not run.

Use Design Studio Scheduler when: • Scheduling a Plan • Viewing Scheduled Plans • Removing a Scheduled Plan • Receiving Notification • Viewing the Event Log

Scheduling a Plan

 You must specify in Sagent Admin the account under which Scheduler will run.

To schedule a Plan using the Design Studio Scheduler:

1. Open Design Studio, see “Displaying the Data Flow Editor” on page 47, steps 1. to 6. 2. Click the Search button. A list of MetaViews displays. 3. Select the Marketing MetaView.

Sagent Data Flow Solution v6.8 86 Data Mart Population Guide Chapter 6: Scheduling Population Plans Using Design Studio Scheduler

4. Expand the following Categories: – Product –Sales 5. Drag the following Parts from the Parts Bin: – Product Name – Cost in Dollars –Units Sold 6. Click Update. The results display in the Workspace. 7. Save the Plan in the Plan Bin.

Sagent Data Flow Solution v6.8 87 Data Mart Population Guide Chapter 6: Scheduling Population Plans Using Design Studio Scheduler

8. Click the Schedule Plans button on the Standard toolbar.

Schedule Plans button

The Scheduler dialog displays a calendar and time slots. The current date and time are highlighted.

9. Select the current date on monthly calendar.

10. Select the time of day that you want to run the Plan. 11. Drag the Plan you created from the Plan Bin to the time of day you want it to run.

Sagent Data Flow Solution v6.8 88 Data Mart Population Guide Chapter 6: Scheduling Population Plans Using Design Studio Scheduler

12. Select Every week from the Frequency dropdown to schedule the Plan to run every week on the same weekday. The calendar displays a clock icon on every date the Plan is scheduled to run.

Plan will run every week

The first scheduled occurrence displays a clock icon with two dots. Subsequent occurrences display a clock only.

13. Select how you want to be notified that the Plan has run from the Notify by dropdown.

14. If you select Email or Email and logfile from the Notify by dropdown, the Enter Email Address dialog displays.

Sagent Data Flow Solution v6.8 89 Data Mart Population Guide Chapter 6: Scheduling Population Plans Using Design Studio Scheduler

15. Type the email address where the notification is sent.

16. Click OK.

17. Select Save Plan as Snap from the Run result dropdown. Plan results are saved as a Snap in the Snap Bin when the Plan runs.

18. Click OK. The scheduled Plan in the Plan Bin displays a clock icon.

19. To view the Scheduler, right-click on the Plan.

20. Select Properties. The Properties dialog displays. 21. Close the Properties dialog. 22. Close the Design Studio application. The Plan is scheduled to run every week. If Design Studio is open while the scheduled Plan runs, you must close and re-open the application to view Snaps and the use the Event Log button.

Viewing Scheduled Plans To view all Plans scheduled to run on a specific date:

1. Click the Schedule Plans button on the Standard toolbar. The Scheduler dialog displays.

Sagent Data Flow Solution v6.8 90 Data Mart Population Guide Chapter 6: Scheduling Population Plans Using Design Studio Scheduler

2. Right-click the date on the calendar. A dialog displays a list of Plans scheduled to run.

You can use Sagent Admin to view scheduled Plans. For more information, see the Sagent Administrator’s Guide.

Removing a Scheduled Plan To delete all Plans scheduled to run on the same day:

1. Click the Schedule Plans button on the Standard toolbar. The Scheduler dialog displays. 2. Click the schedule day on the calendar. 3. Press Delete. The following dialog displays:

4. Click Yes.

Sagent Data Flow Solution v6.8 91 Data Mart Population Guide Chapter 6: Scheduling Population Plans Using Design Studio Scheduler

5. If the following dialog displays, click OK.

All Plans scheduled to run on the same day are deleted. To delete one of several Plans scheduled to run on the same day:

1. Click the Schedule Plans button on the Standard toolbar. The Scheduler dialog displays. 2. Click the schedule day on the calendar. 3. Click the Plan to delete in time slot of the Time of Day section. 4. Press Delete. The following dialog displays:

5. Click Yes. The Plan is deleted from the Scheduler.

6. Click OK.

Receiving Notification When using Design Studio Scheduler, you can be notified when a Plan is run by: • Email You specify the email address in a dialog box. • A log file entry A log file, called schedule.log, is placed in the Sagent directory on the Sagent Load Server.

Sagent Data Flow Solution v6.8 92 Data Mart Population Guide Chapter 6: Scheduling Population Plans Using Design Studio Scheduler

• Email and a log file entry To select the type of notification, see “Scheduling a Plan” on page 86.

Viewing the Event Log You can view the event log when the Event Log button is active. To view the event log:

1. Click the Event Log button.

Event Log Button

The Event Log dialog displays.

2. Click Close. The Event Log button is disabled.

Sagent Data Flow Solution v6.8 93 Data Mart Population Guide Advanced Population 7 Options

This appendix describes optional procedures to complete while you prepare your environment to enhance extraction and loading. In this chapter: • Optimizing Load Performance • Changed Data Capture and Data Cleansing • Auditing Population Plans • Enhancing Query Performance on the Star Schema • Managing Indexes and Audit Trails Chapter 7: Advanced Population Options Optimizing Load Performance Optimizing Load Performance

To optimize load performance, you should set transaction management in the target database to UNRECOVERABLE mode. The volume of data processed during population usually makes transaction management impractical.

Sagent Data Flow Solution v6.8 95 Data Mart Population Guide Chapter 7: Advanced Population Options Changed Data Capture and Data Cleansing Changed Data Capture and Data Cleansing

To ensure that data is useful and relevant, you must preserve all data. To preserve data make sure you maintain old data during loading and eliminate redundancy.

The information in a decision support database is historical. If you reload all of the data in a dimension table every time you perform an extraction, you lose historical information. For example, if a customer is deleted from the source customer table, and you overwrite the target customer table with each extraction, you lose all the facts for that customer. To prevent loss of historical information, extract only data that has changed since the previous extraction. This process improves load performance and preserves historical data. After you identify data that has changed since the last extraction, Design Studio supports several methods of appending data to a table or updating existing records. These methods include: • Scrubbing Source Data • Timestamping the Data • Appending New Fact Records • Appending New Dimension Records (Type II Records) • Updating Existing Records (Type I Records) • Using a Third Party Tool

Scrubbing Source Data Scrubbing is the process of correcting errors in the source data and eliminating redundancies that invalidate analysis by end users. To correct invalid strings or characters from Design Studio, add a Search & Replace Transform to the population Plan.

Sagent Data Flow Solution v6.8 96 Data Mart Population Guide Chapter 7: Advanced Population Options Changed Data Capture and Data Cleansing

Use this Transform to search for specific characters or strings, or use wildcard characters to broaden your search. For example, if the sales region name has changed from New England to Northeast, specify the following settings in the Search & Replace Transform dialog to ensure that loaded records have the new region name.

You can store the changed record in a new column for tracking purposes. For more information, see the Sagent Transforms Reference Guide.

Timestamping the Data Timestamp your data by: • Timestamping the Load • Timestamping the Source Database

Timestamping the Load A convenient way to assure data quality is to flag records in the target according to the date they were loaded. If the load process does not complete or you notice problems after data is already loaded, a timestamp column makes it easy to identify which records were affected. You can then delete all the records processed during a particular phase, return to the pre-load state and address any problems before running population Plans again. You can timestamp the load operation by adding an extra column, such as load_date, to your fact table using the Expression Calculator Transform. To timestamp a load: 1. Open the Expression Calculator dialog.

Sagent Data Flow Solution v6.8 97 Data Mart Population Guide Chapter 7: Advanced Population Options Changed Data Capture and Data Cleansing

2. Click New. The Expression Builder dialog displays. 3. Double-click dtCurrentDT from the Functions scroll list. 4. Click the ( and ) buttons.

5. Click OK. A timestamp column named Calculated1 is added to the Expression Calculator dialog.

6. To change the name of the column, right-click Calculated1 and select Rename. 7. Type the new name for the column. 8. Click OK. The Expression Calculator Transform creates a new column and populates it with the current date and time when a Plan is run. For more information, see Sagent Transforms Reference Guide.

Timestamping the Source Database Consider creating a timestamp column in your source database that identifies when records were last modified.

For records processed during the initial load, populate the timestamp column with a baseline date: 1. During the first incremental load, edit the extract SQL to constrain on timestamp values. 2. Specify an interval between the baseline and current dates.

Sagent Data Flow Solution v6.8 98 Data Mart Population Guide Chapter 7: Advanced Population Options Changed Data Capture and Data Cleansing

3. For each subsequent incremental load, modify this interval to use the last incremental load and the current date.

Appending New Fact Records Incremental load operations often append records to a target fact table, to preserve existing facts. To append records to a fact table, edit the extract SQL in the fact table population Plan. For example, many tables have a timestamp column that identifies when a record was last changed. If data is loaded daily, add a constraint such as, WHERE date_last_modified =''07/01/97'' in the SQL Editor. The population Plan extracts only records updated on the date or during the interval you specify.

If your fact table population Plan ends in a Batch Loader, to append records select the Use Existing Table option or Append mode in the Batch Loader dialog.

Appending New Dimension Records (Type II Records) Most dimensions change slowly over time. By appending new records to a dimension table when a change occurs, you keep the old and new values. Appending new records results in a richer historical perspective on your data than updating the record. To append records to a dimension table, specify an incremented value when generating the new key. Creating a new record and key in the dimension table allows updated and original dimension information to coexist.

Design Studio uses sequential integer keys to identify dimension records. Determine the next available key value by either tracking existing values manually or looking up the maximum value in the dimension table key column. The Key Generation Transform looks up existing key values. For more information, see the Sagent Transforms Reference Guide.

If your dimension table population Plan ends in a Batch Loader, append records by selecting the Use Existing Table option or Append mode in the Batch Loader dialog.

Updating Existing Records (Type I Records) If a record has changed in the source since the initial load, update the existing record in the target. This is an alternate way to handle slowly changing dimensions. For example, when you update a dimension record, all associated facts are described by the new value and historical information is lost. You can use a SQL Command Transform to update existing dimension records in a target table. If you are appending dimension records, include this Transform as a step in the dimension table population Plan. Then, edit or remove the SQL Command step as needed before running the Plan again. You can create a special Plan that only performs the update operation. In this case, extract dimension records with a SQL Query source and perform the update with a SQL Command Sink Transform. For more information on using these Transforms, see the Sagent Transforms Reference Guide.

Sagent Data Flow Solution v6.8 99 Data Mart Population Guide Chapter 7: Advanced Population Options Changed Data Capture and Data Cleansing

Using a Third Party Tool Another option is to use a third-party tool to capture and extract changed data. Most change capture tools write changed records to a database table. To perform an incremental load, create a source BaseView of this table and create population Plans to extract and load changed records.

Sagent Data Flow Solution v6.8 100 Data Mart Population Guide Chapter 7: Advanced Population Options Auditing Population Plans Auditing Population Plans

You can use the auditing or logging features of your target database, Queue Monitor, or status Monitor to track how population Plans affect target tables. However, if you need to disable database auditing during the load operation or if you want to monitor Plan execution directly, you can use Design Studio to get information about population Plans. In Design Studio, auditing is implemented for each Plan as a Plan property. To activate Plan auditing in Design Studio: 1. Right-click the Plan in the Plan Bin and select Properties. The Properties dialog displays.

2. Click the Track Plan Execution checkbox. 3. Click OK. When you audit a Plan, information about its execution is recorded in the current Repository.

Sagent Data Flow Solution v6.8 101 Data Mart Population Guide Chapter 7: Advanced Population Options Auditing Population Plans

When you run a Plan with Plan tracking, tables in your Repository database log events associated with the Plan. – sarp_track_plan table contains a record for each tracked Plan executed – sarp_track_step tracks each step of the Plan – sarp_track_table tracks affected target tables Plan tracking also lists all the records that are processed or rejected in each iteration of an iterative SubPlan. The total number of records processed in each iteration is tracked using the tables sarp_track_step and sarp_track_table. Columns in these tables store information such as start and end time, error text, number of records processed and number of records rejected. To retrieve Plan tracking information, create a BaseView and MetaView of the Sagent Repository including these tables and execute queries against them. For more information on creating BaseViews and MetaViews, see the Sagent Design Studio User’s Guide.

Sagent Data Flow Solution v6.8 102 Data Mart Population Guide Chapter 7: Advanced Population Options Enhancing Query Performance on the Star Schema Enhancing Query Performance on the Star Schema

Aggregates are summary records that you calculate and store to enhance query performance. If your organization uses Sagent client applications to access data, you can use Design Studio to automate aggregate tasks, including building, populating and navigating aggregates. For more information, see the Sagent Design Studio User’s Guide. Unlike base-level population Plans, which run against the source database, an aggregate population Plan runs against tables in the target database after the tables are populated. Using a Plan instead of a SQL script or other tool to build aggregates gives you convenient access to all population Plans in the same environment. It allows you to create scheduling dependencies between base-level and aggregate population Plans.

An aggregate population Plan can be as simple as a SQL Query step and a Terminal Sink or a SQL Query to a Batch Loader. The SQL you type in the SQL Editor dialog should select columns to aggregate, apply an aggregate function, and insert summarized records into aggregate tables. You might want to perform a GROUP BY operation. You could use existing aggregate tables, or use SQL to create tables. End the Plan with a Terminal Sink Transform placed after the SQL Query step is executed.

Use the Scheduler to run aggregate population Plans after all Plans that populate base-level tables have completed. For more information about scheduling population Plans, see Chapter 6, “Scheduling Population Plans”.

Sagent Data Flow Solution v6.8 103 Data Mart Population Guide Chapter 7: Advanced Population Options Managing Indexes and Audit Trails Managing Indexes and Audit Trails

You can execute SQL statements in Design Studio before or after data is loaded. This feature is useful for performing database tasks related to population, such as maintaining indexes on target tables or an audit trail of database activity during population. You can execute SQL in the data flow in the following places:

– A Batch Loader Transform dialog – The Pre SQL or Post SQL tab of a SQL Command Transform – The SQL Editor dialog of a SQL Query Transform For example, after an initial load you might create an index on a fact table as follows:

How you manage indexes during a load operation depends on the target database and the number of records loaded. Generally, loading records into an indexed table is slower than loading into a non- indexed table.

You can execute SQL before and after extracting data. To do this, either edit the SQL statement in the SQL Query step or add a SQL Command Transform to the Plan. For example, the following Plan transforms data and populates an Informix fact table:

Sagent Data Flow Solution v6.8 104 Data Mart Population Guide Chapter 7: Advanced Population Options Managing Indexes and Audit Trails

Typing the following statement in the under the Pre-Load SQL tab in the SQL Command Transform creates an audit trail file for the fact table and allows you to monitor the load operation:

In this example, the target BaseView is selected on the SQL Command tab, so the SQL executes against the target database. The SQL Command tab of a SQL Command Transform can be empty if you only want to use the Pre SQL or Post SQL tab. Unlike the SQL Command tab in this Transform, a SQL statement on the Pre SQL or Post SQL tab is executed only once, and not directly against records in the data flow. SQL you type in a SQL Editor or a Batch Loader dialog is executed only once.

Sagent Data Flow Solution v6.8 105 Data Mart Population Guide Index

A creating non-time dimensions 53 populating non-time dimension 53 access server 22 updating 99 audit trails, managing 104 dimensions auditing population Plans 101 about 18 B see also dimension tables BaseView Editor, opening 38 E BaseViews Education help from Group 1 Software 12 about 21 enhancing query performance, star schema 103 creating for data mart 42 event log, see log file C extracting data 23 Column Select Transform F completing dialog 70 fact tables using 70 about 17, 73, 99 grain 18 D incremental load 99 Data Flow Editor measures 17–18 about 47 populating 63 displaying 47 facts Data Flow Service 22 about 18 data mart see also fact tables about 16, 73 flat file sources creating BaseView 42 compatibility with Design Studio 31, 75 design phase 25–26 using 75 preparing 33 data movement 23 G extraction 23 Getting help from Group 1 11 loading 23 e-mail address 11 transformation 23 grain, about 18 data sources analyzing 31 H compatibility with Design Studio 31, 38 Help, getting from Group 1 Software 11 flat file, creating 75 identifying 31 I preparing 31 representing structure 38 incremental load 24 timestamping 98 indexes, managing 104 verifying data 32 initial load 23 data, see data sources Design Studio J about 21 join groups login 47 about 40 scheduling Plans 86 creating 40 using 21 using 55 dimension tables joins, about 18 about 18, 73, 99 Julian Day value 51

Sagent Data Flow Solution v6.8 106 Data Mart Population Guide K population sequence 84 scheduling 86 key composite 18 R foreign 18 receiving notification primary 18 email 92 Key Generation Transform, completing dialog 56 logfile 92 Key Lookup Transform, completing dialog 66 referential integrity 19–20 L Repository about 22 load server 22 configuration 29 loading multiple 29 incremental 24 single 29 timestamping 97 loading data 23 S log file Sagent Environment receiving notification 92 about 21 viewing 93 preparing 29 M Sagent server 22 Scheduler, using 86 measures 17 scheduling MetaViews about 84 about 21 intervals 84 creating for data sources 38 Plan dependencies 85–86 receiving notification 92 N viewing scheduled Plans 90 notification scrubbing data email 92 server log file 92 access 22 load 22 O Sagent 22 Snaps OLAP about 21 about 16 saving scheduled Plan results as 90 compared to OLTP 16 star schema star schema structure 35 about 17, 73 OLTP creating 73 about 16 dimension tables 18 compared to OLAP 16 enhancing query performance 20, 103 normalized schema structure 35 fact tables 17 P structure 17, 35 permissions T data extraction 31, 33 time dimension, populating 49 Sagent Environment 29 Time Generation Transform Plans completing dialog 50 about 21 Julian Day value 51 see also population Plans Time Lookup Transform, completing dialog 65 population Plans timestamping about 46, 101 load 97 creating fact table 63 source database 98 dependencies 85 transforming data 23

Sagent Data Flow Solution v6.8 107 Data Mart Population Guide Type I Records 99 Type II Records 99 W Windows operating system 8 Microsoft Windows 2000 8 Windows Server 2003 8

Sagent Data Flow Solution v6.8 108 Data Mart Population Guide