Hands on Lab: Building Your First Extract-Transform-Load Process with SQL Server 2008 R2
Total Page:16
File Type:pdf, Size:1020Kb
Hands-On Lab Building Your First Extract- Transform-Load Process with SQL Server 2008 R2 Integration Services 1.
2. 3. Lab version: 1.0.0 4. Last updated: 6/1/2018
5.
6.
7.
8. CONTENTS 9. Overview
10. This lab introduces Integration Services package design specifically to populate a fact table.
Note: Before you start with this exercise you must ensure that your machine meets the system requirements detailed in the next section. Additionally, you must complete the setup steps described in the next section.
11.
Objectives 1. The objectives of this exercise are to: Create an Integration Services project
Define package variables
Define package connection managers
Configure control flow
Configure data flow
System Requirements 1. You must have installed the following items to complete this lab: Microsoft SQL Server 2008 R2:
Database Engine
Integration Services
SQL Server Business Intelligence Development Studio
SQL Server Management Studio
SQL Server AdventureWorks2008 R2 sample databases
AdventureWorks Data Warehouse 2008R2 Setup 1. All the requisites for this lab are verified using the Configuration Wizard. To make sure that everything is correctly configured, follow these steps.
2. Note: To perform the setup steps you need to run the scripts in a command window with administrator privileges.
3. 4. Launch the Configuration Wizard for this lab by double-clicking the Dependencies.dep file located under the Source\Setup folder of this lab. Install any pre-requisites that are missing (rescanning if necessary) and complete the wizard.
Cleanup There is no need to cleanup if you intend to continue the sequence of labs in this training kit. 1. To restore the original state of the AdventureWorksDW2008R2 database, execute the Cleanup.cmd script located under the Setup folder in the Source folder of this lab.
Exercises 1. This Hands-On Lab comprises the following exercise: 2. Populating the FactSalesQuota Table
1. Estimated time to complete this lab: 30 minutes. 2. Exercise 1: Populating the FactSalesQuota Table
3. In this exercise, you will develop an Integration Services package to load data from CSV files into the data warehouse’s FactSalesQuota table. The data in the files requires sophisticated transformation in order to load it into the destination table, and this will be achieved by using the Data Flow Task. Task 1 – Exploring the FactSalesQuota Data 4. In this task, you will use SQL Server Management Studio to execute a query to explore the current FactSalesQuota data. You will execute this query again later in this lab to review the new loaded data into the table using the Integration Services package. 5. Open SQL Server Management Studio from Start | All Programs | Microsoft SQL Server 2008 R2 | SQL Server Management Studio. 6. If prompted to connect to the server, click Cancel. 7. On the File menu, select New | Database Engine Query. 8. In the Connect to Database Engine window, configure the connection based on the following, and then click Connect.
Property Value
Server Name SqlServerTrainingKitAlias
Authentication Windows Authentication 9. 10. On the toolbar, select the AdventureWorksDW2008R2 database.
10.a. 10.b. Figure 1 10.c. Selecting the AdventureWorksDW2008R2 Database
10.d. 11. In the query pane, type the following query. 11.a. T-SQL 11.b. SELECT * FROM dbo.FactSalesQuota 11.c. ORDER BY DateKey DESC;
11.d. 12. On the toolbar, click Execute. 13. Review the query results. A total of 129 rows should be returned, and notice that there is no data beyond calendar year 2007. 14. Leave SQL Server Management Studio open.
Task 2 – Exploring the Quota Extract Files 1. In this task, you will navigate to the QuotaExtracts folder and explore the 2008 quota extract data. The following tasks in this lab will extract the data from these files and load it into the data warehouse’s FactSalesQuota table. 2. To open File Explorer, right-click Start, and then select Open Windows Explorer. 3. In the Explorer Window, navigate to the Assets\QuotaExtracts folder located in the Source folder for this lab. 4. Right-click the 2008_SalesQuota.csv file, and then select Open With | Notepad. 5. In the Notepad window, notice the structure of the data. Each file represents a year’s quotas for a sales territory group. Stored in the file are the quota values by employee ID and calendar quarter delimited by commas. A header row is included also. 6. To close the file, on the File menu, select Exit. 7. Leave the File Explorer window open.
Task 3 – Creating the Integration Services Project 1. In this task, you will commence by creating an empty Visual Studio solution. You will then add an Integration Services project and rename the default package. 2. Open SQL Server Business Intelligence Development Studio from Start | All Programs | Microsoft SQL Server 2008 R2 | SQL Server Business Intelligence Development Studio. 3. To create a new solution, on the File menu, select New | Project. 4. In the New Project window, in the Project Types pane, expand Other Project Types, and then select Visual Studio Solutions. 4.a. 4.b. Figure 2 4.c. Selecting the Visual Studio Solutions Project Type
4.d. 5. In the Templates pane, ensure Blank Solution is selected, then in the Name box, replace the text with AdventureWorksBI. 6. To set the project location, click Browse. 7. In the Project Location window, navigate to the Source folder for this lab, and then click Select Folder. 8. In the New Project window, click OK. 9. When the solution is created, in Solution Explorer, right-click the AdventureWorksBI solution, and then select Add | New Project. 10. In the Add New Project window, in the Templates pane, select Integration Services Project. 11. In the Name box, replace the text with Populate DW, and then click OK.
11.a. Note: The Populate DW Integration Services project is added to the solution and a default package, Package.dtsx, is added to the project. The package is automatically opened in the package designer.
11.b. 12. To rename the package, in Solution Explorer, right-click Package.dtsx, and then select Rename. 13. Modify the name to LoadFactSalesQuota.dtsx, and then press Enter. 14. When prompted to rename the package object, click Yes. 15. To save the solution, on the File menu, select Save All. Task 4 – Configuring the Integration Services Options 1. In this task, you will configure the Integration Services options. These configurations are recommended to help accelerate the development of your packages and improve the cosmetic appearance of your control flow and data flow layouts. 2. On the Tools menu, select Options. 3. In the Options window, in the left pane, expand Business Intelligence Designers | Integration Services Designers, and then select Control Flow Auto Connect. 4. Check the Connect a New Shape to the Selected Shape by Default checkbox. 5. In the second dropdown list, select Add the New Shape Below the Selected Shape.
5.a. 5.b. Figure 3 5.c. Configuring the Control Flow Auto Connect
5.d. 6. In the left pane, select Data Flow Auto Connect. 7. Check the Connect a New Shape to the Select Shape by Default checkbox. 8. In the dropdown list, select Add the New Shape Below the Selected Shape. 8.a. 8.b. Figure 4 8.c. Configuring the Data Flow Auto Connect
8.d. 9. Click OK.
Task 5 – Defining the Package Variables 1. In this task, you will define two variables. The first will store the year and will be used to source the data file and update the FactSalesQuota table. The second will store the folder path used to locate the data file. It is a recommended practice to dynamically retrieve and store such configurations so the package can adapt at execution time. 2. To open the Variables window, right-click anywhere inside the package designer, and then select Variables. 3. In the Variables window, click Add Variable.
3.a. 3.b. Figure 5 3.c. Add a New Variable
3.d. 4. Modify the name of the variable to Year, and then scroll to the right to modify the Value to 2008.
4.a. Note: This variable will be used to dynamically configure the package behavior. When executing the package it is be possible to update the variable’s value in order to load data for any given year. 4.b. 5. To add a second variable, in the Variables window, click Add Variable. 6. Modify the name of the variable to FolderPath, and then set the Data Type to String. 7. To copy the quota extract folder path to the Clipboard, switch to the File Explorer window opened in Task 2. 8. Right-click the folder path, and then select Copy Address as Text. 9. Close the File Explorer window. 10. Switch to SQL Server Business Intelligence Development Studio. 11. Inside the Variables window, for the FolderPath variable, click inside the Value box, right- click, and then select Paste.
11.a. Note: This variable will be used to store the folder path where the quota extracts are stored.
11.b. 12. Verify that the two package variables look like the following.
12.a. 12.b. Figure 6 12.c. Verifying the Package Variables
12.d.
Task 6 – Creating the Sales Quota Connection Manager 1. In this task, you will create the Sales Quota connection manager. This will involve configuring column data types and defining an expression that will allow the connection manager to dynamically connect to the file based on the values in the FolderPath and Year variables. 2. Right-click inside the Connection Managers pane, and then select New Flat File Connection. 3. In the Flat File Connection Manager Editor window, in the Connection Manager Name box, type Sales Quota. 4. To open a sales quota file, click Browse. 5. In the Open window, in the File Name box, right-click and then choose Paste to copy the quota extract folder path from the Clipboard, and then press Enter. 6. In the dropdown list to the right of the File Name box, select CSV Files (*.csv). 7. Select the 2008_SalesQuota.csv file, and then click Open. 8. Check the Column Names in the First Data Row checkbox. 9. Select the Columns page.
9.a. 9.b. Figure 7 9.c. Selecting the Columns Page
9.d. 10. Review the preview data.
10.a. Note: The editor has parsed the data in the opened file and has automatically determined the row and column delimiters.
10.b. 11. Select the Advanced page. 12. Ensure that the EmployeeID column is selected, and then modify the OutputColumnWidth property to 15. 13. Select the Q1 column, and then modify the DataType to Currency [DT_CY]. 14. Repeat the last step for each of the three quarters (Q2, Q3 and Q4).
14.a. Note: Integration Services assumes all data in a text file is of data type String. It is important to accurately configure the data types in the connection manager to ensure the data is retrieved as the correct type.
14.b. 15. To complete the connection manager configuration, click OK. 16. To configure a dynamic reference to the file based on the value of the Year package variable, ensure the Sales Quota connection manager is selected, and then in the Properties window, select the Expressions property, and then click the corresponding ellipsis button.
16.a. Note: If the Properties window is not open, on the View menu, select Properties Window.
16.b. 17. In the Property Expressions Editor window, in the Property column, select ConnectionString, and then at the end of the row, click the ellipsis button top open the Expression Builder.
17.a. 17.b. Figure 8 17.c. Configuring the ConnectionString Expression
17.d. 18. In the Expression Builder window, expand the Variables folder, and then drag the User::FolderPath variable into the Expression box. 19. Complete the expression based on the following.
19.a. Note: The expression required in this step may be copied from the Assets\Snippets.txt file in the Source folder of this lab.
19.b. 19.c. SSIS Expression Language 19.d. @[User::FolderPath] + "\\" + (DT_WSTR, 4)@[User::Year] + "_SalesQuota.csv"
19.e.
19.f. Note: Inside a string literal Integration Services interprets the backslash (\) as an escape character. To produce a backslash in the string literal two backslashes are required. Note also the cast operator that converts the Year variable value (integer) to a four character Unicode string.
19.g. 20. Click Evaluate Expression to review the file path for the 2008 quota extract file, and then click OK. 21. In the Property Expressions Editor window, click OK.
Task 7 – Creating the Data Warehouse Connection Manager 1. In this task, you will create a connection manager for the data warehouse. 2. Right-click inside the Connection Managers pane, and then select New OLE DB Connection. 3. In the Configure OLE DB Connection Manager window, click New. 4. In the Connection Manager window, configure the connection based on the following, and then click OK.
Property Value
Server Name SqlServerTrainingKitAlias
Database Name AdventureWorksDW2008R2 5. 6. In the Configure OLE DB Connection Manager window, click OK. 7. To rename for the connection manager, right-click the connection manager, and then select Rename. 8. Modify the name to AdventureWorksDW2008R2, and then press Enter.
Task 8 – Introducing a Sequence Container 1. In this task, you will commence the control flow design with a Sequence Container that will host two tasks. The first will be an Execute SQL Task that to delete existing quota records, and the second will be a Data Flow Task to insert new quota records. These two operations must happen as a single transaction and you will configure the Sequence Container to support this requirement. 2. From the Toolbox, drag the Sequence Container and drop it into the package designer.
2.a. Note: If the Toolbox is not open, on the View menu, select Toolbox. 2.b. The Toolbox is used frequently during package development. It is recommended that you click the pushpin to keep the Toolbox permanently open. 2.c. 3. To rename the Sequence Container , right-click the container, and then select Rename. 4. Modify the text to Update FactSalesQuota, and then press Enter. 5. To configure the transaction support, ensure the container is selected, and then in the Properties window, modify the TransactionOption property to Required.
Task 9 – Introducing the Execute SQL Task 1. In this task, you will add an Execute SQL Task inside the container, and configure the task to execute a stored procedure. 2. From the Toolbox, drag the Execute SQL Task and drop it inside the Update FactSalesQuota container. 3. To configure the task to delete existing facts, right-click it, and then choose Edit. 4. In the Execute SQL Task Editor window, configure the following properties, and then click OK.
Property Value
Name Delete Existing Quotas
Connection AdventureWorksDW2008R2
SQLStatement EXEC dbo.uspDeleteFactSalesQuota ? 4.a.
4.b. Note: The stored procedure in the SQLStatement property requires a single input parameter named @CalendarYear. The question mark (?) is a placeholder for this parameter.
4.c. 5. Double-click the Delete Existing Quotas task to reopen the Execute SQL Task Editor window and select the ParameterMapping page. 6. To add a parameter mapping, click Add. 7. In the Variable Name column, select User::Year. 8. In the Parameter Name column, replace the text with @CalendarYear, and then click OK.
Task 10 – Introducing the Data Flow Task 1. In this task, you will add a Data Flow Task and then create a precedence constraint that ensures this task execute upon successful completion of the Execute SQL Task. 2. From the Toolbox, drag the Data Flow Task and drop it inside the Update FactSalesQuota container, directly beneath the Delete Existing Quotas task. 3. To rename the Data Flow Task, right-click it, and then select Rename. 4. Modify the text to Insert New Quotas, and then press Enter. 5. To introduce a precedence constraint, select the Delete Existing Quotas task, and then drag the green arrow connector onto the Insert New Quotas task. 6. Review that your package control flow resembles the following.
6.a. 6.b. Figure 9 6.c. Reviewing the Package Control Flow
6.d.
Task 11 – Configuring the Data Flow Task 1. In this task, you will configure the Data Flow Task. The task will be responsible for pivoting the four quarter columns to produce four rows per employee (one for each quarter). It will then add date calculations and perform a lookup against the DimEmployee table to retrieve the dimension surrogate key. 2. Right-click the Insert New Quotas task, and then select Edit.
2.a. Note: The package designer switches to the Data Flow tab. In this tab you will assemble the data flow. Notice also that the content of the Toolbox now consists of the data flow components, categorized by sources, transformations and destinations.
2.b.
Task 12 – Introducing the Flat File Source 1. In this task, you will add a Flat File Source and configure it to retrieve the data from the Sales Quota connection manager. 2. From the Toolbox, from inside the Data Flow Sources category, drag the Flat File Source and drop it into the package designer. 3. To rename the component, right-click the Flat File Source, and then select Rename. 4. Modify the text to Sales Quota, and then press Enter. 5. To configure the component, right-click it, and then select Edit. 6. In the Flat File Source Editor window, in the Flat File Connection Manager dropdown list, ensure the Sales Quota connection manager is selected. 7. To review the data in the file, click Preview. 8. In the Data View window, click Close. 9. Select the Columns page. 10. Review the columns to be output by this component, and then click OK.
Task 13 – Introducing the Unpivot Transformation 1. In this task, you will add the Unpivot transformation and configure it to pivot the quarter values to the rows. You will then use the Advanced Editor to update the output CalendarQuarter data type to match the target table’s data type. 2. Select the Sales Quota component, and then in the Toolbox, from inside the Data Flow Transformations category, double-click the Unpivot component.
2.a. Note: The introduction, layout alignment and auto connection of the component happened because you configure the Integration Services designer options in Task 4.
2.b. 3. Right-click the Unpivot component, and then select Edit. 4. In the Unpivot Transformation Editor window, configure the component as shown below (there are four parts to update), and then click OK. 4.a.
4.b. Figure 10 4.c. Configuring the Unpivot Component
4.d.
4.e. Note: The output of this component will have three columns: EmployeeID, CalendarQuarter, and SalesAmountQuota. The CalendarQuarter column will contain the Pivot Key Value of 1, 2, 3 or 4. A downstream component that you will configure shortly will convert these numbers to the first day of the first month of the quarter. The SalesAmountQuota column will contain the value that was previously stored in the quota extract file at the intersection of the EmployeeID and the quarter.
4.f. 5. Right-click the Unpivot component, and then select Show Advanced Editor. 6. In the Advanced Editor for Unpivot window, select the Input and Output Properties tab. 7. In the Inputs And Outputs pane, expand Unpivot Output, and then expand Output Columns. 8. Select the CalendarQuarter column, modify the DataType property to Single-Byte Unsigned Integer [DT_UI1], and then click OK.
Task 14 – Introducing the Derived Column Transformation 1. In this task, you will add a Derived Column transformation to add two columns to the data flow. The two columns will evaluate expressions to produce date values required in the target table. The first, DateKey, will represent the date of the first day of the first month of the quarter based on the value assigned to the CalendarQuarter column in the Unpivot component, and the year in the Year package variable. The second, CalendarYear, will convert the Year package variable to a TINYINT data type.
1.a. Note: The expressions required in the following steps may be copied from the Assets\Snippets.txt file in the Source folder of this lab.
1.b. 2. Select the Unpivot component, then in the Toolbox, from inside the Data Flow Transformations category, double-click the Derived Column component. 3. Right-click the Derived Column component, and then select Edit. 4. In the Derived Column Transformation Editor window, in the Derived Column Name column, type DateKey. 5. In the Expression column, type the following expression. 5.a. SSIS Expression Language 5.b. (@[User::Year] * 10000) + ((([CalendarQuarter] * 3) - 2) * 100) + 1
5.c. 6. Create a second derived column named CalendarYear based on the expression. 6.a. SSIS Expression Language 6.b. (DT_I2)@[User::Year]
6.c. 7. Verify that the two derived columns resemble the following. 7.a.
7.b. Figure 11 7.c. Reviewing the Derived Column Definitions
7.d. 8. In the Derived Column Transformation Editor window, click OK.
Task 15 – Introducing the Lookup Transformation 1. In this task, you will add a Lookup transformation and configure it to relate the input EmployeeID column to the EmployeeNationalIDAlternateKey column in the dimension table. It will then add the surrogate key, EmployeeKey, as an output column. 2. Select the Derived Column component, and then in the Toolbox, from inside the Data Flow Transformations category, double-click the Lookup component. 3. Right-click the Lookup component, and then select Edit. 4. Select the Connection page. 5. In the OLE DB Connection Manager dropdown list, ensure the AdventureWorksDW2008R2 connection manager is selected. 6. Select the Use Results of an SQL Query option, and then in the following box, type the following query.
6.a. Note: The query required in the following steps may be copied from the Assets\Snippets.txt file in the Source folder of this lab.
6.b. 6.c. T-SQL 6.d. SELECT EmployeeKey, EmployeeNationalIDAlternateKey 6.e. FROM dbo.DimEmployee 6.f. WHERE (SalesPersonFlag=1) AND (CurrentFlag=1);
6.g. 7. To review the lookup data, click Preview. 8. In the Preview Query Results window, click Close. 9. Select the Columns page. 10. From the Available Input Columns table, drag the EmployeeID column and drop it onto the EmployeeNationalIDAlternateKey column in the Available Lookup Columns table. 11. In the Available Lookup Columns table, check the EmployeeKey column.
11.a.
11.b. Figure 12 11.c. Configuring the Lookup Column
11.d. 12. Click OK.
Task 16 – Introducing the OLE DB Destination 1. In this task, you will add the OLE DB Destination and configure it to store the data flow data using the AdventureWorksDW2008R2 connection manager. 2. Select the Lookup component, and then in the Toolbox, from inside the Data Flow Destinations category, double-click the OLE DB Destination. 3. In the Input Output Selection window, in the Output dropdown list, select Lookup Match Output, and then click OK. 4. To rename the component, right-click the OLE DB Destination, and then select Rename. 5. Modify the text to FactSalesQuota, and then press Enter. 6. Right-click the FactSalesQuota component, and then select Edit. 7. In the OLE DB Destination Editor window, in the OLE DB Connection Manager dropdown list, ensure the AdventureWorksDW2008R2 connection manager is selected. 8. In the Name of the Table or the View dropdown list, select [dbo].[FactSalesQuota]. 9. Select the Mappings page.
9.a. Note: When you select this page for the first time, column mappings are automatically assigned when the column names match and the data types of the matched columns are at least equivalent. Because all insertable columns have been mapped by this step, you do not need to perform any additional mapping configurations. Note that the SalesQuotaKey column is an identity column; SQL Server will automatically insert unique sequential values into this column.
9.b. 10. Click OK. 11. Review that your data flow resembles the following.
11.a. 11.b. Figure 13 11.c. Reviewing the Data Flow
11.d.
Task 17 – Executing the Package 1. In this task, you will execute the package to load the 2008 sales quota data and observe the data flow progress. You will then modify the Year package variable and then execute the package again to load the 2009 sales quota data. 2. In the package designer, select the Control Flow tab. 3. In Solution Explorer, right-click the LoadFactSalesQuota.dtsx package, and then select Execute Package. 4. When the package execution completes (note the status line at the bottom of the package designer), select the Data Flow tab, and then review the execution status and row statistics displayed beside the service paths in the Data Flow tab. 5. On the Debug menu, select Stop Debugging. 6. Select the Variables window.
6.a. 6.b. Figure 14 6.c. Selecting the Variables Window
6.d. 7. In the Variables window, modify the value of the Year variable to 2009. 8. Repeat the steps in this task to execute the package again.
Task 18 – Reviewing the FactSalesQuota Data 1. In this task, you will query the FactSalesQuota table again to review the loaded data. 2. Switch to SQL Server Management Studio. 3. To execute the query again, on the toolbar, click Execute. 4. Review the query results. A total of 265 rows should be returned, and notice that there is now data for calendar years 2008 and 2009.
Task 19 – Finishing Up 1. In this task, you will finish up by closing all applications. 2. In SQL Server Management Studio, on the File menu, select Exit. 3. When prompted to save changed, click No. 4. In SQL Server Business Intelligence Development Studio, on the File menu, select Exit. Summary
1. In this lab, you have created an Integration Services package to populate the data warehouse’s FactSalesQuota table.