Benchmark Financial Brokers
Total Page:16
File Type:pdf, Size:1020Kb

UNIVERSITY OF HOUSTON-CLEAR LAKE
Benchmark Financial Brokers
A Case Study on Building a Data Warehouse
Prepared by: Terry Lee Tran Ngo Sandeep Udhani Navin Negi
Prepared for: Dr. Rob ISAM 5332
5/5/2010 University of Houston-Clear Lake ISAM 5332 Dr. Rob
Benchmark Financial Brokers: A Case Study on Building a Data Warehouse
Terry Lee Tran Ngo Sandeep Udhani Navin Negi
Abstract Case Study: Benchmark Financial Brokers The purpose of this paper is to discuss how we approached building a data warehouse for 1. Business Scenario Benchmark Financial Brokers (Benchmark), Benchmark started with a single location in starting with taking the wrong approach and Texas and has now expanded to forty five ending up with a functioning project. locations throughout Texas, Louisiana, and Florida. The business owner plans on expanding Introduction to include more locations throughout the Benchmark is involved in the securities trading country. The problem that the owner and the business and as such has to maintain multiple managers of this business face is that with their transactional systems. These systems are continual expansion it is becoming increasingly adequate to run the day to day operations of difficult for them to conduct timely visits to the business, however they do not allow for each of the locations in order to monitor timely access to strategic information. This is employee performance, track revenue, and information that will help the owner and ensure sound financial advice is being offered to managers maintain the long-term health of the the customer. Benchmark has operational organization. The purpose of a data warehouse systems that allow the advisors to monitor the is to collect desired data from various constantly fluctuating stock markets, as well as transactional systems, convert it into a common transactional systems that allow them to buy or format, cleanse it, and allow it to be queried for sell various financial instruments for their useful strategic information in the future. While customers. These various systems prevent a the concept of a data warehouse sounds quite challenge when it comes to generating useful simple, in reality it is almost completely reports that provide the information that the opposite. Data must be continually collected, owner is looking for. The owner is convinced transformed, and cleansed from each of the that if he can find a way to extract the data that different transactional systems in order to keep he needs from each of these systems he will be the warehouse properly maintained. able to maintain greater control over his business.
2. Why a Data Warehouse? Benchmark’s owner needs better strategic information about his business. This is what a data warehouse is perfect for. The concept of a data warehouse is to be able to collect data from multiple operational and transactional our strategy, and led us to completely redesign systems, combine them in some manner, store our dimensional models. them in a central location, and provide Online Analytical Processing abilities in the future. This 4. Dimensional Modeling and defining data is exactly what the owner of Benchmark has structure been looking for. He needs the ability to collect Dimensional modeling incorporates the and query this data in order to obtain reports business dimensions into the logical data that will provide him with the strategic model. (Ponniah 206) After defining information that he needs. This can include requirements from the business needs, data reports such as revenue for each location by structures are designed within the logical state, revenue by investment type, customers model. When defining requirements, users may who have money in high risk investment, and not be able to precisely describe what they employees who may be purposely selling these want in a data warehouse, they can provide you investments. The ability to query this kind of with very important insights into how they think system online from his office is the perfect about the business, as well as tell us what solution to his problem. measurement units are important for them. Managers think of the business in terms of how 3. Methodology they want to measure it. These measurements Beginning this project as a group, we did not are the facts that indicate to the users how have a very clear picture of how we needed to their departments are doing in fulfilling their go about designing a data warehouse. We objectives. We can say business metrics or facts began with designing what was essentially a are what managers want to analyze in order to transactional database. One of our primary know the current situation of their business. concerns when we started the process was how These are important when making decisions to track the price of the securities that are being that will affect the future of their organization. sold, as the stock market is constantly When designing dimensions for a data fluctuating. Even though it made complete warehouse it is extremely important to pay sense to us, our initial approach proved to be attention the hierarchies that are contained wrong. A data warehouse is not meant to be a within each of the dimensions. These transactional database; it is a database that dimensional hierarchies are the various levels of warehouses historical data about the detail contained within a business dimension. organization. This is important because the Managers can use the dimensional hierarchies stored data will be used strictly for designing as the paths for drilling down or rolling up in the reports that management needs to help analysis. Dimensional modeling is the technique them make strategic decisions about the future. that is in designing a data warehouse. Many Approaching the project a second time we software vendors have expanded their realized that instead of trying to figure out how modeling case tools to include dimensional to collect the data, we needed to figure out modeling. Modern software is very useful when how to present the data that we already had. designing fact tables, dimension tables, and When designing the data warehouse, probably establishing the relationships between them. the best method to employ is to approach it When you have finished modeling the from a reporting aspect. Approaching the dimensions and establishing the relationships, project this way means that you must be you end up with a database schema. The two thinking about the information that is types of schemas that are generally used in a ultimately desired by the end user. Once we data warehouse are the STAR schema and the realized that we needed to look at the project snowflake schema. The STAR schema is a simple from this aspect, it led us to completely rethink database schema for data design using a dimensional model. This schema consists of a looking at project correctly, we recognized that fact table in the center that is directly related to this information was not necessary for the data the dimension tables that surround it. Although warehouse. Although a data warehouse is the STAR schema is a relational model, it is not based on a relational database like one that a normalized model. The snowflake method would be used in an operational system what normalizes the dimension tables in a STAR we were building was a decision-support schema. As mentioned earlier, it is very system. The important component to the entire important to employ the correct approach project is the ability to track and analyze the when designing the schema for a data commission. We do not need to worry about warehouse. This is where we made our first the price of investments or employee salary. In mistake. As a result of this, we have two order to correct our error, we redesigned our different database schemas. The first schema database schema and arrived at a Star schema that we designed was a complex snowflake which is depicted in figure 1.2. In this schema, schema which is depicted in figure 1.1. we have the Transaction as the fact table with four dimension tables: Employee, Customer, Investment and Date_Time. We kept these dimensions because we needed to analyze commission by state, by employee, by customer, by investment type, and by investment risk class.
Figure 1.1
When the dimensions in a STAR schema are completely normalized the resulting structure resembles a snowflake with the fact table in the middle. In the case of Benchmark, the most important fact to analyze is the sales commission which is the revenue for the company. The transaction table is the fact table which contains commission as an attribute. Transactions are analyzed base on dimensions Figure 1.2 such as Customer, Employee, Investment, Time, Date, and the Commission collected. At first, we From our new Star schema, we were able to approached this project from the wrong define the data format that we need for the direction, and because of that we normalized warehouse. We defined table name, attribute the tables in our schema. This was done name, data type of each attribute in each table, because we were thinking in terms of a and relationship among tables. From transactional system where we would need to Benchmark’s operational system, we extracted track changes in the price of stocks, bonds, and data into an excel file that is depicted in figure other marketable securities. We stored detail 1.3. Many of the fields that were extracted are information about employees such as position, necessary for an operational system, but were and salary. After we realized that we weren’t not needed in the data warehouse. This is where cleansing data becomes important. In order to keep the warehouse efficient, we used We used Access to define our relationships and Excel to remove the extraneous data before it make sure that the system functioned before was imported. After cleansing the data, we had importing the database into SQL Server 2008. attributes that were important to the structure of our system. An example of the cleansed data is found in figure 1.4. When we were satisfied that are data was formatted and cleansed correctly, we moved to our next step which was to implement our database schema in Microsoft Access. This schema is displayed in figure 1.5.
Figur e 1.3 Figure 1.4
Figure 1.5 dimensions. This very important step is 5. Implementation in SQL Server 2008 depicted in figure 1.6. In order to browse the data that is contained within the data warehouse, you must design a data structure called a cube. Constructing the cube is done in SQL Server Analysis Services. A cube is comprised of the fact table and all of the data that is directly related to it. The cube organizes the data into a format that can be easily queried, rolled up, drilled down, and sliced and diced based on the measures and hierarchies that are applicable to your particular data set. Importing a database from Access to SQL Server is supposed to be an easy process, but trust us it is anything but. Trying to figure Figure 1.6 out how to get the program to accept your data and process it turned out to be one of the Without defining the hierarchies in each of your biggest challenges of the whole project. dimensions, you will not have access to all of According to Scott Cameron’s SQL Server 2008 the data. When a manager is looking for Analysis Services: Step by Step, if you have an information, he may want a very high level of existing relational database such as Access, granularity, or a very low level of granularity. Teradata, Oracle, IBM DB2, as well as some These types of details are very important when others, you should be able to select the deciding how to define the dimensions that are appropriate driver and connect to your data contained within your data warehouse. When source without any difficulties.(Cameron 39) If all of the hierarchies are defined, you must set only this were true. Due to not having sufficient up the relationships that are contained within security clearance to upload our database onto the dimensions. An example of these the University of Houston-Clear Lake (UHCL) relationships can be seen in figure 1.7. server, we decided to use a personal laptop to run the software. Operating system compatibility was one of the first problems that we encountered. The solution to this problem was to download the applicable service pack from Microsoft Update. Once the software was installed we attempted to import the database from Access. The next attempt to import the database resulted in being able to import the data, but this time we could not build or deploy our cube. Do not get frustrated when you encounter this problem. We have chosen not to outline the steps that we took to get the program to function correctly as they will be different for each application. Once we had the database imported and functioning properly, Figure 1.7 we commenced building our cube. The ability to roll up, and drill down your data is based on the When all of hierarchies and relationships are set hierarchies that are defined within your up, the cube can be launched. A fully implemented cube will look something like the reports the end user is looking for. The figure 1.8. Benchmark project is concerned primarily with commission that is collected from each transaction. In order to get a picture of the business as a whole it is more reasonable to query the data for commission from a particular region, or in our case, by each state. In figure 1.9, we have shown commission by state as it is presented in the cube browser.
Figure 1.8 Figure 1.9 6. Browsing the Cube Keeping in mind that the ultimate goal of the This is a very high level of detail. If you were to data warehouse is to provide strategic add all of the hierarchies that are available to information to managers and business owners, this query, you can drill the data down to it is now time to browse the cube that you have provide commission for each employee in each created. This is the process where you are zip code, as it relates to each type and name of actually designing the queries that will provide investment. This is shown in figure 1.10.
Figure 1.10
7. Generating useful reports Being able to browse the cube and design queries is a very powerful and useful tool. Unfortunately, to the end user of the system, some of these queries are almost unreadable within in the cube browser. Remember that the final result of this project is to provide strategic information that will be useful to management in making decisions that will affect the future health of the organization. These reports are not going to be provided to a member of the IT Figure 1.12 staff who would be comfortable viewing the format in the browser. Management will want a Even though this particular report does not report that can be read and interpreted easily. directly track commission, it is directly related Providing these kinds of reports is easily done to the amount of commission that the company once you have a functional cube. The cube that collects. The goal of Benchmark is help grow the was initially created within Analysis Services can retirement funds of their customers, and if they also be accessed with Reports Service which is were to ignore these risky investments, they another very powerful tool that is included in would lose money, ruin the reputation they SQL Server 2008. By creating a Reports Services have strived to build, and drive new and project, we were able to generate reports from existing customers away. When there are no the Benchmark warehouse that will be useful to customers, there is no commission to keep the owner and management. The same track. information that is depicted in figure 1.9 is again displayed in figure 1.11 in a much easier Conclusion to read format. Entering into the process of constructing a data warehouse with no prior knowledge of the subject proved to be quite a challenge. It also turned into an exceptional learning experience. We learned to carefully analyze the project that has been presented before diving into it head first. It is essential to do this so that you can be sure that the correct approach is being taken in regards to the end result. Starting with the desired result and working backwards turned out to be the direction that we ultimately took with this project, and is probably a viable approach to take when designing a data Figure 1.11 warehouse. A data warehouse is a report centric system, so beginning with an Another report that the Benchmark understanding of the desired output will lead to management wanted was Customers with High a much more efficient design plan. We believe Risk Investments, figure 1.12, which would that we have constructed a system that allow them to find customers who have money Benchmark will be able to rely on for their in an investment that is now considered to be reporting needs for the foreseeable future. high risk.
Works Cited
Cameron, Scott. (2009) Microsoft SQL Server 2008 Analysis Services Step by Step. Redmund, Washington: Microsoft Press.
Ponniah, Paulraj. (2001) Data WarehousingFundamentals: A Comprehensive Guide for IT Professionals. New York, New York: John Wiley & Sons.