MHC Data Warehouse Project
Total Page:16
File Type:pdf, Size:1020Kb
MHC Data Warehouse Project Glossary of Terms
Table of Contents
Basic Data Warehousing Terminology…………………….………………………………………P 2
SAP Business Objects……………….…………………………………………………………………….P 7
Universe Project Process Flow...... …...P 9
Bibliography………………………………………………………………………………………………..P 11
1 MHC Data Warehouse Project Glossary of Terms Basic Data Warehousing Terminology Attribute- Individual data elements that is represented and stored in a dimension. Each attribute contains data relating to that dimension.
Business Intelligence (BI) - The collection of one or more reports and analyses, using data from the data warehouse, that provide insight into the performance of a business organization. These reports and analyses are typically interactive to enable further understanding of specific areas of interest. They are used to support business professionals in their decision –making processes.
Business measures- The complete set of facts, base and derived, that are defined and made for reporting and analysis.
Conformed Dimension – A dimension that is shared between two or more fact tables. It enables the integration of data from different fact tables at query time. This is a foundational principle that enables the longevity of ta data warehousing environment. By using conformed dimensions, facts can be used together, aligned along these common dimensions. The beauty of using conformed dimensions is that facts that were designed independently of each other, perhaps over a number of years, can be integrated. The use of conformed dimensions is the central technique for building an enterprise data warehouse from a set of data marts.
Conformed Fact- A fact or measure whose definition is consistent across facts tables and data marts. A conformed fact, such as revenue, can be correctly added and compared across different fact tables.
Dashboard (also called performance dashboard)- The presentation of key business measurements on a single interface designed for quick interpretation, often using graphics. The most effective dashboards are supported by a full data mart that enables drilling down into more detailed data to better understand the indicators.
Data Architecture- Describes how data is organized and structured to support the development, maintenance, and use of the data by application systems. This includes guidelines and recommendations for historical retention of the data, and how the data is to be used and access.
Data Cleansing- The Process of verifying and correcting data using a series of business rules for validation, and specifying how to handle cases that fail the checks.
Data Dictionary- The place where information about data that exists in the organization is stored. This should include both technical and business details about each element.
2 MHC Data Warehouse Project Glossary of Terms Data Element- The smallest unit of data that is named. The values are stored in a column or a field in a database.
Data Governance- The Practice of organizing and implementing policies, procedures, and standards for the effective use of an organization’s structured or unstructured information assets. Data Mart- Typically, a data model (star schema) that supports a particular business process or workflow. A Data Warehouse is the collection of many data marts standardized by the use of Conformed Dimensions.
Data Model- An abstraction of how individual data elements relate to each other. It visually depicts how the data is to be organized and stored in a database. A data model provides the mechanism to document and understand how data is organized.
Data quality- Assessment of the cleanliness, accuracy, and reliability of data.
Data Warehouse- A broad definition to describe an integrated information repository that marries data from several systems (Colleague, Lawson, PowerFaids, etc). These data are loaded into a database around specific subject area data models, or marts. The data warehouse is the physical layer, ie where the data actually live, that then access with our front end BI tools like Business Objects, Tableau, Excel or any other tool.
Degenerate dimension- A single attribute dimension whereby the only attribute is a reference identifier such as invoice number, P.O. number, or transaction ID. This is needed to support analysis of the individual parts of a business transaction (individual line items) and the entire business transaction (the whole purchase order)
Derived Attribute- An attribute that is created to facilitate the overall usefulness of the dimensional model. This enables different attributes to be pulled together using a single identifier that can help the technical implementation of the dimensional model. It is used when the underlying source systems do not have a data element to uniquely define this case and is required to pull together non related attributes of a junk dimension.
Derived Fact- A fact that is calculated on-the-fly and not stored in the database.
Dimension- Major Business categories of information or groupings to describe business data. Dimensions contain information used for constraining queries, report headings, and defining drill paths. Within a dimension, specific attributes are the data elements that are used as row and column headers on reports. Dimensional attributes are also considered to be reference data. When describing the need to report information by regions, by week, and by month the attributes following “by: are dimensions. Each of these would be included in a dimension.
3 MHC Data Warehouse Project Glossary of Terms
Dimensional Model- A data model organized for the purposes of user understandability and high performance. In a relational database, a dimensional model is a star join schema characterized by a central fact table with a multi-part key.
Dimensional Modeling- A formal data modeling technique that is used to organize and represent data for analytical and reporting use. The Focus is on the business perspective and the representation of data. Entity-relationship (ER) Model- A data model that is used to represent data in its purest form and to define relationships between different entities. It is often the type of model used to design online transaction processing systems See also normalized model.
Extract, transform, and load (ETL)- The collection of processes that are used to prepare data for another purpose. This is typically applied to data warehousing, whereby the extracted process collects data from the appropriate underlying source systems. The transformation processes perform cleansing, manipulation, and reorganization of the data in preparation for its intended use. Finally the load processes put the data into the data structures where it is held for data delivery. While ETL processes are regularly discussed in the context of building the data warehouse, these techniques can also be used for moving and manipulating data for a variety of other purposes.
Factless fact table- A fact table that captures the existence of business events that do not have a n associated quantitative measurement. The existence of the relationship is what is relevant.
Facts- The fundamental measurements of the business. These are captured as specific information about a business event or transaction. They are measured, monitored, and tracked over time. Facts are typically the amounts and counts that show up as the body or reports, Facts are used for any and all calculations that are performed.
Grain- the level of detail showing how data is stored and available for analysis.
Information Management- In its simplest form, this is the work associated with collecting, maintaining, applying,, and leveraging data across and organization.
Infrastructure- a basic foundation technology that all other initiatives in the organization can rely on and use. This includes basic networking services such as providing shared network drives for storing the group’s files (e.g., word processing documents, spreadsheets, and presentations). The existence of some sort of computer for each user can also be considered infrastructure. Basic Networking of computers, via a local area or wireless network, is another example of infrastructure.
4 MHC Data Warehouse Project Glossary of Terms Junk Dimension- A dimension that brings together single attributes that may or may not have any true relationship to each other in order to simplify the model. Improve query performance, and / or reduce data storage.
Master Data Management (MDM)- The processes and tools to help an organization consistently define and manage core reference or descriptive data across the organization. This may involve providing a centralized view of the data to ensure that its use for all business processes is consistent and accurate.
Multi-dimensional OLAP (MOLAP)- OLAP technology whereby the data is stored in proprietary array structures called multi-dimensional cubes. See also OLAP.
Normalized Model- A data Model organized to clarify pure data relationships and targeted at gaining efficiencies in data storage and maintenance. This is used for the design of transaction processing systems. There are specific rules for normalization. Depending upon the number of rules followed (For different purposes) there are different “forms,” such as third normal form. See also entity-relationship model.
Online Analytical processing (OLAP) - a collection of common business analysis functions that are difficult to perform directly with SQL. Some of the specific functions that fall under the OLAP umbrella include times series comparison, ranking, ratios, penetration, thresholds, and contribution to report or the whole data population. Most business intelligence tools provide this type of functionality. The capabilities can be implemented in a variety of different data storage mechanisms. See also MOLAP, ROLAP, HOLAP.
Online Transaction processing- (OLTP)- Online transaction processing (OLTP) systems are the fundamental systems used to run the business. These are also called operational systems or operational applications. They are often used as sources of data for the data warehouse.
Operational data store (ODS)- A collection of data from operational systems, most often integrated together, that is used for some operational purpose, the most critical characteristics here is that this is used for some operational function. This operational dependency takes precedence and the ODS should not be considered a central component of the data warehousing environment. An ODS can be a clean, integrated source of data to be pulled into the data warehousing environment.
Query- The mechanism to get data out of a database. A query is comprised of constraints used to filter data out of the results, and defines the data elements to be included in the results set and possibly some mathematical computation, grouping, or sorting of the data.
5 MHC Data Warehouse Project Glossary of Terms
Relational OLAP (ROLAP)- OLAP technology that uses data is stored in relational database management systems. Data is usually organized dimensionally using a star or snowflake schema.
Role-Playing Dimension- Instances of a dimension that legitimately has more than one value for a given business transaction, such as order date and shipped data. Each attribute with the dimension is uniquely identified to enable easy differentiation between the different roles, such as Order Data, Order Quarter, and Shipped Data and Shipped Quarter.
Scorecard (or performance scorecard)- An application that helps organizations measure and align the strategic and tactical aspects of their businesses, comparing organizational and individual performance to goals and targets.
Slowly Changing Dimension (SCD)- A dimension that accommodates changes to the reference data over time. Several dimensional modeling techniques are used to determine how to handle changes to the reference data stored in dimensions. This may be to retain only the current values (Type 1), to store different versions of the reference data (Type 2), or to retain on previous version of changes made to the entire dimension (Type 3).
Snowflake schema- A variation of the start schema in which the business dimensions are implemented as a set of normalized tables. The resulting diagram resembles a snowflake.
Source System- An operational system of records whose function it is to capture the transactions of the business. Source systems are often large online transactional processing systems, but could also be smaller departmental data bases or spreadsheets that are maintained and used by members of the business community. These are the origin of the data used to build the data warehouse.
Staging area- Place where data is stored while it is being prepared for use, typically where data used by ETL processes is stored, this may encompass everything from where the data is extracted from its original source until it is loaded into presentation servers for end user access. It may also be where data is stored to prepare it for loading into a normalized data warehouse.
Star Schema- The implementation of a dimensional model in a relational data base . The tables are organized around a single central fact table possessing a multi-part key, and each surrounding dimension table has its own primary key.
Structured Query Language (SQL)- The programing language used to access data stored in a relational database.
6 MHC Data Warehouse Project Glossary of Terms Technical architecture- Addresses the organization and the structure of the collection of hardware and software technologies that are installed to support the development and delivery of the data warehouse.
Third normal form (3NF)- the most common form of a normalized model. See also normalized model. (Source: Reeves, Laura L.. A manager's guide to data warehousing. Indianapolis, IN: Wiley Pub., 2009. Print.)
SAP Business Objects MHC Business Objects Servers- 3 servers Panda-https://panda.mtholyoke.edu/BOE/BI (System Test) Whale-https://whale.mtholyoke.edu/BOE/BI (Development) Swan-https://swan.mtholyoke.edu/BOE/BI (Production)
Web Intelligence (Webi)-You perform data analysis with SAP Business Object Web Intelligence by creating reports based on data you want to analyze, or by opening pre-existing documents. Depending on your licenses and security rights, you can then analyze the data in your reports by ,for example, filtering, drill down, to reveal more details, merging data from different data sources, displaying data charts or adding formulas. Web Intelligence- has 3 interfaces Web- also referred to as DHTML interface, you launch this via BI Launch Pad Rich Internet Application- also referred to as Java applet, you launch this via BI Launch Pad Web Intelligence Rich Client-you download and install this via BI Launch Pad
7 MHC Data Warehouse Project Glossary of Terms Rich Client- you download and install this via BI Launch Pad, desk top application. Universes- Data comes from universes, which organize data from relational OLAP databases into objects or hierarchies, from personal data providers such as Microsoft Excel, or CSV files, from BEx queries based on SAP Info Cubes, from Web Services, or from Advanced Analysis Workspaces. You build data providers to retrieve data from these data sources and you create reports from the data in the data providers. The semantic layer that one directly interfaces with in the Query Panel of Webi. This layer is considered a “translation” layer – translating the database level complexities – joins, technical definitions and implementation – into a business view of the data. Data are organized into classes (folders) that contain objects called dimensions and measures. (Note: it’s important to note that SAP BO uses the word ‘dimension’ in the same manner we often use attribute in the DW). Everything such as names, object organization are merely to support ease of use and navigation.
BI Platform- The overall BI system.
BI Launchpad- Bi Platform includes the BI Launch Pad, a web application that acts as a window to business information about your college. In BI Launch Pad you can perform the following tasks: Access Crystal Reports, Web Intelligence documents, and other objects and organize them to suit your needs View information in a web browser, export to other business applications (such as Microsoft excel, and SAP Stream work), and save it to a specified location. Use analytical tools to explore the information in detail >Object-an object is a document of file created by the BI platform or other software that is stored and managed in the BI repository platform. >Categories-a category is an organizational alternative to a folder. Use Categories to label objects. >Scheduling - scheduling is the process of automatically running and object at a specified time. Scheduling refreshes dynamic content or data in the object, creates instances, and distributes the instances to users or stores them locally. >Events- an event is an object that represents an occurrence in the BI platform. Events can be used for a variety of purposes, including: As scheduling dependencies that trigger actions after a schedule job has run To trigger alert notifications To monitor BI platform performance >Calendars-a calendar is a customized list of run dates for scheduling jobs >Instances-an instance is a snapshot of an object that contains data from the time an object was run >Publishing- publishing is the process of making personalized dynamic content publically available for mass consumption.
8 MHC Data Warehouse Project Glossary of Terms >Profiles-a profile is an object that associates users and groups with personalization values. Profiles are used with Publishing to create personalized content and distribute to recipients. >Alerting-alerting is the process of notifying users and administrators when events occur in BI Platform
Universe Project Process Flow
Define Scope - Scope should be defined by the Business Owner. A scope statement should be created and reviewed and signed off by all core team members.
Inventory- Once the project scope has been defined and agreed upon, the Business Owner will create an inventory of the data elements and reports required to support the defined scope. The Business Owner
9 MHC Data Warehouse Project Glossary of Terms should try to identify the source system and data domains as well as the needs and the gaps of the project.
Analyze- Review all of the data elements and the reports. Engage in detailed discussions to profile the data and determine the business rules around the data. Identify any data issues, challenges and try to identify any risks. Document and define the business and data requirements. Determine the security requirements as well; who should be able to view the data, and who should not. The business requirements should be reviewed and signed off by the core team so everyone is aware of the actual requirements.
Design- Develop the Conceptual Data Design, the Logical Models, the Security structure, and the BOE layout.
Prototype- Construct the physical database structures, and load sample data. Build the BO Universe prototype, (note: this step skips a mature ETL step).
Prototype Validation- Validate the data model with the customer by visually displaying the model in BO. Determine if the model works as expected, it comprises all of the required data. Review the BO Universe layout, design and data quality with the customer to ensure if meets their expectations. This is validation is strictly for the design of the data model, and not the actual data. The data will be validated during User Acceptance Testing (UAT)
Iterate/Refine- If the prototype did not get approved by the customer, go back to the Analyze step to ensure there is a common understanding on the business requirements. Follow the next steps in sequence until the Prototype has been validated by the customer; Analyze, Design, Prototype, Prototype Validation.
Construct-Business Definition and Data Source -The business owner will create business definitions for all data elements that are being added to BO. The technical team will provide the data source. At a minimum, both the business definition and the data source will be added in BO as a reference tool for the users. The users will be able to hoover over any data element to display the business definition and data source.
Test-There is a test strategy document that outlines the test strategy in further detail. Functional Testing - This testing is done by the Data Orchestra and the Data Modeler does prior to UAT. It is done to ensure that the system works as defined in the business requirements. The specific test cases will be logged in the System Test Template and stored on the shared drive so the team can review what had been tested.
10 MHC Data Warehouse Project Glossary of Terms UAT (User Acceptance Testing)- This phase entails the user validating the data and business requirements as outlined in the Data Inventory Document and the Requirements Document. The user will use the UAT Test document to plan and document the test scenarios to ensure that BO is functioning as required.
Go Live- Prepare the production environments, migration BO content, reports, and universes. Migrate ETL. The Go Live Checklist will be used to ensure all steps were done to support the move to production.
Post Production Support - This is a four week post-production period to finish up non-critical loose ends, and to address any unforeseen issues.
Bibliography
11 MHC Data Warehouse Project Glossary of Terms "help.sap.com." BI Launch Pad Users Guide:SAP Business Objects Business Intelligence Platform 4.0 Support Package 4. SAP Business Objects, 10 Aug. 2012. Web. 10 Apr. 2013. Reeves, Laura L.. A manager's guide to data warehousing. Indianapolis, IN: Wiley Pub., 2009. Print. "SAP Business Objects Web Intelligence Rich Client Users Guide; SAP Business Objects Business Intelligence Suite 4.0 Feature Pack 3." docs.google.com. Mount Holyoke College file, 16 Mar.2012. Web. 10 Apr. 2013. "SAP Business Objects Web Intelligence Users Guide; SAP Business Objects Business Intelligence Suite 4.0 Support Package 5.0." help.sap.com. SAP Business Objects, 11 Mar. 2013. Web. 10 Apr. 2013. 12