Building Data Warehouse Using ERP Data Sources
Total Page:16
File Type:pdf, Size:1020Kb
ERP Data Warehouse Architectures, Tools & technologies
by
Wipro Technologies
January 2002 ERP Data Warehouse
Table of Contents
Table of Contents...... 2 1 Executive Summary...... 3 2 Introduction...... 5 3 Technical Challenges Associated with ERP Data warehousing...... 5 4 Desired features of the ERP Data Warehouse...... 6 5 Architectural Choices...... 6 6 Tools & Technology Available...... 8 6.1 Packaged Solution from ERP vendors...... 8 6.1.1 SAP Business Information Warehouse...... 8 6.2 Extraction Tools...... 8 6.2.1 ActaWorks from Acta...... 8 6.2.2 Data stage from Ascential...... 9 6.2.3 PowerCenter from Informatica...... 11 7 Conclusion...... 11 8 Appendix A...... 13
Wipro Confidential Page 2 of 37 ERP Data Warehouse
1 Executive Summary
ERP applications have come into existence with a great promise of providing an integrated applications environment that addresses all the issues surrounding uncontrolled growth of stove pipe IS applications and serving full enterprise needs.
After implementing expensive ERP packages, organizations as well as product vendors realized that although these solutions streamlined operational processes and IS applications, it was extremely difficult to serve the information needs of management. As a result organizations had to implement data warehouses for their decision support and business intelligence needs.
There are 3 options available for the organizations for implementing the data warehouse.
ERP-centric Data Warehouse: Data Warehouse is implemented using ERP vendor’s data warehousing package such as SAP Business Information Warehouse or PeopleSoft Enterprise Warehouse.
Due to proprietary nature of these packages, this option is recommended only when more than 80% of the data in the data warehouse come from the same vendor’s OLTP systems. Otherwise data integration and customization cost may be more than the benefits of the well-integrated application environment.
Two Independent Data Warehouses: One Data warehouse is built with non-ERP source data and the other is built within the ERP environment with ERP source data.
This option does not provide true enterprise or cross-functional view and can result in multiple versions of truth. It also involves the burden of maintenance of two environments resulting in overheads in terms of cost, manpower, diverse skill set and also creates confusion among the business users.
Custom Build Data Warehouse: This is built outside ERP environment using best of breed tools and technologies.
This is a highly flexible solution and enables single version of truth, and can grow incrementally as organizational information needs grow. It is also highly scalable. But it takes slightly longer time to implement and more development effort. This option is recommended for cross-functional, high-performance, high volume, multi-dimensional analytical environment with large user base.
Detailed advantages and disadvantages of each of these options are provided in section 5.
Wipro Confidential Page 3 of 37 ERP Data Warehouse
ETL tools for extraction of data from SAP R/3 and loading into SAP BW:
ActaWorks from Acta: ActaWorks is tightly integrated with SAP R/3 and works seamlessly with SAP R/3 as well as BIW. It can also extract data from Non SAP R/3 data sources as well. It is becoming popular among the BIW installations where SAP R/3 is the primary source. It has features to extract incremental changes from SAP R/3.
Data Stage from Ascential: Ascential’s Data stage is also one of the leading ETL tools. SAP is a reseller of Data Stage and DataStage load pack for SAP BW. These tools are integrated into mySAP business intelligence framework.
PowerCenter from Informatica: Informatica PowerCenter is a strong ETL tool. It has separate plug-ins (PowerConnect) for SAP R/3, Siebel, and PeopleSoft etc. Hence, it can extract the data from SAP R/3, other ERP and Legacy systems. It could be a better choice when the majority of the data comes from non-SAP legacy sources.
All the 3 products are SAP certified. However, ActaWorks was the first product to be developed that is well integrated with SAP R/3 and popular among SAP R/3 users. Later on SAP has become reseller for Data Stage product and integrated in its mySAP BI platform.
Detailed comparison of these 3 ETL tools is provided in the Appendix A.
Wipro Confidential Page 4 of 37 ERP Data Warehouse
2 Introduction
Operational systems have been streamlined by deploying packaged enterprise resource planning (ERP) applications. These packages replace legacy and homegrown systems that are not well integrated. Traditionally, ERP packages have automated back-office operations, such as finance, human resources, and manufacturing. Now there are packages for front-office operations, such as sales, marketing, and customer service.
However, ERP systems cannot address decision-support requirements for several reasons: ERP applications are designed to process large volumes of simple requests Larger queries take a long time for processing and need more resources ERP databases contain thousands of small tables that eliminate data redundancies It is easy to find and update a single data item, but querying is difficult ERP databases are very difficult to access, query, and navigate Some ERP systems store data in proprietary formats, making it difficult to access Finding the right entity within thousands of tables is a formidable barrier ERP system does not satisfy all the operational requirements of an enterprise. Similarly not all the modules of an ERP package meet the requirements of an enterprise, resulting in the implementation of part of the ERP package or multiple ERP packages that may co-exist with other legacy applications
Therefore, there is a need to implement a data warehouse sourcing the data from the ERP, CRM and legacy systems to serve the information needs of business users.
This paper outlines the technical issues involved, Desired features and architectural options available for implementing the data warehouse under ERP and non-ERP environments.
3 Technical Challenges Associated with ERP Data warehousing
Following are the technical issues involved in extracting the data from ERP sources.
Proprietary nature of ERP systems’ programming environment and APIs The complex architectures of ERP systems, which embed business logic and processes The data schemas of ERP systems, which are complex and typically contain thousands (SAP has about 9,000 tables) of tables (often described with abbreviations) The use of non-standard storage formats Change data capture
Wipro Confidential Page 5 of 37 ERP Data Warehouse
4 Desired features of the ERP Data Warehouse
ERP data warehousing requires an ETL infrastructure that will enable the extraction and integration of the data from multiple diverse platforms like legacy, CRM, sales force automation and external marketing data providers. Capturing changed data from the ERP applications and legacy application will be a challenge due to large volume of transactions, complex architecture and given little time window for extracting the data from ERP applications. Organizations require information and analysis in real time to facilitate important decisions. To achieve this ERP data warehouse required to extract and transform data from ERP applications in a near real-time manner. Meta Data management and reconciliation of inconsistent Meta data are biggest problems facing organizations with regard to their data warehousing applications. ERP data warehouse should support both the technical analyst and less technical general business users. ERP data warehouses are expected to store global data of an organization. This requires separation of reference data that changes over time and transactional data that is constant. Dimensional model with slowly changing dimensions concept can address this well.
5 Architectural Choices
Approaches for Implementing Data Warehouses with advantages and disadvantages:
ERP Centric Data Warehouse: Data Warehouse is built within the ERP environment (DSS provided by ERP vendor) by pulling non-ERP source data also into DSS system provided by the same ERP vendor.
This option is recommended when majority of the data warehouse data (more than 80%) is sourced from ERP systems and business content for the required functional areas is available in the DSS provided by ERP vendor. Otherwise integration & customization effort can outweigh the benefits of tight integration.
Two independent Data Warehouses: One Data Warehouse is built with ERP data and the other is built from ERP data sources. This is a natural growth as it technically easier and politically right solution.
Custom Build Data Warehouse outside ERP environment: The Data Warehouse is built using best of breed tools outside the ERP environment. This option requires the data extraction from ERP sources that could prove costly. But with the advent of ETL tools such as ActaWorks, Ascential, Informatica that can extract data from ERP application layer, the issue is mitigated to some extent.
Wipro Confidential Page 6 of 37 ERP Data Warehouse
Following table elaborates on advantages and disadvantages of each of the above options: Option Advantages Dis-advantages ERP centric Data Tight integration of Not flexible Warehouse operational and decision Considerable customization support systems effort and requires 3rd part ETL Easier to implement closed tools to integrate non-ERP feedback loop DW sources data Industry best practices are Integration of non-ERP data made available in the form of (organizational or external) into business processes and ERP environment is complex standard reports due to proprietary interfaces and limited business content ERP vendors are traditionally strong in OLTP, but not in DSS applications Not proven for high performance, high volume multi- dimensional analysis with large user base Not all the functionality may be supported by any given ERP vendor Growth to real-time Data Warehouse may not be possible Two Easier to implement No enterprise/cross functional Independent technically view Data Politically natural solution Higher maintenance and Warehouses Earlier investments on sustenance costs existing DW initiatives are Prone to inconsistencies across protected two data warehouses leading to two versions of truth Ambiguity among the user community Custom built Flexible Data extraction from ERP OLTP Data Warehouse True enterprise wide single systems is complex outside ERP version of truth can be 3rd party vendor tools need to environment attained keep up to date with changing Easier to integrate external ERP environment data Longer time to implement Scalability is not an issue Open Architecture is amenable to real-time Data Warehouse refresh and closed loop feedback
Wipro Confidential Page 7 of 37 ERP Data Warehouse
6 Tools & Technology Available
6.1 Packaged Solution from ERP vendors
6.1.1 SAP Business Information Warehouse
Since SAP announced its business information warehouse in 1998, it has gone thru many transformations. Until version 2.1C, SAP BW has been primarily used for operational reporting that was not possible within SAP R/3. It had several limitations such as drill across, ODS structure and scalability. But version 2.1C (my SAP BI) seems to have addressed these issues and it now offers a sound BI platform for SAP R/3 users.
SAP has tied up with Ascential to integrate its ETL tool Data stage as part of the BI platform. With this it has overcome the weakness of transporting the non-ERP data into its business warehouse.
On the UI end it still does not have a competing OLAP tool, though its partners OLAP tool, such as Business Objects, Cognos, can be used for the same. Business Explorer UI that comes with business warehouse is excel like and does not offer robust OLAP functionality.
Business content is also still limited and does not match with its competitor’s offerings in the packaged applications space such as those from Epiphany, Broadbase/EPM, DecisionPoint Application, Hyperion, Gentia, NCR, SAS, and Alphablox etc.
6.2 Extraction Tools
6.2.1 ActaWorks from Acta
Acta was the first vendor to bring a product to market specifically tailored to support data warehousing with ERP systems. Today Acta offers the most comprehensive data warehousing and data integration products for use with ERP systems.
ActaWorks for SAP is designed to support tight integration with SAP ERP applications. In addition to providing an intuitive GUI for mapping data from SAP and non –SAP sources to data warehouse or data mart, ActaWorks extracts data via SAP R/3 application layer, allowing access to all SAP data and business logic. ActaWorks also features a component that supports real-time updates and change-data capture for data warehouses. Also Acta offers pre-packaged data marts or Rapid marts for use with Acta Works to speed warehouse development.
ActaWorks for SAP consists of five key components: ActaWorks Designer, a Meta data repository, ActaWorks Server, ActaWorks Integrator for SAP and ActaWorks administrator.
ActaWorks designer is graphical tool for defining the data mappings, transformations and control logic necessary for managing a complex multi step process for populating a data warehouse. Designer allows users to define data mappings and transformation rules using GUI modeled on SQL.
Wipro Confidential Page 8 of 37 ERP Data Warehouse
The data mappings and transformation rules specified with designer are stored in ActaWorks Meta data repository. The repository also stores information describing the schema for SAP and non-SAP data sources and the target data warehouse schema. To facilitate the process of identifying the right information to extract, ActaLink provides English language descriptions of both tables and columns.
The hub of the transformation process is ActaWorks Server, which performs complex data transformations and integrates data from non-SAP sources with SAP data. The server is designed to provide high throughput and uses in-memory transformations, parallel pipelining.
To extract data from SAP, the ActaWorks Integrator for SAP automatically generates optimized ABAP/4 code. This removes the need to write and maintain custom ABAP/4 code. The features of the integrator are: Populates Meta data repository with SAP logical view of the data. Translates ANSI SQL constructs specified in the designer into ABAP/4 support (OpenSQL) Automatically Generates ABAP/4 code extracting data Uses SAP administrative infrastructure by extracting data via SAP’s application server layer thereby providing access to all SAP data, including data stored in pool and cluster tables, and other SAP business logic. Automatically extracts the hierarchies from SAP
ActaWorks Administrator provides facilities for warehouse administrators to schedule and monitor jobs.
To capture the changed transactions in the source (SAP) can be implemented using the IDocs (Intermediate Document architecture). Idocs capture data when a transaction is being processed. This is very effective means of capturing the data from SAP when underlying tables do not contain date and time stamps. ActaWorks generates ABAP to read staged Idoc data from header and detail.
ActaWorks supports real-time data transformation including receiving messages from ERP systems or XML-based, e-commerce applications. “Real-time” means that ActaWorks reacts to messages as they are sent, performing predefined operations to respond appropriately. For real-time updates from the SAP it is required to install the Acta RealTime Component. For real-time data extraction, ActaWorks Real-Time uses SAP R/3 Application Link Enabling (ALE) technology and Intermediate Documents (IDocs) to capture and process transactions. Idocs can be enriched with other R/3 or non-R/3 data as you specify in the real-time data flow design.
6.2.2 Data stage from Ascential
Using DataStage XE, warehouse developers can take data from diverse sources and complex data forms such as legacy data, B2B and web environments, as well as enterprise applications such as SAP and Siebel. They can transform this data, load it into a warehouse, data mart or business intelligence application for analysis. By managing the Meta data, DataStage XE completely integrates Meta data with the most
Wipro Confidential Page 9 of 37 ERP Data Warehouse commercially popular data modeling and data access tools. Finally, the quality assurance component enables warehouse administrators to audit, monitor, and manage the quality of the data as the warehouse expands and evolves.
Specifically, DataStage XE is an integrated set of software components consisting of: Quality Manager for data quality assurance critical for accurate business analysis MetaStage for Meta data integration in order to maintain consistent analytic interpretations as well as track changes to the data warehouse DataStage for data collection and integration from diverse sources for complete "snapshots" and data movement and transformation for system and end-user productivity DataStage XE/390 for extracting legacy data while using the power of the mainframe infrastructure
As part of DataStage XE, Quality Manager gives development teams and business users the ability to audit, monitor, and certify data quality at key points throughout the data integration lifecycle. Further they can identify a wide range of data quality problems and business rule violations that can inhibit data migration efforts as well as generate data quality metrics for projecting financial returns.
By improving the quality of the data going into DataStage transformations, organizations also improve warehouse performance and the data quality of the resultant target data. The end result is validated data and information for making smart business decisions and a reliable, repeatable and accurate process for making sure information maintains its superior quality over time.
A critical component of DataStage XE is MetaStage, Ascential’s solution for meta data management across data warehouse environments. Most data warehouses and marts are created using a wide variety of tools that cannot exchange Meta data. As a result, business users are unable to understand and leverage enterprise data because the contextual information, or Meta data, required is unavailable or unintelligible. Based on patented technology, MetaStage offers broad support for sharing Meta data between third-party data environments. MetaStage uses MetaBrokers to ensure the complete exchange of all related meta data, regardless of source type.
DataStage is a client/server development tool for building and supporting data migration applications. Ascential Software offers options such as XML Pack, Enterprise Application Packs, and the MQ Series Plug-in. On the server side, DataStage has a transformation engine that enables complex processing while providing ease of use, management control and maximum performance. The DataStage client is a graphical tool with the following major components: Manager, Designer, Director, and Administrator. The DataStage Manager supports the import/export of meta data, as well as the central control of shared transformation objects. The Designer is the tool that visually represents the data transformation process with an intuitive easy-to-use graphical engine. The Director, as its name implies, supports the scheduling and execution of completed transformations, and the Administrator provides for housekeeping and security functions. Data warehousing professionals use the DataStage client to interact with the DataStage Server, the workhorse that processes the transformations and moves data at run-time.
Wipro Confidential Page 10 of 37 ERP Data Warehouse
Enterprise application (EA) systems provides critical data sources for business analysis. DataStage XE provides full integration with leading enterprise applications including SAP, Siebel, and PeopleSoft.
The DataStage Extract PACKs for SAP R/3, Siebel and PeopleSoft, and the DataStage Load PACK for SAP BW enable warehouse developers to integrate this data with the organization's other data sources. The DataStage Extract pack provides:
1. Extensive transformation capabilities to manipulate SAP R/3 data and load it to new or existing data warehouse or data mart. 2. Generates ABAP/4 SAP’s programming language. Automation of ABAP code shields developer from the complexity of manually writing ABAP code and more importantly reduces the development and maintenance costs 3. Access to all SAP R/3 data including transparent, pool, view and cluster tables using unique feature –DataStage Meta data object browser. With over 15000 SAP tables and its known complexity, the meta data object browser enables easy navigation through the info hierarchies before joining multiple R/3 tables – Simplifying the process 4. Enables two methods of operation to optimize performance and resources: Generated ABAP code can be uploaded to the R/3 system via remote function call or for the warehouse developers who don’t have direct access to the R/3 System, R/3 script can be moved manually via FTP and be imported by an R/3 administrator. Job scheduling can be controlled either from the DataStage Director or natively from the SAP scheduling services. 5. Performs complex transformations easily with drag-and-drop operations using DataStage designers graphical mapping tool 6. Utilizes SAP’s RFC library and iDocs; two of the primary data interchange mechanisms for access for SAP R/3, thus conforming to SAP interfacing standards. 7. Another key function is the ability to capture incremental changes and produce event-triggered updates with SAP’s IDoc (Intermediate Documents) functionality. DataStage’s IDoc extract interface retrieves IDoc meta data and automatically translates the segment fields into DataStage achieving real-time SAP data integration
6.2.3 PowerCenter from Informatica
PowerCenter from Informatica is one of the popular and powerful tool in the ETL space. It offers seamless integration with wide data sources including the ERP, mainframe and relational systems as well as e-commerce and legacy applications. Informatics’ PowerConnect for PeopleSoft and PowerConnect for SAP can directly extract and integrate the data from SAP R/3 and people soft applications, as well as other formats. PowerConnect modules are component-based offering that complement and extend the functionality of Informatica core data warehouse development platform – the PowerCenter.
PowerConnect for SAP provides Informatica PowerMart/PowerCenter users with native, high-speed data extraction from SAP R/3 systems, enabling full access to all SAP R/3 tables and SAP R/3 Info hierarchies. PowerConnect for SAP extracts data from SAP using ABAP 4, SAP’s proprietary 4GL. Using powerconnect, users can access all SAP R/3 Tables, including transparent, pool and cluster tables. This allows full access to all
Wipro Confidential Page 11 of 37 ERP Data Warehouse data residing in SAP R/3’s application layer. Once extracted, SAP data is delivered to the PowerCenter server, which transforms the data for delivery to target data warehouse, data marts, or other analytic applications.
PowerConnect for SAP lets you customize the R/3 extraction routines for load processing. You can choose to stage the data in an intermediary file or stream it directly into the PowerCenter Server. In addition when accessing data in R/3 PowerConnect only performs the actual extraction processes on the R/3 system. Transformation and load processing occur within the PowerCenter helping to minimize the load on the R/3 environment.
7 Conclusion
Companies have been struggling for some time now to build data warehouses and data marts that will allow their users to perform better and easier analysis of SAP data. Due to the complexity of the SAP R/3 system and a lack of good data warehousing products specifically designed to handle SAP data, companies were forced to write their own custom extraction programs in ABAP/4. This however is changing and good number vendors, recognizing the opportunity, have introduced ETL products that can assist in extracting and integrating SAP and non-SAP data and moving it into the warehouse.
SAP is seriously pursuing its efforts to provide a scalable BI platform by upgrading its Business Information Warehouse. It is enhancing the business content in each of the new versions, but still lacks the capabilities provided by competing packaged solutions. It has also integrated DataStage (an ETL tool) to integrate non-SAP data also into BW platform.
Meta group predicts that by 2005, SAP BW can become a dominant player in the packaged data warehouse players catering to enterprise level information needs of SAP R/3 users. It may not achieve the same success among non SAP R/3 users.
Wipro Confidential Page 12 of 37 ERP Data Warehouse
8 Appendix A
Ascential Data Stage Category Criteria Informatica PowerCenter Acta Works XE
Version----> 5.0 5.0 5.1 Architecture Architecutr Hub and Spoke Architecture Open Client Server Client Server Architecture e Platform facilitate the sharing of Meta Data
Scalable Highly scalable and extensible Scalable, Flexible Highly scalable Scales up and technology. Scale up as the Technology. w.r.t the hardware and Extensible data and load grows. Scales up software Technology w.r.t the hardware and software
Client Windows 2000/NT/98 Windows 98/NT/2000, Windows 95/NT/2000 Platform OS/2 Server Sun Solaris, AIX, HP-UNIX, Windows NT/2000, HP- Windows NT ( Intel and Platforms Windows NT/2000 Unix, Solaris, AIX Alpha Platforms ), UNIX AIX, HP-UX, Sun Solaris, COMPAQ Tru64. Data Stage XE 390 works on OS/390 platform.
Wipro Confidential Page 13 of 37 ERP Data Warehouse
Which For Extraction: DB/2 Oracle, Informix, QSAM: Sequential flat DBMS are DB/2 /400,Flat Microsoft SQL Server, files ISAM: VSAM: supported Files,IMS,Informix, MS SQL Sybase, DB2 UDB, KSDS, RSDS, ESDS - for Server, ODBC-compliant support GROUPS, multi- extraction MS Access, Oracle, databases, and flat files level arrays, REDEFINES, and loading Sybase,UDB,VSAM,ODBC,Others and all PICTURE clauses. DB2, Adabas, Oracle OCI Targets: Informix ( For releases 7 and 8 ) , DB/2 /400,MS SQL Server, MS Sybase Open Client , Access,,Oracle, PeopleSoft Informix CLI , OLE/DB for Enterprise Microsoft SQL Server 7, Performance ODBC. Management(EPM),SAP® Business Information Warehouse (BW),Sybase,UDB,Flat Files,Others Support for DataStage XE provides ERP full integration with Sources leading enterprise applications including SAP, Siebel, and PeopleSoft. The DataStage Extract PACKs for SAP R/3, Siebel and PeopleSoft, and the DataStage Load PACK for SAP BW enable warehouse developers to integrate this data with the organization's other data sources
Wipro Confidential Page 14 of 37 ERP Data Warehouse
Code Supports development of All the objects in the Permits the reuse of Reusability Mapplets which acts as library object library can be re- existing code through capability between Mappings and also can useable. An object can APIs thereby eliminating within the make transformations shareable be data flow, workflow, redundancy and retesting product across Mappings. job etc. of established business rules
Parallelism Supports parallelism, one can Supports Parallelism, if it Automatically distributes run multiple mapping session is running on a multi independent job flows on the same server. prcessor computer. It across multiple CPU takes full advantage of processes.This feature the Hardware ensures the best use of Architecture. available resources and speeds up overall processing time for the application.
Wipro Confidential Page 15 of 37 ERP Data Warehouse
Code PowerCenter does not generate Does generate Code, but Only Datastage Generator code,all the mappings the Data Flow or Job XE/390 version developed will be inform of GUI Flow defined can be automatically generates interface. converted to code to and optimizes native check with Acta Support. COBOL code and JCL scripts that run on the OS/390 mainframe.
Data PowerCenter is based on Hub & Transformation is Transformation is engine Transforma Spoke architecture and has engine based and relies based - column-to- tion Methodinbuilt Transformation engine. on the server. column mappings (Engine Based ?)
Wipro Confidential Page 16 of 37 ERP Data Warehouse
Building & Aggregation can be built using Aggrigation thru Read to Enhances performance Managing the built in transformation use Transformation and reduces I/O with its Aggregates provided. function built-in sorting and aggregation capabilities. The Sort and Aggregation stages of DataStage work directly on rows as they pass through the engine rather than depending on SQL and intermediate tables. Support for Supports most of the industry Supports most of the It supports most of the various standard data types. This also industry standard data industry standard data data types depends on the kind of source types types. It supports XML system being used. also.
Data Through Quality Manager Quality it is possible to audit, Check monitor, and certify data functionalit quality at key points y or feature throughout the data integration lifecycle.
Wipro Confidential Page 17 of 37 ERP Data Warehouse
Debugging Does not a separate debugging Error Correction can be Helps developers verify and logging Tool. The workaround is by done for each job their code with a built-in features setting the "verbose" property workflow, data flow and debugger thereby on each transformation. By this even object. increasing application informatica will create log files reliability as well as in the server, which can be reducing the amount of used for further analysis. time developers spend fixing errors and bugs. Supports debugging on row-by-row basis using break points. DataStage immediately detects and corrects errors in logic or unexpected legacy data values using this. Highly useful for complex transformation, date conversions etc. Exception Throws out the error records or Support exception Supports exception Handling rejected records into a log file handling no extra effort handling. required.
Wipro Confidential Page 18 of 37 ERP Data Warehouse
How Tool Through log files stored in the Through Log files Developers can closely Provides server observe the running jobs information in the Monitor Window to about provide run-time exception feedback on user- selected intervals.The powerful process viewer estimates rows-per- second and allows developers to pinpoint possible bottle-necks and/or points of failure. Using the Director, the developer can browse detailed log records as each step of a job completes. These date and time stamped log records include notes reported by the DataStage Server as well as messages returned by the operating environment or source and target database systems. DataStage highlights log records with colored icons (green for informational, yellow are warnings, red for fatal)for easy identification. Restarting Support restarting of the Restart is possible. Can Restart is possible. Can an aborted mappings restart from the point of restart from the point of ETL process failure. failure.
Wipro Confidential Page 19 of 37 ERP Data Warehouse
Memory 128 MB/ 256 MB 64 MB /128 MB 64 MB (Minimum/ Recommen ded) requiremen t at client machine Memory Depends on the kind of 64 MB /128 MB Minimum 256 MB (Minimum/ application running, 128 MB / Recommen 256 MB ded) requiremen t at Server machine Repository PowerCenter comes with good Repository Backup can Supports distributed Backup and features for backup and be taken by using Repository - Remote Recovery recovery of the repository. This Reportistory Manager. sites can subscribe to a can done through Repository set of meta data objects Manager. within the warehouse application. These sites are notified via email when meta data changes occur within their subscription. DataStage XE offers version control such as table definitions, transformation rules, and source/target column mappings within a 2-part numbering scheme.
Wipro Confidential Page 20 of 37 ERP Data Warehouse
Meta data Metadata Meta data is captured and Automatically captures Stores all the meta data support Capture stored in the repository of the the meta data and stores in the Repository. PowerCenter in the repository Captures the Meta Data Automatically using component called 'Meta Stage' . It also offers broad support for sharing meta data between third- party data environments using Metabrokers. It maintains a complete catalog of the organization’s metadata, including physical, technical, business and process meta data. Business Business Meta data needs to Not available. Only DataStage XE provides View meta documented while building the Technical Meta Data is warehouse developers data mappings. This data will be stored. with a central hub that stored in the meta data manages meta data at repository. Using the SQL the tool-integration level. commands it is possible to Remote sites can query the meta data. subscribe to a set of meta data objects within the warehouse application.These sites are notified via email when meta data changes occur within their subscription. Meta data Since meta data is stored in the Provides meta data User level security security repository of the product it is security through provided by DataStage very well protected. repository manager, Administrator needs userid and password to login.
Wipro Confidential Page 21 of 37 ERP Data Warehouse
Web Does not have any web BY using Access Server Yes , Supports Web Integration integration for Web administration. integration using Plugin support Using this it is possible toAPI control the whole loading process from a remote machine. Versioning Supports versioning with the Supports Versioning DataStage XE offers Support help of the repository and through central version control,which allows one to define the repository. saves the history of all baseline. the ETL development.It preserves application components such as table definitions,transformation rules,and source/target column mappings within a 2-part numbering scheme.Developers can review older rules and optionally restore entire releases that can then be moved to distributed locations. Metadata Sharable through the Metadata Does not exchange the Has its version of the repository's Exchange (MX2) API metadata with other Common Meta Model. compliance application The meta data can be to one of shared using the the MetaBroker. industry meta data standards
Wipro Confidential Page 22 of 37 ERP Data Warehouse
Meta data PowerCenter comes with the Central repository No tool currently views using meta data reporting tool which provides meta data available.The entire query tools will help the users to access the viewing facility and also history of the data can be meta data stored in the repository tables can be derived and viewed using repository.One can view meta queries using SQL Data Lineage. data using the query tools like statements. SQL etc.
Ease of setup Easy The installation process Easy to install only two An industry standard installation depends the platform on which components needs to installation script procedure being installed. Some times it installed. provided for each " can run into rough weather due DataStage "Packages" to various reasons. But most of helps in easier the cases it is very easy to installation and install automated configuration. Ability to It is possible to generate the Possible to Generate the Possible to create the generate target data mart schema similar Data mart Scehema. data mart schema similar Data mart to source database. to source schema similar to source database Support for Supports Start Schema data E-Caches provides ready-Does not support designing model for target data mart to-use data marts suites directly. But with data data mart design. with all the ETL facility integration capabilities of defined. DataStage/DataStage 390 with DB2 Warehouse Manager's data warehouse generation and management capabilities it is possible to design data mart/warehouse.
Wipro Confidential Page 23 of 37 ERP Data Warehouse
Importing It is possible to import the data Does not support. The MetaBroker for a data models from different modelling particular tool represents models tools by using Plug in called MX. the meta data just as it is from expressed in the tool ’s modeling schema. It accomplishes tools the exchange of meta data between tools by automatically decomposing the meta data concepts of one tool into their atomic elements via the MetaHub and recomposing those elements to represent the meta data concepts from the perspective of the receiving tool.In this way all meta data and their relationships in the integrated suite are captured and retained for use by any of the tools. Summarizing, MetaBrokers facilitates meta data exchange between DataStage and popular data modeling and business intelligence tools.
Wipro Confidential Page 24 of 37 ERP Data Warehouse
TransformationsFilter Supports Filter transformation Supports various types ofSupports Filter transformations: transformation Filtering, Merging, Key Generation, Table Comparison etc.
Format Support Format conversion and Format Conversion is Supports format conversion data type conversion. possible, conversion such as date & time display, numeric representation, National currency rules, Collating sequences etc. Lookup Suppors Lookup transformation Lookup funcitonlaity is Support lookup very well. possible, three types of procedures, hashed funcitonality, pre-cached,lookup tables to increase cahche-on-demand, no- performance. cache.
Wipro Confidential Page 25 of 37 ERP Data Warehouse
Scope for One can define user define Possible to define One can define user user variables but there is no such variable with scope define variables defined thing called scope. global, local and also can fields pass parameter values b/w various projects.
Joins Supports most of the join types. Supports all types of Supports most of the join joins. types using join transformation
Support for Supports external procedures, it Possible to call COM Built into DataStage are external is possible to call stored objects, DLL functions several features procedures procedures through mappings. etc. exclusively designed to support the packaging and deployment of completed data migration applications.
Wipro Confidential Page 26 of 37 ERP Data Warehouse
Management Scheduling Supports good scheduling Good Scheduler with in Good graphical feature feature and it is possible to the tool with Work flow scheduling and schedule the job/session using mechanism, calendar. Monitoring feature Server Manager. With limited provided by the work-flow mechanism. datastage component called Data Director. It can also generate CRON scripts to schedule from Unix. With DataStage Job Control API and Command Language interface provided, any remote C program or command shell can be used to initiate jobs, query their results or program a more complex job execution sequence. Defining Yes it is possible in a Using the data stage calendar and very sophisticated Director it is possible to using it for manner schedule the jobs ad-hoc scheduling Performance Provides more control to No special performance monitoring of user through more monitor tool but ETL process attributes, for better developers can closely monitoring observe the running jobs in the Monitor Window to provide run-time feedback on user- selected intervals. The powerful process viewer estimates rows-per- second and allows developers to pinpoint possible bottlenecks
Wipro Confidential Page 27 of 37 ERP Data Warehouse
and/or points of failure.
Performance It's a strong point of ActaCan provide Very high Options as it gives more performance. Can parameter for enhance performance performance using In-memory hash improvement. tables, reducing I/O operations with its built-in sorting and aggregation capabilities. DataStageallows to bypass ODBC and "talk" natively to the source and target structures using direct calls thereby increasing performance. Specifying It is possible to load a large set Possible to specificy the Does not suppot the of records to the target automaticity of the atomicity updates. atomicity of database. updates the updates Security – Has got good security features Provides good secutity Provides security features Encryption and managed through through repository using Data Administrator. Repository Manager. No manager. Does not Encryption facility. provide encryption facitlity Security Not Available No option to provide Not Available and Access LDAP interface Control using LDAP
Wipro Confidential Page 28 of 37 ERP Data Warehouse
Adaptability Impact It is possible to find out the Provides impact analysis Good impact analysis analysis impact on change which needs capability capabilities provided by capability to be done. the Metastage Hub across the integrated environment. It gives the entire relationship associated with an object. SCD Requires programatic design to Can be handled using Requires programatic update the SCD. filter and lookup design to update the transfors. SCD.
Version/ Supports versioning and Provides good interface Provides version control configuratioconfiguration management. to control the versions through distributed n repository. (Repository manageme can exists on either nt source or target)
Support for Ability to Supports Flat file, oracle, sql Only Supports heterogenous Oracle8.x,Informix,SQL growth handle server, DB2, and other ODBC sources like Oracle, various compliant RDBMS. Server and DB2 Informix, SQL Server, source only.Also provide SAP DB2, flat files, XML, ERP R3 connectivity without types from Sources like Oracle Apps, any plugins. flat to files SAP R/3, Peoplesoft etc. to major RDBMS
Wipro Confidential Page 29 of 37 ERP Data Warehouse
IncrementalThis needs to be handled in Yes Supports Incremental upload mappings manually. load. Changed Data Capture captures changes to the operational data and produces Delta Store files.DataStage XE uses these files to update the data warehouse.From a workflow perspective,the warehouse developer defines a Delta Data Store file as an input table within one of the DataStage XE products on a Windows 95/NT platform. Support for One can call external procedure Yes DataStage supports a External in the mapping using external wide variety of such bulk loader transformation. load utilities either by directly calling a vendor ’s bulk load API or generating the control and matching data file for batch input processing.DataStage developers simply connect a Bulk Load Stage icon to their jobs and then fill in the performance settings that are appropriate for their particular environment.
Wipro Confidential Page 30 of 37 ERP Data Warehouse
Intermediat Only generates a temp file Does not generate Do not require e file when doing sorting or loading. intermediate file during intermediate files or generation loading. secondary storage during locations to perform loading aggregation or intermediate sorting during loading process. Event Does not supports "true" work Yes it is possible for do Supports Event based based flow mechanism. This can be loading loading done using external schedulers or workflow tools like AppWorks or NT Scheduling or using Mainframe OPC Scheduling tools. Support for Supports Oracle, Informix, SQL Only Sybase Adaptive Server , wide range Server, DB2 etc Oracle8.x,Informix,SQL Sybase Adaptive server of Server and DB2 only. IQ, Microsoft SQL Server databases 7 via OLE/DB , Microsoft for SQL Server 6.5 via BCP , storing(Tar Informix Redbrick, get) Teradata, UDB. Bulk information Loaders - Oracle , Informix ADO/XPO High Performance . Ascential databases- UniVerse, Unidata. Also XML,e-mail systems and Web Logs, ERP data and MQSeries messages. Support for Supports multi user Supports multi user Supports multi user client multi-user development environment. development server development developme environment environment nt environmen t
Wipro Confidential Page 31 of 37 ERP Data Warehouse
Advance Data Re-usability Supports re-usability of the provides various reusableCode Reusability is suported. Ascential's Transformation code by making transformation objects like reusable. Jobs,workflows,dataflowsQuality Manager provides etc. a framework for developing a self- contained and reusable Project which consists of business rules, analysis results, measurements, history and reports about a particular source or target environment. Support for Support Built in transformations Support built in functions pre-built functions and built in like aggrigator , filter etc. routines are available functions
Wipro Confidential Page 32 of 37 ERP Data Warehouse
Handling Does not handle duplicate rows. Possible to handle Does not handle duplicate To be hanldled programatically duplicate records duplicate rows. To be records hanldled programatically
Lookup Supports caching of lookup Possible to define lookup Supports Lookup cache cache tables. cache through lookup transformations
Consistency andGlobal Meta Using PowerCenter and Supports Global Meta MetaBrokers enable the data PowerMart model it is possible Data sharing of meta data re-use to handle global meta data. among all of the tools in the warehouse environment.With MetaBrokers, tools can share meta data without having to change their
Wipro Confidential Page 33 of 37 ERP Data Warehouse
internal meta schema to conform to a common model.
Compatibility Compatibili Currently PowerCenter Supports Supports EAI tool TIBCO Only IBM MQ Series is supported. with third party ty of ETL following EAI vendors IBM MQ as an input . Tools with Series, TIBCO, Vitria and tools EAI tools webMethods as source/ target for the data.
Licensing & Server Licensing Includes following for Provideds evaluation and Information Not availble Pricing Licensing Basic Version: permanent . No ability to add-on licenses.Which supports PowerMarts multiuser environment · No Global Repository and SAP R3 connectivity. · No centralized monitoring · 1 Server Engine* · 2 Relational Database Source Types · 2 Target Instances · Unlimited Flat File Sourcing · Unlimited Developers . Single CPU Unix Version Costs : US$
Wipro Confidential Page 34 of 37 ERP Data Warehouse
140 K Windows NT/2000 Ver : US$ 95 K
Client There is no separate licensing There is no separate Information Not availble Licensing for the Client. It Comes along license required for with the server. client. ODC No transfers are allowed from Information Not availble Licensing the client owned software to Wipro. Separate license has to be procured. May be Lab license will do which will be half the cost of the production license
Vendor 2 Informatica was recently namedActa continues to see Information consecutive the 11th fastest-growing strong growth in data years of technology company in Silicon integration with second profitability Valley by Deloitte & Touche. quarter revenue growth The ranking resulted from the results up 110%. company’s 10,491 percent revenue growth between 1995- 1999.
Wipro Confidential Page 35 of 37 ERP Data Warehouse
Significant PowerCenter Works with most SAP is a reseller of third party of the software,database and Ascential’s DataStage partner hardware vendors. Built on and DataStage Load support most with open system. The PACK for SAP BW with product like powerconnect for the sole target being SAP DB2 has been brought by BW. informatica and supported. Global Has Global presence and has Ascential Software presence support most of the continents. Corporation is the leading and support provider of Information Asset Management solutions to the Global 2000. Number of is around 1300 as of Oct 2001 Has more then 200 More than 1800 as of Customers customer as of Oct 2001. Aug' 01
Company All the informtaion regarding Revenue for Ascential financial the health of the company has Software's DataStage®, info readily been reported in its website. Media360™ and related available product and service offerings was $27.0 million in the third quarter, an increase of 14% from $23.6 million in the third quarter of 2000. Revenue for these offerings for the nine months ended September 30, 2001 was $93.9 million, an increase of 47% over the $63.8 million in the first nine months of 2000.
Wipro Confidential Page 36 of 37 ERP Data Warehouse
Company Informatica Came to BI market Acta is well positioned to Adds significant meta focus on with the ETL product and has drive the "data data management ETL established a major player in integration market" and services to the entire segment forthe market. This product will be coming up as major datawarehouse,including ETL. Intend to offer the the future continue to be the flag ship player. product despite change in its capability for positioning in the BI market heterogeneous cross-tool analysis and query capabilities.Exploitation of XML Integration to enhance e-businesses communication.Delivers Key Metabroker development capabilities for its customers and partners.
Wipro Confidential Page 37 of 37