Oracle Data Integrator – Solution Overview
Nguyen Tuan Khang , [email protected] Senior Solutions Consultant Fusion Middleware Oracle Vietnam Why Data Integration?
NEED… Information How and Where you Want It
Corporate Performance Business Process Business Activity Business Intelligence Management Management Monitoring
Data Integration
Migration Data Master Data Data Federation Real Time Warehousing Management Synchronization Messaging ------
HAVE… Data in Disparate Sources
------Legacy ERP CRM Best-of-breed Applications
3 3 Pillars of Data Integration
h
c
t c
a
n B y
Async S
4 Enterprise Information Integration The Traditional Approach
Target Data Source Transform Load Applications Extract Warehouse
• ETL “processes” often use batch processing approaches • Example: Customer nightly batch runs can take > 24 hours! • “Services” that operate on data are not easily reusable in other contexts • ETL “Services” and “Processes” are insecure and hard to monitor (i.e. no SLA)
5 Challenges In Data Integration
CHALLENGE 1. Increasing data volumes; decreasing batch windows
2. Non-integrated integration
3. Complexity, manual effort of conventional ETL design
4. Lack of knowledge capture
6 Oracle Data Integrator Based on Technology from
Data Movement and Transformation from Multiple Sources to Heterogeneous Targets B E N E F I T D I F F E R E N T I A T O R 1 Best Performance Heterogeneous “E-LT” 2 Productivity Declarative Design 3 Real-time Integration Declarative CDC 4 Hot-Pluggable Knowledge Modules The Chosen Integration 5 Future Proof Technology of Oracle Fusion
7 Typical Considerations for ODI
• High volume data synchronization • more than 20MB/min • Heterogeneous data sources • DB2/AS400, Oracle, Excel, File, SQL, BAM… • Capture new data changes regardless of data sources • CDC using Native Journal, LogMiner or Trigger… • Real-time data synchronization • Easy to implement the solution without changing your current IT infrastructure • No separate server required
8 Challenges & Emerging Solutions In Data Integration
CHALLENGE EMERGING SOLUTION 1. Increasing data volumes; Shift from E-T-L to E-LT decreasing batch windows
2. Non-integrated integration Convergence of integration solutions
3. Complexity, manual effort of Shift from custom coding to conventional ETL design declarative design
4. Lack of knowledge capture Shift to pattern-driven development
9 11 E-LT Architecture High Performance Conventional ETL Architecture Transform in Separate ETL Server • Proprietary Engine Extract Transform Load • Poor Performance • High Costs
Eg. Informatica, IBM Datastage Transform in Existing RDBMS • Leverage Resources Next Generation Architecture • Efficient • High Performance ““E-LTE-LT ””
Transform Transform Benefits Extract Load Optimal Performance & Scalability Easier to Manage & Lower Cost Oracle Data Integrator
10 11 Traditional E-T-L Technical Detail
• Need one powerful server for Transform Server and for its staging data tables • High total cost for maintenance • It is not flexible when we add more source and target data sources • Require coding Conventional ETL Architecture • Bad performance (more I/O among staging Transform tables and source/target) Extract Server Load S1
Target 1
S2 ETL DB S3 Repository ------Staging tables
11 11 Next General Architecture: E-LT Technical Detail
• Leverage resources for transformation for high performance, less I/O, and license • Design data flow by pre-defined templates, open for all types of data sources (drag & drop) • Capture changes data for near real-time data synchronization • No coding required E-LT Architecture Target 1 S1 Extract Load Transform S2 Staging tables S3
------For scheduling and ODI Agent real-time monitoring changes only
No need at ODI Designer production 12 22 Active Integration Batch, Event-based, and Service-oriented Integration
• Evolve from Batch to Near Oracle Data Integrator
Real-time Warehousing on Event Conductor Service Conductor Common Platform Event-oriented Service-oriented Integration Integration • Unify the Silos of Data Integration Metadata • Data Integrity on the Fly Declarative Design • Services Plug into Oracle SOA Suite Data-oriented Integration Data Conductor • Benefits Enables real-time data warehousing and operational data hubs Services plug into Oracle SOA Suite for comprehensive integration
13 33 Declarative Design Developer Productivity
Conventional ETL Design Specify ETL Data Flow Graph • Developer must define every step of Complex ETL Flow Logic • Traditional approach requires specialized ETL skills • And significant development and maintenance efforts
Declarative Set-based Design • Simplifies the number of steps ODI Declarative Design • Automatically generates the Data Flow whatever the sources and target DB 1 2 Define Automatically What Generate Benefits You Want Dataflow Significantly reduce the learning curve Shorter implementation times Streamline access to non-IT pros Define How : Built-in Templates
14 44 Pluggable Data Integration Architecture Hot-Pluggable: Modular, Flexible, Extensible
Pluggable Architecture Reverse Journalize Load Check Integrate Service Engineer Metadata Read from CDC From Sources to Constraints before Transform and Move Expose Data and Source Staging Load to Targets Transformation Services Reverse WS WS WS
Staging Tables Load Integrate Services CDC Target Tables Journalize Check Sources
Benefits Tailor to existing best practices Ease administration work Depend on the specific data source, we will select right pre-defined coding module (Knowledge Module) -> Hot-Pluggable Support all types of data sources (DB2/AS400, Oracle, Excel, File…) Reduce cost of ownership
15 44 Knowledge Modules Hot-Pluggable: Modular, Flexible, Extensible
Pluggable Knowledge Modules Architecture Reverse Journalize Load Check Integrate Service Engineer Metadata Read from CDC From Sources to Constraints before Transform and Move Expose Data and Source Staging Load to Targets Transformation Services Reverse WS WS WS
Staging Tables Load Integrate Services CDC Target Tables Journalize Check Sources Error Tables
Sample out-of-the-box Knowledge Modules
SQL Server Oracle Check MS TPump/ Log Miner JMS Queues Oracle Merge Oracle Web SAP/R3 Triggers DBLink Excel Multiload Services
Oracle Check Siebel EIM DB2 Web Siebel DB2 Journals DB2 Exp/Imp SQL*Loader Sybase Type II SCD Schema Services
Benefits Tailor to existing best practices Ease administration work Reduce cost of ownership
16 44 KMs: Truly Heterogeneous
• Generic SQL DB • Netezza Performance Server 2.2.1 • Oracle DB 9i • Hyperion Essbase • Oracle DB 10g • PostgresSQL 8.1 • Oracle DB 10g XE • MySQL 4.0 • IBM DB2/400 • MySQL 5.0 • IBM DB2/UDB • Oracle BI Suite 10g • IBM Informix SE • Oracle BAM 10g • IBM LDAP Server • Oracle Internet Directory 9i • MS SQL Server 2000 • OpenLDAP 2.3 • MS SQL Server 2005 • Siebel CRM 7.8 Out-of-Box • MS SQL Server 2005 SE • JD Edwards Knowledge • MS Office Access 2000 • PeopleSoft Modules • MS Office Excel 2000 • SAP R/3 • MS Active Directory • Oracle EBusiness Suite • Sybase ASA 8.x & 9.x • Oracle AQ 10g • Sybase IQ 12.x • Oracle SOA Suite • Sonic MQ v7.0 • Oracle ESB 10g • Teradata V2R5.x • SalesForce.com App Exchange • Teradata V2R6.x • Any JMS Standard Implementation
17 Popular Usage Scenarios
18 E-LT for Data Warehouse Create Data Warehouse for Business Intelligence Populate Warehouse with High Performance ODI
Load Incremental Update Aggregate Heterogeneous sources Transform Data Integrity Export Capture Changes and targets Incremental load
Cube Slowly changing Operational
Analytics dimensions
------Cube ---- Data Warehouse Data integrity and ---- consistency
Cube Changed data capture Data lineage Metadata
Data Transformation Data Warehousing
19 ODI for Master Data Management Common Data Quality, and Middleware Services
Solutions & Applications Master Data Management Vertical Driven Data Object Centric Industry TelcoTelco EnergyEnergy BankingBanking RetailRetail MfrMfr ….…. Solutions Application Focus Middleware Foundation MDM CustomerCustomer SupplierSupplier EmployeeEmployee ProductProduct AssetAsset ….….Applications Process Orchestration Business Intelligence Registry & Policies Fusion Middleware Foundation Data Integration & Quality Oracle Data Integrator Golden Oracle Data Integrator Master E-LT Agent E-LT Records Batch & Real-time Integration Metadata Data Quality & Profiling Transformation & Data Routing
Other Oracle Siebel PeopleSoft Sources SAP/R3 EBS CRM
20 ODI Enhances Oracle BI Populate Warehouse with High Performance ODI
Oracle BI Suite EE
Interactive Answers Publisher Delivers Dashboards Oracle Business Intelligence Oracle BI Presentation Server
Oracle BI Server Suite EE: Simplified Business Model View Advanced Calculation & Integration Oracle BI Engine Enterprise Data Warehouse Intelligent Request Generation Optimized Data Access Bulk E-LT
Oracle Data Integrator Oracle Data Integrator: E-LT Agent E-LT Metadata Populate Enterprise Data Warehouse Optimized Performance for Load and Transform Extensible Pre-packaged E-LT Other Oracle Siebel PeopleSoft Sources SAP/R3 EBS CRM Content
21 ODI Enhances Oracle SOA Suite Add Bulk Data Transformation to BPEL Process
Oracle SOA Suite Oracle SOA Suite: BPEL Process Manager Business Activity Monitoring BPEL Process Manager for Web Services Business Process Manager Orchestration Declarative Rules Engine
Enterprise Service Bus Oracle Data Integrator: Oracle Data Integrator Efficient Bulk Data Processing E-LT Agent E-LT Metadata as Part of Business Process Interact via Data Services and Transformation Services
Bulk Data Processing
22 ODI with BAM Populate BAM with ETL Data Efficiently
Oracle SOA Suite Oracle SOA Suite Business Activity Monitoring Business Activity Monitoring Event Monitoring Web Applications for Real-time Business Insight BPEL Process Manager Message-based, event- Web Services Manager driven, memory-resident
Business Rules architecture Engine
Event Engine Report Cache Enterprise Service Oracle Data Integrator Bus High Performance Loading of Active Data Cache BAM’s Active Data Cache Pre-built and Integrated via Knowledge Modules Oracle Data Integrator BAM Java APIs Exposed Bulk and Real-Time Agent through “Interface” Like Any Data Processing Metadata Other Target
Message Sample Combined Use Queues Data CDC Cases Warehouse PeopleSoft Monitor Together Events and SAP/R3 the Aggregate Implications of Events 23 Integration with SOA/BI/Fusion Resolve All Integration Challenges
Oracle BPA and Oracle BI Human Workflow
Invoke Invoke
Dashboards, Reporting, Analysis, Publishing
Invoke
BPEL Process Oracle Data Integrator Oracle BAM Manager Transformation Data Services Invoke Invoke Services Invoke
E-LT Agent Metadata Active Knowledge Repository Modules Data Cache
WSDL
Generate Data Service as High speed High speed CDC based Services Data Source Batch ELT JMS ELT ELT
XML Oracle JMS Oracle BI Enterprise Data CDC Warehouse
24 Performance
25 ODI vs. ESB
Recommended
Considered
Can use
26 Performance Report
Source and Target: 2 dual core CPU, 12GB RAM
27 ODI with ESB
Data Latency
Batch (over 2 hours)
Oracle Data Integrator
Asynchronous
Oracle Enterprise Service Bus
Synchronous e (immediate) lif s l - rio ea a R en Sc Message by Mini Batches Large Volume Data Volume Message (over 1M) Processing
28 Understanding Performance Choices When you need to transform data at large size
) et rg Less than 10MB (ta XML File DB
e) rc Depends on whether an ou intermediary XML format (s XML ESB ESB ESB is useful for other processing (use ESB), File ESB ESB depends or if joining File data to tabular RDB data is required (use ODI) DB ESB depends ODI
) et rg Between 10-50MB (ta XML File DB
e) Depends on ho much rc ou cross-referencing (s XML depends depends ODI among the data values and rows is required during transformation – File depends ODI ODI the more there is, the faster ODI will perform DB ODI ODI ODI relative to ESB
) et rg Greater than 50MB (ta XML File DB
e) If the source and target rc ou are both XML, and there (s XML depends ODI ODI is no cross-referencing of data among rows, File ODI ODI ODI then a streaming-type or parallel-engine-type approach might scale DB ODI ODI ODI
*caveat – always benchmark if you are unsure and require best possible results 29 Topology 1 – Oracle to Oracle Vietnamese Customer PoC
Hardware: Quad Core/4 GB RAM
Oracle 10.2+/Linux ODI Designer
Data Synchronization
Oracle 10.2+/Win
Repositories Hardware: Dual Core/2 GB RAM Agent Performance Results
• 100k rows, 15 fields • Load: LKM DBLink 3s • Real-time synchronization (JKM DBLink) • Update 65k: 13s • Delete 30k: 8s • 1.2m rows, 8 fields (about 120 bytes/row) • Load: LKM DBLink 24s, JDBC 4.5 minutes • Real-time synchronization (JKM DBLink) • Update 5000 rows, 8s • Delete 5000 rows, 8s Real-time Synchronization with CDC CPU Usage
• Without CDC: CPU 10%, 1s-1.5s • Enable CDC (LogMiner) and Use AgentScheduler • CPU 2%, 1s-1.5s • Scenario with 1.2m rows • Update 3900 rows, CPU 23%, 2s • Delete 3900 rows, CPU 21%, 2s Summary
35 Oracle Data Integrator
Data Movement and Transformation from Multiple Sources to Heterogeneous Targets B E N E F I T D I F F E R E N T I A T O R 1 Best Performance Heterogeneous “E-LT” 2 Productivity Declarative Design Real-time 3 Declarative CDC Integration 4 Hot-Pluggable Knowledge Modules The Chosen Integration 5 Future Proof Technology of Oracle Fusion
36 Reference Customers
37 Customer: Overstock.com Solution: High-Volume Real-Time Data Transformation Technology: Oracle Data Integrator, Oracle 9i & 10g RAC, Dell Linux, IBM AIX, Teradata 8-node 54000
Oracle Data Integrator Solution: “Having access to key business metrics in real-time is no longer a fantasy.” “Oracle Data Integrator is helping us turn our data into gold” • Found a way to ensure that Teradata data warehouse was constantly updated. “Data Integrator allows us to perform data • Even highly complex transformations are transformations using the power of our Teradata automated within the Enterprise Warehousing platform. […] With Oracle, over 300 users are now able to have access to their • Supporting several terabytes of data stored in the relevant data in real-time, hourly, daily, or weekly enterprise warehouse, and millions of daily transactions depending upon their needs.” “In short, Oracle Data Integrator give us the ability to make better decisions and better manage our bottom line .”
Business Problem: Solution Architecture:
• Wanted to enable sales, finance, marketing and Data Sources, Targets, and Platforms merchandising teams to have access to near Oracle 9i RAC & 10g RAC Teradata 8-node 54000
real-time data so that they could make timely, GoldenGate TDM Platforms: more intelligent business decisions. Transactional Management IBM AIX, Dell Linux • Wanted to know at any point in time if company Data Integration Architecture performance is meeting the target metrics. • Oracle Data Integrator: 100% Java architecture, high-performance E- • Needed a data integration product that could LT transformations, business-rules driven transformation design tool, handle our high-volume loading and automatic load script generation • >1.2M SKU’s, > 5M daily transactions, >300 users, deployable for transformation requirements in near real time. both batch and real-time use cases, leverages power of Teradata engine for improved speed of data transformation
Company : Overstock.com Overstock.com, Inc. (NASDAQ: OSTK) operates as an online retailer offering bed-and-bath Product : Oracle Data Integrator goods, furniture, watches, jewelry, electronics, sporting goods, and designer accessories. Contact : Miranda Nash Email : [email protected]
38 Customer: Sabre Holdings Solution: High-Volume Real-Time Data Transformation Technology: Oracle Data Integrator, Oracle DB, MQ sources, Teradata Data Warehouse target
Oracle Data Integrator Solution:
• E-LT architecture maximizes performance and leverages existing investment in Teradata “We needed a data integration tool that would reduce our infrastructure dependency on manual coding of • Lower development and maintenance costs for E-LT scripts and leverage the E-LT driven by declarative design tools power of our Teradata Warehouse for data transformation.” • Bottom Line: Integrated travel industry data in consolidated view enables Sabre to better serve their customers and travel suppliers
Business Problem: Solution Architecture:
Data Sources, Targets, and Platforms • High costs associated with Data Warehouse Oracle RDBMS Teradata Data Warehouse loading from new sources Flat Files Various other sources over MQ • Large Teradata Data Warehouse requires top Data Integration Architecture performance for loading data in near-real time • Oracle Data Integrator: 100% Java architecture, high-performance E- • Integrated views of data require complex LT transformations, business-rules driven transformation design tool, automatic load script generation transformations, expensive to maintain
Company : Sabre Holdings For more than 40 years, Sabre Holdings (NYSE: TSG) has transformed the airline industry Product : Oracle Data Integrator through technological advancement, the Company offers a portfolio of travel marketing, Contact : Miranda Nash distribution and technology solutions. Email : [email protected]
39 Customer: DHL Solution: High-Volume Real-Time Data Transformation Technology: Oracle Data Integrator, Oracle RDBMS’s, Teradata Data Warehouse, Cobol Flat Files…
Oracle Data Integrator Solution:
• With Oracle Data Integrator, every batch that used “Solution completely meets our to last one hour now lasts seconds needs.” […] Oracle Data Integrator • Reducing window time is critical to adding more was developed by ETL developers, functionality who really know and understand • Running mini-batches more often results in more ETL concerns and pains, and how customer services and more revenue to do things better.” • Using the RDBMS as an engine for data transformation simplifies the administrative workload
Business Problem: Solution Architecture:
• 24/7 business cannot be compromised by long Data Sources, Targets, and Platforms ETL batches (via an ETL Tool) Oracle RDBMS Teradata Data Warehouse • Every daily load cannot last more than one hour Flat Files Platforms: Linux, Cobol • When the volume of data doubles, execution time triples Data Integration Architecture • Data Integration was the bottleneck in providing • Oracle Data Integrator: 100% Java architecture, high-performance E- LT transformations, business-rules driven transformation design tool, more services automatic load script generation • 2.5 terabytes loaded every 15 minutes from 8 major data sources >50 events, >5 shipments and > piece/parcel records per day
Company : DHL For more than 35 years, DHL has built the world's premier global delivery network by Product : Oracle Data Integrator trailblazing express shipping in one country after another. Over 220 countries and territories Contact : Miranda Nash later, DHL is the global market leader of the international express and logistics industry. Email : [email protected]
40 Customer: iBasis Solution: High-Volume Real-Time Data Transformation Technology: Oracle Data Integrator, Oracle 10g, Netezza PowerCenter NPS8350 Warehouse Appliance
Oracle Data Integrator Solution:
"Given the massive volumes of data we need to process every day, getting timely data in the data “The first thing that struck us warehouse requires high performance loading processes. Using Oracle Data Integrator’s set of was the speed with which we Knowledge Modules for Netezza , we are able to ramped up our ETL take advantage of the massively parallel processing developments with Oracle capabilities of Netezza and to reduce load times significantly. […] as our goal is to go more and Data Integrator.” more toward real-time , it will be easy for us to change the latency of these flows – without having to redevelop them."
Business Problem: Solution Architecture:
• Data warehouse had become obsolete and could Data Sources, Targets, and Platforms
not respond to the growing requirements of Oracle RDBMS Netezza PowerCenter NPS8350 management, sales, and operational centers Flat Files Applications (future): • Needed more accurate and timely data Call Billing, Network Monitoring • Replaced entire Data Warehouse infrastructure Data Integration Architecture • Needed a data integration that would provide the • Oracle Data Integrator: 100% Java architecture, high-performance E- scalability and performance they needed to LT transformations, business-rules driven transformation design tool, aggregate, transform, and load their data automatic load script generation • 4.5TB data warehouse, > 8 billion records, company processes >150 million transactions per day
Company : iBasis Founded in 1996, iBasis (NASDAQ: IBAS) is one of the largest carriers of international voice Product : Oracle Data Integrator traffic in the world and a leading provider of prepaid calling services. Contact : Miranda Nash Email : [email protected]
41 Analysts Coverage
42 Gartner
“Sunopsis (Oracle) has made strides in building market awareness beyond its base in Europe. Sunopsis has a range of capabilities, spanning ETL and real-time messaging, and an architecture that enables distribution of transformation workload across data sources and targets.”
Ted Friedman, Bill Gassman, “Magic Quadrant for Extraction, Transformation and Loading, 1H05”, May 11, 2005
43 Bloor Research
“While there are many relatively young vendors within the ETL market, Sunopsis has undoubtedly made the biggest impression, both in terms of the users that it has gained and in the way that its approach has influenced the market.”
Philip Howard, “Bullseye Report - Extract, Transform & Load”, March 28, 2006
44 Gartner
By purchasing Sunopsis, Oracle has acquired a server-independent and platform-independent data integration tool, which will be renamed Oracle Data Integrator (ODI). OFM and Oracle Applications customers will welcome the addition of the ODI's database independence. In particular, the acquisition could provide needed new momentum for Fusion Middleware. Fusion Middleware customers have heterogeneous IT environments, as do former PeopleSoft, Siebel Systems and JD Edwards customers, who have an ongoing requirement for integration with non- Oracle systems. The acquisition will provide OFM with a data integration tool that is capable of deploying small-grained data services within a service-oriented architecture (SOA) environment. This capability could have a positive influence on Fusion Middleware - if Oracle leverages the Sunopsis philosophy.
Mark A. Beyer, Ted Friedman “Sunopsis Data Integration May Fuel Oracle Fusion Middleware” October 23, 2006
45 Forrester Research
“Oracle has recognized that its customers require diverse data integration features without having to integrate and manage products from many vendors. Integrating Sunopsis’ heterogeneous extract, load, transform (ELT) and event-driven CDC capabilities within its middleware offerings is a great start.”
Rob Karel “Oracle Makes Serious Move In Data Heterogeneity by Acquiring Sunopsis” October 29, 2006
46