Oracle Data Integrator – Solution Overview

Nguyen Tuan Khang , [email protected] Senior Solutions Consultant Fusion Middleware Oracle Vietnam Why Data Integration?

NEED… Information How and Where you Want It

Corporate Performance Business Process Business Activity Business Intelligence Management Management Monitoring

Data Integration

Migration Data Master Data Data Federation Real Time Warehousing Management Synchronization Messaging ------

HAVE… Data in Disparate Sources

------Legacy ERP CRM Best-of-breed Applications

3 3 Pillars of Data Integration

h

c

t c

a

n B y

Async S

4 Enterprise Information Integration The Traditional Approach

Target Data Source Transform Load Applications Extract Warehouse

• ETL “processes” often use batch processing approaches • Example: Customer nightly batch runs can take > 24 hours! • “Services” that operate on data are not easily reusable in other contexts • ETL “Services” and “Processes” are insecure and hard to monitor (i.e. no SLA)

5 Challenges In Data Integration

CHALLENGE 1. Increasing data volumes; decreasing batch windows

2. Non-integrated integration

3. Complexity, manual effort of conventional ETL design

4. Lack of knowledge capture

6 Oracle Data Integrator Based on Technology from

Data Movement and Transformation from Multiple Sources to Heterogeneous Targets B E N E F I T D I F F E R E N T I A T O R 1 Best Performance Heterogeneous “E-LT” 2 Productivity Declarative Design 3 Real-time Integration Declarative CDC 4 Hot-Pluggable Knowledge Modules The Chosen Integration 5 Future Proof Technology of Oracle Fusion

7 Typical Considerations for ODI

• High volume data synchronization • more than 20MB/min • Heterogeneous data sources • DB2/AS400, Oracle, Excel, File, SQL, BAM… • Capture new data changes regardless of data sources • CDC using Native Journal, LogMiner or Trigger… • Real-time data synchronization • Easy to implement the solution without changing your current IT infrastructure • No separate server required

8 Challenges & Emerging Solutions In Data Integration

CHALLENGE EMERGING SOLUTION 1. Increasing data volumes; Shift from E-T-L to E-LT decreasing batch windows

2. Non-integrated integration Convergence of integration solutions

3. Complexity, manual effort of Shift from custom coding to conventional ETL design declarative design

4. Lack of knowledge capture Shift to pattern-driven development

9 11 E-LT Architecture High Performance Conventional ETL Architecture Transform in Separate ETL Server • Proprietary Engine Extract Transform Load • Poor Performance • High Costs

Eg. Informatica, IBM Datastage Transform in Existing RDBMS • Leverage Resources Next Generation Architecture • Efficient • High Performance ““E-LTE-LT ””

Transform Transform Benefits Extract Load  Optimal Performance & Scalability  Easier to Manage & Lower Cost Oracle Data Integrator

10 11 Traditional E-T-L Technical Detail

• Need one powerful server for Transform Server and for its staging data tables • High total cost for maintenance • It is not flexible when we add more source and target data sources • Require coding Conventional ETL Architecture • Bad performance (more I/O among staging Transform tables and source/target) Extract Server Load S1

Target 1

S2 ETL DB S3 Repository ------Staging tables

11 11 Next General Architecture: E-LT Technical Detail

• Leverage resources for transformation for high performance, less I/O, and license • Design data flow by pre-defined templates, open for all types of data sources (drag & drop) • Capture changes data for near real-time data synchronization • No coding required E-LT Architecture Target 1 S1 Extract Load Transform S2 Staging tables S3

------For scheduling and ODI Agent real-time monitoring changes only

No need at ODI Designer production 12 22 Active Integration Batch, Event-based, and Service-oriented Integration

• Evolve from Batch to Near Oracle Data Integrator

Real-time Warehousing on Event Conductor Service Conductor Common Platform Event-oriented Service-oriented Integration Integration • Unify the Silos of Data Integration Metadata • Data Integrity on the Fly Declarative Design • Services Plug into Oracle SOA Suite Data-oriented Integration Data Conductor • Benefits  Enables real-time data warehousing and operational data hubs  Services plug into Oracle SOA Suite for comprehensive integration

13 33 Declarative Design Developer Productivity

Conventional ETL Design Specify ETL Data Flow Graph • Developer must define every step of Complex ETL Flow Logic • Traditional approach requires specialized ETL skills • And significant development and maintenance efforts

Declarative Set-based Design • Simplifies the number of steps ODI Declarative Design • Automatically generates the Data Flow whatever the sources and target DB 1 2 Define Automatically What Generate Benefits You Want Dataflow  Significantly reduce the learning curve  Shorter implementation times  Streamline access to non-IT pros Define How : Built-in Templates

14 44 Pluggable Data Integration Architecture Hot-Pluggable: Modular, Flexible, Extensible

Pluggable Architecture Reverse Journalize Load Check Integrate Service Engineer Metadata Read from CDC From Sources to Constraints before Transform and Move Expose Data and Source Staging Load to Targets Transformation Services Reverse WS WS WS

Staging Tables Load Integrate Services CDC Target Tables Journalize Check Sources

Benefits  Tailor to existing best practices  Ease administration work  Depend on the specific data source, we will select right pre-defined coding module (Knowledge Module) -> Hot-Pluggable  Support all types of data sources (DB2/AS400, Oracle, Excel, File…)  Reduce cost of ownership

15 44 Knowledge Modules Hot-Pluggable: Modular, Flexible, Extensible

Pluggable Knowledge Modules Architecture Reverse Journalize Load Check Integrate Service Engineer Metadata Read from CDC From Sources to Constraints before Transform and Move Expose Data and Source Staging Load to Targets Transformation Services Reverse WS WS WS

Staging Tables Load Integrate Services CDC Target Tables Journalize Check Sources Error Tables

Sample out-of-the-box Knowledge Modules

SQL Server Oracle Check MS TPump/ Log Miner JMS Queues Oracle Merge Oracle Web SAP/R3 Triggers DBLink Excel Multiload Services

Oracle Check Siebel EIM DB2 Web Siebel DB2 Journals DB2 Exp/Imp SQL*Loader Sybase Type II SCD Schema Services

Benefits  Tailor to existing best practices  Ease administration work  Reduce cost of ownership

16 44 KMs: Truly Heterogeneous

• Generic SQL DB • Netezza Performance Server 2.2.1 • Oracle DB 9i • Hyperion • Oracle DB 10g • PostgresSQL 8.1 • Oracle DB 10g XE • MySQL 4.0 • IBM DB2/400 • MySQL 5.0 • IBM DB2/UDB • Oracle BI Suite 10g • IBM Informix SE • Oracle BAM 10g • IBM LDAP Server • Oracle Internet Directory 9i • MS SQL Server 2000 • OpenLDAP 2.3 • MS SQL Server 2005 • Siebel CRM 7.8 Out-of-Box • MS SQL Server 2005 SE • JD Edwards Knowledge • MS Office Access 2000 • PeopleSoft Modules • MS Office Excel 2000 • SAP R/3 • MS Active Directory • Oracle EBusiness Suite • Sybase ASA 8.x & 9.x • Oracle AQ 10g • Sybase IQ 12.x • Oracle SOA Suite • Sonic MQ v7.0 • Oracle ESB 10g • Teradata V2R5.x • SalesForce.com App Exchange • Teradata V2R6.x • Any JMS Standard Implementation

17 Popular Usage Scenarios

18 E-LT for Data Warehouse Create Data Warehouse for Business Intelligence Populate Warehouse with High Performance ODI

Load Incremental Update Aggregate  Heterogeneous sources Transform Data Integrity Export Capture Changes and targets  Incremental load

Cube  Slowly changing Operational

Analytics dimensions

------Cube ---- Data Warehouse  Data integrity and ---- consistency

Cube  Changed data capture  Data lineage Metadata

Data Transformation Data Warehousing

19 ODI for Master Data Management Common Data Quality, and Middleware Services

Solutions & Applications Master Data Management  Vertical Driven  Data Object Centric Industry TelcoTelco EnergyEnergy BankingBanking RetailRetail MfrMfr ….…. Solutions  Application Focus Middleware Foundation MDM CustomerCustomer SupplierSupplier EmployeeEmployee ProductProduct AssetAsset ….….Applications  Process Orchestration  Business Intelligence  Registry & Policies Fusion Middleware Foundation  Data Integration & Quality Oracle Data Integrator Golden Oracle Data Integrator Master E-LT Agent E-LT Records  Batch & Real-time Integration Metadata  Data Quality & Profiling  Transformation & Data Routing

Other Oracle Siebel PeopleSoft Sources SAP/R3 EBS CRM

20 ODI Enhances Oracle BI Populate Warehouse with High Performance ODI

Oracle BI Suite EE

Interactive Answers Publisher Delivers Dashboards Oracle Business Intelligence Oracle BI Presentation Server

Oracle BI Server Suite EE:  Simplified Business Model View  Advanced Calculation & Integration Oracle BI Engine Enterprise Data Warehouse  Intelligent Request Generation  Optimized Data Access Bulk E-LT

Oracle Data Integrator Oracle Data Integrator: E-LT Agent E-LT Metadata  Populate Enterprise Data Warehouse  Optimized Performance for Load and Transform  Extensible Pre-packaged E-LT Other Oracle Siebel PeopleSoft Sources SAP/R3 EBS CRM Content

21 ODI Enhances Oracle SOA Suite Add Bulk Data Transformation to BPEL Process

Oracle SOA Suite Oracle SOA Suite: BPEL Process Manager Business Activity Monitoring  BPEL Process Manager for Web Services Business Process Manager Orchestration Declarative Rules Engine

Enterprise Service Bus Oracle Data Integrator: Oracle Data Integrator  Efficient Bulk Data Processing E-LT Agent E-LT Metadata as Part of Business Process  Interact via Data Services and Transformation Services

Bulk Data Processing

22 ODI with BAM Populate BAM with ETL Data Efficiently

Oracle SOA Suite Oracle SOA Suite Business Activity Monitoring  Business Activity Monitoring Event Monitoring Web Applications for Real-time Business Insight BPEL Process Manager  Message-based, event- Web Services Manager driven, memory-resident

Business Rules architecture Engine

Event Engine Report Cache Enterprise Service Oracle Data Integrator Bus  High Performance Loading of Active Data Cache BAM’s Active Data Cache  Pre-built and Integrated via Knowledge Modules Oracle Data Integrator  BAM Java APIs Exposed Bulk and Real-Time Agent through “Interface” Like Any Data Processing Metadata Other Target

Message Sample Combined Use Queues Data CDC Cases Warehouse PeopleSoft  Monitor Together Events and SAP/R3 the Aggregate Implications of Events 23 Integration with SOA/BI/Fusion Resolve All Integration Challenges

Oracle BPA and Oracle BI Human Workflow

Invoke Invoke

Dashboards, Reporting, Analysis, Publishing

Invoke

BPEL Process Oracle Data Integrator Oracle BAM Manager Transformation Data Services Invoke Invoke Services Invoke

E-LT Agent Metadata Active Knowledge Repository Modules Data Cache

WSDL

Generate Data Service as High speed High speed CDC based Services Data Source Batch ELT JMS ELT ELT

XML Oracle JMS Oracle BI Enterprise Data CDC Warehouse

24 Performance

25 ODI vs. ESB

Recommended

Considered

Can use

26 Performance Report

Source and Target: 2 dual core CPU, 12GB RAM

27 ODI with ESB

Data Latency

Batch (over 2 hours)

Oracle Data Integrator

Asynchronous

Oracle Enterprise Service Bus

Synchronous e (immediate) lif s l - rio ea a R en Sc Message by Mini Batches Large Volume Data Volume Message (over 1M) Processing

28 Understanding Performance Choices When you need to transform data at large size

) et rg Less than 10MB (ta XML File DB

e) rc Depends on whether an ou intermediary XML format (s XML ESB ESB ESB is useful for other processing (use ESB), File ESB ESB depends or if joining File data to tabular RDB data is required (use ODI) DB ESB depends ODI

) et rg Between 10-50MB (ta XML File DB

e) Depends on ho much rc ou cross-referencing (s XML depends depends ODI among the data values and rows is required during transformation – File depends ODI ODI the more there is, the faster ODI will perform DB ODI ODI ODI relative to ESB

) et rg Greater than 50MB (ta XML File DB

e) If the source and target rc ou are both XML, and there (s XML depends ODI ODI is no cross-referencing of data among rows, File ODI ODI ODI then a streaming-type or parallel-engine-type approach might scale DB ODI ODI ODI

*caveat – always benchmark if you are unsure and require best possible results 29 Topology 1 – Oracle to Oracle Vietnamese Customer PoC

Hardware: Quad Core/4 GB RAM

Oracle 10.2+/Linux ODI Designer

Data Synchronization

Oracle 10.2+/Win

Repositories Hardware: Dual Core/2 GB RAM Agent Performance Results

• 100k rows, 15 fields • Load: LKM DBLink 3s • Real-time synchronization (JKM DBLink) • Update 65k: 13s • Delete 30k: 8s • 1.2m rows, 8 fields (about 120 bytes/row) • Load: LKM DBLink 24s, JDBC 4.5 minutes • Real-time synchronization (JKM DBLink) • Update 5000 rows, 8s • Delete 5000 rows, 8s Real-time Synchronization with CDC CPU Usage

• Without CDC: CPU 10%, 1s-1.5s • Enable CDC (LogMiner) and Use AgentScheduler • CPU 2%, 1s-1.5s • Scenario with 1.2m rows • Update 3900 rows, CPU 23%, 2s • Delete 3900 rows, CPU 21%, 2s Summary

35 Oracle Data Integrator

Data Movement and Transformation from Multiple Sources to Heterogeneous Targets B E N E F I T D I F F E R E N T I A T O R 1 Best Performance Heterogeneous “E-LT” 2 Productivity Declarative Design Real-time 3 Declarative CDC Integration 4 Hot-Pluggable Knowledge Modules The Chosen Integration 5 Future Proof Technology of Oracle Fusion

36 Reference Customers

37 Customer: Overstock.com Solution: High-Volume Real-Time Data Transformation Technology: Oracle Data Integrator, Oracle 9i & 10g RAC, Dell Linux, IBM AIX, Teradata 8-node 54000

Oracle Data Integrator Solution: “Having access to key business metrics in real-time is no longer a fantasy.” “Oracle Data Integrator is helping us turn our data into gold” • Found a way to ensure that Teradata data warehouse was constantly updated. “Data Integrator allows us to perform data • Even highly complex transformations are transformations using the power of our Teradata automated within the Enterprise Warehousing platform. […] With Oracle, over 300 users are now able to have access to their • Supporting several terabytes of data stored in the relevant data in real-time, hourly, daily, or weekly enterprise warehouse, and millions of daily transactions depending upon their needs.” “In short, Oracle Data Integrator give us the ability to make better decisions and better manage our bottom line .”

Business Problem: Solution Architecture:

• Wanted to enable sales, finance, marketing and Data Sources, Targets, and Platforms merchandising teams to have access to near Oracle 9i RAC & 10g RAC Teradata 8-node 54000

real-time data so that they could make timely, GoldenGate TDM Platforms: more intelligent business decisions. Transactional Management IBM AIX, Dell Linux • Wanted to know at any point in time if company Data Integration Architecture performance is meeting the target metrics. • Oracle Data Integrator: 100% Java architecture, high-performance E- • Needed a data integration product that could LT transformations, business-rules driven transformation design tool, handle our high-volume loading and automatic load script generation • >1.2M SKU’s, > 5M daily transactions, >300 users, deployable for transformation requirements in near real time. both batch and real-time use cases, leverages power of Teradata engine for improved speed of data transformation

Company : Overstock.com Overstock.com, Inc. (NASDAQ: OSTK) operates as an online retailer offering bed-and-bath Product : Oracle Data Integrator goods, furniture, watches, jewelry, electronics, sporting goods, and designer accessories. Contact : Miranda Nash Email : [email protected]

38 Customer: Sabre Holdings Solution: High-Volume Real-Time Data Transformation Technology: Oracle Data Integrator, Oracle DB, MQ sources, Teradata Data Warehouse target

Oracle Data Integrator Solution:

• E-LT architecture maximizes performance and leverages existing investment in Teradata “We needed a data integration tool that would reduce our infrastructure dependency on manual coding of • Lower development and maintenance costs for E-LT scripts and leverage the E-LT driven by declarative design tools power of our Teradata Warehouse for data transformation.” • Bottom Line: Integrated travel industry data in consolidated view enables Sabre to better serve their customers and travel suppliers

Business Problem: Solution Architecture:

Data Sources, Targets, and Platforms • High costs associated with Data Warehouse Oracle RDBMS Teradata Data Warehouse loading from new sources Flat Files Various other sources over MQ • Large Teradata Data Warehouse requires top Data Integration Architecture performance for loading data in near-real time • Oracle Data Integrator: 100% Java architecture, high-performance E- • Integrated views of data require complex LT transformations, business-rules driven transformation design tool, automatic load script generation transformations, expensive to maintain

Company : Sabre Holdings For more than 40 years, Sabre Holdings (NYSE: TSG) has transformed the airline industry Product : Oracle Data Integrator through technological advancement, the Company offers a portfolio of travel marketing, Contact : Miranda Nash distribution and technology solutions. Email : [email protected]

39 Customer: DHL Solution: High-Volume Real-Time Data Transformation Technology: Oracle Data Integrator, Oracle RDBMS’s, Teradata Data Warehouse, Cobol Flat Files…

Oracle Data Integrator Solution:

• With Oracle Data Integrator, every batch that used “Solution completely meets our to last one hour now lasts seconds needs.” […] Oracle Data Integrator • Reducing window time is critical to adding more was developed by ETL developers, functionality who really know and understand • Running mini-batches more often results in more ETL concerns and pains, and how customer services and more revenue to do things better.” • Using the RDBMS as an engine for data transformation simplifies the administrative workload

Business Problem: Solution Architecture:

• 24/7 business cannot be compromised by long Data Sources, Targets, and Platforms ETL batches (via an ETL Tool) Oracle RDBMS Teradata Data Warehouse • Every daily load cannot last more than one hour Flat Files Platforms: Linux, Cobol • When the volume of data doubles, execution time triples Data Integration Architecture • Data Integration was the bottleneck in providing • Oracle Data Integrator: 100% Java architecture, high-performance E- LT transformations, business-rules driven transformation design tool, more services automatic load script generation • 2.5 terabytes loaded every 15 minutes from 8 major data sources >50 events, >5 shipments and > piece/parcel records per day

Company : DHL For more than 35 years, DHL has built the world's premier global delivery network by Product : Oracle Data Integrator trailblazing express shipping in one country after another. Over 220 countries and territories Contact : Miranda Nash later, DHL is the global market leader of the international express and logistics industry. Email : [email protected]

40 Customer: iBasis Solution: High-Volume Real-Time Data Transformation Technology: Oracle Data Integrator, Oracle 10g, Netezza PowerCenter NPS8350 Warehouse Appliance

Oracle Data Integrator Solution:

"Given the massive volumes of data we need to process every day, getting timely data in the data “The first thing that struck us warehouse requires high performance loading processes. Using Oracle Data Integrator’s set of was the speed with which we Knowledge Modules for Netezza , we are able to ramped up our ETL take advantage of the massively parallel processing developments with Oracle capabilities of Netezza and to reduce load times significantly. […] as our goal is to go more and Data Integrator.” more toward real-time , it will be easy for us to change the latency of these flows – without having to redevelop them."

Business Problem: Solution Architecture:

• Data warehouse had become obsolete and could Data Sources, Targets, and Platforms

not respond to the growing requirements of Oracle RDBMS Netezza PowerCenter NPS8350 management, sales, and operational centers Flat Files Applications (future): • Needed more accurate and timely data Call Billing, Network Monitoring • Replaced entire Data Warehouse infrastructure Data Integration Architecture • Needed a data integration that would provide the • Oracle Data Integrator: 100% Java architecture, high-performance E- scalability and performance they needed to LT transformations, business-rules driven transformation design tool, aggregate, transform, and load their data automatic load script generation • 4.5TB data warehouse, > 8 billion records, company processes >150 million transactions per day

Company : iBasis Founded in 1996, iBasis (NASDAQ: IBAS) is one of the largest carriers of international voice Product : Oracle Data Integrator traffic in the world and a leading provider of prepaid calling services. Contact : Miranda Nash Email : [email protected]

41 Analysts Coverage

42 Gartner

“Sunopsis (Oracle) has made strides in building market awareness beyond its base in Europe. Sunopsis has a range of capabilities, spanning ETL and real-time messaging, and an architecture that enables distribution of transformation workload across data sources and targets.”

Ted Friedman, Bill Gassman, “Magic Quadrant for Extraction, Transformation and Loading, 1H05”, May 11, 2005

43 Bloor Research

“While there are many relatively young vendors within the ETL market, Sunopsis has undoubtedly made the biggest impression, both in terms of the users that it has gained and in the way that its approach has influenced the market.”

Philip Howard, “Bullseye Report - Extract, Transform & Load”, March 28, 2006

44 Gartner

By purchasing Sunopsis, Oracle has acquired a server-independent and platform-independent data integration tool, which will be renamed Oracle Data Integrator (ODI). OFM and Oracle Applications customers will welcome the addition of the ODI's database independence. In particular, the acquisition could provide needed new momentum for Fusion Middleware. Fusion Middleware customers have heterogeneous IT environments, as do former PeopleSoft, and JD Edwards customers, who have an ongoing requirement for integration with non- Oracle systems. The acquisition will provide OFM with a data integration tool that is capable of deploying small-grained data services within a service-oriented architecture (SOA) environment. This capability could have a positive influence on Fusion Middleware - if Oracle leverages the Sunopsis philosophy.

Mark A. Beyer, Ted Friedman “Sunopsis Data Integration May Fuel ” October 23, 2006

45 Forrester Research

“Oracle has recognized that its customers require diverse data integration features without having to integrate and manage products from many vendors. Integrating Sunopsis’ heterogeneous extract, load, transform (ELT) and event-driven CDC capabilities within its middleware offerings is a great start.”

Rob Karel “Oracle Makes Serious Move In Data Heterogeneity by Acquiring Sunopsis” October 29, 2006

46