Data Monolith to Data Mesh Future of Data Integration Tools and a focus on Oracle GoldenGate and Stream Processing

Oracle Development September 2020

1 Copyright © 2020, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted For more than 30yrs, Data Integration Architecture:

ETL Hub ETL Tools…

Data Hub Vendor DI Tools…

ODS Hub Kimball EDWs…

Big Data Is “Hub and Spoke” our destiny Hub Data Lakes… forever…or could we be on a journey to somewhere else?

2 Copyright © 2020 Oracle and/or its affiliates. The world around us will keep moving faster and faster…

…IT systems and the data that fuel them are going to need to become faster and more agile…the people processes that we follow for DevOps and DataOps must also be faster and more agile

Our old ways of Data Integration are no longer sufficient to meet the future. Data Fabric | Stream Processing | Data Mesh

4 Copyright © 2020 Oracle and/or its affiliates. The future of Data Integration…is Mesh

…a new generation of Data Mesh capabilities will leave behind the Monolithic Tools of the past to interconnect modern, multi-cloud, data-driven Applications, Analytics, Data Services and Data Products of all types Evolution towards Real-Time Data Mesh

Monoliths Mesh Industry 3.0: Hub and Spoke Transitional: Kappa Hub Mature: Distributed Kappa

mesh & microservice controls

ETL

ETL ETL ETL

Lambda: http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.html https://en.wikipedia.org/wiki/Dimensional_modeling Kappa: https://www.oreilly.com/radar/questioning-the-lambda-architecture/

This data pattern, popularized by Ralph By 2010, the Lambda (big data) pattern By 2020, IT infrastructure has Kimball and Bill Inmon, has been the was common. In 2014, Jay Kreps (of dramatically changed – networking, foundation for enterprise data LinkedIn) questioned the Lambda containers, cloud, compute, Kubernetes, management since 1993. Architecture and spawned Kappa. IoT etc have all pushed data to the edge and redefined DevOps. It is transaction consistent, can scale up Kappa architecture principles consider nicely for most use cases, and is based on batch processing as a special case of The emerging model is a distributed SQL, lingua-franca for most tools. stream processing. Use a historized event mesh of data, logs, stream data log to process both real-time as well as pipelines, change events, and time batch processing. series data that flows seamlessly across clouds. 6 Copyright © 2020 Oracle and/or its affiliates. https://www.youtube.com/playlist?list=PL Data Mesh Series bqmhpwYrlZJ-583p3KQGDAd6038i1ywe

Part 1: Best Practices for How to Handle the Managing Table to Topic Deployment Topologies: CDC and Distributed Maintaining Transaction Change Stream: Partial Mappings, Accounting for Mid-tier, End-Point, Topic Commit Logs Consistency with Supplemental CDC, Schema Evolution etc. Partitions etc. Replication and Kafka Caching, Lookups etc.

Part 2: Event Patterns for CDC: Event Driven Processing, Understanding the Microservices Data Microservices Design • Transaction Outbox with CEP, ESP Time Series, GoldenGate Microservices • Architecture w/CDC Patterns for the Data Tier CQRS with CDC & GoldenGate Stream Architecture • Event Sourcing with CDC • Saga with CDC Analytics

Part 3: Demo of Application & What’s in a Name? Data Purpose of a Data Demonstration Video: Retail / Inventory Analysis Fabric, Data Hub, Data Architecture: Operational • Sources: Weather.com, Oracle DB, Retail Cloud, AWS S3, Salesforce Data Integration Mesh • Targets: Data Lake (Object Storage and Autonomous Data Mesh, Service Mesh vs. Analytic Use Cases Warehouse), Data Services (Mobile APIs), Stream Analytics

Part 4: Monolith to Mesh, the Brief History of Future of Data Integration Business Value of a Data Creating Successful Data Future of DI Tools (Monolithic) Integration is Mesh Mesh (vs. the Monoliths) Products Tools

7 Agenda

1 Brief History of Integration Tools

2 Data Mesh as a Next Step

3 GoldenGate Strategy for Data Mesh

4 GoldenGate Event Stream Processing

5 Call to Action

8 Copyright © 2020, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted Brief History of Enterprise-class Integration Tools

Historically, integration 1980 1990 2000 2010 2020 tools have focused on

specific tiers of the Business Process Management (BPM) Robotic Process software architecture: B2B Automation (RPA)

Transaction Enterprise Application Service-Oriented Integration Platform App Biz Process Processing Systems (TPS) Integration (EAI) Architecture (SOA) as a Service (iPaaS)

Integration Enterprise Service Bus (ESB) Messaging: Kafka, Pulsar etc. Message Queue (MQ) Messaging & Event Systems APIs Big Data Event Stream Processing (ESP) Complex Event Processing (CEP) Data Stream Integration Integration Data Consistency Extract Transform Load (ETL) Data Integration (DI)

Change Data Capture (CDC) Catalog etc. Focus on committed, reliable and & Data Replication Data Data Federation/Virtualization Governance ACID-grade data, typically for both Data Hubs etc. Operational (OLTP) and Analytic Data Quality (DQ) (OLAP) workloads… Master Data (MDM)

Copyright © 2020 Oracle and/or its affiliates. Traditional Data Integration Core Capabilities

Data Federation Stream Integration

As events Clients happen… Request-Reply

cache Query / Views

ETL & ELT CDC & Replication

Event Schedule E T L -driven -driven

E-L T

Copyright © 2020 Oracle and/or its affiliates.

Software Design: Monoliths to Microservices

Monoliths Mesh

Note: a monolith with REST APIs is still a monolith, and putting a monolith in a container/K8S doesn’t make it a microservice! Mesh Controller App / App / App / App / App / App / Component Component Component Component Component Component sidecar App / App / App / shared frameworks frameworks shared frameworks Component Component Component shared frameworks frameworks shared frameworks

host host host host host

Event sourcing log

Classic Monolith “Minilith” / Client Server Microservice & Mesh

Very coarse grained, many components within Coarse grained, some components may be Component isolation. Encapsulated App single App boundary, layers of shared independently upgraded, but cross-component schema, or use of Event Sourcing instead. All frameworks, dependencies on one/more dependencies still generally tightly-coupled. comms via public APIs. Shared schema is an schema, single host is often mandatory Single host is preferred. Dependencies between anti-pattern. Many components may be Apps and shared schema still exist. stateless, serverless/ephemeral

Copyright © 2020 Oracle and/or its affiliates.

Data Integration Tools Must Also Transition…

Monoliths Mesh

Physical Site / Network Physical Site / Network

data data

data Compute Compute + Storage Storage

Data Hub (Classic Monolith) Data Hub (Minilith ) Data Mesh (Microservices) Batch ETL: Batch ETL / ELT / Cloud Native: Streaming / Realtime: • Ab Initio • PowerCenter (10.x and higher) • GoldenGate + GoldenGate Stream Analytics • DataStage Grid, PowerCenter (pre-10.x) • Talend/Stitch, ODI, SAP, SAS etc • AWS Kinesis + Lambda + Glue Streaming • Hadoop (CDH, HDP etc) • StreamSets DataOps Streaming / Realtime: • WSO2 Stream Integrator Streaming / Realtime: • Qlik/Attunity, IBM IIDR, etc. • Event Sourcing Pattern (bespoke • IBM Streams, Software AG Streams • Kappa Big Data Architecture microservices) • Lambda Big Data Architecture (including: Confluent KSQL and Flink) (eg: Apache Hadoop + Apache Storm)

Copyright © 2020 Oracle and/or its affiliates. Graph of Hubs != Data Mesh

Monoliths Distributed Note: a monolith with REST APIs is still a monolith, and putting a monolith in a container/K8S doesn’t make it a microservice!

pipe B cleanse ingest

pipe A sink

prepare pipe C

Copyright © 2020 Oracle and/or its affiliates. Benefits of a “real” Data Mesh (vs. monolithic DI Hubs)

Monolithic Data Integration Issues: Data Mesh Changes Things by:

• High Cost of Operations • Microservices-based Components • DevOps - Continuous Integration • CICD is possible if the DI software is itself Continuous Delivery (CICD) not possible a distributed Mesh of small components • Integrations usually require Waterfall • Agile methodologies work better when approach with many dependencies the software is de-coupled

• Error-prone Lifecycle Management • De-coupling the Infrastructure • Data Hub / Hub and Spoke architecture • Patterns for Data Mesh are explicitly require tight-coupling, all-or-nothing event-driven thereby taking advantage upgrades, migrations, bug fixes of serverless & K8S frameworks

• Slow Pace of Innovation • Shift to a “Event Time” Data Architecture • Batch processing is a “lock in”, a chain of • Timing of data pipelines can be from dependent batches are only as fast as millisecond to hours, as a business the slowest link decision not a technical limitation Agenda

1 Brief History of Integration Tools

2 Data Mesh as a Next Step

3 GoldenGate Strategy for Data Mesh

4 GoldenGate Event Stream Processing

5 Call to Action

15 Copyright © 2020, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted What is a Data Mesh? Eg: low code, data domain Data Product Oriented centric Data Mesh is a data-tier architecture to integrate and govern enterprise data assets across distributed multi-cloud Microservice Log-based environments – three defining characteristics are: Patterns Event Integrations Streaming Eg: distributed Data Product Oriented: commit log in Kafka • Low code management of data services, data models, governance Service Mesh Immutable “Sidecars” Logs and easy integrations to analytics / data science Eg: Kubernetes controller + kubelet De-Centralized Processing: Data • Data De-centralized data processing; no ETL/Hubs/Lake monoliths Mesh • Domain Microservices / Service Mesh deployments, utilization of “sidecar Data Driven Mesh Replication proxy” patterns, encapsulation, etc. Design • Simplified continuous integration continuous delivery (CICD) and lifecycle management (LCM) across public/private clouds

Eg: data Edge / 5G Polyglot consistency Event-Driven, Stream Centric: Frameworks Persistence guarantees • Real-time by default, batch patterns only when necessary • Immutable event logs for messaging and data store events Polyglot • Semantics for consistent (ACID) and inconsistent data sets Replication https://en.wikipedia.org/wiki/ACID

16 Data Mesh Conceptual View – Data Domains

Enterprise Data Data Domain Producers: Domain-Specific Views of Data Consumers

ERP Apps, DBs, People owners of Canonical Data Data Middleware etc. Domain A “Data Products”, collections of data Data Domain B sets in various stages of curation

Data Prepared Data Domain C

IoT Data Producers:

Devices & Data Product-Specific Things Raw Data Storage Choices: Raw Event Consumers Automated Devices, • RDBMS Edge Nodes (5G), Scheduled • Data Lake Data Mesh Routines (eg; ETL etc) • Object Store • Graph, etc. (distributed Kappa, microservices, cloud agnostic)

17 Copyright © 2020 Oracle and/or its affiliates. Consumer-Driven, Event-Centric Data Mesh

Enterprise Data Data Domain Speed & Trusted Ease of Producers Fidelity Views Consumption Consumers

User Data Mesh puts the Action consumer needs first –

Object Model they require data at App Raw Prepared Canonical different latency, fidelity, Data Data Data trust levels and views App APIs and system log events Canonical data Data Applications, DB events Data Services Data Model Objects System Raw Data (LCR) Biz Consumers Of Record Prepared (SoR) “Master” Data Topics Data Topics Schema Events (DDL) Table Analytics & Prepared data events are pushed Logical JSON, XML, Data Data Marts committed! Change Avro, Parquet, Records CSV (LCRs) Raw Data Data Science Raw data, Time Series & Alerting events are pushed & Streaming CDC Replication / Alerts Applications Detect Event LCR/TFs Direct to (high fidelity transaction semantics fully preserved) SQL DBAs for HA, Consumers DR and OLTP

18 Copyright © 2020 Oracle and/or its affiliates. Distributed by Design, Microservices Based

Business Data Domain Data Views Producers Prepare Data Domain Edge Compute User Technical Data Consumers Action or Cloud for Views Stream Events Object Model Raw Data Process (persisted) App Events

Stream Events Canonical Applications, Process (persisted) Data DB data Data Services Data Model Objects System Biz Consumers Of Record (SoR) Events (ephemeral or persisted)

Table Analytics & Prepared data events are pushed Logical Data Data Marts committed! Change Records (LCRs) CDC Replication LCRs Microservices Raw Data Data Science Raw data, Time Series & Alerting events are pushed & Streaming / Alerts Applications Detect Event Direct to Database (high fidelity transaction semantics fully preserved) SQL DBAs for HA, Consumers DR and OLTP

19 Copyright © 2020 Oracle and/or its affiliates. Example 1: Mesh of Data Integration Microservices

single pane of glass…

Multi-Cloud load Analytics xform Edge ingest dist. Edge Gateways

replicat

capture dist. capture λ capture ingest load ingest dist. filter join

dist. dist. capture capture capture capture

Exadata Cloud@Customer

Enterprise Applications

Copyright © 2020 Oracle and/or its affiliates. Example 2: Maintain Data Consistency in Pipelines

Data Consumer is Kafka responsible to maintain Single Partition transaction boundaries

OLTP

A { “customer_id": “1" , “first_name": “Debra" , “last_name": “Burks" , A “phone": “" , “email": “[email protected]" , “SCN”: “130” , “CSN” : “130” B } B

{ “customer_id": “1" , “9273 Updates and SCN – System Change Number, is the Oracle DB clock – every time a transaction commits, the clock Thome Ave." , “city": Deletes both show “Orchard Park" , “state": up in Kafka as new increments. The SCN marks a consistent point in time in the database. “NY" , “zip_code": “14127“ , “SCN”: “130” , “CSN” : “130” messages, CSN – Commit Sequence Number, is the GoldenGate clock – GG uses CSN during apply to identify } Consumers must the point in time at which the transaction is committed for maintaining transaction consistency and interpret the flags data integrity. A CSN is available for all Source DB transactions captured via GoldenGate: correctly https://docs.oracle.com/en/middleware/goldengate/core/19.1/admin/commit-sequence-number.html

Copyright © 2020 Oracle and/or its affiliates. Example 3: Continuous Transformation and Loading (CTL)

Enterprise Data Producers: Data Owners &

ERP Apps, DBs, Data Products Middleware etc.

Data Applications, Objects Data Services Biz Consumers

Table Analytics & Data Marts IoT Data Data Producers: Raw Data Data Science Devices & & Streaming Things / Alerts Applications

Filter Queries Time Series SQL DBAs for HA, Aggregate Data Patterns Spatial Analytics Consumers DR and OLTP Correlate/Enrich Windowing Anomalies Thresholds Data Policies Classification Joins Business Rules Scoring Models Copyright © 2020 Oracle and/or its affiliates. Top “Tactical IT” Data Mesh Opportunities

Distributed Data Lake Ingestion Move From Batch ETL to Streaming CTL (Continuous Transformation and Loading)

GG Admin Event Driven Service E T L Extracts Extracts Replicats Schedule Driven CTL Replicats Extracts E-L T PaaS Cloud Mid Tier Edge

Multi-Cloud (VCN) Operational Replication DataOps for Microservices Apps & Analytics

Event-based data fabric to connect Apps & Analytics across IT estate

AWS East AWS West

Copyright © 2020 Oracle and/or its affiliates. Top Data-Driven Business Transformation Solutions

Next Best Action / Realtime Offers Supply Chain / Smart Inventory Management

Logistics / Location Intelligence

Copyright © 2020 Oracle and/or its affiliates. Agenda

1 Brief History of Integration Tools

2 Data Mesh as a Next Step

3 GoldenGate Strategy for Data Mesh

4 GoldenGate Event Stream Processing

5 Call to Action

25 Copyright © 2020, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted GoldenGate Microservices

GoldenGate is itself a set of microservices Your that human users or other services may GG GUI Services interact with

Embedded User Interface: Native GUIs and Full REST APIs • C-based microservices with embedded HTTP client for native JavaScript based GUI API Gateway or Proxy Service • Oracle Jet frameworks for intuitive interaction model GG GG GG GG GG Admin Distro Metrics Receiver Service REST native APIs: Service Service Service Service Manager • Fully REST native Extracts Replicats Client Libraries • Also available, a Command Line Interface (CLI) produces Cloud, Containers or Edge Devices REST calls to the native services

data replications bi-directional Full GoldenGate Replication Capabilities: ms/sec updates consistency guarantees • 100% coverage for all traditional GG replication patterns; fully capable of HA/DR use cases

26 Transaction Outbox for the Whole Enterprise

DBMS

DB2/z Relational

Microservices- Cloud Based

Non- Big Data Relational

Replication of NoSQL Object Real-time Data Storage Transactions & Events

Apps GoldenGate Stream Analytics Streams ETL &ML

https://www.oracle.com/middleware/technologies/goldengate.html Copyright © 2019 Oracle and/or its affiliates. Single Pane of Glass for Real-Time Data Mesh

DBAs & Data Owners & Data Engineers Data Products

Event Centric Pipelines Data Consumer Driven

Data Applications, Data Services Objects Biz Consumers

DB2/z Raw Real-Time Stream connect Table Data Data Processing Analytics & Data Data Marts

Raw Data Data Science & Streaming / Alerts Applications

Deploys in a Mesh SQL DBAs for HA, Across Containers, Public Clouds and 5G Edge Devices Consumers DR and OLTP

28 Copyright © 2020 Oracle and/or its affiliates. GoldenGate Direction: Mesh Platform for Fast Data

Data Owners & Cloud Admin Data Ops Data Engineer Data Analyst Data Products

Single Pane of Glass for Key Personas Applications, ) Sample Apps, Pre-Built Templates, Accelerators Data Data Services etc Objects Biz Consumers Governance Continuous Operations Change & • Workspaces Capture (CDC) • Management Loading (CTL) Table Analytics & • Catalog • Monitoring Data Data Marts • Data Verification Data Replication Stream Analytics • Metering (cloud) • Security (source/target) (ML, Time-Series etc) • Administration Raw Data Data Science Microservices, Distributed Deployments & Streaming / Alerts Applications

Optional Containers, Service Mesh (K8S Pods) Streaming Data (Apps, DBs, IoT, DBs, IoT, Streaming Data (Apps,

Oracle Cloud | 3rd Party Cloud | On-Prem Data Centers | Embedded Edge SQL DBAs for HA, Consumers DR and OLTP

29 Copyright © 2020 Oracle and/or its affiliates. Oracle Focus for Data Mesh: Data Products Consistent Enterprise Data

• ERP Applications: • Ledger, HR, Supply Chain

• Financial & Ecommerce Apps

• Inventory and Logistics Data

• Operational Data Availability

• Consistent Data Stores • Trusted Data Lakes • Dependable Analytics

Copyright © 2020 Oracle and/or its affiliates. A Note about Data Fidelity and Impedance Mismatch

Impedance Mismatch: Schema evolution and semantic drift should be part of the design, not As a User interacts with the Application Object Model, an after-thought! data may be serialized as Message Payloads or persisted in Database Tables. ER {json} Impedance mismatch can lead to lossy semantics → each time data formats change, there is often some loss of meaning at the data level.

Updates over time… Modern DI tools must Data Fidelity: reliably share the full change stream of the data – consistently! A SQL view on a Table, or a batch ETL job, only tell a small part of the “data story”… you are only seeing the state of the data at a point-in-time.

It is the Commit Log (eg; the change stream) that tells the whole story, a full narrative of the data. Time Dimension

Sequence and Order are as Valuable as State

Copyright © 2020 Oracle and/or its affiliates. Running GoldenGate in a Service Mesh

Pod Management

K8S Master Worker Node 1 Kublet Kube-proxy VCN API Server Controller Pod 1 Scheduler etcd Extract Container 1 SQL*Net / SSL-TLS 1.3 Reverse Proxy (Nginx)

GoldenGate Management

Deliver

Block Storage 1 (writable) Block Storage 2 (read only) Block Storage 3 (read only) VCN My deployments home • Primary GG • CacheMgr • /u02/deployments/dpora12c Installation • Bounded • /u02/deployments/dpora19c • Seamless Recovery Service Manager Patching/Upgrade • /u02/deployments/ServiceManager

Copyright © 2020 Oracle and/or its affiliates. Agenda

1 Brief History of Integration Tools

2 Data Mesh as a Next Step

3 GoldenGate Strategy for Data Mesh

4 GoldenGate Event Stream Processing

5 Call to Action

33 Copyright © 2020, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted Oracle GoldenGate Stream Analytics

Data Owners & Data Products

Ingest Events Select Processing Patterns Build Event Pipelines Serve Data Downstream GoldenGate events from any DB or Rich set of pre-built patterns will Easily leverage geo-fencing, Data can be delivered out to Kafka, NoSQL; also supports events from dramatically improve developer machine-learning, and other , or easily staged for Kafka, JMS, AQ and MQTT efficiency and time-to-value reference data within the stream external ETL jobs

34 Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Oracle OpenWorld 2018 GoldenGate Stream Analytics

Interactive, Low-Code Designer Time Series Analytics Event Driven Dashboards

Patterns / Accelerators Geo-Spatial Analysis & Geo- Predictive Analytics / ML Fencing

Copyright © 2020 Oracle and/or its affiliates. Stream Processing Leadership

More than 70 patents on stream processing Mature tech stack for Event Processing Over 10 years of IP investment Leader for global initiative 19c on streaming standards

18c

12.3

12.2 11g

Copyright © 2020 Oracle and/or its affiliates. Use Cases for Event Driven, Stream Data Processing

Event/Data Geospatial There has been a widespread Pipelines Actions awakening to the benefits of Event Drive Architecture (EDA) for increasing the scalability and agility Continuous Time-Series Real-time ETL Analysis AI/ML of business systems. […] Stream analytics is based on the mathematics of complex-event GoldenGate Stream Analytics (& CEP) processing (CEP). CEP is a computing technique in which Data & Microservice Events incoming data about what is happening (event data) is processed as it arrives (data in motion or recently in motion) to generate higher level, more useful, summary information (complex events).

W. Roy Schulte (of Gartner), March 2020: EDA is Suddenly Popular Will Stream Analytics be Next?

Copyright © 2020 Oracle and/or its affiliates. Agenda

1 Brief History of Integration Tools

2 Data Mesh as a Next Step

3 GoldenGate Strategy for Data Mesh

4 GoldenGate Event Stream Processing

5 Call to Action

38 Copyright © 2020, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted Customer Success What Next?

Ask your Oracle Rep for a demo!

Oracle #1 in Data Fabric Strategy https://blogs.oracle.com/dataintegration/oracl e_forresterwave_datafabric_2020?xd_co_f=66 bcf41f-e285-4ccc-a5b5-1c790cab0db0

GoldenGate YouTube | Data Mesh: https://www.youtube.com/playlist?list=PL bqmhpwYrlZJ-583p3KQGDAd6038i1ywe

Free Trial of GoldenGate Streaming: https://cloudmarketplace.oracle.com/marke tplace/en_US/listing/70961838

Copyright © 2020 Oracle and/or its affiliates.

Our mission is to help people see data in new ways, discover insights, unlock endless possibilities.