Enterprise NoSQL Converging Analysis and Operations Ken Krupa, Enterprise CTO, MarkLogic

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. A Brief History: Duality

Analytical Specialization . Specialization between analysis and operations Operational Specialization Analysis / Operations . Accelerated by disruptive Gap IT shifts (e.g. Internet, Hadoop)

. Need for greater convergence

~1990 ~2000 ~2010 ~2013 Star WWW Big NoSQL Schema Mainstream Main- Mainstream st EDW 1 peak stream Mainstream

SLIDE: 2 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Accelerating the divide? Our world is changing …and heterogeneous data is a problem

44 ZB

12% Structured 88% Unstructured

Reference Data

Warehouse 4.4 ZB OLTP

Data Marts Archives ?

2013 2020

Source: IDC

SLIDE: 3 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

THE DATA WAREHOUSES Traditional Enterprise (RDBMS) EDW Definition: “A subject oriented, nonvolatile, integrated, time variant collection of data in support of management's decisions” – Bill Inmon

. Pre-defined schemas . Complex ETL processes . Changes depend on SDLC

SLIDE: 5 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Star Schema Modeling - Waterfall

TIME COST 1. Choose the business process

Identify 2. Declare the grain

Model 3. Identify the dimensions

Integrate 4. Identify the fact

Source: Discover http://en.wikipedia.org/wiki/Dimensional_modeling

SLIDE: 6 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

View from the Enterprise “Unstructured” Reference Data Documents, Messages Video Warehouse { } Audio Signals, Metadata Logs, Streams OLTP “ Social ”  Search Archives Data Marts

SLIDE: 7 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

WHAT ABOUT HADOOP Hadoop – What You Get

Advantages Gaps . HDFS provides scale and economies . Hadoop was designed for batch of scale processing . File-based nature allows for greater . Does not support real-time Variety applications on its own . Raw data is fine and any shape . Requires expertise to configure, deploy will do and manage . Schema-on-read possible . Has security limitations . Map-reduce and YARN enables . On its own, it is not a database massive parallel scaling

SLIDE: 10 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

RDBMS + Hadoop

. Still a lot of ETL – RDBMS is still in the picture

. Shortcomings in security and governance capabilities with Hadoop

. Reliance on RDBMS for anything operational

. A mismatch between analytical and operational aspects

SLIDE: 11 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

ENTERPRISE NOSQL Enterprise NoSQL Flexible Data Model Store and manage JSON, XML, RDF, and Geospatial data with a document-centric, schema-agnostic database. Pre-requisite modeling not required Search and Query Built-in search to find answers in documents, relationships, and metadata Scalability and Elasticity Scale out on commodity hardware, and also scale down ACID Transactions MVCC for data consistency and simultaneous read+write Enterprise-Grade Security Certified, granular security for modern data governance

SLIDE: 13 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Core Benefits of Enterprise NoSQL

. A database more in line with today’s data processing problems and expectations – Handle all types of data – Minimize (or eliminate) ETL and data copying – Scale out on commodity hardware and in the cloud – Deliver results more quickly

. A database that offers opportunities for operational convergence – Handles mixed workloads (real-time and batch) – Does not abandon enterprise capabilities – e.g. transactions and security – A database that is built to integrate with the ecosystem (e.g. Hadoop and related) SLIDE: 14 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

More Than Just Query…

What if an analyst could talk back to the data warehouse…?

I found Something!

SLIDE: 15 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Semantics Enterprise triple store, document store, and database combined

. Store and query billions of facts and relationships; infer new facts . Facts and relationships provide context for better search . Flexible data modeling—integrate and link data from different sources . Standards-based for ease of use and integration – RDF, SPARQL, and standard REST interfaces

SLIDE: 16 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Semantics: A New Way to Organize Data

Data is stored in Triples, expressed as: Subject : Predicate : Object Jean Dubois : livesIn : Paris Paris : isIn : France

SLIDE: 17 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Semantics: A New Way to Organize Data

Data is stored in Triples,RDF expressed as: Subject : Predicate : Object Triples Jean Dubois : livesIn : Paris Paris : isIn : France

Query with SPARQL gives us simple lookup .. and more! Find people who live in (a place that's in) France

”Jean Dubois" ”Paris" ”France" livesIn isIn

livesIn

SLIDE: 18 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Asserting Facts with Semantics

. Assert newly discovered information during analysis . “I received an email that about the date of birth” . Decorate with additional items of interest . “Bob has an interest in art” . Assert relationships as they are discovered Source: http://www.w3.org/TR/2014/NOTE-rdf11-primer-20140225/ . “Bob knows Alice”

SLIDE: 19 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Data Provenance with Semantics

. Data lineage & provenance . Utilize PROV-O . The PROV Ontology . W3C Recommendation . Expressed with RDF Triples . For example… prov:wasGeneratedBy prov:wasDerivedFrom prov:wasAttributedTo prov:wasAssociatedWith

SLIDE: 20 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Benefits of Enterprise NoSQL with Semantics

. Make analysis conversational – Using machine-readable standards

. Provide even more modeling flexibility – Ad-hoc facts and relationships – Richer metadata

. Further enable operational/analytical convergence

SLIDE: 21 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

HANDLING TIME Bi-Temporality

. Audits – Preserved history . Regulation and compliance . Risk Management – Financial risk assessment models need to factor in all history

What were my customer’s credit ratings last Monday as I knew it last Friday?

. A complete history (audit trail) of what you knew and when you knew it

SLIDE: 23 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Bi-temporality with Enterprise NoSQL

Others Enterprise NoSQL Other NoSQL: No transactions means no bi- Full ACID transactions with MVCC temporal. RDBMS: Bi-temporal at table level, composed Bi-temporal at the object/document level. objects make implementation complex. Implementations much more straightforward RBDMS: Bi-temporal data only. What happens when For Enterprise NoSQL, schema is data. the schema changes? RDBMS: Bi-temporal data only. What happens when Security may also be bi-temporally managed. the security changes? RDBMS: Inflexible implementations with respect to Flexible implementations based on customer input bi-temporal views and clocks. and without compromising auditability. Capabilities such as multi-layered bi-temporality and use of external transaction clocks possible.

SLIDE: 24 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

PUTTING THINGS TOGETHER Enterprise NoSQL Operational Data Warehouse

Discover & Enrich

RT Events or Batch Load

RDF

. Schema-agnostic . Straightforward data integration . Full-text indexing and search . Scale-out infrastructure . Real-time or batch load and analysis . Write back during discovery!

SLIDE: 26 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Getting Noticed

SLIDE: 28 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

BEYOND THE DATA WAREHOUSE If only the EDW was the only problem…

SOA

SLIDE: 30 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Application Centric Architecture

Characterized by: – Applications that own “their own” data – Small pockets of “authoritative” sources (e.g. reference data, CRM) – Data exchange between systems for cross-LoB operations Resulting in: – Multiple copies of the same data (even from authoritative sources) – Diminishing with each copy But that’s not all… – Accelerated by SOA (an otherwise good thing)

SLIDE: 31 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Operational Data Hub

. A read & write Database supports more than analysis

. Enables a Data Centric Architecture for the Enterprise

. Brings all of the data-centric goodness beyond the Data Warehouse space – React immediately to important events – e.g. alerts – Create workflow based on analysis – Make SOA better, redeem broken implementations

SLIDE: 32 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Operational Data Hub . Read+write real-time DW Data-centric enterprise Bidirectional . analysis of all data . Unified distribution . Direct external feedback . Makes use of Hadoop investment . Semantics plays a key role

Multi-channel distribution

Operational Feedback Customers Applications

Warm archives

SLIDE: 33 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Convergence

. Platform for mixed workloads – Simultaneous read & write during discovery – Analytical and Operational functions within the same DB – React immediately to important events – e.g. alerts – Create workflow based on analysis

. A Data Centric Architecture for the Enterprise – Bring applications to the data

. Bring the flexibility of “Big Data” beyond the Data Warehouse space – “Three V’s” for running the business

SLIDE: 34 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Thank You [email protected] marklogic.com @kenkrupa world.marklogic.com