Enterprise NoSQL Converging Analysis and Operations Ken Krupa, Enterprise CTO, MarkLogic
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. A Brief History: Database Duality
Analytical Specialization . Specialization between analysis and operations Operational Specialization Analysis / Operations . Accelerated by disruptive Gap IT shifts (e.g. Internet, Hadoop)
. Need for greater convergence
~1990 ~2000 ~2010 ~2013 Star WWW Big Data NoSQL Schema Mainstream Main- Mainstream st EDW 1 peak stream Mainstream
SLIDE: 2 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Accelerating the divide? Our world is changing …and heterogeneous data is a problem
44 ZB
12% Structured 88% Unstructured
Reference Data
Warehouse 4.4 ZB OLTP
Data Marts Archives ?
2013 2020
Source: IDC
SLIDE: 3 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
THE DATA WAREHOUSES Traditional Enterprise Data Warehouse (RDBMS) EDW Definition: “A subject oriented, nonvolatile, integrated, time variant collection of data in support of management's decisions” – Bill Inmon
. Pre-defined schemas . Complex ETL processes . Changes depend on SDLC
SLIDE: 5 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Star Schema Modeling - Waterfall
TIME COST 1. Choose the business process
Identify 2. Declare the grain
Model 3. Identify the dimensions
Integrate 4. Identify the fact
Source: Discover http://en.wikipedia.org/wiki/Dimensional_modeling
SLIDE: 6 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
View from the Enterprise “Unstructured” Reference Data Documents, Messages Video Warehouse { } Audio Signals, Metadata Logs, Streams OLTP “ Social ” Search Archives Data Marts
SLIDE: 7 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
WHAT ABOUT HADOOP Hadoop – What You Get
Advantages Gaps . HDFS provides scale and economies . Hadoop was designed for batch of scale processing . File-based nature allows for greater . Does not support real-time Variety applications on its own . Raw data is fine and any shape . Requires expertise to configure, deploy will do and manage . Schema-on-read possible . Has security limitations . Map-reduce and YARN enables . On its own, it is not a database massive parallel scaling
SLIDE: 10 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
RDBMS + Hadoop
. Still a lot of ETL – RDBMS is still in the picture
. Shortcomings in security and governance capabilities with Hadoop
. Reliance on RDBMS for anything operational
. A mismatch between analytical and operational aspects
SLIDE: 11 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
ENTERPRISE NOSQL Enterprise NoSQL Flexible Data Model Store and manage JSON, XML, RDF, and Geospatial data with a document-centric, schema-agnostic database. Pre-requisite modeling not required Search and Query Built-in search to find answers in documents, relationships, and metadata Scalability and Elasticity Scale out on commodity hardware, and also scale down ACID Transactions MVCC for data consistency and simultaneous read+write Enterprise-Grade Security Certified, granular security for modern data governance
SLIDE: 13 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Core Benefits of Enterprise NoSQL
. A database more in line with today’s data processing problems and expectations – Handle all types of data – Minimize (or eliminate) ETL and data copying – Scale out on commodity hardware and in the cloud – Deliver results more quickly
. A database that offers opportunities for operational convergence – Handles mixed workloads (real-time and batch) – Does not abandon enterprise capabilities – e.g. transactions and security – A database that is built to integrate with the Big Data ecosystem (e.g. Hadoop and related) SLIDE: 14 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
More Than Just Query…
What if an analyst could talk back to the data warehouse…?
I found Something!
SLIDE: 15 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Semantics Enterprise triple store, document store, and database combined
. Store and query billions of facts and relationships; infer new facts . Facts and relationships provide context for better search . Flexible data modeling—integrate and link data from different sources . Standards-based for ease of use and integration – RDF, SPARQL, and standard REST interfaces
SLIDE: 16 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Semantics: A New Way to Organize Data
Data is stored in Triples, expressed as: Subject : Predicate : Object Jean Dubois : livesIn : Paris Paris : isIn : France
SLIDE: 17 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Semantics: A New Way to Organize Data
Data is stored in Triples,RDF expressed as: Subject : Predicate : Object Triples Jean Dubois : livesIn : Paris Paris : isIn : France
Query with SPARQL gives us simple lookup .. and more! Find people who live in (a place that's in) France
”Jean Dubois" ”Paris" ”France" livesIn isIn
livesIn
SLIDE: 18 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Asserting Facts with Semantics
. Assert newly discovered information during analysis . “I received an email that about the date of birth” . Decorate with additional items of interest . “Bob has an interest in art” . Assert relationships as they are discovered Source: http://www.w3.org/TR/2014/NOTE-rdf11-primer-20140225/ . “Bob knows Alice”
SLIDE: 19 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Data Provenance with Semantics
. Data lineage & provenance . Utilize PROV-O . The PROV Ontology . W3C Recommendation . Expressed with RDF Triples . For example… prov:wasGeneratedBy prov:wasDerivedFrom prov:wasAttributedTo prov:wasAssociatedWith
SLIDE: 20 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Benefits of Enterprise NoSQL with Semantics
. Make analysis conversational – Using machine-readable standards
. Provide even more modeling flexibility – Ad-hoc facts and relationships – Richer metadata
. Further enable operational/analytical convergence
SLIDE: 21 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
HANDLING TIME Bi-Temporality
. Audits – Preserved history . Regulation and compliance . Risk Management – Financial risk assessment models need to factor in all history
What were my customer’s credit ratings last Monday as I knew it last Friday?
. A complete history (audit trail) of what you knew and when you knew it
SLIDE: 23 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Bi-temporality with Enterprise NoSQL
Others Enterprise NoSQL Other NoSQL: No transactions means no bi- Full ACID transactions with MVCC temporal. RDBMS: Bi-temporal at table level, composed Bi-temporal at the object/document level. objects make implementation complex. Implementations much more straightforward RBDMS: Bi-temporal data only. What happens when For Enterprise NoSQL, schema is data. the schema changes? RDBMS: Bi-temporal data only. What happens when Security may also be bi-temporally managed. the security changes? RDBMS: Inflexible implementations with respect to Flexible implementations based on customer input bi-temporal views and clocks. and without compromising auditability. Capabilities such as multi-layered bi-temporality and use of external transaction clocks possible.
SLIDE: 24 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
PUTTING THINGS TOGETHER Enterprise NoSQL Operational Data Warehouse
Discover & Enrich
RT Events or Batch Load
RDF
. Schema-agnostic . Straightforward data integration . Full-text indexing and search . Scale-out infrastructure . Real-time or batch load and analysis . Write back during discovery!
SLIDE: 26 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Getting Noticed
SLIDE: 28 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
BEYOND THE DATA WAREHOUSE If only the EDW was the only problem…
SOA
SLIDE: 30 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Application Centric Architecture
Characterized by: – Applications that own “their own” data – Small pockets of “authoritative” sources (e.g. reference data, CRM) – Data exchange between systems for cross-LoB operations Resulting in: – Multiple copies of the same data (even from authoritative sources) – Diminishing data quality with each copy But that’s not all… – Accelerated by SOA (an otherwise good thing)
SLIDE: 31 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Operational Data Hub
. A read & write Database supports more than analysis
. Enables a Data Centric Architecture for the Enterprise
. Brings all of the data-centric goodness beyond the Data Warehouse space – React immediately to important events – e.g. alerts – Create workflow based on analysis – Make SOA better, redeem broken implementations
SLIDE: 32 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Operational Data Hub . Read+write real-time DW Data-centric enterprise Bidirectional . analysis of all data . Unified distribution . Direct external feedback . Makes use of Hadoop investment . Semantics plays a key role
Multi-channel distribution
Operational Feedback Customers Applications
Warm archives
SLIDE: 33 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Convergence
. Platform for mixed workloads – Simultaneous read & write during discovery – Analytical and Operational functions within the same DB – React immediately to important events – e.g. alerts – Create workflow based on analysis
. A Data Centric Architecture for the Enterprise – Bring applications to the data
. Bring the flexibility of “Big Data” beyond the Data Warehouse space – “Three V’s” for running the business
SLIDE: 34 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Thank You [email protected] marklogic.com @kenkrupa world.marklogic.com