Introducing Enterprise Nosql

Enterprise NoSQL Converging Analysis and Operations Ken Krupa, Enterprise CTO, MarkLogic © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. A Brief History: Database Duality Analytical Specialization . Specialization between analysis and operations Operational Specialization Analysis / Operations . Accelerated by disruptive Gap IT shifts (e.g. Internet, Hadoop) . Need for greater convergence ~1990 ~2000 ~2010 ~2013 Star WWW Big Data NoSQL Schema Mainstream Main- Mainstream st EDW 1 peak stream Mainstream SLIDE: 2 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Accelerating the divide? Our world is changing …and heterogeneous data is a problem 44 ZB 12% Structured 88% Unstructured Reference Data Warehouse 4.4 ZB OLTP Data Marts Archives ? 2013 2020 Source: IDC SLIDE: 3 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. THE DATA WAREHOUSES Traditional Enterprise Data Warehouse (RDBMS) EDW Definition: “A subject oriented, nonvolatile, integrated, time variant collection of data in support of management's decisions” – Bill Inmon . Pre-defined schemas . Complex ETL processes . Changes depend on SDLC SLIDE: 5 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Star Schema Modeling - Waterfall TIME COST 1. Choose the business process Identify 2. Declare the grain Model 3. Identify the dimensions Integrate 4. Identify the fact Source: Discover http://en.wikipedia.org/wiki/Dimensional_modeling SLIDE: 6 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. View from the Enterprise “Unstructured” Reference Data Documents, Messages Video Warehouse { } Audio Signals, Metadata Logs, Streams OLTP “ Social ” Search Archives Data Marts SLIDE: 7 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. WHAT ABOUT HADOOP Hadoop – What You Get Advantages Gaps . HDFS provides scale and economies . Hadoop was designed for batch of scale processing . File-based nature allows for greater . Does not support real-time Variety applications on its own . Raw data is fine and any shape . Requires expertise to configure, deploy will do and manage . Schema-on-read possible . Has security limitations . Map-reduce and YARN enables . On its own, it is not a database massive parallel scaling SLIDE: 10 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. RDBMS + Hadoop . Still a lot of ETL – RDBMS is still in the picture . Shortcomings in security and governance capabilities with Hadoop . Reliance on RDBMS for anything operational . A mismatch between analytical and operational aspects SLIDE: 11 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. ENTERPRISE NOSQL Enterprise NoSQL Flexible Data Model Store and manage JSON, XML, RDF, and Geospatial data with a document-centric, schema-agnostic database. Pre-requisite modeling not required Search and Query Built-in search to find answers in documents, relationships, and metadata Scalability and Elasticity Scale out on commodity hardware, and also scale down ACID Transactions MVCC for data consistency and simultaneous read+write Enterprise-Grade Security Certified, granular security for modern data governance SLIDE: 13 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Core Benefits of Enterprise NoSQL . A database more in line with today’s data processing problems and expectations – Handle all types of data – Minimize (or eliminate) ETL and data copying – Scale out on commodity hardware and in the cloud – Deliver results more quickly . A database that offers opportunities for operational convergence – Handles mixed workloads (real-time and batch) – Does not abandon enterprise capabilities – e.g. transactions and security – A database that is built to integrate with the Big Data ecosystem (e.g. Hadoop and related) SLIDE: 14 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. More Than Just Query… What if an analyst could talk back to the data warehouse…? I found Something! SLIDE: 15 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Semantics Enterprise triple store, document store, and database combined . Store and query billions of facts and relationships; infer new facts . Facts and relationships provide context for better search . Flexible data modeling—integrate and link data from different sources . Standards-based for ease of use and integration – RDF, SPARQL, and standard REST interfaces SLIDE: 16 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Semantics: A New Way to Organize Data Data is stored in Triples, expressed as: Subject : Predicate : Object Jean Dubois : livesIn : Paris Paris : isIn : France SLIDE: 17 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Semantics: A New Way to Organize Data Data is stored in Triples,RDF expressed as: Subject : Predicate : Object Triples Jean Dubois : livesIn : Paris Paris : isIn : France Query with SPARQL gives us simple lookup .. and more! Find people who live in (a place that's in) France ”Jean Dubois" ”Paris" ”France" livesIn isIn livesIn SLIDE: 18 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Asserting Facts with Semantics . Assert newly discovered information during analysis . “I received an email that about the date of birth” . Decorate with additional items of interest . “Bob has an interest in art” . Assert relationships as they are discovered Source: http://www.w3.org/TR/2014/NOTE-rdf11-primer-20140225/ . “Bob knows Alice” SLIDE: 19 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Data Provenance with Semantics . Data lineage & provenance . Utilize PROV-O . The PROV Ontology . W3C Recommendation . Expressed with RDF Triples . For example… prov:wasGeneratedBy prov:wasDerivedFrom prov:wasAttributedTo prov:wasAssociatedWith SLIDE: 20 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Benefits of Enterprise NoSQL with Semantics . Make analysis conversational – Using machine-readable standards . Provide even more modeling flexibility – Ad-hoc facts and relationships – Richer metadata . Further enable operational/analytical convergence SLIDE: 21 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. HANDLING TIME Bi-Temporality . Audits – Preserved history . Regulation and compliance . Risk Management – Financial risk assessment models need to factor in all history What were my customer’s credit ratings last Monday as I knew it last Friday? . A complete history (audit trail) of what you knew and when you knew it SLIDE: 23 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Bi-temporality with Enterprise NoSQL Others Enterprise NoSQL Other NoSQL: No transactions means no bi- Full ACID transactions with MVCC temporal. RDBMS: Bi-temporal at table level, composed Bi-temporal at the object/document level. objects make implementation complex. Implementations much more straightforward RBDMS: Bi-temporal data only. What happens when For Enterprise NoSQL, schema is data. the schema changes? RDBMS: Bi-temporal data only. What happens when Security may also be bi-temporally managed. the security changes? RDBMS: Inflexible implementations with respect to Flexible implementations based on customer input bi-temporal views and clocks. and without compromising auditability. Capabilities such as multi-layered bi-temporality and use of external transaction clocks possible. SLIDE: 24 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. PUTTING THINGS TOGETHER Enterprise NoSQL Operational Data Warehouse Discover & Enrich RT Events or Batch Load RDF . Schema-agnostic . Straightforward data integration . Full-text indexing and search . Scale-out infrastructure . Real-time or batch load and analysis . Write back during discovery! SLIDE: 26 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Getting Noticed SLIDE: 28 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. BEYOND THE DATA WAREHOUSE If only the EDW was the only problem… SOA SLIDE: 30 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Application Centric Architecture Characterized by: – Applications that own “their own” data – Small pockets of “authoritative” sources (e.g. reference data, CRM) – Data exchange between systems for cross-LoB operations Resulting in: – Multiple copies of the same data (even from authoritative sources) – Diminishing data quality with each copy But that’s not all… – Accelerated by SOA (an otherwise good thing) SLIDE: 31 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Operational Data Hub . A read & write Database supports more than analysis . Enables a Data Centric Architecture for the Enterprise . Brings all of the data-centric goodness beyond the Data Warehouse space – React immediately to important events – e.g. alerts – Create workflow based on analysis – Make SOA better, redeem broken implementations SLIDE: 32 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Operational Data Hub . Read+write real-time DW Data-centric enterprise Bidirectional . analysis of all data . Unified distribution . Direct external feedback . Makes use of Hadoop investment . Semantics plays a key role Multi-channel distribution Operational Feedback Customers Applications Warm archives SLIDE: 33 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Convergence . Platform for mixed workloads – Simultaneous read & write during discovery – Analytical and Operational functions within the same DB – React immediately to important events – e.g. alerts – Create workflow based on analysis . A Data Centric Architecture for the Enterprise – Bring applications to the data . Bring the flexibility of “Big Data” beyond the Data Warehouse space – “Three V’s” for running the business SLIDE: 34 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Thank You [email protected] marklogic.com @kenkrupa world.marklogic.com .

Introducing Enterprise Nosql

GAVS' Blockchain-Based

Data Governance with Oracle

Data Management Capability

Achieving Regulatory Compliance with Data Lineage Solutions

Lineage Tracing for General Data Warehouse Transformations

Effective Data Governance

Harness the Power of Your Data

Metadata Management on a Hadoop Eco-System

Data Lineage Management: Impact and Value

Data Warehouse Optimization with Hadoop

Solution Brief Intelligent Data Cataloging for Cloud Data

Data Governance 101