Introducing Enterprise Nosql
Total Page:16
File Type:pdf, Size:1020Kb
Enterprise NoSQL Converging Analysis and Operations Ken Krupa, Enterprise CTO, MarkLogic © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. A Brief History: Database Duality Analytical Specialization . Specialization between analysis and operations Operational Specialization Analysis / Operations . Accelerated by disruptive Gap IT shifts (e.g. Internet, Hadoop) . Need for greater convergence ~1990 ~2000 ~2010 ~2013 Star WWW Big Data NoSQL Schema Mainstream Main- Mainstream st EDW 1 peak stream Mainstream SLIDE: 2 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Accelerating the divide? Our world is changing …and heterogeneous data is a problem 44 ZB 12% Structured 88% Unstructured Reference Data Warehouse 4.4 ZB OLTP Data Marts Archives ? 2013 2020 Source: IDC SLIDE: 3 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. THE DATA WAREHOUSES Traditional Enterprise Data Warehouse (RDBMS) EDW Definition: “A subject oriented, nonvolatile, integrated, time variant collection of data in support of management's decisions” – Bill Inmon . Pre-defined schemas . Complex ETL processes . Changes depend on SDLC SLIDE: 5 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Star Schema Modeling - Waterfall TIME COST 1. Choose the business process Identify 2. Declare the grain Model 3. Identify the dimensions Integrate 4. Identify the fact Source: Discover http://en.wikipedia.org/wiki/Dimensional_modeling SLIDE: 6 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. View from the Enterprise “Unstructured” Reference Data Documents, Messages Video Warehouse { } Audio Signals, Metadata Logs, Streams OLTP “ Social ” Search Archives Data Marts SLIDE: 7 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. WHAT ABOUT HADOOP Hadoop – What You Get Advantages Gaps . HDFS provides scale and economies . Hadoop was designed for batch of scale processing . File-based nature allows for greater . Does not support real-time Variety applications on its own . Raw data is fine and any shape . Requires expertise to configure, deploy will do and manage . Schema-on-read possible . Has security limitations . Map-reduce and YARN enables . On its own, it is not a database massive parallel scaling SLIDE: 10 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. RDBMS + Hadoop . Still a lot of ETL – RDBMS is still in the picture . Shortcomings in security and governance capabilities with Hadoop . Reliance on RDBMS for anything operational . A mismatch between analytical and operational aspects SLIDE: 11 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. ENTERPRISE NOSQL Enterprise NoSQL Flexible Data Model Store and manage JSON, XML, RDF, and Geospatial data with a document-centric, schema-agnostic database. Pre-requisite modeling not required Search and Query Built-in search to find answers in documents, relationships, and metadata Scalability and Elasticity Scale out on commodity hardware, and also scale down ACID Transactions MVCC for data consistency and simultaneous read+write Enterprise-Grade Security Certified, granular security for modern data governance SLIDE: 13 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Core Benefits of Enterprise NoSQL . A database more in line with today’s data processing problems and expectations – Handle all types of data – Minimize (or eliminate) ETL and data copying – Scale out on commodity hardware and in the cloud – Deliver results more quickly . A database that offers opportunities for operational convergence – Handles mixed workloads (real-time and batch) – Does not abandon enterprise capabilities – e.g. transactions and security – A database that is built to integrate with the Big Data ecosystem (e.g. Hadoop and related) SLIDE: 14 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. More Than Just Query… What if an analyst could talk back to the data warehouse…? I found Something! SLIDE: 15 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Semantics Enterprise triple store, document store, and database combined . Store and query billions of facts and relationships; infer new facts . Facts and relationships provide context for better search . Flexible data modeling—integrate and link data from different sources . Standards-based for ease of use and integration – RDF, SPARQL, and standard REST interfaces SLIDE: 16 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Semantics: A New Way to Organize Data Data is stored in Triples, expressed as: Subject : Predicate : Object Jean Dubois : livesIn : Paris Paris : isIn : France SLIDE: 17 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Semantics: A New Way to Organize Data Data is stored in Triples,RDF expressed as: Subject : Predicate : Object Triples Jean Dubois : livesIn : Paris Paris : isIn : France Query with SPARQL gives us simple lookup .. and more! Find people who live in (a place that's in) France ”Jean Dubois" ”Paris" ”France" livesIn isIn livesIn SLIDE: 18 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Asserting Facts with Semantics . Assert newly discovered information during analysis . “I received an email that about the date of birth” . Decorate with additional items of interest . “Bob has an interest in art” . Assert relationships as they are discovered Source: http://www.w3.org/TR/2014/NOTE-rdf11-primer-20140225/ . “Bob knows Alice” SLIDE: 19 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Data Provenance with Semantics . Data lineage & provenance . Utilize PROV-O . The PROV Ontology . W3C Recommendation . Expressed with RDF Triples . For example… prov:wasGeneratedBy prov:wasDerivedFrom prov:wasAttributedTo prov:wasAssociatedWith SLIDE: 20 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Benefits of Enterprise NoSQL with Semantics . Make analysis conversational – Using machine-readable standards . Provide even more modeling flexibility – Ad-hoc facts and relationships – Richer metadata . Further enable operational/analytical convergence SLIDE: 21 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. HANDLING TIME Bi-Temporality . Audits – Preserved history . Regulation and compliance . Risk Management – Financial risk assessment models need to factor in all history What were my customer’s credit ratings last Monday as I knew it last Friday? . A complete history (audit trail) of what you knew and when you knew it SLIDE: 23 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Bi-temporality with Enterprise NoSQL Others Enterprise NoSQL Other NoSQL: No transactions means no bi- Full ACID transactions with MVCC temporal. RDBMS: Bi-temporal at table level, composed Bi-temporal at the object/document level. objects make implementation complex. Implementations much more straightforward RBDMS: Bi-temporal data only. What happens when For Enterprise NoSQL, schema is data. the schema changes? RDBMS: Bi-temporal data only. What happens when Security may also be bi-temporally managed. the security changes? RDBMS: Inflexible implementations with respect to Flexible implementations based on customer input bi-temporal views and clocks. and without compromising auditability. Capabilities such as multi-layered bi-temporality and use of external transaction clocks possible. SLIDE: 24 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. PUTTING THINGS TOGETHER Enterprise NoSQL Operational Data Warehouse Discover & Enrich RT Events or Batch Load RDF . Schema-agnostic . Straightforward data integration . Full-text indexing and search . Scale-out infrastructure . Real-time or batch load and analysis . Write back during discovery! SLIDE: 26 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Getting Noticed SLIDE: 28 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. BEYOND THE DATA WAREHOUSE If only the EDW was the only problem… SOA SLIDE: 30 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Application Centric Architecture Characterized by: – Applications that own “their own” data – Small pockets of “authoritative” sources (e.g. reference data, CRM) – Data exchange between systems for cross-LoB operations Resulting in: – Multiple copies of the same data (even from authoritative sources) – Diminishing data quality with each copy But that’s not all… – Accelerated by SOA (an otherwise good thing) SLIDE: 31 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Operational Data Hub . A read & write Database supports more than analysis . Enables a Data Centric Architecture for the Enterprise . Brings all of the data-centric goodness beyond the Data Warehouse space – React immediately to important events – e.g. alerts – Create workflow based on analysis – Make SOA better, redeem broken implementations SLIDE: 32 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Operational Data Hub . Read+write real-time DW Data-centric enterprise Bidirectional . analysis of all data . Unified distribution . Direct external feedback . Makes use of Hadoop investment . Semantics plays a key role Multi-channel distribution Operational Feedback Customers Applications Warm archives SLIDE: 33 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Convergence . Platform for mixed workloads – Simultaneous read & write during discovery – Analytical and Operational functions within the same DB – React immediately to important events – e.g. alerts – Create workflow based on analysis . A Data Centric Architecture for the Enterprise – Bring applications to the data . Bring the flexibility of “Big Data” beyond the Data Warehouse space – “Three V’s” for running the business SLIDE: 34 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Thank You [email protected] marklogic.com @kenkrupa world.marklogic.com .