Introducing Enterprise Nosql

Total Page:16

File Type:pdf, Size:1020Kb

Introducing Enterprise Nosql Enterprise NoSQL Converging Analysis and Operations Ken Krupa, Enterprise CTO, MarkLogic © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. A Brief History: Database Duality Analytical Specialization . Specialization between analysis and operations Operational Specialization Analysis / Operations . Accelerated by disruptive Gap IT shifts (e.g. Internet, Hadoop) . Need for greater convergence ~1990 ~2000 ~2010 ~2013 Star WWW Big Data NoSQL Schema Mainstream Main- Mainstream st EDW 1 peak stream Mainstream SLIDE: 2 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Accelerating the divide? Our world is changing …and heterogeneous data is a problem 44 ZB 12% Structured 88% Unstructured Reference Data Warehouse 4.4 ZB OLTP Data Marts Archives ? 2013 2020 Source: IDC SLIDE: 3 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. THE DATA WAREHOUSES Traditional Enterprise Data Warehouse (RDBMS) EDW Definition: “A subject oriented, nonvolatile, integrated, time variant collection of data in support of management's decisions” – Bill Inmon . Pre-defined schemas . Complex ETL processes . Changes depend on SDLC SLIDE: 5 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Star Schema Modeling - Waterfall TIME COST 1. Choose the business process Identify 2. Declare the grain Model 3. Identify the dimensions Integrate 4. Identify the fact Source: Discover http://en.wikipedia.org/wiki/Dimensional_modeling SLIDE: 6 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. View from the Enterprise “Unstructured” Reference Data Documents, Messages Video Warehouse { } Audio Signals, Metadata Logs, Streams OLTP “ Social ” Search Archives Data Marts SLIDE: 7 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. WHAT ABOUT HADOOP Hadoop – What You Get Advantages Gaps . HDFS provides scale and economies . Hadoop was designed for batch of scale processing . File-based nature allows for greater . Does not support real-time Variety applications on its own . Raw data is fine and any shape . Requires expertise to configure, deploy will do and manage . Schema-on-read possible . Has security limitations . Map-reduce and YARN enables . On its own, it is not a database massive parallel scaling SLIDE: 10 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. RDBMS + Hadoop . Still a lot of ETL – RDBMS is still in the picture . Shortcomings in security and governance capabilities with Hadoop . Reliance on RDBMS for anything operational . A mismatch between analytical and operational aspects SLIDE: 11 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. ENTERPRISE NOSQL Enterprise NoSQL Flexible Data Model Store and manage JSON, XML, RDF, and Geospatial data with a document-centric, schema-agnostic database. Pre-requisite modeling not required Search and Query Built-in search to find answers in documents, relationships, and metadata Scalability and Elasticity Scale out on commodity hardware, and also scale down ACID Transactions MVCC for data consistency and simultaneous read+write Enterprise-Grade Security Certified, granular security for modern data governance SLIDE: 13 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Core Benefits of Enterprise NoSQL . A database more in line with today’s data processing problems and expectations – Handle all types of data – Minimize (or eliminate) ETL and data copying – Scale out on commodity hardware and in the cloud – Deliver results more quickly . A database that offers opportunities for operational convergence – Handles mixed workloads (real-time and batch) – Does not abandon enterprise capabilities – e.g. transactions and security – A database that is built to integrate with the Big Data ecosystem (e.g. Hadoop and related) SLIDE: 14 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. More Than Just Query… What if an analyst could talk back to the data warehouse…? I found Something! SLIDE: 15 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Semantics Enterprise triple store, document store, and database combined . Store and query billions of facts and relationships; infer new facts . Facts and relationships provide context for better search . Flexible data modeling—integrate and link data from different sources . Standards-based for ease of use and integration – RDF, SPARQL, and standard REST interfaces SLIDE: 16 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Semantics: A New Way to Organize Data Data is stored in Triples, expressed as: Subject : Predicate : Object Jean Dubois : livesIn : Paris Paris : isIn : France SLIDE: 17 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Semantics: A New Way to Organize Data Data is stored in Triples,RDF expressed as: Subject : Predicate : Object Triples Jean Dubois : livesIn : Paris Paris : isIn : France Query with SPARQL gives us simple lookup .. and more! Find people who live in (a place that's in) France ”Jean Dubois" ”Paris" ”France" livesIn isIn livesIn SLIDE: 18 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Asserting Facts with Semantics . Assert newly discovered information during analysis . “I received an email that about the date of birth” . Decorate with additional items of interest . “Bob has an interest in art” . Assert relationships as they are discovered Source: http://www.w3.org/TR/2014/NOTE-rdf11-primer-20140225/ . “Bob knows Alice” SLIDE: 19 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Data Provenance with Semantics . Data lineage & provenance . Utilize PROV-O . The PROV Ontology . W3C Recommendation . Expressed with RDF Triples . For example… prov:wasGeneratedBy prov:wasDerivedFrom prov:wasAttributedTo prov:wasAssociatedWith SLIDE: 20 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Benefits of Enterprise NoSQL with Semantics . Make analysis conversational – Using machine-readable standards . Provide even more modeling flexibility – Ad-hoc facts and relationships – Richer metadata . Further enable operational/analytical convergence SLIDE: 21 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. HANDLING TIME Bi-Temporality . Audits – Preserved history . Regulation and compliance . Risk Management – Financial risk assessment models need to factor in all history What were my customer’s credit ratings last Monday as I knew it last Friday? . A complete history (audit trail) of what you knew and when you knew it SLIDE: 23 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Bi-temporality with Enterprise NoSQL Others Enterprise NoSQL Other NoSQL: No transactions means no bi- Full ACID transactions with MVCC temporal. RDBMS: Bi-temporal at table level, composed Bi-temporal at the object/document level. objects make implementation complex. Implementations much more straightforward RBDMS: Bi-temporal data only. What happens when For Enterprise NoSQL, schema is data. the schema changes? RDBMS: Bi-temporal data only. What happens when Security may also be bi-temporally managed. the security changes? RDBMS: Inflexible implementations with respect to Flexible implementations based on customer input bi-temporal views and clocks. and without compromising auditability. Capabilities such as multi-layered bi-temporality and use of external transaction clocks possible. SLIDE: 24 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. PUTTING THINGS TOGETHER Enterprise NoSQL Operational Data Warehouse Discover & Enrich RT Events or Batch Load RDF . Schema-agnostic . Straightforward data integration . Full-text indexing and search . Scale-out infrastructure . Real-time or batch load and analysis . Write back during discovery! SLIDE: 26 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Getting Noticed SLIDE: 28 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. BEYOND THE DATA WAREHOUSE If only the EDW was the only problem… SOA SLIDE: 30 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Application Centric Architecture Characterized by: – Applications that own “their own” data – Small pockets of “authoritative” sources (e.g. reference data, CRM) – Data exchange between systems for cross-LoB operations Resulting in: – Multiple copies of the same data (even from authoritative sources) – Diminishing data quality with each copy But that’s not all… – Accelerated by SOA (an otherwise good thing) SLIDE: 31 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Operational Data Hub . A read & write Database supports more than analysis . Enables a Data Centric Architecture for the Enterprise . Brings all of the data-centric goodness beyond the Data Warehouse space – React immediately to important events – e.g. alerts – Create workflow based on analysis – Make SOA better, redeem broken implementations SLIDE: 32 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Operational Data Hub . Read+write real-time DW Data-centric enterprise Bidirectional . analysis of all data . Unified distribution . Direct external feedback . Makes use of Hadoop investment . Semantics plays a key role Multi-channel distribution Operational Feedback Customers Applications Warm archives SLIDE: 33 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Convergence . Platform for mixed workloads – Simultaneous read & write during discovery – Analytical and Operational functions within the same DB – React immediately to important events – e.g. alerts – Create workflow based on analysis . A Data Centric Architecture for the Enterprise – Bring applications to the data . Bring the flexibility of “Big Data” beyond the Data Warehouse space – “Three V’s” for running the business SLIDE: 34 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Thank You [email protected] marklogic.com @kenkrupa world.marklogic.com .
Recommended publications
  • GAVS' Blockchain-Based
    Brochure GAVS’ Blockchain-based Master Data Management Framework For Enterprises Enterprise-wide Data Stewardship, Shared Ownership Of Data Master Data Challenges With enterprises moving full throttle towards digital transformation and the consequent proliferation of data, the magnitude of Master Data Management (MDM) challenges are on a steep rise. Implementing a sustainable MDM strategy has now gained critical importance. 01 www.gavstech.com Master Data Management Challenges Centralized Authority Data Management Centralized management of Master Data is complex, Data Reconciliation expensive and prone to security compromise & Mergers & acquisitions involve complex reconciliation accidental loss. and appropriate definition for Data Stewardship & Governance. Data Security Data Movement A single malicious attack to a centralized infrastructure Sharing Master Data across enterprise boundaries, with can do a lot of damage. subsidiaries or BUs across geographies, or multi-master updates is highly resource intensive. Data may also be compromised by breaches through the backend, bypassing business logic-based front-end Transfer of Ownership controls. Transfer of Master Data ownership to customers or to other BUs comes with a compliance risk due to the lack of cryptography-based authentication. Data Lineage Audit trails may be incomplete, unavailable or corrupt leading to lack of data lineage & traceability, which in-turn will affect downstream systems. Blockchain-based Master Data Management to the Rescue! A Blockchain is a kind of distributed ledger consisting of cryptographically signed, irrevocable transactional records based on shared consensus among all participants in the network. Decentralized Trust & Transparent Secured Resilient Innovative application of Blockchain is key to transformation in several industry verticals 02 www.gavstech.com Decentralization Data Transparency/Traceability The cornerstone of the Blockchain technology is Since Blockchains are based on smart contracts and decentralization.
    [Show full text]
  • Data Governance with Oracle
    Data Governance with Oracle Defining and Implementing a Pragmatic Data Governance Process with Oracle Metadata Management and Oracle Data Quality Solutions ORACLE WHITE P A P E R | SEPTEMBER 2015 Disclaimer The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. DATA GOVERNANCE WITH ORACLE Table of Contents Disclaimer 1 Introduction 1 First Define the Business Problem 2 Identify Executive Sponsor 3 Manage Glossary of Business Terms 4 Identify Critical Data Elements 4 Classify Data from an Information Security Perspective 5 Manage Business Rules 6 Manage Allowable Values for Business Terms 7 Support for Data Lineage and Impact Analysis 8 Manage Data Stewardship Workflows 10 Govern Big Data 11 Manage Data Quality Rules 12 Execute Data Quality Rules 12 View Data Quality Dashboard 16 Data Quality Remediation 16 Data Privacy and Security 17 Ingredients for Data Governance Success 17 Governance with Any Enterprise System 19 Align with Other Oracle Solutions 20 About the Author 22 DATA GOVERNANCE WITH ORACLE . DATA GOVERNANCE WITH ORACLE Introduction Data governance is the formulation of policy to optimize, secure, and leverage information as an enterprise asset by aligning the objectives of multiple functions. Data governance programs have traditionally been focused on people and process. In this whitepaper, we will discuss how key data governance capabilities are enabled by Oracle Enterprise Metadata Manager (OEMM) and Oracle Enterprise Data Quality (EDQ).
    [Show full text]
  • Data Management Capability
    Data Management Capability In today’s world, data fuels the Fresh Gravity supports a wide spectrum Information Driven Enterprise. It of your data management needs, offering powers business growth, ensures end-to-end services for your organization. cost control, launches products Our data management focus areas are: and services, and penetrates new markets for the organization. » Big Data Effective data management helps » Cloud Services to achieve the goal of Customer Centricity for organizations of any » Data Architecture size. When it comes to making » Data Governance the most of data, organizations » Metadata Management must get their information in order if they want to turn insights » Master Data Management into action and deliver better » Data Warehouse Modernization business outcomes. With data increasing in velocity, variety, and » Data Integration volume, organizations are finding it increasingly difficult to gather, store, and effectively use data to fuel their analytics and decision support systems. Core Capabilities Big Data Data Quality & Governance Master Data Management » Big Data Strategy » Data Quality Health Check — » Master Data Integration » Big Data Architecture Profile, Analyze and Assess » Match and Merge Rules » Data Lake Implementation Services » Data Remediation Services » Establish Golden Record » Data Quality Tools Implementation » Advanced Analytics » Master Data Syndication » Operational Data Quality » Reference Data Management Procedures Cloud Services » Hierarchies and Affiliations » Standards, Policies,
    [Show full text]
  • Achieving Regulatory Compliance with Data Lineage Solutions
    ACHIEVING REGULATORY COMPLIANCE WITH DATA LINEAGE SOLUTIONS An Industry Perspective Report brought to you by ASG Technologies and FIMA This report is based on recommendations made by the industry experts who participated in ASG’s live webinar in the Summer of 2016. SETTING THE SCENE In the Summer of 2016, ASG and the FIMA conference series partnered to produce a webinar focused on addressing the real questions that arise when working on data lineage projects. Why is lineage important? How are people approaching the creation of a data lineage analysis? What are reasons it should potentially be automated? What does automated analysis bring in terms of accelerated compliance? This report uncovers the answers to these questions, as identified by the several industry experts listed below, and includes their recommendations on building out your own data lineage projects. PANELISTS THOMAS SERVEN Vice-President of Enterprise Data Governance State Street FRED ROOS Director of ICAAP Transformation Santander Holdings US IAN ROWLANDS Vice-President of Product Marketing ASG Technologies 2 What does the current A secondary shift is data and regulatory occurring through an explosive change environment look like? in the technology environment. Cloud- based applications IAN ROWLANDS: There are several factors today and services, as well adding to a heightened focus on data lineage. One of the most prominent is change in the as big data as a new regulatory environment. In the past, you may storage infrastructure have only needed to respond to one new are leading to regulation a year, and been audited at most once a quarter. It would have been likely that your IT an increasingly team knew more about the information that they were being asked for sophisticated than the regulators themselves.
    [Show full text]
  • Lineage Tracing for General Data Warehouse Transformations
    Lineage Tracing for General Data Warehouse Transformations∗ Yingwei Cui and Jennifer Widom Computer Science Department, Stanford University fcyw, [email protected] Abstract. Data warehousing systems integrate information and managing such transformations as part of the extract- from operational data sources into a central repository to enable transform-load (ETL) process, e.g., [Inf, Mic, PPD, Sag]. analysis and mining of the integrated information. During the The transformations may vary from simple algebraic op- integration process, source data typically undergoes a series of erations or aggregations to complex procedural code. transformations, which may vary from simple algebraic opera- In this paper we consider the problem of lineage trac- tions or aggregations to complex “data cleansing” procedures. ing for data warehouses created by general transforma- In a warehousing environment, the data lineage problem is that tions. Since we no longer have the luxury of a fixed set of of tracing warehouse data items back to the original source items operators or the algebraic properties offered by relational from which they were derived. We formally define the lineage views, the problem is considerably more difficult and tracing problem in the presence of general data warehouse trans- open-ended than previous work on lineage tracing. Fur- formations, and we present algorithms for lineage tracing in this thermore, since transformation graphs in real ETL pro- environment. Our tracing procedures take advantage of known cesses can often be quite complex—containing as many structure or properties of transformations when present, but also as 60 or more transformations—the storage requirements work in the absence of such information.
    [Show full text]
  • Effective Data Governance
    PERSPECTIVE EFFECTIVE DATA GOVERNANCE Abstract Data governance is no more just another item that is good to talk about and nice to have, for global data management organizations. This PoV looks into why data governance is now on the core agenda of next-generation organizations, and how they can implement it in the most effective manner. Why is data governance Variety of data and increase in demanding around data privacy, personal important and sandboxing culture information protection, data security, data lineage, and historical data. challenging now? The next-generation analytics utilize data from all kinds of social networks and This makes data governance top priority for Data has grown significantly blogospheres, machine-generated data, Chief Information Officers (CIOs). In fact, a Omniture / clickstream data, as well as survey by Gartner suggested that by 2016, Over time the desire and capability of customer data from credit management 20% of CIOs in regulated industries would organizations, to collect and process data lose their jobs for failing to implement and loyalty management. Alongside this, has increased manifold. Some of the facts the discipline of Information Governance, organizations have now set up sandboxes, that came out in various analyst surveys successfully [3]. pilot environments, and adopted data and research suggest that: discovery tools and self-service tools. Such Data to insights to actions: Need for Structured data is growing by over 40% • data proliferation and the steep increase in accurate information every year data consumption applications demands Today’s managers use data for decisions Traditional content types, including stringent and effective data governance. • and actions.
    [Show full text]
  • Harness the Power of Your Data
    Harness the power of your data: Why Financial Services institutions are building data lakes on AWS What is a data lake? A data lake is a centralized repository that allows you to store all structured and unstructured data at scale and run flexible analytics Financial institutions are collecting such as dashboards, visualizations, big data processing, real-time analytics, and machine learning, to guide better decisions. and storing massive amounts of data Machine The Financial Services industry has relied on traditional data infrastructures Learning Analytics for decades, but traditional data solutions can’t keep up with the volumes and variety of data financial institutions are collecting today. A cloud-based data lake helps financial institutions store all of their data in one central repository, making it easy to support compliance priorities, realize cost efficiencies, perform forecasts, execute risk assessments, better understand customer behavior, and drive innovation. AWS delivers an integrated suite of services that provides the capabilities needed to quickly build and manage a secure data lake that is ready for analysis and the application of machine learning. In this overview, learn how financial institutions are unlocking the value of their data by building data lakes on AWS. On-Premises Real-Time Data Movement Data Movement © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. | 2 There are many benefits to building adata lake on AWS Compliance & Security Scalability Agility Innovation Cost effective Encrypt highly sensitive Amazon S3 data lakes allow any Perform ad-hoc and Aggregated and normalized Pay-as-you-go pricing data and enable controls type of data to be stored at any cost-effective analytics on a data sets provide a foundation for compute, storage, for data access, auditability, scale, making it easy to meet per-query basis without moving for advanced analytics and and analytics.
    [Show full text]
  • Metadata Management on a Hadoop Eco-System
    Metadata management on a Hadoop Eco-System Whitepaper by Satya Nayak Project Lead, Mphasis Jansee Korapati Module Lead, Mphasis Introduction • A well-built metadata layer will allow organization The data lake stores large amount of structured and to harness the potential of data lake and deliver the unstructured data in various varieties at different following mechanisms to the end users to access data transformed layers. While the data is growing to terabytes and perform analysis: and petabytes, and your data lake is being used by the - Self Service BI (SSBI) enterprise, you are likely to come across questions/ - Data-as-a-Service (DaaS) challenges, such as what data is available in the data lake, how it is consumed/prepared/transformed, who is using - Machine Learning-as-a-Service this data, who is contributing to this data, how old is the - Data Provisioning (DP) data… etc. A well maintained metadata layer can effectively answer these kind of queries and thus im-prove the usability of the data lake. This white paper provides the benefits of You can optimize your data lake an effective metadata layer for a data lake implemented using Hadoop Cluster; information on various metadata to the fullest with metadata Management tools is presented, with their features and management. architecture. Benefits and Functions of Metadata Layer • The metadata Layer defines the structure of files in Raw • The metadata layer captures vital information about Zone and describes the entities in-side the file. Using the data as it enters the data lake and indexes the the base level description, the schema evolution of the information so that users can search metadata before file or record is tracked by a versioning schema.
    [Show full text]
  • Data Lineage Management: Impact and Value
    INDUSTRY DEVELOPMENTS AND MODELS Data Lineage Management: Impact and Value Stewart Bond IDC OPINION Data is at the core of digital transformation, and data without integrity won't be able to support digital transformation initiatives. A key component of data integrity includes being able to trust the data. A key component of data trust is lineage. If you don't know the lineage of your data, you don't know whether you can trust it. Data lineage has been important since before the 1st Platform, increased in importance on the 2nd Platform as data became more distributed, and is of significant importance on the 3rd Platform as the scale of data distribution and the variation of data sources are greater than ever before. Results from a survey of data integration software end users indicate that organizations that are tracking data lineage have more trustworthy data, are able to find data faster, and are able to better support security and privacy requirements compared with those organizations that are not tracking lineage. These survey results, combined with drivers for data with integrity in an era that introduces added challenges of schemaless and ever-changing big data persistence environments, are driving innovations in the metadata management segment of the data integration functional market tracked by IDC. IDC is also seeing metadata management and data lineage components becoming the cornerstones of emerging data intelligence solutions. Increasing numbers of regulatory requirements, regional diversity, data-driven decision making, complex security and privacy requirements, and the era of digital transformation are all driving new requirements for data lineage and expanding the definition to answer the five Ws of data: .
    [Show full text]
  • Data Warehouse Optimization with Hadoop
    White Paper Data Warehouse Optimization with Hadoop A Big Data Reference Architecture Using Informatica and Cloudera Technologies This document contains Confidential, Proprietary and Trade Secret Information (“Confidential Information”) of Informatica Corporation and may not be copied, distributed, duplicated, or otherwise reproduced in any manner without the prior written consent of Informatica. While every attempt has been made to ensure that the information in this document is accurate and complete, some typographical errors or technical inaccuracies may exist. Informatica does not accept responsibility for any kind of loss resulting from the use of information contained in this document. The information contained in this document is subject to change without notice. The incorporation of the product attributes discussed in these materials into any release or upgrade of any Informatica software product—as well as the timing of any such release or upgrade—is at the sole discretion of Informatica. Protected by one or more of the following U.S. Patents: 6,032,158; 5,794,246; 6,014,670; 6,339,775; 6,044,374; 6,208,990; 6,208,990; 6,850,947; 6,895,471; or by the following pending U.S. Patents: 09/644,280; 10/966,046; 10/727,700. This edition published January 2014 White Paper Table of Contents Executive Summary ....................................... 2 The Need for Data Warehouse Optimization..................... 3 Real-World Use of Data Warehouse Optimization................. 4 Informatica and Cloudera Converge on Big Data ................
    [Show full text]
  • Solution Brief Intelligent Data Cataloging for Cloud Data
    Solution Brief Intelligent Data Cataloging for Cloud Data Warehouses, Data Lakes, and Lakehouses Key Benefits Accelerate AI-powered Data Discovery and Cloud Data Integration • Maximize cloud data warehouse, data lake, and lakehouse value Enterprises today are rapidly moving their data and analytics infrastructure to the cloud. While without disruption this move to capitalize on cloud’s agility, scalability, and cost-efficiency affects all aspects of data • Accelerate even complex cloud analytics, it’s particularly urgent for data warehousing and data lakes. For years, enterprise data migration and modernization initiatives warehouses were stable workhorses, powering enterprise analytics and reporting systems. Now, a massive shift is underway to modernize them in the cloud to achieve dramatic improvements in • Gain visibility into all your enterprise data assets with performance and competitive advantage. At the same time, organizations are adopting cloud data industry-leading metadata lakes to go hand-in-hand with their cloud data warehouses. And more recently, organizations are management building cloud lakehouses, which merge the best of data warehouses and data lakes to provide • Enable users to easily find analytics capabilities to power everything from BI dashboards to advanced AI and machine relevant data in the cloud learning projects. However, adopting cloud data warehouses, data lakes, and lakehouses can present new challenges. Although it may be initially straightforward to stand up a new cloud data warehouse or data lake, maximizing the value of your investment requires strategy and planning. Whether you are building a new cloud data warehouse, data lake, or lakehouse, or modernizing data and workloads in the cloud over time, it’s essential to understand and assess your current data landscape and make sure you have the tools and best practices in place to manage your data once it’s in the cloud.
    [Show full text]
  • Data Governance 101
    Data Governance 101 Moving past challenges to operationalization MISSION IMPOSSIBLE? OVERCOMING DATA GOVERNANCE CHALLENGES 6 Key Questions to Help Identify the Strength of your Data Governance Program With today’s enterprises relying on big data analytics for business intelligence, implementing an effective data governance program is a top priority. Without data governance there are unanswered questions in understanding your data – “Am I using the right data?” “Is the data I’m using quality data?” After all, data is only valuable if you can translate it into actionable insights to inform strategic and operational decisions. Creating a comprehensive data governance structure requires a process to deal with the most common problems around data. In fact, if a business can’t answer the following six questions, it’s a sign that they need a stronger data governance program. Understanding Your Data One option is to put pressure on IT to do more. But with IT already stressed, overworked, and lacking sufficient bandwidth, getting them to reprioritize often means sacrificing other high priority projects. Often, requests for data analytics take a backseat because IT is overwhelmed satisfying commitments for others across the organization. • Keep it Secure – Ensuring the security of sensitive and personally identifiable information (PII) is a top priority for an effective data governance program. Having a place to view the data end-to-end is even more important. Many enterprises struggle to reduce data security risks due to unauthorized access or misuse of data, while others have difficulty managing the confidentiality, integrity, and availability of data. By understanding the nature of the data, where it’s stored and how it’s used, enterprises can implement the appropriate data governance guidelines for data use, and specify the right standards and policies around data ownership.
    [Show full text]