CDP DATA CENTER 7.1 Laurent Edel : Solution Engineer Jacques Marchand : Solution Engineer Mael Ropars : Principal Solution Engineer

Total Page:16

File Type:pdf, Size:1020Kb

CDP DATA CENTER 7.1 Laurent Edel : Solution Engineer Jacques Marchand : Solution Engineer Mael Ropars : Principal Solution Engineer CDP DATA CENTER 7.1 Laurent Edel : Solution Engineer Jacques Marchand : Solution Engineer Mael Ropars : Principal Solution Engineer 30 Juin 2020 SPEAKERS • © 2019 Cloudera, Inc. All rights reserved. AGENDA • CDP DATA CENTER OVERVIEW • DETAILS ABOUT MAJOR COMPONENTS • PATH TO CDP DC && SMART MIGRATION • Q/A © 2019 Cloudera, Inc. All rights reserved. CLOUDERA DATA PLATFORM © 2020 Cloudera, Inc. All rights reserved. 4 ARCHITECTURE CIBLE : ENTERPRISE DATA CLOUD CDP Cloud Public CDP On-Prem (platform-as-a-service) (installable software) © 2020 Cloudera, Inc. All rights reserved. 5 CDP DATA CENTER OVERVIEW CDP Data Center (installable software) NEW CDP Data Center features include: Cloudera Manager • High-performance SQL analytics • Real-time stream processing, analytics, and management • Fine-grained security, enterprise metadata, and scalable data lineage • Support for object storage (tech preview) • Single pane of glass for management - multi-cluster support Enterprise analytics and data management platform, built for hybrid cloud, optimized for bare metal and ready for private cloud Cloudera Runtime © 2020 Cloudera, Inc. All rights reserved. 6 A NEW OPEN SOURCE DISTRIBUTION FOR BETTER CAPABILITY Cloudera Runtime - created from the best of CDH and HDP Deprecate competitive Merge overlapping Keep complementary Upgrade shared technologies technologies technologies technologies © 2019 Cloudera, Inc. All rights reserved. 7 COMPONENT LIST CDP Data Center 7.1(May) 2020 • Cloudera Manager 7.1 • HBase 2.2 • Key HSM 7.1 • Kafka Schema Registry 0.8 • Hadoop 3.1 • Phoenix 5.0 • Knox 1.3 • Streams Messaging Mgr 1.0 • Spark 2.4 / Spark 3(b2) • Kudu 1.12 • Livy 0.7 • Streams Replication Mgr 2.1 • Hive 3.1 • Sqoop 1.4.7 • Navigator Encrypt 7.1 • Ozone (Beta) 0.6 • Impala 3.4 • Parquet 1.10 • Ranger KMS 7.1 • Kafka Connect 2.4 • Oozie 5.1 • Avro 1.8 • Zeppelin • Cruise Control 2.0 • Hue 4.5 • ORC 1.5 • Hive Warehouse Connector 1.0 • Tez 0.9 • Ranger 2.0 • Zookeeper 3.5 • Kafka 2.4 • Key Trustee Server 7 • Atlas 2.0 • Solr 8.4 • RHEL/CENTOS/OEL 7.7 • MySQL 5.7 • Upgrades from CDP DC 7.0 • Postgres 10 • Oracle DB 12 (Fresh Install Only) • Upgrades from CDH 5.13-5.16 • JDK 8 • PostgreSQL 10 • Upgrades from HDP 2.6.5 • JDK 11 Runtime • Maria DB 10.2 © 2020 Cloudera, Inc. All rights reserved. 8 AGENDA • CDP DATA CENTER OVERVIEW • DETAILS ABOUT MAJOR COMPONENTS • PATH TO CDP DC && SMART MIGRATION • Q/A © 2019 Cloudera, Inc. All rights reserved. CDP DATA CENTER OVERVIEW [What is the scope of CDP Data Center] Collect Report Predict Impala Spark Hive Zeppelin 02 Kudu 04 01 03 05 Enrich Serve Spark Hbase Hive Phoenix SolR SECURITY | GOVERNANCE | LINEAGE | MANAGEMENT | AUTOMATION © 2020 Cloudera, Inc. All rights reserved. 10 CDP DATA CENTER OVERVIEW [What is the scope of CDP Data Center] Collect Report Predict Impala Spark Hive Zeppelin 02 Kudu 04 01 03 05 Enrich Serve Spark Hbase Hive Phoenix SolR SECURITY | GOVERNANCE | LINEAGE | MANAGEMENT | AUTOMATION © 2020 Cloudera, Inc. All rights reserved. 11 KAFKA COMPUTE CLUSTERS WITH CLOUDERA MANAGER Kafka Clusters using Shared Security & Governance Data Lake with Atlas and Ranger • Kafka 2.4 • Ranger & Atlas Integration • Support of Kafka Connect, Kafka Streams • Cruise Control for load balancing • Create multiple Kafka compute clusters using shared Security Data Lake with Ranger & Atlas © 2019 Cloudera, Inc. All rights reserved. KAFKA MANAGEMENT SERVICES Kafka Services for Schema Management, Replication and Monitoring Schema Registry Streams Messaging Manager (SMM) Streams Replication Manager (SRM) New Kafka Schema Governance New Kafka Monitoring Service New Kafka Replication Engine powered by MirrorMaker2 © 2019 Cloudera, Inc. All rights reserved. CDP DATA CENTER OVERVIEW [What is the scope of CDP Data Center] Collect Report Predict Impala Spark Hive Zeppelin 02 Kudu 04 01 03 05 Enrich Serve Spark Hbase Hive Phoenix SolR SECURITY | GOVERNANCE | LINEAGE | MANAGEMENT | AUTOMATION © 2020 Cloudera, Inc. All rights reserved. 14 SPARK • Spark 2.4 • Integration with Ranger for Fine Grained Authorizations • Coming soon: Spark 3 ! • Better performance • Enhanced support for Deep Learning • New modules • MLLib replaced with SparkML • Tech Preview available © 2020 Cloudera, Inc. All rights reserved. 15 CDP DATA CENTER OVERVIEW [What is the scope of CDP Data Center] Collect Report Predict Impala Spark Hive Zeppelin 02 Kudu 04 01 03 05 Enrich Serve Spark Hbase Hive Phoenix SolR SECURITY | GOVERNANCE | LINEAGE | MANAGEMENT | AUTOMATION © 2020 Cloudera, Inc. All rights reserved. 16 SQL USER EXPERIENCE : HUE © 2020 Cloudera, Inc. All rights reserved. 17 DATA WAREHOUSE Hive 3 Apache Hive 3 • Comprehensive ANSI SQL 2016 coverage • GDPR: new ACID v2 as fast as regular tables, transactions, UPDATE/DELETE/MERGE • Cloud-ready: optimized for S3/WASB/GCP • Support for JDBC/Kafka/Druid out of the box • EDW offload: – “DBA” tooling: surrogate keys, materialized views, constraints – information schema • Performance: – workload management – query result cache © 2020 Cloudera, Inc. All rights reserved. 18 DATA WAREHOUSE Impala and Kudu Apache Impala Apache Kudu • Leading MPP SQL Engine for DW - • Leading columnar storage engine for fast optimized for Parquet/Kudu analytics on fast data • Ideal for Data Mart Implementations that • Ideal for Low latency time series data require Interactive/Ad-hoc BI ingest and analytics (with Impala SQL • 1000+ enterprise customers - many engine) running on 10s of PBs and 100s of nodes • Strength of fast ingest with single rows like • Certified with leading BI tools with broad HBASE and allows large scans like HDFS SQL coverage • ACID (insert/update/delete) semantics • Latest release adds improvements for with single rows resiliency, concurrency, and metadata © 2020 Cloudera, Inc. All rights reserved. 19 WORKLOAD MANAGER Global view on Deep Dive Query analysis analytic processing © 2020 Cloudera, Inc. All rights reserved. 20 CDP DATA CENTER OVERVIEW [What is the scope of CDP Data Center] Collect Report Predict Impala Spark Hive Zeppelin 02 Kudu 04 01 03 05 Enrich Serve Spark Hbase Hive Phoenix SolR SECURITY | GOVERNANCE | LINEAGE | MANAGEMENT | AUTOMATION © 2020 Cloudera, Inc. All rights reserved. 21 HBASE + PHOENIX HBASE PHOENIX Flexible, scale-out, no-sql database RDBMS-like, scale-out database Put put = new Put(Bytes.toBytes(rowKey)); stmt.executeUpdate(“UPSERT INTO TABLE_NAME put.addColumn(COLUMN_FAMILY_NAME, COLUMN_NAME, VALUES(rowKey, GREETINGS) "); Bytes.toBytes(GREETINGS)); stmt.execute(); table.put(put); • Maximally flexible & customizable • Programmatic ANSI SQL support • SQL only for data remediation • RDBMS-like data architecture • All advanced functionality available • Auto-applies performance best • New async client practices • JDK8/G1GC • Can co-exist with HBase apps • Off-Heap read path • API clean-up, HBCK2 © 2020 Cloudera, Inc. All rights reserved. 22 CDP SEARCH Scalable and Robust Index Storage with SOLR 8.4 Querying API Indexing API Solr Cloud ● Scalable, cost-efficient index storage Distributed processing coordinator ● High availability, Integrated security with Atlas/Ranger Solr Extraction Mapping ● Shared data store with other processing tools (Spark, Impala..) Indexing engine (Lucene) ● Search AND process data in one platform Shared Data Storage © 2020 Cloudera, Inc. All rights reserved. 23 CDP DATA CENTER OVERVIEW [What is the scope of CDP Data Center] Collect Report Predict Impala Spark Hive Zeppelin 02 Kudu 04 01 03 05 Enrich Serve Spark Hbase Hive Phoenix SolR SECURITY | GOVERNANCE | LINEAGE | MANAGEMENT | AUTOMATION © 2020 Cloudera, Inc. All rights reserved. 24 DATA SCIENCE AND ENGINEERING TOOLS CLOUDERA DATA SCIENCE APACHE ZEPPELIN WORKBENCH © 2019 Cloudera, Inc. All rights reserved. CDP DATA CENTER OVERVIEW [What is the scope of CDP Data Center] Collect Report Predict Impala Spark Hive Zeppelin 02 Kudu 04 01 03 05 Enrich Serve Spark Hbase Hive Phoenix SolR SECURITY | GOVERNANCE | LINEAGE | MANAGEMENT | AUTOMATION © 2020 Cloudera, Inc. All rights reserved. 26 SIMPLIFIED MANAGEMENT Cloudera Manager • Management of multiple clusters • Knox,Ranger,Atlas,Hive-on-Tez,DAS • Cluster-level configuration history • Improved global search • Resume errors in enabling Kerberos • Scalability improvements • Improved alerts configuration • Upgrade Support • Support for Private Cloud (Beta) © 2020 Cloudera, Inc. All rights reserved. 27 CONSISTENT SECURITY AND GOVERNANCE Built for multi-functional analytics anywhere • Data Catalog: a comprehensive catalog of all data sets, spanning on-premises, cloud object stores, structured, unstructured, and semi-structured • Schema: automatic capture and storage of any and all schema and metadata definitions as they are used and created by platform workloads • Replication: deliver data as well as data policies there where the enterprise needs to work, with complete consistency and security • Security: role-based access control applied consistently across the platform. Includes full stack encryption and key management • Governance: enterprise-grade auditing, lineage, and governance capabilities applied across the platform with rich extensibility for partner integrations © 2020 Cloudera, Inc. All rights reserved. 28 SECURITY AND GOVERNANCE Identity & Perimeter Access Visibility Data Protection Validate users in Defining what users and Reporting on where data Shielding data in the enterprise directory applications can do with came from and how it’s cluster from unauthorized data being used visibility Technical Concepts: Technical Concepts: Technical Concepts:
Recommended publications
  • Netapp Solutions for Hadoop Reference Architecture: Cloudera Faiz Abidi (Netapp) and Udai Potluri (Cloudera) June 2018 | WP-7217
    White Paper NetApp Solutions for Hadoop Reference Architecture: Cloudera Faiz Abidi (NetApp) and Udai Potluri (Cloudera) June 2018 | WP-7217 In partnership with Abstract There has been an exponential growth in data over the past decade and analyzing huge amounts of data in a reasonable time can be a challenge. Apache Hadoop is an open- source tool that can help your organization quickly mine big data and extract meaningful patterns from it. However, enterprises face several technical challenges when deploying Hadoop, specifically in the areas of cluster availability, operations, and scaling. NetApp® has developed a reference architecture with Cloudera to deliver a solution that overcomes some of these challenges so that businesses can ingest, store, and manage big data with greater reliability and scalability and with less time spent on operations and maintenance. This white paper discusses a flexible, validated, enterprise-class Hadoop architecture that is based on NetApp E-Series storage using Cloudera’s Hadoop distribution. TABLE OF CONTENTS 1 Introduction ........................................................................................................................................... 4 1.1 Big Data ..........................................................................................................................................................4 1.2 Hadoop Overview ...........................................................................................................................................4 2 NetApp E-Series
    [Show full text]
  • Groups and Activities Report 2017
    Groups and Activities Report 2017 ISBN 978-92-9083-491-5 This report is released under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 2 | Page CERN IT Department Groups and Activities Report 2017 CONTENTS GROUPS REPORTS 2017 Collaborations, Devices & Applications (CDA) Group ............................................................................. 6 Communication Systems (CS) Group .................................................................................................... 11 Compute & Monitoring (CM) Group ..................................................................................................... 16 Computing Facilities (CF) Group ........................................................................................................... 20 Databases (DB) Group ........................................................................................................................... 23 Departmental Infrastructure (DI) Group ............................................................................................... 27 Storage (ST) Group ................................................................................................................................ 28 ACTIVITIES AND PROJECTS REPORTS 2017 CERN openlab ........................................................................................................................................ 34 CERN School of Computing (CSC) .........................................................................................................
    [Show full text]
  • Kyuubi Release 1.3.0 Kent
    Kyuubi Release 1.3.0 Kent Yao Sep 30, 2021 USAGE GUIDE 1 Multi-tenancy 3 2 Ease of Use 5 3 Run Anywhere 7 4 High Performance 9 5 Authentication & Authorization 11 6 High Availability 13 6.1 Quick Start................................................ 13 6.2 Deploying Kyuubi............................................ 47 6.3 Kyuubi Security Overview........................................ 76 6.4 Client Documentation.......................................... 80 6.5 Integrations................................................ 82 6.6 Monitoring................................................ 87 6.7 SQL References............................................. 94 6.8 Tools................................................... 98 6.9 Overview................................................. 101 6.10 Develop Tools.............................................. 113 6.11 Community................................................ 120 6.12 Appendixes................................................ 128 i ii Kyuubi, Release 1.3.0 Kyuubi™ is a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark™. In general, the complete ecosystem of Kyuubi falls into the hierarchies shown in the above figure, with each layer loosely coupled to the other. For example, you can use Kyuubi, Spark and Apache Iceberg to build and manage Data Lake with pure SQL for both data processing e.g. ETL, and analytics e.g. BI. All workloads can be done on one platform, using one copy of data, with one SQL interface. Kyuubi provides the following features: USAGE GUIDE 1 Kyuubi, Release 1.3.0 2 USAGE GUIDE CHAPTER ONE MULTI-TENANCY Kyuubi supports the end-to-end multi-tenancy, and this is why we want to create this project despite that the Spark Thrift JDBC/ODBC server already exists. 1. Supports multi-client concurrency and authentication 2. Supports one Spark application per account(SPA). 3. Supports QUEUE/NAMESPACE Access Control Lists (ACL) 4.
    [Show full text]
  • Chapter 2 Introduction to Big Data Technology
    Chapter 2 Introduction to Big data Technology Bilal Abu-Salih1, Pornpit Wongthongtham2 Dengya Zhu3 , Kit Yan Chan3 , Amit Rudra3 1The University of Jordan 2 The University of Western Australia 3 Curtin University Abstract: Big data is no more “all just hype” but widely applied in nearly all aspects of our business, governments, and organizations with the technology stack of AI. Its influences are far beyond a simple technique innovation but involves all rears in the world. This chapter will first have historical review of big data; followed by discussion of characteristics of big data, i.e. from the 3V’s to up 10V’s of big data. The chapter then introduces technology stacks for an organization to build a big data application, from infrastructure/platform/ecosystem to constructional units and components. Finally, we provide some big data online resources for reference. Keywords Big data, 3V of Big data, Cloud Computing, Data Lake, Enterprise Data Centre, PaaS, IaaS, SaaS, Hadoop, Spark, HBase, Information retrieval, Solr 2.1 Introduction The ability to exploit the ever-growing amounts of business-related data will al- low to comprehend what is emerging in the world. In this context, Big Data is one of the current major buzzwords [1]. Big Data (BD) is the technical term used in reference to the vast quantity of heterogeneous datasets which are created and spread rapidly, and for which the conventional techniques used to process, analyse, retrieve, store and visualise such massive sets of data are now unsuitable and inad- equate. This can be seen in many areas such as sensor-generated data, social media, uploading and downloading of digital media.
    [Show full text]
  • Release Notes Date Published: 2020-08-10 Date Modified
    Cloudera Runtime 7.1.3 Release Notes Date published: 2020-08-10 Date modified: https://docs.cloudera.com/ Legal Notice © Cloudera Inc. 2021. All rights reserved. The documentation is and contains Cloudera proprietary information protected by copyright and other intellectual property rights. No license under copyright or any other intellectual property right is granted herein. Copyright information for Cloudera software may be found within the documentation accompanying each component in a particular release. Cloudera software includes software from various open source or other third party projects, and may be released under the Apache Software License 2.0 (“ASLv2”), the Affero General Public License version 3 (AGPLv3), or other license terms. Other software included may be released under the terms of alternative open source licenses. Please review the license and notice files accompanying the software for additional licensing information. Please visit the Cloudera software product page for more information on Cloudera software. For more information on Cloudera support services, please visit either the Support or Sales page. Feel free to contact us directly to discuss your specific needs. Cloudera reserves the right to change any products at any time, and without notice. Cloudera assumes no responsibility nor liability arising from the use of products, except as expressly agreed to in writing by Cloudera. Cloudera, Cloudera Altus, HUE, Impala, Cloudera Impala, and other Cloudera marks are registered or unregistered trademarks in the United States and other countries. All other trademarks are the property of their respective owners. Disclaimer: EXCEPT AS EXPRESSLY PROVIDED IN A WRITTEN AGREEMENT WITH CLOUDERA, CLOUDERA DOES NOT MAKE NOR GIVE ANY REPRESENTATION, WARRANTY, NOR COVENANT OF ANY KIND, WHETHER EXPRESS OR IMPLIED, IN CONNECTION WITH CLOUDERA TECHNOLOGY OR RELATED SUPPORT PROVIDED IN CONNECTION THEREWITH.
    [Show full text]
  • Storage and Ingestion Systems in Support of Stream Processing
    Storage and Ingestion Systems in Support of Stream Processing: A Survey Ovidiu-Cristian Marcu, Alexandru Costan, Gabriel Antoniu, María Pérez-Hernández, Radu Tudoran, Stefano Bortoli, Bogdan Nicolae To cite this version: Ovidiu-Cristian Marcu, Alexandru Costan, Gabriel Antoniu, María Pérez-Hernández, Radu Tudoran, et al.. Storage and Ingestion Systems in Support of Stream Processing: A Survey. [Technical Report] RT-0501, INRIA Rennes - Bretagne Atlantique and University of Rennes 1, France. 2018, pp.1-33. hal-01939280v2 HAL Id: hal-01939280 https://hal.inria.fr/hal-01939280v2 Submitted on 14 Dec 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Storage and Ingestion Systems in Support of Stream Processing: A Survey Ovidiu-Cristian Marcu, Alexandru Costan, Gabriel Antoniu, María S. Pérez-Hernández, Radu Tudoran, Stefano Bortoli, Bogdan Nicolae TECHNICAL REPORT N° 0501 November 2018 Project-Team KerData ISSN 0249-0803 ISRN INRIA/RT--0501--FR+ENG Storage and Ingestion Systems in Support of Stream Processing: A Survey Ovidiu-Cristian Marcu∗, Alexandru
    [Show full text]
  • Cloudera Enterprise
    DATA SHEET One Platform. Many Applications. Cloudera Enterprise Making Hadoop Fast, Easy, and Secure Data Engineering Apache Hadoop is a new type of data platform: one place to store unlimited data and Build new pipelines, process data faster, access that data with multiple frameworks, all within the same platform. However, all and enable new data science workloads too often, enterprises struggle to turn this new technology into real business value. Analytic Database Cloudera Enterprise changes that. Powered by the world’s most popular Hadoop distribution, Explore, analyze, and understand all your data only Cloudera Enterprise makes Hadoop fast, easy, and secure so you can focus on results, not the technology. Operational Database Build data-driven products to deliver Fast for Business real-time insights Only Cloudera Enterprise enables more insights for more users, all within a single platform. With the most powerful open source tools and the only active data optimization designed for Hadoop, you can move from big data to results faster. Key features include: Deploy and Run on Any Cloud • In-Memory Data Processing: The most experience with Apache Spark Multi-Cloud Provisioning • Fast Analytic SQL: The lowest latency and best concurrency for BI with Apache Impala Deploy and manage Cloudera Enterprise across • Native Search: Complete user accessibility built-into the platform with Apache Solr AWS, Google Cloud Platform, Microsoft Azure, • Updateable Analytic Storage: The only Hadoop storage for fast analytics on fast and private networks changing data with Apache Kudu High-Performance Analytics • Active Data Optimization: Cloudera Navigator Optimizer (limited beta) helps tune data Run the analytic tool of choice against and workloads for peak performance with Hadoop cloud-native object store, Amazon S3 Easy to Manage Hadoop is a complex, evolving ecosystem of open source projects.
    [Show full text]
  • Towards a Unified Ingestion-And-Storage Architecture
    Towards a Unified Ingestion-and-Storage Architecture for Stream Processing Ovidiu-Cristian Marcu∗, Alexandru Costany, Gabriel Antoniu∗, Mar´ıa S. Perez-Hern´ andez´ z, Radu Tudoranx, Stefano Bortolix and Bogdan Nicolaex ∗Inria Rennes - Bretagne Atlantique, France yIRISA / INSA Rennes, France zUniversidad Politecnica de Madrid, Spain xHuawei Germany Research Center Abstract—Big Data applications are rapidly moving from a batch-oriented execution model to a streaming execution model in order to extract value from the data in real-time. However, processing live data alone is often not enough: in many cases, such applications need to combine the live data with previously archived data to increase the quality of the extracted insights. Current streaming-oriented runtimes and middlewares are not flexible enough to deal with this trend, as they address ingestion (collection and pre-processing of data streams) and persistent storage (archival of intermediate results) using separate services. This separation often leads to I/O redundancy (e.g., write data twice to disk or transfer data twice over the network) and interference (e.g., I/O bottlenecks when collecting data streams and writing archival data simultaneously). In this position paper, we argue for a unified ingestion and storage architecture for streaming data that addresses the aforementioned challenge. We Fig. 1: The usual streaming architecture: data is first ingested and then it identify a set of constraints and benefits for such a unified model, flows through the processing layer which relies on the storage layer for storing aggregated data or for archiving streams for later usage. while highlighting the important architectural aspects required to implement it in real life.
    [Show full text]
  • Release Notes Date Published: 2020-10-13 Date Modified
    Cloudera Runtime 7.1.4 Release Notes Date published: 2020-10-13 Date modified: https://docs.cloudera.com/ Legal Notice © Cloudera Inc. 2021. All rights reserved. The documentation is and contains Cloudera proprietary information protected by copyright and other intellectual property rights. No license under copyright or any other intellectual property right is granted herein. Copyright information for Cloudera software may be found within the documentation accompanying each component in a particular release. Cloudera software includes software from various open source or other third party projects, and may be released under the Apache Software License 2.0 (“ASLv2”), the Affero General Public License version 3 (AGPLv3), or other license terms. Other software included may be released under the terms of alternative open source licenses. Please review the license and notice files accompanying the software for additional licensing information. Please visit the Cloudera software product page for more information on Cloudera software. For more information on Cloudera support services, please visit either the Support or Sales page. Feel free to contact us directly to discuss your specific needs. Cloudera reserves the right to change any products at any time, and without notice. Cloudera assumes no responsibility nor liability arising from the use of products, except as expressly agreed to in writing by Cloudera. Cloudera, Cloudera Altus, HUE, Impala, Cloudera Impala, and other Cloudera marks are registered or unregistered trademarks in the United States and other countries. All other trademarks are the property of their respective owners. Disclaimer: EXCEPT AS EXPRESSLY PROVIDED IN A WRITTEN AGREEMENT WITH CLOUDERA, CLOUDERA DOES NOT MAKE NOR GIVE ANY REPRESENTATION, WARRANTY, NOR COVENANT OF ANY KIND, WHETHER EXPRESS OR IMPLIED, IN CONNECTION WITH CLOUDERA TECHNOLOGY OR RELATED SUPPORT PROVIDED IN CONNECTION THEREWITH.
    [Show full text]
  • Getting Started with Kudu PERFORM FAST ANALYTICS on FAST DATA
    Getting Started with Kudu PERFORM FAST ANALYTICS ON FAST DATA Jean-Marc Spaggiari, Mladen Kovacevic, Brock Noland & Ryan Bosshart Getting Started with Kudu Perform Fast Analytics on Fast Data Jean-Marc Spaggiari, Mladen Kovacevic, Brock Noland, and Ryan Bosshart Beijing Boston Farnham Sebastopol Tokyo Getting Started with Kudu by Jean-Marc Spaggiari, Mladen Kovacevic, Brock Noland, and Ryan Bosshart Copyright ©2018 Jean-Marc Spaggiari, Mladen Kovacevic, Brock Noland, Ryan Bosshart. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com/safari). For more information, contact our corporate/insti‐ tutional sales department: 800-998-9938 or [email protected]. Editor: Nicole Tache Interior Designer: David Futato Production Editor: Colleen Cole Cover Designer: Randy Comer Copyeditor: Dwight Ramsey Illustrator: Melanie Yarbrough Proofreaders: Charles Roumeliotis and Octal Technical Reviewers: David Yahalom, Andy Stadtler, Publishing, Inc. Attila Bukor, and Peter Paton Indexer: Judy McConville July 2018: First Edition Revision History for the First Edition 2018-07-09: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781491980255 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Getting Started with Kudu, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work.
    [Show full text]
  • Building a Scalable Distributed Data Platform Using Lambda Architecture
    Building a scalable distributed data platform using lambda architecture by DHANANJAY MEHTA B.Tech., Graphic Era University, India, 2012 A REPORT submitted in partial fulfillment of the requirements for the degree MASTER OF SCIENCE Department Of Computer Science College Of Engineering KANSAS STATE UNIVERSITY Manhattan, Kansas 2017 Approved by: Major Professor Dr. William H. Hsu Copyright Dhananjay Mehta 2017 Abstract Data is generated all the time over Internet, systems, sensors and mobile devices around us this data is often referred to as 'big data'. Tapping this data is a challenge to organiza- tions because of the nature of data i.e. velocity, volume and variety. What make handling this data a challenge? This is because traditional data platforms have been built around relational database management systems coupled with enterprise data warehouses. Legacy infrastructure is either technically incapable to scale to big data or financially infeasible. Now the question arises, how to build a system to handle the challenges of big data and cater needs of an organization? The answer is Lambda Architecture. Lambda Architecture (LA) is a generic term that is used for a scalable and fault-tolerant data processing architecture that ensure real-time processing with low latency. LA provides a general strategy to knit together all necessary tools for building a data pipeline for real- time processing of big data. LA builds a big data platform as a series of layers that combine batch and real time processing. LA comprise of three layers - Batch Layer, responsible for bulk data processing; Speed Layer, responsible for real-time processing of data streams and Serving Layer, responsible for serving queries from end users.
    [Show full text]
  • Red Hat Fuse 7.3 Release Notes
    Red Hat Fuse 7.3 Release Notes What's new in Red Hat Fuse Last Updated: 2020-06-26 Red Hat Fuse 7.3 Release Notes What's new in Red Hat Fuse Legal Notice Copyright © 2020 Red Hat, Inc. The text of and illustrations in this document are licensed by Red Hat under a Creative Commons Attribution–Share Alike 3.0 Unported license ("CC-BY-SA"). An explanation of CC-BY-SA is available at http://creativecommons.org/licenses/by-sa/3.0/ . In accordance with CC-BY-SA, if you distribute this document or an adaptation of it, you must provide the URL for the original version. Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law. Red Hat, Red Hat Enterprise Linux, the Shadowman logo, the Red Hat logo, JBoss, OpenShift, Fedora, the Infinity logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries. Linux ® is the registered trademark of Linus Torvalds in the United States and other countries. Java ® is a registered trademark of Oracle and/or its affiliates. XFS ® is a trademark of Silicon Graphics International Corp. or its subsidiaries in the United States and/or other countries. MySQL ® is a registered trademark of MySQL AB in the United States, the European Union and other countries. Node.js ® is an official trademark of Joyent. Red Hat is not formally related to or endorsed by the official Joyent Node.js open source or commercial project.
    [Show full text]