Superset Documentation

Total Page:16

File Type:pdf, Size:1020Kb

Superset Documentation Superset Documentation Apache Superset Dev Dec 05, 2019 CONTENTS 1 Superset Resources 3 2 Apache Software Foundation Resources5 3 Overview 7 3.1 Features..................................................7 3.2 Databases.................................................7 3.3 Screenshots................................................9 3.4 Contents................................................. 12 3.5 Indices and tables............................................ 85 i ii Superset Documentation Apache Superset (incubating) is a modern, enterprise-ready business intelligence web application Important: Disclaimer: Apache Superset is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. Note: Apache Superset, Superset, Apache, the Apache feather logo, and the Apache Superset project logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries. CONTENTS 1 Superset Documentation 2 CONTENTS CHAPTER ONE SUPERSET RESOURCES • Superset’s Github, note that we use Github for issue tracking • Superset’s contribution guidelines and code of conduct on Github. • Our mailing list archives. To subscribe, send an email to [email protected] • Join our Slack 3 Superset Documentation 4 Chapter 1. Superset Resources CHAPTER TWO APACHE SOFTWARE FOUNDATION RESOURCES • The Apache Software Foundation Website • Current Events • License • Thanks to the ASF’s sponsors • Sponsor Apache! 5 Superset Documentation 6 Chapter 2. Apache Software Foundation Resources CHAPTER THREE OVERVIEW 3.1 Features • A rich set of data visualizations • An easy-to-use interface for exploring and visualizing data • Create and share dashboards • Enterprise-ready authentication with integration with major authentication providers (database, OpenID, LDAP, OAuth & REMOTE_USER through Flask AppBuilder) • An extensible, high-granularity security/permission model allowing intricate rules on who can access individual features and the dataset • A simple semantic layer, allowing users to control how data sources are displayed in the UI by defining which fields should show up in which drop-down and which aggregation and function metrics are made available to the user • Integration with most SQL-speaking RDBMS through SQLAlchemy • Deep integration with Druid.io 3.2 Databases The following RDBMS are currently suppored: • Amazon Athena • Amazon Redshift • Apache Drill • Apache Druid • Apache Hive • Apache Impala • Apache Kylin • Apache Pinot • Apache Spark SQL • BigQuery • ClickHouse 7 Superset Documentation • Google Sheets • Greenplum • IBM Db2 • MySQL • Oracle • PostgreSQL • Presto • Snowflake • SQLite • SQL Server • Teradata • Vertica Other database engines with a proper DB-API driver and SQLAlchemy dialect should be supported as well. 8 Chapter 3. Overview Superset Documentation 3.3 Screenshots 3.3. Screenshots 9 Superset Documentation 10 Chapter 3. Overview Superset Documentation 3.3. Screenshots 11 Superset Documentation 3.4 Contents 3.4.1 Installation & Configuration Getting Started Superset has deprecated support for Python 2.* and supports only ~=3.6 to take advantage of the newer Python features and reduce the burden of supporting previous versions. We run our test suite against 3.6, but running on 3.7 should work as well. Cloud-native! Superset is designed to be highly available. It is “cloud-native” as it has been designed scale out in large, distributed environments, and works well inside containers. While you can easily test drive Superset on a modest setup or simply on your laptop, there’s virtually no limit around scaling out the platform. Superset is also cloud-native in the sense that it is flexible and lets you choose your web server (Gunicorn, Nginx, Apache), your metadata database engine (MySQL, Postgres, MariaDB, . ), your message queue (Redis, RabbitMQ, SQS, . ), your results backend (S3, Redis, Memcached, . ), your caching layer (Memcached, Redis, . ), works well with services like NewRelic, StatsD and DataDog, and has the ability to run analytic workloads against most popular database technologies. Superset is battle tested in large environments with hundreds of concurrent users. Airbnb’s production environment runs inside Kubernetes and serves 600+ daily active users viewing over 100K charts a day. The Superset web server and the Superset Celery workers (optional) are stateless, so you can scale out by running on as many servers as needed. Start with Docker Note: The Docker-related files and documentation has been community-contributed and is not actively maintained and managed by the core committers working on the project. Some issues have been reported as of 2019-01. Help and contributions around Docker are welcomed! If you know docker, then you’re lucky, we have shortcut road for you to initialize development environment: git clone https://github.com/apache/incubator-superset/ cd incubator-superset/contrib/docker # prefix with SUPERSET_LOAD_EXAMPLES=yes to load examples: docker-compose run--rm superset./docker-init.sh # you can run this command everytime you need to start superset now: docker-compose up After several minutes for superset initialization to finish, you can open a browser and view http://localhost:8088 to start your journey. From there, the container server will reload on modification of the superset python and javascript source code. Don’t forget to reload the page to take the new frontend into account though. See also CONTRIBUTING.md#building, for alternative way of serving the frontend. It is also possible to run Superset in non-development mode: in the docker-compose.yml file remove the volumes needed for development and change the variable SUPERSET_ENV to production. If you are attempting to build on a Mac and it exits with 137 you need to increase your docker resources. OSX instructions: https://docs.docker.com/docker-for-mac/#advanced (Search for memory) 12 Chapter 3. Overview Superset Documentation Or if you’re curious and want to install superset from bottom up, then go ahead. See also contrib/docker/README.md OS dependencies Superset stores database connection information in its metadata database. For that purpose, we use the cryptography Python library to encrypt connection passwords. Unfortunately, this library has OS level depen- dencies. You may want to attempt the next step (“Superset installation and initialization”) and come back to this step if you encounter an error. Here’s how to install them: For Debian and Ubuntu, the following command will ensure that the required dependencies are installed: sudo apt-get install build-essential libssl-dev libffi-dev python-dev python-pip ,!libsasl2-dev libldap2-dev Ubuntu 18.04 If you have python3.6 installed alongside with python2.7, as is default on Ubuntu 18.04 LTS, run this command also: sudo apt-get install build-essential libssl-dev libffi-dev python3.6-dev python-pip ,!libsasl2-dev libldap2-dev otherwise build for cryptography fails. For Fedora and RHEL-derivatives, the following command will ensure that the required dependencies are installed: sudo yum upgrade python-setuptools sudo yum install gcc gcc-c++ libffi-devel python-devel python-pip python-wheel ,!openssl-devel libsasl2-devel openldap-devel Mac OS X If possible, you should upgrade to the latest version of OS X as issues are more likely to be resolved for that version. You will likely need the latest version of XCode available for your installed version of OS X. You should also install the XCode command line tools: xcode-select--install System python is not recommended. Homebrew’s python also ships with pip: brew install pkg-config libffi openssl python env LDFLAGS="-L$(brew --prefix openssl)/lib" CFLAGS="-I$(brew --prefix openssl)/ ,!include" pip install cryptography==2.4.2 Windows isn’t officially supported at this point, but if you want to attempt it, download get-pip.py, and run python get-pip.py which may need admin access. Then run the following: C:\> pip install cryptography # You may also have to create C:\Temp C:\> md C:\Temp 3.4. Contents 13 Superset Documentation Python virtualenv It is recommended to install Superset inside a virtualenv. Python 3 already ships virtualenv. But if it’s not installed in your environment for some reason, you can install it via the package for your operating systems, otherwise you can install from pip: pip install virtualenv You can create and activate a virtualenv by: # virtualenv is shipped in Python 3.6+ as venv instead of pyvenv. # See https://docs.python.org/3.6/library/venv.html python3-m venv venv . venv/bin/activate On windows the syntax for activating it is a bit different: venv\Scripts\activate Once you activated your virtualenv everything you are doing is confined inside the virtualenv. To exit a virtualenv just type deactivate. Python’s setup tools and pip Put all the chances on your side by getting the very latest pip and setuptools libraries.: pip install--upgrade setuptools pip Superset installation and initialization Follow these few simple steps to install Superset.: # Install superset pip install superset # Initialize the database superset db
Recommended publications
  • IPS Signature Release Note V9.17.79
    SOPHOS IPS Signature Update Release Notes Version : 9.17.79 Release Date : 19th January 2020 IPS Signature Update Release Information Upgrade Applicable on IPS Signature Release Version 9.17.78 CR250i, CR300i, CR500i-4P, CR500i-6P, CR500i-8P, CR500ia, CR500ia-RP, CR500ia1F, CR500ia10F, CR750ia, CR750ia1F, CR750ia10F, CR1000i-11P, CR1000i-12P, CR1000ia, CR1000ia10F, CR1500i-11P, CR1500i-12P, CR1500ia, CR1500ia10F Sophos Appliance Models CR25iNG, CR25iNG-6P, CR35iNG, CR50iNG, CR100iNG, CR200iNG/XP, CR300iNG/XP, CR500iNG- XP, CR750iNG-XP, CR2500iNG, CR25wiNG, CR25wiNG-6P, CR35wiNG, CRiV1C, CRiV2C, CRiV4C, CRiV8C, CRiV12C, XG85 to XG450, SG105 to SG650 Upgrade Information Upgrade type: Automatic Compatibility Annotations: None Introduction The Release Note document for IPS Signature Database Version 9.17.79 includes support for the new signatures. The following sections describe the release in detail. New IPS Signatures The Sophos Intrusion Prevention System shields the network from known attacks by matching the network traffic against the signatures in the IPS Signature Database. These signatures are developed to significantly increase detection performance and reduce the false alarms. Report false positives at [email protected], along with the application details. January 2020 Page 2 of 245 IPS Signature Update This IPS Release includes Two Thousand, Seven Hundred and Sixty Two(2762) signatures to address One Thousand, Nine Hundred and Thirty Eight(1938) vulnerabilities. New signatures are added for the following vulnerabilities: Name CVE–ID
    [Show full text]
  • Unravel Data Systems Version 4.5
    UNRAVEL DATA SYSTEMS VERSION 4.5 Component name Component version name License names jQuery 1.8.2 MIT License Apache Tomcat 5.5.23 Apache License 2.0 Tachyon Project POM 0.8.2 Apache License 2.0 Apache Directory LDAP API Model 1.0.0-M20 Apache License 2.0 apache/incubator-heron 0.16.5.1 Apache License 2.0 Maven Plugin API 3.0.4 Apache License 2.0 ApacheDS Authentication Interceptor 2.0.0-M15 Apache License 2.0 Apache Directory LDAP API Extras ACI 1.0.0-M20 Apache License 2.0 Apache HttpComponents Core 4.3.3 Apache License 2.0 Spark Project Tags 2.0.0-preview Apache License 2.0 Curator Testing 3.3.0 Apache License 2.0 Apache HttpComponents Core 4.4.5 Apache License 2.0 Apache Commons Daemon 1.0.15 Apache License 2.0 classworlds 2.4 Apache License 2.0 abego TreeLayout Core 1.0.1 BSD 3-clause "New" or "Revised" License jackson-core 2.8.6 Apache License 2.0 Lucene Join 6.6.1 Apache License 2.0 Apache Commons CLI 1.3-cloudera-pre-r1439998 Apache License 2.0 hive-apache 0.5 Apache License 2.0 scala-parser-combinators 1.0.4 BSD 3-clause "New" or "Revised" License com.springsource.javax.xml.bind 2.1.7 Common Development and Distribution License 1.0 SnakeYAML 1.15 Apache License 2.0 JUnit 4.12 Common Public License 1.0 ApacheDS Protocol Kerberos 2.0.0-M12 Apache License 2.0 Apache Groovy 2.4.6 Apache License 2.0 JGraphT - Core 1.2.0 (GNU Lesser General Public License v2.1 or later AND Eclipse Public License 1.0) chill-java 0.5.0 Apache License 2.0 Apache Commons Logging 1.2 Apache License 2.0 OpenCensus 0.12.3 Apache License 2.0 ApacheDS Protocol
    [Show full text]
  • Mapreduce Service
    MapReduce Service Troubleshooting Issue 01 Date 2021-03-03 HUAWEI TECHNOLOGIES CO., LTD. Copyright © Huawei Technologies Co., Ltd. 2021. All rights reserved. No part of this document may be reproduced or transmitted in any form or by any means without prior written consent of Huawei Technologies Co., Ltd. Trademarks and Permissions and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd. All other trademarks and trade names mentioned in this document are the property of their respective holders. Notice The purchased products, services and features are stipulated by the contract made between Huawei and the customer. All or part of the products, services and features described in this document may not be within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements, information, and recommendations in this document are provided "AS IS" without warranties, guarantees or representations of any kind, either express or implied. The information in this document is subject to change without notice. Every effort has been made in the preparation of this document to ensure accuracy of the contents, but all statements, information, and recommendations in this document do not constitute a warranty of any kind, express or implied. Issue 01 (2021-03-03) Copyright © Huawei Technologies Co., Ltd. i MapReduce Service Troubleshooting Contents Contents 1 Account Passwords.................................................................................................................. 1 1.1 Resetting
    [Show full text]
  • Creating Dashboards and Data Stories Within the Data & Analytics Framework (DAF)
    Creating dashboards and data stories within the Data & Analytics Framework (DAF) Michele Petitoc, Francesca Fallucchia,b and De Luca Ernesto Williama,b a DIII, Guglielmo Marconi University, Via Plinio 44, 00193 Roma RM, Italy E-mail: [email protected], [email protected] b DIFI, Georg Eckert Institute Braunschweig, Celler Str. 3, 38114 Braunschweig, German E-mail: [email protected], [email protected] c DIFI, Università̀ di Pisa, Lungarno Antonio Pacinotti 43, 56126, Pisa PI, Italy E-mail: [email protected] Abstract. In recent years, many data visualization tools have appeared on the market that can potentially guarantee citizens and users of the Public Administration (PA) the ability to create dashboards and data stories with just a few clicks, using open and unopened data from the PA. The Data Analytics Framework (DAF), a project of the Italian government launched at the end of 2017 and currently being tested, integrates data based on the semantic web, data analysis tools and open source business intelli- gence products that promise to solve the problems that prevented the PA to exploit its enormous data potential. The DAF favors the spread of linked open data (LOD) thanks to the integration of OntoPiA, a network of controlled ontologies and vocabularies that allows us to describe the concepts we find in datasets, such as "sex", "organization", "people", "addresses", "points of inter- est", "events" etc. This paper contributes to the enhancement of the project by introducing the process of creating a dashboard in the DAF in 5 steps, starting from the dataset search on the data portal, to the creation phase of the real dashboard through Superset and the related data story.
    [Show full text]
  • HDP 3.1.4 Release Notes Date of Publish: 2019-08-26
    Release Notes 3 HDP 3.1.4 Release Notes Date of Publish: 2019-08-26 https://docs.hortonworks.com Release Notes | Contents | ii Contents HDP 3.1.4 Release Notes..........................................................................................4 Component Versions.................................................................................................4 Descriptions of New Features..................................................................................5 Deprecation Notices.................................................................................................. 6 Terminology.......................................................................................................................................................... 6 Removed Components and Product Capabilities.................................................................................................6 Testing Unsupported Features................................................................................ 6 Descriptions of the Latest Technical Preview Features.......................................................................................7 Upgrading to HDP 3.1.4...........................................................................................7 Behavioral Changes.................................................................................................. 7 Apache Patch Information.....................................................................................11 Accumulo...........................................................................................................................................................
    [Show full text]
  • Desarrollo De Una Solución Business Intelligence Mediante Un Paradigma De Data Lake
    Desarrollo de una solución Business Intelligence mediante un paradigma de Data Lake José María Tagarro Martí Grado de Ingeniería Informática Consultor: Humberto Andrés Sanz Profesor: Atanasi Daradoumis Haralabus 13 de enero de 2019 Esta obra está sujeta a una licencia de Reconocimiento-NoComercial-SinObraDerivada 3.0 España de Creative Commons FICHA DEL TRABAJO FINAL Desarrollo de una solución Business Intelligence mediante un paradigma de Data Título del trabajo: Lake Nombre del autor: José María Tagarro Martí Nombre del consultor: Humberto Andrés Sanz Fecha de entrega (mm/aaaa): 01/2019 Área del Trabajo Final: Business Intelligence Titulación: Grado de Ingeniería Informática Resumen del Trabajo (máximo 250 palabras): Este trabajo implementa una solución de Business Intelligence siguiendo un paradigma de Data Lake sobre la plataforma de Big Data Apache Hadoop con el objetivo de ilustrar sus capacidades tecnológicas para este fin. Los almacenes de datos tradicionales necesitan transformar los datos entrantes antes de ser guardados para que adopten un modelo preestablecido, en un proceso conocido como ETL (Extraer, Transformar, Cargar, por sus siglas en inglés). Sin embargo, el paradigma Data Lake propone el almacenamiento de los datos generados en la organización en su propio formato de origen, de manera que con posterioridad puedan ser transformados y consumidos mediante diferentes tecnologías ad hoc para las distintas necesidades de negocio. Como conclusión, se indican las ventajas e inconvenientes de desplegar una plataforma unificada tanto para análisis Big Data como para las tareas de Business Intelligence, así como la necesidad de emplear soluciones basadas en código y estándares abiertos. Abstract (in English, 250 words or less): This paper implements a Business Intelligence solution following the Data Lake paradigm on Hadoop’s Big Data platform with the aim of showcasing the technology for this purpose.
    [Show full text]
  • 60 Recipes for Apache Cloudstack
    60 Recipes for Apache CloudStack Sébastien Goasguen 60 Recipes for Apache CloudStack by Sébastien Goasguen Copyright © 2014 Sébastien Goasguen. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://safaribooksonline.com). For more information, contact our corporate/ institutional sales department: 800-998-9938 or [email protected]. Editor: Brian Anderson Indexer: Ellen Troutman Zaig Production Editor: Matthew Hacker Cover Designer: Karen Montgomery Copyeditor: Jasmine Kwityn Interior Designer: David Futato Proofreader: Linley Dolby Illustrator: Rebecca Demarest September 2014: First Edition Revision History for the First Edition: 2014-08-22: First release See http://oreilly.com/catalog/errata.csp?isbn=9781491910139 for release details. Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc. 60 Recipes for Apache CloudStack, the image of a Virginia Northern flying squirrel, and related trade dress are trademarks of O’Reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps. While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein.
    [Show full text]
  • Real-Time Data Analytics with Apache Druid Correa Bosco Hilary Department of Information Technology, (Msc
    IJARSCT ISSN (Online) 2581-9429 International Journal of Advanced Research in Science, Communication and Technology (IJARSCT) Volume 5, Issue 2, May 2021 Impact Factor: 4.819 Real-Time Data Analytics with Apache Druid Correa Bosco Hilary Department of Information Technology, (MSc. IT Part 1) Sir Sitaram and Lady Shantabai Patkar College of Arts and Science, Mumbai, India Abstract: The shift towards real-time data flow has a major impact on the way applications are designed and on the work of data engineers. Dealing with real-time data ingestion brings a paradigm shift and an added layer of challenges compared to traditional integration and processing methods. There are real benefits to leveraging real-time data, but it requires specialized considerations in setting up the ingestion, processing, storing, and serving of that data. It brings about specific operational needs and a change in the way data engineers work. These should be considered while embarking on a real- time journey. In this paper we are going to see real time data analytics with apache druid. Apache Druid (incubating) performant analytics data store for event-driven data .Druid’s core design combines ideas from OLAP/analytic databases, time series databases, and search systems to create a unified system for operational analytics. Keywords: Distributed, Real-time, Fault-tolerant, Highly Available, Open Source, Analytics, Column- oriented, Olap, Apache Druid I. INTRODUCTION Streaming data integration is the foundation for streaming analytics. Specific use cases such as IoT devices log, contextual marketing triggers, Dynamic pricing all rely on using a data feed or real-time data. If you cannot source the data in real-time, there is very little value to be gained in attempting to tackle these use cases.
    [Show full text]
  • Presto: the Definitive Guide
    Presto The Definitive Guide SQL at Any Scale, on Any Storage, in Any Environment Compliments of Matt Fuller, Manfred Moser & Martin Traverso Virtual Book Tour Starburst presents Presto: The Definitive Guide Register Now! Starburst is hosting a virtual book tour series where attendees will: Meet the authors: • Meet the authors from the comfort of your own home Matt Fuller • Meet the Presto creators and participate in an Ask Me Anything (AMA) session with the book Manfred Moser authors + Presto creators • Meet special guest speakers from Martin your favorite podcasts who will Traverso moderate the AMA Register here to save your spot. Praise for Presto: The Definitive Guide This book provides a great introduction to Presto and teaches you everything you need to know to start your successful usage of Presto. —Dain Sundstrom and David Phillips, Creators of the Presto Projects and Founders of the Presto Software Foundation Presto plays a key role in enabling analysis at Pinterest. This book covers the Presto essentials, from use cases through how to run Presto at massive scale. —Ashish Kumar Singh, Tech Lead, Bigdata Query Processing Platform, Pinterest Presto has set the bar in both community-building and technical excellence for lightning- fast analytical processing on stored data in modern cloud architectures. This book is a must-read for companies looking to modernize their analytics stack. —Jay Kreps, Cocreator of Apache Kafka, Cofounder and CEO of Confluent Presto has saved us all—both in academia and industry—countless hours of work, allowing us all to avoid having to write code to manage distributed query processing.
    [Show full text]
  • Hortonworks Data Platform May 29, 2015
    docs.hortonworks.com Hortonworks Data Platform May 29, 2015 Hortonworks Data Platform : Data Integration Services with HDP Copyright © 2012-2015 Hortonworks, Inc. Some rights reserved. The Hortonworks Data Platform, powered by Apache Hadoop, is a massively scalable and 100% open source platform for storing, processing and analyzing large volumes of data. It is designed to deal with data from many sources and formats in a very quick, easy and cost-effective manner. The Hortonworks Data Platform consists of the essential set of Apache Hadoop projects including MapReduce, Hadoop Distributed File System (HDFS), HCatalog, Pig, Hive, HBase, Zookeeper and Ambari. Hortonworks is the major contributor of code and patches to many of these projects. These projects have been integrated and tested as part of the Hortonworks Data Platform release process and installation and configuration tools have also been included. Unlike other providers of platforms built using Apache Hadoop, Hortonworks contributes 100% of our code back to the Apache Software Foundation. The Hortonworks Data Platform is Apache-licensed and completely open source. We sell only expert technical support, training and partner-enablement services. All of our technology is, and will remain free and open source. Please visit the Hortonworks Data Platform page for more information on Hortonworks technology. For more information on Hortonworks services, please visit either the Support or Training page. Feel free to Contact Us directly to discuss your specific needs. Except where otherwise noted, this document is licensed under Creative Commons Attribution ShareAlike 3.0 License. http://creativecommons.org/licenses/by-sa/3.0/legalcode ii Hortonworks Data Platform May 29, 2015 Table of Contents 1.
    [Show full text]
  • Performance Prediction of Data Streams on High-Performance
    Gautam and Basava Hum. Cent. Comput. Inf. Sci. (2019) 9:2 https://doi.org/10.1186/s13673-018-0163-4 RESEARCH Open Access Performance prediction of data streams on high‑performance architecture Bhaskar Gautam* and Annappa Basava *Correspondence: bhaskar.gautam2494@gmail. Abstract com Worldwide sensor streams are expanding continuously with unbounded velocity in Department of Computer Science and Engineering, volume, and for this acceleration, there is an adaptation of large stream data processing National Institute system from the homogeneous to rack-scale architecture which makes serious con- of Technology Karnataka, cern in the domain of workload optimization, scheduling, and resource management Surathkal, India algorithms. Our proposed framework is based on providing architecture independent performance prediction model to enable resource adaptive distributed stream data processing platform. It is comprised of seven pre-defned domain for dynamic data stream metrics including a self-driven model which tries to ft these metrics using ridge regularization regression algorithm. Another signifcant contribution lies in fully-auto- mated performance prediction model inherited from the state-of-the-art distributed data management system for distributed stream processing systems using Gaussian processes regression that cluster metrics with the help of dimensionality reduction algorithm. We implemented its base on Apache Heron and evaluated with proposed Benchmark Suite comprising of fve domain-specifc topologies. To assess the pro- posed methodologies, we forcefully ingest tuple skewness among the benchmark- ing topologies to set up the ground truth for predictions and found that accuracy of predicting the performance of data streams increased up to 80.62% from 66.36% along with the reduction of error from 37.14 to 16.06%.
    [Show full text]
  • Hortonworks Data Platform Date of Publish: 2018-09-21
    Release Notes 3 Hortonworks Data Platform Date of Publish: 2018-09-21 http://docs.hortonworks.com Contents HDP 3.0.1 Release Notes..........................................................................................3 Component Versions.............................................................................................................................................3 New Features........................................................................................................................................................ 3 Deprecation Notices..............................................................................................................................................4 Terminology.............................................................................................................................................. 4 Removed Components and Product Capabilities.....................................................................................4 Unsupported Features........................................................................................................................................... 4 Technical Preview Features......................................................................................................................4 Upgrading to HDP 3.0.1...................................................................................................................................... 5 Before you begin.....................................................................................................................................
    [Show full text]