Superset Documentation

Superset Documentation Apache Superset Dev May 12, 2020 CONTENTS 1 Superset Resources 3 2 Apache Software Foundation Resources5 3 Overview 7 3.1 Features..................................................7 3.2 Databases.................................................7 3.3 Screenshots................................................8 3.4 Contents.................................................9 3.5 Indices and tables............................................ 83 i ii Superset Documentation _static/images/s.png _static/images/apache_feather.png Apache Superset (incubating) is a modern, enterprise-ready business intelligence web application Important: Disclaimer: Apache Superset is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. Note: Apache Superset, Superset, Apache, the Apache feather logo, and the Apache Superset project logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries. CONTENTS 1 Superset Documentation 2 CONTENTS CHAPTER ONE SUPERSET RESOURCES • Versioned versions of this documentation: https://readthedocs.org/projects/apache-superset/ • Superset’s Github, note that we use Github for issue tracking • Superset’s contribution guidelines and code of conduct on Github. • Our mailing list archives. To subscribe, send an email to [email protected] • Join our Slack 3 Superset Documentation 4 Chapter 1. Superset Resources CHAPTER TWO APACHE SOFTWARE FOUNDATION RESOURCES • The Apache Software Foundation Website • Current Events • License • Thanks to the ASF’s sponsors • Sponsor Apache! 5 Superset Documentation 6 Chapter 2. Apache Software Foundation Resources CHAPTER THREE OVERVIEW 3.1 Features • A rich set of data visualizations • An easy-to-use interface for exploring and visualizing data • Create and share dashboards • Enterprise-ready authentication with integration with major authentication providers (database, OpenID, LDAP, OAuth & REMOTE_USER through Flask AppBuilder) • An extensible, high-granularity security/permission model allowing intricate rules on who can access individual features and the dataset • A simple semantic layer, allowing users to control how data sources are displayed in the UI by defining which fields should show up in which drop-down and which aggregation and function metrics are made available to the user • Integration with most SQL-speaking RDBMS through SQLAlchemy • Deep integration with Druid.io 3.2 Databases The following RDBMS are currently supported: • Amazon Athena • Amazon Redshift • Apache Drill • Apache Druid • Apache Hive • Apache Impala • Apache Kylin • Apache Pinot • Apache Spark SQL • BigQuery • ClickHouse 7 Superset Documentation • CockroachDB • Dremio • Elasticsearch • Exasol • Google Sheets • Greenplum • IBM Db2 • MySQL • Oracle • PostgreSQL • Presto • Snowflake • SQLite • SQL Server • Teradata • Vertica • Hana Other database engines with a proper DB-API driver and SQLAlchemy dialect should be supported as well. 3.3 Screenshots _static/images/screenshots/bank_dash.png _static/images/screenshots/explore.png _static/images/screenshots/sqllab.png 8 Chapter 3. Overview Superset Documentation _static/images/screenshots/deckgl_dash.png 3.4 Contents 3.4.1 Installation & Configuration Getting Started Superset has deprecated support for Python 2.* and supports only ~=3.6 to take advantage of the newer Python features and reduce the burden of supporting previous versions. We run our test suite against 3.6, but 3.7 is fully supported as well. Cloud-native! Superset is designed to be highly available. It is “cloud-native” as it has been designed scale out in large, distributed environments, and works well inside containers. While you can easily test drive Superset on a modest setup or simply on your laptop, there’s virtually no limit around scaling out the platform. Superset is also cloud-native in the sense that it is flexible and lets you choose your web server (Gunicorn, Nginx, Apache), your metadata database engine (MySQL, Postgres, MariaDB, . ), your message queue (Redis, RabbitMQ, SQS, . ), your results backend (S3, Redis, Memcached, . ), your caching layer (Memcached, Redis, . ), works well with services like NewRelic, StatsD and DataDog, and has the ability to run analytic workloads against most popular database technologies. Superset is battle tested in large environments with hundreds of concurrent users. Airbnb’s production environment runs inside Kubernetes and serves 600+ daily active users viewing over 100K charts a day. The Superset web server and the Superset Celery workers (optional) are stateless, so you can scale out by running on as many servers as needed. Start with Docker Note: The Docker-related files and documentation are actively maintained and managed by the core committers working on the project. Help and contributions around Docker are welcomed! If you know docker, then you’re lucky, we have shortcut road for you to initialize development environment: git clone https://github.com/apache/incubator-superset/ cd incubator-superset # you can run this command everytime you need to start superset now: docker-compose up 3.4. Contents 9 Superset Documentation After several minutes for superset initialization to finish, you can open a browser and view http://localhost:8088 to start your journey. By default the system configures an admin user with the username of admin and a password of admin - if you are in a non-local environment it is highly recommended to change this username and password at your earliest convenience. From there, the container server will reload on modification of the superset python and javascript source code. Don’t forget to reload the page to take the new frontend into account though. See also CONTRIBUTING.md#building, for alternative way of serving the frontend. It is currently not recommended to run docker-compose in production. If you are attempting to build on a Mac and it exits with 137 you need to increase your docker resources. OSX instructions: https://docs.docker.com/docker-for-mac/#advanced (Search for memory) Or if you’re curious and want to install superset from bottom up, then go ahead. See also docker/README.md OS dependencies Superset stores database connection information in its metadata database. For that purpose, we use the cryptography Python library to encrypt connection passwords. Unfortunately, this library has OS level dependencies. You may want to attempt the next step (“Superset installation and initialization”) and come back to this step if you encounter an error. Here’s how to install them: For Debian and Ubuntu, the following command will ensure that the required dependencies are installed: sudo apt-get install build-essential libssl-dev libffi-dev python-dev python-pip ,!libsasl2-dev libldap2-dev Ubuntu 18.04 If you have python3.6 installed alongside with python2.7, as is default on Ubuntu 18.04 LTS, run this command also: sudo apt-get install build-essential libssl-dev libffi-dev python3.6-dev python-pip ,!libsasl2-dev libldap2-dev otherwise build for cryptography fails. For Fedora and RHEL-derivatives, the following command will ensure that the required dependencies are installed: sudo yum upgrade python-setuptools sudo yum install gcc gcc-c++ libffi-devel python-devel python-pip python-wheel ,!openssl-devel cyrus-sasl-devel openldap-devel Mac OS X If possible, you should upgrade to the latest version of OS X as issues are more likely to be resolved for that version. You will likely need the latest version of XCode available for your installed version of OS X. You should also install the XCode command line tools: xcode-select--install System python is not recommended. Homebrew’s python also ships with pip: brew install pkg-config libffi openssl python env LDFLAGS="-L$(brew --prefix openssl)/lib" CFLAGS="-I$(brew --prefix openssl)/ ,!include" pip install cryptography==2.4.2 10 Chapter 3. Overview Superset Documentation Windows isn’t officially supported at this point, but if you want to attempt it, download get-pip.py, and run python get-pip.py which may need admin access. Then run the following: C:\> pip install cryptography # You may also have to create C:\Temp C:\> md C:\Temp Python virtualenv It is recommended to install Superset inside a virtualenv. Python 3 already ships virtualenv. But if it’s not installed in your environment for some reason, you can install it via the package for your operating systems, otherwise you can install from pip: pip install virtualenv You can create and activate a virtualenv by: # virtualenv is shipped in Python 3.6+ as venv instead of pyvenv. # See https://docs.python.org/3.6/library/venv.html python3-m venv venv . venv/bin/activate On Windows the syntax for activating it is a bit different: venv\Scripts\activate Once you activated your virtualenv everything you are doing is confined inside the virtualenv. To exit a virtualenv just type deactivate. Python’s setup tools and pip Put all the chances on your side by getting the very latest pip and setuptools libraries.: pip install--upgrade setuptools pip Superset installation and initialization

Superset Documentation

IPS Signature Release Note V9.17.79

Unravel Data Systems Version 4.5

Creating Dashboards and Data Stories Within the Data & Analytics Framework (DAF)

HDP 3.1.4 Release Notes Date of Publish: 2019-08-26

60 Recipes for Apache Cloudstack

Real-Time Data Analytics with Apache Druid Correa Bosco Hilary Department of Information Technology, (Msc

Presto: the Definitive Guide

Hortonworks Data Platform May 29, 2015

Performance Prediction of Data Streams on High-Performance

Hortonworks Data Platform Date of Publish: 2018-09-21

ACNA2011: Apache Rave: Enterprise Social Networking out of The

Real-Time Stream Processing for Big Data