SQream DB Release 2021.1

Sep 27, 2021

Contents:

1 First steps with SQream DB 3 1.1 Preparing for this tutorial ...... 3 1.2 Create a new database for playing around in ...... 4 1.3 Creating your first table ...... 4 1.4 Listing tables ...... 5 1.5 Inserting rows ...... 5 1.6 Queries ...... 6 1.7 Deleting rows ...... 8 1.8 Saving query results to a CSV or PSV file ...... 8

2 Guides 11 2.1 Full list of guides ...... 11 2.1.1 Inserting data ...... 11 2.1.1.1 Data loading overview ...... 12 2.1.1.2 Data loading considerations ...... 12 2.1.1.2.1 Verify data and performance after load ...... 12 2.1.1.2.2 File source location for loading ...... 13 2.1.1.2.3 Supported load methods ...... 13 2.1.1.2.4 Unsupported data types ...... 13 2.1.1.2.5 Extended error handling ...... 13 2.1.1.2.6 Best practices for CSV ...... 13 2.1.1.2.7 Best practices for Parquet ...... 14 2.1.1.2.7.1 Type support and behavior notes ...... 14 2.1.1.2.8 Best practices for ORC ...... 14 2.1.1.2.8.1 Type support and behavior notes ...... 14 2.1.1.3 Further reading and migration guides ...... 15 2.1.1.3.1 Insert from CSV ...... 15 2.1.1.3.1.1 1. Prepare CSVs ...... 16 2.1.1.3.1.2 2. Place CSVs where SQream DB workers can access ..... 16 2.1.1.3.1.3 3. Figure out the table structure ...... 17 2.1.1.3.1.4 4. Bulk load the data with COPY FROM ...... 18 2.1.1.3.1.5 Loading different types of CSV files ...... 18 2.1.1.3.1.6 Loading a standard CSV file from a local filesystem ..... 18 2.1.1.3.1.7 Loading a PSV (pipe separated value) file ...... 18 2.1.1.3.1.8 Loading a TSV (tab separated value) file ...... 18 2.1.1.3.1.9 Loading a text file with non-printable delimiter ...... 18

i 2.1.1.3.1.10 Loading a text file with multi-character delimiters ...... 18 2.1.1.3.1.11 Loading files with a header row ...... 19 2.1.1.3.1.12 Loading files formatted for Windows (\r\n) ...... 19 2.1.1.3.1.13 Loading a file from a public S3 bucket ...... 19 2.1.1.3.1.14 Loading files from an authenticated S3 bucket ...... 19 2.1.1.3.1.15 Loading files from an HDFS storage ...... 19 2.1.1.3.1.16 Saving rejected rows to a file ...... 19 2.1.1.3.1.17 Stopping the load if a certain amount of rows were rejected . 19 2.1.1.3.1.18 Load CSV files from a set of directories ...... 20 2.1.1.3.1.19 Rearrange destination columns ...... 20 2.1.1.3.1.20 Loading non-standard dates ...... 20 2.1.1.3.2 Insert from Parquet ...... 20 2.1.1.3.2.1 1. Prepare the files ...... 21 2.1.1.3.2.2 2. Place Parquet files where SQream DB workers can access them ...... 21 2.1.1.3.2.3 3. Figure out the table structure ...... 22 2.1.1.3.2.4 4. Verify table contents ...... 23 2.1.1.3.2.5 5. Copying data into SQream DB ...... 23 2.1.1.3.2.6 Working around unsupported column types ...... 23 2.1.1.3.2.7 Modifying data during the copy process ...... 24 2.1.1.3.2.8 Further Parquet loading examples ...... 24 2.1.1.3.2.9 Loading a table from a directory of Parquet files on HDFS . 24 2.1.1.3.2.10 Loading a table from a bucket of files on S3 ...... 24 2.1.1.3.3 Insert from ORC ...... 25 2.1.1.3.3.1 1. Prepare the files ...... 25 2.1.1.3.3.2 2. Place ORC files where SQream DB workers can access them 25 2.1.1.3.3.3 3. Figure out the table structure ...... 26 2.1.1.3.3.4 4. Verify table contents ...... 27 2.1.1.3.3.5 5. Copying data into SQream DB ...... 27 2.1.1.3.3.6 Working around unsupported column types ...... 27 2.1.1.3.3.7 Modifying data during the copy process ...... 28 2.1.1.3.3.8 Further ORC loading examples ...... 28 2.1.1.3.3.9 Loading a table from a directory of ORC files on HDFS ... 28 2.1.1.3.3.10 Loading a table from a bucket of files on S3 ...... 28 2.1.1.3.4 Migrating from Oracle ...... 29 2.1.1.3.4.1 1. Preparing the tools and login information ...... 29 2.1.1.3.4.2 2. Export the desired schema ...... 30 2.1.1.3.4.3 Examples ...... 30 2.1.1.3.4.4 Dump all tables ...... 30 2.1.1.3.4.5 Dump only specific tables ...... 30 2.1.1.3.4.6 3. Convert the Oracle dump to standard SQL ...... 30 2.1.1.3.4.7 Example ...... 30 2.1.1.3.4.8 4. Figure out the database structures ...... 30 2.1.1.3.4.9 Remove unsupported attributes ...... 31 2.1.1.3.4.10 Match data types ...... 31 2.1.1.3.4.11 Additional considerations ...... 31 2.1.1.3.4.12 5. Create the tables in SQream DB ...... 31 2.1.1.3.4.13 Example ...... 31 2.1.1.3.4.14 6. Export tables to CSVs ...... 33 2.1.1.3.4.15 Using SQL*Plus to export data lists ...... 33 2.1.1.3.4.16 Creating CSVs using stored procedures ...... 34 2.1.1.3.4.17 CSV generation considerations ...... 34 2.1.1.3.4.18 7. Place CSVs where SQream DB workers can access ..... 34 2.1.1.3.4.19 8. Bulk load the CSVs ...... 35 ii 2.1.1.3.4.20 Example ...... 35 2.1.1.3.4.21 9. Rewrite Oracle queries ...... 35 2.1.2 Client drivers for v2021.1 ...... 35 2.1.2.1 Client driver downloads ...... 36 2.1.2.1.1 All operating systems ...... 36 2.1.2.1.2 Windows ...... 36 2.1.2.1.3 ...... 36 2.1.2.1.3.1 Python (pysqream) ...... 36 2.1.2.1.3.2 Installing the Python connector ...... 37 2.1.2.1.3.3 Prerequisites ...... 37 2.1.2.1.3.4 1. Python ...... 37 2.1.2.1.3.5 2. PIP ...... 37 2.1.2.1.3.6 3. OpenSSL for Linux ...... 38 2.1.2.1.3.7 4. Cython (optional) ...... 38 2.1.2.1.3.8 Install via pip ...... 38 2.1.2.1.3.9 Upgrading an existing installation ...... 39 2.1.2.1.3.10 Validate the installation ...... 39 2.1.2.1.3.11 SQLAlchemy examples ...... 40 2.1.2.1.3.12 A simple connection example ...... 40 2.1.2.1.3.13 Pulling a table into Pandas ...... 40 2.1.2.1.3.14 API Examples ...... 41 2.1.2.1.3.15 Explaining the connection example ...... 41 2.1.2.1.3.16 Using the cursor ...... 42 2.1.2.1.3.17 Reading result metadata ...... 43 2.1.2.1.3.18 Loading data into a table ...... 44 2.1.2.1.3.19 Reading data from a CSV file for load into a table ...... 45 2.1.2.1.3.20 Using SQLAlchemy ORM to create tables and fill them with data ...... 46 2.1.2.1.3.21 pysqream API reference ...... 47 2.1.2.1.3.22 C++ Driver ...... 49 2.1.2.1.3.23 Installing the C++ driver ...... 49 2.1.2.1.3.24 Prerequisites ...... 49 2.1.2.1.3.25 Getting the library ...... 50 2.1.2.1.3.26 Extract the tarball archive ...... 50 2.1.2.1.3.27 Examples ...... 50 2.1.2.1.3.28 Testing the connection to SQream DB ...... 50 2.1.2.1.3.29 Compiling and running the application ...... 51 2.1.2.1.3.30 Creating a table and inserting values ...... 51 2.1.2.1.3.31 Compiling and running the application ...... 52 2.1.2.1.3.32 JDBC ...... 52 2.1.2.1.3.33 Installing the JDBC driver ...... 53 2.1.2.1.3.34 Prerequisites ...... 53 2.1.2.1.3.35 Getting the JAR file ...... 53 2.1.2.1.3.36 Extract the zip archive ...... 53 2.1.2.1.3.37 Setting up the Class Path ...... 53 2.1.2.1.3.38 Connect to SQream DB with a JDBC application ...... 53 2.1.2.1.3.39 Driver class ...... 53 2.1.2.1.3.40 Connection string ...... 54 2.1.2.1.3.41 Connection parameters ...... 54 2.1.2.1.3.42 Connection string examples ...... 54 2.1.2.1.3.43 Sample Java program ...... 54 2.1.2.1.3.44 ODBC ...... 56 2.1.2.1.3.45 Install and Configure ODBC on Windows ...... 56 2.1.2.1.3.46 Installing the ODBC Driver ...... 56

iii 2.1.2.1.3.47 Prerequisites ...... 56 2.1.2.1.3.48 Visual Studio 2015 Redistributables ...... 56 2.1.2.1.3.49 Administrator Privileges ...... 57 2.1.2.1.3.50 1. Run the Windows installer ...... 57 2.1.2.1.3.51 2. Selecting Components ...... 57 2.1.2.1.3.52 3. Configuring the ODBC Driver DSN ...... 58 2.1.2.1.3.53 Connection Parameters ...... 61 2.1.2.1.3.54 Troubleshooting ...... 61 2.1.2.1.3.55 Solving “Code 126” ODBC errors ...... 61 2.1.2.1.3.56 Install and configure ODBC on Linux ...... 61 2.1.2.1.3.57 Prerequisites ...... 62 2.1.2.1.3.58 unixODBC ...... 62 2.1.2.1.3.59 Install unixODBC on RHEL 7 / CentOS 7 ...... 62 2.1.2.1.3.60 Install unixODBC on Ubuntu ...... 62 2.1.2.1.3.61 Install the ODBC driver with a script ...... 62 2.1.2.1.3.62 Install the ODBC driver manually ...... 63 2.1.2.1.3.63 Install the driver dependencies ...... 64 2.1.2.1.3.64 Testing the connection ...... 64 2.1.2.1.3.65 ODBC DSN Parameters ...... 66 2.1.2.1.3.66 Downloading the ODBC driver ...... 67 2.1.2.1.3.67 Install and configure the ODBC driver ...... 67 2.1.2.1.3.68 Node.JS ...... 67 2.1.2.1.3.69 Installing the Node.JS driver ...... 68 2.1.2.1.3.70 Prerequisites ...... 68 2.1.2.1.3.71 Install with NPM ...... 68 2.1.2.1.3.72 Install from an offline package ...... 68 2.1.2.1.3.73 Connect to SQream DB with a Node.JS application ..... 69 2.1.2.1.3.74 Create a simple test ...... 69 2.1.2.1.3.75 Run the test ...... 69 2.1.2.1.3.76 API reference ...... 70 2.1.2.1.3.77 Connection parameters ...... 70 2.1.2.1.3.78 Events ...... 70 2.1.2.1.3.79 Example ...... 70 2.1.2.1.3.80 Input placeholders ...... 70 2.1.2.1.3.81 Examples ...... 71 2.1.2.1.3.82 Setting configuration flags ...... 71 2.1.2.1.3.83 Lazyloading ...... 71 2.1.2.1.3.84 Reusing a connection ...... 72 2.1.2.1.3.85 Using placeholders in queries ...... 73 2.1.2.1.3.86 Troubleshooting and recommended configuration ...... 73 2.1.2.1.3.87 Preventing heap out of memory errors ...... 73 2.1.2.1.3.88 BIGINT support ...... 73 2.1.3 Third Party Tools ...... 74 2.1.3.1 Overview ...... 74 2.1.3.1.1 Connect to SQream Using SQL Workbench ...... 74 2.1.3.1.1.1 Installing SQL Workbench with the SQream DB installer (Windows only) ...... 75 2.1.3.1.1.2 Installing SQL Workbench manually (Linux, MacOS) .... 76 2.1.3.1.1.3 Install Java Runtime ...... 76 2.1.3.1.1.4 Get the SQream DB JDBC driver ...... 77 2.1.3.1.1.5 Install SQL Workbench ...... 77 2.1.3.1.1.6 Setting up the SQream DB JDBC driver profile ...... 77 2.1.3.1.1.7 Create a new connection profile for your cluster ...... 80 2.1.3.1.1.8 Suggested optional configuration ...... 81 iv 2.1.3.1.2 Connecting to SQream Using Tableau ...... 81 2.1.3.1.2.1 Overview ...... 81 2.1.3.1.2.2 Installing the JDBC Driver and Tableau Connector Plugin . 82 2.1.3.1.2.3 Installing the JDBC Driver Using the Windows Installer ... 82 2.1.3.1.2.4 Installing the JDBC Driver Manually ...... 82 2.1.3.1.2.5 Installing the ODBC Driver for Tableau Versions 2019.3 and Earlier ...... 84 2.1.3.1.2.6 Automatically Reconfiguring the ODBC Driver After Initial Installation ...... 84 2.1.3.1.2.7 Manually Reconfiguring the ODBC Driver After Initial In- stallation ...... 85 2.1.3.1.2.8 Configuring the ODBC Connection ...... 86 2.1.3.1.2.9 Connecting Tableau to SQream ...... 88 2.1.3.1.2.10 Connecting to SQream ...... 89 2.1.3.1.2.11 Setting Up SQream Tables as Data Sources ...... 89 2.1.3.1.2.12 Tableau Best Practices and Troubleshooting ...... 90 2.1.3.1.2.13 Inserting Only Required Data ...... 90 2.1.3.1.2.14 Using Tableau’s Table Query Syntax ...... 90 2.1.3.1.2.15 Creating a Separate Service for Tableau ...... 91 2.1.3.1.2.16 Troubleshooting Workbook Performance Before Deploying to the Tableau Server ...... 91 2.1.3.1.2.17 Troubleshooting Error Codes ...... 91 2.1.3.1.3 Connect to SQream Using Pentaho Data Integration ...... 92 2.1.3.1.3.1 Overview ...... 92 2.1.3.1.3.2 Installing Pentaho ...... 92 2.1.3.1.3.3 Installing and Setting Up the JDBC Driver ...... 93 2.1.3.1.3.4 Creating a Transformation ...... 93 2.1.3.1.3.5 Defining Your Output ...... 100 2.1.3.1.3.6 Importing Data ...... 103 2.1.3.1.4 Connect to SQream Using MicroStrategy ...... 108 2.1.3.1.4.1 Overview ...... 108 2.1.3.1.4.2 What is MicroStrategy? ...... 108 2.1.3.1.4.3 Connecting a Data Source ...... 108 2.1.3.1.4.4 Supported SQream Drivers ...... 110 2.1.3.1.5 Connect to SQream Using R ...... 110 2.1.3.1.5.1 JDBC ...... 111 2.1.3.1.5.2 A full example ...... 112 2.1.3.1.5.3 ODBC ...... 112 2.1.3.1.5.4 A full example ...... 113 2.1.3.1.6 Connect to SQream Using PHP ...... 113 2.1.3.1.6.1 Prerequisites ...... 113 2.1.3.1.6.2 Testing the connection ...... 113 2.1.3.1.7 Connect to SQream Using SAS Viya ...... 114 2.1.3.1.7.1 Installing SAS Viya ...... 114 2.1.3.1.7.2 Installing the JDBC driver ...... 114 2.1.3.1.7.3 Configuring the JDBC driver in SAS Viya ...... 115 2.1.3.1.7.4 Browsing data and workbooks ...... 116 2.1.3.1.7.5 SAS Viya best practices and troubleshooting ...... 118 2.1.3.1.7.6 Cut out what you don’t need ...... 118 2.1.3.1.7.7 Create a separate service for SAS Viya ...... 118 2.1.3.1.7.8 Troubleshooting java.lang.ClassNotFoundException: com.sqream.jdbc.SQDriver exceptions ...... 118 2.1.4 Operations ...... 119 2.1.4.1 Optimization and Best Practices ...... 119

v 2.1.4.1.1 Table design ...... 119 2.1.4.1.1.1 Use date and datetime types for columns ...... 120 2.1.4.1.1.2 Reduce varchar length to a minimum ...... 120 2.1.4.1.1.3 Don’t flatten or denormalize data ...... 120 2.1.4.1.1.4 Convert foreign tables to native tables ...... 120 2.1.4.1.1.5 Use information about the column data to your advantage . 120 2.1.4.1.1.6 Set NULL or NOT NULL when relevant ...... 120 2.1.4.1.1.7 Keep VARCHAR lengths to a minimum ...... 121 2.1.4.1.2 Sorting ...... 121 2.1.4.1.3 Query best practices ...... 121 2.1.4.1.3.1 Reduce data sets before joining tables ...... 121 2.1.4.1.3.2 Prefer the ANSI JOIN ...... 121 2.1.4.1.3.3 Use the high selectivity hint ...... 122 2.1.4.1.3.4 Cast smaller types to avoid overflow in aggregates ...... 122 2.1.4.1.3.5 Prefer COUNT(*) and COUNT on non-nullable columns ..... 123 2.1.4.1.3.6 Return only required columns ...... 123 2.1.4.1.3.7 Use saved queries to reduce recurring compilation time ... 123 2.1.4.1.3.8 Pre-filter to reduce JOIN complexity ...... 123 2.1.4.1.4 Data loading considerations ...... 124 2.1.4.1.4.1 Allow and use natural sorting on data ...... 124 2.1.4.1.5 Further reading and monitoring query performance ...... 124 2.1.4.2 Installing Monit ...... 124 2.1.4.2.1 Getting Started ...... 124 2.1.4.2.2 Overview ...... 124 2.1.4.2.2.1 Installing Monit on CentOS: ...... 124 2.1.4.2.2.2 Installing Monit on CentOS Offline: ...... 125 2.1.4.2.2.3 Building Monit from Source Code ...... 125 2.1.4.2.2.4 Building Monit from Pre-Built Binaries ...... 125 2.1.4.2.2.5 Installing Monit on Ubuntu: ...... 126 2.1.4.2.2.6 Installing Monit on Ubuntu Offline: ...... 126 2.1.4.2.3 Configuring Monit ...... 126 2.1.4.2.4 Starting Monit ...... 127 2.1.4.3 Launching SQream with Monit ...... 128 2.1.4.3.1 Launching SQream ...... 128 2.1.4.3.2 Monit Usage Examples ...... 131 2.1.4.3.2.1 Stopping Monit and SQream Separately ...... 131 2.1.4.3.2.2 Stopping SQream Using a Monit Command ...... 131 2.1.4.3.2.3 Monit Command Line Options ...... 131 2.1.4.3.3 Using Monit While Upgrading Your Version of SQream ...... 132 2.1.4.4 Installing SQream Using Binary Packages ...... 133 2.1.4.4.1 Upgrading SQream Version ...... 135 2.1.4.5 Installing SQream with Kubernetes ...... 137 2.1.4.5.1 Preparing the SQream Environment to Launch SQream Using Ku- bernetes ...... 137 2.1.4.5.1.1 Overview ...... 137 2.1.4.5.1.2 Operating System Requirements ...... 138 2.1.4.5.1.3 Compute Server Specifications ...... 138 2.1.4.5.2 Setting Up Your Hosts ...... 138 2.1.4.5.2.1 Configuring the Hosts File ...... 138 2.1.4.5.2.2 Installing the Required Packages ...... 139 2.1.4.5.2.3 Disabling the Linux UI ...... 139 2.1.4.5.2.4 Disabling SELinux ...... 140 2.1.4.5.2.5 Disabling Your Firewall ...... 140 2.1.4.5.2.6 Checking the CUDA Version ...... 140 vi 2.1.4.5.3 Installing Your Kubernetes Cluster ...... 141 2.1.4.5.3.1 Generating and Sharing SSH Keypairs Across All Existing Nodes ...... 141 2.1.4.5.3.2 Installing and Deploying a Kubernetes Cluster Using Kube- spray ...... 142 2.1.4.5.3.3 Adjusting Kubespray Deployment Values ...... 145 2.1.4.5.3.4 Checking Your Kubernetes Status ...... 146 2.1.4.5.3.5 Adding a SQream Label to Your Kubernetes Cluster Nodes . 147 2.1.4.5.3.6 Copying Your Kubernetes Configuration API File to the Mas- ter Cluster Nodes ...... 148 2.1.4.5.3.7 Creating an env_file in Your Home Directory ...... 149 2.1.4.5.3.8 Creating a Base Kubernetes Namespace ...... 150 2.1.4.5.3.9 Pushing the env_file File to the Kubernetes Configmap .. 150 2.1.4.5.3.10 Installing the Docker2 Toolkit ...... 150 2.1.4.5.3.11 Installing the NVIDIA Docker2 Toolkit on an x86_64 Bit Processor on CentOS ...... 150 2.1.4.5.3.12 Installing the NVIDIA Docker2 Toolkit on an x86_64 Bit Processor on Ubuntu ...... 151 2.1.4.5.3.13 Modifying the Docker Daemon JSON File for GPU and Com- pute Nodes ...... 152 2.1.4.5.3.14 Modifying the Docker Daemon JSON File for GPU Nodes .. 152 2.1.4.5.3.15 Modifying the Docker Daemon JSON File for Compute Nodes 153 2.1.4.5.3.16 Installing the Nvidia-device-plugin Daemonset ...... 153 2.1.4.5.3.17 Creating an Nvidia Device Plugin ...... 154 2.1.4.5.3.18 Checking GPU Resources Allocatable to GPU Nodes .... 154 2.1.4.5.3.19 Preparing the WatchDog Monitor ...... 155 2.1.4.5.4 Installing the SQream Software ...... 155 2.1.4.5.4.1 Getting the SQream Package ...... 155 2.1.4.5.4.2 Setting Up and Configuring Hadoop ...... 156 2.1.4.5.4.3 Starting a Local Docker Image Registry ...... 156 2.1.4.5.4.4 Installing the Kubernetes Dashboard ...... 157 2.1.4.5.4.5 Installing the SQream Prometheus Package ...... 159 2.1.4.5.4.6 Installing the Exporter Service ...... 159 2.1.4.5.4.7 Checking the Exporter Status ...... 160 2.1.4.5.5 Running the Sqream-install Service ...... 160 2.1.4.5.5.1 Installing Your License ...... 160 2.1.4.5.5.2 Changing Your Data Ingest Folder ...... 161 2.1.4.5.5.3 Checking Your System Settings ...... 161 2.1.4.5.5.4 SQream Installation Command Reference ...... 161 2.1.4.5.5.5 Controlling Your Kubernetes Cluster Using SQream Flags .. 162 2.1.4.5.6 Using the sqream-start Commands ...... 163 2.1.4.5.6.1 Starting Your SQream Services ...... 163 2.1.4.5.6.2 Starting Your SQream Services in Split Mode ...... 164 2.1.4.5.6.3 Starting the Sqream Studio UI ...... 165 2.1.4.5.6.4 Stopping the SQream Services ...... 165 2.1.4.5.6.5 Advanced sqream-start Commands ...... 166 2.1.4.5.6.6 Controlling Your SQream Spool Size ...... 166 2.1.4.5.6.7 Using a Custom .json File ...... 166 2.1.4.5.6.8 Checking the Status of the SQream Services ...... 166 2.1.4.5.7 Upgrading Your SQream Version ...... 166 2.1.4.5.7.1 Before Upgrading Your System ...... 167 2.1.4.5.7.2 Upgrading Your System ...... 167 2.1.4.6 Running SQream in a Docker Container ...... 167 2.1.4.6.1 Setting Up a Host ...... 168

vii 2.1.4.6.1.1 Operating System Requirements ...... 168 2.1.4.6.1.2 Creating a Local User ...... 168 2.1.4.6.1.3 Setting a Local Language ...... 169 2.1.4.6.1.4 Adding the EPEL Repository ...... 169 2.1.4.6.1.5 Installing the Required NTP Packages ...... 169 2.1.4.6.1.6 Installing the Recommended Tools ...... 169 2.1.4.6.1.7 Updating to the Current Version of the Operating System .. 169 2.1.4.6.1.8 Configuring the NTP Package ...... 170 2.1.4.6.1.9 Configuring the Performance Profile ...... 170 2.1.4.6.1.10 Configuring Your Security Limits ...... 170 2.1.4.6.1.11 Disabling Automatic Bug-Reporting Tools ...... 171 2.1.4.6.1.12 Preparing the Nvidia CUDA Drive for Installation ...... 171 2.1.4.6.1.13 Installing the Nvidia CUDA Driver ...... 172 2.1.4.6.1.14 Installing the CUDA Driver Version 10.1 for x86_64 ..... 172 2.1.4.6.1.15 Installing the CUDA Driver Version 10.1 for IBM Power9 .. 174 2.1.4.6.2 Installing the Docker Engine (Community Edition) ...... 175 2.1.4.6.2.1 Installing the Docker Engine Using an x86_64 Processor on CentOS ...... 176 2.1.4.6.2.2 Installing the Docker Engine Using an x86_64 Processor on Ubuntu ...... 176 2.1.4.6.2.3 Installing the Docker Engine on an IBM Power9 Processor . 176 2.1.4.6.3 Docker Post-Installation ...... 176 2.1.4.6.4 Installing the Nvidia Docker2 ToolKit ...... 177 2.1.4.6.4.1 Installing the NVIDIA Docker2 Toolkit on an x86_64 Processor177 2.1.4.6.4.2 Installing the NVIDIA Docker2 Toolkit on a CentOS Oper- ating System ...... 177 2.1.4.6.4.3 Installing the NVIDIA Docker2 Toolkit on an Ubuntu Oper- ating System ...... 178 2.1.4.6.4.4 Installing the NVIDIA Docker2 Toolkit on a PPC64le Processor179 2.1.4.6.4.5 Accessing the Hadoop and Kubernetes Configuration Files . 180 2.1.4.6.5 Installing the SQream Software ...... 180 2.1.4.6.5.1 Preparing Your Local Environment ...... 180 2.1.4.6.5.2 Deploying the SQream Software ...... 180 2.1.4.6.5.3 Configuring the Hadoop and Kubernetes Configuration Files 181 2.1.4.6.5.4 Configuring the SQream Software ...... 182 2.1.4.6.5.5 Configuring Your Local Environment ...... 182 2.1.4.6.5.6 Installing Your License ...... 183 2.1.4.6.5.7 Validating Your License ...... 183 2.1.4.6.5.8 Setting the Hadoop and Kubernetes Connectivity Parameters 184 2.1.4.6.5.9 Modifying Your Data Ingest Folder ...... 184 2.1.4.6.5.10 Configuring Your Network for Docker ...... 184 2.1.4.6.5.11 Checking and Verifying Your System Settings ...... 185 2.1.4.6.6 Using the SQream Console ...... 185 2.1.4.6.6.1 SQream Console - Basic Commands ...... 185 2.1.4.6.6.2 Starting Your SQream Console ...... 186 2.1.4.6.6.3 Starting the SQream Master ...... 186 2.1.4.6.6.4 Starting SQream Workers ...... 186 2.1.4.6.6.5 Listing the Running Services ...... 186 2.1.4.6.6.6 Stopping the Running Services ...... 187 2.1.4.6.6.7 Using SQream Studio ...... 187 2.1.4.6.6.8 Using the SQream Client ...... 188 2.1.4.6.6.9 Moving from Docker Installation to Standard On-Premises Installation ...... 188 2.1.4.6.6.10 SQream Console - Advanced Commands ...... 188 viii 2.1.4.6.6.11 Controlling the Spool Size ...... 188 2.1.4.6.6.12 Splitting a GPU ...... 189 2.1.4.6.6.13 Splitting GPU and Setting the Spool Size ...... 189 2.1.4.6.6.14 Using a Custom Configuration File ...... 189 2.1.4.6.6.15 Clustering Your Docker Environment ...... 189 2.1.4.6.6.16 Checking the Status of SQream Services ...... 190 2.1.4.6.6.17 Checking the Status of SQream Services from the SQream Console ...... 190 2.1.4.6.6.18 Checking the Status of SQream Services from Outside the SQream Console ...... 190 2.1.4.6.6.19 Upgrading Your SQream System ...... 190 2.1.4.7 Monitoring query performance ...... 191 2.1.4.7.1 Setting up the system for monitoring ...... 192 2.1.4.7.1.1 Adjusting the logging frequency ...... 193 2.1.4.7.1.2 Reading execution plans with a foreign table ...... 193 2.1.4.7.2 The SHOW_NODE_INFO command ...... 194 2.1.4.7.3 Understanding the query execution plan output ...... 195 2.1.4.7.3.1 Information presented in the execution plan ...... 197 2.1.4.7.3.2 Commonly seen nodes ...... 197 2.1.4.7.4 Examples ...... 198 2.1.4.7.4.1 1. Spooling to disk ...... 199 2.1.4.7.4.2 Identifying the offending nodes ...... 199 2.1.4.7.4.3 Common solutions for reducing spool ...... 201 2.1.4.7.4.4 2. Queries with large result sets ...... 201 2.1.4.7.4.5 Identifying the offending nodes ...... 201 2.1.4.7.4.6 Common solutions for reducing gather time ...... 202 2.1.4.7.4.7 3. Inefficient filtering ...... 202 2.1.4.7.4.8 Identifying the situation ...... 202 2.1.4.7.4.9 Common solutions for improving filtering ...... 205 2.1.4.7.4.10 4. Joins with varchar keys ...... 206 2.1.4.7.4.11 Identifying the situation ...... 206 2.1.4.7.4.12 Improving query performance ...... 207 2.1.4.7.4.13 5. Sorting on big VARCHAR fields ...... 208 2.1.4.7.4.14 Identifying the situation ...... 208 2.1.4.7.4.15 Improving sort performance on text keys ...... 210 2.1.4.7.4.16 6. High selectivity data ...... 210 2.1.4.7.4.17 Identifying the situation ...... 211 2.1.4.7.4.18 Improving performance with high selectivity hints ...... 211 2.1.4.7.4.19 7. Performance of unsorted data in joins ...... 211 2.1.4.7.4.20 Identifying the situation ...... 211 2.1.4.7.4.21 Improving join performance when data is sparse ...... 212 2.1.4.7.4.22 8. Manual join reordering ...... 213 2.1.4.7.4.23 Identifying the situation ...... 213 2.1.4.7.4.24 Changing the join order ...... 213 2.1.4.7.5 Further reading ...... 214 2.1.4.8 Logging ...... 214 2.1.4.8.1 Locating the Log Files ...... 214 2.1.4.8.1.1 Log Structure and Contents ...... 215 2.1.4.8.1.2 Log-Naming ...... 217 2.1.4.8.2 Log Control and Maintenance ...... 217 2.1.4.8.2.1 Changing Log Verbosity ...... 217 2.1.4.8.2.2 Changing Log Rotation ...... 217 2.1.4.8.3 Collecting Logs from Your Cluster ...... 217 2.1.4.8.3.1 SQL Syntax ...... 218

ix 2.1.4.8.3.2 Command Line Utility ...... 218 2.1.4.8.3.3 Parameters ...... 218 2.1.4.8.3.4 Example ...... 218 2.1.4.8.4 Troubleshooting with Logs ...... 219 2.1.4.8.4.1 Loading Logs with Foreign Tables ...... 219 2.1.4.8.4.2 Counting Message Types ...... 219 2.1.4.8.4.3 Finding Fatal Errors ...... 220 2.1.4.8.4.4 Countng Error Events Within a Certain Timeframe ..... 220 2.1.4.8.4.5 Tracing Errors to Find Offending Statements ...... 220 2.1.4.9 Configuration ...... 221 2.1.4.9.1 Overview ...... 221 2.1.4.9.2 Configuring Your Instance of SQream ...... 221 2.1.4.9.2.1 Configuring SQream Using the SQream Configuration File . 222 2.1.4.9.2.2 Configuring SQream Using a Legacy Configuration File ... 222 2.1.4.9.3 Session-Based Configuration ...... 222 2.1.4.9.4 Cluster-Based Configuration ...... 223 2.1.4.9.5 Configuration Flag Types ...... 223 2.1.4.9.6 Configuration Commands ...... 223 2.1.4.9.7 Configuration Roles ...... 225 2.1.4.9.8 Showing All Flags in the Catalog Table ...... 225 2.1.4.10 Troubleshooting ...... 225 2.1.4.10.1 Troubleshooting common issues ...... 227 2.1.4.10.1.1 Troubleshoot cluster setup and configuration ...... 227 2.1.4.10.1.2 Troubleshoot connectivity issues ...... 227 2.1.4.10.1.3 Troubleshoot query performance ...... 227 2.1.4.10.1.4 Troubleshoot query behavior ...... 227 2.1.4.10.1.5 File an issue with SQream support ...... 227 2.1.4.10.2 Examining logs ...... 228 2.1.4.10.3 Start a temporary SQream DB for testing ...... 228 2.1.4.11 Gathering information for SQream support ...... 228 2.1.4.11.1 Getting support and reporting bugs ...... 228 2.1.4.11.2 How SQream debugs issues ...... 228 2.1.4.11.2.1 Reproduce ...... 228 2.1.4.11.2.2 Logs ...... 229 2.1.4.11.2.3 Fix ...... 229 2.1.4.11.3 Collecting a reproducible example of a problematic statement ... 229 2.1.4.11.3.1 SQL Syntax ...... 229 2.1.4.11.3.2 Parameters ...... 229 2.1.4.11.3.3 Example ...... 230 2.1.4.11.4 Collecting logs and metadata database ...... 230 2.1.4.11.4.1 Examples ...... 230 2.1.4.12 Creating or cloning a storage cluster ...... 230 2.1.4.12.1 Creating a new storage cluster ...... 230 2.1.4.12.2 Tell SQream DB to use this storage cluster ...... 231 2.1.4.12.2.1 Permanently setting the storage cluster setting ...... 231 2.1.4.12.2.2 Start a temporary SQream DB worker with a storage cluster 231 2.1.4.12.2.3 Using a configuration file (recommended) ...... 231 2.1.4.12.2.4 Using the command line parameters ...... 232 2.1.4.12.3 Copying an existing storage cluster ...... 232 2.1.4.13 Installing Studio on a Stand-Alone Server ...... 232 2.1.4.13.1 Installing NodeJS Version 12 on the Server ...... 233 2.1.4.13.2 Installing Studio ...... 234 2.1.4.13.3 Starting Studio Manually ...... 235 2.1.4.13.4 Starting Studio as a Service ...... 235 x 2.1.4.13.5 Accessing Studio ...... 236 2.1.4.13.6 Maintaining Studio with the Process Manager (PM2) ...... 237 2.1.4.13.7 Upgrading Studio ...... 237 2.1.4.13.7.1 Installing Studio in a Docker Container ...... 238 2.1.4.13.8 Installing SQream Studio in a Docker Container ...... 238 2.1.4.13.9 Accessing Studio ...... 239 2.1.4.13.10 Docker Container Commands ...... 239 2.1.4.13.11 Setup Argument Configurations ...... 239 2.1.4.14 SQream Acceleration Studio 5.4.0 ...... 240 2.1.4.14.1 Getting Started ...... 241 2.1.4.14.1.1 Setting Up and Starting Studio ...... 241 2.1.4.14.1.2 Logging In to Studio ...... 241 2.1.4.14.1.3 Navigating Studio’s Main Features ...... 242 2.1.4.14.2 Monitoring Workers and Services from the Dashboard ...... 242 2.1.4.14.2.1 Displaying System Disk Usage from the Data Storage Panel . 243 2.1.4.14.2.2 Subscribing to Workers from the Services Panel ...... 244 2.1.4.14.2.3 Adding A Service ...... 245 2.1.4.14.2.4 Managing Workers from the Workers Panel ...... 245 2.1.4.14.2.5 Viewing Workers ...... 245 2.1.4.14.2.6 Adding A Worker to A Service ...... 246 2.1.4.14.2.7 Viewing A Worker’s Active Query Information ...... 246 2.1.4.14.2.8 Viewing A Worker’s Host Utilization ...... 246 2.1.4.14.2.9 Viewing a Worker’s Execution Plan ...... 247 2.1.4.14.2.10 Managing Worker Status ...... 248 2.1.4.14.2.11 License Information ...... 248 2.1.4.14.3 Executing Statements and Running Queries from the Editor .... 248 2.1.4.14.3.1 Executing Statements from the Toolbar ...... 249 2.1.4.14.3.2 Performing Statement-Related Operations from the Database Tree ...... 249 2.1.4.14.3.3 Optimizing Database Tables Using the DDL Optimizer ... 251 2.1.4.14.3.4 Executing Pre-Defined Queries from the System Queries Panel251 2.1.4.14.3.5 Writing Statements and Queries from the Statement Panel . 252 2.1.4.14.3.6 Viewing Statement and Query Results from the Results Panel 252 2.1.4.14.3.7 Searching Query Results in the Results View ...... 253 2.1.4.14.3.8 Running Parallel Statements ...... 253 2.1.4.14.3.9 Execution Details View ...... 254 2.1.4.14.3.10 Viewing Wrapped Strings in the SQL View ...... 254 2.1.4.14.3.11 Saving Results to the Clipboard ...... 255 2.1.4.14.3.12 Saving Results to a Local File ...... 255 2.1.4.14.3.13 Analyzing Results ...... 255 2.1.4.14.4 Viewing Logs ...... 255 2.1.4.14.4.1 Filtering Table Data ...... 256 2.1.4.14.4.2 Viewing Query Logs ...... 257 2.1.4.14.4.3 Viewing Session Logs ...... 257 2.1.4.14.4.4 Viewing System Logs ...... 257 2.1.4.14.4.5 Viewing All Log Lines ...... 258 2.1.4.14.5 Creating, Assigning, and Managing Roles and Permissions ..... 258 2.1.4.14.5.1 Overview ...... 258 2.1.4.14.5.2 Viewing Information About a Role ...... 259 2.1.4.14.5.3 Creating a New Role ...... 259 2.1.4.14.5.4 Editing a Role ...... 260 2.1.4.14.5.5 Deleting a Role ...... 260 2.1.4.14.6 Configuring Your Instance of SQream ...... 260 2.1.4.14.6.1 Editing Your Parameters ...... 260

xi 2.1.4.14.6.2 Exporting and Importing Configuration Files ...... 261 2.1.4.15 Using the statement editor (deprecated) ...... 261 2.1.4.15.1 Setting up and starting the statement editor ...... 262 2.1.4.15.2 Logging in ...... 262 2.1.4.15.3 Familiarizing yourself with the editor ...... 263 2.1.4.15.3.1 Toolbar ...... 264 2.1.4.15.3.2 Statement area ...... 265 2.1.4.15.3.3 Keyboard shortcuts ...... 265 2.1.4.15.3.4 Results ...... 265 2.1.4.15.3.5 Sorting results ...... 266 2.1.4.15.3.6 Viewing execution information ...... 266 2.1.4.15.3.7 Saving results to a file or clipboard ...... 266 2.1.4.15.3.8 Database tree ...... 266 2.1.4.15.3.9 Database ...... 268 2.1.4.15.3.10 Schema ...... 268 2.1.4.15.3.11 Table ...... 268 2.1.4.15.3.12 Create a table ...... 268 2.1.4.15.3.13 Catalog views ...... 269 2.1.4.15.3.14 Predefined queries ...... 270 2.1.4.15.4 Notifications ...... 270 2.1.4.15.5 DDL optimizer ...... 270 2.1.4.15.5.1 Using the DDL optimizer ...... 270 2.1.4.15.5.2 Analyzing the results ...... 271 2.1.4.16 Hardware Guide ...... 272 2.1.4.16.1 A SQream DB Cluster ...... 272 2.1.4.16.1.1 Single-Node Cluster Example ...... 272 2.1.4.16.1.2 Multi-Node Cluster Example ...... 273 2.1.4.16.2 Cluster Design Considerations ...... 273 2.1.4.16.2.1 Balancing Cost and Performance ...... 274 2.1.4.16.2.2 CPU Compute ...... 274 2.1.4.16.2.3 GPU Compute and RAM ...... 274 2.1.4.16.2.4 RAM ...... 275 2.1.4.16.2.5 Operating System ...... 275 2.1.4.16.2.6 Storage ...... 275 2.1.4.16.3 Cluster Supporting 32 Concurrent Active User Example ...... 275 2.1.4.17 Security ...... 276 2.1.4.17.1 Overview ...... 276 2.1.4.17.2 Security best practices for SQream DB ...... 277 2.1.4.17.2.1 Secure OS access ...... 277 2.1.4.17.2.2 Change the default SUPERUSER ...... 277 2.1.4.17.2.3 Create distinct user roles ...... 277 2.1.4.17.2.4 Limit SUPERUSER access ...... 277 2.1.4.17.2.5 Password strength guidelines ...... 278 2.1.4.17.2.6 Use TLS/SSL when possible ...... 278 2.1.4.18 Setting up SQream DB ...... 278 2.1.4.18.1 Before you begin ...... 278 2.1.4.18.1.1 What does the installation process look like? ...... 279 2.1.4.18.2 Start a local SQream DB cluster with Docker ...... 279 2.1.4.18.2.1 Preparing your machine for NVIDIA Docker ...... 280 2.1.4.18.2.2 CentOS 7 / RHEL 7 / Amazon Linux ...... 280 2.1.4.18.2.3 Ubuntu 18.04 ...... 281 2.1.4.18.2.4 Install Docker CE and NVIDIA docker ...... 282 2.1.4.18.2.5 CentOS 7 / RHEL 7 / Amazon Linux (x64) ...... 282 2.1.4.18.2.6 CentOS 7.6 / RHEL 7.6 (IBM POWER9) ...... 284 xii 2.1.4.18.2.7 Ubuntu 18.04 (x64) ...... 285 2.1.4.18.2.8 Preparing directories and mounts for SQream DB ...... 286 2.1.4.18.2.9 Install the SQream DB Docker container ...... 287 2.1.4.18.2.10 Starting your first local cluster ...... 288 2.1.4.18.3 Recommended Pre-Installation Configuration ...... 288 2.1.4.18.3.1 Recommended BIOS Settings ...... 289 2.1.4.18.3.2 Installing the Operating System ...... 290 2.1.4.18.3.3 Configuring the Operating System ...... 291 2.1.4.18.3.4 Logging In to the Server ...... 291 2.1.4.18.3.5 Automatically Creating a SQream User ...... 291 2.1.4.18.3.6 Manually Creating a SQream User ...... 292 2.1.4.18.3.7 Setting Up A Locale ...... 292 2.1.4.18.3.8 Installing the Required Packages ...... 292 2.1.4.18.3.9 Installing the Recommended Tools ...... 293 2.1.4.18.3.10 Installing Python 3.6.7 ...... 293 2.1.4.18.3.11 Installing NodeJS on CentOS ...... 293 2.1.4.18.3.12 Installing NodeJS on Ubuntu ...... 294 2.1.4.18.3.13 Configuring the Network Time Protocol (NTP) ...... 294 2.1.4.18.3.14 Configuring the Network Time Protocol Server ...... 294 2.1.4.18.3.15 Configuring the Server to Boot Without theUI ...... 295 2.1.4.18.3.16 Configuring the Security Limits ...... 295 2.1.4.18.3.17 Configuring the Kernel Parameters ...... 295 2.1.4.18.3.18 Configuring the Firewall ...... 296 2.1.4.18.3.19 Disabling selinux ...... 296 2.1.4.18.3.20 Configuring the /etc/hosts File ...... 297 2.1.4.18.3.21 Configuring the DNS ...... 297 2.1.4.18.3.22 Installing the Nvidia CUDA Driver ...... 297 2.1.4.18.3.23 CUDA Driver Prerequisites ...... 297 2.1.4.18.3.24 Updating the Kernel Headers ...... 298 2.1.4.18.3.25 Disabling Nouveau ...... 298 2.1.4.18.3.26 Installing the CUDA Driver ...... 299 2.1.4.18.3.27 Installing the CUDA Driver from the Repository ...... 299 2.1.4.18.3.28 Tuning Up NVIDIA Performance ...... 301 2.1.4.18.3.29 Disabling Automatic Bug Reporting Tools ...... 301 2.1.4.18.3.30 Enabling Core Dumps ...... 302 2.1.4.18.3.31 Checking the abrtd Status ...... 302 2.1.4.18.3.32 Setting the Limits ...... 303 2.1.4.18.3.33 Creating the Core Dumps Directory ...... 303 2.1.4.18.3.34 Setting the Output Directory of the /etc/sysctl.conf File .. 303 2.1.4.18.3.35 Verifying that the Core Dumps Work ...... 304 2.1.4.18.3.36 Troubleshooting Core Dumping ...... 304 2.1.5 Features guides ...... 305 2.1.5.1 Access control ...... 305 2.1.5.1.1 Overview ...... 306 2.1.5.1.2 Terminology ...... 306 2.1.5.1.2.1 Roles ...... 306 2.1.5.1.2.2 Authentication ...... 306 2.1.5.1.2.3 Authorization ...... 306 2.1.5.1.3 Roles ...... 306 2.1.5.1.3.1 Creating new roles (users) ...... 306 2.1.5.1.3.2 Dropping a user ...... 307 2.1.5.1.3.3 Altering a user name ...... 307 2.1.5.1.3.4 Changing user passwords ...... 307 2.1.5.1.3.5 Public Role ...... 307

xiii 2.1.5.1.3.6 Role membership (groups) ...... 308 2.1.5.1.4 Permissions ...... 309 2.1.5.1.4.1 GRANT ...... 309 2.1.5.1.4.2 REVOKE ...... 310 2.1.5.1.4.3 Default permissions ...... 311 2.1.5.1.5 Departmental Example ...... 312 2.1.5.1.5.1 Setting up the department permissions ...... 312 2.1.5.1.5.2 Creating new users in the departments ...... 314 2.1.5.2 Concurrency and locks ...... 315 2.1.5.2.1 Locking modes ...... 316 2.1.5.2.2 When are locks obtained? ...... 316 2.1.5.2.2.1 Global locks ...... 316 2.1.5.2.3 Monitoring locks ...... 316 2.1.5.2.4 Troubleshooting locks ...... 317 2.1.5.2.4.1 Example ...... 317 2.1.5.3 Concurrency and scaling in SQream DB ...... 318 2.1.5.3.1 Scaling when data sizes grow ...... 318 2.1.5.3.2 Scaling when queries are queueing ...... 318 2.1.5.3.3 What to do when queries are slow ...... 318 2.1.5.4 Metadata system ...... 318 2.1.5.4.1 How is metadata collected? ...... 319 2.1.5.4.2 How is metadata used? ...... 319 2.1.5.4.3 Why is metadata always on? ...... 320 2.1.5.5 Chunks and extents ...... 320 2.1.5.5.1 What are chunks? What are extents? ...... 320 2.1.5.5.1.1 Chunks ...... 320 2.1.5.5.1.2 Extents ...... 321 2.1.5.5.2 Why was SQream DB built with chunks and extents? ...... 321 2.1.5.5.2.1 Benefits of chunking ...... 321 2.1.5.5.2.2 Storage reorganization ...... 322 2.1.5.5.2.3 Metadata ...... 322 2.1.5.6 Compression ...... 322 2.1.5.6.1 Encoding ...... 322 2.1.5.6.2 Compression ...... 323 2.1.5.6.2.1 Automatic compression ...... 323 2.1.5.6.2.2 Compression strategies ...... 324 2.1.5.6.2.3 Specifying compression strategies ...... 325 2.1.5.6.2.4 Explicitly specifying automatic compression ...... 325 2.1.5.6.2.5 Forcing no compression (flat) ...... 325 2.1.5.6.2.6 Forcing compressions ...... 325 2.1.5.6.2.7 Examining compression effectiveness ...... 326 2.1.5.6.2.8 Notes on reading this table: ...... 329 2.1.5.6.3 Compression best practices ...... 329 2.1.5.6.3.1 Let SQream DB decide on the compression strategy ..... 329 2.1.5.6.3.2 Maximize the advantage of each compression schemes .... 329 2.1.5.6.3.3 Choose data types that fit the data ...... 330 2.1.5.7 Data clustering ...... 330 2.1.5.7.1 Clustered tables ...... 330 2.1.5.7.2 When does clustering help? ...... 330 2.1.5.7.3 Controlling data clustering ...... 331 2.1.5.7.4 Examples ...... 331 2.1.5.7.4.1 Creating a clustered table ...... 331 2.1.5.8 Deleting Data ...... 331 2.1.5.8.1 How does deleting in SQream DB work? ...... 331 xiv 2.1.5.8.1.1 Phase 1: Delete ...... 332 2.1.5.8.1.2 Phase 2: Clean-up ...... 332 2.1.5.8.2 Notes on data deletion ...... 332 2.1.5.8.2.1 Deleting data does not free up space ...... 332 2.1.5.8.2.2 SELECT performance on deleted rows ...... 333 2.1.5.8.2.3 Use TRUNCATE instead of DELETE ...... 333 2.1.5.8.2.4 Cleanup is I/O intensive ...... 333 2.1.5.8.3 Example ...... 333 2.1.5.8.3.1 Deleting values from a table ...... 333 2.1.5.8.3.2 Deleting values based on more complex predicates ...... 334 2.1.5.8.3.3 Identifying and cleaning up tables ...... 334 2.1.5.8.3.4 List tables that haven’t been cleaned up ...... 334 2.1.5.8.3.5 Identify predicates for clean-up ...... 334 2.1.5.8.3.6 Triggering a cleanup ...... 335 2.1.5.8.4 Best practices for data deletion ...... 335 2.1.5.9 Time based data management ...... 335 2.1.5.9.1 Timestamped data ...... 336 2.1.5.9.2 Chunking ...... 336 2.1.5.9.3 Use cases ...... 336 2.1.5.9.4 Best practices for time-based data ...... 337 2.1.5.9.4.1 Use date and datetime types for columns ...... 337 2.1.5.9.4.2 Ordering ...... 337 2.1.5.9.4.3 Limit workload by timestamp ...... 337 2.1.5.10 Foreign Tables ...... 337 2.1.5.10.1 What kind of data is supported? ...... 338 2.1.5.10.2 What kind of data staging is supported? ...... 338 2.1.5.10.3 Using foreign tables - a practical example ...... 338 2.1.5.10.3.1 Planning for data staging ...... 338 2.1.5.10.3.2 Creating the foreign table ...... 339 2.1.5.10.3.3 From S3 ...... 339 2.1.5.10.3.4 From HDFS ...... 340 2.1.5.10.3.5 Querying foreign tables ...... 340 2.1.5.10.3.6 Modifying data from staging ...... 341 2.1.5.10.3.7 Converting a foreign table to a standard database table ... 341 2.1.5.10.4 Error handling and limitations ...... 342 2.1.5.11 Working with External Data ...... 342 2.1.5.11.1 Amazon S3 ...... 342 2.1.5.11.1.1 S3 Configuration ...... 343 2.1.5.11.1.2 S3 URI Format ...... 343 2.1.5.11.1.3 Authentication ...... 343 2.1.5.11.1.4 Examples ...... 343 2.1.5.11.1.5 Planning for Data Staging ...... 343 2.1.5.11.1.6 Creating the External Table ...... 344 2.1.5.11.1.7 Querying External Tables ...... 345 2.1.5.11.1.8 Bulk Loading a File from a Public S3 Bucket ...... 345 2.1.5.11.1.9 Loading Files from an Authenticated S3 Bucket ...... 345 2.1.5.11.2 Using SQream in an HDFS Environment ...... 346 2.1.5.11.2.1 Configuring an HDFS Environment for the User sqream .. 346 2.1.5.11.2.2 Authenticate Hadoop Servers that Require Kerberos ..... 346 2.1.5.12 Transactions ...... 349 2.1.5.13 Workload Manager ...... 349 2.1.5.13.1 Why use the workload manager? ...... 349 2.1.5.13.2 Setting up service queues ...... 350 2.1.5.13.3 Example - Allocating resources for ETL ...... 350

xv 2.1.5.13.3.1 Creating this configuration ...... 350 2.1.5.13.3.2 Verifying the configuration ...... 351 2.1.5.13.4 Configuring a client to connect to a specific service ...... 352 2.1.5.13.4.1 Using Sqream SQL CLI Reference ...... 352 2.1.5.13.4.2 Using JDBC ...... 352 2.1.5.13.4.3 Using ODBC ...... 352 2.1.5.13.4.4 Using Python (pysqream) ...... 352 2.1.5.13.4.5 Using Node.JS ...... 353 2.1.5.14 Python UDF (user-defined functions) ...... 353 2.1.5.14.1 A simple example ...... 354 2.1.5.14.2 Why use UDFs? ...... 355 2.1.5.14.3 SQream DB’s UDF support ...... 355 2.1.5.14.3.1 Scalar functions ...... 355 2.1.5.14.3.2 Python ...... 355 2.1.5.14.3.3 Using modules ...... 355 2.1.5.14.4 Finding existing UDFs in the catalog ...... 356 2.1.5.14.5 Getting the DDL for a function ...... 356 2.1.5.14.6 Error handling ...... 356 2.1.5.14.7 Permissions and sharing ...... 356 2.1.5.14.8 Best practices ...... 357 2.1.5.15 Saved queries ...... 357 2.1.5.15.1 How saved queries work ...... 357 2.1.5.15.2 Parameters support ...... 357 2.1.5.15.3 Creating a saved query ...... 357 2.1.5.15.3.1 Saving a simple query ...... 357 2.1.5.15.3.2 Saving a parametrized query ...... 357 2.1.5.15.4 Listing and executing saved queries ...... 358 2.1.5.15.5 Dropping a saved query ...... 359 2.1.5.16 Seeing system objects as DDL ...... 359 2.1.5.16.1 Dump specific objects ...... 359 2.1.5.16.1.1 Tables ...... 359 2.1.5.16.1.2 Getting the DDL for a table ...... 359 2.1.5.16.1.3 Exporting table DDL to a file ...... 359 2.1.5.16.1.4 Views ...... 359 2.1.5.16.1.5 Listing all views ...... 360 2.1.5.16.1.6 Getting the DDL for a view ...... 360 2.1.5.16.1.7 Exporting view DDL to a file ...... 360 2.1.5.16.1.8 User defined functions ...... 360 2.1.5.16.1.9 Listing all UDFs ...... 360 2.1.5.16.1.10 Getting the DDL for a function ...... 360 2.1.5.16.1.11 Exporting function DDL to a file ...... 361 2.1.5.16.1.12 Saved queries ...... 361 2.1.5.16.2 Dump entire database DDLs ...... 361 2.1.5.16.2.1 Exporting database DDL to a client ...... 361 2.1.5.16.2.2 Exporting database DDL to a file ...... 362 2.1.6 Architecture ...... 362 2.1.6.1 Internals and architecture ...... 362 2.1.6.1.1 SQream DB internals ...... 362 2.1.6.1.1.1 Statement compiler ...... 363 2.1.6.1.1.2 Concurrency and concurrency control ...... 363 2.1.6.1.1.3 Transactions ...... 363 2.1.6.1.1.4 Storage ...... 363 2.1.6.1.1.5 Metadata layer ...... 363 2.1.6.1.1.6 Bulk data layer ...... 363 xvi 2.1.6.1.1.7 Building blocks ...... 364 2.1.6.1.2 Columnar ...... 364 2.1.6.1.3 GPU usage ...... 364 2.1.6.2 Filesystem and usage ...... 364 2.1.6.2.1 Directory organization ...... 364 2.1.6.2.1.1 databases ...... 365 2.1.6.2.1.2 metadata or leveldb ...... 366 2.1.6.2.1.3 temp ...... 366 2.1.6.2.1.4 logs ...... 367

3 Reference 369 3.1 Data Types ...... 369 3.1.1 Supported Types ...... 369 3.1.2 Converting and Casting Types ...... 371 3.1.3 Data Type Reference ...... 371 3.1.3.1 Numeric (NUMERIC, DECIMAL) ...... 371 3.1.3.1.1 Numeric Examples ...... 371 3.1.3.2 Boolean (BOOL) ...... 372 3.1.3.2.1 Boolean Examples ...... 372 3.1.3.2.2 Boolean Casts and Conversions ...... 372 3.1.3.3 Integers (TINYINT, SMALLINT, INT, BIGINT) ...... 372 3.1.3.3.1 Integer Types ...... 372 3.1.3.3.2 Integer Examples ...... 373 3.1.3.3.3 Integer Casts and Conversions ...... 373 3.1.3.4 Floating Point (REAL, DOUBLE) ...... 373 3.1.3.4.1 Floating Point Types ...... 374 3.1.3.4.2 Floating Point Examples ...... 374 3.1.3.4.3 Floating Point Casts and Conversions ...... 374 3.1.3.5 String (TEXT, VARCHAR) ...... 375 3.1.3.5.1 String Types ...... 375 3.1.3.5.2 Length ...... 375 3.1.3.5.3 Syntax ...... 375 3.1.3.5.4 Size ...... 375 3.1.3.5.5 String Examples ...... 376 3.1.3.5.6 String Casts and Conversions ...... 376 3.1.3.6 Date (DATE, DATETIME) ...... 376 3.1.3.6.1 Date Types ...... 376 3.1.3.6.2 Aliases ...... 377 3.1.3.6.3 Syntax ...... 377 3.1.3.6.4 Size ...... 377 3.1.3.6.5 Date Examples ...... 377 3.1.3.6.6 Date Casts and Conversions ...... 378 3.2 SQL Statements and Syntax ...... 378 3.2.1 SQL Syntax Features ...... 378 3.2.1.1 Keywords and Identifiers ...... 378 3.2.1.1.1 Keywords ...... 378 3.2.1.1.2 Identifiers ...... 378 3.2.1.1.2.1 Identifier rules ...... 378 3.2.1.1.3 Reserved keywords ...... 379 3.2.1.2 Literals ...... 381 3.2.1.2.1 Numeric Literals ...... 381 3.2.1.2.1.1 Examples ...... 381 3.2.1.2.2 String Literals ...... 382 3.2.1.2.2.1 Examples ...... 382

xvii 3.2.1.2.2.2 Regular String Literals ...... 383 3.2.1.2.2.3 Examples ...... 383 3.2.1.2.2.4 Dollar-Quoted String Literals ...... 383 3.2.1.2.2.5 Examples ...... 383 3.2.1.2.2.6 Escaped String Literals ...... 384 3.2.1.2.3 Typed Literals ...... 384 3.2.1.2.3.1 Syntax Reference ...... 384 3.2.1.2.3.2 Examples ...... 385 3.2.1.2.4 Boolean Literals ...... 385 3.2.1.2.4.1 Example ...... 385 3.2.1.2.5 Other Constants ...... 385 3.2.1.3 Scalar expressions ...... 386 3.2.1.3.1 Literals ...... 386 3.2.1.3.2 Column references ...... 386 3.2.1.3.3 Operators ...... 386 3.2.1.3.3.1 Unary operator ...... 386 3.2.1.3.3.2 Binary operator ...... 386 3.2.1.3.3.3 Special operators for set membership ...... 387 3.2.1.3.3.4 Comparisons ...... 387 3.2.1.3.3.5 Operator precedence ...... 387 3.2.1.4 Joins ...... 388 3.2.1.4.1 Syntax ...... 388 3.2.1.4.1.1 Join types ...... 389 3.2.1.4.1.2 Inner join ...... 389 3.2.1.4.1.3 Left outer join ...... 389 3.2.1.4.1.4 Right outer join ...... 389 3.2.1.4.1.5 Cross join ...... 389 3.2.1.4.1.6 Join conditions ...... 389 3.2.1.4.1.7 ON value_expr ...... 389 3.2.1.4.2 Examples ...... 390 3.2.1.4.2.1 Inner join ...... 390 3.2.1.4.2.2 Left join ...... 390 3.2.1.4.2.3 Right join ...... 390 3.2.1.4.2.4 Cross join ...... 391 3.2.1.4.2.5 Join hints ...... 392 3.2.1.5 Common table expressions (CTEs) ...... 392 3.2.1.5.1 Syntax ...... 393 3.2.1.5.2 Examples ...... 393 3.2.1.5.2.1 Simple CTE ...... 393 3.2.1.5.2.2 Nested CTEs ...... 394 3.2.1.5.2.3 Reusing CTEs ...... 394 3.2.1.5.2.4 Using CTEs with CREATE TABLE AS ...... 394 3.2.1.6 Window Functions ...... 394 3.2.1.6.1 Syntax ...... 395 3.2.1.6.2 Arguments ...... 396 3.2.1.6.3 Supported Window Functions ...... 396 3.2.1.6.4 How Window Functions Work ...... 396 3.2.1.6.4.1 PARTITION BY ...... 397 3.2.1.6.4.2 ORDER BY ...... 397 3.2.1.6.4.3 Frames ...... 397 3.2.1.6.4.4 Restrictions ...... 398 3.2.1.6.4.5 Frame Exclusion ...... 398 3.2.1.6.5 Limitations ...... 398 3.2.1.6.6 Examples ...... 398 xviii 3.2.1.6.6.1 Window Function Application ...... 399 3.2.1.6.6.2 Ranking Results ...... 400 3.2.1.6.6.3 Using LEAD to Access Following Rows Without a Join .... 400 3.2.1.7 Subqueries ...... 401 3.2.1.7.1 Table Subqueries ...... 401 3.2.1.7.2 Scalar Subqueries ...... 402 3.2.1.7.2.1 Simple Subquery ...... 402 3.2.1.7.2.2 Combining a Subquery with a Join ...... 402 3.2.1.7.2.3 WITH subqueries ...... 403 3.2.1.8 Null handling ...... 403 3.2.1.8.1 Comparisons ...... 403 3.2.1.8.2 Conditionals ...... 404 3.2.1.8.3 Operations ...... 404 3.2.1.8.3.1 Scalar functions ...... 404 3.2.1.8.3.2 Aggregates ...... 404 3.2.1.8.3.3 Distincts ...... 404 3.2.1.8.4 Sorting ...... 405 3.2.2 SQL Statements ...... 405 3.2.2.1 Data Definition Commands (DDL) ...... 406 3.2.2.2 Data Manipulation Commands (DML) ...... 406 3.2.2.3 Utility Commands ...... 406 3.2.2.4 Saved Queries ...... 407 3.2.2.5 Monitoring ...... 407 3.2.2.6 Workload Management ...... 407 3.2.2.7 Access Control Commands ...... 408 3.2.2.7.1 ADD COLUMN ...... 408 3.2.2.7.1.1 Permissions ...... 408 3.2.2.7.1.2 Syntax ...... 408 3.2.2.7.1.3 Parameters ...... 409 3.2.2.7.1.4 Examples ...... 409 3.2.2.7.1.5 Adding a simple column with default value ...... 409 3.2.2.7.1.6 Adding several columns in one command ...... 409 3.2.2.7.2 ALTER DEFAULT SCHEMA ...... 409 3.2.2.7.2.1 Permissions ...... 409 3.2.2.7.2.2 Syntax ...... 409 3.2.2.7.2.3 Parameters ...... 410 3.2.2.7.2.4 Examples ...... 410 3.2.2.7.2.5 Altering the default schema for a role ...... 410 3.2.2.7.3 ALTER TABLE ...... 410 3.2.2.7.3.1 Locks ...... 410 3.2.2.7.3.2 Subcommands ...... 410 3.2.2.7.4 CLUSTER BY ...... 411 3.2.2.7.4.1 Permissions ...... 411 3.2.2.7.4.2 Syntax ...... 411 3.2.2.7.4.3 Parameters ...... 411 3.2.2.7.4.4 Usage notes ...... 411 3.2.2.7.4.5 Examples ...... 411 3.2.2.7.4.6 Reclustering a table ...... 411 3.2.2.7.5 CREATE DATABASE ...... 411 3.2.2.7.5.1 Permissions ...... 412 3.2.2.7.5.2 Syntax ...... 412 3.2.2.7.5.3 Parameters ...... 412 3.2.2.7.5.4 Examples ...... 412 3.2.2.7.6 CREATE EXTERNAL TABLE ...... 412

xix 3.2.2.7.6.1 Permissions ...... 413 3.2.2.7.6.2 Syntax ...... 413 3.2.2.7.6.3 Parameters ...... 414 3.2.2.7.6.4 Examples ...... 414 3.2.2.7.6.5 A simple table from Tab-delimited file (TSV) ...... 414 3.2.2.7.6.6 A table from a directory of Parquet files on HDFS ...... 414 3.2.2.7.6.7 A table from a bucket of files on S3 ...... 415 3.2.2.7.6.8 Changing an external table to a regular table ...... 415 3.2.2.7.7 CREATE FOREIGN TABLE ...... 415 3.2.2.7.7.1 Permissions ...... 415 3.2.2.7.7.2 Syntax ...... 415 3.2.2.7.7.3 Parameters ...... 417 3.2.2.7.7.4 Examples ...... 417 3.2.2.7.7.5 A simple table from Tab-delimited file (TSV) ...... 417 3.2.2.7.7.6 A table from a directory of Parquet files on HDFS ...... 417 3.2.2.7.7.7 A table from a bucket of ORC files on S3 ...... 418 3.2.2.7.7.8 Changing a foreign table to a regular table ...... 418 3.2.2.7.8 CREATE FUNCTION ...... 418 3.2.2.7.8.1 Permissions ...... 418 3.2.2.7.8.2 Syntax ...... 418 3.2.2.7.8.3 Parameters ...... 419 3.2.2.7.8.4 Examples ...... 419 3.2.2.7.8.5 Calculate distance between two points ...... 419 3.2.2.7.8.6 Calling files from other locations ...... 419 3.2.2.7.9 CREATE SCHEMA ...... 420 3.2.2.7.9.1 Overview ...... 420 3.2.2.7.9.2 Permissions ...... 420 3.2.2.7.9.3 Syntax ...... 420 3.2.2.7.9.4 Parameters ...... 421 3.2.2.7.9.5 Examples ...... 421 3.2.2.7.9.6 Creating a Schema ...... 421 3.2.2.7.9.7 Altering the Default Schema for a Role ...... 421 3.2.2.7.10 CREATE TABLE ...... 421 3.2.2.7.10.1 Permissions ...... 422 3.2.2.7.10.2 Syntax ...... 422 3.2.2.7.10.3 Parameters ...... 422 3.2.2.7.10.4 Default Value Constraints ...... 423 3.2.2.7.10.5 Syntax ...... 423 3.2.2.7.10.6 Identity ...... 423 3.2.2.7.10.7 Examples ...... 424 3.2.2.7.10.8 Standard Table ...... 424 3.2.2.7.10.9 Table with Default Value Constraints for Some Columns ... 424 3.2.2.7.10.10 Table with an Identity Column ...... 424 3.2.2.7.10.11 Creating a Table from a SELECT Query ...... 425 3.2.2.7.10.12 Creating a Table with a Clustering Key ...... 425 3.2.2.7.11 CREATE TABLE AS ...... 425 3.2.2.7.11.1 Permissions ...... 425 3.2.2.7.11.2 Syntax ...... 426 3.2.2.7.11.3 Parameters ...... 426 3.2.2.7.11.4 Examples ...... 426 3.2.2.7.11.5 Create a copy of an foreign table or view ...... 426 3.2.2.7.11.6 Filtering ...... 426 3.2.2.7.11.7 Adding columns ...... 426 3.2.2.7.11.8 Creating a table from values ...... 426 xx 3.2.2.7.12 CREATE VIEW ...... 427 3.2.2.7.12.1 Permissions ...... 427 3.2.2.7.12.2 Syntax ...... 427 3.2.2.7.12.3 Parameters ...... 427 3.2.2.7.12.4 Examples ...... 428 3.2.2.7.12.5 A simple view ...... 428 3.2.2.7.12.6 Overriding default column names ...... 428 3.2.2.7.13 DROP CLUSTERING KEY ...... 428 3.2.2.7.13.1 Permissions ...... 428 3.2.2.7.13.2 Syntax ...... 428 3.2.2.7.13.3 Parameters ...... 428 3.2.2.7.13.4 Usage notes ...... 428 3.2.2.7.13.5 Examples ...... 429 3.2.2.7.13.6 Dropping clustering keys in a table ...... 429 3.2.2.7.14 DROP COLUMN ...... 429 3.2.2.7.14.1 Permissions ...... 429 3.2.2.7.14.2 Syntax ...... 429 3.2.2.7.14.3 Parameters ...... 429 3.2.2.7.14.4 Examples ...... 429 3.2.2.7.14.5 Removing a column ...... 429 3.2.2.7.14.6 Removing a column with a quoted identifier name ...... 430 3.2.2.7.15 DROP DATABASE ...... 430 3.2.2.7.15.1 Permissions ...... 430 3.2.2.7.15.2 Syntax ...... 430 3.2.2.7.15.3 Parameters ...... 430 3.2.2.7.15.4 Examples ...... 430 3.2.2.7.15.5 Dropping a database and all of its objects ...... 430 3.2.2.7.15.6 Dropping the current database ...... 430 3.2.2.7.16 DROP FUNCTION ...... 431 3.2.2.7.16.1 Permissions ...... 431 3.2.2.7.16.2 Syntax ...... 431 3.2.2.7.16.3 Parameters ...... 431 3.2.2.7.16.4 Examples ...... 431 3.2.2.7.16.5 Dropping a function ...... 431 3.2.2.7.16.6 Dropping a function (always succeeds) ...... 431 3.2.2.7.17 DROP SCHEMA ...... 431 3.2.2.7.17.1 Permissions ...... 432 3.2.2.7.17.2 Syntax ...... 432 3.2.2.7.17.3 Parameters ...... 432 3.2.2.7.17.4 Examples ...... 432 3.2.2.7.17.5 Dropping a schema ...... 432 3.2.2.7.17.6 Dropping a schema if it’s not empty ...... 432 3.2.2.7.18 DROP TABLE ...... 433 3.2.2.7.18.1 Permissions ...... 433 3.2.2.7.18.2 Syntax ...... 433 3.2.2.7.18.3 Parameters ...... 433 3.2.2.7.18.4 Examples ...... 433 3.2.2.7.18.5 Dropping a table ...... 433 3.2.2.7.18.6 Dropping a table (always succeeds) ...... 433 3.2.2.7.19 DROP VIEW ...... 434 3.2.2.7.19.1 Permissions ...... 434 3.2.2.7.19.2 Syntax ...... 434 3.2.2.7.19.3 Parameters ...... 434 3.2.2.7.19.4 Examples ...... 434

xxi 3.2.2.7.19.5 Dropping a table ...... 434 3.2.2.7.19.6 Dropping a view (always succeeds) ...... 434 3.2.2.7.20 RENAME COLUMN ...... 435 3.2.2.7.20.1 Permissions ...... 435 3.2.2.7.20.2 Syntax ...... 435 3.2.2.7.20.3 Parameters ...... 435 3.2.2.7.20.4 Examples ...... 435 3.2.2.7.20.5 Renaming a column ...... 435 3.2.2.7.20.6 Renaming a quoted name ...... 435 3.2.2.7.21 RENAME TABLE ...... 436 3.2.2.7.21.1 Permissions ...... 436 3.2.2.7.21.2 Syntax ...... 436 3.2.2.7.21.3 Parameters ...... 436 3.2.2.7.21.4 Examples ...... 436 3.2.2.7.21.5 Renaming a table ...... 436 3.2.2.7.22 COPY FROM ...... 436 3.2.2.7.22.1 Permissions ...... 437 3.2.2.7.22.2 Syntax ...... 437 3.2.2.7.22.3 Elements ...... 440 3.2.2.7.22.4 Supported Date Formats ...... 441 3.2.2.7.22.5 Supported Field Delimiters ...... 441 3.2.2.7.22.6 Customizing Quotations Using Alternative Characters .... 441 3.2.2.7.22.7 Syntax Example 1 - Customizing Quotations Using Alterna- tive Characters ...... 441 3.2.2.7.22.8 Usage Example 1 - Customizing Quotations Using Alterna- tive Characters ...... 442 3.2.2.7.22.9 Syntax Example 2 - Customizing Quotations Using ASCII Character Codes ...... 442 3.2.2.7.22.10 Usage Example 2 - Customizing Quotations Using ASCII Character Codes ...... 442 3.2.2.7.22.11 Multi-Character Delimiters ...... 442 3.2.2.7.22.12 Printable Characters ...... 442 3.2.2.7.22.13 Non-Printable Characters ...... 442 3.2.2.7.22.14 Unsupported Field Delimiters ...... 443 3.2.2.7.22.15 Capturing Rejected Rows ...... 443 3.2.2.7.22.16 CSV Support ...... 444 3.2.2.7.22.17 Marking Null Markers ...... 444 3.2.2.7.22.18 Examples ...... 444 3.2.2.7.22.19 Loading a Standard CSV File ...... 444 3.2.2.7.22.20 Skipping Faulty Rows ...... 444 3.2.2.7.22.21 Skipping a Maximum of 100 Faulty Rows ...... 445 3.2.2.7.22.22 Loading a Pipe Separated Value (PSV) File ...... 445 3.2.2.7.22.23 Loading a Tab Separated Value (TSV) File ...... 445 3.2.2.7.22.24 Loading an ORC File ...... 445 3.2.2.7.22.25 Loading a Parquet File ...... 445 3.2.2.7.22.26 Loading a Text File with Non-Printable Delimiters ...... 445 3.2.2.7.22.27 Loading a Text File with Multi-Character Delimiters ..... 445 3.2.2.7.22.28 Loading Files with a Header Row ...... 446 3.2.2.7.22.29 Loading Files Formatted for Windows (\r\n) ...... 446 3.2.2.7.22.30 Loading a File from a Public S3 Bucket ...... 446 3.2.2.7.22.31 Loading Files from an Authenticated S3 Bucket ...... 446 3.2.2.7.22.32 Saving Rejected Rows to a File ...... 446 3.2.2.7.22.33 Loading CSV Files from a Set of Directories ...... 447 3.2.2.7.22.34 Rearranging Destination Columns ...... 447 xxii 3.2.2.7.22.35 Loading Non-Standard Dates ...... 447 3.2.2.7.23 COPY TO ...... 447 3.2.2.7.23.1 Permissions ...... 448 3.2.2.7.23.2 Syntax ...... 448 3.2.2.7.23.3 Elements ...... 449 3.2.2.7.23.4 Usage notes ...... 449 3.2.2.7.23.5 Supported field delimiters ...... 449 3.2.2.7.23.6 Printable characters ...... 449 3.2.2.7.23.7 Non-printable characters ...... 449 3.2.2.7.23.8 Date format ...... 449 3.2.2.7.23.9 Examples ...... 449 3.2.2.7.23.10 Export table to a CSV without HEADER ...... 449 3.2.2.7.23.11 Export table to a CSV with a HEADER row ...... 450 3.2.2.7.23.12 Export table to a TSV with a header row ...... 450 3.2.2.7.23.13 Use non-printable ASCII characters as delimiter ...... 450 3.2.2.7.23.14 Exporting the result of a query to a CSV ...... 451 3.2.2.7.23.15 Saving files to an authenticated S3 bucket ...... 451 3.2.2.7.23.16 Saving files to an HDFS path ...... 451 3.2.2.7.23.17 Export table to a parquet file ...... 451 3.2.2.7.23.18 Export a query to a parquet file ...... 451 3.2.2.7.23.19 Export table to a ORC file ...... 451 3.2.2.7.24 DELETE ...... 452 3.2.2.7.24.1 Permissions ...... 452 3.2.2.7.24.2 Syntax ...... 452 3.2.2.7.24.3 Parameters ...... 452 3.2.2.7.24.4 How SQream DB deletes data ...... 452 3.2.2.7.24.5 Notes ...... 453 3.2.2.7.24.6 Examples ...... 453 3.2.2.7.24.7 Deleting values from a table ...... 453 3.2.2.7.24.8 Deleting values based on more complex predicates ...... 453 3.2.2.7.24.9 Identifying and cleaning up tables ...... 454 3.2.2.7.24.10 List tables that haven’t been cleaned up ...... 454 3.2.2.7.24.11 Identify predicates for clean-up ...... 454 3.2.2.7.24.12 Triggering a cleanup ...... 454 3.2.2.7.25 INSERT ...... 455 3.2.2.7.25.1 Permissions ...... 455 3.2.2.7.25.2 Syntax ...... 455 3.2.2.7.25.3 Elements ...... 455 3.2.2.7.25.4 Examples ...... 456 3.2.2.7.25.5 Inserting a single row ...... 456 3.2.2.7.25.6 Inserting into rows with default values ...... 456 3.2.2.7.25.7 Changing column order ...... 456 3.2.2.7.25.8 Inserting multiple rows ...... 456 3.2.2.7.25.9 Import data from other tables ...... 456 3.2.2.7.25.10 Inserting data with positional placeholders ...... 457 3.2.2.7.26 SELECT ...... 457 3.2.2.7.26.1 Permissions ...... 458 3.2.2.7.26.2 Syntax ...... 458 3.2.2.7.26.3 Elements ...... 460 3.2.2.7.26.4 Notes ...... 460 3.2.2.7.26.5 Query processing ...... 460 3.2.2.7.26.6 Select lists ...... 460 3.2.2.7.26.7 Examples ...... 461 3.2.2.7.26.8 Simple queries ...... 462

xxiii 3.2.2.7.26.9 Show a count of the rows ...... 462 3.2.2.7.26.10 Get all columns ...... 462 3.2.2.7.26.11 Filter on conditions ...... 463 3.2.2.7.26.12 Filter based on a list ...... 463 3.2.2.7.26.13 Select only distinct rows ...... 463 3.2.2.7.26.14 Count distinct values ...... 464 3.2.2.7.26.15 Rename columns with aliases ...... 464 3.2.2.7.26.16 Searching with LIKE ...... 465 3.2.2.7.26.17 Aggregate functions ...... 465 3.2.2.7.26.18 Filtering on aggregates ...... 465 3.2.2.7.26.19 Sorting results ...... 466 3.2.2.7.26.20 Sorting with multiple columns ...... 467 3.2.2.7.26.21 Combining two or more queries ...... 467 3.2.2.7.26.22 Common table expressions (CTE) ...... 467 3.2.2.7.26.23 Nested CTEs ...... 468 3.2.2.7.26.24 Reusing CTEs ...... 468 3.2.2.7.27 TRUNCATE ...... 468 3.2.2.7.27.1 Permissions ...... 468 3.2.2.7.27.2 Syntax ...... 469 3.2.2.7.27.3 Parameters ...... 469 3.2.2.7.27.4 Examples ...... 469 3.2.2.7.27.5 Truncating a table ...... 469 3.2.2.7.28 VALUES ...... 469 3.2.2.7.28.1 Permissions ...... 470 3.2.2.7.28.2 Syntax ...... 470 3.2.2.7.28.3 Notes ...... 470 3.2.2.7.28.4 Examples ...... 470 3.2.2.7.28.5 Tabular data with VALUES ...... 470 3.2.2.7.28.6 Using VALUES with a SELECT query ...... 470 3.2.2.7.28.7 Creating a table with VALUES ...... 470 3.2.2.7.29 DROP_SAVED_QUERY ...... 471 3.2.2.7.29.1 Permissions ...... 471 3.2.2.7.29.2 Syntax ...... 471 3.2.2.7.29.3 Returns ...... 471 3.2.2.7.29.4 Parameters ...... 471 3.2.2.7.29.5 Examples ...... 471 3.2.2.7.29.6 Dropping a previously saved query ...... 471 3.2.2.7.30 DUMP_DATABASE_DDL ...... 472 3.2.2.7.30.1 Permissions ...... 472 3.2.2.7.30.2 Syntax ...... 472 3.2.2.7.30.3 Parameters ...... 472 3.2.2.7.30.4 Examples ...... 472 3.2.2.7.30.5 Getting the DDL for a database ...... 472 3.2.2.7.30.6 Exporting database DDL to a file ...... 473 3.2.2.7.31 EXECUTE_SAVED_QUERY ...... 473 3.2.2.7.31.1 Permissions ...... 473 3.2.2.7.31.2 Syntax ...... 473 3.2.2.7.31.3 Returns ...... 473 3.2.2.7.31.4 Parameters ...... 474 3.2.2.7.31.5 Notes ...... 474 3.2.2.7.31.6 Examples ...... 474 3.2.2.7.31.7 Saving and executing a simple query ...... 475 3.2.2.7.31.8 Saving and executing parametrized query ...... 475 3.2.2.7.32 GET_DDL ...... 476 xxiv 3.2.2.7.32.1 Permissions ...... 476 3.2.2.7.32.2 Syntax ...... 476 3.2.2.7.32.3 Parameters ...... 476 3.2.2.7.32.4 Examples ...... 477 3.2.2.7.32.5 Getting the DDL for a table ...... 477 3.2.2.7.32.6 Exporting table DDL to a file ...... 477 3.2.2.7.33 GET_FUNCTION_DDL ...... 477 3.2.2.7.33.1 Permissions ...... 477 3.2.2.7.33.2 Syntax ...... 478 3.2.2.7.33.3 Parameters ...... 478 3.2.2.7.33.4 Examples ...... 478 3.2.2.7.33.5 Getting the DDL for a function ...... 478 3.2.2.7.33.6 Exporting function DDL to a file ...... 479 3.2.2.7.34 GET_LICENSE_INFO ...... 479 3.2.2.7.35 GET_VIEW_DDL ...... 479 3.2.2.7.35.1 Permissions ...... 479 3.2.2.7.35.2 Syntax ...... 479 3.2.2.7.35.3 Parameters ...... 480 3.2.2.7.35.4 Examples ...... 480 3.2.2.7.35.5 Getting the DDL for a view ...... 480 3.2.2.7.35.6 Exporting view DDL to a file ...... 480 3.2.2.7.36 LIST_SAVED_QUERIES ...... 480 3.2.2.7.36.1 Permissions ...... 481 3.2.2.7.36.2 Syntax ...... 481 3.2.2.7.36.3 Returns ...... 481 3.2.2.7.36.4 Parameters ...... 481 3.2.2.7.36.5 Notes ...... 481 3.2.2.7.36.6 Examples ...... 481 3.2.2.7.36.7 Listing previously saved queries ...... 481 3.2.2.7.36.8 Listing saved queries with the catalog ...... 481 3.2.2.7.37 RECOMPILE_SAVED_QUERY ...... 482 3.2.2.7.37.1 Permissions ...... 482 3.2.2.7.37.2 Syntax ...... 482 3.2.2.7.37.3 Returns ...... 482 3.2.2.7.37.4 Parameters ...... 482 3.2.2.7.37.5 Examples ...... 482 3.2.2.7.37.6 Recreating a query that has been invalidated ...... 482 3.2.2.7.38 RECOMPILE_VIEW ...... 483 3.2.2.7.38.1 Permissions ...... 483 3.2.2.7.38.2 Syntax ...... 484 3.2.2.7.38.3 Parameters ...... 484 3.2.2.7.38.4 Examples ...... 484 3.2.2.7.38.5 Recreating a view that has been invalidated ...... 484 3.2.2.7.39 SAVE_QUERY ...... 484 3.2.2.7.39.1 Permissions ...... 484 3.2.2.7.39.2 Syntax ...... 484 3.2.2.7.39.3 Returns ...... 485 3.2.2.7.39.4 Parameters ...... 485 3.2.2.7.39.5 Notes ...... 485 3.2.2.7.39.6 Examples ...... 485 3.2.2.7.39.7 Saving and executing a simple query ...... 486 3.2.2.7.39.8 Saving and executing parametrized query ...... 486 3.2.2.7.40 SHOW_SAVED_QUERY ...... 487 3.2.2.7.40.1 Permissions ...... 487

xxv 3.2.2.7.40.2 Syntax ...... 487 3.2.2.7.40.3 Returns ...... 487 3.2.2.7.40.4 Parameters ...... 487 3.2.2.7.40.5 Examples ...... 488 3.2.2.7.40.6 Showing a previously saved query ...... 488 3.2.2.7.41 EXPLAIN ...... 488 3.2.2.7.41.1 Permissions ...... 488 3.2.2.7.41.2 Syntax ...... 488 3.2.2.7.41.3 Parameters ...... 488 3.2.2.7.41.4 Notes ...... 488 3.2.2.7.41.5 Examples ...... 488 3.2.2.7.41.6 Generating a static query plan ...... 488 3.2.2.7.42 SHOW_CONNECTIONS ...... 489 3.2.2.7.42.1 Permissions ...... 489 3.2.2.7.42.2 Syntax ...... 489 3.2.2.7.42.3 Parameters ...... 489 3.2.2.7.42.4 Returns ...... 489 3.2.2.7.42.5 Notes ...... 489 3.2.2.7.42.6 Examples ...... 490 3.2.2.7.42.7 Using SHOW_CONNECTIONS to get statement IDs ...... 490 3.2.2.7.43 SHOW_LOCKS ...... 490 3.2.2.7.43.1 Permissions ...... 490 3.2.2.7.43.2 Syntax ...... 490 3.2.2.7.43.3 Parameters ...... 490 3.2.2.7.43.4 Returns ...... 490 3.2.2.7.43.5 Examples ...... 491 3.2.2.7.43.6 Using SHOW_LOCKS to see active locks ...... 491 3.2.2.7.44 SHOW_NODE_INFO ...... 491 3.2.2.7.44.1 Permissions ...... 492 3.2.2.7.44.2 Syntax ...... 492 3.2.2.7.44.3 Parameters ...... 492 3.2.2.7.44.4 Returns ...... 492 3.2.2.7.44.5 Node types ...... 492 3.2.2.7.44.6 Statement statuses ...... 495 3.2.2.7.44.7 Notes ...... 495 3.2.2.7.44.8 Examples ...... 495 3.2.2.7.44.9 Getting execution details for a statement ...... 495 3.2.2.7.45 SHOW_SERVER_STATUS ...... 496 3.2.2.7.45.1 Permissions ...... 496 3.2.2.7.45.2 Syntax ...... 496 3.2.2.7.45.3 Parameters ...... 497 3.2.2.7.45.4 Returns ...... 497 3.2.2.7.45.5 Notes ...... 497 3.2.2.7.45.6 Examples ...... 498 3.2.2.7.45.7 Using SHOW_SERVER_STATUS to get statement IDs ...... 498 3.2.2.7.46 STOP_STATEMENT ...... 498 3.2.2.7.46.1 Permissions ...... 498 3.2.2.7.46.2 Syntax ...... 498 3.2.2.7.46.3 Parameters ...... 498 3.2.2.7.46.4 Returns ...... 498 3.2.2.7.46.5 Notes ...... 499 3.2.2.7.46.6 Examples ...... 499 3.2.2.7.46.7 Using SHOW_CONNECTIONS to get statement IDs .... 499 3.2.2.7.47 SHOW_SUBSCRIBED_INSTANCES ...... 499 xxvi 3.2.2.7.47.1 Permissions ...... 499 3.2.2.7.47.2 Syntax ...... 499 3.2.2.7.47.3 Parameters ...... 500 3.2.2.7.47.4 Notes ...... 500 3.2.2.7.47.5 Examples ...... 500 3.2.2.7.47.6 List service queues and workers ...... 500 3.2.2.7.48 SUBSCRIBE_SERVICE ...... 500 3.2.2.7.48.1 Permissions ...... 500 3.2.2.7.48.2 Syntax ...... 500 3.2.2.7.48.3 Parameters ...... 501 3.2.2.7.48.4 Notes ...... 501 3.2.2.7.48.5 Examples ...... 501 3.2.2.7.48.6 Subscribing a worker to a service queue ...... 501 3.2.2.7.49 UNSUBSCRIBE_SERVICE ...... 502 3.2.2.7.49.1 Permissions ...... 502 3.2.2.7.49.2 Syntax ...... 502 3.2.2.7.49.3 Parameters ...... 502 3.2.2.7.49.4 Notes ...... 502 3.2.2.7.49.5 Examples ...... 502 3.2.2.7.49.6 Unsubscribing a worker from a service queue ...... 502 3.2.2.7.50 ALTER DEFAULT PERMISSIONS ...... 503 3.2.2.7.50.1 Permissions ...... 503 3.2.2.7.50.2 Syntax ...... 503 3.2.2.7.50.3 Supported permissions ...... 504 3.2.2.7.50.4 Examples ...... 504 3.2.2.7.50.5 Automatic permissions for newly created schemas ...... 504 3.2.2.7.50.6 Automatic permissions for newly created tables in a schema . 505 3.2.2.7.50.7 Revoke (DROP GRANT) permissions for newly created tables . 505 3.2.2.7.51 ALTER ROLE ...... 505 3.2.2.7.51.1 Subcommands ...... 505 3.2.2.7.52 CREATE ROLE ...... 505 3.2.2.7.52.1 Permissions ...... 505 3.2.2.7.52.2 Syntax ...... 505 3.2.2.7.52.3 Parameters ...... 506 3.2.2.7.52.4 Notes ...... 506 3.2.2.7.52.5 Examples ...... 506 3.2.2.7.52.6 Creating a role with no permissions ...... 506 3.2.2.7.52.7 Creating a user role ...... 506 3.2.2.7.53 DROP ROLE ...... 506 3.2.2.7.53.1 Permissions ...... 507 3.2.2.7.53.2 Syntax ...... 507 3.2.2.7.53.3 Parameters ...... 507 3.2.2.7.53.4 Examples ...... 507 3.2.2.7.53.5 Dropping a role ...... 507 3.2.2.7.54 GET_STATEMENT_PERMISSIONS ...... 507 3.2.2.7.54.1 Permissions ...... 507 3.2.2.7.54.2 Syntax ...... 507 3.2.2.7.54.3 Parameters ...... 508 3.2.2.7.54.4 Returns ...... 508 3.2.2.7.54.5 Supported permissions ...... 508 3.2.2.7.54.6 Examples ...... 509 3.2.2.7.54.7 Getting permission details for a simple statement ...... 509 3.2.2.7.54.8 Getting permission details for a DDL statement ...... 509 3.2.2.7.55 GRANT ...... 509

xxvii 3.2.2.7.55.1 Permissions ...... 509 3.2.2.7.55.2 Syntax ...... 509 3.2.2.7.55.3 Parameters ...... 511 3.2.2.7.55.4 Supported permissions ...... 512 3.2.2.7.55.5 Examples ...... 512 3.2.2.7.55.6 Creating a user role with login permissions ...... 512 3.2.2.7.55.7 Promoting a user to a superuser ...... 512 3.2.2.7.55.8 Creating a new role for a group of users ...... 513 3.2.2.7.55.9 Granting with admin option ...... 513 3.2.2.7.55.10 Change password for user role ...... 513 3.2.2.7.56 RENAME ROLE ...... 513 3.2.2.7.56.1 Permissions ...... 513 3.2.2.7.56.2 Syntax ...... 513 3.2.2.7.56.3 Parameters ...... 514 3.2.2.7.56.4 Notes ...... 514 3.2.2.7.56.5 Examples ...... 514 3.2.2.7.56.6 Renaming a role ...... 514 3.2.2.7.57 REVOKE ...... 514 3.2.2.7.57.1 Permissions ...... 514 3.2.2.7.57.2 Syntax ...... 515 3.2.2.7.57.3 Parameters ...... 516 3.2.2.7.57.4 Supported permissions ...... 517 3.2.2.7.57.5 Examples ...... 517 3.2.2.7.57.6 Prevent a role from modifying table contents ...... 517 3.2.2.7.57.7 Demoting a user from superuser ...... 517 3.2.2.7.57.8 Revoking admin option ...... 517 3.2.3 SQL Functions ...... 518 3.2.3.1 Summary of functions ...... 518 3.2.3.1.1 Built-In Scalar Functions ...... 518 3.2.3.1.1.1 Bitwise Operations ...... 518 3.2.3.1.1.2 Conditionals ...... 519 3.2.3.1.1.3 Conversion ...... 519 3.2.3.1.1.4 Date and Time ...... 519 3.2.3.1.1.5 Numeric ...... 519 3.2.3.1.1.6 Strings ...... 521 3.2.3.1.2 User-Defined Scalar Functions ...... 521 3.2.3.1.3 Aggregate Functions ...... 521 3.2.3.1.4 Window Functions ...... 522 3.2.3.1.5 System Functions ...... 522 3.2.3.1.6 Workload Management Functions ...... 522 3.2.3.1.6.1 Built-In Scalar functions ...... 522 3.2.3.1.6.2 & (bitwise AND) ...... 523 3.2.3.1.6.3 Syntax ...... 523 3.2.3.1.6.4 Arguments ...... 523 3.2.3.1.6.5 Returns ...... 523 3.2.3.1.6.6 Notes ...... 523 3.2.3.1.6.7 Examples ...... 523 3.2.3.1.6.8 ~ (bitwise NOT) ...... 524 3.2.3.1.6.9 Syntax ...... 524 3.2.3.1.6.10 Arguments ...... 524 3.2.3.1.6.11 Returns ...... 524 3.2.3.1.6.12 Notes ...... 524 3.2.3.1.6.13 Examples ...... 524 3.2.3.1.6.14 | (bitwise OR) ...... 525 xxviii 3.2.3.1.6.15 Syntax ...... 525 3.2.3.1.6.16 Arguments ...... 525 3.2.3.1.6.17 Returns ...... 525 3.2.3.1.6.18 Notes ...... 525 3.2.3.1.6.19 Examples ...... 526 3.2.3.1.6.20 << (bitwise shift left) ...... 526 3.2.3.1.6.21 Syntax ...... 526 3.2.3.1.6.22 Arguments ...... 526 3.2.3.1.6.23 Returns ...... 527 3.2.3.1.6.24 Notes ...... 527 3.2.3.1.6.25 Examples ...... 527 3.2.3.1.6.26 >> (bitwise shift right) ...... 527 3.2.3.1.6.27 Syntax ...... 527 3.2.3.1.6.28 Arguments ...... 528 3.2.3.1.6.29 Returns ...... 528 3.2.3.1.6.30 Notes ...... 528 3.2.3.1.6.31 Examples ...... 528 3.2.3.1.6.32 XOR (bitwise XOR) ...... 528 3.2.3.1.6.33 Syntax ...... 529 3.2.3.1.6.34 Arguments ...... 529 3.2.3.1.6.35 Returns ...... 529 3.2.3.1.6.36 Notes ...... 529 3.2.3.1.6.37 Examples ...... 529 3.2.3.1.6.38 BETWEEN ...... 530 3.2.3.1.6.39 Syntax ...... 530 3.2.3.1.6.40 Arguments ...... 530 3.2.3.1.6.41 Returns ...... 530 3.2.3.1.6.42 Notes ...... 530 3.2.3.1.6.43 Examples ...... 530 3.2.3.1.6.44 BETWEEN ...... 530 3.2.3.1.6.45 NOT BETWEEN ...... 531 3.2.3.1.6.46 CASE ...... 531 3.2.3.1.6.47 Syntax ...... 531 3.2.3.1.6.48 Arguments ...... 531 3.2.3.1.6.49 Returns ...... 532 3.2.3.1.6.50 Notes ...... 532 3.2.3.1.6.51 Simple case expression ...... 532 3.2.3.1.6.52 Searched case expression ...... 532 3.2.3.1.6.53 Examples ...... 532 3.2.3.1.6.54 Simple case ...... 532 3.2.3.1.6.55 Searched case ...... 533 3.2.3.1.6.56 Replacing IF with CASE ...... 533 3.2.3.1.6.57 COALESCE ...... 534 3.2.3.1.6.58 Syntax ...... 534 3.2.3.1.6.59 Arguments ...... 534 3.2.3.1.6.60 Returns ...... 534 3.2.3.1.6.61 Notes ...... 534 3.2.3.1.6.62 Examples ...... 534 3.2.3.1.6.63 Coalesce ...... 534 3.2.3.1.6.64 Replacing NULL values in aggregations ...... 534 3.2.3.1.6.65 IN ...... 535 3.2.3.1.6.66 Syntax ...... 535 3.2.3.1.6.67 Arguments ...... 535 3.2.3.1.6.68 Returns ...... 535

xxix 3.2.3.1.6.69 Notes ...... 535 3.2.3.1.6.70 Examples ...... 535 3.2.3.1.6.71 IN ...... 535 3.2.3.1.6.72 NOT IN ...... 536 3.2.3.1.6.73 IS_ASCII ...... 536 3.2.3.1.6.74 Syntax ...... 536 3.2.3.1.6.75 Arguments ...... 536 3.2.3.1.6.76 Returns ...... 536 3.2.3.1.6.77 Notes ...... 536 3.2.3.1.6.78 Examples ...... 536 3.2.3.1.6.79 IS NULL ...... 537 3.2.3.1.6.80 IS NULL ...... 537 3.2.3.1.6.81 Syntax ...... 537 3.2.3.1.6.82 Arguments ...... 537 3.2.3.1.6.83 Returns ...... 537 3.2.3.1.6.84 Examples ...... 537 3.2.3.1.6.85 IS NULL ...... 538 3.2.3.1.6.86 Using IS NOT NULL to filter unwanted results ...... 538 3.2.3.1.6.87 ISNULL ...... 538 3.2.3.1.6.88 Syntax ...... 538 3.2.3.1.6.89 Arguments ...... 538 3.2.3.1.6.90 Returns ...... 538 3.2.3.1.6.91 Notes ...... 538 3.2.3.1.6.92 Examples ...... 539 3.2.3.1.6.93 ISNULL ...... 539 3.2.3.1.6.94 Replacing NULL values in aggregations ...... 539 3.2.3.1.6.95 FROM_UNIXTS, FROM_UNIXTSMS ...... 539 3.2.3.1.6.96 Syntax ...... 539 3.2.3.1.6.97 Arguments ...... 539 3.2.3.1.6.98 Returns ...... 539 3.2.3.1.6.99 Notes ...... 540 3.2.3.1.6.100 Examples ...... 540 3.2.3.1.6.101 Convert a UNIX timestamp to DATETIME ...... 540 3.2.3.1.6.102 TO_HEX ...... 540 3.2.3.1.6.103 Syntax ...... 540 3.2.3.1.6.104 Arguments ...... 540 3.2.3.1.6.105 Returns ...... 540 3.2.3.1.6.106 Examples ...... 540 3.2.3.1.6.107 Convert numbers to hexadecimal ...... 540 3.2.3.1.6.108 TO_UNIXTS, TO_UNIXTSMS ...... 541 3.2.3.1.6.109 Syntax ...... 541 3.2.3.1.6.110 Arguments ...... 541 3.2.3.1.6.111 Returns ...... 541 3.2.3.1.6.112 Notes ...... 541 3.2.3.1.6.113 Examples ...... 541 3.2.3.1.6.114 Get the current UNIX timestamp ...... 541 3.2.3.1.6.115 Filter on a range of UNIX timestamps ...... 542 3.2.3.1.6.116 CURDATE ...... 542 3.2.3.1.6.117 Syntax ...... 542 3.2.3.1.6.118 Arguments ...... 542 3.2.3.1.6.119 Returns ...... 542 3.2.3.1.6.120 Notes ...... 542 3.2.3.1.6.121 Examples ...... 542 3.2.3.1.6.122 Get the current system date ...... 542 xxx 3.2.3.1.6.123 CURRENT_DATE ...... 543 3.2.3.1.6.124 Syntax ...... 543 3.2.3.1.6.125 Arguments ...... 543 3.2.3.1.6.126 Returns ...... 543 3.2.3.1.6.127 Notes ...... 543 3.2.3.1.6.128 Examples ...... 543 3.2.3.1.6.129 Get the current system date ...... 543 3.2.3.1.6.130 CURRENT_TIMESTAMP ...... 543 3.2.3.1.6.131 Syntax ...... 544 3.2.3.1.6.132 Arguments ...... 544 3.2.3.1.6.133 Returns ...... 544 3.2.3.1.6.134 Notes ...... 544 3.2.3.1.6.135 Examples ...... 544 3.2.3.1.6.136 Get the current system date and time ...... 544 3.2.3.1.6.137 Find events that happen before this month ...... 544 3.2.3.1.6.138 DATEADD ...... 544 3.2.3.1.6.139 Syntax ...... 545 3.2.3.1.6.140 Arguments ...... 545 3.2.3.1.6.141 Valid date parts ...... 545 3.2.3.1.6.142 Returns ...... 546 3.2.3.1.6.143 Notes ...... 546 3.2.3.1.6.144 Examples ...... 546 3.2.3.1.6.145 Add a quarter to each date ...... 546 3.2.3.1.6.146 Getting next month’s date ...... 546 3.2.3.1.6.147 Filtering +- 50 years from a specific date ...... 547 3.2.3.1.6.148 Check if a year is a leap year ...... 547 3.2.3.1.6.149 DATEDIFF ...... 547 3.2.3.1.6.150 Syntax ...... 547 3.2.3.1.6.151 Arguments ...... 548 3.2.3.1.6.152 Valid date parts ...... 548 3.2.3.1.6.153 Returns ...... 548 3.2.3.1.6.154 Notes ...... 548 3.2.3.1.6.155 Examples ...... 548 3.2.3.1.6.156 Find out how far past events are ...... 549 3.2.3.1.6.157 In years ...... 549 3.2.3.1.6.158 In days ...... 549 3.2.3.1.6.159 In hours ...... 549 3.2.3.1.6.160 DATEPART ...... 550 3.2.3.1.6.161 Syntax ...... 550 3.2.3.1.6.162 Arguments ...... 550 3.2.3.1.6.163 Valid date parts ...... 551 3.2.3.1.6.164 Returns ...... 551 3.2.3.1.6.165 Notes ...... 551 3.2.3.1.6.166 Examples ...... 551 3.2.3.1.6.167 Break up a DATE into components ...... 551 3.2.3.1.6.168 Break up a DATETIME into time components ...... 552 3.2.3.1.6.169 Count number of rows grouped by quarter ...... 552 3.2.3.1.6.170 EOMONTH ...... 552 3.2.3.1.6.171 Syntax ...... 553 3.2.3.1.6.172 Arguments ...... 553 3.2.3.1.6.173 Returns ...... 553 3.2.3.1.6.174 Notes ...... 553 3.2.3.1.6.175 Examples ...... 553 3.2.3.1.6.176 Find last day of the month for a DATE ...... 553

xxxi 3.2.3.1.6.177 Find the last day of the month for a DATETIME ...... 554 3.2.3.1.6.178 EXTRACT ...... 554 3.2.3.1.6.179 Syntax ...... 554 3.2.3.1.6.180 Arguments ...... 554 3.2.3.1.6.181 Valid date parts ...... 555 3.2.3.1.6.182 Returns ...... 555 3.2.3.1.6.183 Notes ...... 555 3.2.3.1.6.184 Examples ...... 555 3.2.3.1.6.185 Break up a DATE into components ...... 555 3.2.3.1.6.186 GETDATE ...... 556 3.2.3.1.6.187 Syntax ...... 556 3.2.3.1.6.188 Arguments ...... 556 3.2.3.1.6.189 Returns ...... 556 3.2.3.1.6.190 Notes ...... 556 3.2.3.1.6.191 Examples ...... 556 3.2.3.1.6.192 Get the current system date and time ...... 556 3.2.3.1.6.193 Find events that happen before this month ...... 556 3.2.3.1.6.194 SYSDATE ...... 557 3.2.3.1.6.195 Syntax ...... 557 3.2.3.1.6.196 Arguments ...... 557 3.2.3.1.6.197 Returns ...... 557 3.2.3.1.6.198 Notes ...... 557 3.2.3.1.6.199 Examples ...... 557 3.2.3.1.6.200 Get the current system date and time ...... 557 3.2.3.1.6.201 Find events that happen before this month ...... 557 3.2.3.1.6.202 TRUNC ...... 558 3.2.3.1.6.203 Syntax ...... 558 3.2.3.1.6.204 Arguments ...... 558 3.2.3.1.6.205 Valid date parts ...... 558 3.2.3.1.6.206 Returns ...... 559 3.2.3.1.6.207 Notes ...... 559 3.2.3.1.6.208 Examples ...... 559 3.2.3.1.6.209 Set all DATE columns to DATETIME at midnight ...... 559 3.2.3.1.6.210 Find the first day of the month for dates ...... 559 3.2.3.1.6.211 Calculate number of hours from New Years ...... 560 3.2.3.1.6.212 ABS ...... 560 3.2.3.1.6.213 Syntax ...... 560 3.2.3.1.6.214 Arguments ...... 560 3.2.3.1.6.215 Returns ...... 560 3.2.3.1.6.216 Notes ...... 560 3.2.3.1.6.217 Examples ...... 560 3.2.3.1.6.218 Absolute value on an integer ...... 561 3.2.3.1.6.219 Absolute value on integer and floating point ...... 561 3.2.3.1.6.220 ACOS ...... 561 3.2.3.1.6.221 Syntax ...... 561 3.2.3.1.6.222 Arguments ...... 561 3.2.3.1.6.223 Returns ...... 561 3.2.3.1.6.224 Notes ...... 562 3.2.3.1.6.225 Examples ...... 562 3.2.3.1.6.226 Inverse cosine of 0 (= π/2) ...... 562 3.2.3.1.6.227 Computing inverse cosine for a column ...... 562 3.2.3.1.6.228 Arithmetic operators ...... 562 3.2.3.1.6.229 Syntax ...... 563 3.2.3.1.6.230 Arguments ...... 563 xxxii 3.2.3.1.6.231 Arithmetic operators ...... 563 3.2.3.1.6.232 Notes ...... 563 3.2.3.1.6.233 Examples ...... 563 3.2.3.1.6.234 Coercing with a unary + ...... 563 3.2.3.1.6.235 Using infix operators ...... 563 3.2.3.1.6.236 ASIN ...... 564 3.2.3.1.6.237 Syntax ...... 564 3.2.3.1.6.238 Arguments ...... 564 3.2.3.1.6.239 Returns ...... 564 3.2.3.1.6.240 Notes ...... 564 3.2.3.1.6.241 Examples ...... 564 3.2.3.1.6.242 Inverse sine of 1 (= π/2) ...... 565 3.2.3.1.6.243 Computing inverse sine for a column ...... 565 3.2.3.1.6.244 ATAN ...... 565 3.2.3.1.6.245 Syntax ...... 565 3.2.3.1.6.246 Arguments ...... 565 3.2.3.1.6.247 Returns ...... 565 3.2.3.1.6.248 Notes ...... 565 3.2.3.1.6.249 Examples ...... 566 3.2.3.1.6.250 Inverse tangent of 1 (= π/4) ...... 566 3.2.3.1.6.251 Computing inverse tangent for a column ...... 566 3.2.3.1.6.252 ATN2 ...... 566 3.2.3.1.6.253 Syntax ...... 566 3.2.3.1.6.254 Arguments ...... 567 3.2.3.1.6.255 Returns ...... 567 3.2.3.1.6.256 Notes ...... 567 3.2.3.1.6.257 Examples ...... 567 3.2.3.1.6.258 Inverse tangent of (1, 0) (= π/2) ...... 567 3.2.3.1.6.259 Inverse tangent of x=0.5, y=SQRT(3)/2 (= π/3, or 60°) ... 567 3.2.3.1.6.260 CEILING / CEIL ...... 567 3.2.3.1.6.261 Syntax ...... 568 3.2.3.1.6.262 Arguments ...... 568 3.2.3.1.6.263 Returns ...... 568 3.2.3.1.6.264 Notes ...... 568 3.2.3.1.6.265 Examples ...... 568 3.2.3.1.6.266 FLOOR vs. CEIL vs. ROUND ...... 568 3.2.3.1.6.267 COS ...... 568 3.2.3.1.6.268 Syntax ...... 568 3.2.3.1.6.269 Arguments ...... 569 3.2.3.1.6.270 Returns ...... 569 3.2.3.1.6.271 Notes ...... 569 3.2.3.1.6.272 Examples ...... 569 3.2.3.1.6.273 Cosine of 0 (= 1) ...... 569 3.2.3.1.6.274 Computing cosine for a column, in radians ...... 569 3.2.3.1.6.275 COT ...... 570 3.2.3.1.6.276 Syntax ...... 570 3.2.3.1.6.277 Arguments ...... 570 3.2.3.1.6.278 Returns ...... 570 3.2.3.1.6.279 Notes ...... 570 3.2.3.1.6.280 Examples ...... 570 3.2.3.1.6.281 Cotangent of π/4 (= 1) ...... 570 3.2.3.1.6.282 Computing cotangents for a column, in radians ...... 571 3.2.3.1.6.283 CRC64 ...... 571 3.2.3.1.6.284 Syntax ...... 571

xxxiii 3.2.3.1.6.285 Arguments ...... 571 3.2.3.1.6.286 Returns ...... 571 3.2.3.1.6.287 Notes ...... 571 3.2.3.1.6.288 Examples ...... 572 3.2.3.1.6.289 Calculate a CRC-64 hash of a string ...... 572 3.2.3.1.6.290 DEGREES ...... 572 3.2.3.1.6.291 Syntax ...... 572 3.2.3.1.6.292 Arguments ...... 572 3.2.3.1.6.293 Returns ...... 572 3.2.3.1.6.294 Notes ...... 572 3.2.3.1.6.295 Examples ...... 572 3.2.3.1.6.296 EXP ...... 573 3.2.3.1.6.297 Syntax ...... 573 3.2.3.1.6.298 Arguments ...... 573 3.2.3.1.6.299 Returns ...... 573 3.2.3.1.6.300 Notes ...... 573 3.2.3.1.6.301 Examples ...... 573 3.2.3.1.6.302 Natural exponent values ...... 573 3.2.3.1.6.303 FLOOR ...... 574 3.2.3.1.6.304 Syntax ...... 574 3.2.3.1.6.305 Arguments ...... 574 3.2.3.1.6.306 Returns ...... 574 3.2.3.1.6.307 Notes ...... 574 3.2.3.1.6.308 Examples ...... 574 3.2.3.1.6.309 FLOOR vs. CEILING / CEIL vs. ROUND ...... 574 3.2.3.1.6.310 LOG ...... 575 3.2.3.1.6.311 Syntax ...... 575 3.2.3.1.6.312 Arguments ...... 575 3.2.3.1.6.313 Returns ...... 575 3.2.3.1.6.314 Notes ...... 575 3.2.3.1.6.315 Examples ...... 575 3.2.3.1.6.316 Natural log values ...... 575 3.2.3.1.6.317 LOG10 ...... 575 3.2.3.1.6.318 Syntax ...... 575 3.2.3.1.6.319 Arguments ...... 576 3.2.3.1.6.320 Returns ...... 576 3.2.3.1.6.321 Notes ...... 576 3.2.3.1.6.322 Examples ...... 576 3.2.3.1.6.323 Log (base 10) values ...... 576 3.2.3.1.6.324 MOD, % ...... 576 3.2.3.1.6.325 Syntax ...... 576 3.2.3.1.6.326 Arguments ...... 576 3.2.3.1.6.327 Returns ...... 576 3.2.3.1.6.328 Notes ...... 577 3.2.3.1.6.329 Examples ...... 577 3.2.3.1.6.330 Using the MOD function ...... 577 3.2.3.1.6.331 Using the infix operator ...... 577 3.2.3.1.6.332 PI ...... 577 3.2.3.1.6.333 Syntax ...... 577 3.2.3.1.6.334 Arguments ...... 577 3.2.3.1.6.335 Returns ...... 577 3.2.3.1.6.336 Examples ...... 577 3.2.3.1.6.337 Using PI() to represent degrees in radians ...... 577 3.2.3.1.6.338 Tangent of π/4 (= 1) ...... 578 xxxiv 3.2.3.1.6.339 Sine of π/2 (= 1) ...... 578 3.2.3.1.6.340 POWER ...... 578 3.2.3.1.6.341 Syntax ...... 578 3.2.3.1.6.342 Arguments ...... 578 3.2.3.1.6.343 Returns ...... 578 3.2.3.1.6.344 Notes ...... 579 3.2.3.1.6.345 Examples ...... 579 3.2.3.1.6.346 On integers ...... 579 3.2.3.1.6.347 On floating point ...... 579 3.2.3.1.6.348 RADIANS ...... 579 3.2.3.1.6.349 Syntax ...... 579 3.2.3.1.6.350 Arguments ...... 579 3.2.3.1.6.351 Returns ...... 580 3.2.3.1.6.352 Notes ...... 580 3.2.3.1.6.353 Examples ...... 580 3.2.3.1.6.354 ROUND ...... 580 3.2.3.1.6.355 Syntax ...... 580 3.2.3.1.6.356 Arguments ...... 581 3.2.3.1.6.357 Returns ...... 581 3.2.3.1.6.358 Notes ...... 581 3.2.3.1.6.359 Examples ...... 581 3.2.3.1.6.360 Rounding to the nearest integer ...... 581 3.2.3.1.6.361 Rounding to 2 digits after the decimal point ...... 581 3.2.3.1.6.362 FLOOR vs. CEILING / CEIL vs. ROUND ...... 581 3.2.3.1.6.363 SIN ...... 582 3.2.3.1.6.364 Syntax ...... 582 3.2.3.1.6.365 Arguments ...... 582 3.2.3.1.6.366 Returns ...... 582 3.2.3.1.6.367 Notes ...... 582 3.2.3.1.6.368 Examples ...... 582 3.2.3.1.6.369 Sine of π/2 (= 1) ...... 583 3.2.3.1.6.370 Computing sine for a column, in radians ...... 583 3.2.3.1.6.371 SQRT ...... 583 3.2.3.1.6.372 Syntax ...... 583 3.2.3.1.6.373 Arguments ...... 583 3.2.3.1.6.374 Returns ...... 583 3.2.3.1.6.375 Notes ...... 584 3.2.3.1.6.376 Examples ...... 584 3.2.3.1.6.377 Replacing negative values with NULL ...... 584 3.2.3.1.6.378 SQUARE ...... 584 3.2.3.1.6.379 Syntax ...... 585 3.2.3.1.6.380 Arguments ...... 585 3.2.3.1.6.381 Returns ...... 585 3.2.3.1.6.382 Notes ...... 585 3.2.3.1.6.383 Examples ...... 585 3.2.3.1.6.384 TAN ...... 585 3.2.3.1.6.385 Syntax ...... 585 3.2.3.1.6.386 Arguments ...... 585 3.2.3.1.6.387 Returns ...... 585 3.2.3.1.6.388 Notes ...... 586 3.2.3.1.6.389 Examples ...... 586 3.2.3.1.6.390 Tangent of π/4 (= 1) ...... 586 3.2.3.1.6.391 Computing tangents for a column, in radians ...... 586 3.2.3.1.6.392 TRUNC ...... 586

xxxv 3.2.3.1.6.393 Syntax ...... 587 3.2.3.1.6.394 Arguments ...... 587 3.2.3.1.6.395 Returns ...... 587 3.2.3.1.6.396 Notes ...... 587 3.2.3.1.6.397 Examples ...... 587 3.2.3.1.6.398 Rounding to the nearest integer ...... 587 3.2.3.1.6.399 TRUNC vs. ROUND ...... 587 3.2.3.1.6.400 CHAR_LENGTH ...... 588 3.2.3.1.6.401 Syntax ...... 588 3.2.3.1.6.402 Arguments ...... 588 3.2.3.1.6.403 Returns ...... 588 3.2.3.1.6.404 Notes ...... 588 3.2.3.1.6.405 Examples ...... 588 3.2.3.1.6.406 Length in characters and bytes of strings ...... 588 3.2.3.1.6.407 CHARINDEX ...... 589 3.2.3.1.6.408 Syntax ...... 589 3.2.3.1.6.409 Arguments ...... 589 3.2.3.1.6.410 Returns ...... 589 3.2.3.1.6.411 Notes ...... 589 3.2.3.1.6.412 Examples ...... 589 3.2.3.1.6.413 Using CHARINDEX ...... 590 3.2.3.1.6.414 || (Concatenate) ...... 590 3.2.3.1.6.415 Syntax ...... 590 3.2.3.1.6.416 Arguments ...... 590 3.2.3.1.6.417 Returns ...... 590 3.2.3.1.6.418 Notes ...... 590 3.2.3.1.6.419 Examples ...... 590 3.2.3.1.6.420 Concatenate two values ...... 591 3.2.3.1.6.421 Concatenate two strings ...... 592 3.2.3.1.6.422 Adding spaces ...... 592 3.2.3.1.6.423 ISPREFIXOF ...... 592 3.2.3.1.6.424 Syntax ...... 592 3.2.3.1.6.425 Arguments ...... 592 3.2.3.1.6.426 Returns ...... 593 3.2.3.1.6.427 Notes ...... 593 3.2.3.1.6.428 Examples ...... 593 3.2.3.1.6.429 Filtering using ISPREFIXOF ...... 593 3.2.3.1.6.430 LEFT ...... 593 3.2.3.1.6.431 Syntax ...... 593 3.2.3.1.6.432 Arguments ...... 594 3.2.3.1.6.433 Returns ...... 594 3.2.3.1.6.434 Notes ...... 594 3.2.3.1.6.435 Examples ...... 594 3.2.3.1.6.436 Using LEFT ...... 594 3.2.3.1.6.437 LEN ...... 594 3.2.3.1.6.438 Syntax ...... 595 3.2.3.1.6.439 Arguments ...... 595 3.2.3.1.6.440 Returns ...... 595 3.2.3.1.6.441 Notes ...... 595 3.2.3.1.6.442 Examples ...... 595 3.2.3.1.6.443 Length in characters and bytes of strings ...... 596 3.2.3.1.6.444 Length of an ASCII string ...... 596 3.2.3.1.6.445 Absolute value on integer and floating point ...... 596 3.2.3.1.6.446 LIKE ...... 596 xxxvi 3.2.3.1.6.447 Syntax ...... 597 3.2.3.1.6.448 Arguments ...... 597 3.2.3.1.6.449 Test patterns ...... 597 3.2.3.1.6.450 Returns ...... 597 3.2.3.1.6.451 Notes ...... 597 3.2.3.1.6.452 Examples ...... 597 3.2.3.1.6.453 Match the beginning of a string ...... 598 3.2.3.1.6.454 Match a wildcard character by escaping ...... 599 3.2.3.1.6.455 Negate with NOT ...... 599 3.2.3.1.6.456 Match the middle of a string ...... 599 3.2.3.1.6.457 Find players with a middle name or suffix ...... 599 3.2.3.1.6.458 Find NON-LITERAL patterns ...... 600 3.2.3.1.6.459 Select rows in which z is a prefix of y: ...... 600 3.2.3.1.6.460 Select rows in which y contains z as a substring: ...... 600 3.2.3.1.6.461 Values that contain wildcards as well: ...... 600 3.2.3.1.6.462 LOWER ...... 600 3.2.3.1.6.463 Syntax ...... 601 3.2.3.1.6.464 Arguments ...... 601 3.2.3.1.6.465 Returns ...... 601 3.2.3.1.6.466 Notes ...... 601 3.2.3.1.6.467 Examples ...... 601 3.2.3.1.6.468 Lower-casing a literal value ...... 601 3.2.3.1.6.469 Lower-casing a column of values ...... 601 3.2.3.1.6.470 LTRIM ...... 602 3.2.3.1.6.471 Syntax ...... 602 3.2.3.1.6.472 Arguments ...... 602 3.2.3.1.6.473 Returns ...... 602 3.2.3.1.6.474 Notes ...... 602 3.2.3.1.6.475 Examples ...... 602 3.2.3.1.6.476 Trimming a literal value ...... 603 3.2.3.1.6.477 Trimming a column of values ...... 603 3.2.3.1.6.478 OCTET_LENGTH ...... 603 3.2.3.1.6.479 Syntax ...... 603 3.2.3.1.6.480 Arguments ...... 604 3.2.3.1.6.481 Returns ...... 604 3.2.3.1.6.482 Notes ...... 604 3.2.3.1.6.483 Examples ...... 604 3.2.3.1.6.484 Length in characters and bytes of strings ...... 604 3.2.3.1.6.485 PATINDEX ...... 604 3.2.3.1.6.486 Syntax ...... 605 3.2.3.1.6.487 Arguments ...... 605 3.2.3.1.6.488 Test patterns ...... 605 3.2.3.1.6.489 Returns ...... 605 3.2.3.1.6.490 Notes ...... 605 3.2.3.1.6.491 Examples ...... 605 3.2.3.1.6.492 Simple patterns ...... 606 3.2.3.1.6.493 Complex wildcards expressions ...... 606 3.2.3.1.6.494 REGEXP_COUNT ...... 606 3.2.3.1.6.495 Syntax ...... 607 3.2.3.1.6.496 Arguments ...... 607 3.2.3.1.6.497 Test patterns ...... 607 3.2.3.1.6.498 Returns ...... 607 3.2.3.1.6.499 Notes ...... 607 3.2.3.1.6.500 Examples ...... 608

xxxvii 3.2.3.1.6.501 Find players with at least 3 names (2 spaces) ...... 608 3.2.3.1.6.502 Using the offset index ...... 609 3.2.3.1.6.503 REGEXP_INSTR ...... 609 3.2.3.1.6.504 Syntax ...... 609 3.2.3.1.6.505 Arguments ...... 609 3.2.3.1.6.506 Test patterns ...... 610 3.2.3.1.6.507 Returns ...... 610 3.2.3.1.6.508 Notes ...... 610 3.2.3.1.6.509 Examples ...... 610 3.2.3.1.6.510 Find players with ‘ow’ in their name ...... 611 3.2.3.1.6.511 Using the return_position argument ...... 611 3.2.3.1.6.512 REGEXP_SUBSTR ...... 612 3.2.3.1.6.513 Syntax ...... 612 3.2.3.1.6.514 Arguments ...... 612 3.2.3.1.6.515 Test patterns ...... 613 3.2.3.1.6.516 Returns ...... 613 3.2.3.1.6.517 Notes ...... 613 3.2.3.1.6.518 Examples ...... 613 3.2.3.1.6.519 Find players with ‘o’ in their name ...... 614 3.2.3.1.6.520 Using the return_position argument ...... 614 3.2.3.1.6.521 REPEAT ...... 615 3.2.3.1.6.522 Syntax ...... 615 3.2.3.1.6.523 Arguments ...... 615 3.2.3.1.6.524 Returns ...... 615 3.2.3.1.6.525 Notes ...... 615 3.2.3.1.6.526 Examples ...... 615 3.2.3.1.6.527 Repeat the text in customername 2 times: ...... 616 3.2.3.1.6.528 Repeat the string 0 times: ...... 616 3.2.3.1.6.529 Repeat the string 1 times: ...... 616 3.2.3.1.6.530 Repeat the string 3 times: ...... 616 3.2.3.1.6.531 Repeat an empty string 10 times: ...... 616 3.2.3.1.6.532 Repeat a string -3 times: ...... 617 3.2.3.1.6.533 REPLACE ...... 617 3.2.3.1.6.534 Syntax ...... 617 3.2.3.1.6.535 Arguments ...... 617 3.2.3.1.6.536 Returns ...... 617 3.2.3.1.6.537 Notes ...... 617 3.2.3.1.6.538 Examples ...... 617 3.2.3.1.6.539 Replacing single characters ...... 618 3.2.3.1.6.540 Replacing words ...... 618 3.2.3.1.6.541 REVERSE ...... 618 3.2.3.1.6.542 Syntax ...... 618 3.2.3.1.6.543 Arguments ...... 618 3.2.3.1.6.544 Returns ...... 619 3.2.3.1.6.545 Notes ...... 619 3.2.3.1.6.546 Examples ...... 619 3.2.3.1.6.547 Using REVERSE ...... 619 3.2.3.1.6.548 RIGHT ...... 619 3.2.3.1.6.549 Syntax ...... 619 3.2.3.1.6.550 Arguments ...... 620 3.2.3.1.6.551 Returns ...... 620 3.2.3.1.6.552 Notes ...... 620 3.2.3.1.6.553 Examples ...... 620 3.2.3.1.6.554 Using RIGHT ...... 620 xxxviii 3.2.3.1.6.555 RLIKE ...... 620 3.2.3.1.6.556 Syntax ...... 621 3.2.3.1.6.557 Arguments ...... 621 3.2.3.1.6.558 Test patterns ...... 621 3.2.3.1.6.559 Returns ...... 621 3.2.3.1.6.560 Notes ...... 621 3.2.3.1.6.561 Examples ...... 622 3.2.3.1.6.562 Match the beginning of a string ...... 622 3.2.3.1.6.563 Negate with NOT ...... 623 3.2.3.1.6.564 Match the middle of a string ...... 623 3.2.3.1.6.565 Find players with a Roman numeral suffix ...... 623 3.2.3.1.6.566 Find players with just one middle name ...... 623 3.2.3.1.6.567 RTRIM ...... 624 3.2.3.1.6.568 Syntax ...... 624 3.2.3.1.6.569 Arguments ...... 624 3.2.3.1.6.570 Returns ...... 624 3.2.3.1.6.571 Notes ...... 624 3.2.3.1.6.572 Examples ...... 624 3.2.3.1.6.573 Trimming a literal value ...... 625 3.2.3.1.6.574 Trimming a column of values ...... 625 3.2.3.1.6.575 SUBSTRING ...... 625 3.2.3.1.6.576 Syntax ...... 625 3.2.3.1.6.577 Arguments ...... 626 3.2.3.1.6.578 Returns ...... 626 3.2.3.1.6.579 Notes ...... 626 3.2.3.1.6.580 Examples ...... 626 3.2.3.1.6.581 Substring using fixed offsets ...... 627 3.2.3.1.6.582 Truncating strings ...... 627 3.2.3.1.6.583 TRIM ...... 628 3.2.3.1.6.584 Syntax ...... 628 3.2.3.1.6.585 Arguments ...... 628 3.2.3.1.6.586 Returns ...... 628 3.2.3.1.6.587 Notes ...... 628 3.2.3.1.6.588 Examples ...... 628 3.2.3.1.6.589 Trimming a literal value ...... 628 3.2.3.1.6.590 Trimming a column of values ...... 629 3.2.3.1.6.591 UPPER ...... 629 3.2.3.1.6.592 Syntax ...... 629 3.2.3.1.6.593 Arguments ...... 629 3.2.3.1.6.594 Returns ...... 630 3.2.3.1.6.595 Notes ...... 630 3.2.3.1.6.596 Examples ...... 630 3.2.3.1.6.597 Upper-casing a literal value ...... 630 3.2.3.1.6.598 Upper-casing a column of values ...... 630 3.2.3.1.6.599 User-Defined Functions ...... 630 3.2.3.1.6.600 Scalar SQL UDF ...... 631 3.2.3.1.6.601 Syntax ...... 631 3.2.3.1.6.602 Examples ...... 631 3.2.3.1.6.603 Example 1 – Support for Different Syntax ...... 631 3.2.3.1.6.604 Example 2 – Manipulating Strings ...... 631 3.2.3.1.6.605 Example 3 – Manually Building Functionality ...... 632 3.2.3.1.6.606 Usage Notes ...... 632 3.2.3.1.6.607 Restrictions ...... 632 3.2.3.1.6.608 Aggregate Functions ...... 633

xxxix 3.2.3.1.6.609 Overview ...... 633 3.2.3.1.6.610 Available Aggregate Functions ...... 633 3.2.3.1.6.611 AVG ...... 633 3.2.3.1.6.612 Syntax ...... 633 3.2.3.1.6.613 Arguments ...... 633 3.2.3.1.6.614 Returns ...... 633 3.2.3.1.6.615 Notes ...... 634 3.2.3.1.6.616 Examples ...... 634 3.2.3.1.6.617 Simple average ...... 635 3.2.3.1.6.618 Combine AVG with other aggregates ...... 635 3.2.3.1.6.619 CORR ...... 635 3.2.3.1.6.620 Syntax ...... 636 3.2.3.1.6.621 Arguments ...... 636 3.2.3.1.6.622 Returns ...... 636 3.2.3.1.6.623 Notes ...... 636 3.2.3.1.6.624 Examples ...... 636 3.2.3.1.6.625 Simple correlation ...... 637 3.2.3.1.6.626 COUNT ...... 638 3.2.3.1.6.627 Syntax ...... 638 3.2.3.1.6.628 Arguments ...... 638 3.2.3.1.6.629 Returns ...... 638 3.2.3.1.6.630 Notes ...... 638 3.2.3.1.6.631 Examples ...... 639 3.2.3.1.6.632 Count rows in a table ...... 639 3.2.3.1.6.633 Count distinct values in a table ...... 640 3.2.3.1.6.634 Combine COUNT with other aggregates ...... 640 3.2.3.1.6.635 COVAR_POP ...... 640 3.2.3.1.6.636 Syntax ...... 641 3.2.3.1.6.637 Arguments ...... 641 3.2.3.1.6.638 Returns ...... 641 3.2.3.1.6.639 Notes ...... 641 3.2.3.1.6.640 Examples ...... 641 3.2.3.1.6.641 Simple covariance ...... 642 3.2.3.1.6.642 COVAR_SAMP ...... 643 3.2.3.1.6.643 Syntax ...... 643 3.2.3.1.6.644 Arguments ...... 643 3.2.3.1.6.645 Returns ...... 643 3.2.3.1.6.646 Notes ...... 643 3.2.3.1.6.647 Examples ...... 643 3.2.3.1.6.648 Simple covariance ...... 644 3.2.3.1.6.649 MAX ...... 645 3.2.3.1.6.650 Syntax ...... 645 3.2.3.1.6.651 Arguments ...... 645 3.2.3.1.6.652 Returns ...... 645 3.2.3.1.6.653 Notes ...... 646 3.2.3.1.6.654 Examples ...... 646 3.2.3.1.6.655 Simple Minimum, Maximum on numeric columns ...... 646 3.2.3.1.6.656 Minimum and maximum on text columns ...... 647 3.2.3.1.6.657 Combine MAX with GROUP BY ...... 647 3.2.3.1.6.658 MIN ...... 647 3.2.3.1.6.659 Syntax ...... 647 3.2.3.1.6.660 Arguments ...... 647 3.2.3.1.6.661 Returns ...... 648 3.2.3.1.6.662 Notes ...... 648 xl 3.2.3.1.6.663 Examples ...... 648 3.2.3.1.6.664 Simple Minimum, Maximum on numeric columns ...... 649 3.2.3.1.6.665 Minimum and maximum on text columns ...... 649 3.2.3.1.6.666 Combine MIN with GROUP BY ...... 649 3.2.3.1.6.667 STDDEV_POP ...... 649 3.2.3.1.6.668 Syntax ...... 649 3.2.3.1.6.669 Arguments ...... 650 3.2.3.1.6.670 Returns ...... 650 3.2.3.1.6.671 Notes ...... 650 3.2.3.1.6.672 Examples ...... 650 3.2.3.1.6.673 Simple standard deviation ...... 651 3.2.3.1.6.674 Combine STDEVP with other aggregates ...... 651 3.2.3.1.6.675 STDDEV_SAMP ...... 652 3.2.3.1.6.676 Syntax ...... 652 3.2.3.1.6.677 Arguments ...... 652 3.2.3.1.6.678 Returns ...... 653 3.2.3.1.6.679 Notes ...... 653 3.2.3.1.6.680 Examples ...... 653 3.2.3.1.6.681 Simple standard deviation ...... 654 3.2.3.1.6.682 Combine stddev with other aggregates ...... 654 3.2.3.1.6.683 SUM ...... 654 3.2.3.1.6.684 Syntax ...... 654 3.2.3.1.6.685 Arguments ...... 655 3.2.3.1.6.686 Returns ...... 655 3.2.3.1.6.687 Notes ...... 655 3.2.3.1.6.688 Examples ...... 655 3.2.3.1.6.689 Simple sum ...... 656 3.2.3.1.6.690 Sum only distinct values ...... 656 3.2.3.1.6.691 Combine sum with GROUP BY ...... 656 3.2.3.1.6.692 VAR_POP ...... 657 3.2.3.1.6.693 Syntax ...... 657 3.2.3.1.6.694 Arguments ...... 657 3.2.3.1.6.695 Returns ...... 658 3.2.3.1.6.696 Notes ...... 658 3.2.3.1.6.697 Examples ...... 658 3.2.3.1.6.698 Simple variance ...... 659 3.2.3.1.6.699 Combine VARP with other aggregates ...... 659 3.2.3.1.6.700 VAR_SAMP ...... 659 3.2.3.1.6.701 Syntax ...... 660 3.2.3.1.6.702 Arguments ...... 660 3.2.3.1.6.703 Returns ...... 660 3.2.3.1.6.704 Notes ...... 660 3.2.3.1.6.705 Examples ...... 660 3.2.3.1.6.706 Simple variance ...... 661 3.2.3.1.6.707 Combine VAR with other aggregates ...... 662 3.2.3.1.6.708 Window Functions ...... 662 3.2.3.1.6.709 LAG ...... 662 3.2.3.1.6.710 Syntax ...... 662 3.2.3.1.6.711 Arguments ...... 663 3.2.3.1.6.712 Returns ...... 663 3.2.3.1.6.713 Notes ...... 663 3.2.3.1.6.714 Examples ...... 663 3.2.3.1.6.715 Using LAG to access previous rows ...... 664 3.2.3.1.6.716 LEAD ...... 665

xli 3.2.3.1.6.717 Syntax ...... 665 3.2.3.1.6.718 Arguments ...... 665 3.2.3.1.6.719 Returns ...... 665 3.2.3.1.6.720 Notes ...... 665 3.2.3.1.6.721 Examples ...... 665 3.2.3.1.6.722 Using LEAD to access next rows ...... 666 3.2.3.1.6.723 ROW_NUMBER ...... 667 3.2.3.1.6.724 Syntax ...... 667 3.2.3.1.6.725 Arguments ...... 667 3.2.3.1.6.726 Returns ...... 667 3.2.3.1.6.727 Notes ...... 667 3.2.3.1.6.728 Examples ...... 667 3.2.3.1.6.729 RANK vs. ROW_NUMBER ...... 668 3.2.3.1.6.730 RANK ...... 669 3.2.3.1.6.731 Syntax ...... 669 3.2.3.1.6.732 Arguments ...... 670 3.2.3.1.6.733 Returns ...... 670 3.2.3.1.6.734 Notes ...... 670 3.2.3.1.6.735 Examples ...... 670 3.2.3.1.6.736 RANK vs. ROW_NUMBER ...... 671 3.2.3.1.6.737 FIRST_VALUE ...... 672 3.2.3.1.6.738 Syntax ...... 672 3.2.3.1.6.739 Arguments ...... 673 3.2.3.1.6.740 Returns ...... 673 3.2.3.1.6.741 LAST_VALUE ...... 673 3.2.3.1.6.742 Syntax ...... 673 3.2.3.1.6.743 Arguments ...... 673 3.2.3.1.6.744 Returns ...... 673 3.2.3.1.6.745 NTH_VALUE ...... 673 3.2.3.1.6.746 Syntax ...... 673 3.2.3.1.6.747 Examples ...... 674 3.2.3.1.6.748 DENSE_RANK ...... 674 3.2.3.1.6.749 Syntax ...... 674 3.2.3.1.6.750 Arguments ...... 674 3.2.3.1.6.751 Returns ...... 674 3.2.3.1.6.752 PERCENT_RANK ...... 675 3.2.3.1.6.753 Syntax ...... 675 3.2.3.1.6.754 Arguments ...... 675 3.2.3.1.6.755 Returns ...... 675 3.2.3.1.6.756 CUME_DIST ...... 675 3.2.3.1.6.757 Syntax ...... 675 3.2.3.1.6.758 Arguments ...... 675 3.2.3.1.6.759 Returns ...... 675 3.2.3.1.6.760 NTILE ...... 676 3.2.3.1.6.761 Syntax ...... 676 3.2.3.1.6.762 System Functions ...... 676 3.2.3.1.6.763 SHOW_VERSION ...... 676 3.2.3.1.6.764 Permissions ...... 676 3.2.3.1.6.765 Syntax ...... 676 3.2.3.1.6.766 Parameters ...... 676 3.2.3.1.6.767 Notes ...... 676 3.2.3.1.6.768 Examples ...... 677 3.2.3.1.6.769 Getting the current SQream DB version ...... 677 3.3 Catalog reference ...... 677 xlii 3.3.1 Types of data exposed by sqream_catalog ...... 678 3.3.2 Tables in the catalog ...... 678 3.3.2.1 clustering_keys ...... 678 3.3.2.2 columns ...... 679 3.3.2.3 external_tables ...... 679 3.3.2.4 external_table_columns ...... 679 3.3.2.5 databases ...... 680 3.3.2.6 database_permissions ...... 680 3.3.2.7 identity_key ...... 680 3.3.2.8 permission_types ...... 680 3.3.2.9 roles ...... 680 3.3.2.10 roles_memberships ...... 680 3.3.2.11 savedqueries ...... 681 3.3.2.12 schemas ...... 681 3.3.2.13 schema_permissions ...... 681 3.3.2.14 tables ...... 681 3.3.2.15 table_permissions ...... 682 3.3.2.16 udf_permissions ...... 682 3.3.2.17 user_defined_functions ...... 682 3.3.2.18 views ...... 682 3.3.3 Additional tables ...... 682 3.3.3.1 extents ...... 682 3.3.3.2 chunk_columns ...... 683 3.3.3.3 chunks ...... 683 3.3.3.4 delete_predicates ...... 684 3.3.4 Examples ...... 684 3.3.4.1 List all tables in the database ...... 684 3.3.4.2 List all schemas in the database ...... 684 3.3.4.3 List columns and their types for a specific table ...... 684 3.3.4.4 List delete predicates ...... 685 3.3.4.5 List Saved queries ...... 685 3.4 Command line programs ...... 685 3.4.1 metadata_server ...... 685 3.4.1.1 Positional command line arguments ...... 686 3.4.1.2 Starting metadata server ...... 686 3.4.1.2.1 Starting temporarily ...... 686 3.4.1.2.2 Starting temporarily with non-default port ...... 686 3.4.1.2.3 Stopping metadata server ...... 686 3.4.2 sqreamd ...... 687 3.4.2.1 Starting SQream DB ...... 687 3.4.2.1.1 Start SQream DB temporarily ...... 687 3.4.2.2 Command line arguments ...... 687 3.4.2.2.1 Positional command arguments ...... 687 3.4.3 sqream-console ...... 688 3.4.3.1 Starting the console ...... 689 3.4.3.2 Operations and flag reference ...... 690 3.4.3.2.1 Commands ...... 690 3.4.3.2.2 Master ...... 690 3.4.3.2.2.1 Syntax ...... 690 3.4.3.2.2.2 Common usage ...... 691 3.4.3.2.2.3 Start master node ...... 691 3.4.3.2.2.4 Start master node on different ports ...... 691 3.4.3.2.2.5 Listing active master nodes and workers ...... 691 3.4.3.2.2.6 Stopping all SQream DB workers and master ...... 691

xliii 3.4.3.2.3 Workers ...... 691 3.4.3.2.3.1 Syntax ...... 691 3.4.3.2.3.2 Common usage ...... 692 3.4.3.2.3.3 Start 2 workers ...... 692 3.4.3.2.3.4 Stop a single worker ...... 692 3.4.3.2.3.5 Start workers with a different spool size ...... 692 3.4.3.2.3.6 Starting multiple workers on non-dedicated GPUs ...... 693 3.4.3.2.3.7 Overriding default configuration files ...... 693 3.4.3.2.4 Client ...... 693 3.4.3.2.4.1 Syntax ...... 693 3.4.3.2.4.2 Common usage ...... 694 3.4.3.2.4.3 Start a client ...... 694 3.4.3.2.4.4 Start a client to a specific worker ...... 694 3.4.3.2.4.5 Start master node on different ports ...... 694 3.4.3.2.4.6 Listing active master nodes and worker nodes ...... 694 3.4.3.2.5 Editor ...... 695 3.4.3.2.5.1 Syntax ...... 695 3.4.3.2.5.2 Common usage ...... 695 3.4.3.2.5.3 Start the editor UI ...... 695 3.4.3.2.5.4 Stop the editor UI ...... 695 3.4.3.3 Using the console to start SQream DB ...... 695 3.4.3.3.1 Starting a SQream DB cluster for the first time ...... 695 3.4.4 sqream-installer ...... 696 3.4.4.1 Operations and flag reference ...... 697 3.4.4.1.1 Command line flags ...... 697 3.4.4.2 Usage ...... 697 3.4.4.2.1 Install SQream DB for the first time ...... 697 3.4.4.2.2 Modify exposed directories ...... 697 3.4.4.2.3 Install a new license package ...... 698 3.4.4.2.4 View system settings ...... 698 3.4.4.2.5 Upgrading to a new version of SQream DB ...... 698 3.4.5 server_picker ...... 699 3.4.5.1 Positional command line arguments ...... 699 3.4.5.2 Starting server picker ...... 699 3.4.5.2.1 Starting temporarily ...... 699 3.4.5.2.2 Starting temporarily with non-default port ...... 699 3.4.5.2.3 Stopping server picker ...... 699 3.4.6 SqreamStorage ...... 700 3.4.6.1 Running SqreamStorage ...... 700 3.4.6.2 Command line arguments ...... 700 3.4.6.3 Examples ...... 700 3.4.6.3.1 Create a new storage cluster ...... 700 3.4.7 Sqream SQL CLI Reference ...... 700 3.4.7.1 Installing Sqream SQL ...... 701 3.4.7.1.1 Troubleshooting Sqream SQL Installation ...... 702 3.4.7.2 Using Sqream SQL ...... 702 3.4.7.2.1 Running Commands Interactively (SQL shell) ...... 702 3.4.7.2.2 Executing Batch Scripts (-f) ...... 703 3.4.7.2.3 Executing Commands Immediately (-c) ...... 704 3.4.7.3 Examples ...... 704 3.4.7.3.1 Starting a Regular Interactive Shell ...... 704 3.4.7.3.2 Executing Statements in an Interactive Shell ...... 704 3.4.7.3.3 Executing SQL Statements from the Command Line ...... 705 3.4.7.3.4 Controlling the Client Output ...... 705 xliv 3.4.7.3.4.1 Exporting SQL Query Results to CSV ...... 705 3.4.7.3.4.2 Changing a CSV to a TSV ...... 706 3.4.7.3.5 Executing a Series of Statements From a File ...... 706 3.4.7.3.6 Connecting Using Environment Variables ...... 706 3.4.7.3.7 Connecting to a Specific Queue ...... 707 3.4.7.4 Operations and Flag References ...... 707 3.4.7.4.1 Command Line Arguments ...... 707 3.4.7.4.1.1 Supported Record Delimiters ...... 707 3.4.7.4.2 Meta-Commands ...... 708 3.4.7.4.3 Basic Commands ...... 708 3.4.7.4.4 Moving Around the Command Line ...... 708 3.4.7.4.5 Searching ...... 709 3.4.8 upgrade_storage ...... 709 3.4.8.1 Running upgrade_storage ...... 709 3.4.8.2 Command line arguments ...... 709 3.4.8.3 Results and error codes ...... 709 3.4.8.4 Examples ...... 709 3.4.8.4.1 Upgrade SQream DB’s storage cluster ...... 709 3.5 SQL Feature Checklist ...... 710 3.5.1 Data types and values ...... 710 3.5.2 Contraints ...... 711 3.5.3 Transactions ...... 711 3.5.4 Indexes ...... 711 3.5.5 Schema changes ...... 712 3.5.6 Statements ...... 712 3.5.7 Clauses ...... 712 3.5.8 Table expressions ...... 713 3.5.9 Scalar expressions ...... 713 3.5.10 Permissions ...... 713 3.5.11 Extra functionality ...... 714

4 Releases 715 4.1 What’s New in 2021.2 ...... 715 4.1.1 New Features ...... 715 4.1.1.1 New Driver Compatibility ...... 715 4.1.1.2 Centralized Configuration System ...... 716 4.1.1.3 Qualifying Schemas Without Providing an Alias ...... 716 4.1.1.4 Double-Quotations Supported When Importing and Exporting CSVs .... 716 4.2 What’s New in 2021.1 ...... 717 4.2.1 Release Notes Version 2021.1.2 ...... 717 4.2.1.1 New Features ...... 717 4.2.1.1.1 Aliases Added to SUBSTRING Function and Length Argument .. 717 4.2.1.1.2 Data Type Aliases Added ...... 718 4.2.1.1.3 String Literals Containing ASCII Characters Interepreted as TEXT 718 4.2.1.1.4 Decimal Literals Interpreted as Numeric Columns ...... 718 4.2.1.1.5 Roles Area Added to Studio Version 5.3.3 ...... 718 4.2.1.2 Resolved Issues ...... 718 4.2.2 Release Notes Version 2021.1.1 ...... 718 4.2.2.1 New Features ...... 719 4.2.2.1.1 Complete Ranking Function Support ...... 719 4.2.2.2 Resolved Issues ...... 719 4.2.3 Release Notes Version 2021.1 ...... 719 4.2.3.1 Version Content ...... 719 4.2.3.2 New Features ...... 720

xlv 4.2.3.2.1 SQream DB on Cloud ...... 720 4.2.3.2.2 Numeric Data Types ...... 720 4.2.3.2.3 Text Data Type ...... 720 4.2.3.2.4 Supports Scalar Subqueries ...... 720 4.2.3.2.5 Literal Arguments ...... 721 4.2.3.2.6 Simple Scalar SQL UDFs ...... 721 4.2.3.2.7 Logging Enhancements ...... 721 4.2.3.2.8 Improved Presented License Information ...... 721 4.2.3.2.9 Optimized Foreign Data Wrapper Export ...... 721 4.2.3.3 Main Features ...... 722 4.2.3.4 Resolved Issues ...... 722 4.2.3.5 Operations and Configuration Changes ...... 723 4.2.3.5.1 Recommended SQream Configuration on Cloud ...... 723 4.2.3.5.2 Optimized Foreign Data Wrapper Export Configuration Flag .... 723 4.2.3.6 Naming Changes ...... 723 4.2.3.7 Deprecated Features ...... 723 4.2.3.8 Known Issues and Limitations ...... 723 4.2.3.9 Upgrading to v2021.1 ...... 723 4.3 What’s New in 2020.3 ...... 724 4.3.1 Release Notes Version 2020.3.2.1 ...... 724 4.3.1.1 Performance Enhancements ...... 724 4.3.1.2 Known Issues and Limitations ...... 724 4.3.1.3 Upgrading to v2020.3.2.1 ...... 724 4.3.2 Release Notes Version 2020.3.1 ...... 724 4.3.2.1 New Features ...... 724 4.3.2.2 Performance Enhancements ...... 725 4.3.2.3 Resolved Issues ...... 725 4.3.2.4 Known Issues and Limitations ...... 725 4.3.2.5 Upgrading to v2020.3.1 ...... 725 4.3.3 Release Notes Version 2020.3 ...... 726 4.3.3.1 New Features ...... 726 4.3.3.2 Performance Enhancements ...... 726 4.3.3.3 Resolved Issues ...... 726 4.3.3.4 Known Issues and Limitations ...... 727 4.3.3.5 Upgrading to v2020.3 ...... 727 4.4 What’s New in 2020.2 ...... 727 4.4.1 New features ...... 728 4.4.1.1 UI ...... 728 4.4.1.2 Integrations ...... 728 4.4.1.3 SQL support ...... 728 4.4.2 Improvements and fixes ...... 728 4.4.3 Operations ...... 729 4.4.4 Known Issues & Limitations ...... 729 4.4.5 Upgrading to v2020.2 ...... 729 4.5 What’s New in 2020.1 ...... 729 4.5.1 New features ...... 730 4.5.1.1 Integrations ...... 730 4.5.1.2 SQL support ...... 730 4.5.2 Improvements and fixes ...... 730 4.5.3 Behaviour changes ...... 731 4.5.4 Operations ...... 732 4.5.5 Known Issues & Limitations ...... 733 4.5.6 Upgrading to v2020.1 ...... 733

xlvi 5 Glossary 735

Index 737

xlvii xlviii SQream DB, Release 2021.1

For SQream DB v2021.1.

Tip: This documentation is available online at https://docs.sqream.com/

SQream DB is a columnar analytic SQL database management system. SQream DB supports regular SQL including a substantial amount of ANSI SQL, uses serializable transac- tions, and scales horizontally for concurrent statements. Even a basic SQream DB machine can support tens to hundreds of terabytes of data. SQream DB easily plugs in to third-party tools like Tableau comes with standard SQL client drivers, including JDBC, ODBC, and Python DB-API.

Get Started Reference Guides First steps with SQream DB SQL Reference Setting up SQream DB SQL Feature Checklist SQL Statements Best practices Bulk load CSVs SQL Functions Connecting to SQream Using Tableau Releases Driver and Deployment Help and Support 2021.2 Client drivers Troubleshooting guide 2021.1 Third party tools integration Gathering information for 2020.3 Connecting to SQream Using SQream support 2020.2 Tableau 2020.1 All recent releases

Need help?

If you couldn’t find what you’re looking for, we’re always happy to help. Visit SQream’s support portal for additional support.

Looking for older versions?

This version of the documentation is for SQream DB v2021.1. If you’re looking for an older version of the documentation, versions 1.10 through 2019.2.1 are available at http://previous.sqream.com .

Contents: 1 SQream DB, Release 2021.1

2 Contents: CHAPTER 1

First steps with SQream DB

This tutorial takes you through a few basic operations in SQream DB.

In this topic:

• Preparing for this tutorial • Create a new database for playing around in • Creating your first table • Listing tables • Inserting rows • Queries • Deleting rows • Saving query results to a CSV or PSV file

1.1 Preparing for this tutorial

This tutorial assumes you already have a SQream DB cluster running and the SQream command line client installed on the machine you are on.

If you haven’t already:

• Set up a SQream DB cluster. • Install SQream SQL CLI. • See our Hardware guide.

3 SQream DB, Release 2021.1

Run the SQream SQL client like this. It will interactively ask for the password.

$ sqream --port=5000 --username=rhendricks -d master Password:

Interactive client mode To quit, use ^D or \q.

master=> _

You should use a username and password that you have set up or your DBA has given you.

Tip: • To exit the shell, type \q or Ctrl-d. • A new SQream DB cluster contains a database named master. We will start with this database.

1.2 Create a new database for playing around in

To create a database, we will use the CREATE DATABASE syntax.

master=> CREATE DATABASE test; executed

Now, reconnect to the newly created database. First, exit the client by typing \q and hitting enter. From the Linux shell, restart the client with the new database name:

$ sqream sql --port=5000 --username=rhendricks -d test Password:

Interactive client mode To quit, use ^D or \q.

test=> _

The new database name appears in the prompt. This lets you know which database you’re connected to.

1.3 Creating your first table

To create a table, we will use the CREATE TABLE syntax, with a table name and some column specifications. For example,

CREATE TABLE cool_animals ( id INT NOT NULL, name VARCHAR(20), weight INT );

4 Chapter 1. First steps with SQream DB SQream DB, Release 2021.1

If the table already exists and you want to drop the current table and create a new one, you can add OR REPLACE after the CREATE keyword.

CREATE OR REPLACE TABLE cool_animals ( id INT NOT NULL, name VARCHAR(20), weight INT );

You can ask SQream DB to list the full, verbose CREATE TABLE statement for any table, by using the GET_DDL function, with the table name. test=> SELECT GET_DDL('cool_animals'); create table "public"."cool_animals" ( "id" int not null, "name" varchar(20), "weight" int );

If you are done with this table, you can use DROP TABLE to remove the table and all of its data. test=> DROP TABLE cool_animals; executed

1.4 Listing tables

To see the tables in the current database, we will query the catalog test=> SELECT table_name FROM sqream_catalog.tables; cool_animals

1 rows

1.5 Inserting rows

Inserting rows into a table can be performed with the INSERT statement. The statement includes the table name, an optional list of column names, and column values listed in the same order as the column names: test=> INSERT INTO cool_animals VALUES (1, 'Dog', 7); executed

To change the order of values, specify the column order: test=> INSERT INTO cool_animals(weight, id, name) VALUES (3, 2, 'Possum'); executed

You can use INSERT to insert multiple rows too. Here, you use sets of parentheses separated by commas:

1.4. Listing tables 5 SQream DB, Release 2021.1

test=> INSERT INTO cool_animals VALUES (3, 'Cat', 5), (4, 'Elephant', 6500), (5, 'Rhinoceros', 2100); executed

Note: To load big data sets, use bulk loading methods instead. See our Inserting data guide for more information.

When you leave out columns that have a default value (including default NULL value) the default value is used. test=> INSERT INTO cool_animals (id) VALUES (6); executed test=> INSERT INTO cool_animals (id) VALUES (6); executed test=> SELECT * FROM cool_animals; 1,Dog ,7 2,Possum ,3 3,Cat ,5 4,Elephant ,6500 5,Rhinoceros ,2100 6,\N,\N

6 rows

Note: Null row values are represented as \N

1.6 Queries

For querying, use the SELECT keyword, followed by a list of columns and values to be returned, and the table to get the data from. test=> SELECT id, name, weight FROM cool_animals; 1,Dog ,7 2,Possum ,3 3,Cat ,5 4,Elephant ,6500 5,Rhinoceros ,2100 6,\N,\N

6 rows

To get all columns without specifying them, use the star operator *:

6 Chapter 1. First steps with SQream DB SQream DB, Release 2021.1

test=> SELECT * FROM cool_animals; 1,Dog ,7 2,Possum ,3 3,Cat ,5 4,Elephant ,6500 5,Rhinoceros ,2100 6,\N,\N

6 rows

To get the number of values in a table without getting the full result set, use COUNT(*): test=> SELECT COUNT(*) FROM cool_animals; 6

1 row

Filter results by adding a WHERE clause and specifying the filter condition: test=> SELECT id, name, weight FROM cool_animals WHERE weight > 1000; 4,Elephant ,6500 5,Rhinoceros ,2100

2 rows

Sort the results by adding an ORDER BY clause, and specifying ascending (ASC) or descending (DESC) order: test=> SELECT * FROM cool_animals ORDER BY weight DESC; 4,Elephant ,6500 5,Rhinoceros ,2100 1,Dog ,7 3,Cat ,5 2,Possum ,3 6,\N,\N

6 rows

Filter null rows by adding a filter IS NOT NULL: test=> SELECT * FROM cool_animals WHERE weight IS NOT NULL ORDER BY weight DESC; 4,Elephant ,6500 5,Rhinoceros ,2100 1,Dog ,7 3,Cat ,5 2,Possum ,3

5 rows

1.6. Queries 7 SQream DB, Release 2021.1

1.7 Deleting rows

To delete rows in a table selectively, use the DELETE command, with a table name and a WHERE clause to specify which rows are to be deleted: test=> DELETE FROM cool_animals WHERE weight is null; executed master=> SELECT * FROM cool_animals; 1,Dog ,7 2,Possum ,3 3,Cat ,5 4,Elephant ,6500 5,Rhinoceros ,2100

5 rows

To delete all rows in a table, use the TRUNCATE command followed by the table name: test=> TRUNCATE TABLE cool_animals; executed

Note: While TRUNCATE deletes data from disk immediately, DELETE does not physically remove the deleted rows. For more information on removing the rows from disk, see DELETE.

1.8 Saving query results to a CSV or PSV file

The command line client sqream sql can be used to save query results to a CSV or other delimited file format.

$ sqream sql --username=mjordan --database=nba --host=localhost --port=5000 -c "SELECT *␣ ,→FROM nba LIMIT 5" --results-only --delimiter='|' > nba.psv $ cat nba.psv Avery Bradley |Boston Celtics |0|PG|25|6-2 |180|Texas ␣ ,→|7730337 Jae Crowder |Boston Celtics |99|SF|25|6-6 |235|Marquette ␣ ,→|6796117 John Holland |Boston Celtics |30|SG|27|6-5 |205|Boston University ␣ ,→|\N R.J. Hunter |Boston Celtics |28|SG|22|6-5 |185|Georgia State ␣ ,→|1148640 Jonas Jerebko |Boston Celtics |8|PF|29|6-10|231|\N|5000000

See the Controlling the output of the client section of the reference for more options.

What’s next?

• Explore all of SQream DB’s SQL Syntax • See the full SQream SQL CLI reference

8 Chapter 1. First steps with SQream DB SQream DB, Release 2021.1

• Connect a third party tool to SQream DB and start analyzing data

1.8. Saving query results to a CSV or PSV file 9 SQream DB, Release 2021.1

10 Chapter 1. First steps with SQream DB CHAPTER 2

Guides

This section has concept and features guides, and task focused guides.

Recommended guides

• Optimization and best practices in SQream DB • Using third party tools • Client drivers for SQream DB

2.1 Full list of guides

2.1.1 Inserting data

This guide covers inserting data into SQream DB, with subguides on inserting data from a variety of sources and locations.

In this topic:

• Data loading overview • Data loading considerations – Verify data and performance after load – File source location for loading – Supported load methods – Unsupported data types – Extended error handling

11 SQream DB, Release 2021.1

– Best practices for CSV – Best practices for Parquet ∗ Type support and behavior notes – Best practices for ORC ∗ Type support and behavior notes • Further reading and migration guides

2.1.1.1 Data loading overview

SQream DB supports importing data from the following sources: • Using INSERT with a client driver • Using COPY FROM: – Local filesystem and locally mounted network filesystems – Amazon S3 – Using SQream in an HDFS Environment • Using Foreign Tables: – Local filesystem and locally mounted network filesystems – Amazon S3 – Using SQream in an HDFS Environment SQream DB supports loading files in the following formats: • Text - CSV, TSV, PSV • Parquet • ORC

2.1.1.2 Data loading considerations

2.1.1.2.1 Verify data and performance after load

Like other RDBMSs, SQream DB has its own set of best practcies for table design and query optimization. SQream therefore recommends: • Verify that the data is as you expect it (e.g. row counts, data types, formatting, content) • The performance of your queries is adequate • Best practices were followed for table design • Applications such as Tableau and others have been tested, and work • Data types were not over-provisioned (e.g. don’t use VARCHAR(2000) to store a short string)

12 Chapter 2. Guides SQream DB, Release 2021.1

2.1.1.2.2 File source location for loading

During loading using COPY FROM, the statement can run on any worker. If you are running multiple nodes, make sure that all nodes can see the source the same. If you load from a local file which is only on 1 node and not on shared storage, it will fail some of the time. (If you need to, you can also control which node a statement runs on using the Workload Manager).

2.1.1.2.3 Supported load methods

SQream DB’s COPY FROM syntax can be used to load CSV files, but can’t be used for Parquet and ORC. FOREIGN TABLE can be used to load text files, Parquet, and ORC files, and can also transform thedata prior to materialization as a full table.

Method / File type Text (CSV) Parquet ORC Streaming data COPY FROM ✓ ￿ ￿ ￿ Foreign Tables ✓ ✓ ✓ ￿ INSERT ￿ ￿ ￿ ✓ (Python, JDBC, Node.JS)

2.1.1.2.4 Unsupported data types

SQream DB doesn’t support the entire set of features that some other database systems may have, such as ARRAY, BLOB, ENUM, SET, etc. These data types will have to be converted before load. For example, ENUM can often be stored as a VARCHAR.

2.1.1.2.5 Extended error handling

While external tables can be used to load CSVs, the COPY FROM statement provides more fine-grained er- ror handling options, as well as extended support for non-standard CSVs with multi-character delimiters, alternate timestamp formats, and more.

2.1.1.2.6 Best practices for CSV

Text files like CSV rarely conform to RFC 4180 , so alterations may be required: • Use OFFSET 2 for files containing header rows • Failed rows can be captured in a log file for later analysis, or just to skip them. See Unsupported Field Delimiters for information on skipping rejected rows. • Record delimiters (new lines) can be modified with the RECORD DELIMITER syntax. • If the date formats differ from ISO 8601, refer tothe Supported Date Formats section to see how to override default parsing. • Fields in a CSV can be optionally quoted with double-quotes ("). However, any field containing a newline or another double-quote character must be quoted. If a field is quoted, any double quote that appears must be double-quoted (similar tothe string literals quoting rules. For example, to encode What are "birds"?, the field should appear as "What are ""birds""?".

2.1. Full list of guides 13 SQream DB, Release 2021.1

• Field delimiters don’t have a to be a displayable ASCII character. See Supported Field Delimiters for all options.

2.1.1.2.7 Best practices for Parquet

• Parquet files are loaded through Foreign Tables. The destination table structure has to match in number of columns between the source files. • Parquet files support predicate pushdown. When a query is issued over Parquet files, SQream DBuses row-group metadata to determine which row-groups in a file need to be read for a particular query and the row indexes can narrow the search to a particular set of rows.

2.1.1.2.7.1 Type support and behavior notes

• Unlike ORC, the column types should match the data types exactly (see table below).

SQream DB type → BOOL TINYINT SMALLINT INT BIGINT REAL DOUBLE Text1 DATE DATETIME Parquet source BOOLEAN ✓ INT16 ✓ INT32 ✓ INT64 ✓ FLOAT ✓ DOUBLE ✓ BYTE_ARRAY2 ✓ INT963 ✓4

• If a Parquet file has an unsupported type like enum, uuid, time, json, bson, lists, maps, but the data is not referenced in the table (it does not appear in the SELECT query), the statement will succeed. If the column is referenced, an error will be thrown to the user, explaining that the type is not supported, but the column may be ommited.

2.1.1.2.8 Best practices for ORC

• ORC files are loaded through Foreign Tables. The destination table structure has to match in number of columns between the source files. • ORC files support predicate pushdown. When a query is issued over ORC files, SQream DBusesORC metadata to determine which stripes in a file need to be read for a particular query and the row indexes can narrow the search to a particular set of 10,000 rows.

2.1.1.2.8.1 Type support and behavior notes

• ORC files are loaded through Foreign Tables. The destination table structure has to match in number of columns between the source files. • The types should match to some extent within the same “class” (see table below).

1 Text values include TEXT, VARCHAR, and NVARCHAR 2 With UTF8 annotation 3 With TIMESTAMP_NANOS or TIMESTAMP_MILLIS annotation 4 Any microseconds will be rounded down to milliseconds.

14 Chapter 2. Guides SQream DB, Release 2021.1

SQream DB type → BOOL TINYINT SMALLINTINT BIGINT REAL DOUBLE Text1 DATE DATETIME ORC source boolean ✓ ✓5 ✓5 ✓5 ✓5 tinyint ￿6 ✓ ✓ ✓ ✓ smallint ￿6 ￿7 ✓ ✓ ✓ int ￿6 ￿7 ￿7 ✓ ✓ bigint ￿6 ￿7 ￿7 ￿7 ✓ float ✓ ✓ double ✓ ✓ string / char / varchar ✓ date ✓ ✓ timestamp, timestamp ✓ with timezone

• If an ORC file has an unsupported type like binary, list, map, and union, but the data is not referenced in the table (it does not appear in the SELECT query), the statement will succeed. If the column is referenced, an error will be thrown to the user, explaining that the type is not supported, but the column may be ommited.

2.1.1.3 Further reading and migration guides

2.1.1.3.1 Insert from CSV

This guide covers inserting data from CSV files into SQream DB using the COPY FROM method.

In this topic:

• 1. Prepare CSVs • 2. Place CSVs where SQream DB workers can access • 3. Figure out the table structure • 4. Bulk load the data with COPY FROM • Loading different types of CSV files – Loading a standard CSV file from a local filesystem – Loading a PSV (pipe separated value) file – Loading a TSV (tab separated value) file – Loading a text file with non-printable delimiter – Loading a text file with multi-character delimiters – Loading files with a header row – Loading files formatted for Windows (\r\n) – Loading a file from a public S3 bucket – Loading files from an authenticated S3 bucket

5 Boolean values are cast to 0, 1 6 Will succeed if all values are 0, 1 7 Will succeed if all values fit the destination type

2.1. Full list of guides 15 SQream DB, Release 2021.1

– Loading files from an HDFS storage – Saving rejected rows to a file – Stopping the load if a certain amount of rows were rejected – Load CSV files from a set of directories – Rearrange destination columns – Loading non-standard dates

2.1.1.3.1.1 1. Prepare CSVs

Prepare the source CSVs, with the following requirements: • Files should be a valid CSV. By default, SQream DB’s CSV parser can handle RFC 4180 standard CSVs , but can also be modified to support non-standard CSVs (with multi-character delimiters, unquoted fields, etc). • Files are UTF-8 or ASCII encoded • Field delimiter is an ASCII character or characters • Record delimiter, also known as a new line separator, is a Unix-style newline (\n), DOS-style newline (\r\n), or Mac style newline (\r). • Fields are optionally enclosed by double-quotes, or mandatory quoted if they contain one of the fol- lowing characters: – The record delimiter or field delimiter – A double quote character – A newline • If a field is quoted, any double quote that appears must be double-quoted (similar tothe string literals quoting rules. For example, to encode What are "birds"?, the field should appear as "What are ""birds""?". Other modes of escaping are not supported (e.g. 1,"What are \"birds\"?" is not a valid way of escaping CSV values). • NULL values can be marked in two ways in the CSV: – An explicit null marker. For example, col1,\N,col3 – An empty field delimited by the field delimiter. For example, col1,,col3

Note: If a text field is quoted but contains no content ("") it is considered an empty text field. It is not considered NULL.

2.1.1.3.1.2 2. Place CSVs where SQream DB workers can access

During data load, the COPY FROM command can run on any worker (unless explicitly speficied with the Workload Manager). It is important that every node has the same view of the storage being used - meaning, every SQream DB worker should have access to the files. • For files hosted on NFS, ensure that the mount is accessible from all servers.

16 Chapter 2. Guides SQream DB, Release 2021.1

• For HDFS, ensure that SQream DB servers can access the HDFS name node with the correct user-id. See our Using SQream in an HDFS Environment guide for more information. • For S3, ensure network access to the S3 endpoint. See our Amazon S3 guide for more information.

2.1.1.3.1.3 3. Figure out the table structure

Prior to loading data, you will need to write out the table structure, so that it matches the file structure. For example, to import the data from nba.csv, we will first look at the file:

Table 1: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics

• The file format in this case is CSV, and it is stored as an S3object. • The first row of the file is a header containing column names. • The record delimiter was a DOS newline (\r\n). • The file is stored on S3, at s3://sqream-demo-data/nba.csv. We will make note of the file structure to create a matching CREATE TABLE statement.

CREATE TABLE nba ( Name varchar(40), Team varchar(40), Number tinyint, Position varchar(2), Age tinyint, Height varchar(4), Weight real, College varchar(40), Salary float );

2.1. Full list of guides 17 SQream DB, Release 2021.1

2.1.1.3.1.4 4. Bulk load the data with COPY FROM

The CSV is a standard CSV, but with two differences from SQream DB defaults: • The record delimiter is not a Unix newline (\n), but a Windows newline (\r\n) • The first row of the file is a header containing column names, which we’ll wanttoskip.

COPY nba FROM 's3://sqream-demo-data/nba.csv' WITH RECORD DELIMITER '\r\n' OFFSET 2;

Repeat steps 3 and 4 for every CSV file you want to import.

2.1.1.3.1.5 Loading different types of CSV files

COPY FROM contains several configuration options. See more in the COPY FROM elements section.

2.1.1.3.1.6 Loading a standard CSV file from a local filesystem

COPY table_name FROM '/home/rhendricks/file.csv';

2.1.1.3.1.7 Loading a PSV (pipe separated value) file

COPY table_name FROM '/home/rhendricks/file.psv' WITH DELIMITER '|';

2.1.1.3.1.8 Loading a TSV (tab separated value) file

COPY table_name FROM '/home/rhendricks/file.tsv' WITH DELIMITER '\t';

2.1.1.3.1.9 Loading a text file with non-printable delimiter

In the file below, the separator is DC1, which is represented by ASCII 17 decimal or 021 octal.

COPY table_name FROM 'file.txt' WITH DELIMITER E'\021';

2.1.1.3.1.10 Loading a text file with multi-character delimiters

In the file below, the separator is '|.

COPY table_name FROM 'file.txt' WITH DELIMITER '''|';

18 Chapter 2. Guides SQream DB, Release 2021.1

2.1.1.3.1.11 Loading files with a header row

Use OFFSET to skip rows.

Note: When loading multiple files (e.g. with wildcards), this setting affects each file separately.

COPY table_name FROM 'filename.psv' WITH DELIMITER '|' OFFSET 2;

2.1.1.3.1.12 Loading files formatted for Windows (\r\n)

COPY table_name FROM 'filename.psv' WITH DELIMITER '|' RECORD DELIMITER '\r\n';

2.1.1.3.1.13 Loading a file from a public S3 bucket

Note: The bucket must be publicly available and objects can be listed

COPY nba FROM 's3://sqream-demo-data/nba.csv' WITH OFFSET 2 RECORD DELIMITER '\r\n';

2.1.1.3.1.14 Loading files from an authenticated S3 bucket

COPY nba FROM 's3://secret-bucket/*.csv' WITH OFFSET 2 RECORD DELIMITER '\r\n' AWS_ID ,→'12345678' AWS_SECRET 'super_secretive_secret';

2.1.1.3.1.15 Loading files from an HDFS storage

COPY nba FROM 'hdfs://hadoop-nn.piedpiper.com/rhendricks/*.csv' WITH OFFSET 2 RECORD␣ ,→DELIMITER '\r\n';

2.1.1.3.1.16 Saving rejected rows to a file

See Unsupported Field Delimiters for more information about the error handling capabilities of COPY FROM.

COPY table_name FROM 'filename.psv' WITH DELIMITER '|' ERROR_LOG '/temp/load_error.log' -- Save error log ERROR_VERBOSITY 0; -- Only save rejected rows

2.1.1.3.1.17 Stopping the load if a certain amount of rows were rejected

2.1. Full list of guides 19 SQream DB, Release 2021.1

COPY table_name FROM 'filename.csv' WITH delimiter '|' ERROR_LOG '/temp/load_err.log' -- Save error␣ ,→log OFFSET 2 -- skip header row LIMIT 100 -- Only load 100 rows STOP AFTER 5 ERRORS; -- Stop the load if 5␣ ,→errors reached

2.1.1.3.1.18 Load CSV files from a set of directories

Use glob patterns (wildcards) to load multiple files to one table.

COPY table_name from '/path/to/files/2019_08_*/*.csv';

2.1.1.3.1.19 Rearrange destination columns

When the source of the files does not match the table structure, tell the COPY command what the order of columns should be

COPY table_name (fifth, first, third) FROM '/path/to/files/*.csv';

Note: Any column not specified will revert to its default value or NULL value if nullable

2.1.1.3.1.20 Loading non-standard dates

If files contain dates not formatted as ISO8601, tell COPY how to parse the column. After parsing, the date will appear as ISO8601 inside SQream DB. In this example, date_col1 and date_col2 in the table are non-standard. date_col3 is mentioned explicitly, but can be left out. Any column that is not specified is assumed to be ISO8601.

COPY table_name FROM '/path/to/files/*.csv' WITH PARSERS 'date_col1=YMD,date_col2=MDY, ,→date_col3=default';

Tip: The full list of supported date formats can be found under the Supported date formats section of the COPY FROM reference.

2.1.1.3.2 Insert from Parquet

This guide covers inserting data from Parquet files into SQream DB using FOREIGN TABLE.

In this topic:

• 1. Prepare the files

20 Chapter 2. Guides SQream DB, Release 2021.1

• 2. Place Parquet files where SQream DB workers can access them • 3. Figure out the table structure • 4. Verify table contents • 5. Copying data into SQream DB – Working around unsupported column types – Modifying data during the copy process • Further Parquet loading examples – Loading a table from a directory of Parquet files on HDFS – Loading a table from a bucket of files on S3

2.1.1.3.2.1 1. Prepare the files

Prepare the source Parquet files, with the following requirements:

SQream DB type → BOOL TINYINT SMALLINTINT BIGINT REAL DOUBLE Text1 DATE DATETIME Parquet source BOOLEAN ✓ INT16 ✓ INT32 ✓ INT64 ✓ FLOAT ✓ DOUBLE ✓ BYTE_ARRAY / ✓ FIXED_LEN_BYTE_ARRAY2 INT963 ✓4

• If a Parquet file has an unsupported type like enum, uuid, time, json, bson, lists, maps, but the data is not referenced in the table (it does not appear in the SELECT query), the statement will succeed. If the column is referenced, an error will be thrown to the user, explaining that the type is not supported, but the column may be ommited. This can be worked around. See more information in the examples.

2.1.1.3.2.2 2. Place Parquet files where SQream DB workers can access them

Any worker may try to access files (unless explicitly speficied with the Workload Manager). It is important that every node has the same view of the storage being used - meaning, every SQream DB worker should have access to the files. • For files hosted on NFS, ensure that the mount is accessible from all servers. • For HDFS, ensure that SQream DB servers can access the HDFS name node with the correct user-id. See our Using SQream in an HDFS Environment guide for more information. • For S3, ensure network access to the S3 endpoint. See our Amazon S3 guide for more information.

1 Text values include TEXT, VARCHAR, and NVARCHAR 2 With UTF8 annotation 3 With TIMESTAMP_NANOS or TIMESTAMP_MILLIS annotation 4 Any microseconds will be rounded down to milliseconds.

2.1. Full list of guides 21 SQream DB, Release 2021.1

2.1.1.3.2.3 3. Figure out the table structure

Prior to loading data, you will need to write out the table structure, so that it matches the file structure. For example, to import the data from nba.parquet, we will first look at the source table:

Table 2: nba.parquet Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics

• The file is stored on S3, at s3://sqream-demo-data/nba.parquet. We will make note of the file structure to create a matching CREATE EXTERNAL TABLE statement.

CREATE FOREIGN TABLE ext_nba ( Name VARCHAR(40), Team VARCHAR(40), Number BIGINT, Position VARCHAR(2), Age BIGINT, Height VARCHAR(4), Weight BIGINT, College VARCHAR(40), Salary FLOAT ) WRAPPER parquet_fdw OPTIONS ( LOCATION = 's3://sqream-demo-data/nba.parquet' );

Tip: Types in SQream DB must match Parquet types exactly. If the column type isn’t supported, a possible workaround is to set it to any arbitrary type and then exclude

22 Chapter 2. Guides SQream DB, Release 2021.1 it from subsequent queries.

2.1.1.3.2.4 4. Verify table contents

External tables do not verify file integrity or structure, so verify that the table definition matches upand contains the correct data. t=> SELECT * FROM ext_nba LIMIT 10; Name | Team | Number | Position | Age | Height | Weight | College ␣ ,→ | Salary ------+------+------+------+-----+------+------+------,→------+------Avery Bradley | Boston Celtics | 0 | PG | 25 | 6-2 | 180 | Texas ␣ ,→ | 7730337 Jae Crowder | Boston Celtics | 99 | SF | 25 | 6-6 | 235 | Marquette ␣ ,→ | 6796117 John Holland | Boston Celtics | 30 | SG | 27 | 6-5 | 205 | Boston␣ ,→University | R.J. Hunter | Boston Celtics | 28 | SG | 22 | 6-5 | 185 | Georgia␣ ,→State | 1148640 Jonas Jerebko | Boston Celtics | 8 | PF | 29 | 6-10 | 231 | ␣ ,→ | 5000000 Amir Johnson | Boston Celtics | 90 | PF | 29 | 6-9 | 240 | ␣ ,→ | 12000000 Jordan Mickey | Boston Celtics | 55 | PF | 21 | 6-8 | 235 | LSU ␣ ,→ | 1170960 Kelly Olynyk | Boston Celtics | 41 | C | 25 | 7-0 | 238 | Gonzaga ␣ ,→ | 2165160 Terry Rozier | Boston Celtics | 12 | PG | 22 | 6-2 | 190 | Louisville ␣ ,→ | 1824360 Marcus Smart | Boston Celtics | 36 | PG | 22 | 6-4 | 220 | Oklahoma␣ ,→State | 3431040

If any errors show up at this stage, verify the structure of the Parquet files and match them to the external table structure you created.

2.1.1.3.2.5 5. Copying data into SQream DB

To load the data into SQream DB, use the CREATE TABLE AS statement:

CREATE TABLE nba AS SELECT * FROM ext_nba;

2.1.1.3.2.6 Working around unsupported column types

Suppose you only want to load some of the columns - for example, if one of the columns isn’t supported. By ommitting unsupported columns from queries that access the EXTERNAL TABLE, they will never be called, and will not cause a “type mismatch” error. For this example, assume that the Position column isn’t supported because of its type.

2.1. Full list of guides 23 SQream DB, Release 2021.1

CREATE TABLE nba AS SELECT Name, Team, Number, NULL as Position, Age, Height, Weight, College, Salary␣ ,→FROM ext_nba;

-- We ommitted the unsupported column `Position` from this query, and replaced it with a␣ ,→default ``NULL`` value, to maintain the same table structure.

2.1.1.3.2.7 Modifying data during the copy process

One of the main reasons for staging data with EXTERNAL TABLE is to examine the contents and modify them before loading them. Assume we are unhappy with weight being in pounds, because we want to use kilograms instead. We can apply the transformation as part of the CREATE TABLE AS statement. Similar to the previous example, we will also set the Position column as a default NULL.

CREATE TABLE nba AS SELECT name, team, number, NULL as position, age, height, (weight / 2.205) as weight,␣ ,→college, salary FROM ext_nba ORDER BY weight;

2.1.1.3.2.8 Further Parquet loading examples

CREATE FOREIGN TABLE contains several configuration options. See more in the CREATE FOREIGN TABLE parameters section.

2.1.1.3.2.9 Loading a table from a directory of Parquet files on HDFS

CREATE FOREIGN TABLE ext_users (id INT NOT NULL, name VARCHAR(30) NOT NULL, email VARCHAR(50) NOT NULL) WRAPPER parquet_fdw OPTIONS ( LOCATION = 'hdfs://hadoop-nn.piedpiper.com/rhendricks/users/*.parquet' );

CREATE TABLE users AS SELECT * FROM ext_users;

2.1.1.3.2.10 Loading a table from a bucket of files on S3

CREATE FOREIGN TABLE ext_users (id INT NOT NULL, name VARCHAR(30) NOT NULL, email VARCHAR(50) NOT NULL) WRAPPER parquet_fdw OPTIONS ( LOCATION = 's3://pp-secret-bucket/users/*.parquet', AWS_ID = 'our_aws_id', (continues on next page)

24 Chapter 2. Guides SQream DB, Release 2021.1

(continued from previous page) AWS_SECRET = 'our_aws_secret' );

CREATE TABLE users AS SELECT * FROM ext_users;

2.1.1.3.3 Insert from ORC

This guide covers inserting data from ORC files into SQream DB using FOREIGN TABLE.

2.1.1.3.3.1 1. Prepare the files

Prepare the source ORC files, with the following requirements:

SQream DB type → BOOL TINYINT SMALLINTINT BIGINT REAL DOUBLE Text1 DATE DATETIME ORC source boolean ✓ ✓2 ✓2 ✓2 ✓2 tinyint ￿3 ✓ ✓ ✓ ✓ smallint ￿3 ￿4 ✓ ✓ ✓ int ￿3 ￿4 ￿4 ✓ ✓ bigint ￿3 ￿4 ￿4 ￿4 ✓ float ✓ ✓ double ✓ ✓ string / char / varchar ✓ date ✓ ✓ timestamp, timestamp ✓ with timezone

• If an ORC file has an unsupported type like binary, list, map, and union, but the data is not referenced in the table (it does not appear in the SELECT query), the statement will succeed. If the column is referenced, an error will be thrown to the user, explaining that the type is not supported, but the column may be ommited. This can be worked around. See more information in the examples.

2.1.1.3.3.2 2. Place ORC files where SQream DB workers can access them

Any worker may try to access files (unless explicitly speficied with the Workload Manager). It is important that every node has the same view of the storage being used - meaning, every SQream DB worker should have access to the files. • For files hosted on NFS, ensure that the mount is accessible from all servers. • For HDFS, ensure that SQream DB servers can access the HDFS name node with the correct user-id. See our Using SQream in an HDFS Environment guide for more information. • For S3, ensure network access to the S3 endpoint. See our Amazon S3 guide for more information.

1 Text values include TEXT, VARCHAR, and NVARCHAR 2 Boolean values are cast to 0, 1 3 Will succeed if all values are 0, 1 4 Will succeed if all values fit the destination type

2.1. Full list of guides 25 SQream DB, Release 2021.1

2.1.1.3.3.3 3. Figure out the table structure

Prior to loading data, you will need to write out the table structure, so that it matches the file structure. For example, to import the data from nba.orc, we will first look at the source table:

Table 3: nba.orc Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics

• The file is stored on S3, at s3://sqream-demo-data/nba.orc. We will make note of the file structure to create a matching CREATE FOREIGN TABLE statement.

CREATE FOREIGN TABLE ext_nba ( Name VARCHAR(40), Team VARCHAR(40), Number BIGINT, Position VARCHAR(2), Age BIGINT, Height VARCHAR(4), Weight BIGINT, College VARCHAR(40), Salary FLOAT ) WRAPPER orc_fdw OPTIONS ( LOCATION = 's3://sqream-demo-data/nba.orc' );

Tip: Types in SQream DB must match ORC types according to the table above. If the column type isn’t supported, a possible workaround is to set it to any arbitrary type and then exclude

26 Chapter 2. Guides SQream DB, Release 2021.1 it from subsequent queries.

2.1.1.3.3.4 4. Verify table contents

External tables do not verify file integrity or structure, so verify that the table definition matches upand contains the correct data. t=> SELECT * FROM ext_nba LIMIT 10; Name | Team | Number | Position | Age | Height | Weight | College ␣ ,→ | Salary ------+------+------+------+-----+------+------+------,→------+------Avery Bradley | Boston Celtics | 0 | PG | 25 | 6-2 | 180 | Texas ␣ ,→ | 7730337 Jae Crowder | Boston Celtics | 99 | SF | 25 | 6-6 | 235 | Marquette ␣ ,→ | 6796117 John Holland | Boston Celtics | 30 | SG | 27 | 6-5 | 205 | Boston␣ ,→University | R.J. Hunter | Boston Celtics | 28 | SG | 22 | 6-5 | 185 | Georgia␣ ,→State | 1148640 Jonas Jerebko | Boston Celtics | 8 | PF | 29 | 6-10 | 231 | ␣ ,→ | 5000000 Amir Johnson | Boston Celtics | 90 | PF | 29 | 6-9 | 240 | ␣ ,→ | 12000000 Jordan Mickey | Boston Celtics | 55 | PF | 21 | 6-8 | 235 | LSU ␣ ,→ | 1170960 Kelly Olynyk | Boston Celtics | 41 | C | 25 | 7-0 | 238 | Gonzaga ␣ ,→ | 2165160 Terry Rozier | Boston Celtics | 12 | PG | 22 | 6-2 | 190 | Louisville ␣ ,→ | 1824360 Marcus Smart | Boston Celtics | 36 | PG | 22 | 6-4 | 220 | Oklahoma␣ ,→State | 3431040

If any errors show up at this stage, verify the structure of the ORC files and match them to the external table structure you created.

2.1.1.3.3.5 5. Copying data into SQream DB

To load the data into SQream DB, use the CREATE TABLE AS statement:

CREATE TABLE nba AS SELECT * FROM ext_nba;

2.1.1.3.3.6 Working around unsupported column types

Suppose you only want to load some of the columns - for example, if one of the columns isn’t supported. By ommitting unsupported columns from queries that access the EXTERNAL TABLE, they will never be called, and will not cause a “type mismatch” error. For this example, assume that the Position column isn’t supported because of its type.

2.1. Full list of guides 27 SQream DB, Release 2021.1

CREATE TABLE nba AS SELECT Name, Team, Number, NULL as Position, Age, Height, Weight, College, Salary␣ ,→FROM ext_nba;

-- We ommitted the unsupported column `Position` from this query, and replaced it with a␣ ,→default ``NULL`` value, to maintain the same table structure.

2.1.1.3.3.7 Modifying data during the copy process

One of the main reasons for staging data with EXTERNAL TABLE is to examine the contents and modify them before loading them. Assume we are unhappy with weight being in pounds, because we want to use kilograms instead. We can apply the transformation as part of the CREATE TABLE AS statement. Similar to the previous example, we will also set the Position column as a default NULL.

CREATE TABLE nba AS SELECT name, team, number, NULL as position, age, height, (weight / 2.205) as weight,␣ ,→college, salary FROM ext_nba ORDER BY weight;

2.1.1.3.3.8 Further ORC loading examples

CREATE FOREIGN TABLE contains several configuration options. See more in the CREATE FOREIGN TABLE parameters section.

2.1.1.3.3.9 Loading a table from a directory of ORC files on HDFS

CREATE FOREIGN TABLE ext_users (id INT NOT NULL, name VARCHAR(30) NOT NULL, email VARCHAR(50) NOT NULL) WRAPPER orc_fdw OPTIONS ( LOCATION = 'hdfs://hadoop-nn.piedpiper.com/rhendricks/users/*.ORC' );

CREATE TABLE users AS SELECT * FROM ext_users;

2.1.1.3.3.10 Loading a table from a bucket of files on S3

CREATE FOREIGN TABLE ext_users (id INT NOT NULL, name VARCHAR(30) NOT NULL, email VARCHAR(50) NOT NULL) WRAPPER orc_fdw OPTIONS ( LOCATION = 's3://pp-secret-bucket/users/*.ORC', AWS_ID = 'our_aws_id', (continues on next page)

28 Chapter 2. Guides SQream DB, Release 2021.1

(continued from previous page) AWS_SECRET = 'our_aws_secret' ) ;

CREATE TABLE users AS SELECT * FROM ext_users;

2.1.1.3.4 Migrating from Oracle

This guide covers actions required for migrating from Oracle to SQream DB with CSV files.

In this topic:

• 1. Preparing the tools and login information • 2. Export the desired schema – Examples ∗ Dump all tables ∗ Dump only specific tables • 3. Convert the Oracle dump to standard SQL – Example • 4. Figure out the database structures – Remove unsupported attributes – Match data types – Additional considerations • 5. Create the tables in SQream DB – Example • 6. Export tables to CSVs – Using SQL*Plus to export data lists – Creating CSVs using stored procedures ∗ CSV generation considerations • 7. Place CSVs where SQream DB workers can access • 8. Bulk load the CSVs – Example • 9. Rewrite Oracle queries

2.1.1.3.4.1 1. Preparing the tools and login information

• Migrating data from Oracle requires a username and password for your Oracle system. • In this guide, we’ll use the Oracle Data Pump , specifically the Data Pump Export utility .

2.1. Full list of guides 29 SQream DB, Release 2021.1

2.1.1.3.4.2 2. Export the desired schema

Use the Data Pump Export utility to export the database schema. The format for using the Export utility is expdp / DIRECTORY= DUMPFILE= CONTENT=metadata_only NOLOGFILE The resulting Oracle-only schema is stored in a dump file.

2.1.1.3.4.3 Examples

2.1.1.3.4.4 Dump all tables

$ expdp rhendricks/secretpassword DIRECTORY=dpumpdir DUMPFILE=tables.dmp␣ ,→CONTENT=metadata_only NOLOGFILE

2.1.1.3.4.5 Dump only specific tables

In this example, we specify two tables for dumping.

$ expdp rhendricks/secretpassword DIRECTORY=dpumpdir DUMPFILE=tables.dmp␣ ,→CONTENT=metadata_only TABLES=employees,jobs NOLOGFILE

2.1.1.3.4.6 3. Convert the Oracle dump to standard SQL

Oracle’s Data Pump Import utility will help us convert the dump from the previous step to standard SQL. The format for using the Import utility is impdp / DIRECTORY= DUMPFILE= SQLFILE= TRANSFORM=SEGMENT_ATTRIBUTES:N:table PARTITION_OPTIONS=MERGE • TRANSFORM=SEGMENT_ATTRIBUTES:N:table excludes segment attributes (both STORAGE and TA- BLESPACE) from the tables • PARTITON_OPTIONS=MERGE combines all partitions and subpartitions into one table.

2.1.1.3.4.7 Example

$ impdp rhendricks/secretpassword DIRECTORY=dpumpdir DUMPFILE=tables.dmp SQLFILE=sql_ ,→export.sql TRANSFORM=SEGMENT_ATTRIBUTES:N:table PARTITION_OPTIONS=MERGE

2.1.1.3.4.8 4. Figure out the database structures

Using the SQL file created in the previous step, write CREATE TABLE statements to match theschemas of the tables.

30 Chapter 2. Guides SQream DB, Release 2021.1

2.1.1.3.4.9 Remove unsupported attributes

Trim unsupported primary keys, indexes, constraints, and other unsupported Oracle attributes.

2.1.1.3.4.10 Match data types

Refer to the table below to match the Oracle source data type to a new SQream DB type:

Table 4: Data types Oracle Data type Precision SQream DB data type CHAR(n), CHARACTER(n) Any n VARCHAR(n) BLOB, CLOB, NCLOB, LONG TEXT DATE DATE FLOAT(p) p <= 63 REAL FLOAT(p) p > 63 FLOAT, DOUBLE NCHAR(n), NVARCHAR2(n) Any n TEXT (alias of NVARCHAR) NUMBER(p), NUMBER(p,0) p < 5 SMALLINT NUMBER(p), NUMBER(p,0)‘ p < 9 INT NUMBER(p), NUMBER(p,0)‘ p < 19 INT NUMBER(p), NUMBER(p,0)‘ p >= 20 BIGINT NUMBER(p,f), NUMBER(*,f) f > 0 FLOAT / DOUBLE VARCHAR(n), VARCHAR2(n) Any n VARCHAR(n) or TEXT TIMESTAMP DATETIME

Read more about supported data types in SQream DB.

2.1.1.3.4.11 Additional considerations

• Understand how tables are created in SQream DB • Learn how SQream DB handles null values, particularly with regards to constraints. • Oracle roles and user management commands need to be rewritten to SQream DB’s format. SQream DB supports full role-based access control (RBAC) similar to Oracle.

2.1.1.3.4.12 5. Create the tables in SQream DB

After rewriting the table strucutres, create them in SQream DB.

2.1.1.3.4.13 Example

Consider Oracle’s HR.EMPLOYEES sample table:

CREATE TABLE employees ( employee_id NUMBER(6) , first_name VARCHAR2(20) , last_name VARCHAR2(25) CONSTRAINT emp_last_name_nn NOT NULL , email VARCHAR2(25) (continues on next page)

2.1. Full list of guides 31 SQream DB, Release 2021.1

(continued from previous page) CONSTRAINT emp_email_nn NOT NULL , phone_number VARCHAR2(20) , hire_date DATE CONSTRAINT emp_hire_date_nn NOT NULL , job_id VARCHAR2(10) CONSTRAINT emp_job_nn NOT NULL , salary NUMBER(8,2) , commission_pct NUMBER(2,2) , manager_id NUMBER(6) , department_id NUMBER(4) , CONSTRAINT emp_salary_min CHECK (salary > 0) , CONSTRAINT emp_email_uk UNIQUE (email) ); CREATE UNIQUE INDEX emp_emp_id_pk ON employees (employee_id) ;

ALTER TABLE employees ADD ( CONSTRAINT emp_emp_id_pk PRIMARY KEY (employee_id) , CONSTRAINT emp_dept_fk FOREIGN KEY (department_id) REFERENCES departments , CONSTRAINT emp_job_fk FOREIGN KEY (job_id) REFERENCES jobs (job_id) , CONSTRAINT emp_manager_fk FOREIGN KEY (manager_id) REFERENCES employees );

This table rewritten for SQream DB would be created like this:

CREATE TABLE employees ( employee_id SMALLINT NOT NULL, first_name VARCHAR(20), last_name VARCHAR(25) NOT NULL, email VARCHAR(20) NOT NULL, phone_number VARCHAR(20), hire_date DATE NOT NULL, job_id VARCHAR(10) NOT NULL, salary FLOAT, commission_pct REAL, manager_id SMALLINT, department_id TINYINT );

32 Chapter 2. Guides SQream DB, Release 2021.1

2.1.1.3.4.14 6. Export tables to CSVs

Exporting CSVs from Oracle servers is not a trivial task.

Options for exporting to CSVs

• Using SQL*Plus to export data lists • Creating CSVs using stored procedures – CSV generation considerations

2.1.1.3.4.15 Using SQL*Plus to export data lists

Here’s a sample SQL*Plus script that will export PSVs in a format that SQream DB can read: Download to_csv.sql

Listing 1: Oracle SQL*Plus CSV export script

1 SET ECHO OFF 2 SET TERMOUT OFF 3 SET FEEDBACK OFF 4 SET WRAP OFF 5 SET PAGESIZE 0 6 SET LINESIZE 10000 7 SET ARRAYSIZE 5000 8 SET NUMW 30 9 SET TRIMSPOOL ON 10 SET UNDERLINE OFF 11 SET RECSEP OFF 12 SET VERIFY OFF 13 SET COLSEP '|'

14

15

16 SPOOL '&1'

17

18 ALTER SESSION SET nls_date_format = 'YYYY-MM-DD HH24:MI:SS';

19

20 SELECT * FROM &1;

21

22 SPOOL OFF

Enter SQL*Plus and export tables one-by-one interactively:

$ sqlplus rhendricks/secretpassword

@spool employees @spool jobs [...] EXIT

Each table is exported as a data list file (.lst).

2.1. Full list of guides 33 SQream DB, Release 2021.1

2.1.1.3.4.16 Creating CSVs using stored procedures

You can use stored procedures if you have them set-up. Examples of stored procedures for generating CSVs ‘_ can be found in the Ask The Oracle Mentors forums.

2.1.1.3.4.17 CSV generation considerations

• Files should be a valid CSV. By default, SQream DB’s CSV parser can handle RFC 4180 standard CSVs , but can also be modified to support non-standard CSVs (with multi-character delimiters, unquoted fields, etc). • Files are UTF-8 or ASCII encoded • Field delimiter is an ASCII character or characters • Record delimiter, also known as a new line separator, is a Unix-style newline (\n), DOS-style newline (\r\n), or Mac style newline (\r). • Fields are optionally enclosed by double-quotes, or mandatory quoted if they contain one of the fol- lowing characters: – The record delimiter or field delimiter – A double quote character – A newline • If a field is quoted, any double quote that appears must be double-quoted (similar tothe string literals quoting rules. For example, to encode What are "birds"?, the field should appear as "What are ""birds""?". Other modes of escaping are not supported (e.g. 1,"What are \"birds\"?" is not a valid way of escaping CSV values). • NULL values can be marked in two ways in the CSV: – An explicit null marker. For example, col1,\N,col3 – An empty field delimited by the field delimiter. For example, col1,,col3

Note: If a text field is quoted but contains no content ("") it is considered an empty text field. It is not considered NULL.

2.1.1.3.4.18 7. Place CSVs where SQream DB workers can access

During data load, the COPY FROM command can run on any worker (unless explicitly speficied with the Workload Manager). It is important that every node has the same view of the storage being used - meaning, every SQream DB worker should have access to the files. • For files hosted on NFS, ensure that the mount is accessible from all servers. • For HDFS, ensure that SQream DB servers can access the HDFS name node with the correct user-id • For S3, ensure network access to the S3 endpoint

34 Chapter 2. Guides SQream DB, Release 2021.1

2.1.1.3.4.19 8. Bulk load the CSVs

Issue the COPY FROM commands to SQream DB to insert a table from the CSVs created. Repeat the COPY FROM command for each table exported from Oracle.

2.1.1.3.4.20 Example

For the employees table, run the following command:

COPY employees FROM 'employees.lst' WITH DELIMITER '|';

2.1.1.3.4.21 9. Rewrite Oracle queries

SQream DB supports a large subset of ANSI SQL. You will have to refactor much of Oracle’s SQL and functions that often are not ANSI SQL. We recommend the following resources: • SQL Feature Checklist - to understand SQream DB’s SQL feature support. • Optimization and Best Practices - to understand best practices for SQL queries and schema design. • Common table expressions (CTEs) - CTEs can be used to rewrite complex queries in a compact form. • Concurrency and locks - to understand the difference between Oracle’s transactions and SQream DB’s concurrency. • Identity - SQream DB supports sequences, but no triggers for auto-increment. • Joins - SQream DB supports ANSI join syntax. Oracle uses the + operator which SQream DB doesn’t support. • Saved queries - Saved queries can be used to emulate some stored procedures. • Subqueries - SQream DB supports a limited set of subqueries. • Python UDF (user-defined functions) - SQream DB supports Python User Defined Functions which can be used to run complex operations in-line. • Views - SQream DB supports logical views, but does not support materialized views. • Window Functions - SQream DB supports a wide array of window functions.

See also:

• COPY FROM • INSERT • Foreign Tables

2.1.2 Client drivers for v2021.1

These guides explain how to use the SQream DB client drivers, and how to use client applications with SQream DB.

2.1. Full list of guides 35 SQream DB, Release 2021.1

2.1.2.1 Client driver downloads

2.1.2.1.1 All operating systems

• JDBC - sqream-jdbc-4.4.0 (.jar) JDBC Driver (SQream recommends installing via mvn) • Python - pysqream v3.1.3 (.tar.gz) Python (pysqream) - Python driver (SQream recommends installing via pip) • Node.JS - sqream-v4.2.4 (.tar.gz) Node.JS - Node.JS driver (SQream recommends installing via npm)

2.1.2.1.2 Windows

• .Net driver - SQream .Net driver v3.0.2

2.1.2.1.3 Linux

• SQream SQL (x86) - sqream-sql-v2020.1.1_stable.x86_64.tar.gz sqream sql - Interactive command-line SQL client for Intel-based machines • SQream SQL (IBM POWER9) - sqream-sql-v2020.1.1_stable.ppc64le.tar.gz sqream sql - Interactive command-line SQL client for IBM POWER9-based machines • C++ connector - libsqream-4.1 C++ shared object library

2.1.2.1.3.1 Python (pysqream)

The SQream Python connector is a set of packages that allows Python programs to connect to SQream DB. • pysqream is a pure Python connector. It can be installed with pip on any operating system, including Linux, Windows, and macOS. • pysqream-sqlalchemy is a SQLAlchemy dialect for pysqream The connector supports Python 3.6.5 and newer. The base pysqream package conforms to Python DB-API specifications PEP-249.

In this topic:

• Installing the Python connector – Prerequisites

36 Chapter 2. Guides SQream DB, Release 2021.1

∗ 1. Python ∗ 2. PIP ∗ 3. OpenSSL for Linux ∗ 4. Cython (optional) – Install via pip – Upgrading an existing installation – Validate the installation • SQLAlchemy examples – A simple connection example – Pulling a table into Pandas • API Examples – Explaining the connection example – Using the cursor – Reading result metadata – Loading data into a table – Reading data from a CSV file for load into a table – Using SQLAlchemy ORM to create tables and fill them with data

2.1.2.1.3.2 Installing the Python connector

2.1.2.1.3.3 Prerequisites

2.1.2.1.3.4 1. Python

The connector requires Python 3.6.5 or newer. To verify your version of Python:

$ python --version Python 3.7.3

Note: If both Python 2.x and 3.x are installed, you can run python3 and pip3 instead of python and pip respectively for the rest of this guide

Warning: If you’re running on an older version, pip will fetch an older version of pysqream, with version <3.0.0. This version is currently not supported.

2.1.2.1.3.5 2. PIP

The Python connector is installed via pip, the Python package manager and installer.

2.1. Full list of guides 37 SQream DB, Release 2021.1

We recommend upgrading to the latest version of pip before installing. To verify that you are on the latest version, run the following command:

$ python -m pip install --upgrade pip Collecting pip Downloading https://files.pythonhosted.org/packages/00/b6/ ,→9cfa56b4081ad13874b0c6f96af8ce16cfbc1cb06bedf8e9164ce5551ec1/pip-19.3.1-py2.py3-none- ,→any.whl (1.4MB) |￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿| 1.4MB 1.6MB/s Installing collected packages: pip Found existing installation: pip 19.1.1 Uninstalling pip-19.1.1: Successfully uninstalled pip-19.1.1 Successfully installed pip-19.3.1

Note: • On macOS, you may want to use virtualenv to install Python and the connector, to ensure compatibility with the built-in Python environment • If you encounter an error including SSLError or WARNING: pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available. - please be sure to reinstall Python with SSL enabled, or use virtualenv or Anaconda.

2.1.2.1.3.6 3. OpenSSL for Linux

Some distributions of Python do not include OpenSSL. The Python connector relies on OpenSSL for secure connections to SQream DB. • To install OpenSSL on RHEL/CentOS

$ sudo yum install -y libffi-devel openssl-devel

• To install OpenSSL on Ubuntu

$ sudo apt-get install libssl-dev libffi-dev -y

2.1.2.1.3.7 4. Cython (optional)

Optional but highly recommended is Cython, which improves performance of Python applications.

$ pip install cython

2.1.2.1.3.8 Install via pip

The Python connector is available via PyPi. Install the connector with pip:

$ pip install pysqream pysqream-sqlalchemy

38 Chapter 2. Guides SQream DB, Release 2021.1

pip will automatically install all necessary libraries and modules.

2.1.2.1.3.9 Upgrading an existing installation

The Python drivers are updated periodically. To upgrade an existing pysqream installation, use pip’s -U flag.

$ pip install pysqream pysqream-sqlalchemy -U

2.1.2.1.3.10 Validate the installation

Create a file called test.py, containing the following:

Listing 2: pysqream Validation Script

1 #!/usr/bin/env python

2

3 import pysqream

4

5 """ 6 Connection parameters include: 7 * IP/Hostname 8 * Port 9 * database name 10 * username 11 * password 12 * Connect through load balancer, or direct to worker (Default: false - direct to worker) 13 * use SSL connection (default: false) 14 * Optional service queue (default: 'sqream') 15 """

16

17 # Create a connection object

18

19 con = pysqream.connect(host='127.0.0.1', port=5000, database='master' 20 , username='sqream', password='sqream' 21 , clustered=False)

22

23 # Create a new cursor 24 cur = con.cursor()

25

26 # Prepare and execute a query 27 cur.execute('select show_version()')

28

29 result = cur.fetchall() # `fetchall` gets the entire data set

30

31 print (f"Version: {result[0][0]}")

32

33 # This should print the SQream DB version. For example ``Version: v2020.1``.

34

35 # Finally, close the connection

36

37 con.close()

2.1. Full list of guides 39 SQream DB, Release 2021.1

Make sure to replace the parameters in the connection with the respective parameters for your SQream DB installation. Run the test file to verify that you can connect to SQream DB:

$ python test.py Version: v2020.1

If all went well, you are now ready to build an application using the SQream DB Python connector! If any connection error appears, verify that you have access to a running SQream DB and that the connection parameters are correct.

2.1.2.1.3.11 SQLAlchemy examples

SQLAlchemy is an ORM for Python. When you install the SQream DB dialect (pysqream-sqlalchemy) you can use frameworks like Pandas, TensorFlow, and Alembic to query SQream DB directly.

2.1.2.1.3.12 A simple connection example import sqlalchemy as sa from sqlalchemy.engine.url import URL engine_url = URL('sqream' , username='rhendricks' , password='secret_passwor" , host='localhost' , port=5000 , database='raviga' , query={'use_ssl': False}) engine = sa.create_engine(engine_url) res = engine.execute('create table test (ints int)') res = engine.execute('insert into test values (5), (6)') res = engine.execute('select * from test')

2.1.2.1.3.13 Pulling a table into Pandas

In this example, we use the URL method to create the connection string. import sqlalchemy as sa import pandas as pd from sqlalchemy.engine.url import URL engine_url = URL('sqream' , username='rhendricks' , password='secret_passwor" (continues on next page)

40 Chapter 2. Guides SQream DB, Release 2021.1

(continued from previous page) , host='localhost' , port=5000 , database='raviga' , query={'use_ssl': False}) engine = sa.create_engine(engine_url) table_df = pd.read_sql("select * from nba", con=engine)

2.1.2.1.3.14 API Examples

2.1.2.1.3.15 Explaining the connection example

First, import the package and create a connection

# Import pysqream package import pysqream

""" Connection parameters include: * IP/Hostname * Port * database name * username * password * Connect through load balancer, or direct to worker (Default: false - direct to worker) * use SSL connection (default: false) * Optional service queue (default: 'sqream') """

# Create a connection object con = pysqream.connect(host='127.0.0.1', port=3108, database='raviga' , username='rhendricks', password='Tr0ub4dor&3' , clustered=True)

Then, run a query and fetch the results cur = con.cursor() # Create a new cursor # Prepare and execute a query cur.execute('select show_version()') result = cur.fetchall() # `fetchall` gets the entire data set print (f"Version: {result[0][0]}")

This should print the SQream DB version. For example v2020.1. Finally, we will close the connection

2.1. Full list of guides 41 SQream DB, Release 2021.1

con.close()

2.1.2.1.3.16 Using the cursor

The DB-API specification includes several methods for fetching results from the cursor. We will use the nba example. Here’s a peek at the table contents:

Table 5: nba Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics

Like before, we will import the library and create a Connection(), followed by execute() on a simple SELECT * query. import pysqream con = pysqream.connect(host='127.0.0.1', port=3108, database='master' , username='rhendricks', password='Tr0ub4dor&3' , clustered=True) cur = con.cursor() # Create a new cursor # The select statement: statement = 'SELECT * FROM nba' cur.execute(statement)

After executing the statement, we have a Connection cursor object waiting. A cursor is iterable, meaning that everytime we fetch, it advances the cursor to the next row. Use fetchone() to get one record at a time: first_row = cur.fetchone() # Fetch one row at a time (first row) second_row = cur.fetchone() # Fetch one row at a time (second row)

42 Chapter 2. Guides SQream DB, Release 2021.1

To get several rows at a time, use fetchmany(): # executing `fetchone` twice is equivalent to this form: third_and_fourth_rows = cur.fetchmany(2)

To get all rows at once, use fetchall(): # To get all rows at once, use `fetchall` remaining_rows = cur.fetchall()

# Close the connection when done con.close()

Here are the contents of the row variables we used:

>>> print(first_row) ('Avery Bradley', 'Boston Celtics', 0, 'PG', 25, '6-2', 180, 'Texas', 7730337) >>> print(second_row) ('Jae Crowder', 'Boston Celtics', 99, 'SF', 25, '6-6', 235, 'Marquette', 6796117) >>> print(third_and_fourth_rows) [('John Holland', 'Boston Celtics', 30, 'SG', 27, '6-5', 205, 'Boston University', None), ,→ ('R.J. Hunter', 'Boston Celtics', 28, 'SG', 22, '6-5', 185, 'Georgia State', 1148640)] >>> print(remaining_rows) [('Jonas Jerebko', 'Boston Celtics', 8, 'PF', 29, '6-10', 231, None, 5000000), ('Amir␣ ,→Johnson', 'Boston Celtics', 90, 'PF', 29, '6-9', 240, None, 12000000), ('Jordan Mickey ,→', 'Boston Celtics', 55, 'PF', 21, '6-8', 235, 'LSU', 1170960), ('Kelly Olynyk', ,→'Boston Celtics', 41, 'C', 25, '7-0', 238, 'Gonzaga', 2165160), [...]

Note: Calling a fetch command after all rows have been fetched will return an empty array ([]).

2.1.2.1.3.17 Reading result metadata

When executing a statement, the connection object also contains metadata about the result set (e.g.column names, types, etc). The metadata is stored in the Connection.description object of the cursor. >>> import pysqream >>> con = pysqream.connect(host='127.0.0.1', port=3108, database='master' ... , username='rhendricks', password='Tr0ub4dor&3' ... , clustered=True) >>> cur = con.cursor() >>> statement = 'SELECT * FROM nba' >>> cur.execute(statement) >>> print(cur.description) [('Name', 'STRING', 24, 24, None, None, True), ('Team', 'STRING', 22, 22, None, None,␣ ,→True), ('Number', 'NUMBER', 1, 1, None, None, True), ('Position', 'STRING', 2, 2, None, ,→ None, True), ('Age (as of 2018)', 'NUMBER', 1, 1, None, None, True), ('Height', ,→'STRING', 4, 4, None, None, True), ('Weight', 'NUMBER', 2, 2, None, None, True), ( ,→'College', 'STRING', 21, 21, None, None, True), ('Salary', 'NUMBER', 4, 4, None, None,␣ (continues on next page) ,→True)]

2.1. Full list of guides 43 SQream DB, Release 2021.1

(continued from previous page)

To get a list of column names, iterate over the description list:

>>> [ i[0] for i in cur.description ] ['Name', 'Team', 'Number', 'Position', 'Age (as of 2018)', 'Height', 'Weight', 'College', ,→ 'Salary']

2.1.2.1.3.18 Loading data into a table

This example loads 10,000 rows of dummy data to a SQream DB instance import pysqream from datetime import date, datetime from time import time con = pysqream.connect(host='127.0.0.1', port=3108, database='master' , username='rhendricks', password='Tr0ub4dor&3' , clustered=True)

# Create a table for loading create = 'create or replace table perf (b bool, t tinyint, sm smallint, i int, bi bigint, ,→ f real, d double, s varchar(12), ss text, dt date, dtt datetime)' con.execute(create)

# After creating the table, we can load data into it with the INSERT command

# Create dummy data which matches the table we created data = (False, 2, 12, 145, 84124234, 3.141,-4.3, "Marty McFly" , u"￿￿￿￿￿￿￿￿￿￿" , date(2019,␣ ,→12, 17), datetime(1955, 11, 4, 1, 23, 0, 0)) row_count = 10**4

# Get a new cursor cur = con.cursor() insert = 'insert into perf values (?,?,?,?,?,?,?,?,?,?,?)' start = time() cur.executemany(insert, [data] * row_count) print (f"Total insert time for {row_count} rows: {time() - start} seconds")

# Close this cursor cur.close()

# Verify that the data was inserted correctly # Get a new cursor cur = con.cursor() cur.execute('select count(*) from perf') result = cur.fetchall() # `fetchall` collects the entire data set print (f"Count of inserted rows: {result[0][0]}")

(continues on next page)

44 Chapter 2. Guides SQream DB, Release 2021.1

(continued from previous page) # When done, close the cursor cur.close()

# Close the connection con.close()

2.1.2.1.3.19 Reading data from a CSV file for load into a table

We will write a helper function to create an INSERT statement, by reading an existing table’s metadata. import pysqream import datetime def insert_from_csv(cur, table_name, csv_filename, field_delimiter = ',', null_markers =␣ ,→[]): """ We will first ask SQream DB for some table information. This is important for understanding the number of columns, and will help to create a matching INSERT statement """

column_info = cur.execute(f"SELECT * FROM {table_name} LIMIT 0").description

def parse_datetime(v): try: return datetime.datetime.strptime(row[i], '%Y-%m-%d %H:%M:%S.%f') except ValueError: try: return datetime.datetime.strptime(row[i], '%Y-%m-%d %H:%M:%S') except ValueError: return datetime.datetime.strptime(row[i], '%Y-%m-%d')

# Create enough placeholders (`?`) for the INSERT query string qstring = ','.join(['?']* len(column_info)) insert_statement = f"insert into {table_name} values ({qstring})"

# Open the CSV file with open(csv_filename, mode='r') as csv_file: csv_reader = csv.reader(csv_file, delimiter=field_delimiter)

# Execute the INSERT statement with the CSV data cur.executemany(insert_statement, [row for row in csv_reader]) con = pysqream.connect(host='127.0.0.1', port=3108, database='master' , username='rhendricks', password='Tr0ub4dor&3' , clustered=True) cur = con.cursor() (continues on next page)

2.1. Full list of guides 45 SQream DB, Release 2021.1

(continued from previous page) insert_from_csv(cur, 'nba', 'nba.csv', field_delimiter = ',', null_markers = []) con.close()

2.1.2.1.3.20 Using SQLAlchemy ORM to create tables and fill them with data

You can also use the ORM to create tables and insert data to them from Python objects. For example: import sqlalchemy as sa import pandas as pd from sqlalchemy.engine.url import URL engine_url = URL('sqream' , username='rhendricks' , password='secret_passwor" , host='localhost' , port=5000 , database='raviga' , query={'use_ssl': False}) engine = sa.create_engine(engine_url)

# Build a metadata object and bind it metadata = sa.MetaData() metadata.bind = engine

# Create a table in the local metadata employees = sa.Table( 'employees' , metadata , sa.Column('id', sa.Integer) , sa.Column('name', sa.VARCHAR(32)) , sa.Column('lastname', sa.VARCHAR(32)) , sa.Column('salary', sa.Float) )

# The create_all() function uses the SQream DB engine object # to create all the defined table objects. metadata.create_all(engine)

# Now that the table exists, we can insert data into it.

# Build the data rows insert_data = [ {'id': 1, 'name': 'Richard','lastname': 'Hendricks', 'salary': 12000. ,→75} (continues on next page)

46 Chapter 2. Guides SQream DB, Release 2021.1

(continued from previous page) ,{'id': 3, 'name': 'Bertram', 'lastname': 'Gilfoyle', 'salary': 8400.0} ,{'id': 8, 'name': 'Donald', 'lastname': 'Dunn', 'salary': 6500.40} ]

# Build the insert command ins = employees.insert(insert_data)

# Execute the command result = engine.execute(ins)

2.1.2.1.3.21 pysqream API reference

The SQream Python connector allows Python programs to connect to SQream DB. pysqream conforms to Python DB-API specifications PEP-249 The main module is pysqream, which contains the Connection() class. connect(host, port, database, username, password, clustered = False, use_ssl = False, service=’sqream’, reconnect_attempts=3, reconnect_interval=10) Creates a new Connection() object and connects to SQream DB. host SQream DB hostname or IP port SQream DB port database database name username Username to use for connection password Password for username clustered Connect through load balancer, or direct to worker (Default: false - direct to worker) use_ssl use SSL connection (default: false) service Optional service queue (default: ‘sqream’) reconnect_attempts Number of reconnection attempts to attempt before closing the connection reconnect_interval Time in seconds between each reconnection attempt class Connection

arraysize Specifies the number of rows to fetch at a time with fetchmany(). Defaults to 1 - one row at a time. rowcount Unused, always returns -1. description Read-only attribute that contains result set metadata. This attribute is populated after a statement is executed.

2.1. Full list of guides 47 SQream DB, Release 2021.1

Value Description name Column name type_code Internal type code display_size Not used - same as internal_size internal_size Data size in bytes precision Precision of numeric data (not used) scale Scale for numeric data (not used) null_ok Specifies if NULL values are allowed for this column

execute(self, query, params=None) Execute a statement. Parameters are not supported self Connection() query statement or query text params Unused executemany(self, query, rows_or_cols=None, data_as=’rows’, amount=None) Prepares a statement and executes it against all parameter sequences found in rows_or_cols. self Connection() query INSERT statement rows_or_cols Data buffer to insert. This should be a sequence of lists ortuples. data_as (Optional) Read data as rows or columns amount (Optional) count of rows to insert close(self ) Close a statement and connection. After a statement is closed, it must be reopened by creating a new cursor. self Connection() cursor(self ) Create a new Connection() cursor. We recommend creating a new cursor for every statement. self Connection() fetchall(self, data_as=’rows’) Fetch all remaining records from the result set. An empty sequence is returned when no more rows are available.

self Connection() data_as (Optional) Read data as rows or columns

fetchone(self, data_as=’rows’) Fetch one record from the result set. An empty sequence is returned when no more rows are available. self Connection() data_as (Optional) Read data as rows or columns

48 Chapter 2. Guides SQream DB, Release 2021.1

fetchmany(self, size=[Connection.arraysize], data_as=’rows’) Fetches the next several rows of a query result set. An empty sequence is returned when no more rows are available.

self Connection() size Number of records to fetch. If not set, fetches Connection.arraysize (1 by default) records data_as (Optional) Read data as rows or columns

__iter__() Makes the cursor iterable. apilevel = '2.0' String constant stating the supported API level. The connector supports API “2.0”. threadsafety = 1 Level of thread safety the interface supports. pysqream currently supports level 1, which states that threads can share the module, but not connections. paramstyle = 'qmark' The placeholder marker. Set to qmark, which is a question mark (?).

2.1.2.1.3.22 C++ Driver

The SQream DB C++ driver allows C++ programs and tools to connect to SQream DB. This tutorial shows how to write a C++ program that uses this driver.

In this topic:

• Installing the C++ driver – Prerequisites – Getting the library – Extract the tarball archive • Examples – Testing the connection to SQream DB – Creating a table and inserting values

2.1.2.1.3.23 Installing the C++ driver

2.1.2.1.3.24 Prerequisites

The SQream DB C++ driver was built on 64-bit Linux, and is designed to work with RHEL 7 and Ubuntu 16.04 and newer.

2.1. Full list of guides 49 SQream DB, Release 2021.1

2.1.2.1.3.25 Getting the library

The C++ driver is provided as a tarball containing the compiled libsqream.so file and a header sqream. h. Get the driver from the SQream Drivers page. The library can be integrated into your C++-based applications or projects.

2.1.2.1.3.26 Extract the tarball archive

Extract the library files from the tarball

$ tar xf libsqream-3.0.tar.gz

2.1.2.1.3.27 Examples

Assuming there is a SQream DB worker to connect to, we’ll connect to it using the application and run some statements.

2.1.2.1.3.28 Testing the connection to SQream DB

Download this file by right clicking and saving to your computer connect_test.cpp.

Listing 3: Connect to SQream DB

1 // Trivial example

2

3 #include

4

5 #include "sqream.h"

6

7 int main () {

8

9 sqream::driver sqc;

10

11 // Connection parameters: Hostname, Port, Use SSL, Username, Password, 12 // Database name, Service name 13 sqc.connect("127.0.0.1", 5000, false, "rhendricks", "Tr0ub4dor&3", 14 "raviga", "sqream");

15

16 // create table with data 17 run_direct_query(&sqc, "CREATE TABLE test_table (x int)"); 18 run_direct_query(&sqc, "INSERT INTO test_table VALUES (5), (6), (7), (8)");

19

20 // query it 21 sqc.new_query("SELECT * FROM test_table"); 22 sqc.execute_query();

23

24 // See the results 25 while (sqc.next_query_row()) { 26 std::cout << "Received: " << sqc.get_int(0) << std::endl; 27 } (continues on next page)

50 Chapter 2. Guides SQream DB, Release 2021.1

(continued from previous page)

28

29 sqc.finish_query();

30

31 // Close the connection completely 32 sqc.disconnect();

33

34 }

2.1.2.1.3.29 Compiling and running the application

To build this code, place the library and header file in ./libsqream-3.0/ and run

$ g++ -Wall -Ilibsqream-3.0 -Llibsqream-3.0 -lsqream connect_test.cpp -o connect_test $ ./connect_test

Modify the -I and -L arguments to match the .so library and .h file if they are in another directory.

2.1.2.1.3.30 Creating a table and inserting values

Download this file by right clicking and saving to your computer insert_test.cpp.

Listing 4: Inserting data to a SQream DB table

1 // Insert with parameterized statement example

2

3 #include

4

5 #include "sqream.h"

6

7 int main () {

8

9 sqream::driver sqc;

10

11 // Connection parameters: Hostname, Port, Use SSL, Username, Password, 12 // Database name, Service name 13 sqc.connect("127.0.0.1", 5000, false, "rhendricks", "Tr0ub4dor&3", 14 "raviga", "sqream");

15

16 run_direct_query(&sqc, 17 "CREATE TABLE animals (id INT NOT NULL, name VARCHAR(10) NOT NULL)");

18

19 // prepare the statement 20 sqc.new_query("INSERT INTO animals VALUES (?, ?)"); 21 sqc.execute_query();

22

23 // Data to insert 24 int row0[] = {1,2,3}; 25 std::string row1[] = {"Dog","Cat","Possum"}; 26 int len = sizeof(row0)/sizeof(row0[0]);

27 (continues on next page)

2.1. Full list of guides 51 SQream DB, Release 2021.1

(continued from previous page)

28 for (int i = 0; i < len; ++i) { 29 sqc.set_int(0, row0[i]); 30 sqc.set_varchar(1, row1[i]); 31 sqc.next_query_row(); 32 }

33

34 // This commits the insert 35 sqc.finish_query();

36

37 sqc.disconnect();

38

39 }

2.1.2.1.3.31 Compiling and running the application

To build this code, use

$ g++ -Wall -Ilibsqream-3.0 -Llibsqream-3.0 -lsqream insert_test.cpp -o insert_test $ ./insert_test

2.1.2.1.3.32 JDBC

The SQream DB JDBC driver allows many Java applications and tools connect to SQream DB. This tutorial shows how to write a Java application using the JDBC interface. The JDBC driver requires Java 1.8 or newer.

In this topic:

• Installing the JDBC driver – Prerequisites – Getting the JAR file – Extract the zip archive – Setting up the Class Path • Connect to SQream DB with a JDBC application – Driver class – Connection string ∗ Connection parameters ∗ Connection string examples – Sample Java program

52 Chapter 2. Guides SQream DB, Release 2021.1

2.1.2.1.3.33 Installing the JDBC driver

2.1.2.1.3.34 Prerequisites

The SQream DB JDBC driver requires Java 1.8 or newer. We recommend either Oracle Java or OpenJDK. Oracle Java Download and install Java 8 from Oracle for your platform https://www.java.com/en/download/manual.jsp OpenJDK For Linux and BSD, see https://openjdk.java.net/install/ For Windows, SQream recommends Zulu 8 https://www.azul.com/downloads/zulu-community/?&version= java-8-lts&architecture=x86-64-bit&package=jdk

2.1.2.1.3.35 Getting the JAR file

The JDBC driver is provided as a zipped JAR file, available for download from the client drivers download page. This JAR file can integrate into your Java-based applications or projects.

2.1.2.1.3.36 Extract the zip archive

Extract the JAR file from the zip archive

$ unzip sqream-jdbc-4.3.0.zip

2.1.2.1.3.37 Setting up the Class Path

To use the driver, the JAR named sqream-jdbc-.jar (for example, sqream-jdbc-4.3.0.jar) needs to be included in the class path, either by putting it in the CLASSPATH environment variable, or by using flags on the relevant Java command line. For example, if the JDBC driver has been unzipped to /home/sqream/sqream-jdbc-4.3.0.jar, the appli- cation should be run as follows:

$ export CLASSPATH=/home/sqream/sqream-jdbc-4.3.0.jar:$CLASSPATH $ java my_java_app

An alternative method is to pass -classpath to the Java executable:

$ java -classpath .:/home/sqream/sqream-jdbc-4.3.0.jar my_java_app

2.1.2.1.3.38 Connect to SQream DB with a JDBC application

2.1.2.1.3.39 Driver class

Use com.sqream.jdbc.SQDriver as the driver class in the JDBC application.

2.1. Full list of guides 53 SQream DB, Release 2021.1

2.1.2.1.3.40 Connection string

JDBC drivers rely on a connection string. Use the following syntax for SQream DB

jdbc:Sqream:///;user=;password=sqream;[ ,→; ...]

2.1.2.1.3.41 Connection parameters

Item Op- De- Description tional fault 1:5000, sqream.mynetwork.co:3108 username=￿ None Username of a role to use for connection. For example, username=rhendricks password=￿ None Specifies the password of the selected role. For example, password=Tr0ub4dor&3 service=✓ sqream Specifices service queue to use. For example, service=etl ✓ false Specifies SSL for this connection. For example, ssl=true ✓ true Connect via load balancer (use only if exists, and check port).

2.1.2.1.3.42 Connection string examples

For a SQream DB cluster with load balancer and no service queues, with SSL

jdbc:Sqream://sqream.mynetwork.co:3108/master;user=rhendricks;password=Tr0ub4dor&3; ,→ssl=true;cluster=true

Minimal example for a local, standalone SQream DB

jdbc:Sqream://127.0.0.1:5000/master;user=rhendricks;password=Tr0ub4dor&3

For a SQream DB cluster with load balancer and a specific service queue named etl, to the database named raviga

jdbc:Sqream://sqream.mynetwork.co:3108/raviga;user=rhendricks;password=Tr0ub4dor&3; ,→cluster=true;service=etl

2.1.2.1.3.43 Sample Java program

Download this file by right clicking and saving to your computer sample.java.

Listing 5: JDBC application sample

1 import java.sql.Connection;

2 import java.sql.DatabaseMetaData;

3 import java.sql.DriverManager; (continues on next page)

54 Chapter 2. Guides SQream DB, Release 2021.1

(continued from previous page)

4 import java.sql.Statement;

5 import java.sql.ResultSet;

6

7 import java.io.IOException;

8 import java.security.KeyManagementException;

9 import java.security.NoSuchAlgorithmException;

10 import java.sql.SQLException;

11

12

13

14 public class SampleTest {

15

16 // Replace with your connection string 17 static final String url = "jdbc:Sqream://sqream.mynetwork.co:3108/master; ,→user=rhendricks;password=Tr0ub4dor&3;ssl=true;cluster=true";

18

19 // Allocate objects for result set and metadata 20 Connection conn = null; 21 Statement stmt = null; 22 ResultSet rs = null; 23 DatabaseMetaData dbmeta = null;

24

25 int res = 0;

26

27 public void testJDBC() throws SQLException, IOException {

28

29 // Create a connection 30 conn = DriverManager.getConnection(url,"rhendricks","Tr0ub4dor&3");

31

32 // Create a table with a single integer column 33 String sql = "CREATE TABLE test (x INT)"; 34 stmt = conn.createStatement(); // Prepare the statement 35 stmt.execute(sql); // Execute the statement 36 stmt.close(); // Close the statement handle

37

38 // Insert some values into the newly created table 39 sql = "INSERT INTO test VALUES (5),(6)"; 40 stmt = conn.createStatement(); 41 stmt.execute(sql); 42 stmt.close();

43

44 // Get values from the table 45 sql = "SELECT * FROM test"; 46 stmt = conn.createStatement(); 47 rs = stmt.executeQuery(sql); 48 // Fetch all results one-by-one 49 while(rs.next()) { 50 res = rs.getInt(1); 51 System.out.println(res); // Print results to screen 52 } 53 rs.close(); // Close the result set

(continues on next page)

2.1. Full list of guides 55 SQream DB, Release 2021.1

(continued from previous page)

54 stmt.close(); // Close the statement handle 55 }

56

57

58 public static void main(String[] args) throws SQLException, ␣ ,→KeyManagementException, NoSuchAlgorithmException, IOException, ␣ ,→ClassNotFoundException{

59

60 // Load SQream DB JDBC driver 61 Class.forName("com.sqream.jdbc.SQDriver");

62

63 // Create test object and run 64 SampleTest test = new SampleTest(); 65 test.testJDBC(); 66 } 67 }

2.1.2.1.3.44 ODBC

2.1.2.1.3.45 Install and Configure ODBC on Windows

The ODBC driver for Windows is provided as a self-contained installer. This tutorial shows you how to install and configure ODBC on Windows.

In this topic:

• Installing the ODBC Driver – Prerequisites – 1. Run the Windows installer • 3. Configuring the ODBC Driver DSN – Connection Parameters • Troubleshooting – Solving “Code 126” ODBC errors

2.1.2.1.3.46 Installing the ODBC Driver

2.1.2.1.3.47 Prerequisites

2.1.2.1.3.48 Visual Studio 2015 Redistributables

To install the ODBC driver you must first install Microsoft’s Visual C++ Redistributable for Visual Studio 2015. To install Visual C++ Redistributable for Visual Studio 2015, see the Install Instructions.

56 Chapter 2. Guides SQream DB, Release 2021.1

2.1.2.1.3.49 Administrator Privileges

The SQream DB ODBC driver requires administrator privileges on your computer to add the DSNs (data source names).

2.1.2.1.3.50 1. Run the Windows installer

Install the driver by following the on-screen instructions in the easy-to-follow installer.

Note: The installer will install the driver in C:\Program Files\SQream Technologies\ODBC Driver by default. This path is changable during the installation.

2.1.2.1.3.51 2. Selecting Components

The installer includes additional components, like JDBC and Tableau customizations.

2.1. Full list of guides 57 SQream DB, Release 2021.1

You can deselect items you don’t want to install, but the items named ODBC Driver DLL and ODBC Driver Registry Keys must remain selected for a complete installation of the ODBC driver. Once the installer finishes, you will be ready to configure the DSN for connection.

2.1.2.1.3.52 3. Configuring the ODBC Driver DSN

ODBC driver configurations are done via DSNs. Each DSN represents one SQream DB database. 1. Open up the Windows menu by clicking the Windows button on your keyboard (￿ Win) or pressing the Windows button with your mouse. 2. Type ODBC and select ODBC Data Sources (64-bit). Click the item to open up the setup window.

58 Chapter 2. Guides SQream DB, Release 2021.1

3. The installer has created a sample User DSN named SQreamDB You can modify this DSN, or create a new one (Add → SQream ODBC Driver → Next)

2.1. Full list of guides 59 SQream DB, Release 2021.1

4. Enter your connection parameters. See the reference below for a description of the parameters.

5. When completed, save the DSN by selecting OK

Tip: Test the connection by clicking Test before saving. A successful test looks like this:

1. You can now use this DSN in ODBC applications like Tableau.

60 Chapter 2. Guides SQream DB, Release 2021.1

2.1.2.1.3.53 Connection Parameters

Item Description Data Source An easily recognizable name that you’ll use to reference this DSN. Once you set this, Name it can not be changed. Description A description of this DSN for your convenience. You can leave this blank. User Username of a role to use for connection. For example, rhendricks Password Specifies the password of the selected role. For example, Tr0ub4dor&3 Database Specifies the database name to connect to. For example, master Service Specifices service queue to use. For example, etl. Leave blank for default service sqream. Server Hostname of the SQream DB worker. For example, 127.0.0.1 or sqream.mynetwork. co Port TCP port of the SQream DB worker. For example, 5000 or 3108 User server Connect via load balancer (use only if exists, and check port) picker SSL Specifies SSL for this connection Logging op- Use this screen to alter logging options when tracing the ODBC connection for possible tions connection issues.

2.1.2.1.3.54 Troubleshooting

2.1.2.1.3.55 Solving “Code 126” ODBC errors

After installing the ODBC driver, you may experience the following error:

The setup routines for the SQreamDriver64 ODBC driver could not be loaded due to system␣ ,→error code 126: The specified module could not be found. (c:\Program Files\SQream Technologies\ODBC Driver\sqreamOdbc64.dll)

This is an issue with the Visual Studio Redistributable packages. Verify you’ve correctly installed them, as described in the Visual Studio 2015 Redistributables section above.

2.1.2.1.3.56 Install and configure ODBC on Linux

The ODBC driver for Windows is provided as a shared library. This tutorial shows how to install and configure ODBC on Linux.

In this topic:

• Prerequisites – unixODBC • Install the ODBC driver with a script • Install the ODBC driver manually • Install the driver dependencies

2.1. Full list of guides 61 SQream DB, Release 2021.1

• Testing the connection • ODBC DSN Parameters

2.1.2.1.3.57 Prerequisites

2.1.2.1.3.58 unixODBC

The ODBC driver requires a driver manager to manage the DSNs. SQream DB’s driver is built for unixODBC. Verify unixODBC is installed by running:

$ odbcinst -j unixODBC 2.3.4 DRIVERS...... : /etc/odbcinst.ini SYSTEM DATA SOURCES: /etc/odbc.ini FILE DATA SOURCES..: /etc/ODBCDataSources USER DATA SOURCES..: /home/rhendricks/.odbc.ini SQLULEN Size...... : 8 SQLLEN Size...... : 8 SQLSETPOSIROW Size.: 8

Take note of the location of .odbc.ini and .odbcinst.ini. In this case, /etc. If odbcinst is not installed, follow the instructions for your platform below:

Install unixODBC on:

• Install unixODBC on RHEL 7 / CentOS 7 • Install unixODBC on Ubuntu

2.1.2.1.3.59 Install unixODBC on RHEL 7 / CentOS 7

$ yum install -y unixODBC unixODBC-devel

2.1.2.1.3.60 Install unixODBC on Ubuntu

$ sudo apt-get install unixodbc unixodbc-dev

2.1.2.1.3.61 Install the ODBC driver with a script

Use this method if you have never used ODBC on your machine before. If you have existing DSNs, see the manual install process below. 1. Unpack the tarball Copy the downloaded file to any directory, and untar it to a new directory:

62 Chapter 2. Guides SQream DB, Release 2021.1

$ mkdir -p sqream_odbc64 $ tar xf sqream_2019.2.1_odbc_3.0.0_x86_64_linux.tar.gz -C sqream_odbc64

2. Run the first-time installer. The installer will create an editable DSN.

$ cd sqream_odbc64 ./odbc_install.sh --install

3. Edit the DSN created by editing /etc/.odbc.ini. See the parameter explanation in the section ODBC DSN Parameters.

2.1.2.1.3.62 Install the ODBC driver manually

Use this method when you have existing ODBC DSNs on your machine. 1. Unpack the tarball Copy the file you downloaded to the directory where you want to install it,and untar it:

$ tar xf sqream_2019.2.1_odbc_3.0.0_x86_64_linux.tar.gz -C sqream_odbc64

Take note of the directory where the driver was unpacked. For example, /home/rhendricks/ sqream_odbc64 2. Locate the .odbc.ini and .odbcinst.ini files, using odbcinst -j. 1. In .odbcinst.ini, add the following lines to register the driver (change the highlighted paths to match your specific driver):

[ODBC Drivers] SqreamODBCDriver=Installed

[SqreamODBCDriver] Description=Driver DSII SqreamODBC 64bit Driver=/home/rhendricks/sqream_odbc64/sqream_odbc64.so Setup=/home/rhendricks/sqream_odbc64/sqream_odbc64.so APILevel=1 ConnectFunctions=YYY DriverODBCVer=03.80 SQLLevel=1 IconvEncoding=UCS-4LE

2. In .odbc.ini, add the following lines to configure the DSN (change the highlighted parameters to match your installation):

[ODBC Data Sources] MyTest=SqreamODBCDriver

[MyTest] Description=64-bit Sqream ODBC Driver=/home/rhendricks/sqream_odbc64/sqream_odbc64.so Server="127.0.0.1" Port="5000" Database="raviga" Service="" (continues on next page)

2.1. Full list of guides 63 SQream DB, Release 2021.1

(continued from previous page) User="rhendricks" Password="Tr0ub4dor&3" Cluster=false Ssl=false

Parameters are in the form of parameter = value. For details about the parameters that can be set for each DSN, see the section ODBC DSN Parameters. 3. Create a file called .sqream_odbc.ini for managing the driver settings and logging. This file should be created alongside the other files, and add the following lines (change the highlighted parameters to match your installation):

# Note that this default DriverManagerEncoding of UTF-32 is for iODBC.␣ ,→unixODBC uses UTF-16 by default. # If unixODBC was compiled with -DSQL_WCHART_CONVERT, then UTF-32 is the␣ ,→correct value. # Execute 'odbc_config --cflags' to determine if you need UTF-32 or UTF- ,→16 on unixODBC [Driver] DriverManagerEncoding=UTF-16 DriverLocale=en-US ErrorMessagesPath=/home/rhendricks/sqream_odbc64/ErrorMessages LogLevel=0 LogNamespace= LogPath=/tmp/ ODBCInstLib=libodbcinst.so

2.1.2.1.3.63 Install the driver dependencies

Add the ODBC driver path to LD_LIBRARY_PATH:

$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/rhendricks/sqream_odbc64/lib

You can also add this previous command line to your ~/.bashrc file in order to keep this installation working between reboots without re-entering the command manually

2.1.2.1.3.64 Testing the connection

Test the driver using isql. If the DSN created is called MyTest as the example, run isql in this format:

$ isql MyTest

64 Chapter 2. Guides SQream DB, Release 2021.1

2.1. Full list of guides 65 SQream DB, Release 2021.1

2.1.2.1.3.65 ODBC DSN Parameters

Item Default Description Data Source Name None An easily recognizable name that you’ll use to reference this DSN. Description None A description of this DSN for your convenience. This field can be left blank User None Username of a role to use for connection. For example, User="rhendricks" Password None Specifies the password of the selected role. For example, User="Tr0ub4dor&3" Database None Specifies the database name to connect to. For example, Database="master" Service sqream Specifices service queue to use. For example, Service="etl". Leave blank (Service="") for de- fault service sqream. Server None Hostname of the SQream DB worker. For exam- ple, Server="127.0.0.1" or Server="sqream.mynetwork. co" Port None TCP port of the SQream DB worker. For example, Port="5000" or Port="3108" for the load balancer Cluster false Connect via load balancer (use only if exists, and check port). For example, Cluster=true Ssl false Specifies SSL for this connection. For example, Ssl=true DriverManagerEncoding UTF-16 Depending on how unixODBC is installed, you may need to change this to UTF-32. ErrorMessagesPath None Location where the driver was installed. For example, ErrorMessagePath=/home/ rhendricks/sqream_odbc64/ ErrorMessages. LogLevel 0 Set to 0-6 for logging. Use this setting when instructed to by SQream Support. For example, LogLevel=1 • 0 = Disable tracing • 1 = Fatal only error tracing • 2 = Error trac- ing • 3 = Warning 66 tracingChapter 2. Guides • 4 = Info tracing • 5 = Debug trac- ing • 6 = Detailed tracing SQream DB, Release 2021.1

SQream has an ODBC driver to connect to SQream DB. This tutorial shows how to install the ODBC driver for Linux or Windows for use with applications like Tableau, PHP, and others that use ODBC.

Platform Versions supported Windows • Windows 7 (64 bit) • Windows 8 (64 bit) • Windows 10 (64 bit) • Windows Server 2008 R2 (64 bit) • Windows Server 2012 • Windows Server 2016 • Windows Server 2019

Linux • Red Hat Enterprise Linux (RHEL) 7 • CentOS 7 • Ubuntu 16.04 • Ubuntu 18.04

Other distributions may also work, but are not officially supported by SQream.

In this topic:

• Downloading the ODBC driver • Install and configure the ODBC driver

2.1.2.1.3.66 Downloading the ODBC driver

The SQream DB ODBC driver is distributed by your SQream account manager. Before contacting your account manager, verify which platform the ODBC driver will be used on. Go to SQream Support or contact your SQream account manager to get the driver. The driver is provided as an executable installer for Windows, or a compressed tarball for Linux platforms. After downloading the driver, follow the relevant instructions to install and configure the driver for your platform:

2.1.2.1.3.67 Install and configure the ODBC driver

Continue based on your platform: • Install and Configure ODBC on Windows • Install and configure ODBC on Linux

2.1.2.1.3.68 Node.JS

The SQream DB Node.JS driver allows Javascript applications and tools connect to SQream DB. This tutorial shows you how to write a Node application using the Node.JS interface. The driver requires Node 10 or newer.

2.1. Full list of guides 67 SQream DB, Release 2021.1

In this topic:

• Installing the Node.JS driver – Prerequisites – Install with NPM – Install from an offline package • Connect to SQream DB with a Node.JS application – Create a simple test – Run the test • API reference – Connection parameters – Events ∗ Example – Input placeholders • Examples – Setting configuration flags – Lazyloading – Reusing a connection – Using placeholders in queries • Troubleshooting and recommended configuration – Preventing heap out of memory errors – BIGINT support

2.1.2.1.3.69 Installing the Node.JS driver

2.1.2.1.3.70 Prerequisites

• Node.JS 10 or newer. Follow instructions at nodejs.org .

2.1.2.1.3.71 Install with NPM

Installing with npm is the easiest and most reliable method. If you need to install the driver in an offline system, see the offline method below.

$ npm install @sqream/sqreamdb

2.1.2.1.3.72 Install from an offline package

The Node driver is provided as a tarball for download from the SQream Drivers page .

68 Chapter 2. Guides SQream DB, Release 2021.1

After downloading the tarball, use npm to install the offline package.

$ sudo npm install sqreamdb-4.0.0.tgz

2.1.2.1.3.73 Connect to SQream DB with a Node.JS application

2.1.2.1.3.74 Create a simple test

Replace the connection parameters with real parameters for a SQream DB installation.

Listing 6: sqreamdb-test.js const Connection = require('@sqream/sqreamdb'); const config = { host: 'localhost', port: 3109, username: 'rhendricks', password: 'super_secret_password', connectDatabase: 'raviga', cluster: true, is_ssl: true, service: 'sqream' }; const query1 = 'SELECT 1 AS test, 2*6 AS "dozen"'; const sqream = new Connection(config); sqream.execute(query1).then((data) => { console.log(data); }, (err) => { console.error(err); });

2.1.2.1.3.75 Run the test

A successful run should look like this:

$ node sqreamdb-test.js [ { test: 1, dozen: 12 } ]

2.1. Full list of guides 69 SQream DB, Release 2021.1

2.1.2.1.3.76 API reference

2.1.2.1.3.77 Connection parameters

Item Op- De- Description tional fault host ￿ None Hostname for SQream DB worker. For example, 127.0.0.1, sqream. mynetwork.co port ￿ None Port for SQream DB end-point. For example, 3108 for the load bal- ancer, 5000 for a worker. username ￿ None Username of a role to use for connection. For example, rhendricks password ￿ None Specifies the password of the selected role. For example, Tr0ub4dor&3 connectDatabase￿ None Database name to connect to. For example, master service ✓ sqream Specifices service queue to use. For example, etl is_ssl ✓ false Specifies SSL for this connection. For example, true cluster ✓ false Connect via load balancer (use only if exists, and check port). For example, true

2.1.2.1.3.78 Events

The connector handles event returns with an event emitter getConnectionId The getConnectionId event returns the executing connection ID. getStatementId The getStatementId event returns the executing statement ID. getTypes The getTypes event returns the results columns types.

2.1.2.1.3.79 Example const myConnection = new Connection(config); myConnection.runQuery(query1, function (err, data){ myConnection.events.on('getConnectionId', function(data){ console.log('getConnectionId', data); });

myConnection.events.on('getStatementId', function(data){ console.log('getStatementId', data); });

myConnection.events.on('getTypes', function(data){ console.log('getTypes', data); }); });

2.1.2.1.3.80 Input placeholders

The Node.JS driver can replace parameters in a statement.

70 Chapter 2. Guides SQream DB, Release 2021.1

Input placeholders allow values like user input to be passed as parameters into queries, with proper escaping. The valid placeholder formats are provided in the table below.

Placeholder Type %i Identifier (e.g. table name, column name) %s A text string %d A number value %b A boolean value

See the input placeholders example below.

2.1.2.1.3.81 Examples

2.1.2.1.3.82 Setting configuration flags

SQream DB configuration flags can be set per statement, as a parameter to runQuery. For example: const setFlag = 'SET showfullexceptioninfo = true;'; const query_string = 'SELECT 1'; const myConnection = new Connection(config); myConnection.runQuery(query_string, function (err, data){ console.log(err, data); }, setFlag);

2.1.2.1.3.83 Lazyloading

To process rows without keeping them in memory, you can lazyload the rows with an async: const Connection = require('@sqream/sqreamdb'); const config = { host: 'localhost', port: 3109, username: 'rhendricks', password: 'super_secret_password', connectDatabase: 'raviga', cluster: true, is_ssl: true, service: 'sqream' }; const sqream = new Connection(config); const query = "SELECT * FROM public.a_very_large_table";

(async () => { (continues on next page)

2.1. Full list of guides 71 SQream DB, Release 2021.1

(continued from previous page) const cursor = await sqream.executeCursor(query); let count = 0; for await (let rows of cursor.fetchIterator(100)) { // fetch rows in chunks of 100 count += rows.length; } await cursor.close(); return count; })().then((total) => { console.log('Total rows', total); }, (err) => { console.error(err); });

2.1.2.1.3.84 Reusing a connection

It is possible to execeute multiple queries with the same connection (although only one query can be executed at a time). const Connection = require('@sqream/sqreamdb'); const config = { host: 'localhost', port: 3109, username: 'rhendricks', password: 'super_secret_password', connectDatabase: 'raviga', cluster: true, is_ssl: true, service: 'sqream' }; const sqream = new Connection(config);

(async () => {

const conn = await sqream.connect(); try { const res1 = await conn.execute("SELECT 1"); const res2 = await conn.execute("SELECT 2"); const res3 = await conn.execute("SELECT 3"); conn.disconnect(); return {res1, res2, res3}; } catch (err) { conn.disconnect(); throw err; }

})().then((res) => { console.log('Results', res) (continues on next page)

72 Chapter 2. Guides SQream DB, Release 2021.1

(continued from previous page) }, (err) => { console.error(err); });

2.1.2.1.3.85 Using placeholders in queries

Input placeholders allow values like user input to be passed as parameters into queries, with proper escaping. const Connection = require('@sqream/sqreamdb'); const config = { host: 'localhost', port: 3109, username: 'rhendricks', password: 'super_secret_password', connectDatabase: 'raviga', cluster: true, is_ssl: true, service: 'sqream' }; const sqream = new Connection(config); const sql = "SELECT %i FROM public.%i WHERE name = %s AND num > %d AND active = %b"; sqream.execute(sql, "col1", "table2", "john's", 50, true);

The query that will run is SELECT col1 FROM public.table2 WHERE name = 'john''s' AND num > 50 AND active = true

2.1.2.1.3.86 Troubleshooting and recommended configuration

2.1.2.1.3.87 Preventing heap out of memory errors

Some workloads may cause Node.JS to fail with the error:

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory

To prevent this error, modify the heap size configuration by setting the --max-old-space-size run flag. For example, set the space size to 2GB:

$ node --max-old-space-size=2048 my-application.js

2.1.2.1.3.88 BIGINT support

The Node.JS connector supports fetching BIGINT values from SQream DB. However, some applications may encounter an error when trying to serialize those values. The error that appears is: .. code-block:: none

2.1. Full list of guides 73 SQream DB, Release 2021.1

TypeError: Do not know how to serialize a BigInt This is because JSON specification do not support BIGINT values, even when supported by Javascript engines. To resolve this issue, objects with BIGINT values should be converted to string before serializing, and converted back after deserializing. For example: const rows = [{test: 1n}] const json = JSON.stringify(rows, , (key, value) => typeof value === 'bigint' ? value.toString() : value // return everything else unchanged )); console.log(json); // [{"test": "1"}]

Need help?

If you couldn’t find what you’re looking for, we’re always happy to help. Visit SQream’s support portal for additional support.

Looking for older drivers?

If you’re looking for an older version of SQream DB drivers, versions 1.10 through 2019.2.1 are available at https://sqream.com/product/client-drivers/ .

2.1.3 Third Party Tools

These topics explain how to install and connect a variety of third party tools. Browse the articles below, in the sidebar, or use the search to find the information you need.

2.1.3.1 Overview

SQream DB is designed to work with most common database tools and interfaces, allowing you direct access through a variety of drivers, connectors, tools, vizualisers, and utilities. The tools listed have been tested and approved for use with SQream DB. Most 3rd party tools that work through JDBC, ODBC, and Python should work. If you are looking for a tool that is not listed, SQream and our partners can help. Go to SQream Support or contact your SQream account manager for more information.

2.1.3.1.1 Connect to SQream Using SQL Workbench

You can use SQL Workbench to interact with a SQream DB cluster. SQL Workbench/J is a free SQL query tool, and is designed to run on any JRE-enabled environment. This tutorial is a guide that will show you how to connect SQL Workbench to SQream DB.

74 Chapter 2. Guides SQream DB, Release 2021.1

In this topic:

• Installing SQL Workbench with the SQream DB installer (Windows only) • Installing SQL Workbench manually (Linux, MacOS) – Install Java Runtime – Get the SQream DB JDBC driver – Install SQL Workbench – Setting up the SQream DB JDBC driver profile • Create a new connection profile for your cluster • Suggested optional configuration

2.1.3.1.1.1 Installing SQL Workbench with the SQream DB installer (Windows only)

SQream DB’s driver installer for Windows can install the Java prerequisites and SQL Workbench for you. 1. Get the JDBC driver installer available for download from the SQream Drivers page. The Windows installer takes care of the Java prerequisites and subsequent configuration. 2. Install the driver by following the on-screen instructions in the easy-to-follow installer. By default, the installer does not install SQL Workbench. Make sure to select the item!

Note: The installer will install SQL Workbench in C:\Program Files\SQream

2.1. Full list of guides 75 SQream DB, Release 2021.1

Technologies\SQLWorkbench by default. You can change this path during the installation.

1. Once finished, SQL Workbench is installed and contains the necessary configuration for connectingto SQream DB clusters. 2. Start SQL Workbench from the Windows start menu. Be sure to select SQL Workbench (64) if you’re on 64-bit Windows.

You are now ready to create a profile for your cluster. Continue to Creating a new connection profile.

2.1.3.1.1.2 Installing SQL Workbench manually (Linux, MacOS)

2.1.3.1.1.3 Install Java Runtime

Both SQL Workbench and the SQream DB JDBC driver require Java 1.8 or newer. You can install either Oracle Java or OpenJDK.

76 Chapter 2. Guides SQream DB, Release 2021.1

Oracle Java Download and install Java 8 from Oracle for your platform - https://www.java.com/en/download/manual.jsp OpenJDK For Linux and BSD, see https://openjdk.java.net/install/ For Windows, SQream recommends Zulu 8 https://www.azul.com/downloads/zulu-community/?&version= java-8-lts&architecture=x86-64-bit&package=jdk

2.1.3.1.1.4 Get the SQream DB JDBC driver

SQream DB’s JDBC driver is provided as a zipped JAR file, available for download from the SQream Drivers page. Download and extract the JAR file from the zip archive.

2.1.3.1.1.5 Install SQL Workbench

1. Download the latest stable release from https://www.sql-workbench.eu/downloads.html . The Generic package for all systems is recommended. 2. Extract the downloaded ZIP archive into a directory of your choice. 3. Start SQL workbench. If you are using 64 bit windows, run SQLWorkbench64.exe instead of SQLWOrkbench.exe.

2.1.3.1.1.6 Setting up the SQream DB JDBC driver profile

1. Define a connection profile - File → Connect window (Alt+C)

2.1. Full list of guides 77 SQream DB, Release 2021.1

2. Open the drivers management window - Manage Drivers

78 Chapter 2. Guides SQream DB, Release 2021.1

3. Create the SQream DB driver profile

2.1. Full list of guides 79 SQream DB, Release 2021.1

1. Click on the Add new driver button (“New” icon) 2. Name the driver as you see fit. We recommend calling it SQream DB , where is the version you have installed. 3. Add the JDBC drivers from the location where you extracted the SQream DB JDBC JAR. If you used the SQream installer, the file will be in C:\Program Files\SQream Technologies\JDBC Driver\ 4. Click the magnifying glass button to detect the classname automatically. Other details are purely optional 5. Click OK to save and return to “new connection screen”

2.1.3.1.1.7 Create a new connection profile for your cluster

1. Create new connection by clicking the New icon (top left) 2. Give your connection a descriptive name 3. Select the SQream Driver that was created in the previous screen 4. Type in your connection string. To find out more about your connection string (URL), seethe Con- nection string documentation. 5. Text the connection details 6. Click OK to save the connection profile and connect to SQream DB

80 Chapter 2. Guides SQream DB, Release 2021.1

2.1.3.1.1.8 Suggested optional configuration

If you installed SQL Workbench manually, you can set a customization to help SQL Workbench show information correctly in the DB Explorer panel. 1. Locate your workbench.settings file On Windows, typically: C:\Users\\. sqlworkbench\workbench.settings On Linux, $HOME/.sqlworkbench 2. Add the following line at the end of the file:

workbench.db.sqreamdb.schema.retrieve.change.catalog=true

3. Save the file and restart SQL Workbench

2.1.3.1.2 Connecting to SQream Using Tableau

2.1.3.1.2.1 Overview

SQream’s Tableau connector plugin, based on standard JDBC, enables storing and fast querying large volumes of data. The Connecting to SQream Using Tableau page is a Quick Start Guide that describes how install Tableau and the JDBC and ODBC drivers and connect to SQream using the JDBC and ODBC drivers for data analysis. It also describes using best practices and troubleshoot issues that may occur while installing Tableau. SQream supports both Tableau Desktop and Tableau Server on Windows, MacOS, and Linux distributions. For more information on SQream’s integration with Tableau, see Tableau’s Extension Gallery. The Connecting to SQream Using Tableau page describes the following:

• Installing the JDBC Driver and Tableau Connector Plugin – Installing the JDBC Driver Using the Windows Installer – Installing the JDBC Driver Manually • Installing the ODBC Driver for Tableau Versions 2019.3 and Earlier – Automatically Reconfiguring the ODBC Driver After Initial Installation – Manually Reconfiguring the ODBC Driver After Initial Installation – Configuring the ODBC Connection – Connecting Tableau to SQream • Connecting to SQream • Setting Up SQream Tables as Data Sources • Tableau Best Practices and Troubleshooting – Inserting Only Required Data – Using Tableau’s Table Query Syntax – Creating a Separate Service for Tableau – Troubleshooting Workbook Performance Before Deploying to the Tableau Server

2.1. Full list of guides 81 SQream DB, Release 2021.1

– Troubleshooting Error Codes

2.1.3.1.2.2 Installing the JDBC Driver and Tableau Connector Plugin

This section describes how to install the JDBC driver using the fully-integrated Tableau connector plugin (Tableau Connector, or .taco file). SQream has been tested with Tableau versions 9.2 and newer. To connect to SQream using Tableau: 1. Install the Tableau Desktop application. For more information about installing the Tableau Desktop application, see the Tableau products page and click Download Free Trial. Note that Tableau offers a 14-day trial version. 2. Do one of the following: • For Windows - See Installing Tableau Using the Windows Installer. • For MacOS or Linux - See Installing the JDBC Driver Manually.

Note: For Tableau 2019.4 versions and later, SQream recommends installing the JDBC driver instead of the previously recommended ODBC driver.

2.1.3.1.2.3 Installing the JDBC Driver Using the Windows Installer

If you are using Windows, after installing the Tableau Desktop application you can install the JDBC driver using the Windows installer. The Windows installer is an installation wizard that guides you through the JDBC driver installation steps. When the driver is installed, you can connect to SQream. To install Tableau using the Windows installer: 1. Close Tableau Desktop.

2. Download the most current version of the SQream JDBC driver.

3. Do the following: 1. Start the installer. 2. Verify that the Tableau Desktop connector item is selected. 3. Follow the installation steps.

You can now restart Tableau Desktop or Server to begin using the SQream driver by connecting to SQream.

2.1.3.1.2.4 Installing the JDBC Driver Manually

If you are using MacOS, Linux, or the Tableau server, after installing the Tableau Desktop application you can install the JDBC driver manually. When the driver is installed, you can connect to SQream. To install the JDBC driver manually:

82 Chapter 2. Guides SQream DB, Release 2021.1

1. Download the JDBC installer and SQream Tableau connector (.taco) file from the from the client drivers page.

2. Install the JDBC driver by unzipping the JDBC driver into a Tableau driver directory. Based on the installation method that you used, your Tableau driver directory is located in one of the following places: • Tableau Desktop on Windows: C:\Program Files\Tableau\Drivers • Tableau Desktop on MacOS: ~/Library/Tableau/Drivers • Tableau on Linux: /opt/tableau/tableau_driver/jdbc

Note: If the driver includes only a single .jar file, copy it to C:\Program Files\Tableau/Drivers. If the driver includes multiple files, create a subfolder A in C:\Program Files\Tableau/Drivers and copy all files to folder A.

Note the following when installing the JDBC driver: • You must have read permissions on the .jar file. • Tableau requires a JDBC 4.0 or later driver. • Tableau requires a Type 4 JDBC driver. • The latest 64-bit version of Java 8 is installed. 3. Install the SQreamDB.taco file by moving the SQreamDB.taco file into the Tableau connectors directory. Based on the installation method that you used, your Tableau driver directory is located in one of the following places: • Tableau Desktop on Windows: C:\Users\\My Tableau Repository\Connectors • Tableau Desktop on Windows: ~/My Tableau Repository/Connectors

4. Optional - If you are using the Tableau Server, do the following: 1. Create a directory for Tableau connectors and give it a descriptive name, such as C:\tableau_connectors. This directory needs to exist on all Tableau servers.

2. Copy the SQreamDB.taco file into the new directory.

3. Set the native_api.connect_plugins_path option to tsm as shown in the following example:

$ tsm configuration set -k native_api.connect_plugins_path -v C:/tableau_ ,→connectors

If a configuration error is displayed, add --force-keys to the end of the command as shown in the following example:

2.1. Full list of guides 83 SQream DB, Release 2021.1

$ tsm configuration set -k native_api.connect_plugins_path -v C:/tableau_ ,→connectors--force-keys

4. To apply the pending configuration changes, run the following command:

$ tsm pending-changes apply

Warning: This restarts the server.

You can now restart Tableau Desktop or Server to begin using the SQream driver by connecting to SQream as described in the section below.

2.1.3.1.2.5 Installing the ODBC Driver for Tableau Versions 2019.3 and Earlier

This section describes the installation method for Tableau version 2019.3 or earlier and describes the follow- ing:

• Automatically Reconfiguring the ODBC Driver After Initial Installation • Manually Reconfiguring the ODBC Driver After Initial Installation • Configuring the ODBC Connection • Connecting Tableau to SQream

Note: SQream recommends installing the JDBC driver to provide improved connectivity.

2.1.3.1.2.6 Automatically Reconfiguring the ODBC Driver After Initial Installation

If you’ve already installed the SQream ODBC driver and installed Tableau, SQream recommends reinstalling the ODBC driver with the .TDC Tableau Settings for SQream DB configuration shown in the image below:

84 Chapter 2. Guides SQream DB, Release 2021.1

SQream recommends this configuration because Tableau creates temporary tables and runs several discovery queries that may impact performance. The ODBC driver installer avoids this by automatically reconfiguring Tableau. For more information about reinstalling the ODBC driver installer, see Install and Configure ODBC on Windows. If you want to manually reconfigure the ODBC driver, see Manually Reconfiguring the ODBC Driver After Initial Installation below.

2.1.3.1.2.7 Manually Reconfiguring the ODBC Driver After Initial Installation

The file Tableau Datasource Customization (TDC) file lets you use Tableau make full use of SQream DB’s features and capabilities. To manually reconfigure the ODBC driver after initial installation: 1. Do one of the following: 1. Download the odbc-sqream.tdc file to your machine and open it in a text editor.

2. Copy the text below into a text editor:

Listing 7: SQream ODBC TDC File (continues on next page)

2.1. Full list of guides 85 SQream DB, Release 2021.1

(continued from previous page)

2. Check which version of Tableau you are using.

3. In the text of the file shown above, in the highlighted line, replace the version number withthe major version of Tableau that you are using. For example, if you are using Tableau vesion 2019.2.1, replace it with 2019.2.

4. Do one of the following: • If you are using Tableau Desktop - save the TDC file to C:\Users\\Documents\My Tableau Repository\Datasources, where is the Windows username that you have installed Tableau under.

• If you are using the Tableau Server - save the TDC file to C:\ProgramData\Tableau\Tableau Server\data\tabsvc\vizqlserver\Datasources.

2.1.3.1.2.8 Configuring the ODBC Connection

The ODBC connection uses a DSN when connecting to ODBC data sources, and each DSN represents one SQream database. To configure the ODBC connection: 1. Create an ODBC DSN.

86 Chapter 2. Guides SQream DB, Release 2021.1

2. Open the Windows menu by pressing the Windows button (￿ Win) or clicking the Windows menu button.

3. Type ODBC and select ODBC Data Sources (64-bit). During installation, the installer created a sample user DSN named SQreamDB.

4. Optional - Do one or both of the following: • Modify the DSN name.

• Create a new DSN name by clicking Add and selecting SQream ODBC Driver.

5. Click Finish.

6. Enter your connection parameters. The following table describes the connection parameters:

2.1. Full list of guides 87 SQream DB, Release 2021.1

Item Description Example Data Source The Data Source Name. SQream rec- Name ommends using a descriptive and easily recognizable name for referencing your DSN. Once set, the Data Source Name cannot be changed. Description The description of your DSN. This field is optional. User The username of a role to use for estab- rhendricks lishing the connection. Password The password of the selected role. Tr0ub4dor Database The database name to connect to. For master example, master Service The service queue to use. For example, etl. For the default ser- vice sqream, leave blank. Server The hostname of the SQream worker. 127.0.0.1 or sqream.mynetwork.co Port The TCP port of the SQream worker. 5000 or 3108 User Server Uses the load balancer when establish- Picker ing a connection. Use only if exists, and check port. SSL Uses SSL when establishing a connec- tion. Logging Op- Lets you modify your logging options tions when tracking the ODBC connection for connection issues.

Tip: Test the connection by clicking Test before saving your DSN.

7. Save the DSN by clicking OK.

2.1.3.1.2.9 Connecting Tableau to SQream

To connect Tableau to SQream: 1. Start Tableau Desktop.

2. In the Connect menu, in the To a server sub-menu, click More Servers and select Other Databases (ODBC). The Other Databases (ODBC) window is displayed.

3. In the Other Databases (ODBC) window, select the DSN that you created in Setting Up SQream Tables as Data Sources. Tableau may display the Sqream ODBC Driver Connection Dialog window and prompt you to provide your username and password. 4. Provide your username and password and click OK.

88 Chapter 2. Guides SQream DB, Release 2021.1

2.1.3.1.2.10 Connecting to SQream

After installing the JDBC driver you can connect to SQream. To connect to SQream: 1. Start Tableau Desktop.

2. In the Connect menu, in the To a Server sub-menu, click More…. More connection options are displayed.

3. Select SQream DB by SQream Technologies. The New Connection dialog box is displayed.

4. In the New Connection dialog box, fill in the fields and click Sign In. The following table describes the fields:

Item Description Example Server Defines the server of the SQream 127.0.0.1 or sqream.mynetwork.co worker. Port Defines the TCP port of the SQream 3108 when using a load balancer, or worker. 5100 when connecting directly to a worker with SSL. Database Defines the database to establish a master connection with. Cluster Enables (true) or disables (false) the load balancer. After enabling or disabling the load balance, verify the connection. Username Specifies the username of a role to use rhendricks when connecting. Password Specifies the password of the selected Tr0ub4dor&3 role. Require Sets SSL as a requirement for estab- SSL (recom- lishing this connection. mended)

The connection is established and the data source page is displayed.

Tip: Tableau automatically assigns your connection a default name based on the DSN and table. SQream recommends giving the connection a more descriptive name.

2.1.3.1.2.11 Setting Up SQream Tables as Data Sources

After connecting to SQream you must set up the SQream tables as data sources. To set up SQream tables as data sources:

2.1. Full list of guides 89 SQream DB, Release 2021.1

1. From the Table menu, select the desired database and schema. SQream’s default schema is public.

2. Drag the desired tables into the main area (labeled Drag tables here). This area is also used for specifying joins and data source filters.

3. Open a new sheet to analyze data.

Tip: For more information about configuring data sources, joining, filtering, see Tableau’s Set Up Data Sources tutorials.

2.1.3.1.2.12 Tableau Best Practices and Troubleshooting

This section describes the following best practices and troubleshooting procedures when connecting to SQream using Tableau:

• Inserting Only Required Data • Using Tableau’s Table Query Syntax • Creating a Separate Service for Tableau • Troubleshooting Workbook Performance Before Deploying to the Tableau Server • Troubleshooting Error Codes

2.1.3.1.2.13 Inserting Only Required Data

When using Tableau, SQream recommends using only data that you need, as described below: • Insert only the data sources you need into Tableau, excluding tables that don’t require analysis.

• To increase query performance, add filters before analyzing. Every modification you make while an- alyzing data queries the SQream database, sometimes several times. Adding filters to the datasource before exploring limits the amount of data analyze and increases query performance.

2.1.3.1.2.14 Using Tableau’s Table Query Syntax

Dragging your desired tables into the main area in Tableau builds queries based on its own syntax. This helps ensure increased performance, while using views or custom SQL may degrade performance. In addition, SQream recommends using the CREATE VIEW to create pre-optimized views, which your datasources point to.

90 Chapter 2. Guides SQream DB, Release 2021.1

2.1.3.1.2.15 Creating a Separate Service for Tableau

SQream recommends creating a separate service for Tableau with the DWLM. This reduces the impact that Tableau has on other applications and processes, such as ETL. In addition, this works in conjunction with the load balancer to ensure good performance.

2.1.3.1.2.16 Troubleshooting Workbook Performance Before Deploying to the Tableau Server

Tableau has a built-in performance recorder that shows how time is being spent. If you’re seeing slow performance, this could be the result of a misconfiguration such as setting concurrency too low. Use the Tableau Performance Recorder for viewing the performance of queries run by Tableau. You can use this information to identify queries that can be optimized by using views.

2.1.3.1.2.17 Troubleshooting Error Codes

Tableau may be unable to locate the SQream JDBC driver. The following message is displayed when Tableau cannot locate the driver:

Error Code: 37CE01A3, No suitable driver installed or the URL is incorrect

To troubleshoot error codes: If Tableau cannot locate the SQream JDBC driver, do the following: 1. Verify that the JDBC driver is located in the correct directory: • Tableau Desktop on Windows: C:Program FilesTableauDrivers • Tableau Desktop on MacOS: ~/Library/Tableau/Drivers • Tableau on Linux: /opt/tableau/tableau_driver/jdbc 2. Find the file path for the JDBC driver and add it to the Java classpath: • For Linux - export CLASSPATH=; $CLASSPATH

• For Windows - add an environment variable for the classpath:

2.1. Full list of guides 91 SQream DB, Release 2021.1

If you experience issues after restarting Tableau, see the SQream support portal.

2.1.3.1.3 Connect to SQream Using Pentaho Data Integration

2.1.3.1.3.1 Overview

This document is a Quick Start Guide that describes how to install Pentaho, create a transformation, and define your output. The Connecting to SQream Using Pentaho page describes the following: • Installing Pentaho • Installing and setting up the JDBC driver • Creating a transformation • Defining your output • Importing your data

2.1.3.1.3.2 Installing Pentaho

To install PDI, see the Pentaho Community Edition (CE) Installation Guide.

92 Chapter 2. Guides SQream DB, Release 2021.1

The Pentaho Community Edition (CE) Installation Guide describes how to do the following: • Downloading the PDI software. • Installing the JRE (Java Runtime Environment) and JDK (Java Development Kit). • Setting up the JRE and JDK environment variables for PDI. Back to Overview

2.1.3.1.3.3 Installing and Setting Up the JDBC Driver

After installing Pentaho you must install and set up the JDBC driver. This section explains how to set up the JDBC driver using Pentaho. These instructions use Spoon, the graphical transformation and job designer associated with the PDI suite. You can install the driver by copying and pasting the SQream JDBC .jar file into your /design-tools/data-integration/lib directory. NOTE: Contact your SQream license account manager for the JDBC .jar file. Back to Overview

2.1.3.1.3.4 Creating a Transformation

After installing Pentaho you can create a transformation. To create a transformation: 1. Use the CLI to open the PDI client for your operating system (Windows):

$ spoon.bat

2. Open the spoon.bat file from its folder location.

3. In the View tab, right-click Transformations and click New.

2.1. Full list of guides 93 SQream DB, Release 2021.1

A new transformation tab is created.

4. In the Design tab, click Input to show its file contents.

94 Chapter 2. Guides SQream DB, Release 2021.1

5. Drag and drop the CSV file input item to the new transformation tab that you created.

6. Double-click CSV file input. The CSV file input panel is displayed.

2.1. Full list of guides 95 SQream DB, Release 2021.1

7. In the Step name field, type a name.

8. To the right of the Filename field, click Browse.

96 Chapter 2. Guides SQream DB, Release 2021.1

9. Select the file that you want to read from and click OK.

10. In the CSV file input window, click Get Fields.

11. In the Sample data window, enter the number of lines you want to sample and click OK. The default setting is 100.

2.1. Full list of guides 97 SQream DB, Release 2021.1

The tool reads the file and suggests the field name and type.

12. In the CSV file input window, click Preview.

98 Chapter 2. Guides SQream DB, Release 2021.1

13. In the Preview size window, enter the number of rows you want to preview and click OK. The default setting is 1000.

14. Verify that the preview data is correct and click Close.

2.1. Full list of guides 99 SQream DB, Release 2021.1

15. Click OK in the CSV file input window. Back to Overview

2.1.3.1.3.5 Defining Your Output

After creating your transformation you must define your output. To define your output: 1. In the Design tab, click Output. 2. Drag and drop Table output item to the Transformation window.

3. Double-click Table output to open the Table output dialog box. 4. From the Table output dialog box, type a Step name and click New to create a new connection. Your steps are the building blocks of a transformation, such as file input or a table output.

100 Chapter 2. Guides SQream DB, Release 2021.1

The Database Connection window is displayed with the General tab selected by default.

5. Enter or select the following information in the Database Connection window and click Test.

2.1. Full list of guides 101 SQream DB, Release 2021.1

The following table shows and describes the information that you need to fill out in the Database Connection window:

No. Element Name Description 1 Connection name Enter a name that uniquely describes your connection, such as sam- pledata. 2 Connection type Select Generic database. 3 Access Select Native (JDBC). 4 Custom connection URL Insert jdbc:Sqream:///;user=;password=;[; …];. The IP is a node in your SQream cluster and is the name or schema of the database you want to connect to. Verify that you have not used any leading or trailing spaces. 5 Custom driver class name Insert com.sqream.jdbc.SQDriver. Verify that you have not used any leading or trailing spaces. 6 Username Your SQreamdb username. If you leave this blank, you will be prompted to provide it when you connect. 7 Password Your password. If you leave this blank, you will be prompted to provide it when you connect.

The following message is displayed:

102 Chapter 2. Guides SQream DB, Release 2021.1

6. Click OK in the window above, in the Database Connection window, and Table Output window. Back to Overview

2.1.3.1.3.6 Importing Data

After defining your output you can begin importing your data. For more information about backing up users, permissions, or schedules, see Backup and Restore Pentaho Repositories To import data: 1. Double-click the Table output connection that you just created.

2. To the right of the Target schema field, click Browse and select a schema name.

2.1. Full list of guides 103 SQream DB, Release 2021.1

3. Click OK. The selected schema name is displayed in the Target schema field.

4. Create a new hop connection between the CSV file input and Table output steps: 1. On the CSV file input step item, click the new hop connection icon.

104 Chapter 2. Guides SQream DB, Release 2021.1

2. Drag an arrow from the CSV file input step item to the Table output step item.

3. Release the mouse button. The following options are displayed. 4. Select Main output of step.

5. Double-click Table output to open the Table output dialog box. 6. In the Target table field, define a target table name.

7. Click SQL to open the Simple SQL editor.

2.1. Full list of guides 105 SQream DB, Release 2021.1

8. In the Simple SQL editor, click Execute.

The system processes and displays the results of the SQL statements.

106 Chapter 2. Guides SQream DB, Release 2021.1

9. Close all open dialog boxes. 10. Click the play button to execute the transformation.

The Run Options dialog box is displayed.

11. Click Run. The Execution Results are displayed.

2.1. Full list of guides 107 SQream DB, Release 2021.1

Back to Overview

2.1.3.1.4 Connect to SQream Using MicroStrategy

2.1.3.1.4.1 Overview

This document is a Quick Start Guide that describes how to install MicroStrategy and connect a datasource to the MicroStrategy dasbhoard for analysis. The Connecting to SQream Using MicroStrategy page describes the following:

• What is MicroStrategy? • Connecting a Data Source • Supported SQream Drivers

2.1.3.1.4.2 What is MicroStrategy?

MicroStrategy is a Business Intelligence software offering a wide variety of data analytics capabilities. SQream uses the MicroStrategy connector for reading and loading data into SQream. MicroStrategy provides the following: • Data discovery • Advanced analytics • Data visualization • Embedded BI • Banded reports and statements For more information about Microstrategy, see MicroStrategy. Back to Overview

2.1.3.1.4.3 Connecting a Data Source

1. Activate the MicroStrategy Desktop app. The app displays the Dossiers panel to the right.

2. Download the most current version of the SQream JDBC driver.

108 Chapter 2. Guides SQream DB, Release 2021.1

3. Click Dossiers and New Dossier. The Untitled Dossier panel is displayed.

4. Click New Data.

5. From the Data Sources panel, select Databases to access data from tables. The Select Import Options panel is displayed.

6. Select one of the following: • Build a Query • Type a Query • Select Tables

7. Click Next.

8. In the Data Source panel, do the following: 1. From the Database dropdown menu, select Generic. The Host Name, Port Number, and Database Name fields are removed from the panel.

2. In the Version dropdown menu, verify that Generic DBMS is selected.

3. Click Show Connection String.

4. Select the Edit connection string checkbox.

5. From the Driver dropdown menu, select a driver for one of the following connectors: • JDBC - The SQream driver is not integrated with MicroStrategy and does not appear in the dropdown menu. However, to proceed, you must select an item, and in the next step you must specify the path to the SQream driver that you installed on your machine. • ODBC - SQreamDB ODBC

6. In the Connection String text box, type the relevant connection string and path to the JDBC jar file using the following syntax:

$ jdbc:Sqream:///;user=;password= ,→sqream;[; ...]

The following example shows the correct syntax for the JDBC connector:

jdbc;MSTR_JDBC_JAR_FOLDER=C:\path\to\jdbc\folder;DRIVER=;URL= ,→{jdbc:Sqream:///;user=;password= ,→;[; ...];}

2.1. Full list of guides 109 SQream DB, Release 2021.1

The following example shows the correct syntax for the ODBC connector:

odbc:Driver={SqreamODBCDriver};DSN={SQreamDB ODBC};Server=;Port=; ,→Database=;User=;Password=;Cluster= ,→;

For more information about the available connection parameters and other examples, see Connection Parameters. 7. In the User and Password fields, fill out your user name and password.

8. In the Data Source Name field, type SQreamDB.

9. Click Save. The SQreamDB that you picked in the Data Source panel is displayed. 9. In the Namespace menu, select a namespace. The tables files are displayed.

10. Drag and drop the tables into the panel on the right in your required order.

11. Recommended - Click Prepare Data to customize your data for analysis.

12. Click Finish.

13. From the Data Access Mode dialog box, select one of the following: • Connect Live • Import as an In-memory Dataset Your populated dashboard is displayed and is ready for data discovery and analytics. Back to Overview

2.1.3.1.4.4 Supported SQream Drivers

The following list shows the supported SQream drivers and versions: • JDBC - Version 4.3.3 and higher. • ODBC - Version 4.0.0. Back to Overview

2.1.3.1.5 Connect to SQream Using R

You can use R to interact with a SQream DB cluster. This tutorial is a guide that will show you how to connect R to SQream DB.

110 Chapter 2. Guides SQream DB, Release 2021.1

In this topic:

• JDBC – A full example • ODBC – A full example

2.1.3.1.5.1 JDBC

1. Get the SQream DB JDBC driver. 2. In R, install RJDBC

> install.packages("RJDBC") Installing package into 'C:/Users/r/...' (as 'lib' is unspecified)

package 'RJDBC' successfully unpacked and MD5 sums checked

3. Import the RJDBC library

> library(RJDBC)

4. Set the classpath and initialize the JDBC driver which was previously installed. For example, on Windows:

> cp = c("C:\\Program Files\\SQream Technologies\\JDBC Driver\\2020.1-3.2.0\\sqream- ,→jdbc-3.2.jar") > .jinit(classpath=cp) > drv <- JDBC("com.sqream.jdbc.SQDriver","C:\\Program Files\\SQream␣ ,→Technologies\\JDBC Driver\\2020.1-3.2.0\\sqream-jdbc-3.2.jar")

5. Open a connection with a JDBC connection string and run your first statement

> con <- dbConnect(drv,"jdbc:Sqream://127.0.0.1:3108/master;user=rhendricks; ,→password=Tr0ub4dor&3;cluster=true")

> dbGetQuery(con,"select top 5 * from t") xint xtinyint xsmallint xbigint 1 1 82 5067 1 2 2 14 1756 2 3 3 91 22356 3 4 4 84 17232 4 5 5 13 14315 5

6. Close the connection

> close(con)

2.1. Full list of guides 111 SQream DB, Release 2021.1

2.1.3.1.5.2 A full example

> library(RJDBC) > cp = c("C:\\Program Files\\SQream Technologies\\JDBC Driver\\2020.1-3.2.0\\sqream-jdbc- ,→3.2.jar") > .jinit(classpath=cp) > drv <- JDBC("com.sqream.jdbc.SQDriver","C:\\Program Files\\SQream Technologies\\JDBC␣ ,→Driver\\2020.1-3.2.0\\sqream-jdbc-3.2.jar") > con <- dbConnect(drv,"jdbc:Sqream://127.0.0.1:3108/master;user=rhendricks; ,→password=Tr0ub4dor&3;cluster=true") > dbGetQuery(con,"select top 5 * from t") xint xtinyint xsmallint xbigint 1 1 82 5067 1 2 2 14 1756 2 3 3 91 22356 3 4 4 84 17232 4 5 5 13 14315 5 > close(con)

2.1.3.1.5.3 ODBC

1. Install the SQream DB ODBC driver for your operating system, and create a DSN. 2. In R, install RODBC

> install.packages("RODBC") Installing package into 'C:/Users/r/...' (as 'lib' is unspecified)

package 'RODBC' successfully unpacked and MD5 sums checked

3. Import the RODBC library

> library(RODBC)

4. Open a connection handle to an existing DSN (my_cool_dsn in this example)

> ch <- odbcConnect("my_cool_dsn",believeNRows=F)

5. Run your first statement

> sqlQuery(ch,"select top 5 * from t") xint xtinyint xsmallint xbigint 1 1 82 5067 1 2 2 14 1756 2 3 3 91 22356 3 4 4 84 17232 4 5 5 13 14315 5

6. Close the connection

> close(ch)

112 Chapter 2. Guides SQream DB, Release 2021.1

2.1.3.1.5.4 A full example

> library(RODBC) > ch <- odbcConnect("my_cool_dsn",believeNRows=F) > sqlQuery(ch,"select top 5 * from t") xint xtinyint xsmallint xbigint 1 1 82 5067 1 2 2 14 1756 2 3 3 91 22356 3 4 4 84 17232 4 5 5 13 14315 5 > close(ch)

2.1.3.1.6 Connect to SQream Using PHP

You can use PHP to interact with a SQream DB cluster. This tutorial is a guide that will show you how to connect a PHP application to SQream DB.

In this topic:

• Prerequisites • Testing the connection

2.1.3.1.6.1 Prerequisites

1. Install the SQream DB ODBC driver for Linux and create a DSN. 2. Install the uODBC extension for your PHP installation. To configure PHP to enable uODBC, con- figure it with ./configure --with-pdo-odbc=unixODBC,/usr/local when compiling php or install php-odbc and php-pdo along with php (version 7.1 minimum for best results) using your distribution package manager.

2.1.3.1.6.2 Testing the connection

1. Create a test connection file. Be sure to use the correct parameters for your SQream DB installation. Download this PHP example connection file .

1

2.1. Full list of guides 113 SQream DB, Release 2021.1

(continued from previous page)

11 echo "Result is " . odbc_result($rs, $i); 12 } 13 } 14 echo "\n"; 15 odbc_close($conn); // Finally, close the connection 16 ?>

Tip: An example of a valid DSN line is:

$dsn = "odbc:Driver={SqreamODBCDriver};Server=192.168.0.5;Port=5000;Database=master; ,→User=rhendricks;Password=super_secret;Service=sqream";

For more information about supported DSN parameters, see ODBC DSN Parameters.

2. Run the PHP file either directly with PHP(php test.php) or through a browser.

2.1.3.1.7 Connect to SQream Using SAS Viya

You can use SAS Viya to connect to a SQream DB cluster. This tutorial is a guide that will show you how to connect to SAS Viya.

In this topic:

• Installing SAS Viya • Installing the JDBC driver • Configuring the JDBC driver in SAS Viya • Browsing data and workbooks • SAS Viya best practices and troubleshooting – Cut out what you don’t need – Create a separate service for SAS Viya – Troubleshooting java.lang.ClassNotFoundException: com.sqream.jdbc.SQDriver excep- tions

2.1.3.1.7.1 Installing SAS Viya

SQream DB has been tested with SAS Viya v.03.05 and newer. If you do not already have SAS Viya installed, refer to SAS’s website https://www.sas.com/en_us/software/ viya.html .

2.1.3.1.7.2 Installing the JDBC driver

1. Download the JDBC driver from the client drivers page.

114 Chapter 2. Guides SQream DB, Release 2021.1

2. Unzip the JDBC driver into a location on the SAS Viya server. SQream recommends creating the directory /opt/sqream on the SAS Viya server.

2.1.3.1.7.3 Configuring the JDBC driver in SAS Viya

1. Sign into SAS Studio 2. Create a new SAS program

3. Create a sample program to explore data

Listing 8: Sample SAS program

1 options sastrace='d,d,d,d' 2 sastraceloc=saslog 3 nostsuffix 4 msglevel=i 5 sql_ip_trace=(note,source) 6 DEBUG=DBMS_SELECT;

7

8 options validvarname=any;

9

10 libname sqlib jdbc driver="com.sqream.jdbc.SQDriver" 11 classpath="/opt/sqream/sqream-jdbc-4.0.0.jar" 12 URL="jdbc:Sqream://sqream-cluster.piedpiper.com:3108/raviga;cluster=true" 13 user="rhendricks" 14 password="Tr0ub4dor3" 15 schema="public" 16 PRESERVE_TAB_NAMES=YES 17 PRESERVE_COL_NAMES=YES;

18 (continues on next page)

2.1. Full list of guides 115 SQream DB, Release 2021.1

(continued from previous page)

19 proc sql; 20 title 'Customers table'; 21 select * 22 from sqlib.customers; 23 quit;

24

25 data sqlib.customers; 26 set sqlib.customers; 27 run;

This sample program does several things: • Line 10: Start a JDBC session named sqlib, associated with the SQream DB driver. This statement extends the SAS global LIBNAME statement so that you can assign a libref to your data source. This feature lets you reference a table in SQream DB directly in a DATA Step or SAS procedure. • Line 11: Instruct SAS Viya where to find the SQream DB JDBC driver. This step is required because SAS Viya will not honor the SAS_ACCESS_CLASSPATH environment variable for this connection. • Lines 12-15: Associate the libref to be able to use it in the program as sqlib.tablename. The libref will be sqlib and it will use the JDBC engine and connect to the sqream-cluster. piedpiper.com SQream DB cluster. The database name is raviga and the schema is public. See our JDBC guide for connection string documentation . • Lines 19-23: Data preparation step. We load data from the customers table into the in-memory space in SAS Viya. • Lines 25-27: DATA step. This step should be familiar to SAS v9 users. We use standard SAS naming conventions to reference the data, with sqlib as the libref and customers as the table. 4. Run the program by clicking the Run button If the sample ran correctly, three new tabs will appear: Log, Results, and Output Data. 5. The query results can be seen in the Results tab.

2.1.3.1.7.4 Browsing data and workbooks

1. From the panel on the left, navigate to Libraries to open the navigation tree. 2. Our previously created library named SQLIB will populate, and show the table customers. Double clicking on the table name will expand it and show the columns. 3. Find the workbook you created in the DATA step. It should appear under WORK. The workbook will be named sqlib.customers. Double click it to expand the table tree.

116 Chapter 2. Guides SQream DB, Release 2021.1

Fig. 1: The Run button runs the current SAS program

Fig. 2: The results tab contains query results

2.1. Full list of guides 117 SQream DB, Release 2021.1

2.1.3.1.7.5 SAS Viya best practices and troubleshooting

2.1.3.1.7.6 Cut out what you don’t need

• Bring only the data sources you need into SAS Viya. As a best practice, do not bring in tables that you don’t intend to explore. • Add filters before the DATA step to reduce in-memory size. Add filters to the datasource before exploring, so that the queries sent to SQream DB run faster.

2.1.3.1.7.7 Create a separate service for SAS Viya

SQream recommends that SAS Viya get a separate service with the DWLM. This will reduce the impact of SAS Viya on other applications and processes, such as ETL. This works in conjunction with the load balancer to ensure good performance.

2.1.3.1.7.8 Troubleshooting java.lang.ClassNotFoundException: com.sqream.jdbc.SQDriver excep- tions

In some cases, SAS Viya may have trouble finding the SQream DB JDBC driver. This message explains that the driver can’t be found. To solve this issue, try two things: 1. Verify that the JDBC driver was placed in a directory that SAS Viya can access 2. Verify the classpath in your SAS program. Make sure that the classpath is correct, and the file it references can be accessed by SAS Viya. If you’re still experiencing issues after restarting SAS Viya, we’re always happy to help. Visit SQream’s support portal for additional support.

118 Chapter 2. Guides SQream DB, Release 2021.1

2.1.4 Operations

The guides in this section include information about installing Monit, best practices, monitoring, logging, troubleshooting, and maintaining a SQream DB cluster.

2.1.4.1 Optimization and Best Practices

This topic explains some best practices of working with SQream DB. See also our Monitoring query performance guide for more information.

In this topic:

• Table design – Use date and datetime types for columns – Reduce varchar length to a minimum – Don’t flatten or denormalize data – Convert foreign tables to native tables – Use information about the column data to your advantage ∗ Set NULL or NOT NULL when relevant ∗ Keep VARCHAR lengths to a minimum • Sorting • Query best practices – Reduce data sets before joining tables – Prefer the ANSI JOIN – Use the high selectivity hint – Cast smaller types to avoid overflow in aggregates – Prefer COUNT(*) and COUNT on non-nullable columns – Return only required columns – Use saved queries to reduce recurring compilation time – Pre-filter to reduce JOIN complexity • Data loading considerations – Allow and use natural sorting on data • Further reading and monitoring query performance

2.1.4.1.1 Table design

This section describes best practices and guidelines for designing tables.

2.1. Full list of guides 119 SQream DB, Release 2021.1

2.1.4.1.1.1 Use date and datetime types for columns

When creating tables with dates or timestamps, using the purpose-built DATE and DATETIME types over integer types or VARCHAR will bring performance and storage footprint improvements, and in many cases huge performance improvements (as well as data integrity benefits). SQream DB stores dates and datetimes very efficiently and can strongly optimize queries using these specific types.

2.1.4.1.1.2 Reduce varchar length to a minimum

With the VARCHAR type, the length has a direct effect on query performance. If the size of your column is predictable, by defining an appropriate column length (no longer than the maximum actual value) you will get the following benefits: • Data loading issues can be identified more quickly • SQream DB can reserve less memory for decompression operations • Third-party tools that expect a data size are less likely to over-allocate memory

2.1.4.1.1.3 Don’t flatten or denormalize data

SQream DB executes JOIN operations very effectively. It is almost always better to JOIN tables at query- time rather than flatten/denormalize your tables. This will also reduce storage size and reduce row-lengths. We highly suggest using INT or BIGINT as join keys, rather than a text/string type.

2.1.4.1.1.4 Convert foreign tables to native tables

SQream DB’s native storage is heavily optimized for analytic workloads. It is always faster for querying than other formats, even columnar ones such as Parquet. It also enables the use of additional metadata to help speed up queries, in some cases by many orders of magnitude. You can improve the performance of all operations by converting foreign tables into native tables by using the CREATE TABLE AS syntax. For example,

CREATE TABLE native_table AS SELECT * FROM external_table

The one situation when this wouldn’t be as useful is when data will be only queried once.

2.1.4.1.1.5 Use information about the column data to your advantage

Knowing the data types and their ranges can help design a better table.

2.1.4.1.1.6 Set NULL or NOT NULL when relevant

For example, if a value can’t be missing (or NULL), specify a NOT NULL constraint on the columns. Not only does specifying NOT NULL save on data storage, it lets the query compiler know that a column cannot have a NULL value, which can improve query performance.

120 Chapter 2. Guides SQream DB, Release 2021.1

2.1.4.1.1.7 Keep VARCHAR lengths to a minimum

While it won’t make a big difference in storage, large strings allocate a lot of memory at query time. If a column’s string length never exceeds 50 characters, specify VARCHAR(50) rather than an arbitrarily large number.

2.1.4.1.2 Sorting

Data sorting is an important factor in minimizing storage size and improving query performance. • Minimizing storage saves on physical resources and increases performance by reducing overall disk I/O. Prioritize the sorting of low-cardinality columns. This reduces the number of chunks and extents that SQream DB reads during query execution. • Where possible, sort columns with the lowest cardinality first. Avoid sorting VARCHAR and TEXT/ NVARCHAR columns with lengths exceeding 50 characters. • For longer-running queries that run on a regular basis, performance can be improved by sorting data based on the WHERE and GROUP BY parameters. Data can be sorted during insert by using Foreign Tables or by using CREATE TABLE AS.

2.1.4.1.3 Query best practices

This section describes best practices for writing SQL queries.

2.1.4.1.3.1 Reduce data sets before joining tables

Reducing the input to a JOIN clause can increase performance. Some queries benefit from retreiving a reduced dataset as a subquery prior to a join. For example,

SELECT store_name, SUM(amount) FROM store_dim AS dim INNER JOIN store_fact AS fact ON dim.store_id=fact.store_id WHERE p_date BETWEEN '2018-07-01' AND '2018-07-31' GROUP BY 1;

Can be rewritten as

SELECT store_name, sum_amount FROM store_dim AS dim INNER JOIN (SELECT SUM(amount) AS sum_amount, store_id FROM store_fact WHERE p_date BETWEEN '2018-07-01' AND '2018-07-31' group by 2) AS fact ON dim.store_id=fact.store_id;

2.1.4.1.3.2 Prefer the ANSI JOIN

SQream DB prefers the ANSI JOIN syntax. In some cases, the ANSI JOIN performs better than the non-ANSI variety.

2.1. Full list of guides 121 SQream DB, Release 2021.1

For example, this ANSI JOIN example will perform better:

Listing 9: ANSI JOIN will perform better SELECT p.name, s.name, c.name FROM "Products" AS p JOIN "Sales" AS s ON p.product_id = s.sale_id JOIN "Customers" as c ON s.c_id = c.id AND c.id = 20301125;

This non-ANSI JOIN is supported, but not recommended:

Listing 10: Non-ANSI JOIN may not perform well SELECT p.name, s.name, c.name FROM "Products" AS p, "Sales" AS s, "Customers" as c WHERE p.product_id = s.sale_id AND s.c_id = c.id AND c.id = 20301125;

2.1.4.1.3.3 Use the high selectivity hint

Selectivity is the ratio of cardinality to the number of records of a chunk. We define selectivity as Distinct values Total number of records in a chunk SQream DB has a hint function called HIGH_SELECTIVITY, which is a function you can wrap a condition in. The hint signals to SQream DB that the result of the condition will be very sparse, and that it should attempt to rechunk the results into fewer, fuller chunks. Use the high selectivity hint when you expect a predicate to filter out most values. For example, when the data is dispersed over lots of chunks (meaning that the data is not well-clustered). For example,

SELECT store_name, SUM(amount) FROM store_dim WHERE HIGH_SELECTIVITY(p_date = '2018-07-01') GROUP BY 1;

This hint tells the query compiler that the WHERE condition is expected to filter out more than 60% of values. It never affects the query results, but when used correctly can improve query performance.

Tip: The HIGH_SELECTIVITY() hint function can only be used as part of the WHERE clause. It can’t be used in equijoin conditions, cases, or in the select list.

Read more about identifying the scenarios for the high selectivity hint in our Monitoring query performance guide.

2.1.4.1.3.4 Cast smaller types to avoid overflow in aggregates

When using an INT or smaller type, the SUM and COUNT operations return a value of the same type. To avoid overflow on large results, cast the column up to a larger type.

122 Chapter 2. Guides SQream DB, Release 2021.1

For example

SELECT store_name, SUM(amount :: BIGINT) FROM store_dim GROUP BY 1;

2.1.4.1.3.5 Prefer COUNT(*) and COUNT on non-nullable columns

SQream DB optimizes COUNT(*) queries very strongly. This also applies to COUNT(column_name) on non- nullable columns. Using COUNT(column_name) on a nullable column will operate quickly, but much slower than the previous variations.

2.1.4.1.3.6 Return only required columns

Returning only the columns you need to client programs can improve overall query performance. This also reduces the overall result set, which can improve performance in third-party tools. SQream is able to optimize out unneeded columns very strongly due to its columnar storage.

2.1.4.1.3.7 Use saved queries to reduce recurring compilation time

Saved queries are compiled when they are created. The query plan is saved in SQream DB’s metadata for later re-use. Because the query plan is saved, they can be used to reduce compilation overhead, especially with very complex queries, such as queries with lots of values in an IN predicate. When executed, the saved query plan is recalled and executed on the up-to-date data stored on disk. See how to use saved queries in the saved queries guide.

2.1.4.1.3.8 Pre-filter to reduce JOIN complexity

Filter and reduce table sizes prior to joining on them

SELECT store_name, SUM(amount) FROM dimention dim JOIN fact ON dim.store_id = fact.store_id WHERE p_date BETWEEN '2019-07-01' AND '2019-07-31' GROUP BY store_name;

Can be rewritten as:

SELECT store_name, sum_amount FROM dimention AS dim INNER JOIN (SELECT SUM(amount) AS sum_amount, store_id FROM fact WHERE p_date BETWEEN '2019-07-01' AND '2019-07-31' GROUP BY store_id) AS fact ON dim.store_id = fact.store_id;

2.1. Full list of guides 123 SQream DB, Release 2021.1

2.1.4.1.4 Data loading considerations

2.1.4.1.4.1 Allow and use natural sorting on data

Very often, tabular data is already naturally ordered along a dimension such as a timestamp or area. This natural order is a major factor for query performance later on, as data that is naturally sorted can be more easily compressed and analyzed with SQream DB’s metadata collection. For example, when data is sorted by timestamp, filtering on this timestamp is more effective than filtering on an unordered column. Natural ordering can also be used for effective DELETE operations.

2.1.4.1.5 Further reading and monitoring query performance

Read our Monitoring query performance guide to learn how to use the built in monitoring utilities. The guide also gives concerete examples for improving query performance.

2.1.4.2 Installing Monit

2.1.4.2.1 Getting Started

Before installing SQream with Monit, verify that you have followed the required recommended pre-installation configurations. The procedures in the Installing Monit guide must be performed on each SQream cluster node.

2.1.4.2.2 Overview

Monit is a free open source supervision utility for managing and monitoring Unix and Linux. Monit lets you view system status directly from the command line or from a native HTTP web server. Monit can be used to conduct automatic maintenance and repair, such as executing meaningful causal actions in error situations. SQream uses Monit as a watchdog utility, but you can use any other utility that provides the same or similar functionality. The Installing Monit procedures describes how to install, configure, and start Monit. You can install Monit in one of the following ways: • Installing Monit on CentOS • Installing Monit on CentOS offline • Installing Monit on Ubuntu • Installing Monit on Ubuntu offline

2.1.4.2.2.1 Installing Monit on CentOS:

To install Monit on CentOS: 1. Install Monit as a superuser on CentOS:

124 Chapter 2. Guides SQream DB, Release 2021.1

$ sudo yum install monit

2.1.4.2.2.2 Installing Monit on CentOS Offline:

Installing Monit on CentOS offline can be done in either of the following ways: • Building Monit from Source Code • Building Monit from Pre-Built Binaries

2.1.4.2.2.3 Building Monit from Source Code

To build Monit from source code: 1. Copy the Monit package for the current version:

$ tar zxvf monit-.tar.gz

The value x.y.z denotes the version numbers. 2. Navigate to the directory where you want to store the package:

$ cd monit-x.y.z

3. Configure the files in the package:

$ ./configure (use ./configure --help to view available options)

4. Build and install the package:

$ make && make install

The following are the default storage directories: • The Monit package: /usr/local/bin/ • The monit.1 man-file: /usr/local/man/man1/ 5. Optional - To change the above default location(s), use the –prefix option to ./configure. 6. Optional - Create an RPM package for CentOS directly from the source code:

$ rpmbuild -tb monit-x.y.z.tar.gz

2.1.4.2.2.4 Building Monit from Pre-Built Binaries

To build Monit from pre-built binaries: 1. Copy the Monit package for the current version:

$ tar zxvf monit-x.y.z-linux-x64.tar.gz

The value x.y.z denotes the version numbers. 2. Navigate to the directory where you want to store the package:

2.1. Full list of guides 125 SQream DB, Release 2021.1

3. Copy the bin/monit and /usr/local/bin/ directories:

$ cp bin/monit /usr/local/bin/

4. Copy the conf/monitrc and /etc/ directories:

$ cp conf/monitrc /etc/

For examples of pre-built Monit binarties, see Download Precompiled Binaries. Back to top

2.1.4.2.2.5 Installing Monit on Ubuntu:

To install Monit on Ubuntu: 1. Install Monit as a superuser on Ubuntu:

$ sudo apt-get install monit

Back to top

2.1.4.2.2.6 Installing Monit on Ubuntu Offline:

You can install Monit on Ubuntu when you do not have an internet connection. To install Monit on Ubuntu offline: 1. Compress the required file:

$ tar zxvf monit--linux-x64.tar.gz

NOTICE: denotes the version number. 2. Navigate to the directory where you want to save the file:

$ cd monit-x.y.z

3. Copy the bin/monit directory into the /usr/local/bin/ directory:

$ cp bin/monit /usr/local/bin/

4. Copy the conf/monitrc directory into the /etc/ directory:

$ cp conf/monitrc /etc/

Back to top

2.1.4.2.3 Configuring Monit

When the installation is complete, you can configure Monit. You configure Monit by modifying the Monit configuration file, called monitrc. This file contains blocks for each service that you want to monitor. The following is an example of a service block:

126 Chapter 2. Guides SQream DB, Release 2021.1

$ #SQREAM1-START $ check process sqream1 with pidfile /var/run/sqream1.pid $ start program = "/usr/bin/systemctl start sqream1" $ stop program = "/usr/bin/systemctl stop sqream1" $ #SQREAM1-END

For example, if you have 16 services, you can configure this block by copying the entire block 15 times and modifying all service names as required, as shown below:

$ #SQREAM2-START $ check process sqream2 with pidfile /var/run/sqream2.pid $ start program = "/usr/bin/systemctl start sqream2" $ stop program = "/usr/bin/systemctl stop sqream2" $ #SQREAM2-END

For servers that don’t run the metadataserver and serverpicker commands, you can use the block example above, but comment out the related commands, as shown below:

$ #METADATASERVER-START $ #check process metadataserver with pidfile /var/run/metadataserver.pid $ #start program = "/usr/bin/systemctl start metadataserver" $ #stop program = "/usr/bin/systemctl stop metadataserver" $ #METADATASERVER-END

To configure Monit: 1. Copy the required block for each required service. 2. Modify all service names in the block. 3. Copy the configured monitrc file to the /etc/monit.d/ directory:

$ cp monitrc /etc/monit.d/

4. Set file permissions to 600 (full read and write access):

$ sudo chmod 600 /etc/monit.d/monitrc

5. Reload the system to activate the current configurations:

$ sudo systemctl daemon-reload

6. Optional - Navigate to the /etc/sqream directory and create a symbolic link to the monitrc file:

$ cd /etc/sqream $ sudo ln -s /etc/monit.d/monitrc monitrc

2.1.4.2.4 Starting Monit

After configuring Monit, you can start it. To start Monit: 1. Start Monit as a super user:

2.1. Full list of guides 127 SQream DB, Release 2021.1

$ sudo systemctl start monit

2. View Monit’s service status:

$ sudo systemctl status monit

3. If Monit is functioning correctly, enable the Monit service to start on boot:

$ sudo systemctl enable monit

2.1.4.3 Launching SQream with Monit

This procedure describes how to launch SQream using Monit.

2.1.4.3.1 Launching SQream

After doing the following, you can launch SQream according to the instructions on this page. 1. Installing Monit 2. Installing SQream with Binary The following is an example of a working monitrc file configured to monitor the *metadataserver and serverpicker commands, and four sqreamd services. The monitrc configuration file is located inthe conf/monitrc directory. Note that the monitrc in the example is configured for eight sqreamd services, but that only the first four are enabled:

$ set daemon 5 # check services at 30 seconds intervals $ set logfile syslog $ $ set httpd port 2812 and $ use address localhost # only accept connection from localhost $ allow localhost # allow localhost to connect to the server and $ allow admin:monit # require user 'admin' with password 'monit' $ $ ##set mailserver smtp.gmail.com port 587 $ ## using tlsv12 $ #METADATASERVER-START $ check process metadataserver with pidfile /var/run/metadataserver.pid $ start program = "/usr/bin/systemctl start metadataserver" $ stop program = "/usr/bin/systemctl stop metadataserver" $ #METADATASERVER-END $ # alert [email protected] on {nonexist, timeout} $ # with mail-format { $ # from: Monit@$HOST $ # subject: metadataserver $EVENT - $ACTION $ # message: This is an automate mail, sent from␣ ,→monit. $ # } $ #SERVERPICKER-START $ check process serverpicker with pidfile /var/run/serverpicker.pid (continues on next page)

128 Chapter 2. Guides SQream DB, Release 2021.1

(continued from previous page) $ start program = "/usr/bin/systemctl start serverpicker" $ stop program = "/usr/bin/systemctl stop serverpicker" $ #SERVERPICKER-END $ # alert [email protected] on {nonexist, timeout} $ # with mail-format { $ # from: Monit@$HOST $ # subject: serverpicker $EVENT - ,→$ACTION $ # message: This is an automate mail, ,→ sent from monit. $ # $ # $ #SQREAM1-START $ check process sqream1 with pidfile /var/run/sqream1.pid $ start program = "/usr/bin/systemctl start sqream1" $ stop program = "/usr/bin/systemctl stop sqream1" $ #SQREAM1-END $ # alert [email protected] on {nonexist, timeout} $ # with mail-format { $ # from: Monit@$HOST $ # subject: sqream1 $EVENT - $ACTION $ # message: This is an automate mail, sent from monit. $ # } $ #SQREAM2-START $ check process sqream2 with pidfile /var/run/sqream2.pid $ start program = "/usr/bin/systemctl start sqream2" $ #SQREAM2-END $ # alert [email protected] on {nonexist, timeout} $ # with mail-format { $ # from: Monit@$HOST $ # subject: sqream1 $EVENT - $ACTION $ # message: This is an automate mail, sent from monit. $ # } $ #SQREAM3-START $ check process sqream3 with pidfile /var/run/sqream3.pid $ start program = "/usr/bin/systemctl start sqream3" $ stop program = "/usr/bin/systemctl stop sqream3" $ #SQREAM3-END $ # alert [email protected] on {nonexist, timeout} $ # with mail-format { $ # from: Monit@$HOST $ # subject: sqream2 $EVENT - $ACTION $ # message: This is an automate mail, sent from monit. $ # } $ #SQREAM4-START $ check process sqream4 with pidfile /var/run/sqream4.pid $ start program = "/usr/bin/systemctl start sqream4" $ stop program = "/usr/bin/systemctl stop sqream4" $ #SQREAM4-END $ # alert [email protected] on {nonexist, timeout} $ # with mail-format { $ # from: Monit@$HOST (continues on next page)

2.1. Full list of guides 129 SQream DB, Release 2021.1

(continued from previous page) $ # subject: sqream2 $EVENT - $ACTION $ # message: This is an automate mail, sent from␣ ,→monit. $ # } $ # $ #SQREAM5-START $ #check process sqream5 with pidfile /var/run/sqream5.pid $ #start program = "/usr/bin/systemctl start sqream5" $ #stop program = "/usr/bin/systemctl stop sqream5" $ #SQREAM5-END $ # alert [email protected] on {nonexist, timeout} $ # with mail-format { $ # from: Monit@$HOST $ # subject: sqream2 $EVENT - $ACTION $ # message: This is an automate mail, sent from␣ ,→monit. $ # } $ # $ #SQREAM6-START $ #check process sqream6 with pidfile /var/run/sqream6.pid $ #start program = "/usr/bin/systemctl start sqream6" $ #stop program = "/usr/bin/systemctl stop sqream6" $ #SQREAM6-END $ # alert [email protected] on {nonexist, timeout} $ # with mail-format { $ # from: Monit@$HOST $ # subject: sqream2 $EVENT - $ACTION $ # message: This is an automate mail, sent from␣ ,→monit. $ # } $ # $ #SQREAM7-START $ #check process sqream7 with pidfile /var/run/sqream7.pid $ #start program = "/usr/bin/systemctl start sqream7" $ #stop program = "/usr/bin/systemctl stop sqream7" $ #SQREAM7-END $ # with mail-format { $ # from: Monit@$HOST $ # subject: sqream2 $EVENT - $ACTION $ # message: This is an automate mail, sent from␣ ,→monit. $ # } $ # $ #SQREAM8-START $ #check process sqream8 with pidfile /var/run/sqream8.pid $ #start program = "/usr/bin/systemctl start sqream8" $ #stop program = "/usr/bin/systemctl stop sqream8" $ #SQREAM8-END $ # alert [email protected] on {nonexist, timeout} $ # with mail-format { $ # from: Monit@$HOST $ # subject: sqream2 $EVENT - $ACTION (continues on next page)

130 Chapter 2. Guides SQream DB, Release 2021.1

(continued from previous page) $ # message: This is an automate mail, sent from␣ ,→monit. $ # }

2.1.4.3.2 Monit Usage Examples

This section shows examples of two methods for stopping the sqream3 service use Monit’s command syntax: • Stopping Monit and SQream separately • Stopping SQream using a Monit command

2.1.4.3.2.1 Stopping Monit and SQream Separately

You can stop the Monit service and SQream separately as follows:

$ sudo systemctl stop monit $ sudo systemctl stop sqream3

You can restart Monit as follows:

$ sudo systemctl start monit

Restarting Monit automatically restarts the SQream services.

2.1.4.3.2.2 Stopping SQream Using a Monit Command

You can stop SQream using a Monit command as follows:

$ sudo monit stop sqream3

This command spots SQream only (and not Monit). You can restart SQream as follows:

$ sudo monit start sqream3

2.1.4.3.2.3 Monit Command Line Options

This section describes some of the most commonly used Monit command options. You can show the command line options by running:

$ monit --help

$ start all - Start all services $ start - Only start the named service $ stop all - Stop all services $ stop - Stop the named service (continues on next page)

2.1. Full list of guides 131 SQream DB, Release 2021.1

(continued from previous page) $ restart all - Stop and start all services $ restart - Only restart the named service $ monitor all - Enable monitoring of all services $ monitor - Only enable monitoring of the named service $ unmonitor all - Disable monitoring of all services $ unmonitor - Only disable monitoring of the named service $ reload - Reinitialize monit $ status [name] - Print full status information for service(s) $ summary [name] - Print short status information for service(s) $ report [up|down|..] - Report state of services. See manual for options $ quit - Kill the monit daemon process $ validate - Check all services and start if not running $ procmatch - Test process matching pattern

2.1.4.3.3 Using Monit While Upgrading Your Version of SQream

While upgrade your version of SQream, you can use Monit to avoid conflicts (such as service start). This is done by pausing or stopping all running services while you manually upgrade SQream. When you finish successfully upgrading SQream, you can use Monit to restart all SQream services To use Monit while upgrading your version of SQream: 1. Stop all actively running SQream services:

$ sudo monit stop all

2. Verify that SQream has stopped listening on ports 500X, 510X, and 310X:

$ sudo netstat -nltp #to make sure sqream stopped listening on 500X, 510X and␣ ,→310X ports.

The example below shows the old version sqream-db-v2020.2 being replaced with the new version sqream-db-v2025.200.

$ cd /home/sqream $ mkdir tempfolder $ mv sqream-db-v2025.200.tar.gz tempfolder/ $ tar -xf sqream-db-v2025.200.tar.gz $ sudo mv sqream /usr/local/sqream-db-v2025.200 $ cd /usr/local $ sudo chown -R sqream:sqream sqream-db-v2025.200 $ sudo rm sqream #This only should remove symlink $ sudo ln -s sqream-db-v2025.200 sqream #this will create new symlink named "sqream"␣ ,→pointing to new version $ ls -l

The symbolic SQream link pointing should be pointing to the real folder:

$ sqream -> sqream-db-v2025.200

4. Restart the SQream services:

132 Chapter 2. Guides SQream DB, Release 2021.1

$ sudo monit start all

5. Verify that the latest version has been installed:

$ SELECT SHOW_VERSION();

The correct version is output. 5. Restart the UI:

$ pm2 start all

2.1.4.4 Installing SQream Using Binary Packages

This procedure describes how to install SQream using Binary packages and must be done on all servers. To install SQream using Binary packages: 1. Copy the SQream package to the /home/sqream directory for the current version:

$ tar -xf sqream-db-v<2020.2>.tar.gz

2. Append the version number to the name of the SQream folder. The version number in the following example is v2020.2:

$ mv sqream sqream-db-v<2020.2>

3. Move the new version of the SQream folder to the /usr/local/ directory:

$ sudo mv sqream-db-v<2020.2> /usr/local/

4. Change the ownership of the folder to sqream folder:

$ sudo chown -R sqream:sqream /usr/local/sqream-db-v<2020.2>

5. Navigate to the /usr/local/ directory and create a symbolic link to SQream:

$ cd /usr/local $ sudo ln -s sqream-db-v<2020.2> sqream

6. Verify that the symbolic link that you created points to the folder that you created:

$ ls -l

7. Verify that the symbolic link that you created points to the folder that you created:

$ sqream -> sqream-db-v<2020.2>

8. Create the SQream configuration file destination folders and set their ownership to sqream:

$ sudo mkdir /etc/sqream $ sudo chown -R sqream:sqream /etc/sqream

9. Create the SQream service log destination folders and set their ownership to sqream:

2.1. Full list of guides 133 SQream DB, Release 2021.1

$ sudo mkdir /var/log/sqream $ sudo chown -R sqream:sqream /var/log/sqream

10. Navigate to the /usr/local/ directory and copy the SQream configuration files from them:

$ cd /usr/local/sqream/etc/ $ cp * /etc/sqream

The configuration files are service configuration files, and the JSON files are SQream configuration files, for a total of four files. The number of SQream configuration files and JSON files must be identical. NOTICE - Verify that the JSON files have been configured correctly and that all required flags havebeen set to the correct values. In each JSON file, the following parameters must be updated: • instanceId • machineIP • metadataServerIp • spoolMemoryGB • limitQueryMemoryGB • gpu • port • ssl_port Note the following: • The value of the metadataServerIp parameter must point to the IP that the metadata is running on. • The value of the machineIP parameter must point to the IP of your local machine. It would be same on server running metadataserver and different on other server nodes. 11. Optional - To run additional SQream services, copy the required configuration files and create addi- tional JSON files:

$ cp sqream2_config.json sqream3_config.json $ vim sqream3_config.json

NOTICE: A unique instanceID must be used in each JSON file. IN the example above, the instanceID sqream_2 is changed to sqream_3. 12. Optional - If you created additional services in Step 11, verify that you have also created their additional configuration files:

$ cp sqream2-service.conf sqream3-service.conf $ vim sqream3-service.conf

13. For each SQream service configuration file, do the following: 1. Change the SERVICE_NAME=sqream2 value to SERVICE_NAME=sqream3. 2. Change LOGFILE=/var/log/sqream/sqream2.log to LOG- FILE=/var/log/sqream/sqream3.log.

134 Chapter 2. Guides SQream DB, Release 2021.1

NOTE: If you are running SQream on more than one server, you must configure the serverpicker and metadatserver services to start on only one of the servers. If metadataserver is running on the first server, the metadataServerIP value in the second server’s /etc/sqream/sqream1_config.json file must point to the IP of the server on which the metadataserver service is running. 14. Set up servicepicker: 1. Do the following:

$ vim /etc/sqream/server_picker.conf

2. Change the IP 127.0.0.1 to the IP of the server that the metadataserver service is running on. 3. Change the CLUSTER to the value of the cluster path. 15. Set up your service files:

$ cd /usr/local/sqream/service/ $ cp sqream2.service sqream3.service $ vim sqream3.service

16. Increment each EnvironmentFile=/etc/sqream/sqream2-service.conf configuration file for each SQream service file, as shown below:

$ EnvironmentFile=/etc/sqream/sqream<3>-service.conf

17. Copy and register your service files into systemd:

$ sudo cp metadataserver.service /usr/lib/systemd/system/ $ sudo cp serverpicker.service /usr/lib/systemd/system/ $ sudo cp sqream*.service /usr/lib/systemd/system/

18. Verify that your service files have been copied into systemd:

$ ls -l /usr/lib/systemd/system/sqream* $ ls -l /usr/lib/systemd/system/metadataserver.service $ ls -l /usr/lib/systemd/system/serverpicker.service $ sudo systemctl daemon-reload

19. Copy the license into the /etc/license directory:

$ cp license.enc /etc/sqream/

If you have an HDFS environment, see Configuring an HDFS Environment for the User sqream.

2.1.4.4.1 Upgrading SQream Version

Upgrading your SQream version requires stopping all running services while you manually upgrade SQream. To upgrade your version of SQream: 1. Stop all actively running SQream services. Notice- All SQream services must remain stopped while the upgrade is in process. Ensuring that SQream services remain stopped depends on the tool being used. For an example of stopping actively running SQream services, see Launching SQream with Monit. 2. Verify that SQream has stopped listening on ports 500X, 510X, and 310X:

2.1. Full list of guides 135 SQream DB, Release 2021.1

$ sudo netstat -nltp #to make sure sqream stopped listening on 500X, 510X and␣ ,→310X ports.

3. Replace the old version sqream-db-v2020.2, with the new version sqream-db-v2021.1:

$ cd /home/sqream $ mkdir tempfolder $ mv sqream-db-v2021.1.tar.gz tempfolder/ $ tar -xf sqream-db-v2021.1.tar.gz $ sudo mv sqream /usr/local/sqream-db-v2021.1 $ cd /usr/local $ sudo chown -R sqream:sqream sqream-db-v2021.1

4. Remove the symbolic link:

$ sudo rm sqream

5. Create a new symbolic link named “sqream” pointing to the new version:

$ sudo ln -s sqream-db-v2021.1 sqream

6. Verify that the symbolic SQream link points to the real folder:

$ ls -l

The following is an example of the correct output:

$ sqream -> sqream-db-v2021.1

7. Optional- (for major versions) Upgrade your version of SQream storage cluster, as shown in the following example:

$ cat /etc/sqream/sqream1_config.json |grep cluster $ ./upgrade_storage

The following is an example of the correct output:

get_leveldb_version path{} current storage version 23 upgrade_v24 upgrade_storage to 24 upgrade_storage to 24 - Done upgrade_v25 upgrade_storage to 25 upgrade_storage to 25 - Done upgrade_v26 upgrade_storage to 26 upgrade_storage to 26 - Done validate_leveldb ... upgrade_v37 upgrade_storage to 37 upgrade_storage to 37 - Done validate_leveldb storage has been upgraded successfully to version 37

136 Chapter 2. Guides SQream DB, Release 2021.1

8. Verify that the latest version has been installed:

$ ./sqream sql --username sqream --password sqream --host localhost --databasename␣ ,→master -c "SELECT SHOW_VERSION();"

The following is an example of the correct output:

v2021.1 1 row time: 0.050603s

For more information, see the upgrade_storage command line program.

2.1.4.5 Installing SQream with Kubernetes

Kubernetes, also known as k8s, is a portable open source platform that automates Linux container oper- ations. Kubernetes supports outsourcing data centers to public cloud service providers or can be scaled for web hosting. SQream uses Kubernetes as an orchestration and recovery solution. The Installing SQream with Kubernetes guide describes the following:

• Preparing the SQream Environment to Launch SQream Using Kubernetes • Setting Up Your Hosts • Installing Your Kubernetes Cluster • Installing the SQream Software • Running the Sqream-install Service • Using the sqream-start Commands • Upgrading Your SQream Version

2.1.4.5.1 Preparing the SQream Environment to Launch SQream Using Kubernetes

The Preparing the SQream environment to Launch SQream Using Kubernetes section describes the following:

• Overview • Operating System Requirements • Compute Server Specifications

2.1.4.5.1.1 Overview

A minimum of three servers is required for preparing the SQream environment using Kubernetes. Kubernetes uses clusters, which are sets of nodes running containterized applications. A cluster consists of at least two GPU nodes and one additional server without GPU to act as the quorum manager.

2.1. Full list of guides 137 SQream DB, Release 2021.1

Each server must have the following IP addresses: • An IP address located in the management network. • An additional IP address from the same subnet to function as a floating IP. All servers must be mounted in the same shared storage folder. The following list shows the server host name format requirements: • A maximum of 253 characters. • Only lowercase alphanumeric characters, such as - or .. • Starts and ends with alphanumeric characters. Go back to Preparing the SQream Environment to Launch SQream Using Kubernetes

2.1.4.5.1.2 Operating System Requirements

The required operating system is a version of x86 CentOS/RHEL between 7.6 and 7.9. Regarding PPC64le, the required version is RHEL 7.6. Go back to Preparing the SQream Environment to Launch SQream Using Kubernetes

2.1.4.5.1.3 Compute Server Specifications

Installing SQream with Kubernetes includes the following compute server specifications: • CPU: 4 cores • RAM: 16GB • HD: 500GB Go back to Preparing the SQream Environment to Launch SQream Using Kubernetes

2.1.4.5.2 Setting Up Your Hosts

SQream requires you to set up your hosts. Setting up your hosts requires the following:

• Configuring the Hosts File • Installing the Required Packages • Disabling the Linux UI • Disabling SELinux • Disabling Your Firewall • Checking the CUDA Version

2.1.4.5.2.1 Configuring the Hosts File

To configure the /etc/hosts file:

138 Chapter 2. Guides SQream DB, Release 2021.1

1. Edit the /etc/hosts file:

$ sudo vim /etc/hosts

2. Call your local host:

$ 127.0.0.1 localhost $

2.1.4.5.2.2 Installing the Required Packages

The first step in setting up your hosts is to install the required packages. To install the required packages: 1. Run the following command based on your operating system: • RHEL:

$ sudo yum -y install https://dl.fedoraproject.org/pub/epel/epel-release- ,→latest-7.noarch.rpm

• CentOS:

$ sudo yum install epel-release $ sudo yum install pciutils openssl-devel python36 python36-pip kernel- ,→devel-$(uname -r) kernel-headers-$(uname -r) gcc jq net-tools ntp

2. Verify that that the required packages were successfully installed. The following is the correct output:

ntpq --version jq --version python3 --version pip3 --version rpm -qa |grep kernel-devel-$(uname -r) rpm -qa |grep kernel-headers-$(uname -r) gcc --version

3. Enable the ntpd (Network Time Protocol daemon) program on all servers:

$ sudo systemctl start ntpd $ sudo systemctl enable ntpd $ sudo systemctl status ntpd $ sudo ntpq -p

Go back to Setting Up Your Hosts

2.1.4.5.2.3 Disabling the Linux UI

After installing the required packages, you must disable the Linux UI if it has been installed. You can disable Linux by running the following command:

$ sudo systemctl set-default multi-user.target

2.1. Full list of guides 139 SQream DB, Release 2021.1

Go back to Setting Up Your Hosts

2.1.4.5.2.4 Disabling SELinux

After disabling the Linux UI you must disable SELinux. To disable SELinux: 1. Run the following command:

$ sed -i -e s/enforcing/disabled/g /etc/selinux/config $ sudo reboot

2. Reboot the system as a root user:

$ sudo reboot

Go back to Setting Up Your Hosts

2.1.4.5.2.5 Disabling Your Firewall

After disabling SELinux, you must disable your firewall by running the following commands:

$ sudo systemctl stop firewalld $ sudo systemctl disable firewalld

Go back to Setting Up Your Hosts

2.1.4.5.2.6 Checking the CUDA Version

After completing all of the steps above, you must check the CUDA version. To check the CUDA version: 1. Check the CUDA version:

$ nvidia-smi

The following is an example of the correct output:

$ +------+ $ | NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 | $ |------+------+------+ $ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | $ | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | $ |======+======+======| $ | 0 GeForce GTX 105... Off | 00000000:01:00.0 Off | N/A | $ | 32% 38C P0 N/A / 75W | 0MiB / 4039MiB | 0% Default | $ +------+------+------+ $ $ +------+ $ | Processes: GPU Memory | $ | GPU PID Type Process name Usage | (continues on next page)

140 Chapter 2. Guides SQream DB, Release 2021.1

(continued from previous page) $ |======| $ | No running processes found | $ +------+

In the above output, the CUDA version is 10.1. If the above output is not generated, CUDA has not been installed. To install CUDA, see Installing the CUDA driver. Go back to Setting Up Your Hosts

2.1.4.5.3 Installing Your Kubernetes Cluster

After setting up your hosts, you must install your Kubernetes cluster. The Kubernetes and SQream software must be installed from the management host, and can be installed on any server in the cluster. Installing your Kubernetes cluster requires the following:

• Generating and Sharing SSH Keypairs Across All Existing Nodes • Installing and Deploying a Kubernetes Cluster Using Kubespray • Adjusting Kubespray Deployment Values • Checking Your Kubernetes Status • Adding a SQream Label to Your Kubernetes Cluster Nodes • Copying Your Kubernetes Configuration API File to the Master Cluster Nodes • Creating an env_file in Your Home Directory • Creating a Base Kubernetes Namespace • Pushing the env_file File to the Kubernetes Configmap • Installing the NVIDIA Docker2 Toolkit • Modifying the Docker Daemon JSON File for GPU and Compute Nodes • Installing the Nvidia-device-plugin Daemonset • Creating an Nvidia Device Plugin • Checking GPU Resources Allocatable to GPU Nodes • Preparing the WatchDog Monitor

2.1.4.5.3.1 Generating and Sharing SSH Keypairs Across All Existing Nodes

You can generate and share SSH keypairs across all existing nodes. Sharing SSH keypairs across all nodes enables passwordless access from the management server to all nodes in the cluster. All nodes in the cluster require passwordless access.

Note: You must generate and share an SSH keypair across all nodes even if you are installing the Kubernetes cluster on a single host.

2.1. Full list of guides 141 SQream DB, Release 2021.1

To generate and share an SSH keypair: 1. Switch to root user access:

$ sudo su -

2. Generate an RSA key pair:

$ ssh-keygen

The following is an example of the correct output:

$ ssh-keygen $ Generating public/private rsa key pair. $ Enter file in which to save the key (/root/.ssh/id_rsa): $ Created directory '/root/.ssh'. $ Enter passphrase (empty for no passphrase): $ Enter same passphrase again: $ Your identification has been saved in /root/.ssh/id_rsa. $ Your public key has been saved in /root/.ssh/id_rsa.pub. $ The key fingerprint is: $ SHA256:xxxxxxxxxxxxxxdsdsdffggtt66gfgfg [email protected] $ The key's randomart image is: $ +---[RSA 2048]----+ $ | =*. | $ | .o | $ | ..o o| $ | . .oo +.| $ | = S =...o o| $ | B + *..o+.| $ | o * *..o .+| $ | o * oo.E.o| $ | . ..+..B.+o| $ +----[SHA256]-----+

The generated file is /root/.ssh/id_rsa.pub. 3. Copy the public key to all servers in the cluster, including the one that you are running on.

$ ssh-copy-id -i ~/.ssh/id_rsa.pub root@remote-host

4. Replace the remote host with your host IP address. Go back to Installing Your Kubernetes Cluster

2.1.4.5.3.2 Installing and Deploying a Kubernetes Cluster Using Kubespray

SQream uses the Kubespray software package to install and deploy Kubernetes clusters. To install and deploy a Kubernetes cluster using Kubespray: 1. Clone Kubernetes: 1. Clone the kubespray.git repository:

$ git clone https://github.com/kubernetes-incubator/kubespray.git

142 Chapter 2. Guides SQream DB, Release 2021.1

2. Nagivate to the kubespray directory:

$ cd kubespray

3. Install the requirements.txt configuration file:

$ pip3 install -r requirements.txt

2. Create your SQream inventory directory: 1. Run the following command:

$ cp -rp inventory/sample inventory/sqream

2. Replace the with the defined cluster node IP address(es).

$ declare -a IPS=(, )

For example, the following replaces 192.168.0.93 with 192.168.0.92:

$ declare -a IPS=(host-93,192.168.0.93 host-92,192.168.0.92)

Note the following: • Running a declare requires defining a pair (host name and cluster node IP address), as shownin the above example. • You can define more than one pair. 3. When the reboot is complete, switch back to the root user:

$ sudo su -

4. Navigate to root/kubespray:

$ cd /root/kubespray

5. Copy inventory/sample as inventory/sqream:

$ cp -rfp inventory/sample inventory/sqream

6. Update the Ansible inventory file with the inventory builder:

$ declare -a IPS=(, , ,)

7. In the kubespray hosts.yml file, set the node IP’s:

$ CONFIG_FILE=inventory/sqream/hosts.yml python3 contrib/inventory_builder/ ,→inventory.py ${IPS[@]}

If you do not set a specific hostname in declare, the server hostnames will change to node1, node2, etc. To maintain specific hostnames, run declare as in the following example:

$ declare -a IPS=(eks-rhl-1,192.168.5.81 eks-rhl-2,192.168.5.82 eks-rhl-3,192.168.5. ,→83)

Note that the declare must contain pairs (hostname,ip). 8. Verify that the following have been done:

2.1. Full list of guides 143 SQream DB, Release 2021.1

• That the hosts.yml file is configured correctly. • That all children are included with their relevant nodes. You can save your current server hostname by replacing with your server hostname. 9. Generate the content output of the hosts.yml file. Make sure to include the file’s directory:

$ cat inventory/sqream/hosts.yml

The hostname can be lowercase and contain - or . only, and must be aligned with the server’s hostname. The following is an example of the correct output. Each host and IP address that you provided in Step 2 should be displayed once:

$ all: $ hosts: $ node1: $ ansible_host: 192.168.5.81 $ ip: 192.168.5.81 $ access_ip: 192.168.5.81 $ node2: $ ansible_host: 192.168.5.82 $ ip: 192.168.5.82 $ access_ip: 192.168.5.82 $ node3: $ ansible_host: 192.168.5.83 $ ip: 192.168.5.83 $ access_ip: 192.168.5.83 $ children: $ kube-master: $ hosts: $ node1: $ node2: $ node3: $ kube-node: $ hosts: $ node1: $ node2: $ node3: $ etcd: $ hosts: $ node1: $ node2: $ node3: $ k8s-cluster: $ children: $ kube-master: $ kube-node: $ calico-rr: $ hosts: {}

Go back to Installing Your Kubernetes Cluster

144 Chapter 2. Guides SQream DB, Release 2021.1

2.1.4.5.3.3 Adjusting Kubespray Deployment Values

After downloading and configuring Kubespray, you can adjust your Kubespray deployment values. A script is used to modify how the Kubernetes cluster is deployed, and you must set the cluster name variable before running this script.

Note: The script must be run from the kubespray folder.

To adjust Kubespray deployment values: 1. Add the following export to the local user’s ~/.bashrc file by replacing the with the user’s Virtual IP address:

$ export VIP_IP=

2. Logout, log back in, and verify the following:

$ echo $VIP_IP

3. Make the following replacements to the kubespray.settings.sh file:

$ cat < kubespray_settings.sh $ sed -i "/cluster_name: cluster.local/c \cluster_name: cluster.local.$cluster_ ,→name" inventory/sqream/group_vars/k8s-cluster/k8s-cluster.yml $ sed -i "/dashboard_enabled/c \dashboard_enabled\: "false"" inventory/sqream/ ,→group_vars/k8s-cluster/addons.yml $ sed -i "/kube_version/c \kube_version\: "v1.18.3"" inventory/sqream/group_vars/ ,→k8s-cluster/k8s-cluster.yml $ sed -i "/metrics_server_enabled/c \metrics_server_enabled\: "true"" inventory/ ,→sample/group_vars/k8s-cluster/addons.yml $ echo 'kube_apiserver_node_port_range: "3000-6000"' >> inventory/sqream/group_vars/ ,→k8s-cluster/k8s-cluster.yml $ echo 'kube_controller_node_monitor_grace_period: 20s' >> inventory/sqream/group_ ,→vars/k8s-cluster/k8s-cluster.yml $ echo 'kube_controller_node_monitor_period: 2s' >> inventory/sqream/group_vars/k8s- ,→cluster/k8s-cluster.yml $ echo 'kube_controller_pod_eviction_timeout: 30s' >> inventory/sqream/group_vars/ ,→k8s-cluster/k8s-cluster.yml $ echo 'kubelet_status_update_frequency: 4s' >> inventory/sqream/group_vars/k8s- ,→cluster/k8s-cluster.yml $ echo 'ansible ALL=(ALL) NOPASSWD: ALL' >> /etc/sudoers $ EOF

Note: In most cases, the Docker data resides on the system disk. Because Docker requires a high volume of data (images, containers, volumes, etc.), you can change the default Docker data location to prevent the system disk from running out of space.

4. Optional - Change the default Docker data location:

$ sed -i "/docker_daemon_graph/c \docker_daemon_graph\: """ inventory/sqream/group_vars/all/docker.yml

5. Make the kubespray_settings.sh file executable for your user:

2.1. Full list of guides 145 SQream DB, Release 2021.1

$ chmod u+x kubespray_settings.sh && ./kubespray_settings.sh

6. Run the following script:

$ ./kubespray_settings.sh

7. Run a playbook on the inventory/sqream/hosts.yml cluster.yml file:

$ ansible-playbook -i inventory/sqream/hosts.yml cluster.yml -v

The Kubespray installation takes approximately 10 - 15 minutes. The following is an example of the correct output:

$ PLAY RECAP $␣ ,→********************************************************************************************* $ node-1 : ok=680 changed=133 unreachable=0 failed=0 $ node-2 : ok=583 changed=113 unreachable=0 failed=0 $ node-3 : ok=586 changed=115 unreachable=0 failed=0 $ localhost : ok=1 changed=0 unreachable=0 failed=0

In the event that the output is incorrect, or a failure occurred during the installation, please contact a SQream customer support representative. Go back to Installing Your Kubernetes Cluster.

2.1.4.5.3.4 Checking Your Kubernetes Status

After adjusting your Kubespray deployment values, you must check your Kubernetes status. To check your Kuberetes status: 1. Check the status of the node:

$ kubectl get nodes

The following is an example of the correct output:

$ NAME STATUS ROLES AGE VERSION $ eks-rhl-1 Ready control-plane,master 29m v1.21.1 $ eks-rhl-2 Ready control-plane,master 29m v1.21.1 $ eks-rhl-3 Ready 28m v1.21.1

2. Check the status of the pod:

$ kubectl get pods --all-namespaces

The following is an example of the correct output:

$ NAMESPACE NAME ␣ ,→READY STATUS RESTARTS AGE $ kube-system calico-kube-controllers-68dc8bf4d5-n9pbp 1/1␣ ,→ Running 0 160m $ kube-system calico-node-26cn9 1/1␣ ,→ Running 1 160m (continues on next page)

146 Chapter 2. Guides SQream DB, Release 2021.1

(continued from previous page) $ kube-system calico-node-kjsgw 1/1␣ ,→ Running 1 160m $ kube-system calico-node-vqvc5 1/1␣ ,→ Running 1 160m $ kube-system coredns-58687784f9-54xsp 1/1␣ ,→ Running 0 160m $ kube-system coredns-58687784f9-g94xb 1/1␣ ,→ Running 0 159m $ kube-system dns-autoscaler-79599df498-hlw8k 1/1␣ ,→ Running 0 159m $ kube-system kube-apiserver-k8s-host-1-134 1/1␣ ,→ Running 0 162m $ kube-system kube-apiserver-k8s-host-194 1/1␣ ,→ Running 0 161m $ kube-system kube-apiserver-k8s-host-68 1/1␣ ,→ Running 0 161m $ kube-system kube-controller-manager-k8s-host-1-134 1/1␣ ,→ Running 0 162m $ kube-system kube-controller-manager-k8s-host-194 1/1␣ ,→ Running 0 161m $ kube-system kube-controller-manager-k8s-host-68 1/1␣ ,→ Running 0 161m $ kube-system kube-proxy-5f42q 1/1␣ ,→ Running 0 161m $ kube-system kube-proxy-bbwvk 1/1␣ ,→ Running 0 161m $ kube-system kube-proxy-fgcfb 1/1␣ ,→ Running 0 161m $ kube-system kube-scheduler-k8s-host-1-134 1/1␣ ,→ Running 0 161m $ kube-system kube-scheduler-k8s-host-194 1/1␣ ,→ Running 0 161m

Go back to Installing Your Kubernetes Cluster

2.1.4.5.3.5 Adding a SQream Label to Your Kubernetes Cluster Nodes

After checking your Kubernetes status, you must add a SQream label on your Kubernetes cluster nodes. To add a SQream label on your Kubernetes cluster nodes: 1. Get the cluster node list:

$ kubectl get nodes

The following is an example of the correct output:

$ NAME STATUS ROLES AGE VERSION $ eks-rhl-1 Ready control-plane,master 29m v1.21.1 $ eks-rhl-2 Ready control-plane,master 29m v1.21.1 $ eks-rhl-3 Ready 28m v1.21.1

2. Set the node label, change the node-name to the node NAME(s) in the above example:

2.1. Full list of guides 147 SQream DB, Release 2021.1

$ kubectl label nodes cluster=sqream

The following is an example of the correct output:

$ [root@edk-rhl-1 kubespray]# kubectl label nodes eks-rhl-1 cluster=sqream $ node/eks-rhl-1 labeled $ [root@edk-rhl-1 kubespray]# kubectl label nodes eks-rhl-2 cluster=sqream $ node/eks-rhl-2 labeled $ [root@edk-rhl-1 kubespray]# kubectl label nodes eks-rhl-3 cluster=sqream $ node/eks-rhl-3 labeled

Go back to Installing Your Kubernetes Cluster

2.1.4.5.3.6 Copying Your Kubernetes Configuration API File to the Master Cluster Nodes

After adding a SQream label on your Kubernetes cluster nodes, you must copy your Kubernetes configuration API file to your Master cluster nodes. When the Kubernetes cluster installation is complete, an API configuration file is automatically created in the .kube folder of the root user. This file enables the kubectl command access Kubernetes’ internal API service. Following this step lets you run kubectl commands from any node in the cluster.

Warning: You must perform this on the management server only!

To copy your Kubernetes configuration API file to your Master cluster nodes: 1. Create the .kube folder in the local user directory:

$ mkdir /home//.kube

2. Copy the configuration file from the root user directory to the directory:

$ sudo cp /root/.kube/config /home//.kube

3. Change the file owner from root user to the :

$ sudo chown . /home//.kube/config

4. Create the .kube folder in the other nodes located in the directory:

$ ssh @ mkdir .kube

5. Copy the configuration file from the management node to the other nodes:

$ scp /home//.kube/config @:/home//. ,→kube/

6. Under local user on each server you copied .kube to, run the following command:

$ sudo usermod -aG docker $USER

This grants the local user the necessary permissions to run Docker commands. Go back to Installing Your Kubernetes Cluster

148 Chapter 2. Guides SQream DB, Release 2021.1

2.1.4.5.3.7 Creating an env_file in Your Home Directory

After copying your Kubernetes configuration API file to your Master cluster nodes, you must createan env_file in your home directory, and must set the VIP address as a variable.

Warning: You must perform this on the management server only!

To create an env_file for local users in the user’s home directory: 1. Set a variable that includes the VIP IP address:

$ export VIP_IP=

Note: If you use Kerberos, replace the KRB5_SERVER value with the IP address of your Kerberos server.

2. Do one of the following: • For local users:

$ mkdir /home/$USER/.sqream

3. Make the following replacements to the kubespray.settings.sh file, verifying that the KRB5_SERVER parameter is set to your server IP:

$ cat < /home/$USER/.sqream/env_file SQREAM_K8S_VIP=$VIP_IP SQREAM_ADMIN_UI_PORT=8080 SQREAM_DASHBOARD_DATA_COLLECTOR_PORT=8100 SQREAM_DATABASE_NAME=master SQREAM_K8S_ADMIN_UI=sqream-admin-ui SQREAM_K8S_DASHBOARD_DATA_COLLECTOR=dashboard-data-collector SQREAM_K8S_METADATA=sqream-metadata SQREAM_K8S_NAMESPACE=sqream SQREAM_K8S_PICKER=sqream-picker SQREAM_K8S_PROMETHEUS=prometheus SQREAM_K8S_REGISTRY_PORT=6000 SQREAM_METADATA_PORT=3105 SQREAM_PICKER_PORT=3108 SQREAM_PROMETHEUS_PORT=9090 SQREAM_SPOOL_MEMORY_RATIO=0.25 SQREAM_WORKER_0_PORT=5000 KRB5CCNAME=FILE:/tmp/tgt KRB5_SERVER=kdc.sq.com:1 KRB5_CONFIG_DIR=${ $ SQREAM_MOUNT_DIR}/krb5 KRB5_CONFIG_FILE=${KRB5_CONFIG_DIR}/krb5.conf HADOOP_CONFIG_DIR=${ $ SQREAM_MOUNT_DIR}/hadoop HADOOP_CORE_XML=${HADOOP_CONFIG_DIR}/core-site.xml HADOOP_HDFS_XML=${HADOOP_CONFIG_DIR}/hdfs-site.xml EOF

Go back to Installing Your Kubernetes Cluster

2.1. Full list of guides 149 SQream DB, Release 2021.1

2.1.4.5.3.8 Creating a Base Kubernetes Namespace

After creating an env_file in the user’s home directory, you must create a base Kubernetes namespace. You can create a Kubernetes namespace by running the following command:

$ kubectl create namespace sqream-init

The following is an example of the correct output:

$ namespace/sqream-init created

Go back to Installing Your Kubernetes Cluster

2.1.4.5.3.9 Pushing the env_file File to the Kubernetes Configmap

After creating a base Kubernetes namespace, you must push the env_file to the Kubernetes configmap. You must push the env_file file to the Kubernetes configmap in the sqream-init namespace. This is done by running the following command:

$ kubectl create configmap sqream-init -n sqream-init --from-env-file=/home/$USER/. ,→sqream/env_file

The following is an example of the correct output:

$ configmap/sqream-init created

Go back to Installing Your Kubernetes Cluster

2.1.4.5.3.10 Installing the NVIDIA Docker2 Toolkit

After pushing the env_file file to the Kubernetes configmap, you must install the NVIDIA Docker2 Toolkit. The NVIDIA Docker2 Toolkit lets users build and run GPU-accelerated Docker containers, and must be run only on GPU servers. The NVIDIA Docker2 Toolkit includes a container runtime library and utilities that automatically configure containers to leverage NVIDIA GPUs.

2.1.4.5.3.11 Installing the NVIDIA Docker2 Toolkit on an x86_64 Bit Processor on CentOS

To install the NVIDIA Docker2 Toolkit on an x86_64 bit processor on CentOS: 1. Add the repository for your distribution:

$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) $ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker. ,→repo | \ $ sudo tee /etc/yum.repos.d/nvidia-docker.repo

2. Install the nvidia-docker2 package and reload the Docker daemon configuration:

$ sudo yum install nvidia-docker2 $ sudo pkill -SIGHUP dockerd

150 Chapter 2. Guides SQream DB, Release 2021.1

3. Verify that the nvidia-docker2 package has been installed correctly:

$ docker run --runtime=nvidia --rm nvidia/:10.1-base nvidia-smi

The following is an example of the correct output:

$ docker run --runtime=nvidia --rm nvidia/cuda:10.1-base nvidia-smi $ Unable to find image 'nvidia/cuda:10.1-base' locally $ 10.1-base: Pulling from nvidia/cuda $ d519e2592276: Pull complete $ d22d2dfcfa9c: Pull complete $ b3afe92c540b: Pull complete $ 13a10df09dc1: Pull complete $ 4f0bc36a7e1d: Pull complete $ cd710321007d: Pull complete $ Digest: sha256:635629544b2a2be3781246fdddc55cc1a7d8b352e2ef205ba6122b8404a52123 $ Status: Downloaded newer image for nvidia/cuda:10.1-base $ Sun Feb 14 13:27:58 2021 $ +------+ $ | NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 | $ |------+------+------+ $ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | $ | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | $ |======+======+======| $ | 0 GeForce GTX 105... Off | 00000000:01:00.0 Off | N/A | $ | 32% 37C P0 N/A / 75W | 0MiB / 4039MiB | 0% Default | $ +------+------+------+ $ $ +------+ $ | Processes: GPU Memory | $ | GPU PID Type Process name Usage | $ |======| $ | No running processes found | $ +------+

For more information on installing the NVIDIA Docker2 Toolkit on an x86_64 Bit Processor on CentOS, see NVIDIA Docker Installation - CentOS distributions

2.1.4.5.3.12 Installing the NVIDIA Docker2 Toolkit on an x86_64 Bit Processor on Ubuntu

To install the NVIDIA Docker2 Toolkit on an x86_64 bit processor on Ubuntu: 1. Add the repository for your distribution:

$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \ $ sudo apt-key add - $ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) $ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker. ,→list | \ $ sudo tee /etc/apt/sources.list.d/nvidia-docker.list $ sudo apt-get update

2. Install the nvidia-docker2 package and reload the Docker daemon configuration:

2.1. Full list of guides 151 SQream DB, Release 2021.1

$ sudo apt-get install nvidia-docker2 $ sudo pkill -SIGHUP dockerd

3. Verify that the nvidia-docker2 package has been installed correctly:

$ docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi

For more information on installing the NVIDIA Docker2 Toolkit on an x86_64 Bit Processor on Ubuntu, see NVIDIA Docker Installation - Ubuntu distributions Go back to Installing Your Kubernetes Cluster

2.1.4.5.3.13 Modifying the Docker Daemon JSON File for GPU and Compute Nodes

After installing the NVIDIA Docker2 toolkit, you must modify the Docker daemon JSON file for GPU and Compute nodes.

2.1.4.5.3.14 Modifying the Docker Daemon JSON File for GPU Nodes

To modify the Docker daemon JSON file for GPU nodes: 1. Enable GPU and set HTTP access to the local Kubernetes Docker registry.

Note: The Docker daemon JSON file must be modified on all GPU nodes.

Note: Contact your IT department for a virtual IP.

2. Replace the VIP address with your assigned VIP address. 3. Connect as a root user:

$ sudo -i

4. Set a variable that includes the VIP address:

$ export VIP_IP=

5. Replace the with the VIP address:

$ cat < /etc/docker/daemon.json $ { $ "insecure-registries": ["$VIP_IP:6000"], $ "default-runtime": "nvidia", $ "runtimes": { $ "nvidia": { $ "path": "nvidia-container-runtime", $ "runtimeArgs": [] $ } $ } $ } $ EOF

152 Chapter 2. Guides SQream DB, Release 2021.1

6. Apply the changes and restart Docker:

$ systemctl daemon-reload && systemctl restart docker

7. Exit the root user:

$ exit

Go back to Installing Your Kubernetes Cluster

2.1.4.5.3.15 Modifying the Docker Daemon JSON File for Compute Nodes

You must follow this procedure only if you have a Compute node. To modify the Docker daemon JSON file for Compute nodes: 1. Switch to a root user:

$ sudo -i

2. Set a variable that includes a VIP address.

Note: Contact your IT department for a virtual IP.

3. Replace the VIP address with your assigned VIP address.

$ cat < /etc/docker/daemon.json $ { $ "insecure-registries": ["$VIP_IP:6000"] $ } $ EOF

4. Restart the services:

$ systemctl daemon-reload && systemctl restart docker

5. Exit the root user:

$ exit

Go back to Installing Your Kubernetes Cluster

2.1.4.5.3.16 Installing the Nvidia-device-plugin Daemonset

After modifying the Docker daemon JSON file for GPU or Compute Nodes, you must installing the Nvidia- device-plugin daemonset. The Nvidia-device-plugin daemonset is only relevant to GPU nodes. To install the Nvidia-device-plugin daemonset: 1. Set nvidia.com/gpu to true on all GPU nodes:

$ kubectl label nodes nvidia.com/gpu=true

2.1. Full list of guides 153 SQream DB, Release 2021.1

2. Replace the with your GPU node name: For a complete list of GPU node names, run the kubectl get nodes command. The following is an example of the correct output:

$ [root@eks-rhl-1 ~]# kubectl label nodes eks-rhl-1 nvidia.com/gpu=true $ node/eks-rhl-1 labeled $ [root@eks-rhl-1 ~]# kubectl label nodes eks-rhl-2 nvidia.com/gpu=true $ node/eks-rhl-2 labeled $ [root@eks-rhl-1 ~]# kubectl label nodes eks-rhl-3 nvidia.com/gpu=true $ node/eks-rhl-3 labeled

Go back to Installing Your Kubernetes Cluster

2.1.4.5.3.17 Creating an Nvidia Device Plugin

After installing the Nvidia-device-plugin daemonset, you must create an Nvidia-device-plugin. You can create an Nvidia-device-plugin by running the following command

$ kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0- ,→beta6/nvidia-device-plugin.yml

If needed, you can check the status of the Nvidia-device-plugin-daemonset pod status:

$ kubectl get pods -n kube-system -o wide | grep nvidia-device-plugin

The following is an example of the correct output:

$ NAME READY STATUS RESTARTS AGE $ nvidia-device-plugin-daemonset-fxfct 1/1 Running 0 6h1m $ nvidia-device-plugin-daemonset-jdvxs 1/1 Running 0 6h1m $ nvidia-device-plugin-daemonset-xpmsv 1/1 Running 0 6h1m

Go back to Installing Your Kubernetes Cluster

2.1.4.5.3.18 Checking GPU Resources Allocatable to GPU Nodes

After creating an Nvidia Device Plugin, you must check the GPU resources alloctable to the GPU nodes. Each GPU node has records, such as nvidia.com/gpu: <#>. The # indicates the number of allocatable, or available, GPUs in each node. You can output a description of allocatable resources by running the following command:

$ kubectl describe node | grep -i -A 7 -B 2 allocatable:

The following is an example of the correct output:

$ Allocatable: $ cpu: 3800m $ ephemeral-storage: 94999346224 $ hugepages-1Gi: 0 $ hugepages-2Mi: 0 $ memory: 15605496Ki (continues on next page)

154 Chapter 2. Guides SQream DB, Release 2021.1

(continued from previous page) $ nvidia.com/gpu: 1 $ pods: 110

Go back to Installing Your Kubernetes Cluster

2.1.4.5.3.19 Preparing the WatchDog Monitor

SQream’s deployment includes installing two watchdog services. These services monitor Kuberenetes man- agement and the server’s storage network. You can enable the storage watchdogs by adding entries in the /etc/hosts file on each server:

$

k8s-node1.storage $
k8s-node2.storage $
k8s-node3.storage

The following is an example of the correct syntax:

$ 10.0.0.1 k8s-node1.storage $ 10.0.0.2 k8s-node2.storage $ 10.0.0.3 k8s-node3.storage

Go back to Installing Your Kubernetes Cluster

2.1.4.5.4 Installing the SQream Software

Once you’ve prepared the SQream environment for launching it using Kubernetes, you can begin installing the SQream software. The Installing the SQream Software section describes the following:

• Getting the SQream Package • Setting Up and Configuring Hadoop • Starting a Local Docker Image Registry • Installing the Kubernetes Dashboard • Installing the SQream Prometheus Package

2.1.4.5.4.1 Getting the SQream Package

The first step in installing the SQream software is getting the SQream package. Please contact theSQream Support team to get the sqream_k8s-nnn-DBnnn-COnnn-SDnnn-.tar.gz tarball file. This file includes the following values: • sqream_k8s- - the SQream installer version. • DB - the SQreamDB version. • CO - the SQream console version.

2.1. Full list of guides 155 SQream DB, Release 2021.1

• SD - the SQream Acceleration Studio version. • arch - the server architecture. You can extract the contents of the tarball by running the following command:

$ tar -xvf sqream_k8s-1.0.15-DB2020.1.0.2-SD0.7.3-x86_64.tar.gz $ cd sqream_k8s-1.0.15-DB2020.1.0.2-SD0.7.3-x86_64 $ ls

Extracting the contents of the tarball file generates a new folder with the same name as the tarball file. The following shows the output of the extracted file:

drwxrwxr-x. 2 sqream sqream 22 Jan 27 11:39 license lrwxrwxrwx. 1 sqream sqream 49 Jan 27 11:39 sqream -> .sqream/sqream-sql-v2020.3.1_ ,→stable.x86_64/sqream -rwxrwxr-x. 1 sqream sqream 9465 Jan 27 11:39 sqream-install -rwxrwxr-x. 1 sqream sqream 12444 Jan 27 11:39 sqream-start

Go back to Installing Your SQream Software

2.1.4.5.4.2 Setting Up and Configuring Hadoop

After getting the SQream package, you can set up and configure Hadoop by configuring the keytab and krb5.conf files.

Note: You only need to configure the keytab and krb5.conf files if you use Hadoop with Kerberos authentication.

To set up and configure Hadoop: 1. Contact IT for the keytab and krb5.conf files. 2. Copy both files into the respective empty .hadoop/ and .krb5/ directories:

$ cp hdfs.keytab krb5.conf .krb5/ $ cp core-site.xml hdfs-site.xml .hadoop/

The SQream installer automatically copies the above files during the installation process. Go back to Installing Your SQream Software

2.1.4.5.4.3 Starting a Local Docker Image Registry

After getting the SQream package, or (optionally) setting up and configuring Hadoop, you must start a local Docker image registry. Because Kubernetes is based on Docker, you must start the local Docker image registry on the host’s shared folder. This allows all hosts to pull the SQream Docker images. To start a local Docker image registry: 1. Create a Docker registry folder:

$ mkdir /docker-registry/

2. Set the docker_path for the Docker registry folder:

156 Chapter 2. Guides SQream DB, Release 2021.1

$ export docker_path=

3. Apply the docker-registry service to the cluster:

$ cat .k8s/admin/docker_registry.yaml | envsubst | kubectl create -f -

The following is an example of the correct output:

namespace/sqream-docker-registry created configmap/sqream-docker-registry-config created deployment.apps/sqream-docker-registry created service/sqream-docker-registry created

4. Check the pod status of the docker-registry service:

$ kubectl get pods -n sqream-docker-registry

The following is an example of the correct output:

NAME READY STATUS RESTARTS AGE sqream-docker-registry-655889fc57-hmg7h 1/1 Running 0 6h40m

Go back to Installing Your SQream Software

2.1.4.5.4.4 Installing the Kubernetes Dashboard

After starting a local Docker image registry, you must install the Kubernetes dashboard. The Kubernetes dashboard lets you see the Kubernetes cluster, nodes, services, and pod status. To install the Kubernetes dashboard: 1. Apply the k8s-dashboard service to the cluster:

$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.0/ ,→aio/deploy/recommended.yaml

The following is an example of the correct output:

namespace/kubernetes-dashboard created serviceaccount/kubernetes-dashboard created service/kubernetes-dashboard created secret/kubernetes-dashboard-certs created secret/kubernetes-dashboard-csrf created secret/kubernetes-dashboard-key-holder created configmap/kubernetes-dashboard-settings created role.rbac.authorization.k8s.io/kubernetes-dashboard created clusterrole.rbac.authorization.k8s.io/kubernetes-dashboard created rolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created clusterrolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created deployment.apps/kubernetes-dashboard created service/dashboard-metrics-scraper created deployment.apps/dashboard-metrics-scraper created

2. Grant the user external access to the Kubernetes dashboard:

2.1. Full list of guides 157 SQream DB, Release 2021.1

$ cat .k8s/admin/kubernetes-dashboard-svc-metallb.yaml | envsubst | kubectl create - ,→f -

The following is an example of the correct output:

service/kubernetes-dashboard-nodeport created

3. Create the cluster-admin-sa.yaml file:

$ kubectl create -f .k8s/admin/cluster-admin-sa.yaml

The following is an example of the correct output:

clusterrolebinding.rbac.authorization.k8s.io/cluster-admin-sa-cluster-admin created

4. Check the pod status of the K8s-dashboard service:

$ kubectl get pods -n kubernetes-dashboard

The following is an example of the correct output:

NAME READY STATUS RESTARTS AGE dashboard-metrics-scraper-6b4884c9d5-n8p57 1/1 Running 0 4m32s kubernetes-dashboard-7b544877d5-qc8b4 1/1 Running 0 4m32s

5. Obtain the k8s-dashboard access token:

$ kubectl -n kube-system describe secrets cluster-admin-sa-token

The following is an example of the correct output:

Name: cluster-admin-sa-token-rbl9p Namespace: kube-system Labels: Annotations: kubernetes.io/service-account.name: cluster-admin-sa kubernetes.io/service-account.uid: 81866d6d-8ef3-4805-840d- ,→58618235f68d

Type: kubernetes.io/service-account-token

Data ==== ca.crt: 1025 bytes namespace: 11 bytes token: ␣ ,→eyJhbGciOiJSUzI1NiIsImtpZCI6IjRMV09qVzFabjhId09oamQzZGFFNmZBeEFzOHp3SlJOZWdtVm5lVTdtSW8ifQ. ,→eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJjbHVzdGVyLWFkbWluLXNhLXRva2VuLXJibDlwIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6ImNsdXN0ZXItYWRtaW4tc2EiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiI4MTg2NmQ2ZC04ZWYzLTQ4MDUtODQwZC01ODYxODIzNWY2OGQiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zeXN0ZW06Y2x1c3Rlci1hZG1pbi1zYSJ9. ,→mNhp8JMr5y3hQ44QrvRDCMueyjSHSrmqZcoV00ZC7iBzNUqh3n-fB99CvC_ ,→GR15ys43jnfsz0tdsTy7VtSc9hm5ENBI-tQ_mwT1Zc7zJrEtgFiA0o_ ,→eyfYZOARdhdyFEJg84bzkIxJFPKkBWb4iPWU1Xb7RibuMCjNTarZMZbqzKYfQEcMZWJ5UmfUqp- ,→HahZZR4BNbjSWybs7t6RWdcQZt6sO_rRCDrOeEJlqKKjx4-5jFZB8Du_ ,→0kKmnw2YJmmSCEOXrpQCyXIiZJpX08HyDDYfFp8IGzm61arB8HDA9dN_ ,→xoWvuz4Cj8klUtTzL9effJJPjHJlZXcEqQc9hE3jw

6. Navigate to https://:5999.

158 Chapter 2. Guides SQream DB, Release 2021.1

7. Select the Token radio button, paste the token from the previous command output, and click Sign in. The Kubernetes dashboard is displayed. Go back to Installing Your SQream Software

2.1.4.5.4.5 Installing the SQream Prometheus Package

After installing the Kubernetes dashboard, you must install the SQream Prometheus package. To properly monitor the host and GPU statistics the exporter service must be installed on each Kubernetes cluster node. This section describes how to install the following: • node_exporter - collects host data, such as CPU memory usage. • nvidia_exporter - collects GPU utilization data.

Note: The steps in this section must be done on all cluster nodes.

To install the sqream-prometheus package, you must do the following: 1. Install the exporter service 2. Check the exporter service Go back to Installing Your SQream Software

2.1.4.5.4.6 Installing the Exporter Service

To install the exporter service: 1. Create a user and group that will be used to run the exporter services:

$ sudo groupadd --system prometheus && sudo useradd -s /sbin/nologin --system -g␣ ,→prometheus prometheus

2. Extract the sqream_exporters_prometheus.0.1.tar.gz file:

$ cd .prometheus $ tar -xf sqream_exporters_prometheus.0.1.tar.gz

3. Copy the exporter software files to the /usr/bin directory:

$ cd sqream_exporters_prometheus.0.1 $ sudo cp node_exporter/node_exporter /usr/bin/ $ sudo cp nvidia_exporter/nvidia_exporter /usr/bin/

4. Copy the exporters service file to the /etc/systemd/system/ directory:

$ sudo cp services/node_exporter.service /etc/systemd/system/ $ sudo cp services/nvidia_exporter.service /etc/systemd/system/

5. Set the permission and group of the service files:

2.1. Full list of guides 159 SQream DB, Release 2021.1

$ sudo chown prometheus:prometheus /usr/bin/node_exporter $ sudo chmod u+x /usr/bin/node_exporter $ sudo chown prometheus:prometheus /usr/bin/nvidia_exporter $ sudo chmod u+x /usr/bin/nvidia_exporter

6. Reload the services:

$ sudo systemctl daemon-reload

7. Start both services and set them to start when the server is booted up: • Node_exporter:

$ sudo systemctl start node_exporter && sudo systemctl enable node_exporter

• Nvidia_exporter:

$ sudo systemctl start nvidia_exporter && sudo systemctl enable nvidia_exporter

2.1.4.5.4.7 Checking the Exporter Status

After installing the exporter service, you must check its status. You can check the exporter status by running the following command:

$ sudo systemctl status node_exporter && sudo systemctl status nvidia_exporter

Go back to Installing Your SQream Software

2.1.4.5.5 Running the Sqream-install Service

The Running the Sqream-install Service section describes the following:

• Installing Your License • Changing Your Data Ingest Folder • Checking Your System Settings • SQream Installation Command Reference • Controlling Your Kubernetes Cluster Using SQream Flags

2.1.4.5.5.1 Installing Your License

After install the SQream Prometheus package, you must install your license. To install your license: 1. Copy your license package to the sqream /license folder.

160 Chapter 2. Guides SQream DB, Release 2021.1

Note: You do not need to untar the license package after copying it to the /license folder because the installer script does it automatically.

The following flags are mandatory during your first run:

$ sudo ./sqream-install -i -k -m

Note: If you cannot run the script with sudo, verify that you have the right permission (rwx for the user) on the relevant directories (config, log, volume, and data-in directories).

Go back to Running the SQream_install Service.

2.1.4.5.5.2 Changing Your Data Ingest Folder

After installing your license, you must change your data ingest folder. You can change your data ingest folder by running the following command:

$ sudo ./sqream-install -d /media/nfs/sqream/data_in

Go back to Running the SQream_install Service.

2.1.4.5.5.3 Checking Your System Settings

After changing your data ingest folder, you must check your system settings. The following command shows you all the variables that your SQream system is running with:

$ ./sqream-install -s

After optionally checking your system settings, you can use the sqream-start application to control your Kubernetes cluster. Go back to Running the SQream_install Service.

2.1.4.5.5.4 SQream Installation Command Reference

If needed, you can use the sqream-install flag reference for any needed flags by typing:

$ ./sqream-install --help

The following shows the sqream–install flag descriptions:

2.1. Full list of guides 161 SQream DB, Release 2021.1

Flag Function Note -i Loads all the software from the hidden .docker folder. Mandatory -k Loads the license package from the /license directory. Mandatory -m Sets the relative path for all SQream folders under the Mandatory shared filesystem available from all nodes (sqreamdb, con- fig, logs and data_in). No other flags are required ifyou use this flag (such as c, v, l or d). -c Sets the path where to write/read SQream configuration Optional files from. The default is /etc/sqream/. -v Shows the location of the SQream cluster. v creates a Optional cluster if none exist, and mounts it if does. -l Shows the location of the SQream system startup logs. Optional The logs contain startup and Docker logs. The default is /var/log/sqream/. -d Shows the folder containing data that you want to import Optional into or copy from SQream. -n Sets the Kubernetes namespace. The default is sqream. Optional -N Deletes a specific Kubernetes namespace and sets the fac- Optional tory default namespace (sqream). -f Overwrite existing folders and all files located in mounted Optional directories. -r Resets the system configuration. This flag is run without Optional any other variables. -s Shows the system settings. Optional -e Sets the Kubernetes cluster’s virtual IP address. Optional -h Help, shows all available flags. Optional

Go back to Running the SQream_install Service.

2.1.4.5.5.5 Controlling Your Kubernetes Cluster Using SQream Flags

You can control your Kubernetes cluster using SQream flags. The following command shows you the available Kubernetes cluster control options:

$ ./sqream-start -h

The following describes the sqream-start flags:

162 Chapter 2. Guides SQream DB, Release 2021.1

Flag Function Note -s Starts the sqream services, starting metadata, server Mandatory picker, and workers. The number of workers started is based on the number of available GPU’s. -p Sets specific ports to the workers services. You must en- ter the starting port for the sqream-start application to allocate it based on the number of workers. -j Uses an external .json configuration file. The file must be The workers must each located in the configuration directory. be started individu- ally. -m Allocates worker spool memory. The workers must each be started individu- ally. -a Starts the SQream Administration dashboard and speci- fies the listening port. -d Deletes all running SQream services. -h Shows all available flags. Help

Go back to Running the SQream_install Service.

2.1.4.5.6 Using the sqream-start Commands

In addition to controlling your Kubernetes cluster using SQream flags, you can control it using sqream-start commands. The Using the sqream-start Commands section describes the following:

• Starting Your SQream Services • Starting Your SQream Services in Split Mode • Starting the Sqream Studio UI • Stopping the SQream Services • Advanced sqream-start Commands

2.1.4.5.6.1 Starting Your SQream Services

You can run the sqream-start command with the -s flag to start SQream services on all available GPU’s:

$ sudo ./sqream-start -s

This command starts the SQream metadata, server picker, and sqream workers on all available GPU’s in the cluster. The following is an example of the correct output:

./sqream-start -s Initializing network watchdogs on 3 hosts... Network watchdogs are up and running (continues on next page)

2.1. Full list of guides 163 SQream DB, Release 2021.1

(continued from previous page)

Initializing 3 worker data collectors ... Worker data collectors are up and running

Starting Prometheus ... Prometheus is available at 192.168.5.100:9090

Starting SQream master ... SQream master is up and running

Starting up 3 SQream workers ... All SQream workers are up and running, SQream-DB is available at 192.168.5.100:3108 All SQream workers are up and running, SQream-DB is available at 192.168.5.100:3108

Go back to Using the SQream-start Commands.

2.1.4.5.6.2 Starting Your SQream Services in Split Mode

Starting SQream services in split mode refers to running multiple SQream workers on a single GPU. You can do this by running the sqream-start command with the -s and -z flags. In addition, you can define the amount of hosts to run the multiple workers on. In the example below, the command defines to run the multiple workers on three hosts. To start SQream services in split mode: 1. Run the following command:

$ ./sqream-start -s -z 3

This command starts the SQream metadata, server picker, and sqream workers on a single GPU for three hosts: The following is an example of the correct output:

Initializing network watchdogs on 3 hosts... Network watchdogs are up and running

Initializing 3 worker data collectors ... Worker data collectors are up and running

Starting Prometheus ... Prometheus is available at 192.168.5.101:9090

Starting SQream master ... SQream master is up and running

Starting up 9 SQream workers over <#> available GPUs ... All SQream workers are up and running, SQream-DB is available at 192.168.5.101:3108

2. Verify all pods are properly running in k8s cluster (STATUS column):

kubectl -n sqream get pods

(continues on next page)

164 Chapter 2. Guides SQream DB, Release 2021.1

(continued from previous page) NAME READY STATUS RESTARTS AGE prometheus-bcf877867-kxhld 1/1 Running 0 106s sqream-metadata-fbcbc989f-6zlkx 1/1 Running 0 103s sqream-picker-64b8c57ff5-ndfr9 1/1 Running 2 102s sqream-split-workers-0-1-2-6bdbfbbb86-ml7kn 1/1 Running 0 57s sqream-split-workers-3-4-5-5cb49d49d7-596n4 1/1 Running 0 57s sqream-split-workers-6-7-8-6d598f4b68-2n9z5 1/1 Running 0 56s sqream-workers-start-xj75g 1/1 Running 0 58s watchdog-network-management-6dnfh 1/1 Running 0 115s watchdog-network-management-tfd46 1/1 Running 0 115s watchdog-network-management-xct4d 1/1 Running 0 115s watchdog-network-storage-lr6v4 1/1 Running 0 116s watchdog-network-storage-s29h7 1/1 Running 0 116s watchdog-network-storage-sx9mw 1/1 Running 0 116s worker-data-collector-62rxs 0/1 Init:0/1 0 54s worker-data-collector-n8jsv 0/1 Init:0/1 0 55s worker-data-collector-zp8vf 0/1 Init:0/1 0 54s

Go back to Using the SQream-start Commands.

2.1.4.5.6.3 Starting the Sqream Studio UI

You can run the following command the to start the SQream Studio UI (Editor and Dashboard):

$ ./sqream-start -a

The following is an example of the correct output:

$ ./sqream-start -a Please enter USERNAME: sqream Please enter PASSWORD: ****** Please enter port value or press ENTER to keep 8080:

Starting up SQream Admin UI... SQream admin ui is available at 192.168.5.100:8080

Go back to Using the SQream-start Commands.

2.1.4.5.6.4 Stopping the SQream Services

You can run the following command to stop all SQream services:

$ ./sqream-start -d

The following is an example of the correct output:

$ ./sqream-start -d $ Cleaning all SQream services in sqream namespace ... $ All SQream service removed from sqream namespace

2.1. Full list of guides 165 SQream DB, Release 2021.1

Go back to Using the SQream-start Commands.

2.1.4.5.6.5 Advanced sqream-start Commands

2.1.4.5.6.6 Controlling Your SQream Spool Size

If you do not specify the SQream spool size, the console automatically distributes the available RAM between all running workers. You can define a specific spool size by running the following command:

$ ./sqream-start -s -m 4

2.1.4.5.6.7 Using a Custom .json File

You have the option of using your own .json file for your own custom configurations. Your .json filemust be placed within the path mounted in the installation. SQream recommends placing your .json file in the configuration folder. The SQream console does not validate the integrity of external .json files. You can use the following command (using the j flag) to set the full path of your .json file to the configuration file:

$ ./sqream-start -s -f .json

This command starts one worker with an external configuration file.

Note: The configuration file must be available in the shared configuration folder.

2.1.4.5.6.8 Checking the Status of the SQream Services

You can show all running SQream services by running the following command:

$ kubectl get pods -n -o wide

This command shows all running services in the cluster and which nodes they are running in. Go back to Using the SQream-start Commands.

2.1.4.5.7 Upgrading Your SQream Version

The Upgrading Your SQream Version section describes the following:

• Before Upgrading Your System • Upgrading Your System

166 Chapter 2. Guides SQream DB, Release 2021.1

2.1.4.5.7.1 Before Upgrading Your System

Before upgrading your system you must do the following: 1. Contact SQream support for a new SQream package tarball file. 2. Set a maintenance window.

Note: You must stop the system while upgrading it.

2.1.4.5.7.2 Upgrading Your System

After completing the steps in Before Upgrading Your System above, you can upgrade your system. To upgrade your system: 1. Extract the contents of the tarball file that you received from SQream support. Make sure to extract the contents to the same directory as in Getting the SQream Package and for the same user:

$ tar -xvf sqream_installer-2.0.5-DB2019.2.1-CO1.6.3-ED3.0.0-x86_64/ $ cd sqream_installer-2.0.5-DB2019.2.1-CO1.6.3-ED3.0.0-x86_64/

2. Start the upgrade process run the following command:

$ ./sqream-install -i

The upgrade process checks if the SQream services are running and will prompt you to stop them. 3. Do one of the following: • Stop the upgrade by writing No. • Continue the upgrade by writing Yes. If you continue upgrading, all running SQream workers (master and editor) are stopped. When all services have been stopped, the new version is loaded.

Note: SQream periodically upgrades its metadata structure. If an upgrade version includes an upgraded metadata service, an approval request message is displayed. This approval is required to finish the upgrade process. Because SQream supports only specific metadata versions, all SQream services must be upgraded at the same time.

4. When SQream has successfully upgraded, load the SQream console and restart your services. For questions, contact SQream Support.

2.1.4.6 Running SQream in a Docker Container

This document describes how to prepare your machine’s environment for installing and running SQream in a Docker container.

This page describes the following:

2.1. Full list of guides 167 SQream DB, Release 2021.1

• Running SQream in a Docker Container – Setting Up a Host – Installing the Docker Engine (Community Edition) – Docker Post-Installation – Installing the Nvidia Docker2 ToolKit – Installing the SQream Software – Using the SQream Console

2.1.4.6.1 Setting Up a Host

2.1.4.6.1.1 Operating System Requirements

SQream was tested and verified on the following versions of Linux: • x86 CentOS/RHEL 7.6 - 7.9 • IBM RHEL 7.6 SQream recommends installing a clean OS on the host to avoid any installation issues.

Warning: Docker-based installation supports only single host deployment and cannot be used on a multi-node cluster. Installing Docker on a single host you will not be able to scale it to a multi-node cluster.

2.1.4.6.1.2 Creating a Local User

To run SQream in a Docker container you must create a local user. To create a local user: 1. Add a local user:

$ useradd -m -U

2. Set the local user’s password:

$ passwd

3. Add the local user to the wheel group:

$ usermod -aG wheel

You can remove the local user from the wheel group when you have completed the installation. 4. Log out and log back in as the local user.

168 Chapter 2. Guides SQream DB, Release 2021.1

2.1.4.6.1.3 Setting a Local Language

After creating a local user you must set a local language. To set a local language: 1. Set the local language:

$ sudo localectl set-locale LANG=en_US.UTF-8

2. Set the time stamp (time and date) of the locale:

$ sudo timedatectl set-timezone Asia/Jerusalem

You can run the timedatectl list-timezones command to see your timezone.

2.1.4.6.1.4 Adding the EPEL Repository

After setting a local language you must add the EPEL repository. To add the EPEL repository: 1. As a root user, upgrade the epel-release-latest-7.noarch.rpm repository:

$ sudo rpm -Uvh http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch. ,→rpm

2.1.4.6.1.5 Installing the Required NTP Packages

After adding the EPEL repository, you must install the required NTP packages. You can install the required NTP packages by running the following command:

$ sudo yum install ntp pciutils python36 kernel-devel-$(uname -r) kernel-headers- ,→$(uname -r) gcc

2.1.4.6.1.6 Installing the Recommended Tools

After installin gthe required NTP packages you must install the recommended tools. SQream recommends installing the following recommended tools:

$ sudo yum install bash-completion.noarch vim-enhanced.x86_64 vim-common.x86_64 net- ,→tools iotop htop psmisc screen xfsprogs wget yum-utils deltarpm dos2unix

2.1.4.6.1.7 Updating to the Current Version of the Operating System

After installing the recommended tools you must update to the current version of the operating system. SQream recommends updating to the current version of the operating system. This is not recommended if the nvidia driver has not been installed.

2.1. Full list of guides 169 SQream DB, Release 2021.1

2.1.4.6.1.8 Configuring the NTP Package

After updating to the current version of the operating system you must configure the NTP package. To configure the NTP package: 1. Add your local servers to the NTP configuration.

2. Configure the ntpd service to begin running when your machine is started:

$ sudo systemctl enable ntpd $ sudo systemctl start ntpd

$ sudo ntpq -p

2.1.4.6.1.9 Configuring the Performance Profile

After configuring the NTP package you must configure the performance profile. To configure the performance profile: 1. Switch the active profile:

$ sudo tuned-adm profile throughput-performance

2. Change the multi-user’s default run level:

$ sudo systemctl set-default multi-user.target

2.1.4.6.1.10 Configuring Your Security Limits

After configuring the performance profile you must configure your security limits. Configuring yoursecurity limits refers to configuring the number of open files, processes, etc. To configure your security limits: 1. Run the bash shell as a super-user:

$ sudo bash

2. Run the following command:

$ echo -e "sqream soft nproc 500000\nsqream hard nproc 500000\nsqream soft nofile␣ ,→500000\nsqream hard nofile 500000\nsqream soft core unlimited\nsqream hard core␣ ,→unlimited" >> /etc/security/limits.conf

3. Run the following command:

$ echo -e "vm.dirty_background_ratio = 5 \n vm.dirty_ratio = 10 \n vm.swappiness =␣ ,→10 \n vm.zone_reclaim_mode = 0 \n vm.vfs_cache_pressure = 200 \n" >> /etc/sysctl. ,→conf

170 Chapter 2. Guides SQream DB, Release 2021.1

2.1.4.6.1.11 Disabling Automatic Bug-Reporting Tools

After configuring your security limits you must disable the following automatic bug-reporting tools: • ccpp.service • oops.service • pstoreoops.service • vmcore.service • xorg.service You can abort the above but-reporting tools by running the following command:

$ for i in abrt-ccpp.service abrtd.service abrt-oops.service abrt-pstoreoops.service␣ ,→abrt-vmcore.service abrt-xorg.service ; do sudo systemctl disable $i; sudo systemctl␣ ,→stop $i; done

2.1.4.6.1.12 Preparing the Nvidia CUDA Drive for Installation

After disabling the automatic bug-reporting tools you must prepare the Nvidia CUDA driver for installation. To prepare the Nvidia CUDA drive for installation: 1. Reboot all servers.

2. Verify that the Tesla NVIDIA card has been installed and is detected by the system:

$ lspci | grep -i nvidia

The correct output is a list of Nvidia graphic cards. If you do not receive this output, verify that an NVIDIA GPU card has been installed. 3. Verify that the open-source upstream Nvidia driver is running:

$ lsmod | grep nouveau

No output should be generated. 4. If you receive any output, do the following: 1. Disable the open-source upstream Nvidia driver:

$ sudo bash $ echo "blacklist nouveau" > /etc/modprobe.d/blacklist-nouveau.conf $ echo "options nouveau modeset=0" >> /etc/modprobe.d/blacklist-nouveau.conf $ dracut --force $ modprobe --showconfig | grep nouveau

2. Reboot the server and verify that the Nouveau model has not been loaded:

$ lsmod | grep nouveau

2.1. Full list of guides 171 SQream DB, Release 2021.1

2.1.4.6.1.13 Installing the Nvidia CUDA Driver

After preparing the Nvidia CUDA driver for installation you must install it. To install the Nvidia CUDA driver: 1. Check if the Nvidia CUDA driver has already been installed:

$ nvidia-smi

The following is an example of the correct output:

nvidia-smi Wed Oct 30 14:05:42 2019 +------+ | NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 | |------+------+------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |======+======+======| | 0 Tesla V100-SXM2... On | 00000004:04:00.0 Off | 0 | | N/A 32C P0 37W / 300W | 0MiB / 16130MiB | 0% Default | +------+------+------+ | 1 Tesla V100-SXM2... On | 00000035:03:00.0 Off | 0 | | N/A 33C P0 37W / 300W | 0MiB / 16130MiB | 0% Default | +------+------+------+

+------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |======| | No running processes found | +------+

2. Verify that the installed CUDA version shown in the output above is 10.1.

3. Do one of the following:

1. If CUDA version 10.1 has already been installed, skip to Docktime Runtime (Community Edition).

2. If CUDA version 10.1 has not been installed yet, continue with Step 4 below. 4. Do one of the following: • Install CUDA Driver version 10.1 for x86_64.

• Install CUDA driver version 10.1 for IBM Power9.

2.1.4.6.1.14 Installing the CUDA Driver Version 10.1 for x86_64

To install the CUDA driver version 10.1 for x86_64:

172 Chapter 2. Guides SQream DB, Release 2021.1

1. Make the following target platform selections:

• Operating system: Linux • Architecture: x86_64 • Distribution: CentOS • Version: 7 • Installer type: the relevant installer type For installer type, SQream recommends selecting runfile (local). The available selections shows only the supported platforms. 2. Download the base installer for Linux CentOS 7 x86_64.

3. Install the base installer for Linux CentOS 7 x86_64: 1. Run the following command:

$ sudo sh cuda_10.1.105_418.39_linux.run

2. Follow the command line prompts. 4. Enable the Nvidia service to start at boot and start it:

$ sudo systemctl enable nvidia-persistenced.service && sudo systemctl start nvidia- ,→persistenced.service

5. Create a symbolic link from the /etc/systemd/system/multi-user.target.wants/nvidia- persistenced.service file to the /usr/lib/systemd/system/nvidia-persistenced.service file.

6. Reboot the server.

7. Verify that the Nvidia driver has been installed and shows all available GPU’s:

$ nvidia-smi

nvidia-smi Wed Oct 30 14:05:42 2019 +------+ | NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 | |------+------+------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |======+======+======| | 0 Tesla V100-SXM2... On | 00000004:04:00.0 Off | 0 | | N/A 32C P0 37W / 300W | 0MiB / 16130MiB | 0% Default | +------+------+------+ | 1 Tesla V100-SXM2... On | 00000035:03:00.0 Off | 0 | | N/A 33C P0 37W / 300W | 0MiB / 16130MiB | 0% Default | +------+------+------+ (continues on next page)

2.1. Full list of guides 173 SQream DB, Release 2021.1

(continued from previous page)

+------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |======| | No running processes found | +------+

2.1.4.6.1.15 Installing the CUDA Driver Version 10.1 for IBM Power9

To install the CUDA driver version 10.1 for IBM Power9: 1. Make the following target platform selections:

• Operating system: Linux • Architecture: ppc64le • Distribution: RHEL • Version: 7 • Installer type: the relevant installer type For installer type, SQream recommends selecting runfile (local). The available selections shows only the supported platforms. To disable the udev rule: 1. Copy the file to the /etc/udev/rules.d directory.

2. Comment out, remove, or change the hot-pluggable memory rule located in file copied to the /etc/udev/rules.d directory. This prevents it from affecting the Power9 Nvidia systems: • Run the following on RHEL version 7.6 or later:

$ sudo cp /lib/udev/rules.d/40-redhat.rules /etc/udev/rules.d $ sudo sed -i 's/SUBSYSTEM!="memory",.*GOTO="memory_hotplug_end"/SUBSYSTEM=="*", ,→ GOTO="memory_hotplug_end"/' /etc/udev/rules.d/40-redhat.rules

4. Enable the nvidia-persisted.service file:

$ sudo systemctl enable nvidia-persistenced.service

5. Create a symbolic link from the /etc/systemd/system/multi-user.target.wants/nvidia- persistenced.service file to the /usr/lib/systemd/system/nvidia-persistenced.service file.

6. Reboot your system to initialize the above modifications.

7. Verify that the Nvidia driver and the nvidia-persistenced.service files are running:

174 Chapter 2. Guides SQream DB, Release 2021.1

$ nvidia smi

The following is the correct output:

nvidia-smi Wed Oct 30 14:05:42 2019 +------+ | NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 | |------+------+------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |======+======+======| | 0 Tesla V100-SXM2... On | 00000004:04:00.0 Off | 0 | | N/A 32C P0 37W / 300W | 0MiB / 16130MiB | 0% Default | +------+------+------+ | 1 Tesla V100-SXM2... On | 00000035:03:00.0 Off | 0 | | N/A 33C P0 37W / 300W | 0MiB / 16130MiB | 0% Default | +------+------+------+

+------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |======| | No running processes found | +------+

8. Verify that the nvidia-persistenced service is running:

$ ystemctl status nvidia-persistenced

The following is the correct output:

root@gpudb ~]systemctl status nvidia-persistenced ￿ nvidia-persistenced.service - NVIDIA Persistence Daemon Loaded: loaded (/usr/lib/systemd/system/nvidia-persistenced.service;␣ ,→enabled; vendor preset: disabled) Active: active (running) since Tue 2019-10-15 21:43:19 KST; 11min ago Process: 8257 ExecStart=/usr/bin/nvidia-persistenced --verbose␣ ,→(code=exited, status=0/SUCCESS) Main PID: 8265 (nvidia-persiste) Tasks: 1 Memory: 21.0M CGroup: /system.slice/nvidia-persistenced.service ￿￿8265 /usr/bin/nvidia-persistenced --verbose

2.1.4.6.2 Installing the Docker Engine (Community Edition)

After installing the Nvidia CUDA driver you must install the Docker engine. This section describes how to install the Docker engine using the following processors: • Using x86_64 processor on CentOS • Using x86_64 processor on Ubuntu

2.1. Full list of guides 175 SQream DB, Release 2021.1

• Using IBM Power9 (PPC64le) processor

2.1.4.6.2.1 Installing the Docker Engine Using an x86_64 Processor on CentOS

The x86_64 processor supports installing the Docker Community Edition (CE) versions 18.03 and higher. For more information on installing the Docker Engine CE on an x86_64 processor, see Install Docker Engine on CentOS

2.1.4.6.2.2 Installing the Docker Engine Using an x86_64 Processor on Ubuntu

The x86_64 processor supports installing the Docker Community Edition (CE) versions 18.03 and higher. For more information on installing the Docker Engine CE on an x86_64 processor, see Install Docker Engine on Ubuntu

2.1.4.6.2.3 Installing the Docker Engine on an IBM Power9 Processor

The x86_64 processor only supports installing the Docker Community Edition (CE) version 18.03. To install the Docker Engine on an IBM Power9 processor: You can install the Docker Engine on an IBM Power9 processor by running the following command:

$ wget $ http://ftp.unicamp.br/pub/ppc64el/rhel/7_1/docker-ppc64el/container-selinux-2.9-4.el7. ,→noarch.rpm $ wget $ http://ftp.unicamp.br/pub/ppc64el/rhel/7_1/docker-ppc64el/docker-ce-18.03.1.ce-1.el7. ,→centos.ppc64le.rpm $ yum install -y container-selinux-2.9-4.el7.noarch.rpm $ docker-ce-18.03.1.ce-1.el7.centos.ppc64le.rpm

For more information on installing the Docker Engine CE on an IBM Power9 processor, see Install Docker Engine on Ubuntu.

2.1.4.6.3 Docker Post-Installation

After installing the Docker engine you must configure Docker on your local machine. To configure Docker on your local machine: 1. Enable Docker to start on boot:

$ sudo systemctl enable docker && sudo systemctl start docker

2. Enable managing Docker as a non-root user:

$ sudo usermod -aG docker $USER

3. Log out and log back in via SSH. This causes Docker to re-evaluate your group membership.

176 Chapter 2. Guides SQream DB, Release 2021.1

4. Verify that you can run the following Docker command as a non-root user (without sudo):

$ docker run hello-world

If you can run the above Docker command as a non-root user, the following occur: • Docker downloads a test image and runs it in a container. • When the container runs, it prints an informational message and exits. For more information on installing the Docker Post-Installation, see Docker Post-Installation.

2.1.4.6.4 Installing the Nvidia Docker2 ToolKit

After configuring Docker on your local machine you must install the Nvidia Docker2 ToolKit. The NVIDIA Docker2 Toolkit lets you build and run GPU-accelerated Docker containers. The Toolkit includes a container runtime library and related utilities for automatically configuring containers to leverage NVIDIA GPU’s. This section describes the following: • Installing the NVIDIA Docker2 Toolkit on an x86_64 processor. • Installing the NVIDIA Docker2 Toolkit on a PPC64le processor.

2.1.4.6.4.1 Installing the NVIDIA Docker2 Toolkit on an x86_64 Processor

This section describes the following: • Installing the NVIDIA Docker2 Toolkit on a CentOS operating system • Installing the NVIDIA Docker2 Toolkit on an Ubuntu operating system

2.1.4.6.4.2 Installing the NVIDIA Docker2 Toolkit on a CentOS Operating System

To install the NVIDIA Docker2 Toolkit on a CentOS operating system: 1. Install the repository for your distribution:

$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) $ curl -s -L $ https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | \ $ sudo tee /etc/yum.repos.d/nvidia-docker.repo

2. Install the nvidia-docker2 package and reload the Docker daemon configuration:

$ sudo yum install nvidia-docker2 $ sudo pkill -SIGHUP dockerd

3. Do one of the following: • If you received an error when installing the nvidia-docker2 package, skip to Step 4. • If you successfully installed the nvidia-docker2 package, skip to Step 5. 4. Do the following:

2.1. Full list of guides 177 SQream DB, Release 2021.1

1. Run the sudo vi /etc/yum.repos.d/nvidia-docker.repo command if the following error is displayed when installing the nvidia-docker2 package:

https://nvidia.github.io/nvidia-docker/centos7/ppc64le/repodata/repomd.xml: [Errno -1] repomd.xml signature could not be verified for nvidia-docker

2. Change repo_gpgcheck=1 to repo_gpgcheck=0. 5. Verify that the NVIDIA-Docker run has been installed correctly:

$ docker run --runtime=nvidia --rm nvidia/cuda:10.1-base nvidia-smi

For more information on installing the NVIDIA Docker2 Toolkit on a CentOS operating system, see Installing the NVIDIA Docker2 Toolkit on a CentOS operating system

2.1.4.6.4.3 Installing the NVIDIA Docker2 Toolkit on an Ubuntu Operating System

To install the NVIDIA Docker2 Toolkit on an Ubuntu operating system: 1. Install the repository for your distribution:

$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \ $ sudo apt-key add - $ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) $ curl -s -L $ https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \ $ sudo tee /etc/apt/sources.list.d/nvidia-docker.list $ sudo apt-get update

2. Install the nvidia-docker2 package and reload the Docker daemon configuration:

$ sudo apt-get install nvidia-docker2 $ sudo pkill -SIGHUP dockerd

3. Do one of the following: * If you received an error when installing the nvidia-docker2 package, skip to Step 4. * If you successfully installed the nvidia-docker2 package, skip to Step 5.

4. Do the following: 1. Run the sudo vi /etc/yum.repos.d/nvidia-docker.repo command if the following error is displayed when installing the nvidia-docker2 package:

https://nvidia.github.io/nvidia-docker/centos7/ppc64le/repodata/repomd.xml: [Errno -1] repomd.xml signature could not be verified for nvidia-docker

2. Change repo_gpgcheck=1 to repo_gpgcheck=0. 5. Verify that the NVIDIA-Docker run has been installed correctly:

$ docker run --runtime=nvidia --rm nvidia/cuda:10.1-base nvidia-smi

For more information on installing the NVIDIA Docker2 Toolkit on a CentOS operating system, see Installing the NVIDIA Docker2 Toolkit on an Ubuntu operating system

178 Chapter 2. Guides SQream DB, Release 2021.1

2.1.4.6.4.4 Installing the NVIDIA Docker2 Toolkit on a PPC64le Processor

This section describes how to install the NVIDIA Docker2 Toolkit on an IBM RHEL operating system: To install the NVIDIA Docker2 Toolkit on an IBM RHEL operating system: 1. Import the repository and install the libnvidia-container and the nvidia-container-runtime containers.

$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) $ curl -s -L $ https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | \ $ sudo tee /etc/yum.repos.d/nvidia-docker.repo $ sudo yum install -y libnvidia-container*

2. Do one of the following: • If you received an error when installing the containers, skip to Step 3. • If you successfully installed the containers, skip to Step 4. 3. Do the following: 1. Run the sudo vi /etc/yum.repos.d/nvidia-docker.repo command if the following error is displayed when installing the containers:

https://nvidia.github.io/nvidia-docker/centos7/ppc64le/repodata/repomd.xml: [Errno -1] repomd.xml signature could not be verified for nvidia-docker

2. Change repo_gpgcheck=1 to repo_gpgcheck=0.

3. Install the libnvidia-container container.

$ sudo yum install -y libnvidia-container*

4. Install the nvidia-container-runtime container:

$ sudo yum install -y nvidia-container-runtime*

5. Add nvidia runtime to the Docker daemon:

$ sudo mkdir -p /etc/systemd/system/docker.service.d/ $ sudo vi /etc/systemd/system/docker.service.d/override.conf

$ [Service] $ ExecStart= $ ExecStart=/usr/bin/dockerd

6. Restart Docker:

$ sudo systemctl daemon-reload $ sudo systemctl restart docker

7. Verify that the NVIDIA-Docker run has been installed correctly:

2.1. Full list of guides 179 SQream DB, Release 2021.1

$ docker run --runtime=nvidia --rm nvidia/cuda-ppc64le nvidia-smi

2.1.4.6.4.5 Accessing the Hadoop and Kubernetes Configuration Files

The information this section is optional and is only relevant for Hadoop users. If you require Hadoop and Kubernetes (Krb5) connectivity, contact your IT department for access to the following configuration files: • Hadoop configuration files: – core-site.xml – hdfs-site.xml

• Kubernetes files: – Configuration file - krb.conf – Kubernetes Hadoop client certificate - hdfs.keytab Once you have the above files, you must copy them into the correct folders in your working directory. For more information about the correct directory to copy the above files into, see the Installing the SQream Software section below. For related information, see the following sections: • Configuring the Hadoop and Kubernetes Configuration Files. • Setting the Hadoop and Kubernetes Configuration Parameters.

2.1.4.6.5 Installing the SQream Software

2.1.4.6.5.1 Preparing Your Local Environment

After installing the Nvidia Docker2 toolKit you must prepare your local environment. The Linux user preparing the local environment must have read/write access to the following directories for the SQream software to correctly read and write the required resources: • Log directory - default: /var/log/sqream/ • Configuration directory - default: /etc/sqream/ • Cluster directory - the location where SQream writes its DB system, such as /mnt/sqreamdb • Ingest directory - the location where the required data is loaded, such as /mnt/data_source/

2.1.4.6.5.2 Deploying the SQream Software

After preparing your local environment you must deploy the SQream software. Deploying the SQream software requires you to access and extract the required files and to place them in the correct directory. To deploy the SQream software: 1. Contact the SQream Support team for access to the sqream_installer-nnn-DBnnn-COnnn- EDnnn-.tar.gz file.

180 Chapter 2. Guides SQream DB, Release 2021.1

The sqream_installer-nnn-DBnnn-COnnn-EDnnn-.tar.gz file includes the following param- eter values: • sqream_installer-nnn - sqream installer version • DBnnn - SQreamDB version • COnnn - SQream console version • EDnnn - SQream editor version • arch - server arch (applicable to X86.64 and ppc64le) 2. Extract the tarball file:

$ tar -xvf sqream_installer-1.1.5-DB2019.2.1-CO1.5.4-ED3.0.0-x86_64.tar.gz

When the tarball file has been extracted, a new folder will be created. The new folder is automatically given the name of the tarball file:

drwxrwxr-x 9 sqream sqream 4096 Aug 11 11:51 sqream_istaller-1.1.5-DB2019.2.1- ,→CO1.5.4-ED3.0.0-x86_64/ -rw-rw-r-- 1 sqream sqream 3130398797 Aug 11 11:20 sqream_installer-1.1.5- ,→DB2019.2.1-CO1.5.4-ED3.0.0-x86_64.tar.gz

3. Change the directory to the new folder that you created in the previous step. 4. Verify that the folder you just created contains all of the required files.

$ ls -la

The following is an example of the files included in the new folder:

drwxrwxr-x. 10 sqream sqream 198 Jun 3 17:57 . drwx------. 25 sqream sqream 4096 Jun 7 18:11 .. drwxrwxr-x. 2 sqream sqream 226 Jun 7 18:09 .docker drwxrwxr-x. 2 sqream sqream 64 Jun 3 12:55 .hadoop drwxrwxr-x. 2 sqream sqream 4096 May 31 14:18 .install drwxrwxr-x. 2 sqream sqream 39 Jun 3 12:53 .krb5 drwxrwxr-x. 2 sqream sqream 22 May 31 14:18 license drwxrwxr-x. 2 sqream sqream 82 May 31 14:18 .sqream -rwxrwxr-x. 1 sqream sqream 1712 May 31 14:18 sqream-console -rwxrwxr-x. 1 sqream sqream 4608 May 31 14:18 sqream-install

For information relevant to Hadoop users, see the following sections: • Accessing the Hadoop and Kubernetes Configuration Files. • Configuring the Hadoop and Kubernetes Configuration Files. • Setting the Hadoop and Kubernetes Configuration Parameters.

2.1.4.6.5.3 Configuring the Hadoop and Kubernetes Configuration Files

The information in this section is optional and is only relevant for Hadoop users. If you require Hadoop and Kubernetes (Krb5) connectivity, you must copy the Hadoop and Kubernetes files into the correct folders in your working directory as shown below: • .hadoop/core-site.xml

2.1. Full list of guides 181 SQream DB, Release 2021.1

• .hadoop/hdfs-site.xml • .krb5/krb5.conf • .krb5/hdfs.keytab For related information, see the following sections: • Accessing the Hadoop and Kubernetes Configuration Files. • Setting the Hadoop and Kubernetes Configuration Parameters.

2.1.4.6.5.4 Configuring the SQream Software

After deploying the SQream software, and optionally configuring the Hadoop and Kubernetes configuration files, you must configure the SQream software. Configuring the SQream software requires you to do the following: • Configure your local environment • Understand the sqream-install flags • Install your SQream license • Validate your SQream icense • Change your data ingest folder

2.1.4.6.5.5 Configuring Your Local Environment

Once you’ve downloaded the SQream software, you can begin configuring your local environment. The following commands must be run (as sudo) from the same directory that you located your packages. For example, you may have saved your packages in /home/sqream/sqream-console-package/. The following table shows the flags that you can use to configure your local directory:

182 Chapter 2. Guides SQream DB, Release 2021.1

Flag Function Note -i Loads all software from the hidden folder .docker. Mandatory -k Loads all license packages from the /license direc- Mandatory tory. -f Overwrites existing folders. Note Using -f over- Mandatory writes all files located in mounted directories. -c Defines the origin path for writing/reading SQream If you are installing the Docker ver- configuration files. The default location is /etc/ sion on a server that already works with sqream/. SQream, do not use the default path. -v The SQream cluster location. If a cluster does not Mandatory exist yet, -v creates one. If a cluster already exists, -v mounts it. -l SQream system startup logs location, including startup logs and docker logs. The default location is /var/log/sqream/. -d The directory containing customer data to be im- ported and/or copied to SQream. -s Shows system settings. -r Resets the system configuration. This value is run Mandatory without any other variables. -h Help. Shows the available flags. Mandatory -K Runs license validation -e Used for inserting your RKrb5 server DNS name. For more information on setting your Kerberos con- figuration parameters, see Setting the Hadoop and Kubernetes Configuration Parameters. -p Used for inserting your Kerberos user name. For more information on setting your Kerberos config- uration parameters, see Setting the Hadoop and Ku- bernetes Configuration Parameters.

2.1.4.6.5.6 Installing Your License

Once you’ve configured your local environment, you must install your license by copying it into theSQream installation package folder located in the ./license folder:

$ sudo ./sqream-install -k

You do not need to extract this folder after uploading into the ./license.

2.1.4.6.5.7 Validating Your License

You can copy your license package into the SQream console folder located in the /license folder by running the following command:

$ sudo ./sqream-install -K

The following mandatory flags must be used in the first run:

$ sudo ./sqream-install -i -k -v

2.1. Full list of guides 183 SQream DB, Release 2021.1

The following is an example of the correct command syntax:

$ sudo ./sqream-install -i -k -c /etc/sqream -v /home/sqream/sqreamdb -l $ /var/log/sqream -d /home/sqream/data_ingest

2.1.4.6.5.8 Setting the Hadoop and Kubernetes Connectivity Parameters

The information in this section is optional, and is only relevant for Hadoop users. If you require Hadoop and Kubernetes (Krb5) connectivity, you must set their connectivity parameters. The following is the correct syntax when setting the Hadoop and Kubernetes connectivity parameters:

$ sudo ./sqream-install -p -e :

The following is an example of setting the Hadoop and Kubernetes connectivity parameters:

$ sudo ./sqream-install -p -e kdc.sq.com:<192.168.1.111>

For related information, see the following sections: • Accessing the Hadoop and Kubernetes Configuration Files. • Configuring the Hadoop and Kubernetes Configuration Files.

2.1.4.6.5.9 Modifying Your Data Ingest Folder

Once you’ve validated your license, you can modify your data ingest folder after the first run by running the following command:

$ sudo ./sqream-install -d /home/sqream/data_in

2.1.4.6.5.10 Configuring Your Network for Docker

Once you’ve modified your data ingest folder (if needed), you must validate that the server network and Docker network that you are setting up do not overlap. To configure your network for Docker: 1. To verify that your server network and Docker network do not overlap, run the following command:

$ ifconfig | grep 172.

2. Do one of the following: • If running the above command output no results, continue the installation process. • If running the above command output results, run the following command:

$ ifconfig | grep 192.168.

184 Chapter 2. Guides SQream DB, Release 2021.1

2.1.4.6.5.11 Checking and Verifying Your System Settings

Once you’ve configured your network for Docker, you can check and verify your system settings. Running the following command shows you all the variables used by your SQream system:

$ ./sqream-install -s

The following is an example of the correct output:

SQREAM_CONSOLE_TAG=1.5.4 SQREAM_TAG=2019.2.1 SQREAM_EDITOR_TAG=3.0.0 license_worker_0=f0:cc: license_worker_1=26:91: license_worker_2=20:26: license_worker_3=00:36: SQREAM_VOLUME=/media/sqreamdb SQREAM_DATA_INGEST=/media/sqreamdb/data_in SQREAM_CONFIG_DIR=/etc/sqream/ LICENSE_VALID=true SQREAM_LOG_DIR=/var/log/sqream/ SQREAM_USER=sqream SQREAM_HOME=/home/sqream SQREAM_ENV_PATH=/home/sqream/.sqream/env_file PROCESSOR=x86_64 METADATA_PORT=3105 PICKER_PORT=3108 NUM_OF_GPUS=2 CUDA_VERSION=10.1 NVIDIA_SMI_PATH=/usr/bin/nvidia-smi DOCKER_PATH=/usr/bin/docker NVIDIA_DRIVER=418 SQREAM_MODE=single_host

2.1.4.6.6 Using the SQream Console

After configuring the SQream software and veriying your system settings you can begin using theSQream console.

2.1.4.6.6.1 SQream Console - Basic Commands

The SQream console offers the following basic commands: • Starting your SQream console • Starting Metadata and Picker • Starting the running services • Listing the running services • Stopping the running services • Using the SQream editor

2.1. Full list of guides 185 SQream DB, Release 2021.1

• Using the SQream Client

2.1.4.6.6.2 Starting Your SQream Console

You can start your SQream console by running the following command:

$ ./sqream-console

2.1.4.6.6.3 Starting the SQream Master

To listen to metadata and picker: 1. Start the metadata server (default port 3105) and picker (default port 3108) by running the following command:

$ sqream master --start

The following is the correct output:

sqream-console> sqream master --start starting master server in single_host mode ... sqream_single_host_master is up and listening on ports: 3105,3108

2. Optional - Change the metadata and server picker ports by adding -p and -m :

$ sqream-console>sqream master --start -p 4105 -m 43108 $ starting master server in single_host mode ... $ sqream_single_host_master is up and listening on ports: 4105,4108

2.1.4.6.6.4 Starting SQream Workers

When starting SQream workers, setting the value sets how many workers to start. Leaving the value unspecified runs all of the available resources.

$ sqream worker --start

The following is an example of expected output when setting the ````␣ ,→value to ``2``:

.. code-block:: console

sqream-console>sqream worker --start 2 started sqream_single_host_worker_0 on port 5000, allocated gpu: 0 started sqream_single_host_worker_1 on port 5001, allocated gpu: 1

2.1.4.6.6.5 Listing the Running Services

You can list running SQream services to look for container names and ID’s by running the following command:

186 Chapter 2. Guides SQream DB, Release 2021.1

$ sqream master --list

The following is an example of the expected output: sqream-console>sqream master --list container name: sqream_single_host_worker_0, container id: c919e8fb78c8 container name: sqream_single_host_master, container id: ea7eef80e038--

2.1.4.6.6.6 Stopping the Running Services

You can stop running services either for a single SQream worker, or all SQream services for both master and worker. The following is the command for stopping a running service for a single SQream worker:

$ sqream worker --stop

The following is an example of expected output when stopping a running service for a single SQream worker: sqream worker stop stopped container sqream_single_host_worker_0, id: 892a8f1a58c5

You can stop all running SQream services (both master and worker) by running the following command:

$ sqream-console>sqream master --stop --all

The following is an example of expected output when stopping all running services: sqream-console>sqream master --stop --all stopped container sqream_single_host_worker_0, id: 892a8f1a58c5 stopped container sqream_single_host_master, id: 55cb7e38eb22

2.1.4.6.6.7 Using SQream Studio

SQream Studio is an SQL statement editor. To start SQream Studio: 1. Run the following command:

$ sqream studio --start

The following is an example of the expected output:

SQream Acceleration Studio is available at http://192.168.1.62:8080

2. Click the http://192.168.1.62:8080 link shown in the CLI. To stop SQream Studio: You can stop your SQream Studio by running the following command:

$ sqream studio --stop

The following is an example of the expected output:

2.1. Full list of guides 187 SQream DB, Release 2021.1

sqream_admin stopped

2.1.4.6.6.8 Using the SQream Client

You can use the embedded SQream Client on the following nodes: • Master node • Worker node When using the SQream Client on the Master node, the following default settings are used: • Default port: 3108. You can change the default port using the -p variable. • Default database: master. You can change the default database using the -d variable. The following is an example:

$ sqream client --master -u sqream -w sqream

When using the SQream Client on a Worker node (or nodes), you should use the -p variable for Worker ports. The default database is master, but you can use the -d variable to change databases. The following is an example:

$ sqream client --worker -p 5000 -u sqream -w sqream

2.1.4.6.6.9 Moving from Docker Installation to Standard On-Premises Installation

Because Docker creates all files and directories on the host atthe root level, you must grant ownership of the SQream storage folder to the working directory user.

2.1.4.6.6.10 SQream Console - Advanced Commands

The SQream console offers the following advanced commands: • Controlling the spool size • Splitting a GPU • Splitting a GPU and setting the spool size • Using a custom configuration file • Clustering your Docker environment

2.1.4.6.6.11 Controlling the Spool Size

From the console you can define a spool size value. The following example shows the spool size being set to 50:

$ sqream-console>sqream worker --start 2 -m 50

If you don’t define the SQream spool size, the SQream console automatically distributes the available RAM between all running workers.

188 Chapter 2. Guides SQream DB, Release 2021.1

2.1.4.6.6.12 Splitting a GPU

You can start more than one sqreamd on a single GPU by splitting it. The following example shows the GPU being split into two sqreamd’s on the GPU in slot 0:

$ sqream-console>sqream worker --start 2 -g 0

2.1.4.6.6.13 Splitting GPU and Setting the Spool Size

You can simultaneously split a GPU and set the spool size by appending the -m flag:

$ sqream-console>sqream worker --start 2 -g 0 -m 50

Note: The console does not validate whether the user-defined spool size is available. Before setting the spool size, verify that the requested resources are available.

2.1.4.6.6.14 Using a Custom Configuration File

SQream lets you use your own external custom configuration json files. You must place these json filesin the path mounted in the installation. SQream recommends placing the json file in the Configuration folder. The SQream console does not validate the integrity of your external configuration files. When using your custom configuration file, you can use the -j flag to define the full path to the Configuration file, as in the example below:

$ sqream-console>sqream worker --start 1 -j /etc/sqream/configfile.json

Note: To start more than one sqream daemon, you must provide files for each daemon, as in the example below:

$ sqream worker --start 2 -j /etc/sqream/configfile.json /etc/sqream/configfile2.json

Note: To split a specific GPU, you must also list the GPU flag, as in the example below:

$ sqream worker --start 2 -g 0 -j /etc/sqream/configfile.json /etc/sqream/configfile2. ,→json

2.1.4.6.6.15 Clustering Your Docker Environment

SQream lets you connect to a remote Master node to start Docker in Distributed mode. If you have already connected to a Slave node server in Distributed mode, the sqream Master and Client commands are only available on the Master node.

2.1. Full list of guides 189 SQream DB, Release 2021.1

$ --master-host $ sqream-console>sqream worker --start 1 --master-host 192.168.0.1020

2.1.4.6.6.16 Checking the Status of SQream Services

SQream lets you check the status of SQream services from the following locations: • From the Sqream console • From outside the Sqream console

2.1.4.6.6.17 Checking the Status of SQream Services from the SQream Console

From the SQream console, you can check the status of SQream services by running the following command:

$ sqream-console>sqream master --list

The following is an example of the expected output:

$ sqream-console>sqream master --list $ checking 3 sqream services: $ sqream_single_host_worker_1 up, listens on port: 5001 allocated gpu: 1 $ sqream_single_host_worker_0 up, listens on port: 5000 allocated gpu: 1 $ sqream_single_host_master up listens on ports: 3105,3108

2.1.4.6.6.18 Checking the Status of SQream Services from Outside the SQream Console

From outside the Sqream Console, you can check the status of SQream services by running the following commands:

$ sqream-status $ NAMES STATUS PORTS $ sqream_single_host_worker_1 Up 3 minutes 0.0.0.0:5001->5001/tcp $ sqream_single_host_worker_0 Up 3 minutes 0.0.0.0:5000->5000/tcp $ sqream_single_host_master Up 3 minutes 0.0.0.0:3105->3105/tcp, 0.0.0.0:3108->3108/tcp $ sqream_editor_3.0.0 Up 3 hours (healthy) 0.0.0.0:3000->3000/tcp

2.1.4.6.6.19 Upgrading Your SQream System

This section describes how to upgrade your SQream system. To upgrade your SQream system: 1. Contact the SQream Support team for access to the new SQream package tarball file.

2. Set a maintenance window to enable stopping the system while upgrading it.

3. Extract the following tarball file received from the SQream Support team, under it with the same user and in the same folder that you used while Downloading the SQream Software.

190 Chapter 2. Guides SQream DB, Release 2021.1

$ tar -xvf sqream_installer-2.0.5-DB2019.2.1-CO1.6.3-ED3.0.0-x86_64/

4. Navigate to the new folder created as a result of extracting the tarball file:

$ cd sqream_installer-2.0.5-DB2019.2.1-CO1.6.3-ED3.0.0-x86_64/

5. Initiate the upgrade process:

$ ./sqream-install -i

Initiating the upgrade process checks if any SQream services are running. If any services are running, you will be prompted to stop them. 6. Do one of the following: • Select Yes to stop all running SQream workers (Master and Editor) and continue the upgrade process. • Select No to stop the upgrade process. SQream periodically upgrades the metadata structure. If an upgrade version includes a change to the metadata structure, you will be prompted with an approval request message. Your approval is required to finish the upgrade process. Because SQream supports only certain metadata versions, all SQream services must be upgraded at the same time. 7. When the upgrade is complete, load the SQream console and restart your services. For assistance, contact SQream Support.

2.1.4.7 Monitoring query performance

When analyzing options for query tuning, the first step is to analyze the query plan and execution. The query plan and execution details explains how SQream DB processes a query and where time is spent. This document details how to analyze query performance with execution plans. This guide focuses specifically on identifying bottlenecks and possible optimization techniques to improve query performance. Performance tuning options for each query are different. You should adapt the recommendations and tips for your own workloads. See also our Optimization and Best Practices guide for more information about data loading considerations and other best practices.

In this section:

• Setting up the system for monitoring – Adjusting the logging frequency – Reading execution plans with a foreign table • The SHOW_NODE_INFO command • Understanding the query execution plan output

2.1. Full list of guides 191 SQream DB, Release 2021.1

– Information presented in the execution plan – Commonly seen nodes • Examples – 1. Spooling to disk ∗ Identifying the offending nodes ∗ Common solutions for reducing spool – 2. Queries with large result sets ∗ Identifying the offending nodes ∗ Common solutions for reducing gather time – 3. Inefficient filtering ∗ Identifying the situation ∗ Common solutions for improving filtering – 4. Joins with varchar keys ∗ Identifying the situation ∗ Improving query performance – 5. Sorting on big VARCHAR fields ∗ Identifying the situation ∗ Improving sort performance on text keys – 6. High selectivity data ∗ Identifying the situation ∗ Improving performance with high selectivity hints – 7. Performance of unsorted data in joins ∗ Identifying the situation ∗ Improving join performance when data is sparse – 8. Manual join reordering ∗ Identifying the situation ∗ Changing the join order • Further reading

2.1.4.7.1 Setting up the system for monitoring

By default, SQream DB logs execution details for every statement that runs for more than 60 seconds. If you want to see the execution details for a currently running statement, see The SHOW_NODE_INFO command below.

192 Chapter 2. Guides SQream DB, Release 2021.1

2.1.4.7.1.1 Adjusting the logging frequency

To adjust the frequency of logging for statements, you may want to reduce the interval from 60 seconds down to, say, 5 or 10 seconds. Modify the configuration files and set the nodeInfoLoggingSec parameter as you see fit:

{ "compileFlags":{ }, "runtimeFlags":{ }, "runtimeGlobalFlags":{ "nodeInfoLoggingSec" : 5, }, "server":{ } }

After restarting the SQream DB cluster, the execution plan details will be logged to the standard SQream DB logs directory, as a message of type 200. You can see these messages with a text viewer or with queries on the log Foreign Tables.

2.1.4.7.1.2 Reading execution plans with a foreign table

First, create a foreign table for the logs

CREATE FOREIGN TABLE logs ( start_marker VARCHAR(4), row_id BIGINT, timestamp DATETIME, message_level TEXT, thread_id TEXT, worker_hostname TEXT, worker_port INT, connection_id INT, database_name TEXT, user_name TEXT, statement_id INT, service_name TEXT, message_type_id INT, message TEXT, end_message VARCHAR(5) ) WRAPPER cdv_fdw OPTIONS ( LOCATION = '/home/rhendricks/sqream_storage/logs/**/sqream*.log', DELIMITER = '|' ) ;

2.1. Full list of guides 193 SQream DB, Release 2021.1

Once you’ve defined the foreign table, you can run queries to observe the previously logged execution plans. This is recommended over looking at the raw logs. t=> SELECT message . FROM logs . WHERE message_type_id = 200 . AND timestamp BETWEEN '2020-06-11' AND '2020-06-13'; message ------,→------SELECT *,coalesce((depdelay > 15),false) AS isdepdelayed FROM ontime WHERE year IN (2005, ,→ 2006, 2007, 2008, 2009, 2010) : : 1,PushToNetworkQueue ,10354468,10,1035446,2020-06-12 20:41:42,-1,,,,13.55 : 2,Rechunk ,10354468,10,1035446,2020-06-12 20:41:42,1,,,,0.10 : 3,ReorderInput ,10354468,10,1035446,2020-06-12 20:41:42,2,,,,0.00 : 4,DeferredGather ,10354468,10,1035446,2020-06-12 20:41:42,3,,,,1.23 : 5,ReorderInput ,10354468,10,1035446,2020-06-12 20:41:41,4,,,,0.01 : 6,GpuToCpu ,10354468,10,1035446,2020-06-12 20:41:41,5,,,,0.07 : 7,GpuTransform ,10354468,10,1035446,2020-06-12 20:41:41,6,,,,0.02 : 8,ReorderInput ,10354468,10,1035446,2020-06-12 20:41:41,7,,,,0.00 : 9,Filter ,10354468,10,1035446,2020-06-12 20:41:41,8,,,,0.07 : 10,GpuTransform ,10485760,10,1048576,2020-06-12 20:41:41,9,,,,0.07 : 11,GpuDecompress ,10485760,10,1048576,2020-06-12 20:41:41,10,,,,0.03 : 12,GpuTransform ,10485760,10,1048576,2020-06-12 20:41:41,11,,,,0.22 : 13,CpuToGpu ,10485760,10,1048576,2020-06-12 20:41:41,12,,,,0.76 : 14,ReorderInput ,10485760,10,1048576,2020-06-12 20:41:40,13,,,,0.11 : 15,Rechunk ,10485760,10,1048576,2020-06-12 20:41:40,14,,,,5.58 : 16,CpuDecompress ,10485760,10,1048576,2020-06-12 20:41:34,15,,,,0.04 : 17,ReadTable ,10485760,10,1048576,2020-06-12 20:41:34,16,832MB,,public. ,→ontime,0.55

2.1.4.7.2 The SHOW_NODE_INFO command

The SHOW_NODE_INFO command returns a snapshot of the current query plan, similar to EXPLAIN ANALYZE from other databases. The SHOW_NODE_INFO result, just like the periodically-logged execution plans described above, are an at-the-moment view of the compiler’s execution plan and runtime statistics for the specified statement. To inspect a currently running statement, execute the show_node_info utility function in a SQL client like sqream sql, the SQream Studio Editor, or any other third party SQL terminal. In this example, we inspect a statement with statement ID of 176. The command looks like this: t=> SELECT SHOW_NODE_INFO(176); stmt_id | node_id | node_type | rows | chunks | avg_rows_in_chunk | time ␣ ,→ | parent_node_id | read | write | comment | timeSum ------+------+------+------+------+------+------,→------+------+------+------+------+------176 | 1 | PushToNetworkQueue | 1 | 1 | 1 | 2019-12-25␣ ,→23:53:13 | -1 | | | | 0.0025 176 | 2 | Rechunk | 1 | 1 | 1 | 2019-12-25␣ ,→23:53:13 | 1 | | | | 0 (continues on next page)

194 Chapter 2. Guides SQream DB, Release 2021.1

(continued from previous page) 176 | 3 | GpuToCpu | 1 | 1 | 1 | 2019-12-25␣ ,→23:53:13 | 2 | | | | 0 176 | 4 | ReorderInput | 1 | 1 | 1 | 2019-12-25␣ ,→23:53:13 | 3 | | | | 0 176 | 5 | Filter | 1 | 1 | 1 | 2019-12-25␣ ,→23:53:13 | 4 | | | | 0.0002 176 | 6 | GpuTransform | 457 | 1 | 457 | 2019-12-25␣ ,→23:53:13 | 5 | | | | 0.0002 176 | 7 | GpuDecompress | 457 | 1 | 457 | 2019-12-25␣ ,→23:53:13 | 6 | | | | 0 176 | 8 | CpuToGpu | 457 | 1 | 457 | 2019-12-25␣ ,→23:53:13 | 7 | | | | 0.0003 176 | 9 | Rechunk | 457 | 1 | 457 | 2019-12-25␣ ,→23:53:13 | 8 | | | | 0 176 | 10 | CpuDecompress | 457 | 1 | 457 | 2019-12-25␣ ,→23:53:13 | 9 | | | | 0 176 | 11 | ReadTable | 457 | 1 | 457 | 2019-12-25␣ ,→23:53:13 | 10 | 4MB | | public.nba | 0.0004

2.1.4.7.3 Understanding the query execution plan output

Both SHOW_NODE_INFO and the logged execution plans represents the query plan as a graph hierarchy, with data separated into different columns. Each row represents a single logical database operation, which is also called a node or chunk producer. A node reports several metrics during query execution, such as how much data it has read and written, how many chunks and rows, and how much time has elapsed. Consider the example show_node_info presented above. The source node with ID #11 (ReadTable), has a parent node ID #10 (CpuDecompress). If we were to draw this out in a graph, it’d look like this: The last node, also called the sink, has a parent node ID of -1, meaning it has no parent. This is typically a node that sends data over the network or into a table. When using SHOW_NODE_INFO, a tabular representation of the currently running statement execution is presented. See the examples below to understand how the query execution plan is instrumental in identifying bottlenecks and optimizing long-running statements.

2.1. Full list of guides 195 SQream DB, Release 2021.1

196 Chapter 2. Guides

Fig. 3: This graph explains how the query execution details are arranged in a logical order, from the bottom up. SQream DB, Release 2021.1

2.1.4.7.3.1 Information presented in the execution plan

Table 6: Result columns Column name Description stmt_id The ID for the statement node_id This node’s ID in this execution plan node_type The node type rows Total number of rows this node has processed chunks Number of chunks this node has processed avg_rows_in_chunk Average amount of rows that this node processes in each chunk (rows / chunks) time Timestamp for this node’s creation parent_node_id The node_id of this node’s parent read Total data read from disk write Total data written to disk comment Additional information (e.g. table name for ReadTable) timesum Total elapsed time for this execution node’s processing

2.1.4.7.3.2 Commonly seen nodes

Table 7: Node types Column name Execution location Description CpuDecompress CPU Decompression operation, common for longer VARCHAR types CpuLoopJoin CPU A non-indexed nested loop join, performed on the CPU CpuReduce CPU A reduce process performed on the CPU, primarily with DISTINCT aggregates (e.g. COUNT(DISTINCT ...)) CpuToGpu, GpuToCpu An operation that moves data to or from the GPU for processing CpuTransform CPU A transform operation performed on the CPU, usually a scalar function DeferredGather CPU Merges the results of GPU operations with a result set Distinct GPU Removes duplicate rows (usually as part of the DISTINCT operation) Distinct_Merge CPU The merge operation of the Distinct operation Filter GPU A filtering operation, such asa WHERE or JOIN clause GpuDecompress GPU Decompression operation GpuReduceMerge GPU An operation to optimize part of the merger phases in the GPU GpuTransform GPU A transformation operation such as a type cast or scalar function LocateFiles CPU Validates external file paths for foreign data wrappers, expanding directories and GLOB patterns LoopJoin GPU A non-indexed nested loop join, performed on the GPU ParseCsv CPU A CSV parser, used after ReadFiles to convert the CSV into columnar data PushToNetworkQueue CPU Sends result sets to a client connected over the network ReadFiles CPU Reads external flat-files ReadTable CPU Reads data from a standard table stored on disk Rechunk Reorganize multiple small chunks into a full chunk. Commonly found after joins and when HIGH_SELECTIVITY is used Reduce GPU A reduction operation, such as a GROUP BY ReduceMerge GPU A merge operation of a reduction operation, helps operate on larger-than-RAM data ReorderInput Change the order of arguments in preparation for the next operation SeparatedGather GPU Gathers additional columns for the result Sort GPU Sort operation TakeRowsFromChunk Take the first N rows from each chunk, to optimize LIMIT when used alongside ORDER BY Top Limits the input size, when used with LIMIT (or its alias TOP) UdfTransform CPU Executes a user defined function Continued on next page

2.1. Full list of guides 197 SQream DB, Release 2021.1

Table 7 – continued from previous page Column name Execution location Description UnionAll Combines two sources of data when UNION ALL is used Window GPU Executes a non-ranking window function WindowRanking GPU Executes a ranking window function WriteTable CPU Writes the result set to a standard table stored on disk

Tip: The full list of nodes appears in the Node types table, as part of the SHOW_NODE_INFO reference.

2.1.4.7.4 Examples

In general, looking at the top three longest running nodes (as is detailed in the timeSum column) can indicate the biggest bottlenecks. In the following examples you will learn how to identify and solve some common issues.

In this section:

• 1. Spooling to disk – Identifying the offending nodes – Common solutions for reducing spool • 2. Queries with large result sets – Identifying the offending nodes – Common solutions for reducing gather time • 3. Inefficient filtering – Identifying the situation – Common solutions for improving filtering • 4. Joins with varchar keys – Identifying the situation – Improving query performance • 5. Sorting on big VARCHAR fields – Identifying the situation – Improving sort performance on text keys • 6. High selectivity data – Identifying the situation – Improving performance with high selectivity hints • 7. Performance of unsorted data in joins – Identifying the situation – Improving join performance when data is sparse

198 Chapter 2. Guides SQream DB, Release 2021.1

• 8. Manual join reordering – Identifying the situation – Changing the join order

2.1.4.7.4.1 1. Spooling to disk

When there is not enough RAM to process a statement, SQream DB will spill over data to the temp folder in the storage disk. While this ensures that a statement can always finish processing, it can slow down the processing significantly. It’s worth identifying these statements, to figure out if the cluster is configured correctly, as well as potentially reduce the statement size. You can identify a statement that spools to disk by looking at the write column in the execution details. A node that spools will have a value, shown in megabytes in the write column. Common nodes that write spools include Join or LoopJoin.

2.1.4.7.4.2 Identifying the offending nodes

1. Run a query. For example, a query from the TPC-H benchmark:

SELECT o_year, SUM(CASE WHEN nation = 'BRAZIL' THEN volume ELSE 0 END) / SUM(volume) AS mkt_ ,→share FROM (SELECT datepart(YEAR,o_orderdate) AS o_year, l_extendedprice*(1 - l_discount / 100.0) AS volume, n2.n_name AS nation FROM lineitem JOIN part ON p_partkey = CAST (l_partkey AS INT) JOIN orders ON l_orderkey = o_orderkey JOIN customer ON o_custkey = c_custkey JOIN nation n1 ON c_nationkey = n1.n_nationkey JOIN region ON n1.n_regionkey = r_regionkey JOIN supplier ON s_suppkey = l_suppkey JOIN nation n2 ON s_nationkey = n2.n_nationkey WHERE o_orderdate BETWEEN '1995-01-01' AND '1996-12-31') AS all_nations GROUP BY o_year ORDER BY o_year;

2. Observe the execution information by using the foreign table, or use show_node_info This statement is made up of 199 nodes, starting from a ReadTable, and finishes by returning only 2 results to the client. The execution below has been shortened, but note the highlighted rows for LoopJoin: t=> SELECT message FROM logs WHERE message_type_id = 200 LIMIT 1; message ------,→----- (continues on next page)

2.1. Full list of guides 199 SQream DB, Release 2021.1

(continued from previous page) SELECT o_year, SUM(CASE WHEN nation = 'BRAZIL' THEN volume ELSE 0 END) / SUM(volume) AS mkt_ ,→share : FROM (SELECT datepart(YEAR,o_orderdate) AS o_year, : l_extendedprice*(1 - l_discount / 100.0) AS volume, : n2.n_name AS nation : FROM lineitem : JOIN part ON p_partkey = CAST (l_partkey AS INT) : JOIN orders ON l_orderkey = o_orderkey : JOIN customer ON o_custkey = c_custkey : JOIN nation n1 ON c_nationkey = n1.n_nationkey : JOIN region ON n1.n_regionkey = r_regionkey : JOIN supplier ON s_suppkey = l_suppkey : JOIN nation n2 ON s_nationkey = n2.n_nationkey : WHERE o_orderdate BETWEEN '1995-01-01' AND '1996-12-31') AS all_nations : GROUP BY o_year : ORDER BY o_year : 1,PushToNetworkQueue ,2,1,2,2020-09-04 18:32:50,-1,,,,0.27 : 2,Rechunk ,2,1,2,2020-09-04 18:32:50,1,,,,0.00 : 3,SortMerge ,2,1,2,2020-09-04 18:32:49,2,,,,0.00 : 4,GpuToCpu ,2,1,2,2020-09-04 18:32:49,3,,,,0.00 : 5,Sort ,2,1,2,2020-09-04 18:32:49,4,,,,0.00 : 6,ReorderInput ,2,1,2,2020-09-04 18:32:49,5,,,,0.00 : 7,GpuTransform ,2,1,2,2020-09-04 18:32:49,6,,,,0.00 : 8,CpuToGpu ,2,1,2,2020-09-04 18:32:49,7,,,,0.00 : 9,Rechunk ,2,1,2,2020-09-04 18:32:49,8,,,,0.00 : 10,ReduceMerge ,2,1,2,2020-09-04 18:32:49,9,,,,0.03 : 11,GpuToCpu ,6,3,2,2020-09-04 18:32:49,10,,,,0.00 : 12,Reduce ,6,3,2,2020-09-04 18:32:49,11,,,,0.64 [...] : 49,LoopJoin ,182369485,7,26052783,2020-09-04 18:32:36,48,1915MB, ,→1915MB,inner,4.94 [...] : 98,LoopJoin ,182369485,12,15197457,2020-09-04 18:32:16,97,2191MB, ,→2191MB,inner,5.01 [...] : 124,LoopJoin ,182369485,8,22796185,2020-09-04 18:32:03,123,3064MB, ,→3064MB,inner,6.73 [...] : 150,LoopJoin ,182369485,10,18236948,2020-09-04 18:31:47,149,12860MB, ,→12860MB,inner,23.62 [...] : 199,ReadTable ,20000000,1,20000000,2020-09-04 18:30:33,198,0MB,,public. ,→part,0.83

Because of the relatively low amount of RAM in the machine and because the data set is rather large at around 10TB, SQream DB needs to spool. The total spool used by this query is around 20GB (1915MB + 2191MB + 3064MB + 12860MB).

200 Chapter 2. Guides SQream DB, Release 2021.1

2.1.4.7.4.3 Common solutions for reducing spool

• Increase the amount of spool memory available for the workers, as a proportion of the maximum statement memory. When the amount of spool memory is increased, SQream DB may not need to write to disk. This setting is called spoolMemoryGB. Refer to the Configuration guide. • Reduce the amount of workers per host, and increase the amount of spool available to the (now reduced amount of) active workers. This may reduce the amount of concurrent statements, but will improve performance for heavy statements.

2.1.4.7.4.4 2. Queries with large result sets

When queries have large result sets, you may see a node called DeferredGather. This gathering occurs when the result set is assembled, in preparation for sending it to the client.

2.1.4.7.4.5 Identifying the offending nodes

1. Run a query. For example, a modified query from the TPC-H benchmark:

SELECT s.*, l.*, r.*, n1.*, n2.*, p.*, o.*, c.* FROM lineitem l JOIN part p ON p_partkey = CAST (l_partkey AS INT) JOIN orders o ON l_orderkey = o_orderkey JOIN customer c ON o_custkey = c_custkey JOIN nation n1 ON c_nationkey = n1.n_nationkey JOIN region r ON n1.n_regionkey = r_regionkey JOIN supplier s ON s_suppkey = l_suppkey JOIN nation n2 ON s_nationkey = n2.n_nationkey WHERE r_name = 'AMERICA' AND o_orderdate BETWEEN '1995-01-01' AND '1996-12-31' AND high_selectivity(p_type = 'ECONOMY BURNISHED NICKEL');

2. Observe the execution information by using the foreign table, or use show_node_info This statement is made up of 221 nodes, containing 8 ReadTable nodes, and finishes by returning billions of results to the client. The execution below has been shortened, but note the highlighted rows for DeferredGather:

t=> SELECT show_node_info(494); stmt_id | node_id | node_type | rows | chunks | avg_rows_in_chunk |␣ ,→time | parent_node_id | read | write | comment | timeSum (continues on next page)

2.1. Full list of guides 201 SQream DB, Release 2021.1

(continued from previous page) ------+------+------+------+------+------+- ,→------+------+------+------+------+------494 | 1 | PushToNetworkQueue | 242615 | 1 | 242615 |␣ ,→2020-09-04 19:07:55 | -1 | | | | 0.36 494 | 2 | Rechunk | 242615 | 1 | 242615 |␣ ,→2020-09-04 19:07:55 | 1 | | | | 0 494 | 3 | ReorderInput | 242615 | 1 | 242615 |␣ ,→2020-09-04 19:07:55 | 2 | | | | 0 494 | 4 | DeferredGather | 242615 | 1 | 242615 |␣ ,→2020-09-04 19:07:55 | 3 | | | | 0.16 [...] 494 | 166 | DeferredGather | 3998730 | 39 | 102531 |␣ ,→2020-09-04 19:07:47 | 165 | | | | 21.75 [...] 494 | 194 | DeferredGather | 133241 | 20 | 6662 |␣ ,→2020-09-04 19:07:03 | 193 | | | | 0.41 [...] 494 | 221 | ReadTable | 20000000 | 20 | 1000000 |␣ ,→2020-09-04 19:07:01 | 220 | 20MB | | public.part | 0.1

When you see DeferredGather operations taking more than a few seconds, that’s a sign that you’re selecting too much data. In this case, the DeferredGather with node ID 166 took over 21 seconds. 3. Modify the statement to see the difference Altering the select clause to be more restrictive will reduce the deferred gather time back to a few milliseconds.

SELECT DATEPART(year, o_orderdate) AS o_year, l_extendedprice * (1 - l_discount / 100.0) as volume, n2.n_name as nation FROM ...

2.1.4.7.4.6 Common solutions for reducing gather time

• Reduce the effect of the preparation time. Avoid selecting unnecessary columns (SELECT * FROM...), or reduce the result set size by using more filters.

2.1.4.7.4.7 3. Inefficient filtering

When running statements, SQream DB tries to avoid reading data that is not needed for the statement by skipping chunks. If statements do not include efficient filtering, SQream DB will read a lot of data off disk. Insomecases, you need the data and there’s nothing to do about it. However, if most of it gets pruned further down the line, it may be efficient to skip reading the data altogether by using the metadata.

2.1.4.7.4.8 Identifying the situation

We consider the filtering to be inefficient when the Filter node shows that the number of rows processed is less than a third of the rows passed into it by the ReadTable node.

202 Chapter 2. Guides SQream DB, Release 2021.1

For example: 1. Run a query. In this example, we execute a modified query from the TPC-H benchmark. Our lineitem table contains 600,037,902 rows.

SELECT o_year, SUM(CASE WHEN nation = 'BRAZIL' THEN volume ELSE 0 END) / SUM(volume) AS mkt_ ,→share FROM (SELECT datepart(YEAR,o_orderdate) AS o_year, l_extendedprice*(1 - l_discount / 100.0) AS volume, n2.n_name AS nation FROM lineitem JOIN part ON p_partkey = CAST (l_partkey AS INT) JOIN orders ON l_orderkey = o_orderkey JOIN customer ON o_custkey = c_custkey JOIN nation n1 ON c_nationkey = n1.n_nationkey JOIN region ON n1.n_regionkey = r_regionkey JOIN supplier ON s_suppkey = l_suppkey JOIN nation n2 ON s_nationkey = n2.n_nationkey WHERE r_name = 'AMERICA' AND lineitem.l_quantity = 3 AND o_orderdate BETWEEN '1995-01-01' AND '1996-12-31' AND high_selectivity(p_type = 'ECONOMY BURNISHED NICKEL')) AS all_nations GROUP BY o_year ORDER BY o_year;

2. Observe the execution information by using the foreign table, or use show_node_info The execution below has been shortened, but note the highlighted rows for ReadTable and Filter:

1 t=> SELECT show_node_info(559); 2 stmt_id | node_id | node_type | rows | chunks | avg_rows_in_chunk |␣ ,→time | parent_node_id | read | write | comment | timeSum 3 ------+------+------+------+------+------+- ,→------+------+------+------+------+------4 559 | 1 | PushToNetworkQueue | 2 | 1 | 2 |␣ ,→2020-09-07 11:12:01 | -1 | | | | 0.28 5 559 | 2 | Rechunk | 2 | 1 | 2 |␣ ,→2020-09-07 11:12:01 | 1 | | | | 0 6 559 | 3 | SortMerge | 2 | 1 | 2 |␣ ,→2020-09-07 11:12:01 | 2 | | | | 0 7 559 | 4 | GpuToCpu | 2 | 1 | 2 |␣ ,→2020-09-07 11:12:01 | 3 | | | | 0 8 [...] 9 559 | 189 | Filter | 12007447 | 12 | 1000620 |␣ ,→2020-09-07 11:12:00 | 188 | | | | 0.3 10 559 | 190 | GpuTransform | 600037902 | 12 | 50003158 |␣ ,→2020-09-07 11:12:00 | 189 | | | | 0.02 11 559 | 191 | GpuDecompress | 600037902 | 12 | 50003158 |␣ ,→2020-09-07 11:12:00 | 190 | | | | 0.16 12 559 | 192 | GpuTransform | 600037902 | 12 | 50003158 |␣ ,→2020-09-07 11:12:00 | 191 | | | | 0.02 13 559 | 193 | CpuToGpu | 600037902 | 12 | 50003158 |␣ ,→2020-09-07 11:12:00 | 192 | | | (continues| on next 1.47 page)

2.1. Full list of guides 203 SQream DB, Release 2021.1

(continued from previous page)

14 559 | 194 | ReorderInput | 600037902 | 12 | 50003158 |␣ ,→2020-09-07 11:12:00 | 193 | | | | 0 15 559 | 195 | Rechunk | 600037902 | 12 | 50003158 |␣ ,→2020-09-07 11:12:00 | 194 | | | | 0 16 559 | 196 | CpuDecompress | 600037902 | 12 | 50003158 |␣ ,→2020-09-07 11:12:00 | 195 | | | | 0 17 559 | 197 | ReadTable | 600037902 | 12 | 50003158 |␣ ,→2020-09-07 11:12:00 | 196 | 7587MB | | public.lineitem | 0.1 18 [...] 19 559 | 208 | Filter | 133241 | 20 | 6662 |␣ ,→2020-09-07 11:11:57 | 207 | | | | 0.01 20 559 | 209 | GpuTransform | 20000000 | 20 | 1000000 |␣ ,→2020-09-07 11:11:57 | 208 | | | | 0.02 21 559 | 210 | GpuDecompress | 20000000 | 20 | 1000000 |␣ ,→2020-09-07 11:11:57 | 209 | | | | 0.03 22 559 | 211 | GpuTransform | 20000000 | 20 | 1000000 |␣ ,→2020-09-07 11:11:57 | 210 | | | | 0 23 559 | 212 | CpuToGpu | 20000000 | 20 | 1000000 |␣ ,→2020-09-07 11:11:57 | 211 | | | | 0.01 24 559 | 213 | ReorderInput | 20000000 | 20 | 1000000 |␣ ,→2020-09-07 11:11:57 | 212 | | | | 0 25 559 | 214 | Rechunk | 20000000 | 20 | 1000000 |␣ ,→2020-09-07 11:11:57 | 213 | | | | 0 26 559 | 215 | CpuDecompress | 20000000 | 20 | 1000000 |␣ ,→2020-09-07 11:11:57 | 214 | | | | 0 27 559 | 216 | ReadTable | 20000000 | 20 | 1000000 |␣ ,→2020-09-07 11:11:57 | 215 | 20MB | | public.part | 0

• The Filter on line 9 has processed 12,007,447 rows, but the output of ReadTable on public. lineitem on line 17 was 600,037,902 rows. This means that it has filtered out 98% (1 − 600037902 = 98%) of the data, but the entire table was read. 12007447 • The Filter on line 19 has processed 133,000 rows, but the output of ReadTable on public.part 133241 on line 27 was 20,000,000 rows. This means that it has filtered out >99% (1− = 99.4%) 20000000 of the data, but the entire table was read. However, this table is small enough that we can ignore it. 3. Modify the statement to see the difference Altering the statement to have a WHERE condition on the clustered l_orderkey column of the lineitem table will help SQream DB skip reading the data.

SELECT o_year, SUM(CASE WHEN nation = 'BRAZIL' THEN volume ELSE 0 END) / SUM(volume) AS mkt_ ,→share FROM (SELECT datepart(YEAR,o_orderdate) AS o_year, l_extendedprice*(1 - l_discount / 100.0) AS volume, n2.n_name AS nation FROM lineitem JOIN part ON p_partkey = CAST (l_partkey AS INT) JOIN orders ON l_orderkey = o_orderkey JOIN customer ON o_custkey = c_custkey (continues on next page)

204 Chapter 2. Guides SQream DB, Release 2021.1

(continued from previous page) JOIN nation n1 ON c_nationkey = n1.n_nationkey JOIN region ON n1.n_regionkey = r_regionkey JOIN supplier ON s_suppkey = l_suppkey JOIN nation n2 ON s_nationkey = n2.n_nationkey WHERE r_name = 'AMERICA' AND lineitem.l_orderkey > 4500000 AND o_orderdate BETWEEN '1995-01-01' AND '1996-12-31' AND high_selectivity(p_type = 'ECONOMY BURNISHED NICKEL')) AS all_nations GROUP BY o_year ORDER BY o_year;

1 t=> SELECT show_node_info(586); 2 stmt_id | node_id | node_type | rows | chunks | avg_rows_in_chunk |␣ ,→time | parent_node_id | read | write | comment | timeSum 3 ------+------+------+------+------+------+- ,→------+------+------+------+------+------4 [...] 5 586 | 190 | Filter | 494621593 | 8 | 61827699 |␣ ,→2020-09-07 13:20:45 | 189 | | | | 0.39 6 586 | 191 | GpuTransform | 494927872 | 8 | 61865984 |␣ ,→2020-09-07 13:20:44 | 190 | | | | 0.03 7 586 | 192 | GpuDecompress | 494927872 | 8 | 61865984 |␣ ,→2020-09-07 13:20:44 | 191 | | | | 0.26 8 586 | 193 | GpuTransform | 494927872 | 8 | 61865984 |␣ ,→2020-09-07 13:20:44 | 192 | | | | 0.01 9 586 | 194 | CpuToGpu | 494927872 | 8 | 61865984 |␣ ,→2020-09-07 13:20:44 | 193 | | | | 1.86 10 586 | 195 | ReorderInput | 494927872 | 8 | 61865984 |␣ ,→2020-09-07 13:20:44 | 194 | | | | 0 11 586 | 196 | Rechunk | 494927872 | 8 | 61865984 |␣ ,→2020-09-07 13:20:44 | 195 | | | | 0 12 586 | 197 | CpuDecompress | 494927872 | 8 | 61865984 |␣ ,→2020-09-07 13:20:44 | 196 | | | | 0 13 586 | 198 | ReadTable | 494927872 | 8 | 61865984 |␣ ,→2020-09-07 13:20:44 | 197 | 6595MB | | public.lineitem | 0.09 14 [...]

In this example, the filter processed 494,621,593 rows, while the output of ReadTable on public. 494621593 lineitem was 494,927,872 rows. This means that it has filtered out all but 0.01% (1 − = 494927872 0.01%) of the data that was read. The metadata skipping has performed very well, and has pre-filtered the data for us by pruning unnecessary chunks.

2.1.4.7.4.9 Common solutions for improving filtering

• Use clustering keys and naturally ordered data in your filters. • Avoid full table scans when possible

2.1. Full list of guides 205 SQream DB, Release 2021.1

2.1.4.7.4.10 4. Joins with varchar keys

Joins on long text keys, such as varchar(100) do not perform as well as numeric data types or very short text keys.

2.1.4.7.4.11 Identifying the situation

When a join is inefficient, you may note that a query spends a lot of time onthe Join node. For example, consider these two table structures:

CREATE TABLE t_a ( amt FLOAT NOT NULL, i INT NOT NULL, ts DATETIME NOT NULL, country_code VARCHAR(3) NOT NULL, flag VARCHAR(10) NOT NULL, fk VARCHAR(50) NOT NULL );

CREATE TABLE t_b ( id VARCHAR(50) NOT NULL prob FLOAT NOT NULL, j INT NOT NULL, );

1. Run a query. In this example, we will join t_a.fk with t_b.id, both of which are VARCHAR(50).

SELECT AVG(t_b.j :: BIGINT), t_a.country_code FROM t_a JOIN t_b ON (t_a.fk = t_b.id) GROUP BY t_a.country_code

2. Observe the execution information by using the foreign table, or use show_node_info The execution below has been shortened, but note the highlighted rows for Join. The Join node is by far the most time-consuming part of this statement - clocking in at 69.7 seconds joining 1.5 billion records.

1 t=> SELECT show_node_info(5); 2 stmt_id | node_id | node_type | rows | chunks | avg_rows_in_chunk␣ ,→| time | parent_node_id | read | write | comment | timeSum 3 ------+------+------+------+------+------,→+------+------+------+------+------+------4 [...] 5 5 | 19 | GpuTransform | 1497366528 | 204 | 7340032␣ ,→| 2020-09-08 18:29:03 | 18 | | | | 1.46 6 5 | 20 | ReorderInput | 1497366528 | 204 | 7340032␣ ,→| 2020-09-08 18:29:03 | 19 | | | | 0 (continues on next page)

206 Chapter 2. Guides SQream DB, Release 2021.1

(continued from previous page)

7 5 | 21 | ReorderInput | 1497366528 | 204 | 7340032␣ ,→| 2020-09-08 18:29:03 | 20 | | | | 0 8 5 | 22 | Join | 1497366528 | 204 | 7340032␣ ,→| 2020-09-08 18:29:03 | 21 | | | inner | 69.7 9 5 | 24 | AddSortedMinMaxMet.. | 6291456 | 1 | 6291456␣ ,→| 2020-09-08 18:26:05 | 22 | | | | 0 10 5 | 25 | Sort | 6291456 | 1 | 6291456␣ ,→| 2020-09-08 18:26:05 | 24 | | | | 2.06 11 [...] 12 5 | 31 | ReadTable | 6291456 | 1 | 6291456␣ ,→| 2020-09-08 18:26:03 | 30 | 235MB | | public.t_b | 0.02 13 [...] 14 5 | 41 | CpuDecompress | 10000000 | 2 | 5000000␣ ,→| 2020-09-08 18:26:09 | 40 | | | | 0 15 5 | 42 | ReadTable | 10000000 | 2 | 5000000␣ ,→| 2020-09-08 18:26:09 | 41 | 14MB | | public.t_a | 0

2.1.4.7.4.12 Improving query performance

• In general, try to avoid VARCHAR as a join key. As a rule of thumb, BIGINT works best as a join key. • Convert text values on-the-fly before running the query. For example, the CRC64 function takes a text input and returns a BIGINT hash. For example:

SELECT AVG(t_b.j :: BIGINT), t_a.country_code FROM t_a JOIN t_b ON (crc64_join(t_a.fk) = crc64_join(t_b.id)) GROUP BY t_a.country_code

The execution below has been shortened, but note the highlighted rows for Join. The Join node went from taking nearly 70 seconds, to just 6.67 seconds for joining 1.5 billion records.

1 t=> SELECT show_node_info(6); 2 stmt_id | node_id | node_type | rows | chunks | avg_rows_in_ ,→chunk | time | parent_node_id | read | write | comment |␣ ,→timeSum 3 ------+------+------+------+------+------,→--+------+------+------+------+------+------4 [...] 5 6 | 19 | GpuTransform | 1497366528 | 85 | ␣ ,→17825792 | 2020-09-08 18:57:04 | 18 | | | | ␣ ,→1.48 6 6 | 20 | ReorderInput | 1497366528 | 85 | ␣ ,→17825792 | 2020-09-08 18:57:04 | 19 | | | | ␣ ,→ 0 7 6 | 21 | ReorderInput | 1497366528 | 85 | ␣ ,→17825792 | 2020-09-08 18:57:04 | 20 | | | | ␣ ,→ 0 8 6 | 22 | Join | 1497366528 | 85 | ␣ ,→17825792 | 2020-09-08 18:57:04 | 21 | | |(continues inner on next| page)␣ ,→6.67

2.1. Full list of guides 207 SQream DB, Release 2021.1

(continued from previous page)

9 6 | 24 | AddSortedMinMaxMet.. | 6291456 | 1 | ␣ ,→6291456 | 2020-09-08 18:55:12 | 22 | | | | ␣ ,→ 0 10 [...] 11 6 | 32 | ReadTable | 6291456 | 1 | ␣ ,→6291456 | 2020-09-08 18:55:12 | 31 | 235MB | | public.t_b | ␣ ,→0.02 12 [...] 13 6 | 43 | CpuDecompress | 10000000 | 2 | ␣ ,→5000000 | 2020-09-08 18:55:13 | 42 | | | | ␣ ,→ 0 14 6 | 44 | ReadTable | 10000000 | 2 | ␣ ,→5000000 | 2020-09-08 18:55:13 | 43 | 14MB | | public.t_a | ␣ ,→ 0

• You can map some text values to numeric types by using a dimension table. Then, reconcile the values when you need them by joining the dimension table.

2.1.4.7.4.13 5. Sorting on big VARCHAR fields

In general, SQream DB automatically inserts a Sort node which arranges the data prior to reductions and aggregations. When running a GROUP BY on large VARCHAR fields, you may see nodes for Sort and Reduce taking a long time.

2.1.4.7.4.14 Identifying the situation

When running a statement, inspect it with SHOW_NODE_INFO. If you see Sort and Reduce among your top five longest running nodes, there is a potential issue. For example: 1. Run a query to test it out. Our t_inefficient table contains 60,000,000 rows, and the structure is simple, but with an oversized country_code column:

CREATE TABLE t_inefficient ( i INT NOT NULL, amt DOUBLE NOT NULL, ts DATETIME NOT NULL, country_code VARCHAR(100) NOT NULL, flag VARCHAR(10) NOT NULL, string_fk VARCHAR(50) NOT NULL );

We will run a query, and inspect it’s execution details:

t=> SELECT country_code, . SUM(amt) . FROM t_inefficient (continues on next page)

208 Chapter 2. Guides SQream DB, Release 2021.1

(continued from previous page) . GROUP BY country_code; executed time: 47.55s

country_code | sum ------+------VUT | 1195416012 GIB | 1195710372 TUR | 1195946178 [...]

t=> select show_node_info(30); stmt_id | node_id | node_type | rows | chunks | avg_rows_in_chunk |␣ ,→time | parent_node_id | read | write | comment |␣ ,→timeSum ------+------+------+------+------+------+---- ,→------+------+------+------+------+------30 | 1 | PushToNetworkQueue | 249 | 1 | 249 |␣ ,→2020-09-10 16:17:10 | -1 | | | | ␣ ,→0.25 30 | 2 | Rechunk | 249 | 1 | 249 |␣ ,→2020-09-10 16:17:10 | 1 | | | | ␣ ,→ 0 30 | 3 | ReduceMerge | 249 | 1 | 249 |␣ ,→2020-09-10 16:17:10 | 2 | | | | ␣ ,→0.01 30 | 4 | GpuToCpu | 1508 | 15 | 100 |␣ ,→2020-09-10 16:17:10 | 3 | | | | ␣ ,→ 0 30 | 5 | Reduce | 1508 | 15 | 100 |␣ ,→2020-09-10 16:17:10 | 4 | | | | ␣ ,→7.23 30 | 6 | Sort | 60000000 | 15 | 4000000 |␣ ,→2020-09-10 16:17:10 | 5 | | | | ␣ ,→36.8 30 | 7 | GpuTransform | 60000000 | 15 | 4000000 |␣ ,→2020-09-10 16:17:10 | 6 | | | | ␣ ,→0.08 30 | 8 | GpuDecompress | 60000000 | 15 | 4000000 |␣ ,→2020-09-10 16:17:10 | 7 | | | | ␣ ,→2.01 30 | 9 | CpuToGpu | 60000000 | 15 | 4000000 |␣ ,→2020-09-10 16:17:10 | 8 | | | | ␣ ,→0.16 30 | 10 | Rechunk | 60000000 | 15 | 4000000 |␣ ,→2020-09-10 16:17:10 | 9 | | | | ␣ ,→ 0 30 | 11 | CpuDecompress | 60000000 | 15 | 4000000 |␣ ,→2020-09-10 16:17:10 | 10 | | | | ␣ ,→ 0 30 | 12 | ReadTable | 60000000 | 15 | 4000000 |␣ ,→2020-09-10 16:17:10 | 11 | 520MB | | public.t_inefficient | ␣ (continues on next page) ,→0.05

2.1. Full list of guides 209 SQream DB, Release 2021.1

(continued from previous page)

2. We can look to see if there’s any shrinking we can do on the GROUP BY key

t=> SELECT MAX(LEN(country_code)) FROM t_inefficient; max --- 3

With a maximum string length of just 3 characters, our VARCHAR(100) is way oversized. 3. We can recreate the table with a more restrictive VARCHAR(3), and can examine the difference in performance:

t=> CREATE TABLE t_efficient . AS SELECT i, . amt, . ts, . country_code::VARCHAR(3) AS country_code, . flag . FROM t_inefficient; executed time: 16.03s

t=> SELECT country_code, . SUM(amt::bigint) . FROM t_efficient . GROUP BY country_code; executed time: 4.75s country_code | sum ------+------VUT | 1195416012 GIB | 1195710372 TUR | 1195946178 [...]

This time, the entire query took just 4.75 seconds, or just about 91% faster.

2.1.4.7.4.15 Improving sort performance on text keys

When using VARCHAR, ensure that the maximum length defined in the table structure is as smallas necessary. For example, if you’re storing phone numbers, don’t define the field as VARCHAR(255), as that affects sort performance. You can run a query to get the maximum column length (e.g. MAX(LEN(a_column))), and potentially modify the table structure.

2.1.4.7.4.16 6. High selectivity data

Selectivity is the ratio of cardinality to the number of records of a chunk. We define selectivity as Distinct values Total number of records in a chunk

210 Chapter 2. Guides SQream DB, Release 2021.1

SQream DB has a hint called HIGH_SELECTIVITY, which is a function you can wrap a condition in. The hint signals to SQream DB that the result of the condition will be very sparse, and that it should attempt to rechunk the results into fewer, fuller chunks.

Note: SQream DB doesn’t do this automatically because it adds a significant overhead on naturally ordered and well-clustered data, which is the more common scenario.

2.1.4.7.4.17 Identifying the situation

This is easily identifiable - when the amount of average of rows in a chunk is small, followinga Filter operation. Consider this execution plan:

t=> select show_node_info(30); stmt_id | node_id | node_type | rows | chunks | avg_rows_in_chunk | time ␣ ,→ | parent_node_id | read | write | comment | timeSum ------+------+------+------+------+------+------,→------+------+------+------+------+------[...] 30 | 38 | Filter | 18160 | 74 | 245 | 2020-09- ,→10 12:17:09 | 37 | | | | 0.012 [...] 30 | 44 | ReadTable | 77000000 | 74 | 1040540 | 2020-09- ,→10 12:17:09 | 43 | 277MB | | public.dim | 0.058

The table was read entirely - 77 million rows into 74 chunks. The filter node reduced the output to just 18,160 relevant rows, but they’re distributed across theoriginal 74 chunks. All of these rows could fit in one single chunk, instead of spanning 74 rather sparse chunks.

2.1.4.7.4.18 Improving performance with high selectivity hints

• Use when there’s a WHERE condition on an unclustered column, and when you expect the filter to cut out more than 60% of the result set. • Use when the data is uniformly distributed or random

2.1.4.7.4.19 7. Performance of unsorted data in joins

When data is not well-clustered or naturally ordered, a join operation can take a long time.

2.1.4.7.4.20 Identifying the situation

When running a statement, inspect it with SHOW_NODE_INFO. If you see Join and DeferredGather among your top five longest running nodes, there is a potential issue. In this case, we’re also interested in the number of chunks produced by these nodes. Consider this execution plan:

2.1. Full list of guides 211 SQream DB, Release 2021.1

t=> select show_node_info(30); stmt_id | node_id | node_type | rows | chunks | avg_rows_in_chunk | time ␣ ,→ | parent_node_id | read | write | comment | timeSum ------+------+------+------+------+------+------,→------+------+------+------+------+------[...] 30 | 13 | ReorderInput | 181582598 | 70596 | 2572 | 2020-09- ,→10 12:17:10 | 12 | | | | 4.681 30 | 14 | DeferredGather | 181582598 | 70596 | 2572 | 2020-09- ,→10 12:17:10 | 13 | | | | 29.901 30 | 15 | ReorderInput | 181582598 | 70596 | 2572 | 2020-09- ,→10 12:17:10 | 14 | | | | 3.053 30 | 16 | GpuToCpu | 181582598 | 70596 | 2572 | 2020-09- ,→10 12:17:10 | 15 | | | | 5.798 30 | 17 | ReorderInput | 181582598 | 70596 | 2572 | 2020-09- ,→10 12:17:10 | 16 | | | | 2.899 30 | 18 | ReorderInput | 181582598 | 70596 | 2572 | 2020-09- ,→10 12:17:10 | 17 | | | | 3.695 30 | 19 | Join | 181582598 | 70596 | 2572 | 2020-09- ,→10 12:17:10 | 18 | | | inner | 22.745 [...] 30 | 38 | Filter | 18160 | 74 | 245 | 2020-09- ,→10 12:17:09 | 37 | | | | 0.012 [...] 30 | 44 | ReadTable | 77000000 | 74 | 1040540 | 2020-09- ,→10 12:17:09 | 43 | 277MB | | public.dim | 0.058

• Join is the node that matches rows from both table relations. • DeferredGather gathers the required column chunks to decompress Pay special attention to the volume of data removed by the Filter node. The table was read entirely - 77 million rows into 74 chunks. The filter node reduced the output to just 18,160 relevant rows, but they’re distributed across theoriginal 74 chunks. All of these rows could fit in one single chunk, instead of spanning 74 rather sparse chunks.

2.1.4.7.4.21 Improving join performance when data is sparse

You can tell SQream DB to reduce the amount of chunks involved, if you know that the filter is going to be quite agressive by using the HIGH_SELECTIVITY hint described above. This forces the compiler to rechunk the data into fewer chunks. To tell SQream DB to rechunk the data, wrap a condition (or several) in the HIGH_SELECTIVITY hint: -- Without the hint SELECT * FROM cdrs WHERE RequestReceiveTime BETWEEN '2018-01-01 00:00:00.000' AND '2018-08-31 23:59:59.999' AND EnterpriseID=1150 AND MSISDN='9724871140341';

-- With the hint (continues on next page)

212 Chapter 2. Guides SQream DB, Release 2021.1

(continued from previous page) SELECT * FROM cdrs WHERE HIGH_SELECTIVITY(RequestReceiveTime BETWEEN '2018-01-01 00:00:00.000' AND '2018-08- ,→31 23:59:59.999') AND EnterpriseID=1150 AND MSISDN='9724871140341';

2.1.4.7.4.22 8. Manual join reordering

When joining multiple tables, you may wish to change the join order to join the smallest tables first.

2.1.4.7.4.23 Identifying the situation

When joining more than two tables, the Join nodes will be the most time-consuming nodes.

2.1.4.7.4.24 Changing the join order

Always prefer to join the smallest tables first.

Note: We consider small tables to be tables that only retain a small amount of rows after conditions are applied. This bears no direct relation to the amount of total rows in the table.

Changing the join order can reduce the query runtime significantly. In the examples below, we reduce the time from 27.3 seconds to just 6.4 seconds.

Listing 11: Original query -- This variant runs in 27.3 seconds SELECT SUM(l_extendedprice / 100.0*(1 - l_discount / 100.0)) AS revenue, c_nationkey FROM lineitem --6B Rows, ~183GB

JOIN orders --1.5B Rows, ~55GB ON l_orderkey = o_orderkey JOIN customer --150M Rows, ~12GB ON c_custkey = o_custkey

WHERE c_nationkey = 1 AND o_orderdate >= DATE '1993-01-01' AND o_orderdate < '1994-01-01' AND l_shipdate >= '1993-01-01' AND l_shipdate <= dateadd(DAY,122,'1994-01-01') GROUP BY c_nationkey

2.1. Full list of guides 213 SQream DB, Release 2021.1

Listing 12: Modified query with improved join order -- This variant runs in 6.4 seconds SELECT SUM(l_extendedprice / 100.0*(1 - l_discount / 100.0)) AS revenue, c_nationkey FROM orders --1.5B Rows, ~55GB

JOIN customer --150M Rows, ~12GB ON c_custkey = o_custkey JOIN lineitem --6B Rows, ~183GB ON l_orderkey = o_orderkey

WHERE c_nationkey = 1 AND o_orderdate >= DATE '1993-01-01' AND o_orderdate < '1994-01-01' AND l_shipdate >= '1993-01-01' AND l_shipdate <= dateadd(DAY,122,'1994-01-01') GROUP BY c_nationkey

2.1.4.7.5 Further reading

See our Optimization and Best Practices guide for more information about query optimization and data loading considerations.

2.1.4.8 Logging

2.1.4.8.1 Locating the Log Files

The storage cluster contains a logs directory. Each worker produces a log file in its own directory, which can be identified by the worker’s hostname and port.

Note: Additional internal debug logs may reside in the main logs directory.

The worker logs contain information messages, warnings, and errors pertaining to SQream DB’s operation, including: • Server start-up and shutdown • Configuration changes • Exceptions and errors • User login events • Session events • Statement execution success / failure • Statement execution statistics

214 Chapter 2. Guides SQream DB, Release 2021.1

2.1.4.8.1.1 Log Structure and Contents

The log is a CSV, with several fields.

Table 8: Log fields Field Description #SQ# Start delimiter. When used with the end of line delimiter can be used to parse multi-line statements correctly Row Id Unique identifier for the row Timestamp Timestamp for the message (ISO 8601 date format) Information Information level of the message. See information level table below Level Thread Id System thread identifier (internal use) Worker host- Hostname of the worker that generated the message name Worker port Port of the worker that generated the message Connection Id Connection Id for the message. Defaults to -1 if no connection Database Database name that generated the message. Can be empty for no database name User Id User role that was connected during the message. Can be empty if no user caused the message Statement Id Statement Id for the message. Defaults to -1 if no statement Service name Service name for the connection. Can be empty. Message type Message type Id. See message type table below) Id Message Content for the message #EOM# End of line delimiter

Table 9: Information Level Level Description SYSTEM System information like start up, shutdown, configuration change FATAL Fatal errors that may cause outage ERROR Errors encountered during statement execution WARNING Warnings INFO Information and statistics

2.1. Full list of guides 215 SQream DB, Release 2021.1

Table 10: Message Type Type Level Description Example message con- tent 1 INFO Statement start infor- • "Query before mation parsing (state- ment handle opened) • "SELECT * FROM nba WHERE ""Team"" NOT LIKE ""Portland%%""" (statement preparing)

2 INFO Statement passed to an- • ""Reconstruct other worker for execu- query before tion parsing" • "SELECT * FROM nba WHERE ""Team"" NOT LIKE ""Portland%%""" (statement preparing on node)

4 INFO Statement has entered "Statement execution execution" 10 INFO Statement execution "Success" / "Failed" completed 20 INFO Compilation error, with "Could not find accompanying error function dateplart message in catalog." 21 INFO Execution error, with Error text accompanying error message 30 INFO Size of data read from 18 disk in megabytes 31 INFO Row count of result set 45 32 INFO Processed Rows 450134749978 100 INFO Session start - Client IP "192.168.5.5" address 101 INFO Login "Login Success" / "Login Failed" 110 INFO Session end "Session ended" 200 INFO SHOW_NODE_INFO periodic output 500 ERROR Exception occured in a "Cannot return the statement inverse cosine of a number not in [-1,1] range" 1000 SYSTEM Worker startup message "Server Start 216 TimeChapter - 2019-12-30 2. Guides 21:18:31, SQream ver{v2020.1}" 1002 SYSTEM Metadata Metadata server location 1003 SYSTEM Show all configuration values "Flags␣ ,→configuration: compileFlags,␣ ,→extendedAssertions, ,→ false, true; compileFlags,␣ ,→useSortMergeJoin, ,→ false, false; compileFlags,␣ ,→distinctAggregatesOnHost, ,→ true, false; [...]" 1004 SYSTEM SQream DB metadata "23" version 1010 FATAL Fatal server error "Mismatch in storage version, upgrade is needed, Storage version: 22, Server version is: 23" 1090 INFO Configuration change Successful set config useSortMergeJoin to value: true 1100 SYSTEM Worker shutdown "Server shutdown" SQream DB, Release 2021.1

2.1.4.8.1.2 Log-Naming

Log file name syntax sqream__.log • date is formatted %y%m%d, for example 20191231 for December 31st 2019. By default, each worker will create a new log file every time it is restarted. • sequence is the log’s sequence. When a log is rotated, the sequence number increases. This starts at 000. For example, /home/rhendricks/sqream_storage/192.168.1.91_5000. See the Changing Log Rotation below for information about controlling this setting.

2.1.4.8.2 Log Control and Maintenance

2.1.4.8.2.1 Changing Log Verbosity

A few configuration settings alter the verbosity of the logs:

Table 11: Log verbosity configuration Flag Description De- Values fault logClientLevelUsed to control which log level should appear in the 4 0 SYSTEM (lowest) - 4 logs (INFO) INFO (highest). See infor- mation level table above. nodeInfoLoggingSecSets an interval for automatically logging long- 60 Positive whole number >=1. running statements’ SHOW_NODE_INFO output. (every Output is written as a message type 200. minute)

2.1.4.8.2.2 Changing Log Rotation

A few configuration settings alter the log rotation policy:

Table 12: Log rotation configuration Flag Description De- Values fault useLogMaxFileSizeRotate log files once they reach a certain file size. When false false or true. true, set the logMaxFileSizeMB accordingly. logMaxFileSizeMB Sets the size threshold in megabytes after which a new log 20 1 to 1024 (1MB file will be opened. to 1GB) logFileRotateTimeFrequencyFrequency of log rotation never daily, weekly, monthly, never

2.1.4.8.3 Collecting Logs from Your Cluster

Collecting logs from your cluster can be as simple as creating an archive from the logs subdirectory: tar -czvf logs.tgz *.log.

2.1. Full list of guides 217 SQream DB, Release 2021.1

However, SQream DB comes bundled with a data collection utility and an SQL utility intended for collecting logs and additional information that can help SQream support drill down into possible issues.

2.1.4.8.3.1 SQL Syntax

SELECT REPORT_COLLECTION(output_path, mode) ; output_path ::= filepath mode ::= log | db | db_and_log

2.1.4.8.3.2 Command Line Utility

If you cannot access SQream DB for any reason, you can also use a command line toolto collect the same information:

$ ./bin/report_collection

2.1.4.8.3.3 Parameters

Pa- Description rame- ter output_pathPath for the output archive. The output file will be named report__

2.1.4.8.3.4 Example

Write an archive to /home/rhendricks, containing log files:

SELECT REPORT_COLLECTION('/home/rhendricks', 'log') ;

Write an archive to /home/rhendricks, containing log files and metadata database:

SELECT REPORT_COLLECTION('/home/rhendricks', 'db_and_log') ;

Using the command line utility:

$ ./bin/report_collection /home/rhendricks/sqream_storage /home/rhendricks db_and_log

218 Chapter 2. Guides SQream DB, Release 2021.1

2.1.4.8.4 Troubleshooting with Logs

2.1.4.8.4.1 Loading Logs with Foreign Tables

Assuming logs are stored at /home/rhendricks/sqream_storage/logs/, a database administrator can ac- cess the logs using the Foreign Tables concept through SQream DB.

CREATE FOREIGN TABLE logs ( start_marker VARCHAR(4), row_id BIGINT, timestamp DATETIME, message_level TEXT, thread_id TEXT, worker_hostname TEXT, worker_port INT, connection_id INT, database_name TEXT, user_name TEXT, statement_id INT, service_name TEXT, message_type_id INT, message TEXT, end_message VARCHAR(5) ) WRAPPER csv_fdw OPTIONS ( LOCATION = '/home/rhendricks/sqream_storage/logs/**/sqream*.log', DELIMITER = '|' CONTINUE_ON_ERROR = true ) ;

For more information, see Loading Logs with Foreign Tables.

2.1.4.8.4.2 Counting Message Types t=> SELECT message_type_id, COUNT(*) FROM logs GROUP BY 1; message_type_id | count ------+------0 | 9 1 | 5578 4 | 2319 10 | 2788 20 | 549 30 | 411 31 | 1720 32 | 1720 100 | 2592 101 | 2598 (continues on next page)

2.1. Full list of guides 219 SQream DB, Release 2021.1

(continued from previous page) 110 | 2571 200 | 11 500 | 136 1000 | 19 1003 | 19 1004 | 19 1010 | 5

2.1.4.8.4.3 Finding Fatal Errors t=> SELECT message FROM logs WHERE message_type_id=1010; Internal Runtime Error,open cluster metadata database:IO error: lock /home/rhendricks/ ,→sqream_storage/leveldb/LOCK: Resource temporarily unavailable Internal Runtime Error,open cluster metadata database:IO error: lock /home/rhendricks/ ,→sqream_storage/leveldb/LOCK: Resource temporarily unavailable Mismatch in storage version, upgrade is needed,Storage version: 25, Server version is: 26 Mismatch in storage version, upgrade is needed,Storage version: 25, Server version is: 26 Internal Runtime Error,open cluster metadata database:IO error: lock /home/rhendricks/ ,→sqream_storage/LOCK: Resource temporarily unavailable

2.1.4.8.4.4 Countng Error Events Within a Certain Timeframe t=> SELECT message_type_id, . COUNT(*) . FROM logs . WHERE message_type_id IN (1010,500) . AND timestamp BETWEEN '2019-12-20' AND '2020-01-01' . GROUP BY 1; message_type_id | count ------+------500 | 18 1010 | 3

2.1.4.8.4.5 Tracing Errors to Find Offending Statements

If we know an error occured, but don’t know which statement caused it, we can find it using the connection ID and statement ID. t=> SELECT connection_id, statement_id, message . FROM logs . WHERE message_level = 'ERROR' . AND timestamp BETWEEN '2020-01-01' AND '2020-01-06'; connection_id | statement_id | message ------+------+------,→------,→------79 | 67 | Column type mismatch, expected UByte, got INT64 on column␣ ,→Number, file name: /home/sqream/nba.parquet (continues on next page)

220 Chapter 2. Guides SQream DB, Release 2021.1

(continued from previous page)

Use the connection_id and statement_id to narrow down the results. t=> SELECT database_name, message FROM logs . WHERE connection_id=79 AND statement_id=67 AND message_type_id=1; database_name | message ------+------master | Query before parsing master | SELECT * FROM nba_parquet

2.1.4.9 Configuration

The Configuration page describes SQream’s method for configuring your instance of SQream and includes the following topics:

• Overview • Configuring Your Instance of SQream • Session-Based Configuration • Cluster-Based Configuration • Configuration Flag Types • Configuration Commands • Configuration Roles • Showing All Flags in the Catalog Table

2.1.4.9.1 Overview

Modifications that you make to your configurations are persistent based on whether they are madeatthe session or cluster level. Persistent configurations are modifications made to attributes that are retained after shutting down your system.

2.1.4.9.2 Configuring Your Instance of SQream

The Configuring Your Instance of SQream section describes the following:

• Configuring SQream Using the SQream Configuration File • Configuring SQream Using a Legacy Configuration File

2.1. Full list of guides 221 SQream DB, Release 2021.1

2.1.4.9.2.1 Configuring SQream Using the SQream Configuration File

You can configure your instance of SQream using the SQream configuration file (config.json). This file displays all the parameters required for configuring your instance of SQream and are read-only.The parameter settings in this file are stored at the metadata level and are applied globally to allworkers connected to it. The following is an example of a worker configuration file:

{ “cluster”: “/home/test_user/sqream_testing_temp/sqreamdb”, “gpu”: 0, “licensePath”: “home/test_user/SQream/tests/license.enc”, “machineIP”: “127.0.0.1”, “metadataServerIp”: “127.0.0.1”, “metadataServerPort”: “3105, “port”: 5000, “useConfigIP”” true, “legacyConfigFilePath”: “home/SQream_develop/SqrmRT/utils/json/legacy_congif.json” }

You can access the SQream configuration file from the legacyConfigFilePath parameter shown above.

2.1.4.9.2.2 Configuring SQream Using a Legacy Configuration File

The legacy configuration file provides access to the read/write flags used in SQream’s previous configuration method. A link to this file is provided in the legacyConfigFilePath parameter in the worker configuration file. The following is an example of the legacy configuration file:

{ “developerMode”: true, “reextentUse”: false, “useClientLog”: true, “useMetadataServer”” false }

2.1.4.9.3 Session-Based Configuration

Session-based configurations are not persistent and are deleted when your session ends. This method enables you to modify all required configurations while avoiding conflicts between flag attributes modified on different devices at different points in time. The SET flag_name command is used to modify flag attributes. Any modifications you make with the SET flag_name command apply only to the open session, and are not saved when the session ends. For more information, see the following: • Using SQream SQL - modifying flag attributes from the CLI. • SQream Acceleration Studio - modifying flag attributes from Studio.

222 Chapter 2. Guides SQream DB, Release 2021.1

2.1.4.9.4 Cluster-Based Configuration

SQream uses cluster-based configuration, enabling you to centralize configurations for all workers onthe cluster. Only flags set to the regular or cluster flag type have access to cluster-based configuration. Config- urations made on the cluster level are persistent. For more information, see the following: • Using SQream SQL - modifying flag attributes from the CLI. • SQream Acceleration Studio - modifying flag attributes from Studio. For more information on flag-based access to cluster-based configuration, see Configuration Flag Types below.

2.1.4.9.5 Configuration Flag Types

The flag type attribute can be set for each flag and determines its write access asfollows: • Regular: session-based read/write flags that can be stored in the metadata file • Cluster: global cluster-based read/write flags that can be stored in the metadata file • Worker: single worker-based read-only flags that can be stored in the worker configuration file The flag type determines which files can be accessed and which commands or commands sets users canrun. The following table describes the file or command modification rights for each flag type:

Flag Type Legacy Configuration File ALTER SYSTEM SET Worker Config File Regular Can modify Can modify Cannot modify Cluster Cannot modify Can modify Cannot modify Worker Cannot modify Cannot modify Can modify

2.1.4.9.6 Configuration Commands

The configuration commands are associated with particular flag types based on permissions. The following table describes the commands or command sets that can be run based on their flag type:

2.1. Full list of guides 223 SQream DB, Release 2021.1

Flag Type Command Description Example Regular SET Used for modifying flag SET developer- attributes. Mode=true Cluster ALTER SYSTEM SET Used to storing or mod- ALTER SYSTEM ifying flag attributes in SET Cluster ALTER SYSTEM Used to remove a flag or ALTER SYSTEM RE- RESET the metadata file. val ALTER SYSTEM RESET ALL> Regular, Cluster, SHOW | Used to print the value SHOW flag values. Regular, Cluster, SHOW ALL LIKE Used as a wildcard char- SHOW Worker acter for flag names. Regular, Cluster, show_conf UF Used to print all flags rechunkThreshold,90,true,RND,regular Worker with the following at- tributes: • Flag name • Default value • Is developer mode (Boolean) • Flag category • Flag type

Regular, Cluster, show_conf_extended Used to print all in- spoolMemoryGB,15,false,generic,regular,Amount Worker UF formation output by of memory (GB) the the show_conf UF server can use for command, in addition spooling,”Statement to description, usage, that perform “”group data type, default value by””, “”order by”” or and range. “”join”” operation(s) on large set of data will run much faster if given enough spool memory, otherwise disk spooling will be used re- sulting in performance hit.”,uint„0-5000 Regular, Cluster, show_md_flag UF Used to show a specific Worker flag/all flags stored in • Example 1: * the metadata file. master=> AL- TER SYSTEM SET heartbeat- Timeout=111; • Example 2: * master=> select show_md_flag(‘all’); heartbeatTime- out,111 • Example 3: * master=> select show_md_flag(‘heartbeatTimeout’); heartbeatTime- out,111 224 Chapter 2. Guides SQream DB, Release 2021.1

2.1.4.9.7 Configuration Roles

SQream divides flags into the following roles, each with their own set of permissions: • Generic – Flags that can be modified by standard users on a session basis. • Admin – Flags that can be modified by administrators on a session and cluster basis using theALTER SYSTEM SET command. The following table describes the Generic and Admin configuration flags:

2.1.4.9.8 Showing All Flags in the Catalog Table

SQream uses the sqream_catalog.parameters catalog table for showing all flags, providing the scope (default, cluster and session), description, default value and actual value. The following is the correct syntax for a catalog table query:

SELECT * FROM sqream_catalog.settings

The following is an example of a catalog table query: externalTableBlobEstimate, 100, 100, default, varcharEncoding, ascii, ascii, default, Changes the expected encoding for Varchar columns useCrcForTextJoinKeys, true, true, default, hiveStyleImplicitStringCasts, false, false, default,

2.1.4.10 Troubleshooting

In this topic:

• Troubleshooting common issues – Troubleshoot cluster setup and configuration – Troubleshoot connectivity issues – Troubleshoot query performance – Troubleshoot query behavior – File an issue with SQream support • Examining logs • Start a temporary SQream DB for testing

Follow this checklist if you find that the performance is slower than you expect.

2.1. Full list of guides 225 SQream DB, Release 2021.1

Table 13: Troubleshooting checklist Step Description Results 1 A single query is slow If a query isn’t performing as you expect, follow the Query best practices part of the Optimization and Best Practices guide. If all queries are slow, continue to step 2. 2 All queries on a specific table are slow 1. If all queries on a specific table aren’t performing as you expect, follow the Table design best practices part of the Optimization and Best Practices guide. 2. Check for active delete predicates in the table. Consult the Deleting Data guide for more information. If the problem spans all tables, continue to step 3. 3 Check that all workers are up Use SELECT show_cluster_nodes(); to list the active cluster workers. If the worker list is incomplete, follow the cluster troubleshooting section below. If all workers are up, continue to step 4. 4 Check that all workers are per- forming well 1. Identify if a specific worker is slower than others by running the same query on different workers. (e.g. by connecting directly to the worker or through a service queue) 2. If a specific worker is slower than others, inves- tigate performance issues on the host using standard monitoring tools (e.g. top). 3. Restart SQream DB work- ers on the problematic host. If all workers are performing well, continue to step 5. 5 Check if the workload is balanced 1. Run the same query sev- across all workers eral times and check that it appears across multi- ple workers (use SELECT show_server_status() to monitor) 2. If some workers have a heavier workload, check the service queue usage. Refer 226 to the WorkloadChapter 2. Manager Guides guide. If the workload is balanced, con- tinue to step 6. 6 Check if there are long running statements 1. Identify any cur- rently running state- ments (use SELECT show_server_status() to monitor) 2. If there are more state- ments than available re- sources, some statements may be in an In queue mode. 3. If there is a statement that has been running for too long and is block- ing the queue, consider stopping it (use SELECT stop_statement()). If the statement does not stop correctly, contact SQream sup- port. If there are no long running state- ments or this does not help, con- tinue to step 7. 7 Check if there are active locks 1. Use SELECT show_locks() to list any outstanding locks. 2. If a statement is lock- ing some objects, consider waiting for that statement to end or stop it. 3. If after a statement is com- pleted the locks don’t free up, refer to the Concur- rency and locks guide. If performance does not improve after the locks are released, con- tinue to step 8. 8 Check free memory across hosts 1. Check free memory across the hosts by running $ free -th from the termi- nal. 2. If the machine has less than 5% free memory, consider lowering the limitQueryMemoryGB and spoolMemoryGB settings. Refer to the Configuration guide. 3. If the machine has a lot of free memory, con- sider increasing the limitQueryMemoryGB and spoolMemoryGB settings. If performance does not improve, contact SQream support for more help. SQream DB, Release 2021.1

2.1.4.10.1 Troubleshooting common issues

2.1.4.10.1.1 Troubleshoot cluster setup and configuration

1. Note any errors - Make a note of any error you see, or check the logs for errors you might have missed. 2. If SQream DB can’t start, start SQream DB on a new storage cluster, with default settings. If it still can’t start, there could be a driver or hardware issue. Contact SQream support. 3. Reproduce the issue with a standalone SQream DB - starting up a temporary, standalone SQream DB can isolate the issue to a configuration issue, network issue, or similar. 4. Reproduce on a minimal example - Start a standalone SQream DB on a clean storage cluster and try to replicate the issue if possible.

2.1.4.10.1.2 Troubleshoot connectivity issues

1. Verify the correct login credentials - username, password, and database name. 2. Verify the host name and port 3. Try connecting directly to a SQream DB worker, rather than via the load balancer 4. Verify that the driver version you’re using is supported by the SQream DB version. Driver versions often get updated together with major SQream DB releases. 5. Try connecting directly with the built in SQL client. If you can connect with the local SQL client, check network availability and firewall settings.

2.1.4.10.1.3 Troubleshoot query performance

1. Use SHOW_NODE_INFO to examine which building blocks consume time in a statement. If the query has finished, but the results are not yet materialized in the client, it could point toaproblemin the application’s data buffering or a network throughput issue.. 2. If a problem occurs through a 3rd party client, try reproducing it directly with the built in SQL client. If the performance is better in the local client, it could point to a problem in the application or network connection. 3. Consult the Optimization and Best Practices guide to learn how to optimize queries and table structures.

2.1.4.10.1.4 Troubleshoot query behavior

1. Consult the SQL Statements and Syntax reference to verify if a statement or syntax behaves correctly. SQream DB may have some differences in behavior when compared to other databases. 2. If a problem occurs through a 3rd party client, try reproducing it directly with the built in SQL client. If the problem still occurs, file an issue with SQream support.

2.1.4.10.1.5 File an issue with SQream support

To file an issue, follow our Gathering information for SQream support guide.

2.1. Full list of guides 227 SQream DB, Release 2021.1

2.1.4.10.2 Examining logs

See the Collecting logs and metadata database section of the Gathering information for SQream support guide for information about collecting logs for support.

2.1.4.10.3 Start a temporary SQream DB for testing

Starting a SQream DB temporarily (not as part of a cluster, with default settings) can be helpful in identifying configuration issues. Example:

$ sqreamd /home/rhendricks/raviga_database 0 5000 /home/sqream/.sqream/license.enc

Tip: • Using nohup and & sends SQream DB to run in the background. • It is safe to stop SQream DB at any time using kill. No partial data or data corruption should occur when using this method to stop the process.

$ kill -9 $SQREAM_PID

2.1.4.11 Gathering information for SQream support

SQream Support is ready to answer any questions, and help solve any issues with SQream DB.

2.1.4.11.1 Getting support and reporting bugs

When contacting SQream Support, we recommend reporting the following information: • What is the problem encountered? • What was the expected outcome? • How can SQream reproduce the issue? When possible, please attach as many of the following: • Error messages or result outputs • DDL and queries that reproduce the issue • Log files • Screen captures if relevant

2.1.4.11.2 How SQream debugs issues

2.1.4.11.2.1 Reproduce

If we are able to easily reproduce your issue in our testing lab, this greatly improves the speed at which we can fix it.

228 Chapter 2. Guides SQream DB, Release 2021.1

Reproducing an issue consists of understanding: 1. What was SQream DB doing at the time? 2. How is the SQream DB cluster configured? 3. How does the schema look? 4. What is the query or statement that exposed the problem? 5. Were there any external factors? (e.g. Network disconnection, hardware failure, etc.) See the Collecting a reproducible example of a problematic statement section ahead for information about collecting a full reproducible example.

2.1.4.11.2.2 Logs

The logs produced by SQream DB contain a lot of information that may be useful for debugging. Look for error messages in the log and the offending statements. SQream’s support staff are experienced in correlating logs to workloads, and finding possible problems. See the Collecting logs and metadata database section ahead for information about collecting a set of logs that can be analyzed by SQream support.

2.1.4.11.2.3 Fix

Once we have a fix, this can be issued as a hotfix to an existing version, or as part of a bigger majorrelease. Your SQream account manager will keep you up-to-date about the status of the issue.

2.1.4.11.3 Collecting a reproducible example of a problematic statement

SQream DB contains an SQL utility that can help SQream support reproduce a problem with a query or statement. This utility compiles and executes a statement, and collects the relevant data in a small database which can be used to recreate and investigate the issue.

2.1.4.11.3.1 SQL Syntax

SELECT EXPORT_REPRODUCIBLE_SAMPLE(output_path, query_stmt [, ... ]) ;

output_path ::= filepath

2.1.4.11.3.2 Parameters

Parameter Description output_path Path for the output archive. The output file will be a tarball. query_stmt [, ...] Statements to analyze.

2.1. Full list of guides 229 SQream DB, Release 2021.1

2.1.4.11.3.3 Example

SELECT EXPORT_REPRODUCIBLE_SAMPLE('/home/rhendricks', 'SELECT * FROM t', $$SELECT "Name", ,→ "Team" FROM nba$$);

2.1.4.11.4 Collecting logs and metadata database

SQream DB comes bundled with a data collection utility and an SQL utility intended for collecting logs and additional information that can help SQream support drill down into possible issues. See more information in the Collect logs from your cluster section of the Logging guide.

2.1.4.11.4.1 Examples

Write an archive to /home/rhendricks, containing log files:

SELECT REPORT_COLLECTION('/home/rhendricks', 'log') ;

Write an archive to /home/rhendricks, containing log files and metadata database:

SELECT REPORT_COLLECTION('/home/rhendricks', 'db_and_log') ;

Using the command line utility:

$ ./bin/report_collection /home/rhendricks/sqream_storage /home/rhendricks db_and_log

2.1.4.12 Creating or cloning a storage cluster

When SQream DB is installed, it comes with a default storage cluster. This guide will help if you need a fresh storage cluster or a separate copy of an existing storage cluster.

2.1.4.12.1 Creating a new storage cluster

SQream DB comes with a CLI tool, SqreamStorage. This tool can be used to create a new empty storage cluster. In this example, we will create a new cluster at /home/rhendricks/raviga_database:

$ SqreamStorage --create-cluster --cluster-root /home/rhendricks/raviga_database Setting cluster version to: 26

This can also be written shorthand as SqreamStorage -C -r /home/rhendricks/raviga_database. This Setting cluster version... message confirms the creation of the cluster successfully.

230 Chapter 2. Guides SQream DB, Release 2021.1

2.1.4.12.2 Tell SQream DB to use this storage cluster

2.1.4.12.2.1 Permanently setting the storage cluster setting

To permanently set the new cluster location, change the "cluster" path listed in the configuration file. For example:

{ "compileFlags":{ }, "runtimeFlags":{ }, "runtimeGlobalFlags":{ }, "server":{ "gpu": 0, "port": 5000, "cluster": "/home/sqream/my_old_cluster", "licensePath": "/home/sqream/.sqream/license.enc" } } should be changed to

{ "compileFlags":{ }, "runtimeFlags":{ }, "runtimeGlobalFlags":{ }, "server":{ "gpu": 0, "port": 5000, "cluster": "/home/rhendricks/raviga_database", "licensePath": "/home/sqream/.sqream/license.enc" } }

Now, the cluster should be restarted for the changes to take effect.

2.1.4.12.2.2 Start a temporary SQream DB worker with a storage cluster

Starting a SQream DB worker with a custom cluster path can be done in two ways:

2.1.4.12.2.3 Using a configuration file (recommended)

Similar to the technique above, create a configuration file with the correct cluster path. Then, start sqreamd using the -config flag:

2.1. Full list of guides 231 SQream DB, Release 2021.1

$ sqreamd -config config_file.json

2.1.4.12.2.4 Using the command line parameters

Use sqreamd’s command line parameters to override the default storage cluster path:

$ sqreamd /home/rhendricks/raviga_database 0 5000 /home/sqream/.sqream/license.enc

Note: sqreamd’s command line parameters’ order is sqreamd

2.1.4.12.3 Copying an existing storage cluster

Copying an existing storage cluster to another path may be useful for testing or troubleshooting purposes. 1. Identify the location of the active storage cluster. This path can be found in the configuration file, under the "cluster" parameter. 2. Shut down the SQream DB cluster. This prevents very large storage directories from being modified during the copy process. 3. (optional) Create a tarball of the storage cluster, with tar -zcvf sqream_cluster_`date +"%Y-%m-%d-%H-%M"`.tgz . This will create a tarball with the current date and time as part of the filename. 4. Copy the storage cluster directory (or tarball) with cp to another location on the local filesystem, or use rsync to copy to a remote server. 5. After the copy is completed, start the SQream DB cluster to continue using SQream DB.

2.1.4.13 Installing Studio on a Stand-Alone Server

This guide explains how to install SQream Studio on a stand-alone server. A stand-alone server is a server that does not run SQreamd based on binary, Docker, or Kubernetes. This guide describes how to do the following: • Install NodeJS version 12 on the server • Install Studio • Start Studio manually • Start Studio as a service • Access Studio • Maintain Studio using the Process Manager (PM2) service • Upgrade to the new Studio version • Use the configuration setup arguments

232 Chapter 2. Guides SQream DB, Release 2021.1

2.1.4.13.1 Installing NodeJS Version 12 on the Server

Before installing Studio, you must install NodeJS version 12 on the server. To install NodeJS version 12 on the server: 1. Check if a version of NodeJS older than version 12. has been installed on the target server.

$ node -v

The following is the output if a version of NodeJS has already been installed on the target server:

bash: /usr/bin/node: No such file or directory

2. If a version of NodeJS older than 12. has been installed, remove it as follows: • On CentOS:

$ sudo yum remove -y nodejs

• On Ubuntu:

$ sudo apt remove -y nodejs

3. If you have not installed NodeJS version 12, run the following commands: • On CentOS:

$ curl -sL https://rpm.nodesource.com/setup_12.x | sudo bash - $ sudo yum clean all && sudo yum makecache fast $ sudo yum install -y nodejs

• On Ubuntu:

$ curl -sL https://deb.nodesource.com/setup_12.x | sudo -E bash - $ sudo apt-get install -y nodejs

The following output is displayed if your installation has completed successfully:

Transaction Summary ======Install 1 Package

Total download size: 22 M Installed size: 67 M Downloading packages: warning: /var/cache/yum/x86_64/7/nodesource/packages/nodejs-12.22.1- ,→1nodesource.x86_64.rpm: Header V4 RSA/SHA512 Signature, key ID 34fa74dd:␣ ,→NOKEY Public key for nodejs-12.22.1-1nodesource.x86_64.rpm is not installed nodejs-12.22.1-1nodesource.x86_64.rpm ␣ ,→ | 22 MB 00:00:02 Retrieving key from file:///etc/pki/rpm-gpg/NODESOURCE-GPG-SIGNING-KEY-EL Importing GPG key 0x34FA74DD: Userid : "NodeSource " Fingerprint: 2e55 207a 95d9 944b 0cc9 3261 5ddb e8d4 34fa 74dd (continues on next page)

2.1. Full list of guides 233 SQream DB, Release 2021.1

(continued from previous page) Package : nodesource-release-el7-1.noarch (installed) From : /etc/pki/rpm-gpg/NODESOURCE-GPG-SIGNING-KEY-EL Running transaction check Running transaction test Transaction test succeeded Running transaction Warning: RPMDB altered outside of yum. Installing : 2:nodejs-12.22.1-1nodesource.x86_64 ␣ ,→ 1/1 Verifying : 2:nodejs-12.22.1-1nodesource.x86_64 ␣ ,→ 1/1

Installed: nodejs.x86_64 2:12.22.1-1nodesource

Complete!

3. Confirm the Node version.

$ node -v

The following is an example of the correct output:

v12.22.1

Back to Installing Studio on a Stand-Alone Server

2.1.4.13.2 Installing Studio

To install Studio: 1. Copy the SQream Studio package from SQream Artifactory into the target server. For access to the Sqream Studio package, contact Sqream Support. 2. Extract the package:

$ tar -xvf sqream-acceleration-studio-.x86_64.tar.gz

3. Navigate to the new package folder.

$ cd sqream-admin

4. Build the configuration file to set up Sqream Studio. You can use IPaddress 127.0.0.1 on a single server.

$ npm run setup -- -y --host= --port=3108

The above command creates the sqream-admin-config.json configuration file in the sqream-admin folder and shows the following output:

Config generated successfully. Run `npm start` to start the app.

For more information about the available set-up arguments, see Set-Up Arguments.

234 Chapter 2. Guides SQream DB, Release 2021.1

5. If you have installed Studio on a server where SQream is already installed, move the sqream-admin- config.json file to /etc/sqream/:

$ mv sqream-admin-config.json /etc/sqream

Back to Installing Studio on a Stand-Alone Server

2.1.4.13.3 Starting Studio Manually

You can start Studio manually by running the following command:

$ cd /home/sqream/sqream-admin $ NODE_ENV=production pm2 start ./server/build/main.js --name=sqream-studio -- start

The following output is displayed:

[PM2] Starting /home/sqream/sqream-admin/server/build/main.js in fork_mode (1 instance) [PM2] Done. ￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿ ￿ id ￿ name ￿ namespace ￿ version ￿ mode ￿ pid ￿ uptime ￿ ￿ ￿␣ ,→status ￿ cpu ￿ mem ￿ user ￿ watching ￿ ￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿ ￿ 0 ￿ sqream-studio ￿ default ￿ 0.1.0 ￿ fork ￿ 11540 ￿ 0s ￿ 0 ￿␣ ,→online ￿ 0% ￿ 15.6mb ￿ sqream ￿ disabled ￿ ￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿

2.1.4.13.4 Starting Studio as a Service

Sqream uses the Process Manager (PM2) to maintain Studio. To start Studio as a service: 1. Run the following command:

$ sudo npm install -g pm2

2. Verify that the PM2 has been installed successfully.

$ pm2 list

The following is the output:

￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿ ￿ id ￿ name ￿ namespace ￿ version ￿ mode ￿ pid ￿ uptime ￿ ￿ ￿␣ ,→status ￿ cpu ￿ mem ￿ user ￿ watching ￿ ￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿ ￿ 0 ￿ sqream-studio ￿ default ￿ 0.1.0 ￿ fork ￿ 11540 ￿ 2m ￿ 0 ￿␣ ,→online ￿ 0% ￿ 31.5mb ￿ sqream ￿ disabled ￿ ￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿

2. Start the service with PM2: • If the sqream-admin-config.json file is located in /etc/sqream/, run the following command:

2.1. Full list of guides 235 SQream DB, Release 2021.1

$ cd /home/sqream/sqream-admin $ NODE_ENV=production pm2 start ./server/build/main.js --name=sqream-studio --␣ ,→start --config-location=/etc/sqream/sqream-admin-config.json

• If the sqream-admin-config.json file is not located in /etc/sqream/, run the following com- mand:

$ cd /home/sqream/sqream-admin $ NODE_ENV=production pm2 start ./server/build/main.js --name=sqream-studio --␣ ,→start

3. Verify that Studio is running.

$ netstat -nltp

4. Verify that SQream_studio is listening on port 8080, as shown below:

(Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.) Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/ ,→Program name tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN - tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN - tcp6 0 0 :::8080 :::* LISTEN ␣ ,→11540/sqream-studio tcp6 0 0 :::22 :::* LISTEN - tcp6 0 0 ::1:25 :::* LISTEN -

5. Verify the following: 1. That you can access Studio from your browser (http://:8080). 2. That you can log in to SQream. 6. Save the configuration to run on boot.

$ pm2 startup

The following is an example of the output:

$ sudo env PATH=$PATH:/usr/bin /usr/lib/node_modules/pm2/bin/pm2 startup systemd -u␣ ,→sqream --hp /home/sqream

7. Copy and paste the output above and run it. 8. Save the configuration.

$ pm2 save

Back to Installing Studio on a Stand-Alone Server

2.1.4.13.5 Accessing Studio

The Studio page is available on port 8080: http://:8080.

236 Chapter 2. Guides SQream DB, Release 2021.1

If port 8080 is blocked by the server firewall, you can unblock it by running the following command:

$ firewall-cmd --zone=public --add-port=8080/tcp --permanent $ firewall-cmd --reload

Back to Installing Studio

2.1.4.13.6 Maintaining Studio with the Process Manager (PM2)

Sqream uses the Process Manager (PM2) to maintain Studio. You can use PM2 to do one of the following: • To check the PM2 service status: pm2 list

• To restart the PM2 service: pm2 reload sqream-studio

• To see the PM2 service logs: pm2 logs sqream-studio Back to Installing Studio on a Stand-Alone Server

2.1.4.13.7 Upgrading Studio

To upgrade Studio you need to stop the version that you currently have. To stop the current version of Studio: 1. List the process name:

$ pm2 list

The process name is displayed.

2. Run the following command with the process name:

$ pm2 stop

3. If only one process is running, run the following command:

$ pm2 stop all

4. Change the name of the current sqream-admin folder to the old version.

$ mv sqream-admin sqream-admin-

5. Extract the new Studio version.

$ tar -xf sqream-acceleration-studio-tar.gz

6. Rebuild the configuration file. You can use IP address 127.0.0.1 on a single server.

2.1. Full list of guides 237 SQream DB, Release 2021.1

$ npm run setup -- -y --host= --port=3108

The above command creates the sqream-admin-config.json configuration file in the sqream_admin folder. 7. Copy the sqream-admin-config.json configuration file to /etc/sqream/ to overwrite the old con- figuration file. 8. Start PM2.

$ pm2 start all

Back to Installing Studio on a Stand-Alone Server

2.1.4.13.7.1 Installing Studio in a Docker Container

This guide explains how to install SQream Studio in a Docker container. This guide describes how to do the following: • Install SQream Studio in a Docker container • Access Studio • Using Docker Container Commands

2.1.4.13.8 Installing SQream Studio in a Docker Container

If you have already installed Docker, you can install SQream Studio in a Docker container. To install Sqream Studio in a Docker container: 1. Copy the downloaded image onto the target server. 2. Load the Docker image.

$ docker load -i

3. If the downloaded image is called sqream-acceleration-studio-5.1.3.x86_64.docker18.0.3.tar, run the following command:

$ docker load -i sqream-acceleration-studio-5.1.3.x86_64.docker18.0.3.tar

4. Start the Docker container.

$ docker run -d --restart=unless-stopped -p :8080 -e runtime=docker - ,→e SQREAM_K8S_PICKER= -e SQREAM_PICKER_PORT= -e SQREAM_DATABASE_NAME= -e SQREAM_ADMIN_UI_PORT=8080␣ ,→--name=sqream-admin-ui

The following is an example of the command above:

$ docker run -d --name sqream-studio -p 8080:8080 -e runtime=docker -e SQREAM_K8S_ ,→PICKER=192.168.0.183 -e SQREAM_PICKER_PORT=3108 -e SQREAM_DATABASE_NAME=master -e␣ ,→SQREAM_ADMIN_UI_PORT=8080 sqream-acceleration-studio:5.1.3

Back to Installing Studio in a Docker Container

238 Chapter 2. Guides SQream DB, Release 2021.1

2.1.4.13.9 Accessing Studio

You can access Studio from Port 8080: http://:8080. If you want to use Studio over a secure connection (https), you must use the parameter values shown in the following table:

Parameter Default Value Description --web-ssl-port8443 --web-ssl-key-pathNone The path of SSL key PEM file for enabling https. Leave empty to disable. --web-ssl-cert-pathNone The path of SSL certificate PEM file for enabling https. Leave empty to disable.

You can configure the above parameters using the following syntax:

$ npm run setup -- -y --host=127.0.0.1 --port=3108

Back to Installing Studio in a Docker Container

2.1.4.13.10 Docker Container Commands

When installing Studio in Docker, you can run the following commands: • View Docker container logs:

$ docker logs -f sqream-admin-ui

• Restart the Docker container:

$ docker restart sqream-admin-ui

• Kill the Docker container:

$ docker rm -f sqream-admin-ui

Back to Installing Studio in a Docker Container

2.1.4.13.11 Setup Argument Configurations

When creating the sqream-admin-config.json configuration file, you can add -y to create the configuration file in non-interactive mode. Configuration files created in non-interactive mode use all the parameter defaults not provided in the command. The following table shows the available arguments:

2.1. Full list of guides 239 SQream DB, Release 2021.1

Parameter Default Value Description --web--host8443 --web-port8080 --web-ssl-port8443 --web-ssl-key-pathNone The path of the SSL Key PEM file for enabling https. Leave empty to disable. --web-ssl-cert-pathNone The path of the SSL Certificate PEM file for enabling https. Leave empty to disable. --debug-sqreamfalse (flag) --host 127.0.0.1 --port 3108 is-clustertrue (flag) --servicesqream --ssl false Enables the SQream SSL connection. (flag) --name default --data-collector-urllocalhost:8100/api/dashboard/dataEnables the Dashboard. Leaving this blank disables the Dashboard. Using a mock URL uses mock data. --cluster-typestandalone (standalone or k8s) --config-location./sqream-admin- config.json --network-timeout60000 (60 seconds) --access-keyNone If defined, UI access is blocked unless ?ui-access= is included in the URL.

2.1.4.14 SQream Acceleration Studio 5.4.0

The SQream Studio is a web-based client for use with SQream DB. The Studio provides users with all functionality available from the command line in an intuitive and easy-to-use format. This includes running statements, managing roles and permissions, and managing SQream DB clusters.

This page describes the following:

• SQream Acceleration Studio 5.4.0 – Getting Started ∗ Setting Up and Starting Studio ∗ Logging In to Studio ∗ Navigating Studio’s Main Features – Monitoring Workers and Services from the Dashboard ∗ Displaying System Disk Usage from the Data Storage Panel ∗ Subscribing to Workers from the Services Panel ∗ Managing Workers from the Workers Panel ∗ License Information

240 Chapter 2. Guides SQream DB, Release 2021.1

– Executing Statements and Running Queries from the Editor ∗ Executing Statements from the Toolbar ∗ Performing Statement-Related Operations from the Database Tree ∗ Executing Pre-Defined Queries from the System Queries Panel ∗ Writing Statements and Queries from the Statement Panel ∗ Viewing Statement and Query Results from the Results Panel ∗ Analyzing Results – Viewing Logs ∗ Filtering Table Data ∗ Viewing Query Logs ∗ Viewing Session Logs ∗ Viewing System Logs ∗ Viewing All Log Lines – Creating, Assigning, and Managing Roles and Permissions ∗ Overview ∗ Viewing Information About a Role ∗ Creating a New Role ∗ Editing a Role ∗ Deleting a Role – Configuring Your Instance of SQream ∗ Editing Your Parameters ∗ Exporting and Importing Configuration Files

2.1.4.14.1 Getting Started

2.1.4.14.1.1 Setting Up and Starting Studio

Studio is included with all dockerized installations of SQream DB. When starting Studio, it listens on the local machine on port 8080.

2.1.4.14.1.2 Logging In to Studio

To log in to SQream Studio: 1. Open a browser to the host on port 8080. For example, if your machine IP address is 192.168.0.100, insert the IP address into the browser as shown below:

$ http://192.168.0.100:8080

2.1. Full list of guides 241 SQream DB, Release 2021.1

2. Fill in your SQream DB login credentials. These are the same credentials used for sqream sql or JDBC. When you sign in, the License Warning is displayed.

2.1.4.14.1.3 Navigating Studio’s Main Features

When you log in, you are automatically taken to the Editor screen. The Studio’s main functions are displayed in the Navigation pane on the left side of the screen. From here you can navigate between the main areas of the Studio:

Element Description Dashboard Lets you monitor cluster storage and system health and manage queues and workers. Editor Lets you select databases, perform statement operations, and write and execute queries. Logs Lets you view usage logs. Roles Lets you create users and manage user permissions. ConfigurationLets you configure your instance of SQream.

By clicking the user icon, you can also use it for logging out and viewing the following: • User information • Connection type • SQream version • SQream Studio version • Data size limitations • Log out

2.1.4.14.2 Monitoring Workers and Services from the Dashboard

The Dashboard is used for the following: • Monitoring cluster storage and system health. • Viewing, monitoring, and adding defined service queues. • Viewing and managing worker status and add workers. You can only access the Dashboard if you signed in with a SUPERUSER role. The following is a brief description of the Dashboard panels:

No. Element Description 1 Data Storage panel Used to monitor cluster storage. 2 Services panel Used for viewing and monitoring the defined service queues. 3 Workers panel Monitors system health and shows each Sqreamd worker running in the cluster. 4 License information Shows the remaining amount of days left on your license.

242 Chapter 2. Guides SQream DB, Release 2021.1

2.1.4.14.2.1 Displaying System Disk Usage from the Data Storage Panel

The Data Storage area displays your system’s total disk usage (percentage) and data storage (donut graph). Your data storage is shows as the following four components: • Data – Storage occupied by databases in SQream DB. • Free – Free storage space. • Deleted – Storage that is temporarily occupied, but has not been reclaimed. For more information, see Deleting Data. The deleted value is estimated and may not be accurate. • Other – Storage used by other applications. On a dedicated SQream DB cluster this should be near zero. You can show more information by clicking the expand arrow on the Data Storage panel.

The expanded Data Storage panel can be used to drill down into each database’s storage footprint, and displays a breakdown of how much storage each database in the cluster uses. The database information is displayed in a table and shows the following: • Database name - the name of the database. • Raw Size – the estimated size of (uncompressed) raw data in the database. • Storage Size – the physical size of the compressed data. • Ratio – the effective compression ratio • Deleted Data – Storage that is temporarily occupied, but has not been reclaimed. Below the table, an interactive line graph displays the database storage trends. By default, the line graph displays the total storage for all databases. You can show a particular database’s graph by clicking it in the table. You can also change the line graph’s timespan by selecting one from the timespan dropdown menu.

2.1. Full list of guides 243 SQream DB, Release 2021.1

Back to Monitoring Workers and Services from the Dashboard

2.1.4.14.2.2 Subscribing to Workers from the Services Panel

Services are used to categorize and associate (also known as subscribing) workers to particular services. The Service panel is used for viewing, monitoring, and adding defined service queues.

The Dashboard includes the four panes shown below:

244 Chapter 2. Guides SQream DB, Release 2021.1

The following is a brief description of each pane:

No. Description 1 Adds a worker to the selected service. 2 Shows the service name. 3 Shows a trend graph of queued statements loaded over time. 4 Adds a service. 5 Shows the currently processed queries belonging to the service/total queries for that service in the system (including queued queries).

2.1.4.14.2.3 Adding A Service

You can add a service by clicking + Add and defining the service name.

Note: If you do not associate a worker with the new service, it will not be created.

You can manage workers from the Workers panel. For more information on managing workers, see Workers. Back to Monitoring Workers and Services from the Dashboard

2.1.4.14.2.4 Managing Workers from the Workers Panel

From the Workers panel you can do the following: • View workers • Add a worker to a service • View a worker’s active query information • View a worker’s execution plan

2.1.4.14.2.5 Viewing Workers

The Worker panel shows each worker (sqreamd) running in the cluster. Each worker has a status bar that represents the status over time. The status bar is divided into 20 equal segments, showing the most dominant activity in that segment. From the Scale dropdown menu you can set the time scale of the displayed information You can hover over segments in the status bar to see the date and time corresponding to each activity type: • Idle – the worker is idle and available for statements.

2.1. Full list of guides 245 SQream DB, Release 2021.1

• Compiling – the worker is compiling a statement and is preparing for execution. • Executing – the worker is executing a statement after compilation. • Stopped – the worker was stopped (either deliberately or due to an error). • Waiting – the worker was waiting on an object locked by another worker.

2.1.4.14.2.6 Adding A Worker to A Service

You can add a worker to a service by clicking the add button.

Clicking the add button shows the selected service’s workers. You can add the selected worker to the service by clicking Add Worker. Adding a worker to a service does not break associations already made between that worker and other services.

2.1.4.14.2.7 Viewing A Worker’s Active Query Information

You can view a worker’s active query information by clicking Queries, which displays them in the selected service. Each statement shows the query ID, status, service queue, elapsed time, execution time, and esti- mated completion status. In addition, each statement can be stopped or expanded to show its execution plan and progress. For more information on viewing a statement’s execution plan and progress, see Viewing a Worker’s Execution Plan below.

2.1.4.14.2.8 Viewing A Worker’s Host Utilization

While viewing a worker’s query information, clicking the down arrow expands to show the host resource utilization.

246 Chapter 2. Guides SQream DB, Release 2021.1

The graphs show the resource utilization trends over time, and the CPU memory and utilization and the GPU utilization values on the right. You can hover over the graph to see more information about the activity at any point on the graph. Error notifications related to statements are displayed as shown in the figure below, and you can hoverover them for more information about the error.

2.1.4.14.2.9 Viewing a Worker’s Execution Plan

Clicking the ellipsis in a service shows the following additional options: • Stop Query - stops the query. • Show Execution Plan - shows the execution plan as a table. The columns in the Show Execution Plan table can be sorted. For more information on the current query plan, see SHOW_NODE_INFO. For more information on check- ing active sessions across the cluster, see SHOW_SERVER_STATUS.

2.1. Full list of guides 247 SQream DB, Release 2021.1

Table 14: Statement status values Status Description Preparing Statement is being prepared In queue Statement is waiting for execution Initializing Statement has entered execution checks Executing Statement is executing Stopping Statement is in the process of stopping

2.1.4.14.2.10 Managing Worker Status

In some cases you may want to stop or restart workers for maintenance purposes. Each Worker line has a ￿ menu used for stopping, starting, or restarting workers.

Starting or restarting workers terminates all queries related to that worker. When you stop a worker, its background turns gray. Back to Monitoring Workers and Services from the Dashboard

2.1.4.14.2.11 License Information

The license information is a counter showing the amount of time in days remaining on the license. Back to Monitoring Workers and Services from the Dashboard Back to Navigating Studio’s Main Features

2.1.4.14.3 Executing Statements and Running Queries from the Editor

The Editor is used for the following: • Selecting an active database and executing queries. • Performing statement-related operations and showing metadata. • Executing pre-defined queries. • Writing queries and statements and viewing query results.

248 Chapter 2. Guides SQream DB, Release 2021.1

The following is a brief description of the Editor panels:

No. Element Description 1 Toolbar Used to select the active database you want to work on, limit the number of rows, save query, etc. 2 Database Tree and System Shows a heirarchy tree of databases, views, tables, and Queries panel columns 3 Statement panel Used for writing queries and statements 4 Results panel Shows query results and execution information.

2.1.4.14.3.1 Executing Statements from the Toolbar

The following figure shows the Toolbar pane:

You can access the following from the Toolbar pane: • Database dropdown list - select a database that you want to run statements on. • Service dropdown list - select a service that you want to run statements on. The options in the service dropdown menu depend on the database you select from the Database dropdown list. • Execute - lets you set which statements to execute. The Execute button toggles between Execute and Stop, and can be used to stop an active statement before it completes: – Statements - executes the statement at the location of the cursor. – Selected - executes only the highlighted text. This mode should be used when executing sub- queries or sections of large queries (as long as they are valid ). – All - executes all statements in a selected tab. For more information on stopping active statements, see the STOP_STATEMENT command. • Format SQL - Lets you reformat and reindent statements. • Download query - Lets you download query text to your computer. • Open query - Lets you upload query text from your computer. • Max Rows - By default, the Editor fetches only the first 10,000 rows. You can modify this number by selecting an option from the Max Rows dropdown list. Note that setting a higher number may slow down your browser if the result is very large. This number is limited to 100,000 results. To see a higher number, you can save the results in a file or a table using the CREATE TABLE AS command. Back to Executing Statements and Running Queries from the Editor

2.1.4.14.3.2 Performing Statement-Related Operations from the Database Tree

From the Database Tree you can perform statement-related operations and show metadata (such as a number indicating the amount of rows in the table). The following figure shows the Database Tree and System Queries panel, with the Database Tree tab selected.

2.1. Full list of guides 249 SQream DB, Release 2021.1

The following figure shows the database object functions inthe calcs table object.

The database object functions are used to perform the following: • The SELECT statement - copies the selected table’s columns into the Statement panel as SELECT parameters. • The copy feature - copies the selected table’s name into the Statement panel.

• The additional operations - displays the following additional options:

250 Chapter 2. Guides SQream DB, Release 2021.1

Function Description Insert statement Generates an INSERT statement for the selected table in the editing area. Delete statement Generates a DELETE statement for the selected table in the editing area. Create Table As statement Generates a CREATE TABLE AS statement for the selected table in the editing area. Rename statement Generates an RENAME TABLE AS statement for renaming the selected table in the editing area. Adding column statement Generates an ADD COLUMN statement for adding columns to the se- lected table in the editing area. Truncate table statement Generates a TRUNCATE_IF_EXISTS statement for the selected table in the editing area. Drop table statement Generates a DROP statement for the selected object in the editing area. Table DDL Generates a DDL statement for the selected object in the editing area. To get the entire database DDL, click the icon next to the database name in the tree root. See also Seeing System Objects as DDL. DDL Optimizer The DDL Optimizer lets you analyze database tables and recommends possible optimizations.

2.1.4.14.3.3 Optimizing Database Tables Using the DDL Optimizer

The DDL Optimizer tab analyzes database tables and recommends possible optimizations according to SQream’s best practices. As described in the previous table, you can access the DDL Optimizer by clicking the additional options icon and selecting DDL Optimizer. The following table describes the DDL Optimizer screen:

Element Description Column area Shows the column names and column types from the selected table. You can scroll down or to the right/left for long column lists. Optimization Shows the number of rows to sample as the basis for running an optimization, the default area setting (1,000,000) when running an optimization (this is also the overhead threshold used when analyzing VARCHAR fields), and the default percent buffer to add VARCHAR lengths (10%). Attempts to determine field nullability. Run Optimizer Starts the optimization process.

Clicking Run Optimizer adds a tab to the Statement panel showing the optimized results of the selected object. The figure below shows the calcs Optimized tab for the optimized calcs table. For more information, see Optimization and Best Practices.

2.1.4.14.3.4 Executing Pre-Defined Queries from the System Queries Panel

The System Queries panel lets you execute pre-defined queries and includes the following system query types: • Catalog queries - used for analyzing table compression rates, users and permissions, etc. • Admin queries - queries related to available (describe the functionality in a general way). Queries useful for SQream database management:

2.1. Full list of guides 251 SQream DB, Release 2021.1

Clicking an item pastes the query into the Statement pane, and you can undo a previous operation by pressing Ctrl + Z.

2.1.4.14.3.5 Writing Statements and Queries from the Statement Panel

The multi-tabbed statement area is used for writing queries and statements, and is used in tandem with the toolbar. When writing and executing statements, you must first select a database from the Database dropdown menu in the toolbar. When you execute a statement, it passes through a series of statuses until completing. Knowing the status helps you with statement maintenance, and the statuses are shown in the Results panel. The following table shows the statement statuses:

Status Description Pending The statement is pending. In queue The statement is waiting for execution. Initializing The statement has entered execution checks. Executing The statement is executing. Statement stopped The statement has been stopped.

You can add and name new tabs for each statement that you need to execute, and Studio preserves your created tabs when you switch between databases. You can add new tabs by clicking , which creates a new tab to the right with a default name of SQL and an increasing number. This helps you keep track of your statements.

You can also rename the default tab name by double-clicking it and typing a new name and write multiple statements in tandem in the same tab by separating them with semicolons (;).If too many tabs to fit into the Statement Pane are open at the same time, the tab arrows are displayed. You can scroll through the tabs by clicking or , and close tabs by clicking . You can also close all tabs at once by clicking Close all located to the right of the tabs.

Tip: If this is your first time using SQream, see First steps with SQream DB.

Back to Executing Statements and Running Queries from the Editor

2.1.4.14.3.6 Viewing Statement and Query Results from the Results Panel

The results pane shows statment and query results. By default, only the first 10,000 results are returned, although you can modify this from the Executing Statements from the Toolbar, as described above.

252 Chapter 2. Guides SQream DB, Release 2021.1

By default, executing several statements together opens a separate results tab for each statement. Executing statements together executes them serially, and any failed statement cancels all subsequent executions. The following is a brief description of the elements on the Results panel views:

Element Description Results view Lets you view search query results. Execution Details Lets you view execution details, such as statement ID, number of rows, and averge view number of rows in chunk. SQL view Lets you see the SQL view. Save results to clip- Lets you save your search results to the clipboard to paste into another text editor. board Save results to local Lets you save your search query results to a local file. file

Back to Executing Statements and Running Queries from the Editor

2.1.4.14.3.7 Searching Query Results in the Results View

The Results view lets you view search query results. From this view you can also do the following: • View the amount of time (in seconds) taken for a query to finish executing. • Switch and scroll between tabs. • Close all tabs at once. • Enable keeping tabs by selecting Keep tabs. • Sort column results. In the Results view you can also run parallel statements, as described in Running Parallel Statements below.

2.1.4.14.3.8 Running Parallel Statements

While Studio’s default functionality is to open a new tab for each executed statement, Studio supports running parallel statements in one statement tab. Running parallel statements requires using macros and is useful for advanced users. The following shows the syntax for running parallel statements:

$ @@ parallel $ $$ $ select 1; $ select 2; $ select 3; $ $$

2.1. Full list of guides 253 SQream DB, Release 2021.1

The following figure shows the parallel statement syntax in the Editor:

2.1.4.14.3.9 Execution Details View

The Execution Details view lets you view a query’s execution plan for monitoring purposes. Most im- portantly, the Execution Details view highlights rows based on how long they ran relative to the entire query. This can be seen in the timeSum column as follows: • Rows highlighted red - longest runtime • Rows highlighted orange - medium runtime • Rows highlighted yellow - shortest runtime

2.1.4.14.3.10 Viewing Wrapped Strings in the SQL View

The SQL View panel allows you to more easily view certain queries, such as a long string that appears on one line. The SQL View makes it easier to see by wrapping it so that you can see the entire string at once. It

254 Chapter 2. Guides SQream DB, Release 2021.1

also reformats and organizes query syntax entered in the Statement panel for more easily locating particular segments of your queries. The SQL View is identical to the Format SQL feature in the Toolbar, allowing you to retain your originally constructed query while viewing a more intuititively structured snapshot of it. The following figure shows the SQL view:

2.1.4.14.3.11 Saving Results to the Clipboard

The Save results to clipboard function lets you save your results to the clipboard to paste into another text editor or into Excel for further analysis.

2.1.4.14.3.12 Saving Results to a Local File

The Save results to local file functions lets you save your search query results to a local file. Clicking Save results to local file downloads the contents of the Results panel to an Excel sheet. You can then use copy and paste this content into other editors as needed. Back to Executing Statements and Running Queries from the Editor

2.1.4.14.3.13 Analyzing Results

When results are produced, a Generate CREATE statement button is displayed. Clicking this button creates a new tab with an optimized CREATE TABLE statement, and an INSERT statement to copy the data to the new table. Back to Executing Statements and Running Queries from the Editor Back to Navigating Studio’s Main Features

2.1.4.14.4 Viewing Logs

The Logs screen is used for viewing logs and includes the following elements:

2.1. Full list of guides 255 SQream DB, Release 2021.1

Element Description Filter area Lets you filter the data shown in the table. Query tab Shows basic query information logs, such as query number and the time the query was run. Session tab Shows basic session information logs, such as session ID and user name. System tab Shows all system logs. Log lines tab Shows the total amount of log lines.

2.1.4.14.4.1 Filtering Table Data

From the Logs tab, from the FILTERS area you can also apply the TIMESPAN, ONLY ERRORS, and additional filters (Add). The Timespan filter lets you select a timespan. The Only Errors toggle button lets you show all queries, or only queries that generated errors. The Add button lets you add additional filters to the data shown in the table. The Filter button applies the selected filter(s). Some filters require you to type text to define the filter.

Other filters require you to select an item from a dropdown menu: • INFO • WARNING • ERROR • FATAL • SYSTEM You can also export a record of all of your currently filtered logs in Excel format by clicking Download located above the Filter area. Back to Viewing Logs

256 Chapter 2. Guides SQream DB, Release 2021.1

2.1.4.14.4.2 Viewing Query Logs

The QUERIES log area shows basic query information, such as query number and the time the query was run. The number next to the title indicates the amount of queries that have been run. From the Queries area you can see and sort by the following: • Query ID • Start time • Query • Compilation duration • Execution duration • Total duration • Details (execution details, error details, successful query details) In the Queries table, you can click on the Statement ID and Query items to set them as your filters. In the Details column you can also access additional details by clicking one of the Details options for a more detailed explanation of the query. Back to Viewing Logs

2.1.4.14.4.3 Viewing Session Logs

The SESSIONS tab shows the sessions log table and is used for viewing activity that has occurred during your sessions. The number at the top indicates the amount of sessions that have occurred. From here you can see and sort by the following: • Timestamp • Connection ID • Username • Client IP • Login (Success or Failed) • Duration (of session) • Configuration Changes In the Sessions table, you can click on the Timestamp, Connection ID, and Username items to set them as your filters. Back to Viewing Logs

2.1.4.14.4.4 Viewing System Logs

The SYSTEM tab shows the system log table and is used for viewing all system logs. The number at the top indicates the amount of sessions that have occurred. Because system logs occur less frequently than queries and sessions, you may need to increase the filter timespan for the table to display any system logs. From here you can see and sort by the following: • Timestamp

2.1. Full list of guides 257 SQream DB, Release 2021.1

• Log type • Message In the Systems table, you can click on the Timestamp and Log type items to set them as your filters. In the Message column, you can also click on an item to show more information about the message. Back to Viewing Logs

2.1.4.14.4.5 Viewing All Log Lines

The LOG LINES tab is used for viewing the total amount of log lines in a table. From here users can view a more granular breakdown of log information collected by Studio. The other tabs (QUERIES, SESSIONS, and SYSTEM) show a filtered form of the raw log lines. For example, the QUERIES tab shows an aggregation of several log lines. From here you can see and sort by the following: • Timestamp • Message level • Worker hostname • Worker port • Connection ID • Database name • User name • Statement ID In the LOG LINES table, you can click on any of the items to set them as your filters. Back to Viewing Logs Back to Navigating Studio’s Main Features

2.1.4.14.5 Creating, Assigning, and Managing Roles and Permissions

2.1.4.14.5.1 Overview

In the Roles area you can create and assign roles and manage user permissions. The Type column displays one of the following assigned role types:

Role Type Description Groups Roles with no users. Enabled users Users with log-in permissions and a password. Disabled users Users with log-in permissions and with a disabled password. An admin may disable a user’s password permissions to temporary disable access to the system.

Note: If you disable a password, when you enable it you have to create a new one.

Back to Creating, Assigning, and Managing Roles and Permissions

258 Chapter 2. Guides SQream DB, Release 2021.1

2.1.4.14.5.2 Viewing Information About a Role

Clicking a role in the roles table displays the following information: • Parent Roles - displays the parent roles of the selected role. Roles inherit all roles assigned to the parent. • Members - displays all members that the role has been assigned to. The arrow indicates the roles that the role has inherited. Hovering over a member displays the roles that the role is inherited from. • Permissions - displays the role’s permissions. The arrow indicates the permissions that the role has inherited. Hovering over a permission displays the roles that the permission is inherited from. Back to Creating, Assigning, and Managing Roles and Permissions

2.1.4.14.5.3 Creating a New Role

You can create a new role by clicking New Role.

An admin creates a user by granting login permissions and a password to a role. Each role is defined by a set of permissions. An admin can also group several roles together to form a group to manage them simultaneously. For example, permissions can be granted to or revoked on a group level. Clicking New Role lets you do the following: • Add and assign a role name (required) • Enable or disable log-in permissions for the role. • Set a password. • Assign or delete parent roles. • Add or delete permissions. • Grant the selected user with superuser permissions. From the New Role panel you view directly and indirectly (or inherited) granted permissions. Disabled permissions have no connect permissions for the referenced database and are displayed in gray text. You can add or remove permissions from the Add permissions field. From the New Role panel you can also search and scroll through the permissions. In the Search field you can use the and operator to search for strings that fulfill multiple criteria. When adding a new role, you must select the Enable login for this role and Has password check boxes. Back to Creating, Assigning, and Managing Roles and Permissions

2.1. Full list of guides 259 SQream DB, Release 2021.1

2.1.4.14.5.4 Editing a Role

Once you’ve created a role, clicking the Edit Role button lets you do the following: • Edit the role name. • Enable or disable log-in permissions. • Set a password. • Assign or delete parent roles. • Assign a role administrator permissions. • Add or delete permissions. • Grant the selected user with superuser permissions. From the Edit Role panel you view directly and indirectly (or inherited) granted permissions. Disabled permissions have no connect permissions for the referenced database and are displayed in gray text. You can add or remove permissions from the Add permissions field. From the Edit Role panel you can also search and scroll through the permissions. In the Search field you can use the and operator to search for strings that fulfill multiple criteria. Back to Creating, Assigning, and Managing Roles and Permissions

2.1.4.14.5.5 Deleting a Role

Clicking the delete icon displays a confirmation message with the amount of users and groups that willbe impacted by deleting the role. Back to Creating, Assigning, and Managing Roles and Permissions Back to Navigating Studio’s Main Features

2.1.4.14.6 Configuring Your Instance of SQream

The Configuration section lets you edit parameters from one centralized location. While you can edit these parameters from the worker configuration file (config.json) or from your CLI, you can also modify them in Studio in an easy-to-use format.

2.1.4.14.6.1 Editing Your Parameters

When configuring your instance of SQream in Studio you can edit parameters forthe Generic and Admin parameters only. Studio includes two types of parameters: toggle switches, such as flipJoinOrder, and text fields, such as logSysLevel. After editing a parameter, you can reset each one to its previous value or to its default value individually, or revert all parameters to their default setting simultaneously. Note that you must click Save to save your configurations. You can hover over the information icon located on each parameter to read a short description of its behavior. Back to Configuring Your Instance of SQream

260 Chapter 2. Guides SQream DB, Release 2021.1

2.1.4.14.6.2 Exporting and Importing Configuration Files

You can also export and import your configuration settings into a .json file. This allows you to easily edit your parameters and to share this file with other users if required. For more information about configuring your instance of SQream, see Configuration. Back to Configuring Your Instance of SQream Back to Navigating Studio’s Main Features

2.1.4.15 Using the statement editor (deprecated)

The statement editor is a web-based client for use with SQream DB. It can be used to run statements, and manage roles and permissions.

Note: The statement editor is deprecated from SQream DB v2020.2. It is replaced by SQream Studio

In this topic:

• Setting up and starting the statement editor • Logging in • Familiarizing yourself with the editor – Toolbar – Statement area ∗ Keyboard shortcuts – Results ∗ Sorting results ∗ Viewing execution information ∗ Saving results to a file or clipboard – Database tree ∗ Database ∗ Schema ∗ Table ∗ Create a table ∗ Catalog views ∗ Predefined queries • Notifications • DDL optimizer – Using the DDL optimizer – Analyzing the results

2.1. Full list of guides 261 SQream DB, Release 2021.1

2.1.4.15.1 Setting up and starting the statement editor

Note: We recommend using Google Chrome or Chrome-based browsers to access the statement editor.

The statement editor is included with all dockerized installations of SQream DB. You can start the editor using sqream-console:

$ ./sqream-console sqream-console> sqream editor --start access sqream statement editor through Chrome http://192.168.0.100:3000

When starting the editor, it listens on the local machine, on port 3000.

2.1.4.15.2 Logging in

Open a browser to the host, on port 3000. If the machine is 196.168.0.100, navigate to http://192.168.0.100: 3000 . Fill in your login details for SQream DB. These are the same credentials you might use when using sqream sql or JDBC.

262 Chapter 2. Guides SQream DB, Release 2021.1

Tip: • If using a load balancer, select the Server picker box and make sure to use the correct port (usually 3108). If connecting directly to a worker, make sure to untick the box and use the correct port, usually 5000 and up. • If this is your first time using SQream DB, use database name master. • When using SQream DB on AWS, the initial password is the EC2 instance ID.

2.1.4.15.3 Familiarizing yourself with the editor

The editor is built up of main panes.

2.1. Full list of guides 263 SQream DB, Release 2021.1

• Toolbar - used to select the active database you want to work on, limit the number of rows, save query results, control tab behavior, and more. • Statement area - The statement area is a multi-tab text editor where you write SQL statements. Each tab can connect to a different database. • Results - Results from a query will populate here. • Database tree - contains a heirarchy tree of databases, views, tables, and columns. Can be used to navigate and perform some table operations. See more about each pane below:

2.1.4.15.3.1 Toolbar

In the toolbar, you can perform the folllowing operations (from left to right): • Toggle Database Tree - Click ••• to show or hide the Database Tree pane. • Database dropdown - Select the database you want to the statements to run on. • ￿ RUN / ￿ STOP - Use the ￿ RUN button to execute the statement in the Editor pane. When a statement is running, the button changes to ￿ STOP, and can be used to stop the active statement. • SQL - Reformats and reindents the statement • Max. Rows - By default, the editor will only fetch the first 1000 rows. Click the number to edit. Click outside the number area to save. Setting a higher limit can slow down your browser if the result set is very large. • ￿ (Save) - Save the query text to a file. • ￿ (Load) - Load query text from a file.

264 Chapter 2. Guides SQream DB, Release 2021.1

• ￿ (more) - – Append new results - When checked, every statement executed will open a new Results tab. If unchecked, the Results tab is reused and overwritten with every new statement.

2.1.4.15.3.2 Statement area

The multi-tabbed statement area is where you write queries and statements. Select the database you wish to use in the toolbar, and then write and execute statements. • A new tab can be opened for each statement. Tabs can be used to separate statements to different databases. • Multiple statements can be written in the same tab, separated by semicolons. • When multiple statements exist in the tab, clicking Run executes all statements in the tab, or only the selected statements.

Tip: If this is your first time with SQream DB, see our first steps guide.

2.1.4.15.3.3 Keyboard shortcuts

Ctrl +: kbd:Enter - Execute all queries in the statement area, or just the highlighted part of the query. Ctrl + Space - Auto-complete the current keyword Ctrl + ↑ - Switch to next tab. Ctrl + ↓ - Switch to previous tab

2.1.4.15.3.4 Results

The results pane shows query results and execution information. By default, only the first 1000 results are returned (modify via the Toolbar). A context menu, accessible via a right click on the results tab, enables: • Renaming the tab name • Show the SQL query text • Reload results

2.1. Full list of guides 265 SQream DB, Release 2021.1

• Close the current tab • Close all result tabs

2.1.4.15.3.5 Sorting results

After the results have appeared, sort them by clicking the column name.

2.1.4.15.3.6 Viewing execution information

During query execution the time elapsed is tracked in seconds.

The SHOW STATISTICS button opens the query’s execution plan, for monitoring purposes.

2.1.4.15.3.7 Saving results to a file or clipboard

Query results can be saved to a clipboard (for pasting into another text editor) or a local file.

2.1.4.15.3.8 Database tree

The database tree shows the database objects (e.g. tables, columns, views), as well as some metadata like row counts. It also contains a few predefined catalog queries for execution.

266 Chapter 2. Guides SQream DB, Release 2021.1

Each level contains a context menu relevant to that object, accessible via a right-click.

2.1. Full list of guides 267 SQream DB, Release 2021.1

2.1.4.15.3.9 Database

• Copy the database DDL to the clipboard

2.1.4.15.3.10 Schema

• Drop the schema (copies statement to the clipboard)

2.1.4.15.3.11 Table

• Show row count in the database tree • Copy the create table script to the clipboard • Copy SELECT to clipboard • Copy INSERT to clipboard • Copy DELETE to clipboard • Rename table - Copy RENAME TABLE to clipboard • Create table LIKE - Copy CREATE TABLE AS to clipboard • Add column - Copy ADD COLUMN to clipboard • Truncate table - Copy TRUNCATE to clipboard • Drop table - Copy DROP TABLE to pclipboard • Create a table - Add a new table by running a statement, or alternatively use the Add new link near the TABLES group.

2.1.4.15.3.12 Create a table

Creating a new table is also possible using the wizard which can guide you with creating a table. Refer to the CREATE TABLE reference for information about creating a table (e.g. able parameters like default values, identity, etc.).

268 Chapter 2. Guides SQream DB, Release 2021.1

Fill in the table name, and add a new row for each table column. If a table by the same name exists, check Create or Replace table to overwrite it. Click EXEC to create the table.

2.1.4.15.3.13 Catalog views

To see catalog views, click the catalog view name in the tree. The editor will run a query on that view.

2.1. Full list of guides 269 SQream DB, Release 2021.1

2.1.4.15.3.14 Predefined queries

The editor comes with several predefined catalog queries that are useful for analysis of table compression rates, users and permissions, etc.

2.1.4.15.4 Notifications

Desktop notificaitons lets you receive a notification when a statement is completed. You can minimize the browser or switch to other tabs, and still recieve a notification when the query is done.

Enable the desktop notification through the Allow Desktop Notification from the menu options.

2.1.4.15.5 DDL optimizer

The DDL optimizer tab analyzes database tables and recommends possible optimizations, per the Optimiza- tion and Best Practices guide.

2.1.4.15.5.1 Using the DDL optimizer

Navigate to the DDL optimizer module by selecting it from the ￿ (“More”) menu.

270 Chapter 2. Guides SQream DB, Release 2021.1

• Database and Table - select the database and desired table to optimize • Rows is the number to scan for analysis. Defaults to 1,000,000 • Buffer Size - overhead threshold to use when analyzing VARCHAR fields. Defaults to 10%. • Optimize NULLs - attempt to figure out field nullability. Click EXECUTE to start the optimization process.

2.1.4.15.5.2 Analyzing the results

The analysis process shows results for each row.

The results are displayed in two tabs: • OPTIMIZED COLUMNS - review the system recommendation to: 1. decrease the length of VARCHAR fields 2. remove the NULL option • OPTIMIZED DDL - The recommended CREATE TABLE statement Analyzing the DDL culminates in four possible actions:

2.1. Full list of guides 271 SQream DB, Release 2021.1

• COPY DDL TO CLIPBOARD - Copies the optimized CREATE TABLE to the clipboard • CREATE A NEW TABLE - Creates the new table structure with _new appended to the table name. No data is populated • CREATE AND INSERT INTO EXISTING DATA - Create a new table in same database and schema as the original table and populates the data • Back - go back to the statement editor and abandon any recommendations

2.1.4.16 Hardware Guide

This guide describes the SQream DB reference architecture, emphasizing the benefits to the technical audi- ence, and provides guidance for end-users on selecting the right configuration for a SQream DB installation.

Need help?

This page is intended as a “reference” to suggested hardware. However, different workloads require different solution sizes. SQream’s experienced customer support has the experience to advise on these matters to ensure the best experience. Visit SQream’s support portal for additional support.

2.1.4.16.1 A SQream DB Cluster

SQream recommends rackmount servers by server manufacturers Dell, Lenovo, HP, Cisco, Supermicro, IBM, and others. A typical SQream DB cluster includes one or more nodes, consisting of • Two-socket enterprise processors, like the Intel® Xeon® Gold processor family or an IBM® POWER9 processors, providing the high performance required for compute-bound database workloads. • NVIDIA Tesla GPU accelerators, with up to 5,120 CUDA and Tensor cores, running on PCIe or fast NVLINK busses, delivering high core count, and high-throughput performance on massive datasets • High density chassis design, offering between 2 and 4 GPUs in a 1U, 2U, or 3U package, for best-in-class performance per cm2.

2.1.4.16.1.1 Single-Node Cluster Example

A single-node SQream DB cluster can handle between 1 and 8 concurrent users, with up to 1PB of data storage (when connected via NAS). An average single-node cluster can be a rackmount server or workstation, containing the following compo- nents:

272 Chapter 2. Guides SQream DB, Release 2021.1

Component Type Server Dell T640, Dell R740, Dell R940xa, HP ProLiant DL380 Gen10 or similar Processor 2x Intel Xeon Gold 6240 (18C/36HT) 2.6GHz RAM 512 GB Onboard storage • 2x 960GB SSD 2.5in Hot-plug for OS, RAID1 • 14x 3.84TB SSD 2.5in Hot-plug for storage, RAID6

GPU 2x or 4x NVIDIA Tesla T4, V100 or A100

In this system configuration, SQream DB can store about 200TB of raw data (assuming average compression ratio and ~50TB of usable raw storage). If a NAS is used, the 14x SSD drives can be omitted, but SQream recommends 2TB of local spool space on SSD or NVMe drives.

2.1.4.16.1.2 Multi-Node Cluster Example

Multi-node clusters can handle any number of concurrent users. A typical SQream DB cluster relies on a shared storage connected over a network fabric like InfiniBand EDR, 40GbE, or 100GbE An example of a cluster node providing the best performance:

Component Type Server High-density GPU-capable rackmount server, like Dell C4140, IBM AC922, Lenovo SR650. Processor 2x Intel Xeon Gold 6240 (18C/36HT) 2.6GHz or 2x IBM POWER9 RAM 1024 GB Onboard storage • 2x 960GB SSD 2.5in, for OS, RAID1 • 2x 2TB SSD or NVMe, for temporary spool- ing, RAID1

Networking Mellanox Connectx 5 100 Gbps for storage fabric. GPU 4x NVIDIA Tesla V100 32GB or A100

Note: With a NAS connected over GPFS, Lustre, or NFS - each SQream DB worker can read data at up to 5GB/s.

2.1.4.16.2 Cluster Design Considerations

• In a SQream DB installation, the storage and compute are logically separated. While they may reside on the same machine in a standalone installation, they may also reside on different hosts, providing additional flexibility and scalability.

2.1. Full list of guides 273 SQream DB, Release 2021.1

• SQream DB uses all resources in a machine, including CPU, RAM, and GPU to deliver the best performance. 256GB of RAM per physical GPU is recommended, but not required. • Local disk space is required for good performance temporary spooling - particularly when performing intensive larger-than-RAM operations like sorting. SQream recommends an SSD or NVMe drive, in mirrored RAID 1 configuration, with about 2x the RAM size available for temporary storage. This can be shared with the operating system drive if necessary. • When using SAN or NAS devices, SQream recommends around 5GB/s of burst throughput from storage, per GPU.

2.1.4.16.2.1 Balancing Cost and Performance

Prior to designing and deploying a SQream DB cluster, there are a number of important factors to consider. This section provides a breakdown of deployment details intended to help ensure that this installation exceeds or meets the stated requirements. The rationale provided includes the necessary information for modifying configurations to suit the customer use-case scenario.

Component Value Compute - CPU Balance price and performance Compute – GPU Balance price with performance and concurrency Memory – GPU RAM Balance price with concurrency and performance. Memory - RAM Balance price and performance Operating System Availability, reliability, and familiarity Storage Balance price with capacity and performance Network Balance price and performance

2.1.4.16.2.2 CPU Compute

SQream DB relies on multi-core Intel® Xeon® processors or IBM® POWER9 processors. SQream recommends a dual-socket machine populated with CPUs with 18C/36HT or better. While a higher core count may not necessarily affect query performance, more cores will enable higher concurrency and better load performance.

2.1.4.16.2.3 GPU Compute and RAM

The NVIDIA Tesla range of high-throughput GPU accelerators provides the best performance for enterprise environments. Most cards have ECC memory, which is crucial for delivering correct results every time. SQream recommends the NVIDIA Tesla V100 32GB GPU for best performance and highest concurrent user support. GPU RAM, sometimes called GRAM or VRAM is used for processing queries. It is possible to select GPUs with less RAM, like the NVIDIA Tesla V100 16GB or P100 16GB. However, the smaller GPU RAM available will result in reduced concurrency, as the GPU RAM is used extensively in operations like JOINs, ORDER BY, GROUP BY, and all SQL transforms.

274 Chapter 2. Guides SQream DB, Release 2021.1

2.1.4.16.2.4 RAM

Use of error-correcting code memory (ECC) is a practical requirement for SQream DB and is standard on most enterprise server. SQream DB benefits from having large amounts of memory for improved performance on large ‘external’ operations like sorting and joining. Although SQream DB can function with less, we recommend a key of 256GB of RAM per GPU in the machine.

2.1.4.16.2.5 Operating System

SQream DB can run on 64-bit Linux operating systems: • Red Hat Enterprise Linux (RHEL) v7 • CentOS v7 • Amazon Linux 2018.03 • Ubuntu v16.04 LTS, v18.04 LTS • Other Linux distributions may be supported via nvidia-docker

2.1.4.16.2.6 Storage

For clustered scale-out installations, SQream DB relies on NAS/SAN storage. These devices have extremely high reliability and durability, with five 9s of up-time. For stand-alone installations, SQream DB relies on redundant disk configurations, like RAID 10/50. These RAID configurations ensure that blocks of data are replicated between disks, so that failure of anumber disks will not result in data loss or availability of the system. Because storage reliability is important, SQream recommends enterprise-grade SAS SSD drives. However, as with other components – there is a tradeoff for cost/performance. When performance and reliability are important, SQream recommends SAS SSD or NVMe drives. SQream DB functions well with more cost-effective SATA drives and even large spinning-disk arrays.

2.1.4.16.3 Cluster Supporting 32 Concurrent Active User Example

For a 32-user configuration, the number of GPUs should roughly match the number of users. SQreamDB recommends 1 Tesla V100 GPU per 2 users, for full, uninterrupted dedicated access. Each of these servers can support about 8 users on average. The actual number of concurrent users can be higher, depending on the workload. A SQream DB cluster for 32 users consists of the following components: 1. 4 high-density GPU-enabled servers, like the Dell C4140 (Configuration C) with 4x NVIDIA Tesla V100 32GB PCIe GPUs. Each server is equipped with dual Intel ® Xeon ® Gold 6240 CPU, and 1,024GB of RAM. 2. NAS/SAN storage, capable of delivering 1 GB/s per GPU. For the system above, with 4x4 NVIDIA Tesla V100 GPUs, this results in 16GB/s, over multiple bonded, 40GigE or InfiniBand links via a fabric switch. 3. Top-of-Rack (ToR) 10GigE ethernet switch for the BI fabric

2.1. Full list of guides 275 SQream DB, Release 2021.1

4. 40GigE or InfiniBand switches for the storage fabric 5. At least 1 PDU

Read more

Download the full SQream DB Reference Architecture document.

2.1.4.17 Security

SQream DB has some security features that you should be aware of to increase the security of your data.

In this topic:

• Overview • Security best practices for SQream DB – Secure OS access – Change the default SUPERUSER – Create distinct user roles – Limit SUPERUSER access – Password strength guidelines – Use TLS/SSL when possible

2.1.4.17.1 Overview

An initial, unsecured installation of SQream DB can carry some risks:

276 Chapter 2. Guides SQream DB, Release 2021.1

• Your data open to any client that can access an open node through an IP and port combination. • The initial administrator username and password, when unchanged, can let anyone log in. • Network connections to SQream DB aren’t encrypted. To avoid these security risks, SQream DB provides authentication, authorizaiton, logging, and network encryption. Read through the best practices guide to understand more.

2.1.4.17.2 Security best practices for SQream DB

2.1.4.17.2.1 Secure OS access

SQream DB often runs as a dedicated user on the host OS. This user is the file system owner of SQream DB data files. Any user who logs in to the OS with this user can read or delete data from outside of SQream DB. This user can also read any logs which may contain user login attempts. Therefore, it is very important to secure the host OS and prevent unauthorized access. System administrators should only log in to the host OS to perform maintenance tasks like upgrades. A database user should not log in using the same username in production environments.

2.1.4.17.2.2 Change the default SUPERUSER

To bootstrap SQream DB, a new install will always have one SUPERUSER role, typically named sqream. After creating a second SUPERUSER role, remove or change the default credentials to the default sqream user. No database user should ever use the default SUPERUSER role in a production environment. If you don’t change the user role itself, change the password of the default SUPERUSER. See the change password section of our access control guide.

2.1.4.17.2.3 Create distinct user roles

Each user that signs in to a SQream DB cluster should have a distinct user role for several reasons: • For logging and auditing purposes. Each user that logs in to SQream DB can be identified. • For limiting permissions. Use groups and permissions to manage access. See our Access control guide for more information.

2.1.4.17.2.4 Limit SUPERUSER access

Limit users who have the SUPERUSER role. A superuser role bypasses all permissions checks. Only system administrators should have SUPERUSER roles. See our Access control guide for more information.

2.1. Full list of guides 277 SQream DB, Release 2021.1

2.1.4.17.2.5 Password strength guidelines

System administrators should verify the passwords used are strong ones. SQream DB stores passwords as salted SHA1 hashes in the system catalog so they are obscured and can’t be recovered. However, passwords may appear in server logs. Prevent access to server logs by securing OS access as described above. Follow these recommendations to strengthen passwords: • Pick a password that’s easy to remember • At least 8 characters • Mix upper and lower case letters • Mix letters and numbers • Include non-alphanumeric characters (except " and ')

2.1.4.17.2.6 Use TLS/SSL when possible

SQream DB’s protocol implements client/server TLS security (even though it is called SSL). All SQream DB connectors and drivers support transport encryption. Ensure that each connection uses SSL and the correct access port for the SQream DB cluster: • The load balancer (server_picker) is often started with the secure port at an offset of 1 from the original port (e.g. port 3108 for the unsecured connection and port 3109 for the secured connection). • A SQream DB worker is often started with the secure port enabled at an offset of 100 from the original port (e.g. port 5000 for the unsecured connection and port 5100 for the secured connection). Refer to each client driver for instructions on enabling TLS/SSL.

2.1.4.18 Setting up SQream DB

The guides below cover installing SQream DB.

2.1.4.18.1 Before you begin

Before you install or deploy SQream DB, please verify you have the following: • An NVIDIA-capable server, either on-premise or on supported cloud platforms. – Supported operating systems: Red Hat Enterprise Linux v7.x / CentOS v7.x / Ubuntu 18.04 / Amazon Linux – NVIDIA GPU. A Tesla GPU is highly recommended. See more information in our Hardware Guide. – A privileged SSH connection to the server (sudo) • A SQream DB license (contact [email protected] or your SQream account manager for your license key) • A 3rd party tool that can connect to SQream DB via JDBC, ODBC, or Python If you do not have a SQream DB license, visit our website and sign up for SQream DB today! Refer to our Hardware Guide for more information about supported and recommended configurations.

278 Chapter 2. Guides SQream DB, Release 2021.1

2.1.4.18.1.1 What does the installation process look like?

To install SQream DB, we will go through the following steps: 1. Prepare the host OS for NVIDIA driver installation 2. Install the NVIDIA driver 3. Install Docker CE 4. Install nvidia-docker2 5. Prepare disk space for SQream DB 6. Install SQream DB 7. Additional system configuration for performance and stability

What’s next?

• When ready, start installing SQream DB

2.1.4.18.2 Start a local SQream DB cluster with Docker

See Release Notes to learn about what’s new in the latest release of SQream DB. To upgrade to this release, see Upgrading SQream DB with Docker. SQream DB is installed on your hosts with NVIDIA Docker. There are several preparation steps to ensure before installing SQream DB, so follow these instructions carefully.

Note: Installing SQream DB requires a license key. Go to SQream Support or contact your SQream account manager for your license key.

In this topic:

• Preparing your machine for NVIDIA Docker – CentOS 7 / RHEL 7 / Amazon Linux – Ubuntu 18.04 • Install Docker CE and NVIDIA docker – CentOS 7 / RHEL 7 / Amazon Linux (x64) – CentOS 7.6 / RHEL 7.6 (IBM POWER9) – Ubuntu 18.04 (x64) • Preparing directories and mounts for SQream DB • Install the SQream DB Docker container • Starting your first local cluster

2.1. Full list of guides 279 SQream DB, Release 2021.1

2.1.4.18.2.1 Preparing your machine for NVIDIA Docker

To install NVIDIA Docker, we must first install the NVIDIA driver.

Note: SQream DB works best on NVIDIA Tesla series GPUs, which provide better reliability, performance, and stability. The instructions below are written for NVIDIA Tesla GPUs, but other NVIDIA GPUs may work.

Follow the instructions for your OS and architecture:

• CentOS 7 / RHEL 7 / Amazon Linux • Ubuntu 18.04

2.1.4.18.2.2 CentOS 7 / RHEL 7 / Amazon Linux

Recommended Follow the installation instructions on NVIDIA’s CUDA Installation Guide for full instructions suitable for your platform. The information listed below is a summary of the necessary steps, and does not cover the full range of options available.

1. Enable EPEL EPEL provides additional open-source and free packages from the RHEL ecosystem. The NVIDIA driver depends on packages such as DKMS and libvdpau which are only available on third-party repositories, such as EPEL.

$ sudo rpm -Uvh http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch. ,→rpm

1. Install the kernel headers and development tools necessary to compile the NVIDIA driver

$ sudo yum -y install kernel-devel-$(uname -r) kernel-headers-$(uname -r) gcc

2. Install the CUDA repository and install the latest display driver

$ sudo rpm -Uvh https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_ ,→64/cuda-repo-rhel7-10.1.243-1.x86_64.rpm $ sudo yum install -y nvidia-driver-latest

Note: If Linux is running with X, switch to text-only mode before installing the display driver.

$ sudo systemctl set-default multi-user.target

This change permanently disables X. If you need X, change set-default to isolate. This will re-enable X on the next reboot.

1. Restart your machine

280 Chapter 2. Guides SQream DB, Release 2021.1

``sudo reboot``

2. Verify the installation completed correctly, by asking nvidia-smi, NVIDIA’s system management interface application, to list the available GPUs.

$ nvidia-smi -L GPU 0: Tesla V100-PCIE-16GB (UUID: GPU-...) GPU 1: Tesla V100-PCIE-16GB (UUID: GPU-...)

3. Enable NVIDIA’s persistence daemon. This is mandatory for IBM POWER9, but is recommended for other platforms as well.

$ sudo systemctl enable nvidia-persistenced && sudo systemctl start␣ ,→nvidia-persistenced

Important: On POWER9 systems only, disable the udev rule for hot-pluggable memory probing. For Red Hat 7 this rule can be found in /lib/udev/rules.d/40-redhat.rules For Ubuntu, this rule can be found in in /lib/udev/rules.d/40-vm-hotadd.rules The rule generally takes a form where it detects the addition of a memory block and changes the ‘state’ attribute to online. For example, in RHEL7, the rule looks like this: SUBSYSTEM=="memory", ACTION=="add", PROGRAM="/bin/uname -p", RESULT! ="s390*", ATTR{state}=="offline", ATTR{state}="online" This rule must be disabled by copying the file to /etc/udev/rules.d and comment- ing out, removing, or changing the hot-pluggable memory rule in the /etc copy so that it does not apply to NVIDIA devices on POWER9. • On RHEL 7.5 or earlier versions:

$ sudo cp /lib/udev/rules.d/40-redhat.rules /etc/udev/rules.d $ sudo sed -i '/SUBSYSTEM=="memory", ACTION=="add"/d' /etc/ ,→udev/rules.d/40-redhat.rules

• On RHEL 7.6 and later versions:

$ sudo cp /lib/udev/rules.d/40-redhat.rules /etc/udev/rules.d $ sudo sed -i 's/SUBSYSTEM!="memory", ACTION!="add", GOTO= ,→"memory_hotplug_end"/SUBSYSTEM=="*", GOTO="memory_hotplug_ ,→end"/' /etc/udev/rules.d/40-redhat.rules

Reboot the system to initialize the above changes

4. Continue to installing NVIDIA Docker for RHEL

2.1.4.18.2.3 Ubuntu 18.04

Recommended

2.1. Full list of guides 281 SQream DB, Release 2021.1

Follow the installation instructions on NVIDIA’s CUDA Installation Guide for full instructions suitable for your platform. The information listed below is a summary of the necessary steps, and does not cover the full range of options available.

1. Install the kernel headers and development tools necessary

$ sudo apt-get update $ sudo apt-get install linux-headers-$(uname -r) gcc

2. Install the CUDA repository and driver on Ubuntu

$ curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_ ,→64/cuda-repo-ubuntu1804_10.1.243-1_amd64.deb $ sudo dpkg -i cuda-repo-ubuntu1804_10.1.243-1_amd64.deb $ sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/ ,→repos/ubuntu1804/x86_64/7fa2af80.pub $ sudo apt-get update && sudo apt-get install -y nvidia-driver-418

3. Restart your machine sudo reboot 4. Verify the installation completed correctly, by asking nvidia-smi, NVIDIA’s system management interface application, to list the available GPUs.

$ nvidia-smi -L GPU 0: Tesla V100-PCIE-16GB (UUID: GPU-...) GPU 1: Tesla V100-PCIE-16GB (UUID: GPU-...)

5. Enable NVIDIA’s persistence daemon. This is mandatory for IBM POWER9, but is recommended for other platforms as well.

$ sudo systemctl enable nvidia-persistenced

6. Continue to installing NVIDIA Docker for Ubuntu

2.1.4.18.2.4 Install Docker CE and NVIDIA docker

Follow the instructions for your OS and architecture:

• CentOS 7 / RHEL 7 / Amazon Linux (x64) • CentOS 7.6 / RHEL 7.6 (IBM POWER9) • Ubuntu 18.04 (x64)

2.1.4.18.2.5 CentOS 7 / RHEL 7 / Amazon Linux (x64)

Note: For IBM POWER9, see the next section installing NVIDIA Docker for IBM POWER9

1. Follow the instructions for Docker CE for your platform at Get Docker Engine - Community for CentOS

282 Chapter 2. Guides SQream DB, Release 2021.1

2. Tell Docker to start after a reboot

$ sudo systemctl enable docker && sudo systemctl start docker

3. Verify that docker is running

$ sudo systemctl status docker ￿ docker.service - Docker Application Container Engine Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset:␣ ,→disabled) Active: active (running) since Mon 2019-08-12 08:22:30 IDT; 1 months 27 days ago Docs: https://docs.docker.com Main PID: 65794 (dockerd) Tasks: 76 Memory: 124.5M CGroup: /system.slice/docker.service ￿￿65794 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd. ,→sock

4. Let your current user manage Docker, without requiring sudo

$ sudo usermod -aG docker $USER

Then, log out and log back in:

$ exit

5. Install nvidia-docker

$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) $ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker. ,→repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo $ sudo yum install -y nvidia-container-toolkit $ sudo yum install nvidia-docker2 $ sudo pkill -SIGHUP dockerd $ sudo systemctl restart docker

Note: Occasionally, there may be a signature verification error while obtaining nvidia-docker. The error looks something like this: [Errno -1] repomd.xml signature could not be verified for nvidia-docker Run the following commands to update the repository keys:

$ DIST=$(sed -n 's/releasever=//p' /etc/yum.conf) $ DIST=${DIST:-$(. /etc/os-release; echo $VERSION_ID)} $ sudo rpm -e gpg-pubkey-f796ecb0 $ sudo gpg --homedir /var/lib/yum/repos/$(uname -m)/$DIST/*/gpgdir --delete-key␣ ,→f796ecb0 $ sudo gpg --homedir /var/lib/yum/repos/$(uname -m)/latest/nvidia-docker/gpgdir -- ,→delete-key f796ecb0 $ sudo gpg --homedir /var/lib/yum/repos/$(uname -m)/latest/nvidia-container-runtime/ ,→gpgdir --delete-key f796ecb0 $ sudo gpg --homedir /var/lib/yum/repos/$(uname -m)/latest/libnvidia-container/ ,→gpgdir --delete-key f796ecb0

2.1. Full list of guides 283 SQream DB, Release 2021.1

6. Verify the NVIDIA docker installation

$ sudo docker run --runtime=nvidia --rm nvidia/cuda:10.1-base nvidia-smi -L GPU 0: Tesla V100-PCIE-16GB (UUID: GPU-...) GPU 1: Tesla V100-PCIE-16GB (UUID: GPU-...)

7. Continue to Installing the SQream DB Docker container

2.1.4.18.2.6 CentOS 7.6 / RHEL 7.6 (IBM POWER9)

On POWER9, SQream DB is supported only on RHEL 7.6. 1. Install Docker for IBM POWER9:

$ wget http://ftp.unicamp.br/pub/ppc64el/rhel/7_1/docker-ppc64el/container-selinux- ,→2.9-4.el7.noarch.rpm $ wget http://ftp.unicamp.br/pub/ppc64el/rhel/7_1/docker-ppc64el/docker-ce-18.03.1. ,→ce-1.el7.centos.ppc64le.rpm $ yum install -y container-selinux-2.9-4.el7.noarch.rpm docker-ce-18.03.1.ce-1.el7. ,→centos.ppc64le.rpm

2. Tell Docker to start after a reboot:

$ sudo systemctl enable docker && sudo systemctl start docker

3. Verify that docker is running:

1 $ sudo systemctl status docker 2 ￿ docker.service - Docker Application Container Engine 3 Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset:␣ ,→disabled) 4 Active: active (running) since Mon 2019-08-12 08:22:30 IDT; 1 months 27 days ago 5 Docs: https://docs.docker.com 6 Main PID: 65794 (dockerd) 7 Tasks: 76 8 Memory: 124.5M 9 CGroup: /system.slice/docker.service 10 ￿￿65794 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd. ,→sock

4. Let your current user manage Docker, without requiring sudo

$ sudo usermod -aG docker $USER

Note: Log out and log back in again after this action

5. Install nvidia-docker • Install the NVIDIA container and container runtime packages from NVIDIA’s repository:

284 Chapter 2. Guides SQream DB, Release 2021.1

$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) $ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia- ,→docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo

$ sudo yum install -y libnvidia-container* nvidia-container-runtime*

• Add the NVIDIA runtime to the Docker daemon and restart docker:

$ sudo mkdir -p /etc/systemd/system/docker.service.d/ $ echo -e "[Service]\nExecStart\nExecStart=/usr/bin/dockerd --add- ,→runtime=nvidia=/usr/bin/nvidia-container-runtime" | sudo tee /etc/ ,→systemd/system/docker.service.d/override.conf

$ sudo systemctl daemon-reload && sudo systemctl restart docker

Note: Occasionally, there may be a signature verification error while obtaining nvidia-docker. The error looks something like this: [Errno -1] repomd.xml signature could not be verified for nvidia-docker Run the following commands to update the repository keys:

$ DIST=$(sed -n 's/releasever=//p' /etc/yum.conf) $ DIST=${DIST:-$(. /etc/os-release; echo $VERSION_ID)} $ sudo rpm -e gpg-pubkey-f796ecb0 $ sudo gpg --homedir /var/lib/yum/repos/$(uname -m)/$DIST/*/gpgdir --delete-key␣ ,→f796ecb0 $ sudo gpg --homedir /var/lib/yum/repos/$(uname -m)/latest/nvidia-docker/gpgdir -- ,→delete-key f796ecb0 $ sudo gpg --homedir /var/lib/yum/repos/$(uname -m)/latest/nvidia-container-runtime/ ,→gpgdir --delete-key f796ecb0 $ sudo gpg --homedir /var/lib/yum/repos/$(uname -m)/latest/libnvidia-container/ ,→gpgdir --delete-key f796ecb0

6. Verify the NVIDIA docker installation succeeded

$ docker run --runtime=nvidia --rm nvidia/cuda-ppc64le:10.1-base nvidia-smi -L GPU 0: Tesla V100-SXM2-16GB (UUID: GPU-...) GPU 1: Tesla V100-SXM2-16GB (UUID: GPU-...)

7. Continue to Installing the SQream DB Docker container

2.1.4.18.2.7 Ubuntu 18.04 (x64)

1. Follow the instructions for Docker CE for your platform at Get Docker Engine - Community for CentOS 2. Tell Docker to start after a reboot

$ sudo systemctl enable docker && sudo systemctl start docker

3. Verify that docker is running

2.1. Full list of guides 285 SQream DB, Release 2021.1

1 $ sudo systemctl status docker 2 ￿ docker.service - Docker Application Container Engine 3 Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset:␣ ,→disabled) 4 Active: active (running) since Mon 2019-08-12 08:22:30 IDT; 1 months 27 days ago 5 Docs: https://docs.docker.com 6 Main PID: 65794 (dockerd) 7 Tasks: 76 8 Memory: 124.5M 9 CGroup: /system.slice/docker.service 10 ￿￿65794 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd. ,→sock

4. Let your current user manage Docker, without requiring sudo

$ sudo usermod -aG docker $USER

Note: Log out and log back in again after this action

5. Install nvidia-docker

$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) $ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - $ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker. ,→list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

$ sudo apt-get update

$ sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit nvidia- ,→docker2 $ sudo pkill -SIGHUP dockerd $ sudo systemctl restart docker

6. Verify the NVIDIA docker installation

$ sudo docker run --runtime=nvidia --rm nvidia/cuda:10.1-base nvidia-smi -L GPU 0: Tesla V100-PCIE-16GB (UUID: GPU-...) GPU 1: Tesla V100-PCIE-16GB (UUID: GPU-...)

7. Continue to Installing the SQream DB Docker container

2.1.4.18.2.8 Preparing directories and mounts for SQream DB

SQream DB contains several directories that need to be defined

286 Chapter 2. Guides SQream DB, Release 2021.1

Table 15: Directories and paths Path name Definition storage The location where SQream DB stores data, metadata, and logs exposed path A location that SQream DB can read and write to. Used for allowing access to shared raw files like CSVs on local orNFS drives logs Optional location for debug logs

Note: By default, SQream DB can’t access any OS path. You must explicitly allow it.

2.1.4.18.2.9 Install the SQream DB Docker container

1. Download the SQream DB tarball and license package In the e-mail from your account manager at SQream, you have received a download link for the SQream DB installer and a license package. Download the SQream DB tarball to the user home directory. For example:

$ cd ~ $ curl -O {download URL}

2. Extract the tarball into your home directory

$ tar xf sqream_installer-2.0.7-DB2019.2.1.4-CO1.7.5-ED3.0.1-x86_64.tar.gz

3. Copy the license package Copy the license package from your home directory to the license subdirectory which is located in the newly created SQream installer directory. For example, if the licence package is titled license_package.tar.gz:

$ cp ~/license_package.tar.gz sqream_installer-2.0.7-DB2019.2.1.4-CO1.7.5-ED3.0.1- ,→x86_64/license

4. Enter the installer directory

$ cd sqream_installer-2.0.7-DB2019.2.1.4-CO1.7.5-ED3.0.1-x86_64

5. Install SQream DB In most cases, the installation command will look like this:

$ ./sqream-install -i -k -v -d -l

For example, if the main storage path for SQream DB is /mnt/largedrive and the desired shared access path is /mnt/nfs/source_files, the command will look like:

$ ./sqream-install -i -k -v /mnt/largedrive -d /mnt/nfs/source_files

For a full list of options and commands, see the sqream-installer reference

2.1. Full list of guides 287 SQream DB, Release 2021.1

6. SQream DB is now successfully installed, but not yet running.

2.1.4.18.2.10 Starting your first local cluster

1. Enter the sqream console utility, which helps coordinate SQream DB components

$ ./sqream-console

2. Start the master components:

sqream-console>sqream master --start starting master server in single_host mode ... sqream_single_host_master is up and listening on ports: 3105,3108

3. Start workers to join the cluster:

sqream-console>sqream worker --start 2 started sqream_single_host_worker_0 on port 5000, allocated gpu: 0 started sqream_single_host_worker_1 on port 5001, allocated gpu: 1

Note: By default, each worker is allocated a full GPU. To launch more workers than available GPUs, see the sqream console reference

4. SQream DB is now running! (Exit the console by typing exit when done.)

What’s next?

• Test your SQream DB installation by creating your first table • Connect an external tool to SQream DB • Additional system configuration for performance and stability

2.1.4.18.3 Recommended Pre-Installation Configuration

Once you’ve installed SQream DB, you can and should tune your system for better performance and stability. This page provides recommendations for production deployments of SQream DB.

In this topic:

• Recommended BIOS Settings • Installing the Operating System • Configuring the Operating System – Logging In to the Server – Automatically Creating a SQream User – Manually Creating a SQream User – Setting Up A Locale

288 Chapter 2. Guides SQream DB, Release 2021.1

– Installing the Required Packages – Installing the Recommended Tools – Installing Python 3.6.7 – Installing NodeJS on CentOS – Installing NodeJS on Ubuntu – Configuring the Network Time Protocol (NTP) – Configuring the Network Time Protocol Server – Configuring the Server to Boot Without theUI – Configuring the Security Limits – Configuring the Kernel Parameters – Configuring the Firewall – Disabling selinux – Configuring the /etc/hosts File – Configuring the DNS • Installing the Nvidia CUDA Driver – CUDA Driver Prerequisites – Updating the Kernel Headers – Disabling Nouveau – Installing the CUDA Driver ∗ Installing the CUDA Driver from the Repository ∗ Tuning Up NVIDIA Performance ∗ Disabling Automatic Bug Reporting Tools • Enabling Core Dumps – Checking the abrtd Status – Setting the Limits – Creating the Core Dumps Directory – Setting the Output Directory of the /etc/sysctl.conf File – Verifying that the Core Dumps Work – Troubleshooting Core Dumping

2.1.4.18.3.1 Recommended BIOS Settings

The BIOS settings may have a variety of names, or may not exist on your system. Each system vendor has a different set of settings and variables. It is safe to skip any and all of the configuration steps, but this may impact performance. If any doubt arises, consult the documentation for your server or your hardware vendor for the correct way to apply the settings.

2.1. Full list of guides 289 SQream DB, Release 2021.1

Item Setting Rationale Management console Connected Connection to OOB required to preserve continuous access network uptime. All drives. Connected and dis- Prerequisite for cluster or OS installation. played on RAID in- terface RAID volumes. Configured ac- Clustered to increase logical volume and provide cording to project redundancy. guidelines. Must be rebooted to take effect. Fan speed Thermal Dell fan speed: High NVIDIA Tesla GPUs are passively cooled and re- Configuration. Maximum. Specified quire high airflow to operate at full performance. minimum setting: 60. HPe thermal configura- tion: Increased cool- ing. Power regulator or HPe: HP static high Other power profiles (such as “balanced”) throttle iDRAC power unit performance mode the CPU and diminishes performance. Throttling policy enabled. Dell: iDRAC may also cause GPU failure. power unit policy (power cap policy) disabled. System Profile, High Performance The Performance profile provides potentially in- Power Profile, or creased performance by maximizing processor fre- Performance Profile quency, and the disabling certain power saving fea- tures such as C-states. Use this setting for environ- ments that are not sensitive to power consumption. Power Cap Policy or Disabled Other power profiles (like “balanced”) throttle the Dynamic power cap- CPU and may diminish performance or cause GPU ping failure. This setting may appear together with the above (Power profile or Power regulator). This setting allows disabling system ROM power cali- bration during the boot process. Power regula- tor settings are named differently in BIOS and iLO/iDRAC.

2.1.4.18.3.2 Installing the Operating System

Either the CentOS (versions 7.6-7.9) or RHEL (versions 7.6-7.9) must be installed before installing the SQream database. Either the customer or a SQream representative can perform the installation. To install the operating system: 1. Select a language (English recommended). 2. From Software Selection, select Minimal. 3. Select the Development Tools group checkbox. 4. Continue the installation. 5. Set up the necessary drives and users as per the installation process. Using Debugging Tools is recommended for future problem-solving if necessary.

290 Chapter 2. Guides SQream DB, Release 2021.1

Selecting the Development Tools group installs the following tools: • autoconf • automake • binutils • bison • flex • gcc • gcc-c++ • gettext • libtool • make • patch • pkgconfig • redhat-rpm-config • rpm-build • rpm-sign The root user is created and the OS shell is booted up.

2.1.4.18.3.3 Configuring the Operating System

When configuring the operating system, several basic settings related to creating a new server are required. Configuring these as part of your basic set-up increases your server’s security and usability.

2.1.4.18.3.4 Logging In to the Server

You can log in to the server using the server’s IP address and password for the root user. The server’s IP address and root user were created while installing the operating system above.

2.1.4.18.3.5 Automatically Creating a SQream User

To automatically create a SQream user: 1. If a SQream user was created during installation, verify that the same ID is used on every server:

$ sudo id sqream

The ID 1000 is used on each server in the following example:

$ uid=1000(sqream) gid=1000(sqream) groups=1000(sqream)

2. If the ID’s are different, delete the SQream user and SQream group from both servers:

$ sudo userdel sqream

2.1. Full list of guides 291 SQream DB, Release 2021.1

3. Recreate it using the same ID:

$ sudo rm /var/spool/mail/sqream

2.1.4.18.3.6 Manually Creating a SQream User

To manually create a SQream user: SQream enables you to manually create users. This section shows you how to manually create a user with the UID 1111. You cannot manually create during the operating system installation procedure. 1. Add a user with an identical UID on all cluster nodes:

$ useradd -u 1111 sqream

2. Add the user sqream to the wheel group.

$ sudo usermod -aG wheel sqream

You can remove the SQream user from the wheel group when the installation and configuration are complete:

$ passwd sqream

3. Log out and log back in as sqream. Note: If you deleted the sqream user and recreated it with different ID, to avoid permission errors, you must change its ownership to /home/sqream. 4. Change the sqream user’s ownership to /home/sqream:

$ sudo chown -R sqream:sqream /home/sqream

2.1.4.18.3.7 Setting Up A Locale

SQream enables you to set up a locale. In this example, the locale used is your own location. To set up a locale: 1. Set the language of the locale:

$ sudo localectl set-locale LANG=en_US.UTF-8

2. Set the time stamp (time and date) of the locale:

$ sudo timedatectl set-timezone Asia/Jerusalem

If needed, you can run the timedatectl list-timezones command to see your current time-zone.

2.1.4.18.3.8 Installing the Required Packages

You can install the required packages by running the following command:

$ sudo yum install ntp pciutils monit zlib-devel openssl-devel kernel-devel-$(uname -r)␣ ,→kernel-headers-$(uname -r) gcc net-tools wget jq

292 Chapter 2. Guides SQream DB, Release 2021.1

2.1.4.18.3.9 Installing the Recommended Tools

You can install the recommended tools by running the following command:

$ sudo yum install bash-completion.noarch vim-enhanced vim-common net-tools iotop htop␣ ,→psmisc screen xfsprogs wget yum-utils deltarpm dos2unix

2.1.4.18.3.10 Installing Python 3.6.7

1. Download the Python 3.6.7 source code tarball file from the following URL into the /home/sqream directory:

$ wget https://www.python.org/ftp/python/3.6.7/Python-3.6.7.tar.xz

2. Extract the Python 3.6.7 source code into your current directory:

$ tar -xf Python-3.6.7.tar.xz

3. Navigate to the Python 3.6.7 directory:

$ cd Python-3.6.7/Python-3

4. Run the ./configure script:

$ ./configure

5. Build the software:

$ make -j30

6. Install the software:

$ sudo make install

7. Verify that Python 3.6.7 has been installed:

$ python3.6.7

2.1.4.18.3.11 Installing NodeJS on CentOS

To install the node.js on CentOS: 1. Download the setup_12.x file as a root user logged in shell:

$ curl -sL https://rpm.nodesource.com/setup_12.x | sudo bash -

2. Clear the YUM cache and update the local metadata:

$ sudo yum clean all && sudo yum makecache fast

3. Install the node.js) file:

2.1. Full list of guides 293 SQream DB, Release 2021.1

$ sudo yum install -y nodejs

2.1.4.18.3.12 Installing NodeJS on Ubuntu

To install the node.js file on Ubuntu: 1. Download the setup_12.x file as a root user logged in shell:

$ curl -sL https://rpm.nodesource.com/setup_12.x | sudo bash -

2. Install the node.js file:

$ sudo apt-get install -y nodejs

3. Verify the node version:

2.1.4.18.3.13 Configuring the Network Time Protocol (NTP)

This section describes how to configure your NTP. If you don’t have internet access, see Configure NTP Client to Synchronize with NTP .Server To configure your NTP: 1. Install the NTP file.

$ sudo yum install ntp

2. Enable the ntpd program.

$ sudo systemctl enable ntpd

3. Start the ntdp program.

$ sudo systemctl start ntpd

4. Print a list of peers known to the server and a summary of their states.

$ sudo ntpq -p

2.1.4.18.3.14 Configuring the Network Time Protocol Server

If your organization has an NTP server, you can configure it. To configure your NTP server: 1. Output your NTP server address and append /etc/ntpd.conf to the outuput.

$ echo -e "\nserver \n" | sudo tee -a /etc/ntp.conf

2. Restart the service.

$ sudo systemctl restart ntpd

294 Chapter 2. Guides SQream DB, Release 2021.1

3. Check that synchronization is enabled:

$ sudo timedatectl

Checking that synchronization is enabled generates the following output:

$ Local time: Sat 2019-10-12 17:26:13 EDT Universal time: Sat 2019-10-12 21:26:13 UTC RTC time: Sat 2019-10-12 21:26:13 Time zone: America/New_York (EDT, -0400) NTP enabled: yes NTP synchronized: yes RTC in local TZ: no DST active: yes Last DST change: DST began at Sun 2019-03-10 01:59:59 EST Sun 2019-03-10 03:00:00 EDT Next DST change: DST ends (the clock jumps one hour backwards) at Sun 2019-11-03 01:59:59 EDT Sun 2019-11-03 01:00:00 EST

2.1.4.18.3.15 Configuring the Server to Boot Without theUI

You can configure your server to boot without a UI in cases when it is not required (recommended) by running the following command:

$ sudo systemctl set-default multi-user.target

Running this command activates the NO-UI server mode.

2.1.4.18.3.16 Configuring the Security Limits

The security limits refers to the number of open files, processes, etc. You can configure the security limits by running the echo -e command as a root user logged in shell:

$ sudo bash

$ echo -e "sqream soft nproc 1000000\nsqream hard nproc 1000000\nsqream soft nofile␣ ,→1000000\nsqream hard nofile 1000000\nsqream soft core unlimited\nsqream hard core␣ ,→unlimited" >> /etc/security/limits.conf

2.1.4.18.3.17 Configuring the Kernel Parameters

To configure the kernel parameters: 1. Insert a new line after each kernel parameter:

$ echo -e "vm.dirty_background_ratio = 5 \n vm.dirty_ratio = 10 \n vm.swappiness =␣ ,→10 \n vm.vfs_cache_pressure = 200 \n vm.zone_reclaim_mode = 0 \n" >> /etc/sysctl. ,→conf

2.1. Full list of guides 295 SQream DB, Release 2021.1

Notice: In the past, the vm.zone_reclaim_mode parameter was set to 7. In the latest Sqream version, the vm.zone_reclaim_mode parameter must be set to 0. If it is not set to 0, when a numa node runs out of memory, the system will get stuck and will be unable to pull memory from other numa nodes. 2. Check the maximum value of the fs.file.

$ sysctl -n fs.file-max

3. Optional - If the maximum value of the fs.file is smaller than 2097152, run the following command:

$ echo "fs.file-max=2097152" >> /etc/sysctl.conf

IP4 forward must be enabled for Docker and K8s installation only.

2.1.4.18.3.18 Configuring the Firewall

The example in this section shows the open ports for four sqreamd sessions. If more than four are required, open the required ports as needed. Port 8080 in the example below is a new UI port. To configure the firewall: 1. Start the service and enable FirewallID on boot:

$ systemctl start firewalld

2. Add the following ports to the permanent firewall:

$ firewall-cmd --zone=public --permanent --add-port=8080/tcp $ firewallfirewall-cmd --zone=public --permanent --add-port=3105/tcp $ firewall-cmd --zone=public --permanent --add-port=3108/tcp $ firewall-cmd --zone=public --permanent --add-port=5000-5003/tcp $ firewall-cmd --zone=public --permanent --add-port=5100-5103/tcp $ firewall-cmd --permanent --list-all

3. Reload the firewall:

$ firewall-cmd --reload

4. Start the service and enable FirewallID on boot:

$ systemctl start firewalld

If you do not need the firewall, you can disable it:

$ sudo systemctl disable firewalld

2.1.4.18.3.19 Disabling selinux

To disable selinux: 1. Show the status of selinux:

$ sudo sestatus

2. If the output is not disabled, edit the /etc/selinux/config file:

296 Chapter 2. Guides SQream DB, Release 2021.1

$ sudo vim /etc/selinux/config

3. Change SELINUX=enforcing to SELINUX=disabled. The above changes will only take effect after rebooting the server. You can disable selinux immediately after rebooting the server by running the following command:

$ sudo setenforce 0

2.1.4.18.3.20 Configuring the /etc/hosts File

To configure the /etc/hosts file: 1. Edit the /etc/hosts file:

$ sudo vim /etc/hosts

2. Call your local host:

$ 127.0.0.1 localhost $

2.1.4.18.3.21 Configuring the DNS

To configure the DNS: 1. Run the ifconfig commasnd to check your NIC name. In the following example, eth0 is the NIC name:

$ sudo vim /etc/sysconfig/network-scripts/ifcfg-eth0

2. Replace the DNS lines from the example above with your own DNS addresses :

$ sudo vim /etc/sysconfig/network-scripts/ifcfg-4.4.4.4 $ sudo vim /etc/sysconfig/network-scripts/ifcfg-8.8.8.8

2.1.4.18.3.22 Installing the Nvidia CUDA Driver

Warning: If your UI runs on the server, the server must be stopped before installing the CUDA drivers.

2.1.4.18.3.23 CUDA Driver Prerequisites

1. Verify that the NVIDIA card has been installed and is detected by the system:

$ lspci | grep -i nvidia

2. Check which version of gcc has been installed:

$ gcc --version

2.1. Full list of guides 297 SQream DB, Release 2021.1

3. If gcc has not been installed, install it for one of the following operating systems: • On RHEL/CentOS:

$ sudo yum install -y gcc

• On Ubuntu:

$ sudo apt-get install gcc

2.1.4.18.3.24 Updating the Kernel Headers

To update the kernel headers: 1. Update the kernel headers on one of the following operating systems: • On RHEL/CentOS:

$ sudo yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r)

• On Ubuntu:

$ sudo apt-get install linux-headers-$(uname -r)

2. Install wget one of the following operating systems: • On RHEL/CentOS:

$ sudo yum install wget

• On Ubuntu:

$ sudo apt-get install wget

2.1.4.18.3.25 Disabling Nouveau

You can disable Nouveau, which is the default driver. To disable Nouveau: 1. Check if the Nouveau driver has been loaded:

$ lsmod | grep nouveau

If the Nouveau driver has been loaded, the command above generates output. 2. Blacklist the Nouveau drivers to disable them:

$ cat <

3. Regenerate the kernel initramfs directory set: 1. Modify the initramfs directory set:

$ sudo dracut --force

298 Chapter 2. Guides SQream DB, Release 2021.1

2. Reboot the server:

$ sudo reboot

2.1.4.18.3.26 Installing the CUDA Driver

This section describes how to install the CUDA driver. Notice: The version of the driver installed on the customer’s server must be equal or higher than the driver included in the Sqream release package. Contact a Sqream customer service representative to identify the correct version to install.

2.1.4.18.3.27 Installing the CUDA Driver from the Repository

Installing the CUDA driver from the Repository is the recommended installation method. To install the CUDA driver from the Repository: 1. Install the CUDA dependencies for one of the following operating systems: • For RHEL:

$ sudo rpm -Uvh http://dl.fedoraproject.org/pub/epel/epel-release-latest-7. ,→noarch.rpm

• For CentOS:

$ sudo yum install epel-release

2. Install the CUDA dependencies from the epel repository:

$ sudo yum install dkms libvdpau

Installing the CUDA depedendencies from the epel repository is only required for installing runfile. 3. Download the required local repository:

$ wget http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/ ,→cuda-repo-rhel7-10-1-local-10.1.243-418.87.00-1.0-1.x86_64.rpm

4. Install the required local repository:

$ sudo yum localinstall cuda-repo-rhel7-10-1-local-10.1.243-418.87.00-1.0-1.x86_64. ,→rpm

For example, RHEL7 for cuda 10.1. 5. Install the CUDA drivers: a. Clear the YUM cache:

$ sudo yum clean all

b. Install the most current DKMS (Dynamic Kernel Module Support) NVIDIA driver:

2.1. Full list of guides 299 SQream DB, Release 2021.1

$ sudo yum -y install nvidia-driver-latest-dkms

6. Verify that the installation was successful:

$ nvidia-smi

You can prepare the CUDA driver offline from a server connected to the CUDA repo by running the following commands as a root user: 7. Query all the packages installed in your system, and verify that cuda-repo has been installed:

$ rpm -qa |grep cuda-repo

8. Navigate to the correct repository:

$ cd /etc/yum.repos.d/

9. List in long format and print lines matching a pattern for the cuda file:

$ ls -l |grep cuda

The following is an example of the correct output:

$ cuda-10-1-local.repo

10. Edit the /etc/yum.repos.d/cuda-10-1-local.repo file:

$ vim /etc/yum.repos.d/cuda-10-1-local.repo

The following is an example of the correct output:

$ name=cuda-10-1-local

11. Clone the repository to a location where it can be copied from:

$ reposync -g -l -m --repoid=cuda-10-1-local --download_path=/var/cuda-repo-10. ,→1-local

12. Copy the repository to the installation server and create the repository:

$ createrepo -g comps.xml /var/cuda-repo-10.1-local

13. Add a repo configuration file in /etc/yum.repos.d/ by editing the /etc/yum.repos.d/cuda-10.1- local.repo repository:

$ [cuda-10.1-local] $ name=cuda-10.1-local $ baseurl=file:///var/cuda-repo-10.1-local $ enabled=1 $ gpgcheck=1 $ gpgkey=file:///var/cuda-repo-10-1-local/7fa2af80.pub

14. Install the CUDA drivers by installing the most current DKMS (Dynamic Kernel Module Support) NVIDIA driver as a root user logged in shell:

300 Chapter 2. Guides SQream DB, Release 2021.1

$ sudo yum -y install nvidia-driver-latest-dkms

2.1.4.18.3.28 Tuning Up NVIDIA Performance

This section describes how to tune up NVIDIA performance. The procedures in this section are relevant to Intel only. To tune up NVIDIA performance: 1. Change the permissions on the rc.local file to executable:

$ sudo chmod +x /etc/rc.local

2. Edit the /etc/yum.repos.d/cuda-10-1-local.repo file:

$ sudo vim /etc/rc.local

3. Add the following lines: a. For V100:

$ nvidia-persistenced

b. For IBM (mandatory):

$ sudo systemctl enable nvidia-persistenced

Notice: Setting up the NVIDIA POWER9 CUDA driver includes additional set-up requirements. The NVIDIA POWER9 CUDA driver will not function properly if the additional set-up requirements are not followed. See POWER9 Setup for the additional set-up requirements. c. For K80:

$ nvidia-persistenced $ nvidia-smi -pm 1 $ nvidia-smi -acp 0 $ nvidia-smi --auto-boost-permission=0 $ nvidia-smi --auto-boost-default=0

4. Reboot the server and run the NVIDIA System Management Interface (NVIDIA SMI):

$ nvidia-smi

2.1.4.18.3.29 Disabling Automatic Bug Reporting Tools

To disable automatic bug reporting tools: 1. Run the following abort commands:

$ for i in abrt-ccpp.service abrtd.service abrt-oops.service abrt-pstoreoops. ,→service abrt-vmcore.service abrt-xorg.service ; do sudo systemctl disable $i;␣ ,→sudo systemctl stop $i; done

The server is ready for the SQream software installation.

2.1. Full list of guides 301 SQream DB, Release 2021.1

2. Run the following checks: a. Check the OS release:

$ cat /etc/os-release

b. Verify that a SQream user exists and has the same ID on all cluster member services:

$ id sqream

c. Verify that the storage is mounted:

$ mount

d. Verify that the driver has been installed correctly:

$ nvidia-smi

e. Check the maximum value of the fs.file:

$ sysctl -n fs.file-max

The desired output when checking the maximum value of the fs.file is greater or equal to 2097152. f. Run the following command as a SQream user: 3. Configure the security limits by running the echo -e command as a root user logged in shell:

$ sudo bash $ echo -e "sqream soft nproc 1000000\nsqream hard nproc 1000000\nsqream soft nofile␣ ,→1000000\nsqream hard nofile 1000000\nsqream soft core unlimited\nsqream hard core␣ ,→unlimited" >> /etc/security/limits.conf

2.1.4.18.3.30 Enabling Core Dumps

Enabling core dumps is recommended, but optional. To enable core dumps: 1. Check the abrtd Status 2. Set the limits 3. Create the core dumps directory.

2.1.4.18.3.31 Checking the abrtd Status

To check the abrtd status: 1. Check if abrtd is running:

$ sudo ps -ef |grep abrt

2. If abrtd is running, stop it:

302 Chapter 2. Guides SQream DB, Release 2021.1

$ sudo service abrtd stop $ sudo chkconfig abrt-ccpp off $ sudo chkconfig abrt-oops off $ sudo chkconfig abrt-vmcore off $ sudo chkconfig abrt-xorg off $ sudo chkconfig abrtd off

2.1.4.18.3.32 Setting the Limits

To set the limits: 1. Set the limits:

$ ulimit -c

2. If the output is 0, add the following lines to the limits.conf file (/etc/security):

$ * soft core unlimited $ * hard core unlimited

3. Log out and log in to apply the limit changes.

2.1.4.18.3.33 Creating the Core Dumps Directory

To set the core dumps directory: 1. Make the /tmp/core_dumps directory:

$ mkdir /tmp/core_dumps

2. Set the ownership of the /tmp/core_dumps directory:

$ sudo chown sqream.sqream /tmp/core_dumps

3. Grant read, write, and execute permissions to all users:

$ sudo chmod -R 777 /tmp/core_dumps

2.1.4.18.3.34 Setting the Output Directory of the /etc/sysctl.conf File

To set the output directory of the /etc/sysctl.conf file: 1. Edit the /etc/sysctl.conf file:

$ sudo vim /etc/sysctl.conf

2. Add the following to the bottom of the file:

$ kernel.core_uses_pid = 1 $ kernel.core_pattern = //core-%e-%s-%u-%g-%p-%t $ fs.suid_dumpable = 2

2.1. Full list of guides 303 SQream DB, Release 2021.1

3. To apply the changes without rebooting the server, run:

$ sudo sysctl -p

4. Check that the core output directory points to the following:

$ sudo cat /proc/sys/kernel/core_pattern

The following shows the correct generated output:

$ /tmp/core_dumps/core-%e-%s-%u-%g-%p-%t

5. Verify that the core dumping works:

$ select abort_server();

2.1.4.18.3.35 Verifying that the Core Dumps Work

You can verify that the core dumps work only after installing and running SQream. This causes the server to crash and a new core.xxx file to be included in the folder that is written in /etc/sysctl.conf To verify that the core dumps work: 1. Stop and restart all SQream services. 2. Connect to SQream with ClientCmd and run the following command:

$ select abort_server();

2.1.4.18.3.36 Troubleshooting Core Dumping

This section describes the troubleshooting procedure to be followed if all parameters have been configured correctly, but the cores have not been created. To troubleshoot core dumping: 1. Reboot the server. 2. Verify that you have folder permissions:

$ sudo chmod -R 777 /tmp/core_dumps

3. Verify that the limits have been set correctly:

$ ulimit -c

If all parameters have been configured correctly, the correct output is:

$ unlimited

4. If all parameters have been configured correctly, but running ulimit -c outputs 0, run the following:

$ sudo vim /etc/profile

5. Search for line and tag it with the hash symbol:

304 Chapter 2. Guides SQream DB, Release 2021.1

$ ulimit -S -c 0 > /dev/null 2>&1

6. If the line is not found in /etc/profile directory, do the following: a. Run the following command:

$ sudo vim /etc/init.d/functions

b. Search for the following:

$ ulimit -S -c ${DAEMON_COREFILE_LIMIT:-0} >/dev/null 2>&1

c. If the line is found, tag it with the hash symbol and reboot the server.

2.1.5 Features guides

2.1.5.1 Access control

In this topic:

• Overview • Terminology – Roles – Authentication – Authorization • Roles – Creating new roles (users) – Dropping a user – Altering a user name – Changing user passwords – Public Role – Role membership (groups) • Permissions – GRANT – REVOKE – Default permissions • Departmental Example – Setting up the department permissions – Creating new users in the departments

2.1. Full list of guides 305 SQream DB, Release 2021.1

2.1.5.1.1 Overview

Access control provides authentication and authorization in SQream DB. SQream DB manages authentication and authorization using a role-based access control system (RBAC), like ANSI SQL and other SQL products. SQream DB has a default permissions system which is inspired by Postgres, but with more power. In most cases, this allows an administrator to set things up so that every object gets permissions set automatically. In SQream DB, users log in from any worker which verifies their roles and permissions from the metadata server. Each statement issues commands as the currently logged in role. Roles are defined at the cluster level, meaning they are valid for all databases in thecluster. To bootstrap SQream DB, a new install will always have one SUPERUSER role, typically named sqream. To create more roles, you should first connect as this role.

2.1.5.1.2 Terminology

2.1.5.1.2.1 Roles

Role : a role can be a user, a group, or both. Roles can own database objects (e.g. tables), and can assign permissions on those objects to other roles. Roles can be members of other roles, meaning a user role can inherit permissions from its parent role.

2.1.5.1.2.2 Authentication

Authentication : verifying the identity of the role. User roles have usernames (role names) and passwords.

2.1.5.1.2.3 Authorization

Authorization : checking the role has permissions to do a particular thing. The GRANT command is used for this.

2.1.5.1.3 Roles

Roles are used for both users and groups. Roles are global across all databases in the SQream DB cluster. To use a ROLE as a user, it should have a password, the login permission, and connect permissions to the relevant databases.

2.1.5.1.3.1 Creating new roles (users)

A user role can log in to the database, so it should have LOGIN permissions, as well as a password. For example:

306 Chapter 2. Guides SQream DB, Release 2021.1

CREATE ROLE role_name ; GRANT LOGIN to role_name ; GRANT PASSWORD 'new_password' to role_name ; GRANT CONNECT ON DATABASE database_name to role_name ;

Examples:

CREATE ROLE new_role_name ; GRANT LOGIN TO new_role_name; GRANT PASSWORD 'my_password' TO new_role_name; GRANT CONNECT ON DATABASE master TO new_role_name;

A database role may have a number of permissions that define what tasks it can perform. These are assigned using the GRANT command.

2.1.5.1.3.2 Dropping a user

DROP ROLE role_name ;

Examples:

DROP ROLE admin_role ;

2.1.5.1.3.3 Altering a user name

Renaming a user’s role:

ALTER ROLE role_name RENAME TO new_role_name ;

Examples:

ALTER ROLE admin_role RENAME TO copy_role ;

2.1.5.1.3.4 Changing user passwords

To change a user role’s password, grant the user a new password.

GRANT PASSWORD 'new_password' TO rhendricks;

Note: Granting a new password overrides any previous password. Changing the password while the role has an active running statement does not affect that statement, but will affect subsequent statements.

2.1.5.1.3.5 Public Role

There is a public role which always exists. Each role is granted to the PUBLIC role (i.e. is a member of the public group), and this cannot be revoked. You can alter the permissions granted to the public role.

2.1. Full list of guides 307 SQream DB, Release 2021.1

The PUBLIC role has USAGE and CREATE permissions on PUBLIC schema by default, therefore, new users can create, INSERT, DELETE, and SELECT from objects in the PUBLIC schema.

2.1.5.1.3.6 Role membership (groups)

Many database administrators find it useful to group user roles together. By grouping users, permissions can be granted to, or revoked from a group with one command. In SQream DB, this is done by creating a group role, granting permissions to it, and then assigning users to that group role. To use a role purely as a group, omit granting it LOGIN and PASSWORD permissions. The CONNECT permission can be given directly to user roles, and/or to the groups they are part of.

CREATE ROLE my_group;

Once the group role exists, you can add user roles (members) using the GRANT command. For example:

-- Add my_user to this group GRANT my_group TO my_user;

To manage object permissions like databases and tables, you would then grant permissions to the group-level role (see the permissions table below. All member roles then inherit the permissions from the group. For example:

-- Grant all group users connect permissions GRANT CONNECT ON DATABASE a_database TO my_group;

-- Grant all permissions on tables in public schema GRANT ALL ON all tables IN schema public TO my_group;

Removing users and permissions can be done with the REVOKE command:

-- remove my_other_user from this group REVOKE my_group FROM my_other_user;

308 Chapter 2. Guides SQream DB, Release 2021.1

2.1.5.1.4 Permissions

Ob- Permission Description ject/layer all LOGIN use role to log into the system (the role also needs connect permission on the databases database it is connecting to) all PASSWORD the password used for logging into the system databases all SUPERUSER no permission restrictions on any activity databases database SUPERUSER no permission restrictions on any activity within that database (this does not include modifying roles or permissions) database CONNECT connect to the database database CREATE create schemas in the database database CREATE create and drop functions FUNCTION schema USAGE allows additional permissions within the schema schema CREATE create tables in the schema table SELECT SELECT from the table table INSERT INSERT into the table table DELETE DELETE and TRUNCATE on the table table DDL drop and alter on the table table ALL all the table permissions function EXECUTE use the function function DDL drop and alter on the function function ALL all function permissions

2.1.5.1.4.1 GRANT

GRANT gives permissions to a role.

-- Grant permissions at the instance/ storage cluster level: GRANT

{ SUPERUSER | LOGIN | PASSWORD '' } TO [, ...]

-- Grant permissions at the database level: GRANT {{CREATE | CONNECT| DDL | SUPERUSER | CREATE FUNCTION} [, ...] | ALL␣ ,→[PERMISSIONS]}

ON DATABASE [, ...] TO [, ...]

-- Grant permissions at the schema level: GRANT {{ CREATE | DDL | USAGE | SUPERUSER } [, ...] | ALL [ PERMISSIONS ]} (continues on next page)

2.1. Full list of guides 309 SQream DB, Release 2021.1

(continued from previous page) ON SCHEMA [, ...] TO [, ...]

-- Grant permissions at the object level: GRANT {{SELECT | INSERT | DELETE | DDL } [, ...] | ALL [PERMISSIONS]} ON { TABLE [, ...] | ALL TABLES IN SCHEMA [, ...]} TO [, ...]

-- Grant execute function permission: GRANT {ALL | EXECUTE | DDL} ON FUNCTION function_name TO role;

-- Allows role2 to use permissions granted to role1 GRANT [, ...] TO

-- Also allows the role2 to grant role1 to other roles: GRANT [, ...] TO WITH ADMIN OPTION

GRANT examples: GRANT LOGIN,superuser TO admin;

GRANT CREATE FUNCTION ON database master TO admin;

GRANT SELECT ON TABLE admin.table1 TO userA;

GRANT EXECUTE ON FUNCTION my_function TO userA;

GRANT ALL ON FUNCTION my_function TO userA;

GRANT DDL ON admin.main_table TO userB;

GRANT ALL ON all tables IN schema public TO userB;

GRANT admin TO userC;

GRANT superuser ON schema demo TO userA

GRANT admin_role TO userB;

2.1.5.1.4.2 REVOKE

REVOKE removes permissions from a role.

-- Revoke permissions at the instance/ storage cluster level: REVOKE { SUPERUSER | LOGIN (continues on next page)

310 Chapter 2. Guides SQream DB, Release 2021.1

(continued from previous page) | PASSWORD } FROM [, ...]

-- Revoke permissions at the database level: REVOKE {{CREATE | CONNECT | DDL | SUPERUSER | CREATE FUNCTION}[, ...] |ALL [PERMISSIONS]} ON DATABASE [, ...] FROM [, ...]

-- Revoke permissions at the schema level: REVOKE {{ CREATE | DDL | USAGE | SUPERUSER } [, ...] | ALL [PERMISSIONS]} ON SCHEMA [, ...] FROM [, ...]

-- Revoke permissions at the object level: REVOKE {{ SELECT | INSERT | DELETE | DDL } [, ...] | ALL } ON {[ TABLE ] [, ...] | ALL TABLES IN SCHEMA

[, ...]} FROM [, ...]

-- Removes access to permissions in role1 by role 2 REVOKE [, ...] FROM [, ...] WITH ADMIN OPTION

-- Removes permissions to grant role1 to additional roles from role2 REVOKE [, ...] FROM [, ...] WITH ADMIN OPTION

Examples:

REVOKE superuser on schema demo from userA;

REVOKE delete on admin.table1 from userB;

REVOKE login from role_test;

REVOKE CREATE FUNCTION FROM admin;

2.1.5.1.4.3 Default permissions

The default permissions system (See ALTER DEFAULT PERMISSIONS) can be used to automatically grant permissions to newly created objects (See the departmental example below for one way it can be used). A default permissions rule looks for a schema being created, or a table (possibly by schema), and is table to grant any permission to that object to any role. This happens when the create table or create schema statement is run.

ALTER DEFAULT PERMISSIONS FOR target_role_name [IN schema_name, ...] FOR { TABLES | SCHEMAS } { grant_clause | DROP grant_clause} TO ROLE { role_name | public }; (continues on next page)

2.1. Full list of guides 311 SQream DB, Release 2021.1

(continued from previous page) grant_clause ::= GRANT { CREATE FUNCTION | SUPERUSER | CONNECT | CREATE | USAGE | SELECT | INSERT | DELETE | DDL | EXECUTE | ALL }

2.1.5.1.5 Departmental Example

You work in a company with several departments. The example below shows you how to manage permissions in a database shared by multiple departments, where each department has different roles for the tables by schema. It walks you through how tosetthe permissions up for existing objects and how to set up default permissions rules to cover newly created objects. The concept is that you set up roles for each new schema with the correct permissions, then the existing users can use these roles. A superuser must do new setup for each new schema which is a limitation, but superuser permissions are not needed at any other time, and neither are explicit grant statements or object ownership changes. In the example, the database is called my_database, and the new or existing schema being set up to be managed in this way is called my_schema. There will be a group for this schema for each of the following:

Group Activities database designers create, alter and drop tables updaters insert and delete data readers read data security officers add and remove users from these groups

2.1.5.1.5.1 Setting up the department permissions

As a superuser, you connect to the system and run the following:

-- create the groups

CREATE ROLE my_schema_security_officers; CREATE ROLE my_schema_database_designers; CREATE ROLE my_schema_updaters; CREATE ROLE my_schema_readers; (continues on next page)

312 Chapter 2. Guides SQream DB, Release 2021.1

Fig. 4: Our departmental example has four user group roles and seven users roles

(continued from previous page)

-- grant permissions for each role -- we grant permissions for existing objects here too, -- so you don't have to start with an empty schema

-- security officers

GRANT connect ON DATABASE my_database TO my_schema_security_officers; GRANT usage ON SCHEMA my_schema TO my_schema_security_officers;

GRANT my_schema_database_designers TO my_schema_security_officers WITH ADMIN OPTION; GRANT my_schema_updaters TO my_schema_security_officers WITH ADMIN OPTION; GRANT my_schema_readers TO my_schema_security_officers WITH ADMIN OPTION;

-- database designers

GRANT connect ON DATABASE my_database TO my_schema_database_designers; GRANT usage ON SCHEMA my_schema TO my_schema_database_designers;

GRANT create,ddl ON SCHEMA my_schema TO my_schema_database_designers;

-- updaters

GRANT connect ON DATABASE my_database TO my_schema_updaters; GRANT usage ON SCHEMA my_schema TO my_schema_updaters;

GRANT SELECT,INSERT,DELETE ON ALL TABLES IN SCHEMA my_schema TO my_schema_updaters;

-- readers

(continues on next page)

2.1. Full list of guides 313 SQream DB, Release 2021.1

(continued from previous page) GRANT connect ON DATABASE my_database TO my_schema_readers; GRANT usage ON SCHEMA my_schema TO my_schema_readers;

GRANT SELECT ON ALL TABLES IN SCHEMA my_schema TO my_schema_readers; GRANT EXECUTE ON ALL FUNCTIONS TO my_schema_readers;

-- create the default permissions for new objects

ALTER DEFAULT PERMISSIONS FOR my_schema_database_designers IN my_schema FOR TABLES GRANT SELECT,INSERT,DELETE TO my_schema_updaters;

-- For every table created by my_schema_database_designers, give access to my_schema_ ,→readers:

ALTER DEFAULT PERMISSIONS FOR my_schema_database_designers IN my_schema FOR TABLES GRANT SELECT TO my_schema_readers;

Note: • This process needs to be repeated by a user with SUPERUSER permissions each time a new schema is brought into this permissions management approach. • By default, any new object created will not be accessible by our new my_schema_readers group. Running a GRANT SELECT ... only affects objects that already exist in the schema or database. If you’re getting a Missing the following permissions: SELECT on table 'database.public. tablename' error, make sure that you’ve altered the default permissions with the ALTER DEFAULT PERMISSIONS statement.

2.1.5.1.5.2 Creating new users in the departments

After the group roles have been created, you can now create user roles for each of your users.

-- create the new database designer users

CREATE ROLE ecodd; GRANT LOGIN TO ecodd; GRANT PASSWORD 'ecodds_secret_password' TO ecodd; GRANT CONNECT ON DATABASE my_database TO ecodd; GRANT my_schema_database_designers TO ecodd;

CREATE ROLE ebachmann; GRANT LOGIN TO ebachmann; GRANT PASSWORD 'another_secret_password' TO ebachmann; GRANT CONNECT ON DATABASE my_database TO ebachmann; GRANT my_database_designers TO ebachmann;

-- If a user already exists, we can assign that user directly to the group

(continues on next page)

314 Chapter 2. Guides SQream DB, Release 2021.1

(continued from previous page) GRANT my_schema_updaters TO rhendricks;

-- Create users in the readers group

CREATE ROLE jbarker; GRANT LOGIN TO jbarker; GRANT PASSWORD 'action_jack' TO jbarker; GRANT CONNECT ON DATABASE my_database TO jbarker; GRANT my_schema_readers TO jbarker;

CREATE ROLE lbream; GRANT LOGIN TO lbream; GRANT PASSWORD 'artichoke123' TO lbream; GRANT CONNECT ON DATABASE my_database TO lbream; GRANT my_schema_readers TO lbream;

CREATE ROLE pgregory; GRANT LOGIN TO pgregory; GRANT PASSWORD 'c1ca6a' TO pgregory; GRANT CONNECT ON DATABASE my_database TO pgregory; GRANT my_schema_readers TO pgregory;

-- Create users in the security officers group

CREATE ROLE hoover; GRANT LOGIN TO hoover; GRANT PASSWORD 'mintchip' TO hoover; GRANT CONNECT ON DATABASE my_database TO hoover; GRANT my_schema_security_officers TO hoover;

After this setup: • Database designers will be able to run any ddl on objects in the schema and create new objects, including ones created by other database designers • Updaters will be able to insert and delete to existing and new tables • Readers will be able to read from existing and new tables All this will happen without having to run any more GRANT statements. Any security officer will be able to add and remove users from these groups. Creating and dropping login users themselves must be done by a superuser.

2.1.5.2 Concurrency and locks

Locks are used in SQream DB to provide consistency when there are multiple concurrent transactions updating the database. Read only transactions are never blocked, and never block anything. Even if you drop a database while concurrently running a query on it, both will succeed correctly (as long as the query starts running before the drop database commits).

2.1. Full list of guides 315 SQream DB, Release 2021.1

2.1.5.2.1 Locking modes

SQream DB has two kinds of locks: • exclusive - this lock mode prevents the resource from being modified by other statements This lock tells other statements that they’ll have to wait in order to change an object. DDL operations are always exclusive. They block other DDL operations, and update DML operations (insert and delete). • inclusive - For insert operations, an inclusive lock is obtained on a specific object. This prevents other statements from obtaining an exclusive lock on the object. This lock allows other statements to insert or delete data from a table, but they’ll have to wait in order to run DDL.

2.1.5.2.2 When are locks obtained?

Operation SELECT INSERT DELETE, TRUNCATE DDL SELECT Concurrent Concurrent Concurrent Concurrent INSERT Concurrent Concurrent Concurrent Wait DELETE, TRUNCATE Concurrent Concurrent Wait Wait DDL Concurrent Wait Wait Wait

Statements that wait will exit with an error if they hit the lock timeout. The default timeout is 3 seconds, see statementLockTimeout.

2.1.5.2.2.1 Global locks

Some operations require exclusive global locks at the cluster level. These usually short-lived locks will be obtained for the following operations: • CREATE DATABASE • CREATE ROLE • CREATE TABLE • ALTER ROLE • ALTER TABLE • DROP DATABASE • DROP ROLE • DROP TABLE • GRANT • REVOKE

2.1.5.2.3 Monitoring locks

Monitoring locks across the cluster can be useful when transaction contention takes place, and statements appear “stuck” while waiting for a previous statement to release locks.

316 Chapter 2. Guides SQream DB, Release 2021.1

The utility SHOW_LOCKS can be used to see the active locks. In this example, we create a table based on results (CREATE TABLE AS), but we are also effectively dropping the previous table (by using OR REPLACE which also drops the table). Thus, SQream DB applies locks during the table creation process to prevent the table from being altered during it’s creation. t=> SELECT SHOW_LOCKS(); statement_id | statement_string ␣ ,→ | username | server | port | locked_object ␣ ,→ | lockmode | statement_start_time | lock_start_time ------+------,→------+------+------+------+------,→--+------+------+------287 | CREATE OR REPLACE TABLE nba2 AS SELECT "Name" FROM nba WHERE REGEXP_COUNT( ,→"Name", '( )+', 8)>1; | sqream | 192.168.1.91 | 5000 | database$t ␣ ,→ | Inclusive | 2019-12-26 00:03:30 | 2019-12-26 00:03:30 287 | CREATE OR REPLACE TABLE nba2 AS SELECT "Name" FROM nba WHERE REGEXP_COUNT( ,→"Name", '( )+', 8)>1; | sqream | 192.168.1.91 | 5000 | globalpermission$ ␣ ,→ | Exclusive | 2019-12-26 00:03:30 | 2019-12-26 00:03:30 287 | CREATE OR REPLACE TABLE nba2 AS SELECT "Name" FROM nba WHERE REGEXP_COUNT( ,→"Name", '( )+', 8)>1; | sqream | 192.168.1.91 | 5000 | schema$t$public ␣ ,→ | Inclusive | 2019-12-26 00:03:30 | 2019-12-26 00:03:30 287 | CREATE OR REPLACE TABLE nba2 AS SELECT "Name" FROM nba WHERE REGEXP_COUNT( ,→"Name", '( )+', 8)>1; | sqream | 192.168.1.91 | 5000 | table$t$public$nba2$Insert ␣ ,→ | Exclusive | 2019-12-26 00:03:30 | 2019-12-26 00:03:30 287 | CREATE OR REPLACE TABLE nba2 AS SELECT "Name" FROM nba WHERE REGEXP_COUNT( ,→"Name", '( )+', 8)>1; | sqream | 192.168.1.91 | 5000 | table$t$public$nba2$Update ␣ ,→ | Exclusive | 2019-12-26 00:03:30 | 2019-12-26 00:03:30

2.1.5.2.4 Troubleshooting locks

Sometimes, a rare situation can occur where a lock is never freed. The workflow for troubleshooting locks is: 1. Identify which statement has obtained locks 2. Understand if the statement is itself stuck, or waiting for another statement 3. Try to abort the offending statement 4. Force the stale locks to be removed

2.1.5.2.4.1 Example

We will assume that the statement from the previous example is stuck (statement #287). We can attempt to abort it using STOP_STATEMENT: t=> SELECT STOP_STATEMENT(287); executed

If the locks still appear in the SHOW_LOCKS utility, we can force remove the stale locks: t=> SELECT RELEASE_DEFUNCT_LOCKS(); executed

2.1. Full list of guides 317 SQream DB, Release 2021.1

Warning: This operation can cause some statements to fail on the specific worker on which they are queued. This is intended as a “last resort” to solve stale locks.

2.1.5.3 Concurrency and scaling in SQream DB

A SQream DB cluster can concurrently run one regular statement per worker process. A number of small statements will execute alongside these statements without waiting or blocking anything. SQream DB supports n concurrent statements by having n workers in a cluster. Each worker uses a fixed slice of a GPU’s memory, with usual values are around 8-16GB of GPU memory per worker. This size is ideal for queries running on large data with potentially large row sizes.

2.1.5.3.1 Scaling when data sizes grow

For many statements, SQream DB scales linearly when adding more storage and querying on large data sets. It uses very optimised ‘brute force’ algorithms and implementations, which don’t suffer from sudden performance cliffs at larger data sizes.

2.1.5.3.2 Scaling when queries are queueing

SQream DB scales well by adding more workers, GPUs, and nodes to support more concurrent statements.

2.1.5.3.3 What to do when queries are slow

Adding more workers or GPUs does not boost the performance of a single statement or query. To boost the performance of a single statement, start by examining the best practices and ensure the guidelines are followed. Adding additional RAM to nodes, using more GPU memory, and faster CPUs or storage can also sometimes help.

Need help?

Analyzing complex workloads can be challenging. SQream’s experienced customer support has the experience to advise on these matters to ensure the best experience. Visit SQream’s support portal for additional support.

2.1.5.4 Metadata system

SQream DB contains a transparent and automatic system that collects metadata describing each chunk. The collected metadata enables effective skipping of chunks and extents when queries are executed.

318 Chapter 2. Guides SQream DB, Release 2021.1

2.1.5.4.1 How is metadata collected?

When data is inserted into SQream DB, the load process splits data into chunks. Several parameters are collected and stored for later use, including: • Range of values for each column chunk (minimum, maximum) • The number of values • Additional information for query optimization Data is collected automatically and transparently on every column type.

Fig. 5: Chunks are collections of rows from a column

Fig. 6: Metadata is automatically added to each chunk

2.1.5.4.2 How is metadata used?

Chunk metadata is collected for identifying column values and potentially skipping accessing them, to reduce unnecessary I/O operations. For example, when a query specifies a filter (e.g. WHERE or JOIN condition) on

2.1. Full list of guides 319 SQream DB, Release 2021.1

a range of values that spans a fraction of the table values, SQream DB will optimally scan only that fraction of the table chunks. Queries that filter on fine-grained date and time ranges will be the most effective, particularly when data is timestamped, and when tables contain a large amount of historical data.

2.1.5.4.3 Why is metadata always on?

Metadata collection adds very little overhead to data load. WHen possible, most metadata collection is performed in the GPU. Metadata is collected for every chunk, and adds a handful of kilobytes at most per million values, and very few compute cycles. At scale, metadata collection is often negligible, resulting in a 0.005% overhead. For a 10TB dataset, the metadata storage overhead is estimated at 0.5GB. Because SQream DB’s metadata collection is so light-weight and often results in effective data skipping, it is always-on.

2.1.5.5 Chunks and extents

All data in SQream DB is stored in logical tables. Each table is made up of rows, spanning one or more columns. Internally however, SQream DB stores data partitioned vertically by column, and horizontally by chunks. This topic describes the chunk concept, which can be helpful for tuning SQream DB workloads and table structures.

2.1.5.5.1 What are chunks? What are extents?

2.1.5.5.1.1 Chunks

All data in a table is automatically partitioned into columns, and each column is divided up further into chunks. A chunk is a contiguous number of rows from a specific column. It can be thought of as an automatic partition that spans several millions of records of one column. A chunk is often between 1MB to a couple hundred megabytes uncompressed, depending on the data type (however, all data in SQream DB is stored compressed). This chunk size is suitable for filtering and deleting data from large tables, which can contain hundreds of millions or even billions of chunks. SQream DB adds metadata to chunks automatically.

Note: Chunking is automatic and transparent for all tables in SQream DB.

320 Chapter 2. Guides SQream DB, Release 2021.1

Fig. 7: Chunks are collections of rows from a column

2.1.5.5.1.2 Extents

The next step up from the chunk, an extent is a specific number of contiguous chunks. Extents are designed to optimize disk access patterns, at around 20MB compressed, on-disk. An extent will therefore include between 1 and 25 chunks, based on the actual compressed chunk size.

Fig. 8: Extents are a collection of several contiguous chunks

2.1.5.5.2 Why was SQream DB built with chunks and extents?

2.1.5.5.2.1 Benefits of chunking

Unlike node-partitioning (or sharding), chunking carries several benefits:

2.1. Full list of guides 321 SQream DB, Release 2021.1

• Chunks are small enough to allow multiple workers to read them concurrently • Chunks are optimized for fast insertion of data • Chunks carry metadata, which narrows down their contents for the optimizer • Chunks are ideal for data retension as they can be deleted en-masse • Chunks are optimized for reading into RAM and the GPU • Chunks are compressed individually, which improves compression and data locality

2.1.5.5.2.2 Storage reorganization

SQream DB performs some background storage reorganization to optimize I/O and read patterns. For example, when data is inserted in small batches, SQream DB will run two background processes called rechunk and reextent to reorganize the data into larger contiguous chunks and extents. This is also what happens when data is deleted. Data is never overwritten in SQream DB. Instead, new optimized chunks and extents are written to replace the old chunks and extents. Once all data has been rewritten, SQream DB will swap over to the new optimized chunks and extents and remove the old, unoptimized data.

2.1.5.5.2.3 Metadata

The chunk metadata that SQream DB collects enables effective skipping of chunks and extents when queries are executed. When a query specifies a filter (e.g. WHERE or JOIN condition) on a range of values that spans a fraction of the table values, SQream DB will optimally scan only that fraction of the table chunks. Queries that filter on fine-grained date and time ranges will be the most effective, particularly when data is timestamped, and when tables contain a large amount of historical data. See more in our Time based data management guide and our Metadata system guide.

2.1.5.6 Compression

SQream DB uses compression and encoding techniques to optimize query performance and save on disk space.

2.1.5.6.1 Encoding

Encoding converts data into a common format. When data is stored in a columnar format, it is often in a common format. This is in contrast with data stored in CSVs for example, where everything is stored in a text format. Because encoding uses specific data formats and encodings, it increases performance and reduces datasize. SQream DB encodes data in several ways depending on the data type. For example, a date is stored as an integer, with March 1st 1CE as the start. This is a lot more efficient than encoding the date as a string, and offers a wider range than storing it relative to the Unix Epoch.

322 Chapter 2. Guides SQream DB, Release 2021.1

2.1.5.6.2 Compression

Compression transforms data into a smaller format without losing accuracy (lossless). After encoding a set of column values, SQream DB packs the data and compresses it. Before data can be accessed, SQream DB needs to decompress it. Depending on the compression scheme, the operations can be performed on the CPU or the GPU. Some users find that GPU compressions perform better for their data.

2.1.5.6.2.1 Automatic compression

By default, SQream DB automatically compresses every column (see Specifying compressions below for overriding default compressions). This feature is called automatic adaptive compression strategy. When loading data, SQream DB automatically decides on the compression schemes for specific chunks of data by trying several compression schemes and selecting the one that performs best. SQream DB tries to balance more agressive compressions with the time and CPU/GPU time required to compress and decompress the data.

2.1. Full list of guides 323 SQream DB, Release 2021.1

2.1.5.6.2.2 Compression strategies

Compression name Supported data types Description Location FLAT All types No compression (forced) •

DEFAULT All types Automatic scheme selec- • tion DICT Integer types, dates and Dictionary compression GPU timestamps, short texts with RLE. For each chunk, SQream DB cre- ates a dictionary of dis- tinct values and stores only their indexes. Works best for integers and texts shorter than 120 characters, with <10% unique values. Useful for storing ENUMs or keys, stock tickers, and dimensions. If the data is optionally sorted, this compression will perform even bet- ter. P4D Integer types, dates and Patched frame-of- GPU timestamps reference + Delta Based on the delta between consecutive values. Works best for monotonously in- creasing or decreasing numbers and times- tamps LZ4 Text types Lempel-Ziv general pur- CPU pose compression, used for texts SNAPPY Text types General purpose com- CPU pression, used for texts RLE Integer types, dates and Run-length encoding. GPU timestamps This replaces sequences of values with a single pair. It is best for low cardinality columns that are used to sort data (ORDER BY). SEQUENCE Integer types Optimized RLE + Delta GPU type for built-in identity columns.

324 Chapter 2. Guides SQream DB, Release 2021.1

2.1.5.6.2.3 Specifying compression strategies

When creating a table without any compression specifications, SQream DB defaults to automatic adaptive compression ("default"). However, this can be overriden by specifying a compression strategy when creating a table.

2.1.5.6.2.4 Explicitly specifying automatic compression

The following two are equivalent:

CREATE TABLE t ( x INT, y VARCHAR(50) );

In this version, the default compression is specified explicitly:

CREATE TABLE t ( x INT CHECK('CS "default"'), y VARCHAR(50) CHECK('CS "default"') );

2.1.5.6.2.5 Forcing no compression (flat)

In some cases, you may wish to remove compression entirely on some columns, in order to reduce CPU or GPU resource utilization at the expense of increased I/O.

CREATE TABLE t ( x INT NOT NULL CHECK('CS "flat"'), -- This column won't be compressed y VARCHAR(50) -- This column will still be compressed automatically );

2.1.5.6.2.6 Forcing compressions

In some cases, you may wish to force SQream DB to use a specific compression scheme based on your knowledge of the data. For example:

CREATE TABLE t ( id BIGINT NOT NULL CHECK('CS "sequence"'), y VARCHAR(110) CHECK('CS "lz4"'), -- General purpose text compression z VARCHAR(80) CHECK('CS "dict"'), -- Low cardinality column

);

2.1. Full list of guides 325 SQream DB, Release 2021.1

2.1.5.6.2.7 Examining compression effectiveness

Queries to the internal metadata catalog can expose how effective the compression is, as well as what compression schemes were selected. Here is a sample query we can use to query the catalog:

SELECT c.column_name AS "Column", cc.compression_type AS "Actual compression", AVG(cc.compressed_size) "Compressed", AVG(cc.uncompressed_size) "Uncompressed", AVG(cc.uncompressed_size::FLOAT/ cc.compressed_size) -1 AS "Compression␣ ,→effectiveness", MIN(c.compression_strategy) AS "Compression strategy" FROM sqream_catalog.chunk_columns cc INNER JOIN sqream_catalog.columns c ON cc.table_id = c.table_id AND cc.database_name = c.database_name AND cc.column_id = c.column_id

WHERE c.table_name = 'some_table' -- This is the table name which we want to inspect

GROUP BY 1, 2;

Example (subset) from the ontime table: stats=> SELECT c.column_name AS "Column", . cc.compression_type AS "Actual compression", . AVG(cc.compressed_size) "Compressed", . AVG(cc.uncompressed_size) "Uncompressed", . AVG(cc.uncompressed_size::FLOAT/ cc.compressed_size) -1 AS "Compression␣ ,→effectiveness", . MIN(c.compression_strategy) AS "Compression strategy" . FROM sqream_catalog.chunk_columns cc . INNER JOIN sqream_catalog.columns c . ON cc.table_id = c.table_id . AND cc.database_name = c.database_name . AND cc.column_id = c.column_id . . WHERE c.table_name = 'ontime' . . GROUP BY 1, . 2;

Column | Actual compression | Compressed | Uncompressed | Compression␣ ,→effectiveness | Compression strategy ------+------+------+------+------,→------+------actualelapsedtime@null | dict | 129177 | 1032957 | ␣ ,→ 7 | default actualelapsedtime@val | dict | 1379797 | 4131831 | ␣ ,→ 2 | default airlineid | dict | 578150 | 2065915 | ␣ ,→ 2.7 | default (continues on next page)

326 Chapter 2. Guides SQream DB, Release 2021.1

(continued from previous page) airtime@null | dict | 130011 | 1039625 | ␣ ,→ 7 | default airtime@null | rle | 93404 | 1019833 | ␣ ,→ 116575.61 | default airtime@val | dict | 1142045 | 4131831 | ␣ ,→ 7.57 | default arrdel15@null | dict | 129177 | 1032957 | ␣ ,→ 7 | default arrdel15@val | dict | 129183 | 4131831 | ␣ ,→ 30.98 | default arrdelay@null | dict | 129177 | 1032957 | ␣ ,→ 7 | default arrdelay@val | dict | 1389660 | 4131831 | ␣ ,→ 2 | default arrdelayminutes@null | dict | 129177 | 1032957 | ␣ ,→ 7 | default arrdelayminutes@val | dict | 1356034 | 4131831 | ␣ ,→ 2.08 | default arrivaldelaygroups@null | dict | 129177 | 1032957 | ␣ ,→ 7 | default arrivaldelaygroups@val | p4d | 516539 | 2065915 | ␣ ,→ 3 | default arrtime@null | dict | 129177 | 1032957 | ␣ ,→ 7 | default arrtime@val | p4d | 1652799 | 2065915 | ␣ ,→ 0.25 | default arrtimeblk | dict | 688870 | 9296621 | ␣ ,→ 12.49 | default cancellationcode@null | dict | 129516 | 1035666 | ␣ ,→ 7 | default cancellationcode@null | rle | 54392 | 1031646 | ␣ ,→ 131944.62 | default cancellationcode@val | dict | 263149 | 1032957 | ␣ ,→ 4.12 | default cancelled | dict | 129183 | 4131831 | ␣ ,→ 30.98 | default carrier | dict | 578150 | 2065915 | ␣ ,→ 2.7 | default carrierdelay@null | dict | 129516 | 1035666 | ␣ ,→ 7 | default carrierdelay@null | flat | 1041250 | 1041250 | ␣ ,→ 0 | default carrierdelay@null | rle | 4869 | 1026493 | ␣ ,→ 202740.2 | default carrierdelay@val | dict | 834559 | 4131831 | ␣ ,→ 14.57 | default crsarrtime | p4d | 1652799 | 2065915 | ␣ ,→ 0.25 | default crsdeptime | p4d | 1652799 | 2065915 | ␣ ,→ 0.25 | default crselapsedtime@null | dict | 130449 | 1043140 | ␣ ,→ 7 | default (continues on next page)

2.1. Full list of guides 327 SQream DB, Release 2021.1

(continued from previous page) crselapsedtime@null | rle | 3200 | 1013388 | ␣ ,→ 118975.75 | default crselapsedtime@val | dict | 1182286 | 4131831 | ␣ ,→ 2.5 | default dayofmonth | dict | 688730 | 1032957 | ␣ ,→ 0.5 | default dayofweek | dict | 393577 | 1032957 | ␣ ,→ 1.62 | default departuredelaygroups@null | dict | 129177 | 1032957 | ␣ ,→ 7 | default departuredelaygroups@val | p4d | 516539 | 2065915 | ␣ ,→ 3 | default depdel15@null | dict | 129177 | 1032957 | ␣ ,→ 7 | default depdel15@val | dict | 129183 | 4131831 | ␣ ,→ 30.98 | default depdelay@null | dict | 129177 | 1032957 | ␣ ,→ 7 | default depdelay@val | dict | 1384453 | 4131831 | ␣ ,→ 2.01 | default depdelayminutes@null | dict | 129177 | 1032957 | ␣ ,→ 7 | default depdelayminutes@val | dict | 1362893 | 4131831 | ␣ ,→ 2.06 | default deptime@null | dict | 129177 | 1032957 | ␣ ,→ 7 | default deptime@val | p4d | 1652799 | 2065915 | ␣ ,→ 0.25 | default deptimeblk | dict | 688870 | 9296621 | ␣ ,→ 12.49 | default month | dict | 247852 | 1035246 | ␣ ,→ 3.38 | default month | rle | 5 | 607346 | ␣ ,→ 121468.2 | default origin | dict | 1119457 | 3098873 | ␣ ,→ 1.78 | default quarter | rle | 8 | 1032957 | ␣ ,→ 136498.61 | default securitydelay@null | dict | 129516 | 1035666 | ␣ ,→ 7 | default securitydelay@null | flat | 1041250 | 1041250 | ␣ ,→ 0 | default securitydelay@null | rle | 4869 | 1026493 | ␣ ,→ 202740.2 | default securitydelay@val | dict | 581893 | 4131831 | ␣ ,→ 15.39 | default tailnum@null | dict | 129516 | 1035666 | ␣ ,→ 7 | default tailnum@null | rle | 38643 | 1031646 | ␣ ,→ 121128.68 | default tailnum@val | dict | 1659918 | 12395495 | ␣ ,→ 22.46 | default (continues on next page)

328 Chapter 2. Guides SQream DB, Release 2021.1

(continued from previous page) taxiin@null | dict | 130011 | 1039625 | ␣ ,→ 7 | default taxiin@null | rle | 93404 | 1019833 | ␣ ,→ 116575.61 | default taxiin@val | dict | 839917 | 4131831 | ␣ ,→ 8.49 | default taxiout@null | dict | 130011 | 1039625 | ␣ ,→ 7 | default taxiout@null | rle | 84327 | 1019833 | ␣ ,→ 116575.86 | default taxiout@val | dict | 891539 | 4131831 | ␣ ,→ 8.28 | default totaladdgtime@null | dict | 129516 | 1035666 | ␣ ,→ 7 | default totaladdgtime@null | rle | 3308 | 1031646 | ␣ ,→ 191894.18 | default totaladdgtime@val | dict | 465839 | 4131831 | ␣ ,→ 20.51 | default uniquecarrier | dict | 578221 | 7230705 | ␣ ,→ 11.96 | default year | rle | 6 | 2065915 | ␣ ,→ 317216.08 | default

2.1.5.6.2.8 Notes on reading this table:

1. Higher numbers in the effectiveness column represent better compressions. 0 represents a column that wasn’t compressed at all. 2. Column names are the internal representation. Names with @null and @val suffixes represent a nullable column’s null (boolean) and values respectively, but are treated as one logical column. 3. The query lists all actual compressions for a column, so it may appear several times if the compression has changed mid-way through the loading (as with the carrierdelay column). 4. When default is the compression strategy, the system automatically selects the best compression. This can also mean no compression at all (flat).

2.1.5.6.3 Compression best practices

2.1.5.6.3.1 Let SQream DB decide on the compression strategy

In general, SQream DB will decide on the best compression strategy in most cases. When overriding compression strategies, we recommend benchmarking not just storage size but also query and load performance.

2.1.5.6.3.2 Maximize the advantage of each compression schemes

Some compression schemes perform better when data is organized in a specific way.

2.1. Full list of guides 329 SQream DB, Release 2021.1

For example, to take advantage of RLE, sorting a column may result in better performance and reduced disk-space and I/O usage. Sorting a column partially may also be beneficial. As a rule of thumb, aim for run-lengths of more than 10 consecutive values.

2.1.5.6.3.3 Choose data types that fit the data

Adapting to the narrowest data type will improve query performance and also reduce disk space usage. However, smaller data types may compress better than larger types. For example, use the smallest numeric data type that will accommodate your data. Using BIGINT for data that fits in INT or SMALLINT can use more disk space and memory for query execution. Using FLOAT to store integers will reduce compression’s effectiveness significantly.

2.1.5.7 Data clustering

Together with the chunking and Metadata system, SQream DB uses information to execute queries efficiently. SQream DB automatically collects metadata on incoming data. This works well when the data is naturally ordered (e.g. with time-based data). There are situations where you know more about the incoming data than SQream DB. If you help by defining clustering keys, SQream DB can automatically improve query processing. SQream DB’s query optimizer typically selects the most efficient method when executing queries. If no clustering keys are available, itmay have to scan tables physically.

2.1.5.7.1 Clustered tables

A table is considered “clustered” by one or more clustering keys if rows containing similar values with regard to these expressions are more likely to be located together on disk. For example, a table containing a date column whose values cover a whole month but each chunk on disk covers less than a specific day is considered clustered by this column. Good clustering has a significant positive impact on query performance.

2.1.5.7.2 When does clustering help?

When a table is well-clustered, the metadata collected for each chunk is much more effective (the ranges are more localized). In turn, SQream DB’s query engine chooses to read fewer irrelevant chunks (or just avoid processing them). Here are some common scenarios in which data clustering is beneficial: • When a query contains a WHERE predicate of the form column COMPARISON value. For example, date_column > '2019-01-01' or id = 107 and the columns referenced are clustering keys/ SQream DB will only read the portion of the data containing values matching these predicates. • When two clustered tables are joined on their respective clustering keys, SQream DB will utilize the metadata to identify the matching chunks more easily.

330 Chapter 2. Guides SQream DB, Release 2021.1

2.1.5.7.3 Controlling data clustering

Some tables are naturally clustered. For example - a call log table containing CDRs can be naturally clustered by call date if data is inserted as it is generated, or bulk loaded in batches. Data can also be clustered by a region ID, per city, or customer type, depending on the source. If the incoming data is not well-clustered (by the desired key), it is possible to tell SQream DB which keys it should cluster by. This can be done upon table creation (CREATE TABLE), or retroactively (CLUSTER BY ). New data will be clustered upon insert. When data is loaded to an explicitly clustered table, SQream DB partially sorts it. While this slows down the insert time, it is often beneficial for subsequent queries.

Note: Some queries significantly benefit from the decision to use clustering. For example, queries that filter or join extensively on clustered columns will benefit. However, clustering can slow down data insertion. Some insert workloads can be up to 75% slower. If you are not sure whether or not a specific scenario will benefit from clustering, we recommended testing end-to-end (both insert and query performance) on a small subset of the data before commiting to permanent clustering keys.

2.1.5.7.4 Examples

2.1.5.7.4.1 Creating a clustered table

Even when the table is naturally ordered by start_date, we can specify a cluster key that is different. This will likely improve performance for queries that order by or rely on users’ country.

CREATE TABLE users ( name VARCHAR(30) NOT NULL, start_date datetime not null, country VARCHAR(30) DEFAULT 'Unknown' NOT NULL ) CLUSTER BY country;

2.1.5.8 Deleting Data

SQream DB supports deleting data, but it’s important to understand how this works and how to maintain deleted data.

2.1.5.8.1 How does deleting in SQream DB work?

In SQream DB, when you run a delete statement, any rows that match the delete predicate will no longer be returned when running subsequent queries. Deleted rows are tracked in a separate location, in delete predicates. After the delete statement, a separate process can be used to reclaim the space occupied by these rows, and to remove the small overhead that queries will have until this is done. Some benefits to this design are:

2.1. Full list of guides 331 SQream DB, Release 2021.1

1. Delete transactions complete quickly 2. The total disk footprint overhead at any time for a delete transaction or cleanup process is small and bounded (while the system still supports low overhead commit, rollback and recovery for delete transactions).

2.1.5.8.1.1 Phase 1: Delete

When a DELETE statement is run, SQream DB records the delete predicates used. These predicates will be used to filter future statements on this table until all this delete predicate’s matching rows havebeen physically cleaned up. This filtering process takes full advantage of SQream’s zone map feature.

2.1.5.8.1.2 Phase 2: Clean-up

The cleanup process is not automatic. This gives control to the user or DBA, and gives flexibility on when to run the clean up. Files marked for deletion during the logical deletion stage are removed from disk. This is achieved by calling both utility function commands: CLEANUP_CHUNKS and CLEANUP_EXTENTS sequentially.

Note: • ALTER TABLE and other DDL operations are blocked on tables that require clean-up. See more in the Concurrency and locks guide. • If the estimated time for a cleanup processs is beyond a threshold, you will get an error message about it. The message will explain how to override this limitation and run the process anywhere.

2.1.5.8.2 Notes on data deletion

Note: • If the number of deleted records crosses the threshold defined by the mixedColumnChunksThreshold parameter, the delete operation will be aborted. • This is intended to alert the user that the large number of deleted records may result in a large number of mixed chuncks. • To circumvent this alert, replace XXX with the desired number of records before running the delete operation:

set mixedColumnChunksThreshold=XXX;

2.1.5.8.2.1 Deleting data does not free up space

With the exception of a full table delete (TRUNCATE), deleting data does not free up disk space. To free up disk space, trigger the cleanup process.

332 Chapter 2. Guides SQream DB, Release 2021.1

2.1.5.8.2.2 SELECT performance on deleted rows

Queries on tables that have deleted rows may have to scan data that hasn’t been cleaned up. In some cases, this can cause queries to take longer than expected. To solve this issue, trigger the cleanup process.

2.1.5.8.2.3 Use TRUNCATE instead of DELETE

For tables that are frequently emptied entirely, consider using TRUNCATE rather than DELETE. TRUN- CATE removes the entire content of the table immediately, without requiring a subsequent cleanup to free up disk space.

2.1.5.8.2.4 Cleanup is I/O intensive

The cleanup process actively compacts tables by writing a complete new version of column chunks with no dead space. This minimizes the size of the table, but can take a long time. It also requires extra disk space for the new copy of the table, until the operation completes. Cleanup operations can create significant I/O load on the database. Consider this when planning thebest time for the cleanup process. If this is an issue with your environment, consider using CREATE TABLE AS to create a new table and then rename and drop the old table.

2.1.5.8.3 Example

2.1.5.8.3.1 Deleting values from a table farm=> SELECT * FROM cool_animals; 1,Dog ,7 2,Possum ,3 3,Cat ,5 4,Elephant ,6500 5,Rhinoceros ,2100 6,\N,\N

6 rows farm=> DELETE FROM cool_animals WHERE weight > 1000; executed farm=> SELECT * FROM cool_animals; 1,Dog ,7 2,Possum ,3 3,Cat ,5 6,\N,\N

4 rows

2.1. Full list of guides 333 SQream DB, Release 2021.1

2.1.5.8.3.2 Deleting values based on more complex predicates farm=> SELECT * FROM cool_animals; 1,Dog ,7 2,Possum ,3 3,Cat ,5 4,Elephant ,6500 5,Rhinoceros ,2100 6,\N,\N

6 rows farm=> DELETE FROM cool_animals WHERE weight > 1000; executed farm=> SELECT * FROM cool_animals; 1,Dog ,7 2,Possum ,3 3,Cat ,5 6,\N,\N

4 rows

2.1.5.8.3.3 Identifying and cleaning up tables

2.1.5.8.3.4 List tables that haven’t been cleaned up farm=> SELECT t.table_name FROM sqream_catalog.delete_predicates dp JOIN sqream_catalog.tables t ON dp.table_id = t.table_id GROUP BY 1; cool_animals

1 row

2.1.5.8.3.5 Identify predicates for clean-up farm=> SELECT delete_predicate FROM sqream_catalog.delete_predicates dp JOIN sqream_catalog.tables t ON dp.table_id = t.table_id WHERE t.table_name = 'cool_animals'; weight > 1000

1 row

334 Chapter 2. Guides SQream DB, Release 2021.1

2.1.5.8.3.6 Triggering a cleanup

-- Chunk reorganization (aka SWEEP) farm=> SELECT CLEANUP_CHUNKS('public','cool_animals'); executed

-- Delete leftover files (aka VACUUM) farm=> SELECT CLEANUP_EXTENTS('public','cool_animals'); executed farm=> SELECT delete_predicate FROM sqream_catalog.delete_predicates dp JOIN sqream_catalog.tables t ON dp.table_id = t.table_id WHERE t.table_name = 'cool_animals';

0 rows

2.1.5.8.4 Best practices for data deletion

• Run CLEANUP_CHUNKS and CLEANUP_EXTENTS after large DELETE operations. • When deleting large proportions of data from very large tables, consider running a CREATE TABLE AS operation instead, then rename and drop the original table. • Avoid killing CLEANUP_EXTENTS operations after they’ve started. • SQream DB is optimised for time-based data. When data is naturally ordered by a date or timestamp, deleting based on those columns will perform best. For more information, see our time based data management guide.

2.1.5.9 Time based data management

SQream DB’s columnar-storage system is well adapted to timestamped data. When loading data with natural ordering (sorted by a timestamp), SQream DB organizes and collects metadata in chunks of time. Natural ordering allows for fast retrieval when performing range queries.

In this topic:

• Timestamped data • Chunking • Use cases • Best practices for time-based data – Use date and datetime types for columns – Ordering – Limit workload by timestamp

2.1. Full list of guides 335 SQream DB, Release 2021.1

2.1.5.9.1 Timestamped data

Timestamped data usually has some interesting attributes: • Data is loaded in a natural order, as it is created • Updates are infrequent or non-existent. Any updates are done by inserting a new row relating to a new timestamp • Queries on timestamped data is typically on continuous time ranges • Inserting and reading data are performed in independently, not in the operation or transaction • Timestamped data has a high data volume and accumulates faster than typical OLTP workloads

2.1.5.9.2 Chunking

Core to handling timestamped data is SQream DB’s chunking and metadata system. When data is inserted, data is automatically partitioned vertically by column, and horizontally by chunk. A chunk can be thought of as an automatic partition that spans several millions of records of one column. Unlike node-partitioning (or sharding), chunking carries several benefits: • Chunks are small enough to allow multiple workers to read them concurrently • Chunks are optimized for fast insertion of data • Chunks carry metadata, which narrows down their contents for the optimizer • Chunks are ideal for data retension as they can be deleted en-masse • Chunks are optimized for reading into RAM and the GPU • Chunks are compressed individually, which improves compression and data locality

2.1.5.9.3 Use cases

Consider a set of data with a timestamp column. The timestamp order matches the order of data insertion (i.e. newer data is loaded after older data). This is common when you insert data in small bulks - every 15 minutes, every hour, every day, etc. SQream DB’s storage works by appending new data, partitioned into chunks containing millions of values. As new data is loaded, it is chunked and appended to a table. This is particularly useful in many scenarios: • You run analytical queries spanning specific date ranges (e.g. the sum of transactions during the summer in 2020 vs. the summer in 2019) • You delete data when it is older than X months old • Regulations instruct you to keep several years’ worth of data, but you’re not interested in querying this data all the time

336 Chapter 2. Guides SQream DB, Release 2021.1

2.1.5.9.4 Best practices for time-based data

2.1.5.9.4.1 Use date and datetime types for columns

When creating tables with dates or timestamps, using the purpose-built DATE and DATETIME types over integer types or VARCHAR will bring performance and storage footprint improvements, and in many cases huge performance improvements (as well as data integrity benefits). SQream DB stores dates and datetimes very efficiently and can strongly optimize queries using these specific types.

2.1.5.9.4.2 Ordering

Data ordering is an important factor in minimizing storage size and improving query performance. Prioritize inserting data based on timestamps. This will likely reduces the number of chunks that SQream DB reads during query execution. See our Data clustering guide to see how clustering keys can be defined for optimizing data order.

2.1.5.9.4.3 Limit workload by timestamp

Grouping by and filtering data based on timestamps will improve performance. For example,

SELECT DATEPART(HOUR, timestamp), MIN(transaction_amount), MAX(transaction_amount), avg(transaction_amount) FROM transactions WHERE timestamp BETWEEN (CURRENT_TIMESTAMP AND DATEADD(MONTH,-3,CURRENT_TIMESTAMP)) GROUP BY 1;

2.1.5.10 Foreign Tables

Foreign tables can be used to run queries directly on data without inserting it into SQream DB first. SQream DB supports read only foreign tables, so you can query from foreign tables, but you cannot insert to them, or run deletes or updates on them. Running queries directly on foreign (external) data is most effectively used for things like one off querying. If you will be repeatedly querying data, the performance will usually be better if you insert the data into SQream DB first. Although foreign tables can be used without inserting data into SQream DB, one of their main use cases is to help with the insertion process. An insert select statement on an foreign table can be used to insert data into SQream using the full power of the query engine to perform ETL.

In this topic:

• What kind of data is supported? • What kind of data staging is supported?

2.1. Full list of guides 337 SQream DB, Release 2021.1

• Using foreign tables - a practical example – Planning for data staging – Creating the foreign table ∗ From S3 ∗ From HDFS – Querying foreign tables – Modifying data from staging – Converting a foreign table to a standard database table • Error handling and limitations

2.1.5.10.1 What kind of data is supported?

SQream DB uses foreign data wrappers (FDW) to abstract external sources. SQream DB supports these FDWs: • text files (e.g. CSV, PSV, TSV) via the csv_fdw • ORC via the orc_fdw • Parquet via the parquet_fdw

2.1.5.10.2 What kind of data staging is supported?

SQream DB can stage data from: • a local filesystem (e.g. /mnt/storage/....) • Amazon S3 buckets (e.g. s3://pp-secret-bucket/users/*.parquet) • Using SQream in an HDFS Environment (e.g. hdfs://hadoop-nn.piedpiper.com/rhendricks/*. csv)

2.1.5.10.3 Using foreign tables - a practical example

Use an foreign table to stage data before loading from CSV, Parquet or ORC files.

2.1.5.10.3.1 Planning for data staging

For the following examples, we will want to interact with a CSV file. Here’s a peek at the table contents:

338 Chapter 2. Guides SQream DB, Release 2021.1

Table 16: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics

The file is stored on Amazon S3, at s3://sqream-demo-data/nba_players.csv. We will make note of the file structure, to create a matching CREATE_EXTERNAL_TABLE statement.

2.1.5.10.3.2 Creating the foreign table

Based on the source file structure, we we create a foreign table with the appropriate structure, and point it to the file. The file format in this case is CSV, with a DOS newline(\r\n). Here’s how the table would be created from S3 and HDFS:

2.1.5.10.3.3 From S3

CREATE FOREIGN TABLE nba ( Name varchar(40), Team varchar(40), Number tinyint, Position varchar(2), Age tinyint, Height varchar(4), Weight real, College varchar(40), Salary float ) WRAPPER csv_fdw -- Text file OPTIONS (continues on next page)

2.1. Full list of guides 339 SQream DB, Release 2021.1

(continued from previous page) ( LOCATION = 's3://sqream-demo-data/nba_players.csv', RECORD_DELIMITER = '\r\n'; -- DOS delimited file ) ;

2.1.5.10.3.4 From HDFS

CREATE FOREIGN TABLE nba ( Name varchar(40), Team varchar(40), Number tinyint, Position varchar(2), Age tinyint, Height varchar(4), Weight real, College varchar(40), Salary float ) WRAPPER csv_fdw -- Text file OPTIONS ( LOCATION = 'hdfs://hadoop-nn.piedpiper.com:8020/demo-data/nba_players.csv', RECORD_DELIMITER = '\r\n'; -- DOS delimited file ) ;

2.1.5.10.3.5 Querying foreign tables

Let’s peek at the data from the foreign table: t=> SELECT * FROM nba LIMIT 10; name | team | number | position | age | height | weight | college ␣ ,→ | salary ------+------+------+------+-----+------+------+------,→------+------Avery Bradley | Boston Celtics | 0 | PG | 25 | 6-2 | 180 | Texas ␣ ,→ | 7730337 Jae Crowder | Boston Celtics | 99 | SF | 25 | 6-6 | 235 | Marquette ␣ ,→ | 6796117 John Holland | Boston Celtics | 30 | SG | 27 | 6-5 | 205 | Boston␣ ,→University | R.J. Hunter | Boston Celtics | 28 | SG | 22 | 6-5 | 185 | Georgia␣ ,→State | 1148640 Jonas Jerebko | Boston Celtics | 8 | PF | 29 | 6-10 | 231 | ␣ ,→ | 5000000 Amir Johnson | Boston Celtics | 90 | PF | 29 | 6-9 | 240 | ␣ ,→ | 12000000 Jordan Mickey | Boston Celtics | 55 | PF | 21 | 6-8 | 235 | LSU ␣ ,→ | 1170960 (continues on next page)

340 Chapter 2. Guides SQream DB, Release 2021.1

(continued from previous page) Kelly Olynyk | Boston Celtics | 41 | C | 25 | 7-0 | 238 | Gonzaga ␣ ,→ | 2165160 Terry Rozier | Boston Celtics | 12 | PG | 22 | 6-2 | 190 | Louisville ␣ ,→ | 1824360 Marcus Smart | Boston Celtics | 36 | PG | 22 | 6-4 | 220 | Oklahoma␣ ,→State | 3431040

2.1.5.10.3.6 Modifying data from staging

One of the main reasons for staging data is to examine the contents and modify them before loading them. Assume we are unhappy with weight being in pounds, because we want to use kilograms instead. We can apply the transformation as part of a query: t=> SELECT name, team, number, position, age, height, (weight / 2.205) as weight,␣ ,→college, salary . FROM nba . ORDER BY weight; name | team | number | position | age | height |␣ ,→weight | college | salary ------+------+------+------+-----+------+--- ,→------+------+------Nikola Pekovic | Minnesota Timberwolves | 14 | C | 30 | 6-11 | ␣ ,→139.229 | | 12100000 Boban Marjanovic | San Antonio Spurs | 40 | C | 27 | 7-3 |␣ ,→131.5193 | | 1200000 Al Jefferson | Charlotte Hornets | 25 | C | 31 | 6-10 |␣ ,→131.0658 | | 13500000 Jusuf Nurkic | Denver Nuggets | 23 | C | 21 | 7-0 |␣ ,→126.9841 | | 1842000 Andre Drummond | Detroit Pistons | 0 | C | 22 | 6-11 |␣ ,→126.5306 | Connecticut | 3272091 Kevin Seraphin | New York Knicks | 1 | C | 26 | 6-10 |␣ ,→126.0771 | | 2814000 Brook Lopez | Brooklyn Nets | 11 | C | 28 | 7-0 |␣ ,→124.7166 | Stanford | 19689000 Jahlil Okafor | Philadelphia 76ers | 8 | C | 20 | 6-11 |␣ ,→124.7166 | Duke | 4582680 Cristiano Felicio | Chicago Bulls | 6 | PF | 23 | 6-10 |␣ ,→124.7166 | | 525093 [...]

Now, if we’re happy with the results, we can convert the staged foreign table to a standard table

2.1.5.10.3.7 Converting a foreign table to a standard database table

CREATE TABLE AS can be used to materialize a foreign table into a regular table.

Tip: If you intend to use the table multiple times, convert the foreign table to a standard table.

2.1. Full list of guides 341 SQream DB, Release 2021.1

t=> CREATE TABLE real_nba AS . SELECT name, team, number, position, age, height, (weight / 2.205) as weight,␣ ,→college, salary . FROM nba . ORDER BY weight; executed t=> SELECT * FROM real_nba LIMIT 5; name | team | number | position | age | height | weight ␣ ,→| college | salary ------+------+------+------+-----+------+------,→+------+------Nikola Pekovic | Minnesota Timberwolves | 14 | C | 30 | 6-11 | 139.229␣ ,→| | 12100000 Boban Marjanovic | San Antonio Spurs | 40 | C | 27 | 7-3 | 131.5193␣ ,→| | 1200000 Al Jefferson | Charlotte Hornets | 25 | C | 31 | 6-10 | 131.0658␣ ,→| | 13500000 Jusuf Nurkic | Denver Nuggets | 23 | C | 21 | 7-0 | 126.9841␣ ,→| | 1842000 Andre Drummond | Detroit Pistons | 0 | C | 22 | 6-11 | 126.5306␣ ,→| Connecticut | 3272091

2.1.5.10.4 Error handling and limitations

• Error handling in foreign tables is limited. Any error that occurs during source data parsing will result in the statement aborting. • Foreign tables are logical and do not contain any data, their structure is not verified or enforced until a query uses the table. For example, a CSV with the wrong delimiter may cause a query to fail, even though the table has been created successfully:

t=> SELECT * FROM nba; master=> select * from nba; Record delimiter mismatch during CSV parsing. User defined line delimiter \n does␣ ,→not match the first delimiter \r\n found in s3://sqream-demo-data/nba.csv

• Since the data for a foreign table is not stored in SQream DB, it can be changed or removed at any time by an external process. As a result, the same query can return different results each time it runs against a foreign table. Similarly, a query might fail if the external data is moved, removed, or has changed structure.

2.1.5.11 Working with External Data

SQream DB supports external data sources for use with Foreign Tables, COPY FROM, and COPY TO.

2.1.5.11.1 Amazon S3

SQream DB has a native S3 connector for inserting data. The s3:// URI specifies an external file path on an S3 bucket.

342 Chapter 2. Guides SQream DB, Release 2021.1

File names may contain wildcard characters and the files can be a CSV or columnar format like Parquet and ORC.

In this topic:

• S3 Configuration • S3 URI Format • Authentication • Examples – Planning for Data Staging – Creating the External Table – Querying External Tables – Bulk Loading a File from a Public S3 Bucket – Loading Files from an Authenticated S3 Bucket

2.1.5.11.1.1 S3 Configuration

Any SQream DB host that has access to S3 endpoints can access S3 without any configuration. To read files from an S3 bucket, it must have listable files.

2.1.5.11.1.2 S3 URI Format

With S3, specify a location for a file (or files) when using COPY FROM or Foreign Tables. The general S3 syntax is: s3://bucket_name/path

2.1.5.11.1.3 Authentication

SQream DB supports AWS ID and AWS SECRET authentication. These should be specified when executing a statement.

2.1.5.11.1.4 Examples

Use an external table to stage data from S3 before loading from CSV, Parquet or ORC files.

2.1.5.11.1.5 Planning for Data Staging

For the following examples, we will want to interact with a CSV file. Here’s a peek at the table contents:

2.1. Full list of guides 343 SQream DB, Release 2021.1

Table 17: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics

The file is stored on Amazon S3, and this bucket is public and listable. We will make note of the file structure, to create a matching CREATE FOREIGN TABLE statement.

2.1.5.11.1.6 Creating the External Table

Based on the source file structure, we create a foreign table with the appropriate structure, and point it to the file.

CREATE FOREIGN TABLE nba ( Name varchar(40), Team varchar(40), Number tinyint, Position varchar(2), Age tinyint, Height varchar(4), Weight real, College varchar(40), Salary float ) WRAPPER csv_fdw OPTIONS ( LOCATION = 's3://sqream-demo-data/nba_players.csv', RECORD_DELIMITER = '\r\n' -- DOS delimited file ) ;

The file format in this case is CSV, and it is stored as an S3 object (if thepathison Using SQream in an

344 Chapter 2. Guides SQream DB, Release 2021.1

HDFS Environment, change the URI accordingly). We also took note that the record delimiter was a DOS newline (\r\n).

2.1.5.11.1.7 Querying External Tables

Let’s peek at the data from the external table: t=> SELECT * FROM nba LIMIT 10; name | team | number | position | age | height | weight | college ␣ ,→ | salary ------+------+------+------+-----+------+------+------,→------+------Avery Bradley | Boston Celtics | 0 | PG | 25 | 6-2 | 180 | Texas ␣ ,→ | 7730337 Jae Crowder | Boston Celtics | 99 | SF | 25 | 6-6 | 235 | Marquette ␣ ,→ | 6796117 John Holland | Boston Celtics | 30 | SG | 27 | 6-5 | 205 | Boston␣ ,→University | R.J. Hunter | Boston Celtics | 28 | SG | 22 | 6-5 | 185 | Georgia␣ ,→State | 1148640 Jonas Jerebko | Boston Celtics | 8 | PF | 29 | 6-10 | 231 | ␣ ,→ | 5000000 Amir Johnson | Boston Celtics | 90 | PF | 29 | 6-9 | 240 | ␣ ,→ | 12000000 Jordan Mickey | Boston Celtics | 55 | PF | 21 | 6-8 | 235 | LSU ␣ ,→ | 1170960 Kelly Olynyk | Boston Celtics | 41 | C | 25 | 7-0 | 238 | Gonzaga ␣ ,→ | 2165160 Terry Rozier | Boston Celtics | 12 | PG | 22 | 6-2 | 190 | Louisville ␣ ,→ | 1824360 Marcus Smart | Boston Celtics | 36 | PG | 22 | 6-4 | 220 | Oklahoma␣ ,→State | 3431040

2.1.5.11.1.8 Bulk Loading a File from a Public S3 Bucket

The COPY FROM command can also be used to load data without staging it first.

Note: The bucket must be publicly available and objects can be listed

COPY nba FROM 's3://sqream-demo-data/nba.csv' WITH OFFSET 2 RECORD DELIMITER '\r\n';

2.1.5.11.1.9 Loading Files from an Authenticated S3 Bucket

COPY nba FROM 's3://secret-bucket/*.csv' WITH OFFSET 2 RECORD DELIMITER '\r\n' AWS_ID '12345678' AWS_SECRET 'super_secretive_secret';

2.1. Full list of guides 345 SQream DB, Release 2021.1

2.1.5.11.2 Using SQream in an HDFS Environment

2.1.5.11.2.1 Configuring an HDFS Environment for the User sqream

This section describes how to configure an HDFS environment for the user sqream and is only relevant for users with an HDFS environment. To configure an HDFS environment for the user sqream: 1. Open your bash_profile configuration file for editing:

$ vim /home/sqream/.bash_profile

3. Verify that the edits have been made:

source /home/sqream/.bash_profile

4. Check if you can access Hadoop from your machine:

$ hadoop fs -ls hdfs://:8020/

5. Verify that an HDFS environment exists for SQream services:

$ ls -l /etc/sqream/sqream_env.sh

6. If an HDFS environment does not exist for SQream services, create one (sqream_env.sh):

$ #!/bin/bash

$ SQREAM_HOME=/usr/local/sqream $ export SQREAM_HOME

$ export JAVA_HOME=${SQREAM_HOME}/hdfs/jdk $ export HADOOP_INSTALL=${SQREAM_HOME}/hdfs/hadoop $ export CLASSPATH=`${HADOOP_INSTALL}/bin/hadoop classpath --glob` $ export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_INSTALL}/lib/native $ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${SQREAM_HOME}/lib:$HADOOP_COMMON_LIB_ ,→NATIVE_DIR

$ PATH=$PATH:$HOME/.local/bin:$HOME/bin:${SQREAM_HOME}/bin/:${JAVA_HOME}/bin: ,→$HADOOP_INSTALL/bin $ export PATH

Back to top

2.1.5.11.2.2 Authenticate Hadoop Servers that Require Kerberos

If your Hadoop server requires Kerberos authentication, do the following: 1. Create a principal for the user sqream.

$ kadmin -p root/[email protected] $ addprinc [email protected]

346 Chapter 2. Guides SQream DB, Release 2021.1

2. If you do not know yor Kerberos root credentials, connect to the Kerberos server as a root user with ssh and run kadmin.local:

$ kadmin.local

Running kadmin.local does not require a password. 3. If a password is not required, change your password to [email protected].

$ change_password [email protected]

4. Connect to the hadoop name node using ssh:

$ cd /var/run/cloudera-scm-agent/process

5. Check the most recently modified content of the directory above:

$ ls -lrt

5. Look for a recently updated folder containing the text hdfs. The following is an example of the correct folder name:

cd -hdfs-

This folder should contain a file named hdfs.keytab or another similar .keytab file. 6. Copy the .keytab file to user sqream’s Home directory on the remote machines that you are planning to use Hadoop on. 7. Copy the following files to the sqream sqream@server:/hdfs/hadoop/etc/hadoop: directory: • core-site.xml • hdfs-site.xml 8. Connect to the sqream server and verify that the .keytab file’s owner is a user sqream and is granted the correct permissions:

$ sudo chown sqream:sqream /home/sqream/hdfs.keytab $ sudo chmod 600 /home/sqream/hdfs.keytab

9. Log into the sqream server. 10. Log in as the user sqream. 11. Navigate to the Home directory and check the name of a Kerberos principal represented by the following .keytab file:

$ klist -kt hdfs.keytab

The following is an example of the correct output:

$ sqream@Host-121 ~ $ klist -kt hdfs.keytab $ Keytab name: FILE:hdfs.keytab $ KVNO Timestamp Principal $ ------,→-- (continues on next page)

2.1. Full list of guides 347 SQream DB, Release 2021.1

(continued from previous page) $ 5 09/15/2020 18:03:05 HTTP/[email protected] $ 5 09/15/2020 18:03:05 HTTP/[email protected] $ 5 09/15/2020 18:03:05 HTTP/[email protected] $ 5 09/15/2020 18:03:05 HTTP/[email protected] $ 5 09/15/2020 18:03:05 HTTP/[email protected] $ 5 09/15/2020 18:03:05 HTTP/[email protected] $ 5 09/15/2020 18:03:05 HTTP/[email protected] $ 5 09/15/2020 18:03:05 HTTP/[email protected] $ 5 09/15/2020 18:03:05 hdfs/[email protected] $ 5 09/15/2020 18:03:05 hdfs/[email protected] $ 5 09/15/2020 18:03:05 hdfs/[email protected] $ 5 09/15/2020 18:03:05 hdfs/[email protected] $ 5 09/15/2020 18:03:05 hdfs/[email protected] $ 5 09/15/2020 18:03:05 hdfs/[email protected] $ 5 09/15/2020 18:03:05 hdfs/[email protected] $ 5 09/15/2020 18:03:05 hdfs/[email protected]

12. Verify that the hdfs service named hdfs/[email protected] is shown in the generated output above. 13. Run the following:

$ kinit -kt hdfs.keytab hdfs/[email protected]

13. Check the output:

$ klist

The following is an example of the correct output:

$ Ticket cache: FILE:/tmp/krb5cc_1000 $ Default principal: [email protected] $ $ Valid starting Expires Service principal $ 09/16/2020 13:44:18 09/17/2020 13:44:18 krbtgt/[email protected]

14. List the files located at the defined server name or IP address:

$ hadoop fs -ls hdfs://:8020/

15. Do one of the following: • If the list below is output, continue with Step 16. • If the list is not output, verify that your environment has been set up correctly. If any of the following are empty, verify that you followed Step 6 in the Configuring an HDFS Envi- ronment for the User sqream section above correctly:

$ echo $JAVA_HOME $ echo $SQREAM_HOME $ echo $CLASSPATH $ echo $HADOOP_COMMON_LIB_NATIVE_DIR $ echo $LD_LIBRARY_PATH $ echo $PATH

16. Verify that you copied the correct keytab file.

348 Chapter 2. Guides SQream DB, Release 2021.1

17. Review this procedure to verify that you have followed each step. NOTE: If you are using more than one server, you must start the metadataserver and serverpicker services on one node only. 18. Start the following SQream services manually and verify that they are running correctly:

$ sudo systemctl start metadataserver $ sudo systemctl start serverpicker $ sudo systemctl start sqream1 $ sudo systemctl start sqream2 $ sudo systemctl start sqream3 $ sudo systemctl start sqream4

19. Verify that all SQream processes are running and listening:

$ sudo systemctl status metadataserver $ sudo systemctl status serverpicker $ sudo systemctl status sqream1

NOTE: Run sudo systemctl status on all servers. 20. Verify that SQream is listening on all ports:

$ sudo netstat -nltp

NOTE: Depending on the package build GPU optimization, SQream takes several minutes to start listening on its ports. You can check your service logs in /var/log/sqream to check if SQream has started listening on its ports. Back to top

2.1.5.12 Transactions

SQream DB supports serializable transactions. This is also called ‘ACID compliance’. The implementation of transactions means that commit, rollback and recovery are all extremely fast. SQream DB has extremely fast bulk insert speed, with minimal slowdown when running concurrent inserts. There is no performance reason to break large inserts up into multiple transactions. The phrase “supporting transactions” for a database system sometimes means having good performance for OLTP workloads, SQream DB’s transaction system does not have high performance for high concurrency OLTP workloads. SQream DB also supports transactional DDL.

2.1.5.13 Workload Manager

The workload manager (WLM) allows SQream DB workers to identify their availability to clients with specific service names. The load balancer will then use that information to route statements to specific workers.

2.1.5.13.1 Why use the workload manager?

The workload manager allows a system engineer or database administrator to allocate specific workers and compute resoucres for various tasks.

2.1. Full list of guides 349 SQream DB, Release 2021.1

For example: 1. Creating a service queue named ETL and allocating two workers exclusively to this service prevents non-ETL statements from utilizing these compute resources. 2. Creating a service for the company’s leadership during working hours for dedicated access, and disabling this service at night to allow maintenance operations to use the available compute.

2.1.5.13.2 Setting up service queues

By default, every worker subscribes to the sqream service queue. Additional service names are configured in the configuration file for every worker, but canalsobe set on a per-session basis.

2.1.5.13.3 Example - Allocating resources for ETL

We want to allocate resources for ETL to ensure a good quality of service, but we also have management users who don’t like waiting. The configuration in this example allocates resources as follows: • 1 worker for ETL work • 3 workers for general queries • All workers assigned to queries from management

Service / Worker Worker #1 Worker #2 Worker #3 Worker #4 ETL ✓ ￿ ￿ ￿ Query service ￿ ✓ ✓ ✓ Management ✓ ✓ ✓ ✓

This configuration gives the ETL queue dedicated access to two workers, one of which can’t beusedby regular queries. Queries from management will use any available worker.

2.1.5.13.3.1 Creating this configuration

The persistent configuration for this setup is listed in these four configuration files. Each worker gets a comma-separated list of service queues that it subscribes to. These services are specified in the initialSubscribedServices attribute.

Listing 13: Worker #1 { "compileFlags":{ }, "runtimeFlags":{ }, "runtimeGlobalFlags":{ "initialSubscribedServices" : "etl,management" (continues on next page)

350 Chapter 2. Guides SQream DB, Release 2021.1

(continued from previous page) }, "server":{ "gpu": 0, "port": 5000, "cluster": "/home/rhendricks/raviga_database", "licensePath": "/home/sqream/.sqream/license.enc" } }

Listing 14: Workers #2, #3, #4 { "compileFlags":{ }, "runtimeFlags":{ }, "runtimeGlobalFlags":{ "initialSubscribedServices" : "query,management" }, "server":{ "gpu": 1, "port": 5001, "cluster": "/home/rhendricks/raviga_database", "licensePath": "/home/sqream/.sqream/license.enc" } }

Tip: This configuration can be created temporarily (for the current session only) by using the SUB- SCRIBE_SERVICE and UNSUBSCRIBE_SERVICE statements.

2.1.5.13.3.2 Verifying the configuration

Use SHOW_SUBSCRIBED_INSTANCES to view service subscriptions for each worker. Use ref:show_server_status to see the statement queues. t=> SELECT SHOW_SUBSCRIBED_INSTANCES(); service | servernode | serverip | serverport ------+------+------+------management | node_9383 | 192.168.0.111 | 5000 etl | node_9383 | 192.168.0.111 | 5000 query | node_9384 | 192.168.0.111 | 5001 management | node_9384 | 192.168.0.111 | 5001 query | node_9385 | 192.168.0.111 | 5002 management | node_9385 | 192.168.0.111 | 5002 query | node_9551 | 192.168.1.91 | 5000 management | node_9551 | 192.168.1.91 | 5000

2.1. Full list of guides 351 SQream DB, Release 2021.1

2.1.5.13.4 Configuring a client to connect to a specific service

2.1.5.13.4.1 Using Sqream SQL CLI Reference

Add --service= to the command line.

$ sqream sql --port=3108 --clustered --username=mjordan --databasename=master -- ,→service=etl Password:

Interactive client mode To quit, use ^D or \q. master=>_

2.1.5.13.4.2 Using JDBC

Add --service= to the command line.

Listing 15: JDBC connection string jdbc:Sqream://127.0.0.1:3108/raviga;user=rhendricks;password=Tr0ub4dor&3;service=etl; ,→cluster=true;ssl=false;

2.1.5.13.4.3 Using ODBC

On Linux, modify the DSN parameters in odbc.ini. For example, Service="etl":

Listing 16: odbc.ini [sqreamdb] Description=64-bit Sqream ODBC Driver=/home/rhendricks/sqream_odbc64/sqream_odbc64.so Server="127.0.0.1" Port="3108" Database="raviga" Service="etl" User="rhendricks" Password="Tr0ub4dor&3" Cluster=true Ssl=false

On Windows, change the parameter in the DSN editing window.

2.1.5.13.4.4 Using Python (pysqream)

In Python, set the service parameter in the connection command:

352 Chapter 2. Guides SQream DB, Release 2021.1

Listing 17: Python con = pysqream.connect(host='127.0.0.1', port=3108, database='raviga' , username='rhendricks', password='Tr0ub4dor&3' , clustered=True, use_ssl = False, service='etl')

2.1.5.13.4.5 Using Node.JS

Add the service to the connection settings:

Listing 18: Node.JS const Connection = require('sqreamdb'); const config = { host: '127.0.0.1', port: 3108, username: 'rhendricks', password: 'Tr0ub4dor&3', connectDatabase: 'raviga', cluster: 'true', service: 'etl' };

2.1.5.14 Python UDF (user-defined functions)

User-defined functions (UDFs) are a feature that extends SQream DB’s built in SQL functionality. SQream DB’s Python UDFs allow developers to create new functionality in SQL by writing the lower-level language implementation in Python.

In this topic:

• A simple example • Why use UDFs? • SQream DB’s UDF support – Scalar functions – Python – Using modules • Finding existing UDFs in the catalog • Getting the DDL for a function • Error handling • Permissions and sharing • Best practices

2.1. Full list of guides 353 SQream DB, Release 2021.1

2.1.5.14.1 A simple example

Most databases have an UPPER function, including SQream DB. However, assume that this function is missing for the sake of this example. You can write a function in Python to uppercase a text value using the CREATE FUNCTION syntax.

CREATE FUNCTION my_upper (x1 text) RETURNS text AS $$ return x1.upper() $$ LANGUAGE PYTHON;

Let’s break down this example: • CREATE FUNCTION my_upper - Create a function called my_upper. This name must be unique in the current database • (x1 text) - the function accepts one argument named x1 which is of the SQL type TEXT. All data types are supported. • RETURNS text - the function returns the same type - TEXT. All data types are supported. • AS $$ - what follows is some code that we don’t want to quote, so we use dollar-quoting ($$) instead of single quotes ('). • return x1.upper() - the Python function’s body is the argument named x1, uppercased. • $$ LANGUAGE PYTHON - this is the end of the function, and it’s in the Python language.

Running this example

After creating the function, you can use it in any SQL query. For example: master=>CREATE TABLE jabberwocky(line text); executed master=> INSERT INTO jabberwocky VALUES . ('''Twas brillig, and the slithy toves '), (' Did gyre and gimble in the wabe: ,→') . ,('All mimsy were the borogoves, '), (' And the mome raths outgrabe. ') . ,('"Beware the Jabberwock, my son! '), (' The jaws that bite, the claws that␣ ,→catch! ') . ,('Beware the Jubjub bird, and shun '), (' The frumious Bandersnatch!" '); executed master=> SELECT line, my_upper(line) FROM jabberwocky; line | my_upper ------+------,→------'Twas brillig, and the slithy toves | 'TWAS BRILLIG, AND THE SLITHY TOVES Did gyre and gimble in the wabe: | DID GYRE AND GIMBLE IN THE WABE: All mimsy were the borogoves, | ALL MIMSY WERE THE BOROGOVES, And the mome raths outgrabe. | AND THE MOME RATHS OUTGRABE. "Beware the Jabberwock, my son! | "BEWARE THE JABBERWOCK, MY SON! The jaws that bite, the claws that catch! | THE JAWS THAT BITE, THE CLAWS␣ ,→THAT CATCH! (continues on next page)

354 Chapter 2. Guides SQream DB, Release 2021.1

(continued from previous page) Beware the Jubjub bird, and shun | BEWARE THE JUBJUB BIRD, AND SHUN The frumious Bandersnatch!" | THE FRUMIOUS BANDERSNATCH!"

2.1.5.14.2 Why use UDFs?

• They allow simpler statements - You can create the function once, store it in the database, and call it any number of times in a statement. • They can be shared - UDFs can be created by a database administrator, and then used by other roles. • They can simplify downstream code - UDFs can be modified in SQream DB independently of program source code.

2.1.5.14.3 SQream DB’s UDF support

2.1.5.14.3.1 Scalar functions

SQream DB’s UDFs are scalar functions. This means that the UDF returns a single data value of the type defined in the RETURNS clause. For an inline scalar function, the returned scalar value is the result of a single statement.

2.1.5.14.3.2 Python

At this time, SQream DB’s UDFs are supported for Python. Python 3.6.7 is installed alongside SQream DB, for use exclusively by SQream DB. You may have a different version of Python installed on your server. To find which version of Python is installed for use by SQream DB, create and run thisUDF: master=> CREATE OR REPLACE FUNCTION py_version() . RETURNS text . AS $$ . import sys . return ("Python version: " + sys.version + ". Path: " + sys.base_exec_prefix) . $$ LANGUAGE PYTHON; executed master=> SELECT py_version(); py_version ------Python version: 3.6.7 (default, Jul 22 2019, 11:03:54) [GCC 5.4.0]. Path: /opt/sqream/python-3.6.7-5.4.0

2.1.5.14.3.3 Using modules

To import a Python module, use the standard import syntax in the first lines of the user-defined function.

2.1. Full list of guides 355 SQream DB, Release 2021.1

2.1.5.14.4 Finding existing UDFs in the catalog

The user_defined_functions catalog view contains function information. Here’s how you’d list all UDFs in the system:

master=> SELECT * FROM sqream_catalog.user_defined_functions; database_name | function_id | function_name ------+------+------master | 1 | my_upper

2.1.5.14.5 Getting the DDL for a function

master=> SELECT GET_FUNCTION_DDL('my_upper'); ddl ------create function "my_upper" (x1 text) returns text as $$ return x1.upper() $$ language python volatile;

See GET_FUNCTION_DDL for more information.

2.1.5.14.6 Error handling

In UDFs, any error that occurs causes the execution of the function to stop. This in turn causes the statement that invoked the function to be canceled.

2.1.5.14.7 Permissions and sharing

To create a UDF, the creator needs the CREATE FUNCTION permission at the database level. For example, to grant CREATE FUNCTION to a non-superuser role:

GRANT CREATE FUNCTION ON DATABASE master TO mjordan;

To execute a UDF, the role needs the EXECUTE FUNCTION permission for every function. For example, to grant the permission to the r_bi_users role group, run:

GRANT EXECUTE ON FUNCTION my_upper TO r_bi_users;

Note: Functions are stored for each database, outside of any schema.

See more information about permissions in the Access control guide.

356 Chapter 2. Guides SQream DB, Release 2021.1

2.1.5.14.8 Best practices

Although user-defined functions add flexibility, they may have some performance drawbacks. Theyarenot usually a replacement for subqueries or views. In some cases, the user-defined function provides benefits like sharing extended functionality which makesit very appealing. Use user-defined functions sparingly in the WHERE clause. SQream DB can’t optimize the function’s usage, and it will be called once for every value. If possible, you should narrow down the number of results before the UDF is called by using a subquery.

2.1.5.15 Saved queries

Saved queries can be used to reuse a query plan for a query to eliminate compilation times for repeated queries. They also provide a way to implement ‘parameterized views’.

2.1.5.15.1 How saved queries work

Saved queries are compiled when they are created. When a saved query is run, this query plan is used instead of compiling a query plan at query time.

2.1.5.15.2 Parameters support

Query parameters can be used as substitutes for literal expressions in queries. • Parameters cannot be used to substitute things like column names and table names. • Query parameters of a string datatype (like VARCHAR) must be of a fixed length, and can be used in equality checks, but not patterns (e.g. LIKE, RLIKE, etc.)

2.1.5.15.3 Creating a saved query

A saved query is created using the SAVE_QUERY utility command.

2.1.5.15.3.1 Saving a simple query t=> SELECT SAVE_QUERY('select_all','SELECT * FROM nba'); executed

2.1.5.15.3.2 Saving a parametrized query

Use parameters to replace them later at execution time. t=> SELECT SAVE_QUERY('select_by_weight_and_team','SELECT * FROM nba WHERE Weight > ?␣ ,→AND Team = ?'); executed

2.1. Full list of guides 357 SQream DB, Release 2021.1

2.1.5.15.4 Listing and executing saved queries

Saved queries are saved as a database objects. They can be listed in one of two ways: Using the catalog: t=> SELECT * FROM sqream_catalog.savedqueries; name | num_parameters ------+------select_all | 0 select_by_weight | 1 select_by_weight_and_team | 2

Using the LIST_SAVED_QUERIES utility function: t=> SELECT LIST_SAVED_QUERIES(); saved_query ------select_all select_by_weight select_by_weight_and_team

Executing a saved query requires calling it by it’s name in a EXECUTE_SAVED_QUERY statement. A saved query with no parameter is called without parameters. t=> SELECT EXECUTE_SAVED_QUERY('select_all'); Name | Team | Number | Position | Age | Height |␣ ,→Weight | College | Salary ------+------+------+------+-----+------+--- ,→-----+------+------Avery Bradley | Boston Celtics | 0 | PG | 25 | 6-2 | ␣ ,→ 180 | Texas | 7730337 Jae Crowder | Boston Celtics | 99 | SF | 25 | 6-6 | ␣ ,→ 235 | Marquette | 6796117 John Holland | Boston Celtics | 30 | SG | 27 | 6-5 | ␣ ,→ 205 | Boston University | R.J. Hunter | Boston Celtics | 28 | SG | 22 | 6-5 | ␣ ,→ 185 | Georgia State | 1148640 [...]

Executing a saved query with parameters requires specifying the parameters in the order they appear in the query: t=> SELECT EXECUTE_SAVED_QUERY('select_by_weight_and_team', 240, 'Toronto Raptors'); Name | Team | Number | Position | Age | Height | Weight |␣ ,→College | Salary ------+------+------+------+-----+------+------+------,→-----+------Bismack Biyombo | Toronto Raptors | 8 | C | 23 | 6-9 | 245 | ␣ ,→ | 2814000 James Johnson | Toronto Raptors | 3 | PF | 29 | 6-9 | 250 | Wake␣ ,→Forest | 2500000 Jason Thompson | Toronto Raptors | 1 | PF | 29 | 6-11 | 250 | Rider ␣ ,→ | 245177 (continues on next page)

358 Chapter 2. Guides SQream DB, Release 2021.1

(continued from previous page) Jonas Valanciunas | Toronto Raptors | 17 | C | 24 | 7-0 | 255 | ␣ ,→ | 4660482

2.1.5.15.5 Dropping a saved query

When you’re done with a saved query, or would like to replace it with another, you can drop it with DROP_SAVED_QUERY : t=> SELECT DROP_SAVED_QUERY('select_all'); executed t=> SELECT DROP_SAVED_QUERY('select_by_weight_and_team'); executed t=> SELECT LIST_SAVED_QUERIES(); saved_query ------select_by_weight

2.1.5.16 Seeing system objects as DDL

2.1.5.16.1 Dump specific objects

2.1.5.16.1.1 Tables

See GET_DDL for more information.

Examples

2.1.5.16.1.2 Getting the DDL for a table farm=> SELECT GET_DDL('cool_animals'); create table "public"."cool_animals" ( "id" int not null, "name" varchar(30) not null, "weight" double null, "is_agressive" bool default false not null ) ;

2.1.5.16.1.3 Exporting table DDL to a file

COPY (SELECT GET_DDL('cool_animals')) TO '/home/rhendricks/animals.ddl';

2.1.5.16.1.4 Views

See GET_VIEW_DDL for more information.

2.1. Full list of guides 359 SQream DB, Release 2021.1

Examples

2.1.5.16.1.5 Listing all views farm=> SELECT view_name FROM sqream_catalog.views; view_name ------angry_animals only_agressive_animals

2.1.5.16.1.6 Getting the DDL for a view farm=> SELECT GET_VIEW_DDL('angry_animals'); create view "public".angry_animals as select "cool_animals"."id" as "id", "cool_animals"."name" as "name", "cool_animals"."weight" as "weight", "cool_animals"."is_agressive" as "is_agressive" from "public".cool_animals as cool_animals where "cool_animals"."is_agressive" = false;

2.1.5.16.1.7 Exporting view DDL to a file

COPY (SELECT GET_VIEW_DDL('angry_animals')) TO '/home/rhendricks/angry_animals.sql';

2.1.5.16.1.8 User defined functions

See GET_FUNCTION_DDL for more information.

Examples

2.1.5.16.1.9 Listing all UDFs master=> SELECT * FROM sqream_catalog.user_defined_functions; database_name | function_id | function_name ------+------+------master | 1 | my_distance

2.1.5.16.1.10 Getting the DDL for a function

360 Chapter 2. Guides SQream DB, Release 2021.1

master=> SELECT GET_FUNCTION_DDL('my_distance'); create function "my_distance" (x1 float, y1 float, x2 float, y2 float) returns float as $$ import math if y1 < x1: return 0.0 else: return math.sqrt((y2 - y1) ** 2 + (x2 - x1) ** 2) $$ language python volatile;

2.1.5.16.1.11 Exporting function DDL to a file

COPY (SELECT GET_FUNCTION_DDL('my_distance')) TO '/home/rhendricks/my_distance.sql';

2.1.5.16.1.12 Saved queries

See LIST_SAVED_QUERIES, SHOW_SAVED_QUERY for more information.

2.1.5.16.2 Dump entire database DDLs

Dumping the database DDL includes tables and views, but not UDFs and saved queries. See DUMP_DATABASE_DDL for more information.

Examples

2.1.5.16.2.1 Exporting database DDL to a client farm=> SELECT DUMP_DATABASE_DDL(); create table "public"."cool_animals" ( "id" int not null, "name" varchar(30) not null, "weight" double null, "is_agressive" bool default false not null ) ; create view "public".angry_animals as select "cool_animals"."id" as "id", "cool_animals"."name" as "name", "cool_animals"."weight" as "weight", "cool_animals"."is_agressive" as "is_agressive" (continues on next page)

2.1. Full list of guides 361 SQream DB, Release 2021.1

(continued from previous page) from "public".cool_animals as cool_animals where "cool_animals"."is_agressive" = false;

2.1.5.16.2.2 Exporting database DDL to a file

COPY (SELECT DUMP_DATABASE_DDL()) TO '/home/rhendricks/database.ddl';

Note: To export data in tables, see COPY TO.

2.1.6 Architecture

This topic includes guides that walk an end-user, database administrator, or system architect through the main ideas behind SQream DB. While SQream DB has many similarities to other database management systems, it has some unique and additional capabilities. Explore the guides below for information about SQream DB’s architecture.

2.1.6.1 Internals and architecture

2.1.6.1.1 SQream DB internals

Here is a high level architecture diagram of SQream DB’s internals.

362 Chapter 2. Guides SQream DB, Release 2021.1

2.1.6.1.1.1 Statement compiler

The statement compiler is written in Haskell. This takes SQL text and produces an optimised statement plan.

2.1.6.1.1.2 Concurrency and concurrency control

The execution engine in SQream DB is built around thread workers with message passing. It uses threads to overlap different kinds of operations (including IO and GPU operations with CPU operations), andto accelerate CPU intensive operations.

2.1.6.1.1.3 Transactions

SQream DB has serializable transactions, with these features: • Serializable, with any kind of statement • Run multiple SELECT queries concurrently with anything • Run multiple inserts to the same table at the same time • Cannot run multiple statements in a single transaction • Other operations such as DELETE, TRUNCATE, and DDL use coarse-grained exclusive locking.

2.1.6.1.1.4 Storage

The storage is split into the metadata layer and an append-only/ garbage collected bulk data layer.

2.1.6.1.1.5 Metadata layer

The metadata layer uses LevelDB, and uses LevelDB’s snapshot and write atomic features as part of the transaction system. The metadata layer, together with the append-only bulk data layer help ensure consistency.

2.1.6.1.1.6 Bulk data layer

The bulk data layer is comprised of extents, which are optimised for IO performance as much as possible. Inside the extents, are chunks, which are optimised for processing in the CPU and GPU. Compression is used in the extents and chunks. When you run small inserts, you will get less optimised chunks and extents, but the system is designed to both be able to still run efficiently on this, and to be able to reorganise them transactionally inthe background, without blocking DML operations. By writing small chunks in small inserts, then reorganising later, it supports both fast medium sized insert transactions and fast querying.

2.1. Full list of guides 363 SQream DB, Release 2021.1

2.1.6.1.1.7 Building blocks

The heavy lifting in SQream DB is done by single purpose C++/CUDA building blocks. These are purposely designed to not be smart - they have to be instructed exactly what to do. Most of the intelligence in piecing things together is in the statement compiler.

2.1.6.1.2 Columnar

Like many other analytical database management systems, SQream DB uses a column store for tables. Column stores offer better I/O and performance with analytic workloads. Columns also compress much better, and lend themselves well to bulk data.

2.1.6.1.3 GPU usage

SQream DB uses GPUs for accelerating database operations. This acceleration brings additional benefit to columnar data processing. SQream DB’s GPU acceleration is integral to database operations. It is not an additional feature, but rather core to most data operations, e.g. GROUP BY, scalar functions, JOIN, ORDER BY, and more. Using a GPU is an extended form of SIMD (Single-instruction, multiple data) intended for high throughput operations. When GPU acceleraiton is used, SQream DB uses special building blocks to take advantage of the high degree of parallelism of the GPU. This means that GPU operations use a single instruction that runs on multiple values.

2.1.6.2 Filesystem and usage

SQream DB writes and reads data from disk. The SQream DB storage directory, sometimes refered to as a storage cluster is a collection of database objects, metadata database, and logs. Each SQream DB worker and the metadata server must have access to the storage cluster in order to function properly.

2.1.6.2.1 Directory organization

The cluster root is the directory in which all data for SQream DB is stored.

SQream DB storage cluster directories

• databases • metadata or leveldb • temp • logs

364 Chapter 2. Guides SQream DB, Release 2021.1

2.1.6.2.1.1 databases

The databases directory houses all of the actual data in tables and columns. Each database is stored as it’s own directory. Each table is stored under it’s respective database, and columns are stored in their respective table. In the example above, the database named retail contains a table directory with a directory named 23.

Tip: To find table IDs, use a catalog query: master=> SELECT table_name, table_id FROM sqream_catalog.tables WHERE table_name = ,→'customers'; table_name | table_id ------+------customers | 23

Each table directory contains a directory for each physical column. An SQL column may be built up of several physical columns (e.g. if the data type is nullable).

Tip: To find column IDs, use a catalog query: master=> SELECT column_id, column_name FROM sqream_catalog.columns WHERE table_id=23; column_id | column_name ------+------0 | name@null 1 | name@val 2 | age@null (continues on next page)

2.1. Full list of guides 365 SQream DB, Release 2021.1

(continued from previous page) 3 | age@val 4 | email@null 5 | email@val

Each column directory will contain extents, which are collections of chunks.

2.1.6.2.1.2 metadata or leveldb

SQream DB’s metadata is an embedded key-value store, based on LevelDB. LevelDB helps SQream DB ensure efficient storage for keys, handle atomic writes, snapshots, durability, and automatic recovery. The metadata is where all database objects are stored, including roles, permissions, database and table structures, chunk mappings, and more.

2.1.6.2.1.3 temp

The temp directory is where SQream DB writes temporary data. The directory to which SQream DB writes temporary data can be changed to any other directory on the filesystem. SQream recommends remapping this directory to a fast local storage to get better performance when executing intensive larger-than-RAM operations like sorting. SQream recommends an SSD or NVMe drive, in mirrored RAID 1 configuration. If desired, the temp folder can be redirected to a local disk for improved performance, by setting the tempPath setting in the configuration file.

366 Chapter 2. Guides SQream DB, Release 2021.1

2.1.6.2.1.4 logs

The logs directory contains logs produced by SQream DB. See more about the logs in the Logging guide.

2.1. Full list of guides 367 SQream DB, Release 2021.1

368 Chapter 2. Guides CHAPTER 3

Reference

This section provides reference for using SQream DB’s interfaces and SQL features.

3.1 Data Types

This topic describes the data types that SQream DB supports, and how to convert between them.

In this topic:

• Supported Types • Converting and Casting Types • Data Type Reference – Numeric (NUMERIC, DECIMAL) – Boolean (BOOL) – Integers (TINYINT, SMALLINT, INT, BIGINT) – Floating Point (REAL, DOUBLE) – String (TEXT, VARCHAR) – Date (DATE, DATETIME)

3.1.1 Supported Types

The following table shows the supported data types.

369 SQream DB, Release 2021.1

Name Description Data Size (Not Example Null, Uncom- pressed) BOOL Boolean 1 byte true values (true, false) TINYINT Unsigned 1 byte 5 integer (0 - 255) SMALLINT Integer 2 bytes -155 (-32,768 - 32,767) INT Integer (- 4 bytes 1648813 2,147,483,648 - 2,147,483,647) BIGINT Integer (- 8 bytes 36124441255243 9,223,372,036,854,775,808 - 9,223,372,036,854,775,807) REAL Floating 4 bytes 3.141 point (inex- act) DOUBLE (FLOAT) Floating 8 bytes 0.000003 point (inex- act) TEXT [(n)], Variable Up to 4*n bytes '￿￿￿￿￿￿￿￿￿￿' NVARCHAR (n) length string - UTF-8 unicode NUMERIC, 38 digits 16 bytes 0.123245678901234567890123456789012345678 DECIMAL VARCHAR (n) Variable n bytes 'Kiwis have tiny wings, but cannot fly.' length string - ASCII only DATE Date 4 bytes '1955-11-05' DATETIME Date and 8 bytes '1955-11-05 01:24:00.000' (TIMESTAMP) time pair- ing in UTC

Note: SQream DB compresses all columns and types. The data size noted is the maximum data size allocation for uncompressed data.

370 Chapter 3. Reference SQream DB, Release 2021.1

3.1.2 Converting and Casting Types

SQream DB supports explicit and implicit casting and type conversion. The system may automatically add implicit casts when combining different data types in the same expression. In many cases, while thedetails related to this are not important, they can affect the query results of a query. When necessary, an explicit cast can be used to override the automatic cast added by SQream DB. For example, the ANSI standard defines a SUM() aggregation over an INT column as an INT. However, when dealing with large amounts of data this could cause an overflow. You can rectify this by casting the value to a larger data type:

SUM(some_int_column :: BIGINT)

SQream DB supports the following three data conversion types: • CAST( TO ), to convert a value from one type to another. For example, CAST('1997-01-01' TO DATE), CAST(3.45 TO SMALLINT), CAST(some_column TO VARCHAR(30)). • :: , a shorthand for the CAST syntax. For example, '1997-01-01' :: DATE, 3.45 :: SMALLINT, (3+5) :: BIGINT. • See the SQL functions reference for additional functions that convert from a specific value which is not an SQL type, such as FROM_UNIXTS, FROM_UNIXTSMS, etc. The Data Type Reference section below provides more details about the supported casts for each type.

3.1.3 Data Type Reference

3.1.3.1 Numeric (NUMERIC, DECIMAL)

The Numeric data type (also known as Decimal) is recommended for values that tend to occur as exact decimals, such as in Finance. While Numeric has a fixed precision of 38, higher than REAL (9) or DOUBLE (17), it runs calculations more slowly. For operations that require faster performance, using Floating Point is recommended. The correct syntax for Numeric is numeric(p, s)), where p is the total number of digits (38 maximum), and s is the total number of decimal digits.

3.1.3.1.1 Numeric Examples

The following is an example of the Numeric syntax:

$ create or replace table t(x numeric(20, 10), y numeric(38, 38)); $ insert into t values(1234567890.1234567890, 0.123245678901234567890123456789012345678); $ select x + y from t;

The following table shows information relevant to the Numeric data type:

Description Data Size (Not Null, Uncom- Example pressed) 38 digits 16 bytes 0.123245678901234567890123456789012345678

Numeric supports the following operations: • All join types.

3.1. Data Types 371 SQream DB, Release 2021.1

• All aggregation types (not including Window functions). • Scalar functions (not including some trigonometric and logarithmic functions).

3.1.3.2 Boolean (BOOL)

The following table describes the Boolean data type.

Values Syntax Data Size (Not Null, Uncom- pressed) true, false (case sensitive) When loading from CSV, BOOL 1 byte, but resulting average data columns can accept 0 as false sizes may be lower after compres- and 1 as true. sion.

3.1.3.2.1 Boolean Examples

The following is an example of the Boolean syntax:

CREATE TABLE animals (name TEXT, is_angry BOOL);

INSERT INTO animals VALUES ('fox',true), ('cat',true), ('kiwi',false);

SELECT name, CASE WHEN is_angry THEN 'Is really angry!' else 'Is not angry' END FROM␣ ,→animals;

The following is an example of the correct output:

"fox","Is really angry!" "cat","Is really angry!" "kiwi","Is not angry"

3.1.3.2.2 Boolean Casts and Conversions

The following table shows the possible Boolean value conversions:

Type Details TINYINT, SMALLINT, INT, BIGINT true → 1, false → 0 REAL, DOUBLE true → 1.0, false → 0.0

3.1.3.3 Integers (TINYINT, SMALLINT, INT, BIGINT)

Integer data types are designed to store whole numbers. For more information about identity sequences (sometimes called auto-increment or auto-numbers), see Identity.

3.1.3.3.1 Integer Types

The following table describes the Integer types.

372 Chapter 3. Reference SQream DB, Release 2021.1

Name Details Data Size (Not Null, Un- Example compressed) TINYINT Unsigned integer (0 - 255) 1 byte 5 SMALLINT Integer (-32,768 - 32,767) 2 bytes -155 INT Integer (-2,147,483,648 - 2,147,483,647) 4 bytes 1648813 BIGINT Integer (-9,223,372,036,854,775,808 - 8 bytes 36124441255243 9,223,372,036,854,775,807)

The following table describes the Integer data type.

Syntax Data Size (Not Null, Uncompressed) An integer can be entered as a regular literal, such Integer types range between 1, 2, 4, and 8 bytes - as 12, -365. but resulting average data sizes could be lower after compression.

3.1.3.3.2 Integer Examples

The following is an example of the Integer syntax:

CREATE TABLE cool_numbers (a INT NOT NULL, b TINYINT, c SMALLINT, d BIGINT);

INSERT INTO cool_numbers VALUES (1,2,3,4), (-5, 127, 32000, 45000000000);

SELECT * FROM cool_numbers;

The following is an example of the correct output:

1,2,3,4 -5,127,32000,45000000000

3.1.3.3.3 Integer Casts and Conversions

The following table shows the possible Integer value conversions:

Type Details REAL, DOUBLE 1 → 1.0, -32 → -32.0 VARCHAR(n) (All numberic values must fit in the string length) 1 → '1', 2451 → '2451'

3.1.3.4 Floating Point (REAL, DOUBLE)

The Floating Point data types (REAL and DOUBLE) store extremely close value approximations, and are therefore recommended for values that tend to be inexact, such as Scientific Notation. While Floating Point generally runs faster than Numeric, it has a lower precision of 9 (REAL) or 17 (DOUBLE) compared to Numeric’s 38. For operations that require a higher level of precision, using Numeric is recommended. The floating point representation is based on IEEE 754.

3.1. Data Types 373 SQream DB, Release 2021.1

3.1.3.4.1 Floating Point Types

The following table describes the Floating Point data types.

Name Details Data Size (Not Null, Uncompressed) Example REAL Single precision floating point (inexact) 4 bytes 3.141 DOUBLE Double precision floating point (inexact) 8 bytes 0.000003

The following table shows information relevant to the Floating Point data types.

Aliases Syntax Data Size (Not Null, Uncom- pressed) DOUBLE is also known as FLOAT. A double precision floating point Floating point types are either 4 can be entered as a regular literal, or 8 bytes, but size could be lower such as 3.14, 2.718, .34, or 2. after compression. 71e-45. To enter a REAL floating point number, cast the value. For example, (3.14 :: REAL).

3.1.3.4.2 Floating Point Examples

CREATE TABLE cool_numbers (a REAL NOT NULL, b DOUBLE);

INSERT INTO cool_numbers VALUES (1,2), (3.14159265358979, 2.718281828459);

SELECT * FROM cool_numbers;

1.0,2.0 3.1415927,2.718281828459

Note: Most SQL clients control display precision of floating point numbers, and values may appear differently in some clients.

3.1.3.4.3 Floating Point Casts and Conversions

The following table shows the possible Floating Point value conversions:

Type Details BOOL 1.0 → true, 0.0 → false TINYINT, SMALLINT, INT, 2.0 → 2, 3.14159265358979 → 3, 2.718281828459 → 2, 0.5 → 0, BIGINT 1.5 → 1 VARCHAR(n) (n > 6 recom- 1 → '1.0000', 3.14159265358979 → '3.1416' mended)

Note: As shown in the above examples, when casting real to int, we round down.

374 Chapter 3. Reference SQream DB, Release 2021.1

3.1.3.5 String (TEXT, VARCHAR)

TEXT and VARCHAR are types designed for storing text or strings of characters. SQream DB separates ASCII (VARCHAR) and UTF-8 representations (TEXT).

Note: The data type NVARCHAR has been deprecated by TEXT as of version 2020.1.

3.1.3.5.1 String Types

The following table describes the String types.

Name Details Data Size (Not Example Null, Uncom- pressed) TEXT [(n)], Varaiable length string - UTF-8 uni- Up to 4*n bytes '￿￿￿￿￿￿￿￿￿￿' NVARCHAR code. NVARCHAR is synonymous with (n) TEXT. VARCHAR (n) Variable length string - ASCII only n bytes 'Kiwis have tiny wings, but cannot fly.'

3.1.3.5.2 Length

When using TEXT, specifying a size is optional. If not specified, the text field carries no constraints. Tolimit the size of the input, use VARCHAR(n) or TEXT(n), where n is the permitted number of characters. The following apply to setting the String type length: • If the data exceeds the column length limit on INSERT or COPY operations, SQream DB will return an error. • When casting or converting, the string has to fit in the target. For example, 'Kiwis are weird birds' :: VARCHAR(5) will return an error. Use SUBSTRING to truncate the length of the string. • VARCHAR strings are padded with spaces.

3.1.3.5.3 Syntax

String types can be written with standard SQL string literals, which are enclosed with single quotes, such as 'Kiwi bird'. To include a single quote in the string, use double quotations, such as 'Kiwi bird''s wings are tiny'. String literals can also be dollar-quoted with the dollar sign $, such as $$Kiwi bird's wings are tiny$$ is the same as 'Kiwi bird''s wings are tiny'.

3.1.3.5.4 Size

VARCHAR(n) can occupy up to n bytes, whereas TEXT(n) can occupy up to 4*n bytes. However, the size of strings is variable and is compressed by SQream DB.

3.1. Data Types 375 SQream DB, Release 2021.1

3.1.3.5.5 String Examples

The following is an example of the String syntax:

CREATE TABLE cool_strings (a TEXT NOT NULL, b TEXT);

INSERT INTO cool_strings VALUES ('hello world', 'Hello to kiwi birds specifically');

INSERT INTO cool_strings VALUES ('This is ASCII only', 'But this column can contain ￿￿￿￿');

SELECT * FROM cool_strings;

The following is an example of the correct output: hello world ,Hello to kiwi birds specifically This is ASCII only,But this column can contain ￿￿￿￿

Note: Most clients control the display precision of floating point numbers, and values may appear differently in some clients.

3.1.3.5.6 String Casts and Conversions

The following table shows the possible String value conversions:

Type Details BOOL 'true' → true, 'false' → false TINYINT, '2' → 2, '-128' → -128 SMALLINT, INT, BIGINT REAL, DOUBLE '2.0' → 2.0, '3.141592' → 3.141592 DATE, DATETIME Requires a supported format, such as '1955-11-05 → date '1955-11-05', '1955-11-05 01:24:00.000' → '1955-11-05 01:24:00.000'

3.1.3.6 Date (DATE, DATETIME)

DATE is a type designed for storing year, month, and day. DATETIME is a type designed for storing year, month, day, hour, minute, seconds, and milliseconds in UTC with 1 millisecond precision.

3.1.3.6.1 Date Types

The following table describes the Date types.

376 Chapter 3. Reference SQream DB, Release 2021.1

Table 1: Date types Name Details Data Size (Not Null, Uncom- Example pressed) DATE Date 4 bytes '1955-11-05' DATETIME Date and time pairing in 8 bytes '1955-11-05 01:24:00. UTC 000'

3.1.3.6.2 Aliases

DATETIME is also known as TIMESTAMP.

3.1.3.6.3 Syntax

DATE values are formatted as string literals. For example, '1955-11-05' or date '1955-11-05'. DATETIME values are formatted as string literals conforming to ISO 8601, for example '1955-11-05 01:26:00'. SQream DB attempts to guess if the string literal is a date or datetime based on context, for example when used in date-specific functions.

3.1.3.6.4 Size

A DATE column is 4 bytes in length, while a DATETIME column is 8 bytes in length. However, the size of these values is compressed by SQream DB.

3.1.3.6.5 Date Examples

The following is an example of the Date syntax:

CREATE TABLE important_dates (a DATE, b DATETIME);

INSERT INTO important_dates VALUES ('1997-01-01', '1955-11-05 01:24');

SELECT * FROM important_dates;

The following is an example of the correct output:

1997-01-01,1955-11-05 01:24:00.0

The following is an example of the Datetime syntax:

SELECT a :: DATETIME, b :: DATE FROM important_dates;

The following is an example of the correct output:

1997-01-01 00:00:00.0,1955-11-05

3.1. Data Types 377 SQream DB, Release 2021.1

Warning: Some client applications may alter the DATETIME value by modifying the timezone.

3.1.3.6.6 Date Casts and Conversions

The following table shows the possible DATE and DATETIME value conversions:

Type Details VARCHAR(n) '1997-01-01' → '1997-01-01', '1955-11-05 01:24' → '1955-11-05 01:24:00.000'

3.2 SQL Statements and Syntax

This section provides reference for using SQream DB’s SQL statements - DDL commands, DML commands and SQL query syntax.

3.2.1 SQL Syntax Features

SQream DB supports SQL from the ANSI 92 syntax.

3.2.1.1 Keywords and Identifiers

SQL statements are made up of three components: • Keywords: Keywords have specific meaning in SQL, like SELECT, CREATE, WHERE, etc. • Identifiers: Names for things like tables, columns, databases, etc.

3.2.1.1.1 Keywords

Keywords are special words that make up SQream DB’s SQL syntax, and have a meaning in a statement. Keywords are not allowed as identifiers. A full list of reserved keywords can be found at the end of this document.

3.2.1.1.2 Identifiers

Identifiers are typically used as database objects names, such as databases, tables, views or columns. Because of their use, these are also often also called “names”. Identifiers can also be used to change a column name in the result (column alias)ina SELECT statement.

3.2.1.1.2.1 Identifier rules

An identifier can be either quoted or unquoted, and are maximum 128 characters long. Regular identifiers follow these rules: 1. Does not contain any special characters, apart for an underscore (_)

378 Chapter 3. Reference SQream DB, Release 2021.1

2. Case insensitive. SQream DB converts all identifiers to lowercase unless quoted. 3. Does not equal any keyword, like SELECT, OR, AND, etc. To bypass these limitations, use a quoted identifier by surrounding the identifier with double quotes ("). Quoted identifiers follow these rules: • Surrounded with double quotes (“). • May contain any ASCII character, except @, $ or ". • Quoted identifiers are stored case-sensitive and must be referenced with double quotes.

3.2.1.1.3 Reserved keywords

Keyword ALL ANALYSE ANALYZE AND ANY ARRAY AS ASC AUTHORIZATION BINARY BOTH CASE CAST CHECK COLLATE COLUMN CONCURRENTLY CONSTRAINT CREATE CROSS CURRENT_CATALOG CURRENT_ROLE CURRENT_TIME CURRENT_USER DEFAULT DEFERRABLE DESC DISTINCT DO ELSE END EXCEPT FALSE FETCH FOR FREEZE Continued on next page

3.2. SQL Statements and Syntax 379 SQream DB, Release 2021.1

Table 2 – continued from previous page Keyword FROM FULL GRANT GROUP HASH HAVING ILIKE IN INITIALLY INNER INTERSECT INTO IS ISNULL JOIN LEADING LEFT LIKE LIMIT LOCALTIME LOCALTIMESTAMP LOOP MERGE NATURAL NOT NOTNULL NULL OFFSET ON ONLY OPTION OR ORDER OUTER OVER OVERLAPS PLACING PRIMARY REFERENCES RETURNING RIGHT RLIKE SELECT SESSION_USER SIMILAR SOME SYMMETRIC SYMMETRIC TABLE Continued on next page

380 Chapter 3. Reference SQream DB, Release 2021.1

Table 2 – continued from previous page Keyword THEN TO TRAILING TRUE UNION UNIQUE USER USING VARIADIC VERBOSE WHEN WHERE WINDOW WITH

3.2.1.2 Literals

Literals represent constant values. SQream DB contains the following types of literals: • Numeric literals - define numbers such as 1.3, -5 • String literals - define text values like 'Foxes are cool', '1997-01-01' • Typed literals - define values with explicit types like (3.0 :: float) • Boolean literals - define values that include true and false • Other constants - predefined values like NULL or TRUE

3.2.1.2.1 Numeric Literals

Numeric literals can be expressed as follows: number_literal ::= [+-] digits | digits . [ digits ] [ e [+-] digits ] | [ digits ] . digits [ e [+-] digits ] | digits e[+-]digits

3.2.1.2.1.1 Examples

1234

1234.56

12.

.34

(continues on next page)

3.2. SQL Statements and Syntax 381 SQream DB, Release 2021.1

(continued from previous page) 123.56e-45

0.23

3.141

42

Note: The actual data type of the value changes based on context, the format used, and the value itself. For example, any number containing the decimal point will be considered FLOAT by default. Any whole number will considered INT, unless the value is larger than the maximum value, in which case the type will become a BIGINT.

Note: A numeric literal that contains neither a decimal point nor an exponent is considered INT by default if its value fits in type INT (32 bits). If not, it is considered BIGINT by default if its value fits in type BIGINT (64 bits). If neither are true, it is considered FLOAT. Literals that contain decimal points and/or exponents are always considered FLOAT.

3.2.1.2.2 String Literals

String literals are string (text) values, encoded either in ASCII or UTF-8. String literals are surrounded by single quotes (') or dollars ($$)

Tip: To use a single quote in a string, use a repeated single quote.

3.2.1.2.2.1 Examples

'This is an example of a string'

'Hello? Is it me you''re looking for?' -- Repeated single quotes are treated as a single␣ ,→quote

$$That is my brother's company's CEO's son's dog's toy$$ -- Dollar-quoted

'1997-01-01' -- This is a string

The actual data type of the value changes based on context, the format used, and the value itself. In the example below, the first value is interpreted as a DATE, while the second is interpreted as a VARCHAR.

INSERT INTO cool_dates(date_col, reason) VALUES ('1955-11-05', 'Doc Brown discovers flux␣ ,→capacitor');

This section describes the following types of literals: • Regular string literals

382 Chapter 3. Reference SQream DB, Release 2021.1

• Dollar-quoted string literals • Escaped string literals

3.2.1.2.2.2 Regular String Literals

In SQL, a regular string literal is a sequence of zero or more characters bound by single quotes ('):

'This is a string'.

You can include a single-quote character in a string literal with two consecutive single quotes (''):

'Dianne''s horse'.

Note that two adjacent single quotes is not the same as a double-quote character (").

3.2.1.2.2.3 Examples

The following are some examples of regular string literals:

'123'

'￿￿￿'

'a''b'

''

3.2.1.2.2.4 Dollar-Quoted String Literals

Dollar-quoted string literals consist of a dollar sign ($), an optional “tag” of zero or more characters, another dollar sign, an arbitrary sequence of characters that make up the string content, a dollar sign, the same tag at the beginning of the dollar quote, and another dollar sign.

3.2.1.2.2.5 Examples

For example, below are two different ways to specify the string Dianne's horse using dollar-quoted string literals:

$$Dianne's horse$$ $$Dianne's horse$$

Note that you can use single quotes inside the dollar-quoted string without an escape. Because the string is always written literally, you do not need to escape any characters inside a dollar-quoted string. Backslashes and dollar signs indicate no specific functions unless they are part of a sequence matching the opening tag. Any used tags in a dollar-quoted string follow the same rules as for unquoted identifiers, except that they cannot contain a dollar sign. In addition, because tags are case sensitive, $$String content$$ is correct, but $$String content$$ is incorrect.

3.2. SQL Statements and Syntax 383 SQream DB, Release 2021.1

A dollar-quoted string that follows a keyword or identifier must be separated from it by whitespace (such as spaces, tabs, or newlines). If you do not separate them with whitespace, the dollar-quoting delimiter is taken as part of the preceding identifier.

3.2.1.2.2.6 Escaped String Literals

Because regular string literals do not support inserting special characters (such as new lines), the escaped string literals syntax was added to support inserting special characters with an escaping syntax. In addition to being enclosed by single quotes (e.g. ‘abc’), escaped string literals are preceded by a capital E.

E'abc'

The character sequence inside the single quotes can contain escaped characters in addition to regular char- acters, shown below:

Sequence Interpretation \b Inserts a backspace. \f Inserts a form feed. \n Inserts a newline. \r Inserts a carriage return. \t Inserts a tab. \o, \oo, `\ooo` (o = Inserts an octal byte value. This sequence is currently not supported. 0 - 7) \xh, \xhh (h = 0 - 9, Inserts a hexadecimal byte value. This sequence is currently not supported. A - F) \uxxxx, \Uxxxxxxxx Inserts a 16 or 32-bit hexadecimal unicode character value (x = 0 - 9, A - F).

Excluding the characters in the table above, escaped string literals take all other characters following a backslash literally. To include a backslash character, use two consecutive backslashes (\\). You can use a single quote in an escape string by writing \', in addition to the normal method ('').

3.2.1.2.3 Typed Literals

Typed literals allow you to create any data type using either of the following syntaxes:

CAST(literal AS type_name) literal :: type_name

See also Converting and Casting Types for more information about supported casts.

3.2.1.2.3.1 Syntax Reference

The following is a syntax reference for typed literals: typed_literal ::= cast(literal AS type_name) | literal :: type_name (continues on next page)

384 Chapter 3. Reference SQream DB, Release 2021.1

(continued from previous page) literal ::= string_literal | number_literal | NULL | TRUE | FALSE type_name ::=

BOOL | TINYINT | SMALLINT | INT | BIGINT | FLOAT | REAL | DATE | DATETIME | VARCHAR ( digits ) | TEXT ( digits )

3.2.1.2.3.2 Examples

'1955-11-05' :: date

'TRUE' :: BOOL

CAST('2300' as BIGINT)

CAST(42 :: FLOAT)

3.2.1.2.4 Boolean Literals

Boolean literals include the keywords true or false.

3.2.1.2.4.1 Example

INSERT INTO animals VALUES ('fox',true), ('cat',true), ('kiwi',false);

3.2.1.2.5 Other Constants

The following other constants can be used: • TRUE and FALSE - interpreted as values of type BOOL. • NULL - which has no type of its own. The type is inferred from context during query compilation.

3.2. SQL Statements and Syntax 385 SQream DB, Release 2021.1

3.2.1.3 Scalar expressions

Scalar expressions are functions that calculate a single (scalar) value, even if executed on an entire column. They can be stored as a row value, as opposed to a table value which is a result set consisting of more than one column and/or row. A scalar expression can be any one of these:

• Literals • Column references • Operators

3.2.1.3.1 Literals

Literals are constant values.

3.2.1.3.2 Column references

A column reference is a column name, alias, or ordinal. For example, in SELECT name FROM users, the column reference refers to the column titled name. Column names may be aliased. For example in SELECT name as "First name" from users, the column reference is the alias "First name", which is quoted to maintain the case and use of space. A column may also be referneced using an ordinal, for example in a GROUP BY or ORDER BY. In this query SELECT AVG(Salary),Team FROM nba GROUP BY 2, the ordinal 2 refers to the second column in the select list, Team.

3.2.1.3.3 Operators

Operators, frequently used for comparison, usually come in three forms:

3.2.1.3.3.1 Unary operator

A prefix or postfix to an expression or literals. For example, -, which is used to negate numbers.

prefix_unary_operator ::= + | - | NOT

postfix_unary_operator ::= IS NULL | IS NOT NULL

3.2.1.3.3.2 Binary operator

Two expressions or literals separated by an operator. For example, + which is used to add two numbers.

386 Chapter 3. Reference SQream DB, Release 2021.1

binary_operator ::= . | + | ^ | * | / | % | + | - | >= | <= | != | <> | || | LIKE | NOT LIKE | RLIKE | NOT RLIKE | < | > | = | OR | AND

3.2.1.3.3.3 Special operators for set membership

special_operator ::= value_expr IN ( value_expr [, ... ]) | value_expr NOT IN ( value_expr [, ... ]) | value_expr BETWEEN value_expr AND value_expr | value_expr NOT BETWEEN value_expr AND value_expr

These operators return TRUE if the value_expr on the left matches the expression on the right for set membership or if the value is in-range.

Note: The data type of the left value_expr must match the type of the right side value_expr.

3.2.1.3.3.4 Comparisons

Binary operators are frequently used to compare values. Comparison operators (< > = <= >= <> !=, IS NULL, IS NOT NULL always returns BOOL These operators are:

Operator Name < Smaller than <= Smaller than or equal to > Greater than >= Greater than or equal to = Equals <> or != Not equal to IS Identical to IS NOT Not identical to

Note: NULL values are handled differently than other value expressions: • NULL is always smaller than anything, including another NULL. • NULL is never equal to anything, including another NULL (=). To check if a value is null, use IS NULL

3.2.1.3.3.5 Operator precedence

The table below lists the operators in decreasing order of precedence.

3.2. SQL Statements and Syntax 387 SQream DB, Release 2021.1

Precedence Operator Associativity Highest . left + - (unary) ^ left */% left + - (binary) left || right BETWEEN, IN, LIKE, RLIKE < > = <= >= <> != IS NULL, IS NOT NULL NOT AND left Lowest OR left

Tip: Use parentheses to avoid ambiguous situations when using binary operators.

Note: The NOT variations, such as NOT BETWEEN, NOT IN, NOT LIKE, NOT RLIKE have the same precedence as their non-NOT variations.

3.2.1.4 Joins

Joins combine results from two or more table expressions (tables, external tables, views) to form a new result set. The combination of these table expressions ca nbe based on a set of conditions, such as equality of columns. Joins are useful when data in tables are related. For example, when two tables contain one or more columns in common, so that rows can be matched or correlated with rows from the other table.

3.2.1.4.1 Syntax table_ref ::= | left_side join_type right_side [ ON value_expr ] | table_alias join_type ::= [ INNER ] [ join_hint ] JOIN | LEFT [ OUTER ] [ join_hint ] JOIN | RIGHT [ OUTER ] [ join_hint ] JOIN | CROSS [ join_hint ] JOIN join_hint ::= MERGE | LOOP

388 Chapter 3. Reference SQream DB, Release 2021.1

3.2.1.4.1.1 Join types

3.2.1.4.1.2 Inner join

left_side [ INNER ] JOIN right_side ON value_expr left_side [ INNER ] JOIN right_side USING ( join_column [, ... ])

This is the default join type. Rows from the left_side and right_side that match the condition are returned. An inner join can also be specified by listing several tables in the FROM clause. Omitting the ON or WHERE will result in a CROSS JOIN, where every row of left_side is matched with every row on right_side.

3.2.1.4.1.3 Left outer join

left_side LEFT [ OUTER ] JOIN right_side ON value_expr left_side LEFT [ OUTER ] JOIN right_side USING ( join_column [, ... ])

Similar to the inner join - but for every row of left_side that does not match, NULL values are returned for columns on right_side

3.2.1.4.1.4 Right outer join

left_side RIGHT [ OUTER ] JOIN right_side ON value_expr left_side RIGHT [ OUTER ] JOIN right_side USING ( join_column [, ... ])

Similar to the inner join - but for every row of right_side that does not match, NULL values are returned for columns on left_side

3.2.1.4.1.5 Cross join

left_side CROSS JOIN right_side

A cartesian product - all rows match on all sides. For every row from left_side and right_side, the result set will contain a row with all columns from left_side and all columns from right_side. The CROSS JOIN can’t have an ON clause, but WHERE can be used to limit the result set.

3.2.1.4.1.6 Join conditions

3.2.1.4.1.7 ON value_expr

The ON condition is an expression that evaluates to a boolean to identify whether the rows match. For example, ON left_side.name = right_side.name matches when both name columns match. For LEFT and RIGHT joins, the ON clause is optional. However, if it is not specified, the result is a computa- tionally intensive CROSS JOIN.

3.2. SQL Statements and Syntax 389 SQream DB, Release 2021.1

Tip: SQream DB does not support the USING syntax. However, queries can be easily rewritten. left_side JOIN right_side using (name) is equivalent to ON left_side.name = right_side.name

3.2.1.4.2 Examples

Assume a pair of tables with the following structure and content:

CREATE TABLE left_side (x INT); INSERT INTO left_side VALUES (1), (2), (4), (5);

CREATE TABLE right_side (x INT); INSERT INTO right_side VALUES (2), (3), (4), (5), (6);

3.2.1.4.2.1 Inner join

In inner joins, values that are not matched do not appear in the result set. t=> SELECT * FROM left_side AS l JOIN right_side AS r . ON l.x = r.x; x | x0 --+--- 2 | 2 4 | 4 5 | 5

3.2.1.4.2.2 Left join

Note: Note the null values for 1 which were not matched. SQream DB outputs nulls last. t=> SELECT * FROM left_side AS l LEFT JOIN right_side AS r . ON l.x = r.x; x | x0 --+--- 2 | 2 4 | 4 5 | 5 1 | \N

3.2.1.4.2.3 Right join

Note: Note the null values for 3, 6 which were not matched. SQream DB outputs nulls last.

390 Chapter 3. Reference SQream DB, Release 2021.1

t=> SELECT * FROM left_side AS l LEFT JOIN right_side AS r . ON l.x = r.x; x | x0 ---+--- 2 | 2 4 | 4 5 | 5 \N | 3 \N | 6

3.2.1.4.2.4 Cross join t=> SELECT * FROM left_side AS l CROSS JOIN right_side AS r; x | x0 --+--- 1 | 2 1 | 3 1 | 4 1 | 5 1 | 6 2 | 2 2 | 3 2 | 4 2 | 5 2 | 6 4 | 2 4 | 3 4 | 4 4 | 5 4 | 6 5 | 2 5 | 3 5 | 4 5 | 5 5 | 6

Specifying multiple comma-separated tables is equivalent to a cross join, that can then be filtered with a WHERE clause.. t=> SELECT * FROM left_side l, right_side r; x | x0 --+--- 1 | 2 1 | 3 1 | 4 1 | 5 1 | 6 2 | 2 2 | 3 2 | 4 2 | 5 (continues on next page)

3.2. SQL Statements and Syntax 391 SQream DB, Release 2021.1

(continued from previous page) 2 | 6 4 | 2 4 | 3 4 | 4 4 | 5 4 | 6 5 | 2 5 | 3 5 | 4 5 | 5 5 | 6

t=> SELECT * FROM left_side l, right_side r WHERE (r.x=l.x); x | x0 --+--- 2 | 2 4 | 4 5 | 5

3.2.1.4.2.5 Join hints

Join hints can be used to override the query compiler and choose a particular join algorithm. The available algorithms are LOOP (corresponding to non-indexed nested loop join algorithm), and MERGE (corresponding to sort merge join algorithm). If no algorithm is specified, a loop join is performed by default.

t=> SELECT * FROM left_side AS l INNER MERGE JOIN right_side AS r ON l.x = r.x; x | x0 --+--- 2 | 2 4 | 4 5 | 5

t=> SELECT * FROM left_side AS l INNER LOOP JOIN right_side AS r ON l.x = r.x; x | x0 --+--- 2 | 2 4 | 4 5 | 5

3.2.1.5 Common table expressions (CTEs)

Common table expressions or CTEs allow a complex subquery to be represented in a short way later on for improved readability, and reuse multiple times in a query. CTEs do not affect query performance.

392 Chapter 3. Reference SQream DB, Release 2021.1

3.2.1.5.1 Syntax cte ::= [ WITH table_alias AS (query) [, ...] ] query ::= query_term [ UNION ALL query_term [ ...]] [ ORDER BY order [, ... ]] [ LIMIT num_rows ] ; query_term ::= SELECT [ TOP num_rows ] [ DISTINCT ] select_list [ FROM table_ref [, ... ] [ WHERE value_expr [ GROUP BY value_expr [, ... ] [ HAVING value_expr ] ] | ( VALUES ( value_expr [, ... ] ) [, ... ])

3.2.1.5.2 Examples

3.2.1.5.2.1 Simple CTE nba=> WITH s AS (SELECT "Name" FROM nba WHERE "Salary" > 20000000) . SELECT * FROM nba AS n, s WHERE n."Name" = s."Name"; Name | Team | Number | Position | Age | Height | Weight |␣ ,→College | Salary | name0 ------+------+------+------+-----+------+------+---- ,→------+------+------Carmelo Anthony | New York Knicks | 7 | SF | 32 | 6-8 | 240 |␣ ,→Syracuse | 22875000 | Carmelo Anthony Chris Bosh | Miami Heat | 1 | PF | 32 | 6-11 | 235 |␣ ,→Georgia Tech | 22192730 | Chris Bosh Chris Paul | Los Angeles Clippers | 3 | PG | 31 | 6-0 | 175 |␣ ,→Wake Forest | 21468695 | Chris Paul Derrick Rose | Chicago Bulls | 1 | PG | 27 | 6-3 | 190 |␣ ,→Memphis | 20093064 | Derrick Rose Dwight Howard | Houston Rockets | 12 | C | 30 | 6-11 | 265 | ␣ ,→ | 22359364 | Dwight Howard Kevin Durant | Oklahoma City Thunder | 35 | SF | 27 | 6-9 | 240 |␣ ,→Texas | 20158622 | Kevin Durant Kobe Bryant | Los Angeles Lakers | 24 | SF | 37 | 6-6 | 212 | ␣ ,→ | 25000000 | Kobe Bryant (continues on next page)

3.2. SQL Statements and Syntax 393 SQream DB, Release 2021.1

(continued from previous page) LeBron James | Cleveland Cavaliers | 23 | SF | 31 | 6-8 | 250 | ␣ ,→ | 22970500 | LeBron James

In this example, the WITH clause defines the temporary name r for the subquery which finds salaries over $20 million. The result set becomes a valid table reference in any table expression of the subsequent SELECT clause.

3.2.1.5.2.2 Nested CTEs

SQream DB also supports any amount of nested CTEs, such as this:

WITH w AS (SELECT * FROM (WITH x AS (SELECT * FROM nba) SELECT * FROM x ORDER BY "Salary" DESC)) SELECT * FROM w ORDER BY "Weight" DESC;

3.2.1.5.2.3 Reusing CTEs

SQream DB supports reusing CTEs several times in a query. CTEs are separated with commas.

nba=> WITH . nba_ct AS (SELECT "Name", "Team" FROM nba WHERE "College"='Connecticut'), . nba_az AS (SELECT "Name", "Team" FROM nba WHERE "College"='Arizona') . SELECT * FROM nba_az JOIN nba_ct ON nba_ct."Team" = nba_az."Team"; Name | Team | name0 | team0 ------+------+------+------Stanley Johnson | Detroit Pistons | Andre Drummond | Detroit Pistons Aaron Gordon | Orlando Magic | Shabazz Napier | Orlando Magic

3.2.1.5.2.4 Using CTEs with CREATE TABLE AS

When used with CREATE TABLE AS, the CREATE TABLE statement should appear before WITH.

CREATE TABLE weights AS

WITH w AS (SELECT * FROM (WITH x AS (SELECT * FROM nba) SELECT * FROM x ORDER BY "Salary" DESC)) SELECT * FROM w ORDER BY "Weight" DESC;

3.2.1.6 Window Functions

Window functions are functions applied over a subset (known as a window) of the rows returned by a SELECT query.

394 Chapter 3. Reference SQream DB, Release 2021.1

3.2.1.6.1 Syntax window_expr ::= window_fn ( [ value_expr [, ...]]) OVER ( [ PARTITION BY value_expression [, ...]] [ ORDER BY value_expression [ ASC | DESC ][ NULLS { FIRST | LAST } ] [, ...]] [ frame_clause ] ) window_fn ::= | AVG | COUNT | LAG | LEAD | MAX | MIN | RANK | ROW_NUMBER | SUM | FIRST_VALUE | LAST_VALUE | NTH_VALUE | DENSE_RANK | PERCENT_RANK | CUME_DIST | NTILE frame_clause ::= { RANGE | ROWS } frame_start [ frame_exclusion ] | { RANGE | ROWS } BETWEEN frame_def AND frame_def [ frame_exclusion ] frame_def ::= UNBOUNDED PRECEDING offset PRECEDING CURRENT ROW offset FOLLOWING UNBOUNDED FOLLOWING offset ::= EXCLUDE CURRENT ROW | EXCLUDE GROUP | EXCLUDE TIES | EXCLUDE NO OTHERS row_offset ::= numeric literal

3.2. SQL Statements and Syntax 395 SQream DB, Release 2021.1

3.2.1.6.2 Arguments

Parameter Description window_fn Window function, like AVG, MIN, etc. PARTITION BY partition_expr An expression or column reference to partition by ORDER BY order An expression or column reference to order by

3.2.1.6.3 Supported Window Functions

Table 3: Window Aggregation Functions FunctionDescription AVG Returns the average of numeric values. COUNTReturns the count of numeric values, or only the distinct values. MAX Returns the maximum values. MIN Returns the minimum values. SUM Returns the sum of numeric values, or only the distinct values.

Table 4: Ranking Functions FunctionDescription LAG Returns a value from a previous row within the partition of a result set. LEAD Returns a value from a subsequent row within the partition of a result set. ROW_NUMBERReturns the row number of each row within the partition of a result set. RANK Returns the rank of each row within the partition of a result set. FIRST_VALUEReturns the value in the first row of a window. LAST_VALUEReturns the value in the last row of a window. NTH_VALUEReturns the value in a specified (n) row of a window. DENSE_RANKReturns the rank of the current row with no gaps. PERCENT_RANKReturns the relative rank of the current row. CUME_DISTReturns the cumulative distribution of rows. NTILEReturns an integer ranging between 1 and the argument value, dividing the partitions as equally as possible.

3.2.1.6.4 How Window Functions Work

A window function operates on a subset (“window”) of rows. Each time a window function is called, it gets the current row for processing, as well as the window of rows that contains the current row. The window function returns one result row for each input. The result depends on the individual row and the order of the rows. Some window functions are order- sensitive, such as RANK.

Note: In general, a window frame will include all rows of a partition. If an ORDER BY clause is applied, the rows will become ordered which can change the order of the function calls. The function will be applied to the subset between the first row and the current row, instead ofthe whole frame.

396 Chapter 3. Reference SQream DB, Release 2021.1

Boundaries for the frames may need to be applied to get the correct results.

Window frame functions allows a user to perform rolling operations, such as calculate moving averages, longest standing customers, identifying churn, find movers and shakers, etc.

3.2.1.6.4.1 PARTITION BY

The PARTITION BY clause groups the rows of the query into partitions, which are processed separately by the window function. PARTITION BY works similarly to a query-level GROUP BY clause, but expressions are always just expressions and cannot be output-column names or numbers. Without PARTITION BY, all rows produced by the query are treated as a single partition.

3.2.1.6.4.2 ORDER BY

The ORDER BY clause determines the order in which the rows of a partition are processed by the window function. It works similarly to a query-level ORDER BY clause, but cannot use output-column names or numbers. Without ORDER BY, rows are processed in an unspecified order.

3.2.1.6.4.3 Frames

Note: Frames and frame exclusions have been tested extensively, but are a complex feature. They are released as a preview in v2020.1 pending longer-term testing.

The frame_clause specifies the set of rows constituting the window frame, which is a subset of thecurrent partition, for those window functions that act on the frame instead of the whole partition. The set of rows in the frame can vary depending on which row is the current row. The frame can be specified in RANGE or ROWS mode; in each case, it runs from the frame_start to the frame_end. If frame_end is omitted, the end defaults to CURRENT ROW. A frame_start of UNBOUNDED PRECEDING means that the frame starts with the first row of the partition, and similarly a frame_end of UNBOUNDED FOLLOWING means that the frame ends with the last row of the partition. In RANGE mode, a frame_start of CURRENT ROW means the frame starts with the current row’s first peer row (a row that the window’s ORDER BY clause sorts as equivalent to the current row), while a frame_end of CURRENT ROW means the frame ends with the current row’s last peer row. In ROWS mode, CURRENT ROW simply means the current row. In the offset PRECEDING and offset FOLLOWING frame options, the offset must be an expression not containing any variables, aggregate functions, or window functions. The meaning of the offset depends on the frame mode: • In ROWS mode, the offset must yield a non-null, non-negative integer, and the option means thatthe frame starts or ends the specified number of rows before or after the current row. • In RANGE mode, these options require that the ORDER BY clause specify exactly one column. The offset specifies the maximum difference between the value of that column in the current row anditsvaluein

3.2. SQL Statements and Syntax 397 SQream DB, Release 2021.1

preceding or following rows of the frame. This option is restricted to integer types, date and datetime. The offset is required to be a non-null non-negative integer value. • With a DATE or DATETIME column, the offset indicates a number of days. In any case, the distance to the end of the frame is limited by the distance to the end of the partition, so that for rows near the partition ends the frame might contain fewer rows than elsewhere. The default framing option is RANGE UNBOUNDED PRECEDING, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. With ORDER BY, this sets the frame to be all rows from the par- tition start up through the current row’s last ORDER BY peer. Without ORDER BY, this means all rows of the partition are included in the window frame, since all rows become peers of the current row.

3.2.1.6.4.4 Restrictions

• frame_start cannot be UNBOUNDED FOLLOWING • frame_end cannot be UNBOUNDED PRECEDING • frame_end choice cannot appear earlier in the above list of frame_start and frame_end options than the frame_start choice does. For example RANGE BETWEEN CURRENT ROW AND 7 PRECEDING is not allowed. However, while ROWS BETWEEN 7 PRECEDING AND 8 PRECEDING is allowed, it would never select any rows.

3.2.1.6.4.5 Frame Exclusion

The frame_exclusion option allows rows around the current row to be excluded from the frame, even if they would be included according to the frame start and frame end options. EXCLUDE CURRENT ROW excludes the current row from the frame. EXCLUDE GROUP excludes the current row and its ordering peers from the frame. EXCLUDE TIES excludes any peers of the current row from the frame, but not the current row itself. EXCLUDE NO OTHERS simply specifies explicitly the default behavior of not excluding the current roworits peers.

3.2.1.6.5 Limitations

• At this phase, text columns are not supported in window function expressions. • Window function calls are permitted only in the SELECT list.

3.2.1.6.6 Examples

For these examples, assume a table named nba, with the following structure:

CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, (continues on next page)

398 Chapter 3. Reference SQream DB, Release 2021.1

(continued from previous page) "College" varchar(40), "Salary" float );

Here’s a peek at the table contents (Download nba.csv):

Table 5: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics

3.2.1.6.6.1 Window Function Application t=> SELECT SUM("Salary") OVER (PARTITION BY "Team" ORDER BY "Age") FROM nba; sum ------1763400 5540289 5540289 5540289 5540289 7540289 18873622 18873622 30873622 60301531 60301531 60301531 64301531 72902950 72902950 [...]

3.2. SQL Statements and Syntax 399 SQream DB, Release 2021.1

3.2.1.6.6.2 Ranking Results

See RANK. t=> SELECT n.Name, n.Age, n.Height ,RANK() OVER . (PARTITION BY n.Age ORDER BY n.Height DESC) AS Rank . FROM nba_2 n; name | age | height | rank ------+-----+------+----- Devin Booker | 19 | 6-6 | 1 Rashad Vaughn | 19 | 6-6 | 1 Kristaps Porzingis | 20 | 7-3 | 1 Karl-Anthony Towns | 20 | 7-0 | 2 Bruno Caboclo | 20 | 6-9 | 3 Kevon Looney | 20 | 6-9 | 3 Aaron Gordon | 20 | 6-9 | 3 Noah Vonleh | 20 | 6-9 | 3 Cliff Alexander | 20 | 6-8 | 7 Stanley Johnson | 20 | 6-7 | 8 Justise Winslow | 20 | 6-7 | 8 Kelly Oubre Jr. | 20 | 6-7 | 8 James Young | 20 | 6-6 | 11 Dante Exum | 20 | 6-6 | 11 D'Angelo Russell | 20 | 6-5 | 13 Emmanuel Mudiay | 20 | 6-5 | 13 Tyus Jones | 20 | 6-2 | 15 Jahlil Okafor | 20 | 6-11 | 16 Christian Wood | 20 | 6-11 | 16 Myles Turner | 20 | 6-11 | 16 Trey Lyles | 20 | 6-10 | 19 [...]

3.2.1.6.6.3 Using LEAD to Access Following Rows Without a Join

The LEAD function is used to return data from rows further down the result set. The LAG function returns data from rows further up the result set. This example calculates the salary between two players, starting from the highest salary. t=> SELECT "Name", . "Salary", . LEAD("Salary", 1) OVER (ORDER BY "Salary" DESC) AS "Salary - next", . ABS(LEAD("Salary", 1) OVER (ORDER BY "Salary" DESC)- "Salary") AS "Salary -␣ ,→diff" . FROM nba . LIMIT 11 ; Name | Salary | Salary - next | Salary - diff ------+------+------+------Kobe Bryant | 25000000 | 22970500 | 2029500 LeBron James | 22970500 | 22875000 | 95500 Carmelo Anthony | 22875000 | 22359364 | 515636 Dwight Howard | 22359364 | 22192730 | 166634 (continues on next page)

400 Chapter 3. Reference SQream DB, Release 2021.1

(continued from previous page) Chris Bosh | 22192730 | 21468695 | 724035 Chris Paul | 21468695 | 20158622 | 1310073 Kevin Durant | 20158622 | 20093064 | 65558 Derrick Rose | 20093064 | 20000000 | 93064 Dwyane Wade | 20000000 | 19689000 | 311000 Brook Lopez | 19689000 | 19689000 | 0 DeAndre Jordan | 19689000 | 19689000 | 0

3.2.1.7 Subqueries

Subqueries allows you to reuse of results from another query. SQream DB supports relational (also called derived table) subqueries, which appear as SELECT queries as part of a table expression. SQream DB also supports Common table expressions (CTEs), which are a form of subquery. With CTEs, a subquery can be named for reuse in a query.

Note: • SQream DB does not currently support correlated subqueries or scalar subqueries. • There is no limit to the number of subqueries or nesting limits in a statement

3.2.1.7.1 Table Subqueries

The following is an example of table named nba with the following structure:

CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float );

To see the table contents, click Download nba.csv:

3.2. SQL Statements and Syntax 401 SQream DB, Release 2021.1

Table 6: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics

3.2.1.7.2 Scalar Subqueries

The following are examples of scalar subqueries.

3.2.1.7.2.1 Simple Subquery t=> SELECT AVG("Age") FROM . (SELECT "Name","Team","Age" FROM nba WHERE "Height" > '7-0'); avg --- 26

3.2.1.7.2.2 Combining a Subquery with a Join t=> SELECT * FROM . (SELECT "Name" FROM nba WHERE "Height" > '7-0') AS t(name) . , nba AS n . WHERE n."Name"=t.name; name | Name | Team | Number | Position |␣ ,→Age | Height | Weight | College | Salary ------+------+------+------+------+--- ,→--+------+------+------+------Alex Len | Alex Len | Phoenix Suns | 21 | C | ␣ ,→22 | 7-1 | 260 | Maryland | 3807120 Alexis Ajinca | Alexis Ajinca | New Orleans Pelicans | 42 | C | ␣ ,→28 | 7-2 | 248 | \N | 4389607 (continues on next page)

402 Chapter 3. Reference SQream DB, Release 2021.1

(continued from previous page) Boban Marjanovic | Boban Marjanovic | San Antonio Spurs | 40 | C | ␣ ,→27 | 7-3 | 290 | \N | 1200000 Kristaps Porzingis | Kristaps Porzingis | New York Knicks | 6 | PF | ␣ ,→20 | 7-3 | 240 | \N | 4131720 Marc Gasol | Marc Gasol | Memphis Grizzlies | 33 | C | ␣ ,→31 | 7-1 | 255 | \N | 19688000 Meyers Leonard | Meyers Leonard | Portland Trail Blazers | 11 | PF | ␣ ,→24 | 7-1 | 245 | Illinois | 3075880 Roy Hibbert | Roy Hibbert | Los Angeles Lakers | 17 | C | ␣ ,→29 | 7-2 | 270 | Georgetown | 15592217 Rudy Gobert | Rudy Gobert | Utah Jazz | 27 | C | ␣ ,→23 | 7-1 | 245 | \N | 1175880 Salah Mejri | Salah Mejri | Dallas Mavericks | 50 | C | ␣ ,→29 | 7-2 | 245 | \N | 525093 Spencer Hawes | Spencer Hawes | Charlotte Hornets | 0 | PF | ␣ ,→28 | 7-1 | 245 | Washington | 6110034 Tibor Pleiss | Tibor Pleiss | Utah Jazz | 21 | C | ␣ ,→26 | 7-3 | 256 | \N | 2900000 Timofey Mozgov | Timofey Mozgov | Cleveland Cavaliers | 20 | C | ␣ ,→29 | 7-1 | 275 | \N | 4950000 Tyson Chandler | Tyson Chandler | Phoenix Suns | 4 | C | ␣ ,→33 | 7-1 | 240 | \N | 13000000 Walter Tavares | Walter Tavares | Atlanta Hawks | 22 | C | ␣ ,→24 | 7-3 | 260 | \N | 1000000

3.2.1.7.2.3 WITH subqueries

See Common table expressions (CTEs) for more information. nba=> WITH . nba_ct AS (SELECT "Name", "Team" FROM nba WHERE "College"='Connecticut'), . nba_az AS (SELECT "Name", "Team" FROM nba WHERE "College"='Arizona') . SELECT * FROM nba_az JOIN nba_ct ON nba_ct."Team" = nba_az."Team"; Name | Team | name0 | team0 ------+------+------+------Stanley Johnson | Detroit Pistons | Andre Drummond | Detroit Pistons Aaron Gordon | Orlando Magic | Shabazz Napier | Orlando Magic

3.2.1.8 Null handling

SQream DB handles NULL values similar to other RDBMSs, with some minor differences.

Tip: When using sqream sql NULL values are displayed as \N. Different clients may show other values, including an empty string.

3.2.1.8.1 Comparisons

Any comparison between a NULL and a value will result in NULL.

3.2. SQL Statements and Syntax 403 SQream DB, Release 2021.1

For example, writing WHERE a = NULL will never match to TRUE or FALSE, because the comparison results in NULL. Use IS NULL to check for NULL values in result sets.

3.2.1.8.2 Conditionals

Functions like ISNULL and COALESCE evaluate arguments in their given order, so they may never evaluate some arguments. For example, COALESCE(a,b,c,d,NULL) will never evalulate b, c, d, or NULL if a is not NULL.

3.2.1.8.3 Operations

3.2.1.8.3.1 Scalar functions

With all scalar functions, a NULL input to any one of the arguments means the result is NULL. t=> SELECT x, y, z, x+y as "x+y", x*y as "x*y", y*z as "y*z", z-y as "z-y" FROM n; x | y | z | x+y | x*y | y*z | z-y --+----+----+-----+-----+-----+---- 1|0|1| 1| 0| 0| 1 2|1|1| 3| 2| 1| 0 3 | \N | 4 | \N | \N | \N | \N 4 | \N | \N | \N | \N | \N | \N 5 | 6 | \N | 11 | 30 | \N | \N

3.2.1.8.3.2 Aggregates

With aggregates, NULL values are ignored, so they do not affect the result set. t=> SELECT SUM(x) AS "sum(x)", SUM(y) AS "sum(y)", SUM(z) AS "sum(z)" FROM n; sum(x) | sum(y) | sum(z) ------+------+------15 | 7 | 6 t=> SELECT COUNT(x) AS "count(x)", COUNT(y) AS "count(y)", COUNT(z) AS "count(z)" FROM n; count(x) | count(y) | count(z) ------+------+------5 | 3 | 3

• NULL values are not included in COUNT() of a column. COUNT(x) shows the full row-count because it’s a non-nullable column. COUNT(y) returns 3 - just the non-NULL values. • With MIN, MAX, and AVG - NULL values are completely ignored.

3.2.1.8.3.3 Distincts

NULL values are considered distinct, but only counted once.

404 Chapter 3. Reference SQream DB, Release 2021.1

t=> SELECT DISTINCT z FROM n; z -- 1 4 \N

Running COUNT DISTINCT however, ignores the NULL values: NULL values are considered distinct, but only counted once.

t=> SELECT COUNT(DISTINCT z) FROM n; count ----- 2

3.2.1.8.4 Sorting

When sorting a column containing NULL values, SQream DB sorts NULL values first with ASC and last with DESC. SQream DB does not implement NULLS FIRST or NULLS LAST, so where NULL appears cannot change where NULL values appear in the sort order.

t=> SELECT * FROM n ORDER BY z ASC; x | y | z --+----+--- 4 | \N | \N 5 | 6 | \N 1 | 0 | 1 2 | 1 | 1 3 | \N | 4

t=> SELECT * FROM n ORDER BY z DESC; x | y | z --+----+--- 3 | \N | 4 1 | 0 | 1 2 | 1 | 1 4 | \N | \N 5 | 6 | \N

3.2.2 SQL Statements

SQream DB supports commands from ANSI SQL.

3.2. SQL Statements and Syntax 405 SQream DB, Release 2021.1

3.2.2.1 Data Definition Commands (DDL)

Table 7: DDL Commands Command Usage ADD COLUMN Add a new column to a table ALTER DEFAULT SCHEMA Change the default schema for a role ALTER TABLE Change the schema of a table CREATE DATABASE Create a new database CREATE EXTERNAL TABLE Create a new external table in the database (deprecated) CREATE FOREIGN TABLE Create a new foreign table in the database CREATE FUNCTION Create a new user defined function in the database CREATE SCHEMA Create a new schema in the database CREATE TABLE Create a new table in the database CREATE TABLE AS Create a new table in the database using results from a select query CREATE VIEW Create a new view in the database DROP COLUMN Drop a column from a table DROP DATABASE Drop a database and all of its objects DROP FUNCTION Drop a function DROP SCHEMA Drop a schema DROP TABLE Drop a table and its contents from a database DROP VIEW Drop a view RENAME COLUMN Rename a column RENAME TABLE Rename a table

3.2.2.2 Data Manipulation Commands (DML)

Table 8: DML Commands Command Usage CREATE TABLE AS Create a new table in the database using results from a select query DELETE Delete specific rows from a table COPY FROM Bulk load CSV data into an existing table COPY TO Export a select query or entire table to CSV files INSERT Insert rows into a table SELECT Select rows and column from a table TRUNCATE Delete all rows from a table VALUES Return rows containing literal values

3.2.2.3 Utility Commands

Table 9: Utility Commands Command Usage SELECT GET_LICENSE_INFO View a user’s license information SELECT GET_DDL View the CREATE TABLE statement for a table SELECT GET_FUNCTION_DDL View the CREATE FUNCTION statement for a UDF SELECT GET_VIEW_DDL View the CREATE VIEW statement for a view SELECT RECOMPILE_VIEW Recreate a view after schema changes SELECT DUMP_DATABASE_DDL View the CREATE TABLE statement for an current database

406 Chapter 3. Reference SQream DB, Release 2021.1

3.2.2.4 Saved Queries

Table 10: Saved Queries Command Usage SELECT DROP_SAVED_QUERY Drop a saved query SELECT EXE- Executes a saved query CUTE_SAVED_QUERY SELECT LIST_SAVED_QUERIES Returns a list of saved queries SELECT RECOM- Recompiles a query that has been invalidated by a schema PILE_SAVED_QUERY change SELECT SAVE_QUERY Compiles and saves a query for re-use and sharing SELECT SHOW_SAVED_QUERY Shows query text for a saved query

For more information, see Saved queries

3.2.2.5 Monitoring

Monitoring statements allow a database administrator to execute actions in the system, such as aborting a query or get information about system processes.

Table 11: Monitoring Command Usage EXPLAIN Returns a static query plan for a statement SHOW_CONNECTIONS Returns a list of jobs and statements on the current worker SHOW_LOCKS Returns any existing locks in the database SHOW_NODE_INFO Returns a query plan for an actively running statement with timing infor- mation SHOW_SERVER_STATUS Shows running statements across the cluster SHOW_VERSION Returns the version of SQream DB STOP_STATEMENT Stops a query (or statement) if it is currently running

3.2.2.6 Workload Management

Table 12: Workload Management Command Usage SUBSCRIBE_SERVICE Add a SQream DB worker to a service queue UNSUBSCRIBE_SERVICE Remove a SQream DB worker to a service queue SHOW_SUBSCRIBED_INSTANCES Return a list of service queues and workers

3.2. SQL Statements and Syntax 407 SQream DB, Release 2021.1

3.2.2.7 Access Control Commands

Table 13: Access Control Commands Command Usage ALTER DEFAULT PER- Applies a change to defaults in the current schema MISSIONS ALTER ROLE Applies a change to an existing role CREATE ROLE Creates a roles, which lets a database administrator control permissions on tables and databases DROP ROLE Removes roles GET_STATEMENT_PERMISSIONSReturns a list of permissions required to run a statement or query GRANT Grant permissions to a role REVOKE Revoke permissions from a role RENAME ROLE Rename a role

3.2.2.7.1 ADD COLUMN

ADD COLUMN can be used to add columns to an existing table.

3.2.2.7.1.1 Permissions

The role must have the DDL permission at the database or table level.

3.2.2.7.1.2 Syntax alter_table_add_column_statement ::= ALTER TABLE [schema_name.]table_name { ADD COLUMN column_def [, ...]} ; table_name ::= identifier schema_name ::= identifier column_def :: = { column_name type_name [ default ] [ column_constraint ] } column_name ::= identifier column_constraint ::= { NOT NULL | NULL } default ::= DEFAULT default_value

408 Chapter 3. Reference SQream DB, Release 2021.1

3.2.2.7.1.3 Parameters

Parameter Description schema_name The schema name for the table. Defaults to public if not specified. table_name The table name to apply the change to. ADD COLUMN A comma separated list of ADD COLUMN commands column_def column_def A column definition. A minimal column definition includes a name identifier anda datatype. Other column constraints and default values can be added optionally.

Note: • When adding a new column to an existing table, a default (or null constraint) has to be specified, even if the table is empty. • A new column added to the table can not contain an IDENTITY or be of the TEXT type.

3.2.2.7.1.4 Examples

3.2.2.7.1.5 Adding a simple column with default value

ALTER TABLE cool_animals ADD COLUMN number_of_eyes INT DEFAULT 2 NOT NULL;

3.2.2.7.1.6 Adding several columns in one command

ALTER TABLE cool_animals ADD COLUMN number_of_eyes INT DEFAULT 2 NOT NULL, ADD COLUMN date_seen DATE DEFAULT '2019-08-01';

3.2.2.7.2 ALTER DEFAULT SCHEMA

ALTER DEFAULT SCHEMA can be used to change a role’s default schema. The default schema in SQream DB is public. See also: CREATE SCHEMA, DROP SCHEMA.

3.2.2.7.2.1 Permissions

No special permissions are required

3.2.2.7.2.2 Syntax

3.2. SQL Statements and Syntax 409 SQream DB, Release 2021.1

alter_default_schema_statement ::= ALTER DEFAULT SCHEMA FOR role_name TO schema_name ; role_name ::= identifier schema_name ::= identifier

3.2.2.7.2.3 Parameters

Parameter Description role_name The name of the role the change will apply to. schema_name The new default schema name.

3.2.2.7.2.4 Examples

3.2.2.7.2.5 Altering the default schema for a role

SELECT * FROM users; -- Refers to public.users

ALTER DEFAULT SCHEMA FOR bgilfoyle TO staging;

SELECT * FROM users; -- Now refers to staging.users, rather than public.users

3.2.2.7.3 ALTER TABLE

ALTER TABLE can be used to make schema changes to a table. It works in conjunction with several subcom- mands.

3.2.2.7.3.1 Locks

Schema changes take an exclusive lock on tables. While these operations are usually short, other statements may have to wait until the schema changes are completed.

3.2.2.7.3.2 Subcommands

Command Usage ADD COLUMN Add a new column to a table DROP COLUMN Drop a column from a table RENAME COLUMN Rename a column RENAME TABLE Rename a table CLUSTER BY Modify (add or reorder) the clustering keys in a table DROP CLUSTERING KEY Drop all clustering keys

410 Chapter 3. Reference SQream DB, Release 2021.1

3.2.2.7.4 CLUSTER BY

CLUSTER BY can be used to change clustering keys in a table. Read our Data clustering guide for more information. See also: DROP CLUSTERING KEY , CREATE TABLE.

3.2.2.7.4.1 Permissions

The role must have the DDL permission at the database or table level.

3.2.2.7.4.2 Syntax alter_table_rename_table_statement ::= ALTER TABLE [schema_name.]table_name CLUSTER BY column_name [, ...] ; table_name ::= identifier column_name ::= identifier

3.2.2.7.4.3 Parameters

Parameter Description schema_name The schema name for the table. Defaults to public if not specified. table_name The table name to apply the change to. column_name [, ... ] Comma separated list of columns to create clustering keys for

3.2.2.7.4.4 Usage notes

Removing clustering keys does not affect existing data. To force data to re-cluster, the table has to be recreated (i.e. with CREATE TABLE AS).

3.2.2.7.4.5 Examples

3.2.2.7.4.6 Reclustering a table

ALTER TABLE public.users CLUSTER BY start_date;

3.2.2.7.5 CREATE DATABASE

CREATE DATABASE creates a new database in SQream DB

3.2. SQL Statements and Syntax 411 SQream DB, Release 2021.1

3.2.2.7.5.1 Permissions

Only a superuser can create a new database

3.2.2.7.5.2 Syntax create_database_statement ::=

CREATE DATABASE database_name ; database_name ::= identifier

3.2.2.7.5.3 Parameters

Parameter Description database_name The name of the database name. The database name must be unique, and follows Identifier rules

3.2.2.7.5.4 Examples

CREATE DATABASE raviga;

CREATE DATABASE my_db;

If the database already exists, an error will appear: master=> CREATE DATABASE MY_DB; Database 'my_db' already exists

Note: SQream DB identifiers are always converted to lowercase, so my_db is the same as MY_DB, unless explicitly quoted as "MY_DB".

3.2.2.7.6 CREATE EXTERNAL TABLE

Warning: The CREATE EXTERNAL TABLE syntax is deprecated, and will be removed in future versions. Starting with SQream DB v2020.2, external tables have been renamed to foreign tables, and use a more flexible foreign data wrapper concept. See CREATE FOREIGN TABLE instead. Upgrading to a new version of SQream DB converts existing tables automatically. When creating a new external tables, use the new foreign table syntax.

CREATE TABLE creates a new external table in an existing database. See more in the External tables guide.

412 Chapter 3. Reference SQream DB, Release 2021.1

Tip: • Data in an external table can change if the sources change, and frequent access to remote files may harm performance. • To create a regular table, see CREATE TABLE

3.2.2.7.6.1 Permissions

The role must have the CREATE permission at the database level.

3.2.2.7.6.2 Syntax create_table_statement ::= CREATE [ OR REPLACE ] EXTERNAL TABLE [schema_name].table_name ( { column_def [, ...]} ) USING FORMAT format_def WITH { external_table_option [ ...]} ; schema_name ::= identifier table_name ::= identifier format_def ::= { PARQUET | ORC | CSV } external_table_option ::= { PATH '{ path_spec }' | FIELD DELIMITER '{ field_delimiter }' | RECORD DELIMITER '{ record_delimiter }' | AWS_ID '{ AWS ID }' | AWS_SECRET '{ AWS SECRET }' } path_spec ::= { local filepath | S3 URI | HDFS URI } field_delimiter ::= delimiter_character record_delimiter ::= delimiter_character column_def ::= { column_name type_name [ default ] [ column_constraint ] } column_name ::= identifier column_constraint ::= { NOT NULL | NULL } default ::=

(continues on next page)

3.2. SQL Statements and Syntax 413 SQream DB, Release 2021.1

(continued from previous page) DEFAULT default_value | IDENTITY [ ( start_with [ , increment_by ] ) ]

3.2.2.7.6.3 Parameters

Parame- Description ter OR Create a new table, and overwrite any existing table by the same name. Does not return an REPLACE error if the table already exists. CREATE OR REPLACE does not check the table contents or structure, only the table name. schema_nameThe name of the schema in which to create the table. table_nameThe name of the table to create, which must be unique inside the schema. column_defA comma separated list of column definitions. A minimal column definition includes aname identifier and a datatype. Other column constraints and default values can be added option- ally. USING Specifies the format of the source files, suchas PARQUET, ORC, or CSV. FORMAT ... WITH Specifies a path or URI of the source files, suchas /path/to/*.parquet. PATH ... FIELD Specifies the field delimiter for CSV files. Defaults to ,. DELIMITER RECORD Specifies the record delimiter for CSV files. Defaults to anewline, \n DELIMITER AWS_ID, Credentials for authenticated S3 access AWS_SECRET

3.2.2.7.6.4 Examples

3.2.2.7.6.5 A simple table from Tab-delimited file (TSV)

CREATE OR REPLACE EXTERNAL TABLE cool_animals (id INT NOT NULL, name VARCHAR(30) NOT NULL, weight FLOAT NOT NULL) USING FORMAT csv WITH PATH '/home/rhendricks/cool_animals.csv' FIELD DELIMITER '\t';

3.2.2.7.6.6 A table from a directory of Parquet files on HDFS

CREATE EXTERNAL TABLE users (id INT NOT NULL, name VARCHAR(30) NOT NULL, email VARCHAR(50) NOT NULL) USING FORMAT Parquet WITH PATH 'hdfs://hadoop-nn.piedpiper.com/rhendricks/users/*.parquet';

414 Chapter 3. Reference SQream DB, Release 2021.1

3.2.2.7.6.7 A table from a bucket of files on S3

CREATE EXTERNAL TABLE users (id INT NOT NULL, name VARCHAR(30) NOT NULL, email VARCHAR(50) NOT NULL) USING FORMAT Parquet WITH PATH 's3://pp-secret-bucket/users/*.parquet' AWS_ID 'our_aws_id' AWS_SECRET 'our_aws_secret';

3.2.2.7.6.8 Changing an external table to a regular table

Materializes an external table into a regular table.

CREATE TABLE real_table AS SELECT * FROM external_table;

3.2.2.7.7 CREATE FOREIGN TABLE

Note: Starting with SQream DB v2020.2, external tables have been renamed to foreign tables, and use a more flexible foreign data wrapper concept. Upgrading to a new version of SQream DB converts existing external tables automatically.

CREATE FOREIGN TABLE creates a new foreign table in an existing database. See more in the Foreign tables guide.

Tip: • Data in a foreign table can change if the sources change, and frequent access to remote files may harm performance. • To create a regular table, see CREATE TABLE

3.2.2.7.7.1 Permissions

The role must have the CREATE permission at the database level.

3.2.2.7.7.2 Syntax create_table_statement ::= CREATE [ OR REPLACE ] FOREIGN TABLE [schema_name].table_name ( { column_def [, ...]} ) [ FOREIGN DATA ] WRAPPER fdw_name [ OPTIONS ( option_def [, ... ])] (continues on next page)

3.2. SQL Statements and Syntax 415 SQream DB, Release 2021.1

(continued from previous page) ; schema_name ::= identifier table_name ::= identifier fdw_name ::= { csv_fdw | orc_fdw | parquet_fdw } option_def ::= { LOCATION = '{ path_spec }' | DELIMITER = '{ field_delimiter }' -- for CSV only | RECORD_DELIMITER = '{ record_delimiter }' -- for CSV only | AWS_ID '{ AWS ID }' | AWS_SECRET '{ AWS SECRET }' } path_spec ::= { local filepath | S3 URI | HDFS URI } field_delimiter ::= delimiter_character record_delimiter ::= delimiter_character column_def ::= { column_name type_name [ default ] [ column_constraint ] } column_name ::= identifier column_constraint ::= { NOT NULL | NULL } default ::= DEFAULT default_value | IDENTITY [ ( start_with [ , increment_by ] ) ]

416 Chapter 3. Reference SQream DB, Release 2021.1

3.2.2.7.7.3 Parameters

Parameter Description OR Create a new table, and overwrite any existing table by the same name. Does not return REPLACE an error if the table already exists. CREATE OR REPLACE does not check the table contents or structure, only the table name. schema_name The name of the schema in which to create the table. table_name The name of the table to create, which must be unique inside the schema. column_def A comma separated list of column definitions. A minimal column definition includes a name identifier and a datatype. Other column constraints and default values can beadded optionally. WRAPPER Specifies the format of the source files, suchas parquet_fdw, orc_fdw, or csv_fdw. ... LOCATION Specifies a path or URI of the source files, suchas /path/to/*.parquet. = ... DELIMITER Specifies the field delimiter for CSV files. Defaults to ,. = ... RECORD_DELIMITERSpecifies the record delimiter for CSV files. Defaults to anewline, \n = ... AWS_ID, Credentials for authenticated S3 access AWS_SECRET

3.2.2.7.7.4 Examples

3.2.2.7.7.5 A simple table from Tab-delimited file (TSV)

CREATE OR REPLACE FOREIGN TABLE cool_animals (id INT NOT NULL, name VARCHAR(30) NOT NULL, weight FLOAT NOT NULL) WRAPPER csv_fdw OPTIONS ( LOCATION = '/home/rhendricks/cool_animals.csv', DELIMITER = '\t' ) ;

3.2.2.7.7.6 A table from a directory of Parquet files on HDFS

CREATE FOREIGN TABLE users (id INT NOT NULL, name VARCHAR(30) NOT NULL, email VARCHAR(50) NOT NULL) WRAPPER parquet_fdw OPTIONS ( LOCATION = 'hdfs://hadoop-nn.piedpiper.com/rhendricks/users/*.parquet' );

3.2. SQL Statements and Syntax 417 SQream DB, Release 2021.1

3.2.2.7.7.7 A table from a bucket of ORC files on S3

CREATE FOREIGN TABLE users (id INT NOT NULL, name VARCHAR(30) NOT NULL, email VARCHAR(50) NOT NULL) WRAPPER orc_fdw OPTIONS ( LOCATION = 's3://pp-secret-bucket/users/*.orc', AWS_ID = 'our_aws_id', AWS_SECRET = 'our_aws_secret' );

3.2.2.7.7.8 Changing a foreign table to a regular table

Materializes a foreign table into a regular table.

CREATE TABLE real_table AS SELECT * FROM some_foreign_table;

3.2.2.7.8 CREATE FUNCTION

CREATE FUNCTION creates a new user-defined function (UDF) in an existing database. See more in our Python UDF (user-defined functions) guide.

3.2.2.7.8.1 Permissions

The role must have the CREATE FUNCTION permission at the database level.

3.2.2.7.8.2 Syntax

create_function_statement ::= CREATE [ OR REPLACE ] FUNCTION function_name (argument_list) RETURNS return_type AS $$ { function_body } $$ LANGUAGE python ;

function_name ::= identifier

argument_list :: = { value_name type_name [, ...]} value_name ::= identifier return_type ::= type_name function_body ::= Valid Python code

418 Chapter 3. Reference SQream DB, Release 2021.1

3.2.2.7.8.3 Parameters

Pa- Description rame- ter OR Create a new function, and overwrite any existing function by the same name. Does not return REPLACE an error if the function already exists. CREATE OR REPLACE does not check the function contents or structure, only the function name. function_nameThe name of the function to create, which must be unique inside the database. argument_listA comma separated list of column definitions. A column definition includes a name identifier and a datatype. return_typeThe SQL datatype of the return value, such as INT, VARCHAR, etc. function_bodyPython code, dollar-quoted ($$).

3.2.2.7.8.4 Examples

3.2.2.7.8.5 Calculate distance between two points

CREATE OR REPLACE FUNCTION my_distance (x1 float, y1 float, x2 float, y2 float) RETURNS FLOAT AS $$ import math if y1 < x1: return 0.0 else: return math.sqrt((y2 - y1) ** 2 + (x2 - x1) ** 2) $$ LANGUAGE PYTHON;

-- Usage: SELECT city, my_location, my_distance(x1,y1,x2,y2) from cities;

3.2.2.7.8.6 Calling files from other locations

-- Our script my_code.py is in ~/my_python_stuff

CREATE FUNCTION write_log() RETURNS INT AS $$ import sys sys.path.append("/home/user/my_python_stuff") import my_code as f f.main() return 1

$$ LANGUAGE PYTHON; (continues on next page)

3.2. SQL Statements and Syntax 419 SQream DB, Release 2021.1

(continued from previous page)

-- Usage: SELECT write_log();

3.2.2.7.9 CREATE SCHEMA

The CREATE SCHEMA page describes the following:

• Overview • Permissions • Syntax • Parameters • Examples – Creating a Schema – Altering the Default Schema for a Role

3.2.2.7.9.1 Overview

CREATE SCHEMA creates a new schema in an existing database. A schema is a virtual space for storing tables. The default schema in SQream DB is public.

Tip: Use schemas to separate between use-cases, such as staging and production.

The CREATE SCHEMA statement can be used to query tables from different schemas without providing an alias, as in the following example: select schema1.table1.column from schema1.table1 join schema2.table1 on schema1.table1. ,→column1=schema2.table1.column1

See also: DROP SCHEMA, ALTER DEFAULT SCHEMA.

3.2.2.7.9.2 Permissions

The role must have the CREATE permission at the database level.

3.2.2.7.9.3 Syntax

The following example shows the correct syntax for creating a schema:

420 Chapter 3. Reference SQream DB, Release 2021.1

create_schema_statement ::= CREATE SCHEMA schema_name ; schema_name ::= identifier

3.2.2.7.9.4 Parameters

The following table shows the schema_name parameters:

Parameter Description schema_name The name of the schema to create.

3.2.2.7.9.5 Examples

This section includes the following examples:

• Creating a Schema • Altering the Default Schema for a Role

3.2.2.7.9.6 Creating a Schema

The following example shows an example of the syntax for creating a schema:

CREATE SCHEMA staging;

CREATE TABLE staging.users AS SELECT * FROM public.users;

SELECT * FROM staging.users;

3.2.2.7.9.7 Altering the Default Schema for a Role

The following example shows an example of the syntax for altering the default schema for a role:

SELECT * FROM users; -- Refers to public.users

ALTER DEFAULT SCHEMA FOR bgilfoyle TO staging;

SELECT * FROM users; -- Now refers to staging.users, rather than public.users

3.2.2.7.10 CREATE TABLE

CREATE TABLE creates a new table in an existing database.

3.2. SQL Statements and Syntax 421 SQream DB, Release 2021.1

Tip: • To create a table based on the result of a select query, see CREATE TABLE AS. • To create a table based on files like Parquet and ORC, see CREATE FOREIGN TABLE

3.2.2.7.10.1 Permissions

The role must have the CREATE permission at the schema level.

3.2.2.7.10.2 Syntax

The following is the correct syntax when creating a table: create_table_statement ::= CREATE [ OR REPLACE ] TABLE [schema_name.]table_name ( { column_def [, ...]} ) [ CLUSTER BY { column_name [, ...]}] ; schema_name ::= identifier table_name ::= identifier column_def :: = { column_name type_name [ default ] [ column_constraint ] } column_name ::= identifier column_constraint ::= { NOT NULL | NULL } default ::= DEFAULT default_value | IDENTITY [ ( start_with [ , increment_by ] ) ]

3.2.2.7.10.3 Parameters

The following parameters can be used when creating a table:

422 Chapter 3. Reference SQream DB, Release 2021.1

Parameter Description OR REPLACE Creates a new tables and overwrites any existing table by the same name. Does not return an error if the table already exists. CREATE OR REPLACE does not check the table contents or structure, only the table name. schema_name The name of the schema in which to create the table. table_name The name of the table to create, which must be unique inside the schema. column_def A comma separated list of column definitions. A minimal column definition includes a name identifier and a datatype. Other column constraints and default values canbe added optionally. CLUSTER BY A commma separated list of clustering column keys. column_name1 See Data clustering for more information. ... LIKE Duplicates the column structure of an existing table.

3.2.2.7.10.4 Default Value Constraints

The DEFAULT value constraint specifies a value to use if a value isn’t defined inan INSERT or COPY FROM statement. The value may be either a literal or GETDATE(), which is s evaluated at the time the row is created.

Note: The DEFAULT constraint only applies if the column does not have a value specified in the INSERT or COPY FROM statement. You can still insert a NULL into an nullable column by explicitly inserting NULL. For example, INSERT INTO cool_animals VALUES (1, 'Gnu', NULL).

3.2.2.7.10.5 Syntax

The following is the correct syntax when using the DEFAULT value constraints: column_def :: = { column_name type_name [ default ] [ column_constraint ] } column_constraint ::= { NOT NULL | NULL } default ::= DEFAULT default_value | IDENTITY [ ( start_with [ , increment_by ] ) ] check_specification ::= CHECK( 'CS compression_spec' ) compression_spec ::= { "default" | "p4d" | "dict" | "rle" | "sequence" | "flat" }

3.2.2.7.10.6 Identity

Identity (or sequence) columns can be used for generating key values. Some databases call this AUTOINCREMENT.

3.2. SQL Statements and Syntax 423 SQream DB, Release 2021.1

The identity property on a column guarantees that each new row inserted is generated based on the current seed & increment.

Warning: The identity property on a column does not guarantee uniqueness. The identity value can be bypassed by specifying it in an INSERT command.

The following table describes the identity parameters:

Parameter Description start_with A value that is used for the very first row loaded into the table. increment_by Incremental value that is added to the identity value of the previous row that was loaded.

3.2.2.7.10.7 Examples

3.2.2.7.10.8 Standard Table

The following is an example of the syntax used to create a standard table:

CREATE TABLE cool_animals ( id INT NOT NULL, name varchar(30) NOT NULL, weight FLOAT, is_agressive BOOL );

3.2.2.7.10.9 Table with Default Value Constraints for Some Columns

The following is an example of the syntax used to create a table with default value constraints for some columns:

CREATE TABLE cool_animals ( id INT NOT NULL, name varchar(30) NOT NULL, weight FLOAT, is_agressive BOOL DEFAULT false NOT NULL );

Note: The nullable/non-nullable constraint appears at the end, after the default option

3.2.2.7.10.10 Table with an Identity Column

The following is an example of the syntax used to create a table with an identity (auto-increment) column:

CREATE TABLE users ( id BIGINT IDENTITY(0,1) NOT NULL , -- Start with 0, increment by 1 name VARCHAR(30) NOT NULL, (continues on next page)

424 Chapter 3. Reference SQream DB, Release 2021.1

(continued from previous page) country VARCHAR(30) DEFAULT 'Unknown' NOT NULL );

Note: • Identity columns are supported on BIGINT columns. • Identity does not enforce the uniqueness of values. The identity value can be bypassed by specifying it in an INSERT command.

3.2.2.7.10.11 Creating a Table from a SELECT Query

The following is an example of the syntax used to create a table from a SELECT query: Use a CREATE TABLE AS statement to create a new table from the results of a SELECT query.

CREATE TABLE users_uk AS SELECT * FROM users WHERE country = 'United Kingdom';

3.2.2.7.10.12 Creating a Table with a Clustering Key

The following is an example of the syntax used to create a table with a clustering key: When data in a table is stored in a sorted order, the sorted columns are considered clustered. Good clustering can have a significant positive impact on performance. In the following example, we expect the start_date column to be naturally clustered, as new users sign up and get a newer start date. When the clustering key is set, if the incoming data isn’t naturally clustered, it will be clustered by SQream DB during insert or bulk load. See Data clustering for more information.

CREATE TABLE users ( name VARCHAR(30) NOT NULL, start_date datetime not null, country VARCHAR(30) DEFAULT 'Unknown' NOT NULL ) CLUSTER BY start_date;

3.2.2.7.11 CREATE TABLE AS

CREATE TABLE AS creates a new table from the result of a select query.

3.2.2.7.11.1 Permissions

The role must have the CREATE permission at the schema level, as well as SELECT permissions for any tables referenced by the statement.

3.2. SQL Statements and Syntax 425 SQream DB, Release 2021.1

3.2.2.7.11.2 Syntax

create_table_statement ::= CREATE [ OR REPLACE ] TABLE [schema_name].table_name AS query ; schema_name ::= identifier table_name ::= identifier

3.2.2.7.11.3 Parameters

Pa- Description rame- ter OR Create a new table, and overwrite any existing table by the same name. Does not return an error REPLACE if the table already exists. CREATE OR REPLACE does not check the table contents or structure, only the table name. schema_nameThe name of the schema in which to create the table. table_nameThe name of the table to create, which must be unique inside the schema. query A select query that returns data

3.2.2.7.11.4 Examples

3.2.2.7.11.5 Create a copy of an foreign table or view

CREATE TABLE users AS SELECT * FROM users_source;

3.2.2.7.11.6 Filtering

CREATE TABLE users_uk AS SELECT * FROM users WHERE country = 'United Kingdom';

3.2.2.7.11.7 Adding columns

CREATE TABLE users_uk_new AS SELECT GETDATE() as "Date",*,false as is_new FROM users_uk;

3.2.2.7.11.8 Creating a table from values

CREATE TABLE new_users AS VALUES(GETDATE(),'Richard','Foxworthy','1984-03-03',True)

426 Chapter 3. Reference SQream DB, Release 2021.1

3.2.2.7.12 CREATE VIEW

CREATE VIEW creates a new view in an existing database. A view is a virtual table.

Tip: • Use views to simplify complex queries or present only partial data to specific roles. • If an underlying table has changed (new columns, changed names, etc.) - a view may be invalidated. To recompile the view, see SELECT RECOMPILE_VIEW()

3.2.2.7.12.1 Permissions

The role must have the CREATE permission at the database level, as well as SELECT permissions for any tables referenced by the view.

3.2.2.7.12.2 Syntax create_view_statement ::= CREATE VIEW [schema_name].view_name [ column_list ] AS query ; schema_name ::= identifier view_name ::= identifier column_list ::= ( { column_name [, ...]}) column_name ::= identifier

3.2.2.7.12.3 Parameters

Pa- Description rame- ter schema_nameThe name of the schema in which to create the view. view_nameThe name of the view to create, which must be unique inside the schema. column_listAn optional comma separated list of column names for the view. If specified, these column names will override the column names in the response of the query in the AS query statement. AS The select query to execute when the view is referenced. query

3.2. SQL Statements and Syntax 427 SQream DB, Release 2021.1

3.2.2.7.12.4 Examples

3.2.2.7.12.5 A simple view

CREATE VIEW only_agressive_animals AS SELECT * FROM cool_animals WHERE is_agressive=true;

SELECT * FROM only_agressive_animals;

3.2.2.7.12.6 Overriding default column names

CREATE VIEW only_relaxed_animals (animal_id, animal_name, should_i_worry) AS SELECT id, name, is_agressive FROM cool_animals WHERE is_agressive=false;

3.2.2.7.13 DROP CLUSTERING KEY

DROP CLUSTERING KEY drops all clustering keys in a table. Read our Data clustering guide for more information. See also: CLUSTER BY , CREATE TABLE.

3.2.2.7.13.1 Permissions

The role must have the DDL permission at the database or table level.

3.2.2.7.13.2 Syntax alter_table_rename_table_statement ::= ALTER TABLE [schema_name.]table_name DROP CLUSTERING KEY ; table_name ::= identifier

3.2.2.7.13.3 Parameters

Parameter Description schema_name The schema name for the table. Defaults to public if not specified. table_name The table name to apply the change to.

3.2.2.7.13.4 Usage notes

Removing clustering keys does not affect existing data. To force data to re-cluster, the table has to be recreated (i.e. with CREATE TABLE AS).

428 Chapter 3. Reference SQream DB, Release 2021.1

3.2.2.7.13.5 Examples

3.2.2.7.13.6 Dropping clustering keys in a table

ALTER TABLE public.users DROP CLUSTERING KEY

3.2.2.7.14 DROP COLUMN

DROP COLUMN can be used to remove columns from a table.

3.2.2.7.14.1 Permissions

The role must have the DDL permission at the database or table level.

3.2.2.7.14.2 Syntax

alter_table_drop_column_statement ::= ALTER TABLE [schema_name.]table_name DROP COLUMN column_name ; table_name ::= identifier schema_name ::= identifier column_name ::= identifier

3.2.2.7.14.3 Parameters

Parameter Description schema_name The schema name for the table. Defaults to public if not specified. table_name The table name to apply the change to. column_name The column to remove.

3.2.2.7.14.4 Examples

3.2.2.7.14.5 Removing a column

-- Remove the 'weight' column ALTER TABLE users DROP COLUMN weight;

3.2. SQL Statements and Syntax 429 SQream DB, Release 2021.1

3.2.2.7.14.6 Removing a column with a quoted identifier name

ALTER TABLE users DROP COLUMN "Weight in kilograms";

3.2.2.7.15 DROP DATABASE

DROP DATABASE can be used to remove a database and all of its objects.

3.2.2.7.15.1 Permissions

The role must have the DDL permission at the database level.

3.2.2.7.15.2 Syntax drop_database_statement ::= DROP DATABASE database_name ; database_name ::= identifier

3.2.2.7.15.3 Parameters

Parameter Description database_name The name of the database to drop. This can not be the current database in use.

3.2.2.7.15.4 Examples

3.2.2.7.15.5 Dropping a database and all of its objects master=> DROP DATABASE raviga; executed

3.2.2.7.15.6 Dropping the current database

The current database in use can’t be dropped. Switch to another database first. raviga=> DROP DATABASE raviga; Current open database 'raviga' cannot be dropped. raviga=> \c master master=> DROP DATABASE raviga; executed

430 Chapter 3. Reference SQream DB, Release 2021.1

3.2.2.7.16 DROP FUNCTION

DROP FUNCTION can be used to remove a user defined function.

3.2.2.7.16.1 Permissions

The role must have the DDL permission at the database level.

3.2.2.7.16.2 Syntax drop_function_statement ::= DROP FUNCTION [ IF EXISTS ] function_name(); ; function_name ::= identifier

3.2.2.7.16.3 Parameters

Parameter Description IF EXISTS Drop the function if it exists. Does not error if the function does not exist. function_name() The name of the function to drop.

3.2.2.7.16.4 Examples

3.2.2.7.16.5 Dropping a function

DROP FUNCTION my_distance();

3.2.2.7.16.6 Dropping a function (always succeeds) farm=> DROP FUNCTION my_distance(); executed farm=> DROP FUNCTION my_distance(); Function 'my_distance' not found

-- This will succeed, even though the function does not exist farm=> DROP FUNCTION IF EXISTS my_distance(); executed

3.2.2.7.17 DROP SCHEMA

DROP SCHEMA can be used to remove a schema.

3.2. SQL Statements and Syntax 431 SQream DB, Release 2021.1

The schema has to be empty before removal. SQream DB does not support dropping a schema with objects. See also: CREATE SCHEMA, ALTER DEFAULT SCHEMA.

3.2.2.7.17.1 Permissions

The role must have the DDL permission at the database level.

3.2.2.7.17.2 Syntax drop_schema_statement ::= DROP SCHEMA schema_name ; schema_name ::= identifier

3.2.2.7.17.3 Parameters

Parameter Description schema_name The name of the schema to drop

3.2.2.7.17.4 Examples

3.2.2.7.17.5 Dropping a schema

DROP SCHEMA test;

3.2.2.7.17.6 Dropping a schema if it’s not empty

If a schema contains several tables, SQream DB will alert you that these tables need to be dropped first. This prevents accidental dropping of full schemas. t=> DROP SCHEMA test; Schema 'test' contains the following objects: Tables - 'test.foo' , 'test.bar' Please drop its content and then try again.

To drop the schema, drop the schema’s tables first, and then drop the schema: t=> DROP TABLE test.foo; executed t=> DROP TABLE test.bar; executed t=> DROP SCHEMA test; executed

432 Chapter 3. Reference SQream DB, Release 2021.1

3.2.2.7.18 DROP TABLE

DROP TABLE can be used to remove a table and all of its contents.

3.2.2.7.18.1 Permissions

The role must have the DDL permission at the database or table level.

3.2.2.7.18.2 Syntax drop_table_statement ::= DROP TABLE [ IF EXISTS ] [schema_name.]table_name ; table_name ::= identifier schema_name ::= identifier

3.2.2.7.18.3 Parameters

Parameter Description IF EXISTS Drop the table if it exists. Does not error if the table does not exist. schema_name The name of the schema from which to drop the table. table_name The name of the table to drop.

3.2.2.7.18.4 Examples

3.2.2.7.18.5 Dropping a table

DROP TABLE cool_animals;

3.2.2.7.18.6 Dropping a table (always succeeds) farm=> DROP TABLE cool_animals; executed farm=> DROP TABLE cool_animals; Table 'public.cool_animals' not found

-- This will succeed, even though the table does not exist farm=> DROP TABLE IF EXISTS cool_animals; executed

3.2. SQL Statements and Syntax 433 SQream DB, Release 2021.1

3.2.2.7.19 DROP VIEW

DROP VIEW can be used to remove a view. Because a view is logical, this does not affect any data in any of the referenced tables.

3.2.2.7.19.1 Permissions

The role must have the DDL permission at the database level.

3.2.2.7.19.2 Syntax drop_view_statement ::= DROP VIEW [ IF EXISTS ] [schema_name.]view_name ; view_name ::= identifier schema_name ::= identifier

3.2.2.7.19.3 Parameters

Parameter Description IF EXISTS Drop the view if it exists. Does not error if the view does not exist. schema_name The name of the schema from which to drop the view. view_name The name of the view to drop.

3.2.2.7.19.4 Examples

3.2.2.7.19.5 Dropping a table

DROP VIEW angry_animals;

3.2.2.7.19.6 Dropping a view (always succeeds) farm=> DROP VIEW angry_animals; executed farm=> DROP VIEW angry_animals; View 'public.angry_animals' not found

-- This will succeed, even though the view does not exist farm=> DROP VIEW IF EXISTS angry_animals; executed

434 Chapter 3. Reference SQream DB, Release 2021.1

3.2.2.7.20 RENAME COLUMN

RENAME COLUMN can be used to rename columns in a table.

3.2.2.7.20.1 Permissions

The role must have the DDL permission at the database or table level.

3.2.2.7.20.2 Syntax alter_table_rename_column_statement ::= ALTER TABLE [schema_name.]table_name RENAME COLUMN current_name TO new_name ; table_name ::= identifier schema_name ::= identifier current_name ::= identifier new_name ::= identifier

3.2.2.7.20.3 Parameters

Parameter Description schema_name The schema name for the table. Defaults to public if not specified. table_name The table name to apply the change to. current_name The column to rename. new_name The new column name.

3.2.2.7.20.4 Examples

3.2.2.7.20.5 Renaming a column

-- Remove the 'weight' column ALTER TABLE users RENAME COLUMN weight TO mass;

3.2.2.7.20.6 Renaming a quoted name

ALTER TABLE users RENAME COLUMN "mass" TO "Mass (Kilograms);

3.2. SQL Statements and Syntax 435 SQream DB, Release 2021.1

3.2.2.7.21 RENAME TABLE

RENAME TABLE can be used to rename a table.

Warning: Renaming a table can void existing views that use this table. See more about recompiling views.

3.2.2.7.21.1 Permissions

The role must have the DDL permission at the database or table level.

3.2.2.7.21.2 Syntax

alter_table_rename_table_statement ::= ALTER TABLE [schema_name.]current_name RENAME TO new_name ; current_name ::= identifier schema_name ::= identifier new_name ::= identifier

3.2.2.7.21.3 Parameters

Parameter Description schema_name The schema name for the table. Defaults to public if not specified. current_name The table name to apply the change to. new_name The new table name.

3.2.2.7.21.4 Examples

3.2.2.7.21.5 Renaming a table

ALTER TABLE public.users RENAME TO former_users;

3.2.2.7.22 COPY FROM

COPY ... FROM is a statement that allows loading data from files on the filesystem and importing them into SQream tables. This is the recommended way for bulk loading CSV files into SQream DB. In general, COPY moves data between filesystem files and SQream DB tables.

Note:

436 Chapter 3. Reference SQream DB, Release 2021.1

• Learn how to migrate from CSV files in the Insert from CSV guide • To copy data from a table to a file, see COPY TO. • To load Parquet or ORC files, see CREATE FOREIGN TABLE

3.2.2.7.22.1 Permissions

The role must have the INSERT permission to the destination table.

3.2.2.7.22.2 Syntax

COPY [schema name.]table_name FROM WRAPPER fdw_name OPTIONS ( [ copy_from_option [, ...]] ) ;

schema_name ::= identifer

table_name ::= identifier

copy_from_option ::=

LOCATION = { filename | S3 URI | HDFS URI }

| OFFSET = { offset }

| LIMIT = { limit }

| DELIMITER = '{ delimiter }'

| RECORD_DELIMITER = '{ record delimiter }'

| ERROR_LOG = '{ local filepath }'

| REJECTED_DATA = '{ local filepath }'

| CONTINUE_ON_ERROR = { true | false }

| ERROR_COUNT = '{ error count }'

| DATETIME_FORMAT = '{ parser format }'

| AWS_ID = '{ AWS ID }'

| AWS_SECRET = '{ AWS Secret }' offset ::= positive integer (continues on next page)

3.2. SQL Statements and Syntax 437 SQream DB, Release 2021.1

(continued from previous page)

limit ::= positive integer

delimiter ::= string

record delimiter ::= string

error count ::= integer

parser_format ::= see supported parser table below

AWS ID ::= string

AWS Secret ::= string

Note: Some options are applicable to CSVs only. These include: OFFSET, LIMIT, DELIMITER, RECORD_DELIMITER, REJECTED_DATA, DATETIME_FORMAT

438 Chapter 3. Reference SQream DB, Release 2021.1

3.2. SQL Statements and Syntax 439 SQream DB, Release 2021.1

3.2.2.7.22.3 Elements

Parameter Default value Value range Description [schema_name. None Table to copy data into ]table_name name_fdw csv_fdw, orc_fdw, or The name of the Foreign parquet_fdw Data Wrapper to use LOCATION None A path on the lo- cal filesystem, S3, or HDFS URI. For ex- ample, /tmp/foo.csv, s3://my-bucket/ foo.csv, or hdfs:// my-namenode:8020/ foo.csv. The local path must be an abso- lute path that SQream DB can access. Wild- cards are premitted in this field. OFFSET 1 >1, but no more than The row number to start the number of lines in with. The first row is 1. the first file LIMIT unlimited 1 to 2147483647. When specified, tells SQream DB to stop loading after the spec- ified number of rows. Unlimited if unset. DELIMITER ',' Almost any ASCII char- Specifies the field ter- acter, See field delim- minator - the character iters section below (or characters) that sep- arates fields or columns within each row of the file. RECORD_DELIMITER \n (UNIX style newline) \n, \r\n, \r Specifies the row ter- minator - the character that separates lines or rows, also known as a new line separator. ERROR_LOG No error log When used, the COPY process will write error information from un- parsable rows to the file specified by this param- eter. • If an existing file path is specified, it will be overwrit- ten. • Specifying the same file for ERROR_LOG and REJECTED_DATA is not allowed and will result in 440 Chapter 3. Reference error. • Specifing an error log when creating a foreign table will write a new error log for every query on the foreign ta- ble.

REJECTED_DATA Inactive When used, the COPY process will write the rejected record lines to this file. • If an existing file path is specified, it will be overwrit- ten. • Specifying the same file for ERROR_LOG and REJECTED_DATA is not allowed and will result in error. • Specifing an error log when creating a foreign table will write a new error log for every query on the foreign ta- ble.

CONTINUE_ON_ERROR false true, false Specifies if errors should be ignored or skipped. When set to true, the transaction will con- tinue despite rejected data. This parameter should be set together with ERROR_COUNT When reading multiple files, if an entire file can’t be opened it will be skipped. ERROR_COUNT unlimited 1 to 2147483647 Specifies the threshold for the maximum num- ber of faulty records that will be ignored. This setting must be used in conjunction with CONTINUE_ON_ERROR. DATETIME_FORMAT ISO8601 for all columns See table below Allows specifying a non- default date formats for specific columns AWS_ID, AWS_SECRET None Specifies the authentica- tion details for secured S3 buckets SQream DB, Release 2021.1

3.2.2.7.22.4 Supported Date Formats

Table 14: Supported date parsers Name Pattern Examples ISO8601, YYYY-MM-DD [hh:mm:ss[. 2017-12-31 11:12:13.456, 2018-11-02 11:05:00, DEFAULT SSS]] 2019-04-04 ISO8601C YYYY-MM-DD 2017-12-31 11:12:13:456 [hh:mm:ss[:SSS]] DMY DD/MM/YYYY [hh:mm:ss[. 31/12/2017 11:12:13.123 SSS]] YMD YYYY/MM/DD [hh:mm:ss[. 2017/12/31 11:12:13.678 SSS]] MDY MM/DD/YYYY [hh:mm:ss[. 12/31/2017 11:12:13.456 SSS]] YYYYMMDD YYYYMMDD[hh[mm[ss[SSS]]]] 20171231111213456 YYYY-M-D YYYY-M-D[ h:m[:s[.S]]] 2017-9-10 10:7:21.1 (optional leading zeroes) YYYY/M/D YYYY/M/D[ h:m[:s[.S]]] 2017/9/10 10:7:21.1 (optional leading zeroes) DD-mon-YYYY DD-mon-YYYY[ hh:mm[:ss[. 31-Dec-2017 11:12:13.456 SSS]]] YYYY-mon-DD YYYY-mon-DD[ hh:mm[:ss[. 2017-Dec-31 11:12:13.456 SSS]]]

Pattern Description YYYY four digit year representation (0000-9999) MM two digit month representation (01-12) DD two digit day of month representation (01-31) m short month representation (Jan-Dec) a short day of week representation (Sun-Sat). hh two digit 24 hour representation (00-23) h two digit 12 hour representation (00-12) P uppercase AM/PM representation mm two digit minute representation (00-59) ss two digit seconds representation (00-59) SSS 3 digits fraction representation for milliseconds (000-999)

Note: These date patterns are not the same as date parts used in the DATEPART function.

3.2.2.7.22.5 Supported Field Delimiters

Field delimiters can be one or more characters.

3.2.2.7.22.6 Customizing Quotations Using Alternative Characters

3.2.2.7.22.7 Syntax Example 1 - Customizing Quotations Using Alternative Characters

The following is the correct syntax for customizing quotations using alternative characters:

3.2. SQL Statements and Syntax 441 SQream DB, Release 2021.1

copy t from wrapper csv_fdw options (location = '/tmp/source_file.csv', quote='@'); copy t to wrapper csv_fdw options (location = '/tmp/destination_file.csv', quote='@');

3.2.2.7.22.8 Usage Example 1 - Customizing Quotations Using Alternative Characters

The following is an example of line taken from a CSV when customizing quotations using a character:

Pepsi-"Cola",@Coca-"Cola"@,Sprite,Fanta

3.2.2.7.22.9 Syntax Example 2 - Customizing Quotations Using ASCII Character Codes

The following is the correct syntax for customizing quotations using ASCII character codes: copy t from wrapper csv_fdw options (location = '/tmp/source_file.csv', quote=E'\064'); copy t to wrapper csv_fdw options (location = '/tmp/destination_file.csv', quote=E'\064 ,→');

3.2.2.7.22.10 Usage Example 2 - Customizing Quotations Using ASCII Character Codes

The following is an example of line taken from a CSV when customizing quotations using an ASCII character code:

Pepsi-"Cola",@Coca-"Cola"@,Sprite,Fanta

3.2.2.7.22.11 Multi-Character Delimiters

SQream DB supports multi-character field delimiters, sometimes found in non-standard files. A multi-character delimiter can be specified. For example, DELIMITER '%%', DELIMITER '{~}', etc.

3.2.2.7.22.12 Printable Characters

Any printable ASCII character (or characters) can be used as a delimiter without special syntax. The default CSV field delimiter is a comma (,). A printable character is any ASCII character in the range 32 - 126. Literal quoting rules apply with delimiters. For example, to use ' as a field delimiter, use DELIMITER ''''

3.2.2.7.22.13 Non-Printable Characters

A non-printable character (1 - 31, 127) can be used in its octal form. A tab can be specified by escaping it, for example \t. Other non-printable characters can be specified using their octal representations, by using the E'\000' format, where 000 is the octal value of the character. For example, ASCII character 15, known as “shift in”, can be specified using E'\017'.

442 Chapter 3. Reference SQream DB, Release 2021.1

3.2.2.7.22.14 Unsupported Field Delimiters

The following ASCII field delimiters (octal range 001 - 176) are not supported:

Charac- Deci- Sym- Charac- Deci- Sym- Charac- Deci- Sym- ter mal bol ter mal bol ter mal bol “ 34 42 a 97 141 p 112 160 - 45 55 b 98 142 q 113 161 . 46 56 c 99 143 r 114 162 : 58 72 d 100 144 s 115 163 \ 92 134 e 101 145 t 116 164 0 48 60 f 102 146 u 117 165 1 49 61 g 103 147 v 118 166 2 50 62 h 104 150 w 119 167 3 51 63 i 105 151 x 120 170 4 52 64 j 106 152 y 121 171 5 53 65 k 107 153 z 122 172 6 54 66 l 108 154 N 78 116 7 55 67 m 109 155 10 49 12 8 56 70 n 110 156 13 49 13 9 57 71 o 111 157

3.2.2.7.22.15 Capturing Rejected Rows

Prior to the column process and storage, the COPY command parses the data. Whenever the data can’t be parsed because it is improperly formatted or doesn’t match the data structure, the entire record (or row) will be rejected.

1. When ERROR_LOG is not used, the COPY command will stop and roll back the transaction upon the first error. 2. When ERROR_LOG is set and ERROR_VERBOSITY is set to 1 (default), all errors and rejected rows are saved to the file path specified. 3. When ERROR_LOG is set and ERROR_VERBOSITY is set to 0, rejected rows are saved to the file path specified, but errors are not logged. This is useful for replaying the filelater.

3.2. SQL Statements and Syntax 443 SQream DB, Release 2021.1

3.2.2.7.22.16 CSV Support

By default, SQream DB’s CSV parser can handle RFC 4180 standard CSVs , but can also be modified to support non-standard CSVs (with multi-character delimiters, unquoted fields, etc). All CSV files should be prepared according to these recommendations: • Files are UTF-8 or ASCII encoded • Field delimiter is an ASCII character or characters • Record delimiter, also known as a new line separator, is a Unix-style newline (\n), DOS-style newline (\r\n), or Mac style newline (\r). • Fields are optionally enclosed by double-quotes, or mandatory quoted if they contain one of the fol- lowing characters: – The record delimiter or field delimiter – A double quote character – A newline • If a field is quoted, any double quote that appears must be double-quoted (similar tothe string literals quoting rules. For example, to encode What are "birds"?, the field should appear as "What are ""birds""?". Other modes of escaping are not supported (e.g. 1,"What are \"birds\"?" is not a valid way of escaping CSV values).

3.2.2.7.22.17 Marking Null Markers

NULL values can be marked in two ways in the CSV: • An explicit null marker. For example, col1,\N,col3 • An empty field delimited by the field delimiter. For example, col1,,col3

Note: If a text field is quoted but contains no content ("") it is considered an empty text field. It is not considered NULL.

3.2.2.7.22.18 Examples

3.2.2.7.22.19 Loading a Standard CSV File

COPY table_name FROM WRAPPER csv_fdw OPTIONS (location = '/tmp/file.csv');

3.2.2.7.22.20 Skipping Faulty Rows

COPY table_name FROM WRAPPER csv_fdw OPTIONS (location = '/tmp/file.csv', continue_on_ ,→error = true);

444 Chapter 3. Reference SQream DB, Release 2021.1

3.2.2.7.22.21 Skipping a Maximum of 100 Faulty Rows

COPY table_name FROM WRAPPER csv_fdw OPTIONS (location = '/tmp/file.csv', continue_on_ ,→error = true, error_count = 100);

3.2.2.7.22.22 Loading a Pipe Separated Value (PSV) File

COPY table_name FROM WRAPPER csv_fdw OPTIONS (location = '/tmp/file.psv', delimiter = '| ,→');

3.2.2.7.22.23 Loading a Tab Separated Value (TSV) File

COPY table_name FROM WRAPPER csv_fdw OPTIONS (location = '/tmp/file.tsv', delimiter = '\t ,→');

3.2.2.7.22.24 Loading an ORC File

COPY table_name FROM WRAPPER orc_fdw OPTIONS (location = '/tmp/file.orc');

3.2.2.7.22.25 Loading a Parquet File

COPY table_name FROM WRAPPER parquet_fdw OPTIONS (location = '/tmp/file.parquet');

3.2.2.7.22.26 Loading a Text File with Non-Printable Delimiters

In the file below, the separator is DC1, which is represented by ASCII 17 decimal or 021 octal.

COPY table_name FROM WRAPPER psv_fdw OPTIONS (location = '/tmp/file.txt', delimiter = E ,→'\021');

3.2.2.7.22.27 Loading a Text File with Multi-Character Delimiters

In the file below, the separator is ^|.

COPY table_name FROM WRAPPER psv_fdw OPTIONS (location = '/tmp/file.txt', delimiter = '^| ,→');

In the file below, the separator is '|. The quote character has to be repeated, as per the literal quoting rules.

COPY table_name FROM WRAPPER psv_fdw OPTIONS (location = '/tmp/file.txt', delimiter = ''' ,→'|');

3.2. SQL Statements and Syntax 445 SQream DB, Release 2021.1

3.2.2.7.22.28 Loading Files with a Header Row

Use OFFSET to skip rows.

Note: When loading multiple files (e.g. with wildcards), this setting affects each file separately.

COPY table_name FROM WRAPPER csv_fdw OPTIONS (location = '/tmp/file.psv', delimiter = '| ,→', offset = 2);

3.2.2.7.22.29 Loading Files Formatted for Windows (\r\n)

COPY table_name FROM WRAPPER csv_fdw OPTIONS (location = '/tmp/file.psv', delimiter = ,→'\r\n');

3.2.2.7.22.30 Loading a File from a Public S3 Bucket

Note: The bucket must be publicly available and objects can be listed

COPY table_name FROM WRAPPER csv_fdw OPTIONS (location = 's3://sqream-demo-data/file.csv ,→', delimiter = '\r\n', offset = 2);

3.2.2.7.22.31 Loading Files from an Authenticated S3 Bucket

COPY table_name FROM WRAPPER psv_fdw OPTIONS (location = 's3://secret-bucket/*.csv',␣ ,→DELIMITER = '\r\n', OFFSET = 2, AWS_ID = '12345678', AWS_SECRET = 'super_secretive_ ,→secret');

3.2.2.7.22.32 Saving Rejected Rows to a File

Note: When loading multiple files (e.g. with wildcards), this error threshold is for the entire transaction.

COPY table_name FROM WRAPPER csv_fdw OPTIONS (location = '/tmp/file.csv', ␣ ,→ ,continue_on_error = true ,error_log = '/temp/load_error.log' );

COPY table_name FROM WRAPPER csv_fdw OPTIONS (location = '/tmp/file.psv' ␣ ,→ ,delimiter '|' ,error_log = '/temp/load_error.log' -- Save␣ ,→error log (continues on next page)

446 Chapter 3. Reference SQream DB, Release 2021.1

(continued from previous page) ,rejected_data = '/temp/load_rejected.log' - ,→- Only save rejected rows ,limit = 100 -- Only load 100 rows ,error_count = 5 -- Stop the load if 5␣ ,→errors reached );

3.2.2.7.22.33 Loading CSV Files from a Set of Directories

COPY table_name FROM WRAPPER csv_fdw OPTIONS (location = '/tmp/2019_08_*/*.csv');

3.2.2.7.22.34 Rearranging Destination Columns

When the source of the files does not match the table structure, tell the COPY command what the order of columns should be

COPY table_name (fifth, first, third) FROM WRAPPER csv_fdw OPTIONS (location = '/tmp/*. ,→csv');

Note: Any column not specified will revert to its default value or NULL value if nullable

3.2.2.7.22.35 Loading Non-Standard Dates

If files contain dates not formatted as ISO8601, tell COPY how to parse the column. After parsing, the date will appear as ISO8601 inside SQream DB. These are called date parsers. You can find the supported dates in the ‘Supported date parsers’ table above In this example, date_col1 and date_col2 in the table are non-standard. date_col3 is mentioned explicitly, but can be left out. Any column that is not specified is assumed to be ISO8601.

COPY table_name FROM WRAPPER csv_fdw OPTIONS (location = '/tmp/*.csv', datetime_format = ,→'DMY');

3.2.2.7.23 COPY TO

COPY ... TO is a statement that can be used to export data from a SQream database table or query to a file on the filesystem. In general, COPY moves data between filesystem files and SQream DB tables.

Note: To copy data from a file to a table, see COPY FROM.

3.2. SQL Statements and Syntax 447 SQream DB, Release 2021.1

3.2.2.7.23.1 Permissions

The role must have the SELECT permission on every table or schema that is referenced by the statement.

3.2.2.7.23.2 Syntax

COPY { [schema_name].table_name [ ( column_name [, ... ] ) ] | query } TO [FOREIGN DATA] WRAPPER fdw_name

OPTIONS ( [ copy_to_option [, ...]] ) ;

fdw_name ::= csw_fdw | parquet_fdw | orc_fdw

schema_name ::= identifer

table_name ::= identifier

copy_to_option ::=

LOCATION = { filename | S3 URI | HDFS URI }

| DELIMITER = '{ delimiter }'

| RECORD_DELIMITER = '{ record delimiter }'

| HEADER = { true | false }

| AWS_ID = '{ AWS ID }'

| AWS_SECRET = '{ AWS Secret }' delimiter ::= string record delimiter ::= string

AWS ID ::= string

AWS Secret ::= string

448 Chapter 3. Reference SQream DB, Release 2021.1

3.2.2.7.23.3 Elements

Parameter Description [schema_name].Name of the table to be exported table_name query An SQL query that returns a table result, or a table name fdw_name The name of the Foreign Data Wrapper to use. Supported FDWs are csv_fdw, orc_fdw, or parquet_fdw. LOCATION A path on the local filesystem, S3, or HDFS URI. For example, /tmp/foo.csv, s3:// my-bucket/foo.csv, or hdfs://my-namenode:8020/foo.csv. The local path must be an absolute path that SQream DB can access. HEADER The CSV file will contain a header line with the names of each column in thefile.This option is allowed only when using CSV format. DELIMITER Specifies the character that separates fields (columns) within each row of thefile.The default is a comma character (,). AWS_ID, Specifies the authentication details for secured S3 buckets AWS_SECRET

3.2.2.7.23.4 Usage notes

3.2.2.7.23.5 Supported field delimiters

3.2.2.7.23.6 Printable characters

Any printable ASCII character can be used as a delimiter without special syntax. The default CSV field delimiter is a comma (,). A printable character is any ASCII character in the range 32 - 126.

3.2.2.7.23.7 Non-printable characters

A non-printable character (1 - 31, 127) can be used in its octal form. A tab can be specified by escaping it, for example \t. Other non-printable characters can be specified using their octal representations, by using the E'\000' format, where 000 is the octal value of the character. For example, ASCII character 15, known as “shift in”, can be specified using E'\017'.

3.2.2.7.23.8 Date format

The date format in the output CSV is formatted as ISO 8601 (2019-12-31 20:30:55.123), regardless of how it was parsed initially with COPY FROM date parsers.

3.2.2.7.23.9 Examples

3.2.2.7.23.10 Export table to a CSV without HEADER

COPY nba TO WRAPPER csv_fdw OPTIONS (LOCATION = '/tmp/nba_export.csv', DELIMITER = ',',␣ ,→HEADER = false); (continues on next page)

3.2. SQL Statements and Syntax 449 SQream DB, Release 2021.1

(continued from previous page)

$ head -n6 nba.csv Avery Bradley,Boston Celtics,0,PG,25,6-2,180,Texas,7730337 Jae Crowder,Boston Celtics,99,SF,25,6-6,235,Marquette,6796117 John Holland,Boston Celtics,30,SG,27,6-5,205,Boston University,\N R.J. Hunter,Boston Celtics,28,SG,22,6-5,185,Georgia State,1148640 Jonas Jerebko,Boston Celtics,8,PF,29,6-10,231,\N,5000000 Amir Johnson,Boston Celtics,90,PF,29,6-9,240,\N,12000000

3.2.2.7.23.11 Export table to a CSV with a HEADER row

COPY nba TO WRAPPER csv_fdw OPTIONS (LOCATION = '/tmp/nba_export.csv', DELIMITER = ',',␣ ,→HEADER = true);

$ head -n6 nba_h.csv Name,Team,Number,Position,Age,Height,Weight,College,Salary Avery Bradley,Boston Celtics,0,PG,25,6-2,180,Texas,7730337 Jae Crowder,Boston Celtics,99,SF,25,6-6,235,Marquette,6796117 John Holland,Boston Celtics,30,SG,27,6-5,205,Boston University,\N R.J. Hunter,Boston Celtics,28,SG,22,6-5,185,Georgia State,1148640 Jonas Jerebko,Boston Celtics,8,PF,29,6-10,231,\N,5000000

3.2.2.7.23.12 Export table to a TSV with a header row

COPY nba TO WRAPPER csv_fdw OPTIONS (LOCATION = '/tmp/nba_export.csv', DELIMITER = '|',␣ ,→HEADER = true);

$ head -n6 nba_h.tsv Name Team Number Position Age Height Weight College Salary Avery Bradley Boston Celtics 0 PG 25 6-2 180 Texas 7730337 Jae Crowder Boston Celtics 99 SF 25 6-6 235 Marquette ␣ ,→6796117 John Holland Boston Celtics 30 SG 27 6-5 205 Boston␣ ,→University \N R.J. Hunter Boston Celtics 28 SG 22 6-5 185 Georgia State ␣ ,→1148640 Jonas Jerebko Boston Celtics 8 PF 29 6-10 231 \N 5000000

3.2.2.7.23.13 Use non-printable ASCII characters as delimiter

Non-printable characters can be specified using their octal representations, by using the E'\000' format, where 000 is the octal value of the character. For example, ASCII character 15, known as “shift in”, can be specified using E'\017'.

COPY nba TO WRAPPER csv_fdw OPTIONS (LOCATION = '/tmp/nba_export.csv', DELIMITER = E'\017 ,→');

450 Chapter 3. Reference SQream DB, Release 2021.1

COPY nba TO WRAPPER csv_fdw OPTIONS (LOCATION = '/tmp/nba_export.csv', DELIMITER = E'\011 ,→'); -- 011 is a tab character

3.2.2.7.23.14 Exporting the result of a query to a CSV

COPY (SELECT "Team", AVG("Salary") FROM nba GROUP BY 1) TO WRAPPER csv_fdw OPTIONS␣ ,→(LOCATION = '/tmp/nba_export.csv');

$ head -n5 nba_salaries.csv Atlanta Hawks,4860196 Boston Celtics,4181504 Brooklyn Nets,3501898 Charlotte Hornets,5222728 Chicago Bulls,5785558

3.2.2.7.23.15 Saving files to an authenticated S3 bucket

COPY (SELECT "Team", AVG("Salary") FROM nba GROUP BY 1) TO WRAPPER csv_fdw OPTIONS␣ ,→(LOCATION = 's3://my_bucket/salaries/nba_export.csv', AWS_ID = 'my_aws_id', AWS_SECRET␣ ,→= 'my_aws_secret');

3.2.2.7.23.16 Saving files to an HDFS path

COPY (SELECT "Team", AVG("Salary") FROM nba GROUP BY 1) TO WRAPPER csv_fdw OPTIONS␣ ,→(LOCATION = 'hdfs://pp_namenode:8020/nba_export.csv');

3.2.2.7.23.17 Export table to a parquet file

COPY nba TO WRAPPER parquet_fdw OPTIONS (LOCATION = '/tmp/nba_export.parquet');

3.2.2.7.23.18 Export a query to a parquet file

COPY (select x,y from t where z=0) TO WRAPPER parquet_fdw OPTIONS (LOCATION = '/tmp/file. ,→parquet');

3.2.2.7.23.19 Export table to a ORC file

COPY nba TO WRAPPER orc_fdw OPTIONS (LOCATION = '/tmp/nba_export.orc');

3.2. SQL Statements and Syntax 451 SQream DB, Release 2021.1

3.2.2.7.24 DELETE

DELETE removes specific rows from a table. See more information about how SQream DB deletes data in the Deleting Data guide.

Tip: • To delete all rows from a table, see TRUNCATE • To delete columns, see DROP COLUMN

3.2.2.7.24.1 Permissions

The role must have the DELETE and SELECT permissions at the table level.

3.2.2.7.24.2 Syntax

delete_table_statement ::= DELETE FROM [schema_name.]table_name [ WHERE value_expr ] ;

chunk_cleanup_statement ::= SELECT CLEANUP_CHUNKS ( 'schema_name', 'table_name' ) ; extent_cleanup_statement ::= SELECT CLEANUP_EXTENTS ( 'schema_name', 'table_name' ) ;

table_name ::= identifier

schema_name ::= identifier

3.2.2.7.24.3 Parameters

Parame- Description ter schema_nameThe name of the schema for the table. table_nameThe name of the table to delete rows from. value_exprAn expression that returns Boolean values using columns, such as = . Rows that match the expression will be deleted.

3.2.2.7.24.4 How SQream DB deletes data

Deleting data in SQream DB is a two-step process. First, SQream DB marks rows as deleted, but they remain on-disk until a cleanup process is initiated.

452 Chapter 3. Reference SQream DB, Release 2021.1

See more information about how SQream DB deletes data in the Deleting Data guide.

3.2.2.7.24.5 Notes

• ALTER TABLE and other DDL operations are blocked on tables that require clean-up. • The value expression for deletion can’t be the result of a subquery or a join. • SQream DB may prevent a very long delete process. If the estimated time is beyond the threshold, the error message will explain how to override this limitation and continue the process.

3.2.2.7.24.6 Examples

3.2.2.7.24.7 Deleting values from a table farm=> SELECT * FROM cool_animals; 1,Dog ,7 2,Possum ,3 3,Cat ,5 4,Elephant ,6500 5,Rhinoceros ,2100 6,\N,\N

6 rows farm=> DELETE FROM cool_animals WHERE weight > 1000; executed farm=> SELECT * FROM cool_animals; 1,Dog ,7 2,Possum ,3 3,Cat ,5 6,\N,\N

4 rows

3.2.2.7.24.8 Deleting values based on more complex predicates farm=> SELECT * FROM cool_animals; 1,Dog ,7 2,Possum ,3 3,Cat ,5 4,Elephant ,6500 5,Rhinoceros ,2100 6,\N,\N

6 rows farm=> DELETE FROM cool_animals WHERE weight > 1000; executed (continues on next page)

3.2. SQL Statements and Syntax 453 SQream DB, Release 2021.1

(continued from previous page) farm=> SELECT * FROM cool_animals; 1,Dog ,7 2,Possum ,3 3,Cat ,5 6,\N,\N

4 rows

3.2.2.7.24.9 Identifying and cleaning up tables

3.2.2.7.24.10 List tables that haven’t been cleaned up farm=> SELECT t.table_name FROM sqream_catalog.delete_predicates dp JOIN sqream_catalog.tables t ON dp.table_id = t.table_id GROUP BY 1; cool_animals

1 row

3.2.2.7.24.11 Identify predicates for clean-up farm=> SELECT delete_predicate FROM sqream_catalog.delete_predicates dp JOIN sqream_catalog.tables t ON dp.table_id = t.table_id WHERE t.table_name = 'cool_animals'; weight > 1000

1 row

3.2.2.7.24.12 Triggering a cleanup

-- Chunk reorganization (SWEEP) farm=> SELECT CLEANUP_CHUNKS('public','cool_animals'); executed

-- Delete leftover files (VACUUM) farm=> SELECT CLEANUP_EXTENTS('public','cool_animals'); executed farm=> SELECT delete_predicate FROM sqream_catalog.delete_predicates dp JOIN sqream_catalog.tables t ON dp.table_id = t.table_id WHERE t.table_name = 'cool_animals'; (continues on next page)

454 Chapter 3. Reference SQream DB, Release 2021.1

(continued from previous page)

0 rows

3.2.2.7.25 INSERT

INSERT inserts one or more rows into a table.

Tip: • To bulk load data into existing tables, the COPY FROM command performs better than INSERT. • To load Parquet or ORC files, see CREATE FOREIGN TABLE

3.2.2.7.25.1 Permissions

The role must have the INSERT permission to the destination table.

3.2.2.7.25.2 Syntax insert_statement ::=

INSERT INTO [schema_name.]table_name [ ( column_name [, ... ])] query ; schema_name ::= identifier table_name ::= identifier column_name ::= identifier

3.2.2.7.25.3 Elements

Parameter Description [schema_name.Table to insert data into ]table_name ( A comma separated list of column names that specifies the destination columns of the column_name insert. [, ...] ) query A SELECT or VALUES statement. Each value must match the data type of its destination column. The values must also match the order of the table or columns if specified with ‘ (column_name, …)‘.

3.2. SQL Statements and Syntax 455 SQream DB, Release 2021.1

3.2.2.7.25.4 Examples

3.2.2.7.25.5 Inserting a single row

INSERT INTO cool_animals VALUES (5, 'fox', 15);

3.2.2.7.25.6 Inserting into rows with default values

The id column is an IDENTITY column, and has a default, so it can be omitted.

INSERT INTO cool_animals(name, weight) VALUES ('fox', 15);

3.2.2.7.25.7 Changing column order

INSERT INTO cool_animals(name, weight, id) VALUES ('possum', 7, 6);

3.2.2.7.25.8 Inserting multiple rows

INSERT INTO cool_animals(name, weight) VALUES ('koala', 20), ('lemur', 6), ('kiwi', 3);

3.2.2.7.25.9 Import data from other tables

INSERT can be used to insert data obtained from queries on other tables, including foreign tables. For example, farm=> SELECT name, weight FROM all_animals . WHERE region = 'Australia'; name | weight ------+------Kangaroo | 120 Koala | 20 Wombat | 60 Platypus | 5 Wallaby | 35 Echidna | 8 Dingo | 25

INSERT INTO cool_animals(name,weight) SELECT name, weight FROM all_animals WHERE region = 'Australia';

456 Chapter 3. Reference SQream DB, Release 2021.1

3.2.2.7.25.10 Inserting data with positional placeholders

When preparing an INSERT statement for loading data over the network (for example, from a Python or Java application, use positional placeholders. Example using Python:

data = [["Kangaroo", 120], ["Koala", 20], ["Platypus", 5]] data_len = len(data)

insert_stmt = 'INSERT INTO cool_animals (name, weight) VALUES (?, ?)' con.executemany(insert_stmt, data)

Note: The executemany method is used only for parametrized statements like INSERT. Running multiple SELECT queries or other statements this way is not supported.

3.2.2.7.26 SELECT

SELECT is the main statement that allows reading and processing of data. It is used to retrieve rows and columns from one or more tables. When used alone, the statement is known as a “SELECT statement” or “SELECT query”, but it can also be combined to create more elaborate statements like CREATE TABLE AS.

Note: The SELECT keyword is also used to execute utility functions, such as SELECT get_ddl();. These are covered in their respective locations.

In this topic:

• Permissions • Syntax • Elements • Notes – Query processing – Select lists • Examples – Simple queries – Show a count of the rows – Get all columns – Filter on conditions – Filter based on a list – Select only distinct rows – Count distinct values

3.2. SQL Statements and Syntax 457 SQream DB, Release 2021.1

– Rename columns with aliases – Searching with LIKE – Aggregate functions – Filtering on aggregates – Sorting results – Sorting with multiple columns – Combining two or more queries – Common table expressions (CTE) ∗ Nested CTEs ∗ Reusing CTEs

3.2.2.7.26.1 Permissions

The role must have the SELECT permission on every table or schema that is referenced by the SELECT query.

3.2.2.7.26.2 Syntax query ::= query_term [ UNION ALL query_term [ ...]] [ ORDER BY order [, ... ]] [ LIMIT num_rows ] ; query_term ::= [ WITH table_alias AS (query_term) [, ...] ] SELECT [ TOP num_rows ] [ DISTINCT ] select_list [ FROM table_ref [, ... ] [ WHERE value_expr [ GROUP BY value_expr [, ... ] [ HAVING value_expr ] ] | ( VALUES ( value_expr [, ... ] ) [, ... ]) select_list ::= value_expr [ [ AS ] column_alias ] [, ... ] | window_expr [ [ AS ] column alias ] column_alias ::= identifier

(continues on next page)

458 Chapter 3. Reference SQream DB, Release 2021.1

(continued from previous page) table_alias ::= identifier

window_expr ::= window_fn ( [ value_expr [, ...]]) OVER ( [ PARTITION BY value_expr [, ...]] [ ORDER BY order [ ASC | DESC ] [, ...]] ) table_ref ::= [schema_name.]table_name [ [ AS ] alias [ ( column_alias [, ... ])]] | ( query ) [ [AS] alias [ ( column_alias [, ... ])]] | table_ref join_type table_ref [ ON value_expr ] | table_alias

schema_name ::= identifier

table_name ::= identifier

alias ::= identifier

join_type ::= [ INNER ] [ join_hint ] JOIN | LEFT [ OUTER ] [ join_hint ] JOIN | RIGHT [ OUTER ] [ join_hint ] JOIN | CROSS [ join_hint ] JOIN

join_hint ::= MERGE | LOOP

order ::= value_expr [ ASC | DESC ] [, ...][NULLS FIRST | LAST ]

3.2. SQL Statements and Syntax 459 SQream DB, Release 2021.1

3.2.2.7.26.3 Elements

Parameter Description DISTINCT Remove duplicates FROM A table name or another sub-clause that creates a table, such as VALUES or a subquery. table_ref WHERE An expression that returns Boolean values using columns, such as = . value_expr Rows that do not match the expression will not show up in the result set. GROUP BY Aggregate on specific columns or values. Often used with aggregate functions. value_expr HAVING Only return values that match the expression. HAVING is like WHERE, but for results of the value_expr aggregate functions. ORDER BY A comma separated list of ordering specifications, used to change the order of the results. order LIMIT Restricts the operation to only retrieve the first num_rows rows. num_rows UNION ALL Concatenates the results of two queries together. UNION ALL does not remove duplicates.

3.2.2.7.26.4 Notes

3.2.2.7.26.5 Query processing

Queries are processed in a manner equivalent to the following order: 1. FROM, including nested queries in the FROM 2. WHERE 3. SELECT list row → value functions and these functions inside aggregates and window function calls in select list 4. GROUP BY and aggregates 5. HAVING 6. Window functions 7. SELECT list row → value functions on the outside of aggregates and window functions 8. DISTINCT 9. UNION ALL 10. ORDER BY 11. LIMIT / TOP Inside the FROM clause, the processing occurs in the usual way, from the outside in.

3.2.2.7.26.6 Select lists

The select_list is a comma separated list of column names and value expressions. • Use LIMIT num_rows to retrieve only the first num_rows results. SQream DB also supports the TOP num_rows syntax from SQL Server. • DISTINCT can be used to remove duplicate rows.

460 Chapter 3. Reference SQream DB, Release 2021.1

• Value expressions in select lists support aggregate and window functions as well as normal value expressions.

Tip: Each expression in the select list is given an ordinal number, from 1 to the number of expressions. When using ORDER BY or GROUP BY, these ordinals are used as shorthand to refer to these expressions.

SELECT a, SUM(b) FROM t GROUP BY a ORDER BY SUM(b) DESC; -- is equivalent to: SELECT a, SUM(b) FROM t GROUP BY 1 ORDER BY 2 DESC;

3.2.2.7.26.7 Examples

Assume a table named nba, with the following structure:

CREATE TABLE nba ( Name varchar(40), Team varchar(40), Number tinyint, Position varchar(2), Age tinyint, Height varchar(4), Weight real, College varchar(40), Salary float );

Here’s a peek at the table contents (Download nba.csv):

Table 15: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics

3.2. SQL Statements and Syntax 461 SQream DB, Release 2021.1

3.2.2.7.26.8 Simple queries

This query will get the Name, Team name, and Age from the NBA table, but only show the first 10 results. nba=> SELECT Name, Team, Age FROM nba LIMIT 10; Avery Bradley,Boston Celtics,25 Jae Crowder,Boston Celtics,25 John Holland,Boston Celtics,27 R.J. Hunter,Boston Celtics,22 Jonas Jerebko,Boston Celtics,29 Amir Johnson,Boston Celtics,29 Jordan Mickey,Boston Celtics,21 Kelly Olynyk,Boston Celtics,25 Terry Rozier,Boston Celtics,22 Marcus Smart,Boston Celtics,22

3.2.2.7.26.9 Show a count of the rows

Use COUNT(*) to retrieve the number of rows in a result. nba=> SELECT COUNT(*) FROM nba; 457

3.2.2.7.26.10 Get all columns

* is used as shorthand for “all columns”.

Warning: Running a SELECT * query on very large tables can occupy the client for a long time, if the result set is big. nba=> SELECT * FROM nba; Name | Team | Number | Position | Age | Height |␣ ,→Weight | College | Salary ------+------+------+------+-----+------+--- ,→-----+------+------Avery Bradley | Boston Celtics | 0 | PG | 25 | 6-2 | ␣ ,→ 180 | Texas | 7730337 Jae Crowder | Boston Celtics | 99 | SF | 25 | 6-6 | ␣ ,→ 235 | Marquette | 6796117 John Holland | Boston Celtics | 30 | SG | 27 | 6-5 | ␣ ,→ 205 | Boston University | R.J. Hunter | Boston Celtics | 28 | SG | 22 | 6-5 | ␣ ,→ 185 | Georgia State | 1148640 Jonas Jerebko | Boston Celtics | 8 | PF | 29 | 6-10 | ␣ ,→ 231 | | 5000000 Amir Johnson | Boston Celtics | 90 | PF | 29 | 6-9 | ␣ ,→ 240 | | 12000000 Jordan Mickey | Boston Celtics | 55 | PF | 21 | 6-8 | ␣ ,→ 235 | LSU | 1170960 (continues on next page)

462 Chapter 3. Reference SQream DB, Release 2021.1

(continued from previous page) Kelly Olynyk | Boston Celtics | 41 | C | 25 | 7-0 | ␣ ,→ 238 | Gonzaga | 2165160 Terry Rozier | Boston Celtics | 12 | PG | 22 | 6-2 | ␣ ,→ 190 | Louisville | 1824360 Marcus Smart | Boston Celtics | 36 | PG | 22 | 6-4 | ␣ ,→ 220 | Oklahoma State | 3431040

3.2.2.7.26.11 Filter on conditions

Use the WHERE clause to filter results. nba=> SELECT "Name","Age","Salary" FROM nba WHERE "Age" < 24 LIMIT 5; R.J. Hunter,22,1148640 Jordan Mickey,21,1170960 Terry Rozier,22,1824360 Marcus Smart,22,3431040 James Young,20,1749840 nba=> SELECT "Name","Age","Salary" FROM nba WHERE "Age" < 24 AND "Salary" > 1800000␣ ,→LIMIT 5; Terry Rozier,22,1824360 Marcus Smart,22,3431040 Kristaps Porzingis,20,4131720 Joel Embiid,22,4626960 Nerlens Noel,22,3457800

3.2.2.7.26.12 Filter based on a list

WHERE column IN (value_expr in comma separated list) matches the column with any value in the list. nba=> SELECT "Name","Age","Salary","Team" FROM nba WHERE "Team" IN ('Utah Jazz', ,→'Portland Trail Blazers'); Cliff Alexander,20,525093,Portland Trail Blazers Al-Farouq Aminu,25,8042895,Portland Trail Blazers Pat Connaughton,23,625093,Portland Trail Blazers [...] Shelvin Mack,26,2433333,Utah Jazz Raul Neto,24,900000,Utah Jazz Tibor Pleiss,26,2900000,Utah Jazz Jeff Withey,26,947276,Utah Jazz

3.2.2.7.26.13 Select only distinct rows nba=> SELECT DISTINCT "Team" FROM nba; Atlanta Hawks Boston Celtics (continues on next page)

3.2. SQL Statements and Syntax 463 SQream DB, Release 2021.1

(continued from previous page) Brooklyn Nets Charlotte Hornets Chicago Bulls Cleveland Cavaliers Dallas Mavericks Denver Nuggets Detroit Pistons Golden State Warriors Houston Rockets Indiana Pacers Los Angeles Clippers Los Angeles Lakers Memphis Grizzlies Miami Heat Milwaukee Bucks Minnesota Timberwolves New Orleans Pelicans New York Knicks Oklahoma City Thunder Orlando Magic Philadelphia 76ers Phoenix Suns Portland Trail Blazers Sacramento Kings San Antonio Spurs Toronto Raptors Utah Jazz Washington Wizards

3.2.2.7.26.14 Count distinct values nba=> SELECT COUNT(DISTINCT "Team") FROM nba; 30

3.2.2.7.26.15 Rename columns with aliases nba=> SELECT "Name" AS "Player", -- Note usage of AS . "Team", . "Salary" "Yearly salary" -- AS is optional. . -- This is identical to "Salary" AS "Yearly salary"

FROM nba LIMIT 5; Player | Team | Yearly salary ------+------+------Avery Bradley | Boston Celtics | 7730337 Jae Crowder | Boston Celtics | 6796117 John Holland | Boston Celtics | R.J. Hunter | Boston Celtics | 1148640 Jonas Jerebko | Boston Celtics | 5000000

464 Chapter 3. Reference SQream DB, Release 2021.1

3.2.2.7.26.16 Searching with LIKE

LIKE allows pattern matching text in the WHERE clause. • % matches 0 or more characters • _ matches exactly 1 character nba=> SELECT "Name","Age","Salary","Team" FROM nba WHERE "Team" LIKE 'Portland%' LIMIT 5; Cliff Alexander,20,525093,Portland Trail Blazers Al-Farouq Aminu,25,8042895,Portland Trail Blazers Pat Connaughton,23,625093,Portland Trail Blazers Allen Crabbe,24,947276,Portland Trail Blazers Ed Davis,27,6980802,Portland Trail Blazers

3.2.2.7.26.17 Aggregate functions

Aggregate functions compute a single result from a column.

Tip: Aggregate functions can return NULL if no rows are selected or all input values are NULL. The notable exception to this rule is COUNT, which always returns an integer. Use COALESCE to substitute zero or another value for NULL when necessary. nba=> SELECT max("Salary") FROM nba; 25000000

Aggregate functions are often combined with GROUP BY. nba=> SELECT "Team",max("Salary") FROM nba GROUP BY "Team"; Atlanta Hawks,18671659 Boston Celtics,12000000 Brooklyn Nets,19689000 Charlotte Hornets,13500000 [...] Utah Jazz,15409570 Washington Wizards,15851950

Note: Unlike some other databases, when using an aggregate function, all other items in the select list must either be aggregated or be specified in a GROUP BY. A query like SELECT "Team",max("Salary") FROM nba is not valid, and will result in an error.

3.2.2.7.26.18 Filtering on aggregates

Filtering on aggregates is done with the HAVING clause, rather than the WHERE clause. nba=> SELECT "Team",AVG("Salary") FROM nba GROUP BY "Team" HAVING AVG("Salary") BETWEEN␣ ,→4477884 AND 5018868; Atlanta Hawks,4860196 (continues on next page)

3.2. SQL Statements and Syntax 465 SQream DB, Release 2021.1

(continued from previous page) Dallas Mavericks,4746582 Detroit Pistons,4477884 Houston Rockets,5018868 Los Angeles Lakers,4784695 Minnesota Timberwolves,4593053 New York Knicks,4581493 Sacramento Kings,4778911 Toronto Raptors,4741174

3.2.2.7.26.19 Sorting results

ORDER BY takes a comma separated list of ordering specifications - a column followed by ASC for ascending or DESC for descending.

Note: When ORDER BY is not specified in a query, rows are returned based on the order in which theywere read, not by any consistent criteria. Unlike some databases, NULL values are neither first nor last - but can appear anywhere in the result set.

Tip: SQream DB does not support functions and complex arguments in the ORDER BY clause. To work around this limitation, use ordinals or aliases, as with the examples below, which are functionally identical. nba=> SELECT "Team",AVG("Salary") as "Average Salary" FROM nba GROUP BY "Team" ORDER BY ,→"Average Salary" DESC; Team | Average Salary ------+------Cleveland Cavaliers | 7642049 Miami Heat | 6347359 Los Angeles Clippers | 6323642 Oklahoma City Thunder | 6251019 [...] Brooklyn Nets | 3501898 Portland Trail Blazers | 3220121 Philadelphia 76ers | 2213778 nba=> SELECT "Team",AVG("Salary") as "Average Salary" FROM nba GROUP BY "Team" ORDER BY␣ ,→2 DESC; Team | Average Salary ------+------Cleveland Cavaliers | 7642049 Miami Heat | 6347359 Los Angeles Clippers | 6323642 Oklahoma City Thunder | 6251019 [...] Brooklyn Nets | 3501898 Portland Trail Blazers | 3220121 Philadelphia 76ers | 2213778

466 Chapter 3. Reference SQream DB, Release 2021.1

3.2.2.7.26.20 Sorting with multiple columns

Order retrieved rows by multiple columns: nba=> SELECT "Name", "Position", "Weight", "Salary" FROM nba ORDER BY "Weight" DESC, ,→"Salary" ASC; Name | Position | Weight | Salary ------+------+------+------Nikola Pekovic | C | 307 | 12100000 Boban Marjanovic | C | 290 | 1200000 Al Jefferson | C | 289 | 13500000 [...] Tim Frazier | PG | 170 | 845059 Brandon Jennings | PG | 169 | 8344497 Briante Weber | PG | 165 | Bryce Cotton | PG | 165 | 700902 Aaron Brooks | PG | 161 | 2250000

3.2.2.7.26.21 Combining two or more queries

UNION ALL can be used to combine the results of two or more queries into one result set. UNION ALL does not remove duplicate results. nba=> SELECT "Position" FROM nba WHERE "Weight" > 300 . UNION ALL SELECT "Position" FROM nba WHERE "Weight" < 170; C PG PG PG PG

3.2.2.7.26.22 Common table expressions (CTE)

Common table expressions or CTEs allow a possibly complex subquery to be represented in a short way later on, for improved readability. It does not affect query performance. nba=> WITH s AS (SELECT "Name" FROM nba WHERE "Salary" > 20000000) . SELECT * FROM nba AS n, s WHERE n."Name" = s."Name"; Name | Team | Number | Position | Age | Height | Weight |␣ ,→College | Salary | name0 ------+------+------+------+-----+------+------+---- ,→------+------+------Carmelo Anthony | New York Knicks | 7 | SF | 32 | 6-8 | 240 |␣ ,→Syracuse | 22875000 | Carmelo Anthony Chris Bosh | Miami Heat | 1 | PF | 32 | 6-11 | 235 |␣ ,→Georgia Tech | 22192730 | Chris Bosh Chris Paul | Los Angeles Clippers | 3 | PG | 31 | 6-0 | 175 |␣ ,→Wake Forest | 21468695 | Chris Paul (continues on next page)

3.2. SQL Statements and Syntax 467 SQream DB, Release 2021.1

(continued from previous page) Derrick Rose | Chicago Bulls | 1 | PG | 27 | 6-3 | 190 |␣ ,→Memphis | 20093064 | Derrick Rose Dwight Howard | Houston Rockets | 12 | C | 30 | 6-11 | 265 | ␣ ,→ | 22359364 | Dwight Howard Kevin Durant | Oklahoma City Thunder | 35 | SF | 27 | 6-9 | 240 |␣ ,→Texas | 20158622 | Kevin Durant Kobe Bryant | Los Angeles Lakers | 24 | SF | 37 | 6-6 | 212 | ␣ ,→ | 25000000 | Kobe Bryant LeBron James | Cleveland Cavaliers | 23 | SF | 31 | 6-8 | 250 | ␣ ,→ | 22970500 | LeBron James

In this example, the WITH clause defines the temporary name r for the subquery which finds salaries over $20 million. The result set becomes a valid table reference in any table expression of the subsequent SELECT clause.

3.2.2.7.26.23 Nested CTEs

SQream DB also supports any amount of nested CTEs, such as this:

WITH w AS (SELECT * FROM (WITH x AS (SELECT * FROM nba) SELECT * FROM x ORDER BY "Salary" DESC)) SELECT * FROM w ORDER BY "Weight" DESC;

3.2.2.7.26.24 Reusing CTEs

SQream DB supports reusing CTEs several times in a query:

nba=> WITH . nba_ct AS (SELECT "Name", "Team" FROM nba WHERE "College"='Connecticut'), . nba_az AS (SELECT "Name", "Team" FROM nba WHERE "College"='Arizona') . SELECT * FROM nba_az JOIN nba_ct ON nba_ct."Team" = nba_az."Team"; Name | Team | name0 | team0 ------+------+------+------Stanley Johnson | Detroit Pistons | Andre Drummond | Detroit Pistons Aaron Gordon | Orlando Magic | Shabazz Napier | Orlando Magic

3.2.2.7.27 TRUNCATE

TRUNCATE removes all rows from a table. It is functionally identical to running a DELETE statement without a WHERE clause.

3.2.2.7.27.1 Permissions

The role must have the DELETE permission at the table level.

468 Chapter 3. Reference SQream DB, Release 2021.1

3.2.2.7.27.2 Syntax

truncate_table_statement ::= TRUNCATE [ TABLE ] [schema_name.]table_name [ RESTART IDENTITY | CONTINUE IDENTITY ] ; table_name ::= identifier schema_name ::= identifier

3.2.2.7.27.3 Parameters

Parameter Description schema_name The name of the schema for the table. table_name The name of the table to truncate. RESTART IDENTITY Automatically restart sequences owned by columns of the truncated table. CONTINUE IDENTITY Do not change the values of sequences. This is the default.

3.2.2.7.27.4 Examples

3.2.2.7.27.5 Truncating a table

farm=> SELECT * FROM cool_animals; 1,Dog ,7 2,Possum ,3 3,Cat ,5 4,Elephant ,6500 5,Rhinoceros ,2100 6,\N,\N

6 rows

farm=> TRUNCATE TABLE cool_animals; executed

farm=> SELECT * FROM cool_animals;

0 rows

3.2.2.7.28 VALUES

VALUES is a table constructor - a clause that can be used to define tabular data.

Tip: • Use VALUES in conjunction with INSERT statements to insert a set of one or more rows.

3.2. SQL Statements and Syntax 469 SQream DB, Release 2021.1

3.2.2.7.28.1 Permissions

This clause requires no special permissions.

3.2.2.7.28.2 Syntax

values_expr ::= VALUES ( value_expr [, ... ] ) [, ... ] ;

3.2.2.7.28.3 Notes

Each set of comma-separated value_expr in parentheses represents a single row in the result set. Column names of the result table are auto-generated. To rename the column names, add an AS clause.

3.2.2.7.28.4 Examples

3.2.2.7.28.5 Tabular data with VALUES

master=> VALUES (1,2,3,4), (5,6,7,8), (9,10,11,12); 1,2,3,4 5,6,7,8 9,10,11,12 3 rows

3.2.2.7.28.6 Using VALUES with a SELECT query

To use VALUES in a select query, assign a name to the VALUES clause with AS

master=> SELECT t.* FROM (VALUES (1,2,3,'a'), (5,6,7,'b'), (9,10,11,'c')) AS t; 1,2,3,a 5,6,7,b 9,10,11,c

3 rows

You can also use this to rename the columns

SELECT t.* FROM (VALUES (1,2,3,'a'), (5,6,7,'b'), (9,10,11,'c')) AS t(a,b,c,d);

3.2.2.7.28.7 Creating a table with VALUES

Use AS to assign names to columns

470 Chapter 3. Reference SQream DB, Release 2021.1

CREATE TABLE cool_animals AS (SELECT t.* FROM (VALUES (1, 'dog'), (2, 'cat'), (3, 'horse'), (4, 'hippopotamus') ) AS t(id, name) );

3.2.2.7.29 DROP_SAVED_QUERY

DROP_SAVED_QUERY drops a previously saved query. Read more in the Saved queries guide. See also: ref:save_query, EXECUTE_SAVED_QUERY , ref:show_saved_query, ref:list_saved_queries.

3.2.2.7.29.1 Permissions

Dropping a saved query requires no special permissions.

3.2.2.7.29.2 Syntax drop_saved_query_statement ::= SELECT DROP_SAVED_QUERY(saved_query_name) ; saved_query_name ::= string_literal

3.2.2.7.29.3 Returns

If saved query is dropped successfully, returns nothing.

3.2.2.7.29.4 Parameters

Parameter Description saved_query_name The name of the query to drop

3.2.2.7.29.5 Examples

3.2.2.7.29.6 Dropping a previously saved query t=> SELECT DROP_SAVED_QUERY('select_all'); executed

3.2. SQL Statements and Syntax 471 SQream DB, Release 2021.1

3.2.2.7.30 DUMP_DATABASE_DDL

DUMP_DATABASE_DDL() is a function that shows the CREATE statements for database objects including views and tables. Begining with 2020.3.1, DUMP_DATABASE_DDL includes foreign tables in the output.

Warning: This function does not currently show UDFs. To list available UDFs, use the catalog: farm=> SELECT * FROM sqream_catalog.user_defined_functions; farm,1,my_distance

Then, export UDFs one-by-one using GET_FUNCTION_DDL.

Tip: • For just tables, see GET_DDL. • For just views, see GET_VIEW_DDL. • For UDFs, see GET_FUNCTION_DDL.

3.2.2.7.30.1 Permissions

The role must have the CONNECT permission at the database level.

3.2.2.7.30.2 Syntax dump_database_ddl_statement ::= SELECT DUMP_DATABASE_DDL() ;

3.2.2.7.30.3 Parameters

This function accepts no parameters.

3.2.2.7.30.4 Examples

3.2.2.7.30.5 Getting the DDL for a database farm=> SELECT DUMP_DATABASE_DDL(); create table "public"."cool_animals" ( "id" int not null, "name" varchar(30) not null, "weight" double null, "is_agressive" bool default false not null ) ; (continues on next page)

472 Chapter 3. Reference SQream DB, Release 2021.1

(continued from previous page) create view "public".angry_animals as select "cool_animals"."id" as "id", "cool_animals"."name" as "name", "cool_animals"."weight" as "weight", "cool_animals"."is_agressive" as "is_agressive" from "public".cool_animals as cool_animals where "cool_animals"."is_agressive" = false;

3.2.2.7.30.6 Exporting database DDL to a file

COPY (SELECT DUMP_DATABASE_DDL()) TO '/home/rhendricks/database.ddl';

3.2.2.7.31 EXECUTE_SAVED_QUERY

EXECUTE_SAVED_QUERY executes a previously saved query. Read more in the Saved queries guide. See also: ref:save_query, DROP_SAVED_QUERY , ref:show_saved_query, ref:list_saved_queries.

3.2.2.7.31.1 Permissions

Executing a saved query requires SELECT permissions to access the tables referenced in the query.

3.2.2.7.31.2 Syntax execute_saved_query_statement ::= SELECT EXECUTE_SAVED_QUERY(saved_query_name, [ , argument [ , ... ]]) ; saved_query_name ::= string_literal argument ::= string_literal | number_literal

3.2.2.7.31.3 Returns

Query execution results, based on the query saved.

3.2. SQL Statements and Syntax 473 SQream DB, Release 2021.1

3.2.2.7.31.4 Parameters

Parameter Description saved_query_name The name of the query to execute argument A comma separated list of argument literal values

3.2.2.7.31.5 Notes

• Query parameters can be used as substitutes for literal expressions. Parameters cannot be used to substitute identifiers, column names, table names, or other parts of the query. • Query parameters of a string datatype (like VARCHAR) must be of a fixed length, and can be used in equality checks, but not patterns (e.g. LIKE, RLIKE, etc) • Query parameters’ types are inferred at compile time.

3.2.2.7.31.6 Examples

Assume a table named nba, with the following structure:

CREATE TABLE nba ( Name varchar(40), Team varchar(40), Number tinyint, Position varchar(2), Age tinyint, Height varchar(4), Weight real, College varchar(40), Salary float );

Here’s a peek at the table contents (Download nba.csv):

474 Chapter 3. Reference SQream DB, Release 2021.1

Table 16: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics

3.2.2.7.31.7 Saving and executing a simple query t=> SELECT SAVE_QUERY('select_all','SELECT * FROM nba'); executed t=> SELECT EXECUTE_SAVED_QUERY('select_all'); Name | Team | Number | Position | Age | Height |␣ ,→Weight | College | Salary ------+------+------+------+-----+------+--- ,→-----+------+------Avery Bradley | Boston Celtics | 0 | PG | 25 | 6-2 | ␣ ,→ 180 | Texas | 7730337 Jae Crowder | Boston Celtics | 99 | SF | 25 | 6-6 | ␣ ,→ 235 | Marquette | 6796117 John Holland | Boston Celtics | 30 | SG | 27 | 6-5 | ␣ ,→ 205 | Boston University | R.J. Hunter | Boston Celtics | 28 | SG | 22 | 6-5 | ␣ ,→ 185 | Georgia State | 1148640 [...]

3.2.2.7.31.8 Saving and executing parametrized query

Use parameters to replace them later at execution time.

Tip: Use dollar quoting ($$) to avoid escaping strings.

3.2. SQL Statements and Syntax 475 SQream DB, Release 2021.1

t=> SELECT SAVE_QUERY(‘select_by_weight_and_team’,$$SELECT * FROM nba WHERE Weight > ? AND Team = ?$$); executed t=> SELECT EXE- CUTE_SAVED_QUERY(‘select_by_weight_and_team’, 240, ‘Toronto Raptors’); Name | Team | Number | Position | Age | Height | Weight | College | Salary ——————+—————–+——–+——— -+—–+——–+——–+————-+——– Bismack Biyombo | Toronto Raptors | 8 | C | 23 | 6-9 | 245 | | 2814000 James Johnson | Toronto Raptors | 3 | PF | 29 | 6-9 | 250 | Wake Forest | 2500000 Jason Thompson | Toronto Raptors | 1 | PF | 29 | 6-11 | 250 | Rider | 245177 Jonas Valanciunas | Toronto Raptors | 17 | C | 24 | 7-0 | 255 | | 4660482

3.2.2.7.32 GET_DDL

GET_DDL(

) is a function that shows the CREATE TABLE statement for a table.

Tip: • For views, see GET_VIEW_DDL. • For the entire database, see DUMP_DATABASE_DDL. • For UDFs, see GET_FUNCTION_DDL.

3.2.2.7.32.1 Permissions

The role must have the CONNECT permission at the database level.

3.2.2.7.32.2 Syntax

get_ddl_statement ::= SELECT GET_DDL('[schema_name.]table_name') ; schema_name ::= identifier table_name ::= identifier

3.2.2.7.32.3 Parameters

Parameter Description schema_name The name of the schema. table_name The name of the table.

476 Chapter 3. Reference SQream DB, Release 2021.1

3.2.2.7.32.4 Examples

3.2.2.7.32.5 Getting the DDL for a table

The result of the GET_DDL function is a verbose version of the CREATE TABLE syntax, which may include additional information that was added by SQream DB. For example, a NULL constraint may be specified explicitly. farm=> CREATE TABLE cool_animals ( id INT NOT NULL, name varchar(30) NOT NULL, weight FLOAT, is_agressive BOOL DEFAULT false NOT NULL ); executed farm=> SELECT GET_DDL('cool_animals'); create table "public"."cool_animals" ( "id" int not null, "name" varchar(30) not null, "weight" double null, "is_agressive" bool default false not null ) ;

3.2.2.7.32.6 Exporting table DDL to a file

COPY (SELECT GET_DDL('cool_animals')) TO '/home/rhendricks/animals.ddl';

3.2.2.7.33 GET_FUNCTION_DDL

GET_FUNCTION_DDL() is a function that shows the CREATE FUNCTION statement for a function.

Tip: • For tables, see GET_DDL. • For views, see GET_VIEW_DDL. • For the entire database, see DUMP_DATABASE_DDL.

3.2.2.7.33.1 Permissions

The role must have the CONNECT permission at the database level.

3.2. SQL Statements and Syntax 477 SQream DB, Release 2021.1

3.2.2.7.33.2 Syntax get_function_ddl_statement ::= SELECT GET_FUNCTION_DDL('function_name') ; function_name ::= identifier

3.2.2.7.33.3 Parameters

Parameter Description function_name The name of the function.

3.2.2.7.33.4 Examples

3.2.2.7.33.5 Getting the DDL for a function

The result of the GET_FUNCTION_DDL function is a verbose version of the CREATE FUNCTION statement, which may include additional information that was added by SQream DB. For example, some type names and identifiers may be quoted or altered. master=> CREATE OR REPLACE FUNCTION my_distance (x1 float, y1 float, x2 float,␣ ,→ y2 float) RETURNS float as $$ import math if y1 < x1: return 0.0 else: return math.sqrt((y2 - y1) ** 2 + (x2 - x1) ** 2) $$ LANGUAGE PYTHON; executed master=> SELECT GET_FUNCTION_DDL('my_distance'); create function "my_distance" (x1 float, y1 float, x2 float, y2 float) returns float as $$ import math if y1 < x1: return 0.0 else: return math.sqrt((y2 - y1) ** 2 + (x2 - x1) ** 2) $$ language python volatile;

478 Chapter 3. Reference SQream DB, Release 2021.1

3.2.2.7.33.6 Exporting function DDL to a file

COPY (SELECT GET_FUNCTION_DDL('my_distance')) TO '/home/rhendricks/my_distance.sql';

3.2.2.7.34 GET_LICENSE_INFO

SQream displays information related to data size limitations, expiration date, type of license shown by the new UF. When invoked, get_license_info() returns the license information in the following order: 1. Compressed data size (GB) 2. Uncompressed data size (GB) 3. Compression type 4. Data size limit (GB) 5. Expiration date 6. is date expired(0/1) - 0 is no, and 1 is yes 7. is size exceeded(0/1) - 0 is no, and 1 is yes 8. data_size_left (GB) The following is an example of the above licensing information:

10,100,compressed,20,2045-03-18,0,0,10

3.2.2.7.35 GET_VIEW_DDL

GET_VIEW_DDL() is a function that shows the CREATE VIEW statement for a view.

Tip: • For tables, see GET_DDL. • For the entire database, see DUMP_DATABASE_DDL. • For UDFs, see GET_FUNCTION_DDL.

3.2.2.7.35.1 Permissions

The role must have the CONNECT permission at the database level.

3.2.2.7.35.2 Syntax

3.2. SQL Statements and Syntax 479 SQream DB, Release 2021.1

get_view_ddl_statement ::= SELECT GET_VIEW_DDL('[schema_name.]view_name') ; schema_name ::= identifier view_name ::= identifier

3.2.2.7.35.3 Parameters

Parameter Description schema_name The name of the schema. view_name The name of the view.

3.2.2.7.35.4 Examples

3.2.2.7.35.5 Getting the DDL for a view

The result of the GET_VIEW_DDL function is a verbose version of the CREATE VIEW statement, which may include additional information that was added by SQream DB. For example, schemas and column names will be be specified explicitly. farm=> CREATE VIEW angry_animals AS SELECT * FROM cool_animals WHERE is_agressive =␣ ,→false; executed farm=> SELECT GET_VIEW_DDL('angry_animals'); create view "public".angry_animals as select "cool_animals"."id" as "id", "cool_animals"."name" as "name", "cool_animals"."weight" as "weight", "cool_animals"."is_agressive" as "is_agressive" from "public".cool_animals as cool_animals where "cool_animals"."is_agressive" = false;

3.2.2.7.35.6 Exporting view DDL to a file

COPY (SELECT GET_VIEW_DDL('angry_animals')) TO '/home/rhendricks/angry_animals.sql';

3.2.2.7.36 LIST_SAVED_QUERIES

LIST_SAVED_QUERIES lists the available previously saved queries. This is an alternative way to using the savedqueries catalog view.

480 Chapter 3. Reference SQream DB, Release 2021.1

Read more in the Saved queries guide. See also: ref:save_query, EXECUTE_SAVED_QUERY , ref:drop_saved_query, ref:show_saved_query.

3.2.2.7.36.1 Permissions

Listing the saved queries requires no special permissions.

3.2.2.7.36.2 Syntax list_saved_queries_statement ::= SELECT LIST_SAVED_QUERIES() ;

3.2.2.7.36.3 Returns

List of saved query names, one per row.

3.2.2.7.36.4 Parameters

None

3.2.2.7.36.5 Notes

This statement returns an empty result set if no saved queries exist in the current database.

3.2.2.7.36.6 Examples

3.2.2.7.36.7 Listing previously saved queries t=> SELECT LIST_SAVED_QUERIES(); saved_query ------select_all select_by_weight select_by_weight_and_team t=> SELECT SHOW_SAVED_QUERY('select_by_weight_and_team'); saved_query ------SELECT * FROM nba WHERE Weight > ? AND Team = ?

3.2.2.7.36.8 Listing saved queries with the catalog

Using the catalog is also possible:

3.2. SQL Statements and Syntax 481 SQream DB, Release 2021.1

t=> SELECT * FROM sqream_catalog.savedqueries; name | num_parameters ------+------select_all | 0 select_by_weight | 1 select_by_weight_and_team | 2

3.2.2.7.37 RECOMPILE_SAVED_QUERY

RECOMPILE_SAVED_QUERY recompiles a saved query that has been invalidated due to a schema change.

3.2.2.7.37.1 Permissions

Recompiling a saved query requires no special permissions.

3.2.2.7.37.2 Syntax recompile_saved_query_statement ::= SELECT RECOMPILE_SAVED_QUERY(saved_query_name) ; saved_query_name ::= string_literal

3.2.2.7.37.3 Returns

If saved query is recompiled successfully, returns nothing.

3.2.2.7.37.4 Parameters

Parameter Description saved_query_name The name of the query to recompile

3.2.2.7.37.5 Examples

3.2.2.7.37.6 Recreating a query that has been invalidated t=> SELECT SAVE_QUERY('select_by_weight_and_team',$$SELECT * FROM nba WHERE Weight > ?␣ ,→AND Team = ?$$); executed t=> SELECT EXECUTE_SAVED_QUERY('select_by_weight_and_team', 240, 'Toronto Raptors'); Name | Team | Number | Position | Age | Height | Weight |␣ ,→College | Salary ------+------+------+------+-----+------+------+------,→-----+------(continues on next page)

482 Chapter 3. Reference SQream DB, Release 2021.1

(continued from previous page) Bismack Biyombo | Toronto Raptors | 8 | C | 23 | 6-9 | 245 | ␣ ,→ | 2814000 James Johnson | Toronto Raptors | 3 | PF | 29 | 6-9 | 250 | Wake␣ ,→Forest | 2500000 Jason Thompson | Toronto Raptors | 1 | PF | 29 | 6-11 | 250 | Rider ␣ ,→ | 245177 Jonas Valanciunas | Toronto Raptors | 17 | C | 24 | 7-0 | 255 | ␣ ,→ | 4660482

To invalidate the original saved query, we will change the schema without affecting our original query text: t=> ALTER TABLE nba RENAME COLUMN age to "Age (as of 2015)"; executed

However, because the query was compiled previously, this change invalidates the query and causes it to fail: t=> SELECT EXECUTE_SAVED_QUERY('select_by_weight_and_team', 240, 'Toronto Raptors'); Error: column not found {Age@null} column not found {Age@null}

Recompiling the query will fix this issue t=> SELECT RECOMPILE_SAVED_QUERY('select_by_weight_and_team'); executed t=> SELECT EXECUTE_SAVED_QUERY('select_by_weight_and_team', 240, 'Toronto Raptors'); Name | Team | Number | Position | Age (as of 2015) | Height |␣ ,→Weight | College | Salary ------+------+------+------+------+------+---- ,→----+------+------Bismack Biyombo | Toronto Raptors | 8 | C | 23 | 6-9 | ␣ ,→245 | | 2814000 James Johnson | Toronto Raptors | 3 | PF | 29 | 6-9 | ␣ ,→250 | Wake Forest | 2500000 Jason Thompson | Toronto Raptors | 1 | PF | 29 | 6-11 | ␣ ,→250 | Rider | 245177 Jonas Valanciunas | Toronto Raptors | 17 | C | 24 | 7-0 | ␣ ,→255 | | 4660482

3.2.2.7.38 RECOMPILE_VIEW

RECOMPILE_VIEW() is a function that can recreate a view that has been invalidated due to a schema change.

3.2.2.7.38.1 Permissions

The role must have the DDL permission at the database level, as well as SELECT permissions for any tables referenced by the view.

3.2. SQL Statements and Syntax 483 SQream DB, Release 2021.1

3.2.2.7.38.2 Syntax recompile_view_statement ::= SELECT RECOMPILE_VIEW('[schema_name].view_name') ; schema_name ::= identifier view_name ::= identifier

3.2.2.7.38.3 Parameters

Parameter Description schema_name The name of the schema the view is in. view_name The name of the view.

3.2.2.7.38.4 Examples

3.2.2.7.38.5 Recreating a view that has been invalidated farm=> SELECT * FROM agressive_animals; View 'public.agressive_animals' is invalid since it references a table that has been␣ ,→dropped/altered. The probable candidates are: [ "public.cool_animals" ] farm=> SELECT recompile_view('only_agressive_animals'); executed

3.2.2.7.39 SAVE_QUERY

SAVE_QUERY saves a query execution plan. Read more in the Saved queries guide. See also: EXECUTE_SAVED_QUERY , DROP_SAVED_QUERY , SHOW_SAVED_QUERY , LIST_SAVED_QUERIES.

3.2.2.7.39.1 Permissions

No special permissions are needed to save a query.

3.2.2.7.39.2 Syntax save_query_statement ::= SELECT SAVE_QUERY(saved_query_name, parameterized_query_string) ; (continues on next page)

484 Chapter 3. Reference SQream DB, Release 2021.1

(continued from previous page) saved_query_name ::= string_literal parameterized_query_string ::= string_literal

3.2.2.7.39.3 Returns

If query is saved correctly, this statement does not return anything.

3.2.2.7.39.4 Parameters

Parameter Description saved_query_name The name of the query to save. This name will identify the query table_name The query to save. Can be dollar quoted and contain parameters (?)

3.2.2.7.39.5 Notes

• Query names are unique across the database and can’t be repeated unless dropped. • The query is compiled upon save, but not executed. It must be a valid query at compile time. • Query parameters can be used as substitutes for literal expressions. Parameters cannot be used to substitute identifiers, column names, table names, or other parts of the query. • Query parameters of a string datatype (like VARCHAR) must be of a fixed length, and can be used in equality checks, but not patterns (e.g. LIKE, RLIKE, etc) • Query parameters’ types are inferred at compile time.

3.2.2.7.39.6 Examples

Assume a table named nba, with the following structure:

CREATE TABLE nba ( Name varchar(40), Team varchar(40), Number tinyint, Position varchar(2), Age tinyint, Height varchar(4), Weight real, College varchar(40), Salary float );

Here’s a peek at the table contents (Download nba.csv):

3.2. SQL Statements and Syntax 485 SQream DB, Release 2021.1

Table 17: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics

3.2.2.7.39.7 Saving and executing a simple query t=> SELECT SAVE_QUERY('select_all','SELECT * FROM nba'); executed t=> SELECT EXECUTE_SAVED_QUERY('select_all'); Name | Team | Number | Position | Age | Height |␣ ,→Weight | College | Salary ------+------+------+------+-----+------+--- ,→-----+------+------Avery Bradley | Boston Celtics | 0 | PG | 25 | 6-2 | ␣ ,→ 180 | Texas | 7730337 Jae Crowder | Boston Celtics | 99 | SF | 25 | 6-6 | ␣ ,→ 235 | Marquette | 6796117 John Holland | Boston Celtics | 30 | SG | 27 | 6-5 | ␣ ,→ 205 | Boston University | R.J. Hunter | Boston Celtics | 28 | SG | 22 | 6-5 | ␣ ,→ 185 | Georgia State | 1148640 [...]

3.2.2.7.39.8 Saving and executing parametrized query

Use parameters to replace them later at execution time.

Tip: Use dollar quoting ($$) to avoid escaping strings.

486 Chapter 3. Reference SQream DB, Release 2021.1

t=> SELECT SAVE_QUERY('select_by_weight_and_team',$$SELECT * FROM nba WHERE Weight > ?␣ ,→AND Team = ?$$); executed t=> SELECT EXECUTE_SAVED_QUERY('select_by_weight_and_team', 240, 'Toronto Raptors'); Name | Team | Number | Position | Age | Height | Weight |␣ ,→College | Salary ------+------+------+------+-----+------+------+------,→-----+------Bismack Biyombo | Toronto Raptors | 8 | C | 23 | 6-9 | 245 | ␣ ,→ | 2814000 James Johnson | Toronto Raptors | 3 | PF | 29 | 6-9 | 250 | Wake␣ ,→Forest | 2500000 Jason Thompson | Toronto Raptors | 1 | PF | 29 | 6-11 | 250 | Rider ␣ ,→ | 245177 Jonas Valanciunas | Toronto Raptors | 17 | C | 24 | 7-0 | 255 | ␣ ,→ | 4660482

3.2.2.7.40 SHOW_SAVED_QUERY

SHOW_SAVED_QUERY shows the query text for a previously saved query. Read more in the Saved queries guide. See also: ref:save_query, EXECUTE_SAVED_QUERY , ref:drop_saved_query, ref:list_saved_queries.

3.2.2.7.40.1 Permissions

Showing a saved query requires no special permissions.

3.2.2.7.40.2 Syntax show_saved_query_statement ::= SELECT SHOW_SAVED_QUERY(saved_query_name) ; saved_query_name ::= string_literal

3.2.2.7.40.3 Returns

A single row result containing the saved query string.

3.2.2.7.40.4 Parameters

Parameter Description saved_query_name The name of the query to show

3.2. SQL Statements and Syntax 487 SQream DB, Release 2021.1

3.2.2.7.40.5 Examples

3.2.2.7.40.6 Showing a previously saved query t=> SELECT SAVE_QUERY('select_by_weight_and_team',$$SELECT * FROM nba WHERE Weight > ?␣ ,→AND Team = ?$$); executed t=> SELECT SHOW_SAVED_QUERY('select_by_weight_and_team'); saved_query ------SELECT * FROM nba WHERE Weight > ? AND Team = ?

3.2.2.7.41 EXPLAIN

EXPLAIN returns a static query plan, which can be used to debug query plans. To see an actively running query or statement, use SHOW_NODE_INFO instead. See also SHOW_NODE_INFO, SHOW_SERVER_STATUS.

3.2.2.7.41.1 Permissions

The role must have the SELECT permissions for any tables referenced by the query.

3.2.2.7.41.2 Syntax explain_statement ::= SELECT EXPLAIN(query_stmt) ;

3.2.2.7.41.3 Parameters

Parameter Description query_stmt The select query to generate the plan for.

3.2.2.7.41.4 Notes

Use dollar-quoting to escape the query text

3.2.2.7.41.5 Examples

3.2.2.7.41.6 Generating a static query plan

488 Chapter 3. Reference SQream DB, Release 2021.1

t=> SELECT EXPLAIN($$SELECT DATEADD(hour,1,dt) FROM cool_dates$$); Select Specific (TTNative Gpu) (dateadd@null := dt@null, dateadd@val := (pure_if dt@null "0" (addHoursdt2dt dt@val ,→"1"))) Table Scan public.cool_dates ("dt@null", "dt@val") []

3.2.2.7.42 SHOW_CONNECTIONS

SHOW_CONNECTIONS returns a list of active sessions on the current worker. To see sessions across the cluster, see SHOW_SERVER_STATUS.

3.2.2.7.42.1 Permissions

The role must have the SUPERUSER permissions.

3.2.2.7.42.2 Syntax show_connections_statement ::= SELECT SHOW_CONNECTIONS() ;

3.2.2.7.42.3 Parameters

None

3.2.2.7.42.4 Returns

This function returns a list of active sessions. If no sessions are active on the worker, the result set will be empty.

Table 18: Result columns ip The worker hostname or IP conn_id Connection ID conn_start_time Connection start timestamp stmt_id Statement ID. Connections with no active statement display -1. stmt_start_time Statement start timestamp stmt Statement text

3.2.2.7.42.5 Notes

• This utility shows the active connections. Some sessions may be actively connected, but not currently running a statement.

3.2. SQL Statements and Syntax 489 SQream DB, Release 2021.1

• A connection is typically reused. There could be many statements under a single connection ID.

3.2.2.7.42.6 Examples

3.2.2.7.42.7 Using SHOW_CONNECTIONS to get statement IDs

t=> SELECT SHOW_CONNECTIONS(); ip | conn_id | conn_start_time | stmt_id | stmt_start_time | stmt ------+------+------+------+------+------,→------192.168.1.91 | 103 | 2019-12-24 00:01:27 | 129 | 2019-12-24 00:38:18 | SELECT␣ ,→GET_DATE(), * F... 192.168.1.91 | 23 | 2019-12-24 00:01:27 | -1 | 2019-12-24 00:01:27 | 192.168.1.91 | 22 | 2019-12-24 00:01:27 | -1 | 2019-12-24 00:01:27 | 192.168.1.91 | 26 | 2019-12-24 00:01:28 | -1 | 2019-12-24 00:01:28 |

The statement ID we’re interested in is 129. We can see the connection started at 00:01:27, while the statement started at 00:38:18.

3.2.2.7.43 SHOW_LOCKS

SHOW_LOCKS returns a list of locks from across the cluster. Read more about locks in Concurrency and locks.

3.2.2.7.43.1 Permissions

The role must have the SUPERUSER permissions.

3.2.2.7.43.2 Syntax

show_locks_statement ::= SELECT SHOW_LOCKS() ;

3.2.2.7.43.3 Parameters

None

3.2.2.7.43.4 Returns

This function returns a list of active locks. If no locks are active in the cluster, the result set will be empty.

490 Chapter 3. Reference SQream DB, Release 2021.1

Table 19: Result columns stmt_id Statement ID that caused the lock. stmt_string Statement text username The role that executed the statement server The worker node’s IP port The worker node’s port locked_object The full qualified name of the object being locked, separated with $ (e.g. table$t$public$nba2 for table nba2 in schema public, in database t lockmode The locking mode (inclusive or exclusive). statement_start_timeTimestamp the statement started lock_start_timeTimestamp the lock was obtained

3.2.2.7.43.5 Examples

3.2.2.7.43.6 Using SHOW_LOCKS to see active locks

In this example, we create a table based on results (CREATE TABLE AS), but we are also effectively dropping the previous table (by using OR REPLACE). Thus, SQream DB applies locks during the table creation process to prevent the table from being altered during it’s creation. t=> SELECT SHOW_LOCKS(); statement_id | statement_string ␣ ,→ | username | server | port | locked_object ␣ ,→ | lockmode | statement_start_time | lock_start_time ------+------,→------+------+------+------+------,→--+------+------+------287 | CREATE OR REPLACE TABLE nba2 AS SELECT "Name" FROM nba WHERE REGEXP_COUNT( ,→"Name", '( )+', 8)>1; | sqream | 192.168.1.91 | 5000 | database$t ␣ ,→ | Inclusive | 2019-12-26 00:03:30 | 2019-12-26 00:03:30 287 | CREATE OR REPLACE TABLE nba2 AS SELECT "Name" FROM nba WHERE REGEXP_COUNT( ,→"Name", '( )+', 8)>1; | sqream | 192.168.1.91 | 5000 | globalpermission$ ␣ ,→ | Exclusive | 2019-12-26 00:03:30 | 2019-12-26 00:03:30 287 | CREATE OR REPLACE TABLE nba2 AS SELECT "Name" FROM nba WHERE REGEXP_COUNT( ,→"Name", '( )+', 8)>1; | sqream | 192.168.1.91 | 5000 | schema$t$public ␣ ,→ | Inclusive | 2019-12-26 00:03:30 | 2019-12-26 00:03:30 287 | CREATE OR REPLACE TABLE nba2 AS SELECT "Name" FROM nba WHERE REGEXP_COUNT( ,→"Name", '( )+', 8)>1; | sqream | 192.168.1.91 | 5000 | table$t$public$nba2$Insert ␣ ,→ | Exclusive | 2019-12-26 00:03:30 | 2019-12-26 00:03:30 287 | CREATE OR REPLACE TABLE nba2 AS SELECT "Name" FROM nba WHERE REGEXP_COUNT( ,→"Name", '( )+', 8)>1; | sqream | 192.168.1.91 | 5000 | table$t$public$nba2$Update ␣ ,→ | Exclusive | 2019-12-26 00:03:30 | 2019-12-26 00:03:30

3.2.2.7.44 SHOW_NODE_INFO

SHOW_NODE_INFO returns a snapshot of the current query plan, similar to EXPLAIN ANALYZE from other databases. The snapshot provides information about execution which can be used for monitoring and troubleshooting slow running statements by helping identify long-running execution nodes (components that process data),

3.2. SQL Statements and Syntax 491 SQream DB, Release 2021.1 etc. See also EXPLAIN, SHOW_SERVER_STATUS.

3.2.2.7.44.1 Permissions

The role must have the SUPERUSER permissions.

3.2.2.7.44.2 Syntax show_node_info_statement ::= SELECT SHOW_NODE_INFO(stmt_id) ; stmt_id ::= bigint

3.2.2.7.44.3 Parameters

Parameter Description stmt_id The statement ID to explore

3.2.2.7.44.4 Returns

This utility returns details of the execution nodes, if the statement is still running. If the statement has finished, or the statment ID does not exist, the utility returns an empty result set.

Table 20: Result columns Column name Description stmt_id The ID for the statement node_id This node’s ID in this execution plan node_type The node type rows Total number of rows this node has processed chunks Number of chunks this node has processed avg_rows_in_chunk Average amount of rows that this node processes in each chunk (rows / chunks) time Timestamp for this node’s creation parent_node_id The node_id of this node’s parent read Total data read from disk write Total data written to disk comment Additional information (e.g. table name for ReadTable) timesum Total elapsed time for this execution node’s processing

3.2.2.7.44.5 Node types

This is a full list of node types:

492 Chapter 3. Reference SQream DB, Release 2021.1

Table 21: Node types Column name Execution location Description AddChunkId Used during insert operations AddSequences Used during insert operations, with identity columns AddMinMaxMetadata Used to calculate ranges for the chunk metadata system AddTopSortFilters An operation to optimize LIMIT when used alongside ORDER BY Compress CPU and GPU Compress data with both CPU and GPU schemes CpuDecompress CPU Decompression operation, com- mon for longer VARCHAR types CpuLoopJoin CPU A non-indexed nested loop join, performed on the CPU CpuReduce CPU A reduce process performed on the CPU, primarily with DISTINCT aggregates (e.g. COUNT(DISTINCT ...)) CpuToGpu, GpuToCpu An operation that moves data to or from the GPU for processing CpuTransform CPU A transform operation performed on the CPU, usually a scalar function CrossJoin A join without a join condition DeferredGather CPU Merges the results of GPU oper- ations with a result set1 Distinct GPU Removes duplicate rows (usually as part of the DISTINCT opera- tion) Distinct_Merge CPU The merge operation of the Distinct operation Filter GPU A filtering operation, such asa WHERE or JOIN clause GpuCopy GPU Copies data between GPUs or within a single GPU GpuDecompress GPU Decompression operation GpuReduceMerge GPU An operation to optimize part of the merger phases in the GPU GpuTransform GPU A transformation operation such as a type cast or scalar function Hash CPU Hashes the output result. Used internally. JoinSideMarker Used internally. LiteralValues CPU Creates a virtual relation (table), when VALUES is used LocateFiles CPU Validates external file paths for foreign data wrappers, expanding directories and GLOB patterns LoopJoin GPU A non-indexed nested loop join, performed on the GPU Continued on next page

3.2. SQL Statements and Syntax 493 SQream DB, Release 2021.1

Table 21 – continued from previous page Column name Execution location Description MarkMatchedJoinRows Used in outer joins, matches rows for larger join operations NullifyDuplicates Replaces duplicate values with NULL to calculate distinct aggre- gates NullSink CPU Used internally ParseCsv CPU A CSV parser, used after ReadFiles to convert the CSV into columnar data PopNetworkQueue CPU Fetches data from the network queue (e.g. when used with IN- SERT) PushToNetworkQueue CPU Sends result sets to a client con- nected over the network ReadCatalog CPU Converts the catalog into a rela- tion (table) ReadFiles CPU Reads external flat-files ReadOrc CPU Reads data from an ORC file ReadParquet CPU Reads data from a Parquet file ReadTable CPU Reads data from a standard table stored on disk ReadTableMetadata CPU Reads only table metadata as an optimization Rechunk Reorganize multiple small chunks into a full chunk. Commonly found after joins and when HIGH_SELECTIVITY is used Reduce GPU A reduction operation, such as a GROUP BY ReduceMerge GPU A merge operation of a reduc- tion operation, helps operate on larger-than-RAM data ReorderInput Change the order of arguments in preparation for the next oper- ation SeparatedGather GPU Gathers additional columns for the result Sort GPU Sort operation2 SortMerge CPU A merge operation of a sort op- eration, helps operate on larger- than-RAM data SortMergeJoin GPU A sort-merge join, performed on the GPU TakeRowsFromChunk Take the first N rows from each chunk, to optimize LIMIT when used alongside ORDER BY Top Limits the input size, when used with LIMIT (or its alias TOP) UdfTransform CPU Executes a user defined function Continued on next page

494 Chapter 3. Reference SQream DB, Release 2021.1

Table 21 – continued from previous page Column name Execution location Description UnionAll Combines two sources of data when UNION ALL is used VarCharJoiner Used internally VarCharSplitter Used intenrally Window GPU Executes a non-ranking window function WindowRanking GPU Executes a ranking window func- tion WriteTable CPU Writes the result set to a stan- dard table stored on disk

3.2.2.7.44.6 Statement statuses

Table 22: Statement status values Status Description Preparing Statement is being prepared In queue Statement is waiting for execution Initializing Statement has entered execution checks Executing Statement is executing Stopping Statement is in the process of stopping

3.2.2.7.44.7 Notes

• This utility shows the execution information for active statements. Once a query has finished execution, the information is no longer available using this utility. See Logging for more information about extracting the information from the logs. • This utility is primarily intended for troubleshooting with SQream support.

3.2.2.7.44.8 Examples

3.2.2.7.44.9 Getting execution details for a statement t=> SELECT SHOW_SERVER_STATUS(); service | instanceid | connection_id | serverip | serverport | database_name | user_ ,→name | clientip | statementid | statement ␣ ,→ | statementstarttime | statementstatus | statementstatusstart ------+------+------+------+------+------+------,→----+------+------+------,→------+------+------+------sqream | | 152 | 192.168.1.91 | 5000 | t | sqream␣ ,→ | 192.168.1.91 | 176 | SELECT "Name" FROM nba WHERE REGEXP_COUNT("Name", '(␣ ,→)+', 8)>1; | 25-12-2019 23:53:13 | Executing | 25-12-2019 23:53:13 (continues on next page)

1 Gathers columns which should be returned. This node typically spends most of the time on decompressing additional columns. 2 A GPU sort operation can be added by the statement compiler before GROUP BY or JOIN operations.

3.2. SQL Statements and Syntax 495 SQream DB, Release 2021.1

(continued from previous page) sqream | | 151 | 192.168.1.91 | 5000 | t | sqream␣ ,→ | 192.168.0.1 | 177 | SELECT show_server_status() ␣ ,→ | 25-12-2019 23:51:31 | Executing | 25-12-2019 23:53:13

The statement ID we want to reserach is 176, running on worker 192.168.1.91. The query text is SELECT "Name" FROM nba WHERE REGEXP_COUNT("Name", '( )+', 8)>1; t=> SELECT SHOW_NODE_INFO(176); stmt_id | node_id | node_type | rows | chunks | avg_rows_in_chunk | time ␣ ,→ | parent_node_id | read | write | comment | timeSum ------+------+------+------+------+------+------,→------+------+------+------+------+------176 | 1 | PushToNetworkQueue | 1 | 1 | 1 | 2019-12-25␣ ,→23:53:13 | -1 | | | | 0.0025 176 | 2 | Rechunk | 1 | 1 | 1 | 2019-12-25␣ ,→23:53:13 | 1 | | | | 0 176 | 3 | GpuToCpu | 1 | 1 | 1 | 2019-12-25␣ ,→23:53:13 | 2 | | | | 0 176 | 4 | ReorderInput | 1 | 1 | 1 | 2019-12-25␣ ,→23:53:13 | 3 | | | | 0 176 | 5 | Filter | 1 | 1 | 1 | 2019-12-25␣ ,→23:53:13 | 4 | | | | 0.0002 176 | 6 | GpuTransform | 457 | 1 | 457 | 2019-12-25␣ ,→23:53:13 | 5 | | | | 0.0002 176 | 7 | GpuDecompress | 457 | 1 | 457 | 2019-12-25␣ ,→23:53:13 | 6 | | | | 0 176 | 8 | CpuToGpu | 457 | 1 | 457 | 2019-12-25␣ ,→23:53:13 | 7 | | | | 0.0003 176 | 9 | Rechunk | 457 | 1 | 457 | 2019-12-25␣ ,→23:53:13 | 8 | | | | 0 176 | 10 | CpuDecompress | 457 | 1 | 457 | 2019-12-25␣ ,→23:53:13 | 9 | | | | 0 176 | 11 | ReadTable | 457 | 1 | 457 | 2019-12-25␣ ,→23:53:13 | 10 | 4MB | | public.nba | 0.0004

3.2.2.7.45 SHOW_SERVER_STATUS

SHOW_SERVER_STATUS returns a list of active sessions across the cluster. To list active statements on the current worker only, see SHOW_CONNECTIONS.

3.2.2.7.45.1 Permissions

The role must have the SUPERUSER permissions.

3.2.2.7.45.2 Syntax

496 Chapter 3. Reference SQream DB, Release 2021.1

show_server_status_statement ::= SELECT SHOW_SERVER_STATUS() ;

3.2.2.7.45.3 Parameters

None

3.2.2.7.45.4 Returns

This function returns a list of active sessions. If no sessions are active across the cluster, the result set will be empty.

Table 23: Result columns service The service name for the statement instance The worker ID connection_id Connection ID serverip Worker end-point IP serverport Worker end-point port database_name Database name for the statement user_name Username running the statement clientip Client IP statementid Statement ID statement Statement text statementstarttime Statement start timestamp statementstatus Statement status (see table below) statementstatusstart Last updated timestamp

Table 24: Statement status values Status Description Preparing Statement is being prepared In queue Statement is waiting for execution Initializing Statement has entered execution checks Executing Statement is executing Stopping Statement is in the process of stopping

3.2.2.7.45.5 Notes

• This utility shows the active sessions. Some sessions may be actively connected, but not running any statements.

3.2. SQL Statements and Syntax 497 SQream DB, Release 2021.1

3.2.2.7.45.6 Examples

3.2.2.7.45.7 Using SHOW_SERVER_STATUS to get statement IDs t=> SELECT SHOW_SERVER_STATUS(); service | instanceid | connection_id | serverip | serverport | database_name | user_ ,→name | clientip | statementid | statement | statementstarttime ␣ ,→| statementstatus | statementstatusstart ------+------+------+------+------+------+------,→-----+------+------+------+------+- ,→------+------sqream | | 102 | 192.168.1.91 | 5000 | t |␣ ,→rhendricks | 192.168.0.1 | 128 | SELECT SHOW_SERVER_STATUS() | 24-12-2019␣ ,→00:14:53 | Executing | 24-12-2019 00:14:53

The statement ID is 128, running on worker 192.168.1.91.

3.2.2.7.46 STOP_STATEMENT

STOP_STATEMENT stops or aborts an active statement. To find a statement by ID, see SHOW_SERVER_STATUS and SHOW_CONNECTIONS.

Tip: Some DBMSs call this process killing a session, terminating a job, or kill query

3.2.2.7.46.1 Permissions

The role must have the SUPERUSER permissions.

3.2.2.7.46.2 Syntax stop_statement_statement ::= SELECT STOP_STATEMENT(stmt_id) ; stmt_id ::= bigint

3.2.2.7.46.3 Parameters

Parameter Description stmt_id The statement ID to stop

3.2.2.7.46.4 Returns

This utility does not return any value, and always succeeds even if the statement does not exist, or has already stopped.

498 Chapter 3. Reference SQream DB, Release 2021.1

3.2.2.7.46.5 Notes

• This utility always succeeds even if the statement does not exist, or has already stopped.

3.2.2.7.46.6 Examples

3.2.2.7.46.7 Using SHOW_CONNECTIONS to get statement IDs

Tip: Use SHOW_SERVER_STATUS to find statments from across the entire cluster, or SHOW_CONNECTIONS to show statements from the current worker the client is connected to. t=> SELECT SHOW_CONNECTIONS(); ip | conn_id | conn_start_time | stmt_id | stmt_start_time | stmt ------+------+------+------+------+------,→------192.168.1.91 | 103 | 2019-12-24 00:01:27 | 129 | 2019-12-24 00:38:18 | SELECT␣ ,→GET_DATE(), * F... 192.168.1.91 | 23 | 2019-12-24 00:01:27 | -1 | 2019-12-24 00:01:27 | 192.168.1.91 | 22 | 2019-12-24 00:01:27 | -1 | 2019-12-24 00:01:27 | 192.168.1.91 | 26 | 2019-12-24 00:01:28 | -1 | 2019-12-24 00:01:28 |

The statement ID we’re interested in is 129. We can now stop this statement: t=> SELECT STOP_STATEMENT(129) executed

3.2.2.7.47 SHOW_SUBSCRIBED_INSTANCES

SHOW_SUBSCRIBED_INSTANCES lists the cluster workers and their service queues.

Note: If you haven’t already, read the Workload manager guide.

See also SUBSCRIBE_SERVICE, UNSUBSCRIBE_SERVICE.

3.2.2.7.47.1 Permissions

To list the service queues, the current role must have the SUPERUSER permission.

3.2.2.7.47.2 Syntax show_subscribed_instances_statement ::= SELECT SHOW_SUBSCRIBED_INSTANCES();

3.2. SQL Statements and Syntax 499 SQream DB, Release 2021.1

3.2.2.7.47.3 Parameters

None

3.2.2.7.47.4 Notes

This function applies to clusters using the workload manager.

3.2.2.7.47.5 Examples

3.2.2.7.47.6 List service queues and workers

Each worker is assigned an automatic ID, and is subscribed to the 'sqream' queue upon start. t=> SELECT SHOW_SUBSCRIBED_INSTANCES(); service | servernode | serverip | serverport ------+------+------+------sqream | node_9383 | 192.168.0.111 | 5000 sqream | node_9384 | 192.168.0.111 | 5001 sqream | node_9385 | 192.168.0.111 | 5002 sqream | node_9551 | 192.168.1.91 | 5000

This list shows all cluster nodes, their service names, IPs, and ports.

3.2.2.7.48 SUBSCRIBE_SERVICE

SUBSCRIBE_SERVICE subscribes a worker to a service queue for the duration of the connected session.

Note: If you haven’t already, read the Workload manager guide.

See also UNSUBSCRIBE_SERVICE, SHOW_SUBSCRIBED_INSTANCES.

3.2.2.7.48.1 Permissions

To subscribe a worker to a service, the current role must have the SUPERUSER permission.

3.2.2.7.48.2 Syntax subscribe_service_statement ::= SELECT SUBSCRIBE_SERVICE( worker_id, service_name ); worker_id ::= text service_name ::= text

500 Chapter 3. Reference SQream DB, Release 2021.1

3.2.2.7.48.3 Parameters

Parameter Description worker_id The name of the worker to subscribe service_name The service name to subscribe to. If the service does not exist, it will be created

3.2.2.7.48.4 Notes

• If the service name does not currently exist, it will be created

Warning: SUBSCRIBE_SERVICE applies the service subscription immediately, but the setting applies for the duration of the session. To apply a persistent setting, use the initialSubscribedServices configuration setting. Read the Workload manager guide for more information.

3.2.2.7.48.5 Examples

3.2.2.7.48.6 Subscribing a worker to a service queue

A common use of the workload manager is to separate ETL tasks from BI tasks. Each worker is assigned an automatic ID, and is subscribed to the 'sqream' queue upon start.

t=> SELECT SHOW_SUBSCRIBED_INSTANCES(); service | servernode | serverip | serverport ------+------+------+------sqream | node_9383 | 192.168.0.111 | 5000 sqream | node_9384 | 192.168.0.111 | 5001 sqream | node_9385 | 192.168.0.111 | 5002 sqream | node_9551 | 192.168.1.91 | 5000

We want to modify node_9551 to join the ETL queue, which will be created when we subscribe it:

t=> SELECT SUBSCRIBE_SERVICE('node_9551','etl'); executed

t=> SELECT SHOW_SUBSCRIBED_INSTANCES(); service | servernode | serverip | serverport ------+------+------+------sqream | node_9383 | 192.168.0.111 | 5000 sqream | node_9384 | 192.168.0.111 | 5001 sqream | node_9385 | 192.168.0.111 | 5002 sqream | node_9551 | 192.168.1.91 | 5000 etl | node_9551 | 192.168.1.91 | 5000

To connect to the new queue, use the service parameter in the connection string. A statement executed from a connections to the etl service will be routed to an available worker that is subscribed to the etl service queue.

3.2. SQL Statements and Syntax 501 SQream DB, Release 2021.1

3.2.2.7.49 UNSUBSCRIBE_SERVICE

UNSUBSCRIBE_SERVICE unsubscribes a worker from a service queue for the duration of the connected session.

Note: If you haven’t already, read the Workload manager guide.

See also SUBSCRIBE_SERVICE, SHOW_SUBSCRIBED_INSTANCES.

3.2.2.7.49.1 Permissions

To unsubscribe a worker from a service, the current role must have the SUPERUSER permission.

3.2.2.7.49.2 Syntax subscribe_service_statement ::= SELECT UNSUBSCRIBE_SERVICE( worker_id, service_name ); worker_id ::= text service_name ::= text

3.2.2.7.49.3 Parameters

Parameter Description worker_id The name of the worker to unsubscribe service_name The service name to unsubscribe from

3.2.2.7.49.4 Notes

• If the service name does not currently exist, it will be created

Warning: UNSUBSCRIBE_SERVICE applies the service subscription immediately, but the setting applies for the duration of the session. To apply a persistent setting, use the initialSubscribedServices configuration setting. Read the Workload manager guide for more information.

3.2.2.7.49.5 Examples

3.2.2.7.49.6 Unsubscribing a worker from a service queue

Each worker is assigned an automatic ID, and is subscribed to the 'sqream' queue upon start.

502 Chapter 3. Reference SQream DB, Release 2021.1

t=> SELECT SHOW_SUBSCRIBED_INSTANCES(); service | servernode | serverip | serverport ------+------+------+------sqream | node_9383 | 192.168.0.111 | 5000 sqream | node_9384 | 192.168.0.111 | 5001 sqream | node_9385 | 192.168.0.111 | 5002 sqream | node_9551 | 192.168.1.91 | 5000 etl | node_9551 | 192.168.1.91 | 5000

We want to modify node_9551 to leave the ETL queue: t=> SELECT UNSUBSCRIBE_SERVICE('node_9551','etl'); executed t=> SELECT SHOW_SUBSCRIBED_INSTANCES(); service | servernode | serverip | serverport ------+------+------+------sqream | node_9383 | 192.168.0.111 | 5000 sqream | node_9384 | 192.168.0.111 | 5001 sqream | node_9385 | 192.168.0.111 | 5002 sqream | node_9551 | 192.168.1.91 | 5000

3.2.2.7.50 ALTER DEFAULT PERMISSIONS

ALTER DEFAULT PERMISSIONS allows granting automatic permissions to future objects. By default, if one user creates a table, another user will not have SELECT permissions on it. By modifying the target role’s default permissions, a database administrator can ensure that all objects created by that role will be accessible to others. Learn more about the permission system in the access control guide.

3.2.2.7.50.1 Permissions

To alter default permissions, the current role must have the SUPERUSER permission.

3.2.2.7.50.2 Syntax alter_default_permissions_statement ::= ALTER DEFAULT PERMISSIONS FOR target_role_name [IN schema_name, ...] FOR { TABLES | SCHEMAS } { grant_clause | DROP grant_clause} TO ROLE { role_name | public }; grant_clause ::= GRANT { CREATE FUNCTION | SUPERUSER | CONNECT (continues on next page)

3.2. SQL Statements and Syntax 503 SQream DB, Release 2021.1

(continued from previous page) | CREATE | USAGE | SELECT | INSERT | DELETE | DDL | EXECUTE | ALL } target_role_name ::= identifier role_name ::= identifier schema_name ::= identifier

3.2.2.7.50.3 Supported permissions

Permission Object Description LOGIN Cluster Login permissions (with a password) allows a role to be a user and login to a database PASSWORD Cluster Sets the password for a user role CREATE Database Allows a user to create a Python UDF FUNCTION SUPERUSER Cluster, Database, The most privileged role, with full control over a cluster, Schema database, or schema CONNECT Database Allows a user to connect and use a database CREATE Database, Schema, Table For a role to create and manage objects, it needs the CREATE and USAGE permissions at the respective level USAGE Schema For a role to see tables in a schema, it needs the USAGE permis- sions SELECT Table Allows a user to run SELECT queries on table contents INSERT Table Allows a user to run COPY FROM and INSERT statements to load data into a table DELETE Table Allows a user to run DELETE, TRUNCATE statements to delete data from a table DDL Database, Schema, Table, Allows a user to alter tables, rename columns and tables, etc. Function EXECUTE Function Allows a user to execute UDFs ALL Cluster, Database, All of the above permissions at the respective level Schema, Table, Function

3.2.2.7.50.4 Examples

3.2.2.7.50.5 Automatic permissions for newly created schemas

When role demo creates a new schema, roles u1,u2 will get USAGE and CREATE permissions in the new schema:

504 Chapter 3. Reference SQream DB, Release 2021.1

ALTER DEFAULT PERMISSIONS FOR demo FOR SCHEMAS GRANT USAGE, CREATE TO u1,u2;

3.2.2.7.50.6 Automatic permissions for newly created tables in a schema

When role demo creates a new table in schema s1, roles u1,u2 wil be granted with SELECT on it:

ALTER DEFAULT PERMISSIONS FOR demo IN s1 FOR TABLES GRANT SELECT TO u1,u2;

3.2.2.7.50.7 Revoke (DROP GRANT) permissions for newly created tables

ALTER DEFAULT PERMISSIONS FOR public FOR TABLES DROP GRANT SELECT,DDL,INSERT,DELETE TO␣ ,→public;

3.2.2.7.51 ALTER ROLE

ALTER ROLE can be used to make changes to a role. It works in conjunction with subcommands. Learn more about the permission system in the access control guide.

3.2.2.7.51.1 Subcommands

Command Usage RENAME ROLE Rename a role

3.2.2.7.52 CREATE ROLE

CREATE ROLE creates roles which can be assigned permissions. A ROLE can be used as a USER if it has been granted a password and login permissions. Learn more about the permission system in the access control guide. See also DROP ROLE, ALTER ROLE, GRANT.

3.2.2.7.52.1 Permissions

To create a role, the current role must have the SUPERUSER permission.

3.2.2.7.52.2 Syntax create_role_statement ::= CREATE ROLE role_name ; role_name ::= identifier

3.2. SQL Statements and Syntax 505 SQream DB, Release 2021.1

3.2.2.7.52.3 Parameters

Parameter Description role_name The name of the role to create, which must be unique across the cluster

3.2.2.7.52.4 Notes

SQream recommend that role names should follow these rules: • Are case-insensitive • Start with either a letter or underscore • Contain only ASCII letters, numbers, or underscores • Follow rules for Identifiers • A role has no permissions by default. Follow the examples below to see how to grant permissions to databases and tables. • Roles can be members of other roles. • Roles must be unique across the cluster. • Roles cannot log in by default. They do not have a password or login permissions until granted.

3.2.2.7.52.5 Examples

3.2.2.7.52.6 Creating a role with no permissions

CREATE ROLE new_role;

3.2.2.7.52.7 Creating a user role

A user role has permissions to login, and has a password.

Tip: Some DBMSs call this CREATE USER or ADD USER

CREATE ROLE new_role; GRANT LOGIN to new_role; GRANT PASSWORD 'Tr0ub4dor&3' to new_role; GRANT CONNECT ON DATABASE master to new_role; -- Repeat for other desired databases

3.2.2.7.53 DROP ROLE

DROP ROLE remove roles. Learn more about the permission system in the access control guide. See also CREATE ROLE.

506 Chapter 3. Reference SQream DB, Release 2021.1

3.2.2.7.53.1 Permissions

To drop a role, the current role must have the SUPERUSER permission.

3.2.2.7.53.2 Syntax drop_role_statement ::= DROP ROLE role_name ; role_name ::= identifier

3.2.2.7.53.3 Parameters

Parameter Description role_name The name of the role to drop.

3.2.2.7.53.4 Examples

3.2.2.7.53.5 Dropping a role

DROP ROLE new_role;

3.2.2.7.54 GET_STATEMENT_PERMISSIONS

GET_STATEMENT_PERMISSIONS analyzes an SQL statement and returns a list of permissions required to execute it. Use this function to understand the permissions required, before granting them to a specific role. Learn more about the permission system in the access control guide. See also GRANT, CREATE ROLE.

3.2.2.7.54.1 Permissions

No special permissions are required to run GET_STATEMENT_PERMISSIONS.

3.2.2.7.54.2 Syntax get_statement_permissions_statement ::= SELECT GET_STATEMENT_PERMISSIONS(query_stmt) ;

3.2. SQL Statements and Syntax 507 SQream DB, Release 2021.1

3.2.2.7.54.3 Parameters

Parameter Description query_stmt The statement to analyze

3.2.2.7.54.4 Returns

This utility returns details of the required permissions to run the statement. If the statement requires no special permissions, the utility returns an empty result set.

Table 25: Result columns permission_type The permission type required object_type The object on which the permission is required object_name The object name

3.2.2.7.54.5 Supported permissions

Permission Object Description LOGIN Cluster Login permissions (with a password) allows a role to be a user and login to a database PASSWORD Cluster Sets the password for a user role CREATE Database Allows a user to create a Python UDF FUNCTION SUPERUSER Cluster, Database, The most privileged role, with full control over a cluster, Schema database, or schema CONNECT Database Allows a user to connect and use a database CREATE Database, Schema, Table For a role to create and manage objects, it needs the CREATE and USAGE permissions at the respective level USAGE Schema For a role to see tables in a schema, it needs the USAGE permis- sions SELECT Table Allows a user to run SELECT queries on table contents INSERT Table Allows a user to run COPY FROM and INSERT statements to load data into a table DELETE Table Allows a user to run DELETE, TRUNCATE statements to delete data from a table DDL Database, Schema, Table, Allows a user to alter tables, rename columns and tables, etc. Function EXECUTE Function Allows a user to execute UDFs ALL Cluster, Database, All of the above permissions at the respective level Schema, Table, Function

508 Chapter 3. Reference SQream DB, Release 2021.1

3.2.2.7.54.6 Examples

3.2.2.7.54.7 Getting permission details for a simple statement t=> SELECT GET_STATEMENT_PERMISSIONS('SELECT * from nba'); permission_type | object_type | object_name ------+------+------SELECT | table | master.public.nba USAGE | schema | master.public

3.2.2.7.54.8 Getting permission details for a DDL statement

Tip: Use dollar quoting ($$) to avoid escaping a statement t=> SELECT GET_STATEMENT_PERMISSIONS($$ALTER TABLE nba RENAME COLUMN "Weight" TO "Mass"$ ,→$); permission_type | object_type | object_name ------+------+------DDL | table | master.public.nba USAGE | schema | master.public

3.2.2.7.55 GRANT

The GRANT statement adds permissions for a role. It allows for setting permissions to databases, schemas, and tables. It also allows adding a role as a memeber to another role. When granting permissions to a role, the permission is added for existing objects only. To automatically add permissions to newly created objects, see ALTER DEFAULT PERMISSIONS. Learn more about the permission system in the access control guide. See also REVOKE, CREATE ROLE.

3.2.2.7.55.1 Permissions

To grant permissions, the current role must have the SUPERUSER permission, or have the ADMIN OPTION.

3.2.2.7.55.2 Syntax grant_statement ::= { -- Grant permissions at the cluster level: GRANT { SUPERUSER | LOGIN (continues on next page)

3.2. SQL Statements and Syntax 509 SQream DB, Release 2021.1

(continued from previous page) | PASSWORD 'password' } TO role_name [, ...]

-- Grant permissions at the database level: | GRANT { { CREATE | CONNECT | DDL | SUPERUSER | CREATE FUNCTION } [, ...] | ALL [PERMISSIONS] } ON DATABASE database_name [, ...] TO role_name [, ...]

-- Grant permissions at the schema level: | GRANT { { CREATE | DDL | USAGE | SUPERUSER } [, ...] | ALL [PERMISSIONS ] } ON SCHEMA schema_name [, ...] TO role_name [, ...]

-- Grant permissions at the object level: | GRANT { { SELECT | INSERT | DELETE | DDL } [, ...] | ALL [PERMISSIONS] } ON { TABLE table_name [, ...] | ALL TABLES IN SCHEMA schema_name [, ...]} TO role_name [, ...]

-- Grant execute function permission: | GRANT { ALL | EXECUTE | DDL } ON FUNCTION function_name (continues on next page)

510 Chapter 3. Reference SQream DB, Release 2021.1

(continued from previous page) TO role_name [, ...]

-- Pass permissions between roles by granting one role to another: | GRANT role_name [, ...] TO role_name_2 [ WITH ADMIN OPTION ]

; role_name ::= identifier role_name2 ::= identifier database_name ::= identifier table_name ::= identifier schema_name ::= identifier

3.2.2.7.55.3 Parameters

Parameter Description role_name The name of the role to grant permissions to table_name, Object to grant permissions on. database_name, schema_name, function_name WITH ADMIN If WITH ADMIN OPTION is specified, the role that has the admin option can inturn OPTION grant membership in the role to others, and revoke membership in the role as well. Without the admin option, ordinary roles cannot grant or revoke membership. Roles with SUPERUSER can grant or revoke membership in any role to anyone.

3.2. SQL Statements and Syntax 511 SQream DB, Release 2021.1

3.2.2.7.55.4 Supported permissions

Permission Object Description LOGIN Cluster Login permissions (with a password) allows a role to be a user and login to a database PASSWORD Cluster Sets the password for a user role CREATE Database Allows a user to create a Python UDF FUNCTION SUPERUSER Cluster, Database, The most privileged role, with full control over a cluster, Schema database, or schema CONNECT Database Allows a user to connect and use a database CREATE Database, Schema, Table For a role to create and manage objects, it needs the CREATE and USAGE permissions at the respective level USAGE Schema For a role to see tables in a schema, it needs the USAGE permis- sions SELECT Table Allows a user to run SELECT queries on table contents INSERT Table Allows a user to run COPY FROM and INSERT statements to load data into a table DELETE Table Allows a user to run DELETE, TRUNCATE statements to delete data from a table DDL Database, Schema, Table, Allows a user to alter tables, rename columns and tables, etc. Function EXECUTE Function Allows a user to execute UDFs ALL Cluster, Database, All of the above permissions at the respective level Schema, Table, Function

3.2.2.7.55.5 Examples

3.2.2.7.55.6 Creating a user role with login permissions

Convert a role to a user by granting a password and login permissions

CREATE ROLE new_role; GRANT LOGIN to new_role; GRANT PASSWORD 'Tr0ub4dor&3' to new_role; GRANT CONNECT ON DATABASE master to new_role; -- Repeat for other desired databases

3.2.2.7.55.7 Promoting a user to a superuser

-- On the entire cluster GRANT SUPERUSER TO new_role;

-- For a specific database GRANT SUPERUSER ON DATABASE my_database TO new_role;

512 Chapter 3. Reference SQream DB, Release 2021.1

3.2.2.7.55.8 Creating a new role for a group of users

-- Create new users (we will grant them passwords and logins later) CREATE ROLE dba_user1; CREATE ROLE dba_user2; CREATE ROLE dba_user3;

-- Add new users to the existing r_database_architect role GRANT r_database_architect TO dba_user1; GRANT r_database_architect TO dba_user2; GRANT r_database_architect TO dba_user3;

3.2.2.7.55.9 Granting with admin option

If WITH ADMIN OPTION is specified, the role that has the admin option can in turn grant membership inthe role to others, and revoke membership in the role as well.

-- dba_user1 is our team lead, so he should be able to grant -- permissions to other users.

GRANT r_database_architect TO dba_user1 WITH ADMIN OPTION;

3.2.2.7.55.10 Change password for user role

To change a user role’s password, grant the user a new password.

GRANT PASSWORD 'new_password' TO rhendricks;

Note: Granting a new password overrides any previous password. Changing the password while the role has an active running statement does not affect that statement, but will affect subsequent statements.

3.2.2.7.56 RENAME ROLE

RENAME ROLE can be used to rename a role. Learn more about the permission system in the access control guide.

3.2.2.7.56.1 Permissions

The role must have the SUPERUSER permission.

3.2.2.7.56.2 Syntax

3.2. SQL Statements and Syntax 513 SQream DB, Release 2021.1

alter_role_rename_role_statement ::= ALTER ROLE role_name RENAME TO new_name ; role_name ::= identifier new_name ::= identifier

3.2.2.7.56.3 Parameters

Parameter Description role_name The role name to apply the change to new_name The new role name

3.2.2.7.56.4 Notes

• Role names are unique across the cluster. • SQream recommend that role names should follow these rules: – Are case-insensitive – Start with either a letter or underscore – Contain only ASCII letters, numbers, or underscores – Follow rules for Identifiers

3.2.2.7.56.5 Examples

3.2.2.7.56.6 Renaming a role

ALTER ROLE rhendricks RENAME TO patches;

3.2.2.7.57 REVOKE

The REVOKE statement removes permissions from a role. It allows for removing permissions to specific objects. Learn more about the permission system in the access control guide. See also GRANT, DROP ROLE.

3.2.2.7.57.1 Permissions

To revoke permissions, the current role must have the SUPERUSER permission, or have the ADMIN OPTION.

514 Chapter 3. Reference SQream DB, Release 2021.1

3.2.2.7.57.2 Syntax grant_statement ::= { -- revoke permissions at the cluster level: REVOKE { SUPERUSER | LOGIN | PASSWORD 'password' } FROM role_name [, ...]

-- Revoke permissions at the database level: | REVOKE { { CREATE | CONNECT | DDL | SUPERUSER | CREATE FUNCTION } [, ...] | ALL [PERMISSIONS] } ON DATABASE database_name [, ...] FROM role_name [, ...]

-- Revoke permissions at the schema level: | REVOKE { { CREATE | DDL | USAGE | SUPERUSER } [, ...] | ALL [PERMISSIONS ] } ON SCHEMA schema_name [, ...] FROM role_name [, ...]

-- Revoke permissions at the object level: | REVOKE { { SELECT | INSERT | DELETE | DDL } [, ...] | ALL [PERMISSIONS] } ON { TABLE table_name [, ...] | ALL TABLES IN SCHEMA schema_name [, ...]} FROM role_name [, ...]

(continues on next page)

3.2. SQL Statements and Syntax 515 SQream DB, Release 2021.1

(continued from previous page)

-- Revoke execute function permission: | REVOKE { ALL | EXECUTE | DDL } ON FUNCTION function_name FROM role_name [, ...]

-- Remove permissions between roles by revoking role membership: | REVOKE role_name [, ...] FROM role_name_2 [ WITH ADMIN OPTION ]

; role_name ::= identifier role_name2 ::= identifier database_name ::= identifier table_name ::= identifier schema_name ::= identifier

3.2.2.7.57.3 Parameters

Parameter Description role_name The name of the role to revoke permissions from table_name, Object to revoke permissions on. database_name, schema_name, function_name WITH ADMIN If WITH ADMIN OPTION is specified, the role that has the admin option can inturn OPTION grant membership in the role to others, and revoke membership in the role as well. Specifying WITH ADMIN OPTION for revocation will return the role to an ordinary role. An ordinary role cannot grant or revoke membership.

516 Chapter 3. Reference SQream DB, Release 2021.1

3.2.2.7.57.4 Supported permissions

Permission Object Description LOGIN Cluster Login permissions (with a password) allows a role to be a user and login to a database PASSWORD Cluster Sets the password for a user role CREATE Database Allows a user to create a Python UDF FUNCTION SUPERUSER Cluster, Database, The most privileged role, with full control over a cluster, Schema database, or schema CONNECT Database Allows a user to connect and use a database CREATE Database, Schema, Table For a role to create and manage objects, it needs the CREATE and USAGE permissions at the respective level USAGE Schema For a role to see tables in a schema, it needs the USAGE permis- sions SELECT Table Allows a user to run SELECT queries on table contents INSERT Table Allows a user to run COPY FROM and INSERT statements to load data into a table DELETE Table Allows a user to run DELETE, TRUNCATE statements to delete data from a table DDL Database, Schema, Table, Allows a user to alter tables, rename columns and tables, etc. Function EXECUTE Function Allows a user to execute UDFs ALL Cluster, Database, All of the above permissions at the respective level Schema, Table, Function

3.2.2.7.57.5 Examples

3.2.2.7.57.6 Prevent a role from modifying table contents

If you don’t trust user shifty, reokve DDL and INSERT permissions.

REVOKE INSERT ON TABLE important_table FROM shifty; REVOKE DDL ON TABLE important_table FROM shifty;

3.2.2.7.57.7 Demoting a user from superuser

-- On the entire cluster REVOKE SUPERUSER FROM new_role;

3.2.2.7.57.8 Revoking admin option

If WITH ADMIN OPTION is specified, the role that has the admin option can in turn grant membership inthe role to others, and revoke membership in the role as well.

-- dba_user1 has been demoted from team lead, so he should not be able to grant -- permissions to other users. (continues on next page)

3.2. SQL Statements and Syntax 517 SQream DB, Release 2021.1

(continued from previous page)

REVOKE r_database_architect FROM dba_user1 WITH ADMIN OPTION;

3.2.3 SQL Functions

SQream DB supports functions from ANSI SQL, as well as others for compatibility.

3.2.3.1 Summary of functions

• Built-In Scalar Functions – Bitwise Operations – Conditionals – Conversion – Date and Time – Numeric – Strings • User-Defined Scalar Functions • Aggregate Functions • Window Functions • System Functions • Workload Management Functions

3.2.3.1.1 Built-In Scalar Functions

See more about Built-In Scalar functions

3.2.3.1.1.1 Bitwise Operations

Function Description & (bitwise AND) Bitwise AND ~ (bitwise NOT) Bitwise NOT | (bitwise OR) Bitwise OR << (bitwise shift left) Bitwise shift left >> (bitwise shift right) Bitwise shift right XOR (bitwise XOR) Bitwise XOR

518 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.1.2 Conditionals

Function Description BETWEEN Value is in [ or not within ] the range CASE Test a conditional expression, and depending on the result, evaluate additional expressions. COA- Evaluate first non-NULL expression LESCE IN Value is in [ or not within ] a set of values ISNULL Alias for COALESCE with two expressions IS_ASCII Test a TEXT for ASCII-only characters IS NULL Check for NULL [ or non-NULL ] values

3.2.3.1.1.3 Conversion

Function Description FROM_UNIXTS, FROM_UNIXTSMS Converts a UNIX Timestamp to DATE or DATETIME TO_HEX Converts a number to a hexadecimal string representation TO_UNIXTS, TO_UNIXTSMS Converts a DATE or DATETIME to a UNIX Timestamp

3.2.3.1.1.4 Date and Time

Function Description CURDATE Special syntax, equivalent to CURRENT_DATE CURRENT_DATE Returns the current date as DATE CURRENT_TIMESTAMP Equivalent to GETDATE DATEPART Extracts a date or time element from a date expression DATEADD Adds an interval to a date expression DATEDIFF Calculates the time difference between two date expressions EOMONTH Calculates the last day of the month of a given date expression EXTRACT ANSI syntax for extracting date or time element from a date expression GETDATE Returns the current timestamp as DATETIME SYSDATE Equivalent to GETDATE TRUNC Truncates a date element down to a specified date or time element

3.2.3.1.1.5 Numeric

See more about Arithmetic operators

3.2. SQL Statements and Syntax 519 SQream DB, Release 2021.1

Table 26: Arithmetic operators Operator Syntax Description + (unary) +a Converts a string to a numeric value. Identical to a :: double + a + b Adds two expressions together - (unary) -a Negates a numeric expression - a - b Subtracts b from a * a * b Multiplies a by b / a / b Divides a by b % a % b Modulu of a by b. See also MOD, %

Table 27: Functions Function Description ABS Calculates the absolute value of an argument ACOS Calculates the inverse cosine of an argument ASIN Calculates the inverse sine of an argument ATAN Calculates the inverse tangent of an argument ATN2 Calculates the inverse tangent for a point (y, x) CEILING / CEIL Calculates the next integer for an argument COS Calculates the cosine of an argument COT Calculates the cotangent of an argument CRC64 Calculates a CRC-64 hash of an argument DEGREES Converts a value from radian values to degrees EXP Calcalates the natural exponent for an argument (ex) FLOOR Calculates the largest integer smaller than the argument LOG Calculates the natural log for an argument LOG10 Calculates the 10-based log for an argument MOD, % Calculates the modulu (remainder) of two arguments PI Returns the constant value for π POWER Calculates x to the power of y (xy) RADIANS Converts a value from degree values to radians ROUND Rounds an argument down to the nearest integer, or an arbitrary precision SIN Calculates the sine of an argument SQRT Calculates the square root of an argument (√x) SQUARE Raises an argument to the power of 2 (xy) TAN Calculates the tangent of an argument TRUNC Rounds a number to its integer representation towards 0

520 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.1.6 Strings

Function Description CHAR_LENGTH Calculates number of characters in an argument CHARINDEX Calculates the position where a string starts inside another string || (Concatenate) Concatenates two strings ISPREFIXOF Matches if a string is the prefix of another string LEFT Returns the first number of characters from an argument LEN Calculates the length of a string in characters LIKE Tests if a string argument matches a pattern LOWER Converts an argument to a lower-case equivalent LTRIM Trims whitespaces from the left side of an argument OCTET_LENGTH Calculates the length of a string in bytes PATINDEX Calculates the position where a pattern matches a string REGEXP_COUNT Calculates the number of matches of a regular expression match in an argument REGEXP_INSTR Returns the start position of a regular expression match in an argument REGEXP_SUBSTR Returns a substring of an argument that matches a regular expression REPEAT Repeats a string as many times as specified REPLACE Replaces characters in a string REVERSE Reverses a string argument RIGHT Returns the last number of characters from an argument RLIKE Tests if a string argument matches a regular expression pattern RTRIM Trims whitespace from the right side of an argument SUBSTRING Returns a substring of an argument TRIM Trims whitespaces from an argument UPPER Converts an argument to an upper-case equivalent

3.2.3.1.2 User-Defined Scalar Functions

For more information about user-defined scalar functions, see Scalar SQL UDF

3.2.3.1.3 Aggregate Functions

See more about Aggregate Functions

Function Aliases Description AVG Calculates the average of all of the values CORR Calculates the Pearson correlation coefficient COUNT Calculates the count of all of the values or only distinct values COVAR_POP Calculates population covariance of values COVAR_SAMP Calculates sample covariance of values MAX Returns maximum value of all values MIN Returns minimum value of all values SUM Calculates the sum of all of the values or only distinct values STDDEV_SAMP stdev, stddev Calculates sample standard deviation of values STDDEV_POP stdevp Calculates population standard deviation of values VAR_SAMP var, variance Calculates sample variance of values VAR_POP varp Calculates population variance of values

3.2. SQL Statements and Syntax 521 SQream DB, Release 2021.1

3.2.3.1.4 Window Functions

See more about Window Functions

Function Description LAG Calculates the value evaluated at the row that is before the current row within the partition LEAD Calculates the value evaluated at the row that is after the current row within the partition MAX Calculates the maximum value MIN Calculates the minimum value SUM Calculates the sum of all of the values RANK Calculates the rank of a row FIRST_VALUE Returns the value in the first row of a window LAST_VALUE Returns the value in the last row of a window NTH_VALUE Returns the value in a specified (n) row of a window DENSE_RANK Returns the rank of the current row with no gaps PER- Returns the relative rank of the current row CENT_RANK CUME_DIST Returns the cumulative distribution of rows NTILE Returns an integer ranging between 1 and the argument value, dividing the partitions as equally as possible

3.2.3.1.5 System Functions

System functions allow you to execute actions in the system, such as aborting a query or get information about system processes.

Function Description EXPLAIN Returns a static query plan for a statement SHOW_CONNECTIONS Returns a list of jobs and statements on the current worker SHOW_LOCKS Returns any existing locks in the database SHOW_NODE_INFO Returns a query plan for an actively running statement with timing infor- mation SHOW_SERVER_STATUS Shows running statements across the cluster SHOW_VERSION Returns the version of SQream DB STOP_STATEMENT Stops a query (or statement) if it is currently running

3.2.3.1.6 Workload Management Functions

Function Description SUBSCRIBE_SERVICE Add a SQream DB worker to a service queue UNSUBSCRIBE_SERVICE Remove a SQream DB worker to a service queue SHOW_SUBSCRIBED_INSTANCES Return a list of service queues and workers

3.2.3.1.6.1 Built-In Scalar functions

Built-in scalar functions return one value per call.

522 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.2 & (bitwise AND)

Returns the bitwise AND of two numeric expressions

3.2.3.1.6.3 Syntax expr1 & expr2 --> integer expr1 ::= integer expr2 ::= integer

3.2.3.1.6.4 Arguments

Parameter Description expr1, expr2 Integer expressions

3.2.3.1.6.5 Returns

Returns an integer that is the bitwise AND of the inputs.

3.2.3.1.6.6 Notes

• If either value is NULL, the result is NULL.

3.2.3.1.6.7 Examples master=> SELECT 16 & 24; 16 master=> SELECT 101 & 110; 100 master=> SELECT 32 & 64; 0 master=> CREATE TABLE bit(b1 int, b2 int, b3 int); executed master=> INSERT INTO bit VALUES (1,2,3), (2, 4, 6), (4, 2, 6), (2, 8, 16), (null, null,␣ ,→64), (5, 3, 1), (6, 1, 0); executed

SELECT b1, b2, b3, b1 & b2, b2 & b3, b1 & b3 FROM bit; b1 | b2 | b3 | ?column? | ?column?0 | ?column?1 (continues on next page)

3.2. SQL Statements and Syntax 523 SQream DB, Release 2021.1

(continued from previous page) ---+----+----+------+------+------1 | 2 | 3 | 0 | 2 | 1 2 | 4 | 6 | 0 | 4 | 2 4 | 2 | 6 | 0 | 2 | 4 2 | 8 | 16 | 0 | 0 | 0 | | 64 | | | 5 | 3 | 1 | 1 | 1 | 1 6 | 1 | 0 | 0 | 0 | 0

3.2.3.1.6.8 ~ (bitwise NOT)

Returns the bitwise NOT (negation) of two numeric expressions. This is the bitwise complement.

3.2.3.1.6.9 Syntax

~ expr --> integer expr ::= integer

3.2.3.1.6.10 Arguments

Parameter Description expr Integer expression

3.2.3.1.6.11 Returns

Returns an integer that is the bitwise NOT of the input.

3.2.3.1.6.12 Notes

• If the value is NULL, the result is NULL.

3.2.3.1.6.13 Examples master=> SELECT ~16, ~24; -17,-25 master=> SELECT ~101, ~110; -102,-111 master=> SELECT ~32, ~64; -33,-65

524 Chapter 3. Reference SQream DB, Release 2021.1

master=> CREATE TABLE bit(b1 int, b2 int, b3 int); executed master=> INSERT INTO bit VALUES (1,2,3), (2, 4, 6), (4, 2, 6), (2, 8, 16), (null, null,␣ ,→64), (5, 3, 1), (6, 1, 0); executed

SELECT b1, b2, b3, ~b1, ~b2, ~b3 FROM bit; b1 | b2 | b3 | ?column? | ?column?0 | ?column?1 ---+----+----+------+------+------1 | 2 | 3 | -2 | -3 | -4 2 | 4 | 6 | -3 | -5 | -7 4 | 2 | 6 | -5 | -3 | -7 2 | 8 | 16 | -3 | -9 | -17 | | 64 | | | -65 5 | 3 | 1 | -6 | -4 | -2 6 | 1 | 0 | -7 | -2 | -1

3.2.3.1.6.14 | (bitwise OR)

Returns the bitwise OR of two numeric expressions

3.2.3.1.6.15 Syntax expr1 | expr2 --> integer expr1 ::= integer expr2 ::= integer

3.2.3.1.6.16 Arguments

Parameter Description expr1, expr2 Integer expressions

3.2.3.1.6.17 Returns

Returns an integer that is the bitwise OR of the inputs.

3.2.3.1.6.18 Notes

• If either value is NULL, the result is NULL.

3.2. SQL Statements and Syntax 525 SQream DB, Release 2021.1

3.2.3.1.6.19 Examples master=> SELECT 16 | 24; 24 master=> SELECT 101 | 110; 111 master=> SELECT 32 | 64; 96 master=> CREATE TABLE bit(b1 int, b2 int, b3 int); executed master=> INSERT INTO bit VALUES (1,2,3), (2, 4, 6), (4, 2, 6), (2, 8, 16), (null, null,␣ ,→64), (5, 3, 1), (6, 1, 0); executed

SELECT b1, b2, b3, b1 | b2, b2 | b3, b1 | b3 FROM bit; b1 | b2 | b3 | ?column? | ?column?0 | ?column?1 ---+----+----+------+------+------1 | 2 | 3 | 3 | 3 | 3 2 | 4 | 6 | 6 | 6 | 6 4 | 2 | 6 | 6 | 6 | 6 2 | 8 | 16 | 10 | 24 | 18 | | 64 | | | 5 | 3 | 1 | 7 | 3 | 5 6 | 1 | 0 | 7 | 1 | 6

3.2.3.1.6.20 << (bitwise shift left)

Returns the bitwise shift left of a numeric expression

3.2.3.1.6.21 Syntax expr << n --> integer expr ::= integer n ::= integer

3.2.3.1.6.22 Arguments

Parameter Description expr Integer expressions n Number of bits to shift by

526 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.23 Returns

Returns an integer that is the input shifted left by n positions.

3.2.3.1.6.24 Notes

• If either value is NULL, the result is NULL.

3.2.3.1.6.25 Examples master=> SELECT 16 << 1; 32 master=> SELECT 16 << 2; 64 master=> select 1 << 1; 2 master=> CREATE TABLE bit(b1 int, b2 int, b3 int); executed master=> INSERT INTO bit VALUES (1,2,3), (2, 4, 6), (4, 2, 6), (2, 8, 16), (null, null,␣ ,→64), (5, 3, 1), (6, 1, 0); executed

SELECT b1, b2, b3, b1 << b2, b2 << b3, b1 << 1 FROM bit; b1 | b2 | b3 | ?column? | ?column?0 | ?column?1 ---+----+----+------+------+------1 | 2 | 3 | 4 | 16 | 2 2 | 4 | 6 | 32 | 256 | 4 4 | 2 | 6 | 16 | 128 | 8 2 | 8 | 16 | 512 | 524288 | 4 | | 64 | | | 5 | 3 | 1 | 40 | 6 | 10 6 | 1 | 0 | 12 | 1 | 12

3.2.3.1.6.26 >> (bitwise shift right)

Returns the bitwise shift right of a numeric expression

3.2.3.1.6.27 Syntax expr >> n --> integer expr ::= integer n ::= integer

3.2. SQL Statements and Syntax 527 SQream DB, Release 2021.1

3.2.3.1.6.28 Arguments

Parameter Description expr Integer expressions n Number of bits to shift by

3.2.3.1.6.29 Returns

Returns an integer that is the input shifted right by n positions.

3.2.3.1.6.30 Notes

• If either value is NULL, the result is NULL.

3.2.3.1.6.31 Examples master=> SELECT 16 >> 1; 8 master=> SELECT 16 >> 2; 4 master=> select 1 >> 1; 0 master=> CREATE TABLE bit(b1 int, b2 int, b3 int); executed master=> INSERT INTO bit VALUES (1,2,3), (2, 4, 6), (4, 2, 6), (2, 8, 16), (null, null,␣ ,→64), (5, 3, 1), (6, 1, 0); executed

SELECT b1, b2, b3, b1 >> b2, b2 >> b3, b1 >> 1 FROM bit; b1 | b2 | b3 | ?column? | ?column?0 | ?column?1 ---+----+----+------+------+------1 | 2 | 3 | 0 | 0 | 0 2 | 4 | 6 | 0 | 0 | 1 4 | 2 | 6 | 1 | 0 | 2 2 | 8 | 16 | 0 | 0 | 1 | | 64 | | | 5 | 3 | 1 | 0 | 1 | 2 6 | 1 | 0 | 3 | 1 | 3

3.2.3.1.6.32 XOR (bitwise XOR)

Returns the bitwise XOR of two numeric expressions

528 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.33 Syntax

XOR(expr1, expr2) --> integer expr1 ::= integer expr2 ::= integer

3.2.3.1.6.34 Arguments

Parameter Description expr1, expr2 Integer expressions

3.2.3.1.6.35 Returns

Returns an integer that is the bitwise XOR of the inputs.

3.2.3.1.6.36 Notes

• If either value is NULL, the result is NULL.

3.2.3.1.6.37 Examples master=> SELECT XOR(16, 24); 8 master=> SELECT XOR(101, 110); 11 master=> SELECT XOR(32, 64); 96 master=> SELECT XOR(512, 512); 0 master=> CREATE TABLE bit(b1 int, b2 int, b3 int); executed master=> INSERT INTO bit VALUES (1,2,3), (2, 4, 6), (4, 2, 6), (2, 8, 16), (null, null,␣ ,→64), (5, 3, 1), (6, 1, 0); executed

SELECT b1, b2, b3, xor(b1, b2), xor(b2, b3), xor(b1, b3) FROM bit; b1 | b2 | b3 | xor | xor0 | xor1 ---+----+----+-----+------+----- 1|2|3| 3| 1| 2 (continues on next page)

3.2. SQL Statements and Syntax 529 SQream DB, Release 2021.1

(continued from previous page) 2|4|6| 6| 2| 4 4|2|6| 6| 4| 2 2 | 8 | 16 | 10 | 24 | 18 | | 64 | | | 5|3|1| 6| 2| 4 6|1|0| 7| 1| 6

3.2.3.1.6.38 BETWEEN

BETWEEN is used to simplify range tests, by returning TRUE when the input is within two boundaries.

3.2.3.1.6.39 Syntax expr [ NOT ] BETWEEN lower_bound AND upper_bound

3.2.3.1.6.40 Arguments

Parameter Description expr A general value expression or a literal. lower_bound, upper_bound Lower and upper bounds, of the same data type as expr

3.2.3.1.6.41 Returns

Returns TRUE when expr is within the bounds, or FALSE otherwise.

3.2.3.1.6.42 Notes

• expr BETWEEN X AND Y is equivalent to expr >=X AND expr <=Y. • The upper boundary must be greater than the lower boundary

3.2.3.1.6.43 Examples

3.2.3.1.6.44 BETWEEN farm=> SELECT name, num_eyes FROM cool_animals WHERE num_eyes BETWEEN 5 and 8 name | num_eyes ------+------Spider | 8 Starfish | 5 Praying mantis | 5

530 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.45 NOT BETWEEN farm=> SELECT name, num_eyes FROM cool_animals WHERE num_eyes NOT BETWEEN 5 and 8 name | num_eyes ------+------Human | 2 Horseshoe crab | 10 Box Jellyfish | 24 Fox | 2 Possum | 2

3.2.3.1.6.46 CASE

A case expression can test a conditional expression, and depending on the result, evaluate additional expres- sions. CASE works like nested IF ... THEN ... ELSE .... When one of the conditions evaluates to TRUE, the eval- uation stops and the expression or operand assosicated with the result (THEN ...) is evaulated. Otherwise, the evaluation continues to the next condition. If no conditional matches, the ELSE result is returned.

3.2.3.1.6.47 Syntax case_expression ::= searched_case | simple_case searched_case ::= CASE WHEN conditional_value_expr THEN result_value_expr [WHEN ...] [ELSE result_value_expr] END simple_case ::= CASE value_expr WHEN value_expr THEN result_value_expr [WHEN ...] [ELSE result_value_expr] END

3.2.3.1.6.48 Arguments

Parameter Description conditional_value_exprA condition that evaluates to a boolean (TRUE, FALSE, or NULL) value_expr A general value expression or a literal. result_value_expr A general value expression or a literal. All data types of result_value_expr must match within a CASE.

3.2. SQL Statements and Syntax 531 SQream DB, Release 2021.1

3.2.3.1.6.49 Returns

Returns an expression consistent with the type of result_value_expr.

3.2.3.1.6.50 Notes

• SQream DB does not support the IF .. THEN .. ELSE syntax. A simple case expression can replace it. • If no ELSE is specified, the default result will be NULL. • All WHEN ... expressions must have the same data type. • All THEN ... and the ELSE results must have the same data type.

3.2.3.1.6.51 Simple case expression simple_case ::= CASE value_expr WHEN value_expr THEN result_value_expr [WHEN ...] [ELSE result_value_expr] END

A simple case expression evaluates the condition (conditional_value_expr), and then evaluates and returns the result from the corresponding THEN. If no matches are found, NULL is returned.

3.2.3.1.6.52 Searched case expression searched_case ::= CASE WHEN conditional_value_expr THEN result_value_expr [WHEN ...] [ELSE result_value_expr] END

A searched case expression evaluates every conditional_value_expr listed. At the first conditional_value_expr that evaluates to TRUE, the result of the THEN will be returned. If no matches are found, NULL is returned.

3.2.3.1.6.53 Examples

3.2.3.1.6.54 Simple case master=> SELECT name, CASE num_eyes . WHEN 1 THEN 'Cyclops' . WHEN 2 THEN 'Binocular' (continues on next page)

532 Chapter 3. Reference SQream DB, Release 2021.1

(continued from previous page) . WHEN 5 THEN 'Pentocular' . WHEN 8 then 'Octocular' . ELSE 'Other' . END . FROM (VALUES ('Copepod',1), ('Spider',8), ('Starfish', 5), ('Praying mantis',␣ ,→5), ('Human (average)', 2), ('Eagle', 2), ('Horseshoe crab', 10)) . AS cool_animals(name, num_eyes); name | ?column? ------+------Copepod | Cyclops Spider | Octocular Starfish | Pentocular Praying mantis | Pentocular Human (average) | Binocular Eagle | Binocular Horseshoe crab | Other

3.2.3.1.6.55 Searched case

SELECT age, CASE WHEN age < 3 THEN 'Toddler' WHEN age < 12 THEN 'Child' WHEN age < 20 THEN 'Teenager' WHEN age < 34 THEN 'Young adult' WHEN age < 65 THEN 'Adult' ELSE 'Senior' END AS "Age Group" FROM (VALUES (2), (5), (15), (19), (32), (44), (87)) AS t(age); age | Age group ----+------2 | Toddler 5 | Child 15 | Teenager 19 | Teenager 32 | Young adult 44 | Adult 87 | Senior

3.2.3.1.6.56 Replacing IF with CASE

As SQream DB does not support the IF function found on some other DBMSs, use CASE instead.

-- MySQL syntax: IF (age > 65, "Senior", "Other") is functionally identical to:

3.2. SQL Statements and Syntax 533 SQream DB, Release 2021.1

CASE WHEN age > 65 THEN 'Senior' ELSE 'Other' END

3.2.3.1.6.57 COALESCE

Evaluates and returns the first non-null expression, or NULL if all expressions are NULL.

3.2.3.1.6.58 Syntax

COALESCE( expr, [, ...])

3.2.3.1.6.59 Arguments

Parameter Description expr1, expr2,… Expressions of the same type

3.2.3.1.6.60 Returns

Returns the first non-null argument or NULL if all expressions are null.

3.2.3.1.6.61 Notes

• All expressions must have the same type, which is also the type of the result. • See also ISNULL. ISNULL(x,y) is equivalent to COALESCE(x,y).

3.2.3.1.6.62 Examples

3.2.3.1.6.63 Coalesce master=> SELECT COALESCE(NULL, NULL, NULL, 5); 5

3.2.3.1.6.64 Replacing NULL values in aggregations

In some cases, replacing NULL values with a default can affect results. master=> SELECT . AVG(num_eyes), . AVG(COALESCE(num_eyes,2)) AS "Corrected average" . (continues on next page)

534 Chapter 3. Reference SQream DB, Release 2021.1

(continued from previous page) . FROM ( . VALUES ('Copepod',1),('Spider',8) . ,('Starfish',5),('Praying mantis',5) . ,('Human (average)',2),('Eagle',2) . ,('Horseshoe crab',10),('Kiwi',NULL) . ,('Fox',NULL),('Badger',NULL) . ) AS cool_animals (name , num_eyes); avg | Corrected average ----+------4 | 3

3.2.3.1.6.65 IN

IN is used to test if an expression is contained in a list of values. This is also known as set membership.

3.2.3.1.6.66 Syntax expr [ NOT ] IN ( value_expr [, ...])

3.2.3.1.6.67 Arguments

Parameter Description expr A general value expression or a literal to test. value_expr [, ...] A comma separated list of literal values of the same data type as expr

3.2.3.1.6.68 Returns

Returns TRUE when expr is contained in the set, or FALSE otherwise.

3.2.3.1.6.69 Notes

• NULL is never equal to NULL, so to check if a value is NULL use IS NULL instead. • Using IN with more than 100 literal values can slow down query compilation. If your query relies on more than 100 values, consider using a lookup table and an inner join.

3.2.3.1.6.70 Examples

3.2.3.1.6.71 IN

3.2. SQL Statements and Syntax 535 SQream DB, Release 2021.1

farm=> SELECT name, num_eyes FROM cool_animals WHERE num_eyes IN (8, 10) name | num_eyes ------+------Spider | 8 Horseshoe crab | 10

3.2.3.1.6.72 NOT IN farm=> SELECT name, num_eyes FROM cool_animals WHERE num_eyes NOT IN (8, 10) name | num_eyes ------+------Box Jellyfish | 24 Human | 2 Fox | 2 Possum | 2

3.2.3.1.6.73 IS_ASCII

IS_ASCII is used to test if a string contains only ASCII characters.

3.2.3.1.6.74 Syntax

IS_ASCII( expr ) --> BOOL

3.2.3.1.6.75 Arguments

Parameter Description expr A general value expression or a literal to test.

3.2.3.1.6.76 Returns

Returns TRUE when expr contains only ASCII characters (0-127) or FALSE otherwise.

3.2.3.1.6.77 Notes

• If the expression to test is NULL, IS_ASCII returns NULL.

3.2.3.1.6.78 Examples

For these examples, consider the following table and contents:

536 Chapter 3. Reference SQream DB, Release 2021.1

CREATE TABLE dictionary (id INT NOT NULL, fw TEXT(30), en VARCHAR(30));

INSERT INTO dictionary VALUES (1, '￿￿￿', 'Let''s go'), (2, '￿￿', 'Cheers'), (3, 'L''chaim', ,→ 'Cheers');

3.2.3.1.6.79 IS NULL m=> SELECT id, en, fw, IS_ASCII(fw) FROM dictionary; id | en | fw | is_ascii ---+------+------+------1 | Let's go | ￿￿￿ | false 2 | Cheers | ￿￿ | false 3 | Cheers | L'chaim | true

3.2.3.1.6.80 IS NULL

IS NULL is used to test if an expression is null.

3.2.3.1.6.81 Syntax expr IS [ NOT ] NULL

3.2.3.1.6.82 Arguments

Parameter Description expr A general value expression or a literal to test.

3.2.3.1.6.83 Returns

Returns TRUE when expr is NULL or FALSE otherwise.

3.2.3.1.6.84 Examples

For these examples, consider the following table and contents:

CREATE TABLE t (id INT NOT NULL, name VARCHAR(30), weight INT);

INSERT INTO t VALUES (1, 'Kangaroo', 120), (2, 'Koala', 20), (3, 'Wombat', 60) ,(4, 'Kappa', NULL),(5, 'Echidna', 8),(6, 'Chupacabra', NULL) ,(7, 'Kraken', NULL);

3.2. SQL Statements and Syntax 537 SQream DB, Release 2021.1

3.2.3.1.6.85 IS NULL m=> SELECT * FROM t WHERE weight IS NULL; id | name | weight ---+------+------4 | Kappa | 6 | Chupacabra | 7 | Kraken |

3.2.3.1.6.86 Using IS NOT NULL to filter unwanted results m=> SELECT AVG(weight) FROM t WHERE weight IS NOT NULL; avg --- 52

3.2.3.1.6.87 ISNULL

Evaluates an expression and returns a default value if the expression is NULL.

3.2.3.1.6.88 Syntax

ISNULL( expr, value_expr )

3.2.3.1.6.89 Arguments

Parameter Description expr Expressions to evaluate value_expr Default value to return if expr is null

3.2.3.1.6.90 Returns

Returns either expr or value_expr.

3.2.3.1.6.91 Notes

• All expressions must have the same type, which is also the type of the result. • See also COALESCE. ISNULL(x,y) is equivalent to COALESCE(x,y).

538 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.92 Examples

3.2.3.1.6.93 ISNULL master=> SELECT ISNULL(NULL, 5); 5

3.2.3.1.6.94 Replacing NULL values in aggregations

In some cases, replacing NULL values with a default can affect results. master=> SELECT . AVG(num_eyes), . AVG(ISNULL(num_eyes,2)) AS "Corrected average" . . FROM ( . VALUES ('Copepod',1),('Spider',8) . ,('Starfish',5),('Praying mantis',5) . ,('Human (average)',2),('Eagle',2) . ,('Horseshoe crab',10),('Kiwi',NULL) . ,('Fox',NULL),('Badger',NULL) . ) AS cool_animals (name , num_eyes); avg | Corrected average ----+------4 | 3

3.2.3.1.6.95 FROM_UNIXTS, FROM_UNIXTSMS

Converts a BIGINT representing a UNIX timestamp to a DATETIME value.

3.2.3.1.6.96 Syntax

FROM_UNIXTS( expr ) --> DATETIME

FROM_UNIXTSMS( expr ) --> DATETIME

3.2.3.1.6.97 Arguments

Parameter Description expr An integer containing a UNIX timestamp in seconds/milliseconds from EPOCH

3.2.3.1.6.98 Returns

• FROM_UNIXTS returns a DATETIME from a UNIX timestamp in seconds since EPOCH • FROM_UNIXTSMS returns DATETIME from a UNIX timestamp in milliseconds since EPOCH.

3.2. SQL Statements and Syntax 539 SQream DB, Release 2021.1

3.2.3.1.6.99 Notes

To convert a DATETIME to a UNIX timestamp, see TO_UNIXTS.

3.2.3.1.6.100 Examples

3.2.3.1.6.101 Convert a UNIX timestamp to DATETIME master=> SELECT FROM_UNIXTS(0), FROM_UNIXTS(1575636057); to_unixtsms | to_unixts ------+------1575642811562 | 1575642811

3.2.3.1.6.102 TO_HEX

Converts an integer to a hexadecimal representation.

3.2.3.1.6.103 Syntax

TO_HEX( expr ) --> VARCHAR

3.2.3.1.6.104 Arguments

Parameter Description expr An integer expression

3.2.3.1.6.105 Returns

• Representation of the hexadecimal number of type VARCHAR.

3.2.3.1.6.106 Examples

For these examples, consider the following table and contents:

CREATE TABLE cool_numbers(number BIGINT NOT NULL);

INSERT INTO cool_numbers VALUES (42), (3735928559), (666), (3135097598), (3221229823);

3.2.3.1.6.107 Convert numbers to hexadecimal

540 Chapter 3. Reference SQream DB, Release 2021.1

master=> SELECT TO_HEX(number) FROM cool_numbers; to_hex ------0x000000000000002a 0x00000000deadbeef 0x000000000000029a 0x00000000baddcafe 0x00000000c00010ff

3.2.3.1.6.108 TO_UNIXTS, TO_UNIXTSMS

Converts a DATETIME value to a BIGINT representing a UNIX timestamp.

3.2.3.1.6.109 Syntax

TO_UNIXTS( expr ) --> BIGINT

TO_UNIXTSMS( expr ) --> BIGINT

3.2.3.1.6.110 Arguments

Parameter Description expr A DATE or DATETIME expression

3.2.3.1.6.111 Returns

• TO_UNIXTS returns the UNIX timestamp in seconds since EPOCH • TO_UNIXTSMS returns the UNIX timestamp in milliseconds since EPOCH.

3.2.3.1.6.112 Notes

To convert a UNIX timestamp to a DATE or DATETIME, see FROM_UNIXTS.

3.2.3.1.6.113 Examples

3.2.3.1.6.114 Get the current UNIX timestamp master=> SELECT TO_UNIXTSMS(GETDATE()), TO_UNIXTS(GETDATE()); to_unixtsms | to_unixts ------+------1575642811562 | 1575642811

3.2. SQL Statements and Syntax 541 SQream DB, Release 2021.1

3.2.3.1.6.115 Filter on a range of UNIX timestamps

Get the amount of users that signed up during 2019 master=> SELECT COUNT(*) FROM users . WHERE signup_ts BETWEEN TO_UNIXTS('2019-01-01') AND TO_UNIXTS('2019-12-31'); count ------523428

3.2.3.1.6.116 CURDATE

Returns the current date of the system.

Note: This function is provided for compatibility with MySQL, IBM DB2 and ODBC.

3.2.3.1.6.117 Syntax

CURDATE() --> DATE

3.2.3.1.6.118 Arguments

None

3.2.3.1.6.119 Returns

The current system date, with type DATE.

3.2.3.1.6.120 Notes

• This function is equivalent to CURRENT_DATE. • To get the date and time, see CURRENT_TIMESTAMP.

3.2.3.1.6.121 Examples

3.2.3.1.6.122 Get the current system date master=> SELECT CURRENT_DATE, CURRENT_DATE(), CURDATE(); date | date0 | date1 ------+------+------2019-12-07 | 2019-12-07 | 2019-12-07

542 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.123 CURRENT_DATE

Returns the current date of the system.

Note: This function has a special ANSI SQL form and can be called without parentheses.

3.2.3.1.6.124 Syntax

CURRENT_DATE() --> DATE

CURRENT_DATE --> DATE

3.2.3.1.6.125 Arguments

None

3.2.3.1.6.126 Returns

The current system date, with type DATE.

3.2.3.1.6.127 Notes

• This function has a special ANSI SQL form and can be called without parentheses. • This function is equivalent to CURDATE. • To get the date and time, see CURRENT_TIMESTAMP.

3.2.3.1.6.128 Examples

3.2.3.1.6.129 Get the current system date master=> SELECT CURRENT_DATE, CURRENT_DATE(), CURDATE(); date | date0 | date1 ------+------+------2019-12-07 | 2019-12-07 | 2019-12-07

3.2.3.1.6.130 CURRENT_TIMESTAMP

Returns the current date and time of the system.

Note: This function has a special ANSI SQL form and can be called without parentheses.

3.2. SQL Statements and Syntax 543 SQream DB, Release 2021.1

3.2.3.1.6.131 Syntax

CURRENT_TIMESTAMP() --> DATETIME

CURRENT_TIMESTAMP --> DATETIME

3.2.3.1.6.132 Arguments

None

3.2.3.1.6.133 Returns

The current system date and time, with type DATETIME.

3.2.3.1.6.134 Notes

• This function has a special ANSI SQL form and can be called without parentheses. • Aliases to this function include SYSDATE and GETDATE. • To get the date only, see CURRENT_DATE.

3.2.3.1.6.135 Examples

3.2.3.1.6.136 Get the current system date and time master=> SELECT CURRENT_TIMESTAMP, CURRENT_TIMESTAMP(), SYSDATE, GETDATE(); getdate | getdate0 | getdate1 | getdate2 ------+------+------+------2019-12-07 23:04:26 | 2019-12-07 23:04:26 | 2019-12-07 23:04:26 | 2019-12-07 23:04:26

3.2.3.1.6.137 Find events that happen before this month

We will use TRUNC to get the date at the beginning of this month, and then filter. master=> SELECT COUNT(*) FROM cool_dates WHERE dt <= TRUNC(CURRENT_TIMESTAMP, month); count ----- 5

3.2.3.1.6.138 DATEADD

Adds or subtracts an interval to DATE or DATETIME value.

544 Chapter 3. Reference SQream DB, Release 2021.1

Note: SQream DB does not support the INTERVAL ANSI syntax. Use DATEADD to add or subtract date intervals.

3.2.3.1.6.139 Syntax

DATEADD( interval, number, date_expr ) interval ::= YEAR | YYYY | YY | QUARTER | QQ | Q | MONTH | MM | M | DAYOFYEAR | DOY | DY | Y | DAY | DD | D | WEEK | WK | WW | HOUR | HH | MINUTE | MI | N | SECOND | SS | S | MILLISECOND | MS

3.2.3.1.6.140 Arguments

Parame- Description ter interval An interval representing a date part. See the table below or the syntax reference above for valid date parts number An integer expression date_expr A DATE or DATETIME expression

3.2.3.1.6.141 Valid date parts

Date part Shorthand Definition YEAR YYYY, YY Year (0 - 9999) QUARTER QQ, Q Quarter (1-4) MONTH MM, M Month (1-12) DAY DD, D, DOY, DAYOFYEAR, DY, Y Days (1-365) WEEK WK, WW Week of the year (1-52) HOUR HH Hour (0-23) MINUTE MI, N Minute (0-59) SECOND SS, S Seconds (0-59) MILLISECOND MS Milliseconds (0-999)

Note: • The first day of the week is Sunday, when used with weekday.

3.2. SQL Statements and Syntax 545 SQream DB, Release 2021.1

3.2.3.1.6.142 Returns

• If HOUR, MINUTE, SECOND, or MILLISECOND are added to a DATE, the return type will be DATETIME. • For all other date parts, the return type is the same as the argument supplied.

3.2.3.1.6.143 Notes

• Use negative numbers to subtract from a date • Use DATEADD instead of manually figuring out dates. For example, adding a day to February 28th2020 should result in February 29th 2020. Adding another day should result in March 1st, not February 30th.

3.2.3.1.6.144 Examples

For these examples, consider the following table and contents:

CREATE TABLE cool_dates(name VARCHAR(40), d DATE, dt DATETIME);

INSERT INTO cool_dates VALUES ('Marty McFly goes back to this time','1955-11-05','1955- ,→11-05 01:21:00.000') ,('Marty McFly came from this time', '1985-10-26', '1985-10-26 01:22:00.000') ,('Vesuvius erupts', '79-08-24', '79-08-24 13:00:00.000') ,('1997 begins', '1997-01-01', '1997-01-01') ,('1997 ends', '1997-12-31','1997-12-31 23:59:59.999');

3.2.3.1.6.145 Add a quarter to each date master=> SELECT d as original_date, DATEADD(QUARTER, 1, d) as next_quarter FROM cool_ ,→dates; original_date | next_quarter ------+------1955-11-05 | 1956-02-05 1985-10-26 | 1986-01-26 0079-08-24 | 0079-11-24 1997-01-01 | 1997-04-01 1997-12-31 | 1998-03-31

3.2.3.1.6.146 Getting next month’s date master=> SELECT CURRENT_DATE,DATEADD(MONTH, 1, CURRENT_DATE); date | dateadd ------+------2019-12-07 | 2020-01-07

546 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.147 Filtering +- 50 years from a specific date

master=> SELECT name, dt as datetime FROM cool_dates . WHERE dt BETWEEN DATEADD(YEAR,-50,'1955-06-01') AND DATEADD(YEAR,50,'1955-06- ,→01');

name | datetime ------+------Marty McFly goes back to this time | 1955-11-05 01:21:00 Marty McFly came from this time | 1985-10-26 01:22:00 1997 begins | 1997-01-01 00:00:00 1997 ends | 1997-12-31 23:59:59

3.2.3.1.6.148 Check if a year is a leap year

Returns TRUE if this is a leap year - because adding a day to February 28th is February 29th on a leap year.

-- Should return true for 2020: master=> SELECT DATEPART(MONTH, DATEADD(DAY,1,'2020-02-28')) = 2 AS "2020 is a leap year ,→"; 2020 is a leap year ------true

-- Should return false for 2021: master=> SELECT DATEPART(MONTH, DATEADD(DAY,1,'2021-02-28')) = 2 AS "2021 is a leap year ,→"; 2021 is a leap year ------false

3.2.3.1.6.149 DATEDIFF

Calculates the difference between to DATE or DATETIME expressions, in terms of a specific date part.

Note: Results are given in integers, rather than INTERVAL, which SQream DB does not support.

3.2.3.1.6.150 Syntax

DATEDIFF( interval, date_expr1, date_expr2 ) --> INT interval ::= YEAR | YYYY | YY | QUARTER | QQ | Q | MONTH | MM | M | DAYOFYEAR | DY | Y | DAY | DD | D (continues on next page)

3.2. SQL Statements and Syntax 547 SQream DB, Release 2021.1

(continued from previous page) | WEEK | WK | WW | HOUR | HH | MINUTE | MI | N | SECOND | SS | S | MILLISECOND | MS

3.2.3.1.6.151 Arguments

Parameter Description interval An interval representing a date part. See the table below or the syntax reference above for valid date parts date_expr1, A DATE or DATETIME expression. The function calculates date_expr2 - date_expr2 date_expr1.

3.2.3.1.6.152 Valid date parts

Date part Shorthand Definition YEAR YYYY, YY Year (0 - 9999) QUARTER QQ, Q Quarter (1-4) MONTH MM, M Month (1-12) DAY DD, D, DAYOFYEAR, DY, Y Days (1-365) WEEK WK, WW Week of the year (1-52) HOUR HH Hour (0-23) MINUTE MI, N Minute (0-59) SECOND SS, S Seconds (0-59) MILLISECOND MS Milliseconds (0-999)

3.2.3.1.6.153 Returns

An integer representing the number of date part units (e.g. years, days, months, hours, etc.) between date_expr2 and date_expr1.

3.2.3.1.6.154 Notes

• Only the selected date part is used to calculate the difference. For example, the difference between '1997-12-31' and '1997-01-01' in years is 0, even though they are 364 days apart. • Negative values means that date_expr1 is after date_expr2.

3.2.3.1.6.155 Examples

For these examples, consider the following table and contents:

548 Chapter 3. Reference SQream DB, Release 2021.1

CREATE TABLE cool_dates(name VARCHAR(40), d DATE, dt DATETIME);

INSERT INTO cool_dates VALUES ('Marty McFly goes back to this time','1955-11-05','1955- ,→11-05 01:21:00.000') ,('Marty McFly came from this time', '1985-10-26', '1985-10-26 01:22:00.000') ,('Vesuvius erupts', '79-08-24', '79-08-24 13:00:00.000') ,('1997 begins', '1997-01-01', '1997-01-01') ,('1997 ends', '1997-12-31','1997-12-31 23:59:59.999');

3.2.3.1.6.156 Find out how far past events are

3.2.3.1.6.157 In years master=> SELECT d AS original_date, DATEDIFF(YEAR, CURRENT_DATE, d) AS "was ... years ago ,→" FROM cool_dates; original_date | was ... years ago ------+------1955-11-05 | -64 1985-10-26 | -34 0079-08-24 | -1940 1997-01-01 | -22 1997-12-31 | -22

3.2.3.1.6.158 In days master=> SELECT d AS original_date, DATEDIFF(DAY, CURRENT_DATE, d) AS "was ... days ago"␣ ,→FROM cool_dates; original_date | was ... days ago ------+------1955-11-05 | -23408 1985-10-26 | -12460 0079-08-24 | -708675 1997-01-01 | -8375 1997-12-31 | -8011

3.2.3.1.6.159 In hours

Note: • Use CURRENT_TIMESTAMP instead of CURRENT_DATE, to include the current time as well as date. • In this example, we use dt which is a DATETIME column

3.2. SQL Statements and Syntax 549 SQream DB, Release 2021.1

master=> SELECT CURRENT_TIMESTAMP as "Now", dt AS "Original datetime", DATEDIFF(HOUR,␣ ,→CURRENT_TIMESTAMP, dt) AS "was ... hours ago" FROM cool_dates; Now | Original datetime | was ... hours ago ------+------+------2019-12-07 22:35:50 | 1955-11-05 01:21:00 | -561813 2019-12-07 22:35:50 | 1985-10-26 01:22:00 | -299061 2019-12-07 22:35:50 | 0079-08-24 13:00:00 | -17008209 2019-12-07 22:35:50 | 1997-01-01 00:00:00 | -201022 2019-12-07 22:35:50 | 1997-12-31 23:59:59 | -192263

3.2.3.1.6.160 DATEPART

Extracts a date or time part from a DATE or DATETIME value.

Note: SQream DB also supports the ANSI EXTRACT syntax.

3.2.3.1.6.161 Syntax

DATEPART( interval, date_expr ) --> INT interval ::= YEAR | YYYY | YY | QUARTER | QQ | Q | MONTH | MM | M | DAYOFYEAR | DOY | DY | Y | DAY | DD | D | WEEK | WK | WW | WEEKDAY | DW | HOUR | HH | MINUTE | MI | N | SECOND | SS | S | MILLISECOND | MS

3.2.3.1.6.162 Arguments

Parame- Description ter interval An interval representing a date part. See the table below or the syntax reference above for valid date parts date_expr A DATE or DATETIME expression

550 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.163 Valid date parts

Date part Shorthand Definition YEAR YYYY, YY Year (0 - 9999) QUARTER QQ, Q Quarter (1-4) MONTH MM, M Month (1-12) DAYOFYEAR DOY, DY, Y Day of the year (1-365) DAY DD, D Day of the month (1-31) WEEK WK, WW Week of the year (1-52) WEEKDAY DW Weekday / Day of week (1-7) HOUR HH Hour (0-23) MINUTE MI, N Minute (0-59) SECOND SS, S Seconds (0-59) MILLISECOND MS Milliseconds (0-999)

Note: • The first day of the week is Sunday, when used with WEEKDAY.

3.2.3.1.6.164 Returns

• An integer representing the date part value

3.2.3.1.6.165 Notes

• All date parts work on a DATETIME. • The HOUR, MINUTE, SECOND, and MILLISECOND date parts work only on DATETIME. Using them on DATE will result in an error.

3.2.3.1.6.166 Examples

For these examples, consider the following table and contents:

CREATE TABLE cool_dates(name VARCHAR(40), d DATE, dt DATETIME);

INSERT INTO cool_dates VALUES ('Marty McFly goes back to this time','1955-11-05','1955- ,→11-05 01:21:00.000') ,('Marty McFly came from this time', '1985-10-26', '1985-10-26 01:22:00.000') ,('Vesuvius erupts', '79-08-24', '79-08-24 13:00:00.000') ,('1997 begins', '1997-01-01', '1997-01-01') ,('1997 ends', '1997-12-31','1997-12-31 23:59:59.999');

3.2.3.1.6.167 Break up a DATE into components

3.2. SQL Statements and Syntax 551 SQream DB, Release 2021.1

master=> SELECT DATEPART(YEAR, d) AS year, DATEPART(MONTH, d) AS month, DATEPART(DAY,d)␣ ,→AS day, . DATEPART(Q,d) AS quarter FROM cool_dates; year | month | day | quarter -----+------+-----+------1955 | 11 | 5 | 4 1985 | 10 | 26 | 4 79 | 8 | 24 | 3 1997 | 1 | 1 | 1 1997 | 12 | 31 | 4

3.2.3.1.6.168 Break up a DATETIME into time components master=> SELECT DATEPART(HOUR, dt) AS hour, DATEPART(MINUTE, dt) AS minute, . DATEPART(SECOND,dt) AS seconds, DATEPART(MILLISECOND,dt) AS milliseconds . FROM cool_dates; hour | minute | seconds | milliseconds -----+------+------+------1 | 21 | 0 | 0 1 | 22 | 0 | 0 13 | 0 | 0 | 0 0 | 0 | 0 | 0 23 | 59 | 59 | 999

3.2.3.1.6.169 Count number of rows grouped by quarter

Tip: Use ordinal aliases to avoid having to write complex functions in the GROUP BY clause. See Select lists for more information. master=> SELECT COUNT(*), DATEPART(Q, dt) AS quarter FROM cool_dates GROUP BY 2; count | quarter ------+------1 | 1 1 | 3 3 | 4

3.2.3.1.6.170 EOMONTH

Returns a DATE or DATETIME value, reset to midnight on the last day of the month.

Note: This function is provided for SQL Server compatability.

552 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.171 Syntax

EOMONTH( date_expr )

3.2.3.1.6.172 Arguments

Parameter Description date_expr A DATE or DATETIME expression

3.2.3.1.6.173 Returns

The return type is the same as the argument supplied.

3.2.3.1.6.174 Notes

• The time value will be set to midnight on the last day of the month, • See also TRUNC to find the first day of the month.

3.2.3.1.6.175 Examples

For these examples, consider the following table and contents:

CREATE TABLE cool_dates(name VARCHAR(40), d DATE, dt DATETIME);

INSERT INTO cool_dates VALUES ('Marty McFly goes back to this time','1955-11-05','1955- ,→11-05 01:21:00.000') ,('Marty McFly came from this time', '1985-10-26', '1985-10-26 01:22:00.000') ,('Vesuvius erupts', '79-08-24', '79-08-24 13:00:00.000') ,('1997 begins', '1997-01-01', '1997-01-01') ,('1997 ends', '1997-12-31','1997-12-31 23:59:59.999');

3.2.3.1.6.176 Find last day of the month for a DATE master=> SELECT name, d AS date, EOMONTH(d) FROM cool_dates; name | date | eomonth ------+------+------Marty McFly goes back to this time | 1955-11-05 | 1955-11-30 Marty McFly came from this time | 1985-10-26 | 1985-10-31 Vesuvius erupts | 0079-08-24 | 0079-08-31 1997 begins | 1997-01-01 | 1997-01-31 1997 ends | 1997-12-31 | 1997-12-31

3.2. SQL Statements and Syntax 553 SQream DB, Release 2021.1

3.2.3.1.6.177 Find the last day of the month for a DATETIME

Note: The time value is reset to midnight, regardless of the original time value. master=> SELECT name, dt AS datetime, EOMONTH(dt) FROM cool_dates; name | datetime | eomonth ------+------+------Marty McFly goes back to this time | 1955-11-05 01:21:00 | 1955-11-30 00:00:00 Marty McFly came from this time | 1985-10-26 01:22:00 | 1985-10-31 00:00:00 Vesuvius erupts | 0079-08-24 13:00:00 | 0079-08-31 00:00:00 1997 begins | 1997-01-01 00:00:00 | 1997-01-31 00:00:00 1997 ends | 1997-12-31 23:59:59 | 1997-12-31 00:00:00

3.2.3.1.6.178 EXTRACT

Extracts a date or time part from a DATE or DATETIME value.

Note: SQream DB also supports the SQL Server DATEPART syntax, which contains more date parts for use.

3.2.3.1.6.179 Syntax

EXTRACT( interval FROM date_expr ) --> DOUBLE interval ::= YEAR | MONTH | WEEK | DOY | DAY | HOUR | MINUTE | SECOND | MILLISECONDS

3.2.3.1.6.180 Arguments

Parame- Description ter interval An interval representing a date part. See the table below or the syntax reference above for valid date parts date_expr A DATE or DATETIME expression

554 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.181 Valid date parts

Date part Definition YEAR Year (0.0 - 9999.0) MONTH Month (1.0-12.0) DOY Day of the year (1.0-365.0) DAY Day of the month (1.0-31.0) WEEK Week of the year (1.0-52.0) HOUR Hour (0.0-23.0) MINUTE Minute (0.0-59.0) SECOND Seconds (0.0-59.0) MILLISECONDS Milliseconds (0.0-999.0)

3.2.3.1.6.182 Returns

• A floating point representing the date part value

3.2.3.1.6.183 Notes

• The HOUR, MINUTE, SECOND, and MILLISECOND date parts work only on DATETIME. Using them on DATE will result in an error.

3.2.3.1.6.184 Examples

For these examples, consider the following table and contents:

CREATE TABLE cool_dates(name VARCHAR(40), d DATE, dt DATETIME);

INSERT INTO cool_dates VALUES ('Marty McFly goes back to this time','1955-11-05','1955- ,→11-05 01:21:00.000') ,('Marty McFly came from this time', '1985-10-26', '1985-10-26 01:22:00.000') ,('Vesuvius erupts', '79-08-24', '79-08-24 13:00:00.000') ,('1997 begins', '1997-01-01', '1997-01-01') ,('1997 ends', '1997-12-31','1997-12-31 23:59:59.999');

3.2.3.1.6.185 Break up a DATE into components master=> SELECT EXTRACT(YEAR FROM d) AS year, EXTRACT(MONTH FROM d) AS month,␣ ,→EXTRACT(DAY FROM d) AS day . FROM cool_dates; year | month | day ------+------+----- 1955.0 | 11.0 | 5.0 1985.0 | 10.0 | 26.0 79.0 | 8.0 | 24.0 1997.0 | 1.0 | 1.0 1997.0 | 12.0 | 31.0

3.2. SQL Statements and Syntax 555 SQream DB, Release 2021.1

3.2.3.1.6.186 GETDATE

Returns the current date and time of the system.

Note: This function is provided for SQL Server compatability.

3.2.3.1.6.187 Syntax

GETDATE() --> DATETIME

3.2.3.1.6.188 Arguments

None

3.2.3.1.6.189 Returns

The current system date and time, with type DATETIME.

3.2.3.1.6.190 Notes

• Aliases to this function include CURRENT_TIMESTAMP and SYSDATE. • To get the date only, see CURRENT_DATE.

3.2.3.1.6.191 Examples

3.2.3.1.6.192 Get the current system date and time master=> SELECT CURRENT_TIMESTAMP, CURRENT_TIMESTAMP(), SYSDATE, GETDATE(); getdate | getdate0 | getdate1 | getdate2 ------+------+------+------2019-12-07 23:04:26 | 2019-12-07 23:04:26 | 2019-12-07 23:04:26 | 2019-12-07 23:04:26

3.2.3.1.6.193 Find events that happen before this month

We will use TRUNC to get the date at the beginning of this month, and then filter. master=> SELECT COUNT(*) FROM cool_dates WHERE dt <= TRUNC(GETDATE(), month); count ----- 5

556 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.194 SYSDATE

Returns the current date and time of the system.

Note: This function is provided for Oracle compatability and is called without parentheses.

3.2.3.1.6.195 Syntax

SYSDATE --> DATETIME

3.2.3.1.6.196 Arguments

None

3.2.3.1.6.197 Returns

The current system date and time, with type DATETIME.

3.2.3.1.6.198 Notes

• This function has a special form and is called without parentheses. • Aliases to this function include CURRENT_TIMESTAMP and GETDATE. • To get the date only, see CURRENT_DATE.

3.2.3.1.6.199 Examples

3.2.3.1.6.200 Get the current system date and time master=> SELECT CURRENT_TIMESTAMP, CURRENT_TIMESTAMP(), SYSDATE, GETDATE(); getdate | getdate0 | getdate1 | getdate2 ------+------+------+------2019-12-07 23:04:26 | 2019-12-07 23:04:26 | 2019-12-07 23:04:26 | 2019-12-07 23:04:26

3.2.3.1.6.201 Find events that happen before this month

We will use TRUNC to get the date at the beginning of this month, and then filter. master=> SELECT COUNT(*) FROM cool_dates WHERE dt <= TRUNC(SYSDATE, month); count ----- 5

3.2. SQL Statements and Syntax 557 SQream DB, Release 2021.1

3.2.3.1.6.202 TRUNC

Truncates a DATE or DATETIME value to a specified resolution. For example, truncating a DATE down to the nearest month returns the date of the first day of the month.

Note: This function is overloaded. The function TRUNC can also round numbers towards zero.

3.2.3.1.6.203 Syntax

TRUNC( date_expr [ , interval ]) interval ::= YEAR | YYYY | YY | QUARTER | QQ | Q | MONTH | MM | M | DAY | DD | D | WEEK | WK | WW | HOUR | HH | MINUTE | MI | N | SECOND | SS | S | MILLISECOND | MS

3.2.3.1.6.204 Arguments

Pa- Description rame- ter date_exprA DATE or DATETIME expression intervalAn interval representing a date part. See the table below or the syntax reference above for valid date parts. If not specified, sets the value to to midnight and returns a DATETIME.

3.2.3.1.6.205 Valid date parts

Date part Shorthand Definition YEAR YYYY, YY Year (0 - 9999) QUARTER QQ, Q Quarter (1-4) MONTH MM, M Month (1-12) DAY DD, D Day of the month (1-31) WEEK WK, WW Week of the year (1-52) HOUR HH Hour (0-23) MINUTE MI, N Minute (0-59) SECOND SS, S Seconds (0-59) MILLISECOND MS Milliseconds (0-999)

558 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.206 Returns

If no date part is specified, the return type is DATETIME. Otherwise, the return type is the same as the argument supplied.

3.2.3.1.6.207 Notes

• All date parts work on a DATETIME. • The HOUR, MINUTE, SECOND, and MILLISECOND date parts work only on DATETIME. Using them on DATE will result in an error. • If no date part is specified, the DATE or DATETIME value will be set to midnight on the date value. See examples below • See also EOMONTH to find the last day of the month.

3.2.3.1.6.208 Examples

For these examples, consider the following table and contents:

CREATE TABLE cool_dates(name VARCHAR(40), d DATE, dt DATETIME);

INSERT INTO cool_dates VALUES ('Marty McFly goes back to this time','1955-11-05','1955- ,→11-05 01:21:00.000') ,('Marty McFly came from this time', '1985-10-26', '1985-10-26 01:22:00.000') ,('Vesuvius erupts', '79-08-24', '79-08-24 13:00:00.000') ,('1997 begins', '1997-01-01', '1997-01-01') ,('1997 ends', '1997-12-31','1997-12-31 23:59:59.999');

3.2.3.1.6.209 Set all DATE columns to DATETIME at midnight master=> SELECT name, d AS date, trunc(d) FROM cool_dates; name | date | trunc ------+------+------Marty McFly goes back to this time | 1955-11-05 | 1955-11-05 00:00:00 Marty McFly came from this time | 1985-10-26 | 1985-10-26 00:00:00 Vesuvius erupts | 0079-08-24 | 0079-08-24 00:00:00 1997 begins | 1997-01-01 | 1997-01-01 00:00:00 1997 ends | 1997-12-31 | 1997-12-31 00:00:00

3.2.3.1.6.210 Find the first day of the month for dates master=> SELECT name, d AS date, trunc(d, MONTH) FROM cool_dates; name | date | trunc ------+------+------Marty McFly goes back to this time | 1955-11-05 | 1955-11-01 Marty McFly came from this time | 1985-10-26 | 1985-10-01 Vesuvius erupts | 0079-08-24 | 0079-08-01 (continues on next page)

3.2. SQL Statements and Syntax 559 SQream DB, Release 2021.1

(continued from previous page) 1997 begins | 1997-01-01 | 1997-01-01 1997 ends | 1997-12-31 | 1997-12-01

3.2.3.1.6.211 Calculate number of hours from New Years

Combine TRUNC with DATEDIFF to calculate the number of hours since New Years. master=> SELECT name, dt AS datetime, . , DATEDIFF(HOUR, trunc(dt, YEAR), dt) AS "Hours since New Years" . FROM cool_dates; name | datetime | Hours since New Years ------+------+------Marty McFly goes back to this time | 1955-11-05 01:21:00 | 7393 Marty McFly came from this time | 1985-10-26 01:22:00 | 7153 Vesuvius erupts | 0079-08-24 13:00:00 | 5653 1997 begins | 1997-01-01 00:00:00 | 0 1997 ends | 1997-12-31 23:59:59 | 8759

3.2.3.1.6.212 ABS

Returns the absolute (positive) value of a numeric expression

3.2.3.1.6.213 Syntax

ABS( expr )

3.2.3.1.6.214 Arguments

Parameter Description expr Numeric expression

3.2.3.1.6.215 Returns

Returns the same type as the argument supplied.

3.2.3.1.6.216 Notes

• If the value is NULL, the result is NULL.

3.2.3.1.6.217 Examples

For these examples, consider the following table and contents:

560 Chapter 3. Reference SQream DB, Release 2021.1

CREATE TABLE cool_numbers(i INT, f DOUBLE);

INSERT INTO cool_numbers VALUES (1,1.618033), (-12,-34) ,(22, 3.141592), (-26538, 2.7182818284) ,(NULL, NULL), (NULL,1.4142135623) ,(42,NULL), (-42, NULL) , (-474, 365);

3.2.3.1.6.218 Absolute value on an integer numbers=> SELECT ABS(-24); 24

3.2.3.1.6.219 Absolute value on integer and floating point numbers=> SELECT i, ABS(i), f, ABS(f) FROM cool_numbers; i | abs | f | abs0 ------+------+------+----- 1 | 1 | 1.62 | 1.62 -12 | 12 | -34 | 34 22 | 22 | 3.14 | 3.14 -26538 | 26538 | 2.72 | 2.72 | | | | | 1.41 | 1.41 42 | 42 | | -42 | 42 | | -474 | 474 | 365 | 365

3.2.3.1.6.220 ACOS

Returns the inverse cosine value of a numeric expression

3.2.3.1.6.221 Syntax

ACOS( expr )

3.2.3.1.6.222 Arguments

Parameter Description expr Numeric expression in range [-1.0, 1.0]

3.2.3.1.6.223 Returns

Always returns a floating point result of the inverse cosine, in radians.

3.2. SQL Statements and Syntax 561 SQream DB, Release 2021.1

3.2.3.1.6.224 Notes

• Valid inputs for an inverse cosine are in the range [ -1.0 , 1.0]. Inputs exceeding this range will result in an error. • The result is given in radians, in the range [0, π]. • If the input value is NULL, the result is NULL.

3.2.3.1.6.225 Examples

For these examples, consider the following table and contents:

CREATE TABLE trig(i INT, f DOUBLE);

INSERT INTO trig VALUES (0,0.0), (1,-0.866) ,(2,-0.707), (3,-0.5) ,(4,0.5), (5,0.707) ,(6, 0.866), (7, 1);

3.2.3.1.6.226 Inverse cosine of 0 (= π/2) numbers=> SELECT ACOS(0); acos ------1.5707963267948966

3.2.3.1.6.227 Computing inverse cosine for a column numbers=> SELECT f, ACOS(f) FROM trig; f | acos ------+----- 0 | 1.57 -0.87 | 2.62 -0.71 | 2.36 -0.5 | 2.09 0.5 | 1.05 0.71 | 0.79 0.87 | 0.52 1 | 0

3.2.3.1.6.228 Arithmetic operators

Arithmetic operators are functions that can be used as infix operators, or as prefix operators.

562 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.229 Syntax arithmetic_expr ::= | value_expr binary_operator value_expr | unary_operator value_expr arithmetic_infix_operator ::= + | - | * | / | % arithmetic_unary_operator ::= + | -

3.2.3.1.6.230 Arguments

Parameter Description value_expr Numeric expression, or expression that can be cast to a numeric expression

3.2.3.1.6.231 Arithmetic operators

Operator Syntax Description + (unary) +a Converts a string to a numeric value. Identical to a :: double + a + b Adds two expressions together - (unary) -a Negates a numeric expression - a - b Subtracts b from a * a * b Multiplies a by b / a / b Divides a by b % a % b Modulu of a by b. See also MOD, %

3.2.3.1.6.232 Notes

• If any of the inputs value are NULL, the result is NULL.

3.2.3.1.6.233 Examples

3.2.3.1.6.234 Coercing with a unary + numbers=> SELECT +'5'; 5.00

3.2.3.1.6.235 Using infix operators

3.2. SQL Statements and Syntax 563 SQream DB, Release 2021.1

numbers=> SELECT 5*3 AS "5*3", 5+3 AS "5+3", 2-5 AS "2-5" . , 2.0-5 AS "2.0-5", 11 % 5 AS "11%5", 11/5 AS "11/5", 11.0 / 5 AS "11.0/ ,→5"; 5*3 | 5+3 | 2-5 | 2.0-5 | 11%5 | 11/5 | 11.0/5 ----+-----+-----+------+------+------+------15 | 8 | -3 | -3 | 1 | 2 | 2.2

3.2.3.1.6.236 ASIN

Returns the inverse sine value of a numeric expression

3.2.3.1.6.237 Syntax

ASIN( expr )

3.2.3.1.6.238 Arguments

Parameter Description expr Numeric expression in range [-1.0, 1.0]

3.2.3.1.6.239 Returns

Always returns a floating point result of the inverse sine, in radians.

3.2.3.1.6.240 Notes

• Valid inputs for an inverse sine are in the range [ -1.0 , 1.0]. Inputs exceeding this range will result in an error. • The result is given in radians, in the range [-π/2, π/2]. • If the input value is NULL, the result is NULL.

3.2.3.1.6.241 Examples

For these examples, consider the following table and contents:

CREATE TABLE trig(i INT, f DOUBLE);

INSERT INTO trig VALUES (0,0.0), (1,-0.866) ,(2,-0.707), (3,-0.5) ,(4,0.5), (5,0.707) ,(6, 0.866), (7, 1);

564 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.242 Inverse sine of 1 (= π/2) numbers=> SELECT ASIN(1); asin ------1.5707963267948966

3.2.3.1.6.243 Computing inverse sine for a column numbers=> SELECT f, ASIN(f) FROM trig; f | asin ------+------0 | 0 -0.87 | -1.05 -0.71 | -0.79 -0.5 | -0.52 0.5 | 0.52 0.71 | 0.79 0.87 | 1.05 1 | 1.57

3.2.3.1.6.244 ATAN

Returns the inverse tangent value of a numeric expression

3.2.3.1.6.245 Syntax

ATAN( expr )

3.2.3.1.6.246 Arguments

Parameter Description expr Numeric expression

3.2.3.1.6.247 Returns

Always returns a floating point result of the inverse tangent, in radians.

3.2.3.1.6.248 Notes

• The result is given in radians, in the range [-π/2, π/2]. • If the input value is NULL, the result is NULL.

3.2. SQL Statements and Syntax 565 SQream DB, Release 2021.1

3.2.3.1.6.249 Examples

For these examples, consider the following table and contents:

CREATE TABLE trig(f DOUBLE);

INSERT INTO trig VALUES (-9999), (-3), (-2), (-1.732) , (-1), (-0.5), (0), (0.5) ,(1), (1.732), (2), (3), (9999);

3.2.3.1.6.250 Inverse tangent of 1 (= π/4) numbers=> SELECT ATAN(1); atan ---- 0.7853981633974483

3.2.3.1.6.251 Computing inverse tangent for a column numbers=> SELECT f, ASIN(f) FROM trig; f | atan ------+------9999 | -1.57069632 -3 | -1.24904577 -2 | -1.10714872 -1.732 | -1.04718485 -1 | -0.78539816 -0.5 | -0.46364761 0 | 0.0 0.5 | 0.46364761 1 | 0.78539816 1.732 | 1.04718485 2 | 1.10714872 3 | 1.24904577 9999 | 1.57069632

3.2.3.1.6.252 ATN2

Returns the inverse tangent of the ratio of x and y. ATN2(y,x) = ATAN(y/x) This represents the angle in radians from the positive x-axis of a plane and the point (x, y) on it, with positive sign for the upper half-plane.

3.2.3.1.6.253 Syntax

566 Chapter 3. Reference SQream DB, Release 2021.1

ATN2( expr1, expr2 )

3.2.3.1.6.254 Arguments

Parameter Description expr1 Numeric expression representing the y component. expr2 Numeric expression representing the x component

3.2.3.1.6.255 Returns

Always returns a floating point result of the inverse tangent, in radians.

3.2.3.1.6.256 Notes

• The result is given in radians, in the range (-π, π]. • If the input value is NULL, the result is NULL.

3.2.3.1.6.257 Examples

3.2.3.1.6.258 Inverse tangent of (1, 0) (= π/2) numbers=> SELECT ATN2(1,0); atn2 ---- 1.5707963267948966

3.2.3.1.6.259 Inverse tangent of x=0.5, y=SQRT(3)/2 (= π/3, or 60°)

Use SQRT to simplify input and DEGREES to convert the result from radians. numbers=> SELECT ATN2(0.5*SQRT(3), 0.5), DEGREES(ATN2(0.5*SQRT(3), 0.5)); atn2 | degrees ------+------1.0472 | 60

3.2.3.1.6.260 CEILING / CEIL

Calculates the smallest integer greater than the numeric expression given. See also ROUND, FLOOR.

3.2. SQL Statements and Syntax 567 SQream DB, Release 2021.1

3.2.3.1.6.261 Syntax

CEILING( expr )

CEIL ( expr ) --> DOUBLE

3.2.3.1.6.262 Arguments

Parameter Description expr Numeric expression

3.2.3.1.6.263 Returns

• CEIL Always returns a floating point result. • CEILING returns the same type as the argument supplied.

3.2.3.1.6.264 Notes

• If the input value is NULL, the result is NULL.

3.2.3.1.6.265 Examples

3.2.3.1.6.266 FLOOR vs. CEIL vs. ROUND numbers=> SELECT FLOOR(x), CEIL(x), ROUND(x) . FROM (VALUES (0.0001), (-0.0001) . , (PI()), (-2.718281), (500.1234)) as t(x); floor | ceil | round ------+------+------0 | 1 | -0 3 | 4 | 3 -3 | -2 | -3 500 | 501 | 500

3.2.3.1.6.267 COS

Returns the cosine value of a numeric expression

3.2.3.1.6.268 Syntax

COS( expr )

568 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.269 Arguments

Parameter Description expr Numeric expression representing an angle in radians

3.2.3.1.6.270 Returns

Always returns a floating point result of the cosine.

3.2.3.1.6.271 Notes

• The result is in the range [-1, 1]. • If the input value is NULL, the result is NULL.

3.2.3.1.6.272 Examples

For these examples, consider the following table and contents:

CREATE TABLE trig(f DOUBLE);

INSERT INTO trig VALUES (0), (PI()/12), (PI()/8), (PI()/6) , (PI()/4), (PI()/3), (3*PI()/8), (5*PI()/12), (PI()/2);

3.2.3.1.6.273 Cosine of 0 (= 1) numbers=> SELECT COS(0); cos --- 1.0

3.2.3.1.6.274 Computing cosine for a column, in radians numbers=> SELECT f, COS(f) FROM trig; f | cos ------+------0 | 1 0.2618 | 0.9659 0.3927 | 0.9239 0.5236 | 0.866 0.7854 | 0.7071 1.0472 | 0.5 1.1781 | 0.3827 1.309 | 0.2588 1.5708 | 0

3.2. SQL Statements and Syntax 569 SQream DB, Release 2021.1

3.2.3.1.6.275 COT

Returns the cotangent value of a numeric expression

3.2.3.1.6.276 Syntax

COT( expr )

3.2.3.1.6.277 Arguments

Parameter Description expr Numeric expression representing an angle in radians

3.2.3.1.6.278 Returns

Always returns a floating point result of the cotangent.

3.2.3.1.6.279 Notes

• The result is in the range [0, infinity). • If the input value is NULL, the result is NULL.

3.2.3.1.6.280 Examples

For these examples, consider the following table and contents:

CREATE TABLE trig(f DOUBLE);

INSERT INTO trig VALUES (0), (PI()/12), (PI()/8), (PI()/6) , (PI()/4), (PI()/3), (3*PI()/8), (5*PI()/12), (PI()/2);

3.2.3.1.6.281 Cotangent of π/4 (= 1)

Tip: Use RADIANS to convert degrees to radians numbers=> SELECT COT(PI()/4), COT(RADIANS(45)); cot | cot0 ----+----- 1 | 1

570 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.282 Computing cotangents for a column, in radians numbers=> SELECT f, COT(f) FROM trig; f | cot ------+------0 | 16331239353195370 0.2618 | 3.7321 0.3927 | 2.4142 0.5236 | 1.7321 0.7854 | 1 1.0472 | 0.5774 1.1781 | 0.4142 1.309 | 0.2679 1.5708 | 0

3.2.3.1.6.283 CRC64

Calculates the CRC-64 hash of a text expression

3.2.3.1.6.284 Syntax

CRC64( expr ) --> BIGINT

CRC64_JOIN( expr ) --> BIGINT

3.2.3.1.6.285 Arguments

Parameter Description expr Text expression (VARCHAR, TEXT)

3.2.3.1.6.286 Returns

Returns a CRC-64 hash of the text input, of type BIGINT.

3.2.3.1.6.287 Notes

• If the input value is NULL, the result is NULL. • The CRC64_JOIN can be used with VARCHAR only. It can not be used with TEXT. • The CRC64_JOIN variant ignores leading whitespace when used as a JOIN key.

3.2. SQL Statements and Syntax 571 SQream DB, Release 2021.1

3.2.3.1.6.288 Examples

3.2.3.1.6.289 Calculate a CRC-64 hash of a string numbers=> SELECT CRC64(x) FROM . (VALUES ('This is a relatively long text string, that can be converted to a shorter␣ ,→hash' :: varchar(80))) . as t(x); crc64 ------9085161068710498500

3.2.3.1.6.290 DEGREES

Converts a numeric expression from radian values to degree values. See also RADIANS.

3.2.3.1.6.291 Syntax

DEGREES( expr )

3.2.3.1.6.292 Arguments

Parameter Description expr Numeric expression

3.2.3.1.6.293 Returns

Always returns a floating point result of the value in degrees.

3.2.3.1.6.294 Notes

• If the value is NULL, the result is NULL.

3.2.3.1.6.295 Examples

For these examples, consider the following table and contents:

CREATE TABLE trig(f DOUBLE);

INSERT INTO trig VALUES (0), (PI()/12), (PI()/8), (PI()/6) , (PI()/4), (PI()/3), (3*PI()/8), (5*PI()/12), (PI()/2);

572 Chapter 3. Reference SQream DB, Release 2021.1

numbers=> SELECT f, DEGREES(f) FROM trig; f | degrees ------+------0 | 0 0.2618 | 15 0.3927 | 22.5 0.5236 | 30 0.7854 | 45 1.0472 | 60 1.1781 | 67.5 1.309 | 75 1.5708 | 90

3.2.3.1.6.296 EXP

Returns the natural exponent value of a numeric expression (ex) See also LOG.

3.2.3.1.6.297 Syntax

EXP( expr )

3.2.3.1.6.298 Arguments

Parameter Description expr Positive numeric expression

3.2.3.1.6.299 Returns

Always returns a floating point result.

3.2.3.1.6.300 Notes

• If the input value is NULL, the result is NULL.

3.2.3.1.6.301 Examples

3.2.3.1.6.302 Natural exponent values

numbers=> SELECT EXP(x) FROM (VALUES (1.0), (2.0), (4.0)) as t(x); exp ------2.7183 (continues on next page)

3.2. SQL Statements and Syntax 573 SQream DB, Release 2021.1

(continued from previous page) 7.3891 54.5982

3.2.3.1.6.303 FLOOR

Calculates the largest integer smaller or equal to the numeric expression given. See also ROUND, CEILING / CEIL.

3.2.3.1.6.304 Syntax

FLOOR ( expr )

3.2.3.1.6.305 Arguments

Parameter Description expr Numeric expression

3.2.3.1.6.306 Returns

Returns the same type as the argument supplied.

3.2.3.1.6.307 Notes

• If the input value is NULL, the result is NULL.

3.2.3.1.6.308 Examples

3.2.3.1.6.309 FLOOR vs. CEILING / CEIL vs. ROUND numbers=> SELECT FLOOR(x), CEIL(x), ROUND(x) . FROM (VALUES (0.0001), (-0.0001) . , (PI()), (-2.718281), (500.1234)) as t(x); floor | ceil | round ------+------+------0 | 1 | -0 3 | 4 | 3 -3 | -2 | -3 500 | 501 | 500

574 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.310 LOG

Returns the natural logarithm value of a numeric expression The natural logarithm is also known as ln or logarithm to the base of the mathematical constant e. See also LOG10.

3.2.3.1.6.311 Syntax

LOG( expr )

3.2.3.1.6.312 Arguments

Parameter Description expr Positive numeric expression

3.2.3.1.6.313 Returns

Always returns a floating point result.

3.2.3.1.6.314 Notes

• If the input value is NULL, the result is NULL.

3.2.3.1.6.315 Examples

3.2.3.1.6.316 Natural log values numbers=> SELECT LOG(0.0001), LOG(1), LOG(2.718281), LOG(5000000000000000); log | log0 | log1 | log2 ------+------+------+------9.2103 | 0 | 1 | 36.1482

3.2.3.1.6.317 LOG10

Returns the 10-base logarithm value of a numeric expression See also LOG (natural).

3.2.3.1.6.318 Syntax

LOG10( expr )

3.2. SQL Statements and Syntax 575 SQream DB, Release 2021.1

3.2.3.1.6.319 Arguments

Parameter Description expr Positive numeric expression

3.2.3.1.6.320 Returns

Always returns a floating point result.

3.2.3.1.6.321 Notes

• If the input value is NULL, the result is NULL.

3.2.3.1.6.322 Examples

3.2.3.1.6.323 Log (base 10) values numbers=> SELECT LOG10(0.0001), LOG10(1), LOG10(100), LOG10(1000000); log10 | log100 | log101 | log102 ------+------+------+------4 | 0 | 2 | 6

3.2.3.1.6.324 MOD, %

Calculates the modulu (remainder) of x divided by y. This function can be also be called with an infix operator, x % y.

3.2.3.1.6.325 Syntax modulu ::= MOD( expr1, expr2 )

| expr1 % expr2

3.2.3.1.6.326 Arguments

Parameter Description expr1 Integer expression (dividend) expr2 Integer expression (divisor)

3.2.3.1.6.327 Returns

Always returns a integer result.

576 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.328 Notes

• If the input value is NULL, the result is NULL. • Both inputs are expected to be integers

3.2.3.1.6.329 Examples

3.2.3.1.6.330 Using the MOD function numbers=> SELECT MOD(11,-5), MOD(8,5), MOD(1,1); mod | mod0 | mod1 ----+------+----- 1 | 3 | 0

3.2.3.1.6.331 Using the infix operator numbers=> SELECT 11 %-5, 8 % 5, 1 % 1; ?column? | ?column?0 | ?column?1 ------+------+------1 | 3 | 0

3.2.3.1.6.332 PI

Returns a constant value of π

3.2.3.1.6.333 Syntax

PI( )

3.2.3.1.6.334 Arguments

None

3.2.3.1.6.335 Returns

Always returns a floating point value of π, to 10 digits after the decimal.

3.2.3.1.6.336 Examples

3.2.3.1.6.337 Using PI() to represent degrees in radians

3.2. SQL Statements and Syntax 577 SQream DB, Release 2021.1

CREATE TABLE trig(f DOUBLE);

INSERT INTO trig VALUES (0), (PI()/12), (PI()/8), (PI()/6) , (PI()/4), (PI()/3), (3*PI()/8), (5*PI()/12), (PI()/2);

3.2.3.1.6.338 Tangent of π/4 (= 1)

Tip: Use RADIANS to convert degrees to radians numbers=> SELECT TAN(PI()/4), TAN(RADIANS(45)); tan | tan0 ----+----- 1 | 1

3.2.3.1.6.339 Sine of π/2 (= 1) numbers=> SELECT SIN(PI()/2), SIN(RADIANS(90)); sin | sin0 ----+----- 1 | 1

3.2.3.1.6.340 POWER

Returns the value of x to the power of y (xy). POWER(x,2) is equivalent to SQUARE(x). Some DBMSs call this feature POW. See also: SQUARE.

3.2.3.1.6.341 Syntax

POWER( base, exponent )

3.2.3.1.6.342 Arguments

Parameter Description base Numeric expression (x) exponent Numeric expression (y)

3.2.3.1.6.343 Returns

Returns the same type as the argument supplied.

578 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.344 Notes

• If the value is NULL, the result is NULL.

3.2.3.1.6.345 Examples

3.2.3.1.6.346 On integers numbers=> SELECT POWER(3,x) FROM (VALUES (1), (2), (3), (4), (5)) AS t(x); power ----- 3 9 27 81 243

3.2.3.1.6.347 On floating point numbers=> SELECT POWER(3.0,x) FROM (VALUES (1), (2), (3), (4), (5)) AS t(x); power ----- 3.0 9.0 27.0 81.0 243.0

3.2.3.1.6.348 RADIANS

Converts a numeric expression from degree values to radian values. See also DEGREES.

3.2.3.1.6.349 Syntax

RADIANS( expr )

3.2.3.1.6.350 Arguments

Parameter Description expr Numeric expression

3.2. SQL Statements and Syntax 579 SQream DB, Release 2021.1

3.2.3.1.6.351 Returns

Always returns a floating point result of the value in radians.

3.2.3.1.6.352 Notes

• If the value is NULL, the result is NULL.

3.2.3.1.6.353 Examples

For these examples, consider the following table and contents:

CREATE TABLE trig(f DOUBLE);

INSERT INTO trig VALUES (0), (15), (22.5) ,(30), (45), (60), (67.5), (75), (90);

-- Equivalent to these values in radians: -- (0), (PI()/12), (PI()/8), (PI()/6) -- ,(PI()/4), (PI()/3), (3*PI()/8), (5*PI()/12), (PI()/2); numbers=> SELECT f, RADIANS(f) FROM trig; f | radians -----+------0 | 0 15 | 0.2618 22.5 | 0.3927 30 | 0.5236 45 | 0.7854 60 | 1.0472 67.5 | 1.1781 75 | 1.309 90 | 1.5708

3.2.3.1.6.354 ROUND

Rounds a numeric expression to the nearest precision. See also CEILING / CEIL, FLOOR.

3.2.3.1.6.355 Syntax

ROUND( expr [, scale ] )

580 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.356 Arguments

Parameter Description expr Numeric expression to round scale Number of digits after the decimal point to round to. Defaults to 0 if not specified.

3.2.3.1.6.357 Returns

Always returns a floating point result.

3.2.3.1.6.358 Notes

• If the input value is NULL, the result is NULL.

3.2.3.1.6.359 Examples

3.2.3.1.6.360 Rounding to the nearest integer numbers=> SELECT ROUND(x) FROM (VALUES (0.0001), (PI()), (-2.718281), (500.1234), (0.5),␣ ,→(1.5)) as t(x); round ------0 3 -3 500 1 2

3.2.3.1.6.361 Rounding to 2 digits after the decimal point numbers=> SELECT ROUND(x,2) FROM (VALUES (0.0001), (PI()), (-2.718281), (500.1234)) as␣ ,→t(x); round ------0 3.14 -2.72 500.12

3.2.3.1.6.362 FLOOR vs. CEILING / CEIL vs. ROUND numbers=> SELECT FLOOR(x), CEIL(x), ROUND(x) . FROM (VALUES (0.0001), (-0.0001) (continues on next page)

3.2. SQL Statements and Syntax 581 SQream DB, Release 2021.1

(continued from previous page) . , (PI()), (-2.718281), (500.1234)) as t(x); floor | ceil | round ------+------+------0 | 1 | 0 -1 | 0 | 0 3 | 4 | 3 -3 | -2 | -3 500 | 501 | 500

3.2.3.1.6.363 SIN

Returns the sine value of a numeric expression

3.2.3.1.6.364 Syntax

SIN( expr )

3.2.3.1.6.365 Arguments

Parameter Description expr Numeric expression representing an angle in radians

3.2.3.1.6.366 Returns

Always returns a floating point result of the sine.

3.2.3.1.6.367 Notes

• The result is in the range [-1, 1]. • If the input value is NULL, the result is NULL.

3.2.3.1.6.368 Examples

For these examples, consider the following table and contents:

CREATE TABLE trig(f DOUBLE);

INSERT INTO trig VALUES (0), (PI()/12), (PI()/8), (PI()/6) , (PI()/4), (PI()/3), (3*PI()/8), (5*PI()/12), (PI()/2);

582 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.369 Sine of π/2 (= 1)

Tip: Use RADIANS to convert degrees to radians numbers=> SELECT SIN(PI()/2), SIN(RADIANS(90)); sin | sin0 ----+----- 1 | 1

3.2.3.1.6.370 Computing sine for a column, in radians numbers=> SELECT f, SIN(f) FROM trig; f | sin ------+------0 | 0 0.2618 | 0.2588 0.3927 | 0.3827 0.5236 | 0.5 0.7854 | 0.7071 1.0472 | 0.866 1.1781 | 0.9239 1.309 | 0.9659 1.5708 | 1

3.2.3.1.6.371 SQRT

Returns the square root value of a non-negative numeric expression.

3.2.3.1.6.372 Syntax

SQRT( expr )

3.2.3.1.6.373 Arguments

Parameter Description expr Non-negative numeric expression

3.2.3.1.6.374 Returns

Always returns a floating point result of the square root

3.2. SQL Statements and Syntax 583 SQream DB, Release 2021.1

3.2.3.1.6.375 Notes

• If the value is NULL, the result is NULL. • If the numeric expression is negative (<0), an error will be shown.

3.2.3.1.6.376 Examples

For these examples, consider the following table and contents:

CREATE TABLE cool_numbers(i INT, f DOUBLE);

INSERT INTO cool_numbers VALUES (1,1.618033), (-12,-34) ,(22, 3.141592), (-26538, 2.7182818284) ,(NULL, NULL), (NULL,1.4142135623) ,(42,NULL), (-42, NULL) , (-474, 365); numbers=> SELECT SQRT(2); sqrt ------1.4142135623730951

Note: • SQRT always returns a floating point result. • Some clients may show fewer digits. See your client settings to change the precision shown.

3.2.3.1.6.377 Replacing negative values with NULL

Combine with CASE to filter out negative results: numbers=> SELECT i, SQRT(CASE WHEN i>=0 THEN i ELSE NULL END) FROM cool_numbers; i | sqrt ------+----- 1 | 1 -12 | 22 | 4.69 -26538 | | | 42 | 6.48 -42 | -474 |

3.2.3.1.6.378 SQUARE

Returns the square value of a numeric expression (x2). SQUARE(x) is equivalent to POWER(x,2). See POWER.

584 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.379 Syntax

SQUARE( expr )

3.2.3.1.6.380 Arguments

Parameter Description expr Numeric expression

3.2.3.1.6.381 Returns

Always returns a floating point result

3.2.3.1.6.382 Notes

• If the value is NULL, the result is NULL.

3.2.3.1.6.383 Examples numbers=> SELECT SQUARE(3); square ------9.0

3.2.3.1.6.384 TAN

Returns the tangent value of a numeric expression

3.2.3.1.6.385 Syntax

TAN( expr )

3.2.3.1.6.386 Arguments

Parameter Description expr Numeric expression representing an angle in radians

3.2.3.1.6.387 Returns

Always returns a floating point result of the tangent.

3.2. SQL Statements and Syntax 585 SQream DB, Release 2021.1

3.2.3.1.6.388 Notes

• The result is in the range [0, infinity). • If the input value is NULL, the result is NULL.

3.2.3.1.6.389 Examples

For these examples, consider the following table and contents:

CREATE TABLE trig(f DOUBLE);

INSERT INTO trig VALUES (0), (PI()/12), (PI()/8), (PI()/6) , (PI()/4), (PI()/3), (3*PI()/8), (5*PI()/12), (PI()/2);

3.2.3.1.6.390 Tangent of π/4 (= 1)

Tip: Use RADIANS to convert degrees to radians numbers=> SELECT TAN(PI()/4), TAN(RADIANS(45)); tan | tan0 ----+----- 1 | 1

3.2.3.1.6.391 Computing tangents for a column, in radians numbers=> SELECT f, TAN(f) FROM trig; f | tan ------+------0 | 0 0.2618 | 0.2679 0.3927 | 0.4142 0.5236 | 0.5774 0.7854 | 1 1.0472 | 1.7321 1.1781 | 2.4142 1.309 | 3.7321 1.5708 | 16331239353195370

3.2.3.1.6.392 TRUNC

Rounds a number to its integer representation towards 0.

Note: This function is overloaded. The function TRUNC can also modify the precision of DATE and DATETIME values.

586 Chapter 3. Reference SQream DB, Release 2021.1

See also ROUND.

3.2.3.1.6.393 Syntax

TRUNC( expr )

3.2.3.1.6.394 Arguments

Parameter Description expr Numeric expression

3.2.3.1.6.395 Returns

Returns the same type as the argument supplied.

3.2.3.1.6.396 Notes

• If the input value is NULL, the result is NULL.

3.2.3.1.6.397 Examples

3.2.3.1.6.398 Rounding to the nearest integer numbers=> SELECT TRUNC(x) FROM (VALUES (0.0001), (PI()), (-2.718281), (500.1234)) as␣ ,→t(x); trunc ----- 0.0 3.0 -2.0 500.0

3.2.3.1.6.399 TRUNC vs. ROUND numbers=> SELECT TRUNC(x), ROUND(x) . FROM (VALUES (0.0001), (-0.0001) . , (PI()), (-2.718281), (500.1234)) as t(x); trunc | round ------+------0.0 | -0.0 -0.0 | -0.0 3.0 | 3.0 -2.0 | -3.0 500.0 | 500.0

3.2. SQL Statements and Syntax 587 SQream DB, Release 2021.1

3.2.3.1.6.400 CHAR_LENGTH

Calculates the number of characters in a string.

Note: • This function is supported on TEXT only. • To get the length in bytes, see OCTET_LENGTH. • For VARCHAR strings, the octet length is the number of characters. Use LEN instead.

3.2.3.1.6.401 Syntax

CHAR_LEN( text_expr ) --> INT

3.2.3.1.6.402 Arguments

Parameter Description text_expr TEXT expression

3.2.3.1.6.403 Returns

Returns an integer containing the number of characters in the string.

3.2.3.1.6.404 Notes

• To get the length in bytes, see OCTET_LENGTH • If the value is NULL, the result is NULL.

3.2.3.1.6.405 Examples

For these examples, consider the following table and contents:

CREATE TABLE alphabets(line TEXT(50));

INSERT INTO alphabets VALUES ('abcdefghijklmnopqrstuvwxyz'), ('￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿') ,('￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿');

3.2.3.1.6.406 Length in characters and bytes of strings

ASCII characters take up 1 byte per character, while Thai takes up 3 bytes and Hebrew takes up 2 bytes. Unlike LEN, CHAR_LENGTH preserves the trailing whitespaces.

588 Chapter 3. Reference SQream DB, Release 2021.1

t=> SELECT LEN(line), CHAR_LENGTH(line), OCTET_LENGTH(line) FROM alphabets; len | char_length | octet_length ----+------+------26 | 26 | 26 47 | 47 | 141 22 | 22 | 44

3.2.3.1.6.407 CHARINDEX

Returns the starting position of a string inside another string. See also PATINDEX, REGEXP_INSTR.

3.2.3.1.6.408 Syntax

CHARINDEX ( needle_string_expr , haystack_string_expr [ , start_location ] )

3.2.3.1.6.409 Arguments

Parameter Description needle_string_exprString to find haystack_string_exprString to search within start_location An integer at which the search starts. This value is optinoal and when not supplied, the search starts at the beggining of needle_string_expr

3.2.3.1.6.410 Returns

Integer start position of a match, or 0 if no match was found.

3.2.3.1.6.411 Notes

• If the value is NULL, the result is NULL.

3.2.3.1.6.412 Examples

For these examples, consider the following table and contents:

CREATE TABLE jabberwocky(line VARCHAR(50));

INSERT INTO jabberwocky VALUES ('''Twas brillig, and the slithy toves '), (' Did gyre and gimble in the wabe: ') ,('All mimsy were the borogoves, '), (' And the mome raths outgrabe. ') ,('"Beware the Jabberwock, my son! '), (' The jaws that bite, the claws that␣ ,→catch! ') ,('Beware the Jubjub bird, and shun '), (' The frumious Bandersnatch!" ');

3.2. SQL Statements and Syntax 589 SQream DB, Release 2021.1

3.2.3.1.6.413 Using CHARINDEX t=> SELECT line, CHARINDEX('the', line) FROM jabberwocky line | charindex ------+------'Twas brillig, and the slithy toves | 20 Did gyre and gimble in the wabe: | 30 All mimsy were the borogoves, | 16 And the mome raths outgrabe. | 11 "Beware the Jabberwock, my son! | 9 The jaws that bite, the claws that catch! | 27 Beware the Jubjub bird, and shun | 8 The frumious Bandersnatch!" | 0

3.2.3.1.6.414 || (Concatenate)

Concatenate two strings to create a longer string

3.2.3.1.6.415 Syntax expr1 || expr2

3.2.3.1.6.416 Arguments

Parameter Description expr1, expr2 String expressions

3.2.3.1.6.417 Returns

Returns the same type as the argument supplied.

3.2.3.1.6.418 Notes

• Both values must be strings, and can’t be NULL. If NULLS are expected, use COALESCE. • SQream DB removes the trailing spaces from strings by default, which may lead to unexpected results. See the examples for more information.

3.2.3.1.6.419 Examples

For these examples, assume a table named nba, with the following structure:

590 Chapter 3. Reference SQream DB, Release 2021.1

CREATE TABLE nba ( Name varchar(40), Team varchar(40), Number tinyint, Position varchar(2), Age tinyint, Height varchar(4), Weight real, College varchar(40), Salary float );

Here’s a peek at the table contents (Download nba.csv):

Table 28: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics

3.2.3.1.6.420 Concatenate two values

Convert values to string types before concatenation nba=> SELECT ("Age" :: VARCHAR(2)) || "Name" FROM nba ORDER BY 1 DESC LIMIT 5; ?column? ------40Tim Duncan 40Kevin Garnett 40Andre Miller 39Vince Carter 39Pablo Prigioni

3.2. SQL Statements and Syntax 591 SQream DB, Release 2021.1

3.2.3.1.6.421 Concatenate two strings t=> SELECT 'Hello, this is' || ' nice'; ?column? ------Hello, this is nice

Warning: Trailing spaces are trimmed by default. For example, t=> SELECT 'Hello, this is ' || 'nice'; ?column? ------Hello, this isnice

This may sometimes lead to an unexpected result. See the example below for a remedy.

3.2.3.1.6.422 Adding spaces

Add a space and concatenate it first to bypass the space trimming issue nba=> SELECT ("Age" :: VARCHAR(2) || (' ' || "Name")) FROM nba ORDER BY 1 DESC LIMIT 5; ?column? ------40 Tim Duncan 40 Kevin Garnett 40 Andre Miller 39 Vince Carter 39 Pablo Prigioni

3.2.3.1.6.423 ISPREFIXOF

Checks if one string is a prefix of the other. This is a more peformant way to write y LIKE (x || '%') See also: LIKE.

3.2.3.1.6.424 Syntax

ISPREFIXOF(needle_string_expr , haystack_string_expr) --> BOOL

3.2.3.1.6.425 Arguments

Parameter Description needle_string_expr String to locate haystack_string_expr String to search within

592 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.426 Returns

TRUE if needle_string_expr is a prefix of haystack_string_expr, or FALSE otherwise.

3.2.3.1.6.427 Notes

• This function is supported on TEXT strings only. • If the value is NULL, the result is NULL.

3.2.3.1.6.428 Examples

For these examples, consider the following table and contents:

CREATE TABLE jabberwocky(line VARCHAR(50));

INSERT INTO jabberwocky VALUES ('''Twas brillig, and the slithy toves '), (' Did gyre and gimble in the wabe: ') ,('All mimsy were the borogoves, '), (' And the mome raths outgrabe. ') ,('"Beware the Jabberwock, my son! '), (' The jaws that bite, the claws that␣ ,→catch! ') ,('Beware the Jubjub bird, and shun '), (' The frumious Bandersnatch!" ');

3.2.3.1.6.429 Filtering using ISPREFIXOF t=> SELECT line FROM jabberwocky WHERE ISPREFIXOF('And',TRIM(line)); line ------And the mome raths outgrabe.

Tip: Use TRIM to avoid leading and trailing whitespace issues

3.2.3.1.6.430 LEFT

Returns the left part of a character string with the specified number of characters. See also RIGHT.

3.2.3.1.6.431 Syntax

LEFT( expr , character_count )

3.2. SQL Statements and Syntax 593 SQream DB, Release 2021.1

3.2.3.1.6.432 Arguments

Parameter Description expr String expression character_count A positive integer that specifies how many characters to return.

3.2.3.1.6.433 Returns

Returns the same type as the argument supplied.

3.2.3.1.6.434 Notes

• This function works on TEXT strings only. • If the value is NULL, the result is NULL.

3.2.3.1.6.435 Examples

For these examples, consider the following table and contents:

CREATE TABLE jabberwocky(line TEXT(50));

INSERT INTO jabberwocky VALUES ('''Twas brillig, and the slithy toves '), (' Did gyre and gimble in the wabe: ') ,('All mimsy were the borogoves, '), (' And the mome raths outgrabe. ') ,('"Beware the Jabberwock, my son! '), (' The jaws that bite, the claws that␣ ,→catch! ') ,('Beware the Jubjub bird, and shun '), (' The frumious Bandersnatch!" ');

3.2.3.1.6.436 Using LEFT t=> SELECT LEFT(line, 10), RIGHT(line, 10) FROM jabberwocky; left | right ------+------'Twas bril | thy toves Did | the wabe: All mimsy | orogoves, And | outgrabe. "Beware th | , my son! The | at catch! Beware the | and shun The | rsnatch!"

3.2.3.1.6.437 LEN

Calculates the number of characters in a string.

594 Chapter 3. Reference SQream DB, Release 2021.1

Note: • This function is provided for SQL Server compatability. • For UTF-8 encoded TEXT strings, multi-byte characters are counted as a single character. To get the length in bytes, see OCTET_LENGTH. To get the length in characters, see CHAR_LENGTH.

3.2.3.1.6.438 Syntax

LEN( expr ) --> INT

3.2.3.1.6.439 Arguments

Parameter Description expr String expression

3.2.3.1.6.440 Returns

Returns an integer containing the number of characters in the string.

3.2.3.1.6.441 Notes

• If the value is NULL, the result is NULL.

3.2.3.1.6.442 Examples

For these examples, consider the following table and contents:

CREATE TABLE jabberwocky(line VARCHAR(50));

INSERT INTO jabberwocky VALUES ($$'Twas brillig, and the slithy toves$$), (' Did gyre and gimble in the wabe:') ,('All mimsy were the borogoves,'), (' And the mome raths outgrabe.') ,('"Beware the Jabberwock, my son!'), (' The jaws that bite, the claws that␣ ,→catch!') ,('Beware the Jubjub bird, and shun'), (' The frumious Bandersnatch!"');

CREATE TABLE alphabets(line TEXT(50));

INSERT INTO alphabets VALUES ('abcdefghijklmnopqrstuvwxyz'), ('￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿') ,('￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿');

3.2. SQL Statements and Syntax 595 SQream DB, Release 2021.1

3.2.3.1.6.443 Length in characters and bytes of strings

ASCII characters take up 1 byte per character, while Thai takes up 3 bytes and Hebrew takes up 2 bytes. t=> SELECT LEN(line), OCTET_LENGTH(line) FROM alphabets; len | octet_length ----+------26 | 26 47 | 141 22 | 44

3.2.3.1.6.444 Length of an ASCII string

Note: SQream DB does not count trailing spaces, but does keep leading spaces. t=> SELECT LEN('Trailing spaces are not counted '); len --- 31 t=> SELECT LEN(' Leading spaces are counted'); len --- 38

3.2.3.1.6.445 Absolute value on integer and floating point numbers=> SELECT LEN(line) FROM jabberwocky; len --- 35 38 29 34 31 47 32 33

3.2.3.1.6.446 LIKE

Tests if a string matches a given pattern. LIKE and RLIKE are similar. LIKE uses SQL patterns, whereas RLIKE uses POSIX regular expressions. See also: RLIKE, REGEXP_COUNT, REGEXP_INSTR, REGEXP_SUBSTR, ISPREFIXOF.

596 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.447 Syntax string_expr [ NOT ] LIKE string_test_expr

3.2.3.1.6.448 Arguments

Parameter Description string_expr String to test string_test_expr Test pattern

3.2.3.1.6.449 Test patterns

Pat- Description tern % match zero or more characters _ (un- match exactly one character der- score) [A-Z] match any character between A and Z inclusive. The - character between two other characters forms a range that matches all characters from the first character to the second. For example, [A-Z] matches all ASCII capital letters. [^A-Z] match any character not between A and Z [abcde] match any one of a b c d and e [^abcde]match any character that is not one of a b c d or e [abcC-F]match a b c or any character between C and F

• \ (backslash) - escape character Using a backslash (\) indicates that the wildcard is interpreted as a regular character and not as a wildcard.

3.2.3.1.6.450 Returns

TRUE if the test string matches the pattern, or FALSE otherwise.

3.2.3.1.6.451 Notes

• The test pattern must be literal string. If matching just the beginning of the string, use ISPREFIXOF which supports column references. • Starting with version 2020.3.1, Column references or complex expressions are also supported. • If the value is NULL, the result is NULL.

3.2.3.1.6.452 Examples

For these examples, assume a table named nba, with the following structure:

3.2. SQL Statements and Syntax 597 SQream DB, Release 2021.1

CREATE TABLE nba ( Name varchar(40), Team varchar(40), Number tinyint, Position varchar(2), Age tinyint, Height varchar(4), Weight real, College varchar(40), Salary float );

Here’s a peek at the table contents (Download nba.csv):

Table 29: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics

3.2.3.1.6.453 Match the beginning of a string nba=> SELECT "Name","Age","Salary","Team" FROM nba WHERE "Team" LIKE 'Portland%' LIMIT 5; Name | Age | Salary | Team ------+-----+------+------Cliff Alexander | 20 | 525093 | Portland Trail Blazers Al-Farouq Aminu | 25 | 8042895 | Portland Trail Blazers Pat Connaughton | 23 | 625093 | Portland Trail Blazers Allen Crabbe | 24 | 947276 | Portland Trail Blazers Ed Davis | 27 | 6980802 | Portland Trail Blazers

Tip: ISPREFIXOF is a more performant way to match the beginning of a string, especially This example can be written as

598 Chapter 3. Reference SQream DB, Release 2021.1

SELECT "Name","Age","Salary","Team" FROM nba WHERE ISPREFIXOF('Portland',"Team") LIMIT 5;

3.2.3.1.6.454 Match a wildcard character by escaping

To match a wildcard, escape it with a backslash escape character: nba=> SELECT "Name" FROM nba WHERE "Name" LIKE '%\_%'; Name | Age | Salary | Team ------+-----+------+------R.J._Hunter | 22 | 1148640 | Boston Celtics

3.2.3.1.6.455 Negate with NOT nba=> SELECT "Name","Age","Salary","Team" FROM nba WHERE "Team" NOT LIKE 'Portland%'␣ ,→LIMIT 5; Name | Age | Salary | Team ------+-----+------+------Avery Bradley | 25 | 7730337 | Boston Celtics Jae Crowder | 25 | 6796117 | Boston Celtics John Holland | 27 | | Boston Celtics R.J. Hunter | 22 | 1148640 | Boston Celtics Jonas Jerebko | 29 | 5000000 | Boston Celtics

3.2.3.1.6.456 Match the middle of a string nba=> SELECT "Name","Age","Salary","Team" FROM nba WHERE "Team" LIKE '%zz%' LIMIT 5; Name | Age | Salary | Team ------+-----+------+------Jordan Adams | 21 | 1404600 | Memphis Grizzlies Tony Allen | 34 | 5158539 | Memphis Grizzlies Chris Andersen | 37 | 5000000 | Memphis Grizzlies Matt Barnes | 36 | 3542500 | Memphis Grizzlies Vince Carter | 39 | 4088019 | Memphis Grizzlies

3.2.3.1.6.457 Find players with a middle name or suffix nba=> SELECT "Name","Age","Salary","Team" FROM nba WHERE "Name" LIKE '%%%'; Name | Age | Salary | Team ------+-----+------+------James Michael McAdoo | 23 | 845059 | Golden State Warriors Luc Richard Mbah a Moute | 29 | 947276 | Los Angeles Clippers Larry Nance Jr. | 23 | 1155600 | Los Angeles Lakers Metta World Peace | 36 | 947276 | Los Angeles Lakers Glenn Robinson III | 22 | 1100000 | Indiana Pacers Johnny O'Bryant III | 23 | 845059 | Milwaukee Bucks (continues on next page)

3.2. SQL Statements and Syntax 599 SQream DB, Release 2021.1

(continued from previous page) Tim Hardaway Jr. | 24 | 1304520 | Atlanta Hawks Frank Kaminsky III | 23 | 2612520 | Charlotte Hornets Kelly Oubre Jr. | 20 | 1920240 | Washington Wizards Otto Porter Jr. | 23 | 4662960 | Washington Wizards

3.2.3.1.6.458 Find NON-LITERAL patterns nba=> CREATE TABLE t(x int not null, y text not null, z text not null); nba=> INSERT INTO t VALUES (1,'abc','a'),(2,'abcd','bc');

3.2.3.1.6.459 Select rows in which z is a prefix of y: nba=> SELECT * FROM t WHERE y LIKE z || '%'; x | y | z ------1 | abc | a

3.2.3.1.6.460 Select rows in which y contains z as a substring: nba=> SELECT * FROM t WHERE y LIKE z || '%'; x | y | z ------1 | abc | a 2 | abcd | bc

3.2.3.1.6.461 Values that contain wildcards as well: nba=> CREATE TABLE patterns(x text not null); nba=> INSERT INTO patterns values ('%'),('a%'),('%a'); nba=> SELECT x, 'abc' LIKE x FROM patterns; x | ?column? ------% | 1 a% | 1 %a | 0

3.2.3.1.6.462 LOWER

Converts characters in a string to lower case See also: UPPER

600 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.463 Syntax

LOWER( expr )

3.2.3.1.6.464 Arguments

Parameter Description expr String expression

3.2.3.1.6.465 Returns

Returns the same type as the argument supplied.

3.2.3.1.6.466 Notes

• If the value is NULL, the result is NULL.

3.2.3.1.6.467 Examples

For these examples, consider the following table and contents:

CREATE TABLE jabberwocky(line VARCHAR(50));

INSERT INTO jabberwocky VALUES ('''Twas brillig, and the slithy toves'), (' Did gyre and gimble in the wabe:') ,('All mimsy were the borogoves,'), (' And the mome raths outgrabe.') ,('"Beware the Jabberwock, my son!'), (' The jaws that bite, the claws that␣ ,→catch!') ,('Beware the Jubjub bird, and shun'), (' The frumious Bandersnatch!"');

3.2.3.1.6.468 Lower-casing a literal value t=> SELECT LOWER('SQream DB'); sqream db

3.2.3.1.6.469 Lower-casing a column of values t=> SELECT LOWER(line) FROM jabberwocky; lower ------'twas brillig, and the slithy toves did gyre and gimble in the wabe: all mimsy were the borogoves, (continues on next page)

3.2. SQL Statements and Syntax 601 SQream DB, Release 2021.1

(continued from previous page) and the mome raths outgrabe. "beware the jabberwock, my son! the jaws that bite, the claws that catch! beware the jubjub bird, and shun the frumious bandersnatch!"

3.2.3.1.6.470 LTRIM

Trims leading whitespace from a string. See also TRIM, RTRIM.

3.2.3.1.6.471 Syntax

LTRIM( expr )

3.2.3.1.6.472 Arguments

Parameter Description expr String expression

3.2.3.1.6.473 Returns

Returns the same type as the argument supplied.

3.2.3.1.6.474 Notes

• This function is equivalent to the ANSI form TRIM( LEADING FROM expr ) • If the value is NULL, the result is NULL.

3.2.3.1.6.475 Examples

For these examples, consider the following table and contents:

CREATE TABLE jabberwocky(line TEXT(50));

INSERT INTO jabberwocky VALUES ('''Twas brillig, and the slithy toves'), (' Did gyre and gimble in the wabe:') ,('All mimsy were the borogoves,'), (' And the mome raths outgrabe.') ,('"Beware the Jabberwock, my son!'), (' The jaws that bite, the claws that␣ ,→catch!') ,('Beware the Jubjub bird, and shun'), (' The frumious Bandersnatch!"');

602 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.476 Trimming a literal value t=> SELECT LTRIM(' SQream DB'); ltrim ------SQream DB

3.2.3.1.6.477 Trimming a column of values

In this example we use || (Concatenate) to show the leading spaces. t=> SELECT ('Line: ' || line) as untrimmed, ('Line: ' || LTRIM(line)) as trimmed FROM␣ ,→jabberwocky; untrimmed | trimmed ------+------,→------Line: 'Twas brillig, and the slithy toves | Line: 'Twas brillig, and the␣ ,→slithy toves Line: Did gyre and gimble in the wabe: | Line: Did gyre and gimble in the␣ ,→wabe: Line: All mimsy were the borogoves, | Line: All mimsy were the␣ ,→borogoves, Line: And the mome raths outgrabe. | Line: And the mome raths␣ ,→outgrabe. Line: "Beware the Jabberwock, my son! | Line: "Beware the Jabberwock, my␣ ,→son! Line: The jaws that bite, the claws that catch! | Line: The jaws that bite, the␣ ,→claws that catch! Line: Beware the Jubjub bird, and shun | Line: Beware the Jubjub bird,␣ ,→and shun Line: The frumious Bandersnatch!" | Line: The frumious Bandersnatch!"

3.2.3.1.6.478 OCTET_LENGTH

Calculates the number of bytes in a string.

Note: • This function is supported on TEXT strings only. • To get the length in characters, see CHAR_LENGTH. • For VARCHAR strings, the octet length is the number of characters. Use LEN instead.

3.2.3.1.6.479 Syntax

OCTET_LEN( text_expr ) --> INT

3.2. SQL Statements and Syntax 603 SQream DB, Release 2021.1

3.2.3.1.6.480 Arguments

Parameter Description text_expr TEXT expression

3.2.3.1.6.481 Returns

Returns an integer containing the number of bytes in the string.

3.2.3.1.6.482 Notes

• To get the length in characters, see CHAR_LENGTH • If the value is NULL, the result is NULL.

3.2.3.1.6.483 Examples

For these examples, consider the following table and contents:

CREATE TABLE alphabets(line NVARCHAR(50));

INSERT INTO alphabets VALUES ('abcdefghijklmnopqrstuvwxyz'), ('￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿') ,('￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿');

3.2.3.1.6.484 Length in characters and bytes of strings

ASCII characters take up 1 byte per character, while Thai takes up 3 bytes and Hebrew takes up 2 bytes. t=> SELECT LEN(line), CHAR_LENGTH(line), OCTET_LENGTH(line) FROM alphabets; len | char_length | octet_length ----+------+------26 | 26 | 26 47 | 47 | 141 22 | 22 | 44

3.2.3.1.6.485 PATINDEX

Returns the starting position of a pattern inside a string. See also CHARINDEX, REGEXP_INSTR. PATINDEX works like LIKE, so the standard wildcards are supported. You do not have to enclose the pattern between percents.

Note: This function is provided for SQL Server compatability.

604 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.486 Syntax

PATINDEX ( string_test_expr , string_expr ) --> INT

3.2.3.1.6.487 Arguments

Parameter Description string_test_expr Pattern to find string_expr String to search within

3.2.3.1.6.488 Test patterns

Pat- Description tern % match zero or more characters _ match exactly one character [A-Z] match any character between A and Z inclusive. The - character between two other characters forms a range that matches all characters from the first character to the second. For example, [A-Z] matches all ASCII capital letters. [^A-Z]match any character not between A and Z [abcde]match any one of a b c d and e [^abcde]match any character that is not one of a b c d or e [abcC-F]match a b c or any character between C and F

3.2.3.1.6.489 Returns

Integer start position of a match, or 0 if no match was found.

3.2.3.1.6.490 Notes

• If the value is NULL, the result is NULL. • PATINDEX works on VARCHAR text types only. • PATINDEX does not work on all literal values - only on column values. (i.e. PATINDEX('%mimsy%', 'All mimsy were the borogoves') will not work, but PATINDEX('%mimsy%', line) will)

3.2.3.1.6.491 Examples

For these examples, consider the following table and contents:

CREATE TABLE jabberwocky(line VARCHAR(50));

INSERT INTO jabberwocky VALUES (continues on next page)

3.2. SQL Statements and Syntax 605 SQream DB, Release 2021.1

(continued from previous page) ('''Twas brillig, and the slithy toves '), (' Did gyre and gimble in the wabe: ') ,('All mimsy were the borogoves, '), (' And the mome raths outgrabe. ') ,('"Beware the Jabberwock, my son! '), (' The jaws that bite, the claws that␣ ,→catch! ') ,('Beware the Jubjub bird, and shun '), (' The frumious Bandersnatch!" ');

3.2.3.1.6.492 Simple patterns t=> SELECT line, PATINDEX('%mimsy%', line) as "Position of mimsy" FROM jabberwocky; line | Position of mimsy ------+------'Twas brillig, and the slithy toves | 0 Did gyre and gimble in the wabe: | 0 All mimsy were the borogoves, | 5 And the mome raths outgrabe. | 0 "Beware the Jabberwock, my son! | 0 The jaws that bite, the claws that catch! | 0 Beware the Jubjub bird, and shun | 0 The frumious Bandersnatch!" | 0

3.2.3.1.6.493 Complex wildcards expressions

The following example uses the ^ negation operator to find the position of a character that is neither a number, a letter, or a space. t=> SELECT PATINDEX('%[^ 0-9A-z]%', line), line FROM jabberwocky; patindex | line ------+------1 | 'Twas brillig, and the slithy toves 38 | Did gyre and gimble in the wabe: 29 | All mimsy were the borogoves, 34 | And the mome raths outgrabe. 1 | "Beware the Jabberwock, my son! 25 | The jaws that bite, the claws that catch! 23 | Beware the Jubjub bird, and shun 32 | The frumious Bandersnatch!"

3.2.3.1.6.494 REGEXP_COUNT

Counts the number of occurences that a string matches a given regular expression pattern. See also: REGEXP_INSTR, REGEXP_SUBSTR.

606 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.495 Syntax

REGEXP_COUNT( string_expr, string_test_expr [, start_index ] ) --> INT

3.2.3.1.6.496 Arguments

Parameter Description string_expr String to test string_test_expr Test pattern start_index The character index offset to start counting from. Defaults to1

3.2.3.1.6.497 Test patterns

Pat- Description tern ^ Match the beginning of a string $ Match the end of a string . Match any character (including whitespace such as carriage return and newline) * Match the preceding pattern zero or more times + Match the preceding pattern at least once ? Match the preceding pattern once at most de|abcMatch either de or abc (abc)*Match zero or more instances of the sequence abc {2} Match the preceding pattern exactly two times {2, Match the preceding pattern between two and four times 4} [a-dX]Matches, any character that is (or is not when negated with ^) either a, b, c, d, or X. The - character [^a-dX]between two other characters forms a range that matches all characters from the first character to the second. For example, [0-9] matches any decimal digit. To include a literal ] character, it must immediately follow the opening bracket [. To include a literal - character, it must be written first or last. Any character that does not have a defined special meaning inside a [] pair matches only itself.

3.2.3.1.6.498 Returns

Integer result of the number of times that a pattern occurs in a string.

3.2.3.1.6.499 Notes

• The test pattern must be literal string. Column references or complex expressions are currently un- supported. • If the value is NULL, the result is NULL.

3.2. SQL Statements and Syntax 607 SQream DB, Release 2021.1

3.2.3.1.6.500 Examples

For these examples, assume a table named nba, with the following structure: CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float );

Here’s a peek at the table contents (Download nba.csv):

Table 30: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics

3.2.3.1.6.501 Find players with at least 3 names (2 spaces) nba=> SELECT "Name" FROM nba WHERE REGEXP_COUNT("Name", '( )+')>1; Name ------James Michael McAdoo Luc Richard Mbah a Moute Larry Nance Jr. Metta World Peace Glenn Robinson III (continues on next page)

608 Chapter 3. Reference SQream DB, Release 2021.1

(continued from previous page) Johnny O'Bryant III Tim Hardaway Jr. Frank Kaminsky III Kelly Oubre Jr. Otto Porter Jr.

3.2.3.1.6.502 Using the offset index

Start finding spaces that appear 8 characters in nba=> SELECT "Name" FROM nba WHERE REGEXP_COUNT("Name", '( )+', 8)>1; Name ------Luc Richard Mbah a Moute

3.2.3.1.6.503 REGEXP_INSTR

Returns the start position of a regex match. searches a string for a POSIX-style regular expression. This function returns the position within the string where the match was located See also: REGEXP_COUNT, REGEXP_SUBSTR.

3.2.3.1.6.504 Syntax

REGEXP_INSTR( string_expr, string_test_expr [, start_index [, occurence [, return_ ,→position ]] ) --> INT

3.2.3.1.6.505 Arguments

Pa- Description rame- ter string_exprString to test string_test_exprTest pattern start_indexThe character index offset to start counting from. Defaults to1 occurenceWhich occurence to search for. Defaults to 1 return_positionSpecifies the location within the string to return. Using 0, the function returns the string position of the first character of the substring that matches the pattern. A value greater than 0returns will return the position of the first character following the end of the pattern. Defaults to0

3.2. SQL Statements and Syntax 609 SQream DB, Release 2021.1

3.2.3.1.6.506 Test patterns

Pat- Description tern ^ Match the beginning of a string $ Match the end of a string . Match any character (including whitespace such as carriage return and newline) * Match the preceding pattern zero or more times + Match the preceding pattern at least once ? Match the preceding pattern once at most de|abcMatch either de or abc (abc)*Match zero or more instances of the sequence abc {2} Match the preceding pattern exactly two times {2, Match the preceding pattern between two and four times 4} [a-dX]Matches, any character that is (or is not when negated with ^) either a, b, c, d, or X. The - character [^a-dX]between two other characters forms a range that matches all characters from the first character to the second. For example, [0-9] matches any decimal digit. To include a literal ] character, it must immediately follow the opening bracket [. To include a literal - character, it must be written first or last. Any character that does not have a defined special meaning inside a [] pair matches only itself.

3.2.3.1.6.507 Returns

Integer start position of a regex match, or 0 if no match was found.

3.2.3.1.6.508 Notes

• The test pattern must be literal string. Column references or complex expressions are currently un- supported. • If the value is NULL, the result is NULL.

3.2.3.1.6.509 Examples

For these examples, assume a table named nba, with the following structure:

CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float );

610 Chapter 3. Reference SQream DB, Release 2021.1

Here’s a peek at the table contents (Download nba.csv):

Table 31: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics

3.2.3.1.6.510 Find players with ‘ow’ in their name nba=> SELECT "Name", REGEXP_INSTR("Name", 'ow') FROM nba WHERE REGEXP_COUNT("Name", 'ow ,→')>0; Name | regexp_instr ------+------Jae Crowder | 7 Markel Brown | 10 Langston Galloway | 14 Kyle Lowry | 7 Norman Powell | 9 Anthony Brown | 11 Cameron Bairstow | 15 Lorenzo Brown | 11 Dirk Nowitzki | 7 Dwight Powell | 9 Dwight Howard | 9 Justise Winslow | 14 Karl-Anthony Towns | 15 Anthony Morrow | 13

3.2.3.1.6.511 Using the return_position argument

Get the second occurence of the letter ‘k’ in a player’s name. We set start_index to 1 (the default)

3.2. SQL Statements and Syntax 611 SQream DB, Release 2021.1

nba=> SELECT "Name", REGEXP_INSTR("Name", 'k', 1, 2) FROM nba WHERE REGEXP_INSTR("Name", ,→ 'k', 1, 2)>0; Name | regexp_instr ------+------Nik Stauskas | 10 Tarik Black | 11 Dirk Nowitzki | 12 Sam Dekker | 8 Kendrick Perkins | 13 Frank Kaminsky III | 13 Nikola Jokic | 10 Nikola Pekovic | 10

3.2.3.1.6.512 REGEXP_SUBSTR

Returns the occurence of a regex match. See also: REGEXP_INSTR, REGEXP_COUNT.

3.2.3.1.6.513 Syntax

REGEXP_SUBSTR( string_expr, string_test_expr [ , start_index [ , occurence ] ] ) --> INT

3.2.3.1.6.514 Arguments

Parameter Description string_exprString to test string_test_exprTest pattern start_indexThe character index offset to start counting from. Defaults to1 occurence Which occurence to search for. Defaults to 1 return_positionSetes the position within the string to return. Using 0, the function returns the string position of the first character of the substring that matches the pattern. Defaults to0

612 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.515 Test patterns

Pat- Description tern ^ Match the beginning of a string $ Match the end of a string . Match any character (including whitespace such as carriage return and newline) * Match the preceding pattern zero or more times + Match the preceding pattern at least once ? Match the preceding pattern once at most de|abcMatch either de or abc (abc)*Match zero or more instances of the sequence abc {2} Match the preceding pattern exactly two times {2, Match the preceding pattern between two and four times 4} [a-dX]Matches, any character that is (or is not when negated with ^) either a, b, c, d, or X. The - character [^a-dX]between two other characters forms a range that matches all characters from the first character to the second. For example, [0-9] matches any decimal digit. To include a literal ] character, it must immediately follow the opening bracket [. To include a literal - character, it must be written first or last. Any character that does not have a defined special meaning inside a [] pair matches only itself.

3.2.3.1.6.516 Returns

Returns the same type as the argument supplied, or Integer start position of a regex match, or an empty string ('') if no match was found.

3.2.3.1.6.517 Notes

• The test pattern must be literal string. Column references or complex expressions are currently un- supported. • If the value is NULL, the result is NULL.

3.2.3.1.6.518 Examples

For these examples, assume a table named nba, with the following structure:

CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float );

3.2. SQL Statements and Syntax 613 SQream DB, Release 2021.1

Here’s a peek at the table contents (Download nba.csv):

Table 32: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics

3.2.3.1.6.519 Find players with ‘o’ in their name nba=> SELECT "Name", REGEXP_SUBSTR("Name", '([a-zA-Z]+o[a-zA-Z]+)+') FROM nba ORDER BY 2␣ ,→DESC LIMIT 10; Name | regexp_substr ------+------James Young | Young Thaddeus Young | Young Nick Young | Young Metta World Peace | World Christian Wood | Wood Justise Winslow | Winslow Wilson Chandler | Wilson C.J. Wilcox | Wilcox Shayne Whittington | Whittington Russell Westbrook | Westbrook

3.2.3.1.6.520 Using the return_position argument

Get the last name (or middle name) for players with ‘o’ in their first and last name. We set start_index to 1 (the default) nba=> SELECT "Name", REGEXP_SUBSTR("Name", '([a-zA-Z]+o[a-zA-Z]+)+', 1, 2) FROM nba␣ ,→ORDER BY 2 DESC LIMIT 10; Name | regexp_substr (continues on next page)

614 Chapter 3. Reference SQream DB, Release 2021.1

(continued from previous page) ------+------Joe Young | Young Tony Wroten | Wroten Noah Vonleh | Vonleh Karl-Anthony Towns | Towns Anthony Tolliver | Tolliver Hollis Thompson | Thompson Jason Thompson | Thompson Donald Sloan | Sloan Jonathon Simmons | Simmons Ramon Sessions | Sessions

3.2.3.1.6.521 REPEAT

Repeats a string as many times as specified.

Warning: This function works ONLY with TEXT data type.

3.2.3.1.6.522 Syntax

REPEAT(expr, character_count)

3.2.3.1.6.523 Arguments

3.2.3.1.6.524 Returns

Returns the same type as the argument supplied.

3.2.3.1.6.525 Notes

• When character_count <= 0, and empty string is returned.

3.2.3.1.6.526 Examples

For these examples, consider the following table and contents:

CREATE TABLE customer(customername TEXT));

INSERT INTO customer VALUES ('Alfreds Futterkiste'), ('Ana Trujillo Emparedados y helados'), ('Antonio Moreno Taquería'), ('Around the Horn');

3.2. SQL Statements and Syntax 615 SQream DB, Release 2021.1

3.2.3.1.6.527 Repeat the text in customername 2 times: t=> SELECT REPEAT(customername, 2) FROM customers; repeat ------Alfreds FutterkisteAlfreds Futterkiste Ana Trujillo Emparedados y heladosAna Trujillo Emparedados y helados Antonio Moreno TaqueríaAntonio Moreno Taquería Around the HornAround the Horn

3.2.3.1.6.528 Repeat the string 0 times: t=> SELECT REPEAT('abc', 0); repeat ------''

3.2.3.1.6.529 Repeat the string 1 times: t=> SELECT REPEAT('abc', 1); repeat ------'abc'

3.2.3.1.6.530 Repeat the string 3 times: t=> SELECT REPEAT('a', 3); repeat ------'aaa'

3.2.3.1.6.531 Repeat an empty string 10 times: t=> SELECT REPEAT('', 10); repeat ------''

616 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.532 Repeat a string -3 times: t=> SELECT REPEAT('abc',-3); repeat ------''

3.2.3.1.6.533 REPLACE

Replaces all occurrences of a specified string value with another string value.

Warning: With VARCHAR, a substring can only be replaced with another substring of equal byte length. See OCTET_LENGTH.

3.2.3.1.6.534 Syntax

REPLACE( expr, source_expr, replacement_expr )

3.2.3.1.6.535 Arguments

Parameter Description expr String expression source_expr String expression to be replaced replacement_expr String expression containing the replacement

3.2.3.1.6.536 Returns

Returns the same type as the argument supplied.

3.2.3.1.6.537 Notes

• In VARCHAR strings, the source_expr and replacement_expr must be the same byte length. See OCTET_LENGTH. • If the value is NULL, the result is NULL.

3.2.3.1.6.538 Examples

For these examples, consider the following table and contents:

3.2. SQL Statements and Syntax 617 SQream DB, Release 2021.1

CREATE TABLE jabberwocky(line TEXT(50));

INSERT INTO jabberwocky VALUES ('''Twas brillig, and the slithy toves '), (' Did gyre and gimble in the wabe: ') ,('All mimsy were the borogoves, '), (' And the mome raths outgrabe. ') ,('"Beware the Jabberwock, my son! '), (' The jaws that bite, the claws that␣ ,→catch! ') ,('Beware the Jubjub bird, and shun '), (' The frumious Bandersnatch!" ');

3.2.3.1.6.539 Replacing single characters t=> SELECT REPLACE('What is all of this about?', 'u', 'o'); replace ------What is all of this aboot?

3.2.3.1.6.540 Replacing words t=> SELECT REPLACE(line, 'the', 'þe') FROM jabberwocky; replace ------'Twas brillig, and þe slithy toves Did gyre and gimble in þe wabe: All mimsy were þe borogoves, And þe mome raths outgrabe. "Beware þe Jabberwock, my son! The jaws that bite, þe claws that catch! Beware þe Jubjub bird, and shun The frumious Bandersnatch!"

3.2.3.1.6.541 REVERSE

Returns a reversed order of a character string.

3.2.3.1.6.542 Syntax

REVERSE( expr )

3.2.3.1.6.543 Arguments

Parameter Description expr String expression

618 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.544 Returns

Returns the same type as the argument supplied.

3.2.3.1.6.545 Notes

• If the value is NULL, the result is NULL.

3.2.3.1.6.546 Examples

For these examples, consider the following table and contents:

CREATE TABLE jabberwocky(line TEXT(50));

INSERT INTO jabberwocky VALUES ('''Twas brillig, and the slithy toves '), (' Did gyre and gimble in the wabe: ') ,('All mimsy were the borogoves, '), (' And the mome raths outgrabe. ') ,('"Beware the Jabberwock, my son! '), (' The jaws that bite, the claws that␣ ,→catch! ') ,('Beware the Jubjub bird, and shun '), (' The frumious Bandersnatch!" ');

3.2.3.1.6.547 Using REVERSE t=> SELECT line, REVERSE(line) FROM jabberwocky; line | reverse ------+------,→------'Twas brillig, and the slithy toves | sevot yhtils eht dna ,gillirb sawT' Did gyre and gimble in the wabe: | :ebaw eht ni elbmig dna eryg diD All mimsy were the borogoves, | ,sevogorob eht erew ysmim llA And the mome raths outgrabe. | .ebargtuo shtar emom eht dnA "Beware the Jabberwock, my son! | !nos ym ,kcowrebbaJ eht eraweB" The jaws that bite, the claws that catch! | !hctac taht swalc eht ,etib taht␣ ,→swaj ehT Beware the Jubjub bird, and shun | nuhs dna ,drib bujbuJ eht eraweB The frumious Bandersnatch!" | "!hctansrednaB suoimurf ehT

3.2.3.1.6.548 RIGHT

Returns the right part of a character string with the specified number of characters. See also LEFT.

3.2.3.1.6.549 Syntax

RIGHT( expr , character_count )

3.2. SQL Statements and Syntax 619 SQream DB, Release 2021.1

3.2.3.1.6.550 Arguments

Parameter Description expr String expression character_count A positive integer that specifies how many characters to return.

3.2.3.1.6.551 Returns

Returns the same type as the argument supplied.

3.2.3.1.6.552 Notes

• This function works on TEXT strings only. • If the value is NULL, the result is NULL.

3.2.3.1.6.553 Examples

For these examples, consider the following table and contents:

CREATE TABLE jabberwocky(line TEXT(50));

INSERT INTO jabberwocky VALUES ('''Twas brillig, and the slithy toves '), (' Did gyre and gimble in the wabe: ') ,('All mimsy were the borogoves, '), (' And the mome raths outgrabe. ') ,('"Beware the Jabberwock, my son! '), (' The jaws that bite, the claws that␣ ,→catch! ') ,('Beware the Jubjub bird, and shun '), (' The frumious Bandersnatch!" ');

3.2.3.1.6.554 Using RIGHT t=> SELECT LEFT(line, 10), RIGHT(line, 10) FROM jabberwocky; left | right ------+------'Twas bril | thy toves Did | the wabe: All mimsy | orogoves, And | outgrabe. "Beware th | , my son! The | at catch! Beware the | and shun The | rsnatch!"

3.2.3.1.6.555 RLIKE

Tests if a string matches a given regular expression pattern.

620 Chapter 3. Reference SQream DB, Release 2021.1

RLIKE and LIKE are similar, but RLIKE uses POSIX regular expressions instead of the SQL patterns. See also: LIKE, REGEXP_COUNT, REGEXP_INSTR, REGEXP_SUBSTR.

3.2.3.1.6.556 Syntax string_expr [ NOT ] RLIKE string_test_expr

3.2.3.1.6.557 Arguments

Parameter Description string_expr String to test string_test_expr Test pattern

3.2.3.1.6.558 Test patterns

Pat- Description tern ^ Match the beginning of a string $ Match the end of a string . Match any character (including whitespace such as carriage return and newline) * Match the preceding pattern zero or more times + Match the preceding pattern at least once ? Match the preceding pattern once at most de|abcMatch either de or abc (abc)*Match zero or more instances of the sequence abc {2} Match the preceding pattern exactly two times {2, Match the preceding pattern between two and four times 4} [a-dX]Matches, any character that is (or is not when negated with ^) either a, b, c, d, or X. The - character [^a-dX]between two other characters forms a range that matches all characters from the first character to the second. For example, [0-9] matches any decimal digit. To include a literal ] character, it must immediately follow the opening bracket [. To include a literal - character, it must be written first or last. Any character that does not have a defined special meaning inside a [] pair matches only itself.

3.2.3.1.6.559 Returns

TRUE if the test string matches the pattern, or FALSE otherwise.

3.2.3.1.6.560 Notes

• The test pattern must be literal string. Column references or complex expressions are currently un- supported. • If the value is NULL, the result is NULL.

3.2. SQL Statements and Syntax 621 SQream DB, Release 2021.1

3.2.3.1.6.561 Examples

For these examples, assume a table named nba, with the following structure:

CREATE TABLE nba ( Name varchar(40), Team varchar(40), Number tinyint, Position varchar(2), Age tinyint, Height varchar(4), Weight real, College varchar(40), Salary float );

Here’s a peek at the table contents (Download nba.csv):

Table 33: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics

3.2.3.1.6.562 Match the beginning of a string

This form is equivalent to ... LIKE "Portland%" nba=> SELECT "Name","Age","Salary","Team" FROM nba WHERE "Team" RLIKE '^(Portland)+'␣ ,→LIMIT 5; Name | Age | Salary | Team ------+-----+------+------Cliff Alexander | 20 | 525093 | Portland Trail Blazers Al-Farouq Aminu | 25 | 8042895 | Portland Trail Blazers (continues on next page)

622 Chapter 3. Reference SQream DB, Release 2021.1

(continued from previous page) Pat Connaughton | 23 | 625093 | Portland Trail Blazers Allen Crabbe | 24 | 947276 | Portland Trail Blazers Ed Davis | 27 | 6980802 | Portland Trail Blazers

3.2.3.1.6.563 Negate with NOT nba=> SELECT "Name","Age","Salary","Team" FROM nba WHERE "Team" NOT RLIKE '^(Portland)+'␣ ,→LIMIT 5; Name | Age | Salary | Team ------+-----+------+------Avery Bradley | 25 | 7730337 | Boston Celtics Jae Crowder | 25 | 6796117 | Boston Celtics John Holland | 27 | | Boston Celtics R.J. Hunter | 22 | 1148640 | Boston Celtics Jonas Jerebko | 29 | 5000000 | Boston Celtics

3.2.3.1.6.564 Match the middle of a string nba=> SELECT "Name","Age","Salary","Team" FROM nba WHERE "Team" RLIKE '(zz)' LIMIT 5; Name | Age | Salary | Team ------+-----+------+------Jordan Adams | 21 | 1404600 | Memphis Grizzlies Tony Allen | 34 | 5158539 | Memphis Grizzlies Chris Andersen | 37 | 5000000 | Memphis Grizzlies Matt Barnes | 36 | 3542500 | Memphis Grizzlies Vince Carter | 39 | 4088019 | Memphis Grizzlies

3.2.3.1.6.565 Find players with a Roman numeral suffix

Use $ to match only the end of the string nba=> SELECT "Name","Age","Salary","Team" FROM nba WHERE "Name" RLIKE '[XCLVMI]$'; Name | Age | Salary | Team ------+-----+------+------Glenn Robinson III | 22 | 1100000 | Indiana Pacers Johnny O'Bryant III | 23 | 845059 | Milwaukee Bucks Frank Kaminsky III | 23 | 2612520 | Charlotte Hornets

3.2.3.1.6.566 Find players with just one middle name nba=> SELECT "Name","Age","Salary","Team" FROM nba WHERE "Name" RLIKE '^[a-zA-Z]+ [a-zA- ,→Z]+ [a-zA-Z]+$'; Name | Age | Salary | Team ------+-----+------+------James Michael McAdoo | 23 | 845059 | Golden State Warriors (continues on next page)

3.2. SQL Statements and Syntax 623 SQream DB, Release 2021.1

(continued from previous page) Metta World Peace | 36 | 947276 | Los Angeles Lakers Glenn Robinson III | 22 | 1100000 | Indiana Pacers Frank Kaminsky III | 23 | 2612520 | Charlotte Hornets

3.2.3.1.6.567 RTRIM

Trims trailing whitespace from a string. See also TRIM, LTRIM.

3.2.3.1.6.568 Syntax

RTRIM( expr )

3.2.3.1.6.569 Arguments

Parameter Description expr String expression

3.2.3.1.6.570 Returns

Returns the same type as the argument supplied.

3.2.3.1.6.571 Notes

• When using VARCHAR values, SQream DB automatically trims the trailing whitespace. Using RTRIM on VARCHAR does not affect the result. • This function is equivalent to the ANSI form TRIM( TRAILING FROM expr ) • If the value is NULL, the result is NULL.

3.2.3.1.6.572 Examples

For these examples, consider the following table and contents:

CREATE TABLE jabberwocky(line TEXT(50));

INSERT INTO jabberwocky VALUES ('''Twas brillig, and the slithy toves '), (' Did gyre and gimble in the wabe: ') ,('All mimsy were the borogoves, '), (' And the mome raths outgrabe. ') ,('"Beware the Jabberwock, my son! '), (' The jaws that bite, the claws that␣ ,→catch! ') ,('Beware the Jubjub bird, and shun '), (' The frumious Bandersnatch!" ');

624 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.573 Trimming a literal value

In this example we use || (Concatenate) to show the whitespace. t=> SELECT '#' || RTRIM(' SQream DB ') || '#'; rtrim ------# SQream DB#

3.2.3.1.6.574 Trimming a column of values t=> SELECT ('Line: ' || line) as untrimmed, ('Line: ' || RTRIM(line) || ') as trimmed␣ ,→FROM jabberwocky; untrimmed | trimmed ------+------,→------Line: 'Twas brillig, and the slithy toves # | Line: 'Twas brillig, and the␣ ,→slithy toves# Line: Did gyre and gimble in the wabe: # | Line: Did gyre and␣ ,→gimble in the wabe:# Line: All mimsy were the borogoves, # | Line: All mimsy were the␣ ,→borogoves,# Line: And the mome raths outgrabe. # | Line: And the mome raths␣ ,→outgrabe.# Line: "Beware the Jabberwock, my son! # | Line: "Beware the Jabberwock,␣ ,→my son!# Line: The jaws that bite, the claws that catch! # | Line: The jaws that bite, ,→ the claws that catch!# Line: Beware the Jubjub bird, and shun # | Line: Beware the Jubjub bird,␣ ,→and shun# Line: The frumious Bandersnatch!" # | Line: The frumious␣ ,→Bandersnatch!"#

3.2.3.1.6.575 SUBSTRING

Returns a substring of the input starting at start_pos.

Note: Some systems call this function SUBSTR.

See also REGEXP_SUBSTR.

3.2.3.1.6.576 Syntax

SUBSTRING( expr, start_pos, length )

3.2. SQL Statements and Syntax 625 SQream DB, Release 2021.1

3.2.3.1.6.577 Arguments

Parameter Description expr String expression start_pos Starting position (starts at 1) length Number of characters to extract

3.2.3.1.6.578 Returns

Returns the same type as the argument supplied.

3.2.3.1.6.579 Notes

• Character count starts at 1. • If the value is NULL, the result is NULL.

3.2.3.1.6.580 Examples

For these examples, assume a table named nba, with the following structure:

CREATE TABLE nba ( Name varchar(40), Team varchar(40), Number tinyint, Position varchar(2), Age tinyint, Height varchar(4), Weight real, College varchar(40), Salary float );

Here’s a peek at the table contents (Download nba.csv):

626 Chapter 3. Reference SQream DB, Release 2021.1

Table 34: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics

3.2.3.1.6.581 Substring using fixed offsets

Get 4 characters, starting from the 4th character nba=> SELECT SUBSTRING("Name", 4, 4) FROM nba LIMIT 5; substring ------ry B Cro n Ho . Hu as J

3.2.3.1.6.582 Truncating strings

Trim a string to 10 characters nba=> SELECT SUBSTRING("Name", 1, 10) FROM nba LIMIT 5; substring ------Avery Brad Jae Crowde John Holla R.J. Hunte Jonas Jere

3.2. SQL Statements and Syntax 627 SQream DB, Release 2021.1

3.2.3.1.6.583 TRIM

Trims both leading and trailing whitespace from a string. See also LTRIM, RTRIM.

3.2.3.1.6.584 Syntax

TRIM( expr )

3.2.3.1.6.585 Arguments

Parameter Description expr String expression

3.2.3.1.6.586 Returns

Returns the same type as the argument supplied.

3.2.3.1.6.587 Notes

• When using VARCHAR values, SQream DB automatically trims the trailing whitespace. • This function is equivalent to the ANSI form TRIM( BOTH FROM expr ) • If the value is NULL, the result is NULL.

3.2.3.1.6.588 Examples

For these examples, consider the following table and contents:

CREATE TABLE jabberwocky(line TEXT(50));

INSERT INTO jabberwocky VALUES ('''Twas brillig, and the slithy toves '), (' Did gyre and gimble in the wabe: ') ,('All mimsy were the borogoves, '), (' And the mome raths outgrabe. ') ,('"Beware the Jabberwock, my son! '), (' The jaws that bite, the claws that␣ ,→catch! ') ,('Beware the Jubjub bird, and shun '), (' The frumious Bandersnatch!" ');

3.2.3.1.6.589 Trimming a literal value

In this example we use || (Concatenate) to show the whitespace.

628 Chapter 3. Reference SQream DB, Release 2021.1

t=> SELECT '#' || TRIM(' SQream DB ') || '#'; trim ------#SQream DB#

3.2.3.1.6.590 Trimming a column of values t=> SELECT ('Line: ' || line || '#') as untrimmed, ('Line: ' || TRIM(line) || '#') as␣ ,→trimmed FROM jabberwocky; untrimmed | trimmed ------+------,→------Line: 'Twas brillig, and the slithy toves # | Line: 'Twas brillig, and the␣ ,→slithy toves# Line: Did gyre and gimble in the wabe: # | Line: Did gyre and gimble in␣ ,→the wabe:# Line: All mimsy were the borogoves, # | Line: All mimsy were the␣ ,→borogoves,# Line: And the mome raths outgrabe. # | Line: And the mome raths␣ ,→outgrabe.# Line: "Beware the Jabberwock, my son! # | Line: "Beware the Jabberwock,␣ ,→my son!# Line: The jaws that bite, the claws that catch! # | Line: The jaws that bite, the␣ ,→claws that catch!# Line: Beware the Jubjub bird, and shun # | Line: Beware the Jubjub bird,␣ ,→and shun# Line: The frumious Bandersnatch!" # | Line: The frumious␣ ,→Bandersnatch!"#

3.2.3.1.6.591 UPPER

Converts characters in a string to upper case See also: LOWER

3.2.3.1.6.592 Syntax

UPPER( expr )

3.2.3.1.6.593 Arguments

Parameter Description expr String expression

3.2. SQL Statements and Syntax 629 SQream DB, Release 2021.1

3.2.3.1.6.594 Returns

Returns the same type as the argument supplied.

3.2.3.1.6.595 Notes

• If the value is NULL, the result is NULL.

3.2.3.1.6.596 Examples

For these examples, consider the following table and contents:

CREATE TABLE jabberwocky(line VARCHAR(50));

INSERT INTO jabberwocky VALUES ('''Twas brillig, and the slithy toves'), (' Did gyre and gimble in the wabe:') ,('All mimsy were the borogoves,'), (' And the mome raths outgrabe.') ,('"Beware the Jabberwock, my son!'), (' The jaws that bite, the claws that␣ ,→catch!') ,('Beware the Jubjub bird, and shun'), (' The frumious Bandersnatch!"');

3.2.3.1.6.597 Upper-casing a literal value t=> SELECT UPPER('SQream DB'); SQREAM DB

3.2.3.1.6.598 Upper-casing a column of values t=> SELECT UPPER(line) FROM jabberwocky; upper ------'TWAS BRILLIG, AND THE SLITHY TOVES DID GYRE AND GIMBLE IN THE WABE: ALL MIMSY WERE THE BOROGOVES, AND THE MOME RATHS OUTGRABE. "BEWARE THE JABBERWOCK, MY SON! THE JAWS THAT BITE, THE CLAWS THAT CATCH! BEWARE THE JUBJUB BIRD, AND SHUN THE FRUMIOUS BANDERSNATCH!"

3.2.3.1.6.599 User-Defined Functions

The following user-defined functions are functions that can be defined and configured byusers: • Python user-defined functions. • Scalar SQL user-defined functions.

630 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.600 Scalar SQL UDF

A scalar SQL UDF is a user-defined function that returns a single value, such as the sum of agroup values. Scalar UDFs are different than table-value functions, which return a result set in the form ofatable. This page describes the correct syntax when building simple scalar UDFs and provides three examples.

3.2.3.1.6.601 Syntax

The following example shows the correct syntax for simple scalar SQL UDF’s returning the type name:

$ create_function_statement ::= $ CREATE [ OR REPLACE ] FUNCTION function_name (argument_list) $ RETURNS return_type $ AS $$ $ { function_body } $ $$ LANGUAGE SQL $ ; $ $ function_name ::= identifier $ argument_list :: = { value_name type_name [, ...] } $ value_name ::= identifier $ return_type ::= type_name $ function_body ::= A valid SQL statement

3.2.3.1.6.602 Examples

3.2.3.1.6.603 Example 1 – Support for Different Syntax

Scalar SQL UDF supports standard functionality even when different syntax is used. In the example below, the syntax dateadd is used instead of add_months, although the function of each is identical. In addition, the operation works correctly even though the order of the expressions in add_months (dt, datetime, and n int) is different than MONTH, n, and dt in dateadd.

$ CREATE or replace FUNCTION add_months(dt datetime,n int) $ RETURNS datetime $ AS $$ $ SELECT dateadd(MONTH ,n,dt) $ $$ LANGUAGE SQL;

3.2.3.1.6.604 Example 2 – Manipulating Strings

The Scalar SQL UDF can be used to manipulate strings. The following example shows the correct syntax for converting a TEXT date to the DATE type:

$ CREATE or replace FUNCTION STR_TO_DATE(f text) $ RETURNS date $ AS $$ (continues on next page)

3.2. SQL Statements and Syntax 631 SQream DB, Release 2021.1

(continued from previous page) $ select (substring(f,1,4)||'-'||substring(f,5,2)||'-'||substring(f,7,2))::date $ $$ LANGUAGE SQL;

3.2.3.1.6.605 Example 3 – Manually Building Functionality

You can use the Scalar SQL UDF to manually build functionality for otherwise unsupported operations.

$ CREATE OR REPLACE function "least_sq" (a float, b float) -- Replace the LEAST(from␣ ,→hql) function $ returns float as $ $$select case $ when a <= b then a $ when b < a then b $ when a is null then b $ when b is null then a $ else null $ end; $ $$ $ language sql;

3.2.3.1.6.606 Usage Notes

The following usage notes apply when using simple scalar SQL UDF’s: • During this stage, the SQL embedded in the function body must be of the type SELECT expr;. Creating a UDF with invalid SQL, or with valid SQL of any other type, results in an error. • As with Python UDFs, the argument list can be left empty. • SQL UDFs can reference other UDF’s, including Python UDF’s. NOTICE: A function cannot (directly or indirectly) reference itself (such as by referencing another function that references it). Because SQL UDF’s are one type of supported UDFs, the following Python UDF characteristics apply: • UDF permission rules - see Access Control. • The get_function_ddl utility function works on these functions - see Getting the DDL for a Function. • SQL UDF’s should appear in the catalog with Python UDF’s - see Finding Existing UDFs in the Catalog.

3.2.3.1.6.607 Restrictions

The following restrictions apply to simple scalar SQL UDFs: • Simple scalar SQL UDF’s cannot currently reference other UDFs. • Like Python UDF’s, Sqream does not support overloading.

632 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.608 Aggregate Functions

3.2.3.1.6.609 Overview

Aggregate functions perform calculations based on a set of values and return a single value. Most aggregate functions ignore null values. Aggregate functions are often used with the GROUP BY clause of the SELECT statement.

3.2.3.1.6.610 Available Aggregate Functions

The following list shows the available aggregate functions:

3.2.3.1.6.611 AVG

Returns the average of numeric values.

3.2.3.1.6.612 Syntax

-- As an aggregate AVG( expr )

-- As a window function AVG ( expr ) OVER ( [ PARTITION BY value_expression [, ...]] [ ORDER BY value_expression [ ASC | DESC ][ NULLS { FIRST | LAST } ] [, ...]] [ frame_clause ] )

3.2.3.1.6.613 Arguments

Parameter Description expr Numeric expression

3.2.3.1.6.614 Returns

Return type is dependant on the argument. • For TINYINT, SMALLINT and INT, the return type is INT. • For BIGINT, the return type is BIGINT. • For REAL, the return type is REAL • For DOUBLE, rhe return type is DOUBLE

3.2. SQL Statements and Syntax 633 SQream DB, Release 2021.1

3.2.3.1.6.615 Notes

• NULL values are ignored • AVG relies on calculating the sum, which can very quickly overflow on large data sets. If the sumis over 231 for example, up-cast to a larger type like BIGINT: AVG(expr :: BIGINT)

3.2.3.1.6.616 Examples

For these examples, assume a table named nba, with the following structure:

CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float );

Here’s a peek at the table contents (Download nba.csv):

Table 35: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics

634 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.617 Simple average t=> SELECT AVG("Age") FROM nba; avg --- 26

Note: The return type is the same as the input type. To get a fractional result, cast the argument: t=> SELECT AVG("Age" :: REAL) FROM nba; avg ------26.9387

3.2.3.1.6.618 Combine AVG with other aggregates t=> SELECT "Age", AVG("Salary") as "Average salary", COUNT(*) as "Number of players"␣ ,→FROM nba GROUP BY 1; Age | Average salary | Number of players ----+------+------19 | 1930440 | 2 20 | 2725790 | 19 21 | 2067379 | 19 22 | 2357963 | 26 23 | 2034746 | 41 24 | 3785300 | 47 25 | 3930867 | 45 26 | 6866566 | 36 27 | 6676741 | 41 28 | 5110188 | 31 29 | 6224177 | 28 30 | 7061858 | 31 31 | 8511396 | 22 32 | 7716958 | 13 33 | 3930739 | 14 34 | 7606030 | 10 35 | 3461739 | 9 36 | 2238119 | 10 37 | 12777778 | 4 38 | 1840041 | 4 39 | 2517872 | 2 40 | 4666916 | 3

3.2.3.1.6.619 CORR

Returns the Pearson correlation coefficient of value pairs.

3.2. SQL Statements and Syntax 635 SQream DB, Release 2021.1

3.2.3.1.6.620 Syntax

-- As an aggregate CORR( expr1, expr2 )

-- As a window function CORR ( expr1, expr2 ) OVER ( [ PARTITION BY partition_expr [, ...]] [ ORDER BY order [ ASC | DESC ] [, ...]] )

3.2.3.1.6.621 Arguments

Parameter Description expr1, expr2 Numeric expression

3.2.3.1.6.622 Returns

Returns the Perason correlation coefficient with type DOUBLE.

3.2.3.1.6.623 Notes

• When all rows contain NULL values, the function returns NULL.

3.2.3.1.6.624 Examples

For these examples, assume a table named nba, with the following structure:

CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float );

Here’s a peek at the table contents (Download nba.csv):

636 Chapter 3. Reference SQream DB, Release 2021.1

Table 36: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics

3.2.3.1.6.625 Simple correlation t=> SELECT "Team", CORR("Age","Salary") FROM nba GROUP BY 1 ORDER BY 2 ASC; Team | corr ------+------Cleveland Cavaliers | -0.3219 San Antonio Spurs | -0.2015 Oklahoma City Thunder | -0.1236 Detroit Pistons | -0.0678 New Orleans Pelicans | -0.0459 Los Angeles Clippers | -0.0279 Utah Jazz | 0.0913 Washington Wizards | 0.1217 Dallas Mavericks | 0.1388 Sacramento Kings | 0.1489 Milwaukee Bucks | 0.1626 Golden State Warriors | 0.1648 Minnesota Timberwolves | 0.1909 Denver Nuggets | 0.2035 Houston Rockets | 0.2051 Philadelphia 76ers | 0.2645 Chicago Bulls | 0.2663 Phoenix Suns | 0.2808 Orlando Magic | 0.2878 Toronto Raptors | 0.2916 Memphis Grizzlies | 0.3225 Miami Heat | 0.3635 Charlotte Hornets | 0.3779 (continues on next page)

3.2. SQL Statements and Syntax 637 SQream DB, Release 2021.1

(continued from previous page) Brooklyn Nets | 0.4084 Indiana Pacers | 0.4261 Atlanta Hawks | 0.4321 New York Knicks | 0.4401 Los Angeles Lakers | 0.4563 Portland Trail Blazers | 0.4856 Boston Celtics | 0.6904

3.2.3.1.6.626 COUNT

Returns the count of numeric values, or only the distinct values.

3.2.3.1.6.627 Syntax

-- As an aggregate COUNT( { [ DISTINCT ] expr | * } ) --> BIGINT

-- As a window function COUNT ( { [ DISTINCT ] expr | * } ) OVER ( [ PARTITION BY value_expression [, ...]] [ ORDER BY value_expression [ ASC | DESC ][ NULLS { FIRST | LAST } ] [, ...]] [ frame_clause ] )

3.2.3.1.6.628 Arguments

Pa- Description rame- ter expr Any value expression * Specifies that COUNT should count all rows. It does not use information about any particular column, and preserves duplicate rows and NULL values. DISTINCTSpecifies that the operation should operate only on unique values

3.2.3.1.6.629 Returns

• Count returns BIGINT.

3.2.3.1.6.630 Notes

• NULL values are not ignored by COUNT • When all rows contain NULL values, the function returns NULL.

638 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.631 Examples

For these examples, assume a table named nba, with the following structure:

CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float );

Here’s a peek at the table contents (Download nba.csv):

Table 37: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics

3.2.3.1.6.632 Count rows in a table t=> SELECT COUNT(*) FROM nba; count ----- 457

3.2. SQL Statements and Syntax 639 SQream DB, Release 2021.1

3.2.3.1.6.633 Count distinct values in a table

These two forms are equivalent: t=> SELECT COUNT(distinct "Age") FROM nba; count ----- 22 t=> SELECT COUNT(*) FROM (SELECT "Age" FROM nba GROUP BY 1); count ----- 22

3.2.3.1.6.634 Combine COUNT with other aggregates t=> SELECT "Age", AVG("Salary") as "Average salary", COUNT(*) as "Number of players"␣ ,→FROM nba GROUP BY 1; Age | Average salary | Number of players ----+------+------19 | 1930440 | 2 20 | 2725790 | 19 21 | 2067379 | 19 22 | 2357963 | 26 23 | 2034746 | 41 24 | 3785300 | 47 25 | 3930867 | 45 26 | 6866566 | 36 27 | 6676741 | 41 28 | 5110188 | 31 29 | 6224177 | 28 30 | 7061858 | 31 31 | 8511396 | 22 32 | 7716958 | 13 33 | 3930739 | 14 34 | 7606030 | 10 35 | 3461739 | 9 36 | 2238119 | 10 37 | 12777778 | 4 38 | 1840041 | 4 39 | 2517872 | 2 40 | 4666916 | 3

3.2.3.1.6.635 COVAR_POP

Returns the population covariance of value pairs. See also: COVAR_SAMP

640 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.636 Syntax

-- As an aggregate COVAR_POP( expr1, expr2 )

-- As a window function COVAR_POP ( expr1, expr2 ) OVER ( [ PARTITION BY value_expression [, ...]] [ ORDER BY value_expression [ ASC | DESC ][ NULLS { FIRST | LAST } ] [, ...]] [ frame_clause ] )

3.2.3.1.6.637 Arguments

Parameter Description expr1, expr2 Numeric expression

3.2.3.1.6.638 Returns

Returns the population covariance with type DOUBLE.

3.2.3.1.6.639 Notes

• When all rows contain NULL values, the function returns NULL.

3.2.3.1.6.640 Examples

For these examples, assume a table named nba, with the following structure:

CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float );

Here’s a peek at the table contents (Download nba.csv):

3.2. SQL Statements and Syntax 641 SQream DB, Release 2021.1

Table 38: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics

3.2.3.1.6.641 Simple covariance t=> SELECT "Team", COVAR_POP("Age","Salary") FROM nba GROUP BY 1 ORDER BY 2 ASC; Team | covar_pop ------+------Cleveland Cavaliers | -9192828.8163 San Antonio Spurs | -6938848.9867 Oklahoma City Thunder | -3696440.1244 Detroit Pistons | -1313564.5067 Los Angeles Clippers | -968077.3778 New Orleans Pelicans | -440360.5374 Utah Jazz | 930824.7689 Philadelphia 76ers | 1458110.3265 Sacramento Kings | 2306106.08 Dallas Mavericks | 2418733.4356 Washington Wizards | 2427928.8978 Milwaukee Bucks | 2616404.8555 Orlando Magic | 2812867.8673 Golden State Warriors | 3352356.3333 Portland Trail Blazers | 3941655.4533 Denver Nuggets | 3966387.1122 Minnesota Timberwolves | 4492620.0237 Toronto Raptors | 4524417.1244 Charlotte Hornets | 5056864.8 Houston Rockets | 5309246.2089 Phoenix Suns | 5580976.6889 Indiana Pacers | 5757986.9067 Boston Celtics | 5797738.7245 (continues on next page)

642 Chapter 3. Reference SQream DB, Release 2021.1

(continued from previous page) Brooklyn Nets | 6119732.0667 Chicago Bulls | 6506357.92 Atlanta Hawks | 8859452.0667 Memphis Grizzlies | 9524269 New York Knicks | 10264800.6875 Miami Heat | 13009610.4734 Los Angeles Lakers | 15400203.6533

3.2.3.1.6.642 COVAR_SAMP

Returns the sample covariance of value pairs. See also: COVAR_POP

3.2.3.1.6.643 Syntax

-- As an aggregate COVAR_SAMP( expr1, expr2 )

-- As a window function COVAR_SAMP ( expr1, expr2 ) OVER ( [ PARTITION BY value_expression [, ...]] [ ORDER BY value_expression [ ASC | DESC ][ NULLS { FIRST | LAST } ] [, ...]] [ frame_clause ] )

3.2.3.1.6.644 Arguments

Parameter Description expr1, expr2 Numeric expression

3.2.3.1.6.645 Returns

Returns the sample covariance with type DOUBLE.

3.2.3.1.6.646 Notes

• When all rows contain NULL values, the function returns NULL. • The function also returns NULL when only one value is non-NULL.

3.2.3.1.6.647 Examples

For these examples, assume a table named nba, with the following structure:

3.2. SQL Statements and Syntax 643 SQream DB, Release 2021.1

CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float );

Here’s a peek at the table contents (Download nba.csv):

Table 39: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics

3.2.3.1.6.648 Simple covariance t=> SELECT "Team", COVAR_SAMP("Age","Salary") FROM nba GROUP BY 1 ORDER BY 2 ASC; Team | covar_samp ------+------Cleveland Cavaliers | -9899969.4945 San Antonio Spurs | -7434481.0571 Oklahoma City Thunder | -3960471.5619 Detroit Pistons | -1407390.5429 Los Angeles Clippers | -1037225.7619 New Orleans Pelicans | -464825.0117 Utah Jazz | 997312.2524 Philadelphia 76ers | 1570272.6593 (continues on next page)

644 Chapter 3. Reference SQream DB, Release 2021.1

(continued from previous page) Sacramento Kings | 2470827.9429 Dallas Mavericks | 2591500.1095 Washington Wizards | 2601352.3905 Milwaukee Bucks | 2790831.8458 Orlando Magic | 3029242.3187 Golden State Warriors | 3591810.3571 Portland Trail Blazers | 4223202.2714 Denver Nuggets | 4271493.8132 Toronto Raptors | 4847589.7762 Minnesota Timberwolves | 4867005.0256 Charlotte Hornets | 5418069.4286 Houston Rockets | 5688478.081 Phoenix Suns | 5979617.881 Indiana Pacers | 6169271.6857 Boston Celtics | 6243718.6264 Brooklyn Nets | 6556855.7857 Chicago Bulls | 6971097.7714 Atlanta Hawks | 9492270.0714 Memphis Grizzlies | 10256905.0769 New York Knicks | 10949120.7333 Miami Heat | 14093744.6795 Los Angeles Lakers | 16500218.2

3.2.3.1.6.649 MAX

Returns the maximum values

3.2.3.1.6.650 Syntax

-- As an aggregate MAX( expr )

-- As a window function MAX ( expr ) OVER ( [ PARTITION BY value_expression [, ...]] [ ORDER BY value_expression [ ASC | DESC ][ NULLS { FIRST | LAST } ] [, ...]] [ frame_clause ] )

3.2.3.1.6.651 Arguments

Parameter Description expr Value expression

3.2.3.1.6.652 Returns

Return type is dependant on the argument.

3.2. SQL Statements and Syntax 645 SQream DB, Release 2021.1

3.2.3.1.6.653 Notes

• NULL values are ignored

3.2.3.1.6.654 Examples

For these examples, assume a table named nba, with the following structure:

CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float );

Here’s a peek at the table contents (Download nba.csv):

Table 40: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics

3.2.3.1.6.655 Simple Minimum, Maximum on numeric columns t=> SELECT MIN("Age"), MAX("Age") FROM nba; min | max (continues on next page)

646 Chapter 3. Reference SQream DB, Release 2021.1

(continued from previous page) ----+---- 19 | 40

3.2.3.1.6.656 Minimum and maximum on text columns t=> SELECT MIN("Name"), MAX("Name") FROM nba; min | max ------+------Aaron Brooks | Zaza Pachulia

3.2.3.1.6.657 Combine MAX with GROUP BY t=> SELECT "Team", MAX("Salary") FROM nba GROUP BY 1 ORDER BY 2 DESC LIMIT 5; Team | max ------+------Los Angeles Lakers | 25000000 Cleveland Cavaliers | 22970500 New York Knicks | 22875000 Houston Rockets | 22359364 Miami Heat | 22192730

3.2.3.1.6.658 MIN

Returns the minimum values

3.2.3.1.6.659 Syntax

-- As an aggregate MIN( expr )

-- As a window function MIN ( expr ) OVER ( [ PARTITION BY value_expression [, ...]] [ ORDER BY value_expression [ ASC | DESC ][ NULLS { FIRST | LAST } ] [, ...]] [ frame_clause ] )

3.2.3.1.6.660 Arguments

Parameter Description expr Value expression

3.2. SQL Statements and Syntax 647 SQream DB, Release 2021.1

3.2.3.1.6.661 Returns

Return type is dependant on the argument.

3.2.3.1.6.662 Notes

• NULL values are ignored

3.2.3.1.6.663 Examples

For these examples, assume a table named nba, with the following structure:

CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float );

Here’s a peek at the table contents (Download nba.csv):

Table 41: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics

648 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.664 Simple Minimum, Maximum on numeric columns t=> SELECT MIN("Age"), MAX("Age") FROM nba; min | max ----+---- 19 | 40

3.2.3.1.6.665 Minimum and maximum on text columns t=> SELECT MIN("Name"), MAX("Name") FROM nba; min | max ------+------Aaron Brooks | Zaza Pachulia

3.2.3.1.6.666 Combine MIN with GROUP BY t=> SELECT "Team", MIN("Salary") FROM nba GROUP BY 1 ORDER BY 2 DESC LIMIT 5; Team | min ------+------Boston Celtics | 1148640 Minnesota Timberwolves | 947276 Utah Jazz | 900000 Orlando Magic | 845059 Memphis Grizzlies | 700902

3.2.3.1.6.667 STDDEV_POP

Returns the population standard deviation of values.

Note: Aliases to this function include STDEVP for compatibility.

See also: STDDEV_SAMP

3.2.3.1.6.668 Syntax

-- As an aggregate STDDEV_POP( expr )

STDEVP( expr )

-- As a window function STDDEVP_POP ( expr ) OVER ( [ PARTITION BY value_expression [, ...]] [ ORDER BY value_expression [ ASC | DESC ][ NULLS { FIRST | LAST } ] [, ...]] [ frame_clause ] )

3.2. SQL Statements and Syntax 649 SQream DB, Release 2021.1

3.2.3.1.6.669 Arguments

Parameter Description expr Numeric expression

3.2.3.1.6.670 Returns

Returns the population deviation with type DOUBLE.

3.2.3.1.6.671 Notes

• When all rows contain NULL values, the function returns NULL.

3.2.3.1.6.672 Examples

For these examples, assume a table named nba, with the following structure:

CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float );

Here’s a peek at the table contents (Download nba.csv):

650 Chapter 3. Reference SQream DB, Release 2021.1

Table 42: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics

3.2.3.1.6.673 Simple standard deviation t=> SELECT "Team", STDDEV_POP("Age") FROM nba GROUP BY 1 LIMIT 10; Team | stddev_pop ------+------Atlanta Hawks | 4.0857 Boston Celtics | 2.7439 Brooklyn Nets | 2.9166 Charlotte Hornets | 3.0521 Chicago Bulls | 4.0464 Cleveland Cavaliers | 3.9811 Dallas Mavericks | 3.5864 Denver Nuggets | 4.5821 Detroit Pistons | 4.2926 Golden State Warriors | 3.7178

3.2.3.1.6.674 Combine STDEVP with other aggregates t=> SELECT "Age", AVG("Salary"), STDDEV_SAMP("Salary"), STDEVP("Salary") FROM nba GROUP␣ ,→BY 1; Age | avg | stddev_samp | stddev_pop ----+------+------+------19 | 1930440 | 279165.7572 | 197400 20 | 2725790 | 1510913.4308 | 1470615.1437 21 | 2067379 | 1412350.3607 | 1374680.8959 22 | 2357963 | 1517378.326 | 1487911.8642 23 | 2034746 | 2728292.0728 | 2693086.8292 (continues on next page)

3.2. SQL Statements and Syntax 651 SQream DB, Release 2021.1

(continued from previous page) 24 | 3785300 | 4803383.8083 | 4749713.0309 25 | 3930867 | 4558462.9038 | 4506364.4739 26 | 6866566 | 6100470.7879 | 6015145.316 27 | 6676741 | 6831984.073 | 6746043.7454 28 | 5110188 | 4316626.5733 | 4244073.0603 29 | 6224177 | 4870705.8411 | 4779656.5821 30 | 7061858 | 5408669.6434 | 5317761.1581 31 | 8511396 | 7170163.4693 | 7005310.0889 32 | 7716958 | 7451335.8768 | 7159011.944 33 | 3930739 | 4354293.0145 | 4195901.738 34 | 7606030 | 5653035.0228 | 5362939.9094 35 | 3461739 | 2364690.175 | 2211965.1152 36 | 2238119 | 1550061.2451 | 1470517.2142 37 | 12777778 | 10715167.374 | 8748897.5249 38 | 1840041 | 1496660.6556 | 1296146.1486 39 | 2517872 | 2220522.4752 | 1570146.5 40 | 4666916 | 4155420.6792 | 3392886.7769

3.2.3.1.6.675 STDDEV_SAMP

Returns the sample standard deviation of values.

Note: Aliases to this function include STDDEV and STDEV for compatibility.

See also: STDDEV_POP

3.2.3.1.6.676 Syntax

-- As an aggregate STDDEV_SAMP( expr )

STDDEV( expr )

STDEV( expr )

-- As a window function STDDEV_SAMP ( expr ) OVER ( [ PARTITION BY value_expression [, ...]] [ ORDER BY value_expression [ ASC | DESC ][ NULLS { FIRST | LAST } ] [, ...]] [ frame_clause ] )

3.2.3.1.6.677 Arguments

Parameter Description expr Numeric expression

652 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.678 Returns

Returns the standard deviation with type DOUBLE.

3.2.3.1.6.679 Notes

• When all rows contain NULL values, the function returns NULL. • The function also returns NULL when only one value is non-NULL.

3.2.3.1.6.680 Examples

For these examples, assume a table named nba, with the following structure:

CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float );

Here’s a peek at the table contents (Download nba.csv):

Table 43: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics

3.2. SQL Statements and Syntax 653 SQream DB, Release 2021.1

3.2.3.1.6.681 Simple standard deviation t=> SELECT AVG("Age"), STDDEV_SAMP("Age") FROM nba; avg | stddev_samp ----+------26 | 4.404

3.2.3.1.6.682 Combine stddev with other aggregates t=> SELECT "Age", AVG("Salary"), STDDEV_SAMP("Salary"), STDDEV_POP("Salary") FROM nba␣ ,→GROUP BY 1; Age | avg | stddev_samp | stddev_pop ----+------+------+------19 | 1930440 | 279165.7572 | 197400 20 | 2725790 | 1510913.4308 | 1470615.1437 21 | 2067379 | 1412350.3607 | 1374680.8959 22 | 2357963 | 1517378.326 | 1487911.8642 23 | 2034746 | 2728292.0728 | 2693086.8292 24 | 3785300 | 4803383.8083 | 4749713.0309 25 | 3930867 | 4558462.9038 | 4506364.4739 26 | 6866566 | 6100470.7879 | 6015145.316 27 | 6676741 | 6831984.073 | 6746043.7454 28 | 5110188 | 4316626.5733 | 4244073.0603 29 | 6224177 | 4870705.8411 | 4779656.5821 30 | 7061858 | 5408669.6434 | 5317761.1581 31 | 8511396 | 7170163.4693 | 7005310.0889 32 | 7716958 | 7451335.8768 | 7159011.944 33 | 3930739 | 4354293.0145 | 4195901.738 34 | 7606030 | 5653035.0228 | 5362939.9094 35 | 3461739 | 2364690.175 | 2211965.1152 36 | 2238119 | 1550061.2451 | 1470517.2142 37 | 12777778 | 10715167.374 | 8748897.5249 38 | 1840041 | 1496660.6556 | 1296146.1486 39 | 2517872 | 2220522.4752 | 1570146.5 40 | 4666916 | 4155420.6792 | 3392886.7769

3.2.3.1.6.683 SUM

Returns the sum of numeric values, or only the distinct values.

3.2.3.1.6.684 Syntax

-- As an aggregate SUM( [ DISTINCT ] expr )

-- As a window function SUM ( expr ) OVER ( [ PARTITION BY value_expression [, ...]] (continues on next page)

654 Chapter 3. Reference SQream DB, Release 2021.1

(continued from previous page) [ ORDER BY value_expression [ ASC | DESC ][ NULLS { FIRST | LAST } ] [, ...]] [ frame_clause ] )

3.2.3.1.6.685 Arguments

Parameter Description expr Numeric expression DISTINCT Specifies that the operation should operate only on unique values

3.2.3.1.6.686 Returns

Return type is dependant on the argument. • For TINYINT, SMALLINT and INT, the return type is INT. • For BIGINT, the return type is BIGINT. • For REAL, the return type is REAL • For DOUBLE, rhe return type is DOUBLE

3.2.3.1.6.687 Notes

• NULL values are ignored • Because SUM returns the same data type, it can very quickly overflow on large data sets. If the SUM is over 231 for example, up-cast to a larger type like BIGINT: SUM(expr :: BIGINT)

3.2.3.1.6.688 Examples

For these examples, assume a table named nba, with the following structure:

CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float );

Here’s a peek at the table contents (Download nba.csv):

3.2. SQL Statements and Syntax 655 SQream DB, Release 2021.1

Table 44: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics

3.2.3.1.6.689 Simple sum t=> SELECT SUM("Age") FROM nba; sum ----- 12311

3.2.3.1.6.690 Sum only distinct values t=> SELECT SUM(DISTINCT "Age") FROM nba; sum --- 649

3.2.3.1.6.691 Combine sum with GROUP BY t=> SELECT "Age", SUM("Salary") FROM nba GROUP BY 1; Age | sum ----+------19 | 3860880 20 | 51790026 21 | 39280213 22 | 61307050 23 | 79355103 24 | 170338514 (continues on next page)

656 Chapter 3. Reference SQream DB, Release 2021.1

(continued from previous page) 25 | 172958166 26 | 247196385 27 | 267069647 28 | 153305658 29 | 168052779 30 | 211855757 31 | 187250724 32 | 100320456 33 | 55030346 34 | 76060300 35 | 27693918 36 | 22381196 37 | 38333334 38 | 7360164 39 | 5035745 40 | 14000750

3.2.3.1.6.692 VAR_POP

Returns the population variance of values.

Note: Aliases to this function include VARP for compatibility.

See also: STDDEV_SAMP

3.2.3.1.6.693 Syntax

-- As an aggregate VAR_POP( expr )

VARP( expr )

-- As a window function VAR_POP ( expr ) OVER ( [ PARTITION BY value_expression [, ...]] [ ORDER BY value_expression [ ASC | DESC ][ NULLS { FIRST | LAST } ] [, ...]] [ frame_clause ] )

3.2.3.1.6.694 Arguments

Parameter Description expr Numeric expression

3.2. SQL Statements and Syntax 657 SQream DB, Release 2021.1

3.2.3.1.6.695 Returns

Returns the population variance with type DOUBLE.

3.2.3.1.6.696 Notes

• When all rows contain NULL values, the function returns NULL.

3.2.3.1.6.697 Examples

For these examples, assume a table named nba, with the following structure:

CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float );

Here’s a peek at the table contents (Download nba.csv):

Table 45: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics

658 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.698 Simple variance t=> SELECT "Team", VAR_POP("Age") FROM nba GROUP BY 1 LIMIT 10; Team | var_pop ------+------Atlanta Hawks | 16.6933 Boston Celtics | 7.5289 Brooklyn Nets | 8.5067 Charlotte Hornets | 9.3156 Chicago Bulls | 16.3733 Cleveland Cavaliers | 15.8489 Dallas Mavericks | 12.8622 Denver Nuggets | 20.9956 Detroit Pistons | 18.4267 Golden State Warriors | 13.8222

3.2.3.1.6.699 Combine VARP with other aggregates t=> SELECT "Age", AVG("Salary"), VARP("Salary"), STDDEV_POP("Salary") FROM nba GROUP BY␣ ,→1; Age | avg | var_pop | stddev_pop ----+------+------+------19 | 1930440 | 38966760000 | 197400 20 | 2725790 | 2162708900784.4473 | 1470615.1437 21 | 2067379 | 1889747565514.0222 | 1374680.8959 22 | 2357963 | 2213881715652.018 | 1487911.8642 23 | 2034746 | 7252716669494.947 | 2693086.8292 24 | 3785300 | 22559773876347.457 | 4749713.0309 25 | 3930867 | 20307320771204.332 | 4506364.4739 26 | 6866566 | 36181973172363.8 | 6015145.316 27 | 6676741 | 45509106214871.39 | 6746043.7454 28 | 5110188 | 18012156141081.11 | 4244073.0603 29 | 6224177 | 22845117042669.63 | 4779656.5821 30 | 7061858 | 28278583734766.582 | 5317761.1581 31 | 8511396 | 49074369441838.43 | 7005310.0889 32 | 7716958 | 51251452013710.28 | 7159011.944 33 | 3930739 | 17605591394980.715 | 4195901.738 34 | 7606030 | 28761124471812.6 | 5362939.9094 35 | 3461739 | 4892789670765.1875 | 2211965.1152 36 | 2238119 | 2162420877245.64 | 1470517.2142 37 | 12777778 | 76543207901234.67 | 8748897.5249 38 | 1840041 | 1679994838499 | 1296146.1486 39 | 2517872 | 2465360031462.25 | 1570146.5 40 | 4666916 | 11511680680555.555 | 3392886.7769

3.2.3.1.6.700 VAR_SAMP

Returns the sample variance of values.

3.2. SQL Statements and Syntax 659 SQream DB, Release 2021.1

Note: Aliases to this function include VAR and VARIANCE for compatibility.

See also: VAR_POP

3.2.3.1.6.701 Syntax

-- As an aggregate VAR_SAMP( expr )

VAR( expr )

VARIANCE( expr )

-- As a window function VAR_SAMP ( expr ) OVER ( [ PARTITION BY value_expression [, ...]] [ ORDER BY value_expression [ ASC | DESC ][ NULLS { FIRST | LAST } ] [, ...]] [ frame_clause ] )

3.2.3.1.6.702 Arguments

Parameter Description expr Numeric expression

3.2.3.1.6.703 Returns

Returns the sample variance with type DOUBLE.

3.2.3.1.6.704 Notes

• When all rows contain NULL values, the function returns NULL. • The function also returns NULL when only one value is non-NULL.

3.2.3.1.6.705 Examples

For these examples, assume a table named nba, with the following structure:

CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), (continues on next page)

660 Chapter 3. Reference SQream DB, Release 2021.1

(continued from previous page) "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float );

Here’s a peek at the table contents (Download nba.csv):

Table 46: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics

3.2.3.1.6.706 Simple variance t=> SELECT "Team", VAR("Age") FROM nba GROUP BY 1 LIMIT 10; Team | var_samp ------+------Atlanta Hawks | 17.8857 Boston Celtics | 8.0667 Brooklyn Nets | 9.1143 Charlotte Hornets | 9.981 Chicago Bulls | 17.5429 Cleveland Cavaliers | 16.981 Dallas Mavericks | 13.781 Denver Nuggets | 22.4952 Detroit Pistons | 19.7429 Golden State Warriors | 14.8095

3.2. SQL Statements and Syntax 661 SQream DB, Release 2021.1

3.2.3.1.6.707 Combine VAR with other aggregates t=> SELECT "Age", AVG("Salary"), VAR("Salary"), VARP("Salary") FROM nba GROUP BY 1; Age | avg | var_samp | var_pop ----+------+------+------19 | 1930440 | 77933520000 | 38966760000 20 | 2725790 | 2282859395272.472 | 2162708900784.4473 21 | 2067379 | 1994733541375.9124 | 1889747565514.0222 22 | 2357963 | 2302436984278.0986 | 2213881715652.018 23 | 2034746 | 7443577634481.656 | 7252716669494.947 24 | 3785300 | 23072496009900.81 | 22559773876347.457 25 | 3930867 | 20779584044953.27 | 20307320771204.332 26 | 6866566 | 37215743834431.336 | 36181973172363.8 27 | 6676741 | 46676006374227.07 | 45509106214871.39 28 | 5110188 | 18633264973532.18 | 18012156141081.11 29 | 6224177 | 23723775390464.617 | 22845117042669.63 30 | 7061858 | 29253707311827.5 | 28278583734766.582 31 | 8511396 | 51411244177164.07 | 49074369441838.43 32 | 7716958 | 55522406348186.13 | 51251452013710.28 33 | 3930739 | 18959867656133.08 | 17605591394980.715 34 | 7606030 | 31956804968680.668 | 28761124471812.6 35 | 3461739 | 5591759623731.643 | 4892789670765.1875 36 | 2238119 | 2402689863606.2666 | 2162420877245.64 37 | 12777778 | 114814811851852 | 76543207901234.67 38 | 1840041 | 2239993117998.6665 | 1679994838499 39 | 2517872 | 4930720062924.5 | 2465360031462.25 40 | 4666916 | 17267521020833.332 | 11511680680555.555

3.2.3.1.6.708 Window Functions

Window functions are functions applied over a subset (known as a window) of the rows returned by a SELECT query. Read more about Window Functions in the SQL Syntax Features section.

3.2.3.1.6.709 LAG

Returns a value from a previous row within the partition of a result set. See also: LEAD.

3.2.3.1.6.710 Syntax

LAG ( expr [, offset ]) OVER ( [ PARTITION BY partition_expr [, ...]] [ ORDER BY order [ ASC | DESC ] [, ...]] ) offset ::= integer

662 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.711 Arguments

Parameter Description expr Any value expression offset Specifies the offset of rows to scan back. If not specified, defaults to1.

3.2.3.1.6.712 Returns

Same type as expr.

3.2.3.1.6.713 Notes

• offset is evaluated with respect to the current row. If not specified, offset defaults to 1. • If there is no previous row (calculated against the current row), LAG returns NULL.

3.2.3.1.6.714 Examples

For these examples, assume a table named nba, with the following structure:

CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float );

Here’s a peek at the table contents (Download nba.csv):

3.2. SQL Statements and Syntax 663 SQream DB, Release 2021.1

Table 47: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics

3.2.3.1.6.715 Using LAG to access previous rows

Changed in version 2020.2: LAG and LEAD are supported from v2020.2. This example calculates the salary between two players, starting from the highest salary. t=> SELECT "Name", . "Salary", . LAG("Salary") OVER (ORDER BY "Salary" DESC) AS "Salary - previous", . LAG("Salary",1) OVER (ORDER BY "Salary" DESC)- "Salary" AS "Salary - diff" . -- LAG("Salary",1) is equivalent to LAG("Salary") . FROM nba . LIMIT 11 ; Name | Salary | Salary - previous | Salary - diff ------+------+------+------Kobe Bryant | 25000000 | | LeBron James | 22970500 | 25000000 | 2029500 Carmelo Anthony | 22875000 | 22970500 | 95500 Dwight Howard | 22359364 | 22875000 | 515636 Chris Bosh | 22192730 | 22359364 | 166634 Chris Paul | 21468695 | 22192730 | 724035 Kevin Durant | 20158622 | 21468695 | 1310073 Derrick Rose | 20093064 | 20158622 | 65558 Dwyane Wade | 20000000 | 20093064 | 93064 Brook Lopez | 19689000 | 20000000 | 311000 DeAndre Jordan | 19689000 | 19689000 | 0

664 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.716 LEAD

Returns a value from a subsequent row within the partition of a result set. See also: LAG.

3.2.3.1.6.717 Syntax

LEAD ( expr [, offset ]) OVER ( [ PARTITION BY partition_expr [, ...]] [ ORDER BY order [ ASC | DESC ] [, ...]] ) offset ::= integer

3.2.3.1.6.718 Arguments

Parameter Description expr Any value expression offset Specifies the offset of rows to scan forward. If not specified, defaults to1.

3.2.3.1.6.719 Returns

Same type as expr.

3.2.3.1.6.720 Notes

• offset is evaluated with respect to the current row. If not specified, offset defaults to 1. • If there is no subsequent row (calculated against the current row), LEAD returns NULL.

3.2.3.1.6.721 Examples

For these examples, assume a table named nba, with the following structure:

CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float );

3.2. SQL Statements and Syntax 665 SQream DB, Release 2021.1

Here’s a peek at the table contents (Download nba.csv):

Table 48: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics

3.2.3.1.6.722 Using LEAD to access next rows

Changed in version 2020.2: LAG and LEAD are supported from v2020.2. This example calculates the salary between two players, starting from the highest salary. t=> SELECT "Name", . "Salary", . LEAD("Salary") OVER (ORDER BY "Salary" DESC) AS "Salary - next", . "Salary" - LEAD("Salary",1) OVER (ORDER BY "Salary" DESC) AS "Salary - diff" . -- LEAD("Salary",1) is equivalent to LEAD("Salary") . FROM nba . LIMIT 11 ; Name | Salary | Salary - next | Salary - diff ------+------+------+------Kobe Bryant | 25000000 | 22970500 | 2029500 LeBron James | 22970500 | 22875000 | 95500 Carmelo Anthony | 22875000 | 22359364 | 515636 Dwight Howard | 22359364 | 22192730 | 166634 Chris Bosh | 22192730 | 21468695 | 724035 Chris Paul | 21468695 | 20158622 | 1310073 Kevin Durant | 20158622 | 20093064 | 65558 Derrick Rose | 20093064 | 20000000 | 93064 Dwyane Wade | 20000000 | 19689000 | 311000 Brook Lopez | 19689000 | 19689000 | 0 DeAndre Jordan | 19689000 | 19689000 | 0

666 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.723 ROW_NUMBER

Returns the row number of each row within the partition of a result set. ROW_NUMBER numbers all rows sequentially. See also RANK, which provides the same value for identical consecutive rows (e.g. 1, 2, 2, 4, 4, 6).

3.2.3.1.6.724 Syntax

ROW_NUMBER( ) OVER ( [ PARTITION BY partition_expr [, ...]] [ ORDER BY order [ ASC | DESC ] [, ...]] )

3.2.3.1.6.725 Arguments

None

3.2.3.1.6.726 Returns

The row number, with data type BIGINT.

3.2.3.1.6.727 Notes

• The ORDER BY clause that is used determines the order in which the rows appear in a result set, and thus the rank. • Rank is non-deterministic. It may return different results each time it is called with a specific setof input values even if there has been no change in the table contents.

3.2.3.1.6.728 Examples

For these examples, assume a table named nba, with the following structure:

CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float );

3.2. SQL Statements and Syntax 667 SQream DB, Release 2021.1

Here’s a peek at the table contents (Download nba.csv):

Table 49: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics

3.2.3.1.6.729 RANK vs. ROW_NUMBER t=> SELECT ROW_NUMBER () OVER (PARTITION BY "Age" ORDER BY "Height") FROM nba; row_number ------1 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 1 (continues on next page)

668 Chapter 3. Reference SQream DB, Release 2021.1

(continued from previous page) 2 3 4 [...] t=> SELECT RANK () OVER (PARTITION BY "Age" ORDER BY "Height") FROM nba; rank ---- 1 1 1 2 2 2 5 6 6 8 8 10 10 10 13 14 14 14 14 18 19 1 2 2 2 [...]

3.2.3.1.6.730 RANK

Returns the rank of each row within the partition of a result set. The rank of a row is defined as 1 + the number of ranks that come before the row. While ROW_NUMBER numbers all rows sequentially, RANK provides the same value for identical consecutive rows (e.g. 1, 2, 2, 4, 4, 6).

3.2.3.1.6.731 Syntax

RANK( ) OVER ( [ PARTITION BY partition_expr [, ...]] (continues on next page)

3.2. SQL Statements and Syntax 669 SQream DB, Release 2021.1

(continued from previous page) [ ORDER BY order [ ASC | DESC ] [, ...]] )

3.2.3.1.6.732 Arguments

None

3.2.3.1.6.733 Returns

The rank of the row, with data type BIGINT.

3.2.3.1.6.734 Notes

• The ORDER BY clause that is used determines the order in which the rows appear in a result set, and thus the rank. • Rank is non-deterministic. It may return different results each time it is called with a specific setof input values even if there has been no change in the table contents.

3.2.3.1.6.735 Examples

For these examples, assume a table named nba, with the following structure:

CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float );

Here’s a peek at the table contents (Download nba.csv):

670 Chapter 3. Reference SQream DB, Release 2021.1

Table 50: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics

3.2.3.1.6.736 RANK vs. ROW_NUMBER t=> SELECT ROW_NUMBER () OVER (PARTITION BY "Age" ORDER BY "Height") FROM nba; row_number ------1 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 1 2 (continues on next page)

3.2. SQL Statements and Syntax 671 SQream DB, Release 2021.1

(continued from previous page) 3 4 [...] t=> SELECT RANK () OVER (PARTITION BY "Age" ORDER BY "Height") FROM nba; rank ---- 1 1 1 2 2 2 5 6 6 8 8 10 10 10 13 14 14 14 14 18 19 1 2 2 2 [...]

3.2.3.1.6.737 FIRST_VALUE

The FIRST_VALUE function returns the value located in the selected column of the first row of a segment. If the table is not segmented, the FIRST_VALUE function returns the value from the first row of the whole table. This function returns the same type of variable that you input for your requested value. For example, requesting the value for the first employee in a list using an int type output returns an int type ID column. If you use a varchar type, the function returns a varchar type name column.

3.2.3.1.6.738 Syntax

The following shows the correct syntax for the FIRST_VALUE function.

FIRST_VALUE(Selected_Column) OVER (...)

672 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.739 Arguments

None

3.2.3.1.6.740 Returns

Returns the value located in the selected column of the first row of a segment.

3.2.3.1.6.741 LAST_VALUE

The LAST_VALUE function returns the value located in the selected column of the last row of a segment. If the table is not segmented, the LAST_VALUE function returns the value from the last row of the whole table. This function returns the same type of variable that you input for your requested value. For example, requesting the value for the last employee in a list using an int type output returns an int type ID column. If you use a varchar type, the function returns a varchar type name column.

3.2.3.1.6.742 Syntax

The following shows the correct syntax for the LAST_VALUE function.

LAST_VALUE(Selected_Column) OVER (...)

3.2.3.1.6.743 Arguments

None

3.2.3.1.6.744 Returns

Returns the value located in the selected column of the last row of a segment.

3.2.3.1.6.745 NTH_VALUE

The NTH_VALUE function returns the value located in the selected column of a specified row of a segment. While the NTH_VALUE function is identical to the FIRST_VALUE and LAST_VALUE functions, it requires you to add a literal and whole int type n value representing the row number containing the value that you want. If you select an n value larger than the number of rows in the table, the NTH_VALUE function returns NULL.

3.2.3.1.6.746 Syntax

The following shows the correct syntax for the NTH_VALUE function.

NTH_VALUE(Selected_Column, n) OVER (...)

3.2. SQL Statements and Syntax 673 SQream DB, Release 2021.1

3.2.3.1.6.747 Examples

The following example shows the syntax for a table named superstore used for tracking item sales in thousands:

CREATE TABLE superstore ( "Section" varchar(40), "Product_Name" varchar(40), "Sales_In_K" int, );

The following example shows the output of the syntax above:

Section Product_Name Sales_In_K bed pillow_case 17 bath rubber_duck 22 beyond curtain 15 bed side_table 8 bath shampoo 9 bed blanket 7 bath bath_bomb 13 bath conditioner 17 beyond lamp 7 bath soap 13 beyond rug 12 bed pillow 17

3.2.3.1.6.748 DENSE_RANK

The DENSE_RANK function returns data ranking information and is similar to the RANK function. Its main difference is related to the results of identical line values. While the RANK function proceeds with the next row after providing the same value for identical consecutive rows (for example, 1,1,3), the DENSE_RANK function proceeds with the next number in the sequence (1,1,2).

3.2.3.1.6.749 Syntax

The following shows the correct syntax for the DENSE_RANK function.

DENSE_RANK ( ) OVER (...)

3.2.3.1.6.750 Arguments

None

3.2.3.1.6.751 Returns

Returns data ranking information.

674 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.752 PERCENT_RANK

The PERCENT_RANK function returns relative data ranking information and is similar to the RANK function. It differs from the RANK function in the following ways: • The output type is double instead of bigint. • Shows ranking as a relative rank number (for example, a number between and including 0 and 1), and not an absolute rank number (for example, 1, 1, 3).

3.2.3.1.6.753 Syntax

The following shows the correct syntax for the PERCENT_RANK function.

PERCENT_RANK ( ) OVER (...)

3.2.3.1.6.754 Arguments

None

3.2.3.1.6.755 Returns

Returns relative data ranking information in the form of a number between (and including) 0 and 1.

3.2.3.1.6.756 CUME_DIST

The CUME_DIST function returns the cumulative distribution for each of your partitioned rows between the range of 1/n (with n being the total number of rows), and 1. For example, in a table with ten rows, the cumulative distance for the first row is 0.1 and the second is 0.2, etc. The cumulative distance of the final row is always 1.

3.2.3.1.6.757 Syntax

The following shows the correct syntax for the CUME_DIST function.

CUME_DIST ( ) OVER (...)

3.2.3.1.6.758 Arguments

None

3.2.3.1.6.759 Returns

Returns the cumulative distribution for each of your partitioned rows.

3.2. SQL Statements and Syntax 675 SQream DB, Release 2021.1

3.2.3.1.6.760 NTILE

The NTILE function returns an integer ranging between 1 and the argument value, and divides a table’s rows into n ranked buckets according to the following: • The selected column. • The selected positive, literal n. Selecting an n that exceeds the number of rows in a table outputs a result identical to the RANK function. When this happens, each ranked bucket is populated with a single value and any bucket numbered higher than the number of rows not even being created. Additionally, if the number of rows in a bucket is not exactly divisible by n, the function gives precedence to values at the top. For example, if n is 6 for 63 values, the top three buckets each have 11 values, and the bottom three each have 10.

3.2.3.1.6.761 Syntax

The following shows the correct syntax for the NTILE function.

NTILE(n) OVER (...)

3.2.3.1.6.762 System Functions

System functions are used for working with database objects, settings, and values.

3.2.3.1.6.763 SHOW_VERSION

SHOW_VERSION() is a function that returns the system version for SQream DB.

3.2.3.1.6.764 Permissions

No special permissions are required.

3.2.3.1.6.765 Syntax show_version_statement ::= SELECT SHOW_VERSION() ;

3.2.3.1.6.766 Parameters

None

3.2.3.1.6.767 Notes

To check the SQream DB version from the shell, run $ sqreamd --version

676 Chapter 3. Reference SQream DB, Release 2021.1

3.2.3.1.6.768 Examples

3.2.3.1.6.769 Getting the current SQream DB version

t=> SELECT SHOW_VERSION(); bytesread ------v2019.3

3.3 Catalog reference

SQream DB contains a schema called sqream_catalog that contains information about your database’s objects - tables, columns, views, permissions, and more. Some additional catalog tables are used primarily for internal introspection, which could change across SQream DB versions.

In this topic:

• Types of data exposed by sqream_catalog • Tables in the catalog – clustering_keys – columns – external_tables – external_table_columns – databases – database_permissions – identity_key – permission_types – roles – roles_memberships – savedqueries – schemas – schema_permissions – tables – table_permissions – udf_permissions – user_defined_functions – views • Additional tables

3.3. Catalog reference 677 SQream DB, Release 2021.1

– extents – chunk_columns – chunks – delete_predicates • Examples – List all tables in the database – List all schemas in the database – List columns and their types for a specific table – List delete predicates – List Saved queries

3.3.1 Types of data exposed by sqream_catalog

Table 51: Database objects Object Table Clustering clustering_keys keys Columns columns, external_table_columns Databases databases Permissions table_permissions, database_permissions, schema_permissions, permission_types, udf_permissions Roles roles, roles_memeberships Schemas schemas Sequences identity_key Tables tables, external_tables Views views UDFs user_defined_functions

The catalog contains a few more tables which contain storage details for internal use

Table 52: Storage objects Object Table Extents extents Chunks chunks Delete predicates delete_predicates

3.3.2 Tables in the catalog

3.3.2.1 clustering_keys

Explicit clustering keys for tables. When more than one clustering key is defined, each key is listed in a separate row.

678 Chapter 3. Reference SQream DB, Release 2021.1

Column Description database_name Name of the database containing the table table_id ID of the table containing the column schema_name Name of the schema containing the table table_name Name of the table containing the column clustering_key Name of the column that is a clustering key for this table

3.3.2.2 columns

Column objects for standard tables

Column Description database_name Name of the database containing the table schema_name Name of the schema containing the table table_id ID of the table containing the column table_name Name of the table containing the column column_id Ordinal of the column in the table (begins at 0) column_name Name of the column type_name Data type of the column column_size The maximum length in bytes. has_default NULL if the column has no default value. 1 if the default is a fixed value, or 2 if the default is an Identity default_value Default value for the column compression_strategyUser-overridden compression strategy created Timestamp when the column was created altered Timestamp when the column was last altered

3.3.2.3 external_tables external_tables identifies external tables in the database. For TABLES see tables

Column Description database_name Name of the database containing the table table_id Database-unique ID for the table schema_name Name of the schema containing the table table_name Name of the table format Identifies the foreign data wrapper used. 0 for csv_fdw, 1 for parquet_fdw, 2 for orc_fdw. created Identifies the clause used to create the table

3.3.2.4 external_table_columns

Column objects for external tables

3.3. Catalog reference 679 SQream DB, Release 2021.1

3.3.2.5 databases

Column Description database_Id Unique ID of the database database_name Name of the database default_disk_chunk_size Internal use default_process_chunk_size Internal use rechunk_size Internal use storage_subchunk_size Internal use compression_chunk_size_threshold Internal use

3.3.2.6 database_permissions database_permissions identifies all permissions granted to databases. There is one row for each combination of role (grantee) and permission granted to a database.

Column Description database_name Name of the database the permission applies to role_id ID of the role granted permissions (grantee) permission_type Identifies the permission type

3.3.2.7 identity_key

3.3.2.8 permission_types permission_types Identifies the permission names that exist in the database.

Column Description permission_type_id ID of the permission type name Name of the permission type

3.3.2.9 roles roles identifies the roles in the database.

Column Description role_id Database-unique ID of the role name Name of the role superuser Identifies if this role is a superuser. 1 for superuser or 0 otherwise. login Identifies if this role can be used to log in to SQream DB. 1 for yes or 0 otherwise. has_password Identifies if this role has a password. 1 for yes or 0 otherwise. can_create_function Identifies if this role can create UDFs. 1 for yes, 0 otherwise.

3.3.2.10 roles_memberships roles_memberships identifies the role memberships in the database.

680 Chapter 3. Reference SQream DB, Release 2021.1

Column Description role_id Role ID member_role_id ID of the parent role from which this role will inherit inherit Identifies if permissions are inherited. 1 for yes or 0 otherwise.

3.3.2.11 savedqueries savedqueries identifies the saved_queries in the database.

Column Description name Saved query name num_parameters Number of parameters to be replaced at run-time

3.3.2.12 schemas schemas identifies all the database’s schemas.

Column Description schema_id Unique ID of the schema schema_name Name of the schema schema_owner Name of the role who owns this schema rechunker_ignore Internal use

3.3.2.13 schema_permissions schema_permissions identifies all permissions granted to schemas. There is one row for each combination of role (grantee) and permission granted to a schema.

Column Description database_name Name of the database containing the schema schema_id ID of the schema the permission applies to role_id ID of the role granted permissions (grantee) permission_type Identifies the permission type

3.3.2.14 tables tables identifies proper SQream tables in the database. For EXTERNAL TABLES see external_tables

Column Description database_name Name of the database containing the table table_id Database-unique ID for the table schema_name Name of the schema containing the table table_name Name of the table row_count_valid Identifies if the row_count can be used row_count Number of rows in the table rechunker_ignore Internal use

3.3. Catalog reference 681 SQream DB, Release 2021.1

3.3.2.15 table_permissions table_permissions identifies all permissions granted to tables. There is one row for each combination of role (grantee) and permission granted to a table.

Column Description database_name Name of the database containing the table table_id ID of the table the permission applies to role_id ID of the role granted permissions (grantee) permission_type Identifies the permission type

3.3.2.16 udf_permissions

3.3.2.17 user_defined_functions user_defined_functions identifies UDFs in the database.

Column Description database_name Name of the database containing the view function_id Database-unique ID for the UDF function_name Name of the UDF

3.3.2.18 views views identifies views in the database.

Column Description view_id Database-unique ID for the view view_schema Name of the schema containing the view view_name Name of the view view_data Internal use view_query_text Identifies the AS clause used to create the view

3.3.3 Additional tables

There are additional tables in the catalog that can be used for performance monitoring and inspection. The definition for these tables is provided below could change across SQream DBversions.

3.3.3.1 extents extents identifies storage extents. Each storage extents can contain several chunks.

Note: This is an internal table designed for low-level performance troubleshooting.

682 Chapter 3. Reference SQream DB, Release 2021.1

Column Description database_name Name of the databse containing the extent table_id ID of the table containing the extent column_id ID of the column containing the extent extent_id ID for the extent size Extent size in megabytes path Full path to the extent on the file system

3.3.3.2 chunk_columns chunk_columns lists chunk information by column.

Column Description database_name Name of the databse containing the extent table_id ID of the table containing the extent column_id ID of the column containing the extent chunk_id ID for the chunk extent_id ID for the extent compressed_size Actual chunk size in bytes uncompressed_size Uncompressed chunk size in bytes compression_type Actual compression scheme for this chunk long_min Minimum numeric value in this chunk (if exists) long_max Maximum numeric value in this chunk (if exists) string_min Minimum text value in this chunk (if exists) string_max Maximum text value in this chunk (if exists) offset_in_file Internal use

Note: This is an internal table designed for low-level performance troubleshooting.

3.3.3.3 chunks chunks identifies storage chunks.

Note: This is an internal table designed for low-level performance troubleshooting.

Column Description database_nameName of the databse containing the chunk table_id ID of the table containing the chunk column_id ID of the column containing the chunk rows_num Amount of rows contained in the chunk deletion_statusWhen data is deleted from the table, it is first deleted logically. This value identifies howmuch data is deleted from the chunk. 0 for no data, 1 for some data, 2 to specify the entire chunk is deleted.

3.3. Catalog reference 683 SQream DB, Release 2021.1

3.3.3.4 delete_predicates delete_predicates identifies the existing delete predicates that have not been cleaned up. Each DELETE command may result in several entries in this table.

Note: This is an internal table designed for low-level performance troubleshooting.

Column Description database_name Name of the databse containing the predicate table_id ID of the table containing the predicate max_chunk_id Internal use. Placeholder marker for the highest chunk_id logged during the DELETE operation. delete_predicate Identifies the DELETE predicate

3.3.4 Examples

3.3.4.1 List all tables in the database master=> SELECT * FROM sqream_catalog.tables; database_name | table_id | schema_name | table_name | row_count_valid | row_count |␣ ,→rechunker_ignore ------+------+------+------+------+------+-- ,→------master | 1 | public | nba | true | 457 | ␣ ,→ 0 master | 12 | public | cool_dates | true | 5 | ␣ ,→ 0 master | 13 | public | cool_numbers | true | 9 | ␣ ,→ 0 master | 27 | public | jabberwocky | true | 8 | ␣ ,→ 0

3.3.4.2 List all schemas in the database master=> SELECT * FROM sqream_catalog.schemas; schema_id | schema_name | schema_owner | rechunker_ignore ------+------+------+------0 | public | sqream | false 1 | secret_schema | mjordan | false

3.3.4.3 List columns and their types for a specific table

SELECT column_name, type_name FROM sqream_catalog.columns WHERE table_name='cool_animals';

684 Chapter 3. Reference SQream DB, Release 2021.1

3.3.4.4 List delete predicates

SELECT t.table_name, d.* FROM sqream_catalog.delete_predicates AS d INNER JOIN sqream_catalog.tables AS t ON d.table_id=t.table_id;

3.3.4.5 List Saved queries

SELECT * FROM sqream_catalog.savedqueries;

3.4 Command line programs

SQream contains several command line programs for using, starting, managing, and configuring SQream DB clusters. This topic contains the reference for these programs, as well as flags and configuration settings.

Table 53: User CLIs Command Usage sqream sql Built-in SQL client

Table 54: SQream DB cluster components Command Usage sqreamd Start a SQream DB worker metadata_server The cluster manager/coordinator that enables scaling SQream DB. server_picker Load balancer end-point

Table 55: SQream DB utilities Command Usage SqreamStorage Initialize a cluster and set superusers upgrade_storage Upgrade metadata schemas when upgrading between major versions

Table 56: Docker utilities Command Usage sqream_console Dockerized convenience wrapper for operations sqream_installer Dockerized installer

3.4.1 metadata_server

SQream DB’s cluster manager/coordinator is called metadata_server. In general, you should not need to run metadata_server manually, but it is sometimes useful for testing. This page serves as a reference for the options and parameters.

3.4. Command line programs 685 SQream DB, Release 2021.1

3.4.1.1 Positional command line arguments

$ metadata_server [ [ ] ]

Argument Default Description Logging path Current directory Path to store metadata logs into Listen port 3105 TCP listen port. If used, log path must be specified beforehand.

3.4.1.2 Starting metadata server

3.4.1.2.1 Starting temporarily

$ nohup metadata_server & $ MS_PID=$!

Using nohup and & sends metadata server to run in the background.

Note: • Logs are saved to the current directory, under metadata_server_logs. • The default listening port is 3105

3.4.1.2.2 Starting temporarily with non-default port

To use a non-default port, specify the logging path as well.

$ nohup metadata_server /home/rhendricks/metadata_logs 9241 & $ MS_PID=$!

Using nohup and & sends metadata server to run in the background.

Note: • Logs are saved to the /home/rhendricks/metadata_logs directory. • The listening port is 9241

3.4.1.2.3 Stopping metadata server

To stop metadata server:

$ kill -9 $MS_PID

Tip: It is safe to stop any SQream DB component at any time using kill. No partial data or data corruption should occur when using this method to stop the process.

686 Chapter 3. Reference SQream DB, Release 2021.1

3.4.2 sqreamd

SQream DB’s main worker is called sqreamd. This page serves as a reference for the options and parameters.

3.4.2.1 Starting SQream DB

3.4.2.1.1 Start SQream DB temporarily

In general, you should not need to run sqreamd manually, but it is sometimes useful for testing.

$ nohup sqreamd -config ~/.sqream/sqream_config.json & $ SQREAM_PID=$!

Using nohup and & sends SQream DB to run in the background. To stop the active worker:

$ kill -9 $SQREAM_PID

Tip: It is safe to stop SQream DB at any time using kill. No partial data or data corruption should occur when using this method to stop the process.

3.4.2.2 Command line arguments

sqreamd supports the following command line arguments:

Argument Default Description --version None Outputs the version of SQream DB and immediately exits. -config $HOME/.sqream/ Specifies the configuration file touse sqream_config.json --port_ssl Don’t use SSL When specified, tells SQream DB to listen for SSL con- nections

3.4.2.2.1 Positional command arguments

sqreamd also supports positional arguments, when not using a configuration file. This method can be used to temporarily start a SQream DB worker for testing.

$ sqreamd

3.4. Command line programs 687 SQream DB, Release 2021.1

Argument Re- Description quired Storage path ✓ Full path to a valid SQream DB persistant storage GPU Ordinal ✓ Number representing the GPU to use. Check GPU ordinals with nvidia-smi -L TCP listen port (unse- ✓ TCP port SQream DB should listen on. Recommended: 5000 cured) License path ✓ Full path to a SQream DB license file

3.4.3 sqream-console sqream-console is an interactive shell designed to help manage a dockerized SQream DB installation. The console itself is a dockerized application. This page serves as a reference for the options and parameters.

In this topic:

• Starting the console • Operations and flag reference – Commands – Master ∗ Syntax ∗ Common usage · Start master node · Start master node on different ports · Listing active master nodes and workers · Stopping all SQream DB workers and master – Workers ∗ Syntax ∗ Common usage · Start 2 workers · Stop a single worker · Start workers with a different spool size · Starting multiple workers on non-dedicated GPUs · Overriding default configuration files – Client ∗ Syntax ∗ Common usage · Start a client

688 Chapter 3. Reference SQream DB, Release 2021.1

· Start a client to a specific worker · Start master node on different ports · Listing active master nodes and worker nodes – Editor ∗ Syntax ∗ Common usage · Start the editor UI · Stop the editor UI • Using the console to start SQream DB – Starting a SQream DB cluster for the first time

3.4.3.1 Starting the console

sqream-console can be found in your SQream DB installation, under the name sqream-console. Start the console by executing it from the shell

$ ./sqream-console ...... ,→......

￿￿￿￿￿￿￿￿ ￿￿￿￿￿￿￿ ￿￿￿￿￿￿￿ ￿￿￿￿￿￿￿￿ ￿￿￿￿￿￿ ￿￿￿￿ ￿￿￿￿ ￿￿￿￿￿￿￿ ￿￿￿￿￿￿￿ ￿￿￿￿ ￿￿￿￿￿￿￿￿￿￿￿ ￿￿￿￿￿￿￿ ￿￿￿ ￿￿￿￿￿￿￿￿ ￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿ ￿￿￿￿￿ ￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿ ￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿ ￿￿￿￿￿￿￿￿ ￿￿￿￿￿￿￿￿￿￿￿ ￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿ ￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿ ￿￿￿ ￿￿￿ ￿￿￿￿￿￿￿￿￿ ￿￿￿￿￿￿￿￿￿￿￿￿￿￿ ￿￿￿￿￿￿ ￿￿￿￿￿￿ ￿￿￿￿￿￿￿￿￿￿￿￿￿ ￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿ ￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿ ￿￿￿ ￿￿￿ ￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿ ￿￿￿￿￿￿ ￿￿￿￿￿￿ ￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿ ￿￿￿￿￿￿￿￿￿￿￿￿￿￿ ￿￿￿￿￿￿ ￿￿￿ ￿￿￿ ￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿ ￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿ ￿￿￿￿￿￿￿￿ ￿￿￿￿￿￿￿ ￿￿￿ ￿￿￿￿￿￿￿￿￿￿￿￿￿￿ ￿￿￿￿￿￿ ￿￿￿ ￿￿￿￿￿￿￿ ￿￿￿￿￿￿￿ ￿￿￿ ￿￿￿￿￿￿￿￿￿￿￿￿￿ ￿￿￿￿￿￿￿ ￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿

...... ,→......

Welcome to SQream Console ver 1.7.6, type exit to log-out

usage: sqream [-h] [--settings] {master,worker,client,editor} ...

Run SQream Cluster

optional arguments: -h, --help show this help message and exit --settings sqream environment variables settings

subcommands: sqream services

{master,worker,client,editor} sub-command help (continues on next page)

3.4. Command line programs 689 SQream DB, Release 2021.1

(continued from previous page) master start sqream master worker start sqream worker client operating sqream client editor operating sqream statement editor sqream-console>

The console is now waiting for commands. The console is a wrapper around a standard linux shell. It supports commands like ls, cp, etc. All SQream DB-specific commands start with the keyword sqream.

3.4.3.2 Operations and flag reference

3.4.3.2.1 Commands

Command Description sqream --help Shows the initial usage information sqream master Controls the master node’s operations sqream worker Controls workers’ operations sqream client Access to sqream sql sqream editor Controls the statement editor’s operations (web UI)

3.4.3.2.2 Master

The master node contains the metadata server and the load balancer.

3.4.3.2.2.1 Syntax sqream master

Flag/command Description --start [ Starts the master node. The --single-host modifier sets the mode to allow all --single-host containers to run on the same server. ] --stop [ Stops the master node and all connected workers. The --all modifier instructs the --all ] --stop command to stop all running services related to SQream DB --list Shows a list of all active master nodes and their workers -p Sets the port for the load balancer. Defaults to 3108 -m Sets the port for the metadata server. Defaults to 3105

690 Chapter 3. Reference SQream DB, Release 2021.1

3.4.3.2.2.2 Common usage

3.4.3.2.2.3 Start master node sqream-console> sqream master --start starting master server in single_host mode ... sqream_single_host_master is up and listening on ports: 3105,3108

3.4.3.2.2.4 Start master node on different ports sqream-console> sqream master --start -p 4105 -m 4108 starting master server in single_host mode ... sqream_single_host_master is up and listening on ports: 4105,4108

3.4.3.2.2.5 Listing active master nodes and workers sqream-console> sqream master --list container name: sqream_single_host_worker_1, container id: de9b8aff0a9c container name: sqream_single_host_worker_0, container id: c919e8fb78c8 container name: sqream_single_host_master, container id: ea7eef80e038

3.4.3.2.2.6 Stopping all SQream DB workers and master sqream-console> sqream master --stop --all shutting down 2 sqream services ... sqream_editor stopped sqream_single_host_worker_1 stopped sqream_single_host_worker_0 stopped sqream_single_host_master stopped

3.4.3.2.3 Workers

Workers are SQream DB daemons, that connect to the master node.

3.4.3.2.3.1 Syntax sqream worker

Flag/command Description --start [ options [ Starts worker nodes. See options table below. ...] ] --stop [ | --all ] command to stop all running workers.

3.4. Command line programs 691 SQream DB, Release 2021.1

Start options are specified consecutively, separated by spaces.

Table 57: Start options Option Description Specifies the number of workers to start -j [ specify one file per worker, separated by spaces. ...] -p [ Sets the ports to listen on. When launching multiple workers, specify one port per ...] worker, separated by spaces. Defaults to 5000 - 5000+n. -g Sets the GPU ordinal to assign to each worker. When launching multiple workers, specify [ ...] one GPU ordinal per worker, separated by spaces. Defaults to automatic allocation. -m --master-host Sets the hostname for the master node. Defaults to localhost. --master-port Sets the port for the master node. Defaults to 3105. --stand-alone For testing only: Starts a worker without connecting to the master node.

3.4.3.2.3.2 Common usage

3.4.3.2.3.3 Start 2 workers

After starting the master node, start workers: sqream-console> sqream worker --start 2 started sqream_single_host_worker_0 on port 5000, allocated gpu: 0 started sqream_single_host_worker_1 on port 5001, allocated gpu: 1

3.4.3.2.3.4 Stop a single worker

To stop a single worker, find its name first: sqream-console> sqream master --list container name: sqream_single_host_worker_1, container id: de9b8aff0a9c container name: sqream_single_host_worker_0, container id: c919e8fb78c8 container name: sqream_single_host_master, container id: ea7eef80e038

Then, issue a stop command: sqream-console> sqream worker --stop sqream_single_host_worker_1 stopped sqream_single_host_worker_1

3.4.3.2.3.5 Start workers with a different spool size

If no spool size is specified, the RAM is equally distributed among workers. Sometimes a system engineer may wish to specify the spool size manually. This example starts two workers, with a spool size of 50GB per node:

692 Chapter 3. Reference SQream DB, Release 2021.1

sqream-console> sqream worker --start 2 -m 50

3.4.3.2.3.6 Starting multiple workers on non-dedicated GPUs

By default, SQream DB workers assign one worker per GPU. However, a system engineer may wish to assign multiple workers per GPU, if the workload permits it. This example starts 4 workers on 2 GPUs, with 50GB spool each: sqream-console> sqream worker --start 2 -g 0 -m 50 started sqream_single_host_worker_0 on port 5000, allocated gpu: 0 started sqream_single_host_worker_1 on port 5001, allocated gpu: 0 sqream-console> sqream worker --start 2 -g 1 -m 50 started sqream_single_host_worker_2 on port 5002, allocated gpu: 1 started sqream_single_host_worker_3 on port 5003, allocated gpu: 1

3.4.3.2.3.7 Overriding default configuration files

It is possible to override default configuration settings by listing a configuration file for every worker. This example starts 2 workers on the same GPU, with modified configuration files: sqream-console> sqream worker --start 2 -g 0 -j /etc/sqream/configfile.json /etc/sqream/ ,→configfile2.json

3.4.3.2.4 Client

The client operation runs sqream sql in interactive mode.

Note: The dockerized client is useful for testing and experimentation. It is not the recommended method for executing analytic queries. See more about connecting a third party tool to SQream DB for data analysis.

3.4.3.2.4.1 Syntax sqream client

3.4. Command line programs 693 SQream DB, Release 2021.1

Flag/command Description --master Connects to the master node via the load balancer --worker Connects to a worker directly --host Specifies the hostname to connect to. Defaults to localhost. --port , -p Specifies the port to connect to. Defaults to 3108 when used with -master. --user , -u Specifies the role’s username to use --password , -w Specifies the password to use for the role --database , -d Specifies the database name for the connection. Defaults to master.

3.4.3.2.4.2 Common usage

3.4.3.2.4.3 Start a client

Connect to default master database through the load balancer: sqream-console> sqream client --master -u sqream -w sqream Interactive client mode To quit, use ^D or \q. master=> _

3.4.3.2.4.4 Start a client to a specific worker

Connect to database raviga directly to a worker on port 5000: sqream-console> sqream client --worker -u sqream -w sqream -p 5000 -d raviga Interactive client mode To quit, use ^D or \q. raviga=> _

3.4.3.2.4.5 Start master node on different ports sqream-console> sqream master --start -p 4105 -m 4108 starting master server in single_host mode ... sqream_single_host_master is up and listening on ports: 4105,4108

3.4.3.2.4.6 Listing active master nodes and worker nodes sqream-console> sqream master --list container name: sqream_single_host_worker_1, container id: de9b8aff0a9c (continues on next page)

694 Chapter 3. Reference SQream DB, Release 2021.1

(continued from previous page) container name: sqream_single_host_worker_0, container id: c919e8fb78c8 container name: sqream_single_host_master, container id: ea7eef80e038

3.4.3.2.5 Editor

The editor operation runs the web UI for the SQream DB Statement Editor. The editor can be used to run queries from a browser.

3.4.3.2.5.1 Syntax sqream editor

Flag/command Description --start Start the statement editor --stop Shut down the statement editor --port , -p Specify a different port for the editor. Defaults to 3000.

3.4.3.2.5.2 Common usage

3.4.3.2.5.3 Start the editor UI sqream-console> sqream editor --start access sqream statement editor through Chrome http://192.168.0.100:3000

3.4.3.2.5.4 Stop the editor UI sqream-console> sqream editor --stop sqream_editor stopped

3.4.3.3 Using the console to start SQream DB

The console is used to start and stop SQream DB components in a dockerized environment.

3.4.3.3.1 Starting a SQream DB cluster for the first time

To start a SQream DB cluster, start the master node, followed by workers. The example below starts 2 workers, running on 2 dedicated GPUs. sqream-console> sqream master --start starting master server in single_host mode ... sqream_single_host_master is up and listening on ports: 3105,3108 (continues on next page)

3.4. Command line programs 695 SQream DB, Release 2021.1

(continued from previous page) sqream-console> sqream worker --start 2 started sqream_single_host_worker_0 on port 5000, allocated gpu: 0 started sqream_single_host_worker_1 on port 5001, allocated gpu: 1 sqream-console> sqream editor --start access sqream statement editor through Chrome http://192.168.0.100:3000

SQream DB is now listening on port 3108 for any incoming statements. A user can also access the web editor (running on port 3000 on the SQream DB machine) to connect and run queries.

3.4.4 sqream-installer sqream-installer is an application that prepares and configures a dockerized SQream DB installation. This page serves as a reference for the options and parameters.

In this topic:

• Operations and flag reference – Command line flags • Usage – Install SQream DB for the first time – Modify exposed directories – Install a new license package – View system settings – Upgrading to a new version of SQream DB

696 Chapter 3. Reference SQream DB, Release 2021.1

3.4.4.1 Operations and flag reference

3.4.4.1.1 Command line flags

Flag Description -i Loads the docker images for installation -k Load new licenses from the license subdirectory -K Validate licenses -f Force overwrite any existing installation and data directories cur- rently in use -c /etc/sqream. -v exist. -l Specifies a path to store system startup logs. Defaults to /var/log/ sqream -d Specifies a path to expose to SQream DB workers. To expose several paths, repeat the usage of this flag. -s Shows system settings -r Reset the system configuration. This flag can’t be combined with other flags.

3.4.4.2 Usage

3.4.4.2.1 Install SQream DB for the first time

Assuming license package tarball has been placed in the license subfolder. • The path where SQream DB will store data is /home/rhendricks/sqream_storage. • Logs will be stored in /var/log/sqream • Source CSV, Parquet, and ORC files can be accessed from /home/rhendricks/source_data. All other directory paths are hidden from the Docker container.

# ./sqream-install -i -k -v /home/rhendricks/sqream_storage -l /var/log/sqream -c /etc/ ,→sqream -d /home/rhendricks/source_data

Note: Installation commands should be run with sudo or root access.

3.4.4.2.2 Modify exposed directories

To expose more directory paths for SQream DB to read and write data from, re-run the installer with additional directory flags.

# ./sqream-install -d /home/rhendricks/more_source_data

There is no need to specify the initial installation flags - only the modified exposed directory paths flag.

3.4. Command line programs 697 SQream DB, Release 2021.1

3.4.4.2.3 Install a new license package

Assuming license package tarball has been placed in the license subfolder.

# ./sqream-install -k

3.4.4.2.4 View system settings

This information may be useful to identify problems accessing directory paths, or locating where data is stored.

# ./sqream-install -s SQREAM_CONSOLE_TAG=1.7.4 SQREAM_TAG=2020.1 SQREAM_EDITOR_TAG=3.1.0 license_worker_0=[...] license_worker_1=[...] license_worker_2=[...] license_worker_3=[...] SQREAM_VOLUME=/home/rhendricks/sqream_storage SQREAM_DATA_INGEST=/home/rhendricks/source_data SQREAM_CONFIG_DIR=/etc/sqream/ LICENSE_VALID=true SQREAM_LOG_DIR=/var/log/sqream/ SQREAM_USER=sqream SQREAM_HOME=/home/sqream SQREAM_ENV_PATH=/home/sqream/.sqream/env_file PROCESSOR=x86_64 METADATA_PORT=3105 PICKER_PORT=3108 NUM_OF_GPUS=8 CUDA_VERSION=10.1 NVIDIA_SMI_PATH=/usr/bin/nvidia-smi DOCKER_PATH=/usr/bin/docker NVIDIA_DRIVER=418 SQREAM_MODE=single_host

3.4.4.2.5 Upgrading to a new version of SQream DB

When upgrading to a new version with Docker, most settings don’t need to be modified. The upgrade process replaces the existing docker images with new ones. 1. Obtain the new tarball, and untar it to an accessible location. Enter the newly extracted directory. 2. Install the new images

# ./sqream-install -i

3. The upgrade process will check for running SQream DB processes. If any are found running, the installer will ask to stop them in order to continue the upgrade process. Once all services are stopped, the new version will be loaded.

698 Chapter 3. Reference SQream DB, Release 2021.1

4. After the upgrade, open sqream-console and restart the desired services.

3.4.5 server_picker

SQream DB’s load balancer is called server_picker. This page serves as a reference for the options and parameters.

3.4.5.1 Positional command line arguments

$ server_picker [ [ [ ,→ ] ]

Argument Default Description Metadata server address IP or hostname to an active metadata server Metadata server port TCP port to an active metadata server TCP listen port 3108 TCP port for server picker to listen on Metadata server port 3109 SSL port for server picker to listen on

3.4.5.2 Starting server picker

3.4.5.2.1 Starting temporarily

In general, you should not need to run server_picker manually, but it is sometimes useful for testing. Assuming we have a metadata server listening on the localhost, on port 3105:

$ nohup server_picker 127.0.0.1 3105 & $ SP_PID=$!

Using nohup and & sends server picker to run in the background.

3.4.5.2.2 Starting temporarily with non-default port

Tell server picker to listen on port 2255 for unsecured connections, and port 2266 for SSL connections.

$ nohup server_picker 127.0.0.1 3105 2255 2266 & $ SP_PID=$!

Using nohup and & sends server picker to run in the background.

3.4.5.2.3 Stopping server picker

$ kill -9 $SP_PID

Tip: It is safe to stop any SQream DB component at any time using kill. No partial data or data corruption should occur when using this method to stop the process.

3.4. Command line programs 699 SQream DB, Release 2021.1

3.4.6 SqreamStorage

SqreamStorage allows the creation of a new storage cluster. This page serves as a reference for the options and parameters.

3.4.6.1 Running SqreamStorage

SqreamStorage can be found in the bin directory of your SQream DB installation..

3.4.6.2 Command line arguments

SqreamStorage supports the following command line arguments:

Argument Shorthand Description --create-cluster -C Creates a storage cluster at a specified path --cluster-root -r Specifies the cluster path. The path must not already exist.

3.4.6.3 Examples

3.4.6.3.1 Create a new storage cluster

Create a new cluster at /home/rhendricks/raviga_database:

$ SqreamStorage --create-cluster --cluster-root /home/rhendricks/raviga_database Setting cluster version to: 26

This can also be written shorthand as SqreamStorage -C -r /home/rhendricks/raviga_database. This message confirms the creation of the cluster successfully.

3.4.7 Sqream SQL CLI Reference

SQream DB comes with a built-in client for executing SQL statements either interactively or from the command-line. This page serves as a reference for the options and parameters. Learn more about using SQream DB SQL with the CLI by visiting the First steps with SQream DB tutorial.

In this topic:

• Installing Sqream SQL – Troubleshooting Sqream SQL Installation • Using Sqream SQL – Running Commands Interactively (SQL shell) – Executing Batch Scripts (-f) – Executing Commands Immediately (-c)

700 Chapter 3. Reference SQream DB, Release 2021.1

• Examples – Starting a Regular Interactive Shell – Executing Statements in an Interactive Shell – Executing SQL Statements from the Command Line – Controlling the Client Output ∗ Exporting SQL Query Results to CSV ∗ Changing a CSV to a TSV – Executing a Series of Statements From a File – Connecting Using Environment Variables – Connecting to a Specific Queue • Operations and Flag References – Command Line Arguments ∗ Supported Record Delimiters – Meta-Commands – Basic Commands – Moving Around the Command Line – Searching

3.4.7.1 Installing Sqream SQL

If you have a SQream DB installation on your server, sqream sql can be found in the bin directory of your SQream DB installation, under the name sqream.

Note: If you installed SQream DB via Docker, the command is named sqream-client sql, and can be found in the same location as the console.

Changed in version 2020.1: As of version 2020.1, ClientCmd has been renamed to sqream sql. To run sqream sql on any other Linux host: 1. Download the sqream sql tarball package from the Client drivers for v2021.1 page. 2. Untar the package: tar xf sqream-sql-v2020.1.1_stable.x86_64.tar.gz 3. Start the client:

$ cd sqream-sql-v2020.1.1_stable.x86_64 $ ./sqream sql --port=5000 --username=jdoe --databasename=master Password:

Interactive client mode To quit, use ^D or \q.

master=> _

3.4. Command line programs 701 SQream DB, Release 2021.1

3.4.7.1.1 Troubleshooting Sqream SQL Installation

Upon running sqream sql for the first time, you may get an error error while loading shared libraries: libtinfo.so.5: cannot open shared object file: No such file or directory. Solving this error requires installing the ncruses or libtinfo libraries, depending on your operating system. • Ubuntu: 1. Install libtinfo: $ sudo apt-get install -y libtinfo 2. Depending on your Ubuntu version, you may need to create a symbolic link to the newer libtinfo that was installed. For example, if libtinfo was installed as /lib/x86_64-linux-gnu/libtinfo.so.6.2: $ sudo ln -s /lib/x86_64-linux-gnu/libtinfo.so.6.2 /lib/x86_64-linux-gnu/ libtinfo.so.5 • CentOS / RHEL: 1. Install ncurses: $ sudo yum install -y ncurses-libs 2. Depending on your RHEL version, you may need to create a symbolic link to the newer libtinfo that was installed. For example, if libtinfo was installed as /usr/lib64/libtinfo.so.6: $ sudo ln -s /usr/lib64/libtinfo.so.6 /usr/lib64/libtinfo.so.5

3.4.7.2 Using Sqream SQL

By default, sqream sql runs in interactive mode. You can issue commands or SQL statements.

3.4.7.2.1 Running Commands Interactively (SQL shell)

When starting sqream sql, after entering your password, you are presented with the SQL shell. To exit the shell, type \q or Ctrl-d.

$ sqream sql --port=5000 --username=jdoe --databasename=master Password:

Interactive client mode To quit, use ^D or \q. master=> _

The database name shown means you are now ready to run statements and queries. Statements and queries are standard SQL, followed by a semicolon (;). Statement results are usually formatted as a valid CSV, followed by the number of rows and the elapsed time for that statement.

702 Chapter 3. Reference SQream DB, Release 2021.1

master=> SELECT TOP 5 * FROM nba; Avery Bradley ,Boston Celtics ,0,PG,25,6-2 ,180,Texas , ,→7730337 Jae Crowder ,Boston Celtics ,99,SF,25,6-6 ,235,Marquette , ,→6796117 John Holland ,Boston Celtics ,30,SG,27,6-5 ,205,Boston University , ,→\N R.J. Hunter ,Boston Celtics ,28,SG,22,6-5 ,185,Georgia State , ,→1148640 Jonas Jerebko ,Boston Celtics ,8,PF,29,6-10,231,\N,5000000 5 rows time: 0.001185s

Note: Null values are represented as \N.

When writing long statements and queries, it may be beneficial to use line-breaks. The prompt for amulti- line statement will change from => to ., to alert users to the change. The statement will not execute until a semicolon is used.

$ sqream sql --port=5000 --username=mjordan -d master Password:

Interactive client mode To quit, use ^D or \q. master=> SELECT "Age", . AVG("Salary") . FROM NBA . GROUP BY 1 . ORDER BY 2 ASC . LIMIT 5 . ; 38,1840041 19,1930440 23,2034746 21,2067379 36,2238119 5 rows time: 0.009320s

3.4.7.2.2 Executing Batch Scripts (-f)

To run an SQL script, use the -f argument. For example,

$ sqream sql --port=5000 --username=jdoe -d master -f sql_script.sql --results-only

Tip: Output can be saved to a file by using redirection (>).

3.4. Command line programs 703 SQream DB, Release 2021.1

3.4.7.2.3 Executing Commands Immediately (-c)

To run a statement from the console, use the -c argument. For example,

$ sqream sql --port=5000 --username=jdoe -d nba -c "SELECT TOP 5 * FROM nba" Avery Bradley ,Boston Celtics ,0,PG,25,6-2 ,180,Texas , ,→7730337 Jae Crowder ,Boston Celtics ,99,SF,25,6-6 ,235,Marquette , ,→6796117 John Holland ,Boston Celtics ,30,SG,27,6-5 ,205,Boston University , ,→\N R.J. Hunter ,Boston Celtics ,28,SG,22,6-5 ,185,Georgia State , ,→1148640 Jonas Jerebko ,Boston Celtics ,8,PF,29,6-10,231,\N,5000000 5 rows time: 0.202618s

Tip: Remove the timing and row count by passing the --results-only parameter

3.4.7.3 Examples

3.4.7.3.1 Starting a Regular Interactive Shell

Connect to local server 127.0.0.1 on port 5000, to the default built-in database, master:

$ sqream sql --port=5000 --username=mjordan -d master Password:

Interactive client mode To quit, use ^D or \q. master=>_

Connect to local server 127.0.0.1 via the built-in load balancer on port 3108, to the default built-in database, master:

$ sqream sql --port=3105 --clustered --username=mjordan -d master Password:

Interactive client mode To quit, use ^D or \q. master=>_

3.4.7.3.2 Executing Statements in an Interactive Shell

Note that all SQL commands end with a semicolon. Creating a new database and switching over to it without reconnecting:

704 Chapter 3. Reference SQream DB, Release 2021.1

$ sqream sql --port=3105 --clustered --username=oldmcd -d master Password:

Interactive client mode To quit, use ^D or \q. master=> create database farm; executed time: 0.003811s master=> \c farm farm=> farm=> create table animals(id int not null, name varchar(30) not null, is_angry bool␣ ,→not null); executed time: 0.011940s farm=> insert into animals values(1,'goat',false); executed time: 0.000405s farm=> insert into animals values(4,'bull',true); executed time: 0.049338s farm=> select * from animals; 1,goat ,0 4,bull ,1 2 rows time: 0.029299s

3.4.7.3.3 Executing SQL Statements from the Command Line

$ sqream sql --port=3105 --clustered --username=oldmcd -d farm -c "SELECT * FROM animals␣ ,→WHERE is_angry = true" 4,bull ,1 1 row time: 0.095941s

3.4.7.3.4 Controlling the Client Output

Two parameters control the dispay of results from the client: • --results-only - removes row counts and timing information • --delimiter - changes the record delimiter

3.4.7.3.4.1 Exporting SQL Query Results to CSV

Using the --results-only flag removes the row counts and timing.

3.4. Command line programs 705 SQream DB, Release 2021.1

$ sqream sql --port=3105 --clustered --username=oldmcd -d farm -c "SELECT * FROM animals ,→" --results-only > file.csv $ cat file.csv 1,goat ,0 2,sow ,0 3,chicken ,0 4,bull ,1

3.4.7.3.4.2 Changing a CSV to a TSV

The --delimiter parameter accepts any printable character.

Tip: To insert a tab, use Ctrl-V followed by Tab ￿ in Bash.

$ sqream sql --port=3105 --clustered --username=oldmcd -d farm -c "SELECT * FROM animals ,→" --delimiter '' > file.tsv $ cat file.tsv 1 goat 0 2 sow 0 3 chicken 0 4 bull 1

3.4.7.3.5 Executing a Series of Statements From a File

Assuming a file containing SQL statements (separated by semicolons):

$ cat some_queries.sql CREATE TABLE calm_farm_animals ( id INT IDENTITY(0, 1), name VARCHAR(30) );

INSERT INTO calm_farm_animals (name) SELECT name FROM animals WHERE is_angry = false;

$ sqream sql --port=3105 --clustered --username=oldmcd -d farm -f some_queries.sql executed time: 0.018289s executed time: 0.090697s

3.4.7.3.6 Connecting Using Environment Variables

You can save connection parameters as environment variables:

$ export SQREAM_USER=sqream; $ export SQREAM_DATABASE=farm; $ sqream sql --port=3105 --clustered --username=$SQREAM_USER -d $SQREAM_DATABASE

706 Chapter 3. Reference SQream DB, Release 2021.1

3.4.7.3.7 Connecting to a Specific Queue

When using the dynamic workload manager - connect to etl queue instead of using the default sqream queue.

$ sqream sql --port=3105 --clustered --username=mjordan -d master --service=etl Password:

Interactive client mode To quit, use ^D or \q. master=>_

3.4.7.4 Operations and Flag References

3.4.7.4.1 Command Line Arguments

Sqream SQL supports the following command line arguments:

Argu- De- Description ment fault -c or None Changes the mode of operation to single-command, non-interactive. Use this argument --command to run a statement and immediately exit. -f or None Changes the mode of operation to multi-command, non-interactive. Use this argument --file to run a sequence of statements from an external file and immediately exit. --host 127. Address of the SQream DB worker. 0. 0.1 --port 5000 Sets the connection port. --databasenameNone Specifies the database name for queries and statements in this session. or -d --usernameNone Username to connect to the specified database. --passwordNone Specify the password using the command line argument. If not specified, the client will prompt the user for the password. --clusteredFalse When used, the client connects to the load balancer, usually on port 3108. If not set, the client assumes the connection is to a standalone SQream DB worker. --service sqreamService name (queue) that statements will file into. --results-onlyFalse Outputs results only, without timing information and row counts --no-historyFalse When set, prevents command history from being saved in ~/.sqream/clientcmdhist --delimiter, Specifies the field separator. By default, sqream sql outputs valid CSVs. Change the delimiter to modify the output to another delimited format (e.g. TSV, PSV). See the section supported record delimiters below for more information.

Tip: Run $ sqream sql --help to see a full list of arguments

3.4.7.4.1.1 Supported Record Delimiters

The supported record delimiters are printable ASCII values (32-126).

3.4. Command line programs 707 SQream DB, Release 2021.1

• Recommended delimiters for use are: ,, |, tab character. • The following characters are not supported: \, N, -, :, ", \n, \r, ., lower-case latin letters, digits (0-9)

3.4.7.4.2 Meta-Commands

• Meta-commands in Sqream SQL start with a backslash (\)

Note: Meta commands do not end with a semicolon

Command Example Description \q or \quit Quit the client. (Same as master=> \q Ctrl-d)

\c or \connect Changes the current connection master=> \c fox to an alternate database fox=>

3.4.7.4.3 Basic Commands

3.4.7.4.4 Moving Around the Command Line

Command Description Ctrl-a Goes to the beginning of the command line. Ctrl-e Goes to the end of the command line. Ctrl-u Deletes from cursor to the beginning of the command line. Ctrl-k Deletes from the cursor to the end of the command line. Ctrl-w Delete from cursor to beginning of a word. Ctrl-y Pastes a word or text that was cut using one of the deletion shortcuts (such as the one above) after the cursor. Alt-b Moves back one word (or goes to the beginning of the word where the cursor is). Alt-f Moves forward one word (or goes to the end of word the cursor is). Alt-d Deletes to the end of a word starting at the cursor. Deletes the whole word if the cursor is at the beginning of that word. Alt-c Capitalizes letters in a word starting at the cursor. Capitalizes the whole word if the cursor is at the beginning of that word. Alt-u Capitalizes from the cursor to the end of the word. Alt-l Makes lowercase from the cursor to the end of the word. Ctrl-f Moves forward one character. Ctrl-b Moves backward one character. Ctrl-h Deletes characters located before the cursor. Ctrl-t Swaps a character at the cursor with the previous character.

708 Chapter 3. Reference SQream DB, Release 2021.1

3.4.7.4.5 Searching

Command Description Ctrl-r Searches the history backward. Ctrl-g Escapes from history-searching mode. Ctrl-p Searches the previous command in history. Ctrl-n Searches the next command in history.

3.4.8 upgrade_storage upgrade_storage is used to upgrade metadata schemas, when upgrading between major versions. This page serves as a reference for the options and parameters.

3.4.8.1 Running upgrade_storage upgrade_storage can be found in the bin directory of your SQream DB installation.

3.4.8.2 Command line arguments upgrade_storage contains one positional argument:

$ upgrade_storage

Argument Required Description Storage path ✓ Full path to a valid storage cluster

3.4.8.3 Results and error codes

Result Message Description Success storage has been Storage has been successfully upgraded upgraded successfully to version 26 Success no need to upgrade Storage doesn’t need an upgrade Failure: levelDB is in use by Check permissions, and ensure no SQream DB workers or can’t read another application metadata_server are running when performing this oper- storage ation.

3.4.8.4 Examples

3.4.8.4.1 Upgrade SQream DB’s storage cluster

$ ./upgrade_storage /home/rhendricks/raviga_database get_leveldb_version path{/home/rhendricks/raviga_database} current storage version 23 upgrade_v24 (continues on next page)

3.4. Command line programs 709 SQream DB, Release 2021.1

(continued from previous page) upgrade_storage to 24 upgrade_storage to 24 - Done upgrade_v25 upgrade_storage to 25 upgrade_storage to 25 - Done upgrade_v26 upgrade_storage to 26 upgrade_storage to 26 - Done validate_leveldb storage has been upgraded successfully to version 26

This message confirms that the cluster has already been upgraded correctly.

3.5 SQL Feature Checklist

To understand which ANSI SQL and other SQL features SQream DB supports, use the tables below.

In this topic:

• Data types and values • Contraints • Transactions • Indexes • Schema changes • Statements • Clauses • Table expressions • Scalar expressions • Permissions • Extra functionality

3.5.1 Data types and values

Read more about supported data types.

710 Chapter 3. Reference SQream DB, Release 2021.1

Table 58: Value Item Sup- Further information ported BOOL ✓ Boolean values TINTINT ✓ Unsigned 1 byte integer (0 - 255) SMALLINT ✓ 2 byte integer (-32,768 - 32,767) INT ✓ 4 byte integer (-2,147,483,648 - 2,147,483,647) BIGINT ✓ 8 byte integer (-9,223,372,036,854,775,808 - 9,223,372,036,854,775,807) REAL ✓ 4 byte floating point DOUBLE, FLOAT ✓ 8 byte floating point DECIMAL, NUMERIC ✓ Fixed-point numbers. VARCHAR ✓ Variable length string - ASCII only TEXT, NVARCHAR ✓ Variable length string - UTF-8 encoded DATE ✓ Date DATETIME, ✓ Date and time TIMESTAMP NULL ✓ NULL values TIME ￿ Can be stored as a text string or as part of a DATETIME

3.5.2 Contraints

Table 59: Contraints Item Supported Further information Not null ✓ NOT NULL Default values ✓ DEFAULT AUTO INCREMENT ✓ Different name IDENTITY

3.5.3 Transactions

SQream DB treats each statement as an auto-commit transaction. Each transaction is isolated from other transactions with serializable isolation. If a statement fails, the entire transaction is cancelled and rolled back. The database is unchanged. Read more about transactions in SQream DB.

3.5.4 Indexes

SQream DB has a range-index collected on all columns as part of the metadata collection process. SQream DB does not support explicit indexing, but does support clustering keys. Read more about clustering keys and our metadata system.

3.5. SQL Feature Checklist 711 SQream DB, Release 2021.1

3.5.5 Schema changes

Table 60: Schema changes Item Sup- Further information ported ALTER TABLE ✓ ALTER TABLE - Add column, alter column, drop column, rename col- umn, rename table, modify clustering keys Rename database ￿ Rename table ✓ RENAME TABLE Rename column ✓ RENAME COLUMN Add column ✓ ADD COLUMN Remove column ✓ DROP COLUMN Alter column data ￿ type Add / modify clus- ✓ CLUSTER BY tering keys Drop clustering ✓ DROP CLUSTERING KEY keys Add / Remove con- ￿ straints Rename schema ￿ Drop schema ✓ DROP SCHEMA Alter default ✓ ALTER DEFAULT SCHEMA schema per user

3.5.6 Statements

Table 61: Statements Item Supported Further information SELECT ✓ SELECT CREATE TABLE ✓ CREATE TABLE CREATE FOREIGN / EXTERNAL TABLE ✓ CREATE FOREIGN TABLE DELETE ✓ Deleting Data INSERT ✓ INSERT, COPY FROM TRUNCATE ✓ TRUNCATE UPDATE ￿ VALUES ✓ VALUES

3.5.7 Clauses

Table 62: Clauses Item Supported Further information LIMIT / TOP ✓ LIMIT with OFFSET ￿ WHERE ✓ HAVING ✓ OVER ✓

712 Chapter 3. Reference SQream DB, Release 2021.1

3.5.8 Table expressions

Table 63: Table expressions Item Supported Further information Tables, Views ✓ Aliases, AS ✓ JOIN - INNER, LEFT [ OUTER ], RIGHT [ OUTER ], CROSS ✓ Table expression subqueries ✓ Scalar subqueries ￿

3.5.9 Scalar expressions

Read more about Scalar expressions.

Table 64: Scalar expressions Item Sup- Further information ported Common functions ✓ CURRENT_TIMESTAMP, SUBSTRING, TRIM, EXTRACT, etc. Comparison operators ✓ <, <=, >, >=, =, <>, !=, IS, IS NOT Boolean operators ✓ AND, NOT, OR Conditional expressions ✓ CASE .. WHEN Conditional functions ✓ COALESCE Pattern matching ✓ LIKE, RLIKE, ISPREFIXOF, CHARINDEX, PATINDEX REGEX POSIX pattern matching ✓ RLIKE, REGEXP_COUNT, REGEXP_INSTR, REGEXP_SUBSTR, EXISTS ￿ IN, NOT IN Partial Literal values only Bitwise arithemtic ✓ &, |, XOR, ~, >>, <<

3.5.10 Permissions

Read more about Access control in SQream DB.

Table 65: Permissions Item Supported Further information Roles as users and groups ✓ Object default permissions ✓ Column / Row based permissions ￿ Object ownership ￿

3.5. SQL Feature Checklist 713 SQream DB, Release 2021.1

3.5.11 Extra functionality

Table 66: Extra functionality Item Supported Further information Information schema ✓ Catalog reference Views ✓ CREATE VIEW Window functions ✓ Window Functions CTEs ✓ Common table expressions (CTEs) Saved queries, Saved queries with parameters ✓ Saved queries Sequences ✓ Identity

714 Chapter 3. Reference CHAPTER 4

Releases

Version Release Date What’s New in 2021.2 September 13, 2021 What’s New in 2021.1 June 13, 2021 What’s New in 2020.3 October 8, 2020 What’s New in 2020.2 July 22, 2020 What’s New in 2020.1 January 15, 2020

4.1 What’s New in 2021.2

The 2021.2 release notes were released on 13/9/2021.

4.1.1 New Features

The 2021.2 Release Notes include the following new features:

• New Driver Compatibility • Centralized Configuration System • Qualifying Schemas Without Providing an Alias • Double-Quotations Supported When Importing and Exporting CSVs

4.1.1.1 New Driver Compatibility

The 2021.2 release supports the following drivers: • JDBC - new driver version (JDBC 4.5) with important bug fixes.

715 SQream DB, Release 2021.1

• ODBC - ODBC 4.1.1. available on request. • NodeJS - all versions starting with NodeJS 4.0. SQream recommends the latest version (NodeJS 4.2.4). • Dot Net - SQream recommends version version 3.02 (compatible with DotNet version 48). • Pysqream - pysqream 3.1.2

4.1.1.2 Centralized Configuration System

SQream now uses a new configuration system based on centralized configuration accessible from SQream Studio. For more information, see the following: • Configuration - describes how to configure your instance of SQream from a centralized location. • SQream Studio 5.4.0 - configure your instance of SQream from Studio.

4.1.1.3 Qualifying Schemas Without Providing an Alias

When running queries, SQream now supports qualifying schemas without providing an alias. For more information, see CREATE SCHEMA.

4.1.1.4 Double-Quotations Supported When Importing and Exporting CSVs

When importing and exporting CSVs, SQream now supports using quotation characters other than double quotation marks ("). For more information, see the following: • COPY FROM • COPY TO Note the following: • Leaving unspecified uses the default value of standard double quotations ”.

• The quotation character must be a single, 1-byte printable ASCII character. The same octal syntax of the copy command can be used.

• The quote character cannot be contained in the field delimiter, record delimiter, or null marker.

• Double-quotations can be customized when the csv_fdw value is used with the COPY FROM and CREATE FOREIGN TABLE statements.

• The default escape character always matches the quote character, and can be overridden by using the ESCAPE = {'\\' | E'\XXX') syntax as shown in the following examples:

copy t from wrapper csv_fdw options (location = '/tmp/file.csv', escape='\\');

716 Chapter 4. Releases SQream DB, Release 2021.1

copy t from wrapper csv_fdw options (location = '/tmp/file.csv', escape=E'\017');

copy t to wrapper csv_fdw options (location = '/tmp/file.csv', escape='\\');

For more information, see the following statements: • COPY FROM - Loading data from files on the filesystem and importing it into SQream tables.

• CREATE FOREIGN TABLE - Creating a new foreign table in an existing database.

4.2 What’s New in 2021.1

The 2021.1 Release Notes describe the following releases:

• Release Notes Version 2021.1.2 • Release Notes Version 2021.1.1 • Release Notes Version 2021.1

4.2.1 Release Notes Version 2021.1.2

The 2021.1.2 release notes were released on 8/9/2021.

4.2.1.1 New Features

The 2021.1.2 Release Notes include the following new features:

• Aliases Added to SUBSTRING Function and Length Argument • Data Type Aliases Added • String Literals Containing ASCII Characters Interepreted as TEXT • Decimal Literals Interpreted as Numeric Columns • Roles Area Added to Studio Version 5.3.3

4.2.1.1.1 Aliases Added to SUBSTRING Function and Length Argument

The following aliases have been added: • length - len • substring - substr

4.2. What’s New in 2021.1 717 SQream DB, Release 2021.1

4.2.1.1.2 Data Type Aliases Added

The following data type aliases have been added: • INTEGER - int • DECIMAL - numeric • DOUBLE PRECISION - double • CHARACTER/CHAR - text • NATIONAL CHARACTER/NATIONAL CHAR/NCHAR - text • CHARACTER VARYING/CHAR VARYING - text • NATIONAL CHARACTER VARYING/NATIONAL CHAR VARYING/NCHAR VARYING - text

4.2.1.1.3 String Literals Containing ASCII Characters Interepreted as TEXT

SQream now interprets all string literals, including those containing ASCII characters, as text. For more information, see String Types.

4.2.1.1.4 Decimal Literals Interpreted as Numeric Columns

SQream now interprets literals containing decimal points as numeric instead of as double. For more information, see Data Types.

4.2.1.1.5 Roles Area Added to Studio Version 5.3.3

The Roles area has been added to Studio version 5.3.3. From the Roles area users can create and assign roles and manage user permissions.

4.2.1.2 Resolved Issues

The following list describes the resolved issues: • In Parquet files, float columns could not be mapped to SQream double columns. This was fixed. • The REPLACE function only supported constant values as arguments. This was fixed. • The LIKE function did not check for incorrect patterns or handle escape characters. This was fixed.

4.2.2 Release Notes Version 2021.1.1

The 2021.1.1 release notes were released on 7/27/2021.

718 Chapter 4. Releases SQream DB, Release 2021.1

4.2.2.1 New Features

The 2021.1.1 Release Notes include the following new features:

• Complete Ranking Function Support

4.2.2.1.1 Complete Ranking Function Support

SQream now supports the following new ranking functions:

FunctionReturn Type Description first_valueSame type as value Returns the value in the first row of a window. last_valueSame type as value Returns the value in the last row of a window. nth_valueSame type as value Returns the value in a specified (n) row of a window. if the specified row does not exist, this function returns NULL. dense_rankbigint Returns the rank of the current row with no gaps. percent_rankdouble Returns the relative rank of the current row. cume_distdouble Returns the cumulative distribution of rows. ntile(buckets)integer Returns an integer ranging between 1 and the argument value, dividing the partitions as equally as possible.

For more information, navigate to Windows Functions and scroll to the Ranking Functions table.

4.2.2.2 Resolved Issues

The following list describes the resolved issues: • SQream did not support exporting and reading Int64 columns as bigint in Parquet. This was fixed. • The Decimal column was not supported when inserting data from Parquet files. This was fixed. • Values in Parquet Numeric columns were not being converted correctly. This was fixed. • Converting string data type to datetime was not working correctly. This was fixed. • Casting datetime to text truncated the time. This was fixed.

4.2.3 Release Notes Version 2021.1

The 2021.1 release notes were released on 6/13/2021.

4.2.3.1 Version Content

The 2021.1 Release Notes describes the following: • Major feature release targeted for all on-premises customers. • Basic Cloud functionality.

4.2. What’s New in 2021.1 719 SQream DB, Release 2021.1

4.2.3.2 New Features

The 2021.1 Release Notes include the following new features:

• SQream DB on Cloud • Numeric Data Types • Text Data Type • Supports Scalar Subqueries • Literal Arguments • Simple Scalar SQL UDFs • Logging Enhancements • Improved Presented License Information • Optimized Foreign Data Wrapper Export

4.2.3.2.1 SQream DB on Cloud

SQream DB can now be run on AWS, GCP, and Azure.

4.2.3.2.2 Numeric Data Types

SQream now supports Numeric Data types for the following operations: • All join types. • All aggregation types (not including Window functions). • Scalar functions (not including some trigonometric and logarithmic functions). For more information, see Numeric Data Types.

4.2.3.2.3 Text Data Type

SQream now supports TEXT data types in all operations, which is default string data type for new projects. • Sqream supports VARCHAR functionalty, but recommends using TEXT. • TEXT data enhancements introduced in Release Notes version 2020.3.1: – Support text columns in queries with multiple distinct aggregates. – Text literal support for all functions. For more information, see String Types.

4.2.3.2.4 Supports Scalar Subqueries

SQream now supports running initial scalar subqueries. For more information, see Subqueries.

720 Chapter 4. Releases SQream DB, Release 2021.1

4.2.3.2.5 Literal Arguments

SQream now supports literal arguments for functions in all cases where column/scalar arguments are sup- ported.

4.2.3.2.6 Simple Scalar SQL UDFs

SQream now supports simple scalar SQL UDF’s. For more information, see Simple Scalar SQL UDF’s.

4.2.3.2.7 Logging Enhancements

The following log information has been added for the following events: • Compilation start time. • When the first metadata callback in the compiler (if relevant). • When the last metadata callback in the compiler (if relevant). • When the log started attempting to apply locks. • When a statement entered the queue. • When a statement exited the queue. • When a client has connected to an instance of sqreamd (if it reconnects). • When the log started executing.

4.2.3.2.8 Improved Presented License Information

SQream now displays information related to data size limitations, expiration date, type of license shown by the new UF. The Utility Function (UF) name is get_license_info(). For more information, see GET_LICENSE_INFO.

4.2.3.2.9 Optimized Foreign Data Wrapper Export

Sqream now supports exporting to multiple files concurrently. This is useful when you need to reduce file size to more easily export multiple files. The following is the correct syntax for exporting multiple files concurrently:

COPY table_name TO fdw_name OPTIONS(max_file_size=size_in_bytes,enforce_single_file= ,→{TRUE|FALSE});

The following is an example of the correct syntax for exporting multiple files concurrently:

COPY my_table1 TO my_ext_table OPTIONS(max_file_size=500000,enforce_single_file=TRUE);

The following apply: • Both of the parameters in the above example are optional.

4.2. What’s New in 2021.1 721 SQream DB, Release 2021.1

• The max_file_size value is specified in bytes and can be any positive value. The default valueis 16*2^20 (16MB). • When the enforce_single_file value is set to TRUE, only one file is created, and its size is not limited by the max_file_size value. Its default value is TRUE.

4.2.3.3 Main Features

The following list describes the main features: • SQreamDB available on AWS. • SQreamDB available on GCP. • SQreamDB available on Azure. • SQream usages storage located on Object Store (as opposed to local disks) for the above three cloud providers. • SQream now supports Microstrategy. • Supports MVP licensing system. • A new literal syntax containing character escape semantics for string literals has been added. • Supports optimizing exporting foreign data wrappers. • Supports truncating Numeric values when ingested from ORC and CSV files. • Supports catalog Utility Function that accepts valid SQL patterns and escape characters. • Supports creating a basic random data foreign data wrapper for non-text types. • The new foreign data wrapper random_fdw has been introduced for non-text types. • Supports simple scalar SQL UDF’s. • SQream parses its own logs as CSV’s.

4.2.3.4 Resolved Issues

The following list describes the resolved issues: • Copying text from a CSV file to the TEXT column without closing quotes caused SQream tocrash. This was fixed. • Using an unsupported function call generated an incorrect insert error. This was fixed. • Using the insert into function from table_does_not_exist generated an incorrect error. • SQream treated inserting * in select_distinct as one column. This was fixed. • Using certain encodeKey functions generated errors. This was fixed. • Compile errors occurred while running decimal datatype sets. This was fixed. • Running the select table_name,row_count from sqream_catalog.tables order by row_count limit 5 query generated an internal runtime error. • Using wildcards (such as *.x.y) did not work in parquet files. This was fixed. • Executing log*(x,y) generated an incorrect error message. This was fixed. • The internal runtime error type doesn’t have a fixed size when doing max on text on develop.

722 Chapter 4. Releases SQream DB, Release 2021.1

• The min and max on TEXT were significantly slower than varchar. This was fixed. • Running regexp_instr generated an empty regular expression. This was fixed. • Schemas with external tables could be dropped. This was fixed.

4.2.3.5 Operations and Configuration Changes

4.2.3.5.1 Recommended SQream Configuration on Cloud

For more information about AWS, see Amazon S3.

4.2.3.5.2 Optimized Foreign Data Wrapper Export Configuration Flag

SQream now has a new runtimeGlobalFlags flag called WriteToFileThreads. This flag configures the number of threads inthe WriteToFile function. The default value is 16. For more information about the runtimeGlobalFlags flag, see the Runtime Global Flags table in Con- figuration.

4.2.3.6 Naming Changes

No relevant naming changes were made.

4.2.3.7 Deprecated Features

No features were depecrated.

4.2.3.8 Known Issues and Limitations

The the list below describes the following known issues and limitations: • In cases when selecting top 1 from external table using the Parquet format with an hdfs path, SQream experienced an error. • Internal Runtime Error occurred when SQream was unable to find column in reorder columns. • Casting datetime to text truncates the time segment. • In the select list, the compiler generates an error when a count is used as an alias. • Performance degradation occurred when joins made on small tables. • SQream causes a logging error when using copy from logs. • Deploying S3 requires setting the ObjectStoreClients parameter to 40.

4.2.3.9 Upgrading to v2021.1

Due to the known issue of a limitation on the amount of access requests that can be simultaneously sent to AWS, deploying S3 requires setting the ObjectStoreClients parameter to 40.

4.2. What’s New in 2021.1 723 SQream DB, Release 2021.1

4.3 What’s New in 2020.3

SQream DB v2020.3 contains new features, performance enhancements, and resolved issues. The 2020.3 Release Notes describe the following:

• Release Notes Version 2020.3.2.1 • Release Notes Version 2020.3.1 • Release Notes Version 2020.3

4.3.1 Release Notes Version 2020.3.2.1

SQream DB v2020.3.2.1 contains major performance improvements and some bug fixes.

4.3.1.1 Performance Enhancements

• Metadata on Demand optimization resulting in reduced latency and improved overall performance.

4.3.1.2 Known Issues and Limitations

• Multiple count distinct operations - is blocked for Text data type.

4.3.1.3 Upgrading to v2020.3.2.1

Versions are available for IBM POWER9, RedHat (CentOS) 7, Ubuntu 18.04, and other OSs via Docker. Contact your account manager to get the latest release of SQream DB.

4.3.2 Release Notes Version 2020.3.1

4.3.2.1 New Features

The following list describes the new features: • TEXT data type: – Full support for MIN and MAX aggregate functions on TEXT columns in GROUP BY queries. – Support Text-type as window partition keys (e.g., select distinct name, max(id) over (partition by name) from textTable;). – Support Text-type fields in windows order by keys. – Support join on TEXT columns (such as t1.x = t2.y where x and y are columns of type TEXT).

724 Chapter 4. Releases SQream DB, Release 2021.1

– Complete the implementation of LIKE on TEXT columns (previously limited to prefix and suffix). – Support for cast fromm TEXT to REAL/FLOAT. – New string function - REPEAT for repeating a string value for a specified number of times. • Support mapping DECIMAL ORC columns to SQream’s floating-point types. • Support LIKE on non-literal patterns (such as columns and complex expressions). • Catch OS signals and save the signal along with the stack trace in the SQream debug log. • Support equijoin conditions on columns with different types (such as tinyint, smallint, int and bigint). • DUMP_DATABASE_DDL now includes foreign tables in the output. • New utility function - TRUNCATE_IF_EXISTS.

4.3.2.2 Performance Enhancements

The following list describes the performance enhancements: • Introduced the “MetaData on Demand” feature which results in signicant proformance improvements. • Implemented regex functions (RLIKE, REGEXP_COUNT, REGEXP_INSTR, REGEXP_SUBSTR, PATINDEX) for TEXT columns on GPU.

4.3.2.3 Resolved Issues

The following list describes the resolved issues: • Multiple distinct aggregates no longer need to be used with developerMode flag. • In some scenarios, the statement_id and connection_id values are incorrectly recorded as -1 in the log. • NOT RLIKE is not supported for TEXT in the compiler. • Casting from TEXT to date/datetime returns an error when the TEXT column contains NULL.

4.3.2.4 Known Issues and Limitations

No known issues and limitations.

4.3.2.5 Upgrading to v2020.3.1

Versions are available for IBM POWER9, RedHat (CentOS) 7, Ubuntu 18.04, and other OSs via Docker. Contact your account manager to get the latest release of SQream DB.

4.3. What’s New in 2020.3 725 SQream DB, Release 2021.1

4.3.3 Release Notes Version 2020.3

4.3.3.1 New Features

The following list describes the new features: • Parquet and ORC files can now be exported to local storage, S3, and HDFS with COPY TO and foreign data wrappers. • New error tolerance features when loading data with foreign data wrappers. • TEXT is ramping up with new features (previously only available with VARCHARs): – SUBSTRING, LOWER, LTRIM, CHARINDEX, REPLACE, etc. – Binary operators - || (Concatenate), LIKE, etc. – Casts to and from TEXT • sqream_studio v5.1 – New log viewer helps you track and debug what’s going on in SQream DB. – Dashboard now also available for non-k8s deployments. – The editor contains a new query concurrency tool for date and numeric ranges.

4.3.3.2 Performance Enhancements

The following list describes the performance enhancements: • Error handling for CSV FDW. • Enable logging errors - ORC, Parquet, CSV. • Add limit and offset options to csv_fdw import. • Enable logging errors to an external file when skipping CSV, Parquet, and ORC errors. • Option to specify date format to the CSV FDW. • Support all existing VARCHAR functions with TEXT on GPU. • Support INSERT INTO + ORDER BY optimization for non-clustered tables. • Performance improvements with I/O.

4.3.3.3 Resolved Issues

The following list describes the resolved issues: • Better error message when passing the max errors limit. This was fixed. • showFullExceptionInfo is no longer restricted to Developer Mode. This was fixed. • An StreamAggregateA reduction error occured when performing aggregation on a NULL column. This was fixed. • Insert into query fails with “”Error at Sql phase during Stages “”rewriteSqlQuery””. This was fixed. • Casting from VARCHAR to ‘‘NVARCHAR‘ does not remove the spaces. This was fixed. • An Internal Runtime Error t1.size() == t2.size() occurs when querying the sqream_catalog. delete_predicates. This was fixed.

726 Chapter 4. Releases SQream DB, Release 2021.1

• spoolMemoryGB and limitQueryMemoryGB show incorrectly in the runtime global section of show_conf. This was fixed. • Casting empty text to int causes illegal memory access. This was fixed. • Copying from the TEXT field is 1.5x slower than the VARCHAR equivalent. This was fixed. • TPCDS 10TB - Internal runtime error (std::bad_alloc: out of memory) occurs on 2020.1.0.2. This was fixed. • An unequal join on non-existing TEXT caused a system crash. This was fixed. • An Internal runtime time error occured when using TEXT (tpcds). This was fixed. • Copying CSV with a quote in the middle of a field to a TEXT field does not produce the required error. This was fixed. • Cannot monitor long network insert loads with SQream. This was fixed. • Upper and like performance on NVARCHAR. This was fixed. • Insert into from 4 instances would get stuck (hanging). This was fixed. • An invalid formatted CSV would cause an insufficient memory error ona COPY FROM statement if a quote was not closed and the file was much larger than system memory. This was fixed. • TEXT columns cannot be used with an outer join together with an inequality check (!= , <>). This was fixed.

4.3.3.4 Known Issues and Limitations

The following list describes the known issues and limitations: • Cast from TEXT to a DATE or DATETIME errors when the TEXT column contains NULL • Casting an empty TEXT field to an INT type returns 0 instead of erroring • Multiple COUNT( distinct ... ) operations on the TEXT data type are currently unsupported • Multiple COUNT( distinct ... ) operations within the same query are limited to “developer mode” due to an instability that was identified. If you rely on this feature, contact your SQream account manager to enable this feature.

4.3.3.5 Upgrading to v2020.3

Versions are available for IBM POWER9, RedHat (CentOS) 7, Ubuntu 18.04, and other OSs via Docker. Contact your account manager to get the latest release of SQream DB.

4.4 What’s New in 2020.2

SQream DB v2020.2 contains some new features, improved performance, and bug fixes. This version has new window ranking function and a new editor UI to empower data users to analyze more data with less friction. As always, the latest release improves reliability and performance, and makes getting more data into SQream DB easier than ever.

4.4. What’s New in 2020.2 727 SQream DB, Release 2021.1

4.4.1 New features

4.4.1.1 UI

• New sqream_studio replaces the previous Statement Editor. Try it out today!

4.4.1.2 Integrations

• Our Python driver (pysqream) now has an SQLAlchemy dialect. Customers can write high-performance Python applications that make full use of SQream DB - connect, query, delete, and insert data. Data scientists can use pysqream with Pandas, Numpy, and AI/ML frameworks like TensorFlow for direct queries of huge datasets.

4.4.1.3 SQL support

• Added LAG/LEAD ranking functions to our Window Functions support. We will have more features coming in the next version. • New syntax preview for Foreign Tables. Foreign tables replace external tables, with improved func- tionality. You can keep using the existing external table syntax for now, but it may be deprecated in the future.

CREATE FOREIGN TABLE orc_example ( name varchar(40), Age tinyint, Salary float ) WRAPPER orc_fdw OPTIONS ( LOCATION = 'hdfs://hadoop-nn.piedpiper.com:8020/demo-data/example.orc' );

4.4.2 Improvements and fixes

SQream DB v2020.2 includes hundreds of small new features and tunable parameters that improve perfor- mance, reliability, and stability. • ~100 bug fixes, including: – Fixed CSV handling for DOS newlines – Fixed “out of bounds” message when several layers of nested substring, cast, and to_hex were used to produce one value. – Fixed “Illegal memory access” that would occur in extremely rare situations on all-text tables – Window functions can now be used with all aggregations – Fixed situation where a single worker may use more than one GPU that isn’t allocated to it – Text columns can now be added to existing tables with ALTER TABLE • New Data clustering syntax that can improve query performance for unsorted data

728 Chapter 4. Releases SQream DB, Release 2021.1

4.4.3 Operations

• When upgrading from a previous version of SQream DB (for example, v2019.2), the storage version must be upgraded using the upgrade_storage utility: ./bin/upgrade_storage /path/to/storage/ sqreamdb/ • A change in memory allocation behaviour in this version sees the introduction of a new setting, limitQueryMemoryGB. This is an addition to the previous spoolMemoryGB setting. A good rule-of-thumb is to allow 5% system memory for other processes. The spool memory allocation should be around 90% of the total memory allocated. – limitQueryMemoryGB defines how much total system memory is used by the worker. The recom- mended setting is (total host memory - 5%) / sqreamd workers on host. – spoolMemoryGB defines how much memory is set aside for spooling, out of the total system memory allocated in limitQueryMemoryGB. The recommended setting is 90% of the limitQueryMemoryGB. This setting must be set lower than the limitQueryMemoryGB setting. For example, for a machine with 512GB of RAM and 4 workers, the recommended settings are: – limitQueryMemoryGB - ￿(512 * 0.95 / 4)￿ → ~ 486 / 4 → 121. – spoolMemoryGB - ￿( 0.9 * limitQueryMemoryGB )￿ → ￿( 0.9 * 121 )￿ → 108 Example settings per-worker, for 512GB of RAM and 4 workers:

"runtimeFlags": { "limitQueryMemoryGB" : 121, "spoolMemoryGB" : 108

4.4.4 Known Issues & Limitations

• An invalid formatted CSV can cause an insufficient memory error on a COPY FROM statement if a quote isn’t closed and the file is much larger than system memory. • Multiple COUNT( distinct ... ) operations within the same query are limited to “developer mode” due to an instability that was identified. If you rely on this feature, contact your SQream account manager to enable this feature. • TEXT columns can’t be used with an outer join together with an inequality check (!= , <>)

4.4.5 Upgrading to v2020.2

Versions are available for IBM POWER9, RedHat (CentOS) 7, Ubuntu 18.04, and other OSs via Docker. Contact your account manager to get the latest release of SQream DB.

4.5 What’s New in 2020.1

SQream DB v2020.1 contains lots of new features, improved performance, and bug fixes. This is the first release of 2020, with a strong focus on integration into existing environments. Therelease includes connectivity to Hadoop and other legacy data warehouse ecosystems. We’re also bringing lots of new capabilities to our analytics engine, to empower data users to analyze more data with less friction.

4.5. What’s New in 2020.1 729 SQream DB, Release 2021.1

The latest release vastly improves reliability and performance, and makes getting more data into SQream DB easier than ever. The core of SQream DB v2020.1 contains new integration features, more analytics capabilities, and better drivers and connectors.

4.5.1 New features

4.5.1.1 Integrations

• Load files directly from S3 buckets. Customers with columnar data in S3 data lakes can now access the data directly. All that is needed is to simply point an external table to an S3 bucket with Parquet, ORC, or CSV objects. This feature is available on all deployments of SQream DB – in the cloud and on-prem. • Load files directly from HDFS. SQream DB now comes with built-in, native HDFS support for directly loading data from Hadoop-based data lakes. Our focus on helping Hadoop customers do more with their data led us to develop this feature, which works out of the box. As a result, SQream DB can now not only read but also write data, and intermediate results back to HDFS for HIVE and other data consumers. SQream DB now fits seamlessly into a Hadoop data pipeline. • Import ORC files, through Foreign Tables. ORC files join Parquet as files that can be natively accessed and inserted into SQream DB tables. • Python driver (pysqream) is now DB-API v2.0 compliant. Customers can write high-performance Python applications that make full use of SQream DB - connect, query, delete, and insert data. Data scientists can use pysqream with Pandas, Numpy, and AI/ML frameworks like TensorFlow for direct queries of huge datasets. • Certified Tableau JDBC connector (taco), now also supported on MacOS. Users are encouraged to install the new JDBC connector. • All logs are now unified into one log, which can be analyzed with SQream DB directly. See Logging for more information.

4.5.1.2 SQL support

• Added frames and frame exclusions to Window Functions. This is available for preview, with more features coming in the next version. The new frames and frame exclusionsfeature adds complex analytics capabilities to the already powerful window functions. • New datatype - TEXT, which replaces NVARCHAR directly with UTF-8 support and improved performance. Unlike VARCHAR, the new TEXT data type has no restrictions on size, and carries no performance overhead as the text sizes grow. • TEXT join keys are now supported • Added lots of new aggregate functions, including VAR_SAMP, VAR_POP, COVAR_POP, etc.

4.5.2 Improvements and fixes

SQream DB v2020.1 includes hundreds of small new features and tunable parameters that improve perfor- mance, reliability, and stability. Existing SQream DB users can expect to see a general speedup of around 10% on most statements and queries!

730 Chapter 4. Releases SQream DB, Release 2021.1

• 207 bug fixes, including: – Improved performance of both inner and outer joins – Fixed wrong results on STDDEV (0 instead of NULL) – Fixed wrong results on nested Parquet files – Fixed failing cast from VARCHAR to FLOAT – Fix INSERT that would fail on nullable values and non-nullable columns in some scenarios – Improved memory consumption, so Out of GPU memory errors should not occur anymore – Reduced long compilation times for very complex queries – Improved ODBC reliability – Fixed situation where some logs would clip very long queries – Improved error messages when dropping a schema with many objects – Fixed situation where Spotfire would not show table names – Fixed situation where some queries with UTF-8 literals wouldn’t run through Tableau over ODBC – Significantly improved cache freeing and memory allocation – Fixed situation in which a malformed time (24:00:00) would get incorrectly inserted from a CSV – Fixed race condition in which loading thousands of small files from HDFScaused a memory leak • The saved query feature can now be used with INSERT statements • Faster “Deferred gather” algorithm for joins with text keys • Faster filtering when using DATEPART • Faster metadata tagging during load • Fixed situation where some queries would get compiled twice • Saved queries now support INSERT statements • highCardinalityColumns can be configured to tell the system about high selectivity columns • sqream sql starts up faster, can run on any Linux machine • Additional CSV date formats (date parsers) added for compatibility

4.5.3 Behaviour changes

• ClientCmd is now known as sqream sql • NVARCHAR columns are now known as TEXT internally • Deprecated the ability to run SELECT and COPY at the same time on the same worker. This change is designed to protect against out of GPU memory issues. This comes with a configuration change, namely the limitQueryMemoryGB setting. See the operations section for more information. • All logs are now unified into one log. See Logging for more information • Compression changes: – The latest version of SQream DB could select a different compression scheme if data is reloaded, compared to previous versions of SQream DB. This internal change improves performance.

4.5. What’s New in 2020.1 731 SQream DB, Release 2021.1

– With LZ4 compression, the maximum chunk size is limited to 2.1GB. If the chunk size is bigger, another compression may be selected - primarily SNAPPY. • The following configuration flags have been deprecated: – addStatementRechunkerAfterGpuToHost – increasedChunkSizeFactor – gpuReduceMergeOutputFactor – fullSortInputMemFactor – reduceInputMemFactor – distinctInputMemFactor – useAutoMemFactors – autoMemFactorsVramFactor – catchNotEnoughVram – useNetworkRechunker – useMemFactorInJoinOutput

4.5.4 Operations

• The client-server protocol has been updated to support faster data flow, and more reliable memory allocations on the client side. End users are required to use only the latest sqream sql, JDBC, and ODBC drivers delivered with this version. See the client driver download page for the latest drivers and connectors. • When upgrading from a previous version of SQream DB (for example, v2019.2), the storage version must be upgraded using the upgrade_storage utility: ./bin/upgrade_storage /path/to/storage/ sqreamdb/ • A change in memory allocation behaviour in this version sees the introduction of a new setting, limitQueryMemoryGB. This is an addition to the previous spoolMemoryGB setting. A good rule-of-thumb is to allow 5% system memory for other processes. The spool memory allocation should be around 90% of the total memory allocated. – limitQueryMemoryGB defines how much total system memory is used by the worker. The recom- mended setting is (total host memory - 5%) / sqreamd workers on host. – spoolMemoryGB defines how much memory is set aside for spooling, out of the total system memory allocated in limitQueryMemoryGB. The recommended setting is 90% of the limitQueryMemoryGB. This setting must be set lower than the limitQueryMemoryGB setting. For example, for a machine with 512GB of RAM and 4 workers, the recommended settings are: – limitQueryMemoryGB - ￿(512 * 0.95 / 4)￿ → ~ 486 / 4 → 121. – spoolMemoryGB - ￿( 0.9 * limitQueryMemoryGB )￿ → ￿( 0.9 * 121 )￿ → 108 Example settings per-worker, for 512GB of RAM and 4 workers:

"runtimeGlobalFlags": { "limitQueryMemoryGB" : 121, "spoolMemoryGB" : 108

732 Chapter 4. Releases SQream DB, Release 2021.1

4.5.5 Known Issues & Limitations

• An invalid formatted CSV can cause an insufficient memory error on a COPY FROM statement if a quote isn’t closed and the file is much larger than system memory. • TEXT columns cannot be used in a window functions’ partition • Parsing errors are sometimes hard to read - the location points to the wrong part of the statement • LZ4 compression may not be applied correctly on very large VARCHAR columns, which decreases per- formance • Using SUM on very large numbers in window functions can error (overflow) when not used with an ORDER BY clause • Slight performance decrease with DATEADD in this version (<4%) • Operations on Snappy-compressed ORC files are slower than their Parquet equivalents.

4.5.6 Upgrading to v2020.1

Versions are available for IBM POWER9, RedHat (CentOS) 7, Ubuntu 18.04, and other OSs via Docker. Contact your account manager to get the latest release of SQream DB.

4.5. What’s New in 2020.1 733 SQream DB, Release 2021.1

734 Chapter 4. Releases CHAPTER 5

Glossary

Cluster A SQream DB deployment containing several workers running on one or more nodes. Node A machine running SQream DB workers. Worker A SQream DB application that can respond to statements. Several workers can run on one or more nodes to form a cluster. Metadata SQream DB’s internal storage which contains details about database objects. Authentication Authentication is the process that identifies a user or role to verify an identity - to make sure the user is who they say they are. This is done with a username and password. Authorization Authorization defines the set of actions that an authenticaed role can perform after gaining access to the system, protecting from threats that authentication controls alone are not enough against. Role A role is a group or a user, depending on the permissions granted. A role groups together a set of permissions. Catalog A set of views that contains information (metadata) about the objects in a database (e.g. tables, columns, chunks, etc…). Storage cluster The storage cluster is the directory in which SQream DB stores data, including database objects, metadata database, and logs. UDF User-defined function A feature that extends SQream DB’s built in SQL functionality with user-written Python code.

735 SQream DB, Release 2021.1

736 Chapter 5. Glossary Index

Symbols S __iter__() (Connection method), 49 Storage cluster, 735 A U arraysize (Connection attribute), 47 UDF, 735 Authentication, 735 User-defined function, 735 Authorization, 735 W C Worker, 735 Catalog, 735 close() (Connection method), 48 Cluster, 735 connect(), 47 Connection (built-in class), 47 cursor() (Connection method), 48 D description (Connection attribute), 47 E execute() (Connection method), 48 executemany() (Connection method), 48 F fetchall() (Connection method), 48 fetchmany() (Connection method), 49 fetchone() (Connection method), 48 M Metadata, 735 N Node, 735 R Role, 735 rowcount (Connection attribute), 47

737