SQream DB Release 2021.1
Sep 27, 2021
Contents:
1 First steps with SQream DB 3 1.1 Preparing for this tutorial ...... 3 1.2 Create a new database for playing around in ...... 4 1.3 Creating your first table ...... 4 1.4 Listing tables ...... 5 1.5 Inserting rows ...... 5 1.6 Queries ...... 6 1.7 Deleting rows ...... 8 1.8 Saving query results to a CSV or PSV file ...... 8
2 Guides 11 2.1 Full list of guides ...... 11 2.1.1 Inserting data ...... 11 2.1.1.1 Data loading overview ...... 12 2.1.1.2 Data loading considerations ...... 12 2.1.1.2.1 Verify data and performance after load ...... 12 2.1.1.2.2 File source location for loading ...... 13 2.1.1.2.3 Supported load methods ...... 13 2.1.1.2.4 Unsupported data types ...... 13 2.1.1.2.5 Extended error handling ...... 13 2.1.1.2.6 Best practices for CSV ...... 13 2.1.1.2.7 Best practices for Parquet ...... 14 2.1.1.2.7.1 Type support and behavior notes ...... 14 2.1.1.2.8 Best practices for ORC ...... 14 2.1.1.2.8.1 Type support and behavior notes ...... 14 2.1.1.3 Further reading and migration guides ...... 15 2.1.1.3.1 Insert from CSV ...... 15 2.1.1.3.1.1 1. Prepare CSVs ...... 16 2.1.1.3.1.2 2. Place CSVs where SQream DB workers can access ..... 16 2.1.1.3.1.3 3. Figure out the table structure ...... 17 2.1.1.3.1.4 4. Bulk load the data with COPY FROM ...... 18 2.1.1.3.1.5 Loading different types of CSV files ...... 18 2.1.1.3.1.6 Loading a standard CSV file from a local filesystem ..... 18 2.1.1.3.1.7 Loading a PSV (pipe separated value) file ...... 18 2.1.1.3.1.8 Loading a TSV (tab separated value) file ...... 18 2.1.1.3.1.9 Loading a text file with non-printable delimiter ...... 18
i 2.1.1.3.1.10 Loading a text file with multi-character delimiters ...... 18 2.1.1.3.1.11 Loading files with a header row ...... 19 2.1.1.3.1.12 Loading files formatted for Windows (\r\n) ...... 19 2.1.1.3.1.13 Loading a file from a public S3 bucket ...... 19 2.1.1.3.1.14 Loading files from an authenticated S3 bucket ...... 19 2.1.1.3.1.15 Loading files from an HDFS storage ...... 19 2.1.1.3.1.16 Saving rejected rows to a file ...... 19 2.1.1.3.1.17 Stopping the load if a certain amount of rows were rejected . 19 2.1.1.3.1.18 Load CSV files from a set of directories ...... 20 2.1.1.3.1.19 Rearrange destination columns ...... 20 2.1.1.3.1.20 Loading non-standard dates ...... 20 2.1.1.3.2 Insert from Parquet ...... 20 2.1.1.3.2.1 1. Prepare the files ...... 21 2.1.1.3.2.2 2. Place Parquet files where SQream DB workers can access them ...... 21 2.1.1.3.2.3 3. Figure out the table structure ...... 22 2.1.1.3.2.4 4. Verify table contents ...... 23 2.1.1.3.2.5 5. Copying data into SQream DB ...... 23 2.1.1.3.2.6 Working around unsupported column types ...... 23 2.1.1.3.2.7 Modifying data during the copy process ...... 24 2.1.1.3.2.8 Further Parquet loading examples ...... 24 2.1.1.3.2.9 Loading a table from a directory of Parquet files on HDFS . 24 2.1.1.3.2.10 Loading a table from a bucket of files on S3 ...... 24 2.1.1.3.3 Insert from ORC ...... 25 2.1.1.3.3.1 1. Prepare the files ...... 25 2.1.1.3.3.2 2. Place ORC files where SQream DB workers can access them 25 2.1.1.3.3.3 3. Figure out the table structure ...... 26 2.1.1.3.3.4 4. Verify table contents ...... 27 2.1.1.3.3.5 5. Copying data into SQream DB ...... 27 2.1.1.3.3.6 Working around unsupported column types ...... 27 2.1.1.3.3.7 Modifying data during the copy process ...... 28 2.1.1.3.3.8 Further ORC loading examples ...... 28 2.1.1.3.3.9 Loading a table from a directory of ORC files on HDFS ... 28 2.1.1.3.3.10 Loading a table from a bucket of files on S3 ...... 28 2.1.1.3.4 Migrating from Oracle ...... 29 2.1.1.3.4.1 1. Preparing the tools and login information ...... 29 2.1.1.3.4.2 2. Export the desired schema ...... 30 2.1.1.3.4.3 Examples ...... 30 2.1.1.3.4.4 Dump all tables ...... 30 2.1.1.3.4.5 Dump only specific tables ...... 30 2.1.1.3.4.6 3. Convert the Oracle dump to standard SQL ...... 30 2.1.1.3.4.7 Example ...... 30 2.1.1.3.4.8 4. Figure out the database structures ...... 30 2.1.1.3.4.9 Remove unsupported attributes ...... 31 2.1.1.3.4.10 Match data types ...... 31 2.1.1.3.4.11 Additional considerations ...... 31 2.1.1.3.4.12 5. Create the tables in SQream DB ...... 31 2.1.1.3.4.13 Example ...... 31 2.1.1.3.4.14 6. Export tables to CSVs ...... 33 2.1.1.3.4.15 Using SQL*Plus to export data lists ...... 33 2.1.1.3.4.16 Creating CSVs using stored procedures ...... 34 2.1.1.3.4.17 CSV generation considerations ...... 34 2.1.1.3.4.18 7. Place CSVs where SQream DB workers can access ..... 34 2.1.1.3.4.19 8. Bulk load the CSVs ...... 35 ii 2.1.1.3.4.20 Example ...... 35 2.1.1.3.4.21 9. Rewrite Oracle queries ...... 35 2.1.2 Client drivers for v2021.1 ...... 35 2.1.2.1 Client driver downloads ...... 36 2.1.2.1.1 All operating systems ...... 36 2.1.2.1.2 Windows ...... 36 2.1.2.1.3 Linux ...... 36 2.1.2.1.3.1 Python (pysqream) ...... 36 2.1.2.1.3.2 Installing the Python connector ...... 37 2.1.2.1.3.3 Prerequisites ...... 37 2.1.2.1.3.4 1. Python ...... 37 2.1.2.1.3.5 2. PIP ...... 37 2.1.2.1.3.6 3. OpenSSL for Linux ...... 38 2.1.2.1.3.7 4. Cython (optional) ...... 38 2.1.2.1.3.8 Install via pip ...... 38 2.1.2.1.3.9 Upgrading an existing installation ...... 39 2.1.2.1.3.10 Validate the installation ...... 39 2.1.2.1.3.11 SQLAlchemy examples ...... 40 2.1.2.1.3.12 A simple connection example ...... 40 2.1.2.1.3.13 Pulling a table into Pandas ...... 40 2.1.2.1.3.14 API Examples ...... 41 2.1.2.1.3.15 Explaining the connection example ...... 41 2.1.2.1.3.16 Using the cursor ...... 42 2.1.2.1.3.17 Reading result metadata ...... 43 2.1.2.1.3.18 Loading data into a table ...... 44 2.1.2.1.3.19 Reading data from a CSV file for load into a table ...... 45 2.1.2.1.3.20 Using SQLAlchemy ORM to create tables and fill them with data ...... 46 2.1.2.1.3.21 pysqream API reference ...... 47 2.1.2.1.3.22 C++ Driver ...... 49 2.1.2.1.3.23 Installing the C++ driver ...... 49 2.1.2.1.3.24 Prerequisites ...... 49 2.1.2.1.3.25 Getting the library ...... 50 2.1.2.1.3.26 Extract the tarball archive ...... 50 2.1.2.1.3.27 Examples ...... 50 2.1.2.1.3.28 Testing the connection to SQream DB ...... 50 2.1.2.1.3.29 Compiling and running the application ...... 51 2.1.2.1.3.30 Creating a table and inserting values ...... 51 2.1.2.1.3.31 Compiling and running the application ...... 52 2.1.2.1.3.32 JDBC ...... 52 2.1.2.1.3.33 Installing the JDBC driver ...... 53 2.1.2.1.3.34 Prerequisites ...... 53 2.1.2.1.3.35 Getting the JAR file ...... 53 2.1.2.1.3.36 Extract the zip archive ...... 53 2.1.2.1.3.37 Setting up the Class Path ...... 53 2.1.2.1.3.38 Connect to SQream DB with a JDBC application ...... 53 2.1.2.1.3.39 Driver class ...... 53 2.1.2.1.3.40 Connection string ...... 54 2.1.2.1.3.41 Connection parameters ...... 54 2.1.2.1.3.42 Connection string examples ...... 54 2.1.2.1.3.43 Sample Java program ...... 54 2.1.2.1.3.44 ODBC ...... 56 2.1.2.1.3.45 Install and Configure ODBC on Windows ...... 56 2.1.2.1.3.46 Installing the ODBC Driver ...... 56
iii 2.1.2.1.3.47 Prerequisites ...... 56 2.1.2.1.3.48 Visual Studio 2015 Redistributables ...... 56 2.1.2.1.3.49 Administrator Privileges ...... 57 2.1.2.1.3.50 1. Run the Windows installer ...... 57 2.1.2.1.3.51 2. Selecting Components ...... 57 2.1.2.1.3.52 3. Configuring the ODBC Driver DSN ...... 58 2.1.2.1.3.53 Connection Parameters ...... 61 2.1.2.1.3.54 Troubleshooting ...... 61 2.1.2.1.3.55 Solving “Code 126” ODBC errors ...... 61 2.1.2.1.3.56 Install and configure ODBC on Linux ...... 61 2.1.2.1.3.57 Prerequisites ...... 62 2.1.2.1.3.58 unixODBC ...... 62 2.1.2.1.3.59 Install unixODBC on RHEL 7 / CentOS 7 ...... 62 2.1.2.1.3.60 Install unixODBC on Ubuntu ...... 62 2.1.2.1.3.61 Install the ODBC driver with a script ...... 62 2.1.2.1.3.62 Install the ODBC driver manually ...... 63 2.1.2.1.3.63 Install the driver dependencies ...... 64 2.1.2.1.3.64 Testing the connection ...... 64 2.1.2.1.3.65 ODBC DSN Parameters ...... 66 2.1.2.1.3.66 Downloading the ODBC driver ...... 67 2.1.2.1.3.67 Install and configure the ODBC driver ...... 67 2.1.2.1.3.68 Node.JS ...... 67 2.1.2.1.3.69 Installing the Node.JS driver ...... 68 2.1.2.1.3.70 Prerequisites ...... 68 2.1.2.1.3.71 Install with NPM ...... 68 2.1.2.1.3.72 Install from an offline package ...... 68 2.1.2.1.3.73 Connect to SQream DB with a Node.JS application ..... 69 2.1.2.1.3.74 Create a simple test ...... 69 2.1.2.1.3.75 Run the test ...... 69 2.1.2.1.3.76 API reference ...... 70 2.1.2.1.3.77 Connection parameters ...... 70 2.1.2.1.3.78 Events ...... 70 2.1.2.1.3.79 Example ...... 70 2.1.2.1.3.80 Input placeholders ...... 70 2.1.2.1.3.81 Examples ...... 71 2.1.2.1.3.82 Setting configuration flags ...... 71 2.1.2.1.3.83 Lazyloading ...... 71 2.1.2.1.3.84 Reusing a connection ...... 72 2.1.2.1.3.85 Using placeholders in queries ...... 73 2.1.2.1.3.86 Troubleshooting and recommended configuration ...... 73 2.1.2.1.3.87 Preventing heap out of memory errors ...... 73 2.1.2.1.3.88 BIGINT support ...... 73 2.1.3 Third Party Tools ...... 74 2.1.3.1 Overview ...... 74 2.1.3.1.1 Connect to SQream Using SQL Workbench ...... 74 2.1.3.1.1.1 Installing SQL Workbench with the SQream DB installer (Windows only) ...... 75 2.1.3.1.1.2 Installing SQL Workbench manually (Linux, MacOS) .... 76 2.1.3.1.1.3 Install Java Runtime ...... 76 2.1.3.1.1.4 Get the SQream DB JDBC driver ...... 77 2.1.3.1.1.5 Install SQL Workbench ...... 77 2.1.3.1.1.6 Setting up the SQream DB JDBC driver profile ...... 77 2.1.3.1.1.7 Create a new connection profile for your cluster ...... 80 2.1.3.1.1.8 Suggested optional configuration ...... 81 iv 2.1.3.1.2 Connecting to SQream Using Tableau ...... 81 2.1.3.1.2.1 Overview ...... 81 2.1.3.1.2.2 Installing the JDBC Driver and Tableau Connector Plugin . 82 2.1.3.1.2.3 Installing the JDBC Driver Using the Windows Installer ... 82 2.1.3.1.2.4 Installing the JDBC Driver Manually ...... 82 2.1.3.1.2.5 Installing the ODBC Driver for Tableau Versions 2019.3 and Earlier ...... 84 2.1.3.1.2.6 Automatically Reconfiguring the ODBC Driver After Initial Installation ...... 84 2.1.3.1.2.7 Manually Reconfiguring the ODBC Driver After Initial In- stallation ...... 85 2.1.3.1.2.8 Configuring the ODBC Connection ...... 86 2.1.3.1.2.9 Connecting Tableau to SQream ...... 88 2.1.3.1.2.10 Connecting to SQream ...... 89 2.1.3.1.2.11 Setting Up SQream Tables as Data Sources ...... 89 2.1.3.1.2.12 Tableau Best Practices and Troubleshooting ...... 90 2.1.3.1.2.13 Inserting Only Required Data ...... 90 2.1.3.1.2.14 Using Tableau’s Table Query Syntax ...... 90 2.1.3.1.2.15 Creating a Separate Service for Tableau ...... 91 2.1.3.1.2.16 Troubleshooting Workbook Performance Before Deploying to the Tableau Server ...... 91 2.1.3.1.2.17 Troubleshooting Error Codes ...... 91 2.1.3.1.3 Connect to SQream Using Pentaho Data Integration ...... 92 2.1.3.1.3.1 Overview ...... 92 2.1.3.1.3.2 Installing Pentaho ...... 92 2.1.3.1.3.3 Installing and Setting Up the JDBC Driver ...... 93 2.1.3.1.3.4 Creating a Transformation ...... 93 2.1.3.1.3.5 Defining Your Output ...... 100 2.1.3.1.3.6 Importing Data ...... 103 2.1.3.1.4 Connect to SQream Using MicroStrategy ...... 108 2.1.3.1.4.1 Overview ...... 108 2.1.3.1.4.2 What is MicroStrategy? ...... 108 2.1.3.1.4.3 Connecting a Data Source ...... 108 2.1.3.1.4.4 Supported SQream Drivers ...... 110 2.1.3.1.5 Connect to SQream Using R ...... 110 2.1.3.1.5.1 JDBC ...... 111 2.1.3.1.5.2 A full example ...... 112 2.1.3.1.5.3 ODBC ...... 112 2.1.3.1.5.4 A full example ...... 113 2.1.3.1.6 Connect to SQream Using PHP ...... 113 2.1.3.1.6.1 Prerequisites ...... 113 2.1.3.1.6.2 Testing the connection ...... 113 2.1.3.1.7 Connect to SQream Using SAS Viya ...... 114 2.1.3.1.7.1 Installing SAS Viya ...... 114 2.1.3.1.7.2 Installing the JDBC driver ...... 114 2.1.3.1.7.3 Configuring the JDBC driver in SAS Viya ...... 115 2.1.3.1.7.4 Browsing data and workbooks ...... 116 2.1.3.1.7.5 SAS Viya best practices and troubleshooting ...... 118 2.1.3.1.7.6 Cut out what you don’t need ...... 118 2.1.3.1.7.7 Create a separate service for SAS Viya ...... 118 2.1.3.1.7.8 Troubleshooting java.lang.ClassNotFoundException: com.sqream.jdbc.SQDriver exceptions ...... 118 2.1.4 Operations ...... 119 2.1.4.1 Optimization and Best Practices ...... 119
v 2.1.4.1.1 Table design ...... 119 2.1.4.1.1.1 Use date and datetime types for columns ...... 120 2.1.4.1.1.2 Reduce varchar length to a minimum ...... 120 2.1.4.1.1.3 Don’t flatten or denormalize data ...... 120 2.1.4.1.1.4 Convert foreign tables to native tables ...... 120 2.1.4.1.1.5 Use information about the column data to your advantage . 120 2.1.4.1.1.6 Set NULL or NOT NULL when relevant ...... 120 2.1.4.1.1.7 Keep VARCHAR lengths to a minimum ...... 121 2.1.4.1.2 Sorting ...... 121 2.1.4.1.3 Query best practices ...... 121 2.1.4.1.3.1 Reduce data sets before joining tables ...... 121 2.1.4.1.3.2 Prefer the ANSI JOIN ...... 121 2.1.4.1.3.3 Use the high selectivity hint ...... 122 2.1.4.1.3.4 Cast smaller types to avoid overflow in aggregates ...... 122 2.1.4.1.3.5 Prefer COUNT(*) and COUNT on non-nullable columns ..... 123 2.1.4.1.3.6 Return only required columns ...... 123 2.1.4.1.3.7 Use saved queries to reduce recurring compilation time ... 123 2.1.4.1.3.8 Pre-filter to reduce JOIN complexity ...... 123 2.1.4.1.4 Data loading considerations ...... 124 2.1.4.1.4.1 Allow and use natural sorting on data ...... 124 2.1.4.1.5 Further reading and monitoring query performance ...... 124 2.1.4.2 Installing Monit ...... 124 2.1.4.2.1 Getting Started ...... 124 2.1.4.2.2 Overview ...... 124 2.1.4.2.2.1 Installing Monit on CentOS: ...... 124 2.1.4.2.2.2 Installing Monit on CentOS Offline: ...... 125 2.1.4.2.2.3 Building Monit from Source Code ...... 125 2.1.4.2.2.4 Building Monit from Pre-Built Binaries ...... 125 2.1.4.2.2.5 Installing Monit on Ubuntu: ...... 126 2.1.4.2.2.6 Installing Monit on Ubuntu Offline: ...... 126 2.1.4.2.3 Configuring Monit ...... 126 2.1.4.2.4 Starting Monit ...... 127 2.1.4.3 Launching SQream with Monit ...... 128 2.1.4.3.1 Launching SQream ...... 128 2.1.4.3.2 Monit Usage Examples ...... 131 2.1.4.3.2.1 Stopping Monit and SQream Separately ...... 131 2.1.4.3.2.2 Stopping SQream Using a Monit Command ...... 131 2.1.4.3.2.3 Monit Command Line Options ...... 131 2.1.4.3.3 Using Monit While Upgrading Your Version of SQream ...... 132 2.1.4.4 Installing SQream Using Binary Packages ...... 133 2.1.4.4.1 Upgrading SQream Version ...... 135 2.1.4.5 Installing SQream with Kubernetes ...... 137 2.1.4.5.1 Preparing the SQream Environment to Launch SQream Using Ku- bernetes ...... 137 2.1.4.5.1.1 Overview ...... 137 2.1.4.5.1.2 Operating System Requirements ...... 138 2.1.4.5.1.3 Compute Server Specifications ...... 138 2.1.4.5.2 Setting Up Your Hosts ...... 138 2.1.4.5.2.1 Configuring the Hosts File ...... 138 2.1.4.5.2.2 Installing the Required Packages ...... 139 2.1.4.5.2.3 Disabling the Linux UI ...... 139 2.1.4.5.2.4 Disabling SELinux ...... 140 2.1.4.5.2.5 Disabling Your Firewall ...... 140 2.1.4.5.2.6 Checking the CUDA Version ...... 140 vi 2.1.4.5.3 Installing Your Kubernetes Cluster ...... 141 2.1.4.5.3.1 Generating and Sharing SSH Keypairs Across All Existing Nodes ...... 141 2.1.4.5.3.2 Installing and Deploying a Kubernetes Cluster Using Kube- spray ...... 142 2.1.4.5.3.3 Adjusting Kubespray Deployment Values ...... 145 2.1.4.5.3.4 Checking Your Kubernetes Status ...... 146 2.1.4.5.3.5 Adding a SQream Label to Your Kubernetes Cluster Nodes . 147 2.1.4.5.3.6 Copying Your Kubernetes Configuration API File to the Mas- ter Cluster Nodes ...... 148 2.1.4.5.3.7 Creating an env_file in Your Home Directory ...... 149 2.1.4.5.3.8 Creating a Base Kubernetes Namespace ...... 150 2.1.4.5.3.9 Pushing the env_file File to the Kubernetes Configmap .. 150 2.1.4.5.3.10 Installing the NVIDIA Docker2 Toolkit ...... 150 2.1.4.5.3.11 Installing the NVIDIA Docker2 Toolkit on an x86_64 Bit Processor on CentOS ...... 150 2.1.4.5.3.12 Installing the NVIDIA Docker2 Toolkit on an x86_64 Bit Processor on Ubuntu ...... 151 2.1.4.5.3.13 Modifying the Docker Daemon JSON File for GPU and Com- pute Nodes ...... 152 2.1.4.5.3.14 Modifying the Docker Daemon JSON File for GPU Nodes .. 152 2.1.4.5.3.15 Modifying the Docker Daemon JSON File for Compute Nodes 153 2.1.4.5.3.16 Installing the Nvidia-device-plugin Daemonset ...... 153 2.1.4.5.3.17 Creating an Nvidia Device Plugin ...... 154 2.1.4.5.3.18 Checking GPU Resources Allocatable to GPU Nodes .... 154 2.1.4.5.3.19 Preparing the WatchDog Monitor ...... 155 2.1.4.5.4 Installing the SQream Software ...... 155 2.1.4.5.4.1 Getting the SQream Package ...... 155 2.1.4.5.4.2 Setting Up and Configuring Hadoop ...... 156 2.1.4.5.4.3 Starting a Local Docker Image Registry ...... 156 2.1.4.5.4.4 Installing the Kubernetes Dashboard ...... 157 2.1.4.5.4.5 Installing the SQream Prometheus Package ...... 159 2.1.4.5.4.6 Installing the Exporter Service ...... 159 2.1.4.5.4.7 Checking the Exporter Status ...... 160 2.1.4.5.5 Running the Sqream-install Service ...... 160 2.1.4.5.5.1 Installing Your License ...... 160 2.1.4.5.5.2 Changing Your Data Ingest Folder ...... 161 2.1.4.5.5.3 Checking Your System Settings ...... 161 2.1.4.5.5.4 SQream Installation Command Reference ...... 161 2.1.4.5.5.5 Controlling Your Kubernetes Cluster Using SQream Flags .. 162 2.1.4.5.6 Using the sqream-start Commands ...... 163 2.1.4.5.6.1 Starting Your SQream Services ...... 163 2.1.4.5.6.2 Starting Your SQream Services in Split Mode ...... 164 2.1.4.5.6.3 Starting the Sqream Studio UI ...... 165 2.1.4.5.6.4 Stopping the SQream Services ...... 165 2.1.4.5.6.5 Advanced sqream-start Commands ...... 166 2.1.4.5.6.6 Controlling Your SQream Spool Size ...... 166 2.1.4.5.6.7 Using a Custom .json File ...... 166 2.1.4.5.6.8 Checking the Status of the SQream Services ...... 166 2.1.4.5.7 Upgrading Your SQream Version ...... 166 2.1.4.5.7.1 Before Upgrading Your System ...... 167 2.1.4.5.7.2 Upgrading Your System ...... 167 2.1.4.6 Running SQream in a Docker Container ...... 167 2.1.4.6.1 Setting Up a Host ...... 168
vii 2.1.4.6.1.1 Operating System Requirements ...... 168 2.1.4.6.1.2 Creating a Local User ...... 168 2.1.4.6.1.3 Setting a Local Language ...... 169 2.1.4.6.1.4 Adding the EPEL Repository ...... 169 2.1.4.6.1.5 Installing the Required NTP Packages ...... 169 2.1.4.6.1.6 Installing the Recommended Tools ...... 169 2.1.4.6.1.7 Updating to the Current Version of the Operating System .. 169 2.1.4.6.1.8 Configuring the NTP Package ...... 170 2.1.4.6.1.9 Configuring the Performance Profile ...... 170 2.1.4.6.1.10 Configuring Your Security Limits ...... 170 2.1.4.6.1.11 Disabling Automatic Bug-Reporting Tools ...... 171 2.1.4.6.1.12 Preparing the Nvidia CUDA Drive for Installation ...... 171 2.1.4.6.1.13 Installing the Nvidia CUDA Driver ...... 172 2.1.4.6.1.14 Installing the CUDA Driver Version 10.1 for x86_64 ..... 172 2.1.4.6.1.15 Installing the CUDA Driver Version 10.1 for IBM Power9 .. 174 2.1.4.6.2 Installing the Docker Engine (Community Edition) ...... 175 2.1.4.6.2.1 Installing the Docker Engine Using an x86_64 Processor on CentOS ...... 176 2.1.4.6.2.2 Installing the Docker Engine Using an x86_64 Processor on Ubuntu ...... 176 2.1.4.6.2.3 Installing the Docker Engine on an IBM Power9 Processor . 176 2.1.4.6.3 Docker Post-Installation ...... 176 2.1.4.6.4 Installing the Nvidia Docker2 ToolKit ...... 177 2.1.4.6.4.1 Installing the NVIDIA Docker2 Toolkit on an x86_64 Processor177 2.1.4.6.4.2 Installing the NVIDIA Docker2 Toolkit on a CentOS Oper- ating System ...... 177 2.1.4.6.4.3 Installing the NVIDIA Docker2 Toolkit on an Ubuntu Oper- ating System ...... 178 2.1.4.6.4.4 Installing the NVIDIA Docker2 Toolkit on a PPC64le Processor179 2.1.4.6.4.5 Accessing the Hadoop and Kubernetes Configuration Files . 180 2.1.4.6.5 Installing the SQream Software ...... 180 2.1.4.6.5.1 Preparing Your Local Environment ...... 180 2.1.4.6.5.2 Deploying the SQream Software ...... 180 2.1.4.6.5.3 Configuring the Hadoop and Kubernetes Configuration Files 181 2.1.4.6.5.4 Configuring the SQream Software ...... 182 2.1.4.6.5.5 Configuring Your Local Environment ...... 182 2.1.4.6.5.6 Installing Your License ...... 183 2.1.4.6.5.7 Validating Your License ...... 183 2.1.4.6.5.8 Setting the Hadoop and Kubernetes Connectivity Parameters 184 2.1.4.6.5.9 Modifying Your Data Ingest Folder ...... 184 2.1.4.6.5.10 Configuring Your Network for Docker ...... 184 2.1.4.6.5.11 Checking and Verifying Your System Settings ...... 185 2.1.4.6.6 Using the SQream Console ...... 185 2.1.4.6.6.1 SQream Console - Basic Commands ...... 185 2.1.4.6.6.2 Starting Your SQream Console ...... 186 2.1.4.6.6.3 Starting the SQream Master ...... 186 2.1.4.6.6.4 Starting SQream Workers ...... 186 2.1.4.6.6.5 Listing the Running Services ...... 186 2.1.4.6.6.6 Stopping the Running Services ...... 187 2.1.4.6.6.7 Using SQream Studio ...... 187 2.1.4.6.6.8 Using the SQream Client ...... 188 2.1.4.6.6.9 Moving from Docker Installation to Standard On-Premises Installation ...... 188 2.1.4.6.6.10 SQream Console - Advanced Commands ...... 188 viii 2.1.4.6.6.11 Controlling the Spool Size ...... 188 2.1.4.6.6.12 Splitting a GPU ...... 189 2.1.4.6.6.13 Splitting GPU and Setting the Spool Size ...... 189 2.1.4.6.6.14 Using a Custom Configuration File ...... 189 2.1.4.6.6.15 Clustering Your Docker Environment ...... 189 2.1.4.6.6.16 Checking the Status of SQream Services ...... 190 2.1.4.6.6.17 Checking the Status of SQream Services from the SQream Console ...... 190 2.1.4.6.6.18 Checking the Status of SQream Services from Outside the SQream Console ...... 190 2.1.4.6.6.19 Upgrading Your SQream System ...... 190 2.1.4.7 Monitoring query performance ...... 191 2.1.4.7.1 Setting up the system for monitoring ...... 192 2.1.4.7.1.1 Adjusting the logging frequency ...... 193 2.1.4.7.1.2 Reading execution plans with a foreign table ...... 193 2.1.4.7.2 The SHOW_NODE_INFO command ...... 194 2.1.4.7.3 Understanding the query execution plan output ...... 195 2.1.4.7.3.1 Information presented in the execution plan ...... 197 2.1.4.7.3.2 Commonly seen nodes ...... 197 2.1.4.7.4 Examples ...... 198 2.1.4.7.4.1 1. Spooling to disk ...... 199 2.1.4.7.4.2 Identifying the offending nodes ...... 199 2.1.4.7.4.3 Common solutions for reducing spool ...... 201 2.1.4.7.4.4 2. Queries with large result sets ...... 201 2.1.4.7.4.5 Identifying the offending nodes ...... 201 2.1.4.7.4.6 Common solutions for reducing gather time ...... 202 2.1.4.7.4.7 3. Inefficient filtering ...... 202 2.1.4.7.4.8 Identifying the situation ...... 202 2.1.4.7.4.9 Common solutions for improving filtering ...... 205 2.1.4.7.4.10 4. Joins with varchar keys ...... 206 2.1.4.7.4.11 Identifying the situation ...... 206 2.1.4.7.4.12 Improving query performance ...... 207 2.1.4.7.4.13 5. Sorting on big VARCHAR fields ...... 208 2.1.4.7.4.14 Identifying the situation ...... 208 2.1.4.7.4.15 Improving sort performance on text keys ...... 210 2.1.4.7.4.16 6. High selectivity data ...... 210 2.1.4.7.4.17 Identifying the situation ...... 211 2.1.4.7.4.18 Improving performance with high selectivity hints ...... 211 2.1.4.7.4.19 7. Performance of unsorted data in joins ...... 211 2.1.4.7.4.20 Identifying the situation ...... 211 2.1.4.7.4.21 Improving join performance when data is sparse ...... 212 2.1.4.7.4.22 8. Manual join reordering ...... 213 2.1.4.7.4.23 Identifying the situation ...... 213 2.1.4.7.4.24 Changing the join order ...... 213 2.1.4.7.5 Further reading ...... 214 2.1.4.8 Logging ...... 214 2.1.4.8.1 Locating the Log Files ...... 214 2.1.4.8.1.1 Log Structure and Contents ...... 215 2.1.4.8.1.2 Log-Naming ...... 217 2.1.4.8.2 Log Control and Maintenance ...... 217 2.1.4.8.2.1 Changing Log Verbosity ...... 217 2.1.4.8.2.2 Changing Log Rotation ...... 217 2.1.4.8.3 Collecting Logs from Your Cluster ...... 217 2.1.4.8.3.1 SQL Syntax ...... 218
ix 2.1.4.8.3.2 Command Line Utility ...... 218 2.1.4.8.3.3 Parameters ...... 218 2.1.4.8.3.4 Example ...... 218 2.1.4.8.4 Troubleshooting with Logs ...... 219 2.1.4.8.4.1 Loading Logs with Foreign Tables ...... 219 2.1.4.8.4.2 Counting Message Types ...... 219 2.1.4.8.4.3 Finding Fatal Errors ...... 220 2.1.4.8.4.4 Countng Error Events Within a Certain Timeframe ..... 220 2.1.4.8.4.5 Tracing Errors to Find Offending Statements ...... 220 2.1.4.9 Configuration ...... 221 2.1.4.9.1 Overview ...... 221 2.1.4.9.2 Configuring Your Instance of SQream ...... 221 2.1.4.9.2.1 Configuring SQream Using the SQream Configuration File . 222 2.1.4.9.2.2 Configuring SQream Using a Legacy Configuration File ... 222 2.1.4.9.3 Session-Based Configuration ...... 222 2.1.4.9.4 Cluster-Based Configuration ...... 223 2.1.4.9.5 Configuration Flag Types ...... 223 2.1.4.9.6 Configuration Commands ...... 223 2.1.4.9.7 Configuration Roles ...... 225 2.1.4.9.8 Showing All Flags in the Catalog Table ...... 225 2.1.4.10 Troubleshooting ...... 225 2.1.4.10.1 Troubleshooting common issues ...... 227 2.1.4.10.1.1 Troubleshoot cluster setup and configuration ...... 227 2.1.4.10.1.2 Troubleshoot connectivity issues ...... 227 2.1.4.10.1.3 Troubleshoot query performance ...... 227 2.1.4.10.1.4 Troubleshoot query behavior ...... 227 2.1.4.10.1.5 File an issue with SQream support ...... 227 2.1.4.10.2 Examining logs ...... 228 2.1.4.10.3 Start a temporary SQream DB for testing ...... 228 2.1.4.11 Gathering information for SQream support ...... 228 2.1.4.11.1 Getting support and reporting bugs ...... 228 2.1.4.11.2 How SQream debugs issues ...... 228 2.1.4.11.2.1 Reproduce ...... 228 2.1.4.11.2.2 Logs ...... 229 2.1.4.11.2.3 Fix ...... 229 2.1.4.11.3 Collecting a reproducible example of a problematic statement ... 229 2.1.4.11.3.1 SQL Syntax ...... 229 2.1.4.11.3.2 Parameters ...... 229 2.1.4.11.3.3 Example ...... 230 2.1.4.11.4 Collecting logs and metadata database ...... 230 2.1.4.11.4.1 Examples ...... 230 2.1.4.12 Creating or cloning a storage cluster ...... 230 2.1.4.12.1 Creating a new storage cluster ...... 230 2.1.4.12.2 Tell SQream DB to use this storage cluster ...... 231 2.1.4.12.2.1 Permanently setting the storage cluster setting ...... 231 2.1.4.12.2.2 Start a temporary SQream DB worker with a storage cluster 231 2.1.4.12.2.3 Using a configuration file (recommended) ...... 231 2.1.4.12.2.4 Using the command line parameters ...... 232 2.1.4.12.3 Copying an existing storage cluster ...... 232 2.1.4.13 Installing Studio on a Stand-Alone Server ...... 232 2.1.4.13.1 Installing NodeJS Version 12 on the Server ...... 233 2.1.4.13.2 Installing Studio ...... 234 2.1.4.13.3 Starting Studio Manually ...... 235 2.1.4.13.4 Starting Studio as a Service ...... 235 x 2.1.4.13.5 Accessing Studio ...... 236 2.1.4.13.6 Maintaining Studio with the Process Manager (PM2) ...... 237 2.1.4.13.7 Upgrading Studio ...... 237 2.1.4.13.7.1 Installing Studio in a Docker Container ...... 238 2.1.4.13.8 Installing SQream Studio in a Docker Container ...... 238 2.1.4.13.9 Accessing Studio ...... 239 2.1.4.13.10 Docker Container Commands ...... 239 2.1.4.13.11 Setup Argument Configurations ...... 239 2.1.4.14 SQream Acceleration Studio 5.4.0 ...... 240 2.1.4.14.1 Getting Started ...... 241 2.1.4.14.1.1 Setting Up and Starting Studio ...... 241 2.1.4.14.1.2 Logging In to Studio ...... 241 2.1.4.14.1.3 Navigating Studio’s Main Features ...... 242 2.1.4.14.2 Monitoring Workers and Services from the Dashboard ...... 242 2.1.4.14.2.1 Displaying System Disk Usage from the Data Storage Panel . 243 2.1.4.14.2.2 Subscribing to Workers from the Services Panel ...... 244 2.1.4.14.2.3 Adding A Service ...... 245 2.1.4.14.2.4 Managing Workers from the Workers Panel ...... 245 2.1.4.14.2.5 Viewing Workers ...... 245 2.1.4.14.2.6 Adding A Worker to A Service ...... 246 2.1.4.14.2.7 Viewing A Worker’s Active Query Information ...... 246 2.1.4.14.2.8 Viewing A Worker’s Host Utilization ...... 246 2.1.4.14.2.9 Viewing a Worker’s Execution Plan ...... 247 2.1.4.14.2.10 Managing Worker Status ...... 248 2.1.4.14.2.11 License Information ...... 248 2.1.4.14.3 Executing Statements and Running Queries from the Editor .... 248 2.1.4.14.3.1 Executing Statements from the Toolbar ...... 249 2.1.4.14.3.2 Performing Statement-Related Operations from the Database Tree ...... 249 2.1.4.14.3.3 Optimizing Database Tables Using the DDL Optimizer ... 251 2.1.4.14.3.4 Executing Pre-Defined Queries from the System Queries Panel251 2.1.4.14.3.5 Writing Statements and Queries from the Statement Panel . 252 2.1.4.14.3.6 Viewing Statement and Query Results from the Results Panel 252 2.1.4.14.3.7 Searching Query Results in the Results View ...... 253 2.1.4.14.3.8 Running Parallel Statements ...... 253 2.1.4.14.3.9 Execution Details View ...... 254 2.1.4.14.3.10 Viewing Wrapped Strings in the SQL View ...... 254 2.1.4.14.3.11 Saving Results to the Clipboard ...... 255 2.1.4.14.3.12 Saving Results to a Local File ...... 255 2.1.4.14.3.13 Analyzing Results ...... 255 2.1.4.14.4 Viewing Logs ...... 255 2.1.4.14.4.1 Filtering Table Data ...... 256 2.1.4.14.4.2 Viewing Query Logs ...... 257 2.1.4.14.4.3 Viewing Session Logs ...... 257 2.1.4.14.4.4 Viewing System Logs ...... 257 2.1.4.14.4.5 Viewing All Log Lines ...... 258 2.1.4.14.5 Creating, Assigning, and Managing Roles and Permissions ..... 258 2.1.4.14.5.1 Overview ...... 258 2.1.4.14.5.2 Viewing Information About a Role ...... 259 2.1.4.14.5.3 Creating a New Role ...... 259 2.1.4.14.5.4 Editing a Role ...... 260 2.1.4.14.5.5 Deleting a Role ...... 260 2.1.4.14.6 Configuring Your Instance of SQream ...... 260 2.1.4.14.6.1 Editing Your Parameters ...... 260
xi 2.1.4.14.6.2 Exporting and Importing Configuration Files ...... 261 2.1.4.15 Using the statement editor (deprecated) ...... 261 2.1.4.15.1 Setting up and starting the statement editor ...... 262 2.1.4.15.2 Logging in ...... 262 2.1.4.15.3 Familiarizing yourself with the editor ...... 263 2.1.4.15.3.1 Toolbar ...... 264 2.1.4.15.3.2 Statement area ...... 265 2.1.4.15.3.3 Keyboard shortcuts ...... 265 2.1.4.15.3.4 Results ...... 265 2.1.4.15.3.5 Sorting results ...... 266 2.1.4.15.3.6 Viewing execution information ...... 266 2.1.4.15.3.7 Saving results to a file or clipboard ...... 266 2.1.4.15.3.8 Database tree ...... 266 2.1.4.15.3.9 Database ...... 268 2.1.4.15.3.10 Schema ...... 268 2.1.4.15.3.11 Table ...... 268 2.1.4.15.3.12 Create a table ...... 268 2.1.4.15.3.13 Catalog views ...... 269 2.1.4.15.3.14 Predefined queries ...... 270 2.1.4.15.4 Notifications ...... 270 2.1.4.15.5 DDL optimizer ...... 270 2.1.4.15.5.1 Using the DDL optimizer ...... 270 2.1.4.15.5.2 Analyzing the results ...... 271 2.1.4.16 Hardware Guide ...... 272 2.1.4.16.1 A SQream DB Cluster ...... 272 2.1.4.16.1.1 Single-Node Cluster Example ...... 272 2.1.4.16.1.2 Multi-Node Cluster Example ...... 273 2.1.4.16.2 Cluster Design Considerations ...... 273 2.1.4.16.2.1 Balancing Cost and Performance ...... 274 2.1.4.16.2.2 CPU Compute ...... 274 2.1.4.16.2.3 GPU Compute and RAM ...... 274 2.1.4.16.2.4 RAM ...... 275 2.1.4.16.2.5 Operating System ...... 275 2.1.4.16.2.6 Storage ...... 275 2.1.4.16.3 Cluster Supporting 32 Concurrent Active User Example ...... 275 2.1.4.17 Security ...... 276 2.1.4.17.1 Overview ...... 276 2.1.4.17.2 Security best practices for SQream DB ...... 277 2.1.4.17.2.1 Secure OS access ...... 277 2.1.4.17.2.2 Change the default SUPERUSER ...... 277 2.1.4.17.2.3 Create distinct user roles ...... 277 2.1.4.17.2.4 Limit SUPERUSER access ...... 277 2.1.4.17.2.5 Password strength guidelines ...... 278 2.1.4.17.2.6 Use TLS/SSL when possible ...... 278 2.1.4.18 Setting up SQream DB ...... 278 2.1.4.18.1 Before you begin ...... 278 2.1.4.18.1.1 What does the installation process look like? ...... 279 2.1.4.18.2 Start a local SQream DB cluster with Docker ...... 279 2.1.4.18.2.1 Preparing your machine for NVIDIA Docker ...... 280 2.1.4.18.2.2 CentOS 7 / RHEL 7 / Amazon Linux ...... 280 2.1.4.18.2.3 Ubuntu 18.04 ...... 281 2.1.4.18.2.4 Install Docker CE and NVIDIA docker ...... 282 2.1.4.18.2.5 CentOS 7 / RHEL 7 / Amazon Linux (x64) ...... 282 2.1.4.18.2.6 CentOS 7.6 / RHEL 7.6 (IBM POWER9) ...... 284 xii 2.1.4.18.2.7 Ubuntu 18.04 (x64) ...... 285 2.1.4.18.2.8 Preparing directories and mounts for SQream DB ...... 286 2.1.4.18.2.9 Install the SQream DB Docker container ...... 287 2.1.4.18.2.10 Starting your first local cluster ...... 288 2.1.4.18.3 Recommended Pre-Installation Configuration ...... 288 2.1.4.18.3.1 Recommended BIOS Settings ...... 289 2.1.4.18.3.2 Installing the Operating System ...... 290 2.1.4.18.3.3 Configuring the Operating System ...... 291 2.1.4.18.3.4 Logging In to the Server ...... 291 2.1.4.18.3.5 Automatically Creating a SQream User ...... 291 2.1.4.18.3.6 Manually Creating a SQream User ...... 292 2.1.4.18.3.7 Setting Up A Locale ...... 292 2.1.4.18.3.8 Installing the Required Packages ...... 292 2.1.4.18.3.9 Installing the Recommended Tools ...... 293 2.1.4.18.3.10 Installing Python 3.6.7 ...... 293 2.1.4.18.3.11 Installing NodeJS on CentOS ...... 293 2.1.4.18.3.12 Installing NodeJS on Ubuntu ...... 294 2.1.4.18.3.13 Configuring the Network Time Protocol (NTP) ...... 294 2.1.4.18.3.14 Configuring the Network Time Protocol Server ...... 294 2.1.4.18.3.15 Configuring the Server to Boot Without theUI ...... 295 2.1.4.18.3.16 Configuring the Security Limits ...... 295 2.1.4.18.3.17 Configuring the Kernel Parameters ...... 295 2.1.4.18.3.18 Configuring the Firewall ...... 296 2.1.4.18.3.19 Disabling selinux ...... 296 2.1.4.18.3.20 Configuring the /etc/hosts File ...... 297 2.1.4.18.3.21 Configuring the DNS ...... 297 2.1.4.18.3.22 Installing the Nvidia CUDA Driver ...... 297 2.1.4.18.3.23 CUDA Driver Prerequisites ...... 297 2.1.4.18.3.24 Updating the Kernel Headers ...... 298 2.1.4.18.3.25 Disabling Nouveau ...... 298 2.1.4.18.3.26 Installing the CUDA Driver ...... 299 2.1.4.18.3.27 Installing the CUDA Driver from the Repository ...... 299 2.1.4.18.3.28 Tuning Up NVIDIA Performance ...... 301 2.1.4.18.3.29 Disabling Automatic Bug Reporting Tools ...... 301 2.1.4.18.3.30 Enabling Core Dumps ...... 302 2.1.4.18.3.31 Checking the abrtd Status ...... 302 2.1.4.18.3.32 Setting the Limits ...... 303 2.1.4.18.3.33 Creating the Core Dumps Directory ...... 303 2.1.4.18.3.34 Setting the Output Directory of the /etc/sysctl.conf File .. 303 2.1.4.18.3.35 Verifying that the Core Dumps Work ...... 304 2.1.4.18.3.36 Troubleshooting Core Dumping ...... 304 2.1.5 Features guides ...... 305 2.1.5.1 Access control ...... 305 2.1.5.1.1 Overview ...... 306 2.1.5.1.2 Terminology ...... 306 2.1.5.1.2.1 Roles ...... 306 2.1.5.1.2.2 Authentication ...... 306 2.1.5.1.2.3 Authorization ...... 306 2.1.5.1.3 Roles ...... 306 2.1.5.1.3.1 Creating new roles (users) ...... 306 2.1.5.1.3.2 Dropping a user ...... 307 2.1.5.1.3.3 Altering a user name ...... 307 2.1.5.1.3.4 Changing user passwords ...... 307 2.1.5.1.3.5 Public Role ...... 307
xiii 2.1.5.1.3.6 Role membership (groups) ...... 308 2.1.5.1.4 Permissions ...... 309 2.1.5.1.4.1 GRANT ...... 309 2.1.5.1.4.2 REVOKE ...... 310 2.1.5.1.4.3 Default permissions ...... 311 2.1.5.1.5 Departmental Example ...... 312 2.1.5.1.5.1 Setting up the department permissions ...... 312 2.1.5.1.5.2 Creating new users in the departments ...... 314 2.1.5.2 Concurrency and locks ...... 315 2.1.5.2.1 Locking modes ...... 316 2.1.5.2.2 When are locks obtained? ...... 316 2.1.5.2.2.1 Global locks ...... 316 2.1.5.2.3 Monitoring locks ...... 316 2.1.5.2.4 Troubleshooting locks ...... 317 2.1.5.2.4.1 Example ...... 317 2.1.5.3 Concurrency and scaling in SQream DB ...... 318 2.1.5.3.1 Scaling when data sizes grow ...... 318 2.1.5.3.2 Scaling when queries are queueing ...... 318 2.1.5.3.3 What to do when queries are slow ...... 318 2.1.5.4 Metadata system ...... 318 2.1.5.4.1 How is metadata collected? ...... 319 2.1.5.4.2 How is metadata used? ...... 319 2.1.5.4.3 Why is metadata always on? ...... 320 2.1.5.5 Chunks and extents ...... 320 2.1.5.5.1 What are chunks? What are extents? ...... 320 2.1.5.5.1.1 Chunks ...... 320 2.1.5.5.1.2 Extents ...... 321 2.1.5.5.2 Why was SQream DB built with chunks and extents? ...... 321 2.1.5.5.2.1 Benefits of chunking ...... 321 2.1.5.5.2.2 Storage reorganization ...... 322 2.1.5.5.2.3 Metadata ...... 322 2.1.5.6 Compression ...... 322 2.1.5.6.1 Encoding ...... 322 2.1.5.6.2 Compression ...... 323 2.1.5.6.2.1 Automatic compression ...... 323 2.1.5.6.2.2 Compression strategies ...... 324 2.1.5.6.2.3 Specifying compression strategies ...... 325 2.1.5.6.2.4 Explicitly specifying automatic compression ...... 325 2.1.5.6.2.5 Forcing no compression (flat) ...... 325 2.1.5.6.2.6 Forcing compressions ...... 325 2.1.5.6.2.7 Examining compression effectiveness ...... 326 2.1.5.6.2.8 Notes on reading this table: ...... 329 2.1.5.6.3 Compression best practices ...... 329 2.1.5.6.3.1 Let SQream DB decide on the compression strategy ..... 329 2.1.5.6.3.2 Maximize the advantage of each compression schemes .... 329 2.1.5.6.3.3 Choose data types that fit the data ...... 330 2.1.5.7 Data clustering ...... 330 2.1.5.7.1 Clustered tables ...... 330 2.1.5.7.2 When does clustering help? ...... 330 2.1.5.7.3 Controlling data clustering ...... 331 2.1.5.7.4 Examples ...... 331 2.1.5.7.4.1 Creating a clustered table ...... 331 2.1.5.8 Deleting Data ...... 331 2.1.5.8.1 How does deleting in SQream DB work? ...... 331 xiv 2.1.5.8.1.1 Phase 1: Delete ...... 332 2.1.5.8.1.2 Phase 2: Clean-up ...... 332 2.1.5.8.2 Notes on data deletion ...... 332 2.1.5.8.2.1 Deleting data does not free up space ...... 332 2.1.5.8.2.2 SELECT performance on deleted rows ...... 333 2.1.5.8.2.3 Use TRUNCATE instead of DELETE ...... 333 2.1.5.8.2.4 Cleanup is I/O intensive ...... 333 2.1.5.8.3 Example ...... 333 2.1.5.8.3.1 Deleting values from a table ...... 333 2.1.5.8.3.2 Deleting values based on more complex predicates ...... 334 2.1.5.8.3.3 Identifying and cleaning up tables ...... 334 2.1.5.8.3.4 List tables that haven’t been cleaned up ...... 334 2.1.5.8.3.5 Identify predicates for clean-up ...... 334 2.1.5.8.3.6 Triggering a cleanup ...... 335 2.1.5.8.4 Best practices for data deletion ...... 335 2.1.5.9 Time based data management ...... 335 2.1.5.9.1 Timestamped data ...... 336 2.1.5.9.2 Chunking ...... 336 2.1.5.9.3 Use cases ...... 336 2.1.5.9.4 Best practices for time-based data ...... 337 2.1.5.9.4.1 Use date and datetime types for columns ...... 337 2.1.5.9.4.2 Ordering ...... 337 2.1.5.9.4.3 Limit workload by timestamp ...... 337 2.1.5.10 Foreign Tables ...... 337 2.1.5.10.1 What kind of data is supported? ...... 338 2.1.5.10.2 What kind of data staging is supported? ...... 338 2.1.5.10.3 Using foreign tables - a practical example ...... 338 2.1.5.10.3.1 Planning for data staging ...... 338 2.1.5.10.3.2 Creating the foreign table ...... 339 2.1.5.10.3.3 From S3 ...... 339 2.1.5.10.3.4 From HDFS ...... 340 2.1.5.10.3.5 Querying foreign tables ...... 340 2.1.5.10.3.6 Modifying data from staging ...... 341 2.1.5.10.3.7 Converting a foreign table to a standard database table ... 341 2.1.5.10.4 Error handling and limitations ...... 342 2.1.5.11 Working with External Data ...... 342 2.1.5.11.1 Amazon S3 ...... 342 2.1.5.11.1.1 S3 Configuration ...... 343 2.1.5.11.1.2 S3 URI Format ...... 343 2.1.5.11.1.3 Authentication ...... 343 2.1.5.11.1.4 Examples ...... 343 2.1.5.11.1.5 Planning for Data Staging ...... 343 2.1.5.11.1.6 Creating the External Table ...... 344 2.1.5.11.1.7 Querying External Tables ...... 345 2.1.5.11.1.8 Bulk Loading a File from a Public S3 Bucket ...... 345 2.1.5.11.1.9 Loading Files from an Authenticated S3 Bucket ...... 345 2.1.5.11.2 Using SQream in an HDFS Environment ...... 346 2.1.5.11.2.1 Configuring an HDFS Environment for the User sqream .. 346 2.1.5.11.2.2 Authenticate Hadoop Servers that Require Kerberos ..... 346 2.1.5.12 Transactions ...... 349 2.1.5.13 Workload Manager ...... 349 2.1.5.13.1 Why use the workload manager? ...... 349 2.1.5.13.2 Setting up service queues ...... 350 2.1.5.13.3 Example - Allocating resources for ETL ...... 350
xv 2.1.5.13.3.1 Creating this configuration ...... 350 2.1.5.13.3.2 Verifying the configuration ...... 351 2.1.5.13.4 Configuring a client to connect to a specific service ...... 352 2.1.5.13.4.1 Using Sqream SQL CLI Reference ...... 352 2.1.5.13.4.2 Using JDBC ...... 352 2.1.5.13.4.3 Using ODBC ...... 352 2.1.5.13.4.4 Using Python (pysqream) ...... 352 2.1.5.13.4.5 Using Node.JS ...... 353 2.1.5.14 Python UDF (user-defined functions) ...... 353 2.1.5.14.1 A simple example ...... 354 2.1.5.14.2 Why use UDFs? ...... 355 2.1.5.14.3 SQream DB’s UDF support ...... 355 2.1.5.14.3.1 Scalar functions ...... 355 2.1.5.14.3.2 Python ...... 355 2.1.5.14.3.3 Using modules ...... 355 2.1.5.14.4 Finding existing UDFs in the catalog ...... 356 2.1.5.14.5 Getting the DDL for a function ...... 356 2.1.5.14.6 Error handling ...... 356 2.1.5.14.7 Permissions and sharing ...... 356 2.1.5.14.8 Best practices ...... 357 2.1.5.15 Saved queries ...... 357 2.1.5.15.1 How saved queries work ...... 357 2.1.5.15.2 Parameters support ...... 357 2.1.5.15.3 Creating a saved query ...... 357 2.1.5.15.3.1 Saving a simple query ...... 357 2.1.5.15.3.2 Saving a parametrized query ...... 357 2.1.5.15.4 Listing and executing saved queries ...... 358 2.1.5.15.5 Dropping a saved query ...... 359 2.1.5.16 Seeing system objects as DDL ...... 359 2.1.5.16.1 Dump specific objects ...... 359 2.1.5.16.1.1 Tables ...... 359 2.1.5.16.1.2 Getting the DDL for a table ...... 359 2.1.5.16.1.3 Exporting table DDL to a file ...... 359 2.1.5.16.1.4 Views ...... 359 2.1.5.16.1.5 Listing all views ...... 360 2.1.5.16.1.6 Getting the DDL for a view ...... 360 2.1.5.16.1.7 Exporting view DDL to a file ...... 360 2.1.5.16.1.8 User defined functions ...... 360 2.1.5.16.1.9 Listing all UDFs ...... 360 2.1.5.16.1.10 Getting the DDL for a function ...... 360 2.1.5.16.1.11 Exporting function DDL to a file ...... 361 2.1.5.16.1.12 Saved queries ...... 361 2.1.5.16.2 Dump entire database DDLs ...... 361 2.1.5.16.2.1 Exporting database DDL to a client ...... 361 2.1.5.16.2.2 Exporting database DDL to a file ...... 362 2.1.6 Architecture ...... 362 2.1.6.1 Internals and architecture ...... 362 2.1.6.1.1 SQream DB internals ...... 362 2.1.6.1.1.1 Statement compiler ...... 363 2.1.6.1.1.2 Concurrency and concurrency control ...... 363 2.1.6.1.1.3 Transactions ...... 363 2.1.6.1.1.4 Storage ...... 363 2.1.6.1.1.5 Metadata layer ...... 363 2.1.6.1.1.6 Bulk data layer ...... 363 xvi 2.1.6.1.1.7 Building blocks ...... 364 2.1.6.1.2 Columnar ...... 364 2.1.6.1.3 GPU usage ...... 364 2.1.6.2 Filesystem and usage ...... 364 2.1.6.2.1 Directory organization ...... 364 2.1.6.2.1.1 databases ...... 365 2.1.6.2.1.2 metadata or leveldb ...... 366 2.1.6.2.1.3 temp ...... 366 2.1.6.2.1.4 logs ...... 367
3 Reference 369 3.1 Data Types ...... 369 3.1.1 Supported Types ...... 369 3.1.2 Converting and Casting Types ...... 371 3.1.3 Data Type Reference ...... 371 3.1.3.1 Numeric (NUMERIC, DECIMAL) ...... 371 3.1.3.1.1 Numeric Examples ...... 371 3.1.3.2 Boolean (BOOL) ...... 372 3.1.3.2.1 Boolean Examples ...... 372 3.1.3.2.2 Boolean Casts and Conversions ...... 372 3.1.3.3 Integers (TINYINT, SMALLINT, INT, BIGINT) ...... 372 3.1.3.3.1 Integer Types ...... 372 3.1.3.3.2 Integer Examples ...... 373 3.1.3.3.3 Integer Casts and Conversions ...... 373 3.1.3.4 Floating Point (REAL, DOUBLE) ...... 373 3.1.3.4.1 Floating Point Types ...... 374 3.1.3.4.2 Floating Point Examples ...... 374 3.1.3.4.3 Floating Point Casts and Conversions ...... 374 3.1.3.5 String (TEXT, VARCHAR) ...... 375 3.1.3.5.1 String Types ...... 375 3.1.3.5.2 Length ...... 375 3.1.3.5.3 Syntax ...... 375 3.1.3.5.4 Size ...... 375 3.1.3.5.5 String Examples ...... 376 3.1.3.5.6 String Casts and Conversions ...... 376 3.1.3.6 Date (DATE, DATETIME) ...... 376 3.1.3.6.1 Date Types ...... 376 3.1.3.6.2 Aliases ...... 377 3.1.3.6.3 Syntax ...... 377 3.1.3.6.4 Size ...... 377 3.1.3.6.5 Date Examples ...... 377 3.1.3.6.6 Date Casts and Conversions ...... 378 3.2 SQL Statements and Syntax ...... 378 3.2.1 SQL Syntax Features ...... 378 3.2.1.1 Keywords and Identifiers ...... 378 3.2.1.1.1 Keywords ...... 378 3.2.1.1.2 Identifiers ...... 378 3.2.1.1.2.1 Identifier rules ...... 378 3.2.1.1.3 Reserved keywords ...... 379 3.2.1.2 Literals ...... 381 3.2.1.2.1 Numeric Literals ...... 381 3.2.1.2.1.1 Examples ...... 381 3.2.1.2.2 String Literals ...... 382 3.2.1.2.2.1 Examples ...... 382
xvii 3.2.1.2.2.2 Regular String Literals ...... 383 3.2.1.2.2.3 Examples ...... 383 3.2.1.2.2.4 Dollar-Quoted String Literals ...... 383 3.2.1.2.2.5 Examples ...... 383 3.2.1.2.2.6 Escaped String Literals ...... 384 3.2.1.2.3 Typed Literals ...... 384 3.2.1.2.3.1 Syntax Reference ...... 384 3.2.1.2.3.2 Examples ...... 385 3.2.1.2.4 Boolean Literals ...... 385 3.2.1.2.4.1 Example ...... 385 3.2.1.2.5 Other Constants ...... 385 3.2.1.3 Scalar expressions ...... 386 3.2.1.3.1 Literals ...... 386 3.2.1.3.2 Column references ...... 386 3.2.1.3.3 Operators ...... 386 3.2.1.3.3.1 Unary operator ...... 386 3.2.1.3.3.2 Binary operator ...... 386 3.2.1.3.3.3 Special operators for set membership ...... 387 3.2.1.3.3.4 Comparisons ...... 387 3.2.1.3.3.5 Operator precedence ...... 387 3.2.1.4 Joins ...... 388 3.2.1.4.1 Syntax ...... 388 3.2.1.4.1.1 Join types ...... 389 3.2.1.4.1.2 Inner join ...... 389 3.2.1.4.1.3 Left outer join ...... 389 3.2.1.4.1.4 Right outer join ...... 389 3.2.1.4.1.5 Cross join ...... 389 3.2.1.4.1.6 Join conditions ...... 389 3.2.1.4.1.7 ON value_expr ...... 389 3.2.1.4.2 Examples ...... 390 3.2.1.4.2.1 Inner join ...... 390 3.2.1.4.2.2 Left join ...... 390 3.2.1.4.2.3 Right join ...... 390 3.2.1.4.2.4 Cross join ...... 391 3.2.1.4.2.5 Join hints ...... 392 3.2.1.5 Common table expressions (CTEs) ...... 392 3.2.1.5.1 Syntax ...... 393 3.2.1.5.2 Examples ...... 393 3.2.1.5.2.1 Simple CTE ...... 393 3.2.1.5.2.2 Nested CTEs ...... 394 3.2.1.5.2.3 Reusing CTEs ...... 394 3.2.1.5.2.4 Using CTEs with CREATE TABLE AS ...... 394 3.2.1.6 Window Functions ...... 394 3.2.1.6.1 Syntax ...... 395 3.2.1.6.2 Arguments ...... 396 3.2.1.6.3 Supported Window Functions ...... 396 3.2.1.6.4 How Window Functions Work ...... 396 3.2.1.6.4.1 PARTITION BY ...... 397 3.2.1.6.4.2 ORDER BY ...... 397 3.2.1.6.4.3 Frames ...... 397 3.2.1.6.4.4 Restrictions ...... 398 3.2.1.6.4.5 Frame Exclusion ...... 398 3.2.1.6.5 Limitations ...... 398 3.2.1.6.6 Examples ...... 398 xviii 3.2.1.6.6.1 Window Function Application ...... 399 3.2.1.6.6.2 Ranking Results ...... 400 3.2.1.6.6.3 Using LEAD to Access Following Rows Without a Join .... 400 3.2.1.7 Subqueries ...... 401 3.2.1.7.1 Table Subqueries ...... 401 3.2.1.7.2 Scalar Subqueries ...... 402 3.2.1.7.2.1 Simple Subquery ...... 402 3.2.1.7.2.2 Combining a Subquery with a Join ...... 402 3.2.1.7.2.3 WITH subqueries ...... 403 3.2.1.8 Null handling ...... 403 3.2.1.8.1 Comparisons ...... 403 3.2.1.8.2 Conditionals ...... 404 3.2.1.8.3 Operations ...... 404 3.2.1.8.3.1 Scalar functions ...... 404 3.2.1.8.3.2 Aggregates ...... 404 3.2.1.8.3.3 Distincts ...... 404 3.2.1.8.4 Sorting ...... 405 3.2.2 SQL Statements ...... 405 3.2.2.1 Data Definition Commands (DDL) ...... 406 3.2.2.2 Data Manipulation Commands (DML) ...... 406 3.2.2.3 Utility Commands ...... 406 3.2.2.4 Saved Queries ...... 407 3.2.2.5 Monitoring ...... 407 3.2.2.6 Workload Management ...... 407 3.2.2.7 Access Control Commands ...... 408 3.2.2.7.1 ADD COLUMN ...... 408 3.2.2.7.1.1 Permissions ...... 408 3.2.2.7.1.2 Syntax ...... 408 3.2.2.7.1.3 Parameters ...... 409 3.2.2.7.1.4 Examples ...... 409 3.2.2.7.1.5 Adding a simple column with default value ...... 409 3.2.2.7.1.6 Adding several columns in one command ...... 409 3.2.2.7.2 ALTER DEFAULT SCHEMA ...... 409 3.2.2.7.2.1 Permissions ...... 409 3.2.2.7.2.2 Syntax ...... 409 3.2.2.7.2.3 Parameters ...... 410 3.2.2.7.2.4 Examples ...... 410 3.2.2.7.2.5 Altering the default schema for a role ...... 410 3.2.2.7.3 ALTER TABLE ...... 410 3.2.2.7.3.1 Locks ...... 410 3.2.2.7.3.2 Subcommands ...... 410 3.2.2.7.4 CLUSTER BY ...... 411 3.2.2.7.4.1 Permissions ...... 411 3.2.2.7.4.2 Syntax ...... 411 3.2.2.7.4.3 Parameters ...... 411 3.2.2.7.4.4 Usage notes ...... 411 3.2.2.7.4.5 Examples ...... 411 3.2.2.7.4.6 Reclustering a table ...... 411 3.2.2.7.5 CREATE DATABASE ...... 411 3.2.2.7.5.1 Permissions ...... 412 3.2.2.7.5.2 Syntax ...... 412 3.2.2.7.5.3 Parameters ...... 412 3.2.2.7.5.4 Examples ...... 412 3.2.2.7.6 CREATE EXTERNAL TABLE ...... 412
xix 3.2.2.7.6.1 Permissions ...... 413 3.2.2.7.6.2 Syntax ...... 413 3.2.2.7.6.3 Parameters ...... 414 3.2.2.7.6.4 Examples ...... 414 3.2.2.7.6.5 A simple table from Tab-delimited file (TSV) ...... 414 3.2.2.7.6.6 A table from a directory of Parquet files on HDFS ...... 414 3.2.2.7.6.7 A table from a bucket of files on S3 ...... 415 3.2.2.7.6.8 Changing an external table to a regular table ...... 415 3.2.2.7.7 CREATE FOREIGN TABLE ...... 415 3.2.2.7.7.1 Permissions ...... 415 3.2.2.7.7.2 Syntax ...... 415 3.2.2.7.7.3 Parameters ...... 417 3.2.2.7.7.4 Examples ...... 417 3.2.2.7.7.5 A simple table from Tab-delimited file (TSV) ...... 417 3.2.2.7.7.6 A table from a directory of Parquet files on HDFS ...... 417 3.2.2.7.7.7 A table from a bucket of ORC files on S3 ...... 418 3.2.2.7.7.8 Changing a foreign table to a regular table ...... 418 3.2.2.7.8 CREATE FUNCTION ...... 418 3.2.2.7.8.1 Permissions ...... 418 3.2.2.7.8.2 Syntax ...... 418 3.2.2.7.8.3 Parameters ...... 419 3.2.2.7.8.4 Examples ...... 419 3.2.2.7.8.5 Calculate distance between two points ...... 419 3.2.2.7.8.6 Calling files from other locations ...... 419 3.2.2.7.9 CREATE SCHEMA ...... 420 3.2.2.7.9.1 Overview ...... 420 3.2.2.7.9.2 Permissions ...... 420 3.2.2.7.9.3 Syntax ...... 420 3.2.2.7.9.4 Parameters ...... 421 3.2.2.7.9.5 Examples ...... 421 3.2.2.7.9.6 Creating a Schema ...... 421 3.2.2.7.9.7 Altering the Default Schema for a Role ...... 421 3.2.2.7.10 CREATE TABLE ...... 421 3.2.2.7.10.1 Permissions ...... 422 3.2.2.7.10.2 Syntax ...... 422 3.2.2.7.10.3 Parameters ...... 422 3.2.2.7.10.4 Default Value Constraints ...... 423 3.2.2.7.10.5 Syntax ...... 423 3.2.2.7.10.6 Identity ...... 423 3.2.2.7.10.7 Examples ...... 424 3.2.2.7.10.8 Standard Table ...... 424 3.2.2.7.10.9 Table with Default Value Constraints for Some Columns ... 424 3.2.2.7.10.10 Table with an Identity Column ...... 424 3.2.2.7.10.11 Creating a Table from a SELECT Query ...... 425 3.2.2.7.10.12 Creating a Table with a Clustering Key ...... 425 3.2.2.7.11 CREATE TABLE AS ...... 425 3.2.2.7.11.1 Permissions ...... 425 3.2.2.7.11.2 Syntax ...... 426 3.2.2.7.11.3 Parameters ...... 426 3.2.2.7.11.4 Examples ...... 426 3.2.2.7.11.5 Create a copy of an foreign table or view ...... 426 3.2.2.7.11.6 Filtering ...... 426 3.2.2.7.11.7 Adding columns ...... 426 3.2.2.7.11.8 Creating a table from values ...... 426 xx 3.2.2.7.12 CREATE VIEW ...... 427 3.2.2.7.12.1 Permissions ...... 427 3.2.2.7.12.2 Syntax ...... 427 3.2.2.7.12.3 Parameters ...... 427 3.2.2.7.12.4 Examples ...... 428 3.2.2.7.12.5 A simple view ...... 428 3.2.2.7.12.6 Overriding default column names ...... 428 3.2.2.7.13 DROP CLUSTERING KEY ...... 428 3.2.2.7.13.1 Permissions ...... 428 3.2.2.7.13.2 Syntax ...... 428 3.2.2.7.13.3 Parameters ...... 428 3.2.2.7.13.4 Usage notes ...... 428 3.2.2.7.13.5 Examples ...... 429 3.2.2.7.13.6 Dropping clustering keys in a table ...... 429 3.2.2.7.14 DROP COLUMN ...... 429 3.2.2.7.14.1 Permissions ...... 429 3.2.2.7.14.2 Syntax ...... 429 3.2.2.7.14.3 Parameters ...... 429 3.2.2.7.14.4 Examples ...... 429 3.2.2.7.14.5 Removing a column ...... 429 3.2.2.7.14.6 Removing a column with a quoted identifier name ...... 430 3.2.2.7.15 DROP DATABASE ...... 430 3.2.2.7.15.1 Permissions ...... 430 3.2.2.7.15.2 Syntax ...... 430 3.2.2.7.15.3 Parameters ...... 430 3.2.2.7.15.4 Examples ...... 430 3.2.2.7.15.5 Dropping a database and all of its objects ...... 430 3.2.2.7.15.6 Dropping the current database ...... 430 3.2.2.7.16 DROP FUNCTION ...... 431 3.2.2.7.16.1 Permissions ...... 431 3.2.2.7.16.2 Syntax ...... 431 3.2.2.7.16.3 Parameters ...... 431 3.2.2.7.16.4 Examples ...... 431 3.2.2.7.16.5 Dropping a function ...... 431 3.2.2.7.16.6 Dropping a function (always succeeds) ...... 431 3.2.2.7.17 DROP SCHEMA ...... 431 3.2.2.7.17.1 Permissions ...... 432 3.2.2.7.17.2 Syntax ...... 432 3.2.2.7.17.3 Parameters ...... 432 3.2.2.7.17.4 Examples ...... 432 3.2.2.7.17.5 Dropping a schema ...... 432 3.2.2.7.17.6 Dropping a schema if it’s not empty ...... 432 3.2.2.7.18 DROP TABLE ...... 433 3.2.2.7.18.1 Permissions ...... 433 3.2.2.7.18.2 Syntax ...... 433 3.2.2.7.18.3 Parameters ...... 433 3.2.2.7.18.4 Examples ...... 433 3.2.2.7.18.5 Dropping a table ...... 433 3.2.2.7.18.6 Dropping a table (always succeeds) ...... 433 3.2.2.7.19 DROP VIEW ...... 434 3.2.2.7.19.1 Permissions ...... 434 3.2.2.7.19.2 Syntax ...... 434 3.2.2.7.19.3 Parameters ...... 434 3.2.2.7.19.4 Examples ...... 434
xxi 3.2.2.7.19.5 Dropping a table ...... 434 3.2.2.7.19.6 Dropping a view (always succeeds) ...... 434 3.2.2.7.20 RENAME COLUMN ...... 435 3.2.2.7.20.1 Permissions ...... 435 3.2.2.7.20.2 Syntax ...... 435 3.2.2.7.20.3 Parameters ...... 435 3.2.2.7.20.4 Examples ...... 435 3.2.2.7.20.5 Renaming a column ...... 435 3.2.2.7.20.6 Renaming a quoted name ...... 435 3.2.2.7.21 RENAME TABLE ...... 436 3.2.2.7.21.1 Permissions ...... 436 3.2.2.7.21.2 Syntax ...... 436 3.2.2.7.21.3 Parameters ...... 436 3.2.2.7.21.4 Examples ...... 436 3.2.2.7.21.5 Renaming a table ...... 436 3.2.2.7.22 COPY FROM ...... 436 3.2.2.7.22.1 Permissions ...... 437 3.2.2.7.22.2 Syntax ...... 437 3.2.2.7.22.3 Elements ...... 440 3.2.2.7.22.4 Supported Date Formats ...... 441 3.2.2.7.22.5 Supported Field Delimiters ...... 441 3.2.2.7.22.6 Customizing Quotations Using Alternative Characters .... 441 3.2.2.7.22.7 Syntax Example 1 - Customizing Quotations Using Alterna- tive Characters ...... 441 3.2.2.7.22.8 Usage Example 1 - Customizing Quotations Using Alterna- tive Characters ...... 442 3.2.2.7.22.9 Syntax Example 2 - Customizing Quotations Using ASCII Character Codes ...... 442 3.2.2.7.22.10 Usage Example 2 - Customizing Quotations Using ASCII Character Codes ...... 442 3.2.2.7.22.11 Multi-Character Delimiters ...... 442 3.2.2.7.22.12 Printable Characters ...... 442 3.2.2.7.22.13 Non-Printable Characters ...... 442 3.2.2.7.22.14 Unsupported Field Delimiters ...... 443 3.2.2.7.22.15 Capturing Rejected Rows ...... 443 3.2.2.7.22.16 CSV Support ...... 444 3.2.2.7.22.17 Marking Null Markers ...... 444 3.2.2.7.22.18 Examples ...... 444 3.2.2.7.22.19 Loading a Standard CSV File ...... 444 3.2.2.7.22.20 Skipping Faulty Rows ...... 444 3.2.2.7.22.21 Skipping a Maximum of 100 Faulty Rows ...... 445 3.2.2.7.22.22 Loading a Pipe Separated Value (PSV) File ...... 445 3.2.2.7.22.23 Loading a Tab Separated Value (TSV) File ...... 445 3.2.2.7.22.24 Loading an ORC File ...... 445 3.2.2.7.22.25 Loading a Parquet File ...... 445 3.2.2.7.22.26 Loading a Text File with Non-Printable Delimiters ...... 445 3.2.2.7.22.27 Loading a Text File with Multi-Character Delimiters ..... 445 3.2.2.7.22.28 Loading Files with a Header Row ...... 446 3.2.2.7.22.29 Loading Files Formatted for Windows (\r\n) ...... 446 3.2.2.7.22.30 Loading a File from a Public S3 Bucket ...... 446 3.2.2.7.22.31 Loading Files from an Authenticated S3 Bucket ...... 446 3.2.2.7.22.32 Saving Rejected Rows to a File ...... 446 3.2.2.7.22.33 Loading CSV Files from a Set of Directories ...... 447 3.2.2.7.22.34 Rearranging Destination Columns ...... 447 xxii 3.2.2.7.22.35 Loading Non-Standard Dates ...... 447 3.2.2.7.23 COPY TO ...... 447 3.2.2.7.23.1 Permissions ...... 448 3.2.2.7.23.2 Syntax ...... 448 3.2.2.7.23.3 Elements ...... 449 3.2.2.7.23.4 Usage notes ...... 449 3.2.2.7.23.5 Supported field delimiters ...... 449 3.2.2.7.23.6 Printable characters ...... 449 3.2.2.7.23.7 Non-printable characters ...... 449 3.2.2.7.23.8 Date format ...... 449 3.2.2.7.23.9 Examples ...... 449 3.2.2.7.23.10 Export table to a CSV without HEADER ...... 449 3.2.2.7.23.11 Export table to a CSV with a HEADER row ...... 450 3.2.2.7.23.12 Export table to a TSV with a header row ...... 450 3.2.2.7.23.13 Use non-printable ASCII characters as delimiter ...... 450 3.2.2.7.23.14 Exporting the result of a query to a CSV ...... 451 3.2.2.7.23.15 Saving files to an authenticated S3 bucket ...... 451 3.2.2.7.23.16 Saving files to an HDFS path ...... 451 3.2.2.7.23.17 Export table to a parquet file ...... 451 3.2.2.7.23.18 Export a query to a parquet file ...... 451 3.2.2.7.23.19 Export table to a ORC file ...... 451 3.2.2.7.24 DELETE ...... 452 3.2.2.7.24.1 Permissions ...... 452 3.2.2.7.24.2 Syntax ...... 452 3.2.2.7.24.3 Parameters ...... 452 3.2.2.7.24.4 How SQream DB deletes data ...... 452 3.2.2.7.24.5 Notes ...... 453 3.2.2.7.24.6 Examples ...... 453 3.2.2.7.24.7 Deleting values from a table ...... 453 3.2.2.7.24.8 Deleting values based on more complex predicates ...... 453 3.2.2.7.24.9 Identifying and cleaning up tables ...... 454 3.2.2.7.24.10 List tables that haven’t been cleaned up ...... 454 3.2.2.7.24.11 Identify predicates for clean-up ...... 454 3.2.2.7.24.12 Triggering a cleanup ...... 454 3.2.2.7.25 INSERT ...... 455 3.2.2.7.25.1 Permissions ...... 455 3.2.2.7.25.2 Syntax ...... 455 3.2.2.7.25.3 Elements ...... 455 3.2.2.7.25.4 Examples ...... 456 3.2.2.7.25.5 Inserting a single row ...... 456 3.2.2.7.25.6 Inserting into rows with default values ...... 456 3.2.2.7.25.7 Changing column order ...... 456 3.2.2.7.25.8 Inserting multiple rows ...... 456 3.2.2.7.25.9 Import data from other tables ...... 456 3.2.2.7.25.10 Inserting data with positional placeholders ...... 457 3.2.2.7.26 SELECT ...... 457 3.2.2.7.26.1 Permissions ...... 458 3.2.2.7.26.2 Syntax ...... 458 3.2.2.7.26.3 Elements ...... 460 3.2.2.7.26.4 Notes ...... 460 3.2.2.7.26.5 Query processing ...... 460 3.2.2.7.26.6 Select lists ...... 460 3.2.2.7.26.7 Examples ...... 461 3.2.2.7.26.8 Simple queries ...... 462
xxiii 3.2.2.7.26.9 Show a count of the rows ...... 462 3.2.2.7.26.10 Get all columns ...... 462 3.2.2.7.26.11 Filter on conditions ...... 463 3.2.2.7.26.12 Filter based on a list ...... 463 3.2.2.7.26.13 Select only distinct rows ...... 463 3.2.2.7.26.14 Count distinct values ...... 464 3.2.2.7.26.15 Rename columns with aliases ...... 464 3.2.2.7.26.16 Searching with LIKE ...... 465 3.2.2.7.26.17 Aggregate functions ...... 465 3.2.2.7.26.18 Filtering on aggregates ...... 465 3.2.2.7.26.19 Sorting results ...... 466 3.2.2.7.26.20 Sorting with multiple columns ...... 467 3.2.2.7.26.21 Combining two or more queries ...... 467 3.2.2.7.26.22 Common table expressions (CTE) ...... 467 3.2.2.7.26.23 Nested CTEs ...... 468 3.2.2.7.26.24 Reusing CTEs ...... 468 3.2.2.7.27 TRUNCATE ...... 468 3.2.2.7.27.1 Permissions ...... 468 3.2.2.7.27.2 Syntax ...... 469 3.2.2.7.27.3 Parameters ...... 469 3.2.2.7.27.4 Examples ...... 469 3.2.2.7.27.5 Truncating a table ...... 469 3.2.2.7.28 VALUES ...... 469 3.2.2.7.28.1 Permissions ...... 470 3.2.2.7.28.2 Syntax ...... 470 3.2.2.7.28.3 Notes ...... 470 3.2.2.7.28.4 Examples ...... 470 3.2.2.7.28.5 Tabular data with VALUES ...... 470 3.2.2.7.28.6 Using VALUES with a SELECT query ...... 470 3.2.2.7.28.7 Creating a table with VALUES ...... 470 3.2.2.7.29 DROP_SAVED_QUERY ...... 471 3.2.2.7.29.1 Permissions ...... 471 3.2.2.7.29.2 Syntax ...... 471 3.2.2.7.29.3 Returns ...... 471 3.2.2.7.29.4 Parameters ...... 471 3.2.2.7.29.5 Examples ...... 471 3.2.2.7.29.6 Dropping a previously saved query ...... 471 3.2.2.7.30 DUMP_DATABASE_DDL ...... 472 3.2.2.7.30.1 Permissions ...... 472 3.2.2.7.30.2 Syntax ...... 472 3.2.2.7.30.3 Parameters ...... 472 3.2.2.7.30.4 Examples ...... 472 3.2.2.7.30.5 Getting the DDL for a database ...... 472 3.2.2.7.30.6 Exporting database DDL to a file ...... 473 3.2.2.7.31 EXECUTE_SAVED_QUERY ...... 473 3.2.2.7.31.1 Permissions ...... 473 3.2.2.7.31.2 Syntax ...... 473 3.2.2.7.31.3 Returns ...... 473 3.2.2.7.31.4 Parameters ...... 474 3.2.2.7.31.5 Notes ...... 474 3.2.2.7.31.6 Examples ...... 474 3.2.2.7.31.7 Saving and executing a simple query ...... 475 3.2.2.7.31.8 Saving and executing parametrized query ...... 475 3.2.2.7.32 GET_DDL ...... 476 xxiv 3.2.2.7.32.1 Permissions ...... 476 3.2.2.7.32.2 Syntax ...... 476 3.2.2.7.32.3 Parameters ...... 476 3.2.2.7.32.4 Examples ...... 477 3.2.2.7.32.5 Getting the DDL for a table ...... 477 3.2.2.7.32.6 Exporting table DDL to a file ...... 477 3.2.2.7.33 GET_FUNCTION_DDL ...... 477 3.2.2.7.33.1 Permissions ...... 477 3.2.2.7.33.2 Syntax ...... 478 3.2.2.7.33.3 Parameters ...... 478 3.2.2.7.33.4 Examples ...... 478 3.2.2.7.33.5 Getting the DDL for a function ...... 478 3.2.2.7.33.6 Exporting function DDL to a file ...... 479 3.2.2.7.34 GET_LICENSE_INFO ...... 479 3.2.2.7.35 GET_VIEW_DDL ...... 479 3.2.2.7.35.1 Permissions ...... 479 3.2.2.7.35.2 Syntax ...... 479 3.2.2.7.35.3 Parameters ...... 480 3.2.2.7.35.4 Examples ...... 480 3.2.2.7.35.5 Getting the DDL for a view ...... 480 3.2.2.7.35.6 Exporting view DDL to a file ...... 480 3.2.2.7.36 LIST_SAVED_QUERIES ...... 480 3.2.2.7.36.1 Permissions ...... 481 3.2.2.7.36.2 Syntax ...... 481 3.2.2.7.36.3 Returns ...... 481 3.2.2.7.36.4 Parameters ...... 481 3.2.2.7.36.5 Notes ...... 481 3.2.2.7.36.6 Examples ...... 481 3.2.2.7.36.7 Listing previously saved queries ...... 481 3.2.2.7.36.8 Listing saved queries with the catalog ...... 481 3.2.2.7.37 RECOMPILE_SAVED_QUERY ...... 482 3.2.2.7.37.1 Permissions ...... 482 3.2.2.7.37.2 Syntax ...... 482 3.2.2.7.37.3 Returns ...... 482 3.2.2.7.37.4 Parameters ...... 482 3.2.2.7.37.5 Examples ...... 482 3.2.2.7.37.6 Recreating a query that has been invalidated ...... 482 3.2.2.7.38 RECOMPILE_VIEW ...... 483 3.2.2.7.38.1 Permissions ...... 483 3.2.2.7.38.2 Syntax ...... 484 3.2.2.7.38.3 Parameters ...... 484 3.2.2.7.38.4 Examples ...... 484 3.2.2.7.38.5 Recreating a view that has been invalidated ...... 484 3.2.2.7.39 SAVE_QUERY ...... 484 3.2.2.7.39.1 Permissions ...... 484 3.2.2.7.39.2 Syntax ...... 484 3.2.2.7.39.3 Returns ...... 485 3.2.2.7.39.4 Parameters ...... 485 3.2.2.7.39.5 Notes ...... 485 3.2.2.7.39.6 Examples ...... 485 3.2.2.7.39.7 Saving and executing a simple query ...... 486 3.2.2.7.39.8 Saving and executing parametrized query ...... 486 3.2.2.7.40 SHOW_SAVED_QUERY ...... 487 3.2.2.7.40.1 Permissions ...... 487
xxv 3.2.2.7.40.2 Syntax ...... 487 3.2.2.7.40.3 Returns ...... 487 3.2.2.7.40.4 Parameters ...... 487 3.2.2.7.40.5 Examples ...... 488 3.2.2.7.40.6 Showing a previously saved query ...... 488 3.2.2.7.41 EXPLAIN ...... 488 3.2.2.7.41.1 Permissions ...... 488 3.2.2.7.41.2 Syntax ...... 488 3.2.2.7.41.3 Parameters ...... 488 3.2.2.7.41.4 Notes ...... 488 3.2.2.7.41.5 Examples ...... 488 3.2.2.7.41.6 Generating a static query plan ...... 488 3.2.2.7.42 SHOW_CONNECTIONS ...... 489 3.2.2.7.42.1 Permissions ...... 489 3.2.2.7.42.2 Syntax ...... 489 3.2.2.7.42.3 Parameters ...... 489 3.2.2.7.42.4 Returns ...... 489 3.2.2.7.42.5 Notes ...... 489 3.2.2.7.42.6 Examples ...... 490 3.2.2.7.42.7 Using SHOW_CONNECTIONS to get statement IDs ...... 490 3.2.2.7.43 SHOW_LOCKS ...... 490 3.2.2.7.43.1 Permissions ...... 490 3.2.2.7.43.2 Syntax ...... 490 3.2.2.7.43.3 Parameters ...... 490 3.2.2.7.43.4 Returns ...... 490 3.2.2.7.43.5 Examples ...... 491 3.2.2.7.43.6 Using SHOW_LOCKS to see active locks ...... 491 3.2.2.7.44 SHOW_NODE_INFO ...... 491 3.2.2.7.44.1 Permissions ...... 492 3.2.2.7.44.2 Syntax ...... 492 3.2.2.7.44.3 Parameters ...... 492 3.2.2.7.44.4 Returns ...... 492 3.2.2.7.44.5 Node types ...... 492 3.2.2.7.44.6 Statement statuses ...... 495 3.2.2.7.44.7 Notes ...... 495 3.2.2.7.44.8 Examples ...... 495 3.2.2.7.44.9 Getting execution details for a statement ...... 495 3.2.2.7.45 SHOW_SERVER_STATUS ...... 496 3.2.2.7.45.1 Permissions ...... 496 3.2.2.7.45.2 Syntax ...... 496 3.2.2.7.45.3 Parameters ...... 497 3.2.2.7.45.4 Returns ...... 497 3.2.2.7.45.5 Notes ...... 497 3.2.2.7.45.6 Examples ...... 498 3.2.2.7.45.7 Using SHOW_SERVER_STATUS to get statement IDs ...... 498 3.2.2.7.46 STOP_STATEMENT ...... 498 3.2.2.7.46.1 Permissions ...... 498 3.2.2.7.46.2 Syntax ...... 498 3.2.2.7.46.3 Parameters ...... 498 3.2.2.7.46.4 Returns ...... 498 3.2.2.7.46.5 Notes ...... 499 3.2.2.7.46.6 Examples ...... 499 3.2.2.7.46.7 Using SHOW_CONNECTIONS to get statement IDs .... 499 3.2.2.7.47 SHOW_SUBSCRIBED_INSTANCES ...... 499 xxvi 3.2.2.7.47.1 Permissions ...... 499 3.2.2.7.47.2 Syntax ...... 499 3.2.2.7.47.3 Parameters ...... 500 3.2.2.7.47.4 Notes ...... 500 3.2.2.7.47.5 Examples ...... 500 3.2.2.7.47.6 List service queues and workers ...... 500 3.2.2.7.48 SUBSCRIBE_SERVICE ...... 500 3.2.2.7.48.1 Permissions ...... 500 3.2.2.7.48.2 Syntax ...... 500 3.2.2.7.48.3 Parameters ...... 501 3.2.2.7.48.4 Notes ...... 501 3.2.2.7.48.5 Examples ...... 501 3.2.2.7.48.6 Subscribing a worker to a service queue ...... 501 3.2.2.7.49 UNSUBSCRIBE_SERVICE ...... 502 3.2.2.7.49.1 Permissions ...... 502 3.2.2.7.49.2 Syntax ...... 502 3.2.2.7.49.3 Parameters ...... 502 3.2.2.7.49.4 Notes ...... 502 3.2.2.7.49.5 Examples ...... 502 3.2.2.7.49.6 Unsubscribing a worker from a service queue ...... 502 3.2.2.7.50 ALTER DEFAULT PERMISSIONS ...... 503 3.2.2.7.50.1 Permissions ...... 503 3.2.2.7.50.2 Syntax ...... 503 3.2.2.7.50.3 Supported permissions ...... 504 3.2.2.7.50.4 Examples ...... 504 3.2.2.7.50.5 Automatic permissions for newly created schemas ...... 504 3.2.2.7.50.6 Automatic permissions for newly created tables in a schema . 505 3.2.2.7.50.7 Revoke (DROP GRANT) permissions for newly created tables . 505 3.2.2.7.51 ALTER ROLE ...... 505 3.2.2.7.51.1 Subcommands ...... 505 3.2.2.7.52 CREATE ROLE ...... 505 3.2.2.7.52.1 Permissions ...... 505 3.2.2.7.52.2 Syntax ...... 505 3.2.2.7.52.3 Parameters ...... 506 3.2.2.7.52.4 Notes ...... 506 3.2.2.7.52.5 Examples ...... 506 3.2.2.7.52.6 Creating a role with no permissions ...... 506 3.2.2.7.52.7 Creating a user role ...... 506 3.2.2.7.53 DROP ROLE ...... 506 3.2.2.7.53.1 Permissions ...... 507 3.2.2.7.53.2 Syntax ...... 507 3.2.2.7.53.3 Parameters ...... 507 3.2.2.7.53.4 Examples ...... 507 3.2.2.7.53.5 Dropping a role ...... 507 3.2.2.7.54 GET_STATEMENT_PERMISSIONS ...... 507 3.2.2.7.54.1 Permissions ...... 507 3.2.2.7.54.2 Syntax ...... 507 3.2.2.7.54.3 Parameters ...... 508 3.2.2.7.54.4 Returns ...... 508 3.2.2.7.54.5 Supported permissions ...... 508 3.2.2.7.54.6 Examples ...... 509 3.2.2.7.54.7 Getting permission details for a simple statement ...... 509 3.2.2.7.54.8 Getting permission details for a DDL statement ...... 509 3.2.2.7.55 GRANT ...... 509
xxvii 3.2.2.7.55.1 Permissions ...... 509 3.2.2.7.55.2 Syntax ...... 509 3.2.2.7.55.3 Parameters ...... 511 3.2.2.7.55.4 Supported permissions ...... 512 3.2.2.7.55.5 Examples ...... 512 3.2.2.7.55.6 Creating a user role with login permissions ...... 512 3.2.2.7.55.7 Promoting a user to a superuser ...... 512 3.2.2.7.55.8 Creating a new role for a group of users ...... 513 3.2.2.7.55.9 Granting with admin option ...... 513 3.2.2.7.55.10 Change password for user role ...... 513 3.2.2.7.56 RENAME ROLE ...... 513 3.2.2.7.56.1 Permissions ...... 513 3.2.2.7.56.2 Syntax ...... 513 3.2.2.7.56.3 Parameters ...... 514 3.2.2.7.56.4 Notes ...... 514 3.2.2.7.56.5 Examples ...... 514 3.2.2.7.56.6 Renaming a role ...... 514 3.2.2.7.57 REVOKE ...... 514 3.2.2.7.57.1 Permissions ...... 514 3.2.2.7.57.2 Syntax ...... 515 3.2.2.7.57.3 Parameters ...... 516 3.2.2.7.57.4 Supported permissions ...... 517 3.2.2.7.57.5 Examples ...... 517 3.2.2.7.57.6 Prevent a role from modifying table contents ...... 517 3.2.2.7.57.7 Demoting a user from superuser ...... 517 3.2.2.7.57.8 Revoking admin option ...... 517 3.2.3 SQL Functions ...... 518 3.2.3.1 Summary of functions ...... 518 3.2.3.1.1 Built-In Scalar Functions ...... 518 3.2.3.1.1.1 Bitwise Operations ...... 518 3.2.3.1.1.2 Conditionals ...... 519 3.2.3.1.1.3 Conversion ...... 519 3.2.3.1.1.4 Date and Time ...... 519 3.2.3.1.1.5 Numeric ...... 519 3.2.3.1.1.6 Strings ...... 521 3.2.3.1.2 User-Defined Scalar Functions ...... 521 3.2.3.1.3 Aggregate Functions ...... 521 3.2.3.1.4 Window Functions ...... 522 3.2.3.1.5 System Functions ...... 522 3.2.3.1.6 Workload Management Functions ...... 522 3.2.3.1.6.1 Built-In Scalar functions ...... 522 3.2.3.1.6.2 & (bitwise AND) ...... 523 3.2.3.1.6.3 Syntax ...... 523 3.2.3.1.6.4 Arguments ...... 523 3.2.3.1.6.5 Returns ...... 523 3.2.3.1.6.6 Notes ...... 523 3.2.3.1.6.7 Examples ...... 523 3.2.3.1.6.8 ~ (bitwise NOT) ...... 524 3.2.3.1.6.9 Syntax ...... 524 3.2.3.1.6.10 Arguments ...... 524 3.2.3.1.6.11 Returns ...... 524 3.2.3.1.6.12 Notes ...... 524 3.2.3.1.6.13 Examples ...... 524 3.2.3.1.6.14 | (bitwise OR) ...... 525 xxviii 3.2.3.1.6.15 Syntax ...... 525 3.2.3.1.6.16 Arguments ...... 525 3.2.3.1.6.17 Returns ...... 525 3.2.3.1.6.18 Notes ...... 525 3.2.3.1.6.19 Examples ...... 526 3.2.3.1.6.20 << (bitwise shift left) ...... 526 3.2.3.1.6.21 Syntax ...... 526 3.2.3.1.6.22 Arguments ...... 526 3.2.3.1.6.23 Returns ...... 527 3.2.3.1.6.24 Notes ...... 527 3.2.3.1.6.25 Examples ...... 527 3.2.3.1.6.26 >> (bitwise shift right) ...... 527 3.2.3.1.6.27 Syntax ...... 527 3.2.3.1.6.28 Arguments ...... 528 3.2.3.1.6.29 Returns ...... 528 3.2.3.1.6.30 Notes ...... 528 3.2.3.1.6.31 Examples ...... 528 3.2.3.1.6.32 XOR (bitwise XOR) ...... 528 3.2.3.1.6.33 Syntax ...... 529 3.2.3.1.6.34 Arguments ...... 529 3.2.3.1.6.35 Returns ...... 529 3.2.3.1.6.36 Notes ...... 529 3.2.3.1.6.37 Examples ...... 529 3.2.3.1.6.38 BETWEEN ...... 530 3.2.3.1.6.39 Syntax ...... 530 3.2.3.1.6.40 Arguments ...... 530 3.2.3.1.6.41 Returns ...... 530 3.2.3.1.6.42 Notes ...... 530 3.2.3.1.6.43 Examples ...... 530 3.2.3.1.6.44 BETWEEN ...... 530 3.2.3.1.6.45 NOT BETWEEN ...... 531 3.2.3.1.6.46 CASE ...... 531 3.2.3.1.6.47 Syntax ...... 531 3.2.3.1.6.48 Arguments ...... 531 3.2.3.1.6.49 Returns ...... 532 3.2.3.1.6.50 Notes ...... 532 3.2.3.1.6.51 Simple case expression ...... 532 3.2.3.1.6.52 Searched case expression ...... 532 3.2.3.1.6.53 Examples ...... 532 3.2.3.1.6.54 Simple case ...... 532 3.2.3.1.6.55 Searched case ...... 533 3.2.3.1.6.56 Replacing IF with CASE ...... 533 3.2.3.1.6.57 COALESCE ...... 534 3.2.3.1.6.58 Syntax ...... 534 3.2.3.1.6.59 Arguments ...... 534 3.2.3.1.6.60 Returns ...... 534 3.2.3.1.6.61 Notes ...... 534 3.2.3.1.6.62 Examples ...... 534 3.2.3.1.6.63 Coalesce ...... 534 3.2.3.1.6.64 Replacing NULL values in aggregations ...... 534 3.2.3.1.6.65 IN ...... 535 3.2.3.1.6.66 Syntax ...... 535 3.2.3.1.6.67 Arguments ...... 535 3.2.3.1.6.68 Returns ...... 535
xxix 3.2.3.1.6.69 Notes ...... 535 3.2.3.1.6.70 Examples ...... 535 3.2.3.1.6.71 IN ...... 535 3.2.3.1.6.72 NOT IN ...... 536 3.2.3.1.6.73 IS_ASCII ...... 536 3.2.3.1.6.74 Syntax ...... 536 3.2.3.1.6.75 Arguments ...... 536 3.2.3.1.6.76 Returns ...... 536 3.2.3.1.6.77 Notes ...... 536 3.2.3.1.6.78 Examples ...... 536 3.2.3.1.6.79 IS NULL ...... 537 3.2.3.1.6.80 IS NULL ...... 537 3.2.3.1.6.81 Syntax ...... 537 3.2.3.1.6.82 Arguments ...... 537 3.2.3.1.6.83 Returns ...... 537 3.2.3.1.6.84 Examples ...... 537 3.2.3.1.6.85 IS NULL ...... 538 3.2.3.1.6.86 Using IS NOT NULL to filter unwanted results ...... 538 3.2.3.1.6.87 ISNULL ...... 538 3.2.3.1.6.88 Syntax ...... 538 3.2.3.1.6.89 Arguments ...... 538 3.2.3.1.6.90 Returns ...... 538 3.2.3.1.6.91 Notes ...... 538 3.2.3.1.6.92 Examples ...... 539 3.2.3.1.6.93 ISNULL ...... 539 3.2.3.1.6.94 Replacing NULL values in aggregations ...... 539 3.2.3.1.6.95 FROM_UNIXTS, FROM_UNIXTSMS ...... 539 3.2.3.1.6.96 Syntax ...... 539 3.2.3.1.6.97 Arguments ...... 539 3.2.3.1.6.98 Returns ...... 539 3.2.3.1.6.99 Notes ...... 540 3.2.3.1.6.100 Examples ...... 540 3.2.3.1.6.101 Convert a UNIX timestamp to DATETIME ...... 540 3.2.3.1.6.102 TO_HEX ...... 540 3.2.3.1.6.103 Syntax ...... 540 3.2.3.1.6.104 Arguments ...... 540 3.2.3.1.6.105 Returns ...... 540 3.2.3.1.6.106 Examples ...... 540 3.2.3.1.6.107 Convert numbers to hexadecimal ...... 540 3.2.3.1.6.108 TO_UNIXTS, TO_UNIXTSMS ...... 541 3.2.3.1.6.109 Syntax ...... 541 3.2.3.1.6.110 Arguments ...... 541 3.2.3.1.6.111 Returns ...... 541 3.2.3.1.6.112 Notes ...... 541 3.2.3.1.6.113 Examples ...... 541 3.2.3.1.6.114 Get the current UNIX timestamp ...... 541 3.2.3.1.6.115 Filter on a range of UNIX timestamps ...... 542 3.2.3.1.6.116 CURDATE ...... 542 3.2.3.1.6.117 Syntax ...... 542 3.2.3.1.6.118 Arguments ...... 542 3.2.3.1.6.119 Returns ...... 542 3.2.3.1.6.120 Notes ...... 542 3.2.3.1.6.121 Examples ...... 542 3.2.3.1.6.122 Get the current system date ...... 542 xxx 3.2.3.1.6.123 CURRENT_DATE ...... 543 3.2.3.1.6.124 Syntax ...... 543 3.2.3.1.6.125 Arguments ...... 543 3.2.3.1.6.126 Returns ...... 543 3.2.3.1.6.127 Notes ...... 543 3.2.3.1.6.128 Examples ...... 543 3.2.3.1.6.129 Get the current system date ...... 543 3.2.3.1.6.130 CURRENT_TIMESTAMP ...... 543 3.2.3.1.6.131 Syntax ...... 544 3.2.3.1.6.132 Arguments ...... 544 3.2.3.1.6.133 Returns ...... 544 3.2.3.1.6.134 Notes ...... 544 3.2.3.1.6.135 Examples ...... 544 3.2.3.1.6.136 Get the current system date and time ...... 544 3.2.3.1.6.137 Find events that happen before this month ...... 544 3.2.3.1.6.138 DATEADD ...... 544 3.2.3.1.6.139 Syntax ...... 545 3.2.3.1.6.140 Arguments ...... 545 3.2.3.1.6.141 Valid date parts ...... 545 3.2.3.1.6.142 Returns ...... 546 3.2.3.1.6.143 Notes ...... 546 3.2.3.1.6.144 Examples ...... 546 3.2.3.1.6.145 Add a quarter to each date ...... 546 3.2.3.1.6.146 Getting next month’s date ...... 546 3.2.3.1.6.147 Filtering +- 50 years from a specific date ...... 547 3.2.3.1.6.148 Check if a year is a leap year ...... 547 3.2.3.1.6.149 DATEDIFF ...... 547 3.2.3.1.6.150 Syntax ...... 547 3.2.3.1.6.151 Arguments ...... 548 3.2.3.1.6.152 Valid date parts ...... 548 3.2.3.1.6.153 Returns ...... 548 3.2.3.1.6.154 Notes ...... 548 3.2.3.1.6.155 Examples ...... 548 3.2.3.1.6.156 Find out how far past events are ...... 549 3.2.3.1.6.157 In years ...... 549 3.2.3.1.6.158 In days ...... 549 3.2.3.1.6.159 In hours ...... 549 3.2.3.1.6.160 DATEPART ...... 550 3.2.3.1.6.161 Syntax ...... 550 3.2.3.1.6.162 Arguments ...... 550 3.2.3.1.6.163 Valid date parts ...... 551 3.2.3.1.6.164 Returns ...... 551 3.2.3.1.6.165 Notes ...... 551 3.2.3.1.6.166 Examples ...... 551 3.2.3.1.6.167 Break up a DATE into components ...... 551 3.2.3.1.6.168 Break up a DATETIME into time components ...... 552 3.2.3.1.6.169 Count number of rows grouped by quarter ...... 552 3.2.3.1.6.170 EOMONTH ...... 552 3.2.3.1.6.171 Syntax ...... 553 3.2.3.1.6.172 Arguments ...... 553 3.2.3.1.6.173 Returns ...... 553 3.2.3.1.6.174 Notes ...... 553 3.2.3.1.6.175 Examples ...... 553 3.2.3.1.6.176 Find last day of the month for a DATE ...... 553
xxxi 3.2.3.1.6.177 Find the last day of the month for a DATETIME ...... 554 3.2.3.1.6.178 EXTRACT ...... 554 3.2.3.1.6.179 Syntax ...... 554 3.2.3.1.6.180 Arguments ...... 554 3.2.3.1.6.181 Valid date parts ...... 555 3.2.3.1.6.182 Returns ...... 555 3.2.3.1.6.183 Notes ...... 555 3.2.3.1.6.184 Examples ...... 555 3.2.3.1.6.185 Break up a DATE into components ...... 555 3.2.3.1.6.186 GETDATE ...... 556 3.2.3.1.6.187 Syntax ...... 556 3.2.3.1.6.188 Arguments ...... 556 3.2.3.1.6.189 Returns ...... 556 3.2.3.1.6.190 Notes ...... 556 3.2.3.1.6.191 Examples ...... 556 3.2.3.1.6.192 Get the current system date and time ...... 556 3.2.3.1.6.193 Find events that happen before this month ...... 556 3.2.3.1.6.194 SYSDATE ...... 557 3.2.3.1.6.195 Syntax ...... 557 3.2.3.1.6.196 Arguments ...... 557 3.2.3.1.6.197 Returns ...... 557 3.2.3.1.6.198 Notes ...... 557 3.2.3.1.6.199 Examples ...... 557 3.2.3.1.6.200 Get the current system date and time ...... 557 3.2.3.1.6.201 Find events that happen before this month ...... 557 3.2.3.1.6.202 TRUNC ...... 558 3.2.3.1.6.203 Syntax ...... 558 3.2.3.1.6.204 Arguments ...... 558 3.2.3.1.6.205 Valid date parts ...... 558 3.2.3.1.6.206 Returns ...... 559 3.2.3.1.6.207 Notes ...... 559 3.2.3.1.6.208 Examples ...... 559 3.2.3.1.6.209 Set all DATE columns to DATETIME at midnight ...... 559 3.2.3.1.6.210 Find the first day of the month for dates ...... 559 3.2.3.1.6.211 Calculate number of hours from New Years ...... 560 3.2.3.1.6.212 ABS ...... 560 3.2.3.1.6.213 Syntax ...... 560 3.2.3.1.6.214 Arguments ...... 560 3.2.3.1.6.215 Returns ...... 560 3.2.3.1.6.216 Notes ...... 560 3.2.3.1.6.217 Examples ...... 560 3.2.3.1.6.218 Absolute value on an integer ...... 561 3.2.3.1.6.219 Absolute value on integer and floating point ...... 561 3.2.3.1.6.220 ACOS ...... 561 3.2.3.1.6.221 Syntax ...... 561 3.2.3.1.6.222 Arguments ...... 561 3.2.3.1.6.223 Returns ...... 561 3.2.3.1.6.224 Notes ...... 562 3.2.3.1.6.225 Examples ...... 562 3.2.3.1.6.226 Inverse cosine of 0 (= π/2) ...... 562 3.2.3.1.6.227 Computing inverse cosine for a column ...... 562 3.2.3.1.6.228 Arithmetic operators ...... 562 3.2.3.1.6.229 Syntax ...... 563 3.2.3.1.6.230 Arguments ...... 563 xxxii 3.2.3.1.6.231 Arithmetic operators ...... 563 3.2.3.1.6.232 Notes ...... 563 3.2.3.1.6.233 Examples ...... 563 3.2.3.1.6.234 Coercing with a unary + ...... 563 3.2.3.1.6.235 Using infix operators ...... 563 3.2.3.1.6.236 ASIN ...... 564 3.2.3.1.6.237 Syntax ...... 564 3.2.3.1.6.238 Arguments ...... 564 3.2.3.1.6.239 Returns ...... 564 3.2.3.1.6.240 Notes ...... 564 3.2.3.1.6.241 Examples ...... 564 3.2.3.1.6.242 Inverse sine of 1 (= π/2) ...... 565 3.2.3.1.6.243 Computing inverse sine for a column ...... 565 3.2.3.1.6.244 ATAN ...... 565 3.2.3.1.6.245 Syntax ...... 565 3.2.3.1.6.246 Arguments ...... 565 3.2.3.1.6.247 Returns ...... 565 3.2.3.1.6.248 Notes ...... 565 3.2.3.1.6.249 Examples ...... 566 3.2.3.1.6.250 Inverse tangent of 1 (= π/4) ...... 566 3.2.3.1.6.251 Computing inverse tangent for a column ...... 566 3.2.3.1.6.252 ATN2 ...... 566 3.2.3.1.6.253 Syntax ...... 566 3.2.3.1.6.254 Arguments ...... 567 3.2.3.1.6.255 Returns ...... 567 3.2.3.1.6.256 Notes ...... 567 3.2.3.1.6.257 Examples ...... 567 3.2.3.1.6.258 Inverse tangent of (1, 0) (= π/2) ...... 567 3.2.3.1.6.259 Inverse tangent of x=0.5, y=SQRT(3)/2 (= π/3, or 60°) ... 567 3.2.3.1.6.260 CEILING / CEIL ...... 567 3.2.3.1.6.261 Syntax ...... 568 3.2.3.1.6.262 Arguments ...... 568 3.2.3.1.6.263 Returns ...... 568 3.2.3.1.6.264 Notes ...... 568 3.2.3.1.6.265 Examples ...... 568 3.2.3.1.6.266 FLOOR vs. CEIL vs. ROUND ...... 568 3.2.3.1.6.267 COS ...... 568 3.2.3.1.6.268 Syntax ...... 568 3.2.3.1.6.269 Arguments ...... 569 3.2.3.1.6.270 Returns ...... 569 3.2.3.1.6.271 Notes ...... 569 3.2.3.1.6.272 Examples ...... 569 3.2.3.1.6.273 Cosine of 0 (= 1) ...... 569 3.2.3.1.6.274 Computing cosine for a column, in radians ...... 569 3.2.3.1.6.275 COT ...... 570 3.2.3.1.6.276 Syntax ...... 570 3.2.3.1.6.277 Arguments ...... 570 3.2.3.1.6.278 Returns ...... 570 3.2.3.1.6.279 Notes ...... 570 3.2.3.1.6.280 Examples ...... 570 3.2.3.1.6.281 Cotangent of π/4 (= 1) ...... 570 3.2.3.1.6.282 Computing cotangents for a column, in radians ...... 571 3.2.3.1.6.283 CRC64 ...... 571 3.2.3.1.6.284 Syntax ...... 571
xxxiii 3.2.3.1.6.285 Arguments ...... 571 3.2.3.1.6.286 Returns ...... 571 3.2.3.1.6.287 Notes ...... 571 3.2.3.1.6.288 Examples ...... 572 3.2.3.1.6.289 Calculate a CRC-64 hash of a string ...... 572 3.2.3.1.6.290 DEGREES ...... 572 3.2.3.1.6.291 Syntax ...... 572 3.2.3.1.6.292 Arguments ...... 572 3.2.3.1.6.293 Returns ...... 572 3.2.3.1.6.294 Notes ...... 572 3.2.3.1.6.295 Examples ...... 572 3.2.3.1.6.296 EXP ...... 573 3.2.3.1.6.297 Syntax ...... 573 3.2.3.1.6.298 Arguments ...... 573 3.2.3.1.6.299 Returns ...... 573 3.2.3.1.6.300 Notes ...... 573 3.2.3.1.6.301 Examples ...... 573 3.2.3.1.6.302 Natural exponent values ...... 573 3.2.3.1.6.303 FLOOR ...... 574 3.2.3.1.6.304 Syntax ...... 574 3.2.3.1.6.305 Arguments ...... 574 3.2.3.1.6.306 Returns ...... 574 3.2.3.1.6.307 Notes ...... 574 3.2.3.1.6.308 Examples ...... 574 3.2.3.1.6.309 FLOOR vs. CEILING / CEIL vs. ROUND ...... 574 3.2.3.1.6.310 LOG ...... 575 3.2.3.1.6.311 Syntax ...... 575 3.2.3.1.6.312 Arguments ...... 575 3.2.3.1.6.313 Returns ...... 575 3.2.3.1.6.314 Notes ...... 575 3.2.3.1.6.315 Examples ...... 575 3.2.3.1.6.316 Natural log values ...... 575 3.2.3.1.6.317 LOG10 ...... 575 3.2.3.1.6.318 Syntax ...... 575 3.2.3.1.6.319 Arguments ...... 576 3.2.3.1.6.320 Returns ...... 576 3.2.3.1.6.321 Notes ...... 576 3.2.3.1.6.322 Examples ...... 576 3.2.3.1.6.323 Log (base 10) values ...... 576 3.2.3.1.6.324 MOD, % ...... 576 3.2.3.1.6.325 Syntax ...... 576 3.2.3.1.6.326 Arguments ...... 576 3.2.3.1.6.327 Returns ...... 576 3.2.3.1.6.328 Notes ...... 577 3.2.3.1.6.329 Examples ...... 577 3.2.3.1.6.330 Using the MOD function ...... 577 3.2.3.1.6.331 Using the infix operator ...... 577 3.2.3.1.6.332 PI ...... 577 3.2.3.1.6.333 Syntax ...... 577 3.2.3.1.6.334 Arguments ...... 577 3.2.3.1.6.335 Returns ...... 577 3.2.3.1.6.336 Examples ...... 577 3.2.3.1.6.337 Using PI() to represent degrees in radians ...... 577 3.2.3.1.6.338 Tangent of π/4 (= 1) ...... 578 xxxiv 3.2.3.1.6.339 Sine of π/2 (= 1) ...... 578 3.2.3.1.6.340 POWER ...... 578 3.2.3.1.6.341 Syntax ...... 578 3.2.3.1.6.342 Arguments ...... 578 3.2.3.1.6.343 Returns ...... 578 3.2.3.1.6.344 Notes ...... 579 3.2.3.1.6.345 Examples ...... 579 3.2.3.1.6.346 On integers ...... 579 3.2.3.1.6.347 On floating point ...... 579 3.2.3.1.6.348 RADIANS ...... 579 3.2.3.1.6.349 Syntax ...... 579 3.2.3.1.6.350 Arguments ...... 579 3.2.3.1.6.351 Returns ...... 580 3.2.3.1.6.352 Notes ...... 580 3.2.3.1.6.353 Examples ...... 580 3.2.3.1.6.354 ROUND ...... 580 3.2.3.1.6.355 Syntax ...... 580 3.2.3.1.6.356 Arguments ...... 581 3.2.3.1.6.357 Returns ...... 581 3.2.3.1.6.358 Notes ...... 581 3.2.3.1.6.359 Examples ...... 581 3.2.3.1.6.360 Rounding to the nearest integer ...... 581 3.2.3.1.6.361 Rounding to 2 digits after the decimal point ...... 581 3.2.3.1.6.362 FLOOR vs. CEILING / CEIL vs. ROUND ...... 581 3.2.3.1.6.363 SIN ...... 582 3.2.3.1.6.364 Syntax ...... 582 3.2.3.1.6.365 Arguments ...... 582 3.2.3.1.6.366 Returns ...... 582 3.2.3.1.6.367 Notes ...... 582 3.2.3.1.6.368 Examples ...... 582 3.2.3.1.6.369 Sine of π/2 (= 1) ...... 583 3.2.3.1.6.370 Computing sine for a column, in radians ...... 583 3.2.3.1.6.371 SQRT ...... 583 3.2.3.1.6.372 Syntax ...... 583 3.2.3.1.6.373 Arguments ...... 583 3.2.3.1.6.374 Returns ...... 583 3.2.3.1.6.375 Notes ...... 584 3.2.3.1.6.376 Examples ...... 584 3.2.3.1.6.377 Replacing negative values with NULL ...... 584 3.2.3.1.6.378 SQUARE ...... 584 3.2.3.1.6.379 Syntax ...... 585 3.2.3.1.6.380 Arguments ...... 585 3.2.3.1.6.381 Returns ...... 585 3.2.3.1.6.382 Notes ...... 585 3.2.3.1.6.383 Examples ...... 585 3.2.3.1.6.384 TAN ...... 585 3.2.3.1.6.385 Syntax ...... 585 3.2.3.1.6.386 Arguments ...... 585 3.2.3.1.6.387 Returns ...... 585 3.2.3.1.6.388 Notes ...... 586 3.2.3.1.6.389 Examples ...... 586 3.2.3.1.6.390 Tangent of π/4 (= 1) ...... 586 3.2.3.1.6.391 Computing tangents for a column, in radians ...... 586 3.2.3.1.6.392 TRUNC ...... 586
xxxv 3.2.3.1.6.393 Syntax ...... 587 3.2.3.1.6.394 Arguments ...... 587 3.2.3.1.6.395 Returns ...... 587 3.2.3.1.6.396 Notes ...... 587 3.2.3.1.6.397 Examples ...... 587 3.2.3.1.6.398 Rounding to the nearest integer ...... 587 3.2.3.1.6.399 TRUNC vs. ROUND ...... 587 3.2.3.1.6.400 CHAR_LENGTH ...... 588 3.2.3.1.6.401 Syntax ...... 588 3.2.3.1.6.402 Arguments ...... 588 3.2.3.1.6.403 Returns ...... 588 3.2.3.1.6.404 Notes ...... 588 3.2.3.1.6.405 Examples ...... 588 3.2.3.1.6.406 Length in characters and bytes of strings ...... 588 3.2.3.1.6.407 CHARINDEX ...... 589 3.2.3.1.6.408 Syntax ...... 589 3.2.3.1.6.409 Arguments ...... 589 3.2.3.1.6.410 Returns ...... 589 3.2.3.1.6.411 Notes ...... 589 3.2.3.1.6.412 Examples ...... 589 3.2.3.1.6.413 Using CHARINDEX ...... 590 3.2.3.1.6.414 || (Concatenate) ...... 590 3.2.3.1.6.415 Syntax ...... 590 3.2.3.1.6.416 Arguments ...... 590 3.2.3.1.6.417 Returns ...... 590 3.2.3.1.6.418 Notes ...... 590 3.2.3.1.6.419 Examples ...... 590 3.2.3.1.6.420 Concatenate two values ...... 591 3.2.3.1.6.421 Concatenate two strings ...... 592 3.2.3.1.6.422 Adding spaces ...... 592 3.2.3.1.6.423 ISPREFIXOF ...... 592 3.2.3.1.6.424 Syntax ...... 592 3.2.3.1.6.425 Arguments ...... 592 3.2.3.1.6.426 Returns ...... 593 3.2.3.1.6.427 Notes ...... 593 3.2.3.1.6.428 Examples ...... 593 3.2.3.1.6.429 Filtering using ISPREFIXOF ...... 593 3.2.3.1.6.430 LEFT ...... 593 3.2.3.1.6.431 Syntax ...... 593 3.2.3.1.6.432 Arguments ...... 594 3.2.3.1.6.433 Returns ...... 594 3.2.3.1.6.434 Notes ...... 594 3.2.3.1.6.435 Examples ...... 594 3.2.3.1.6.436 Using LEFT ...... 594 3.2.3.1.6.437 LEN ...... 594 3.2.3.1.6.438 Syntax ...... 595 3.2.3.1.6.439 Arguments ...... 595 3.2.3.1.6.440 Returns ...... 595 3.2.3.1.6.441 Notes ...... 595 3.2.3.1.6.442 Examples ...... 595 3.2.3.1.6.443 Length in characters and bytes of strings ...... 596 3.2.3.1.6.444 Length of an ASCII string ...... 596 3.2.3.1.6.445 Absolute value on integer and floating point ...... 596 3.2.3.1.6.446 LIKE ...... 596 xxxvi 3.2.3.1.6.447 Syntax ...... 597 3.2.3.1.6.448 Arguments ...... 597 3.2.3.1.6.449 Test patterns ...... 597 3.2.3.1.6.450 Returns ...... 597 3.2.3.1.6.451 Notes ...... 597 3.2.3.1.6.452 Examples ...... 597 3.2.3.1.6.453 Match the beginning of a string ...... 598 3.2.3.1.6.454 Match a wildcard character by escaping ...... 599 3.2.3.1.6.455 Negate with NOT ...... 599 3.2.3.1.6.456 Match the middle of a string ...... 599 3.2.3.1.6.457 Find players with a middle name or suffix ...... 599 3.2.3.1.6.458 Find NON-LITERAL patterns ...... 600 3.2.3.1.6.459 Select rows in which z is a prefix of y: ...... 600 3.2.3.1.6.460 Select rows in which y contains z as a substring: ...... 600 3.2.3.1.6.461 Values that contain wildcards as well: ...... 600 3.2.3.1.6.462 LOWER ...... 600 3.2.3.1.6.463 Syntax ...... 601 3.2.3.1.6.464 Arguments ...... 601 3.2.3.1.6.465 Returns ...... 601 3.2.3.1.6.466 Notes ...... 601 3.2.3.1.6.467 Examples ...... 601 3.2.3.1.6.468 Lower-casing a literal value ...... 601 3.2.3.1.6.469 Lower-casing a column of values ...... 601 3.2.3.1.6.470 LTRIM ...... 602 3.2.3.1.6.471 Syntax ...... 602 3.2.3.1.6.472 Arguments ...... 602 3.2.3.1.6.473 Returns ...... 602 3.2.3.1.6.474 Notes ...... 602 3.2.3.1.6.475 Examples ...... 602 3.2.3.1.6.476 Trimming a literal value ...... 603 3.2.3.1.6.477 Trimming a column of values ...... 603 3.2.3.1.6.478 OCTET_LENGTH ...... 603 3.2.3.1.6.479 Syntax ...... 603 3.2.3.1.6.480 Arguments ...... 604 3.2.3.1.6.481 Returns ...... 604 3.2.3.1.6.482 Notes ...... 604 3.2.3.1.6.483 Examples ...... 604 3.2.3.1.6.484 Length in characters and bytes of strings ...... 604 3.2.3.1.6.485 PATINDEX ...... 604 3.2.3.1.6.486 Syntax ...... 605 3.2.3.1.6.487 Arguments ...... 605 3.2.3.1.6.488 Test patterns ...... 605 3.2.3.1.6.489 Returns ...... 605 3.2.3.1.6.490 Notes ...... 605 3.2.3.1.6.491 Examples ...... 605 3.2.3.1.6.492 Simple patterns ...... 606 3.2.3.1.6.493 Complex wildcards expressions ...... 606 3.2.3.1.6.494 REGEXP_COUNT ...... 606 3.2.3.1.6.495 Syntax ...... 607 3.2.3.1.6.496 Arguments ...... 607 3.2.3.1.6.497 Test patterns ...... 607 3.2.3.1.6.498 Returns ...... 607 3.2.3.1.6.499 Notes ...... 607 3.2.3.1.6.500 Examples ...... 608
xxxvii 3.2.3.1.6.501 Find players with at least 3 names (2 spaces) ...... 608 3.2.3.1.6.502 Using the offset index ...... 609 3.2.3.1.6.503 REGEXP_INSTR ...... 609 3.2.3.1.6.504 Syntax ...... 609 3.2.3.1.6.505 Arguments ...... 609 3.2.3.1.6.506 Test patterns ...... 610 3.2.3.1.6.507 Returns ...... 610 3.2.3.1.6.508 Notes ...... 610 3.2.3.1.6.509 Examples ...... 610 3.2.3.1.6.510 Find players with ‘ow’ in their name ...... 611 3.2.3.1.6.511 Using the return_position argument ...... 611 3.2.3.1.6.512 REGEXP_SUBSTR ...... 612 3.2.3.1.6.513 Syntax ...... 612 3.2.3.1.6.514 Arguments ...... 612 3.2.3.1.6.515 Test patterns ...... 613 3.2.3.1.6.516 Returns ...... 613 3.2.3.1.6.517 Notes ...... 613 3.2.3.1.6.518 Examples ...... 613 3.2.3.1.6.519 Find players with ‘o’ in their name ...... 614 3.2.3.1.6.520 Using the return_position argument ...... 614 3.2.3.1.6.521 REPEAT ...... 615 3.2.3.1.6.522 Syntax ...... 615 3.2.3.1.6.523 Arguments ...... 615 3.2.3.1.6.524 Returns ...... 615 3.2.3.1.6.525 Notes ...... 615 3.2.3.1.6.526 Examples ...... 615 3.2.3.1.6.527 Repeat the text in customername 2 times: ...... 616 3.2.3.1.6.528 Repeat the string 0 times: ...... 616 3.2.3.1.6.529 Repeat the string 1 times: ...... 616 3.2.3.1.6.530 Repeat the string 3 times: ...... 616 3.2.3.1.6.531 Repeat an empty string 10 times: ...... 616 3.2.3.1.6.532 Repeat a string -3 times: ...... 617 3.2.3.1.6.533 REPLACE ...... 617 3.2.3.1.6.534 Syntax ...... 617 3.2.3.1.6.535 Arguments ...... 617 3.2.3.1.6.536 Returns ...... 617 3.2.3.1.6.537 Notes ...... 617 3.2.3.1.6.538 Examples ...... 617 3.2.3.1.6.539 Replacing single characters ...... 618 3.2.3.1.6.540 Replacing words ...... 618 3.2.3.1.6.541 REVERSE ...... 618 3.2.3.1.6.542 Syntax ...... 618 3.2.3.1.6.543 Arguments ...... 618 3.2.3.1.6.544 Returns ...... 619 3.2.3.1.6.545 Notes ...... 619 3.2.3.1.6.546 Examples ...... 619 3.2.3.1.6.547 Using REVERSE ...... 619 3.2.3.1.6.548 RIGHT ...... 619 3.2.3.1.6.549 Syntax ...... 619 3.2.3.1.6.550 Arguments ...... 620 3.2.3.1.6.551 Returns ...... 620 3.2.3.1.6.552 Notes ...... 620 3.2.3.1.6.553 Examples ...... 620 3.2.3.1.6.554 Using RIGHT ...... 620 xxxviii 3.2.3.1.6.555 RLIKE ...... 620 3.2.3.1.6.556 Syntax ...... 621 3.2.3.1.6.557 Arguments ...... 621 3.2.3.1.6.558 Test patterns ...... 621 3.2.3.1.6.559 Returns ...... 621 3.2.3.1.6.560 Notes ...... 621 3.2.3.1.6.561 Examples ...... 622 3.2.3.1.6.562 Match the beginning of a string ...... 622 3.2.3.1.6.563 Negate with NOT ...... 623 3.2.3.1.6.564 Match the middle of a string ...... 623 3.2.3.1.6.565 Find players with a Roman numeral suffix ...... 623 3.2.3.1.6.566 Find players with just one middle name ...... 623 3.2.3.1.6.567 RTRIM ...... 624 3.2.3.1.6.568 Syntax ...... 624 3.2.3.1.6.569 Arguments ...... 624 3.2.3.1.6.570 Returns ...... 624 3.2.3.1.6.571 Notes ...... 624 3.2.3.1.6.572 Examples ...... 624 3.2.3.1.6.573 Trimming a literal value ...... 625 3.2.3.1.6.574 Trimming a column of values ...... 625 3.2.3.1.6.575 SUBSTRING ...... 625 3.2.3.1.6.576 Syntax ...... 625 3.2.3.1.6.577 Arguments ...... 626 3.2.3.1.6.578 Returns ...... 626 3.2.3.1.6.579 Notes ...... 626 3.2.3.1.6.580 Examples ...... 626 3.2.3.1.6.581 Substring using fixed offsets ...... 627 3.2.3.1.6.582 Truncating strings ...... 627 3.2.3.1.6.583 TRIM ...... 628 3.2.3.1.6.584 Syntax ...... 628 3.2.3.1.6.585 Arguments ...... 628 3.2.3.1.6.586 Returns ...... 628 3.2.3.1.6.587 Notes ...... 628 3.2.3.1.6.588 Examples ...... 628 3.2.3.1.6.589 Trimming a literal value ...... 628 3.2.3.1.6.590 Trimming a column of values ...... 629 3.2.3.1.6.591 UPPER ...... 629 3.2.3.1.6.592 Syntax ...... 629 3.2.3.1.6.593 Arguments ...... 629 3.2.3.1.6.594 Returns ...... 630 3.2.3.1.6.595 Notes ...... 630 3.2.3.1.6.596 Examples ...... 630 3.2.3.1.6.597 Upper-casing a literal value ...... 630 3.2.3.1.6.598 Upper-casing a column of values ...... 630 3.2.3.1.6.599 User-Defined Functions ...... 630 3.2.3.1.6.600 Scalar SQL UDF ...... 631 3.2.3.1.6.601 Syntax ...... 631 3.2.3.1.6.602 Examples ...... 631 3.2.3.1.6.603 Example 1 – Support for Different Syntax ...... 631 3.2.3.1.6.604 Example 2 – Manipulating Strings ...... 631 3.2.3.1.6.605 Example 3 – Manually Building Functionality ...... 632 3.2.3.1.6.606 Usage Notes ...... 632 3.2.3.1.6.607 Restrictions ...... 632 3.2.3.1.6.608 Aggregate Functions ...... 633
xxxix 3.2.3.1.6.609 Overview ...... 633 3.2.3.1.6.610 Available Aggregate Functions ...... 633 3.2.3.1.6.611 AVG ...... 633 3.2.3.1.6.612 Syntax ...... 633 3.2.3.1.6.613 Arguments ...... 633 3.2.3.1.6.614 Returns ...... 633 3.2.3.1.6.615 Notes ...... 634 3.2.3.1.6.616 Examples ...... 634 3.2.3.1.6.617 Simple average ...... 635 3.2.3.1.6.618 Combine AVG with other aggregates ...... 635 3.2.3.1.6.619 CORR ...... 635 3.2.3.1.6.620 Syntax ...... 636 3.2.3.1.6.621 Arguments ...... 636 3.2.3.1.6.622 Returns ...... 636 3.2.3.1.6.623 Notes ...... 636 3.2.3.1.6.624 Examples ...... 636 3.2.3.1.6.625 Simple correlation ...... 637 3.2.3.1.6.626 COUNT ...... 638 3.2.3.1.6.627 Syntax ...... 638 3.2.3.1.6.628 Arguments ...... 638 3.2.3.1.6.629 Returns ...... 638 3.2.3.1.6.630 Notes ...... 638 3.2.3.1.6.631 Examples ...... 639 3.2.3.1.6.632 Count rows in a table ...... 639 3.2.3.1.6.633 Count distinct values in a table ...... 640 3.2.3.1.6.634 Combine COUNT with other aggregates ...... 640 3.2.3.1.6.635 COVAR_POP ...... 640 3.2.3.1.6.636 Syntax ...... 641 3.2.3.1.6.637 Arguments ...... 641 3.2.3.1.6.638 Returns ...... 641 3.2.3.1.6.639 Notes ...... 641 3.2.3.1.6.640 Examples ...... 641 3.2.3.1.6.641 Simple covariance ...... 642 3.2.3.1.6.642 COVAR_SAMP ...... 643 3.2.3.1.6.643 Syntax ...... 643 3.2.3.1.6.644 Arguments ...... 643 3.2.3.1.6.645 Returns ...... 643 3.2.3.1.6.646 Notes ...... 643 3.2.3.1.6.647 Examples ...... 643 3.2.3.1.6.648 Simple covariance ...... 644 3.2.3.1.6.649 MAX ...... 645 3.2.3.1.6.650 Syntax ...... 645 3.2.3.1.6.651 Arguments ...... 645 3.2.3.1.6.652 Returns ...... 645 3.2.3.1.6.653 Notes ...... 646 3.2.3.1.6.654 Examples ...... 646 3.2.3.1.6.655 Simple Minimum, Maximum on numeric columns ...... 646 3.2.3.1.6.656 Minimum and maximum on text columns ...... 647 3.2.3.1.6.657 Combine MAX with GROUP BY ...... 647 3.2.3.1.6.658 MIN ...... 647 3.2.3.1.6.659 Syntax ...... 647 3.2.3.1.6.660 Arguments ...... 647 3.2.3.1.6.661 Returns ...... 648 3.2.3.1.6.662 Notes ...... 648 xl 3.2.3.1.6.663 Examples ...... 648 3.2.3.1.6.664 Simple Minimum, Maximum on numeric columns ...... 649 3.2.3.1.6.665 Minimum and maximum on text columns ...... 649 3.2.3.1.6.666 Combine MIN with GROUP BY ...... 649 3.2.3.1.6.667 STDDEV_POP ...... 649 3.2.3.1.6.668 Syntax ...... 649 3.2.3.1.6.669 Arguments ...... 650 3.2.3.1.6.670 Returns ...... 650 3.2.3.1.6.671 Notes ...... 650 3.2.3.1.6.672 Examples ...... 650 3.2.3.1.6.673 Simple standard deviation ...... 651 3.2.3.1.6.674 Combine STDEVP with other aggregates ...... 651 3.2.3.1.6.675 STDDEV_SAMP ...... 652 3.2.3.1.6.676 Syntax ...... 652 3.2.3.1.6.677 Arguments ...... 652 3.2.3.1.6.678 Returns ...... 653 3.2.3.1.6.679 Notes ...... 653 3.2.3.1.6.680 Examples ...... 653 3.2.3.1.6.681 Simple standard deviation ...... 654 3.2.3.1.6.682 Combine stddev with other aggregates ...... 654 3.2.3.1.6.683 SUM ...... 654 3.2.3.1.6.684 Syntax ...... 654 3.2.3.1.6.685 Arguments ...... 655 3.2.3.1.6.686 Returns ...... 655 3.2.3.1.6.687 Notes ...... 655 3.2.3.1.6.688 Examples ...... 655 3.2.3.1.6.689 Simple sum ...... 656 3.2.3.1.6.690 Sum only distinct values ...... 656 3.2.3.1.6.691 Combine sum with GROUP BY ...... 656 3.2.3.1.6.692 VAR_POP ...... 657 3.2.3.1.6.693 Syntax ...... 657 3.2.3.1.6.694 Arguments ...... 657 3.2.3.1.6.695 Returns ...... 658 3.2.3.1.6.696 Notes ...... 658 3.2.3.1.6.697 Examples ...... 658 3.2.3.1.6.698 Simple variance ...... 659 3.2.3.1.6.699 Combine VARP with other aggregates ...... 659 3.2.3.1.6.700 VAR_SAMP ...... 659 3.2.3.1.6.701 Syntax ...... 660 3.2.3.1.6.702 Arguments ...... 660 3.2.3.1.6.703 Returns ...... 660 3.2.3.1.6.704 Notes ...... 660 3.2.3.1.6.705 Examples ...... 660 3.2.3.1.6.706 Simple variance ...... 661 3.2.3.1.6.707 Combine VAR with other aggregates ...... 662 3.2.3.1.6.708 Window Functions ...... 662 3.2.3.1.6.709 LAG ...... 662 3.2.3.1.6.710 Syntax ...... 662 3.2.3.1.6.711 Arguments ...... 663 3.2.3.1.6.712 Returns ...... 663 3.2.3.1.6.713 Notes ...... 663 3.2.3.1.6.714 Examples ...... 663 3.2.3.1.6.715 Using LAG to access previous rows ...... 664 3.2.3.1.6.716 LEAD ...... 665
xli 3.2.3.1.6.717 Syntax ...... 665 3.2.3.1.6.718 Arguments ...... 665 3.2.3.1.6.719 Returns ...... 665 3.2.3.1.6.720 Notes ...... 665 3.2.3.1.6.721 Examples ...... 665 3.2.3.1.6.722 Using LEAD to access next rows ...... 666 3.2.3.1.6.723 ROW_NUMBER ...... 667 3.2.3.1.6.724 Syntax ...... 667 3.2.3.1.6.725 Arguments ...... 667 3.2.3.1.6.726 Returns ...... 667 3.2.3.1.6.727 Notes ...... 667 3.2.3.1.6.728 Examples ...... 667 3.2.3.1.6.729 RANK vs. ROW_NUMBER ...... 668 3.2.3.1.6.730 RANK ...... 669 3.2.3.1.6.731 Syntax ...... 669 3.2.3.1.6.732 Arguments ...... 670 3.2.3.1.6.733 Returns ...... 670 3.2.3.1.6.734 Notes ...... 670 3.2.3.1.6.735 Examples ...... 670 3.2.3.1.6.736 RANK vs. ROW_NUMBER ...... 671 3.2.3.1.6.737 FIRST_VALUE ...... 672 3.2.3.1.6.738 Syntax ...... 672 3.2.3.1.6.739 Arguments ...... 673 3.2.3.1.6.740 Returns ...... 673 3.2.3.1.6.741 LAST_VALUE ...... 673 3.2.3.1.6.742 Syntax ...... 673 3.2.3.1.6.743 Arguments ...... 673 3.2.3.1.6.744 Returns ...... 673 3.2.3.1.6.745 NTH_VALUE ...... 673 3.2.3.1.6.746 Syntax ...... 673 3.2.3.1.6.747 Examples ...... 674 3.2.3.1.6.748 DENSE_RANK ...... 674 3.2.3.1.6.749 Syntax ...... 674 3.2.3.1.6.750 Arguments ...... 674 3.2.3.1.6.751 Returns ...... 674 3.2.3.1.6.752 PERCENT_RANK ...... 675 3.2.3.1.6.753 Syntax ...... 675 3.2.3.1.6.754 Arguments ...... 675 3.2.3.1.6.755 Returns ...... 675 3.2.3.1.6.756 CUME_DIST ...... 675 3.2.3.1.6.757 Syntax ...... 675 3.2.3.1.6.758 Arguments ...... 675 3.2.3.1.6.759 Returns ...... 675 3.2.3.1.6.760 NTILE ...... 676 3.2.3.1.6.761 Syntax ...... 676 3.2.3.1.6.762 System Functions ...... 676 3.2.3.1.6.763 SHOW_VERSION ...... 676 3.2.3.1.6.764 Permissions ...... 676 3.2.3.1.6.765 Syntax ...... 676 3.2.3.1.6.766 Parameters ...... 676 3.2.3.1.6.767 Notes ...... 676 3.2.3.1.6.768 Examples ...... 677 3.2.3.1.6.769 Getting the current SQream DB version ...... 677 3.3 Catalog reference ...... 677 xlii 3.3.1 Types of data exposed by sqream_catalog ...... 678 3.3.2 Tables in the catalog ...... 678 3.3.2.1 clustering_keys ...... 678 3.3.2.2 columns ...... 679 3.3.2.3 external_tables ...... 679 3.3.2.4 external_table_columns ...... 679 3.3.2.5 databases ...... 680 3.3.2.6 database_permissions ...... 680 3.3.2.7 identity_key ...... 680 3.3.2.8 permission_types ...... 680 3.3.2.9 roles ...... 680 3.3.2.10 roles_memberships ...... 680 3.3.2.11 savedqueries ...... 681 3.3.2.12 schemas ...... 681 3.3.2.13 schema_permissions ...... 681 3.3.2.14 tables ...... 681 3.3.2.15 table_permissions ...... 682 3.3.2.16 udf_permissions ...... 682 3.3.2.17 user_defined_functions ...... 682 3.3.2.18 views ...... 682 3.3.3 Additional tables ...... 682 3.3.3.1 extents ...... 682 3.3.3.2 chunk_columns ...... 683 3.3.3.3 chunks ...... 683 3.3.3.4 delete_predicates ...... 684 3.3.4 Examples ...... 684 3.3.4.1 List all tables in the database ...... 684 3.3.4.2 List all schemas in the database ...... 684 3.3.4.3 List columns and their types for a specific table ...... 684 3.3.4.4 List delete predicates ...... 685 3.3.4.5 List Saved queries ...... 685 3.4 Command line programs ...... 685 3.4.1 metadata_server ...... 685 3.4.1.1 Positional command line arguments ...... 686 3.4.1.2 Starting metadata server ...... 686 3.4.1.2.1 Starting temporarily ...... 686 3.4.1.2.2 Starting temporarily with non-default port ...... 686 3.4.1.2.3 Stopping metadata server ...... 686 3.4.2 sqreamd ...... 687 3.4.2.1 Starting SQream DB ...... 687 3.4.2.1.1 Start SQream DB temporarily ...... 687 3.4.2.2 Command line arguments ...... 687 3.4.2.2.1 Positional command arguments ...... 687 3.4.3 sqream-console ...... 688 3.4.3.1 Starting the console ...... 689 3.4.3.2 Operations and flag reference ...... 690 3.4.3.2.1 Commands ...... 690 3.4.3.2.2 Master ...... 690 3.4.3.2.2.1 Syntax ...... 690 3.4.3.2.2.2 Common usage ...... 691 3.4.3.2.2.3 Start master node ...... 691 3.4.3.2.2.4 Start master node on different ports ...... 691 3.4.3.2.2.5 Listing active master nodes and workers ...... 691 3.4.3.2.2.6 Stopping all SQream DB workers and master ...... 691
xliii 3.4.3.2.3 Workers ...... 691 3.4.3.2.3.1 Syntax ...... 691 3.4.3.2.3.2 Common usage ...... 692 3.4.3.2.3.3 Start 2 workers ...... 692 3.4.3.2.3.4 Stop a single worker ...... 692 3.4.3.2.3.5 Start workers with a different spool size ...... 692 3.4.3.2.3.6 Starting multiple workers on non-dedicated GPUs ...... 693 3.4.3.2.3.7 Overriding default configuration files ...... 693 3.4.3.2.4 Client ...... 693 3.4.3.2.4.1 Syntax ...... 693 3.4.3.2.4.2 Common usage ...... 694 3.4.3.2.4.3 Start a client ...... 694 3.4.3.2.4.4 Start a client to a specific worker ...... 694 3.4.3.2.4.5 Start master node on different ports ...... 694 3.4.3.2.4.6 Listing active master nodes and worker nodes ...... 694 3.4.3.2.5 Editor ...... 695 3.4.3.2.5.1 Syntax ...... 695 3.4.3.2.5.2 Common usage ...... 695 3.4.3.2.5.3 Start the editor UI ...... 695 3.4.3.2.5.4 Stop the editor UI ...... 695 3.4.3.3 Using the console to start SQream DB ...... 695 3.4.3.3.1 Starting a SQream DB cluster for the first time ...... 695 3.4.4 sqream-installer ...... 696 3.4.4.1 Operations and flag reference ...... 697 3.4.4.1.1 Command line flags ...... 697 3.4.4.2 Usage ...... 697 3.4.4.2.1 Install SQream DB for the first time ...... 697 3.4.4.2.2 Modify exposed directories ...... 697 3.4.4.2.3 Install a new license package ...... 698 3.4.4.2.4 View system settings ...... 698 3.4.4.2.5 Upgrading to a new version of SQream DB ...... 698 3.4.5 server_picker ...... 699 3.4.5.1 Positional command line arguments ...... 699 3.4.5.2 Starting server picker ...... 699 3.4.5.2.1 Starting temporarily ...... 699 3.4.5.2.2 Starting temporarily with non-default port ...... 699 3.4.5.2.3 Stopping server picker ...... 699 3.4.6 SqreamStorage ...... 700 3.4.6.1 Running SqreamStorage ...... 700 3.4.6.2 Command line arguments ...... 700 3.4.6.3 Examples ...... 700 3.4.6.3.1 Create a new storage cluster ...... 700 3.4.7 Sqream SQL CLI Reference ...... 700 3.4.7.1 Installing Sqream SQL ...... 701 3.4.7.1.1 Troubleshooting Sqream SQL Installation ...... 702 3.4.7.2 Using Sqream SQL ...... 702 3.4.7.2.1 Running Commands Interactively (SQL shell) ...... 702 3.4.7.2.2 Executing Batch Scripts (-f) ...... 703 3.4.7.2.3 Executing Commands Immediately (-c) ...... 704 3.4.7.3 Examples ...... 704 3.4.7.3.1 Starting a Regular Interactive Shell ...... 704 3.4.7.3.2 Executing Statements in an Interactive Shell ...... 704 3.4.7.3.3 Executing SQL Statements from the Command Line ...... 705 3.4.7.3.4 Controlling the Client Output ...... 705 xliv 3.4.7.3.4.1 Exporting SQL Query Results to CSV ...... 705 3.4.7.3.4.2 Changing a CSV to a TSV ...... 706 3.4.7.3.5 Executing a Series of Statements From a File ...... 706 3.4.7.3.6 Connecting Using Environment Variables ...... 706 3.4.7.3.7 Connecting to a Specific Queue ...... 707 3.4.7.4 Operations and Flag References ...... 707 3.4.7.4.1 Command Line Arguments ...... 707 3.4.7.4.1.1 Supported Record Delimiters ...... 707 3.4.7.4.2 Meta-Commands ...... 708 3.4.7.4.3 Basic Commands ...... 708 3.4.7.4.4 Moving Around the Command Line ...... 708 3.4.7.4.5 Searching ...... 709 3.4.8 upgrade_storage ...... 709 3.4.8.1 Running upgrade_storage ...... 709 3.4.8.2 Command line arguments ...... 709 3.4.8.3 Results and error codes ...... 709 3.4.8.4 Examples ...... 709 3.4.8.4.1 Upgrade SQream DB’s storage cluster ...... 709 3.5 SQL Feature Checklist ...... 710 3.5.1 Data types and values ...... 710 3.5.2 Contraints ...... 711 3.5.3 Transactions ...... 711 3.5.4 Indexes ...... 711 3.5.5 Schema changes ...... 712 3.5.6 Statements ...... 712 3.5.7 Clauses ...... 712 3.5.8 Table expressions ...... 713 3.5.9 Scalar expressions ...... 713 3.5.10 Permissions ...... 713 3.5.11 Extra functionality ...... 714
4 Releases 715 4.1 What’s New in 2021.2 ...... 715 4.1.1 New Features ...... 715 4.1.1.1 New Driver Compatibility ...... 715 4.1.1.2 Centralized Configuration System ...... 716 4.1.1.3 Qualifying Schemas Without Providing an Alias ...... 716 4.1.1.4 Double-Quotations Supported When Importing and Exporting CSVs .... 716 4.2 What’s New in 2021.1 ...... 717 4.2.1 Release Notes Version 2021.1.2 ...... 717 4.2.1.1 New Features ...... 717 4.2.1.1.1 Aliases Added to SUBSTRING Function and Length Argument .. 717 4.2.1.1.2 Data Type Aliases Added ...... 718 4.2.1.1.3 String Literals Containing ASCII Characters Interepreted as TEXT 718 4.2.1.1.4 Decimal Literals Interpreted as Numeric Columns ...... 718 4.2.1.1.5 Roles Area Added to Studio Version 5.3.3 ...... 718 4.2.1.2 Resolved Issues ...... 718 4.2.2 Release Notes Version 2021.1.1 ...... 718 4.2.2.1 New Features ...... 719 4.2.2.1.1 Complete Ranking Function Support ...... 719 4.2.2.2 Resolved Issues ...... 719 4.2.3 Release Notes Version 2021.1 ...... 719 4.2.3.1 Version Content ...... 719 4.2.3.2 New Features ...... 720
xlv 4.2.3.2.1 SQream DB on Cloud ...... 720 4.2.3.2.2 Numeric Data Types ...... 720 4.2.3.2.3 Text Data Type ...... 720 4.2.3.2.4 Supports Scalar Subqueries ...... 720 4.2.3.2.5 Literal Arguments ...... 721 4.2.3.2.6 Simple Scalar SQL UDFs ...... 721 4.2.3.2.7 Logging Enhancements ...... 721 4.2.3.2.8 Improved Presented License Information ...... 721 4.2.3.2.9 Optimized Foreign Data Wrapper Export ...... 721 4.2.3.3 Main Features ...... 722 4.2.3.4 Resolved Issues ...... 722 4.2.3.5 Operations and Configuration Changes ...... 723 4.2.3.5.1 Recommended SQream Configuration on Cloud ...... 723 4.2.3.5.2 Optimized Foreign Data Wrapper Export Configuration Flag .... 723 4.2.3.6 Naming Changes ...... 723 4.2.3.7 Deprecated Features ...... 723 4.2.3.8 Known Issues and Limitations ...... 723 4.2.3.9 Upgrading to v2021.1 ...... 723 4.3 What’s New in 2020.3 ...... 724 4.3.1 Release Notes Version 2020.3.2.1 ...... 724 4.3.1.1 Performance Enhancements ...... 724 4.3.1.2 Known Issues and Limitations ...... 724 4.3.1.3 Upgrading to v2020.3.2.1 ...... 724 4.3.2 Release Notes Version 2020.3.1 ...... 724 4.3.2.1 New Features ...... 724 4.3.2.2 Performance Enhancements ...... 725 4.3.2.3 Resolved Issues ...... 725 4.3.2.4 Known Issues and Limitations ...... 725 4.3.2.5 Upgrading to v2020.3.1 ...... 725 4.3.3 Release Notes Version 2020.3 ...... 726 4.3.3.1 New Features ...... 726 4.3.3.2 Performance Enhancements ...... 726 4.3.3.3 Resolved Issues ...... 726 4.3.3.4 Known Issues and Limitations ...... 727 4.3.3.5 Upgrading to v2020.3 ...... 727 4.4 What’s New in 2020.2 ...... 727 4.4.1 New features ...... 728 4.4.1.1 UI ...... 728 4.4.1.2 Integrations ...... 728 4.4.1.3 SQL support ...... 728 4.4.2 Improvements and fixes ...... 728 4.4.3 Operations ...... 729 4.4.4 Known Issues & Limitations ...... 729 4.4.5 Upgrading to v2020.2 ...... 729 4.5 What’s New in 2020.1 ...... 729 4.5.1 New features ...... 730 4.5.1.1 Integrations ...... 730 4.5.1.2 SQL support ...... 730 4.5.2 Improvements and fixes ...... 730 4.5.3 Behaviour changes ...... 731 4.5.4 Operations ...... 732 4.5.5 Known Issues & Limitations ...... 733 4.5.6 Upgrading to v2020.1 ...... 733
xlvi 5 Glossary 735
Index 737
xlvii xlviii SQream DB, Release 2021.1
For SQream DB v2021.1.
Tip: This documentation is available online at https://docs.sqream.com/
SQream DB is a columnar analytic SQL database management system. SQream DB supports regular SQL including a substantial amount of ANSI SQL, uses serializable transac- tions, and scales horizontally for concurrent statements. Even a basic SQream DB machine can support tens to hundreds of terabytes of data. SQream DB easily plugs in to third-party tools like Tableau comes with standard SQL client drivers, including JDBC, ODBC, and Python DB-API.
Get Started Reference Guides First steps with SQream DB SQL Reference Setting up SQream DB SQL Feature Checklist SQL Statements Best practices Bulk load CSVs SQL Functions Connecting to SQream Using Tableau Releases Driver and Deployment Help and Support 2021.2 Client drivers Troubleshooting guide 2021.1 Third party tools integration Gathering information for 2020.3 Connecting to SQream Using SQream support 2020.2 Tableau 2020.1 All recent releases
Need help?
If you couldn’t find what you’re looking for, we’re always happy to help. Visit SQream’s support portal for additional support.
Looking for older versions?
This version of the documentation is for SQream DB v2021.1. If you’re looking for an older version of the documentation, versions 1.10 through 2019.2.1 are available at http://previous.sqream.com .
Contents: 1 SQream DB, Release 2021.1
2 Contents: CHAPTER 1
First steps with SQream DB
This tutorial takes you through a few basic operations in SQream DB.
In this topic:
• Preparing for this tutorial • Create a new database for playing around in • Creating your first table • Listing tables • Inserting rows • Queries • Deleting rows • Saving query results to a CSV or PSV file
1.1 Preparing for this tutorial
This tutorial assumes you already have a SQream DB cluster running and the SQream command line client installed on the machine you are on.
If you haven’t already:
• Set up a SQream DB cluster. • Install SQream SQL CLI. • See our Hardware guide.
3 SQream DB, Release 2021.1
Run the SQream SQL client like this. It will interactively ask for the password.
$ sqream sql --port=5000 --username=rhendricks -d master Password:
Interactive client mode To quit, use ^D or \q.
master=> _
You should use a username and password that you have set up or your DBA has given you.
Tip: • To exit the shell, type \q or Ctrl-d. • A new SQream DB cluster contains a database named master. We will start with this database.
1.2 Create a new database for playing around in
To create a database, we will use the CREATE DATABASE syntax.
master=> CREATE DATABASE test; executed
Now, reconnect to the newly created database. First, exit the client by typing \q and hitting enter. From the Linux shell, restart the client with the new database name:
$ sqream sql --port=5000 --username=rhendricks -d test Password:
Interactive client mode To quit, use ^D or \q.
test=> _
The new database name appears in the prompt. This lets you know which database you’re connected to.
1.3 Creating your first table
To create a table, we will use the CREATE TABLE syntax, with a table name and some column specifications. For example,
CREATE TABLE cool_animals ( id INT NOT NULL, name VARCHAR(20), weight INT );
4 Chapter 1. First steps with SQream DB SQream DB, Release 2021.1
If the table already exists and you want to drop the current table and create a new one, you can add OR REPLACE after the CREATE keyword.
CREATE OR REPLACE TABLE cool_animals ( id INT NOT NULL, name VARCHAR(20), weight INT );
You can ask SQream DB to list the full, verbose CREATE TABLE statement for any table, by using the GET_DDL function, with the table name. test=> SELECT GET_DDL('cool_animals'); create table "public"."cool_animals" ( "id" int not null, "name" varchar(20), "weight" int );
If you are done with this table, you can use DROP TABLE to remove the table and all of its data. test=> DROP TABLE cool_animals; executed
1.4 Listing tables
To see the tables in the current database, we will query the catalog test=> SELECT table_name FROM sqream_catalog.tables; cool_animals
1 rows
1.5 Inserting rows
Inserting rows into a table can be performed with the INSERT statement. The statement includes the table name, an optional list of column names, and column values listed in the same order as the column names: test=> INSERT INTO cool_animals VALUES (1, 'Dog', 7); executed
To change the order of values, specify the column order: test=> INSERT INTO cool_animals(weight, id, name) VALUES (3, 2, 'Possum'); executed
You can use INSERT to insert multiple rows too. Here, you use sets of parentheses separated by commas:
1.4. Listing tables 5 SQream DB, Release 2021.1
test=> INSERT INTO cool_animals VALUES (3, 'Cat', 5), (4, 'Elephant', 6500), (5, 'Rhinoceros', 2100); executed
Note: To load big data sets, use bulk loading methods instead. See our Inserting data guide for more information.
When you leave out columns that have a default value (including default NULL value) the default value is used. test=> INSERT INTO cool_animals (id) VALUES (6); executed test=> INSERT INTO cool_animals (id) VALUES (6); executed test=> SELECT * FROM cool_animals; 1,Dog ,7 2,Possum ,3 3,Cat ,5 4,Elephant ,6500 5,Rhinoceros ,2100 6,\N,\N
6 rows
Note: Null row values are represented as \N
1.6 Queries
For querying, use the SELECT keyword, followed by a list of columns and values to be returned, and the table to get the data from. test=> SELECT id, name, weight FROM cool_animals; 1,Dog ,7 2,Possum ,3 3,Cat ,5 4,Elephant ,6500 5,Rhinoceros ,2100 6,\N,\N
6 rows
To get all columns without specifying them, use the star operator *:
6 Chapter 1. First steps with SQream DB SQream DB, Release 2021.1
test=> SELECT * FROM cool_animals; 1,Dog ,7 2,Possum ,3 3,Cat ,5 4,Elephant ,6500 5,Rhinoceros ,2100 6,\N,\N
6 rows
To get the number of values in a table without getting the full result set, use COUNT(*): test=> SELECT COUNT(*) FROM cool_animals; 6
1 row
Filter results by adding a WHERE clause and specifying the filter condition: test=> SELECT id, name, weight FROM cool_animals WHERE weight > 1000; 4,Elephant ,6500 5,Rhinoceros ,2100
2 rows
Sort the results by adding an ORDER BY clause, and specifying ascending (ASC) or descending (DESC) order: test=> SELECT * FROM cool_animals ORDER BY weight DESC; 4,Elephant ,6500 5,Rhinoceros ,2100 1,Dog ,7 3,Cat ,5 2,Possum ,3 6,\N,\N
6 rows
Filter null rows by adding a filter IS NOT NULL: test=> SELECT * FROM cool_animals WHERE weight IS NOT NULL ORDER BY weight DESC; 4,Elephant ,6500 5,Rhinoceros ,2100 1,Dog ,7 3,Cat ,5 2,Possum ,3
5 rows
1.6. Queries 7 SQream DB, Release 2021.1
1.7 Deleting rows
To delete rows in a table selectively, use the DELETE command, with a table name and a WHERE clause to specify which rows are to be deleted: test=> DELETE FROM cool_animals WHERE weight is null; executed master=> SELECT * FROM cool_animals; 1,Dog ,7 2,Possum ,3 3,Cat ,5 4,Elephant ,6500 5,Rhinoceros ,2100
5 rows
To delete all rows in a table, use the TRUNCATE command followed by the table name: test=> TRUNCATE TABLE cool_animals; executed
Note: While TRUNCATE deletes data from disk immediately, DELETE does not physically remove the deleted rows. For more information on removing the rows from disk, see DELETE.
1.8 Saving query results to a CSV or PSV file
The command line client sqream sql can be used to save query results to a CSV or other delimited file format.
$ sqream sql --username=mjordan --database=nba --host=localhost --port=5000 -c "SELECT *␣ ,→FROM nba LIMIT 5" --results-only --delimiter='|' > nba.psv $ cat nba.psv Avery Bradley |Boston Celtics |0|PG|25|6-2 |180|Texas ␣ ,→|7730337 Jae Crowder |Boston Celtics |99|SF|25|6-6 |235|Marquette ␣ ,→|6796117 John Holland |Boston Celtics |30|SG|27|6-5 |205|Boston University ␣ ,→|\N R.J. Hunter |Boston Celtics |28|SG|22|6-5 |185|Georgia State ␣ ,→|1148640 Jonas Jerebko |Boston Celtics |8|PF|29|6-10|231|\N|5000000
See the Controlling the output of the client section of the reference for more options.
What’s next?
• Explore all of SQream DB’s SQL Syntax • See the full SQream SQL CLI reference
8 Chapter 1. First steps with SQream DB SQream DB, Release 2021.1
• Connect a third party tool to SQream DB and start analyzing data
1.8. Saving query results to a CSV or PSV file 9 SQream DB, Release 2021.1
10 Chapter 1. First steps with SQream DB CHAPTER 2
Guides
This section has concept and features guides, and task focused guides.
Recommended guides
• Optimization and best practices in SQream DB • Using third party tools • Client drivers for SQream DB
2.1 Full list of guides
2.1.1 Inserting data
This guide covers inserting data into SQream DB, with subguides on inserting data from a variety of sources and locations.
In this topic:
• Data loading overview • Data loading considerations – Verify data and performance after load – File source location for loading – Supported load methods – Unsupported data types – Extended error handling
11 SQream DB, Release 2021.1
– Best practices for CSV – Best practices for Parquet ∗ Type support and behavior notes – Best practices for ORC ∗ Type support and behavior notes • Further reading and migration guides
2.1.1.1 Data loading overview
SQream DB supports importing data from the following sources: • Using INSERT with a client driver • Using COPY FROM: – Local filesystem and locally mounted network filesystems – Amazon S3 – Using SQream in an HDFS Environment • Using Foreign Tables: – Local filesystem and locally mounted network filesystems – Amazon S3 – Using SQream in an HDFS Environment SQream DB supports loading files in the following formats: • Text - CSV, TSV, PSV • Parquet • ORC
2.1.1.2 Data loading considerations
2.1.1.2.1 Verify data and performance after load
Like other RDBMSs, SQream DB has its own set of best practcies for table design and query optimization. SQream therefore recommends: • Verify that the data is as you expect it (e.g. row counts, data types, formatting, content) • The performance of your queries is adequate • Best practices were followed for table design • Applications such as Tableau and others have been tested, and work • Data types were not over-provisioned (e.g. don’t use VARCHAR(2000) to store a short string)
12 Chapter 2. Guides SQream DB, Release 2021.1
2.1.1.2.2 File source location for loading
During loading using COPY FROM, the statement can run on any worker. If you are running multiple nodes, make sure that all nodes can see the source the same. If you load from a local file which is only on 1 node and not on shared storage, it will fail some of the time. (If you need to, you can also control which node a statement runs on using the Workload Manager).
2.1.1.2.3 Supported load methods
SQream DB’s COPY FROM syntax can be used to load CSV files, but can’t be used for Parquet and ORC. FOREIGN TABLE can be used to load text files, Parquet, and ORC files, and can also transform thedata prior to materialization as a full table.
Method / File type Text (CSV) Parquet ORC Streaming data COPY FROM ✓ Foreign Tables ✓ ✓ ✓ INSERT ✓ (Python, JDBC, Node.JS)
2.1.1.2.4 Unsupported data types
SQream DB doesn’t support the entire set of features that some other database systems may have, such as ARRAY, BLOB, ENUM, SET, etc. These data types will have to be converted before load. For example, ENUM can often be stored as a VARCHAR.
2.1.1.2.5 Extended error handling
While external tables can be used to load CSVs, the COPY FROM statement provides more fine-grained er- ror handling options, as well as extended support for non-standard CSVs with multi-character delimiters, alternate timestamp formats, and more.
2.1.1.2.6 Best practices for CSV
Text files like CSV rarely conform to RFC 4180 , so alterations may be required: • Use OFFSET 2 for files containing header rows • Failed rows can be captured in a log file for later analysis, or just to skip them. See Unsupported Field Delimiters for information on skipping rejected rows. • Record delimiters (new lines) can be modified with the RECORD DELIMITER syntax. • If the date formats differ from ISO 8601, refer tothe Supported Date Formats section to see how to override default parsing. • Fields in a CSV can be optionally quoted with double-quotes ("). However, any field containing a newline or another double-quote character must be quoted. If a field is quoted, any double quote that appears must be double-quoted (similar tothe string literals quoting rules. For example, to encode What are "birds"?, the field should appear as "What are ""birds""?".
2.1. Full list of guides 13 SQream DB, Release 2021.1
• Field delimiters don’t have a to be a displayable ASCII character. See Supported Field Delimiters for all options.
2.1.1.2.7 Best practices for Parquet
• Parquet files are loaded through Foreign Tables. The destination table structure has to match in number of columns between the source files. • Parquet files support predicate pushdown. When a query is issued over Parquet files, SQream DBuses row-group metadata to determine which row-groups in a file need to be read for a particular query and the row indexes can narrow the search to a particular set of rows.
2.1.1.2.7.1 Type support and behavior notes
• Unlike ORC, the column types should match the data types exactly (see table below).
SQream DB type → BOOL TINYINT SMALLINT INT BIGINT REAL DOUBLE Text1 DATE DATETIME Parquet source BOOLEAN ✓ INT16 ✓ INT32 ✓ INT64 ✓ FLOAT ✓ DOUBLE ✓ BYTE_ARRAY2 ✓ INT963 ✓4
• If a Parquet file has an unsupported type like enum, uuid, time, json, bson, lists, maps, but the data is not referenced in the table (it does not appear in the SELECT query), the statement will succeed. If the column is referenced, an error will be thrown to the user, explaining that the type is not supported, but the column may be ommited.
2.1.1.2.8 Best practices for ORC
• ORC files are loaded through Foreign Tables. The destination table structure has to match in number of columns between the source files. • ORC files support predicate pushdown. When a query is issued over ORC files, SQream DBusesORC metadata to determine which stripes in a file need to be read for a particular query and the row indexes can narrow the search to a particular set of 10,000 rows.
2.1.1.2.8.1 Type support and behavior notes
• ORC files are loaded through Foreign Tables. The destination table structure has to match in number of columns between the source files. • The types should match to some extent within the same “class” (see table below).
1 Text values include TEXT, VARCHAR, and NVARCHAR 2 With UTF8 annotation 3 With TIMESTAMP_NANOS or TIMESTAMP_MILLIS annotation 4 Any microseconds will be rounded down to milliseconds.
14 Chapter 2. Guides SQream DB, Release 2021.1
SQream DB type → BOOL TINYINT SMALLINTINT BIGINT REAL DOUBLE Text1 DATE DATETIME ORC source boolean ✓ ✓5 ✓5 ✓5 ✓5 tinyint 6 ✓ ✓ ✓ ✓ smallint 6 7 ✓ ✓ ✓ int 6 7 7 ✓ ✓ bigint 6 7 7 7 ✓ float ✓ ✓ double ✓ ✓ string / char / varchar ✓ date ✓ ✓ timestamp, timestamp ✓ with timezone
• If an ORC file has an unsupported type like binary, list, map, and union, but the data is not referenced in the table (it does not appear in the SELECT query), the statement will succeed. If the column is referenced, an error will be thrown to the user, explaining that the type is not supported, but the column may be ommited.
2.1.1.3 Further reading and migration guides
2.1.1.3.1 Insert from CSV
This guide covers inserting data from CSV files into SQream DB using the COPY FROM method.
In this topic:
• 1. Prepare CSVs • 2. Place CSVs where SQream DB workers can access • 3. Figure out the table structure • 4. Bulk load the data with COPY FROM • Loading different types of CSV files – Loading a standard CSV file from a local filesystem – Loading a PSV (pipe separated value) file – Loading a TSV (tab separated value) file – Loading a text file with non-printable delimiter – Loading a text file with multi-character delimiters – Loading files with a header row – Loading files formatted for Windows (\r\n) – Loading a file from a public S3 bucket – Loading files from an authenticated S3 bucket
5 Boolean values are cast to 0, 1 6 Will succeed if all values are 0, 1 7 Will succeed if all values fit the destination type
2.1. Full list of guides 15 SQream DB, Release 2021.1
– Loading files from an HDFS storage – Saving rejected rows to a file – Stopping the load if a certain amount of rows were rejected – Load CSV files from a set of directories – Rearrange destination columns – Loading non-standard dates
2.1.1.3.1.1 1. Prepare CSVs
Prepare the source CSVs, with the following requirements: • Files should be a valid CSV. By default, SQream DB’s CSV parser can handle RFC 4180 standard CSVs , but can also be modified to support non-standard CSVs (with multi-character delimiters, unquoted fields, etc). • Files are UTF-8 or ASCII encoded • Field delimiter is an ASCII character or characters • Record delimiter, also known as a new line separator, is a Unix-style newline (\n), DOS-style newline (\r\n), or Mac style newline (\r). • Fields are optionally enclosed by double-quotes, or mandatory quoted if they contain one of the fol- lowing characters: – The record delimiter or field delimiter – A double quote character – A newline • If a field is quoted, any double quote that appears must be double-quoted (similar tothe string literals quoting rules. For example, to encode What are "birds"?, the field should appear as "What are ""birds""?". Other modes of escaping are not supported (e.g. 1,"What are \"birds\"?" is not a valid way of escaping CSV values). • NULL values can be marked in two ways in the CSV: – An explicit null marker. For example, col1,\N,col3 – An empty field delimited by the field delimiter. For example, col1,,col3
Note: If a text field is quoted but contains no content ("") it is considered an empty text field. It is not considered NULL.
2.1.1.3.1.2 2. Place CSVs where SQream DB workers can access
During data load, the COPY FROM command can run on any worker (unless explicitly speficied with the Workload Manager). It is important that every node has the same view of the storage being used - meaning, every SQream DB worker should have access to the files. • For files hosted on NFS, ensure that the mount is accessible from all servers.
16 Chapter 2. Guides SQream DB, Release 2021.1
• For HDFS, ensure that SQream DB servers can access the HDFS name node with the correct user-id. See our Using SQream in an HDFS Environment guide for more information. • For S3, ensure network access to the S3 endpoint. See our Amazon S3 guide for more information.
2.1.1.3.1.3 3. Figure out the table structure
Prior to loading data, you will need to write out the table structure, so that it matches the file structure. For example, to import the data from nba.csv, we will first look at the file:
Table 1: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics
• The file format in this case is CSV, and it is stored as an S3object. • The first row of the file is a header containing column names. • The record delimiter was a DOS newline (\r\n). • The file is stored on S3, at s3://sqream-demo-data/nba.csv. We will make note of the file structure to create a matching CREATE TABLE statement.
CREATE TABLE nba ( Name varchar(40), Team varchar(40), Number tinyint, Position varchar(2), Age tinyint, Height varchar(4), Weight real, College varchar(40), Salary float );
2.1. Full list of guides 17 SQream DB, Release 2021.1
2.1.1.3.1.4 4. Bulk load the data with COPY FROM
The CSV is a standard CSV, but with two differences from SQream DB defaults: • The record delimiter is not a Unix newline (\n), but a Windows newline (\r\n) • The first row of the file is a header containing column names, which we’ll wanttoskip.
COPY nba FROM 's3://sqream-demo-data/nba.csv' WITH RECORD DELIMITER '\r\n' OFFSET 2;
Repeat steps 3 and 4 for every CSV file you want to import.
2.1.1.3.1.5 Loading different types of CSV files
COPY FROM contains several configuration options. See more in the COPY FROM elements section.
2.1.1.3.1.6 Loading a standard CSV file from a local filesystem
COPY table_name FROM '/home/rhendricks/file.csv';
2.1.1.3.1.7 Loading a PSV (pipe separated value) file
COPY table_name FROM '/home/rhendricks/file.psv' WITH DELIMITER '|';
2.1.1.3.1.8 Loading a TSV (tab separated value) file
COPY table_name FROM '/home/rhendricks/file.tsv' WITH DELIMITER '\t';
2.1.1.3.1.9 Loading a text file with non-printable delimiter
In the file below, the separator is DC1, which is represented by ASCII 17 decimal or 021 octal.
COPY table_name FROM 'file.txt' WITH DELIMITER E'\021';
2.1.1.3.1.10 Loading a text file with multi-character delimiters
In the file below, the separator is '|.
COPY table_name FROM 'file.txt' WITH DELIMITER '''|';
18 Chapter 2. Guides SQream DB, Release 2021.1
2.1.1.3.1.11 Loading files with a header row
Use OFFSET to skip rows.
Note: When loading multiple files (e.g. with wildcards), this setting affects each file separately.
COPY table_name FROM 'filename.psv' WITH DELIMITER '|' OFFSET 2;
2.1.1.3.1.12 Loading files formatted for Windows (\r\n)
COPY table_name FROM 'filename.psv' WITH DELIMITER '|' RECORD DELIMITER '\r\n';
2.1.1.3.1.13 Loading a file from a public S3 bucket
Note: The bucket must be publicly available and objects can be listed
COPY nba FROM 's3://sqream-demo-data/nba.csv' WITH OFFSET 2 RECORD DELIMITER '\r\n';
2.1.1.3.1.14 Loading files from an authenticated S3 bucket
COPY nba FROM 's3://secret-bucket/*.csv' WITH OFFSET 2 RECORD DELIMITER '\r\n' AWS_ID ,→'12345678' AWS_SECRET 'super_secretive_secret';
2.1.1.3.1.15 Loading files from an HDFS storage
COPY nba FROM 'hdfs://hadoop-nn.piedpiper.com/rhendricks/*.csv' WITH OFFSET 2 RECORD␣ ,→DELIMITER '\r\n';
2.1.1.3.1.16 Saving rejected rows to a file
See Unsupported Field Delimiters for more information about the error handling capabilities of COPY FROM.
COPY table_name FROM 'filename.psv' WITH DELIMITER '|' ERROR_LOG '/temp/load_error.log' -- Save error log ERROR_VERBOSITY 0; -- Only save rejected rows
2.1.1.3.1.17 Stopping the load if a certain amount of rows were rejected
2.1. Full list of guides 19 SQream DB, Release 2021.1
COPY table_name FROM 'filename.csv' WITH delimiter '|' ERROR_LOG '/temp/load_err.log' -- Save error␣ ,→log OFFSET 2 -- skip header row LIMIT 100 -- Only load 100 rows STOP AFTER 5 ERRORS; -- Stop the load if 5␣ ,→errors reached
2.1.1.3.1.18 Load CSV files from a set of directories
Use glob patterns (wildcards) to load multiple files to one table.
COPY table_name from '/path/to/files/2019_08_*/*.csv';
2.1.1.3.1.19 Rearrange destination columns
When the source of the files does not match the table structure, tell the COPY command what the order of columns should be
COPY table_name (fifth, first, third) FROM '/path/to/files/*.csv';
Note: Any column not specified will revert to its default value or NULL value if nullable
2.1.1.3.1.20 Loading non-standard dates
If files contain dates not formatted as ISO8601, tell COPY how to parse the column. After parsing, the date will appear as ISO8601 inside SQream DB. In this example, date_col1 and date_col2 in the table are non-standard. date_col3 is mentioned explicitly, but can be left out. Any column that is not specified is assumed to be ISO8601.
COPY table_name FROM '/path/to/files/*.csv' WITH PARSERS 'date_col1=YMD,date_col2=MDY, ,→date_col3=default';
Tip: The full list of supported date formats can be found under the Supported date formats section of the COPY FROM reference.
2.1.1.3.2 Insert from Parquet
This guide covers inserting data from Parquet files into SQream DB using FOREIGN TABLE.
In this topic:
• 1. Prepare the files
20 Chapter 2. Guides SQream DB, Release 2021.1
• 2. Place Parquet files where SQream DB workers can access them • 3. Figure out the table structure • 4. Verify table contents • 5. Copying data into SQream DB – Working around unsupported column types – Modifying data during the copy process • Further Parquet loading examples – Loading a table from a directory of Parquet files on HDFS – Loading a table from a bucket of files on S3
2.1.1.3.2.1 1. Prepare the files
Prepare the source Parquet files, with the following requirements:
SQream DB type → BOOL TINYINT SMALLINTINT BIGINT REAL DOUBLE Text1 DATE DATETIME Parquet source BOOLEAN ✓ INT16 ✓ INT32 ✓ INT64 ✓ FLOAT ✓ DOUBLE ✓ BYTE_ARRAY / ✓ FIXED_LEN_BYTE_ARRAY2 INT963 ✓4
• If a Parquet file has an unsupported type like enum, uuid, time, json, bson, lists, maps, but the data is not referenced in the table (it does not appear in the SELECT query), the statement will succeed. If the column is referenced, an error will be thrown to the user, explaining that the type is not supported, but the column may be ommited. This can be worked around. See more information in the examples.
2.1.1.3.2.2 2. Place Parquet files where SQream DB workers can access them
Any worker may try to access files (unless explicitly speficied with the Workload Manager). It is important that every node has the same view of the storage being used - meaning, every SQream DB worker should have access to the files. • For files hosted on NFS, ensure that the mount is accessible from all servers. • For HDFS, ensure that SQream DB servers can access the HDFS name node with the correct user-id. See our Using SQream in an HDFS Environment guide for more information. • For S3, ensure network access to the S3 endpoint. See our Amazon S3 guide for more information.
1 Text values include TEXT, VARCHAR, and NVARCHAR 2 With UTF8 annotation 3 With TIMESTAMP_NANOS or TIMESTAMP_MILLIS annotation 4 Any microseconds will be rounded down to milliseconds.
2.1. Full list of guides 21 SQream DB, Release 2021.1
2.1.1.3.2.3 3. Figure out the table structure
Prior to loading data, you will need to write out the table structure, so that it matches the file structure. For example, to import the data from nba.parquet, we will first look at the source table:
Table 2: nba.parquet Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics
• The file is stored on S3, at s3://sqream-demo-data/nba.parquet. We will make note of the file structure to create a matching CREATE EXTERNAL TABLE statement.
CREATE FOREIGN TABLE ext_nba ( Name VARCHAR(40), Team VARCHAR(40), Number BIGINT, Position VARCHAR(2), Age BIGINT, Height VARCHAR(4), Weight BIGINT, College VARCHAR(40), Salary FLOAT ) WRAPPER parquet_fdw OPTIONS ( LOCATION = 's3://sqream-demo-data/nba.parquet' );
Tip: Types in SQream DB must match Parquet types exactly. If the column type isn’t supported, a possible workaround is to set it to any arbitrary type and then exclude
22 Chapter 2. Guides SQream DB, Release 2021.1 it from subsequent queries.
2.1.1.3.2.4 4. Verify table contents
External tables do not verify file integrity or structure, so verify that the table definition matches upand contains the correct data. t=> SELECT * FROM ext_nba LIMIT 10; Name | Team | Number | Position | Age | Height | Weight | College ␣ ,→ | Salary ------+------+------+------+-----+------+------+------,→------+------Avery Bradley | Boston Celtics | 0 | PG | 25 | 6-2 | 180 | Texas ␣ ,→ | 7730337 Jae Crowder | Boston Celtics | 99 | SF | 25 | 6-6 | 235 | Marquette ␣ ,→ | 6796117 John Holland | Boston Celtics | 30 | SG | 27 | 6-5 | 205 | Boston␣ ,→University | R.J. Hunter | Boston Celtics | 28 | SG | 22 | 6-5 | 185 | Georgia␣ ,→State | 1148640 Jonas Jerebko | Boston Celtics | 8 | PF | 29 | 6-10 | 231 | ␣ ,→ | 5000000 Amir Johnson | Boston Celtics | 90 | PF | 29 | 6-9 | 240 | ␣ ,→ | 12000000 Jordan Mickey | Boston Celtics | 55 | PF | 21 | 6-8 | 235 | LSU ␣ ,→ | 1170960 Kelly Olynyk | Boston Celtics | 41 | C | 25 | 7-0 | 238 | Gonzaga ␣ ,→ | 2165160 Terry Rozier | Boston Celtics | 12 | PG | 22 | 6-2 | 190 | Louisville ␣ ,→ | 1824360 Marcus Smart | Boston Celtics | 36 | PG | 22 | 6-4 | 220 | Oklahoma␣ ,→State | 3431040
If any errors show up at this stage, verify the structure of the Parquet files and match them to the external table structure you created.
2.1.1.3.2.5 5. Copying data into SQream DB
To load the data into SQream DB, use the CREATE TABLE AS statement:
CREATE TABLE nba AS SELECT * FROM ext_nba;
2.1.1.3.2.6 Working around unsupported column types
Suppose you only want to load some of the columns - for example, if one of the columns isn’t supported. By ommitting unsupported columns from queries that access the EXTERNAL TABLE, they will never be called, and will not cause a “type mismatch” error. For this example, assume that the Position column isn’t supported because of its type.
2.1. Full list of guides 23 SQream DB, Release 2021.1
CREATE TABLE nba AS SELECT Name, Team, Number, NULL as Position, Age, Height, Weight, College, Salary␣ ,→FROM ext_nba;
-- We ommitted the unsupported column `Position` from this query, and replaced it with a␣ ,→default ``NULL`` value, to maintain the same table structure.
2.1.1.3.2.7 Modifying data during the copy process
One of the main reasons for staging data with EXTERNAL TABLE is to examine the contents and modify them before loading them. Assume we are unhappy with weight being in pounds, because we want to use kilograms instead. We can apply the transformation as part of the CREATE TABLE AS statement. Similar to the previous example, we will also set the Position column as a default NULL.
CREATE TABLE nba AS SELECT name, team, number, NULL as position, age, height, (weight / 2.205) as weight,␣ ,→college, salary FROM ext_nba ORDER BY weight;
2.1.1.3.2.8 Further Parquet loading examples
CREATE FOREIGN TABLE contains several configuration options. See more in the CREATE FOREIGN TABLE parameters section.
2.1.1.3.2.9 Loading a table from a directory of Parquet files on HDFS
CREATE FOREIGN TABLE ext_users (id INT NOT NULL, name VARCHAR(30) NOT NULL, email VARCHAR(50) NOT NULL) WRAPPER parquet_fdw OPTIONS ( LOCATION = 'hdfs://hadoop-nn.piedpiper.com/rhendricks/users/*.parquet' );
CREATE TABLE users AS SELECT * FROM ext_users;
2.1.1.3.2.10 Loading a table from a bucket of files on S3
CREATE FOREIGN TABLE ext_users (id INT NOT NULL, name VARCHAR(30) NOT NULL, email VARCHAR(50) NOT NULL) WRAPPER parquet_fdw OPTIONS ( LOCATION = 's3://pp-secret-bucket/users/*.parquet', AWS_ID = 'our_aws_id', (continues on next page)
24 Chapter 2. Guides SQream DB, Release 2021.1
(continued from previous page) AWS_SECRET = 'our_aws_secret' );
CREATE TABLE users AS SELECT * FROM ext_users;
2.1.1.3.3 Insert from ORC
This guide covers inserting data from ORC files into SQream DB using FOREIGN TABLE.
2.1.1.3.3.1 1. Prepare the files
Prepare the source ORC files, with the following requirements:
SQream DB type → BOOL TINYINT SMALLINTINT BIGINT REAL DOUBLE Text1 DATE DATETIME ORC source boolean ✓ ✓2 ✓2 ✓2 ✓2 tinyint 3 ✓ ✓ ✓ ✓ smallint 3 4 ✓ ✓ ✓ int 3 4 4 ✓ ✓ bigint 3 4 4 4 ✓ float ✓ ✓ double ✓ ✓ string / char / varchar ✓ date ✓ ✓ timestamp, timestamp ✓ with timezone
• If an ORC file has an unsupported type like binary, list, map, and union, but the data is not referenced in the table (it does not appear in the SELECT query), the statement will succeed. If the column is referenced, an error will be thrown to the user, explaining that the type is not supported, but the column may be ommited. This can be worked around. See more information in the examples.
2.1.1.3.3.2 2. Place ORC files where SQream DB workers can access them
Any worker may try to access files (unless explicitly speficied with the Workload Manager). It is important that every node has the same view of the storage being used - meaning, every SQream DB worker should have access to the files. • For files hosted on NFS, ensure that the mount is accessible from all servers. • For HDFS, ensure that SQream DB servers can access the HDFS name node with the correct user-id. See our Using SQream in an HDFS Environment guide for more information. • For S3, ensure network access to the S3 endpoint. See our Amazon S3 guide for more information.
1 Text values include TEXT, VARCHAR, and NVARCHAR 2 Boolean values are cast to 0, 1 3 Will succeed if all values are 0, 1 4 Will succeed if all values fit the destination type
2.1. Full list of guides 25 SQream DB, Release 2021.1
2.1.1.3.3.3 3. Figure out the table structure
Prior to loading data, you will need to write out the table structure, so that it matches the file structure. For example, to import the data from nba.orc, we will first look at the source table:
Table 3: nba.orc Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics
• The file is stored on S3, at s3://sqream-demo-data/nba.orc. We will make note of the file structure to create a matching CREATE FOREIGN TABLE statement.
CREATE FOREIGN TABLE ext_nba ( Name VARCHAR(40), Team VARCHAR(40), Number BIGINT, Position VARCHAR(2), Age BIGINT, Height VARCHAR(4), Weight BIGINT, College VARCHAR(40), Salary FLOAT ) WRAPPER orc_fdw OPTIONS ( LOCATION = 's3://sqream-demo-data/nba.orc' );
Tip: Types in SQream DB must match ORC types according to the table above. If the column type isn’t supported, a possible workaround is to set it to any arbitrary type and then exclude
26 Chapter 2. Guides SQream DB, Release 2021.1 it from subsequent queries.
2.1.1.3.3.4 4. Verify table contents
External tables do not verify file integrity or structure, so verify that the table definition matches upand contains the correct data. t=> SELECT * FROM ext_nba LIMIT 10; Name | Team | Number | Position | Age | Height | Weight | College ␣ ,→ | Salary ------+------+------+------+-----+------+------+------,→------+------Avery Bradley | Boston Celtics | 0 | PG | 25 | 6-2 | 180 | Texas ␣ ,→ | 7730337 Jae Crowder | Boston Celtics | 99 | SF | 25 | 6-6 | 235 | Marquette ␣ ,→ | 6796117 John Holland | Boston Celtics | 30 | SG | 27 | 6-5 | 205 | Boston␣ ,→University | R.J. Hunter | Boston Celtics | 28 | SG | 22 | 6-5 | 185 | Georgia␣ ,→State | 1148640 Jonas Jerebko | Boston Celtics | 8 | PF | 29 | 6-10 | 231 | ␣ ,→ | 5000000 Amir Johnson | Boston Celtics | 90 | PF | 29 | 6-9 | 240 | ␣ ,→ | 12000000 Jordan Mickey | Boston Celtics | 55 | PF | 21 | 6-8 | 235 | LSU ␣ ,→ | 1170960 Kelly Olynyk | Boston Celtics | 41 | C | 25 | 7-0 | 238 | Gonzaga ␣ ,→ | 2165160 Terry Rozier | Boston Celtics | 12 | PG | 22 | 6-2 | 190 | Louisville ␣ ,→ | 1824360 Marcus Smart | Boston Celtics | 36 | PG | 22 | 6-4 | 220 | Oklahoma␣ ,→State | 3431040
If any errors show up at this stage, verify the structure of the ORC files and match them to the external table structure you created.
2.1.1.3.3.5 5. Copying data into SQream DB
To load the data into SQream DB, use the CREATE TABLE AS statement:
CREATE TABLE nba AS SELECT * FROM ext_nba;
2.1.1.3.3.6 Working around unsupported column types
Suppose you only want to load some of the columns - for example, if one of the columns isn’t supported. By ommitting unsupported columns from queries that access the EXTERNAL TABLE, they will never be called, and will not cause a “type mismatch” error. For this example, assume that the Position column isn’t supported because of its type.
2.1. Full list of guides 27 SQream DB, Release 2021.1
CREATE TABLE nba AS SELECT Name, Team, Number, NULL as Position, Age, Height, Weight, College, Salary␣ ,→FROM ext_nba;
-- We ommitted the unsupported column `Position` from this query, and replaced it with a␣ ,→default ``NULL`` value, to maintain the same table structure.
2.1.1.3.3.7 Modifying data during the copy process
One of the main reasons for staging data with EXTERNAL TABLE is to examine the contents and modify them before loading them. Assume we are unhappy with weight being in pounds, because we want to use kilograms instead. We can apply the transformation as part of the CREATE TABLE AS statement. Similar to the previous example, we will also set the Position column as a default NULL.
CREATE TABLE nba AS SELECT name, team, number, NULL as position, age, height, (weight / 2.205) as weight,␣ ,→college, salary FROM ext_nba ORDER BY weight;
2.1.1.3.3.8 Further ORC loading examples
CREATE FOREIGN TABLE contains several configuration options. See more in the CREATE FOREIGN TABLE parameters section.
2.1.1.3.3.9 Loading a table from a directory of ORC files on HDFS
CREATE FOREIGN TABLE ext_users (id INT NOT NULL, name VARCHAR(30) NOT NULL, email VARCHAR(50) NOT NULL) WRAPPER orc_fdw OPTIONS ( LOCATION = 'hdfs://hadoop-nn.piedpiper.com/rhendricks/users/*.ORC' );
CREATE TABLE users AS SELECT * FROM ext_users;
2.1.1.3.3.10 Loading a table from a bucket of files on S3
CREATE FOREIGN TABLE ext_users (id INT NOT NULL, name VARCHAR(30) NOT NULL, email VARCHAR(50) NOT NULL) WRAPPER orc_fdw OPTIONS ( LOCATION = 's3://pp-secret-bucket/users/*.ORC', AWS_ID = 'our_aws_id', (continues on next page)
28 Chapter 2. Guides SQream DB, Release 2021.1
(continued from previous page) AWS_SECRET = 'our_aws_secret' ) ;
CREATE TABLE users AS SELECT * FROM ext_users;
2.1.1.3.4 Migrating from Oracle
This guide covers actions required for migrating from Oracle to SQream DB with CSV files.
In this topic:
• 1. Preparing the tools and login information • 2. Export the desired schema – Examples ∗ Dump all tables ∗ Dump only specific tables • 3. Convert the Oracle dump to standard SQL – Example • 4. Figure out the database structures – Remove unsupported attributes – Match data types – Additional considerations • 5. Create the tables in SQream DB – Example • 6. Export tables to CSVs – Using SQL*Plus to export data lists – Creating CSVs using stored procedures ∗ CSV generation considerations • 7. Place CSVs where SQream DB workers can access • 8. Bulk load the CSVs – Example • 9. Rewrite Oracle queries
2.1.1.3.4.1 1. Preparing the tools and login information
• Migrating data from Oracle requires a username and password for your Oracle system. • In this guide, we’ll use the Oracle Data Pump , specifically the Data Pump Export utility .
2.1. Full list of guides 29 SQream DB, Release 2021.1
2.1.1.3.4.2 2. Export the desired schema
Use the Data Pump Export utility to export the database schema. The format for using the Export utility is expdp
2.1.1.3.4.3 Examples
2.1.1.3.4.4 Dump all tables
$ expdp rhendricks/secretpassword DIRECTORY=dpumpdir DUMPFILE=tables.dmp␣ ,→CONTENT=metadata_only NOLOGFILE
2.1.1.3.4.5 Dump only specific tables
In this example, we specify two tables for dumping.
$ expdp rhendricks/secretpassword DIRECTORY=dpumpdir DUMPFILE=tables.dmp␣ ,→CONTENT=metadata_only TABLES=employees,jobs NOLOGFILE
2.1.1.3.4.6 3. Convert the Oracle dump to standard SQL
Oracle’s Data Pump Import utility will help us convert the dump from the previous step to standard SQL. The format for using the Import utility is impdp
2.1.1.3.4.7 Example
$ impdp rhendricks/secretpassword DIRECTORY=dpumpdir DUMPFILE=tables.dmp SQLFILE=sql_ ,→export.sql TRANSFORM=SEGMENT_ATTRIBUTES:N:table PARTITION_OPTIONS=MERGE
2.1.1.3.4.8 4. Figure out the database structures
Using the SQL file created in the previous step, write CREATE TABLE statements to match theschemas of the tables.
30 Chapter 2. Guides SQream DB, Release 2021.1
2.1.1.3.4.9 Remove unsupported attributes
Trim unsupported primary keys, indexes, constraints, and other unsupported Oracle attributes.
2.1.1.3.4.10 Match data types
Refer to the table below to match the Oracle source data type to a new SQream DB type:
Table 4: Data types Oracle Data type Precision SQream DB data type CHAR(n), CHARACTER(n) Any n VARCHAR(n) BLOB, CLOB, NCLOB, LONG TEXT DATE DATE FLOAT(p) p <= 63 REAL FLOAT(p) p > 63 FLOAT, DOUBLE NCHAR(n), NVARCHAR2(n) Any n TEXT (alias of NVARCHAR) NUMBER(p), NUMBER(p,0) p < 5 SMALLINT NUMBER(p), NUMBER(p,0)‘ p < 9 INT NUMBER(p), NUMBER(p,0)‘ p < 19 INT NUMBER(p), NUMBER(p,0)‘ p >= 20 BIGINT NUMBER(p,f), NUMBER(*,f) f > 0 FLOAT / DOUBLE VARCHAR(n), VARCHAR2(n) Any n VARCHAR(n) or TEXT TIMESTAMP DATETIME
Read more about supported data types in SQream DB.
2.1.1.3.4.11 Additional considerations
• Understand how tables are created in SQream DB • Learn how SQream DB handles null values, particularly with regards to constraints. • Oracle roles and user management commands need to be rewritten to SQream DB’s format. SQream DB supports full role-based access control (RBAC) similar to Oracle.
2.1.1.3.4.12 5. Create the tables in SQream DB
After rewriting the table strucutres, create them in SQream DB.
2.1.1.3.4.13 Example
Consider Oracle’s HR.EMPLOYEES sample table:
CREATE TABLE employees ( employee_id NUMBER(6) , first_name VARCHAR2(20) , last_name VARCHAR2(25) CONSTRAINT emp_last_name_nn NOT NULL , email VARCHAR2(25) (continues on next page)
2.1. Full list of guides 31 SQream DB, Release 2021.1
(continued from previous page) CONSTRAINT emp_email_nn NOT NULL , phone_number VARCHAR2(20) , hire_date DATE CONSTRAINT emp_hire_date_nn NOT NULL , job_id VARCHAR2(10) CONSTRAINT emp_job_nn NOT NULL , salary NUMBER(8,2) , commission_pct NUMBER(2,2) , manager_id NUMBER(6) , department_id NUMBER(4) , CONSTRAINT emp_salary_min CHECK (salary > 0) , CONSTRAINT emp_email_uk UNIQUE (email) ); CREATE UNIQUE INDEX emp_emp_id_pk ON employees (employee_id) ;
ALTER TABLE employees ADD ( CONSTRAINT emp_emp_id_pk PRIMARY KEY (employee_id) , CONSTRAINT emp_dept_fk FOREIGN KEY (department_id) REFERENCES departments , CONSTRAINT emp_job_fk FOREIGN KEY (job_id) REFERENCES jobs (job_id) , CONSTRAINT emp_manager_fk FOREIGN KEY (manager_id) REFERENCES employees );
This table rewritten for SQream DB would be created like this:
CREATE TABLE employees ( employee_id SMALLINT NOT NULL, first_name VARCHAR(20), last_name VARCHAR(25) NOT NULL, email VARCHAR(20) NOT NULL, phone_number VARCHAR(20), hire_date DATE NOT NULL, job_id VARCHAR(10) NOT NULL, salary FLOAT, commission_pct REAL, manager_id SMALLINT, department_id TINYINT );
32 Chapter 2. Guides SQream DB, Release 2021.1
2.1.1.3.4.14 6. Export tables to CSVs
Exporting CSVs from Oracle servers is not a trivial task.
Options for exporting to CSVs
• Using SQL*Plus to export data lists • Creating CSVs using stored procedures – CSV generation considerations
2.1.1.3.4.15 Using SQL*Plus to export data lists
Here’s a sample SQL*Plus script that will export PSVs in a format that SQream DB can read: Download to_csv.sql
Listing 1: Oracle SQL*Plus CSV export script
1 SET ECHO OFF 2 SET TERMOUT OFF 3 SET FEEDBACK OFF 4 SET WRAP OFF 5 SET PAGESIZE 0 6 SET LINESIZE 10000 7 SET ARRAYSIZE 5000 8 SET NUMW 30 9 SET TRIMSPOOL ON 10 SET UNDERLINE OFF 11 SET RECSEP OFF 12 SET VERIFY OFF 13 SET COLSEP '|'
14
15
16 SPOOL '&1'
17
18 ALTER SESSION SET nls_date_format = 'YYYY-MM-DD HH24:MI:SS';
19
20 SELECT * FROM &1;
21
22 SPOOL OFF
Enter SQL*Plus and export tables one-by-one interactively:
$ sqlplus rhendricks/secretpassword
@spool employees @spool jobs [...] EXIT
Each table is exported as a data list file (.lst).
2.1. Full list of guides 33 SQream DB, Release 2021.1
2.1.1.3.4.16 Creating CSVs using stored procedures
You can use stored procedures if you have them set-up. Examples of stored procedures for generating CSVs
2.1.1.3.4.17 CSV generation considerations
• Files should be a valid CSV. By default, SQream DB’s CSV parser can handle RFC 4180 standard CSVs , but can also be modified to support non-standard CSVs (with multi-character delimiters, unquoted fields, etc). • Files are UTF-8 or ASCII encoded • Field delimiter is an ASCII character or characters • Record delimiter, also known as a new line separator, is a Unix-style newline (\n), DOS-style newline (\r\n), or Mac style newline (\r). • Fields are optionally enclosed by double-quotes, or mandatory quoted if they contain one of the fol- lowing characters: – The record delimiter or field delimiter – A double quote character – A newline • If a field is quoted, any double quote that appears must be double-quoted (similar tothe string literals quoting rules. For example, to encode What are "birds"?, the field should appear as "What are ""birds""?". Other modes of escaping are not supported (e.g. 1,"What are \"birds\"?" is not a valid way of escaping CSV values). • NULL values can be marked in two ways in the CSV: – An explicit null marker. For example, col1,\N,col3 – An empty field delimited by the field delimiter. For example, col1,,col3
Note: If a text field is quoted but contains no content ("") it is considered an empty text field. It is not considered NULL.
2.1.1.3.4.18 7. Place CSVs where SQream DB workers can access
During data load, the COPY FROM command can run on any worker (unless explicitly speficied with the Workload Manager). It is important that every node has the same view of the storage being used - meaning, every SQream DB worker should have access to the files. • For files hosted on NFS, ensure that the mount is accessible from all servers. • For HDFS, ensure that SQream DB servers can access the HDFS name node with the correct user-id • For S3, ensure network access to the S3 endpoint
34 Chapter 2. Guides SQream DB, Release 2021.1
2.1.1.3.4.19 8. Bulk load the CSVs
Issue the COPY FROM commands to SQream DB to insert a table from the CSVs created. Repeat the COPY FROM command for each table exported from Oracle.
2.1.1.3.4.20 Example
For the employees table, run the following command:
COPY employees FROM 'employees.lst' WITH DELIMITER '|';
2.1.1.3.4.21 9. Rewrite Oracle queries
SQream DB supports a large subset of ANSI SQL. You will have to refactor much of Oracle’s SQL and functions that often are not ANSI SQL. We recommend the following resources: • SQL Feature Checklist - to understand SQream DB’s SQL feature support. • Optimization and Best Practices - to understand best practices for SQL queries and schema design. • Common table expressions (CTEs) - CTEs can be used to rewrite complex queries in a compact form. • Concurrency and locks - to understand the difference between Oracle’s transactions and SQream DB’s concurrency. • Identity - SQream DB supports sequences, but no triggers for auto-increment. • Joins - SQream DB supports ANSI join syntax. Oracle uses the + operator which SQream DB doesn’t support. • Saved queries - Saved queries can be used to emulate some stored procedures. • Subqueries - SQream DB supports a limited set of subqueries. • Python UDF (user-defined functions) - SQream DB supports Python User Defined Functions which can be used to run complex operations in-line. • Views - SQream DB supports logical views, but does not support materialized views. • Window Functions - SQream DB supports a wide array of window functions.
See also:
• COPY FROM • INSERT • Foreign Tables
2.1.2 Client drivers for v2021.1
These guides explain how to use the SQream DB client drivers, and how to use client applications with SQream DB.
2.1. Full list of guides 35 SQream DB, Release 2021.1
2.1.2.1 Client driver downloads
2.1.2.1.1 All operating systems
• JDBC - sqream-jdbc-4.4.0 (.jar) JDBC Driver (SQream recommends installing via mvn) • Python - pysqream v3.1.3 (.tar.gz) Python (pysqream) - Python driver (SQream recommends installing via pip) • Node.JS - sqream-v4.2.4 (.tar.gz) Node.JS - Node.JS driver (SQream recommends installing via npm)
2.1.2.1.2 Windows
• .Net driver - SQream .Net driver v3.0.2
2.1.2.1.3 Linux
• SQream SQL (x86) - sqream-sql-v2020.1.1_stable.x86_64.tar.gz sqream sql - Interactive command-line SQL client for Intel-based machines • SQream SQL (IBM POWER9) - sqream-sql-v2020.1.1_stable.ppc64le.tar.gz sqream sql - Interactive command-line SQL client for IBM POWER9-based machines • C++ connector - libsqream-4.1 C++ shared object library
2.1.2.1.3.1 Python (pysqream)
The SQream Python connector is a set of packages that allows Python programs to connect to SQream DB. • pysqream is a pure Python connector. It can be installed with pip on any operating system, including Linux, Windows, and macOS. • pysqream-sqlalchemy is a SQLAlchemy dialect for pysqream The connector supports Python 3.6.5 and newer. The base pysqream package conforms to Python DB-API specifications PEP-249.
In this topic:
• Installing the Python connector – Prerequisites
36 Chapter 2. Guides SQream DB, Release 2021.1
∗ 1. Python ∗ 2. PIP ∗ 3. OpenSSL for Linux ∗ 4. Cython (optional) – Install via pip – Upgrading an existing installation – Validate the installation • SQLAlchemy examples – A simple connection example – Pulling a table into Pandas • API Examples – Explaining the connection example – Using the cursor – Reading result metadata – Loading data into a table – Reading data from a CSV file for load into a table – Using SQLAlchemy ORM to create tables and fill them with data
2.1.2.1.3.2 Installing the Python connector
2.1.2.1.3.3 Prerequisites
2.1.2.1.3.4 1. Python
The connector requires Python 3.6.5 or newer. To verify your version of Python:
$ python --version Python 3.7.3
Note: If both Python 2.x and 3.x are installed, you can run python3 and pip3 instead of python and pip respectively for the rest of this guide
Warning: If you’re running on an older version, pip will fetch an older version of pysqream, with version <3.0.0. This version is currently not supported.
2.1.2.1.3.5 2. PIP
The Python connector is installed via pip, the Python package manager and installer.
2.1. Full list of guides 37 SQream DB, Release 2021.1
We recommend upgrading to the latest version of pip before installing. To verify that you are on the latest version, run the following command:
$ python -m pip install --upgrade pip Collecting pip Downloading https://files.pythonhosted.org/packages/00/b6/ ,→9cfa56b4081ad13874b0c6f96af8ce16cfbc1cb06bedf8e9164ce5551ec1/pip-19.3.1-py2.py3-none- ,→any.whl (1.4MB) || 1.4MB 1.6MB/s Installing collected packages: pip Found existing installation: pip 19.1.1 Uninstalling pip-19.1.1: Successfully uninstalled pip-19.1.1 Successfully installed pip-19.3.1
Note: • On macOS, you may want to use virtualenv to install Python and the connector, to ensure compatibility with the built-in Python environment • If you encounter an error including SSLError or WARNING: pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available. - please be sure to reinstall Python with SSL enabled, or use virtualenv or Anaconda.
2.1.2.1.3.6 3. OpenSSL for Linux
Some distributions of Python do not include OpenSSL. The Python connector relies on OpenSSL for secure connections to SQream DB. • To install OpenSSL on RHEL/CentOS
$ sudo yum install -y libffi-devel openssl-devel
• To install OpenSSL on Ubuntu
$ sudo apt-get install libssl-dev libffi-dev -y
2.1.2.1.3.7 4. Cython (optional)
Optional but highly recommended is Cython, which improves performance of Python applications.
$ pip install cython
2.1.2.1.3.8 Install via pip
The Python connector is available via PyPi. Install the connector with pip:
$ pip install pysqream pysqream-sqlalchemy
38 Chapter 2. Guides SQream DB, Release 2021.1
pip will automatically install all necessary libraries and modules.
2.1.2.1.3.9 Upgrading an existing installation
The Python drivers are updated periodically. To upgrade an existing pysqream installation, use pip’s -U flag.
$ pip install pysqream pysqream-sqlalchemy -U
2.1.2.1.3.10 Validate the installation
Create a file called test.py, containing the following:
Listing 2: pysqream Validation Script
1 #!/usr/bin/env python
2
3 import pysqream
4
5 """ 6 Connection parameters include: 7 * IP/Hostname 8 * Port 9 * database name 10 * username 11 * password 12 * Connect through load balancer, or direct to worker (Default: false - direct to worker) 13 * use SSL connection (default: false) 14 * Optional service queue (default: 'sqream') 15 """
16
17 # Create a connection object
18
19 con = pysqream.connect(host='127.0.0.1', port=5000, database='master' 20 , username='sqream', password='sqream' 21 , clustered=False)
22
23 # Create a new cursor 24 cur = con.cursor()
25
26 # Prepare and execute a query 27 cur.execute('select show_version()')
28
29 result = cur.fetchall() # `fetchall` gets the entire data set
30
31 print (f"Version: {result[0][0]}")
32
33 # This should print the SQream DB version. For example ``Version: v2020.1``.
34
35 # Finally, close the connection
36
37 con.close()
2.1. Full list of guides 39 SQream DB, Release 2021.1
Make sure to replace the parameters in the connection with the respective parameters for your SQream DB installation. Run the test file to verify that you can connect to SQream DB:
$ python test.py Version: v2020.1
If all went well, you are now ready to build an application using the SQream DB Python connector! If any connection error appears, verify that you have access to a running SQream DB and that the connection parameters are correct.
2.1.2.1.3.11 SQLAlchemy examples
SQLAlchemy is an ORM for Python. When you install the SQream DB dialect (pysqream-sqlalchemy) you can use frameworks like Pandas, TensorFlow, and Alembic to query SQream DB directly.
2.1.2.1.3.12 A simple connection example import sqlalchemy as sa from sqlalchemy.engine.url import URL engine_url = URL('sqream' , username='rhendricks' , password='secret_passwor" , host='localhost' , port=5000 , database='raviga' , query={'use_ssl': False}) engine = sa.create_engine(engine_url) res = engine.execute('create table test (ints int)') res = engine.execute('insert into test values (5), (6)') res = engine.execute('select * from test')
2.1.2.1.3.13 Pulling a table into Pandas
In this example, we use the URL method to create the connection string. import sqlalchemy as sa import pandas as pd from sqlalchemy.engine.url import URL engine_url = URL('sqream' , username='rhendricks' , password='secret_passwor" (continues on next page)
40 Chapter 2. Guides SQream DB, Release 2021.1
(continued from previous page) , host='localhost' , port=5000 , database='raviga' , query={'use_ssl': False}) engine = sa.create_engine(engine_url) table_df = pd.read_sql("select * from nba", con=engine)
2.1.2.1.3.14 API Examples
2.1.2.1.3.15 Explaining the connection example
First, import the package and create a connection
# Import pysqream package import pysqream
""" Connection parameters include: * IP/Hostname * Port * database name * username * password * Connect through load balancer, or direct to worker (Default: false - direct to worker) * use SSL connection (default: false) * Optional service queue (default: 'sqream') """
# Create a connection object con = pysqream.connect(host='127.0.0.1', port=3108, database='raviga' , username='rhendricks', password='Tr0ub4dor&3' , clustered=True)
Then, run a query and fetch the results cur = con.cursor() # Create a new cursor # Prepare and execute a query cur.execute('select show_version()') result = cur.fetchall() # `fetchall` gets the entire data set print (f"Version: {result[0][0]}")
This should print the SQream DB version. For example v2020.1. Finally, we will close the connection
2.1. Full list of guides 41 SQream DB, Release 2021.1
con.close()
2.1.2.1.3.16 Using the cursor
The DB-API specification includes several methods for fetching results from the cursor. We will use the nba example. Here’s a peek at the table contents:
Table 5: nba Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics
Like before, we will import the library and create a Connection(), followed by execute() on a simple SELECT * query. import pysqream con = pysqream.connect(host='127.0.0.1', port=3108, database='master' , username='rhendricks', password='Tr0ub4dor&3' , clustered=True) cur = con.cursor() # Create a new cursor # The select statement: statement = 'SELECT * FROM nba' cur.execute(statement)
After executing the statement, we have a Connection cursor object waiting. A cursor is iterable, meaning that everytime we fetch, it advances the cursor to the next row. Use fetchone() to get one record at a time: first_row = cur.fetchone() # Fetch one row at a time (first row) second_row = cur.fetchone() # Fetch one row at a time (second row)
42 Chapter 2. Guides SQream DB, Release 2021.1
To get several rows at a time, use fetchmany(): # executing `fetchone` twice is equivalent to this form: third_and_fourth_rows = cur.fetchmany(2)
To get all rows at once, use fetchall(): # To get all rows at once, use `fetchall` remaining_rows = cur.fetchall()
# Close the connection when done con.close()
Here are the contents of the row variables we used:
>>> print(first_row) ('Avery Bradley', 'Boston Celtics', 0, 'PG', 25, '6-2', 180, 'Texas', 7730337) >>> print(second_row) ('Jae Crowder', 'Boston Celtics', 99, 'SF', 25, '6-6', 235, 'Marquette', 6796117) >>> print(third_and_fourth_rows) [('John Holland', 'Boston Celtics', 30, 'SG', 27, '6-5', 205, 'Boston University', None), ,→ ('R.J. Hunter', 'Boston Celtics', 28, 'SG', 22, '6-5', 185, 'Georgia State', 1148640)] >>> print(remaining_rows) [('Jonas Jerebko', 'Boston Celtics', 8, 'PF', 29, '6-10', 231, None, 5000000), ('Amir␣ ,→Johnson', 'Boston Celtics', 90, 'PF', 29, '6-9', 240, None, 12000000), ('Jordan Mickey ,→', 'Boston Celtics', 55, 'PF', 21, '6-8', 235, 'LSU', 1170960), ('Kelly Olynyk', ,→'Boston Celtics', 41, 'C', 25, '7-0', 238, 'Gonzaga', 2165160), [...]
Note: Calling a fetch command after all rows have been fetched will return an empty array ([]).
2.1.2.1.3.17 Reading result metadata
When executing a statement, the connection object also contains metadata about the result set (e.g.column names, types, etc). The metadata is stored in the Connection.description object of the cursor. >>> import pysqream >>> con = pysqream.connect(host='127.0.0.1', port=3108, database='master' ... , username='rhendricks', password='Tr0ub4dor&3' ... , clustered=True) >>> cur = con.cursor() >>> statement = 'SELECT * FROM nba' >>> cur.execute(statement)
2.1. Full list of guides 43 SQream DB, Release 2021.1
(continued from previous page)
To get a list of column names, iterate over the description list:
>>> [ i[0] for i in cur.description ] ['Name', 'Team', 'Number', 'Position', 'Age (as of 2018)', 'Height', 'Weight', 'College', ,→ 'Salary']
2.1.2.1.3.18 Loading data into a table
This example loads 10,000 rows of dummy data to a SQream DB instance import pysqream from datetime import date, datetime from time import time con = pysqream.connect(host='127.0.0.1', port=3108, database='master' , username='rhendricks', password='Tr0ub4dor&3' , clustered=True)
# Create a table for loading create = 'create or replace table perf (b bool, t tinyint, sm smallint, i int, bi bigint, ,→ f real, d double, s varchar(12), ss text, dt date, dtt datetime)' con.execute(create)
# After creating the table, we can load data into it with the INSERT command
# Create dummy data which matches the table we created data = (False, 2, 12, 145, 84124234, 3.141,-4.3, "Marty McFly" , u"" , date(2019,␣ ,→12, 17), datetime(1955, 11, 4, 1, 23, 0, 0)) row_count = 10**4
# Get a new cursor cur = con.cursor() insert = 'insert into perf values (?,?,?,?,?,?,?,?,?,?,?)' start = time() cur.executemany(insert, [data] * row_count) print (f"Total insert time for {row_count} rows: {time() - start} seconds")
# Close this cursor cur.close()
# Verify that the data was inserted correctly # Get a new cursor cur = con.cursor() cur.execute('select count(*) from perf') result = cur.fetchall() # `fetchall` collects the entire data set print (f"Count of inserted rows: {result[0][0]}")
(continues on next page)
44 Chapter 2. Guides SQream DB, Release 2021.1
(continued from previous page) # When done, close the cursor cur.close()
# Close the connection con.close()
2.1.2.1.3.19 Reading data from a CSV file for load into a table
We will write a helper function to create an INSERT statement, by reading an existing table’s metadata. import pysqream import datetime def insert_from_csv(cur, table_name, csv_filename, field_delimiter = ',', null_markers =␣ ,→[]): """ We will first ask SQream DB for some table information. This is important for understanding the number of columns, and will help to create a matching INSERT statement """
column_info = cur.execute(f"SELECT * FROM {table_name} LIMIT 0").description
def parse_datetime(v): try: return datetime.datetime.strptime(row[i], '%Y-%m-%d %H:%M:%S.%f') except ValueError: try: return datetime.datetime.strptime(row[i], '%Y-%m-%d %H:%M:%S') except ValueError: return datetime.datetime.strptime(row[i], '%Y-%m-%d')
# Create enough placeholders (`?`) for the INSERT query string qstring = ','.join(['?']* len(column_info)) insert_statement = f"insert into {table_name} values ({qstring})"
# Open the CSV file with open(csv_filename, mode='r') as csv_file: csv_reader = csv.reader(csv_file, delimiter=field_delimiter)
# Execute the INSERT statement with the CSV data cur.executemany(insert_statement, [row for row in csv_reader]) con = pysqream.connect(host='127.0.0.1', port=3108, database='master' , username='rhendricks', password='Tr0ub4dor&3' , clustered=True) cur = con.cursor() (continues on next page)
2.1. Full list of guides 45 SQream DB, Release 2021.1
(continued from previous page) insert_from_csv(cur, 'nba', 'nba.csv', field_delimiter = ',', null_markers = []) con.close()
2.1.2.1.3.20 Using SQLAlchemy ORM to create tables and fill them with data
You can also use the ORM to create tables and insert data to them from Python objects. For example: import sqlalchemy as sa import pandas as pd from sqlalchemy.engine.url import URL engine_url = URL('sqream' , username='rhendricks' , password='secret_passwor" , host='localhost' , port=5000 , database='raviga' , query={'use_ssl': False}) engine = sa.create_engine(engine_url)
# Build a metadata object and bind it metadata = sa.MetaData() metadata.bind = engine
# Create a table in the local metadata employees = sa.Table( 'employees' , metadata , sa.Column('id', sa.Integer) , sa.Column('name', sa.VARCHAR(32)) , sa.Column('lastname', sa.VARCHAR(32)) , sa.Column('salary', sa.Float) )
# The create_all() function uses the SQream DB engine object # to create all the defined table objects. metadata.create_all(engine)
# Now that the table exists, we can insert data into it.
# Build the data rows insert_data = [ {'id': 1, 'name': 'Richard','lastname': 'Hendricks', 'salary': 12000. ,→75} (continues on next page)
46 Chapter 2. Guides SQream DB, Release 2021.1
(continued from previous page) ,{'id': 3, 'name': 'Bertram', 'lastname': 'Gilfoyle', 'salary': 8400.0} ,{'id': 8, 'name': 'Donald', 'lastname': 'Dunn', 'salary': 6500.40} ]
# Build the insert command ins = employees.insert(insert_data)
# Execute the command result = engine.execute(ins)
2.1.2.1.3.21 pysqream API reference
The SQream Python connector allows Python programs to connect to SQream DB. pysqream conforms to Python DB-API specifications PEP-249 The main module is pysqream, which contains the Connection() class. connect(host, port, database, username, password, clustered = False, use_ssl = False, service=’sqream’, reconnect_attempts=3, reconnect_interval=10) Creates a new Connection() object and connects to SQream DB. host SQream DB hostname or IP port SQream DB port database database name username Username to use for connection password Password for username clustered Connect through load balancer, or direct to worker (Default: false - direct to worker) use_ssl use SSL connection (default: false) service Optional service queue (default: ‘sqream’) reconnect_attempts Number of reconnection attempts to attempt before closing the connection reconnect_interval Time in seconds between each reconnection attempt class Connection
arraysize Specifies the number of rows to fetch at a time with fetchmany(). Defaults to 1 - one row at a time. rowcount Unused, always returns -1. description Read-only attribute that contains result set metadata. This attribute is populated after a statement is executed.
2.1. Full list of guides 47 SQream DB, Release 2021.1
Value Description name Column name type_code Internal type code display_size Not used - same as internal_size internal_size Data size in bytes precision Precision of numeric data (not used) scale Scale for numeric data (not used) null_ok Specifies if NULL values are allowed for this column
execute(self, query, params=None) Execute a statement. Parameters are not supported self Connection() query statement or query text params Unused executemany(self, query, rows_or_cols=None, data_as=’rows’, amount=None) Prepares a statement and executes it against all parameter sequences found in rows_or_cols. self Connection() query INSERT statement rows_or_cols Data buffer to insert. This should be a sequence of lists ortuples. data_as (Optional) Read data as rows or columns amount (Optional) count of rows to insert close(self ) Close a statement and connection. After a statement is closed, it must be reopened by creating a new cursor. self Connection() cursor(self ) Create a new Connection() cursor. We recommend creating a new cursor for every statement. self Connection() fetchall(self, data_as=’rows’) Fetch all remaining records from the result set. An empty sequence is returned when no more rows are available.
self Connection() data_as (Optional) Read data as rows or columns
fetchone(self, data_as=’rows’) Fetch one record from the result set. An empty sequence is returned when no more rows are available. self Connection() data_as (Optional) Read data as rows or columns
48 Chapter 2. Guides SQream DB, Release 2021.1
fetchmany(self, size=[Connection.arraysize], data_as=’rows’) Fetches the next several rows of a query result set. An empty sequence is returned when no more rows are available.
self Connection() size Number of records to fetch. If not set, fetches Connection.arraysize (1 by default) records data_as (Optional) Read data as rows or columns
__iter__() Makes the cursor iterable. apilevel = '2.0' String constant stating the supported API level. The connector supports API “2.0”. threadsafety = 1 Level of thread safety the interface supports. pysqream currently supports level 1, which states that threads can share the module, but not connections. paramstyle = 'qmark' The placeholder marker. Set to qmark, which is a question mark (?).
2.1.2.1.3.22 C++ Driver
The SQream DB C++ driver allows C++ programs and tools to connect to SQream DB. This tutorial shows how to write a C++ program that uses this driver.
In this topic:
• Installing the C++ driver – Prerequisites – Getting the library – Extract the tarball archive • Examples – Testing the connection to SQream DB – Creating a table and inserting values
2.1.2.1.3.23 Installing the C++ driver
2.1.2.1.3.24 Prerequisites
The SQream DB C++ driver was built on 64-bit Linux, and is designed to work with RHEL 7 and Ubuntu 16.04 and newer.
2.1. Full list of guides 49 SQream DB, Release 2021.1
2.1.2.1.3.25 Getting the library
The C++ driver is provided as a tarball containing the compiled libsqream.so file and a header sqream. h. Get the driver from the SQream Drivers page. The library can be integrated into your C++-based applications or projects.
2.1.2.1.3.26 Extract the tarball archive
Extract the library files from the tarball
$ tar xf libsqream-3.0.tar.gz
2.1.2.1.3.27 Examples
Assuming there is a SQream DB worker to connect to, we’ll connect to it using the application and run some statements.
2.1.2.1.3.28 Testing the connection to SQream DB
Download this file by right clicking and saving to your computer connect_test.cpp.
Listing 3: Connect to SQream DB
1 // Trivial example
2
3 #include
4
5 #include "sqream.h"
6
7 int main () {
8
9 sqream::driver sqc;
10
11 // Connection parameters: Hostname, Port, Use SSL, Username, Password, 12 // Database name, Service name 13 sqc.connect("127.0.0.1", 5000, false, "rhendricks", "Tr0ub4dor&3", 14 "raviga", "sqream");
15
16 // create table with data 17 run_direct_query(&sqc, "CREATE TABLE test_table (x int)"); 18 run_direct_query(&sqc, "INSERT INTO test_table VALUES (5), (6), (7), (8)");
19
20 // query it 21 sqc.new_query("SELECT * FROM test_table"); 22 sqc.execute_query();
23
24 // See the results 25 while (sqc.next_query_row()) { 26 std::cout << "Received: " << sqc.get_int(0) << std::endl; 27 } (continues on next page)
50 Chapter 2. Guides SQream DB, Release 2021.1
(continued from previous page)
28
29 sqc.finish_query();
30
31 // Close the connection completely 32 sqc.disconnect();
33
34 }
2.1.2.1.3.29 Compiling and running the application
To build this code, place the library and header file in ./libsqream-3.0/ and run
$ g++ -Wall -Ilibsqream-3.0 -Llibsqream-3.0 -lsqream connect_test.cpp -o connect_test $ ./connect_test
Modify the -I and -L arguments to match the .so library and .h file if they are in another directory.
2.1.2.1.3.30 Creating a table and inserting values
Download this file by right clicking and saving to your computer insert_test.cpp.
Listing 4: Inserting data to a SQream DB table
1 // Insert with parameterized statement example
2
3 #include
4
5 #include "sqream.h"
6
7 int main () {
8
9 sqream::driver sqc;
10
11 // Connection parameters: Hostname, Port, Use SSL, Username, Password, 12 // Database name, Service name 13 sqc.connect("127.0.0.1", 5000, false, "rhendricks", "Tr0ub4dor&3", 14 "raviga", "sqream");
15
16 run_direct_query(&sqc, 17 "CREATE TABLE animals (id INT NOT NULL, name VARCHAR(10) NOT NULL)");
18
19 // prepare the statement 20 sqc.new_query("INSERT INTO animals VALUES (?, ?)"); 21 sqc.execute_query();
22
23 // Data to insert 24 int row0[] = {1,2,3}; 25 std::string row1[] = {"Dog","Cat","Possum"}; 26 int len = sizeof(row0)/sizeof(row0[0]);
27 (continues on next page)
2.1. Full list of guides 51 SQream DB, Release 2021.1
(continued from previous page)
28 for (int i = 0; i < len; ++i) { 29 sqc.set_int(0, row0[i]); 30 sqc.set_varchar(1, row1[i]); 31 sqc.next_query_row(); 32 }
33
34 // This commits the insert 35 sqc.finish_query();
36
37 sqc.disconnect();
38
39 }
2.1.2.1.3.31 Compiling and running the application
To build this code, use
$ g++ -Wall -Ilibsqream-3.0 -Llibsqream-3.0 -lsqream insert_test.cpp -o insert_test $ ./insert_test
2.1.2.1.3.32 JDBC
The SQream DB JDBC driver allows many Java applications and tools connect to SQream DB. This tutorial shows how to write a Java application using the JDBC interface. The JDBC driver requires Java 1.8 or newer.
In this topic:
• Installing the JDBC driver – Prerequisites – Getting the JAR file – Extract the zip archive – Setting up the Class Path • Connect to SQream DB with a JDBC application – Driver class – Connection string ∗ Connection parameters ∗ Connection string examples – Sample Java program
52 Chapter 2. Guides SQream DB, Release 2021.1
2.1.2.1.3.33 Installing the JDBC driver
2.1.2.1.3.34 Prerequisites
The SQream DB JDBC driver requires Java 1.8 or newer. We recommend either Oracle Java or OpenJDK. Oracle Java Download and install Java 8 from Oracle for your platform https://www.java.com/en/download/manual.jsp OpenJDK For Linux and BSD, see https://openjdk.java.net/install/ For Windows, SQream recommends Zulu 8 https://www.azul.com/downloads/zulu-community/?&version= java-8-lts&architecture=x86-64-bit&package=jdk
2.1.2.1.3.35 Getting the JAR file
The JDBC driver is provided as a zipped JAR file, available for download from the client drivers download page. This JAR file can integrate into your Java-based applications or projects.
2.1.2.1.3.36 Extract the zip archive
Extract the JAR file from the zip archive
$ unzip sqream-jdbc-4.3.0.zip
2.1.2.1.3.37 Setting up the Class Path
To use the driver, the JAR named sqream-jdbc-
$ export CLASSPATH=/home/sqream/sqream-jdbc-4.3.0.jar:$CLASSPATH $ java my_java_app
An alternative method is to pass -classpath to the Java executable:
$ java -classpath .:/home/sqream/sqream-jdbc-4.3.0.jar my_java_app
2.1.2.1.3.38 Connect to SQream DB with a JDBC application
2.1.2.1.3.39 Driver class
Use com.sqream.jdbc.SQDriver as the driver class in the JDBC application.
2.1. Full list of guides 53 SQream DB, Release 2021.1
2.1.2.1.3.40 Connection string
JDBC drivers rely on a connection string. Use the following syntax for SQream DB
jdbc:Sqream://
2.1.2.1.3.41 Connection parameters
Item Op- De- Description tional fault
2.1.2.1.3.42 Connection string examples
For a SQream DB cluster with load balancer and no service queues, with SSL
jdbc:Sqream://sqream.mynetwork.co:3108/master;user=rhendricks;password=Tr0ub4dor&3; ,→ssl=true;cluster=true
Minimal example for a local, standalone SQream DB
jdbc:Sqream://127.0.0.1:5000/master;user=rhendricks;password=Tr0ub4dor&3
For a SQream DB cluster with load balancer and a specific service queue named etl, to the database named raviga
jdbc:Sqream://sqream.mynetwork.co:3108/raviga;user=rhendricks;password=Tr0ub4dor&3; ,→cluster=true;service=etl
2.1.2.1.3.43 Sample Java program
Download this file by right clicking and saving to your computer sample.java.
Listing 5: JDBC application sample
1 import java.sql.Connection;
2 import java.sql.DatabaseMetaData;
3 import java.sql.DriverManager; (continues on next page)
54 Chapter 2. Guides SQream DB, Release 2021.1
(continued from previous page)
4 import java.sql.Statement;
5 import java.sql.ResultSet;
6
7 import java.io.IOException;
8 import java.security.KeyManagementException;
9 import java.security.NoSuchAlgorithmException;
10 import java.sql.SQLException;
11
12
13
14 public class SampleTest {
15
16 // Replace with your connection string 17 static final String url = "jdbc:Sqream://sqream.mynetwork.co:3108/master; ,→user=rhendricks;password=Tr0ub4dor&3;ssl=true;cluster=true";
18
19 // Allocate objects for result set and metadata 20 Connection conn = null; 21 Statement stmt = null; 22 ResultSet rs = null; 23 DatabaseMetaData dbmeta = null;
24
25 int res = 0;
26
27 public void testJDBC() throws SQLException, IOException {
28
29 // Create a connection 30 conn = DriverManager.getConnection(url,"rhendricks","Tr0ub4dor&3");
31
32 // Create a table with a single integer column 33 String sql = "CREATE TABLE test (x INT)"; 34 stmt = conn.createStatement(); // Prepare the statement 35 stmt.execute(sql); // Execute the statement 36 stmt.close(); // Close the statement handle
37
38 // Insert some values into the newly created table 39 sql = "INSERT INTO test VALUES (5),(6)"; 40 stmt = conn.createStatement(); 41 stmt.execute(sql); 42 stmt.close();
43
44 // Get values from the table 45 sql = "SELECT * FROM test"; 46 stmt = conn.createStatement(); 47 rs = stmt.executeQuery(sql); 48 // Fetch all results one-by-one 49 while(rs.next()) { 50 res = rs.getInt(1); 51 System.out.println(res); // Print results to screen 52 } 53 rs.close(); // Close the result set
(continues on next page)
2.1. Full list of guides 55 SQream DB, Release 2021.1
(continued from previous page)
54 stmt.close(); // Close the statement handle 55 }
56
57
58 public static void main(String[] args) throws SQLException, ␣ ,→KeyManagementException, NoSuchAlgorithmException, IOException, ␣ ,→ClassNotFoundException{
59
60 // Load SQream DB JDBC driver 61 Class.forName("com.sqream.jdbc.SQDriver");
62
63 // Create test object and run 64 SampleTest test = new SampleTest(); 65 test.testJDBC(); 66 } 67 }
2.1.2.1.3.44 ODBC
2.1.2.1.3.45 Install and Configure ODBC on Windows
The ODBC driver for Windows is provided as a self-contained installer. This tutorial shows you how to install and configure ODBC on Windows.
In this topic:
• Installing the ODBC Driver – Prerequisites – 1. Run the Windows installer • 3. Configuring the ODBC Driver DSN – Connection Parameters • Troubleshooting – Solving “Code 126” ODBC errors
2.1.2.1.3.46 Installing the ODBC Driver
2.1.2.1.3.47 Prerequisites
2.1.2.1.3.48 Visual Studio 2015 Redistributables
To install the ODBC driver you must first install Microsoft’s Visual C++ Redistributable for Visual Studio 2015. To install Visual C++ Redistributable for Visual Studio 2015, see the Install Instructions.
56 Chapter 2. Guides SQream DB, Release 2021.1
2.1.2.1.3.49 Administrator Privileges
The SQream DB ODBC driver requires administrator privileges on your computer to add the DSNs (data source names).
2.1.2.1.3.50 1. Run the Windows installer
Install the driver by following the on-screen instructions in the easy-to-follow installer.
Note: The installer will install the driver in C:\Program Files\SQream Technologies\ODBC Driver by default. This path is changable during the installation.
2.1.2.1.3.51 2. Selecting Components
The installer includes additional components, like JDBC and Tableau customizations.
2.1. Full list of guides 57 SQream DB, Release 2021.1
You can deselect items you don’t want to install, but the items named ODBC Driver DLL and ODBC Driver Registry Keys must remain selected for a complete installation of the ODBC driver. Once the installer finishes, you will be ready to configure the DSN for connection.
2.1.2.1.3.52 3. Configuring the ODBC Driver DSN
ODBC driver configurations are done via DSNs. Each DSN represents one SQream DB database. 1. Open up the Windows menu by clicking the Windows button on your keyboard ( Win) or pressing the Windows button with your mouse. 2. Type ODBC and select ODBC Data Sources (64-bit). Click the item to open up the setup window.
58 Chapter 2. Guides SQream DB, Release 2021.1
3. The installer has created a sample User DSN named SQreamDB You can modify this DSN, or create a new one (Add → SQream ODBC Driver → Next)
2.1. Full list of guides 59 SQream DB, Release 2021.1
4. Enter your connection parameters. See the reference below for a description of the parameters.
5. When completed, save the DSN by selecting OK
Tip: Test the connection by clicking Test before saving. A successful test looks like this:
1. You can now use this DSN in ODBC applications like Tableau.
60 Chapter 2. Guides SQream DB, Release 2021.1
2.1.2.1.3.53 Connection Parameters
Item Description Data Source An easily recognizable name that you’ll use to reference this DSN. Once you set this, Name it can not be changed. Description A description of this DSN for your convenience. You can leave this blank. User Username of a role to use for connection. For example, rhendricks Password Specifies the password of the selected role. For example, Tr0ub4dor&3 Database Specifies the database name to connect to. For example, master Service Specifices service queue to use. For example, etl. Leave blank for default service sqream. Server Hostname of the SQream DB worker. For example, 127.0.0.1 or sqream.mynetwork. co Port TCP port of the SQream DB worker. For example, 5000 or 3108 User server Connect via load balancer (use only if exists, and check port) picker SSL Specifies SSL for this connection Logging op- Use this screen to alter logging options when tracing the ODBC connection for possible tions connection issues.
2.1.2.1.3.54 Troubleshooting
2.1.2.1.3.55 Solving “Code 126” ODBC errors
After installing the ODBC driver, you may experience the following error:
The setup routines for the SQreamDriver64 ODBC driver could not be loaded due to system␣ ,→error code 126: The specified module could not be found. (c:\Program Files\SQream Technologies\ODBC Driver\sqreamOdbc64.dll)
This is an issue with the Visual Studio Redistributable packages. Verify you’ve correctly installed them, as described in the Visual Studio 2015 Redistributables section above.
2.1.2.1.3.56 Install and configure ODBC on Linux
The ODBC driver for Windows is provided as a shared library. This tutorial shows how to install and configure ODBC on Linux.
In this topic:
• Prerequisites – unixODBC • Install the ODBC driver with a script • Install the ODBC driver manually • Install the driver dependencies
2.1. Full list of guides 61 SQream DB, Release 2021.1
• Testing the connection • ODBC DSN Parameters
2.1.2.1.3.57 Prerequisites
2.1.2.1.3.58 unixODBC
The ODBC driver requires a driver manager to manage the DSNs. SQream DB’s driver is built for unixODBC. Verify unixODBC is installed by running:
$ odbcinst -j unixODBC 2.3.4 DRIVERS...... : /etc/odbcinst.ini SYSTEM DATA SOURCES: /etc/odbc.ini FILE DATA SOURCES..: /etc/ODBCDataSources USER DATA SOURCES..: /home/rhendricks/.odbc.ini SQLULEN Size...... : 8 SQLLEN Size...... : 8 SQLSETPOSIROW Size.: 8
Take note of the location of .odbc.ini and .odbcinst.ini. In this case, /etc. If odbcinst is not installed, follow the instructions for your platform below:
Install unixODBC on:
• Install unixODBC on RHEL 7 / CentOS 7 • Install unixODBC on Ubuntu
2.1.2.1.3.59 Install unixODBC on RHEL 7 / CentOS 7
$ yum install -y unixODBC unixODBC-devel
2.1.2.1.3.60 Install unixODBC on Ubuntu
$ sudo apt-get install unixodbc unixodbc-dev
2.1.2.1.3.61 Install the ODBC driver with a script
Use this method if you have never used ODBC on your machine before. If you have existing DSNs, see the manual install process below. 1. Unpack the tarball Copy the downloaded file to any directory, and untar it to a new directory:
62 Chapter 2. Guides SQream DB, Release 2021.1
$ mkdir -p sqream_odbc64 $ tar xf sqream_2019.2.1_odbc_3.0.0_x86_64_linux.tar.gz -C sqream_odbc64
2. Run the first-time installer. The installer will create an editable DSN.
$ cd sqream_odbc64 ./odbc_install.sh --install
3. Edit the DSN created by editing /etc/.odbc.ini. See the parameter explanation in the section ODBC DSN Parameters.
2.1.2.1.3.62 Install the ODBC driver manually
Use this method when you have existing ODBC DSNs on your machine. 1. Unpack the tarball Copy the file you downloaded to the directory where you want to install it,and untar it:
$ tar xf sqream_2019.2.1_odbc_3.0.0_x86_64_linux.tar.gz -C sqream_odbc64
Take note of the directory where the driver was unpacked. For example, /home/rhendricks/ sqream_odbc64 2. Locate the .odbc.ini and .odbcinst.ini files, using odbcinst -j. 1. In .odbcinst.ini, add the following lines to register the driver (change the highlighted paths to match your specific driver):
[ODBC Drivers] SqreamODBCDriver=Installed
[SqreamODBCDriver] Description=Driver DSII SqreamODBC 64bit Driver=/home/rhendricks/sqream_odbc64/sqream_odbc64.so Setup=/home/rhendricks/sqream_odbc64/sqream_odbc64.so APILevel=1 ConnectFunctions=YYY DriverODBCVer=03.80 SQLLevel=1 IconvEncoding=UCS-4LE
2. In .odbc.ini, add the following lines to configure the DSN (change the highlighted parameters to match your installation):
[ODBC Data Sources] MyTest=SqreamODBCDriver
[MyTest] Description=64-bit Sqream ODBC Driver=/home/rhendricks/sqream_odbc64/sqream_odbc64.so Server="127.0.0.1" Port="5000" Database="raviga" Service="" (continues on next page)
2.1. Full list of guides 63 SQream DB, Release 2021.1
(continued from previous page) User="rhendricks" Password="Tr0ub4dor&3" Cluster=false Ssl=false
Parameters are in the form of parameter = value. For details about the parameters that can be set for each DSN, see the section ODBC DSN Parameters. 3. Create a file called .sqream_odbc.ini for managing the driver settings and logging. This file should be created alongside the other files, and add the following lines (change the highlighted parameters to match your installation):
# Note that this default DriverManagerEncoding of UTF-32 is for iODBC.␣ ,→unixODBC uses UTF-16 by default. # If unixODBC was compiled with -DSQL_WCHART_CONVERT, then UTF-32 is the␣ ,→correct value. # Execute 'odbc_config --cflags' to determine if you need UTF-32 or UTF- ,→16 on unixODBC [Driver] DriverManagerEncoding=UTF-16 DriverLocale=en-US ErrorMessagesPath=/home/rhendricks/sqream_odbc64/ErrorMessages LogLevel=0 LogNamespace= LogPath=/tmp/ ODBCInstLib=libodbcinst.so
2.1.2.1.3.63 Install the driver dependencies
Add the ODBC driver path to LD_LIBRARY_PATH:
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/rhendricks/sqream_odbc64/lib
You can also add this previous command line to your ~/.bashrc file in order to keep this installation working between reboots without re-entering the command manually
2.1.2.1.3.64 Testing the connection
Test the driver using isql. If the DSN created is called MyTest as the example, run isql in this format:
$ isql MyTest
64 Chapter 2. Guides SQream DB, Release 2021.1
2.1. Full list of guides 65 SQream DB, Release 2021.1
2.1.2.1.3.65 ODBC DSN Parameters
Item Default Description Data Source Name None An easily recognizable name that you’ll use to reference this DSN. Description None A description of this DSN for your convenience. This field can be left blank User None Username of a role to use for connection. For example, User="rhendricks" Password None Specifies the password of the selected role. For example, User="Tr0ub4dor&3" Database None Specifies the database name to connect to. For example, Database="master" Service sqream Specifices service queue to use. For example, Service="etl". Leave blank (Service="") for de- fault service sqream. Server None Hostname of the SQream DB worker. For exam- ple, Server="127.0.0.1" or Server="sqream.mynetwork. co" Port None TCP port of the SQream DB worker. For example, Port="5000" or Port="3108" for the load balancer Cluster false Connect via load balancer (use only if exists, and check port). For example, Cluster=true Ssl false Specifies SSL for this connection. For example, Ssl=true DriverManagerEncoding UTF-16 Depending on how unixODBC is installed, you may need to change this to UTF-32. ErrorMessagesPath None Location where the driver was installed. For example, ErrorMessagePath=/home/ rhendricks/sqream_odbc64/ ErrorMessages. LogLevel 0 Set to 0-6 for logging. Use this setting when instructed to by SQream Support. For example, LogLevel=1 • 0 = Disable tracing • 1 = Fatal only error tracing • 2 = Error trac- ing • 3 = Warning 66 tracingChapter 2. Guides • 4 = Info tracing • 5 = Debug trac- ing • 6 = Detailed tracing SQream DB, Release 2021.1
SQream has an ODBC driver to connect to SQream DB. This tutorial shows how to install the ODBC driver for Linux or Windows for use with applications like Tableau, PHP, and others that use ODBC.
Platform Versions supported Windows • Windows 7 (64 bit) • Windows 8 (64 bit) • Windows 10 (64 bit) • Windows Server 2008 R2 (64 bit) • Windows Server 2012 • Windows Server 2016 • Windows Server 2019
Linux • Red Hat Enterprise Linux (RHEL) 7 • CentOS 7 • Ubuntu 16.04 • Ubuntu 18.04
Other distributions may also work, but are not officially supported by SQream.
In this topic:
• Downloading the ODBC driver • Install and configure the ODBC driver
2.1.2.1.3.66 Downloading the ODBC driver
The SQream DB ODBC driver is distributed by your SQream account manager. Before contacting your account manager, verify which platform the ODBC driver will be used on. Go to SQream Support or contact your SQream account manager to get the driver. The driver is provided as an executable installer for Windows, or a compressed tarball for Linux platforms. After downloading the driver, follow the relevant instructions to install and configure the driver for your platform:
2.1.2.1.3.67 Install and configure the ODBC driver
Continue based on your platform: • Install and Configure ODBC on Windows • Install and configure ODBC on Linux
2.1.2.1.3.68 Node.JS
The SQream DB Node.JS driver allows Javascript applications and tools connect to SQream DB. This tutorial shows you how to write a Node application using the Node.JS interface. The driver requires Node 10 or newer.
2.1. Full list of guides 67 SQream DB, Release 2021.1
In this topic:
• Installing the Node.JS driver – Prerequisites – Install with NPM – Install from an offline package • Connect to SQream DB with a Node.JS application – Create a simple test – Run the test • API reference – Connection parameters – Events ∗ Example – Input placeholders • Examples – Setting configuration flags – Lazyloading – Reusing a connection – Using placeholders in queries • Troubleshooting and recommended configuration – Preventing heap out of memory errors – BIGINT support
2.1.2.1.3.69 Installing the Node.JS driver
2.1.2.1.3.70 Prerequisites
• Node.JS 10 or newer. Follow instructions at nodejs.org .
2.1.2.1.3.71 Install with NPM
Installing with npm is the easiest and most reliable method. If you need to install the driver in an offline system, see the offline method below.
$ npm install @sqream/sqreamdb
2.1.2.1.3.72 Install from an offline package
The Node driver is provided as a tarball for download from the SQream Drivers page .
68 Chapter 2. Guides SQream DB, Release 2021.1
After downloading the tarball, use npm to install the offline package.
$ sudo npm install sqreamdb-4.0.0.tgz
2.1.2.1.3.73 Connect to SQream DB with a Node.JS application
2.1.2.1.3.74 Create a simple test
Replace the connection parameters with real parameters for a SQream DB installation.
Listing 6: sqreamdb-test.js const Connection = require('@sqream/sqreamdb'); const config = { host: 'localhost', port: 3109, username: 'rhendricks', password: 'super_secret_password', connectDatabase: 'raviga', cluster: true, is_ssl: true, service: 'sqream' }; const query1 = 'SELECT 1 AS test, 2*6 AS "dozen"'; const sqream = new Connection(config); sqream.execute(query1).then((data) => { console.log(data); }, (err) => { console.error(err); });
2.1.2.1.3.75 Run the test
A successful run should look like this:
$ node sqreamdb-test.js [ { test: 1, dozen: 12 } ]
2.1. Full list of guides 69 SQream DB, Release 2021.1
2.1.2.1.3.76 API reference
2.1.2.1.3.77 Connection parameters
Item Op- De- Description tional fault host None Hostname for SQream DB worker. For example, 127.0.0.1, sqream. mynetwork.co port None Port for SQream DB end-point. For example, 3108 for the load bal- ancer, 5000 for a worker. username None Username of a role to use for connection. For example, rhendricks password None Specifies the password of the selected role. For example, Tr0ub4dor&3 connectDatabase None Database name to connect to. For example, master service ✓ sqream Specifices service queue to use. For example, etl is_ssl ✓ false Specifies SSL for this connection. For example, true cluster ✓ false Connect via load balancer (use only if exists, and check port). For example, true
2.1.2.1.3.78 Events
The connector handles event returns with an event emitter getConnectionId The getConnectionId event returns the executing connection ID. getStatementId The getStatementId event returns the executing statement ID. getTypes The getTypes event returns the results columns types.
2.1.2.1.3.79 Example const myConnection = new Connection(config); myConnection.runQuery(query1, function (err, data){ myConnection.events.on('getConnectionId', function(data){ console.log('getConnectionId', data); });
myConnection.events.on('getStatementId', function(data){ console.log('getStatementId', data); });
myConnection.events.on('getTypes', function(data){ console.log('getTypes', data); }); });
2.1.2.1.3.80 Input placeholders
The Node.JS driver can replace parameters in a statement.
70 Chapter 2. Guides SQream DB, Release 2021.1
Input placeholders allow values like user input to be passed as parameters into queries, with proper escaping. The valid placeholder formats are provided in the table below.
Placeholder Type %i Identifier (e.g. table name, column name) %s A text string %d A number value %b A boolean value
See the input placeholders example below.
2.1.2.1.3.81 Examples
2.1.2.1.3.82 Setting configuration flags
SQream DB configuration flags can be set per statement, as a parameter to runQuery. For example: const setFlag = 'SET showfullexceptioninfo = true;'; const query_string = 'SELECT 1'; const myConnection = new Connection(config); myConnection.runQuery(query_string, function (err, data){ console.log(err, data); }, setFlag);
2.1.2.1.3.83 Lazyloading
To process rows without keeping them in memory, you can lazyload the rows with an async: const Connection = require('@sqream/sqreamdb'); const config = { host: 'localhost', port: 3109, username: 'rhendricks', password: 'super_secret_password', connectDatabase: 'raviga', cluster: true, is_ssl: true, service: 'sqream' }; const sqream = new Connection(config); const query = "SELECT * FROM public.a_very_large_table";
(async () => { (continues on next page)
2.1. Full list of guides 71 SQream DB, Release 2021.1
(continued from previous page) const cursor = await sqream.executeCursor(query); let count = 0; for await (let rows of cursor.fetchIterator(100)) { // fetch rows in chunks of 100 count += rows.length; } await cursor.close(); return count; })().then((total) => { console.log('Total rows', total); }, (err) => { console.error(err); });
2.1.2.1.3.84 Reusing a connection
It is possible to execeute multiple queries with the same connection (although only one query can be executed at a time). const Connection = require('@sqream/sqreamdb'); const config = { host: 'localhost', port: 3109, username: 'rhendricks', password: 'super_secret_password', connectDatabase: 'raviga', cluster: true, is_ssl: true, service: 'sqream' }; const sqream = new Connection(config);
(async () => {
const conn = await sqream.connect(); try { const res1 = await conn.execute("SELECT 1"); const res2 = await conn.execute("SELECT 2"); const res3 = await conn.execute("SELECT 3"); conn.disconnect(); return {res1, res2, res3}; } catch (err) { conn.disconnect(); throw err; }
})().then((res) => { console.log('Results', res) (continues on next page)
72 Chapter 2. Guides SQream DB, Release 2021.1
(continued from previous page) }, (err) => { console.error(err); });
2.1.2.1.3.85 Using placeholders in queries
Input placeholders allow values like user input to be passed as parameters into queries, with proper escaping. const Connection = require('@sqream/sqreamdb'); const config = { host: 'localhost', port: 3109, username: 'rhendricks', password: 'super_secret_password', connectDatabase: 'raviga', cluster: true, is_ssl: true, service: 'sqream' }; const sqream = new Connection(config); const sql = "SELECT %i FROM public.%i WHERE name = %s AND num > %d AND active = %b"; sqream.execute(sql, "col1", "table2", "john's", 50, true);
The query that will run is SELECT col1 FROM public.table2 WHERE name = 'john''s' AND num > 50 AND active = true
2.1.2.1.3.86 Troubleshooting and recommended configuration
2.1.2.1.3.87 Preventing heap out of memory errors
Some workloads may cause Node.JS to fail with the error:
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
To prevent this error, modify the heap size configuration by setting the --max-old-space-size run flag. For example, set the space size to 2GB:
$ node --max-old-space-size=2048 my-application.js
2.1.2.1.3.88 BIGINT support
The Node.JS connector supports fetching BIGINT values from SQream DB. However, some applications may encounter an error when trying to serialize those values. The error that appears is: .. code-block:: none
2.1. Full list of guides 73 SQream DB, Release 2021.1
TypeError: Do not know how to serialize a BigInt This is because JSON specification do not support BIGINT values, even when supported by Javascript engines. To resolve this issue, objects with BIGINT values should be converted to string before serializing, and converted back after deserializing. For example: const rows = [{test: 1n}] const json = JSON.stringify(rows, , (key, value) => typeof value === 'bigint' ? value.toString() : value // return everything else unchanged )); console.log(json); // [{"test": "1"}]
Need help?
If you couldn’t find what you’re looking for, we’re always happy to help. Visit SQream’s support portal for additional support.
Looking for older drivers?
If you’re looking for an older version of SQream DB drivers, versions 1.10 through 2019.2.1 are available at https://sqream.com/product/client-drivers/ .
2.1.3 Third Party Tools
These topics explain how to install and connect a variety of third party tools. Browse the articles below, in the sidebar, or use the search to find the information you need.
2.1.3.1 Overview
SQream DB is designed to work with most common database tools and interfaces, allowing you direct access through a variety of drivers, connectors, tools, vizualisers, and utilities. The tools listed have been tested and approved for use with SQream DB. Most 3rd party tools that work through JDBC, ODBC, and Python should work. If you are looking for a tool that is not listed, SQream and our partners can help. Go to SQream Support or contact your SQream account manager for more information.
2.1.3.1.1 Connect to SQream Using SQL Workbench
You can use SQL Workbench to interact with a SQream DB cluster. SQL Workbench/J is a free SQL query tool, and is designed to run on any JRE-enabled environment. This tutorial is a guide that will show you how to connect SQL Workbench to SQream DB.
74 Chapter 2. Guides SQream DB, Release 2021.1
In this topic:
• Installing SQL Workbench with the SQream DB installer (Windows only) • Installing SQL Workbench manually (Linux, MacOS) – Install Java Runtime – Get the SQream DB JDBC driver – Install SQL Workbench – Setting up the SQream DB JDBC driver profile • Create a new connection profile for your cluster • Suggested optional configuration
2.1.3.1.1.1 Installing SQL Workbench with the SQream DB installer (Windows only)
SQream DB’s driver installer for Windows can install the Java prerequisites and SQL Workbench for you. 1. Get the JDBC driver installer available for download from the SQream Drivers page. The Windows installer takes care of the Java prerequisites and subsequent configuration. 2. Install the driver by following the on-screen instructions in the easy-to-follow installer. By default, the installer does not install SQL Workbench. Make sure to select the item!
Note: The installer will install SQL Workbench in C:\Program Files\SQream
2.1. Full list of guides 75 SQream DB, Release 2021.1
Technologies\SQLWorkbench by default. You can change this path during the installation.
1. Once finished, SQL Workbench is installed and contains the necessary configuration for connectingto SQream DB clusters. 2. Start SQL Workbench from the Windows start menu. Be sure to select SQL Workbench (64) if you’re on 64-bit Windows.
You are now ready to create a profile for your cluster. Continue to Creating a new connection profile.
2.1.3.1.1.2 Installing SQL Workbench manually (Linux, MacOS)
2.1.3.1.1.3 Install Java Runtime
Both SQL Workbench and the SQream DB JDBC driver require Java 1.8 or newer. You can install either Oracle Java or OpenJDK.
76 Chapter 2. Guides SQream DB, Release 2021.1
Oracle Java Download and install Java 8 from Oracle for your platform - https://www.java.com/en/download/manual.jsp OpenJDK For Linux and BSD, see https://openjdk.java.net/install/ For Windows, SQream recommends Zulu 8 https://www.azul.com/downloads/zulu-community/?&version= java-8-lts&architecture=x86-64-bit&package=jdk
2.1.3.1.1.4 Get the SQream DB JDBC driver
SQream DB’s JDBC driver is provided as a zipped JAR file, available for download from the SQream Drivers page. Download and extract the JAR file from the zip archive.
2.1.3.1.1.5 Install SQL Workbench
1. Download the latest stable release from https://www.sql-workbench.eu/downloads.html . The Generic package for all systems is recommended. 2. Extract the downloaded ZIP archive into a directory of your choice. 3. Start SQL workbench. If you are using 64 bit windows, run SQLWorkbench64.exe instead of SQLWOrkbench.exe.
2.1.3.1.1.6 Setting up the SQream DB JDBC driver profile
1. Define a connection profile - File → Connect window (Alt+C)
2.1. Full list of guides 77 SQream DB, Release 2021.1
2. Open the drivers management window - Manage Drivers
78 Chapter 2. Guides SQream DB, Release 2021.1
3. Create the SQream DB driver profile
2.1. Full list of guides 79 SQream DB, Release 2021.1
1. Click on the Add new driver button (“New” icon) 2. Name the driver as you see fit. We recommend calling it SQream DB
2.1.3.1.1.7 Create a new connection profile for your cluster
1. Create new connection by clicking the New icon (top left) 2. Give your connection a descriptive name 3. Select the SQream Driver that was created in the previous screen 4. Type in your connection string. To find out more about your connection string (URL), seethe Con- nection string documentation. 5. Text the connection details 6. Click OK to save the connection profile and connect to SQream DB
80 Chapter 2. Guides SQream DB, Release 2021.1
2.1.3.1.1.8 Suggested optional configuration
If you installed SQL Workbench manually, you can set a customization to help SQL Workbench show information correctly in the DB Explorer panel. 1. Locate your workbench.settings file On Windows, typically: C:\Users\
workbench.db.sqreamdb.schema.retrieve.change.catalog=true
3. Save the file and restart SQL Workbench
2.1.3.1.2 Connecting to SQream Using Tableau
2.1.3.1.2.1 Overview
SQream’s Tableau connector plugin, based on standard JDBC, enables storing and fast querying large volumes of data. The Connecting to SQream Using Tableau page is a Quick Start Guide that describes how install Tableau and the JDBC and ODBC drivers and connect to SQream using the JDBC and ODBC drivers for data analysis. It also describes using best practices and troubleshoot issues that may occur while installing Tableau. SQream supports both Tableau Desktop and Tableau Server on Windows, MacOS, and Linux distributions. For more information on SQream’s integration with Tableau, see Tableau’s Extension Gallery. The Connecting to SQream Using Tableau page describes the following:
• Installing the JDBC Driver and Tableau Connector Plugin – Installing the JDBC Driver Using the Windows Installer – Installing the JDBC Driver Manually • Installing the ODBC Driver for Tableau Versions 2019.3 and Earlier – Automatically Reconfiguring the ODBC Driver After Initial Installation – Manually Reconfiguring the ODBC Driver After Initial Installation – Configuring the ODBC Connection – Connecting Tableau to SQream • Connecting to SQream • Setting Up SQream Tables as Data Sources • Tableau Best Practices and Troubleshooting – Inserting Only Required Data – Using Tableau’s Table Query Syntax – Creating a Separate Service for Tableau – Troubleshooting Workbook Performance Before Deploying to the Tableau Server
2.1. Full list of guides 81 SQream DB, Release 2021.1
– Troubleshooting Error Codes
2.1.3.1.2.2 Installing the JDBC Driver and Tableau Connector Plugin
This section describes how to install the JDBC driver using the fully-integrated Tableau connector plugin (Tableau Connector, or .taco file). SQream has been tested with Tableau versions 9.2 and newer. To connect to SQream using Tableau: 1. Install the Tableau Desktop application. For more information about installing the Tableau Desktop application, see the Tableau products page and click Download Free Trial. Note that Tableau offers a 14-day trial version. 2. Do one of the following: • For Windows - See Installing Tableau Using the Windows Installer. • For MacOS or Linux - See Installing the JDBC Driver Manually.
Note: For Tableau 2019.4 versions and later, SQream recommends installing the JDBC driver instead of the previously recommended ODBC driver.
2.1.3.1.2.3 Installing the JDBC Driver Using the Windows Installer
If you are using Windows, after installing the Tableau Desktop application you can install the JDBC driver using the Windows installer. The Windows installer is an installation wizard that guides you through the JDBC driver installation steps. When the driver is installed, you can connect to SQream. To install Tableau using the Windows installer: 1. Close Tableau Desktop.
2. Download the most current version of the SQream JDBC driver.
3. Do the following: 1. Start the installer. 2. Verify that the Tableau Desktop connector item is selected. 3. Follow the installation steps.
You can now restart Tableau Desktop or Server to begin using the SQream driver by connecting to SQream.
2.1.3.1.2.4 Installing the JDBC Driver Manually
If you are using MacOS, Linux, or the Tableau server, after installing the Tableau Desktop application you can install the JDBC driver manually. When the driver is installed, you can connect to SQream. To install the JDBC driver manually:
82 Chapter 2. Guides SQream DB, Release 2021.1
1. Download the JDBC installer and SQream Tableau connector (.taco) file from the from the client drivers page.
2. Install the JDBC driver by unzipping the JDBC driver into a Tableau driver directory. Based on the installation method that you used, your Tableau driver directory is located in one of the following places: • Tableau Desktop on Windows: C:\Program Files\Tableau\Drivers • Tableau Desktop on MacOS: ~/Library/Tableau/Drivers • Tableau on Linux: /opt/tableau/tableau_driver/jdbc
Note: If the driver includes only a single .jar file, copy it to C:\Program Files\Tableau/Drivers. If the driver includes multiple files, create a subfolder A in C:\Program Files\Tableau/Drivers and copy all files to folder A.
Note the following when installing the JDBC driver: • You must have read permissions on the .jar file. • Tableau requires a JDBC 4.0 or later driver. • Tableau requires a Type 4 JDBC driver. • The latest 64-bit version of Java 8 is installed. 3. Install the SQreamDB.taco file by moving the SQreamDB.taco file into the Tableau connectors directory. Based on the installation method that you used, your Tableau driver directory is located in one of the following places: • Tableau Desktop on Windows: C:\Users\
4. Optional - If you are using the Tableau Server, do the following: 1. Create a directory for Tableau connectors and give it a descriptive name, such as C:\tableau_connectors. This directory needs to exist on all Tableau servers.
2. Copy the SQreamDB.taco file into the new directory.
3. Set the native_api.connect_plugins_path option to tsm as shown in the following example:
$ tsm configuration set -k native_api.connect_plugins_path -v C:/tableau_ ,→connectors
If a configuration error is displayed, add --force-keys to the end of the command as shown in the following example:
2.1. Full list of guides 83 SQream DB, Release 2021.1
$ tsm configuration set -k native_api.connect_plugins_path -v C:/tableau_ ,→connectors--force-keys
4. To apply the pending configuration changes, run the following command:
$ tsm pending-changes apply
Warning: This restarts the server.
You can now restart Tableau Desktop or Server to begin using the SQream driver by connecting to SQream as described in the section below.
2.1.3.1.2.5 Installing the ODBC Driver for Tableau Versions 2019.3 and Earlier
This section describes the installation method for Tableau version 2019.3 or earlier and describes the follow- ing:
• Automatically Reconfiguring the ODBC Driver After Initial Installation • Manually Reconfiguring the ODBC Driver After Initial Installation • Configuring the ODBC Connection • Connecting Tableau to SQream
Note: SQream recommends installing the JDBC driver to provide improved connectivity.
2.1.3.1.2.6 Automatically Reconfiguring the ODBC Driver After Initial Installation
If you’ve already installed the SQream ODBC driver and installed Tableau, SQream recommends reinstalling the ODBC driver with the .TDC Tableau Settings for SQream DB configuration shown in the image below:
84 Chapter 2. Guides SQream DB, Release 2021.1
SQream recommends this configuration because Tableau creates temporary tables and runs several discovery queries that may impact performance. The ODBC driver installer avoids this by automatically reconfiguring Tableau. For more information about reinstalling the ODBC driver installer, see Install and Configure ODBC on Windows. If you want to manually reconfigure the ODBC driver, see Manually Reconfiguring the ODBC Driver After Initial Installation below.
2.1.3.1.2.7 Manually Reconfiguring the ODBC Driver After Initial Installation
The file Tableau Datasource Customization (TDC) file lets you use Tableau make full use of SQream DB’s features and capabilities. To manually reconfigure the ODBC driver after initial installation: 1. Do one of the following: 1. Download the odbc-sqream.tdc file to your machine and open it in a text editor.
2. Copy the text below into a text editor:
Listing 7: SQream ODBC TDC File
2.1. Full list of guides 85 SQream DB, Release 2021.1
(continued from previous page)
2. Check which version of Tableau you are using.
3. In the text of the file shown above, in the highlighted line, replace the version number withthe major version of Tableau that you are using. For example, if you are using Tableau vesion 2019.2.1, replace it with 2019.2.
4. Do one of the following: • If you are using Tableau Desktop - save the TDC file to C:\Users\
• If you are using the Tableau Server - save the TDC file to C:\ProgramData\Tableau\Tableau Server\data\tabsvc\vizqlserver\Datasources.
2.1.3.1.2.8 Configuring the ODBC Connection
The ODBC connection uses a DSN when connecting to ODBC data sources, and each DSN represents one SQream database. To configure the ODBC connection: 1. Create an ODBC DSN.
86 Chapter 2. Guides SQream DB, Release 2021.1
2. Open the Windows menu by pressing the Windows button ( Win) or clicking the Windows menu button.
3. Type ODBC and select ODBC Data Sources (64-bit). During installation, the installer created a sample user DSN named SQreamDB.
4. Optional - Do one or both of the following: • Modify the DSN name.
• Create a new DSN name by clicking Add and selecting SQream ODBC Driver.
5. Click Finish.
6. Enter your connection parameters. The following table describes the connection parameters:
2.1. Full list of guides 87 SQream DB, Release 2021.1
Item Description Example Data Source The Data Source Name. SQream rec- Name ommends using a descriptive and easily recognizable name for referencing your DSN. Once set, the Data Source Name cannot be changed. Description The description of your DSN. This field is optional. User The username of a role to use for estab- rhendricks lishing the connection. Password The password of the selected role. Tr0ub4dor Database The database name to connect to. For master example, master Service The service queue to use. For example, etl. For the default ser- vice sqream, leave blank. Server The hostname of the SQream worker. 127.0.0.1 or sqream.mynetwork.co Port The TCP port of the SQream worker. 5000 or 3108 User Server Uses the load balancer when establish- Picker ing a connection. Use only if exists, and check port. SSL Uses SSL when establishing a connec- tion. Logging Op- Lets you modify your logging options tions when tracking the ODBC connection for connection issues.
Tip: Test the connection by clicking Test before saving your DSN.
7. Save the DSN by clicking OK.
2.1.3.1.2.9 Connecting Tableau to SQream
To connect Tableau to SQream: 1. Start Tableau Desktop.
2. In the Connect menu, in the To a server sub-menu, click More Servers and select Other Databases (ODBC). The Other Databases (ODBC) window is displayed.
3. In the Other Databases (ODBC) window, select the DSN that you created in Setting Up SQream Tables as Data Sources. Tableau may display the Sqream ODBC Driver Connection Dialog window and prompt you to provide your username and password. 4. Provide your username and password and click OK.
88 Chapter 2. Guides SQream DB, Release 2021.1
2.1.3.1.2.10 Connecting to SQream
After installing the JDBC driver you can connect to SQream. To connect to SQream: 1. Start Tableau Desktop.
2. In the Connect menu, in the To a Server sub-menu, click More…. More connection options are displayed.
3. Select SQream DB by SQream Technologies. The New Connection dialog box is displayed.
4. In the New Connection dialog box, fill in the fields and click Sign In. The following table describes the fields:
Item Description Example Server Defines the server of the SQream 127.0.0.1 or sqream.mynetwork.co worker. Port Defines the TCP port of the SQream 3108 when using a load balancer, or worker. 5100 when connecting directly to a worker with SSL. Database Defines the database to establish a master connection with. Cluster Enables (true) or disables (false) the load balancer. After enabling or disabling the load balance, verify the connection. Username Specifies the username of a role to use rhendricks when connecting. Password Specifies the password of the selected Tr0ub4dor&3 role. Require Sets SSL as a requirement for estab- SSL (recom- lishing this connection. mended)
The connection is established and the data source page is displayed.
Tip: Tableau automatically assigns your connection a default name based on the DSN and table. SQream recommends giving the connection a more descriptive name.
2.1.3.1.2.11 Setting Up SQream Tables as Data Sources
After connecting to SQream you must set up the SQream tables as data sources. To set up SQream tables as data sources:
2.1. Full list of guides 89 SQream DB, Release 2021.1
1. From the Table menu, select the desired database and schema. SQream’s default schema is public.
2. Drag the desired tables into the main area (labeled Drag tables here). This area is also used for specifying joins and data source filters.
3. Open a new sheet to analyze data.
Tip: For more information about configuring data sources, joining, filtering, see Tableau’s Set Up Data Sources tutorials.
2.1.3.1.2.12 Tableau Best Practices and Troubleshooting
This section describes the following best practices and troubleshooting procedures when connecting to SQream using Tableau:
• Inserting Only Required Data • Using Tableau’s Table Query Syntax • Creating a Separate Service for Tableau • Troubleshooting Workbook Performance Before Deploying to the Tableau Server • Troubleshooting Error Codes
2.1.3.1.2.13 Inserting Only Required Data
When using Tableau, SQream recommends using only data that you need, as described below: • Insert only the data sources you need into Tableau, excluding tables that don’t require analysis.
• To increase query performance, add filters before analyzing. Every modification you make while an- alyzing data queries the SQream database, sometimes several times. Adding filters to the datasource before exploring limits the amount of data analyze and increases query performance.
2.1.3.1.2.14 Using Tableau’s Table Query Syntax
Dragging your desired tables into the main area in Tableau builds queries based on its own syntax. This helps ensure increased performance, while using views or custom SQL may degrade performance. In addition, SQream recommends using the CREATE VIEW to create pre-optimized views, which your datasources point to.
90 Chapter 2. Guides SQream DB, Release 2021.1
2.1.3.1.2.15 Creating a Separate Service for Tableau
SQream recommends creating a separate service for Tableau with the DWLM. This reduces the impact that Tableau has on other applications and processes, such as ETL. In addition, this works in conjunction with the load balancer to ensure good performance.
2.1.3.1.2.16 Troubleshooting Workbook Performance Before Deploying to the Tableau Server
Tableau has a built-in performance recorder that shows how time is being spent. If you’re seeing slow performance, this could be the result of a misconfiguration such as setting concurrency too low. Use the Tableau Performance Recorder for viewing the performance of queries run by Tableau. You can use this information to identify queries that can be optimized by using views.
2.1.3.1.2.17 Troubleshooting Error Codes
Tableau may be unable to locate the SQream JDBC driver. The following message is displayed when Tableau cannot locate the driver:
Error Code: 37CE01A3, No suitable driver installed or the URL is incorrect
To troubleshoot error codes: If Tableau cannot locate the SQream JDBC driver, do the following: 1. Verify that the JDBC driver is located in the correct directory: • Tableau Desktop on Windows: C:Program FilesTableauDrivers • Tableau Desktop on MacOS: ~/Library/Tableau/Drivers • Tableau on Linux: /opt/tableau/tableau_driver/jdbc 2. Find the file path for the JDBC driver and add it to the Java classpath: • For Linux - export CLASSPATH=
• For Windows - add an environment variable for the classpath:
2.1. Full list of guides 91 SQream DB, Release 2021.1
If you experience issues after restarting Tableau, see the SQream support portal.
2.1.3.1.3 Connect to SQream Using Pentaho Data Integration
2.1.3.1.3.1 Overview
This document is a Quick Start Guide that describes how to install Pentaho, create a transformation, and define your output. The Connecting to SQream Using Pentaho page describes the following: • Installing Pentaho • Installing and setting up the JDBC driver • Creating a transformation • Defining your output • Importing your data
2.1.3.1.3.2 Installing Pentaho
To install PDI, see the Pentaho Community Edition (CE) Installation Guide.
92 Chapter 2. Guides SQream DB, Release 2021.1
The Pentaho Community Edition (CE) Installation Guide describes how to do the following: • Downloading the PDI software. • Installing the JRE (Java Runtime Environment) and JDK (Java Development Kit). • Setting up the JRE and JDK environment variables for PDI. Back to Overview
2.1.3.1.3.3 Installing and Setting Up the JDBC Driver
After installing Pentaho you must install and set up the JDBC driver. This section explains how to set up the JDBC driver using Pentaho. These instructions use Spoon, the graphical transformation and job designer associated with the PDI suite. You can install the driver by copying and pasting the SQream JDBC .jar file into your
2.1.3.1.3.4 Creating a Transformation
After installing Pentaho you can create a transformation. To create a transformation: 1. Use the CLI to open the PDI client for your operating system (Windows):
$ spoon.bat
2. Open the spoon.bat file from its folder location.
3. In the View tab, right-click Transformations and click New.
2.1. Full list of guides 93 SQream DB, Release 2021.1
A new transformation tab is created.
4. In the Design tab, click Input to show its file contents.
94 Chapter 2. Guides SQream DB, Release 2021.1
5. Drag and drop the CSV file input item to the new transformation tab that you created.
6. Double-click CSV file input. The CSV file input panel is displayed.
2.1. Full list of guides 95 SQream DB, Release 2021.1
7. In the Step name field, type a name.
8. To the right of the Filename field, click Browse.
96 Chapter 2. Guides SQream DB, Release 2021.1
9. Select the file that you want to read from and click OK.
10. In the CSV file input window, click Get Fields.
11. In the Sample data window, enter the number of lines you want to sample and click OK. The default setting is 100.
2.1. Full list of guides 97 SQream DB, Release 2021.1
The tool reads the file and suggests the field name and type.
12. In the CSV file input window, click Preview.
98 Chapter 2. Guides SQream DB, Release 2021.1
13. In the Preview size window, enter the number of rows you want to preview and click OK. The default setting is 1000.
14. Verify that the preview data is correct and click Close.
2.1. Full list of guides 99 SQream DB, Release 2021.1
15. Click OK in the CSV file input window. Back to Overview
2.1.3.1.3.5 Defining Your Output
After creating your transformation you must define your output. To define your output: 1. In the Design tab, click Output. 2. Drag and drop Table output item to the Transformation window.
3. Double-click Table output to open the Table output dialog box. 4. From the Table output dialog box, type a Step name and click New to create a new connection. Your steps are the building blocks of a transformation, such as file input or a table output.
100 Chapter 2. Guides SQream DB, Release 2021.1
The Database Connection window is displayed with the General tab selected by default.
5. Enter or select the following information in the Database Connection window and click Test.
2.1. Full list of guides 101 SQream DB, Release 2021.1
The following table shows and describes the information that you need to fill out in the Database Connection window:
No. Element Name Description 1 Connection name Enter a name that uniquely describes your connection, such as sam- pledata. 2 Connection type Select Generic database. 3 Access Select Native (JDBC). 4 Custom connection URL Insert jdbc:Sqream://
The following message is displayed:
102 Chapter 2. Guides SQream DB, Release 2021.1
6. Click OK in the window above, in the Database Connection window, and Table Output window. Back to Overview
2.1.3.1.3.6 Importing Data
After defining your output you can begin importing your data. For more information about backing up users, permissions, or schedules, see Backup and Restore Pentaho Repositories To import data: 1. Double-click the Table output connection that you just created.
2. To the right of the Target schema field, click Browse and select a schema name.
2.1. Full list of guides 103 SQream DB, Release 2021.1
3. Click OK. The selected schema name is displayed in the Target schema field.
4. Create a new hop connection between the CSV file input and Table output steps: 1. On the CSV file input step item, click the new hop connection icon.
104 Chapter 2. Guides SQream DB, Release 2021.1
2. Drag an arrow from the CSV file input step item to the Table output step item.
3. Release the mouse button. The following options are displayed. 4. Select Main output of step.
5. Double-click Table output to open the Table output dialog box. 6. In the Target table field, define a target table name.
7. Click SQL to open the Simple SQL editor.
2.1. Full list of guides 105 SQream DB, Release 2021.1
8. In the Simple SQL editor, click Execute.
The system processes and displays the results of the SQL statements.
106 Chapter 2. Guides SQream DB, Release 2021.1
9. Close all open dialog boxes. 10. Click the play button to execute the transformation.
The Run Options dialog box is displayed.
11. Click Run. The Execution Results are displayed.
2.1. Full list of guides 107 SQream DB, Release 2021.1
Back to Overview
2.1.3.1.4 Connect to SQream Using MicroStrategy
2.1.3.1.4.1 Overview
This document is a Quick Start Guide that describes how to install MicroStrategy and connect a datasource to the MicroStrategy dasbhoard for analysis. The Connecting to SQream Using MicroStrategy page describes the following:
• What is MicroStrategy? • Connecting a Data Source • Supported SQream Drivers
2.1.3.1.4.2 What is MicroStrategy?
MicroStrategy is a Business Intelligence software offering a wide variety of data analytics capabilities. SQream uses the MicroStrategy connector for reading and loading data into SQream. MicroStrategy provides the following: • Data discovery • Advanced analytics • Data visualization • Embedded BI • Banded reports and statements For more information about Microstrategy, see MicroStrategy. Back to Overview
2.1.3.1.4.3 Connecting a Data Source
1. Activate the MicroStrategy Desktop app. The app displays the Dossiers panel to the right.
2. Download the most current version of the SQream JDBC driver.
108 Chapter 2. Guides SQream DB, Release 2021.1
3. Click Dossiers and New Dossier. The Untitled Dossier panel is displayed.
4. Click New Data.
5. From the Data Sources panel, select Databases to access data from tables. The Select Import Options panel is displayed.
6. Select one of the following: • Build a Query • Type a Query • Select Tables
7. Click Next.
8. In the Data Source panel, do the following: 1. From the Database dropdown menu, select Generic. The Host Name, Port Number, and Database Name fields are removed from the panel.
2. In the Version dropdown menu, verify that Generic DBMS is selected.
3. Click Show Connection String.
4. Select the Edit connection string checkbox.
5. From the Driver dropdown menu, select a driver for one of the following connectors: • JDBC - The SQream driver is not integrated with MicroStrategy and does not appear in the dropdown menu. However, to proceed, you must select an item, and in the next step you must specify the path to the SQream driver that you installed on your machine. • ODBC - SQreamDB ODBC
6. In the Connection String text box, type the relevant connection string and path to the JDBC jar file using the following syntax:
$ jdbc:Sqream://
The following example shows the correct syntax for the JDBC connector:
jdbc;MSTR_JDBC_JAR_FOLDER=C:\path\to\jdbc\folder;DRIVER=
2.1. Full list of guides 109 SQream DB, Release 2021.1
The following example shows the correct syntax for the ODBC connector:
odbc:Driver={SqreamODBCDriver};DSN={SQreamDB ODBC};Server=
For more information about the available connection parameters and other examples, see Connection Parameters. 7. In the User and Password fields, fill out your user name and password.
8. In the Data Source Name field, type SQreamDB.
9. Click Save. The SQreamDB that you picked in the Data Source panel is displayed. 9. In the Namespace menu, select a namespace. The tables files are displayed.
10. Drag and drop the tables into the panel on the right in your required order.
11. Recommended - Click Prepare Data to customize your data for analysis.
12. Click Finish.
13. From the Data Access Mode dialog box, select one of the following: • Connect Live • Import as an In-memory Dataset Your populated dashboard is displayed and is ready for data discovery and analytics. Back to Overview
2.1.3.1.4.4 Supported SQream Drivers
The following list shows the supported SQream drivers and versions: • JDBC - Version 4.3.3 and higher. • ODBC - Version 4.0.0. Back to Overview
2.1.3.1.5 Connect to SQream Using R
You can use R to interact with a SQream DB cluster. This tutorial is a guide that will show you how to connect R to SQream DB.
110 Chapter 2. Guides SQream DB, Release 2021.1
In this topic:
• JDBC – A full example • ODBC – A full example
2.1.3.1.5.1 JDBC
1. Get the SQream DB JDBC driver. 2. In R, install RJDBC
> install.packages("RJDBC") Installing package into 'C:/Users/r/...' (as 'lib' is unspecified)
package 'RJDBC' successfully unpacked and MD5 sums checked
3. Import the RJDBC library
> library(RJDBC)
4. Set the classpath and initialize the JDBC driver which was previously installed. For example, on Windows:
> cp = c("C:\\Program Files\\SQream Technologies\\JDBC Driver\\2020.1-3.2.0\\sqream- ,→jdbc-3.2.jar") > .jinit(classpath=cp) > drv <- JDBC("com.sqream.jdbc.SQDriver","C:\\Program Files\\SQream␣ ,→Technologies\\JDBC Driver\\2020.1-3.2.0\\sqream-jdbc-3.2.jar")
5. Open a connection with a JDBC connection string and run your first statement
> con <- dbConnect(drv,"jdbc:Sqream://127.0.0.1:3108/master;user=rhendricks; ,→password=Tr0ub4dor&3;cluster=true")
> dbGetQuery(con,"select top 5 * from t") xint xtinyint xsmallint xbigint 1 1 82 5067 1 2 2 14 1756 2 3 3 91 22356 3 4 4 84 17232 4 5 5 13 14315 5
6. Close the connection
> close(con)
2.1. Full list of guides 111 SQream DB, Release 2021.1
2.1.3.1.5.2 A full example
> library(RJDBC) > cp = c("C:\\Program Files\\SQream Technologies\\JDBC Driver\\2020.1-3.2.0\\sqream-jdbc- ,→3.2.jar") > .jinit(classpath=cp) > drv <- JDBC("com.sqream.jdbc.SQDriver","C:\\Program Files\\SQream Technologies\\JDBC␣ ,→Driver\\2020.1-3.2.0\\sqream-jdbc-3.2.jar") > con <- dbConnect(drv,"jdbc:Sqream://127.0.0.1:3108/master;user=rhendricks; ,→password=Tr0ub4dor&3;cluster=true") > dbGetQuery(con,"select top 5 * from t") xint xtinyint xsmallint xbigint 1 1 82 5067 1 2 2 14 1756 2 3 3 91 22356 3 4 4 84 17232 4 5 5 13 14315 5 > close(con)
2.1.3.1.5.3 ODBC
1. Install the SQream DB ODBC driver for your operating system, and create a DSN. 2. In R, install RODBC
> install.packages("RODBC") Installing package into 'C:/Users/r/...' (as 'lib' is unspecified)
package 'RODBC' successfully unpacked and MD5 sums checked
3. Import the RODBC library
> library(RODBC)
4. Open a connection handle to an existing DSN (my_cool_dsn in this example)
> ch <- odbcConnect("my_cool_dsn",believeNRows=F)
5. Run your first statement
> sqlQuery(ch,"select top 5 * from t") xint xtinyint xsmallint xbigint 1 1 82 5067 1 2 2 14 1756 2 3 3 91 22356 3 4 4 84 17232 4 5 5 13 14315 5
6. Close the connection
> close(ch)
112 Chapter 2. Guides SQream DB, Release 2021.1
2.1.3.1.5.4 A full example
> library(RODBC) > ch <- odbcConnect("my_cool_dsn",believeNRows=F) > sqlQuery(ch,"select top 5 * from t") xint xtinyint xsmallint xbigint 1 1 82 5067 1 2 2 14 1756 2 3 3 91 22356 3 4 4 84 17232 4 5 5 13 14315 5 > close(ch)
2.1.3.1.6 Connect to SQream Using PHP
You can use PHP to interact with a SQream DB cluster. This tutorial is a guide that will show you how to connect a PHP application to SQream DB.
In this topic:
• Prerequisites • Testing the connection
2.1.3.1.6.1 Prerequisites
1. Install the SQream DB ODBC driver for Linux and create a DSN. 2. Install the uODBC extension for your PHP installation. To configure PHP to enable uODBC, con- figure it with ./configure --with-pdo-odbc=unixODBC,/usr/local when compiling php or install php-odbc and php-pdo along with php (version 7.1 minimum for best results) using your distribution package manager.
2.1.3.1.6.2 Testing the connection
1. Create a test connection file. Be sure to use the correct parameters for your SQream DB installation. Download this PHP example connection file .
1
2.1. Full list of guides 113 SQream DB, Release 2021.1
(continued from previous page)
11 echo "Result is " . odbc_result($rs, $i); 12 } 13 } 14 echo "\n"; 15 odbc_close($conn); // Finally, close the connection 16 ?>
Tip: An example of a valid DSN line is:
$dsn = "odbc:Driver={SqreamODBCDriver};Server=192.168.0.5;Port=5000;Database=master; ,→User=rhendricks;Password=super_secret;Service=sqream";
For more information about supported DSN parameters, see ODBC DSN Parameters.
2. Run the PHP file either directly with PHP(php test.php) or through a browser.
2.1.3.1.7 Connect to SQream Using SAS Viya
You can use SAS Viya to connect to a SQream DB cluster. This tutorial is a guide that will show you how to connect to SAS Viya.
In this topic:
• Installing SAS Viya • Installing the JDBC driver • Configuring the JDBC driver in SAS Viya • Browsing data and workbooks • SAS Viya best practices and troubleshooting – Cut out what you don’t need – Create a separate service for SAS Viya – Troubleshooting java.lang.ClassNotFoundException: com.sqream.jdbc.SQDriver excep- tions
2.1.3.1.7.1 Installing SAS Viya
SQream DB has been tested with SAS Viya v.03.05 and newer. If you do not already have SAS Viya installed, refer to SAS’s website https://www.sas.com/en_us/software/ viya.html .
2.1.3.1.7.2 Installing the JDBC driver
1. Download the JDBC driver from the client drivers page.
114 Chapter 2. Guides SQream DB, Release 2021.1
2. Unzip the JDBC driver into a location on the SAS Viya server. SQream recommends creating the directory /opt/sqream on the SAS Viya server.
2.1.3.1.7.3 Configuring the JDBC driver in SAS Viya
1. Sign into SAS Studio 2. Create a new SAS program
3. Create a sample program to explore data
Listing 8: Sample SAS program
1 options sastrace='d,d,d,d' 2 sastraceloc=saslog 3 nostsuffix 4 msglevel=i 5 sql_ip_trace=(note,source) 6 DEBUG=DBMS_SELECT;
7
8 options validvarname=any;
9
10 libname sqlib jdbc driver="com.sqream.jdbc.SQDriver" 11 classpath="/opt/sqream/sqream-jdbc-4.0.0.jar" 12 URL="jdbc:Sqream://sqream-cluster.piedpiper.com:3108/raviga;cluster=true" 13 user="rhendricks" 14 password="Tr0ub4dor3" 15 schema="public" 16 PRESERVE_TAB_NAMES=YES 17 PRESERVE_COL_NAMES=YES;
18 (continues on next page)
2.1. Full list of guides 115 SQream DB, Release 2021.1
(continued from previous page)
19 proc sql; 20 title 'Customers table'; 21 select * 22 from sqlib.customers; 23 quit;
24
25 data sqlib.customers; 26 set sqlib.customers; 27 run;
This sample program does several things: • Line 10: Start a JDBC session named sqlib, associated with the SQream DB driver. This statement extends the SAS global LIBNAME statement so that you can assign a libref to your data source. This feature lets you reference a table in SQream DB directly in a DATA Step or SAS procedure. • Line 11: Instruct SAS Viya where to find the SQream DB JDBC driver. This step is required because SAS Viya will not honor the SAS_ACCESS_CLASSPATH environment variable for this connection. • Lines 12-15: Associate the libref to be able to use it in the program as sqlib.tablename. The libref will be sqlib and it will use the JDBC engine and connect to the sqream-cluster. piedpiper.com SQream DB cluster. The database name is raviga and the schema is public. See our JDBC guide for connection string documentation
2.1.3.1.7.4 Browsing data and workbooks
1. From the panel on the left, navigate to Libraries to open the navigation tree. 2. Our previously created library named SQLIB will populate, and show the table customers. Double clicking on the table name will expand it and show the columns. 3. Find the workbook you created in the DATA step. It should appear under WORK. The workbook will be named sqlib.customers. Double click it to expand the table tree.
116 Chapter 2. Guides SQream DB, Release 2021.1
Fig. 1: The Run button runs the current SAS program
Fig. 2: The results tab contains query results
2.1. Full list of guides 117 SQream DB, Release 2021.1
2.1.3.1.7.5 SAS Viya best practices and troubleshooting
2.1.3.1.7.6 Cut out what you don’t need
• Bring only the data sources you need into SAS Viya. As a best practice, do not bring in tables that you don’t intend to explore. • Add filters before the DATA step to reduce in-memory size. Add filters to the datasource before exploring, so that the queries sent to SQream DB run faster.
2.1.3.1.7.7 Create a separate service for SAS Viya
SQream recommends that SAS Viya get a separate service with the DWLM. This will reduce the impact of SAS Viya on other applications and processes, such as ETL. This works in conjunction with the load balancer to ensure good performance.
2.1.3.1.7.8 Troubleshooting java.lang.ClassNotFoundException: com.sqream.jdbc.SQDriver excep- tions
In some cases, SAS Viya may have trouble finding the SQream DB JDBC driver. This message explains that the driver can’t be found. To solve this issue, try two things: 1. Verify that the JDBC driver was placed in a directory that SAS Viya can access 2. Verify the classpath in your SAS program. Make sure that the classpath is correct, and the file it references can be accessed by SAS Viya. If you’re still experiencing issues after restarting SAS Viya, we’re always happy to help. Visit SQream’s support portal for additional support.
118 Chapter 2. Guides SQream DB, Release 2021.1
2.1.4 Operations
The guides in this section include information about installing Monit, best practices, monitoring, logging, troubleshooting, and maintaining a SQream DB cluster.
2.1.4.1 Optimization and Best Practices
This topic explains some best practices of working with SQream DB. See also our Monitoring query performance guide for more information.
In this topic:
• Table design – Use date and datetime types for columns – Reduce varchar length to a minimum – Don’t flatten or denormalize data – Convert foreign tables to native tables – Use information about the column data to your advantage ∗ Set NULL or NOT NULL when relevant ∗ Keep VARCHAR lengths to a minimum • Sorting • Query best practices – Reduce data sets before joining tables – Prefer the ANSI JOIN – Use the high selectivity hint – Cast smaller types to avoid overflow in aggregates – Prefer COUNT(*) and COUNT on non-nullable columns – Return only required columns – Use saved queries to reduce recurring compilation time – Pre-filter to reduce JOIN complexity • Data loading considerations – Allow and use natural sorting on data • Further reading and monitoring query performance
2.1.4.1.1 Table design
This section describes best practices and guidelines for designing tables.
2.1. Full list of guides 119 SQream DB, Release 2021.1
2.1.4.1.1.1 Use date and datetime types for columns
When creating tables with dates or timestamps, using the purpose-built DATE and DATETIME types over integer types or VARCHAR will bring performance and storage footprint improvements, and in many cases huge performance improvements (as well as data integrity benefits). SQream DB stores dates and datetimes very efficiently and can strongly optimize queries using these specific types.
2.1.4.1.1.2 Reduce varchar length to a minimum
With the VARCHAR type, the length has a direct effect on query performance. If the size of your column is predictable, by defining an appropriate column length (no longer than the maximum actual value) you will get the following benefits: • Data loading issues can be identified more quickly • SQream DB can reserve less memory for decompression operations • Third-party tools that expect a data size are less likely to over-allocate memory
2.1.4.1.1.3 Don’t flatten or denormalize data
SQream DB executes JOIN operations very effectively. It is almost always better to JOIN tables at query- time rather than flatten/denormalize your tables. This will also reduce storage size and reduce row-lengths. We highly suggest using INT or BIGINT as join keys, rather than a text/string type.
2.1.4.1.1.4 Convert foreign tables to native tables
SQream DB’s native storage is heavily optimized for analytic workloads. It is always faster for querying than other formats, even columnar ones such as Parquet. It also enables the use of additional metadata to help speed up queries, in some cases by many orders of magnitude. You can improve the performance of all operations by converting foreign tables into native tables by using the CREATE TABLE AS syntax. For example,
CREATE TABLE native_table AS SELECT * FROM external_table
The one situation when this wouldn’t be as useful is when data will be only queried once.
2.1.4.1.1.5 Use information about the column data to your advantage
Knowing the data types and their ranges can help design a better table.
2.1.4.1.1.6 Set NULL or NOT NULL when relevant
For example, if a value can’t be missing (or NULL), specify a NOT NULL constraint on the columns. Not only does specifying NOT NULL save on data storage, it lets the query compiler know that a column cannot have a NULL value, which can improve query performance.
120 Chapter 2. Guides SQream DB, Release 2021.1
2.1.4.1.1.7 Keep VARCHAR lengths to a minimum
While it won’t make a big difference in storage, large strings allocate a lot of memory at query time. If a column’s string length never exceeds 50 characters, specify VARCHAR(50) rather than an arbitrarily large number.
2.1.4.1.2 Sorting
Data sorting is an important factor in minimizing storage size and improving query performance. • Minimizing storage saves on physical resources and increases performance by reducing overall disk I/O. Prioritize the sorting of low-cardinality columns. This reduces the number of chunks and extents that SQream DB reads during query execution. • Where possible, sort columns with the lowest cardinality first. Avoid sorting VARCHAR and TEXT/ NVARCHAR columns with lengths exceeding 50 characters. • For longer-running queries that run on a regular basis, performance can be improved by sorting data based on the WHERE and GROUP BY parameters. Data can be sorted during insert by using Foreign Tables or by using CREATE TABLE AS.
2.1.4.1.3 Query best practices
This section describes best practices for writing SQL queries.
2.1.4.1.3.1 Reduce data sets before joining tables
Reducing the input to a JOIN clause can increase performance. Some queries benefit from retreiving a reduced dataset as a subquery prior to a join. For example,
SELECT store_name, SUM(amount) FROM store_dim AS dim INNER JOIN store_fact AS fact ON dim.store_id=fact.store_id WHERE p_date BETWEEN '2018-07-01' AND '2018-07-31' GROUP BY 1;
Can be rewritten as
SELECT store_name, sum_amount FROM store_dim AS dim INNER JOIN (SELECT SUM(amount) AS sum_amount, store_id FROM store_fact WHERE p_date BETWEEN '2018-07-01' AND '2018-07-31' group by 2) AS fact ON dim.store_id=fact.store_id;
2.1.4.1.3.2 Prefer the ANSI JOIN
SQream DB prefers the ANSI JOIN syntax. In some cases, the ANSI JOIN performs better than the non-ANSI variety.
2.1. Full list of guides 121 SQream DB, Release 2021.1
For example, this ANSI JOIN example will perform better:
Listing 9: ANSI JOIN will perform better SELECT p.name, s.name, c.name FROM "Products" AS p JOIN "Sales" AS s ON p.product_id = s.sale_id JOIN "Customers" as c ON s.c_id = c.id AND c.id = 20301125;
This non-ANSI JOIN is supported, but not recommended:
Listing 10: Non-ANSI JOIN may not perform well SELECT p.name, s.name, c.name FROM "Products" AS p, "Sales" AS s, "Customers" as c WHERE p.product_id = s.sale_id AND s.c_id = c.id AND c.id = 20301125;
2.1.4.1.3.3 Use the high selectivity hint
Selectivity is the ratio of cardinality to the number of records of a chunk. We define selectivity as Distinct values Total number of records in a chunk SQream DB has a hint function called HIGH_SELECTIVITY, which is a function you can wrap a condition in. The hint signals to SQream DB that the result of the condition will be very sparse, and that it should attempt to rechunk the results into fewer, fuller chunks. Use the high selectivity hint when you expect a predicate to filter out most values. For example, when the data is dispersed over lots of chunks (meaning that the data is not well-clustered). For example,
SELECT store_name, SUM(amount) FROM store_dim WHERE HIGH_SELECTIVITY(p_date = '2018-07-01') GROUP BY 1;
This hint tells the query compiler that the WHERE condition is expected to filter out more than 60% of values. It never affects the query results, but when used correctly can improve query performance.
Tip: The HIGH_SELECTIVITY() hint function can only be used as part of the WHERE clause. It can’t be used in equijoin conditions, cases, or in the select list.
Read more about identifying the scenarios for the high selectivity hint in our Monitoring query performance guide.
2.1.4.1.3.4 Cast smaller types to avoid overflow in aggregates
When using an INT or smaller type, the SUM and COUNT operations return a value of the same type. To avoid overflow on large results, cast the column up to a larger type.
122 Chapter 2. Guides SQream DB, Release 2021.1
For example
SELECT store_name, SUM(amount :: BIGINT) FROM store_dim GROUP BY 1;
2.1.4.1.3.5 Prefer COUNT(*) and COUNT on non-nullable columns
SQream DB optimizes COUNT(*) queries very strongly. This also applies to COUNT(column_name) on non- nullable columns. Using COUNT(column_name) on a nullable column will operate quickly, but much slower than the previous variations.
2.1.4.1.3.6 Return only required columns
Returning only the columns you need to client programs can improve overall query performance. This also reduces the overall result set, which can improve performance in third-party tools. SQream is able to optimize out unneeded columns very strongly due to its columnar storage.
2.1.4.1.3.7 Use saved queries to reduce recurring compilation time
Saved queries are compiled when they are created. The query plan is saved in SQream DB’s metadata for later re-use. Because the query plan is saved, they can be used to reduce compilation overhead, especially with very complex queries, such as queries with lots of values in an IN predicate. When executed, the saved query plan is recalled and executed on the up-to-date data stored on disk. See how to use saved queries in the saved queries guide.
2.1.4.1.3.8 Pre-filter to reduce JOIN complexity
Filter and reduce table sizes prior to joining on them
SELECT store_name, SUM(amount) FROM dimention dim JOIN fact ON dim.store_id = fact.store_id WHERE p_date BETWEEN '2019-07-01' AND '2019-07-31' GROUP BY store_name;
Can be rewritten as:
SELECT store_name, sum_amount FROM dimention AS dim INNER JOIN (SELECT SUM(amount) AS sum_amount, store_id FROM fact WHERE p_date BETWEEN '2019-07-01' AND '2019-07-31' GROUP BY store_id) AS fact ON dim.store_id = fact.store_id;
2.1. Full list of guides 123 SQream DB, Release 2021.1
2.1.4.1.4 Data loading considerations
2.1.4.1.4.1 Allow and use natural sorting on data
Very often, tabular data is already naturally ordered along a dimension such as a timestamp or area. This natural order is a major factor for query performance later on, as data that is naturally sorted can be more easily compressed and analyzed with SQream DB’s metadata collection. For example, when data is sorted by timestamp, filtering on this timestamp is more effective than filtering on an unordered column. Natural ordering can also be used for effective DELETE operations.
2.1.4.1.5 Further reading and monitoring query performance
Read our Monitoring query performance guide to learn how to use the built in monitoring utilities. The guide also gives concerete examples for improving query performance.
2.1.4.2 Installing Monit
2.1.4.2.1 Getting Started
Before installing SQream with Monit, verify that you have followed the required recommended pre-installation configurations. The procedures in the Installing Monit guide must be performed on each SQream cluster node.
2.1.4.2.2 Overview
Monit is a free open source supervision utility for managing and monitoring Unix and Linux. Monit lets you view system status directly from the command line or from a native HTTP web server. Monit can be used to conduct automatic maintenance and repair, such as executing meaningful causal actions in error situations. SQream uses Monit as a watchdog utility, but you can use any other utility that provides the same or similar functionality. The Installing Monit procedures describes how to install, configure, and start Monit. You can install Monit in one of the following ways: • Installing Monit on CentOS • Installing Monit on CentOS offline • Installing Monit on Ubuntu • Installing Monit on Ubuntu offline
2.1.4.2.2.1 Installing Monit on CentOS:
To install Monit on CentOS: 1. Install Monit as a superuser on CentOS:
124 Chapter 2. Guides SQream DB, Release 2021.1
$ sudo yum install monit
2.1.4.2.2.2 Installing Monit on CentOS Offline:
Installing Monit on CentOS offline can be done in either of the following ways: • Building Monit from Source Code • Building Monit from Pre-Built Binaries
2.1.4.2.2.3 Building Monit from Source Code
To build Monit from source code: 1. Copy the Monit package for the current version:
$ tar zxvf monit-
The value x.y.z denotes the version numbers. 2. Navigate to the directory where you want to store the package:
$ cd monit-x.y.z
3. Configure the files in the package:
$ ./configure (use ./configure --help to view available options)
4. Build and install the package:
$ make && make install
The following are the default storage directories: • The Monit package: /usr/local/bin/ • The monit.1 man-file: /usr/local/man/man1/ 5. Optional - To change the above default location(s), use the –prefix option to ./configure. 6. Optional - Create an RPM package for CentOS directly from the source code:
$ rpmbuild -tb monit-x.y.z.tar.gz
2.1.4.2.2.4 Building Monit from Pre-Built Binaries
To build Monit from pre-built binaries: 1. Copy the Monit package for the current version:
$ tar zxvf monit-x.y.z-linux-x64.tar.gz
The value x.y.z denotes the version numbers. 2. Navigate to the directory where you want to store the package:
2.1. Full list of guides 125 SQream DB, Release 2021.1
3. Copy the bin/monit and /usr/local/bin/ directories:
$ cp bin/monit /usr/local/bin/
4. Copy the conf/monitrc and /etc/ directories:
$ cp conf/monitrc /etc/
For examples of pre-built Monit binarties, see Download Precompiled Binaries. Back to top
2.1.4.2.2.5 Installing Monit on Ubuntu:
To install Monit on Ubuntu: 1. Install Monit as a superuser on Ubuntu:
$ sudo apt-get install monit
Back to top
2.1.4.2.2.6 Installing Monit on Ubuntu Offline:
You can install Monit on Ubuntu when you do not have an internet connection. To install Monit on Ubuntu offline: 1. Compress the required file:
$ tar zxvf monit-
NOTICE:
$ cd monit-x.y.z
3. Copy the bin/monit directory into the /usr/local/bin/ directory:
$ cp bin/monit /usr/local/bin/
4. Copy the conf/monitrc directory into the /etc/ directory:
$ cp conf/monitrc /etc/
Back to top
2.1.4.2.3 Configuring Monit
When the installation is complete, you can configure Monit. You configure Monit by modifying the Monit configuration file, called monitrc. This file contains blocks for each service that you want to monitor. The following is an example of a service block:
126 Chapter 2. Guides SQream DB, Release 2021.1
$ #SQREAM1-START $ check process sqream1 with pidfile /var/run/sqream1.pid $ start program = "/usr/bin/systemctl start sqream1" $ stop program = "/usr/bin/systemctl stop sqream1" $ #SQREAM1-END
For example, if you have 16 services, you can configure this block by copying the entire block 15 times and modifying all service names as required, as shown below:
$ #SQREAM2-START $ check process sqream2 with pidfile /var/run/sqream2.pid $ start program = "/usr/bin/systemctl start sqream2" $ stop program = "/usr/bin/systemctl stop sqream2" $ #SQREAM2-END
For servers that don’t run the metadataserver and serverpicker commands, you can use the block example above, but comment out the related commands, as shown below:
$ #METADATASERVER-START $ #check process metadataserver with pidfile /var/run/metadataserver.pid $ #start program = "/usr/bin/systemctl start metadataserver" $ #stop program = "/usr/bin/systemctl stop metadataserver" $ #METADATASERVER-END
To configure Monit: 1. Copy the required block for each required service. 2. Modify all service names in the block. 3. Copy the configured monitrc file to the /etc/monit.d/ directory:
$ cp monitrc /etc/monit.d/
4. Set file permissions to 600 (full read and write access):
$ sudo chmod 600 /etc/monit.d/monitrc
5. Reload the system to activate the current configurations:
$ sudo systemctl daemon-reload
6. Optional - Navigate to the /etc/sqream directory and create a symbolic link to the monitrc file:
$ cd /etc/sqream $ sudo ln -s /etc/monit.d/monitrc monitrc
2.1.4.2.4 Starting Monit
After configuring Monit, you can start it. To start Monit: 1. Start Monit as a super user:
2.1. Full list of guides 127 SQream DB, Release 2021.1
$ sudo systemctl start monit
2. View Monit’s service status:
$ sudo systemctl status monit
3. If Monit is functioning correctly, enable the Monit service to start on boot:
$ sudo systemctl enable monit
2.1.4.3 Launching SQream with Monit
This procedure describes how to launch SQream using Monit.
2.1.4.3.1 Launching SQream
After doing the following, you can launch SQream according to the instructions on this page. 1. Installing Monit 2. Installing SQream with Binary The following is an example of a working monitrc file configured to monitor the *metadataserver and serverpicker commands, and four sqreamd services. The monitrc configuration file is located inthe conf/monitrc directory. Note that the monitrc in the example is configured for eight sqreamd services, but that only the first four are enabled:
$ set daemon 5 # check services at 30 seconds intervals $ set logfile syslog $ $ set httpd port 2812 and $ use address localhost # only accept connection from localhost $ allow localhost # allow localhost to connect to the server and $ allow admin:monit # require user 'admin' with password 'monit' $ $ ##set mailserver smtp.gmail.com port 587 $ ## using tlsv12 $ #METADATASERVER-START $ check process metadataserver with pidfile /var/run/metadataserver.pid $ start program = "/usr/bin/systemctl start metadataserver" $ stop program = "/usr/bin/systemctl stop metadataserver" $ #METADATASERVER-END $ # alert [email protected] on {nonexist, timeout} $ # with mail-format { $ # from: Monit@$HOST $ # subject: metadataserver $EVENT - $ACTION $ # message: This is an automate mail, sent from␣ ,→monit. $ # } $ #SERVERPICKER-START $ check process serverpicker with pidfile /var/run/serverpicker.pid (continues on next page)
128 Chapter 2. Guides SQream DB, Release 2021.1
(continued from previous page) $ start program = "/usr/bin/systemctl start serverpicker" $ stop program = "/usr/bin/systemctl stop serverpicker" $ #SERVERPICKER-END $ # alert [email protected] on {nonexist, timeout} $ # with mail-format { $ # from: Monit@$HOST $ # subject: serverpicker $EVENT - ,→$ACTION $ # message: This is an automate mail, ,→ sent from monit. $ # $ # $ #SQREAM1-START $ check process sqream1 with pidfile /var/run/sqream1.pid $ start program = "/usr/bin/systemctl start sqream1" $ stop program = "/usr/bin/systemctl stop sqream1" $ #SQREAM1-END $ # alert [email protected] on {nonexist, timeout} $ # with mail-format { $ # from: Monit@$HOST $ # subject: sqream1 $EVENT - $ACTION $ # message: This is an automate mail, sent from monit. $ # } $ #SQREAM2-START $ check process sqream2 with pidfile /var/run/sqream2.pid $ start program = "/usr/bin/systemctl start sqream2" $ #SQREAM2-END $ # alert [email protected] on {nonexist, timeout} $ # with mail-format { $ # from: Monit@$HOST $ # subject: sqream1 $EVENT - $ACTION $ # message: This is an automate mail, sent from monit. $ # } $ #SQREAM3-START $ check process sqream3 with pidfile /var/run/sqream3.pid $ start program = "/usr/bin/systemctl start sqream3" $ stop program = "/usr/bin/systemctl stop sqream3" $ #SQREAM3-END $ # alert [email protected] on {nonexist, timeout} $ # with mail-format { $ # from: Monit@$HOST $ # subject: sqream2 $EVENT - $ACTION $ # message: This is an automate mail, sent from monit. $ # } $ #SQREAM4-START $ check process sqream4 with pidfile /var/run/sqream4.pid $ start program = "/usr/bin/systemctl start sqream4" $ stop program = "/usr/bin/systemctl stop sqream4" $ #SQREAM4-END $ # alert [email protected] on {nonexist, timeout} $ # with mail-format { $ # from: Monit@$HOST (continues on next page)
2.1. Full list of guides 129 SQream DB, Release 2021.1
(continued from previous page) $ # subject: sqream2 $EVENT - $ACTION $ # message: This is an automate mail, sent from␣ ,→monit. $ # } $ # $ #SQREAM5-START $ #check process sqream5 with pidfile /var/run/sqream5.pid $ #start program = "/usr/bin/systemctl start sqream5" $ #stop program = "/usr/bin/systemctl stop sqream5" $ #SQREAM5-END $ # alert [email protected] on {nonexist, timeout} $ # with mail-format { $ # from: Monit@$HOST $ # subject: sqream2 $EVENT - $ACTION $ # message: This is an automate mail, sent from␣ ,→monit. $ # } $ # $ #SQREAM6-START $ #check process sqream6 with pidfile /var/run/sqream6.pid $ #start program = "/usr/bin/systemctl start sqream6" $ #stop program = "/usr/bin/systemctl stop sqream6" $ #SQREAM6-END $ # alert [email protected] on {nonexist, timeout} $ # with mail-format { $ # from: Monit@$HOST $ # subject: sqream2 $EVENT - $ACTION $ # message: This is an automate mail, sent from␣ ,→monit. $ # } $ # $ #SQREAM7-START $ #check process sqream7 with pidfile /var/run/sqream7.pid $ #start program = "/usr/bin/systemctl start sqream7" $ #stop program = "/usr/bin/systemctl stop sqream7" $ #SQREAM7-END $ # with mail-format { $ # from: Monit@$HOST $ # subject: sqream2 $EVENT - $ACTION $ # message: This is an automate mail, sent from␣ ,→monit. $ # } $ # $ #SQREAM8-START $ #check process sqream8 with pidfile /var/run/sqream8.pid $ #start program = "/usr/bin/systemctl start sqream8" $ #stop program = "/usr/bin/systemctl stop sqream8" $ #SQREAM8-END $ # alert [email protected] on {nonexist, timeout} $ # with mail-format { $ # from: Monit@$HOST $ # subject: sqream2 $EVENT - $ACTION (continues on next page)
130 Chapter 2. Guides SQream DB, Release 2021.1
(continued from previous page) $ # message: This is an automate mail, sent from␣ ,→monit. $ # }
2.1.4.3.2 Monit Usage Examples
This section shows examples of two methods for stopping the sqream3 service use Monit’s command syntax: • Stopping Monit and SQream separately • Stopping SQream using a Monit command
2.1.4.3.2.1 Stopping Monit and SQream Separately
You can stop the Monit service and SQream separately as follows:
$ sudo systemctl stop monit $ sudo systemctl stop sqream3
You can restart Monit as follows:
$ sudo systemctl start monit
Restarting Monit automatically restarts the SQream services.
2.1.4.3.2.2 Stopping SQream Using a Monit Command
You can stop SQream using a Monit command as follows:
$ sudo monit stop sqream3
This command spots SQream only (and not Monit). You can restart SQream as follows:
$ sudo monit start sqream3
2.1.4.3.2.3 Monit Command Line Options
This section describes some of the most commonly used Monit command options. You can show the command line options by running:
$ monit --help
$ start all - Start all services $ start
2.1. Full list of guides 131 SQream DB, Release 2021.1
(continued from previous page) $ restart all - Stop and start all services $ restart
2.1.4.3.3 Using Monit While Upgrading Your Version of SQream
While upgrade your version of SQream, you can use Monit to avoid conflicts (such as service start). This is done by pausing or stopping all running services while you manually upgrade SQream. When you finish successfully upgrading SQream, you can use Monit to restart all SQream services To use Monit while upgrading your version of SQream: 1. Stop all actively running SQream services:
$ sudo monit stop all
2. Verify that SQream has stopped listening on ports 500X, 510X, and 310X:
$ sudo netstat -nltp #to make sure sqream stopped listening on 500X, 510X and␣ ,→310X ports.
The example below shows the old version sqream-db-v2020.2 being replaced with the new version sqream-db-v2025.200.
$ cd /home/sqream $ mkdir tempfolder $ mv sqream-db-v2025.200.tar.gz tempfolder/ $ tar -xf sqream-db-v2025.200.tar.gz $ sudo mv sqream /usr/local/sqream-db-v2025.200 $ cd /usr/local $ sudo chown -R sqream:sqream sqream-db-v2025.200 $ sudo rm sqream #This only should remove symlink $ sudo ln -s sqream-db-v2025.200 sqream #this will create new symlink named "sqream"␣ ,→pointing to new version $ ls -l
The symbolic SQream link pointing should be pointing to the real folder:
$ sqream -> sqream-db-v2025.200
4. Restart the SQream services:
132 Chapter 2. Guides SQream DB, Release 2021.1
$ sudo monit start all
5. Verify that the latest version has been installed:
$ SELECT SHOW_VERSION();
The correct version is output. 5. Restart the UI:
$ pm2 start all
2.1.4.4 Installing SQream Using Binary Packages
This procedure describes how to install SQream using Binary packages and must be done on all servers. To install SQream using Binary packages: 1. Copy the SQream package to the /home/sqream directory for the current version:
$ tar -xf sqream-db-v<2020.2>.tar.gz
2. Append the version number to the name of the SQream folder. The version number in the following example is v2020.2:
$ mv sqream sqream-db-v<2020.2>
3. Move the new version of the SQream folder to the /usr/local/ directory:
$ sudo mv sqream-db-v<2020.2> /usr/local/
4. Change the ownership of the folder to sqream folder:
$ sudo chown -R sqream:sqream /usr/local/sqream-db-v<2020.2>
5. Navigate to the /usr/local/ directory and create a symbolic link to SQream:
$ cd /usr/local $ sudo ln -s sqream-db-v<2020.2> sqream
6. Verify that the symbolic link that you created points to the folder that you created:
$ ls -l
7. Verify that the symbolic link that you created points to the folder that you created:
$ sqream -> sqream-db-v<2020.2>
8. Create the SQream configuration file destination folders and set their ownership to sqream:
$ sudo mkdir /etc/sqream $ sudo chown -R sqream:sqream /etc/sqream
9. Create the SQream service log destination folders and set their ownership to sqream:
2.1. Full list of guides 133 SQream DB, Release 2021.1
$ sudo mkdir /var/log/sqream $ sudo chown -R sqream:sqream /var/log/sqream
10. Navigate to the /usr/local/ directory and copy the SQream configuration files from them:
$ cd /usr/local/sqream/etc/ $ cp * /etc/sqream
The configuration files are service configuration files, and the JSON files are SQream configuration files, for a total of four files. The number of SQream configuration files and JSON files must be identical. NOTICE - Verify that the JSON files have been configured correctly and that all required flags havebeen set to the correct values. In each JSON file, the following parameters must be updated: • instanceId • machineIP • metadataServerIp • spoolMemoryGB • limitQueryMemoryGB • gpu • port • ssl_port Note the following: • The value of the metadataServerIp parameter must point to the IP that the metadata is running on. • The value of the machineIP parameter must point to the IP of your local machine. It would be same on server running metadataserver and different on other server nodes. 11. Optional - To run additional SQream services, copy the required configuration files and create addi- tional JSON files:
$ cp sqream2_config.json sqream3_config.json $ vim sqream3_config.json
NOTICE: A unique instanceID must be used in each JSON file. IN the example above, the instanceID sqream_2 is changed to sqream_3. 12. Optional - If you created additional services in Step 11, verify that you have also created their additional configuration files:
$ cp sqream2-service.conf sqream3-service.conf $ vim sqream3-service.conf
13. For each SQream service configuration file, do the following: 1. Change the SERVICE_NAME=sqream2 value to SERVICE_NAME=sqream3. 2. Change LOGFILE=/var/log/sqream/sqream2.log to LOG- FILE=/var/log/sqream/sqream3.log.
134 Chapter 2. Guides SQream DB, Release 2021.1
NOTE: If you are running SQream on more than one server, you must configure the serverpicker and metadatserver services to start on only one of the servers. If metadataserver is running on the first server, the metadataServerIP value in the second server’s /etc/sqream/sqream1_config.json file must point to the IP of the server on which the metadataserver service is running. 14. Set up servicepicker: 1. Do the following:
$ vim /etc/sqream/server_picker.conf
2. Change the IP 127.0.0.1 to the IP of the server that the metadataserver service is running on. 3. Change the CLUSTER to the value of the cluster path. 15. Set up your service files:
$ cd /usr/local/sqream/service/ $ cp sqream2.service sqream3.service $ vim sqream3.service
16. Increment each EnvironmentFile=/etc/sqream/sqream2-service.conf configuration file for each SQream service file, as shown below:
$ EnvironmentFile=/etc/sqream/sqream<3>-service.conf
17. Copy and register your service files into systemd:
$ sudo cp metadataserver.service /usr/lib/systemd/system/ $ sudo cp serverpicker.service /usr/lib/systemd/system/ $ sudo cp sqream*.service /usr/lib/systemd/system/
18. Verify that your service files have been copied into systemd:
$ ls -l /usr/lib/systemd/system/sqream* $ ls -l /usr/lib/systemd/system/metadataserver.service $ ls -l /usr/lib/systemd/system/serverpicker.service $ sudo systemctl daemon-reload
19. Copy the license into the /etc/license directory:
$ cp license.enc /etc/sqream/
If you have an HDFS environment, see Configuring an HDFS Environment for the User sqream.
2.1.4.4.1 Upgrading SQream Version
Upgrading your SQream version requires stopping all running services while you manually upgrade SQream. To upgrade your version of SQream: 1. Stop all actively running SQream services. Notice- All SQream services must remain stopped while the upgrade is in process. Ensuring that SQream services remain stopped depends on the tool being used. For an example of stopping actively running SQream services, see Launching SQream with Monit. 2. Verify that SQream has stopped listening on ports 500X, 510X, and 310X:
2.1. Full list of guides 135 SQream DB, Release 2021.1
$ sudo netstat -nltp #to make sure sqream stopped listening on 500X, 510X and␣ ,→310X ports.
3. Replace the old version sqream-db-v2020.2, with the new version sqream-db-v2021.1:
$ cd /home/sqream $ mkdir tempfolder $ mv sqream-db-v2021.1.tar.gz tempfolder/ $ tar -xf sqream-db-v2021.1.tar.gz $ sudo mv sqream /usr/local/sqream-db-v2021.1 $ cd /usr/local $ sudo chown -R sqream:sqream sqream-db-v2021.1
4. Remove the symbolic link:
$ sudo rm sqream
5. Create a new symbolic link named “sqream” pointing to the new version:
$ sudo ln -s sqream-db-v2021.1 sqream
6. Verify that the symbolic SQream link points to the real folder:
$ ls -l
The following is an example of the correct output:
$ sqream -> sqream-db-v2021.1
7. Optional- (for major versions) Upgrade your version of SQream storage cluster, as shown in the following example:
$ cat /etc/sqream/sqream1_config.json |grep cluster $ ./upgrade_storage
The following is an example of the correct output:
get_leveldb_version path{
136 Chapter 2. Guides SQream DB, Release 2021.1
8. Verify that the latest version has been installed:
$ ./sqream sql --username sqream --password sqream --host localhost --databasename␣ ,→master -c "SELECT SHOW_VERSION();"
The following is an example of the correct output:
v2021.1 1 row time: 0.050603s
For more information, see the upgrade_storage command line program.
2.1.4.5 Installing SQream with Kubernetes
Kubernetes, also known as k8s, is a portable open source platform that automates Linux container oper- ations. Kubernetes supports outsourcing data centers to public cloud service providers or can be scaled for web hosting. SQream uses Kubernetes as an orchestration and recovery solution. The Installing SQream with Kubernetes guide describes the following:
• Preparing the SQream Environment to Launch SQream Using Kubernetes • Setting Up Your Hosts • Installing Your Kubernetes Cluster • Installing the SQream Software • Running the Sqream-install Service • Using the sqream-start Commands • Upgrading Your SQream Version
2.1.4.5.1 Preparing the SQream Environment to Launch SQream Using Kubernetes
The Preparing the SQream environment to Launch SQream Using Kubernetes section describes the following:
• Overview • Operating System Requirements • Compute Server Specifications
2.1.4.5.1.1 Overview
A minimum of three servers is required for preparing the SQream environment using Kubernetes. Kubernetes uses clusters, which are sets of nodes running containterized applications. A cluster consists of at least two GPU nodes and one additional server without GPU to act as the quorum manager.
2.1. Full list of guides 137 SQream DB, Release 2021.1
Each server must have the following IP addresses: • An IP address located in the management network. • An additional IP address from the same subnet to function as a floating IP. All servers must be mounted in the same shared storage folder. The following list shows the server host name format requirements: • A maximum of 253 characters. • Only lowercase alphanumeric characters, such as - or .. • Starts and ends with alphanumeric characters. Go back to Preparing the SQream Environment to Launch SQream Using Kubernetes
2.1.4.5.1.2 Operating System Requirements
The required operating system is a version of x86 CentOS/RHEL between 7.6 and 7.9. Regarding PPC64le, the required version is RHEL 7.6. Go back to Preparing the SQream Environment to Launch SQream Using Kubernetes
2.1.4.5.1.3 Compute Server Specifications
Installing SQream with Kubernetes includes the following compute server specifications: • CPU: 4 cores • RAM: 16GB • HD: 500GB Go back to Preparing the SQream Environment to Launch SQream Using Kubernetes
2.1.4.5.2 Setting Up Your Hosts
SQream requires you to set up your hosts. Setting up your hosts requires the following:
• Configuring the Hosts File • Installing the Required Packages • Disabling the Linux UI • Disabling SELinux • Disabling Your Firewall • Checking the CUDA Version
2.1.4.5.2.1 Configuring the Hosts File
To configure the /etc/hosts file:
138 Chapter 2. Guides SQream DB, Release 2021.1
1. Edit the /etc/hosts file:
$ sudo vim /etc/hosts
2. Call your local host:
$ 127.0.0.1 localhost $
2.1.4.5.2.2 Installing the Required Packages
The first step in setting up your hosts is to install the required packages. To install the required packages: 1. Run the following command based on your operating system: • RHEL:
$ sudo yum -y install https://dl.fedoraproject.org/pub/epel/epel-release- ,→latest-7.noarch.rpm
• CentOS:
$ sudo yum install epel-release $ sudo yum install pciutils openssl-devel python36 python36-pip kernel- ,→devel-$(uname -r) kernel-headers-$(uname -r) gcc jq net-tools ntp
2. Verify that that the required packages were successfully installed. The following is the correct output:
ntpq --version jq --version python3 --version pip3 --version rpm -qa |grep kernel-devel-$(uname -r) rpm -qa |grep kernel-headers-$(uname -r) gcc --version
3. Enable the ntpd (Network Time Protocol daemon) program on all servers:
$ sudo systemctl start ntpd $ sudo systemctl enable ntpd $ sudo systemctl status ntpd $ sudo ntpq -p
Go back to Setting Up Your Hosts
2.1.4.5.2.3 Disabling the Linux UI
After installing the required packages, you must disable the Linux UI if it has been installed. You can disable Linux by running the following command:
$ sudo systemctl set-default multi-user.target
2.1. Full list of guides 139 SQream DB, Release 2021.1
Go back to Setting Up Your Hosts
2.1.4.5.2.4 Disabling SELinux
After disabling the Linux UI you must disable SELinux. To disable SELinux: 1. Run the following command:
$ sed -i -e s/enforcing/disabled/g /etc/selinux/config $ sudo reboot
2. Reboot the system as a root user:
$ sudo reboot
Go back to Setting Up Your Hosts
2.1.4.5.2.5 Disabling Your Firewall
After disabling SELinux, you must disable your firewall by running the following commands:
$ sudo systemctl stop firewalld $ sudo systemctl disable firewalld
Go back to Setting Up Your Hosts
2.1.4.5.2.6 Checking the CUDA Version
After completing all of the steps above, you must check the CUDA version. To check the CUDA version: 1. Check the CUDA version:
$ nvidia-smi
The following is an example of the correct output:
$ +------+ $ | NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 | $ |------+------+------+ $ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | $ | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | $ |======+======+======| $ | 0 GeForce GTX 105... Off | 00000000:01:00.0 Off | N/A | $ | 32% 38C P0 N/A / 75W | 0MiB / 4039MiB | 0% Default | $ +------+------+------+ $ $ +------+ $ | Processes: GPU Memory | $ | GPU PID Type Process name Usage | (continues on next page)
140 Chapter 2. Guides SQream DB, Release 2021.1
(continued from previous page) $ |======| $ | No running processes found | $ +------+
In the above output, the CUDA version is 10.1. If the above output is not generated, CUDA has not been installed. To install CUDA, see Installing the CUDA driver. Go back to Setting Up Your Hosts
2.1.4.5.3 Installing Your Kubernetes Cluster
After setting up your hosts, you must install your Kubernetes cluster. The Kubernetes and SQream software must be installed from the management host, and can be installed on any server in the cluster. Installing your Kubernetes cluster requires the following:
• Generating and Sharing SSH Keypairs Across All Existing Nodes • Installing and Deploying a Kubernetes Cluster Using Kubespray • Adjusting Kubespray Deployment Values • Checking Your Kubernetes Status • Adding a SQream Label to Your Kubernetes Cluster Nodes • Copying Your Kubernetes Configuration API File to the Master Cluster Nodes • Creating an env_file in Your Home Directory • Creating a Base Kubernetes Namespace • Pushing the env_file File to the Kubernetes Configmap • Installing the NVIDIA Docker2 Toolkit • Modifying the Docker Daemon JSON File for GPU and Compute Nodes • Installing the Nvidia-device-plugin Daemonset • Creating an Nvidia Device Plugin • Checking GPU Resources Allocatable to GPU Nodes • Preparing the WatchDog Monitor
2.1.4.5.3.1 Generating and Sharing SSH Keypairs Across All Existing Nodes
You can generate and share SSH keypairs across all existing nodes. Sharing SSH keypairs across all nodes enables passwordless access from the management server to all nodes in the cluster. All nodes in the cluster require passwordless access.
Note: You must generate and share an SSH keypair across all nodes even if you are installing the Kubernetes cluster on a single host.
2.1. Full list of guides 141 SQream DB, Release 2021.1
To generate and share an SSH keypair: 1. Switch to root user access:
$ sudo su -
2. Generate an RSA key pair:
$ ssh-keygen
The following is an example of the correct output:
$ ssh-keygen $ Generating public/private rsa key pair. $ Enter file in which to save the key (/root/.ssh/id_rsa): $ Created directory '/root/.ssh'. $ Enter passphrase (empty for no passphrase): $ Enter same passphrase again: $ Your identification has been saved in /root/.ssh/id_rsa. $ Your public key has been saved in /root/.ssh/id_rsa.pub. $ The key fingerprint is: $ SHA256:xxxxxxxxxxxxxxdsdsdffggtt66gfgfg [email protected] $ The key's randomart image is: $ +---[RSA 2048]----+ $ | =*. | $ | .o | $ | ..o o| $ | . .oo +.| $ | = S =...o o| $ | B + *..o+.| $ | o * *..o .+| $ | o * oo.E.o| $ | . ..+..B.+o| $ +----[SHA256]-----+
The generated file is /root/.ssh/id_rsa.pub. 3. Copy the public key to all servers in the cluster, including the one that you are running on.
$ ssh-copy-id -i ~/.ssh/id_rsa.pub root@remote-host
4. Replace the remote host with your host IP address. Go back to Installing Your Kubernetes Cluster
2.1.4.5.3.2 Installing and Deploying a Kubernetes Cluster Using Kubespray
SQream uses the Kubespray software package to install and deploy Kubernetes clusters. To install and deploy a Kubernetes cluster using Kubespray: 1. Clone Kubernetes: 1. Clone the kubespray.git repository:
$ git clone https://github.com/kubernetes-incubator/kubespray.git
142 Chapter 2. Guides SQream DB, Release 2021.1
2. Nagivate to the kubespray directory:
$ cd kubespray
3. Install the requirements.txt configuration file:
$ pip3 install -r requirements.txt
2. Create your SQream inventory directory: 1. Run the following command:
$ cp -rp inventory/sample inventory/sqream
2. Replace the
$ declare -a IPS=(
For example, the following replaces 192.168.0.93 with 192.168.0.92:
$ declare -a IPS=(host-93,192.168.0.93 host-92,192.168.0.92)
Note the following: • Running a declare requires defining a pair (host name and cluster node IP address), as shownin the above example. • You can define more than one pair. 3. When the reboot is complete, switch back to the root user:
$ sudo su -
4. Navigate to root/kubespray:
$ cd /root/kubespray
5. Copy inventory/sample as inventory/sqream:
$ cp -rfp inventory/sample inventory/sqream
6. Update the Ansible inventory file with the inventory builder:
$ declare -a IPS=(
7. In the kubespray hosts.yml file, set the node IP’s:
$ CONFIG_FILE=inventory/sqream/hosts.yml python3 contrib/inventory_builder/ ,→inventory.py ${IPS[@]}
If you do not set a specific hostname in declare, the server hostnames will change to node1, node2, etc. To maintain specific hostnames, run declare as in the following example:
$ declare -a IPS=(eks-rhl-1,192.168.5.81 eks-rhl-2,192.168.5.82 eks-rhl-3,192.168.5. ,→83)
Note that the declare must contain pairs (hostname,ip). 8. Verify that the following have been done:
2.1. Full list of guides 143 SQream DB, Release 2021.1
• That the hosts.yml file is configured correctly. • That all children are included with their relevant nodes. You can save your current server hostname by replacing
$ cat inventory/sqream/hosts.yml
The hostname can be lowercase and contain - or . only, and must be aligned with the server’s hostname. The following is an example of the correct output. Each host and IP address that you provided in Step 2 should be displayed once:
$ all: $ hosts: $ node1: $ ansible_host: 192.168.5.81 $ ip: 192.168.5.81 $ access_ip: 192.168.5.81 $ node2: $ ansible_host: 192.168.5.82 $ ip: 192.168.5.82 $ access_ip: 192.168.5.82 $ node3: $ ansible_host: 192.168.5.83 $ ip: 192.168.5.83 $ access_ip: 192.168.5.83 $ children: $ kube-master: $ hosts: $ node1: $ node2: $ node3: $ kube-node: $ hosts: $ node1: $ node2: $ node3: $ etcd: $ hosts: $ node1: $ node2: $ node3: $ k8s-cluster: $ children: $ kube-master: $ kube-node: $ calico-rr: $ hosts: {}
Go back to Installing Your Kubernetes Cluster
144 Chapter 2. Guides SQream DB, Release 2021.1
2.1.4.5.3.3 Adjusting Kubespray Deployment Values
After downloading and configuring Kubespray, you can adjust your Kubespray deployment values. A script is used to modify how the Kubernetes cluster is deployed, and you must set the cluster name variable before running this script.
Note: The script must be run from the kubespray folder.
To adjust Kubespray deployment values: 1. Add the following export to the local user’s ~/.bashrc file by replacing the
$ export VIP_IP=
2. Logout, log back in, and verify the following:
$ echo $VIP_IP
3. Make the following replacements to the kubespray.settings.sh file:
$ cat <
Note: In most cases, the Docker data resides on the system disk. Because Docker requires a high volume of data (images, containers, volumes, etc.), you can change the default Docker data location to prevent the system disk from running out of space.
4. Optional - Change the default Docker data location:
$ sed -i "/docker_daemon_graph/c \docker_daemon_graph\: """ inventory/sqream/group_vars/all/docker.yml
5. Make the kubespray_settings.sh file executable for your user:
2.1. Full list of guides 145 SQream DB, Release 2021.1
$ chmod u+x kubespray_settings.sh && ./kubespray_settings.sh
6. Run the following script:
$ ./kubespray_settings.sh
7. Run a playbook on the inventory/sqream/hosts.yml cluster.yml file:
$ ansible-playbook -i inventory/sqream/hosts.yml cluster.yml -v
The Kubespray installation takes approximately 10 - 15 minutes. The following is an example of the correct output:
$ PLAY RECAP $␣ ,→********************************************************************************************* $ node-1 : ok=680 changed=133 unreachable=0 failed=0 $ node-2 : ok=583 changed=113 unreachable=0 failed=0 $ node-3 : ok=586 changed=115 unreachable=0 failed=0 $ localhost : ok=1 changed=0 unreachable=0 failed=0
In the event that the output is incorrect, or a failure occurred during the installation, please contact a SQream customer support representative. Go back to Installing Your Kubernetes Cluster.
2.1.4.5.3.4 Checking Your Kubernetes Status
After adjusting your Kubespray deployment values, you must check your Kubernetes status. To check your Kuberetes status: 1. Check the status of the node:
$ kubectl get nodes
The following is an example of the correct output:
$ NAME STATUS ROLES AGE VERSION $ eks-rhl-1 Ready control-plane,master 29m v1.21.1 $ eks-rhl-2 Ready control-plane,master 29m v1.21.1 $ eks-rhl-3 Ready
2. Check the status of the pod:
$ kubectl get pods --all-namespaces
The following is an example of the correct output:
$ NAMESPACE NAME ␣ ,→READY STATUS RESTARTS AGE $ kube-system calico-kube-controllers-68dc8bf4d5-n9pbp 1/1␣ ,→ Running 0 160m $ kube-system calico-node-26cn9 1/1␣ ,→ Running 1 160m (continues on next page)
146 Chapter 2. Guides SQream DB, Release 2021.1
(continued from previous page) $ kube-system calico-node-kjsgw 1/1␣ ,→ Running 1 160m $ kube-system calico-node-vqvc5 1/1␣ ,→ Running 1 160m $ kube-system coredns-58687784f9-54xsp 1/1␣ ,→ Running 0 160m $ kube-system coredns-58687784f9-g94xb 1/1␣ ,→ Running 0 159m $ kube-system dns-autoscaler-79599df498-hlw8k 1/1␣ ,→ Running 0 159m $ kube-system kube-apiserver-k8s-host-1-134 1/1␣ ,→ Running 0 162m $ kube-system kube-apiserver-k8s-host-194 1/1␣ ,→ Running 0 161m $ kube-system kube-apiserver-k8s-host-68 1/1␣ ,→ Running 0 161m $ kube-system kube-controller-manager-k8s-host-1-134 1/1␣ ,→ Running 0 162m $ kube-system kube-controller-manager-k8s-host-194 1/1␣ ,→ Running 0 161m $ kube-system kube-controller-manager-k8s-host-68 1/1␣ ,→ Running 0 161m $ kube-system kube-proxy-5f42q 1/1␣ ,→ Running 0 161m $ kube-system kube-proxy-bbwvk 1/1␣ ,→ Running 0 161m $ kube-system kube-proxy-fgcfb 1/1␣ ,→ Running 0 161m $ kube-system kube-scheduler-k8s-host-1-134 1/1␣ ,→ Running 0 161m $ kube-system kube-scheduler-k8s-host-194 1/1␣ ,→ Running 0 161m
Go back to Installing Your Kubernetes Cluster
2.1.4.5.3.5 Adding a SQream Label to Your Kubernetes Cluster Nodes
After checking your Kubernetes status, you must add a SQream label on your Kubernetes cluster nodes. To add a SQream label on your Kubernetes cluster nodes: 1. Get the cluster node list:
$ kubectl get nodes
The following is an example of the correct output:
$ NAME STATUS ROLES AGE VERSION $ eks-rhl-1 Ready control-plane,master 29m v1.21.1 $ eks-rhl-2 Ready control-plane,master 29m v1.21.1 $ eks-rhl-3 Ready
2. Set the node label, change the node-name to the node NAME(s) in the above example:
2.1. Full list of guides 147 SQream DB, Release 2021.1
$ kubectl label nodes
The following is an example of the correct output:
$ [root@edk-rhl-1 kubespray]# kubectl label nodes eks-rhl-1 cluster=sqream $ node/eks-rhl-1 labeled $ [root@edk-rhl-1 kubespray]# kubectl label nodes eks-rhl-2 cluster=sqream $ node/eks-rhl-2 labeled $ [root@edk-rhl-1 kubespray]# kubectl label nodes eks-rhl-3 cluster=sqream $ node/eks-rhl-3 labeled
Go back to Installing Your Kubernetes Cluster
2.1.4.5.3.6 Copying Your Kubernetes Configuration API File to the Master Cluster Nodes
After adding a SQream label on your Kubernetes cluster nodes, you must copy your Kubernetes configuration API file to your Master cluster nodes. When the Kubernetes cluster installation is complete, an API configuration file is automatically created in the .kube folder of the root user. This file enables the kubectl command access Kubernetes’ internal API service. Following this step lets you run kubectl commands from any node in the cluster.
Warning: You must perform this on the management server only!
To copy your Kubernetes configuration API file to your Master cluster nodes: 1. Create the .kube folder in the local user directory:
$ mkdir /home/
2. Copy the configuration file from the root user directory to the
$ sudo cp /root/.kube/config /home/
3. Change the file owner from root user to the
$ sudo chown
4. Create the .kube folder in the other nodes located in the
$ ssh
5. Copy the configuration file from the management node to the other nodes:
$ scp /home/
6. Under local user on each server you copied .kube to, run the following command:
$ sudo usermod -aG docker $USER
This grants the local user the necessary permissions to run Docker commands. Go back to Installing Your Kubernetes Cluster
148 Chapter 2. Guides SQream DB, Release 2021.1
2.1.4.5.3.7 Creating an env_file in Your Home Directory
After copying your Kubernetes configuration API file to your Master cluster nodes, you must createan env_file in your home directory, and must set the VIP address as a variable.
Warning: You must perform this on the management server only!
To create an env_file for local users in the user’s home directory: 1. Set a variable that includes the VIP IP address:
$ export VIP_IP=
Note: If you use Kerberos, replace the KRB5_SERVER value with the IP address of your Kerberos server.
2. Do one of the following: • For local users:
$ mkdir /home/$USER/.sqream
3. Make the following replacements to the kubespray.settings.sh file, verifying that the KRB5_SERVER parameter is set to your server IP:
$ cat <
Go back to Installing Your Kubernetes Cluster
2.1. Full list of guides 149 SQream DB, Release 2021.1
2.1.4.5.3.8 Creating a Base Kubernetes Namespace
After creating an env_file in the user’s home directory, you must create a base Kubernetes namespace. You can create a Kubernetes namespace by running the following command:
$ kubectl create namespace sqream-init
The following is an example of the correct output:
$ namespace/sqream-init created
Go back to Installing Your Kubernetes Cluster
2.1.4.5.3.9 Pushing the env_file File to the Kubernetes Configmap
After creating a base Kubernetes namespace, you must push the env_file to the Kubernetes configmap. You must push the env_file file to the Kubernetes configmap in the sqream-init namespace. This is done by running the following command:
$ kubectl create configmap sqream-init -n sqream-init --from-env-file=/home/$USER/. ,→sqream/env_file
The following is an example of the correct output:
$ configmap/sqream-init created
Go back to Installing Your Kubernetes Cluster
2.1.4.5.3.10 Installing the NVIDIA Docker2 Toolkit
After pushing the env_file file to the Kubernetes configmap, you must install the NVIDIA Docker2 Toolkit. The NVIDIA Docker2 Toolkit lets users build and run GPU-accelerated Docker containers, and must be run only on GPU servers. The NVIDIA Docker2 Toolkit includes a container runtime library and utilities that automatically configure containers to leverage NVIDIA GPUs.
2.1.4.5.3.11 Installing the NVIDIA Docker2 Toolkit on an x86_64 Bit Processor on CentOS
To install the NVIDIA Docker2 Toolkit on an x86_64 bit processor on CentOS: 1. Add the repository for your distribution:
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) $ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker. ,→repo | \ $ sudo tee /etc/yum.repos.d/nvidia-docker.repo
2. Install the nvidia-docker2 package and reload the Docker daemon configuration:
$ sudo yum install nvidia-docker2 $ sudo pkill -SIGHUP dockerd
150 Chapter 2. Guides SQream DB, Release 2021.1
3. Verify that the nvidia-docker2 package has been installed correctly:
$ docker run --runtime=nvidia --rm nvidia/cuda:10.1-base nvidia-smi
The following is an example of the correct output:
$ docker run --runtime=nvidia --rm nvidia/cuda:10.1-base nvidia-smi $ Unable to find image 'nvidia/cuda:10.1-base' locally $ 10.1-base: Pulling from nvidia/cuda $ d519e2592276: Pull complete $ d22d2dfcfa9c: Pull complete $ b3afe92c540b: Pull complete $ 13a10df09dc1: Pull complete $ 4f0bc36a7e1d: Pull complete $ cd710321007d: Pull complete $ Digest: sha256:635629544b2a2be3781246fdddc55cc1a7d8b352e2ef205ba6122b8404a52123 $ Status: Downloaded newer image for nvidia/cuda:10.1-base $ Sun Feb 14 13:27:58 2021 $ +------+ $ | NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 | $ |------+------+------+ $ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | $ | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | $ |======+======+======| $ | 0 GeForce GTX 105... Off | 00000000:01:00.0 Off | N/A | $ | 32% 37C P0 N/A / 75W | 0MiB / 4039MiB | 0% Default | $ +------+------+------+ $ $ +------+ $ | Processes: GPU Memory | $ | GPU PID Type Process name Usage | $ |======| $ | No running processes found | $ +------+
For more information on installing the NVIDIA Docker2 Toolkit on an x86_64 Bit Processor on CentOS, see NVIDIA Docker Installation - CentOS distributions
2.1.4.5.3.12 Installing the NVIDIA Docker2 Toolkit on an x86_64 Bit Processor on Ubuntu
To install the NVIDIA Docker2 Toolkit on an x86_64 bit processor on Ubuntu: 1. Add the repository for your distribution:
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \ $ sudo apt-key add - $ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) $ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker. ,→list | \ $ sudo tee /etc/apt/sources.list.d/nvidia-docker.list $ sudo apt-get update
2. Install the nvidia-docker2 package and reload the Docker daemon configuration:
2.1. Full list of guides 151 SQream DB, Release 2021.1
$ sudo apt-get install nvidia-docker2 $ sudo pkill -SIGHUP dockerd
3. Verify that the nvidia-docker2 package has been installed correctly:
$ docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
For more information on installing the NVIDIA Docker2 Toolkit on an x86_64 Bit Processor on Ubuntu, see NVIDIA Docker Installation - Ubuntu distributions Go back to Installing Your Kubernetes Cluster
2.1.4.5.3.13 Modifying the Docker Daemon JSON File for GPU and Compute Nodes
After installing the NVIDIA Docker2 toolkit, you must modify the Docker daemon JSON file for GPU and Compute nodes.
2.1.4.5.3.14 Modifying the Docker Daemon JSON File for GPU Nodes
To modify the Docker daemon JSON file for GPU nodes: 1. Enable GPU and set HTTP access to the local Kubernetes Docker registry.
Note: The Docker daemon JSON file must be modified on all GPU nodes.
Note: Contact your IT department for a virtual IP.
2. Replace the VIP address with your assigned VIP address. 3. Connect as a root user:
$ sudo -i
4. Set a variable that includes the VIP address:
$ export VIP_IP=
5. Replace the
$ cat <
152 Chapter 2. Guides SQream DB, Release 2021.1
6. Apply the changes and restart Docker:
$ systemctl daemon-reload && systemctl restart docker
7. Exit the root user:
$ exit
Go back to Installing Your Kubernetes Cluster
2.1.4.5.3.15 Modifying the Docker Daemon JSON File for Compute Nodes
You must follow this procedure only if you have a Compute node. To modify the Docker daemon JSON file for Compute nodes: 1. Switch to a root user:
$ sudo -i
2. Set a variable that includes a VIP address.
Note: Contact your IT department for a virtual IP.
3. Replace the VIP address with your assigned VIP address.
$ cat <
4. Restart the services:
$ systemctl daemon-reload && systemctl restart docker
5. Exit the root user:
$ exit
Go back to Installing Your Kubernetes Cluster
2.1.4.5.3.16 Installing the Nvidia-device-plugin Daemonset
After modifying the Docker daemon JSON file for GPU or Compute Nodes, you must installing the Nvidia- device-plugin daemonset. The Nvidia-device-plugin daemonset is only relevant to GPU nodes. To install the Nvidia-device-plugin daemonset: 1. Set nvidia.com/gpu to true on all GPU nodes:
$ kubectl label nodes
2.1. Full list of guides 153 SQream DB, Release 2021.1
2. Replace the
$ [root@eks-rhl-1 ~]# kubectl label nodes eks-rhl-1 nvidia.com/gpu=true $ node/eks-rhl-1 labeled $ [root@eks-rhl-1 ~]# kubectl label nodes eks-rhl-2 nvidia.com/gpu=true $ node/eks-rhl-2 labeled $ [root@eks-rhl-1 ~]# kubectl label nodes eks-rhl-3 nvidia.com/gpu=true $ node/eks-rhl-3 labeled
Go back to Installing Your Kubernetes Cluster
2.1.4.5.3.17 Creating an Nvidia Device Plugin
After installing the Nvidia-device-plugin daemonset, you must create an Nvidia-device-plugin. You can create an Nvidia-device-plugin by running the following command
$ kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0- ,→beta6/nvidia-device-plugin.yml
If needed, you can check the status of the Nvidia-device-plugin-daemonset pod status:
$ kubectl get pods -n kube-system -o wide | grep nvidia-device-plugin
The following is an example of the correct output:
$ NAME READY STATUS RESTARTS AGE $ nvidia-device-plugin-daemonset-fxfct 1/1 Running 0 6h1m $ nvidia-device-plugin-daemonset-jdvxs 1/1 Running 0 6h1m $ nvidia-device-plugin-daemonset-xpmsv 1/1 Running 0 6h1m
Go back to Installing Your Kubernetes Cluster
2.1.4.5.3.18 Checking GPU Resources Allocatable to GPU Nodes
After creating an Nvidia Device Plugin, you must check the GPU resources alloctable to the GPU nodes. Each GPU node has records, such as nvidia.com/gpu: <#>. The # indicates the number of allocatable, or available, GPUs in each node. You can output a description of allocatable resources by running the following command:
$ kubectl describe node | grep -i -A 7 -B 2 allocatable:
The following is an example of the correct output:
$ Allocatable: $ cpu: 3800m $ ephemeral-storage: 94999346224 $ hugepages-1Gi: 0 $ hugepages-2Mi: 0 $ memory: 15605496Ki (continues on next page)
154 Chapter 2. Guides SQream DB, Release 2021.1
(continued from previous page) $ nvidia.com/gpu: 1 $ pods: 110
Go back to Installing Your Kubernetes Cluster
2.1.4.5.3.19 Preparing the WatchDog Monitor
SQream’s deployment includes installing two watchdog services. These services monitor Kuberenetes man- agement and the server’s storage network. You can enable the storage watchdogs by adding entries in the /etc/hosts file on each server:
$
k8s-node1.storage $ k8s-node2.storage $ k8s-node3.storageThe following is an example of the correct syntax:
$ 10.0.0.1 k8s-node1.storage $ 10.0.0.2 k8s-node2.storage $ 10.0.0.3 k8s-node3.storage
Go back to Installing Your Kubernetes Cluster
2.1.4.5.4 Installing the SQream Software
Once you’ve prepared the SQream environment for launching it using Kubernetes, you can begin installing the SQream software. The Installing the SQream Software section describes the following:
• Getting the SQream Package • Setting Up and Configuring Hadoop • Starting a Local Docker Image Registry • Installing the Kubernetes Dashboard • Installing the SQream Prometheus Package
2.1.4.5.4.1 Getting the SQream Package
The first step in installing the SQream software is getting the SQream package. Please contact theSQream Support team to get the sqream_k8s-nnn-DBnnn-COnnn-SDnnn-
2.1. Full list of guides 155 SQream DB, Release 2021.1
• SD
$ tar -xvf sqream_k8s-1.0.15-DB2020.1.0.2-SD0.7.3-x86_64.tar.gz $ cd sqream_k8s-1.0.15-DB2020.1.0.2-SD0.7.3-x86_64 $ ls
Extracting the contents of the tarball file generates a new folder with the same name as the tarball file. The following shows the output of the extracted file:
drwxrwxr-x. 2 sqream sqream 22 Jan 27 11:39 license lrwxrwxrwx. 1 sqream sqream 49 Jan 27 11:39 sqream -> .sqream/sqream-sql-v2020.3.1_ ,→stable.x86_64/sqream -rwxrwxr-x. 1 sqream sqream 9465 Jan 27 11:39 sqream-install -rwxrwxr-x. 1 sqream sqream 12444 Jan 27 11:39 sqream-start
Go back to Installing Your SQream Software
2.1.4.5.4.2 Setting Up and Configuring Hadoop
After getting the SQream package, you can set up and configure Hadoop by configuring the keytab and krb5.conf files.
Note: You only need to configure the keytab and krb5.conf files if you use Hadoop with Kerberos authentication.
To set up and configure Hadoop: 1. Contact IT for the keytab and krb5.conf files. 2. Copy both files into the respective empty .hadoop/ and .krb5/ directories:
$ cp hdfs.keytab krb5.conf .krb5/ $ cp core-site.xml hdfs-site.xml .hadoop/
The SQream installer automatically copies the above files during the installation process. Go back to Installing Your SQream Software
2.1.4.5.4.3 Starting a Local Docker Image Registry
After getting the SQream package, or (optionally) setting up and configuring Hadoop, you must start a local Docker image registry. Because Kubernetes is based on Docker, you must start the local Docker image registry on the host’s shared folder. This allows all hosts to pull the SQream Docker images. To start a local Docker image registry: 1. Create a Docker registry folder:
$ mkdir
2. Set the docker_path for the Docker registry folder:
156 Chapter 2. Guides SQream DB, Release 2021.1
$ export docker_path=
3. Apply the docker-registry service to the cluster:
$ cat .k8s/admin/docker_registry.yaml | envsubst | kubectl create -f -
The following is an example of the correct output:
namespace/sqream-docker-registry created configmap/sqream-docker-registry-config created deployment.apps/sqream-docker-registry created service/sqream-docker-registry created
4. Check the pod status of the docker-registry service:
$ kubectl get pods -n sqream-docker-registry
The following is an example of the correct output:
NAME READY STATUS RESTARTS AGE sqream-docker-registry-655889fc57-hmg7h 1/1 Running 0 6h40m
Go back to Installing Your SQream Software
2.1.4.5.4.4 Installing the Kubernetes Dashboard
After starting a local Docker image registry, you must install the Kubernetes dashboard. The Kubernetes dashboard lets you see the Kubernetes cluster, nodes, services, and pod status. To install the Kubernetes dashboard: 1. Apply the k8s-dashboard service to the cluster:
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.0/ ,→aio/deploy/recommended.yaml
The following is an example of the correct output:
namespace/kubernetes-dashboard created serviceaccount/kubernetes-dashboard created service/kubernetes-dashboard created secret/kubernetes-dashboard-certs created secret/kubernetes-dashboard-csrf created secret/kubernetes-dashboard-key-holder created configmap/kubernetes-dashboard-settings created role.rbac.authorization.k8s.io/kubernetes-dashboard created clusterrole.rbac.authorization.k8s.io/kubernetes-dashboard created rolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created clusterrolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created deployment.apps/kubernetes-dashboard created service/dashboard-metrics-scraper created deployment.apps/dashboard-metrics-scraper created
2. Grant the user external access to the Kubernetes dashboard:
2.1. Full list of guides 157 SQream DB, Release 2021.1
$ cat .k8s/admin/kubernetes-dashboard-svc-metallb.yaml | envsubst | kubectl create - ,→f -
The following is an example of the correct output:
service/kubernetes-dashboard-nodeport created
3. Create the cluster-admin-sa.yaml file:
$ kubectl create -f .k8s/admin/cluster-admin-sa.yaml
The following is an example of the correct output:
clusterrolebinding.rbac.authorization.k8s.io/cluster-admin-sa-cluster-admin created
4. Check the pod status of the K8s-dashboard service:
$ kubectl get pods -n kubernetes-dashboard
The following is an example of the correct output:
NAME READY STATUS RESTARTS AGE dashboard-metrics-scraper-6b4884c9d5-n8p57 1/1 Running 0 4m32s kubernetes-dashboard-7b544877d5-qc8b4 1/1 Running 0 4m32s
5. Obtain the k8s-dashboard access token:
$ kubectl -n kube-system describe secrets cluster-admin-sa-token
The following is an example of the correct output:
Name: cluster-admin-sa-token-rbl9p Namespace: kube-system Labels:
Type: kubernetes.io/service-account-token
Data ==== ca.crt: 1025 bytes namespace: 11 bytes token: ␣ ,→eyJhbGciOiJSUzI1NiIsImtpZCI6IjRMV09qVzFabjhId09oamQzZGFFNmZBeEFzOHp3SlJOZWdtVm5lVTdtSW8ifQ. ,→eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJjbHVzdGVyLWFkbWluLXNhLXRva2VuLXJibDlwIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6ImNsdXN0ZXItYWRtaW4tc2EiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiI4MTg2NmQ2ZC04ZWYzLTQ4MDUtODQwZC01ODYxODIzNWY2OGQiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zeXN0ZW06Y2x1c3Rlci1hZG1pbi1zYSJ9. ,→mNhp8JMr5y3hQ44QrvRDCMueyjSHSrmqZcoV00ZC7iBzNUqh3n-fB99CvC_ ,→GR15ys43jnfsz0tdsTy7VtSc9hm5ENBI-tQ_mwT1Zc7zJrEtgFiA0o_ ,→eyfYZOARdhdyFEJg84bzkIxJFPKkBWb4iPWU1Xb7RibuMCjNTarZMZbqzKYfQEcMZWJ5UmfUqp- ,→HahZZR4BNbjSWybs7t6RWdcQZt6sO_rRCDrOeEJlqKKjx4-5jFZB8Du_ ,→0kKmnw2YJmmSCEOXrpQCyXIiZJpX08HyDDYfFp8IGzm61arB8HDA9dN_ ,→xoWvuz4Cj8klUtTzL9effJJPjHJlZXcEqQc9hE3jw
6. Navigate to https://
158 Chapter 2. Guides SQream DB, Release 2021.1
7. Select the Token radio button, paste the token from the previous command output, and click Sign in. The Kubernetes dashboard is displayed. Go back to Installing Your SQream Software
2.1.4.5.4.5 Installing the SQream Prometheus Package
After installing the Kubernetes dashboard, you must install the SQream Prometheus package. To properly monitor the host and GPU statistics the exporter service must be installed on each Kubernetes cluster node. This section describes how to install the following: • node_exporter - collects host data, such as CPU memory usage. • nvidia_exporter - collects GPU utilization data.
Note: The steps in this section must be done on all cluster nodes.
To install the sqream-prometheus package, you must do the following: 1. Install the exporter service 2. Check the exporter service Go back to Installing Your SQream Software
2.1.4.5.4.6 Installing the Exporter Service
To install the exporter service: 1. Create a user and group that will be used to run the exporter services:
$ sudo groupadd --system prometheus && sudo useradd -s /sbin/nologin --system -g␣ ,→prometheus prometheus
2. Extract the sqream_exporters_prometheus.0.1.tar.gz file:
$ cd .prometheus $ tar -xf sqream_exporters_prometheus.0.1.tar.gz
3. Copy the exporter software files to the /usr/bin directory:
$ cd sqream_exporters_prometheus.0.1 $ sudo cp node_exporter/node_exporter /usr/bin/ $ sudo cp nvidia_exporter/nvidia_exporter /usr/bin/
4. Copy the exporters service file to the /etc/systemd/system/ directory:
$ sudo cp services/node_exporter.service /etc/systemd/system/ $ sudo cp services/nvidia_exporter.service /etc/systemd/system/
5. Set the permission and group of the service files:
2.1. Full list of guides 159 SQream DB, Release 2021.1
$ sudo chown prometheus:prometheus /usr/bin/node_exporter $ sudo chmod u+x /usr/bin/node_exporter $ sudo chown prometheus:prometheus /usr/bin/nvidia_exporter $ sudo chmod u+x /usr/bin/nvidia_exporter
6. Reload the services:
$ sudo systemctl daemon-reload
7. Start both services and set them to start when the server is booted up: • Node_exporter:
$ sudo systemctl start node_exporter && sudo systemctl enable node_exporter
• Nvidia_exporter:
$ sudo systemctl start nvidia_exporter && sudo systemctl enable nvidia_exporter
2.1.4.5.4.7 Checking the Exporter Status
After installing the exporter service, you must check its status. You can check the exporter status by running the following command:
$ sudo systemctl status node_exporter && sudo systemctl status nvidia_exporter
Go back to Installing Your SQream Software
2.1.4.5.5 Running the Sqream-install Service
The Running the Sqream-install Service section describes the following:
• Installing Your License • Changing Your Data Ingest Folder • Checking Your System Settings • SQream Installation Command Reference • Controlling Your Kubernetes Cluster Using SQream Flags
2.1.4.5.5.1 Installing Your License
After install the SQream Prometheus package, you must install your license. To install your license: 1. Copy your license package to the sqream /license folder.
160 Chapter 2. Guides SQream DB, Release 2021.1
Note: You do not need to untar the license package after copying it to the /license folder because the installer script does it automatically.
The following flags are mandatory during your first run:
$ sudo ./sqream-install -i -k -m
Note: If you cannot run the script with sudo, verify that you have the right permission (rwx for the user) on the relevant directories (config, log, volume, and data-in directories).
Go back to Running the SQream_install Service.
2.1.4.5.5.2 Changing Your Data Ingest Folder
After installing your license, you must change your data ingest folder. You can change your data ingest folder by running the following command:
$ sudo ./sqream-install -d /media/nfs/sqream/data_in
Go back to Running the SQream_install Service.
2.1.4.5.5.3 Checking Your System Settings
After changing your data ingest folder, you must check your system settings. The following command shows you all the variables that your SQream system is running with:
$ ./sqream-install -s
After optionally checking your system settings, you can use the sqream-start application to control your Kubernetes cluster. Go back to Running the SQream_install Service.
2.1.4.5.5.4 SQream Installation Command Reference
If needed, you can use the sqream-install flag reference for any needed flags by typing:
$ ./sqream-install --help
The following shows the sqream–install flag descriptions:
2.1. Full list of guides 161 SQream DB, Release 2021.1
Flag Function Note -i Loads all the software from the hidden .docker folder. Mandatory -k Loads the license package from the /license directory. Mandatory -m Sets the relative path for all SQream folders under the Mandatory shared filesystem available from all nodes (sqreamdb, con- fig, logs and data_in). No other flags are required ifyou use this flag (such as c, v, l or d). -c Sets the path where to write/read SQream configuration Optional files from. The default is /etc/sqream/. -v Shows the location of the SQream cluster. v creates a Optional cluster if none exist, and mounts it if does. -l Shows the location of the SQream system startup logs. Optional The logs contain startup and Docker logs. The default is /var/log/sqream/. -d Shows the folder containing data that you want to import Optional into or copy from SQream. -n
Go back to Running the SQream_install Service.
2.1.4.5.5.5 Controlling Your Kubernetes Cluster Using SQream Flags
You can control your Kubernetes cluster using SQream flags. The following command shows you the available Kubernetes cluster control options:
$ ./sqream-start -h
The following describes the sqream-start flags:
162 Chapter 2. Guides SQream DB, Release 2021.1
Flag Function Note -s Starts the sqream services, starting metadata, server Mandatory picker, and workers. The number of workers started is based on the number of available GPU’s. -p Sets specific ports to the workers services. You must en- ter the starting port for the sqream-start application to allocate it based on the number of workers. -j Uses an external .json configuration file. The file must be The workers must each located in the configuration directory. be started individu- ally. -m Allocates worker spool memory. The workers must each be started individu- ally. -a Starts the SQream Administration dashboard and speci- fies the listening port. -d Deletes all running SQream services. -h Shows all available flags. Help
Go back to Running the SQream_install Service.
2.1.4.5.6 Using the sqream-start Commands
In addition to controlling your Kubernetes cluster using SQream flags, you can control it using sqream-start commands. The Using the sqream-start Commands section describes the following:
• Starting Your SQream Services • Starting Your SQream Services in Split Mode • Starting the Sqream Studio UI • Stopping the SQream Services • Advanced sqream-start Commands
2.1.4.5.6.1 Starting Your SQream Services
You can run the sqream-start command with the -s flag to start SQream services on all available GPU’s:
$ sudo ./sqream-start -s
This command starts the SQream metadata, server picker, and sqream workers on all available GPU’s in the cluster. The following is an example of the correct output:
./sqream-start -s Initializing network watchdogs on 3 hosts... Network watchdogs are up and running (continues on next page)
2.1. Full list of guides 163 SQream DB, Release 2021.1
(continued from previous page)
Initializing 3 worker data collectors ... Worker data collectors are up and running
Starting Prometheus ... Prometheus is available at 192.168.5.100:9090
Starting SQream master ... SQream master is up and running
Starting up 3 SQream workers ... All SQream workers are up and running, SQream-DB is available at 192.168.5.100:3108 All SQream workers are up and running, SQream-DB is available at 192.168.5.100:3108
Go back to Using the SQream-start Commands.
2.1.4.5.6.2 Starting Your SQream Services in Split Mode
Starting SQream services in split mode refers to running multiple SQream workers on a single GPU. You can do this by running the sqream-start command with the -s and -z flags. In addition, you can define the amount of hosts to run the multiple workers on. In the example below, the command defines to run the multiple workers on three hosts. To start SQream services in split mode: 1. Run the following command:
$ ./sqream-start -s -z 3
This command starts the SQream metadata, server picker, and sqream workers on a single GPU for three hosts: The following is an example of the correct output:
Initializing network watchdogs on 3 hosts... Network watchdogs are up and running
Initializing 3 worker data collectors ... Worker data collectors are up and running
Starting Prometheus ... Prometheus is available at 192.168.5.101:9090
Starting SQream master ... SQream master is up and running
Starting up 9 SQream workers over <#> available GPUs ... All SQream workers are up and running, SQream-DB is available at 192.168.5.101:3108
2. Verify all pods are properly running in k8s cluster (STATUS column):
kubectl -n sqream get pods
(continues on next page)
164 Chapter 2. Guides SQream DB, Release 2021.1
(continued from previous page) NAME READY STATUS RESTARTS AGE prometheus-bcf877867-kxhld 1/1 Running 0 106s sqream-metadata-fbcbc989f-6zlkx 1/1 Running 0 103s sqream-picker-64b8c57ff5-ndfr9 1/1 Running 2 102s sqream-split-workers-0-1-2-6bdbfbbb86-ml7kn 1/1 Running 0 57s sqream-split-workers-3-4-5-5cb49d49d7-596n4 1/1 Running 0 57s sqream-split-workers-6-7-8-6d598f4b68-2n9z5 1/1 Running 0 56s sqream-workers-start-xj75g 1/1 Running 0 58s watchdog-network-management-6dnfh 1/1 Running 0 115s watchdog-network-management-tfd46 1/1 Running 0 115s watchdog-network-management-xct4d 1/1 Running 0 115s watchdog-network-storage-lr6v4 1/1 Running 0 116s watchdog-network-storage-s29h7 1/1 Running 0 116s watchdog-network-storage-sx9mw 1/1 Running 0 116s worker-data-collector-62rxs 0/1 Init:0/1 0 54s worker-data-collector-n8jsv 0/1 Init:0/1 0 55s worker-data-collector-zp8vf 0/1 Init:0/1 0 54s
Go back to Using the SQream-start Commands.
2.1.4.5.6.3 Starting the Sqream Studio UI
You can run the following command the to start the SQream Studio UI (Editor and Dashboard):
$ ./sqream-start -a
The following is an example of the correct output:
$ ./sqream-start -a Please enter USERNAME: sqream Please enter PASSWORD: ****** Please enter port value or press ENTER to keep 8080:
Starting up SQream Admin UI... SQream admin ui is available at 192.168.5.100:8080
Go back to Using the SQream-start Commands.
2.1.4.5.6.4 Stopping the SQream Services
You can run the following command to stop all SQream services:
$ ./sqream-start -d
The following is an example of the correct output:
$ ./sqream-start -d $ Cleaning all SQream services in sqream namespace ... $ All SQream service removed from sqream namespace
2.1. Full list of guides 165 SQream DB, Release 2021.1
Go back to Using the SQream-start Commands.
2.1.4.5.6.5 Advanced sqream-start Commands
2.1.4.5.6.6 Controlling Your SQream Spool Size
If you do not specify the SQream spool size, the console automatically distributes the available RAM between all running workers. You can define a specific spool size by running the following command:
$ ./sqream-start -s -m 4
2.1.4.5.6.7 Using a Custom .json File
You have the option of using your own .json file for your own custom configurations. Your .json filemust be placed within the path mounted in the installation. SQream recommends placing your .json file in the configuration folder. The SQream console does not validate the integrity of external .json files. You can use the following command (using the j flag) to set the full path of your .json file to the configuration file:
$ ./sqream-start -s -f
This command starts one worker with an external configuration file.
Note: The configuration file must be available in the shared configuration folder.
2.1.4.5.6.8 Checking the Status of the SQream Services
You can show all running SQream services by running the following command:
$ kubectl get pods -n
This command shows all running services in the cluster and which nodes they are running in. Go back to Using the SQream-start Commands.
2.1.4.5.7 Upgrading Your SQream Version
The Upgrading Your SQream Version section describes the following:
• Before Upgrading Your System • Upgrading Your System
166 Chapter 2. Guides SQream DB, Release 2021.1
2.1.4.5.7.1 Before Upgrading Your System
Before upgrading your system you must do the following: 1. Contact SQream support for a new SQream package tarball file. 2. Set a maintenance window.
Note: You must stop the system while upgrading it.
2.1.4.5.7.2 Upgrading Your System
After completing the steps in Before Upgrading Your System above, you can upgrade your system. To upgrade your system: 1. Extract the contents of the tarball file that you received from SQream support. Make sure to extract the contents to the same directory as in Getting the SQream Package and for the same user:
$ tar -xvf sqream_installer-2.0.5-DB2019.2.1-CO1.6.3-ED3.0.0-x86_64/ $ cd sqream_installer-2.0.5-DB2019.2.1-CO1.6.3-ED3.0.0-x86_64/
2. Start the upgrade process run the following command:
$ ./sqream-install -i
The upgrade process checks if the SQream services are running and will prompt you to stop them. 3. Do one of the following: • Stop the upgrade by writing No. • Continue the upgrade by writing Yes. If you continue upgrading, all running SQream workers (master and editor) are stopped. When all services have been stopped, the new version is loaded.
Note: SQream periodically upgrades its metadata structure. If an upgrade version includes an upgraded metadata service, an approval request message is displayed. This approval is required to finish the upgrade process. Because SQream supports only specific metadata versions, all SQream services must be upgraded at the same time.
4. When SQream has successfully upgraded, load the SQream console and restart your services. For questions, contact SQream Support.
2.1.4.6 Running SQream in a Docker Container
This document describes how to prepare your machine’s environment for installing and running SQream in a Docker container.
This page describes the following:
2.1. Full list of guides 167 SQream DB, Release 2021.1
• Running SQream in a Docker Container – Setting Up a Host – Installing the Docker Engine (Community Edition) – Docker Post-Installation – Installing the Nvidia Docker2 ToolKit – Installing the SQream Software – Using the SQream Console
2.1.4.6.1 Setting Up a Host
2.1.4.6.1.1 Operating System Requirements
SQream was tested and verified on the following versions of Linux: • x86 CentOS/RHEL 7.6 - 7.9 • IBM RHEL 7.6 SQream recommends installing a clean OS on the host to avoid any installation issues.
Warning: Docker-based installation supports only single host deployment and cannot be used on a multi-node cluster. Installing Docker on a single host you will not be able to scale it to a multi-node cluster.
2.1.4.6.1.2 Creating a Local User
To run SQream in a Docker container you must create a local user. To create a local user: 1. Add a local user:
$ useradd -m -U
2. Set the local user’s password:
$ passwd
3. Add the local user to the wheel group:
$ usermod -aG wheel
You can remove the local user from the wheel group when you have completed the installation. 4. Log out and log back in as the local user.
168 Chapter 2. Guides SQream DB, Release 2021.1
2.1.4.6.1.3 Setting a Local Language
After creating a local user you must set a local language. To set a local language: 1. Set the local language:
$ sudo localectl set-locale LANG=en_US.UTF-8
2. Set the time stamp (time and date) of the locale:
$ sudo timedatectl set-timezone Asia/Jerusalem
You can run the timedatectl list-timezones command to see your timezone.
2.1.4.6.1.4 Adding the EPEL Repository
After setting a local language you must add the EPEL repository. To add the EPEL repository: 1. As a root user, upgrade the epel-release-latest-7.noarch.rpm repository:
$ sudo rpm -Uvh http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch. ,→rpm
2.1.4.6.1.5 Installing the Required NTP Packages
After adding the EPEL repository, you must install the required NTP packages. You can install the required NTP packages by running the following command:
$ sudo yum install ntp pciutils python36 kernel-devel-$(uname -r) kernel-headers- ,→$(uname -r) gcc
2.1.4.6.1.6 Installing the Recommended Tools
After installin gthe required NTP packages you must install the recommended tools. SQream recommends installing the following recommended tools:
$ sudo yum install bash-completion.noarch vim-enhanced.x86_64 vim-common.x86_64 net- ,→tools iotop htop psmisc screen xfsprogs wget yum-utils deltarpm dos2unix
2.1.4.6.1.7 Updating to the Current Version of the Operating System
After installing the recommended tools you must update to the current version of the operating system. SQream recommends updating to the current version of the operating system. This is not recommended if the nvidia driver has not been installed.
2.1. Full list of guides 169 SQream DB, Release 2021.1
2.1.4.6.1.8 Configuring the NTP Package
After updating to the current version of the operating system you must configure the NTP package. To configure the NTP package: 1. Add your local servers to the NTP configuration.
2. Configure the ntpd service to begin running when your machine is started:
$ sudo systemctl enable ntpd $ sudo systemctl start ntpd
$ sudo ntpq -p
2.1.4.6.1.9 Configuring the Performance Profile
After configuring the NTP package you must configure the performance profile. To configure the performance profile: 1. Switch the active profile:
$ sudo tuned-adm profile throughput-performance
2. Change the multi-user’s default run level:
$ sudo systemctl set-default multi-user.target
2.1.4.6.1.10 Configuring Your Security Limits
After configuring the performance profile you must configure your security limits. Configuring yoursecurity limits refers to configuring the number of open files, processes, etc. To configure your security limits: 1. Run the bash shell as a super-user:
$ sudo bash
2. Run the following command:
$ echo -e "sqream soft nproc 500000\nsqream hard nproc 500000\nsqream soft nofile␣ ,→500000\nsqream hard nofile 500000\nsqream soft core unlimited\nsqream hard core␣ ,→unlimited" >> /etc/security/limits.conf
3. Run the following command:
$ echo -e "vm.dirty_background_ratio = 5 \n vm.dirty_ratio = 10 \n vm.swappiness =␣ ,→10 \n vm.zone_reclaim_mode = 0 \n vm.vfs_cache_pressure = 200 \n" >> /etc/sysctl. ,→conf
170 Chapter 2. Guides SQream DB, Release 2021.1
2.1.4.6.1.11 Disabling Automatic Bug-Reporting Tools
After configuring your security limits you must disable the following automatic bug-reporting tools: • ccpp.service • oops.service • pstoreoops.service • vmcore.service • xorg.service You can abort the above but-reporting tools by running the following command:
$ for i in abrt-ccpp.service abrtd.service abrt-oops.service abrt-pstoreoops.service␣ ,→abrt-vmcore.service abrt-xorg.service ; do sudo systemctl disable $i; sudo systemctl␣ ,→stop $i; done
2.1.4.6.1.12 Preparing the Nvidia CUDA Drive for Installation
After disabling the automatic bug-reporting tools you must prepare the Nvidia CUDA driver for installation. To prepare the Nvidia CUDA drive for installation: 1. Reboot all servers.
2. Verify that the Tesla NVIDIA card has been installed and is detected by the system:
$ lspci | grep -i nvidia
The correct output is a list of Nvidia graphic cards. If you do not receive this output, verify that an NVIDIA GPU card has been installed. 3. Verify that the open-source upstream Nvidia driver is running:
$ lsmod | grep nouveau
No output should be generated. 4. If you receive any output, do the following: 1. Disable the open-source upstream Nvidia driver:
$ sudo bash $ echo "blacklist nouveau" > /etc/modprobe.d/blacklist-nouveau.conf $ echo "options nouveau modeset=0" >> /etc/modprobe.d/blacklist-nouveau.conf $ dracut --force $ modprobe --showconfig | grep nouveau
2. Reboot the server and verify that the Nouveau model has not been loaded:
$ lsmod | grep nouveau
2.1. Full list of guides 171 SQream DB, Release 2021.1
2.1.4.6.1.13 Installing the Nvidia CUDA Driver
After preparing the Nvidia CUDA driver for installation you must install it. To install the Nvidia CUDA driver: 1. Check if the Nvidia CUDA driver has already been installed:
$ nvidia-smi
The following is an example of the correct output:
nvidia-smi Wed Oct 30 14:05:42 2019 +------+ | NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 | |------+------+------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |======+======+======| | 0 Tesla V100-SXM2... On | 00000004:04:00.0 Off | 0 | | N/A 32C P0 37W / 300W | 0MiB / 16130MiB | 0% Default | +------+------+------+ | 1 Tesla V100-SXM2... On | 00000035:03:00.0 Off | 0 | | N/A 33C P0 37W / 300W | 0MiB / 16130MiB | 0% Default | +------+------+------+
+------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |======| | No running processes found | +------+
2. Verify that the installed CUDA version shown in the output above is 10.1.
3. Do one of the following:
1. If CUDA version 10.1 has already been installed, skip to Docktime Runtime (Community Edition).
2. If CUDA version 10.1 has not been installed yet, continue with Step 4 below. 4. Do one of the following: • Install CUDA Driver version 10.1 for x86_64.
• Install CUDA driver version 10.1 for IBM Power9.
2.1.4.6.1.14 Installing the CUDA Driver Version 10.1 for x86_64
To install the CUDA driver version 10.1 for x86_64:
172 Chapter 2. Guides SQream DB, Release 2021.1
1. Make the following target platform selections:
• Operating system: Linux • Architecture: x86_64 • Distribution: CentOS • Version: 7 • Installer type: the relevant installer type For installer type, SQream recommends selecting runfile (local). The available selections shows only the supported platforms. 2. Download the base installer for Linux CentOS 7 x86_64.
3. Install the base installer for Linux CentOS 7 x86_64: 1. Run the following command:
$ sudo sh cuda_10.1.105_418.39_linux.run
2. Follow the command line prompts. 4. Enable the Nvidia service to start at boot and start it:
$ sudo systemctl enable nvidia-persistenced.service && sudo systemctl start nvidia- ,→persistenced.service
5. Create a symbolic link from the /etc/systemd/system/multi-user.target.wants/nvidia- persistenced.service file to the /usr/lib/systemd/system/nvidia-persistenced.service file.
6. Reboot the server.
7. Verify that the Nvidia driver has been installed and shows all available GPU’s:
$ nvidia-smi
nvidia-smi Wed Oct 30 14:05:42 2019 +------+ | NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 | |------+------+------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |======+======+======| | 0 Tesla V100-SXM2... On | 00000004:04:00.0 Off | 0 | | N/A 32C P0 37W / 300W | 0MiB / 16130MiB | 0% Default | +------+------+------+ | 1 Tesla V100-SXM2... On | 00000035:03:00.0 Off | 0 | | N/A 33C P0 37W / 300W | 0MiB / 16130MiB | 0% Default | +------+------+------+ (continues on next page)
2.1. Full list of guides 173 SQream DB, Release 2021.1
(continued from previous page)
+------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |======| | No running processes found | +------+
2.1.4.6.1.15 Installing the CUDA Driver Version 10.1 for IBM Power9
To install the CUDA driver version 10.1 for IBM Power9: 1. Make the following target platform selections:
• Operating system: Linux • Architecture: ppc64le • Distribution: RHEL • Version: 7 • Installer type: the relevant installer type For installer type, SQream recommends selecting runfile (local). The available selections shows only the supported platforms. To disable the udev rule: 1. Copy the file to the /etc/udev/rules.d directory.
2. Comment out, remove, or change the hot-pluggable memory rule located in file copied to the /etc/udev/rules.d directory. This prevents it from affecting the Power9 Nvidia systems: • Run the following on RHEL version 7.6 or later:
$ sudo cp /lib/udev/rules.d/40-redhat.rules /etc/udev/rules.d $ sudo sed -i 's/SUBSYSTEM!="memory",.*GOTO="memory_hotplug_end"/SUBSYSTEM=="*", ,→ GOTO="memory_hotplug_end"/' /etc/udev/rules.d/40-redhat.rules
4. Enable the nvidia-persisted.service file:
$ sudo systemctl enable nvidia-persistenced.service
5. Create a symbolic link from the /etc/systemd/system/multi-user.target.wants/nvidia- persistenced.service file to the /usr/lib/systemd/system/nvidia-persistenced.service file.
6. Reboot your system to initialize the above modifications.
7. Verify that the Nvidia driver and the nvidia-persistenced.service files are running:
174 Chapter 2. Guides SQream DB, Release 2021.1
$ nvidia smi
The following is the correct output:
nvidia-smi Wed Oct 30 14:05:42 2019 +------+ | NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 | |------+------+------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |======+======+======| | 0 Tesla V100-SXM2... On | 00000004:04:00.0 Off | 0 | | N/A 32C P0 37W / 300W | 0MiB / 16130MiB | 0% Default | +------+------+------+ | 1 Tesla V100-SXM2... On | 00000035:03:00.0 Off | 0 | | N/A 33C P0 37W / 300W | 0MiB / 16130MiB | 0% Default | +------+------+------+
+------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |======| | No running processes found | +------+
8. Verify that the nvidia-persistenced service is running:
$ ystemctl status nvidia-persistenced
The following is the correct output:
root@gpudb ~]systemctl status nvidia-persistenced nvidia-persistenced.service - NVIDIA Persistence Daemon Loaded: loaded (/usr/lib/systemd/system/nvidia-persistenced.service;␣ ,→enabled; vendor preset: disabled) Active: active (running) since Tue 2019-10-15 21:43:19 KST; 11min ago Process: 8257 ExecStart=/usr/bin/nvidia-persistenced --verbose␣ ,→(code=exited, status=0/SUCCESS) Main PID: 8265 (nvidia-persiste) Tasks: 1 Memory: 21.0M CGroup: /system.slice/nvidia-persistenced.service 8265 /usr/bin/nvidia-persistenced --verbose
2.1.4.6.2 Installing the Docker Engine (Community Edition)
After installing the Nvidia CUDA driver you must install the Docker engine. This section describes how to install the Docker engine using the following processors: • Using x86_64 processor on CentOS • Using x86_64 processor on Ubuntu
2.1. Full list of guides 175 SQream DB, Release 2021.1
• Using IBM Power9 (PPC64le) processor
2.1.4.6.2.1 Installing the Docker Engine Using an x86_64 Processor on CentOS
The x86_64 processor supports installing the Docker Community Edition (CE) versions 18.03 and higher. For more information on installing the Docker Engine CE on an x86_64 processor, see Install Docker Engine on CentOS
2.1.4.6.2.2 Installing the Docker Engine Using an x86_64 Processor on Ubuntu
The x86_64 processor supports installing the Docker Community Edition (CE) versions 18.03 and higher. For more information on installing the Docker Engine CE on an x86_64 processor, see Install Docker Engine on Ubuntu
2.1.4.6.2.3 Installing the Docker Engine on an IBM Power9 Processor
The x86_64 processor only supports installing the Docker Community Edition (CE) version 18.03. To install the Docker Engine on an IBM Power9 processor: You can install the Docker Engine on an IBM Power9 processor by running the following command:
$ wget $ http://ftp.unicamp.br/pub/ppc64el/rhel/7_1/docker-ppc64el/container-selinux-2.9-4.el7. ,→noarch.rpm $ wget $ http://ftp.unicamp.br/pub/ppc64el/rhel/7_1/docker-ppc64el/docker-ce-18.03.1.ce-1.el7. ,→centos.ppc64le.rpm $ yum install -y container-selinux-2.9-4.el7.noarch.rpm $ docker-ce-18.03.1.ce-1.el7.centos.ppc64le.rpm
For more information on installing the Docker Engine CE on an IBM Power9 processor, see Install Docker Engine on Ubuntu.
2.1.4.6.3 Docker Post-Installation
After installing the Docker engine you must configure Docker on your local machine. To configure Docker on your local machine: 1. Enable Docker to start on boot:
$ sudo systemctl enable docker && sudo systemctl start docker
2. Enable managing Docker as a non-root user:
$ sudo usermod -aG docker $USER
3. Log out and log back in via SSH. This causes Docker to re-evaluate your group membership.
176 Chapter 2. Guides SQream DB, Release 2021.1
4. Verify that you can run the following Docker command as a non-root user (without sudo):
$ docker run hello-world
If you can run the above Docker command as a non-root user, the following occur: • Docker downloads a test image and runs it in a container. • When the container runs, it prints an informational message and exits. For more information on installing the Docker Post-Installation, see Docker Post-Installation.
2.1.4.6.4 Installing the Nvidia Docker2 ToolKit
After configuring Docker on your local machine you must install the Nvidia Docker2 ToolKit. The NVIDIA Docker2 Toolkit lets you build and run GPU-accelerated Docker containers. The Toolkit includes a container runtime library and related utilities for automatically configuring containers to leverage NVIDIA GPU’s. This section describes the following: • Installing the NVIDIA Docker2 Toolkit on an x86_64 processor. • Installing the NVIDIA Docker2 Toolkit on a PPC64le processor.
2.1.4.6.4.1 Installing the NVIDIA Docker2 Toolkit on an x86_64 Processor
This section describes the following: • Installing the NVIDIA Docker2 Toolkit on a CentOS operating system • Installing the NVIDIA Docker2 Toolkit on an Ubuntu operating system
2.1.4.6.4.2 Installing the NVIDIA Docker2 Toolkit on a CentOS Operating System
To install the NVIDIA Docker2 Toolkit on a CentOS operating system: 1. Install the repository for your distribution:
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) $ curl -s -L $ https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | \ $ sudo tee /etc/yum.repos.d/nvidia-docker.repo
2. Install the nvidia-docker2 package and reload the Docker daemon configuration:
$ sudo yum install nvidia-docker2 $ sudo pkill -SIGHUP dockerd
3. Do one of the following: • If you received an error when installing the nvidia-docker2 package, skip to Step 4. • If you successfully installed the nvidia-docker2 package, skip to Step 5. 4. Do the following:
2.1. Full list of guides 177 SQream DB, Release 2021.1
1. Run the sudo vi /etc/yum.repos.d/nvidia-docker.repo command if the following error is displayed when installing the nvidia-docker2 package:
https://nvidia.github.io/nvidia-docker/centos7/ppc64le/repodata/repomd.xml: [Errno -1] repomd.xml signature could not be verified for nvidia-docker
2. Change repo_gpgcheck=1 to repo_gpgcheck=0. 5. Verify that the NVIDIA-Docker run has been installed correctly:
$ docker run --runtime=nvidia --rm nvidia/cuda:10.1-base nvidia-smi
For more information on installing the NVIDIA Docker2 Toolkit on a CentOS operating system, see Installing the NVIDIA Docker2 Toolkit on a CentOS operating system
2.1.4.6.4.3 Installing the NVIDIA Docker2 Toolkit on an Ubuntu Operating System
To install the NVIDIA Docker2 Toolkit on an Ubuntu operating system: 1. Install the repository for your distribution:
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \ $ sudo apt-key add - $ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) $ curl -s -L $ https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \ $ sudo tee /etc/apt/sources.list.d/nvidia-docker.list $ sudo apt-get update
2. Install the nvidia-docker2 package and reload the Docker daemon configuration:
$ sudo apt-get install nvidia-docker2 $ sudo pkill -SIGHUP dockerd
3. Do one of the following: * If you received an error when installing the nvidia-docker2 package, skip to Step 4. * If you successfully installed the nvidia-docker2 package, skip to Step 5.
4. Do the following: 1. Run the sudo vi /etc/yum.repos.d/nvidia-docker.repo command if the following error is displayed when installing the nvidia-docker2 package:
https://nvidia.github.io/nvidia-docker/centos7/ppc64le/repodata/repomd.xml: [Errno -1] repomd.xml signature could not be verified for nvidia-docker
2. Change repo_gpgcheck=1 to repo_gpgcheck=0. 5. Verify that the NVIDIA-Docker run has been installed correctly:
$ docker run --runtime=nvidia --rm nvidia/cuda:10.1-base nvidia-smi
For more information on installing the NVIDIA Docker2 Toolkit on a CentOS operating system, see Installing the NVIDIA Docker2 Toolkit on an Ubuntu operating system
178 Chapter 2. Guides SQream DB, Release 2021.1
2.1.4.6.4.4 Installing the NVIDIA Docker2 Toolkit on a PPC64le Processor
This section describes how to install the NVIDIA Docker2 Toolkit on an IBM RHEL operating system: To install the NVIDIA Docker2 Toolkit on an IBM RHEL operating system: 1. Import the repository and install the libnvidia-container and the nvidia-container-runtime containers.
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) $ curl -s -L $ https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | \ $ sudo tee /etc/yum.repos.d/nvidia-docker.repo $ sudo yum install -y libnvidia-container*
2. Do one of the following: • If you received an error when installing the containers, skip to Step 3. • If you successfully installed the containers, skip to Step 4. 3. Do the following: 1. Run the sudo vi /etc/yum.repos.d/nvidia-docker.repo command if the following error is displayed when installing the containers:
https://nvidia.github.io/nvidia-docker/centos7/ppc64le/repodata/repomd.xml: [Errno -1] repomd.xml signature could not be verified for nvidia-docker
2. Change repo_gpgcheck=1 to repo_gpgcheck=0.
3. Install the libnvidia-container container.
$ sudo yum install -y libnvidia-container*
4. Install the nvidia-container-runtime container:
$ sudo yum install -y nvidia-container-runtime*
5. Add nvidia runtime to the Docker daemon:
$ sudo mkdir -p /etc/systemd/system/docker.service.d/ $ sudo vi /etc/systemd/system/docker.service.d/override.conf
$ [Service] $ ExecStart= $ ExecStart=/usr/bin/dockerd
6. Restart Docker:
$ sudo systemctl daemon-reload $ sudo systemctl restart docker
7. Verify that the NVIDIA-Docker run has been installed correctly:
2.1. Full list of guides 179 SQream DB, Release 2021.1
$ docker run --runtime=nvidia --rm nvidia/cuda-ppc64le nvidia-smi
2.1.4.6.4.5 Accessing the Hadoop and Kubernetes Configuration Files
The information this section is optional and is only relevant for Hadoop users. If you require Hadoop and Kubernetes (Krb5) connectivity, contact your IT department for access to the following configuration files: • Hadoop configuration files: – core-site.xml – hdfs-site.xml
• Kubernetes files: – Configuration file - krb.conf – Kubernetes Hadoop client certificate - hdfs.keytab Once you have the above files, you must copy them into the correct folders in your working directory. For more information about the correct directory to copy the above files into, see the Installing the SQream Software section below. For related information, see the following sections: • Configuring the Hadoop and Kubernetes Configuration Files. • Setting the Hadoop and Kubernetes Configuration Parameters.
2.1.4.6.5 Installing the SQream Software
2.1.4.6.5.1 Preparing Your Local Environment
After installing the Nvidia Docker2 toolKit you must prepare your local environment. The Linux user preparing the local environment must have read/write access to the following directories for the SQream software to correctly read and write the required resources: • Log directory - default: /var/log/sqream/ • Configuration directory - default: /etc/sqream/ • Cluster directory - the location where SQream writes its DB system, such as /mnt/sqreamdb • Ingest directory - the location where the required data is loaded, such as /mnt/data_source/
2.1.4.6.5.2 Deploying the SQream Software
After preparing your local environment you must deploy the SQream software. Deploying the SQream software requires you to access and extract the required files and to place them in the correct directory. To deploy the SQream software: 1. Contact the SQream Support team for access to the sqream_installer-nnn-DBnnn-COnnn- EDnnn-
180 Chapter 2. Guides SQream DB, Release 2021.1
The sqream_installer-nnn-DBnnn-COnnn-EDnnn-
$ tar -xvf sqream_installer-1.1.5-DB2019.2.1-CO1.5.4-ED3.0.0-x86_64.tar.gz
When the tarball file has been extracted, a new folder will be created. The new folder is automatically given the name of the tarball file:
drwxrwxr-x 9 sqream sqream 4096 Aug 11 11:51 sqream_istaller-1.1.5-DB2019.2.1- ,→CO1.5.4-ED3.0.0-x86_64/ -rw-rw-r-- 1 sqream sqream 3130398797 Aug 11 11:20 sqream_installer-1.1.5- ,→DB2019.2.1-CO1.5.4-ED3.0.0-x86_64.tar.gz
3. Change the directory to the new folder that you created in the previous step. 4. Verify that the folder you just created contains all of the required files.
$ ls -la
The following is an example of the files included in the new folder:
drwxrwxr-x. 10 sqream sqream 198 Jun 3 17:57 . drwx------. 25 sqream sqream 4096 Jun 7 18:11 .. drwxrwxr-x. 2 sqream sqream 226 Jun 7 18:09 .docker drwxrwxr-x. 2 sqream sqream 64 Jun 3 12:55 .hadoop drwxrwxr-x. 2 sqream sqream 4096 May 31 14:18 .install drwxrwxr-x. 2 sqream sqream 39 Jun 3 12:53 .krb5 drwxrwxr-x. 2 sqream sqream 22 May 31 14:18 license drwxrwxr-x. 2 sqream sqream 82 May 31 14:18 .sqream -rwxrwxr-x. 1 sqream sqream 1712 May 31 14:18 sqream-console -rwxrwxr-x. 1 sqream sqream 4608 May 31 14:18 sqream-install
For information relevant to Hadoop users, see the following sections: • Accessing the Hadoop and Kubernetes Configuration Files. • Configuring the Hadoop and Kubernetes Configuration Files. • Setting the Hadoop and Kubernetes Configuration Parameters.
2.1.4.6.5.3 Configuring the Hadoop and Kubernetes Configuration Files
The information in this section is optional and is only relevant for Hadoop users. If you require Hadoop and Kubernetes (Krb5) connectivity, you must copy the Hadoop and Kubernetes files into the correct folders in your working directory as shown below: • .hadoop/core-site.xml
2.1. Full list of guides 181 SQream DB, Release 2021.1
• .hadoop/hdfs-site.xml • .krb5/krb5.conf • .krb5/hdfs.keytab For related information, see the following sections: • Accessing the Hadoop and Kubernetes Configuration Files. • Setting the Hadoop and Kubernetes Configuration Parameters.
2.1.4.6.5.4 Configuring the SQream Software
After deploying the SQream software, and optionally configuring the Hadoop and Kubernetes configuration files, you must configure the SQream software. Configuring the SQream software requires you to do the following: • Configure your local environment • Understand the sqream-install flags • Install your SQream license • Validate your SQream icense • Change your data ingest folder
2.1.4.6.5.5 Configuring Your Local Environment
Once you’ve downloaded the SQream software, you can begin configuring your local environment. The following commands must be run (as sudo) from the same directory that you located your packages. For example, you may have saved your packages in /home/sqream/sqream-console-package/. The following table shows the flags that you can use to configure your local directory:
182 Chapter 2. Guides SQream DB, Release 2021.1
Flag Function Note -i Loads all software from the hidden folder .docker. Mandatory -k Loads all license packages from the /license direc- Mandatory tory. -f Overwrites existing folders. Note Using -f over- Mandatory writes all files located in mounted directories. -c Defines the origin path for writing/reading SQream If you are installing the Docker ver- configuration files. The default location is /etc/ sion on a server that already works with sqream/. SQream, do not use the default path. -v The SQream cluster location. If a cluster does not Mandatory exist yet, -v creates one. If a cluster already exists, -v mounts it. -l SQream system startup logs location, including startup logs and docker logs. The default location is /var/log/sqream/. -d The directory containing customer data to be im- ported and/or copied to SQream. -s Shows system settings. -r Resets the system configuration. This value is run Mandatory without any other variables. -h Help. Shows the available flags. Mandatory -K Runs license validation -e Used for inserting your RKrb5 server DNS name. For more information on setting your Kerberos con- figuration parameters, see Setting the Hadoop and Kubernetes Configuration Parameters. -p Used for inserting your Kerberos user name. For more information on setting your Kerberos config- uration parameters, see Setting the Hadoop and Ku- bernetes Configuration Parameters.
2.1.4.6.5.6 Installing Your License
Once you’ve configured your local environment, you must install your license by copying it into theSQream installation package folder located in the ./license folder:
$ sudo ./sqream-install -k
You do not need to extract this folder after uploading into the ./license.
2.1.4.6.5.7 Validating Your License
You can copy your license package into the SQream console folder located in the /license folder by running the following command:
$ sudo ./sqream-install -K
The following mandatory flags must be used in the first run:
$ sudo ./sqream-install -i -k -v
2.1. Full list of guides 183 SQream DB, Release 2021.1
The following is an example of the correct command syntax:
$ sudo ./sqream-install -i -k -c /etc/sqream -v /home/sqream/sqreamdb -l $ /var/log/sqream -d /home/sqream/data_ingest
2.1.4.6.5.8 Setting the Hadoop and Kubernetes Connectivity Parameters
The information in this section is optional, and is only relevant for Hadoop users. If you require Hadoop and Kubernetes (Krb5) connectivity, you must set their connectivity parameters. The following is the correct syntax when setting the Hadoop and Kubernetes connectivity parameters:
$ sudo ./sqream-install -p
The following is an example of setting the Hadoop and Kubernetes connectivity parameters:
$ sudo ./sqream-install -p
For related information, see the following sections: • Accessing the Hadoop and Kubernetes Configuration Files. • Configuring the Hadoop and Kubernetes Configuration Files.
2.1.4.6.5.9 Modifying Your Data Ingest Folder
Once you’ve validated your license, you can modify your data ingest folder after the first run by running the following command:
$ sudo ./sqream-install -d /home/sqream/data_in
2.1.4.6.5.10 Configuring Your Network for Docker
Once you’ve modified your data ingest folder (if needed), you must validate that the server network and Docker network that you are setting up do not overlap. To configure your network for Docker: 1. To verify that your server network and Docker network do not overlap, run the following command:
$ ifconfig | grep 172.
2. Do one of the following: • If running the above command output no results, continue the installation process. • If running the above command output results, run the following command:
$ ifconfig | grep 192.168.
184 Chapter 2. Guides SQream DB, Release 2021.1
2.1.4.6.5.11 Checking and Verifying Your System Settings
Once you’ve configured your network for Docker, you can check and verify your system settings. Running the following command shows you all the variables used by your SQream system:
$ ./sqream-install -s
The following is an example of the correct output:
SQREAM_CONSOLE_TAG=1.5.4 SQREAM_TAG=2019.2.1 SQREAM_EDITOR_TAG=3.0.0 license_worker_0=f0:cc: license_worker_1=26:91: license_worker_2=20:26: license_worker_3=00:36: SQREAM_VOLUME=/media/sqreamdb SQREAM_DATA_INGEST=/media/sqreamdb/data_in SQREAM_CONFIG_DIR=/etc/sqream/ LICENSE_VALID=true SQREAM_LOG_DIR=/var/log/sqream/ SQREAM_USER=sqream SQREAM_HOME=/home/sqream SQREAM_ENV_PATH=/home/sqream/.sqream/env_file PROCESSOR=x86_64 METADATA_PORT=3105 PICKER_PORT=3108 NUM_OF_GPUS=2 CUDA_VERSION=10.1 NVIDIA_SMI_PATH=/usr/bin/nvidia-smi DOCKER_PATH=/usr/bin/docker NVIDIA_DRIVER=418 SQREAM_MODE=single_host
2.1.4.6.6 Using the SQream Console
After configuring the SQream software and veriying your system settings you can begin using theSQream console.
2.1.4.6.6.1 SQream Console - Basic Commands
The SQream console offers the following basic commands: • Starting your SQream console • Starting Metadata and Picker • Starting the running services • Listing the running services • Stopping the running services • Using the SQream editor
2.1. Full list of guides 185 SQream DB, Release 2021.1
• Using the SQream Client
2.1.4.6.6.2 Starting Your SQream Console
You can start your SQream console by running the following command:
$ ./sqream-console
2.1.4.6.6.3 Starting the SQream Master
To listen to metadata and picker: 1. Start the metadata server (default port 3105) and picker (default port 3108) by running the following command:
$ sqream master --start
The following is the correct output:
sqream-console> sqream master --start starting master server in single_host mode ... sqream_single_host_master is up and listening on ports: 3105,3108
2. Optional - Change the metadata and server picker ports by adding -p
$ sqream-console>sqream master --start -p 4105 -m 43108 $ starting master server in single_host mode ... $ sqream_single_host_master is up and listening on ports: 4105,4108
2.1.4.6.6.4 Starting SQream Workers
When starting SQream workers, setting the
$ sqream worker --start
The following is an example of expected output when setting the ``
.. code-block:: console
sqream-console>sqream worker --start 2 started sqream_single_host_worker_0 on port 5000, allocated gpu: 0 started sqream_single_host_worker_1 on port 5001, allocated gpu: 1
2.1.4.6.6.5 Listing the Running Services
You can list running SQream services to look for container names and ID’s by running the following command:
186 Chapter 2. Guides SQream DB, Release 2021.1
$ sqream master --list
The following is an example of the expected output: sqream-console>sqream master --list container name: sqream_single_host_worker_0, container id: c919e8fb78c8 container name: sqream_single_host_master, container id: ea7eef80e038--
2.1.4.6.6.6 Stopping the Running Services
You can stop running services either for a single SQream worker, or all SQream services for both master and worker. The following is the command for stopping a running service for a single SQream worker:
$ sqream worker --stop
The following is an example of expected output when stopping a running service for a single SQream worker: sqream worker stop
You can stop all running SQream services (both master and worker) by running the following command:
$ sqream-console>sqream master --stop --all
The following is an example of expected output when stopping all running services: sqream-console>sqream master --stop --all stopped container sqream_single_host_worker_0, id: 892a8f1a58c5 stopped container sqream_single_host_master, id: 55cb7e38eb22
2.1.4.6.6.7 Using SQream Studio
SQream Studio is an SQL statement editor. To start SQream Studio: 1. Run the following command:
$ sqream studio --start
The following is an example of the expected output:
SQream Acceleration Studio is available at http://192.168.1.62:8080
2. Click the http://192.168.1.62:8080 link shown in the CLI. To stop SQream Studio: You can stop your SQream Studio by running the following command:
$ sqream studio --stop
The following is an example of the expected output:
2.1. Full list of guides 187 SQream DB, Release 2021.1
sqream_admin stopped
2.1.4.6.6.8 Using the SQream Client
You can use the embedded SQream Client on the following nodes: • Master node • Worker node When using the SQream Client on the Master node, the following default settings are used: • Default port: 3108. You can change the default port using the -p variable. • Default database: master. You can change the default database using the -d variable. The following is an example:
$ sqream client --master -u sqream -w sqream
When using the SQream Client on a Worker node (or nodes), you should use the -p variable for Worker ports. The default database is master, but you can use the -d variable to change databases. The following is an example:
$ sqream client --worker -p 5000 -u sqream -w sqream
2.1.4.6.6.9 Moving from Docker Installation to Standard On-Premises Installation
Because Docker creates all files and directories on the host atthe root level, you must grant ownership of the SQream storage folder to the working directory user.
2.1.4.6.6.10 SQream Console - Advanced Commands
The SQream console offers the following advanced commands: • Controlling the spool size • Splitting a GPU • Splitting a GPU and setting the spool size • Using a custom configuration file • Clustering your Docker environment
2.1.4.6.6.11 Controlling the Spool Size
From the console you can define a spool size value. The following example shows the spool size being set to 50:
$ sqream-console>sqream worker --start 2 -m 50
If you don’t define the SQream spool size, the SQream console automatically distributes the available RAM between all running workers.
188 Chapter 2. Guides SQream DB, Release 2021.1
2.1.4.6.6.12 Splitting a GPU
You can start more than one sqreamd on a single GPU by splitting it. The following example shows the GPU being split into two sqreamd’s on the GPU in slot 0:
$ sqream-console>sqream worker --start 2 -g 0
2.1.4.6.6.13 Splitting GPU and Setting the Spool Size
You can simultaneously split a GPU and set the spool size by appending the -m flag:
$ sqream-console>sqream worker --start 2 -g 0 -m 50
Note: The console does not validate whether the user-defined spool size is available. Before setting the spool size, verify that the requested resources are available.
2.1.4.6.6.14 Using a Custom Configuration File
SQream lets you use your own external custom configuration json files. You must place these json filesin the path mounted in the installation. SQream recommends placing the json file in the Configuration folder. The SQream console does not validate the integrity of your external configuration files. When using your custom configuration file, you can use the -j flag to define the full path to the Configuration file, as in the example below:
$ sqream-console>sqream worker --start 1 -j /etc/sqream/configfile.json
Note: To start more than one sqream daemon, you must provide files for each daemon, as in the example below:
$ sqream worker --start 2 -j /etc/sqream/configfile.json /etc/sqream/configfile2.json
Note: To split a specific GPU, you must also list the GPU flag, as in the example below:
$ sqream worker --start 2 -g 0 -j /etc/sqream/configfile.json /etc/sqream/configfile2. ,→json
2.1.4.6.6.15 Clustering Your Docker Environment
SQream lets you connect to a remote Master node to start Docker in Distributed mode. If you have already connected to a Slave node server in Distributed mode, the sqream Master and Client commands are only available on the Master node.
2.1. Full list of guides 189 SQream DB, Release 2021.1
$ --master-host $ sqream-console>sqream worker --start 1 --master-host 192.168.0.1020
2.1.4.6.6.16 Checking the Status of SQream Services
SQream lets you check the status of SQream services from the following locations: • From the Sqream console • From outside the Sqream console
2.1.4.6.6.17 Checking the Status of SQream Services from the SQream Console
From the SQream console, you can check the status of SQream services by running the following command:
$ sqream-console>sqream master --list
The following is an example of the expected output:
$ sqream-console>sqream master --list $ checking 3 sqream services: $ sqream_single_host_worker_1 up, listens on port: 5001 allocated gpu: 1 $ sqream_single_host_worker_0 up, listens on port: 5000 allocated gpu: 1 $ sqream_single_host_master up listens on ports: 3105,3108
2.1.4.6.6.18 Checking the Status of SQream Services from Outside the SQream Console
From outside the Sqream Console, you can check the status of SQream services by running the following commands:
$ sqream-status $ NAMES STATUS PORTS $ sqream_single_host_worker_1 Up 3 minutes 0.0.0.0:5001->5001/tcp $ sqream_single_host_worker_0 Up 3 minutes 0.0.0.0:5000->5000/tcp $ sqream_single_host_master Up 3 minutes 0.0.0.0:3105->3105/tcp, 0.0.0.0:3108->3108/tcp $ sqream_editor_3.0.0 Up 3 hours (healthy) 0.0.0.0:3000->3000/tcp
2.1.4.6.6.19 Upgrading Your SQream System
This section describes how to upgrade your SQream system. To upgrade your SQream system: 1. Contact the SQream Support team for access to the new SQream package tarball file.
2. Set a maintenance window to enable stopping the system while upgrading it.
3. Extract the following tarball file received from the SQream Support team, under it with the same user and in the same folder that you used while Downloading the SQream Software.
190 Chapter 2. Guides SQream DB, Release 2021.1
$ tar -xvf sqream_installer-2.0.5-DB2019.2.1-CO1.6.3-ED3.0.0-x86_64/
4. Navigate to the new folder created as a result of extracting the tarball file:
$ cd sqream_installer-2.0.5-DB2019.2.1-CO1.6.3-ED3.0.0-x86_64/
5. Initiate the upgrade process:
$ ./sqream-install -i
Initiating the upgrade process checks if any SQream services are running. If any services are running, you will be prompted to stop them. 6. Do one of the following: • Select Yes to stop all running SQream workers (Master and Editor) and continue the upgrade process. • Select No to stop the upgrade process. SQream periodically upgrades the metadata structure. If an upgrade version includes a change to the metadata structure, you will be prompted with an approval request message. Your approval is required to finish the upgrade process. Because SQream supports only certain metadata versions, all SQream services must be upgraded at the same time. 7. When the upgrade is complete, load the SQream console and restart your services. For assistance, contact SQream Support.
2.1.4.7 Monitoring query performance
When analyzing options for query tuning, the first step is to analyze the query plan and execution. The query plan and execution details explains how SQream DB processes a query and where time is spent. This document details how to analyze query performance with execution plans. This guide focuses specifically on identifying bottlenecks and possible optimization techniques to improve query performance. Performance tuning options for each query are different. You should adapt the recommendations and tips for your own workloads. See also our Optimization and Best Practices guide for more information about data loading considerations and other best practices.
In this section:
• Setting up the system for monitoring – Adjusting the logging frequency – Reading execution plans with a foreign table • The SHOW_NODE_INFO command • Understanding the query execution plan output
2.1. Full list of guides 191 SQream DB, Release 2021.1
– Information presented in the execution plan – Commonly seen nodes • Examples – 1. Spooling to disk ∗ Identifying the offending nodes ∗ Common solutions for reducing spool – 2. Queries with large result sets ∗ Identifying the offending nodes ∗ Common solutions for reducing gather time – 3. Inefficient filtering ∗ Identifying the situation ∗ Common solutions for improving filtering – 4. Joins with varchar keys ∗ Identifying the situation ∗ Improving query performance – 5. Sorting on big VARCHAR fields ∗ Identifying the situation ∗ Improving sort performance on text keys – 6. High selectivity data ∗ Identifying the situation ∗ Improving performance with high selectivity hints – 7. Performance of unsorted data in joins ∗ Identifying the situation ∗ Improving join performance when data is sparse – 8. Manual join reordering ∗ Identifying the situation ∗ Changing the join order • Further reading
2.1.4.7.1 Setting up the system for monitoring
By default, SQream DB logs execution details for every statement that runs for more than 60 seconds. If you want to see the execution details for a currently running statement, see The SHOW_NODE_INFO command below.
192 Chapter 2. Guides SQream DB, Release 2021.1
2.1.4.7.1.1 Adjusting the logging frequency
To adjust the frequency of logging for statements, you may want to reduce the interval from 60 seconds down to, say, 5 or 10 seconds. Modify the configuration files and set the nodeInfoLoggingSec parameter as you see fit:
{ "compileFlags":{ }, "runtimeFlags":{ }, "runtimeGlobalFlags":{ "nodeInfoLoggingSec" : 5, }, "server":{ } }
After restarting the SQream DB cluster, the execution plan details will be logged to the standard SQream DB logs directory, as a message of type 200. You can see these messages with a text viewer or with queries on the log Foreign Tables.
2.1.4.7.1.2 Reading execution plans with a foreign table
First, create a foreign table for the logs
CREATE FOREIGN TABLE logs ( start_marker VARCHAR(4), row_id BIGINT, timestamp DATETIME, message_level TEXT, thread_id TEXT, worker_hostname TEXT, worker_port INT, connection_id INT, database_name TEXT, user_name TEXT, statement_id INT, service_name TEXT, message_type_id INT, message TEXT, end_message VARCHAR(5) ) WRAPPER cdv_fdw OPTIONS ( LOCATION = '/home/rhendricks/sqream_storage/logs/**/sqream*.log', DELIMITER = '|' ) ;
2.1. Full list of guides 193 SQream DB, Release 2021.1
Once you’ve defined the foreign table, you can run queries to observe the previously logged execution plans. This is recommended over looking at the raw logs. t=> SELECT message . FROM logs . WHERE message_type_id = 200 . AND timestamp BETWEEN '2020-06-11' AND '2020-06-13'; message ------,→------SELECT *,coalesce((depdelay > 15),false) AS isdepdelayed FROM ontime WHERE year IN (2005, ,→ 2006, 2007, 2008, 2009, 2010) : : 1,PushToNetworkQueue ,10354468,10,1035446,2020-06-12 20:41:42,-1,,,,13.55 : 2,Rechunk ,10354468,10,1035446,2020-06-12 20:41:42,1,,,,0.10 : 3,ReorderInput ,10354468,10,1035446,2020-06-12 20:41:42,2,,,,0.00 : 4,DeferredGather ,10354468,10,1035446,2020-06-12 20:41:42,3,,,,1.23 : 5,ReorderInput ,10354468,10,1035446,2020-06-12 20:41:41,4,,,,0.01 : 6,GpuToCpu ,10354468,10,1035446,2020-06-12 20:41:41,5,,,,0.07 : 7,GpuTransform ,10354468,10,1035446,2020-06-12 20:41:41,6,,,,0.02 : 8,ReorderInput ,10354468,10,1035446,2020-06-12 20:41:41,7,,,,0.00 : 9,Filter ,10354468,10,1035446,2020-06-12 20:41:41,8,,,,0.07 : 10,GpuTransform ,10485760,10,1048576,2020-06-12 20:41:41,9,,,,0.07 : 11,GpuDecompress ,10485760,10,1048576,2020-06-12 20:41:41,10,,,,0.03 : 12,GpuTransform ,10485760,10,1048576,2020-06-12 20:41:41,11,,,,0.22 : 13,CpuToGpu ,10485760,10,1048576,2020-06-12 20:41:41,12,,,,0.76 : 14,ReorderInput ,10485760,10,1048576,2020-06-12 20:41:40,13,,,,0.11 : 15,Rechunk ,10485760,10,1048576,2020-06-12 20:41:40,14,,,,5.58 : 16,CpuDecompress ,10485760,10,1048576,2020-06-12 20:41:34,15,,,,0.04 : 17,ReadTable ,10485760,10,1048576,2020-06-12 20:41:34,16,832MB,,public. ,→ontime,0.55
2.1.4.7.2 The SHOW_NODE_INFO command
The SHOW_NODE_INFO command returns a snapshot of the current query plan, similar to EXPLAIN ANALYZE from other databases. The SHOW_NODE_INFO result, just like the periodically-logged execution plans described above, are an at-the-moment view of the compiler’s execution plan and runtime statistics for the specified statement. To inspect a currently running statement, execute the show_node_info utility function in a SQL client like sqream sql, the SQream Studio Editor, or any other third party SQL terminal. In this example, we inspect a statement with statement ID of 176. The command looks like this: t=> SELECT SHOW_NODE_INFO(176); stmt_id | node_id | node_type | rows | chunks | avg_rows_in_chunk | time ␣ ,→ | parent_node_id | read | write | comment | timeSum ------+------+------+------+------+------+------,→------+------+------+------+------+------176 | 1 | PushToNetworkQueue | 1 | 1 | 1 | 2019-12-25␣ ,→23:53:13 | -1 | | | | 0.0025 176 | 2 | Rechunk | 1 | 1 | 1 | 2019-12-25␣ ,→23:53:13 | 1 | | | | 0 (continues on next page)
194 Chapter 2. Guides SQream DB, Release 2021.1
(continued from previous page) 176 | 3 | GpuToCpu | 1 | 1 | 1 | 2019-12-25␣ ,→23:53:13 | 2 | | | | 0 176 | 4 | ReorderInput | 1 | 1 | 1 | 2019-12-25␣ ,→23:53:13 | 3 | | | | 0 176 | 5 | Filter | 1 | 1 | 1 | 2019-12-25␣ ,→23:53:13 | 4 | | | | 0.0002 176 | 6 | GpuTransform | 457 | 1 | 457 | 2019-12-25␣ ,→23:53:13 | 5 | | | | 0.0002 176 | 7 | GpuDecompress | 457 | 1 | 457 | 2019-12-25␣ ,→23:53:13 | 6 | | | | 0 176 | 8 | CpuToGpu | 457 | 1 | 457 | 2019-12-25␣ ,→23:53:13 | 7 | | | | 0.0003 176 | 9 | Rechunk | 457 | 1 | 457 | 2019-12-25␣ ,→23:53:13 | 8 | | | | 0 176 | 10 | CpuDecompress | 457 | 1 | 457 | 2019-12-25␣ ,→23:53:13 | 9 | | | | 0 176 | 11 | ReadTable | 457 | 1 | 457 | 2019-12-25␣ ,→23:53:13 | 10 | 4MB | | public.nba | 0.0004
2.1.4.7.3 Understanding the query execution plan output
Both SHOW_NODE_INFO and the logged execution plans represents the query plan as a graph hierarchy, with data separated into different columns. Each row represents a single logical database operation, which is also called a node or chunk producer. A node reports several metrics during query execution, such as how much data it has read and written, how many chunks and rows, and how much time has elapsed. Consider the example show_node_info presented above. The source node with ID #11 (ReadTable), has a parent node ID #10 (CpuDecompress). If we were to draw this out in a graph, it’d look like this: The last node, also called the sink, has a parent node ID of -1, meaning it has no parent. This is typically a node that sends data over the network or into a table. When using SHOW_NODE_INFO, a tabular representation of the currently running statement execution is presented. See the examples below to understand how the query execution plan is instrumental in identifying bottlenecks and optimizing long-running statements.
2.1. Full list of guides 195 SQream DB, Release 2021.1
196 Chapter 2. Guides
Fig. 3: This graph explains how the query execution details are arranged in a logical order, from the bottom up. SQream DB, Release 2021.1
2.1.4.7.3.1 Information presented in the execution plan
Table 6: Result columns Column name Description stmt_id The ID for the statement node_id This node’s ID in this execution plan node_type The node type rows Total number of rows this node has processed chunks Number of chunks this node has processed avg_rows_in_chunk Average amount of rows that this node processes in each chunk (rows / chunks) time Timestamp for this node’s creation parent_node_id The node_id of this node’s parent read Total data read from disk write Total data written to disk comment Additional information (e.g. table name for ReadTable) timesum Total elapsed time for this execution node’s processing
2.1.4.7.3.2 Commonly seen nodes
Table 7: Node types Column name Execution location Description CpuDecompress CPU Decompression operation, common for longer VARCHAR types CpuLoopJoin CPU A non-indexed nested loop join, performed on the CPU CpuReduce CPU A reduce process performed on the CPU, primarily with DISTINCT aggregates (e.g. COUNT(DISTINCT ...)) CpuToGpu, GpuToCpu An operation that moves data to or from the GPU for processing CpuTransform CPU A transform operation performed on the CPU, usually a scalar function DeferredGather CPU Merges the results of GPU operations with a result set Distinct GPU Removes duplicate rows (usually as part of the DISTINCT operation) Distinct_Merge CPU The merge operation of the Distinct operation Filter GPU A filtering operation, such asa WHERE or JOIN clause GpuDecompress GPU Decompression operation GpuReduceMerge GPU An operation to optimize part of the merger phases in the GPU GpuTransform GPU A transformation operation such as a type cast or scalar function LocateFiles CPU Validates external file paths for foreign data wrappers, expanding directories and GLOB patterns LoopJoin GPU A non-indexed nested loop join, performed on the GPU ParseCsv CPU A CSV parser, used after ReadFiles to convert the CSV into columnar data PushToNetworkQueue CPU Sends result sets to a client connected over the network ReadFiles CPU Reads external flat-files ReadTable CPU Reads data from a standard table stored on disk Rechunk Reorganize multiple small chunks into a full chunk. Commonly found after joins and when HIGH_SELECTIVITY is used Reduce GPU A reduction operation, such as a GROUP BY ReduceMerge GPU A merge operation of a reduction operation, helps operate on larger-than-RAM data ReorderInput Change the order of arguments in preparation for the next operation SeparatedGather GPU Gathers additional columns for the result Sort GPU Sort operation TakeRowsFromChunk Take the first N rows from each chunk, to optimize LIMIT when used alongside ORDER BY Top Limits the input size, when used with LIMIT (or its alias TOP) UdfTransform CPU Executes a user defined function Continued on next page
2.1. Full list of guides 197 SQream DB, Release 2021.1
Table 7 – continued from previous page Column name Execution location Description UnionAll Combines two sources of data when UNION ALL is used Window GPU Executes a non-ranking window function WindowRanking GPU Executes a ranking window function WriteTable CPU Writes the result set to a standard table stored on disk
Tip: The full list of nodes appears in the Node types table, as part of the SHOW_NODE_INFO reference.
2.1.4.7.4 Examples
In general, looking at the top three longest running nodes (as is detailed in the timeSum column) can indicate the biggest bottlenecks. In the following examples you will learn how to identify and solve some common issues.
In this section:
• 1. Spooling to disk – Identifying the offending nodes – Common solutions for reducing spool • 2. Queries with large result sets – Identifying the offending nodes – Common solutions for reducing gather time • 3. Inefficient filtering – Identifying the situation – Common solutions for improving filtering • 4. Joins with varchar keys – Identifying the situation – Improving query performance • 5. Sorting on big VARCHAR fields – Identifying the situation – Improving sort performance on text keys • 6. High selectivity data – Identifying the situation – Improving performance with high selectivity hints • 7. Performance of unsorted data in joins – Identifying the situation – Improving join performance when data is sparse
198 Chapter 2. Guides SQream DB, Release 2021.1
• 8. Manual join reordering – Identifying the situation – Changing the join order
2.1.4.7.4.1 1. Spooling to disk
When there is not enough RAM to process a statement, SQream DB will spill over data to the temp folder in the storage disk. While this ensures that a statement can always finish processing, it can slow down the processing significantly. It’s worth identifying these statements, to figure out if the cluster is configured correctly, as well as potentially reduce the statement size. You can identify a statement that spools to disk by looking at the write column in the execution details. A node that spools will have a value, shown in megabytes in the write column. Common nodes that write spools include Join or LoopJoin.
2.1.4.7.4.2 Identifying the offending nodes
1. Run a query. For example, a query from the TPC-H benchmark:
SELECT o_year, SUM(CASE WHEN nation = 'BRAZIL' THEN volume ELSE 0 END) / SUM(volume) AS mkt_ ,→share FROM (SELECT datepart(YEAR,o_orderdate) AS o_year, l_extendedprice*(1 - l_discount / 100.0) AS volume, n2.n_name AS nation FROM lineitem JOIN part ON p_partkey = CAST (l_partkey AS INT) JOIN orders ON l_orderkey = o_orderkey JOIN customer ON o_custkey = c_custkey JOIN nation n1 ON c_nationkey = n1.n_nationkey JOIN region ON n1.n_regionkey = r_regionkey JOIN supplier ON s_suppkey = l_suppkey JOIN nation n2 ON s_nationkey = n2.n_nationkey WHERE o_orderdate BETWEEN '1995-01-01' AND '1996-12-31') AS all_nations GROUP BY o_year ORDER BY o_year;
2. Observe the execution information by using the foreign table, or use show_node_info This statement is made up of 199 nodes, starting from a ReadTable, and finishes by returning only 2 results to the client. The execution below has been shortened, but note the highlighted rows for LoopJoin: t=> SELECT message FROM logs WHERE message_type_id = 200 LIMIT 1; message ------,→----- (continues on next page)
2.1. Full list of guides 199 SQream DB, Release 2021.1
(continued from previous page) SELECT o_year, SUM(CASE WHEN nation = 'BRAZIL' THEN volume ELSE 0 END) / SUM(volume) AS mkt_ ,→share : FROM (SELECT datepart(YEAR,o_orderdate) AS o_year, : l_extendedprice*(1 - l_discount / 100.0) AS volume, : n2.n_name AS nation : FROM lineitem : JOIN part ON p_partkey = CAST (l_partkey AS INT) : JOIN orders ON l_orderkey = o_orderkey : JOIN customer ON o_custkey = c_custkey : JOIN nation n1 ON c_nationkey = n1.n_nationkey : JOIN region ON n1.n_regionkey = r_regionkey : JOIN supplier ON s_suppkey = l_suppkey : JOIN nation n2 ON s_nationkey = n2.n_nationkey : WHERE o_orderdate BETWEEN '1995-01-01' AND '1996-12-31') AS all_nations : GROUP BY o_year : ORDER BY o_year : 1,PushToNetworkQueue ,2,1,2,2020-09-04 18:32:50,-1,,,,0.27 : 2,Rechunk ,2,1,2,2020-09-04 18:32:50,1,,,,0.00 : 3,SortMerge ,2,1,2,2020-09-04 18:32:49,2,,,,0.00 : 4,GpuToCpu ,2,1,2,2020-09-04 18:32:49,3,,,,0.00 : 5,Sort ,2,1,2,2020-09-04 18:32:49,4,,,,0.00 : 6,ReorderInput ,2,1,2,2020-09-04 18:32:49,5,,,,0.00 : 7,GpuTransform ,2,1,2,2020-09-04 18:32:49,6,,,,0.00 : 8,CpuToGpu ,2,1,2,2020-09-04 18:32:49,7,,,,0.00 : 9,Rechunk ,2,1,2,2020-09-04 18:32:49,8,,,,0.00 : 10,ReduceMerge ,2,1,2,2020-09-04 18:32:49,9,,,,0.03 : 11,GpuToCpu ,6,3,2,2020-09-04 18:32:49,10,,,,0.00 : 12,Reduce ,6,3,2,2020-09-04 18:32:49,11,,,,0.64 [...] : 49,LoopJoin ,182369485,7,26052783,2020-09-04 18:32:36,48,1915MB, ,→1915MB,inner,4.94 [...] : 98,LoopJoin ,182369485,12,15197457,2020-09-04 18:32:16,97,2191MB, ,→2191MB,inner,5.01 [...] : 124,LoopJoin ,182369485,8,22796185,2020-09-04 18:32:03,123,3064MB, ,→3064MB,inner,6.73 [...] : 150,LoopJoin ,182369485,10,18236948,2020-09-04 18:31:47,149,12860MB, ,→12860MB,inner,23.62 [...] : 199,ReadTable ,20000000,1,20000000,2020-09-04 18:30:33,198,0MB,,public. ,→part,0.83
Because of the relatively low amount of RAM in the machine and because the data set is rather large at around 10TB, SQream DB needs to spool. The total spool used by this query is around 20GB (1915MB + 2191MB + 3064MB + 12860MB).
200 Chapter 2. Guides SQream DB, Release 2021.1
2.1.4.7.4.3 Common solutions for reducing spool
• Increase the amount of spool memory available for the workers, as a proportion of the maximum statement memory. When the amount of spool memory is increased, SQream DB may not need to write to disk. This setting is called spoolMemoryGB. Refer to the Configuration guide. • Reduce the amount of workers per host, and increase the amount of spool available to the (now reduced amount of) active workers. This may reduce the amount of concurrent statements, but will improve performance for heavy statements.
2.1.4.7.4.4 2. Queries with large result sets
When queries have large result sets, you may see a node called DeferredGather. This gathering occurs when the result set is assembled, in preparation for sending it to the client.
2.1.4.7.4.5 Identifying the offending nodes
1. Run a query. For example, a modified query from the TPC-H benchmark:
SELECT s.*, l.*, r.*, n1.*, n2.*, p.*, o.*, c.* FROM lineitem l JOIN part p ON p_partkey = CAST (l_partkey AS INT) JOIN orders o ON l_orderkey = o_orderkey JOIN customer c ON o_custkey = c_custkey JOIN nation n1 ON c_nationkey = n1.n_nationkey JOIN region r ON n1.n_regionkey = r_regionkey JOIN supplier s ON s_suppkey = l_suppkey JOIN nation n2 ON s_nationkey = n2.n_nationkey WHERE r_name = 'AMERICA' AND o_orderdate BETWEEN '1995-01-01' AND '1996-12-31' AND high_selectivity(p_type = 'ECONOMY BURNISHED NICKEL');
2. Observe the execution information by using the foreign table, or use show_node_info This statement is made up of 221 nodes, containing 8 ReadTable nodes, and finishes by returning billions of results to the client. The execution below has been shortened, but note the highlighted rows for DeferredGather:
t=> SELECT show_node_info(494); stmt_id | node_id | node_type | rows | chunks | avg_rows_in_chunk |␣ ,→time | parent_node_id | read | write | comment | timeSum (continues on next page)
2.1. Full list of guides 201 SQream DB, Release 2021.1
(continued from previous page) ------+------+------+------+------+------+- ,→------+------+------+------+------+------494 | 1 | PushToNetworkQueue | 242615 | 1 | 242615 |␣ ,→2020-09-04 19:07:55 | -1 | | | | 0.36 494 | 2 | Rechunk | 242615 | 1 | 242615 |␣ ,→2020-09-04 19:07:55 | 1 | | | | 0 494 | 3 | ReorderInput | 242615 | 1 | 242615 |␣ ,→2020-09-04 19:07:55 | 2 | | | | 0 494 | 4 | DeferredGather | 242615 | 1 | 242615 |␣ ,→2020-09-04 19:07:55 | 3 | | | | 0.16 [...] 494 | 166 | DeferredGather | 3998730 | 39 | 102531 |␣ ,→2020-09-04 19:07:47 | 165 | | | | 21.75 [...] 494 | 194 | DeferredGather | 133241 | 20 | 6662 |␣ ,→2020-09-04 19:07:03 | 193 | | | | 0.41 [...] 494 | 221 | ReadTable | 20000000 | 20 | 1000000 |␣ ,→2020-09-04 19:07:01 | 220 | 20MB | | public.part | 0.1
When you see DeferredGather operations taking more than a few seconds, that’s a sign that you’re selecting too much data. In this case, the DeferredGather with node ID 166 took over 21 seconds. 3. Modify the statement to see the difference Altering the select clause to be more restrictive will reduce the deferred gather time back to a few milliseconds.
SELECT DATEPART(year, o_orderdate) AS o_year, l_extendedprice * (1 - l_discount / 100.0) as volume, n2.n_name as nation FROM ...
2.1.4.7.4.6 Common solutions for reducing gather time
• Reduce the effect of the preparation time. Avoid selecting unnecessary columns (SELECT * FROM...), or reduce the result set size by using more filters.
2.1.4.7.4.7 3. Inefficient filtering
When running statements, SQream DB tries to avoid reading data that is not needed for the statement by skipping chunks. If statements do not include efficient filtering, SQream DB will read a lot of data off disk. Insomecases, you need the data and there’s nothing to do about it. However, if most of it gets pruned further down the line, it may be efficient to skip reading the data altogether by using the metadata.
2.1.4.7.4.8 Identifying the situation
We consider the filtering to be inefficient when the Filter node shows that the number of rows processed is less than a third of the rows passed into it by the ReadTable node.
202 Chapter 2. Guides SQream DB, Release 2021.1
For example: 1. Run a query. In this example, we execute a modified query from the TPC-H benchmark. Our lineitem table contains 600,037,902 rows.
SELECT o_year, SUM(CASE WHEN nation = 'BRAZIL' THEN volume ELSE 0 END) / SUM(volume) AS mkt_ ,→share FROM (SELECT datepart(YEAR,o_orderdate) AS o_year, l_extendedprice*(1 - l_discount / 100.0) AS volume, n2.n_name AS nation FROM lineitem JOIN part ON p_partkey = CAST (l_partkey AS INT) JOIN orders ON l_orderkey = o_orderkey JOIN customer ON o_custkey = c_custkey JOIN nation n1 ON c_nationkey = n1.n_nationkey JOIN region ON n1.n_regionkey = r_regionkey JOIN supplier ON s_suppkey = l_suppkey JOIN nation n2 ON s_nationkey = n2.n_nationkey WHERE r_name = 'AMERICA' AND lineitem.l_quantity = 3 AND o_orderdate BETWEEN '1995-01-01' AND '1996-12-31' AND high_selectivity(p_type = 'ECONOMY BURNISHED NICKEL')) AS all_nations GROUP BY o_year ORDER BY o_year;
2. Observe the execution information by using the foreign table, or use show_node_info The execution below has been shortened, but note the highlighted rows for ReadTable and Filter:
1 t=> SELECT show_node_info(559); 2 stmt_id | node_id | node_type | rows | chunks | avg_rows_in_chunk |␣ ,→time | parent_node_id | read | write | comment | timeSum 3 ------+------+------+------+------+------+- ,→------+------+------+------+------+------4 559 | 1 | PushToNetworkQueue | 2 | 1 | 2 |␣ ,→2020-09-07 11:12:01 | -1 | | | | 0.28 5 559 | 2 | Rechunk | 2 | 1 | 2 |␣ ,→2020-09-07 11:12:01 | 1 | | | | 0 6 559 | 3 | SortMerge | 2 | 1 | 2 |␣ ,→2020-09-07 11:12:01 | 2 | | | | 0 7 559 | 4 | GpuToCpu | 2 | 1 | 2 |␣ ,→2020-09-07 11:12:01 | 3 | | | | 0 8 [...] 9 559 | 189 | Filter | 12007447 | 12 | 1000620 |␣ ,→2020-09-07 11:12:00 | 188 | | | | 0.3 10 559 | 190 | GpuTransform | 600037902 | 12 | 50003158 |␣ ,→2020-09-07 11:12:00 | 189 | | | | 0.02 11 559 | 191 | GpuDecompress | 600037902 | 12 | 50003158 |␣ ,→2020-09-07 11:12:00 | 190 | | | | 0.16 12 559 | 192 | GpuTransform | 600037902 | 12 | 50003158 |␣ ,→2020-09-07 11:12:00 | 191 | | | | 0.02 13 559 | 193 | CpuToGpu | 600037902 | 12 | 50003158 |␣ ,→2020-09-07 11:12:00 | 192 | | | (continues| on next 1.47 page)
2.1. Full list of guides 203 SQream DB, Release 2021.1
(continued from previous page)
14 559 | 194 | ReorderInput | 600037902 | 12 | 50003158 |␣ ,→2020-09-07 11:12:00 | 193 | | | | 0 15 559 | 195 | Rechunk | 600037902 | 12 | 50003158 |␣ ,→2020-09-07 11:12:00 | 194 | | | | 0 16 559 | 196 | CpuDecompress | 600037902 | 12 | 50003158 |␣ ,→2020-09-07 11:12:00 | 195 | | | | 0 17 559 | 197 | ReadTable | 600037902 | 12 | 50003158 |␣ ,→2020-09-07 11:12:00 | 196 | 7587MB | | public.lineitem | 0.1 18 [...] 19 559 | 208 | Filter | 133241 | 20 | 6662 |␣ ,→2020-09-07 11:11:57 | 207 | | | | 0.01 20 559 | 209 | GpuTransform | 20000000 | 20 | 1000000 |␣ ,→2020-09-07 11:11:57 | 208 | | | | 0.02 21 559 | 210 | GpuDecompress | 20000000 | 20 | 1000000 |␣ ,→2020-09-07 11:11:57 | 209 | | | | 0.03 22 559 | 211 | GpuTransform | 20000000 | 20 | 1000000 |␣ ,→2020-09-07 11:11:57 | 210 | | | | 0 23 559 | 212 | CpuToGpu | 20000000 | 20 | 1000000 |␣ ,→2020-09-07 11:11:57 | 211 | | | | 0.01 24 559 | 213 | ReorderInput | 20000000 | 20 | 1000000 |␣ ,→2020-09-07 11:11:57 | 212 | | | | 0 25 559 | 214 | Rechunk | 20000000 | 20 | 1000000 |␣ ,→2020-09-07 11:11:57 | 213 | | | | 0 26 559 | 215 | CpuDecompress | 20000000 | 20 | 1000000 |␣ ,→2020-09-07 11:11:57 | 214 | | | | 0 27 559 | 216 | ReadTable | 20000000 | 20 | 1000000 |␣ ,→2020-09-07 11:11:57 | 215 | 20MB | | public.part | 0
• The Filter on line 9 has processed 12,007,447 rows, but the output of ReadTable on public. lineitem on line 17 was 600,037,902 rows. This means that it has filtered out 98% (1 − 600037902 = 98%) of the data, but the entire table was read. 12007447 • The Filter on line 19 has processed 133,000 rows, but the output of ReadTable on public.part 133241 on line 27 was 20,000,000 rows. This means that it has filtered out >99% (1− = 99.4%) 20000000 of the data, but the entire table was read. However, this table is small enough that we can ignore it. 3. Modify the statement to see the difference Altering the statement to have a WHERE condition on the clustered l_orderkey column of the lineitem table will help SQream DB skip reading the data.
SELECT o_year, SUM(CASE WHEN nation = 'BRAZIL' THEN volume ELSE 0 END) / SUM(volume) AS mkt_ ,→share FROM (SELECT datepart(YEAR,o_orderdate) AS o_year, l_extendedprice*(1 - l_discount / 100.0) AS volume, n2.n_name AS nation FROM lineitem JOIN part ON p_partkey = CAST (l_partkey AS INT) JOIN orders ON l_orderkey = o_orderkey JOIN customer ON o_custkey = c_custkey (continues on next page)
204 Chapter 2. Guides SQream DB, Release 2021.1
(continued from previous page) JOIN nation n1 ON c_nationkey = n1.n_nationkey JOIN region ON n1.n_regionkey = r_regionkey JOIN supplier ON s_suppkey = l_suppkey JOIN nation n2 ON s_nationkey = n2.n_nationkey WHERE r_name = 'AMERICA' AND lineitem.l_orderkey > 4500000 AND o_orderdate BETWEEN '1995-01-01' AND '1996-12-31' AND high_selectivity(p_type = 'ECONOMY BURNISHED NICKEL')) AS all_nations GROUP BY o_year ORDER BY o_year;
1 t=> SELECT show_node_info(586); 2 stmt_id | node_id | node_type | rows | chunks | avg_rows_in_chunk |␣ ,→time | parent_node_id | read | write | comment | timeSum 3 ------+------+------+------+------+------+- ,→------+------+------+------+------+------4 [...] 5 586 | 190 | Filter | 494621593 | 8 | 61827699 |␣ ,→2020-09-07 13:20:45 | 189 | | | | 0.39 6 586 | 191 | GpuTransform | 494927872 | 8 | 61865984 |␣ ,→2020-09-07 13:20:44 | 190 | | | | 0.03 7 586 | 192 | GpuDecompress | 494927872 | 8 | 61865984 |␣ ,→2020-09-07 13:20:44 | 191 | | | | 0.26 8 586 | 193 | GpuTransform | 494927872 | 8 | 61865984 |␣ ,→2020-09-07 13:20:44 | 192 | | | | 0.01 9 586 | 194 | CpuToGpu | 494927872 | 8 | 61865984 |␣ ,→2020-09-07 13:20:44 | 193 | | | | 1.86 10 586 | 195 | ReorderInput | 494927872 | 8 | 61865984 |␣ ,→2020-09-07 13:20:44 | 194 | | | | 0 11 586 | 196 | Rechunk | 494927872 | 8 | 61865984 |␣ ,→2020-09-07 13:20:44 | 195 | | | | 0 12 586 | 197 | CpuDecompress | 494927872 | 8 | 61865984 |␣ ,→2020-09-07 13:20:44 | 196 | | | | 0 13 586 | 198 | ReadTable | 494927872 | 8 | 61865984 |␣ ,→2020-09-07 13:20:44 | 197 | 6595MB | | public.lineitem | 0.09 14 [...]
In this example, the filter processed 494,621,593 rows, while the output of ReadTable on public. 494621593 lineitem was 494,927,872 rows. This means that it has filtered out all but 0.01% (1 − = 494927872 0.01%) of the data that was read. The metadata skipping has performed very well, and has pre-filtered the data for us by pruning unnecessary chunks.
2.1.4.7.4.9 Common solutions for improving filtering
• Use clustering keys and naturally ordered data in your filters. • Avoid full table scans when possible
2.1. Full list of guides 205 SQream DB, Release 2021.1
2.1.4.7.4.10 4. Joins with varchar keys
Joins on long text keys, such as varchar(100) do not perform as well as numeric data types or very short text keys.
2.1.4.7.4.11 Identifying the situation
When a join is inefficient, you may note that a query spends a lot of time onthe Join node. For example, consider these two table structures:
CREATE TABLE t_a ( amt FLOAT NOT NULL, i INT NOT NULL, ts DATETIME NOT NULL, country_code VARCHAR(3) NOT NULL, flag VARCHAR(10) NOT NULL, fk VARCHAR(50) NOT NULL );
CREATE TABLE t_b ( id VARCHAR(50) NOT NULL prob FLOAT NOT NULL, j INT NOT NULL, );
1. Run a query. In this example, we will join t_a.fk with t_b.id, both of which are VARCHAR(50).
SELECT AVG(t_b.j :: BIGINT), t_a.country_code FROM t_a JOIN t_b ON (t_a.fk = t_b.id) GROUP BY t_a.country_code
2. Observe the execution information by using the foreign table, or use show_node_info The execution below has been shortened, but note the highlighted rows for Join. The Join node is by far the most time-consuming part of this statement - clocking in at 69.7 seconds joining 1.5 billion records.
1 t=> SELECT show_node_info(5); 2 stmt_id | node_id | node_type | rows | chunks | avg_rows_in_chunk␣ ,→| time | parent_node_id | read | write | comment | timeSum 3 ------+------+------+------+------+------,→+------+------+------+------+------+------4 [...] 5 5 | 19 | GpuTransform | 1497366528 | 204 | 7340032␣ ,→| 2020-09-08 18:29:03 | 18 | | | | 1.46 6 5 | 20 | ReorderInput | 1497366528 | 204 | 7340032␣ ,→| 2020-09-08 18:29:03 | 19 | | | | 0 (continues on next page)
206 Chapter 2. Guides SQream DB, Release 2021.1
(continued from previous page)
7 5 | 21 | ReorderInput | 1497366528 | 204 | 7340032␣ ,→| 2020-09-08 18:29:03 | 20 | | | | 0 8 5 | 22 | Join | 1497366528 | 204 | 7340032␣ ,→| 2020-09-08 18:29:03 | 21 | | | inner | 69.7 9 5 | 24 | AddSortedMinMaxMet.. | 6291456 | 1 | 6291456␣ ,→| 2020-09-08 18:26:05 | 22 | | | | 0 10 5 | 25 | Sort | 6291456 | 1 | 6291456␣ ,→| 2020-09-08 18:26:05 | 24 | | | | 2.06 11 [...] 12 5 | 31 | ReadTable | 6291456 | 1 | 6291456␣ ,→| 2020-09-08 18:26:03 | 30 | 235MB | | public.t_b | 0.02 13 [...] 14 5 | 41 | CpuDecompress | 10000000 | 2 | 5000000␣ ,→| 2020-09-08 18:26:09 | 40 | | | | 0 15 5 | 42 | ReadTable | 10000000 | 2 | 5000000␣ ,→| 2020-09-08 18:26:09 | 41 | 14MB | | public.t_a | 0
2.1.4.7.4.12 Improving query performance
• In general, try to avoid VARCHAR as a join key. As a rule of thumb, BIGINT works best as a join key. • Convert text values on-the-fly before running the query. For example, the CRC64 function takes a text input and returns a BIGINT hash. For example:
SELECT AVG(t_b.j :: BIGINT), t_a.country_code FROM t_a JOIN t_b ON (crc64_join(t_a.fk) = crc64_join(t_b.id)) GROUP BY t_a.country_code
The execution below has been shortened, but note the highlighted rows for Join. The Join node went from taking nearly 70 seconds, to just 6.67 seconds for joining 1.5 billion records.
1 t=> SELECT show_node_info(6); 2 stmt_id | node_id | node_type | rows | chunks | avg_rows_in_ ,→chunk | time | parent_node_id | read | write | comment |␣ ,→timeSum 3 ------+------+------+------+------+------,→--+------+------+------+------+------+------4 [...] 5 6 | 19 | GpuTransform | 1497366528 | 85 | ␣ ,→17825792 | 2020-09-08 18:57:04 | 18 | | | | ␣ ,→1.48 6 6 | 20 | ReorderInput | 1497366528 | 85 | ␣ ,→17825792 | 2020-09-08 18:57:04 | 19 | | | | ␣ ,→ 0 7 6 | 21 | ReorderInput | 1497366528 | 85 | ␣ ,→17825792 | 2020-09-08 18:57:04 | 20 | | | | ␣ ,→ 0 8 6 | 22 | Join | 1497366528 | 85 | ␣ ,→17825792 | 2020-09-08 18:57:04 | 21 | | |(continues inner on next| page)␣ ,→6.67
2.1. Full list of guides 207 SQream DB, Release 2021.1
(continued from previous page)
9 6 | 24 | AddSortedMinMaxMet.. | 6291456 | 1 | ␣ ,→6291456 | 2020-09-08 18:55:12 | 22 | | | | ␣ ,→ 0 10 [...] 11 6 | 32 | ReadTable | 6291456 | 1 | ␣ ,→6291456 | 2020-09-08 18:55:12 | 31 | 235MB | | public.t_b | ␣ ,→0.02 12 [...] 13 6 | 43 | CpuDecompress | 10000000 | 2 | ␣ ,→5000000 | 2020-09-08 18:55:13 | 42 | | | | ␣ ,→ 0 14 6 | 44 | ReadTable | 10000000 | 2 | ␣ ,→5000000 | 2020-09-08 18:55:13 | 43 | 14MB | | public.t_a | ␣ ,→ 0
• You can map some text values to numeric types by using a dimension table. Then, reconcile the values when you need them by joining the dimension table.
2.1.4.7.4.13 5. Sorting on big VARCHAR fields
In general, SQream DB automatically inserts a Sort node which arranges the data prior to reductions and aggregations. When running a GROUP BY on large VARCHAR fields, you may see nodes for Sort and Reduce taking a long time.
2.1.4.7.4.14 Identifying the situation
When running a statement, inspect it with SHOW_NODE_INFO. If you see Sort and Reduce among your top five longest running nodes, there is a potential issue. For example: 1. Run a query to test it out. Our t_inefficient table contains 60,000,000 rows, and the structure is simple, but with an oversized country_code column:
CREATE TABLE t_inefficient ( i INT NOT NULL, amt DOUBLE NOT NULL, ts DATETIME NOT NULL, country_code VARCHAR(100) NOT NULL, flag VARCHAR(10) NOT NULL, string_fk VARCHAR(50) NOT NULL );
We will run a query, and inspect it’s execution details:
t=> SELECT country_code, . SUM(amt) . FROM t_inefficient (continues on next page)
208 Chapter 2. Guides SQream DB, Release 2021.1
(continued from previous page) . GROUP BY country_code; executed time: 47.55s
country_code | sum ------+------VUT | 1195416012 GIB | 1195710372 TUR | 1195946178 [...]
t=> select show_node_info(30); stmt_id | node_id | node_type | rows | chunks | avg_rows_in_chunk |␣ ,→time | parent_node_id | read | write | comment |␣ ,→timeSum ------+------+------+------+------+------+---- ,→------+------+------+------+------+------30 | 1 | PushToNetworkQueue | 249 | 1 | 249 |␣ ,→2020-09-10 16:17:10 | -1 | | | | ␣ ,→0.25 30 | 2 | Rechunk | 249 | 1 | 249 |␣ ,→2020-09-10 16:17:10 | 1 | | | | ␣ ,→ 0 30 | 3 | ReduceMerge | 249 | 1 | 249 |␣ ,→2020-09-10 16:17:10 | 2 | | | | ␣ ,→0.01 30 | 4 | GpuToCpu | 1508 | 15 | 100 |␣ ,→2020-09-10 16:17:10 | 3 | | | | ␣ ,→ 0 30 | 5 | Reduce | 1508 | 15 | 100 |␣ ,→2020-09-10 16:17:10 | 4 | | | | ␣ ,→7.23 30 | 6 | Sort | 60000000 | 15 | 4000000 |␣ ,→2020-09-10 16:17:10 | 5 | | | | ␣ ,→36.8 30 | 7 | GpuTransform | 60000000 | 15 | 4000000 |␣ ,→2020-09-10 16:17:10 | 6 | | | | ␣ ,→0.08 30 | 8 | GpuDecompress | 60000000 | 15 | 4000000 |␣ ,→2020-09-10 16:17:10 | 7 | | | | ␣ ,→2.01 30 | 9 | CpuToGpu | 60000000 | 15 | 4000000 |␣ ,→2020-09-10 16:17:10 | 8 | | | | ␣ ,→0.16 30 | 10 | Rechunk | 60000000 | 15 | 4000000 |␣ ,→2020-09-10 16:17:10 | 9 | | | | ␣ ,→ 0 30 | 11 | CpuDecompress | 60000000 | 15 | 4000000 |␣ ,→2020-09-10 16:17:10 | 10 | | | | ␣ ,→ 0 30 | 12 | ReadTable | 60000000 | 15 | 4000000 |␣ ,→2020-09-10 16:17:10 | 11 | 520MB | | public.t_inefficient | ␣ (continues on next page) ,→0.05
2.1. Full list of guides 209 SQream DB, Release 2021.1
(continued from previous page)
2. We can look to see if there’s any shrinking we can do on the GROUP BY key
t=> SELECT MAX(LEN(country_code)) FROM t_inefficient; max --- 3
With a maximum string length of just 3 characters, our VARCHAR(100) is way oversized. 3. We can recreate the table with a more restrictive VARCHAR(3), and can examine the difference in performance:
t=> CREATE TABLE t_efficient . AS SELECT i, . amt, . ts, . country_code::VARCHAR(3) AS country_code, . flag . FROM t_inefficient; executed time: 16.03s
t=> SELECT country_code, . SUM(amt::bigint) . FROM t_efficient . GROUP BY country_code; executed time: 4.75s country_code | sum ------+------VUT | 1195416012 GIB | 1195710372 TUR | 1195946178 [...]
This time, the entire query took just 4.75 seconds, or just about 91% faster.
2.1.4.7.4.15 Improving sort performance on text keys
When using VARCHAR, ensure that the maximum length defined in the table structure is as smallas necessary. For example, if you’re storing phone numbers, don’t define the field as VARCHAR(255), as that affects sort performance. You can run a query to get the maximum column length (e.g. MAX(LEN(a_column))), and potentially modify the table structure.
2.1.4.7.4.16 6. High selectivity data
Selectivity is the ratio of cardinality to the number of records of a chunk. We define selectivity as Distinct values Total number of records in a chunk
210 Chapter 2. Guides SQream DB, Release 2021.1
SQream DB has a hint called HIGH_SELECTIVITY, which is a function you can wrap a condition in. The hint signals to SQream DB that the result of the condition will be very sparse, and that it should attempt to rechunk the results into fewer, fuller chunks.
Note: SQream DB doesn’t do this automatically because it adds a significant overhead on naturally ordered and well-clustered data, which is the more common scenario.
2.1.4.7.4.17 Identifying the situation
This is easily identifiable - when the amount of average of rows in a chunk is small, followinga Filter operation. Consider this execution plan:
t=> select show_node_info(30); stmt_id | node_id | node_type | rows | chunks | avg_rows_in_chunk | time ␣ ,→ | parent_node_id | read | write | comment | timeSum ------+------+------+------+------+------+------,→------+------+------+------+------+------[...] 30 | 38 | Filter | 18160 | 74 | 245 | 2020-09- ,→10 12:17:09 | 37 | | | | 0.012 [...] 30 | 44 | ReadTable | 77000000 | 74 | 1040540 | 2020-09- ,→10 12:17:09 | 43 | 277MB | | public.dim | 0.058
The table was read entirely - 77 million rows into 74 chunks. The filter node reduced the output to just 18,160 relevant rows, but they’re distributed across theoriginal 74 chunks. All of these rows could fit in one single chunk, instead of spanning 74 rather sparse chunks.
2.1.4.7.4.18 Improving performance with high selectivity hints
• Use when there’s a WHERE condition on an unclustered column, and when you expect the filter to cut out more than 60% of the result set. • Use when the data is uniformly distributed or random
2.1.4.7.4.19 7. Performance of unsorted data in joins
When data is not well-clustered or naturally ordered, a join operation can take a long time.
2.1.4.7.4.20 Identifying the situation
When running a statement, inspect it with SHOW_NODE_INFO. If you see Join and DeferredGather among your top five longest running nodes, there is a potential issue. In this case, we’re also interested in the number of chunks produced by these nodes. Consider this execution plan:
2.1. Full list of guides 211 SQream DB, Release 2021.1
t=> select show_node_info(30); stmt_id | node_id | node_type | rows | chunks | avg_rows_in_chunk | time ␣ ,→ | parent_node_id | read | write | comment | timeSum ------+------+------+------+------+------+------,→------+------+------+------+------+------[...] 30 | 13 | ReorderInput | 181582598 | 70596 | 2572 | 2020-09- ,→10 12:17:10 | 12 | | | | 4.681 30 | 14 | DeferredGather | 181582598 | 70596 | 2572 | 2020-09- ,→10 12:17:10 | 13 | | | | 29.901 30 | 15 | ReorderInput | 181582598 | 70596 | 2572 | 2020-09- ,→10 12:17:10 | 14 | | | | 3.053 30 | 16 | GpuToCpu | 181582598 | 70596 | 2572 | 2020-09- ,→10 12:17:10 | 15 | | | | 5.798 30 | 17 | ReorderInput | 181582598 | 70596 | 2572 | 2020-09- ,→10 12:17:10 | 16 | | | | 2.899 30 | 18 | ReorderInput | 181582598 | 70596 | 2572 | 2020-09- ,→10 12:17:10 | 17 | | | | 3.695 30 | 19 | Join | 181582598 | 70596 | 2572 | 2020-09- ,→10 12:17:10 | 18 | | | inner | 22.745 [...] 30 | 38 | Filter | 18160 | 74 | 245 | 2020-09- ,→10 12:17:09 | 37 | | | | 0.012 [...] 30 | 44 | ReadTable | 77000000 | 74 | 1040540 | 2020-09- ,→10 12:17:09 | 43 | 277MB | | public.dim | 0.058
• Join is the node that matches rows from both table relations. • DeferredGather gathers the required column chunks to decompress Pay special attention to the volume of data removed by the Filter node. The table was read entirely - 77 million rows into 74 chunks. The filter node reduced the output to just 18,160 relevant rows, but they’re distributed across theoriginal 74 chunks. All of these rows could fit in one single chunk, instead of spanning 74 rather sparse chunks.
2.1.4.7.4.21 Improving join performance when data is sparse
You can tell SQream DB to reduce the amount of chunks involved, if you know that the filter is going to be quite agressive by using the HIGH_SELECTIVITY hint described above. This forces the compiler to rechunk the data into fewer chunks. To tell SQream DB to rechunk the data, wrap a condition (or several) in the HIGH_SELECTIVITY hint: -- Without the hint SELECT * FROM cdrs WHERE RequestReceiveTime BETWEEN '2018-01-01 00:00:00.000' AND '2018-08-31 23:59:59.999' AND EnterpriseID=1150 AND MSISDN='9724871140341';
-- With the hint (continues on next page)
212 Chapter 2. Guides SQream DB, Release 2021.1
(continued from previous page) SELECT * FROM cdrs WHERE HIGH_SELECTIVITY(RequestReceiveTime BETWEEN '2018-01-01 00:00:00.000' AND '2018-08- ,→31 23:59:59.999') AND EnterpriseID=1150 AND MSISDN='9724871140341';
2.1.4.7.4.22 8. Manual join reordering
When joining multiple tables, you may wish to change the join order to join the smallest tables first.
2.1.4.7.4.23 Identifying the situation
When joining more than two tables, the Join nodes will be the most time-consuming nodes.
2.1.4.7.4.24 Changing the join order
Always prefer to join the smallest tables first.
Note: We consider small tables to be tables that only retain a small amount of rows after conditions are applied. This bears no direct relation to the amount of total rows in the table.
Changing the join order can reduce the query runtime significantly. In the examples below, we reduce the time from 27.3 seconds to just 6.4 seconds.
Listing 11: Original query -- This variant runs in 27.3 seconds SELECT SUM(l_extendedprice / 100.0*(1 - l_discount / 100.0)) AS revenue, c_nationkey FROM lineitem --6B Rows, ~183GB
JOIN orders --1.5B Rows, ~55GB ON l_orderkey = o_orderkey JOIN customer --150M Rows, ~12GB ON c_custkey = o_custkey
WHERE c_nationkey = 1 AND o_orderdate >= DATE '1993-01-01' AND o_orderdate < '1994-01-01' AND l_shipdate >= '1993-01-01' AND l_shipdate <= dateadd(DAY,122,'1994-01-01') GROUP BY c_nationkey
2.1. Full list of guides 213 SQream DB, Release 2021.1
Listing 12: Modified query with improved join order -- This variant runs in 6.4 seconds SELECT SUM(l_extendedprice / 100.0*(1 - l_discount / 100.0)) AS revenue, c_nationkey FROM orders --1.5B Rows, ~55GB
JOIN customer --150M Rows, ~12GB ON c_custkey = o_custkey JOIN lineitem --6B Rows, ~183GB ON l_orderkey = o_orderkey
WHERE c_nationkey = 1 AND o_orderdate >= DATE '1993-01-01' AND o_orderdate < '1994-01-01' AND l_shipdate >= '1993-01-01' AND l_shipdate <= dateadd(DAY,122,'1994-01-01') GROUP BY c_nationkey
2.1.4.7.5 Further reading
See our Optimization and Best Practices guide for more information about query optimization and data loading considerations.
2.1.4.8 Logging
2.1.4.8.1 Locating the Log Files
The storage cluster contains a logs directory. Each worker produces a log file in its own directory, which can be identified by the worker’s hostname and port.
Note: Additional internal debug logs may reside in the main logs directory.
The worker logs contain information messages, warnings, and errors pertaining to SQream DB’s operation, including: • Server start-up and shutdown • Configuration changes • Exceptions and errors • User login events • Session events • Statement execution success / failure • Statement execution statistics
214 Chapter 2. Guides SQream DB, Release 2021.1
2.1.4.8.1.1 Log Structure and Contents
The log is a CSV, with several fields.
Table 8: Log fields Field Description #SQ# Start delimiter. When used with the end of line delimiter can be used to parse multi-line statements correctly Row Id Unique identifier for the row Timestamp Timestamp for the message (ISO 8601 date format) Information Information level of the message. See information level table below Level Thread Id System thread identifier (internal use) Worker host- Hostname of the worker that generated the message name Worker port Port of the worker that generated the message Connection Id Connection Id for the message. Defaults to -1 if no connection Database Database name that generated the message. Can be empty for no database name User Id User role that was connected during the message. Can be empty if no user caused the message Statement Id Statement Id for the message. Defaults to -1 if no statement Service name Service name for the connection. Can be empty. Message type Message type Id. See message type table below) Id Message Content for the message #EOM# End of line delimiter
Table 9: Information Level Level Description SYSTEM System information like start up, shutdown, configuration change FATAL Fatal errors that may cause outage ERROR Errors encountered during statement execution WARNING Warnings INFO Information and statistics
2.1. Full list of guides 215 SQream DB, Release 2021.1
Table 10: Message Type Type Level Description Example message con- tent 1 INFO Statement start infor- • "Query before mation parsing (state- ment handle opened) • "SELECT * FROM nba WHERE ""Team"" NOT LIKE ""Portland%%""" (statement preparing)
2 INFO Statement passed to an- • ""Reconstruct other worker for execu- query before tion parsing" • "SELECT * FROM nba WHERE ""Team"" NOT LIKE ""Portland%%""" (statement preparing on node)
4 INFO Statement has entered "Statement execution execution" 10 INFO Statement execution "Success" / "Failed" completed 20 INFO Compilation error, with "Could not find accompanying error function dateplart message in catalog." 21 INFO Execution error, with Error text accompanying error message 30 INFO Size of data read from 18 disk in megabytes 31 INFO Row count of result set 45 32 INFO Processed Rows 450134749978 100 INFO Session start - Client IP "192.168.5.5" address 101 INFO Login "Login Success" / "Login Failed" 110 INFO Session end "Session ended" 200 INFO SHOW_NODE_INFO periodic output 500 ERROR Exception occured in a "Cannot return the statement inverse cosine of a number not in [-1,1] range" 1000 SYSTEM Worker startup message "Server Start 216 TimeChapter - 2019-12-30 2. Guides 21:18:31, SQream ver{v2020.1}" 1002 SYSTEM Metadata Metadata server location 1003 SYSTEM Show all configuration values "Flags␣ ,→configuration: compileFlags,␣ ,→extendedAssertions, ,→ false, true; compileFlags,␣ ,→useSortMergeJoin, ,→ false, false; compileFlags,␣ ,→distinctAggregatesOnHost, ,→ true, false; [...]" 1004 SYSTEM SQream DB metadata "23" version 1010 FATAL Fatal server error "Mismatch in storage version, upgrade is needed, Storage version: 22, Server version is: 23" 1090 INFO Configuration change Successful set config useSortMergeJoin to value: true 1100 SYSTEM Worker shutdown "Server shutdown" SQream DB, Release 2021.1
2.1.4.8.1.2 Log-Naming
Log file name syntax sqream_
2.1.4.8.2 Log Control and Maintenance
2.1.4.8.2.1 Changing Log Verbosity
A few configuration settings alter the verbosity of the logs:
Table 11: Log verbosity configuration Flag Description De- Values fault logClientLevelUsed to control which log level should appear in the 4 0 SYSTEM (lowest) - 4 logs (INFO) INFO (highest). See infor- mation level table above. nodeInfoLoggingSecSets an interval for automatically logging long- 60 Positive whole number >=1. running statements’ SHOW_NODE_INFO output. (every Output is written as a message type 200. minute)
2.1.4.8.2.2 Changing Log Rotation
A few configuration settings alter the log rotation policy:
Table 12: Log rotation configuration Flag Description De- Values fault useLogMaxFileSizeRotate log files once they reach a certain file size. When false false or true. true, set the logMaxFileSizeMB accordingly. logMaxFileSizeMB Sets the size threshold in megabytes after which a new log 20 1 to 1024 (1MB file will be opened. to 1GB) logFileRotateTimeFrequencyFrequency of log rotation never daily, weekly, monthly, never
2.1.4.8.3 Collecting Logs from Your Cluster
Collecting logs from your cluster can be as simple as creating an archive from the logs subdirectory: tar -czvf logs.tgz *.log.
2.1. Full list of guides 217 SQream DB, Release 2021.1
However, SQream DB comes bundled with a data collection utility and an SQL utility intended for collecting logs and additional information that can help SQream support drill down into possible issues.
2.1.4.8.3.1 SQL Syntax
SELECT REPORT_COLLECTION(output_path, mode) ; output_path ::= filepath mode ::= log | db | db_and_log
2.1.4.8.3.2 Command Line Utility
If you cannot access SQream DB for any reason, you can also use a command line toolto collect the same information:
$ ./bin/report_collection
2.1.4.8.3.3 Parameters
Pa- Description rame- ter output_pathPath for the output archive. The output file will be named report_
2.1.4.8.3.4 Example
Write an archive to /home/rhendricks, containing log files:
SELECT REPORT_COLLECTION('/home/rhendricks', 'log') ;
Write an archive to /home/rhendricks, containing log files and metadata database:
SELECT REPORT_COLLECTION('/home/rhendricks', 'db_and_log') ;
Using the command line utility:
$ ./bin/report_collection /home/rhendricks/sqream_storage /home/rhendricks db_and_log
218 Chapter 2. Guides SQream DB, Release 2021.1
2.1.4.8.4 Troubleshooting with Logs
2.1.4.8.4.1 Loading Logs with Foreign Tables
Assuming logs are stored at /home/rhendricks/sqream_storage/logs/, a database administrator can ac- cess the logs using the Foreign Tables concept through SQream DB.
CREATE FOREIGN TABLE logs ( start_marker VARCHAR(4), row_id BIGINT, timestamp DATETIME, message_level TEXT, thread_id TEXT, worker_hostname TEXT, worker_port INT, connection_id INT, database_name TEXT, user_name TEXT, statement_id INT, service_name TEXT, message_type_id INT, message TEXT, end_message VARCHAR(5) ) WRAPPER csv_fdw OPTIONS ( LOCATION = '/home/rhendricks/sqream_storage/logs/**/sqream*.log', DELIMITER = '|' CONTINUE_ON_ERROR = true ) ;
For more information, see Loading Logs with Foreign Tables.
2.1.4.8.4.2 Counting Message Types t=> SELECT message_type_id, COUNT(*) FROM logs GROUP BY 1; message_type_id | count ------+------0 | 9 1 | 5578 4 | 2319 10 | 2788 20 | 549 30 | 411 31 | 1720 32 | 1720 100 | 2592 101 | 2598 (continues on next page)
2.1. Full list of guides 219 SQream DB, Release 2021.1
(continued from previous page) 110 | 2571 200 | 11 500 | 136 1000 | 19 1003 | 19 1004 | 19 1010 | 5
2.1.4.8.4.3 Finding Fatal Errors t=> SELECT message FROM logs WHERE message_type_id=1010; Internal Runtime Error,open cluster metadata database:IO error: lock /home/rhendricks/ ,→sqream_storage/leveldb/LOCK: Resource temporarily unavailable Internal Runtime Error,open cluster metadata database:IO error: lock /home/rhendricks/ ,→sqream_storage/leveldb/LOCK: Resource temporarily unavailable Mismatch in storage version, upgrade is needed,Storage version: 25, Server version is: 26 Mismatch in storage version, upgrade is needed,Storage version: 25, Server version is: 26 Internal Runtime Error,open cluster metadata database:IO error: lock /home/rhendricks/ ,→sqream_storage/LOCK: Resource temporarily unavailable
2.1.4.8.4.4 Countng Error Events Within a Certain Timeframe t=> SELECT message_type_id, . COUNT(*) . FROM logs . WHERE message_type_id IN (1010,500) . AND timestamp BETWEEN '2019-12-20' AND '2020-01-01' . GROUP BY 1; message_type_id | count ------+------500 | 18 1010 | 3
2.1.4.8.4.5 Tracing Errors to Find Offending Statements
If we know an error occured, but don’t know which statement caused it, we can find it using the connection ID and statement ID. t=> SELECT connection_id, statement_id, message . FROM logs . WHERE message_level = 'ERROR' . AND timestamp BETWEEN '2020-01-01' AND '2020-01-06'; connection_id | statement_id | message ------+------+------,→------,→------79 | 67 | Column type mismatch, expected UByte, got INT64 on column␣ ,→Number, file name: /home/sqream/nba.parquet (continues on next page)
220 Chapter 2. Guides SQream DB, Release 2021.1
(continued from previous page)
Use the connection_id and statement_id to narrow down the results. t=> SELECT database_name, message FROM logs . WHERE connection_id=79 AND statement_id=67 AND message_type_id=1; database_name | message ------+------master | Query before parsing master | SELECT * FROM nba_parquet
2.1.4.9 Configuration
The Configuration page describes SQream’s method for configuring your instance of SQream and includes the following topics:
• Overview • Configuring Your Instance of SQream • Session-Based Configuration • Cluster-Based Configuration • Configuration Flag Types • Configuration Commands • Configuration Roles • Showing All Flags in the Catalog Table
2.1.4.9.1 Overview
Modifications that you make to your configurations are persistent based on whether they are madeatthe session or cluster level. Persistent configurations are modifications made to attributes that are retained after shutting down your system.
2.1.4.9.2 Configuring Your Instance of SQream
The Configuring Your Instance of SQream section describes the following:
• Configuring SQream Using the SQream Configuration File • Configuring SQream Using a Legacy Configuration File
2.1. Full list of guides 221 SQream DB, Release 2021.1
2.1.4.9.2.1 Configuring SQream Using the SQream Configuration File
You can configure your instance of SQream using the SQream configuration file (config.json). This file displays all the parameters required for configuring your instance of SQream and are read-only.The parameter settings in this file are stored at the metadata level and are applied globally to allworkers connected to it. The following is an example of a worker configuration file:
{ “cluster”: “/home/test_user/sqream_testing_temp/sqreamdb”, “gpu”: 0, “licensePath”: “home/test_user/SQream/tests/license.enc”, “machineIP”: “127.0.0.1”, “metadataServerIp”: “127.0.0.1”, “metadataServerPort”: “3105, “port”: 5000, “useConfigIP”” true, “legacyConfigFilePath”: “home/SQream_develop/SqrmRT/utils/json/legacy_congif.json” }
You can access the SQream configuration file from the legacyConfigFilePath parameter shown above.
2.1.4.9.2.2 Configuring SQream Using a Legacy Configuration File
The legacy configuration file provides access to the read/write flags used in SQream’s previous configuration method. A link to this file is provided in the legacyConfigFilePath parameter in the worker configuration file. The following is an example of the legacy configuration file:
{ “developerMode”: true, “reextentUse”: false, “useClientLog”: true, “useMetadataServer”” false }
2.1.4.9.3 Session-Based Configuration
Session-based configurations are not persistent and are deleted when your session ends. This method enables you to modify all required configurations while avoiding conflicts between flag attributes modified on different devices at different points in time. The SET flag_name command is used to modify flag attributes. Any modifications you make with the SET flag_name command apply only to the open session, and are not saved when the session ends. For more information, see the following: • Using SQream SQL - modifying flag attributes from the CLI. • SQream Acceleration Studio - modifying flag attributes from Studio.
222 Chapter 2. Guides SQream DB, Release 2021.1
2.1.4.9.4 Cluster-Based Configuration
SQream uses cluster-based configuration, enabling you to centralize configurations for all workers onthe cluster. Only flags set to the regular or cluster flag type have access to cluster-based configuration. Config- urations made on the cluster level are persistent. For more information, see the following: • Using SQream SQL - modifying flag attributes from the CLI. • SQream Acceleration Studio - modifying flag attributes from Studio. For more information on flag-based access to cluster-based configuration, see Configuration Flag Types below.
2.1.4.9.5 Configuration Flag Types
The flag type attribute can be set for each flag and determines its write access asfollows: • Regular: session-based read/write flags that can be stored in the metadata file • Cluster: global cluster-based read/write flags that can be stored in the metadata file • Worker: single worker-based read-only flags that can be stored in the worker configuration file The flag type determines which files can be accessed and which commands or commands sets users canrun. The following table describes the file or command modification rights for each flag type:
Flag Type Legacy Configuration File ALTER SYSTEM SET Worker Config File Regular Can modify Can modify Cannot modify Cluster Cannot modify Can modify Cannot modify Worker Cannot modify Cannot modify Can modify
2.1.4.9.6 Configuration Commands
The configuration commands are associated with particular flag types based on permissions. The following table describes the commands or command sets that can be run based on their flag type:
2.1. Full list of guides 223 SQream DB, Release 2021.1
Flag Type Command Description Example Regular SET
Regular, Cluster, show_conf_extended Used to print all in- spoolMemoryGB,15,false,generic,regular,Amount Worker UF formation output by of memory (GB) the the show_conf UF server can use for command, in addition spooling,”Statement to description, usage, that perform “”group data type, default value by””, “”order by”” or and range. “”join”” operation(s) on large set of data will run much faster if given enough spool memory, otherwise disk spooling will be used re- sulting in performance hit.”,uint„0-5000 Regular, Cluster, show_md_flag UF Used to show a specific Worker flag/all flags stored in • Example 1: * the metadata file. master=> AL- TER SYSTEM SET heartbeat- Timeout=111; • Example 2: * master=> select show_md_flag(‘all’); heartbeatTime- out,111 • Example 3: * master=> select show_md_flag(‘heartbeatTimeout’); heartbeatTime- out,111 224 Chapter 2. Guides SQream DB, Release 2021.1
2.1.4.9.7 Configuration Roles
SQream divides flags into the following roles, each with their own set of permissions: • Generic – Flags that can be modified by standard users on a session basis. • Admin – Flags that can be modified by administrators on a session and cluster basis using theALTER SYSTEM SET command. The following table describes the Generic and Admin configuration flags:
2.1.4.9.8 Showing All Flags in the Catalog Table
SQream uses the sqream_catalog.parameters catalog table for showing all flags, providing the scope (default, cluster and session), description, default value and actual value. The following is the correct syntax for a catalog table query:
SELECT * FROM sqream_catalog.settings
The following is an example of a catalog table query: externalTableBlobEstimate, 100, 100, default, varcharEncoding, ascii, ascii, default, Changes the expected encoding for Varchar columns useCrcForTextJoinKeys, true, true, default, hiveStyleImplicitStringCasts, false, false, default,
2.1.4.10 Troubleshooting
In this topic:
• Troubleshooting common issues – Troubleshoot cluster setup and configuration – Troubleshoot connectivity issues – Troubleshoot query performance – Troubleshoot query behavior – File an issue with SQream support • Examining logs • Start a temporary SQream DB for testing
Follow this checklist if you find that the performance is slower than you expect.
2.1. Full list of guides 225 SQream DB, Release 2021.1
Table 13: Troubleshooting checklist Step Description Results 1 A single query is slow If a query isn’t performing as you expect, follow the Query best practices part of the Optimization and Best Practices guide. If all queries are slow, continue to step 2. 2 All queries on a specific table are slow 1. If all queries on a specific table aren’t performing as you expect, follow the Table design best practices part of the Optimization and Best Practices guide. 2. Check for active delete predicates in the table. Consult the Deleting Data guide for more information. If the problem spans all tables, continue to step 3. 3 Check that all workers are up Use SELECT show_cluster_nodes(); to list the active cluster workers. If the worker list is incomplete, follow the cluster troubleshooting section below. If all workers are up, continue to step 4. 4 Check that all workers are per- forming well 1. Identify if a specific worker is slower than others by running the same query on different workers. (e.g. by connecting directly to the worker or through a service queue) 2. If a specific worker is slower than others, inves- tigate performance issues on the host using standard monitoring tools (e.g. top). 3. Restart SQream DB work- ers on the problematic host. If all workers are performing well, continue to step 5. 5 Check if the workload is balanced 1. Run the same query sev- across all workers eral times and check that it appears across multi- ple workers (use SELECT show_server_status() to monitor) 2. If some workers have a heavier workload, check the service queue usage. Refer 226 to the WorkloadChapter 2. Manager Guides guide. If the workload is balanced, con- tinue to step 6. 6 Check if there are long running statements 1. Identify any cur- rently running state- ments (use SELECT show_server_status() to monitor) 2. If there are more state- ments than available re- sources, some statements may be in an In queue mode. 3. If there is a statement that has been running for too long and is block- ing the queue, consider stopping it (use SELECT stop_statement(
2.1.4.10.1 Troubleshooting common issues
2.1.4.10.1.1 Troubleshoot cluster setup and configuration
1. Note any errors - Make a note of any error you see, or check the logs for errors you might have missed. 2. If SQream DB can’t start, start SQream DB on a new storage cluster, with default settings. If it still can’t start, there could be a driver or hardware issue. Contact SQream support. 3. Reproduce the issue with a standalone SQream DB - starting up a temporary, standalone SQream DB can isolate the issue to a configuration issue, network issue, or similar. 4. Reproduce on a minimal example - Start a standalone SQream DB on a clean storage cluster and try to replicate the issue if possible.
2.1.4.10.1.2 Troubleshoot connectivity issues
1. Verify the correct login credentials - username, password, and database name. 2. Verify the host name and port 3. Try connecting directly to a SQream DB worker, rather than via the load balancer 4. Verify that the driver version you’re using is supported by the SQream DB version. Driver versions often get updated together with major SQream DB releases. 5. Try connecting directly with the built in SQL client. If you can connect with the local SQL client, check network availability and firewall settings.
2.1.4.10.1.3 Troubleshoot query performance
1. Use SHOW_NODE_INFO to examine which building blocks consume time in a statement. If the query has finished, but the results are not yet materialized in the client, it could point toaproblemin the application’s data buffering or a network throughput issue.. 2. If a problem occurs through a 3rd party client, try reproducing it directly with the built in SQL client. If the performance is better in the local client, it could point to a problem in the application or network connection. 3. Consult the Optimization and Best Practices guide to learn how to optimize queries and table structures.
2.1.4.10.1.4 Troubleshoot query behavior
1. Consult the SQL Statements and Syntax reference to verify if a statement or syntax behaves correctly. SQream DB may have some differences in behavior when compared to other databases. 2. If a problem occurs through a 3rd party client, try reproducing it directly with the built in SQL client. If the problem still occurs, file an issue with SQream support.
2.1.4.10.1.5 File an issue with SQream support
To file an issue, follow our Gathering information for SQream support guide.
2.1. Full list of guides 227 SQream DB, Release 2021.1
2.1.4.10.2 Examining logs
See the Collecting logs and metadata database section of the Gathering information for SQream support guide for information about collecting logs for support.
2.1.4.10.3 Start a temporary SQream DB for testing
Starting a SQream DB temporarily (not as part of a cluster, with default settings) can be helpful in identifying configuration issues. Example:
$ sqreamd /home/rhendricks/raviga_database 0 5000 /home/sqream/.sqream/license.enc
Tip: • Using nohup and & sends SQream DB to run in the background. • It is safe to stop SQream DB at any time using kill. No partial data or data corruption should occur when using this method to stop the process.
$ kill -9 $SQREAM_PID
2.1.4.11 Gathering information for SQream support
SQream Support is ready to answer any questions, and help solve any issues with SQream DB.
2.1.4.11.1 Getting support and reporting bugs
When contacting SQream Support, we recommend reporting the following information: • What is the problem encountered? • What was the expected outcome? • How can SQream reproduce the issue? When possible, please attach as many of the following: • Error messages or result outputs • DDL and queries that reproduce the issue • Log files • Screen captures if relevant
2.1.4.11.2 How SQream debugs issues
2.1.4.11.2.1 Reproduce
If we are able to easily reproduce your issue in our testing lab, this greatly improves the speed at which we can fix it.
228 Chapter 2. Guides SQream DB, Release 2021.1
Reproducing an issue consists of understanding: 1. What was SQream DB doing at the time? 2. How is the SQream DB cluster configured? 3. How does the schema look? 4. What is the query or statement that exposed the problem? 5. Were there any external factors? (e.g. Network disconnection, hardware failure, etc.) See the Collecting a reproducible example of a problematic statement section ahead for information about collecting a full reproducible example.
2.1.4.11.2.2 Logs
The logs produced by SQream DB contain a lot of information that may be useful for debugging. Look for error messages in the log and the offending statements. SQream’s support staff are experienced in correlating logs to workloads, and finding possible problems. See the Collecting logs and metadata database section ahead for information about collecting a set of logs that can be analyzed by SQream support.
2.1.4.11.2.3 Fix
Once we have a fix, this can be issued as a hotfix to an existing version, or as part of a bigger majorrelease. Your SQream account manager will keep you up-to-date about the status of the issue.
2.1.4.11.3 Collecting a reproducible example of a problematic statement
SQream DB contains an SQL utility that can help SQream support reproduce a problem with a query or statement. This utility compiles and executes a statement, and collects the relevant data in a small database which can be used to recreate and investigate the issue.
2.1.4.11.3.1 SQL Syntax
SELECT EXPORT_REPRODUCIBLE_SAMPLE(output_path, query_stmt [, ... ]) ;
output_path ::= filepath
2.1.4.11.3.2 Parameters
Parameter Description output_path Path for the output archive. The output file will be a tarball. query_stmt [, ...] Statements to analyze.
2.1. Full list of guides 229 SQream DB, Release 2021.1
2.1.4.11.3.3 Example
SELECT EXPORT_REPRODUCIBLE_SAMPLE('/home/rhendricks', 'SELECT * FROM t', $$SELECT "Name", ,→ "Team" FROM nba$$);
2.1.4.11.4 Collecting logs and metadata database
SQream DB comes bundled with a data collection utility and an SQL utility intended for collecting logs and additional information that can help SQream support drill down into possible issues. See more information in the Collect logs from your cluster section of the Logging guide.
2.1.4.11.4.1 Examples
Write an archive to /home/rhendricks, containing log files:
SELECT REPORT_COLLECTION('/home/rhendricks', 'log') ;
Write an archive to /home/rhendricks, containing log files and metadata database:
SELECT REPORT_COLLECTION('/home/rhendricks', 'db_and_log') ;
Using the command line utility:
$ ./bin/report_collection /home/rhendricks/sqream_storage /home/rhendricks db_and_log
2.1.4.12 Creating or cloning a storage cluster
When SQream DB is installed, it comes with a default storage cluster. This guide will help if you need a fresh storage cluster or a separate copy of an existing storage cluster.
2.1.4.12.1 Creating a new storage cluster
SQream DB comes with a CLI tool, SqreamStorage. This tool can be used to create a new empty storage cluster. In this example, we will create a new cluster at /home/rhendricks/raviga_database:
$ SqreamStorage --create-cluster --cluster-root /home/rhendricks/raviga_database Setting cluster version to: 26
This can also be written shorthand as SqreamStorage -C -r /home/rhendricks/raviga_database. This Setting cluster version... message confirms the creation of the cluster successfully.
230 Chapter 2. Guides SQream DB, Release 2021.1
2.1.4.12.2 Tell SQream DB to use this storage cluster
2.1.4.12.2.1 Permanently setting the storage cluster setting
To permanently set the new cluster location, change the "cluster" path listed in the configuration file. For example:
{ "compileFlags":{ }, "runtimeFlags":{ }, "runtimeGlobalFlags":{ }, "server":{ "gpu": 0, "port": 5000, "cluster": "/home/sqream/my_old_cluster", "licensePath": "/home/sqream/.sqream/license.enc" } } should be changed to
{ "compileFlags":{ }, "runtimeFlags":{ }, "runtimeGlobalFlags":{ }, "server":{ "gpu": 0, "port": 5000, "cluster": "/home/rhendricks/raviga_database", "licensePath": "/home/sqream/.sqream/license.enc" } }
Now, the cluster should be restarted for the changes to take effect.
2.1.4.12.2.2 Start a temporary SQream DB worker with a storage cluster
Starting a SQream DB worker with a custom cluster path can be done in two ways:
2.1.4.12.2.3 Using a configuration file (recommended)
Similar to the technique above, create a configuration file with the correct cluster path. Then, start sqreamd using the -config flag:
2.1. Full list of guides 231 SQream DB, Release 2021.1
$ sqreamd -config config_file.json
2.1.4.12.2.4 Using the command line parameters
Use sqreamd’s command line parameters to override the default storage cluster path:
$ sqreamd /home/rhendricks/raviga_database 0 5000 /home/sqream/.sqream/license.enc
Note: sqreamd’s command line parameters’ order is sqreamd
2.1.4.12.3 Copying an existing storage cluster
Copying an existing storage cluster to another path may be useful for testing or troubleshooting purposes. 1. Identify the location of the active storage cluster. This path can be found in the configuration file, under the "cluster" parameter. 2. Shut down the SQream DB cluster. This prevents very large storage directories from being modified during the copy process. 3. (optional) Create a tarball of the storage cluster, with tar -zcvf sqream_cluster_`date +"%Y-%m-%d-%H-%M"`.tgz
2.1.4.13 Installing Studio on a Stand-Alone Server
This guide explains how to install SQream Studio on a stand-alone server. A stand-alone server is a server that does not run SQreamd based on binary, Docker, or Kubernetes. This guide describes how to do the following: • Install NodeJS version 12 on the server • Install Studio • Start Studio manually • Start Studio as a service • Access Studio • Maintain Studio using the Process Manager (PM2) service • Upgrade to the new Studio version • Use the configuration setup arguments
232 Chapter 2. Guides SQream DB, Release 2021.1
2.1.4.13.1 Installing NodeJS Version 12 on the Server
Before installing Studio, you must install NodeJS version 12 on the server. To install NodeJS version 12 on the server: 1. Check if a version of NodeJS older than version 12.
$ node -v
The following is the output if a version of NodeJS has already been installed on the target server:
bash: /usr/bin/node: No such file or directory
2. If a version of NodeJS older than 12.
$ sudo yum remove -y nodejs
• On Ubuntu:
$ sudo apt remove -y nodejs
3. If you have not installed NodeJS version 12, run the following commands: • On CentOS:
$ curl -sL https://rpm.nodesource.com/setup_12.x | sudo bash - $ sudo yum clean all && sudo yum makecache fast $ sudo yum install -y nodejs
• On Ubuntu:
$ curl -sL https://deb.nodesource.com/setup_12.x | sudo -E bash - $ sudo apt-get install -y nodejs
The following output is displayed if your installation has completed successfully:
Transaction Summary ======Install 1 Package
Total download size: 22 M Installed size: 67 M Downloading packages: warning: /var/cache/yum/x86_64/7/nodesource/packages/nodejs-12.22.1- ,→1nodesource.x86_64.rpm: Header V4 RSA/SHA512 Signature, key ID 34fa74dd:␣ ,→NOKEY Public key for nodejs-12.22.1-1nodesource.x86_64.rpm is not installed nodejs-12.22.1-1nodesource.x86_64.rpm ␣ ,→ | 22 MB 00:00:02 Retrieving key from file:///etc/pki/rpm-gpg/NODESOURCE-GPG-SIGNING-KEY-EL Importing GPG key 0x34FA74DD: Userid : "NodeSource
2.1. Full list of guides 233 SQream DB, Release 2021.1
(continued from previous page) Package : nodesource-release-el7-1.noarch (installed) From : /etc/pki/rpm-gpg/NODESOURCE-GPG-SIGNING-KEY-EL Running transaction check Running transaction test Transaction test succeeded Running transaction Warning: RPMDB altered outside of yum. Installing : 2:nodejs-12.22.1-1nodesource.x86_64 ␣ ,→ 1/1 Verifying : 2:nodejs-12.22.1-1nodesource.x86_64 ␣ ,→ 1/1
Installed: nodejs.x86_64 2:12.22.1-1nodesource
Complete!
3. Confirm the Node version.
$ node -v
The following is an example of the correct output:
v12.22.1
Back to Installing Studio on a Stand-Alone Server
2.1.4.13.2 Installing Studio
To install Studio: 1. Copy the SQream Studio package from SQream Artifactory into the target server. For access to the Sqream Studio package, contact Sqream Support. 2. Extract the package:
$ tar -xvf sqream-acceleration-studio-
3. Navigate to the new package folder.
$ cd sqream-admin
4. Build the configuration file to set up Sqream Studio. You can use IPaddress 127.0.0.1 on a single server.
$ npm run setup -- -y --host=
The above command creates the sqream-admin-config.json configuration file in the sqream-admin folder and shows the following output:
Config generated successfully. Run `npm start` to start the app.
For more information about the available set-up arguments, see Set-Up Arguments.
234 Chapter 2. Guides SQream DB, Release 2021.1
5. If you have installed Studio on a server where SQream is already installed, move the sqream-admin- config.json file to /etc/sqream/:
$ mv sqream-admin-config.json /etc/sqream
Back to Installing Studio on a Stand-Alone Server
2.1.4.13.3 Starting Studio Manually
You can start Studio manually by running the following command:
$ cd /home/sqream/sqream-admin $ NODE_ENV=production pm2 start ./server/build/main.js --name=sqream-studio -- start
The following output is displayed:
[PM2] Starting /home/sqream/sqream-admin/server/build/main.js in fork_mode (1 instance) [PM2] Done. id name namespace version mode pid uptime ␣ ,→status cpu mem user watching 0 sqream-studio default 0.1.0 fork 11540 0s 0 ␣ ,→online 0% 15.6mb sqream disabled
2.1.4.13.4 Starting Studio as a Service
Sqream uses the Process Manager (PM2) to maintain Studio. To start Studio as a service: 1. Run the following command:
$ sudo npm install -g pm2
2. Verify that the PM2 has been installed successfully.
$ pm2 list
The following is the output:
id name namespace version mode pid uptime ␣ ,→status cpu mem user watching 0 sqream-studio default 0.1.0 fork 11540 2m 0 ␣ ,→online 0% 31.5mb sqream disabled
2. Start the service with PM2: • If the sqream-admin-config.json file is located in /etc/sqream/, run the following command:
2.1. Full list of guides 235 SQream DB, Release 2021.1
$ cd /home/sqream/sqream-admin $ NODE_ENV=production pm2 start ./server/build/main.js --name=sqream-studio --␣ ,→start --config-location=/etc/sqream/sqream-admin-config.json
• If the sqream-admin-config.json file is not located in /etc/sqream/, run the following com- mand:
$ cd /home/sqream/sqream-admin $ NODE_ENV=production pm2 start ./server/build/main.js --name=sqream-studio --␣ ,→start
3. Verify that Studio is running.
$ netstat -nltp
4. Verify that SQream_studio is listening on port 8080, as shown below:
(Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.) Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/ ,→Program name tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN - tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN - tcp6 0 0 :::8080 :::* LISTEN ␣ ,→11540/sqream-studio tcp6 0 0 :::22 :::* LISTEN - tcp6 0 0 ::1:25 :::* LISTEN -
5. Verify the following: 1. That you can access Studio from your browser (http://
$ pm2 startup
The following is an example of the output:
$ sudo env PATH=$PATH:/usr/bin /usr/lib/node_modules/pm2/bin/pm2 startup systemd -u␣ ,→sqream --hp /home/sqream
7. Copy and paste the output above and run it. 8. Save the configuration.
$ pm2 save
Back to Installing Studio on a Stand-Alone Server
2.1.4.13.5 Accessing Studio
The Studio page is available on port 8080: http://
236 Chapter 2. Guides SQream DB, Release 2021.1
If port 8080 is blocked by the server firewall, you can unblock it by running the following command:
$ firewall-cmd --zone=public --add-port=8080/tcp --permanent $ firewall-cmd --reload
Back to Installing Studio
2.1.4.13.6 Maintaining Studio with the Process Manager (PM2)
Sqream uses the Process Manager (PM2) to maintain Studio. You can use PM2 to do one of the following: • To check the PM2 service status: pm2 list
• To restart the PM2 service: pm2 reload sqream-studio
• To see the PM2 service logs: pm2 logs sqream-studio Back to Installing Studio on a Stand-Alone Server
2.1.4.13.7 Upgrading Studio
To upgrade Studio you need to stop the version that you currently have. To stop the current version of Studio: 1. List the process name:
$ pm2 list
The process name is displayed.
2. Run the following command with the process name:
$ pm2 stop
3. If only one process is running, run the following command:
$ pm2 stop all
4. Change the name of the current sqream-admin folder to the old version.
$ mv sqream-admin sqream-admin-
5. Extract the new Studio version.
$ tar -xf sqream-acceleration-studio-
6. Rebuild the configuration file. You can use IP address 127.0.0.1 on a single server.
2.1. Full list of guides 237 SQream DB, Release 2021.1
$ npm run setup -- -y --host=
The above command creates the sqream-admin-config.json configuration file in the sqream_admin folder. 7. Copy the sqream-admin-config.json configuration file to /etc/sqream/ to overwrite the old con- figuration file. 8. Start PM2.
$ pm2 start all
Back to Installing Studio on a Stand-Alone Server
2.1.4.13.7.1 Installing Studio in a Docker Container
This guide explains how to install SQream Studio in a Docker container. This guide describes how to do the following: • Install SQream Studio in a Docker container • Access Studio • Using Docker Container Commands
2.1.4.13.8 Installing SQream Studio in a Docker Container
If you have already installed Docker, you can install SQream Studio in a Docker container. To install Sqream Studio in a Docker container: 1. Copy the downloaded image onto the target server. 2. Load the Docker image.
$ docker load -i
3. If the downloaded image is called sqream-acceleration-studio-5.1.3.x86_64.docker18.0.3.tar, run the following command:
$ docker load -i sqream-acceleration-studio-5.1.3.x86_64.docker18.0.3.tar
4. Start the Docker container.
$ docker run -d --restart=unless-stopped -p
The following is an example of the command above:
$ docker run -d --name sqream-studio -p 8080:8080 -e runtime=docker -e SQREAM_K8S_ ,→PICKER=192.168.0.183 -e SQREAM_PICKER_PORT=3108 -e SQREAM_DATABASE_NAME=master -e␣ ,→SQREAM_ADMIN_UI_PORT=8080 sqream-acceleration-studio:5.1.3
Back to Installing Studio in a Docker Container
238 Chapter 2. Guides SQream DB, Release 2021.1
2.1.4.13.9 Accessing Studio
You can access Studio from Port 8080: http://
Parameter Default Value Description --web-ssl-port8443 --web-ssl-key-pathNone The path of SSL key PEM file for enabling https. Leave empty to disable. --web-ssl-cert-pathNone The path of SSL certificate PEM file for enabling https. Leave empty to disable.
You can configure the above parameters using the following syntax:
$ npm run setup -- -y --host=127.0.0.1 --port=3108
Back to Installing Studio in a Docker Container
2.1.4.13.10 Docker Container Commands
When installing Studio in Docker, you can run the following commands: • View Docker container logs:
$ docker logs -f sqream-admin-ui
• Restart the Docker container:
$ docker restart sqream-admin-ui
• Kill the Docker container:
$ docker rm -f sqream-admin-ui
Back to Installing Studio in a Docker Container
2.1.4.13.11 Setup Argument Configurations
When creating the sqream-admin-config.json configuration file, you can add -y to create the configuration file in non-interactive mode. Configuration files created in non-interactive mode use all the parameter defaults not provided in the command. The following table shows the available arguments:
2.1. Full list of guides 239 SQream DB, Release 2021.1
Parameter Default Value Description --web--host8443 --web-port8080 --web-ssl-port8443 --web-ssl-key-pathNone The path of the SSL Key PEM file for enabling https. Leave empty to disable. --web-ssl-cert-pathNone The path of the SSL Certificate PEM file for enabling https. Leave empty to disable. --debug-sqreamfalse (flag) --host 127.0.0.1 --port 3108 is-clustertrue (flag) --servicesqream --ssl false Enables the SQream SSL connection. (flag) --name default --data-collector-urllocalhost:8100/api/dashboard/dataEnables the Dashboard. Leaving this blank disables the Dashboard. Using a mock URL uses mock data. --cluster-typestandalone (standalone or k8s) --config-location./sqream-admin- config.json --network-timeout60000 (60 seconds) --access-keyNone If defined, UI access is blocked unless ?ui-access=
2.1.4.14 SQream Acceleration Studio 5.4.0
The SQream Studio is a web-based client for use with SQream DB. The Studio provides users with all functionality available from the command line in an intuitive and easy-to-use format. This includes running statements, managing roles and permissions, and managing SQream DB clusters.
This page describes the following:
• SQream Acceleration Studio 5.4.0 – Getting Started ∗ Setting Up and Starting Studio ∗ Logging In to Studio ∗ Navigating Studio’s Main Features – Monitoring Workers and Services from the Dashboard ∗ Displaying System Disk Usage from the Data Storage Panel ∗ Subscribing to Workers from the Services Panel ∗ Managing Workers from the Workers Panel ∗ License Information
240 Chapter 2. Guides SQream DB, Release 2021.1
– Executing Statements and Running Queries from the Editor ∗ Executing Statements from the Toolbar ∗ Performing Statement-Related Operations from the Database Tree ∗ Executing Pre-Defined Queries from the System Queries Panel ∗ Writing Statements and Queries from the Statement Panel ∗ Viewing Statement and Query Results from the Results Panel ∗ Analyzing Results – Viewing Logs ∗ Filtering Table Data ∗ Viewing Query Logs ∗ Viewing Session Logs ∗ Viewing System Logs ∗ Viewing All Log Lines – Creating, Assigning, and Managing Roles and Permissions ∗ Overview ∗ Viewing Information About a Role ∗ Creating a New Role ∗ Editing a Role ∗ Deleting a Role – Configuring Your Instance of SQream ∗ Editing Your Parameters ∗ Exporting and Importing Configuration Files
2.1.4.14.1 Getting Started
2.1.4.14.1.1 Setting Up and Starting Studio
Studio is included with all dockerized installations of SQream DB. When starting Studio, it listens on the local machine on port 8080.
2.1.4.14.1.2 Logging In to Studio
To log in to SQream Studio: 1. Open a browser to the host on port 8080. For example, if your machine IP address is 192.168.0.100, insert the IP address into the browser as shown below:
$ http://192.168.0.100:8080
2.1. Full list of guides 241 SQream DB, Release 2021.1
2. Fill in your SQream DB login credentials. These are the same credentials used for sqream sql or JDBC. When you sign in, the License Warning is displayed.
2.1.4.14.1.3 Navigating Studio’s Main Features
When you log in, you are automatically taken to the Editor screen. The Studio’s main functions are displayed in the Navigation pane on the left side of the screen. From here you can navigate between the main areas of the Studio:
Element Description Dashboard Lets you monitor cluster storage and system health and manage queues and workers. Editor Lets you select databases, perform statement operations, and write and execute queries. Logs Lets you view usage logs. Roles Lets you create users and manage user permissions. ConfigurationLets you configure your instance of SQream.
By clicking the user icon, you can also use it for logging out and viewing the following: • User information • Connection type • SQream version • SQream Studio version • Data size limitations • Log out
2.1.4.14.2 Monitoring Workers and Services from the Dashboard
The Dashboard is used for the following: • Monitoring cluster storage and system health. • Viewing, monitoring, and adding defined service queues. • Viewing and managing worker status and add workers. You can only access the Dashboard if you signed in with a SUPERUSER role. The following is a brief description of the Dashboard panels:
No. Element Description 1 Data Storage panel Used to monitor cluster storage. 2 Services panel Used for viewing and monitoring the defined service queues. 3 Workers panel Monitors system health and shows each Sqreamd worker running in the cluster. 4 License information Shows the remaining amount of days left on your license.
242 Chapter 2. Guides SQream DB, Release 2021.1
2.1.4.14.2.1 Displaying System Disk Usage from the Data Storage Panel
The Data Storage area displays your system’s total disk usage (percentage) and data storage (donut graph). Your data storage is shows as the following four components: • Data – Storage occupied by databases in SQream DB. • Free – Free storage space. • Deleted – Storage that is temporarily occupied, but has not been reclaimed. For more information, see Deleting Data. The deleted value is estimated and may not be accurate. • Other – Storage used by other applications. On a dedicated SQream DB cluster this should be near zero. You can show more information by clicking the expand arrow on the Data Storage panel.
The expanded Data Storage panel can be used to drill down into each database’s storage footprint, and displays a breakdown of how much storage each database in the cluster uses. The database information is displayed in a table and shows the following: • Database name - the name of the database. • Raw Size – the estimated size of (uncompressed) raw data in the database. • Storage Size – the physical size of the compressed data. • Ratio – the effective compression ratio • Deleted Data – Storage that is temporarily occupied, but has not been reclaimed. Below the table, an interactive line graph displays the database storage trends. By default, the line graph displays the total storage for all databases. You can show a particular database’s graph by clicking it in the table. You can also change the line graph’s timespan by selecting one from the timespan dropdown menu.
2.1. Full list of guides 243 SQream DB, Release 2021.1
Back to Monitoring Workers and Services from the Dashboard
2.1.4.14.2.2 Subscribing to Workers from the Services Panel
Services are used to categorize and associate (also known as subscribing) workers to particular services. The Service panel is used for viewing, monitoring, and adding defined service queues.
The Dashboard includes the four panes shown below:
244 Chapter 2. Guides SQream DB, Release 2021.1
The following is a brief description of each pane:
No. Description 1 Adds a worker to the selected service. 2 Shows the service name. 3 Shows a trend graph of queued statements loaded over time. 4 Adds a service. 5 Shows the currently processed queries belonging to the service/total queries for that service in the system (including queued queries).
2.1.4.14.2.3 Adding A Service
You can add a service by clicking + Add and defining the service name.
Note: If you do not associate a worker with the new service, it will not be created.
You can manage workers from the Workers panel. For more information on managing workers, see Workers. Back to Monitoring Workers and Services from the Dashboard
2.1.4.14.2.4 Managing Workers from the Workers Panel
From the Workers panel you can do the following: • View workers • Add a worker to a service • View a worker’s active query information • View a worker’s execution plan
2.1.4.14.2.5 Viewing Workers
The Worker panel shows each worker (sqreamd) running in the cluster. Each worker has a status bar that represents the status over time. The status bar is divided into 20 equal segments, showing the most dominant activity in that segment. From the Scale dropdown menu you can set the time scale of the displayed information You can hover over segments in the status bar to see the date and time corresponding to each activity type: • Idle – the worker is idle and available for statements.
2.1. Full list of guides 245 SQream DB, Release 2021.1
• Compiling – the worker is compiling a statement and is preparing for execution. • Executing – the worker is executing a statement after compilation. • Stopped – the worker was stopped (either deliberately or due to an error). • Waiting – the worker was waiting on an object locked by another worker.
2.1.4.14.2.6 Adding A Worker to A Service
You can add a worker to a service by clicking the add button.
Clicking the add button shows the selected service’s workers. You can add the selected worker to the service by clicking Add Worker. Adding a worker to a service does not break associations already made between that worker and other services.
2.1.4.14.2.7 Viewing A Worker’s Active Query Information
You can view a worker’s active query information by clicking Queries, which displays them in the selected service. Each statement shows the query ID, status, service queue, elapsed time, execution time, and esti- mated completion status. In addition, each statement can be stopped or expanded to show its execution plan and progress. For more information on viewing a statement’s execution plan and progress, see Viewing a Worker’s Execution Plan below.
2.1.4.14.2.8 Viewing A Worker’s Host Utilization
While viewing a worker’s query information, clicking the down arrow expands to show the host resource utilization.
246 Chapter 2. Guides SQream DB, Release 2021.1
The graphs show the resource utilization trends over time, and the CPU memory and utilization and the GPU utilization values on the right. You can hover over the graph to see more information about the activity at any point on the graph. Error notifications related to statements are displayed as shown in the figure below, and you can hoverover them for more information about the error.
2.1.4.14.2.9 Viewing a Worker’s Execution Plan
Clicking the ellipsis in a service shows the following additional options: • Stop Query - stops the query. • Show Execution Plan - shows the execution plan as a table. The columns in the Show Execution Plan table can be sorted. For more information on the current query plan, see SHOW_NODE_INFO. For more information on check- ing active sessions across the cluster, see SHOW_SERVER_STATUS.
2.1. Full list of guides 247 SQream DB, Release 2021.1
Table 14: Statement status values Status Description Preparing Statement is being prepared In queue Statement is waiting for execution Initializing Statement has entered execution checks Executing Statement is executing Stopping Statement is in the process of stopping
2.1.4.14.2.10 Managing Worker Status
In some cases you may want to stop or restart workers for maintenance purposes. Each Worker line has a menu used for stopping, starting, or restarting workers.
Starting or restarting workers terminates all queries related to that worker. When you stop a worker, its background turns gray. Back to Monitoring Workers and Services from the Dashboard
2.1.4.14.2.11 License Information
The license information is a counter showing the amount of time in days remaining on the license. Back to Monitoring Workers and Services from the Dashboard Back to Navigating Studio’s Main Features
2.1.4.14.3 Executing Statements and Running Queries from the Editor
The Editor is used for the following: • Selecting an active database and executing queries. • Performing statement-related operations and showing metadata. • Executing pre-defined queries. • Writing queries and statements and viewing query results.
248 Chapter 2. Guides SQream DB, Release 2021.1
The following is a brief description of the Editor panels:
No. Element Description 1 Toolbar Used to select the active database you want to work on, limit the number of rows, save query, etc. 2 Database Tree and System Shows a heirarchy tree of databases, views, tables, and Queries panel columns 3 Statement panel Used for writing queries and statements 4 Results panel Shows query results and execution information.
2.1.4.14.3.1 Executing Statements from the Toolbar
The following figure shows the Toolbar pane:
You can access the following from the Toolbar pane: • Database dropdown list - select a database that you want to run statements on. • Service dropdown list - select a service that you want to run statements on. The options in the service dropdown menu depend on the database you select from the Database dropdown list. • Execute - lets you set which statements to execute. The Execute button toggles between Execute and Stop, and can be used to stop an active statement before it completes: – Statements - executes the statement at the location of the cursor. – Selected - executes only the highlighted text. This mode should be used when executing sub- queries or sections of large queries (as long as they are valid SQLs). – All - executes all statements in a selected tab. For more information on stopping active statements, see the STOP_STATEMENT command. • Format SQL - Lets you reformat and reindent statements. • Download query - Lets you download query text to your computer. • Open query - Lets you upload query text from your computer. • Max Rows - By default, the Editor fetches only the first 10,000 rows. You can modify this number by selecting an option from the Max Rows dropdown list. Note that setting a higher number may slow down your browser if the result is very large. This number is limited to 100,000 results. To see a higher number, you can save the results in a file or a table using the CREATE TABLE AS command. Back to Executing Statements and Running Queries from the Editor
2.1.4.14.3.2 Performing Statement-Related Operations from the Database Tree
From the Database Tree you can perform statement-related operations and show metadata (such as a number indicating the amount of rows in the table). The following figure shows the Database Tree and System Queries panel, with the Database Tree tab selected.
2.1. Full list of guides 249 SQream DB, Release 2021.1
The following figure shows the database object functions inthe calcs table object.
The database object functions are used to perform the following: • The SELECT statement - copies the selected table’s columns into the Statement panel as SELECT parameters. • The copy feature - copies the selected table’s name into the Statement panel.
• The additional operations - displays the following additional options:
250 Chapter 2. Guides SQream DB, Release 2021.1
Function Description Insert statement Generates an INSERT statement for the selected table in the editing area. Delete statement Generates a DELETE statement for the selected table in the editing area. Create Table As statement Generates a CREATE TABLE AS statement for the selected table in the editing area. Rename statement Generates an RENAME TABLE AS statement for renaming the selected table in the editing area. Adding column statement Generates an ADD COLUMN statement for adding columns to the se- lected table in the editing area. Truncate table statement Generates a TRUNCATE_IF_EXISTS statement for the selected table in the editing area. Drop table statement Generates a DROP statement for the selected object in the editing area. Table DDL Generates a DDL statement for the selected object in the editing area. To get the entire database DDL, click the icon next to the database name in the tree root. See also Seeing System Objects as DDL. DDL Optimizer The DDL Optimizer lets you analyze database tables and recommends possible optimizations.
2.1.4.14.3.3 Optimizing Database Tables Using the DDL Optimizer
The DDL Optimizer tab analyzes database tables and recommends possible optimizations according to SQream’s best practices. As described in the previous table, you can access the DDL Optimizer by clicking the additional options icon and selecting DDL Optimizer. The following table describes the DDL Optimizer screen:
Element Description Column area Shows the column names and column types from the selected table. You can scroll down or to the right/left for long column lists. Optimization Shows the number of rows to sample as the basis for running an optimization, the default area setting (1,000,000) when running an optimization (this is also the overhead threshold used when analyzing VARCHAR fields), and the default percent buffer to add VARCHAR lengths (10%). Attempts to determine field nullability. Run Optimizer Starts the optimization process.
Clicking Run Optimizer adds a tab to the Statement panel showing the optimized results of the selected object. The figure below shows the calcs Optimized tab for the optimized calcs table. For more information, see Optimization and Best Practices.
2.1.4.14.3.4 Executing Pre-Defined Queries from the System Queries Panel
The System Queries panel lets you execute pre-defined queries and includes the following system query types: • Catalog queries - used for analyzing table compression rates, users and permissions, etc. • Admin queries - queries related to available (describe the functionality in a general way). Queries useful for SQream database management:
2.1. Full list of guides 251 SQream DB, Release 2021.1
Clicking an item pastes the query into the Statement pane, and you can undo a previous operation by pressing Ctrl + Z.
2.1.4.14.3.5 Writing Statements and Queries from the Statement Panel
The multi-tabbed statement area is used for writing queries and statements, and is used in tandem with the toolbar. When writing and executing statements, you must first select a database from the Database dropdown menu in the toolbar. When you execute a statement, it passes through a series of statuses until completing. Knowing the status helps you with statement maintenance, and the statuses are shown in the Results panel. The following table shows the statement statuses:
Status Description Pending The statement is pending. In queue The statement is waiting for execution. Initializing The statement has entered execution checks. Executing The statement is executing. Statement stopped The statement has been stopped.
You can add and name new tabs for each statement that you need to execute, and Studio preserves your created tabs when you switch between databases. You can add new tabs by clicking , which creates a new tab to the right with a default name of SQL and an increasing number. This helps you keep track of your statements.
You can also rename the default tab name by double-clicking it and typing a new name and write multiple statements in tandem in the same tab by separating them with semicolons (;).If too many tabs to fit into the Statement Pane are open at the same time, the tab arrows are displayed. You can scroll through the tabs by clicking or , and close tabs by clicking . You can also close all tabs at once by clicking Close all located to the right of the tabs.
Tip: If this is your first time using SQream, see First steps with SQream DB.
Back to Executing Statements and Running Queries from the Editor
2.1.4.14.3.6 Viewing Statement and Query Results from the Results Panel
The results pane shows statment and query results. By default, only the first 10,000 results are returned, although you can modify this from the Executing Statements from the Toolbar, as described above.
252 Chapter 2. Guides SQream DB, Release 2021.1
By default, executing several statements together opens a separate results tab for each statement. Executing statements together executes them serially, and any failed statement cancels all subsequent executions. The following is a brief description of the elements on the Results panel views:
Element Description Results view Lets you view search query results. Execution Details Lets you view execution details, such as statement ID, number of rows, and averge view number of rows in chunk. SQL view Lets you see the SQL view. Save results to clip- Lets you save your search results to the clipboard to paste into another text editor. board Save results to local Lets you save your search query results to a local file. file
Back to Executing Statements and Running Queries from the Editor
2.1.4.14.3.7 Searching Query Results in the Results View
The Results view lets you view search query results. From this view you can also do the following: • View the amount of time (in seconds) taken for a query to finish executing. • Switch and scroll between tabs. • Close all tabs at once. • Enable keeping tabs by selecting Keep tabs. • Sort column results. In the Results view you can also run parallel statements, as described in Running Parallel Statements below.
2.1.4.14.3.8 Running Parallel Statements
While Studio’s default functionality is to open a new tab for each executed statement, Studio supports running parallel statements in one statement tab. Running parallel statements requires using macros and is useful for advanced users. The following shows the syntax for running parallel statements:
$ @@ parallel $ $$ $ select 1; $ select 2; $ select 3; $ $$
2.1. Full list of guides 253 SQream DB, Release 2021.1
The following figure shows the parallel statement syntax in the Editor:
2.1.4.14.3.9 Execution Details View
The Execution Details view lets you view a query’s execution plan for monitoring purposes. Most im- portantly, the Execution Details view highlights rows based on how long they ran relative to the entire query. This can be seen in the timeSum column as follows: • Rows highlighted red - longest runtime • Rows highlighted orange - medium runtime • Rows highlighted yellow - shortest runtime
2.1.4.14.3.10 Viewing Wrapped Strings in the SQL View
The SQL View panel allows you to more easily view certain queries, such as a long string that appears on one line. The SQL View makes it easier to see by wrapping it so that you can see the entire string at once. It
254 Chapter 2. Guides SQream DB, Release 2021.1
also reformats and organizes query syntax entered in the Statement panel for more easily locating particular segments of your queries. The SQL View is identical to the Format SQL feature in the Toolbar, allowing you to retain your originally constructed query while viewing a more intuititively structured snapshot of it. The following figure shows the SQL view:
2.1.4.14.3.11 Saving Results to the Clipboard
The Save results to clipboard function lets you save your results to the clipboard to paste into another text editor or into Excel for further analysis.
2.1.4.14.3.12 Saving Results to a Local File
The Save results to local file functions lets you save your search query results to a local file. Clicking Save results to local file downloads the contents of the Results panel to an Excel sheet. You can then use copy and paste this content into other editors as needed. Back to Executing Statements and Running Queries from the Editor
2.1.4.14.3.13 Analyzing Results
When results are produced, a Generate CREATE statement button is displayed. Clicking this button creates a new tab with an optimized CREATE TABLE statement, and an INSERT statement to copy the data to the new table. Back to Executing Statements and Running Queries from the Editor Back to Navigating Studio’s Main Features
2.1.4.14.4 Viewing Logs
The Logs screen is used for viewing logs and includes the following elements:
2.1. Full list of guides 255 SQream DB, Release 2021.1
Element Description Filter area Lets you filter the data shown in the table. Query tab Shows basic query information logs, such as query number and the time the query was run. Session tab Shows basic session information logs, such as session ID and user name. System tab Shows all system logs. Log lines tab Shows the total amount of log lines.
2.1.4.14.4.1 Filtering Table Data
From the Logs tab, from the FILTERS area you can also apply the TIMESPAN, ONLY ERRORS, and additional filters (Add). The Timespan filter lets you select a timespan. The Only Errors toggle button lets you show all queries, or only queries that generated errors. The Add button lets you add additional filters to the data shown in the table. The Filter button applies the selected filter(s). Some filters require you to type text to define the filter.
Other filters require you to select an item from a dropdown menu: • INFO • WARNING • ERROR • FATAL • SYSTEM You can also export a record of all of your currently filtered logs in Excel format by clicking Download located above the Filter area. Back to Viewing Logs
256 Chapter 2. Guides SQream DB, Release 2021.1
2.1.4.14.4.2 Viewing Query Logs
The QUERIES log area shows basic query information, such as query number and the time the query was run. The number next to the title indicates the amount of queries that have been run. From the Queries area you can see and sort by the following: • Query ID • Start time • Query • Compilation duration • Execution duration • Total duration • Details (execution details, error details, successful query details) In the Queries table, you can click on the Statement ID and Query items to set them as your filters. In the Details column you can also access additional details by clicking one of the Details options for a more detailed explanation of the query. Back to Viewing Logs
2.1.4.14.4.3 Viewing Session Logs
The SESSIONS tab shows the sessions log table and is used for viewing activity that has occurred during your sessions. The number at the top indicates the amount of sessions that have occurred. From here you can see and sort by the following: • Timestamp • Connection ID • Username • Client IP • Login (Success or Failed) • Duration (of session) • Configuration Changes In the Sessions table, you can click on the Timestamp, Connection ID, and Username items to set them as your filters. Back to Viewing Logs
2.1.4.14.4.4 Viewing System Logs
The SYSTEM tab shows the system log table and is used for viewing all system logs. The number at the top indicates the amount of sessions that have occurred. Because system logs occur less frequently than queries and sessions, you may need to increase the filter timespan for the table to display any system logs. From here you can see and sort by the following: • Timestamp
2.1. Full list of guides 257 SQream DB, Release 2021.1
• Log type • Message In the Systems table, you can click on the Timestamp and Log type items to set them as your filters. In the Message column, you can also click on an item to show more information about the message. Back to Viewing Logs
2.1.4.14.4.5 Viewing All Log Lines
The LOG LINES tab is used for viewing the total amount of log lines in a table. From here users can view a more granular breakdown of log information collected by Studio. The other tabs (QUERIES, SESSIONS, and SYSTEM) show a filtered form of the raw log lines. For example, the QUERIES tab shows an aggregation of several log lines. From here you can see and sort by the following: • Timestamp • Message level • Worker hostname • Worker port • Connection ID • Database name • User name • Statement ID In the LOG LINES table, you can click on any of the items to set them as your filters. Back to Viewing Logs Back to Navigating Studio’s Main Features
2.1.4.14.5 Creating, Assigning, and Managing Roles and Permissions
2.1.4.14.5.1 Overview
In the Roles area you can create and assign roles and manage user permissions. The Type column displays one of the following assigned role types:
Role Type Description Groups Roles with no users. Enabled users Users with log-in permissions and a password. Disabled users Users with log-in permissions and with a disabled password. An admin may disable a user’s password permissions to temporary disable access to the system.
Note: If you disable a password, when you enable it you have to create a new one.
Back to Creating, Assigning, and Managing Roles and Permissions
258 Chapter 2. Guides SQream DB, Release 2021.1
2.1.4.14.5.2 Viewing Information About a Role
Clicking a role in the roles table displays the following information: • Parent Roles - displays the parent roles of the selected role. Roles inherit all roles assigned to the parent. • Members - displays all members that the role has been assigned to. The arrow indicates the roles that the role has inherited. Hovering over a member displays the roles that the role is inherited from. • Permissions - displays the role’s permissions. The arrow indicates the permissions that the role has inherited. Hovering over a permission displays the roles that the permission is inherited from. Back to Creating, Assigning, and Managing Roles and Permissions
2.1.4.14.5.3 Creating a New Role
You can create a new role by clicking New Role.
An admin creates a user by granting login permissions and a password to a role. Each role is defined by a set of permissions. An admin can also group several roles together to form a group to manage them simultaneously. For example, permissions can be granted to or revoked on a group level. Clicking New Role lets you do the following: • Add and assign a role name (required) • Enable or disable log-in permissions for the role. • Set a password. • Assign or delete parent roles. • Add or delete permissions. • Grant the selected user with superuser permissions. From the New Role panel you view directly and indirectly (or inherited) granted permissions. Disabled permissions have no connect permissions for the referenced database and are displayed in gray text. You can add or remove permissions from the Add permissions field. From the New Role panel you can also search and scroll through the permissions. In the Search field you can use the and operator to search for strings that fulfill multiple criteria. When adding a new role, you must select the Enable login for this role and Has password check boxes. Back to Creating, Assigning, and Managing Roles and Permissions
2.1. Full list of guides 259 SQream DB, Release 2021.1
2.1.4.14.5.4 Editing a Role
Once you’ve created a role, clicking the Edit Role button lets you do the following: • Edit the role name. • Enable or disable log-in permissions. • Set a password. • Assign or delete parent roles. • Assign a role administrator permissions. • Add or delete permissions. • Grant the selected user with superuser permissions. From the Edit Role panel you view directly and indirectly (or inherited) granted permissions. Disabled permissions have no connect permissions for the referenced database and are displayed in gray text. You can add or remove permissions from the Add permissions field. From the Edit Role panel you can also search and scroll through the permissions. In the Search field you can use the and operator to search for strings that fulfill multiple criteria. Back to Creating, Assigning, and Managing Roles and Permissions
2.1.4.14.5.5 Deleting a Role
Clicking the delete icon displays a confirmation message with the amount of users and groups that willbe impacted by deleting the role. Back to Creating, Assigning, and Managing Roles and Permissions Back to Navigating Studio’s Main Features
2.1.4.14.6 Configuring Your Instance of SQream
The Configuration section lets you edit parameters from one centralized location. While you can edit these parameters from the worker configuration file (config.json) or from your CLI, you can also modify them in Studio in an easy-to-use format.
2.1.4.14.6.1 Editing Your Parameters
When configuring your instance of SQream in Studio you can edit parameters forthe Generic and Admin parameters only. Studio includes two types of parameters: toggle switches, such as flipJoinOrder, and text fields, such as logSysLevel. After editing a parameter, you can reset each one to its previous value or to its default value individually, or revert all parameters to their default setting simultaneously. Note that you must click Save to save your configurations. You can hover over the information icon located on each parameter to read a short description of its behavior. Back to Configuring Your Instance of SQream
260 Chapter 2. Guides SQream DB, Release 2021.1
2.1.4.14.6.2 Exporting and Importing Configuration Files
You can also export and import your configuration settings into a .json file. This allows you to easily edit your parameters and to share this file with other users if required. For more information about configuring your instance of SQream, see Configuration. Back to Configuring Your Instance of SQream Back to Navigating Studio’s Main Features
2.1.4.15 Using the statement editor (deprecated)
The statement editor is a web-based client for use with SQream DB. It can be used to run statements, and manage roles and permissions.
Note: The statement editor is deprecated from SQream DB v2020.2. It is replaced by SQream Studio
In this topic:
• Setting up and starting the statement editor • Logging in • Familiarizing yourself with the editor – Toolbar – Statement area ∗ Keyboard shortcuts – Results ∗ Sorting results ∗ Viewing execution information ∗ Saving results to a file or clipboard – Database tree ∗ Database ∗ Schema ∗ Table ∗ Create a table ∗ Catalog views ∗ Predefined queries • Notifications • DDL optimizer – Using the DDL optimizer – Analyzing the results
2.1. Full list of guides 261 SQream DB, Release 2021.1
2.1.4.15.1 Setting up and starting the statement editor
Note: We recommend using Google Chrome or Chrome-based browsers to access the statement editor.
The statement editor is included with all dockerized installations of SQream DB. You can start the editor using sqream-console:
$ ./sqream-console sqream-console> sqream editor --start access sqream statement editor through Chrome http://192.168.0.100:3000
When starting the editor, it listens on the local machine, on port 3000.
2.1.4.15.2 Logging in
Open a browser to the host, on port 3000. If the machine is 196.168.0.100, navigate to http://192.168.0.100: 3000 . Fill in your login details for SQream DB. These are the same credentials you might use when using sqream sql or JDBC.
262 Chapter 2. Guides SQream DB, Release 2021.1
Tip: • If using a load balancer, select the Server picker box and make sure to use the correct port (usually 3108). If connecting directly to a worker, make sure to untick the box and use the correct port, usually 5000 and up. • If this is your first time using SQream DB, use database name master. • When using SQream DB on AWS, the initial password is the EC2 instance ID.
2.1.4.15.3 Familiarizing yourself with the editor
The editor is built up of main panes.
2.1. Full list of guides 263 SQream DB, Release 2021.1
• Toolbar - used to select the active database you want to work on, limit the number of rows, save query results, control tab behavior, and more. • Statement area - The statement area is a multi-tab text editor where you write SQL statements. Each tab can connect to a different database. • Results - Results from a query will populate here. • Database tree - contains a heirarchy tree of databases, views, tables, and columns. Can be used to navigate and perform some table operations. See more about each pane below:
2.1.4.15.3.1 Toolbar
In the toolbar, you can perform the folllowing operations (from left to right): • Toggle Database Tree - Click ••• to show or hide the Database Tree pane. • Database dropdown - Select the database you want to the statements to run on. • RUN / STOP - Use the RUN button to execute the statement in the Editor pane. When a statement is running, the button changes to STOP, and can be used to stop the active statement. • SQL - Reformats and reindents the statement • Max. Rows - By default, the editor will only fetch the first 1000 rows. Click the number to edit. Click outside the number area to save. Setting a higher limit can slow down your browser if the result set is very large. • (Save) - Save the query text to a file. • (Load) - Load query text from a file.
264 Chapter 2. Guides SQream DB, Release 2021.1
• (more) - – Append new results - When checked, every statement executed will open a new Results tab. If unchecked, the Results tab is reused and overwritten with every new statement.
2.1.4.15.3.2 Statement area
The multi-tabbed statement area is where you write queries and statements. Select the database you wish to use in the toolbar, and then write and execute statements. • A new tab can be opened for each statement. Tabs can be used to separate statements to different databases. • Multiple statements can be written in the same tab, separated by semicolons. • When multiple statements exist in the tab, clicking Run executes all statements in the tab, or only the selected statements.
Tip: If this is your first time with SQream DB, see our first steps guide.
2.1.4.15.3.3 Keyboard shortcuts
Ctrl +: kbd:Enter - Execute all queries in the statement area, or just the highlighted part of the query. Ctrl + Space - Auto-complete the current keyword Ctrl + ↑ - Switch to next tab. Ctrl + ↓ - Switch to previous tab
2.1.4.15.3.4 Results
The results pane shows query results and execution information. By default, only the first 1000 results are returned (modify via the Toolbar). A context menu, accessible via a right click on the results tab, enables: • Renaming the tab name • Show the SQL query text • Reload results
2.1. Full list of guides 265 SQream DB, Release 2021.1
• Close the current tab • Close all result tabs
2.1.4.15.3.5 Sorting results
After the results have appeared, sort them by clicking the column name.
2.1.4.15.3.6 Viewing execution information
During query execution the time elapsed is tracked in seconds.
The SHOW STATISTICS button opens the query’s execution plan, for monitoring purposes.
2.1.4.15.3.7 Saving results to a file or clipboard
Query results can be saved to a clipboard (for pasting into another text editor) or a local file.
2.1.4.15.3.8 Database tree
The database tree shows the database objects (e.g. tables, columns, views), as well as some metadata like row counts. It also contains a few predefined catalog queries for execution.
266 Chapter 2. Guides SQream DB, Release 2021.1
Each level contains a context menu relevant to that object, accessible via a right-click.
2.1. Full list of guides 267 SQream DB, Release 2021.1
2.1.4.15.3.9 Database
• Copy the database DDL to the clipboard
2.1.4.15.3.10 Schema
• Drop the schema (copies statement to the clipboard)
2.1.4.15.3.11 Table
• Show row count in the database tree • Copy the create table script to the clipboard • Copy SELECT to clipboard • Copy INSERT to clipboard • Copy DELETE to clipboard • Rename table - Copy RENAME TABLE to clipboard • Create table LIKE - Copy CREATE TABLE AS to clipboard • Add column - Copy ADD COLUMN to clipboard • Truncate table - Copy TRUNCATE to clipboard • Drop table - Copy DROP TABLE to pclipboard • Create a table - Add a new table by running a statement, or alternatively use the Add new link near the TABLES group.
2.1.4.15.3.12 Create a table
Creating a new table is also possible using the wizard which can guide you with creating a table. Refer to the CREATE TABLE reference for information about creating a table (e.g. able parameters like default values, identity, etc.).
268 Chapter 2. Guides SQream DB, Release 2021.1
Fill in the table name, and add a new row for each table column. If a table by the same name exists, check Create or Replace table to overwrite it. Click EXEC to create the table.
2.1.4.15.3.13 Catalog views
To see catalog views, click the catalog view name in the tree. The editor will run a query on that view.
2.1. Full list of guides 269 SQream DB, Release 2021.1
2.1.4.15.3.14 Predefined queries
The editor comes with several predefined catalog queries that are useful for analysis of table compression rates, users and permissions, etc.
2.1.4.15.4 Notifications
Desktop notificaitons lets you receive a notification when a statement is completed. You can minimize the browser or switch to other tabs, and still recieve a notification when the query is done.
Enable the desktop notification through the Allow Desktop Notification from the menu options.
2.1.4.15.5 DDL optimizer
The DDL optimizer tab analyzes database tables and recommends possible optimizations, per the Optimiza- tion and Best Practices guide.
2.1.4.15.5.1 Using the DDL optimizer
Navigate to the DDL optimizer module by selecting it from the (“More”) menu.
270 Chapter 2. Guides SQream DB, Release 2021.1
• Database and Table - select the database and desired table to optimize • Rows is the number to scan for analysis. Defaults to 1,000,000 • Buffer Size - overhead threshold to use when analyzing VARCHAR fields. Defaults to 10%. • Optimize NULLs - attempt to figure out field nullability. Click EXECUTE to start the optimization process.
2.1.4.15.5.2 Analyzing the results
The analysis process shows results for each row.
The results are displayed in two tabs: • OPTIMIZED COLUMNS - review the system recommendation to: 1. decrease the length of VARCHAR fields 2. remove the NULL option • OPTIMIZED DDL - The recommended CREATE TABLE statement Analyzing the DDL culminates in four possible actions:
2.1. Full list of guides 271 SQream DB, Release 2021.1
• COPY DDL TO CLIPBOARD - Copies the optimized CREATE TABLE to the clipboard • CREATE A NEW TABLE - Creates the new table structure with _new appended to the table name. No data is populated • CREATE AND INSERT INTO EXISTING DATA - Create a new table in same database and schema as the original table and populates the data • Back - go back to the statement editor and abandon any recommendations
2.1.4.16 Hardware Guide
This guide describes the SQream DB reference architecture, emphasizing the benefits to the technical audi- ence, and provides guidance for end-users on selecting the right configuration for a SQream DB installation.
Need help?
This page is intended as a “reference” to suggested hardware. However, different workloads require different solution sizes. SQream’s experienced customer support has the experience to advise on these matters to ensure the best experience. Visit SQream’s support portal for additional support.
2.1.4.16.1 A SQream DB Cluster
SQream recommends rackmount servers by server manufacturers Dell, Lenovo, HP, Cisco, Supermicro, IBM, and others. A typical SQream DB cluster includes one or more nodes, consisting of • Two-socket enterprise processors, like the Intel® Xeon® Gold processor family or an IBM® POWER9 processors, providing the high performance required for compute-bound database workloads. • NVIDIA Tesla GPU accelerators, with up to 5,120 CUDA and Tensor cores, running on PCIe or fast NVLINK busses, delivering high core count, and high-throughput performance on massive datasets • High density chassis design, offering between 2 and 4 GPUs in a 1U, 2U, or 3U package, for best-in-class performance per cm2.
2.1.4.16.1.1 Single-Node Cluster Example
A single-node SQream DB cluster can handle between 1 and 8 concurrent users, with up to 1PB of data storage (when connected via NAS). An average single-node cluster can be a rackmount server or workstation, containing the following compo- nents:
272 Chapter 2. Guides SQream DB, Release 2021.1
Component Type Server Dell T640, Dell R740, Dell R940xa, HP ProLiant DL380 Gen10 or similar Processor 2x Intel Xeon Gold 6240 (18C/36HT) 2.6GHz RAM 512 GB Onboard storage • 2x 960GB SSD 2.5in Hot-plug for OS, RAID1 • 14x 3.84TB SSD 2.5in Hot-plug for storage, RAID6
GPU 2x or 4x NVIDIA Tesla T4, V100 or A100
In this system configuration, SQream DB can store about 200TB of raw data (assuming average compression ratio and ~50TB of usable raw storage). If a NAS is used, the 14x SSD drives can be omitted, but SQream recommends 2TB of local spool space on SSD or NVMe drives.
2.1.4.16.1.2 Multi-Node Cluster Example
Multi-node clusters can handle any number of concurrent users. A typical SQream DB cluster relies on a shared storage connected over a network fabric like InfiniBand EDR, 40GbE, or 100GbE An example of a cluster node providing the best performance:
Component Type Server High-density GPU-capable rackmount server, like Dell C4140, IBM AC922, Lenovo SR650. Processor 2x Intel Xeon Gold 6240 (18C/36HT) 2.6GHz or 2x IBM POWER9 RAM 1024 GB Onboard storage • 2x 960GB SSD 2.5in, for OS, RAID1 • 2x 2TB SSD or NVMe, for temporary spool- ing, RAID1
Networking Mellanox Connectx 5 100 Gbps for storage fabric. GPU 4x NVIDIA Tesla V100 32GB or A100
Note: With a NAS connected over GPFS, Lustre, or NFS - each SQream DB worker can read data at up to 5GB/s.
2.1.4.16.2 Cluster Design Considerations
• In a SQream DB installation, the storage and compute are logically separated. While they may reside on the same machine in a standalone installation, they may also reside on different hosts, providing additional flexibility and scalability.
2.1. Full list of guides 273 SQream DB, Release 2021.1
• SQream DB uses all resources in a machine, including CPU, RAM, and GPU to deliver the best performance. 256GB of RAM per physical GPU is recommended, but not required. • Local disk space is required for good performance temporary spooling - particularly when performing intensive larger-than-RAM operations like sorting. SQream recommends an SSD or NVMe drive, in mirrored RAID 1 configuration, with about 2x the RAM size available for temporary storage. This can be shared with the operating system drive if necessary. • When using SAN or NAS devices, SQream recommends around 5GB/s of burst throughput from storage, per GPU.
2.1.4.16.2.1 Balancing Cost and Performance
Prior to designing and deploying a SQream DB cluster, there are a number of important factors to consider. This section provides a breakdown of deployment details intended to help ensure that this installation exceeds or meets the stated requirements. The rationale provided includes the necessary information for modifying configurations to suit the customer use-case scenario.
Component Value Compute - CPU Balance price and performance Compute – GPU Balance price with performance and concurrency Memory – GPU RAM Balance price with concurrency and performance. Memory - RAM Balance price and performance Operating System Availability, reliability, and familiarity Storage Balance price with capacity and performance Network Balance price and performance
2.1.4.16.2.2 CPU Compute
SQream DB relies on multi-core Intel® Xeon® processors or IBM® POWER9 processors. SQream recommends a dual-socket machine populated with CPUs with 18C/36HT or better. While a higher core count may not necessarily affect query performance, more cores will enable higher concurrency and better load performance.
2.1.4.16.2.3 GPU Compute and RAM
The NVIDIA Tesla range of high-throughput GPU accelerators provides the best performance for enterprise environments. Most cards have ECC memory, which is crucial for delivering correct results every time. SQream recommends the NVIDIA Tesla V100 32GB GPU for best performance and highest concurrent user support. GPU RAM, sometimes called GRAM or VRAM is used for processing queries. It is possible to select GPUs with less RAM, like the NVIDIA Tesla V100 16GB or P100 16GB. However, the smaller GPU RAM available will result in reduced concurrency, as the GPU RAM is used extensively in operations like JOINs, ORDER BY, GROUP BY, and all SQL transforms.
274 Chapter 2. Guides SQream DB, Release 2021.1
2.1.4.16.2.4 RAM
Use of error-correcting code memory (ECC) is a practical requirement for SQream DB and is standard on most enterprise server. SQream DB benefits from having large amounts of memory for improved performance on large ‘external’ operations like sorting and joining. Although SQream DB can function with less, we recommend a key of 256GB of RAM per GPU in the machine.
2.1.4.16.2.5 Operating System
SQream DB can run on 64-bit Linux operating systems: • Red Hat Enterprise Linux (RHEL) v7 • CentOS v7 • Amazon Linux 2018.03 • Ubuntu v16.04 LTS, v18.04 LTS • Other Linux distributions may be supported via nvidia-docker
2.1.4.16.2.6 Storage
For clustered scale-out installations, SQream DB relies on NAS/SAN storage. These devices have extremely high reliability and durability, with five 9s of up-time. For stand-alone installations, SQream DB relies on redundant disk configurations, like RAID 10/50. These RAID configurations ensure that blocks of data are replicated between disks, so that failure of anumber disks will not result in data loss or availability of the system. Because storage reliability is important, SQream recommends enterprise-grade SAS SSD drives. However, as with other components – there is a tradeoff for cost/performance. When performance and reliability are important, SQream recommends SAS SSD or NVMe drives. SQream DB functions well with more cost-effective SATA drives and even large spinning-disk arrays.
2.1.4.16.3 Cluster Supporting 32 Concurrent Active User Example
For a 32-user configuration, the number of GPUs should roughly match the number of users. SQreamDB recommends 1 Tesla V100 GPU per 2 users, for full, uninterrupted dedicated access. Each of these servers can support about 8 users on average. The actual number of concurrent users can be higher, depending on the workload. A SQream DB cluster for 32 users consists of the following components: 1. 4 high-density GPU-enabled servers, like the Dell C4140 (Configuration C) with 4x NVIDIA Tesla V100 32GB PCIe GPUs. Each server is equipped with dual Intel ® Xeon ® Gold 6240 CPU, and 1,024GB of RAM. 2. NAS/SAN storage, capable of delivering 1 GB/s per GPU. For the system above, with 4x4 NVIDIA Tesla V100 GPUs, this results in 16GB/s, over multiple bonded, 40GigE or InfiniBand links via a fabric switch. 3. Top-of-Rack (ToR) 10GigE ethernet switch for the BI fabric
2.1. Full list of guides 275 SQream DB, Release 2021.1
4. 40GigE or InfiniBand switches for the storage fabric 5. At least 1 PDU
Read more
Download the full SQream DB Reference Architecture document.
2.1.4.17 Security
SQream DB has some security features that you should be aware of to increase the security of your data.
In this topic:
• Overview • Security best practices for SQream DB – Secure OS access – Change the default SUPERUSER – Create distinct user roles – Limit SUPERUSER access – Password strength guidelines – Use TLS/SSL when possible
2.1.4.17.1 Overview
An initial, unsecured installation of SQream DB can carry some risks:
276 Chapter 2. Guides SQream DB, Release 2021.1
• Your data open to any client that can access an open node through an IP and port combination. • The initial administrator username and password, when unchanged, can let anyone log in. • Network connections to SQream DB aren’t encrypted. To avoid these security risks, SQream DB provides authentication, authorizaiton, logging, and network encryption. Read through the best practices guide to understand more.
2.1.4.17.2 Security best practices for SQream DB
2.1.4.17.2.1 Secure OS access
SQream DB often runs as a dedicated user on the host OS. This user is the file system owner of SQream DB data files. Any user who logs in to the OS with this user can read or delete data from outside of SQream DB. This user can also read any logs which may contain user login attempts. Therefore, it is very important to secure the host OS and prevent unauthorized access. System administrators should only log in to the host OS to perform maintenance tasks like upgrades. A database user should not log in using the same username in production environments.
2.1.4.17.2.2 Change the default SUPERUSER
To bootstrap SQream DB, a new install will always have one SUPERUSER role, typically named sqream. After creating a second SUPERUSER role, remove or change the default credentials to the default sqream user. No database user should ever use the default SUPERUSER role in a production environment. If you don’t change the user role itself, change the password of the default SUPERUSER. See the change password section of our access control guide.
2.1.4.17.2.3 Create distinct user roles
Each user that signs in to a SQream DB cluster should have a distinct user role for several reasons: • For logging and auditing purposes. Each user that logs in to SQream DB can be identified. • For limiting permissions. Use groups and permissions to manage access. See our Access control guide for more information.
2.1.4.17.2.4 Limit SUPERUSER access
Limit users who have the SUPERUSER role. A superuser role bypasses all permissions checks. Only system administrators should have SUPERUSER roles. See our Access control guide for more information.
2.1. Full list of guides 277 SQream DB, Release 2021.1
2.1.4.17.2.5 Password strength guidelines
System administrators should verify the passwords used are strong ones. SQream DB stores passwords as salted SHA1 hashes in the system catalog so they are obscured and can’t be recovered. However, passwords may appear in server logs. Prevent access to server logs by securing OS access as described above. Follow these recommendations to strengthen passwords: • Pick a password that’s easy to remember • At least 8 characters • Mix upper and lower case letters • Mix letters and numbers • Include non-alphanumeric characters (except " and ')
2.1.4.17.2.6 Use TLS/SSL when possible
SQream DB’s protocol implements client/server TLS security (even though it is called SSL). All SQream DB connectors and drivers support transport encryption. Ensure that each connection uses SSL and the correct access port for the SQream DB cluster: • The load balancer (server_picker) is often started with the secure port at an offset of 1 from the original port (e.g. port 3108 for the unsecured connection and port 3109 for the secured connection). • A SQream DB worker is often started with the secure port enabled at an offset of 100 from the original port (e.g. port 5000 for the unsecured connection and port 5100 for the secured connection). Refer to each client driver for instructions on enabling TLS/SSL.
2.1.4.18 Setting up SQream DB
The guides below cover installing SQream DB.
2.1.4.18.1 Before you begin
Before you install or deploy SQream DB, please verify you have the following: • An NVIDIA-capable server, either on-premise or on supported cloud platforms. – Supported operating systems: Red Hat Enterprise Linux v7.x / CentOS v7.x / Ubuntu 18.04 / Amazon Linux – NVIDIA GPU. A Tesla GPU is highly recommended. See more information in our Hardware Guide. – A privileged SSH connection to the server (sudo) • A SQream DB license (contact [email protected] or your SQream account manager for your license key) • A 3rd party tool that can connect to SQream DB via JDBC, ODBC, or Python If you do not have a SQream DB license, visit our website and sign up for SQream DB today! Refer to our Hardware Guide for more information about supported and recommended configurations.
278 Chapter 2. Guides SQream DB, Release 2021.1
2.1.4.18.1.1 What does the installation process look like?
To install SQream DB, we will go through the following steps: 1. Prepare the host OS for NVIDIA driver installation 2. Install the NVIDIA driver 3. Install Docker CE 4. Install nvidia-docker2 5. Prepare disk space for SQream DB 6. Install SQream DB 7. Additional system configuration for performance and stability
What’s next?
• When ready, start installing SQream DB
2.1.4.18.2 Start a local SQream DB cluster with Docker
See Release Notes to learn about what’s new in the latest release of SQream DB. To upgrade to this release, see Upgrading SQream DB with Docker. SQream DB is installed on your hosts with NVIDIA Docker. There are several preparation steps to ensure before installing SQream DB, so follow these instructions carefully.
Note: Installing SQream DB requires a license key. Go to SQream Support or contact your SQream account manager for your license key.
In this topic:
• Preparing your machine for NVIDIA Docker – CentOS 7 / RHEL 7 / Amazon Linux – Ubuntu 18.04 • Install Docker CE and NVIDIA docker – CentOS 7 / RHEL 7 / Amazon Linux (x64) – CentOS 7.6 / RHEL 7.6 (IBM POWER9) – Ubuntu 18.04 (x64) • Preparing directories and mounts for SQream DB • Install the SQream DB Docker container • Starting your first local cluster
2.1. Full list of guides 279 SQream DB, Release 2021.1
2.1.4.18.2.1 Preparing your machine for NVIDIA Docker
To install NVIDIA Docker, we must first install the NVIDIA driver.
Note: SQream DB works best on NVIDIA Tesla series GPUs, which provide better reliability, performance, and stability. The instructions below are written for NVIDIA Tesla GPUs, but other NVIDIA GPUs may work.
Follow the instructions for your OS and architecture:
• CentOS 7 / RHEL 7 / Amazon Linux • Ubuntu 18.04
2.1.4.18.2.2 CentOS 7 / RHEL 7 / Amazon Linux
Recommended Follow the installation instructions on NVIDIA’s CUDA Installation Guide for full instructions suitable for your platform. The information listed below is a summary of the necessary steps, and does not cover the full range of options available.
1. Enable EPEL EPEL provides additional open-source and free packages from the RHEL ecosystem. The NVIDIA driver depends on packages such as DKMS and libvdpau which are only available on third-party repositories, such as EPEL.
$ sudo rpm -Uvh http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch. ,→rpm
1. Install the kernel headers and development tools necessary to compile the NVIDIA driver
$ sudo yum -y install kernel-devel-$(uname -r) kernel-headers-$(uname -r) gcc
2. Install the CUDA repository and install the latest display driver
$ sudo rpm -Uvh https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_ ,→64/cuda-repo-rhel7-10.1.243-1.x86_64.rpm $ sudo yum install -y nvidia-driver-latest
Note: If Linux is running with X, switch to text-only mode before installing the display driver.
$ sudo systemctl set-default multi-user.target
This change permanently disables X. If you need X, change set-default to isolate. This will re-enable X on the next reboot.
1. Restart your machine
280 Chapter 2. Guides SQream DB, Release 2021.1
``sudo reboot``
2. Verify the installation completed correctly, by asking nvidia-smi, NVIDIA’s system management interface application, to list the available GPUs.
$ nvidia-smi -L GPU 0: Tesla V100-PCIE-16GB (UUID: GPU-...) GPU 1: Tesla V100-PCIE-16GB (UUID: GPU-...)
3. Enable NVIDIA’s persistence daemon. This is mandatory for IBM POWER9, but is recommended for other platforms as well.
$ sudo systemctl enable nvidia-persistenced && sudo systemctl start␣ ,→nvidia-persistenced
Important: On POWER9 systems only, disable the udev rule for hot-pluggable memory probing. For Red Hat 7 this rule can be found in /lib/udev/rules.d/40-redhat.rules For Ubuntu, this rule can be found in in /lib/udev/rules.d/40-vm-hotadd.rules The rule generally takes a form where it detects the addition of a memory block and changes the ‘state’ attribute to online. For example, in RHEL7, the rule looks like this: SUBSYSTEM=="memory", ACTION=="add", PROGRAM="/bin/uname -p", RESULT! ="s390*", ATTR{state}=="offline", ATTR{state}="online" This rule must be disabled by copying the file to /etc/udev/rules.d and comment- ing out, removing, or changing the hot-pluggable memory rule in the /etc copy so that it does not apply to NVIDIA devices on POWER9. • On RHEL 7.5 or earlier versions:
$ sudo cp /lib/udev/rules.d/40-redhat.rules /etc/udev/rules.d $ sudo sed -i '/SUBSYSTEM=="memory", ACTION=="add"/d' /etc/ ,→udev/rules.d/40-redhat.rules
• On RHEL 7.6 and later versions:
$ sudo cp /lib/udev/rules.d/40-redhat.rules /etc/udev/rules.d $ sudo sed -i 's/SUBSYSTEM!="memory", ACTION!="add", GOTO= ,→"memory_hotplug_end"/SUBSYSTEM=="*", GOTO="memory_hotplug_ ,→end"/' /etc/udev/rules.d/40-redhat.rules
Reboot the system to initialize the above changes
4. Continue to installing NVIDIA Docker for RHEL
2.1.4.18.2.3 Ubuntu 18.04
Recommended
2.1. Full list of guides 281 SQream DB, Release 2021.1
Follow the installation instructions on NVIDIA’s CUDA Installation Guide for full instructions suitable for your platform. The information listed below is a summary of the necessary steps, and does not cover the full range of options available.
1. Install the kernel headers and development tools necessary
$ sudo apt-get update $ sudo apt-get install linux-headers-$(uname -r) gcc
2. Install the CUDA repository and driver on Ubuntu
$ curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_ ,→64/cuda-repo-ubuntu1804_10.1.243-1_amd64.deb $ sudo dpkg -i cuda-repo-ubuntu1804_10.1.243-1_amd64.deb $ sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/ ,→repos/ubuntu1804/x86_64/7fa2af80.pub $ sudo apt-get update && sudo apt-get install -y nvidia-driver-418
3. Restart your machine sudo reboot 4. Verify the installation completed correctly, by asking nvidia-smi, NVIDIA’s system management interface application, to list the available GPUs.
$ nvidia-smi -L GPU 0: Tesla V100-PCIE-16GB (UUID: GPU-...) GPU 1: Tesla V100-PCIE-16GB (UUID: GPU-...)
5. Enable NVIDIA’s persistence daemon. This is mandatory for IBM POWER9, but is recommended for other platforms as well.
$ sudo systemctl enable nvidia-persistenced
6. Continue to installing NVIDIA Docker for Ubuntu
2.1.4.18.2.4 Install Docker CE and NVIDIA docker
Follow the instructions for your OS and architecture:
• CentOS 7 / RHEL 7 / Amazon Linux (x64) • CentOS 7.6 / RHEL 7.6 (IBM POWER9) • Ubuntu 18.04 (x64)
2.1.4.18.2.5 CentOS 7 / RHEL 7 / Amazon Linux (x64)
Note: For IBM POWER9, see the next section installing NVIDIA Docker for IBM POWER9
1. Follow the instructions for Docker CE for your platform at Get Docker Engine - Community for CentOS
282 Chapter 2. Guides SQream DB, Release 2021.1
2. Tell Docker to start after a reboot
$ sudo systemctl enable docker && sudo systemctl start docker
3. Verify that docker is running
$ sudo systemctl status docker docker.service - Docker Application Container Engine Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset:␣ ,→disabled) Active: active (running) since Mon 2019-08-12 08:22:30 IDT; 1 months 27 days ago Docs: https://docs.docker.com Main PID: 65794 (dockerd) Tasks: 76 Memory: 124.5M CGroup: /system.slice/docker.service 65794 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd. ,→sock
4. Let your current user manage Docker, without requiring sudo
$ sudo usermod -aG docker $USER
Then, log out and log back in:
$ exit
5. Install nvidia-docker
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) $ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker. ,→repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo $ sudo yum install -y nvidia-container-toolkit $ sudo yum install nvidia-docker2 $ sudo pkill -SIGHUP dockerd $ sudo systemctl restart docker
Note: Occasionally, there may be a signature verification error while obtaining nvidia-docker. The error looks something like this: [Errno -1] repomd.xml signature could not be verified for nvidia-docker Run the following commands to update the repository keys:
$ DIST=$(sed -n 's/releasever=//p' /etc/yum.conf) $ DIST=${DIST:-$(. /etc/os-release; echo $VERSION_ID)} $ sudo rpm -e gpg-pubkey-f796ecb0 $ sudo gpg --homedir /var/lib/yum/repos/$(uname -m)/$DIST/*/gpgdir --delete-key␣ ,→f796ecb0 $ sudo gpg --homedir /var/lib/yum/repos/$(uname -m)/latest/nvidia-docker/gpgdir -- ,→delete-key f796ecb0 $ sudo gpg --homedir /var/lib/yum/repos/$(uname -m)/latest/nvidia-container-runtime/ ,→gpgdir --delete-key f796ecb0 $ sudo gpg --homedir /var/lib/yum/repos/$(uname -m)/latest/libnvidia-container/ ,→gpgdir --delete-key f796ecb0
2.1. Full list of guides 283 SQream DB, Release 2021.1
6. Verify the NVIDIA docker installation
$ sudo docker run --runtime=nvidia --rm nvidia/cuda:10.1-base nvidia-smi -L GPU 0: Tesla V100-PCIE-16GB (UUID: GPU-...) GPU 1: Tesla V100-PCIE-16GB (UUID: GPU-...)
7. Continue to Installing the SQream DB Docker container
2.1.4.18.2.6 CentOS 7.6 / RHEL 7.6 (IBM POWER9)
On POWER9, SQream DB is supported only on RHEL 7.6. 1. Install Docker for IBM POWER9:
$ wget http://ftp.unicamp.br/pub/ppc64el/rhel/7_1/docker-ppc64el/container-selinux- ,→2.9-4.el7.noarch.rpm $ wget http://ftp.unicamp.br/pub/ppc64el/rhel/7_1/docker-ppc64el/docker-ce-18.03.1. ,→ce-1.el7.centos.ppc64le.rpm $ yum install -y container-selinux-2.9-4.el7.noarch.rpm docker-ce-18.03.1.ce-1.el7. ,→centos.ppc64le.rpm
2. Tell Docker to start after a reboot:
$ sudo systemctl enable docker && sudo systemctl start docker
3. Verify that docker is running:
1 $ sudo systemctl status docker 2 docker.service - Docker Application Container Engine 3 Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset:␣ ,→disabled) 4 Active: active (running) since Mon 2019-08-12 08:22:30 IDT; 1 months 27 days ago 5 Docs: https://docs.docker.com 6 Main PID: 65794 (dockerd) 7 Tasks: 76 8 Memory: 124.5M 9 CGroup: /system.slice/docker.service 10 65794 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd. ,→sock
4. Let your current user manage Docker, without requiring sudo
$ sudo usermod -aG docker $USER
Note: Log out and log back in again after this action
5. Install nvidia-docker • Install the NVIDIA container and container runtime packages from NVIDIA’s repository:
284 Chapter 2. Guides SQream DB, Release 2021.1
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) $ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia- ,→docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo
$ sudo yum install -y libnvidia-container* nvidia-container-runtime*
• Add the NVIDIA runtime to the Docker daemon and restart docker:
$ sudo mkdir -p /etc/systemd/system/docker.service.d/ $ echo -e "[Service]\nExecStart\nExecStart=/usr/bin/dockerd --add- ,→runtime=nvidia=/usr/bin/nvidia-container-runtime" | sudo tee /etc/ ,→systemd/system/docker.service.d/override.conf
$ sudo systemctl daemon-reload && sudo systemctl restart docker
Note: Occasionally, there may be a signature verification error while obtaining nvidia-docker. The error looks something like this: [Errno -1] repomd.xml signature could not be verified for nvidia-docker Run the following commands to update the repository keys:
$ DIST=$(sed -n 's/releasever=//p' /etc/yum.conf) $ DIST=${DIST:-$(. /etc/os-release; echo $VERSION_ID)} $ sudo rpm -e gpg-pubkey-f796ecb0 $ sudo gpg --homedir /var/lib/yum/repos/$(uname -m)/$DIST/*/gpgdir --delete-key␣ ,→f796ecb0 $ sudo gpg --homedir /var/lib/yum/repos/$(uname -m)/latest/nvidia-docker/gpgdir -- ,→delete-key f796ecb0 $ sudo gpg --homedir /var/lib/yum/repos/$(uname -m)/latest/nvidia-container-runtime/ ,→gpgdir --delete-key f796ecb0 $ sudo gpg --homedir /var/lib/yum/repos/$(uname -m)/latest/libnvidia-container/ ,→gpgdir --delete-key f796ecb0
6. Verify the NVIDIA docker installation succeeded
$ docker run --runtime=nvidia --rm nvidia/cuda-ppc64le:10.1-base nvidia-smi -L GPU 0: Tesla V100-SXM2-16GB (UUID: GPU-...) GPU 1: Tesla V100-SXM2-16GB (UUID: GPU-...)
7. Continue to Installing the SQream DB Docker container
2.1.4.18.2.7 Ubuntu 18.04 (x64)
1. Follow the instructions for Docker CE for your platform at Get Docker Engine - Community for CentOS 2. Tell Docker to start after a reboot
$ sudo systemctl enable docker && sudo systemctl start docker
3. Verify that docker is running
2.1. Full list of guides 285 SQream DB, Release 2021.1
1 $ sudo systemctl status docker 2 docker.service - Docker Application Container Engine 3 Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset:␣ ,→disabled) 4 Active: active (running) since Mon 2019-08-12 08:22:30 IDT; 1 months 27 days ago 5 Docs: https://docs.docker.com 6 Main PID: 65794 (dockerd) 7 Tasks: 76 8 Memory: 124.5M 9 CGroup: /system.slice/docker.service 10 65794 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd. ,→sock
4. Let your current user manage Docker, without requiring sudo
$ sudo usermod -aG docker $USER
Note: Log out and log back in again after this action
5. Install nvidia-docker
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) $ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - $ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker. ,→list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
$ sudo apt-get update
$ sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit nvidia- ,→docker2 $ sudo pkill -SIGHUP dockerd $ sudo systemctl restart docker
6. Verify the NVIDIA docker installation
$ sudo docker run --runtime=nvidia --rm nvidia/cuda:10.1-base nvidia-smi -L GPU 0: Tesla V100-PCIE-16GB (UUID: GPU-...) GPU 1: Tesla V100-PCIE-16GB (UUID: GPU-...)
7. Continue to Installing the SQream DB Docker container
2.1.4.18.2.8 Preparing directories and mounts for SQream DB
SQream DB contains several directories that need to be defined
286 Chapter 2. Guides SQream DB, Release 2021.1
Table 15: Directories and paths Path name Definition storage The location where SQream DB stores data, metadata, and logs exposed path A location that SQream DB can read and write to. Used for allowing access to shared raw files like CSVs on local orNFS drives logs Optional location for debug logs
Note: By default, SQream DB can’t access any OS path. You must explicitly allow it.
2.1.4.18.2.9 Install the SQream DB Docker container
1. Download the SQream DB tarball and license package In the e-mail from your account manager at SQream, you have received a download link for the SQream DB installer and a license package. Download the SQream DB tarball to the user home directory. For example:
$ cd ~ $ curl -O {download URL}
2. Extract the tarball into your home directory
$ tar xf sqream_installer-2.0.7-DB2019.2.1.4-CO1.7.5-ED3.0.1-x86_64.tar.gz
3. Copy the license package Copy the license package from your home directory to the license subdirectory which is located in the newly created SQream installer directory. For example, if the licence package is titled license_package.tar.gz:
$ cp ~/license_package.tar.gz sqream_installer-2.0.7-DB2019.2.1.4-CO1.7.5-ED3.0.1- ,→x86_64/license
4. Enter the installer directory
$ cd sqream_installer-2.0.7-DB2019.2.1.4-CO1.7.5-ED3.0.1-x86_64
5. Install SQream DB In most cases, the installation command will look like this:
$ ./sqream-install -i -k -v
For example, if the main storage path for SQream DB is /mnt/largedrive and the desired shared access path is /mnt/nfs/source_files, the command will look like:
$ ./sqream-install -i -k -v /mnt/largedrive -d /mnt/nfs/source_files
For a full list of options and commands, see the sqream-installer reference
2.1. Full list of guides 287 SQream DB, Release 2021.1
6. SQream DB is now successfully installed, but not yet running.
2.1.4.18.2.10 Starting your first local cluster
1. Enter the sqream console utility, which helps coordinate SQream DB components
$ ./sqream-console
2. Start the master components:
sqream-console>sqream master --start starting master server in single_host mode ... sqream_single_host_master is up and listening on ports: 3105,3108
3. Start workers to join the cluster:
sqream-console>sqream worker --start 2 started sqream_single_host_worker_0 on port 5000, allocated gpu: 0 started sqream_single_host_worker_1 on port 5001, allocated gpu: 1
Note: By default, each worker is allocated a full GPU. To launch more workers than available GPUs, see the sqream console reference
4. SQream DB is now running! (Exit the console by typing exit when done.)
What’s next?
• Test your SQream DB installation by creating your first table • Connect an external tool to SQream DB • Additional system configuration for performance and stability
2.1.4.18.3 Recommended Pre-Installation Configuration
Once you’ve installed SQream DB, you can and should tune your system for better performance and stability. This page provides recommendations for production deployments of SQream DB.
In this topic:
• Recommended BIOS Settings • Installing the Operating System • Configuring the Operating System – Logging In to the Server – Automatically Creating a SQream User – Manually Creating a SQream User – Setting Up A Locale
288 Chapter 2. Guides SQream DB, Release 2021.1
– Installing the Required Packages – Installing the Recommended Tools – Installing Python 3.6.7 – Installing NodeJS on CentOS – Installing NodeJS on Ubuntu – Configuring the Network Time Protocol (NTP) – Configuring the Network Time Protocol Server – Configuring the Server to Boot Without theUI – Configuring the Security Limits – Configuring the Kernel Parameters – Configuring the Firewall – Disabling selinux – Configuring the /etc/hosts File – Configuring the DNS • Installing the Nvidia CUDA Driver – CUDA Driver Prerequisites – Updating the Kernel Headers – Disabling Nouveau – Installing the CUDA Driver ∗ Installing the CUDA Driver from the Repository ∗ Tuning Up NVIDIA Performance ∗ Disabling Automatic Bug Reporting Tools • Enabling Core Dumps – Checking the abrtd Status – Setting the Limits – Creating the Core Dumps Directory – Setting the Output Directory of the /etc/sysctl.conf File – Verifying that the Core Dumps Work – Troubleshooting Core Dumping
2.1.4.18.3.1 Recommended BIOS Settings
The BIOS settings may have a variety of names, or may not exist on your system. Each system vendor has a different set of settings and variables. It is safe to skip any and all of the configuration steps, but this may impact performance. If any doubt arises, consult the documentation for your server or your hardware vendor for the correct way to apply the settings.
2.1. Full list of guides 289 SQream DB, Release 2021.1
Item Setting Rationale Management console Connected Connection to OOB required to preserve continuous access network uptime. All drives. Connected and dis- Prerequisite for cluster or OS installation. played on RAID in- terface RAID volumes. Configured ac- Clustered to increase logical volume and provide cording to project redundancy. guidelines. Must be rebooted to take effect. Fan speed Thermal Dell fan speed: High NVIDIA Tesla GPUs are passively cooled and re- Configuration. Maximum. Specified quire high airflow to operate at full performance. minimum setting: 60. HPe thermal configura- tion: Increased cool- ing. Power regulator or HPe: HP static high Other power profiles (such as “balanced”) throttle iDRAC power unit performance mode the CPU and diminishes performance. Throttling policy enabled. Dell: iDRAC may also cause GPU failure. power unit policy (power cap policy) disabled. System Profile, High Performance The Performance profile provides potentially in- Power Profile, or creased performance by maximizing processor fre- Performance Profile quency, and the disabling certain power saving fea- tures such as C-states. Use this setting for environ- ments that are not sensitive to power consumption. Power Cap Policy or Disabled Other power profiles (like “balanced”) throttle the Dynamic power cap- CPU and may diminish performance or cause GPU ping failure. This setting may appear together with the above (Power profile or Power regulator). This setting allows disabling system ROM power cali- bration during the boot process. Power regula- tor settings are named differently in BIOS and iLO/iDRAC.
2.1.4.18.3.2 Installing the Operating System
Either the CentOS (versions 7.6-7.9) or RHEL (versions 7.6-7.9) must be installed before installing the SQream database. Either the customer or a SQream representative can perform the installation. To install the operating system: 1. Select a language (English recommended). 2. From Software Selection, select Minimal. 3. Select the Development Tools group checkbox. 4. Continue the installation. 5. Set up the necessary drives and users as per the installation process. Using Debugging Tools is recommended for future problem-solving if necessary.
290 Chapter 2. Guides SQream DB, Release 2021.1
Selecting the Development Tools group installs the following tools: • autoconf • automake • binutils • bison • flex • gcc • gcc-c++ • gettext • libtool • make • patch • pkgconfig • redhat-rpm-config • rpm-build • rpm-sign The root user is created and the OS shell is booted up.
2.1.4.18.3.3 Configuring the Operating System
When configuring the operating system, several basic settings related to creating a new server are required. Configuring these as part of your basic set-up increases your server’s security and usability.
2.1.4.18.3.4 Logging In to the Server
You can log in to the server using the server’s IP address and password for the root user. The server’s IP address and root user were created while installing the operating system above.
2.1.4.18.3.5 Automatically Creating a SQream User
To automatically create a SQream user: 1. If a SQream user was created during installation, verify that the same ID is used on every server:
$ sudo id sqream
The ID 1000 is used on each server in the following example:
$ uid=1000(sqream) gid=1000(sqream) groups=1000(sqream)
2. If the ID’s are different, delete the SQream user and SQream group from both servers:
$ sudo userdel sqream
2.1. Full list of guides 291 SQream DB, Release 2021.1
3. Recreate it using the same ID:
$ sudo rm /var/spool/mail/sqream
2.1.4.18.3.6 Manually Creating a SQream User
To manually create a SQream user: SQream enables you to manually create users. This section shows you how to manually create a user with the UID 1111. You cannot manually create during the operating system installation procedure. 1. Add a user with an identical UID on all cluster nodes:
$ useradd -u 1111 sqream
2. Add the user sqream to the wheel group.
$ sudo usermod -aG wheel sqream
You can remove the SQream user from the wheel group when the installation and configuration are complete:
$ passwd sqream
3. Log out and log back in as sqream. Note: If you deleted the sqream user and recreated it with different ID, to avoid permission errors, you must change its ownership to /home/sqream. 4. Change the sqream user’s ownership to /home/sqream:
$ sudo chown -R sqream:sqream /home/sqream
2.1.4.18.3.7 Setting Up A Locale
SQream enables you to set up a locale. In this example, the locale used is your own location. To set up a locale: 1. Set the language of the locale:
$ sudo localectl set-locale LANG=en_US.UTF-8
2. Set the time stamp (time and date) of the locale:
$ sudo timedatectl set-timezone Asia/Jerusalem
If needed, you can run the timedatectl list-timezones command to see your current time-zone.
2.1.4.18.3.8 Installing the Required Packages
You can install the required packages by running the following command:
$ sudo yum install ntp pciutils monit zlib-devel openssl-devel kernel-devel-$(uname -r)␣ ,→kernel-headers-$(uname -r) gcc net-tools wget jq
292 Chapter 2. Guides SQream DB, Release 2021.1
2.1.4.18.3.9 Installing the Recommended Tools
You can install the recommended tools by running the following command:
$ sudo yum install bash-completion.noarch vim-enhanced vim-common net-tools iotop htop␣ ,→psmisc screen xfsprogs wget yum-utils deltarpm dos2unix
2.1.4.18.3.10 Installing Python 3.6.7
1. Download the Python 3.6.7 source code tarball file from the following URL into the /home/sqream directory:
$ wget https://www.python.org/ftp/python/3.6.7/Python-3.6.7.tar.xz
2. Extract the Python 3.6.7 source code into your current directory:
$ tar -xf Python-3.6.7.tar.xz
3. Navigate to the Python 3.6.7 directory:
$ cd Python-3.6.7/Python-3
4. Run the ./configure script:
$ ./configure
5. Build the software:
$ make -j30
6. Install the software:
$ sudo make install
7. Verify that Python 3.6.7 has been installed:
$ python3.6.7
2.1.4.18.3.11 Installing NodeJS on CentOS
To install the node.js on CentOS: 1. Download the setup_12.x file as a root user logged in shell:
$ curl -sL https://rpm.nodesource.com/setup_12.x | sudo bash -
2. Clear the YUM cache and update the local metadata:
$ sudo yum clean all && sudo yum makecache fast
3. Install the node.js) file:
2.1. Full list of guides 293 SQream DB, Release 2021.1
$ sudo yum install -y nodejs
2.1.4.18.3.12 Installing NodeJS on Ubuntu
To install the node.js file on Ubuntu: 1. Download the setup_12.x file as a root user logged in shell:
$ curl -sL https://rpm.nodesource.com/setup_12.x | sudo bash -
2. Install the node.js file:
$ sudo apt-get install -y nodejs
3. Verify the node version:
2.1.4.18.3.13 Configuring the Network Time Protocol (NTP)
This section describes how to configure your NTP. If you don’t have internet access, see Configure NTP Client to Synchronize with NTP .Server To configure your NTP: 1. Install the NTP file.
$ sudo yum install ntp
2. Enable the ntpd program.
$ sudo systemctl enable ntpd
3. Start the ntdp program.
$ sudo systemctl start ntpd
4. Print a list of peers known to the server and a summary of their states.
$ sudo ntpq -p
2.1.4.18.3.14 Configuring the Network Time Protocol Server
If your organization has an NTP server, you can configure it. To configure your NTP server: 1. Output your NTP server address and append /etc/ntpd.conf to the outuput.
$ echo -e "\nserver
2. Restart the service.
$ sudo systemctl restart ntpd
294 Chapter 2. Guides SQream DB, Release 2021.1
3. Check that synchronization is enabled:
$ sudo timedatectl
Checking that synchronization is enabled generates the following output:
$ Local time: Sat 2019-10-12 17:26:13 EDT Universal time: Sat 2019-10-12 21:26:13 UTC RTC time: Sat 2019-10-12 21:26:13 Time zone: America/New_York (EDT, -0400) NTP enabled: yes NTP synchronized: yes RTC in local TZ: no DST active: yes Last DST change: DST began at Sun 2019-03-10 01:59:59 EST Sun 2019-03-10 03:00:00 EDT Next DST change: DST ends (the clock jumps one hour backwards) at Sun 2019-11-03 01:59:59 EDT Sun 2019-11-03 01:00:00 EST
2.1.4.18.3.15 Configuring the Server to Boot Without theUI
You can configure your server to boot without a UI in cases when it is not required (recommended) by running the following command:
$ sudo systemctl set-default multi-user.target
Running this command activates the NO-UI server mode.
2.1.4.18.3.16 Configuring the Security Limits
The security limits refers to the number of open files, processes, etc. You can configure the security limits by running the echo -e command as a root user logged in shell:
$ sudo bash
$ echo -e "sqream soft nproc 1000000\nsqream hard nproc 1000000\nsqream soft nofile␣ ,→1000000\nsqream hard nofile 1000000\nsqream soft core unlimited\nsqream hard core␣ ,→unlimited" >> /etc/security/limits.conf
2.1.4.18.3.17 Configuring the Kernel Parameters
To configure the kernel parameters: 1. Insert a new line after each kernel parameter:
$ echo -e "vm.dirty_background_ratio = 5 \n vm.dirty_ratio = 10 \n vm.swappiness =␣ ,→10 \n vm.vfs_cache_pressure = 200 \n vm.zone_reclaim_mode = 0 \n" >> /etc/sysctl. ,→conf
2.1. Full list of guides 295 SQream DB, Release 2021.1
Notice: In the past, the vm.zone_reclaim_mode parameter was set to 7. In the latest Sqream version, the vm.zone_reclaim_mode parameter must be set to 0. If it is not set to 0, when a numa node runs out of memory, the system will get stuck and will be unable to pull memory from other numa nodes. 2. Check the maximum value of the fs.file.
$ sysctl -n fs.file-max
3. Optional - If the maximum value of the fs.file is smaller than 2097152, run the following command:
$ echo "fs.file-max=2097152" >> /etc/sysctl.conf
IP4 forward must be enabled for Docker and K8s installation only.
2.1.4.18.3.18 Configuring the Firewall
The example in this section shows the open ports for four sqreamd sessions. If more than four are required, open the required ports as needed. Port 8080 in the example below is a new UI port. To configure the firewall: 1. Start the service and enable FirewallID on boot:
$ systemctl start firewalld
2. Add the following ports to the permanent firewall:
$ firewall-cmd --zone=public --permanent --add-port=8080/tcp $ firewallfirewall-cmd --zone=public --permanent --add-port=3105/tcp $ firewall-cmd --zone=public --permanent --add-port=3108/tcp $ firewall-cmd --zone=public --permanent --add-port=5000-5003/tcp $ firewall-cmd --zone=public --permanent --add-port=5100-5103/tcp $ firewall-cmd --permanent --list-all
3. Reload the firewall:
$ firewall-cmd --reload
4. Start the service and enable FirewallID on boot:
$ systemctl start firewalld
If you do not need the firewall, you can disable it:
$ sudo systemctl disable firewalld
2.1.4.18.3.19 Disabling selinux
To disable selinux: 1. Show the status of selinux:
$ sudo sestatus
2. If the output is not disabled, edit the /etc/selinux/config file:
296 Chapter 2. Guides SQream DB, Release 2021.1
$ sudo vim /etc/selinux/config
3. Change SELINUX=enforcing to SELINUX=disabled. The above changes will only take effect after rebooting the server. You can disable selinux immediately after rebooting the server by running the following command:
$ sudo setenforce 0
2.1.4.18.3.20 Configuring the /etc/hosts File
To configure the /etc/hosts file: 1. Edit the /etc/hosts file:
$ sudo vim /etc/hosts
2. Call your local host:
$ 127.0.0.1 localhost $
2.1.4.18.3.21 Configuring the DNS
To configure the DNS: 1. Run the ifconfig commasnd to check your NIC name. In the following example, eth0 is the NIC name:
$ sudo vim /etc/sysconfig/network-scripts/ifcfg-eth0
2. Replace the DNS lines from the example above with your own DNS addresses :
$ sudo vim /etc/sysconfig/network-scripts/ifcfg-4.4.4.4 $ sudo vim /etc/sysconfig/network-scripts/ifcfg-8.8.8.8
2.1.4.18.3.22 Installing the Nvidia CUDA Driver
Warning: If your UI runs on the server, the server must be stopped before installing the CUDA drivers.
2.1.4.18.3.23 CUDA Driver Prerequisites
1. Verify that the NVIDIA card has been installed and is detected by the system:
$ lspci | grep -i nvidia
2. Check which version of gcc has been installed:
$ gcc --version
2.1. Full list of guides 297 SQream DB, Release 2021.1
3. If gcc has not been installed, install it for one of the following operating systems: • On RHEL/CentOS:
$ sudo yum install -y gcc
• On Ubuntu:
$ sudo apt-get install gcc
2.1.4.18.3.24 Updating the Kernel Headers
To update the kernel headers: 1. Update the kernel headers on one of the following operating systems: • On RHEL/CentOS:
$ sudo yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r)
• On Ubuntu:
$ sudo apt-get install linux-headers-$(uname -r)
2. Install wget one of the following operating systems: • On RHEL/CentOS:
$ sudo yum install wget
• On Ubuntu:
$ sudo apt-get install wget
2.1.4.18.3.25 Disabling Nouveau
You can disable Nouveau, which is the default driver. To disable Nouveau: 1. Check if the Nouveau driver has been loaded:
$ lsmod | grep nouveau
If the Nouveau driver has been loaded, the command above generates output. 2. Blacklist the Nouveau drivers to disable them:
$ cat < 3. Regenerate the kernel initramfs directory set: 1. Modify the initramfs directory set: $ sudo dracut --force 298 Chapter 2. Guides SQream DB, Release 2021.1 2. Reboot the server: $ sudo reboot 2.1.4.18.3.26 Installing the CUDA Driver This section describes how to install the CUDA driver. Notice: The version of the driver installed on the customer’s server must be equal or higher than the driver included in the Sqream release package. Contact a Sqream customer service representative to identify the correct version to install. 2.1.4.18.3.27 Installing the CUDA Driver from the Repository Installing the CUDA driver from the Repository is the recommended installation method. To install the CUDA driver from the Repository: 1. Install the CUDA dependencies for one of the following operating systems: • For RHEL: $ sudo rpm -Uvh http://dl.fedoraproject.org/pub/epel/epel-release-latest-7. ,→noarch.rpm • For CentOS: $ sudo yum install epel-release 2. Install the CUDA dependencies from the epel repository: $ sudo yum install dkms libvdpau Installing the CUDA depedendencies from the epel repository is only required for installing runfile. 3. Download the required local repository: $ wget http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/ ,→cuda-repo-rhel7-10-1-local-10.1.243-418.87.00-1.0-1.x86_64.rpm 4. Install the required local repository: $ sudo yum localinstall cuda-repo-rhel7-10-1-local-10.1.243-418.87.00-1.0-1.x86_64. ,→rpm For example, RHEL7 for cuda 10.1. 5. Install the CUDA drivers: a. Clear the YUM cache: $ sudo yum clean all b. Install the most current DKMS (Dynamic Kernel Module Support) NVIDIA driver: 2.1. Full list of guides 299 SQream DB, Release 2021.1 $ sudo yum -y install nvidia-driver-latest-dkms 6. Verify that the installation was successful: $ nvidia-smi You can prepare the CUDA driver offline from a server connected to the CUDA repo by running the following commands as a root user: 7. Query all the packages installed in your system, and verify that cuda-repo has been installed: $ rpm -qa |grep cuda-repo 8. Navigate to the correct repository: $ cd /etc/yum.repos.d/ 9. List in long format and print lines matching a pattern for the cuda file: $ ls -l |grep cuda The following is an example of the correct output: $ cuda-10-1-local.repo 10. Edit the /etc/yum.repos.d/cuda-10-1-local.repo file: $ vim /etc/yum.repos.d/cuda-10-1-local.repo The following is an example of the correct output: $ name=cuda-10-1-local 11. Clone the repository to a location where it can be copied from: $ reposync -g -l -m --repoid=cuda-10-1-local --download_path=/var/cuda-repo-10. ,→1-local 12. Copy the repository to the installation server and create the repository: $ createrepo -g comps.xml /var/cuda-repo-10.1-local 13. Add a repo configuration file in /etc/yum.repos.d/ by editing the /etc/yum.repos.d/cuda-10.1- local.repo repository: $ [cuda-10.1-local] $ name=cuda-10.1-local $ baseurl=file:///var/cuda-repo-10.1-local $ enabled=1 $ gpgcheck=1 $ gpgkey=file:///var/cuda-repo-10-1-local/7fa2af80.pub 14. Install the CUDA drivers by installing the most current DKMS (Dynamic Kernel Module Support) NVIDIA driver as a root user logged in shell: 300 Chapter 2. Guides SQream DB, Release 2021.1 $ sudo yum -y install nvidia-driver-latest-dkms 2.1.4.18.3.28 Tuning Up NVIDIA Performance This section describes how to tune up NVIDIA performance. The procedures in this section are relevant to Intel only. To tune up NVIDIA performance: 1. Change the permissions on the rc.local file to executable: $ sudo chmod +x /etc/rc.local 2. Edit the /etc/yum.repos.d/cuda-10-1-local.repo file: $ sudo vim /etc/rc.local 3. Add the following lines: a. For V100: $ nvidia-persistenced b. For IBM (mandatory): $ sudo systemctl enable nvidia-persistenced Notice: Setting up the NVIDIA POWER9 CUDA driver includes additional set-up requirements. The NVIDIA POWER9 CUDA driver will not function properly if the additional set-up requirements are not followed. See POWER9 Setup for the additional set-up requirements. c. For K80: $ nvidia-persistenced $ nvidia-smi -pm 1 $ nvidia-smi -acp 0 $ nvidia-smi --auto-boost-permission=0 $ nvidia-smi --auto-boost-default=0 4. Reboot the server and run the NVIDIA System Management Interface (NVIDIA SMI): $ nvidia-smi 2.1.4.18.3.29 Disabling Automatic Bug Reporting Tools To disable automatic bug reporting tools: 1. Run the following abort commands: $ for i in abrt-ccpp.service abrtd.service abrt-oops.service abrt-pstoreoops. ,→service abrt-vmcore.service abrt-xorg.service ; do sudo systemctl disable $i;␣ ,→sudo systemctl stop $i; done The server is ready for the SQream software installation. 2.1. Full list of guides 301 SQream DB, Release 2021.1 2. Run the following checks: a. Check the OS release: $ cat /etc/os-release b. Verify that a SQream user exists and has the same ID on all cluster member services: $ id sqream c. Verify that the storage is mounted: $ mount d. Verify that the driver has been installed correctly: $ nvidia-smi e. Check the maximum value of the fs.file: $ sysctl -n fs.file-max The desired output when checking the maximum value of the fs.file is greater or equal to 2097152. f. Run the following command as a SQream user: 3. Configure the security limits by running the echo -e command as a root user logged in shell: $ sudo bash $ echo -e "sqream soft nproc 1000000\nsqream hard nproc 1000000\nsqream soft nofile␣ ,→1000000\nsqream hard nofile 1000000\nsqream soft core unlimited\nsqream hard core␣ ,→unlimited" >> /etc/security/limits.conf 2.1.4.18.3.30 Enabling Core Dumps Enabling core dumps is recommended, but optional. To enable core dumps: 1. Check the abrtd Status 2. Set the limits 3. Create the core dumps directory. 2.1.4.18.3.31 Checking the abrtd Status To check the abrtd status: 1. Check if abrtd is running: $ sudo ps -ef |grep abrt 2. If abrtd is running, stop it: 302 Chapter 2. Guides SQream DB, Release 2021.1 $ sudo service abrtd stop $ sudo chkconfig abrt-ccpp off $ sudo chkconfig abrt-oops off $ sudo chkconfig abrt-vmcore off $ sudo chkconfig abrt-xorg off $ sudo chkconfig abrtd off 2.1.4.18.3.32 Setting the Limits To set the limits: 1. Set the limits: $ ulimit -c 2. If the output is 0, add the following lines to the limits.conf file (/etc/security): $ * soft core unlimited $ * hard core unlimited 3. Log out and log in to apply the limit changes. 2.1.4.18.3.33 Creating the Core Dumps Directory To set the core dumps directory: 1. Make the /tmp/core_dumps directory: $ mkdir /tmp/core_dumps 2. Set the ownership of the /tmp/core_dumps directory: $ sudo chown sqream.sqream /tmp/core_dumps 3. Grant read, write, and execute permissions to all users: $ sudo chmod -R 777 /tmp/core_dumps 2.1.4.18.3.34 Setting the Output Directory of the /etc/sysctl.conf File To set the output directory of the /etc/sysctl.conf file: 1. Edit the /etc/sysctl.conf file: $ sudo vim /etc/sysctl.conf 2. Add the following to the bottom of the file: $ kernel.core_uses_pid = 1 $ kernel.core_pattern = / 2.1. Full list of guides 303 SQream DB, Release 2021.1 3. To apply the changes without rebooting the server, run: $ sudo sysctl -p 4. Check that the core output directory points to the following: $ sudo cat /proc/sys/kernel/core_pattern The following shows the correct generated output: $ /tmp/core_dumps/core-%e-%s-%u-%g-%p-%t 5. Verify that the core dumping works: $ select abort_server(); 2.1.4.18.3.35 Verifying that the Core Dumps Work You can verify that the core dumps work only after installing and running SQream. This causes the server to crash and a new core.xxx file to be included in the folder that is written in /etc/sysctl.conf To verify that the core dumps work: 1. Stop and restart all SQream services. 2. Connect to SQream with ClientCmd and run the following command: $ select abort_server(); 2.1.4.18.3.36 Troubleshooting Core Dumping This section describes the troubleshooting procedure to be followed if all parameters have been configured correctly, but the cores have not been created. To troubleshoot core dumping: 1. Reboot the server. 2. Verify that you have folder permissions: $ sudo chmod -R 777 /tmp/core_dumps 3. Verify that the limits have been set correctly: $ ulimit -c If all parameters have been configured correctly, the correct output is: $ unlimited 4. If all parameters have been configured correctly, but running ulimit -c outputs 0, run the following: $ sudo vim /etc/profile 5. Search for line and tag it with the hash symbol: 304 Chapter 2. Guides SQream DB, Release 2021.1 $ ulimit -S -c 0 > /dev/null 2>&1 6. If the line is not found in /etc/profile directory, do the following: a. Run the following command: $ sudo vim /etc/init.d/functions b. Search for the following: $ ulimit -S -c ${DAEMON_COREFILE_LIMIT:-0} >/dev/null 2>&1 c. If the line is found, tag it with the hash symbol and reboot the server. 2.1.5 Features guides 2.1.5.1 Access control In this topic: • Overview • Terminology – Roles – Authentication – Authorization • Roles – Creating new roles (users) – Dropping a user – Altering a user name – Changing user passwords – Public Role – Role membership (groups) • Permissions – GRANT – REVOKE – Default permissions • Departmental Example – Setting up the department permissions – Creating new users in the departments 2.1. Full list of guides 305 SQream DB, Release 2021.1 2.1.5.1.1 Overview Access control provides authentication and authorization in SQream DB. SQream DB manages authentication and authorization using a role-based access control system (RBAC), like ANSI SQL and other SQL products. SQream DB has a default permissions system which is inspired by Postgres, but with more power. In most cases, this allows an administrator to set things up so that every object gets permissions set automatically. In SQream DB, users log in from any worker which verifies their roles and permissions from the metadata server. Each statement issues commands as the currently logged in role. Roles are defined at the cluster level, meaning they are valid for all databases in thecluster. To bootstrap SQream DB, a new install will always have one SUPERUSER role, typically named sqream. To create more roles, you should first connect as this role. 2.1.5.1.2 Terminology 2.1.5.1.2.1 Roles Role : a role can be a user, a group, or both. Roles can own database objects (e.g. tables), and can assign permissions on those objects to other roles. Roles can be members of other roles, meaning a user role can inherit permissions from its parent role. 2.1.5.1.2.2 Authentication Authentication : verifying the identity of the role. User roles have usernames (role names) and passwords. 2.1.5.1.2.3 Authorization Authorization : checking the role has permissions to do a particular thing. The GRANT command is used for this. 2.1.5.1.3 Roles Roles are used for both users and groups. Roles are global across all databases in the SQream DB cluster. To use a ROLE as a user, it should have a password, the login permission, and connect permissions to the relevant databases. 2.1.5.1.3.1 Creating new roles (users) A user role can log in to the database, so it should have LOGIN permissions, as well as a password. For example: 306 Chapter 2. Guides SQream DB, Release 2021.1 CREATE ROLE role_name ; GRANT LOGIN to role_name ; GRANT PASSWORD 'new_password' to role_name ; GRANT CONNECT ON DATABASE database_name to role_name ; Examples: CREATE ROLE new_role_name ; GRANT LOGIN TO new_role_name; GRANT PASSWORD 'my_password' TO new_role_name; GRANT CONNECT ON DATABASE master TO new_role_name; A database role may have a number of permissions that define what tasks it can perform. These are assigned using the GRANT command. 2.1.5.1.3.2 Dropping a user DROP ROLE role_name ; Examples: DROP ROLE admin_role ; 2.1.5.1.3.3 Altering a user name Renaming a user’s role: ALTER ROLE role_name RENAME TO new_role_name ; Examples: ALTER ROLE admin_role RENAME TO copy_role ; 2.1.5.1.3.4 Changing user passwords To change a user role’s password, grant the user a new password. GRANT PASSWORD 'new_password' TO rhendricks; Note: Granting a new password overrides any previous password. Changing the password while the role has an active running statement does not affect that statement, but will affect subsequent statements. 2.1.5.1.3.5 Public Role There is a public role which always exists. Each role is granted to the PUBLIC role (i.e. is a member of the public group), and this cannot be revoked. You can alter the permissions granted to the public role. 2.1. Full list of guides 307 SQream DB, Release 2021.1 The PUBLIC role has USAGE and CREATE permissions on PUBLIC schema by default, therefore, new users can create, INSERT, DELETE, and SELECT from objects in the PUBLIC schema. 2.1.5.1.3.6 Role membership (groups) Many database administrators find it useful to group user roles together. By grouping users, permissions can be granted to, or revoked from a group with one command. In SQream DB, this is done by creating a group role, granting permissions to it, and then assigning users to that group role. To use a role purely as a group, omit granting it LOGIN and PASSWORD permissions. The CONNECT permission can be given directly to user roles, and/or to the groups they are part of. CREATE ROLE my_group; Once the group role exists, you can add user roles (members) using the GRANT command. For example: -- Add my_user to this group GRANT my_group TO my_user; To manage object permissions like databases and tables, you would then grant permissions to the group-level role (see the permissions table below. All member roles then inherit the permissions from the group. For example: -- Grant all group users connect permissions GRANT CONNECT ON DATABASE a_database TO my_group; -- Grant all permissions on tables in public schema GRANT ALL ON all tables IN schema public TO my_group; Removing users and permissions can be done with the REVOKE command: -- remove my_other_user from this group REVOKE my_group FROM my_other_user; 308 Chapter 2. Guides SQream DB, Release 2021.1 2.1.5.1.4 Permissions Ob- Permission Description ject/layer all LOGIN use role to log into the system (the role also needs connect permission on the databases database it is connecting to) all PASSWORD the password used for logging into the system databases all SUPERUSER no permission restrictions on any activity databases database SUPERUSER no permission restrictions on any activity within that database (this does not include modifying roles or permissions) database CONNECT connect to the database database CREATE create schemas in the database database CREATE create and drop functions FUNCTION schema USAGE allows additional permissions within the schema schema CREATE create tables in the schema table SELECT SELECT from the table table INSERT INSERT into the table table DELETE DELETE and TRUNCATE on the table table DDL drop and alter on the table table ALL all the table permissions function EXECUTE use the function function DDL drop and alter on the function function ALL all function permissions 2.1.5.1.4.1 GRANT GRANT gives permissions to a role. -- Grant permissions at the instance/ storage cluster level: GRANT { SUPERUSER | LOGIN | PASSWORD ' -- Grant permissions at the database level: GRANT {{CREATE | CONNECT| DDL | SUPERUSER | CREATE FUNCTION} [, ...] | ALL␣ ,→[PERMISSIONS]} ON DATABASE -- Grant permissions at the schema level: GRANT {{ CREATE | DDL | USAGE | SUPERUSER } [, ...] | ALL [ PERMISSIONS ]} (continues on next page) 2.1. Full list of guides 309 SQream DB, Release 2021.1 (continued from previous page) ON SCHEMA -- Grant permissions at the object level: GRANT {{SELECT | INSERT | DELETE | DDL } [, ...] | ALL [PERMISSIONS]} ON { TABLE -- Grant execute function permission: GRANT {ALL | EXECUTE | DDL} ON FUNCTION function_name TO role; -- Allows role2 to use permissions granted to role1 GRANT -- Also allows the role2 to grant role1 to other roles: GRANT GRANT examples: GRANT LOGIN,superuser TO admin; GRANT CREATE FUNCTION ON database master TO admin; GRANT SELECT ON TABLE admin.table1 TO userA; GRANT EXECUTE ON FUNCTION my_function TO userA; GRANT ALL ON FUNCTION my_function TO userA; GRANT DDL ON admin.main_table TO userB; GRANT ALL ON all tables IN schema public TO userB; GRANT admin TO userC; GRANT superuser ON schema demo TO userA GRANT admin_role TO userB; 2.1.5.1.4.2 REVOKE REVOKE removes permissions from a role. -- Revoke permissions at the instance/ storage cluster level: REVOKE { SUPERUSER | LOGIN (continues on next page) 310 Chapter 2. Guides SQream DB, Release 2021.1 (continued from previous page) | PASSWORD } FROM -- Revoke permissions at the database level: REVOKE {{CREATE | CONNECT | DDL | SUPERUSER | CREATE FUNCTION}[, ...] |ALL [PERMISSIONS]} ON DATABASE -- Revoke permissions at the schema level: REVOKE {{ CREATE | DDL | USAGE | SUPERUSER } [, ...] | ALL [PERMISSIONS]} ON SCHEMA -- Revoke permissions at the object level: REVOKE {{ SELECT | INSERT | DELETE | DDL } [, ...] | ALL } ON {[ TABLE ] -- Removes access to permissions in role1 by role 2 REVOKE -- Removes permissions to grant role1 to additional roles from role2 REVOKE Examples: REVOKE superuser on schema demo from userA; REVOKE delete on admin.table1 from userB; REVOKE login from role_test; REVOKE CREATE FUNCTION FROM admin; 2.1.5.1.4.3 Default permissions The default permissions system (See ALTER DEFAULT PERMISSIONS) can be used to automatically grant permissions to newly created objects (See the departmental example below for one way it can be used). A default permissions rule looks for a schema being created, or a table (possibly by schema), and is table to grant any permission to that object to any role. This happens when the create table or create schema statement is run. ALTER DEFAULT PERMISSIONS FOR target_role_name [IN schema_name, ...] FOR { TABLES | SCHEMAS } { grant_clause | DROP grant_clause} TO ROLE { role_name | public }; (continues on next page) 2.1. Full list of guides 311 SQream DB, Release 2021.1 (continued from previous page) grant_clause ::= GRANT { CREATE FUNCTION | SUPERUSER | CONNECT | CREATE | USAGE | SELECT | INSERT | DELETE | DDL | EXECUTE | ALL } 2.1.5.1.5 Departmental Example You work in a company with several departments. The example below shows you how to manage permissions in a database shared by multiple departments, where each department has different roles for the tables by schema. It walks you through how tosetthe permissions up for existing objects and how to set up default permissions rules to cover newly created objects. The concept is that you set up roles for each new schema with the correct permissions, then the existing users can use these roles. A superuser must do new setup for each new schema which is a limitation, but superuser permissions are not needed at any other time, and neither are explicit grant statements or object ownership changes. In the example, the database is called my_database, and the new or existing schema being set up to be managed in this way is called my_schema. There will be a group for this schema for each of the following: Group Activities database designers create, alter and drop tables updaters insert and delete data readers read data security officers add and remove users from these groups 2.1.5.1.5.1 Setting up the department permissions As a superuser, you connect to the system and run the following: -- create the groups CREATE ROLE my_schema_security_officers; CREATE ROLE my_schema_database_designers; CREATE ROLE my_schema_updaters; CREATE ROLE my_schema_readers; (continues on next page) 312 Chapter 2. Guides SQream DB, Release 2021.1 Fig. 4: Our departmental example has four user group roles and seven users roles (continued from previous page) -- grant permissions for each role -- we grant permissions for existing objects here too, -- so you don't have to start with an empty schema -- security officers GRANT connect ON DATABASE my_database TO my_schema_security_officers; GRANT usage ON SCHEMA my_schema TO my_schema_security_officers; GRANT my_schema_database_designers TO my_schema_security_officers WITH ADMIN OPTION; GRANT my_schema_updaters TO my_schema_security_officers WITH ADMIN OPTION; GRANT my_schema_readers TO my_schema_security_officers WITH ADMIN OPTION; -- database designers GRANT connect ON DATABASE my_database TO my_schema_database_designers; GRANT usage ON SCHEMA my_schema TO my_schema_database_designers; GRANT create,ddl ON SCHEMA my_schema TO my_schema_database_designers; -- updaters GRANT connect ON DATABASE my_database TO my_schema_updaters; GRANT usage ON SCHEMA my_schema TO my_schema_updaters; GRANT SELECT,INSERT,DELETE ON ALL TABLES IN SCHEMA my_schema TO my_schema_updaters; -- readers (continues on next page) 2.1. Full list of guides 313 SQream DB, Release 2021.1 (continued from previous page) GRANT connect ON DATABASE my_database TO my_schema_readers; GRANT usage ON SCHEMA my_schema TO my_schema_readers; GRANT SELECT ON ALL TABLES IN SCHEMA my_schema TO my_schema_readers; GRANT EXECUTE ON ALL FUNCTIONS TO my_schema_readers; -- create the default permissions for new objects ALTER DEFAULT PERMISSIONS FOR my_schema_database_designers IN my_schema FOR TABLES GRANT SELECT,INSERT,DELETE TO my_schema_updaters; -- For every table created by my_schema_database_designers, give access to my_schema_ ,→readers: ALTER DEFAULT PERMISSIONS FOR my_schema_database_designers IN my_schema FOR TABLES GRANT SELECT TO my_schema_readers; Note: • This process needs to be repeated by a user with SUPERUSER permissions each time a new schema is brought into this permissions management approach. • By default, any new object created will not be accessible by our new my_schema_readers group. Running a GRANT SELECT ... only affects objects that already exist in the schema or database. If you’re getting a Missing the following permissions: SELECT on table 'database.public. tablename' error, make sure that you’ve altered the default permissions with the ALTER DEFAULT PERMISSIONS statement. 2.1.5.1.5.2 Creating new users in the departments After the group roles have been created, you can now create user roles for each of your users. -- create the new database designer users CREATE ROLE ecodd; GRANT LOGIN TO ecodd; GRANT PASSWORD 'ecodds_secret_password' TO ecodd; GRANT CONNECT ON DATABASE my_database TO ecodd; GRANT my_schema_database_designers TO ecodd; CREATE ROLE ebachmann; GRANT LOGIN TO ebachmann; GRANT PASSWORD 'another_secret_password' TO ebachmann; GRANT CONNECT ON DATABASE my_database TO ebachmann; GRANT my_database_designers TO ebachmann; -- If a user already exists, we can assign that user directly to the group (continues on next page) 314 Chapter 2. Guides SQream DB, Release 2021.1 (continued from previous page) GRANT my_schema_updaters TO rhendricks; -- Create users in the readers group CREATE ROLE jbarker; GRANT LOGIN TO jbarker; GRANT PASSWORD 'action_jack' TO jbarker; GRANT CONNECT ON DATABASE my_database TO jbarker; GRANT my_schema_readers TO jbarker; CREATE ROLE lbream; GRANT LOGIN TO lbream; GRANT PASSWORD 'artichoke123' TO lbream; GRANT CONNECT ON DATABASE my_database TO lbream; GRANT my_schema_readers TO lbream; CREATE ROLE pgregory; GRANT LOGIN TO pgregory; GRANT PASSWORD 'c1ca6a' TO pgregory; GRANT CONNECT ON DATABASE my_database TO pgregory; GRANT my_schema_readers TO pgregory; -- Create users in the security officers group CREATE ROLE hoover; GRANT LOGIN TO hoover; GRANT PASSWORD 'mintchip' TO hoover; GRANT CONNECT ON DATABASE my_database TO hoover; GRANT my_schema_security_officers TO hoover; After this setup: • Database designers will be able to run any ddl on objects in the schema and create new objects, including ones created by other database designers • Updaters will be able to insert and delete to existing and new tables • Readers will be able to read from existing and new tables All this will happen without having to run any more GRANT statements. Any security officer will be able to add and remove users from these groups. Creating and dropping login users themselves must be done by a superuser. 2.1.5.2 Concurrency and locks Locks are used in SQream DB to provide consistency when there are multiple concurrent transactions updating the database. Read only transactions are never blocked, and never block anything. Even if you drop a database while concurrently running a query on it, both will succeed correctly (as long as the query starts running before the drop database commits). 2.1. Full list of guides 315 SQream DB, Release 2021.1 2.1.5.2.1 Locking modes SQream DB has two kinds of locks: • exclusive - this lock mode prevents the resource from being modified by other statements This lock tells other statements that they’ll have to wait in order to change an object. DDL operations are always exclusive. They block other DDL operations, and update DML operations (insert and delete). • inclusive - For insert operations, an inclusive lock is obtained on a specific object. This prevents other statements from obtaining an exclusive lock on the object. This lock allows other statements to insert or delete data from a table, but they’ll have to wait in order to run DDL. 2.1.5.2.2 When are locks obtained? Operation SELECT INSERT DELETE, TRUNCATE DDL SELECT Concurrent Concurrent Concurrent Concurrent INSERT Concurrent Concurrent Concurrent Wait DELETE, TRUNCATE Concurrent Concurrent Wait Wait DDL Concurrent Wait Wait Wait Statements that wait will exit with an error if they hit the lock timeout. The default timeout is 3 seconds, see statementLockTimeout. 2.1.5.2.2.1 Global locks Some operations require exclusive global locks at the cluster level. These usually short-lived locks will be obtained for the following operations: • CREATE DATABASE • CREATE ROLE • CREATE TABLE • ALTER ROLE • ALTER TABLE • DROP DATABASE • DROP ROLE • DROP TABLE • GRANT • REVOKE 2.1.5.2.3 Monitoring locks Monitoring locks across the cluster can be useful when transaction contention takes place, and statements appear “stuck” while waiting for a previous statement to release locks. 316 Chapter 2. Guides SQream DB, Release 2021.1 The utility SHOW_LOCKS can be used to see the active locks. In this example, we create a table based on results (CREATE TABLE AS), but we are also effectively dropping the previous table (by using OR REPLACE which also drops the table). Thus, SQream DB applies locks during the table creation process to prevent the table from being altered during it’s creation. t=> SELECT SHOW_LOCKS(); statement_id | statement_string ␣ ,→ | username | server | port | locked_object ␣ ,→ | lockmode | statement_start_time | lock_start_time ------+------,→------+------+------+------+------,→--+------+------+------287 | CREATE OR REPLACE TABLE nba2 AS SELECT "Name" FROM nba WHERE REGEXP_COUNT( ,→"Name", '( )+', 8)>1; | sqream | 192.168.1.91 | 5000 | database$t ␣ ,→ | Inclusive | 2019-12-26 00:03:30 | 2019-12-26 00:03:30 287 | CREATE OR REPLACE TABLE nba2 AS SELECT "Name" FROM nba WHERE REGEXP_COUNT( ,→"Name", '( )+', 8)>1; | sqream | 192.168.1.91 | 5000 | globalpermission$ ␣ ,→ | Exclusive | 2019-12-26 00:03:30 | 2019-12-26 00:03:30 287 | CREATE OR REPLACE TABLE nba2 AS SELECT "Name" FROM nba WHERE REGEXP_COUNT( ,→"Name", '( )+', 8)>1; | sqream | 192.168.1.91 | 5000 | schema$t$public ␣ ,→ | Inclusive | 2019-12-26 00:03:30 | 2019-12-26 00:03:30 287 | CREATE OR REPLACE TABLE nba2 AS SELECT "Name" FROM nba WHERE REGEXP_COUNT( ,→"Name", '( )+', 8)>1; | sqream | 192.168.1.91 | 5000 | table$t$public$nba2$Insert ␣ ,→ | Exclusive | 2019-12-26 00:03:30 | 2019-12-26 00:03:30 287 | CREATE OR REPLACE TABLE nba2 AS SELECT "Name" FROM nba WHERE REGEXP_COUNT( ,→"Name", '( )+', 8)>1; | sqream | 192.168.1.91 | 5000 | table$t$public$nba2$Update ␣ ,→ | Exclusive | 2019-12-26 00:03:30 | 2019-12-26 00:03:30 2.1.5.2.4 Troubleshooting locks Sometimes, a rare situation can occur where a lock is never freed. The workflow for troubleshooting locks is: 1. Identify which statement has obtained locks 2. Understand if the statement is itself stuck, or waiting for another statement 3. Try to abort the offending statement 4. Force the stale locks to be removed 2.1.5.2.4.1 Example We will assume that the statement from the previous example is stuck (statement #287). We can attempt to abort it using STOP_STATEMENT: t=> SELECT STOP_STATEMENT(287); executed If the locks still appear in the SHOW_LOCKS utility, we can force remove the stale locks: t=> SELECT RELEASE_DEFUNCT_LOCKS(); executed 2.1. Full list of guides 317 SQream DB, Release 2021.1 Warning: This operation can cause some statements to fail on the specific worker on which they are queued. This is intended as a “last resort” to solve stale locks. 2.1.5.3 Concurrency and scaling in SQream DB A SQream DB cluster can concurrently run one regular statement per worker process. A number of small statements will execute alongside these statements without waiting or blocking anything. SQream DB supports n concurrent statements by having n workers in a cluster. Each worker uses a fixed slice of a GPU’s memory, with usual values are around 8-16GB of GPU memory per worker. This size is ideal for queries running on large data with potentially large row sizes. 2.1.5.3.1 Scaling when data sizes grow For many statements, SQream DB scales linearly when adding more storage and querying on large data sets. It uses very optimised ‘brute force’ algorithms and implementations, which don’t suffer from sudden performance cliffs at larger data sizes. 2.1.5.3.2 Scaling when queries are queueing SQream DB scales well by adding more workers, GPUs, and nodes to support more concurrent statements. 2.1.5.3.3 What to do when queries are slow Adding more workers or GPUs does not boost the performance of a single statement or query. To boost the performance of a single statement, start by examining the best practices and ensure the guidelines are followed. Adding additional RAM to nodes, using more GPU memory, and faster CPUs or storage can also sometimes help. Need help? Analyzing complex workloads can be challenging. SQream’s experienced customer support has the experience to advise on these matters to ensure the best experience. Visit SQream’s support portal for additional support. 2.1.5.4 Metadata system SQream DB contains a transparent and automatic system that collects metadata describing each chunk. The collected metadata enables effective skipping of chunks and extents when queries are executed. 318 Chapter 2. Guides SQream DB, Release 2021.1 2.1.5.4.1 How is metadata collected? When data is inserted into SQream DB, the load process splits data into chunks. Several parameters are collected and stored for later use, including: • Range of values for each column chunk (minimum, maximum) • The number of values • Additional information for query optimization Data is collected automatically and transparently on every column type. Fig. 5: Chunks are collections of rows from a column Fig. 6: Metadata is automatically added to each chunk 2.1.5.4.2 How is metadata used? Chunk metadata is collected for identifying column values and potentially skipping accessing them, to reduce unnecessary I/O operations. For example, when a query specifies a filter (e.g. WHERE or JOIN condition) on 2.1. Full list of guides 319 SQream DB, Release 2021.1 a range of values that spans a fraction of the table values, SQream DB will optimally scan only that fraction of the table chunks. Queries that filter on fine-grained date and time ranges will be the most effective, particularly when data is timestamped, and when tables contain a large amount of historical data. 2.1.5.4.3 Why is metadata always on? Metadata collection adds very little overhead to data load. WHen possible, most metadata collection is performed in the GPU. Metadata is collected for every chunk, and adds a handful of kilobytes at most per million values, and very few compute cycles. At scale, metadata collection is often negligible, resulting in a 0.005% overhead. For a 10TB dataset, the metadata storage overhead is estimated at 0.5GB. Because SQream DB’s metadata collection is so light-weight and often results in effective data skipping, it is always-on. 2.1.5.5 Chunks and extents All data in SQream DB is stored in logical tables. Each table is made up of rows, spanning one or more columns. Internally however, SQream DB stores data partitioned vertically by column, and horizontally by chunks. This topic describes the chunk concept, which can be helpful for tuning SQream DB workloads and table structures. 2.1.5.5.1 What are chunks? What are extents? 2.1.5.5.1.1 Chunks All data in a table is automatically partitioned into columns, and each column is divided up further into chunks. A chunk is a contiguous number of rows from a specific column. It can be thought of as an automatic partition that spans several millions of records of one column. A chunk is often between 1MB to a couple hundred megabytes uncompressed, depending on the data type (however, all data in SQream DB is stored compressed). This chunk size is suitable for filtering and deleting data from large tables, which can contain hundreds of millions or even billions of chunks. SQream DB adds metadata to chunks automatically. Note: Chunking is automatic and transparent for all tables in SQream DB. 320 Chapter 2. Guides SQream DB, Release 2021.1 Fig. 7: Chunks are collections of rows from a column 2.1.5.5.1.2 Extents The next step up from the chunk, an extent is a specific number of contiguous chunks. Extents are designed to optimize disk access patterns, at around 20MB compressed, on-disk. An extent will therefore include between 1 and 25 chunks, based on the actual compressed chunk size. Fig. 8: Extents are a collection of several contiguous chunks 2.1.5.5.2 Why was SQream DB built with chunks and extents? 2.1.5.5.2.1 Benefits of chunking Unlike node-partitioning (or sharding), chunking carries several benefits: 2.1. Full list of guides 321 SQream DB, Release 2021.1 • Chunks are small enough to allow multiple workers to read them concurrently • Chunks are optimized for fast insertion of data • Chunks carry metadata, which narrows down their contents for the optimizer • Chunks are ideal for data retension as they can be deleted en-masse • Chunks are optimized for reading into RAM and the GPU • Chunks are compressed individually, which improves compression and data locality 2.1.5.5.2.2 Storage reorganization SQream DB performs some background storage reorganization to optimize I/O and read patterns. For example, when data is inserted in small batches, SQream DB will run two background processes called rechunk and reextent to reorganize the data into larger contiguous chunks and extents. This is also what happens when data is deleted. Data is never overwritten in SQream DB. Instead, new optimized chunks and extents are written to replace the old chunks and extents. Once all data has been rewritten, SQream DB will swap over to the new optimized chunks and extents and remove the old, unoptimized data. 2.1.5.5.2.3 Metadata The chunk metadata that SQream DB collects enables effective skipping of chunks and extents when queries are executed. When a query specifies a filter (e.g. WHERE or JOIN condition) on a range of values that spans a fraction of the table values, SQream DB will optimally scan only that fraction of the table chunks. Queries that filter on fine-grained date and time ranges will be the most effective, particularly when data is timestamped, and when tables contain a large amount of historical data. See more in our Time based data management guide and our Metadata system guide. 2.1.5.6 Compression SQream DB uses compression and encoding techniques to optimize query performance and save on disk space. 2.1.5.6.1 Encoding Encoding converts data into a common format. When data is stored in a columnar format, it is often in a common format. This is in contrast with data stored in CSVs for example, where everything is stored in a text format. Because encoding uses specific data formats and encodings, it increases performance and reduces datasize. SQream DB encodes data in several ways depending on the data type. For example, a date is stored as an integer, with March 1st 1CE as the start. This is a lot more efficient than encoding the date as a string, and offers a wider range than storing it relative to the Unix Epoch. 322 Chapter 2. Guides SQream DB, Release 2021.1 2.1.5.6.2 Compression Compression transforms data into a smaller format without losing accuracy (lossless). After encoding a set of column values, SQream DB packs the data and compresses it. Before data can be accessed, SQream DB needs to decompress it. Depending on the compression scheme, the operations can be performed on the CPU or the GPU. Some users find that GPU compressions perform better for their data. 2.1.5.6.2.1 Automatic compression By default, SQream DB automatically compresses every column (see Specifying compressions below for overriding default compressions). This feature is called automatic adaptive compression strategy. When loading data, SQream DB automatically decides on the compression schemes for specific chunks of data by trying several compression schemes and selecting the one that performs best. SQream DB tries to balance more agressive compressions with the time and CPU/GPU time required to compress and decompress the data. 2.1. Full list of guides 323 SQream DB, Release 2021.1 2.1.5.6.2.2 Compression strategies Compression name Supported data types Description Location FLAT All types No compression (forced) • DEFAULT All types Automatic scheme selec- • tion DICT Integer types, dates and Dictionary compression GPU timestamps, short texts with RLE. For each chunk, SQream DB cre- ates a dictionary of dis- tinct values and stores only their indexes. Works best for integers and texts shorter than 120 characters, with <10% unique values. Useful for storing ENUMs or keys, stock tickers, and dimensions. If the data is optionally sorted, this compression will perform even bet- ter. P4D Integer types, dates and Patched frame-of- GPU timestamps reference + Delta Based on the delta between consecutive values. Works best for monotonously in- creasing or decreasing numbers and times- tamps LZ4 Text types Lempel-Ziv general pur- CPU pose compression, used for texts SNAPPY Text types General purpose com- CPU pression, used for texts RLE Integer types, dates and Run-length encoding. GPU timestamps This replaces sequences of values with a single pair. It is best for low cardinality columns that are used to sort data (ORDER BY). SEQUENCE Integer types Optimized RLE + Delta GPU type for built-in identity columns. 324 Chapter 2. Guides SQream DB, Release 2021.1 2.1.5.6.2.3 Specifying compression strategies When creating a table without any compression specifications, SQream DB defaults to automatic adaptive compression ("default"). However, this can be overriden by specifying a compression strategy when creating a table. 2.1.5.6.2.4 Explicitly specifying automatic compression The following two are equivalent: CREATE TABLE t ( x INT, y VARCHAR(50) ); In this version, the default compression is specified explicitly: CREATE TABLE t ( x INT CHECK('CS "default"'), y VARCHAR(50) CHECK('CS "default"') ); 2.1.5.6.2.5 Forcing no compression (flat) In some cases, you may wish to remove compression entirely on some columns, in order to reduce CPU or GPU resource utilization at the expense of increased I/O. CREATE TABLE t ( x INT NOT NULL CHECK('CS "flat"'), -- This column won't be compressed y VARCHAR(50) -- This column will still be compressed automatically ); 2.1.5.6.2.6 Forcing compressions In some cases, you may wish to force SQream DB to use a specific compression scheme based on your knowledge of the data. For example: CREATE TABLE t ( id BIGINT NOT NULL CHECK('CS "sequence"'), y VARCHAR(110) CHECK('CS "lz4"'), -- General purpose text compression z VARCHAR(80) CHECK('CS "dict"'), -- Low cardinality column ); 2.1. Full list of guides 325 SQream DB, Release 2021.1 2.1.5.6.2.7 Examining compression effectiveness Queries to the internal metadata catalog can expose how effective the compression is, as well as what compression schemes were selected. Here is a sample query we can use to query the catalog: SELECT c.column_name AS "Column", cc.compression_type AS "Actual compression", AVG(cc.compressed_size) "Compressed", AVG(cc.uncompressed_size) "Uncompressed", AVG(cc.uncompressed_size::FLOAT/ cc.compressed_size) -1 AS "Compression␣ ,→effectiveness", MIN(c.compression_strategy) AS "Compression strategy" FROM sqream_catalog.chunk_columns cc INNER JOIN sqream_catalog.columns c ON cc.table_id = c.table_id AND cc.database_name = c.database_name AND cc.column_id = c.column_id WHERE c.table_name = 'some_table' -- This is the table name which we want to inspect GROUP BY 1, 2; Example (subset) from the ontime table: stats=> SELECT c.column_name AS "Column", . cc.compression_type AS "Actual compression", . AVG(cc.compressed_size) "Compressed", . AVG(cc.uncompressed_size) "Uncompressed", . AVG(cc.uncompressed_size::FLOAT/ cc.compressed_size) -1 AS "Compression␣ ,→effectiveness", . MIN(c.compression_strategy) AS "Compression strategy" . FROM sqream_catalog.chunk_columns cc . INNER JOIN sqream_catalog.columns c . ON cc.table_id = c.table_id . AND cc.database_name = c.database_name . AND cc.column_id = c.column_id . . WHERE c.table_name = 'ontime' . . GROUP BY 1, . 2; Column | Actual compression | Compressed | Uncompressed | Compression␣ ,→effectiveness | Compression strategy ------+------+------+------+------,→------+------actualelapsedtime@null | dict | 129177 | 1032957 | ␣ ,→ 7 | default actualelapsedtime@val | dict | 1379797 | 4131831 | ␣ ,→ 2 | default airlineid | dict | 578150 | 2065915 | ␣ ,→ 2.7 | default (continues on next page) 326 Chapter 2. Guides SQream DB, Release 2021.1 (continued from previous page) airtime@null | dict | 130011 | 1039625 | ␣ ,→ 7 | default airtime@null | rle | 93404 | 1019833 | ␣ ,→ 116575.61 | default airtime@val | dict | 1142045 | 4131831 | ␣ ,→ 7.57 | default arrdel15@null | dict | 129177 | 1032957 | ␣ ,→ 7 | default arrdel15@val | dict | 129183 | 4131831 | ␣ ,→ 30.98 | default arrdelay@null | dict | 129177 | 1032957 | ␣ ,→ 7 | default arrdelay@val | dict | 1389660 | 4131831 | ␣ ,→ 2 | default arrdelayminutes@null | dict | 129177 | 1032957 | ␣ ,→ 7 | default arrdelayminutes@val | dict | 1356034 | 4131831 | ␣ ,→ 2.08 | default arrivaldelaygroups@null | dict | 129177 | 1032957 | ␣ ,→ 7 | default arrivaldelaygroups@val | p4d | 516539 | 2065915 | ␣ ,→ 3 | default arrtime@null | dict | 129177 | 1032957 | ␣ ,→ 7 | default arrtime@val | p4d | 1652799 | 2065915 | ␣ ,→ 0.25 | default arrtimeblk | dict | 688870 | 9296621 | ␣ ,→ 12.49 | default cancellationcode@null | dict | 129516 | 1035666 | ␣ ,→ 7 | default cancellationcode@null | rle | 54392 | 1031646 | ␣ ,→ 131944.62 | default cancellationcode@val | dict | 263149 | 1032957 | ␣ ,→ 4.12 | default cancelled | dict | 129183 | 4131831 | ␣ ,→ 30.98 | default carrier | dict | 578150 | 2065915 | ␣ ,→ 2.7 | default carrierdelay@null | dict | 129516 | 1035666 | ␣ ,→ 7 | default carrierdelay@null | flat | 1041250 | 1041250 | ␣ ,→ 0 | default carrierdelay@null | rle | 4869 | 1026493 | ␣ ,→ 202740.2 | default carrierdelay@val | dict | 834559 | 4131831 | ␣ ,→ 14.57 | default crsarrtime | p4d | 1652799 | 2065915 | ␣ ,→ 0.25 | default crsdeptime | p4d | 1652799 | 2065915 | ␣ ,→ 0.25 | default crselapsedtime@null | dict | 130449 | 1043140 | ␣ ,→ 7 | default (continues on next page) 2.1. Full list of guides 327 SQream DB, Release 2021.1 (continued from previous page) crselapsedtime@null | rle | 3200 | 1013388 | ␣ ,→ 118975.75 | default crselapsedtime@val | dict | 1182286 | 4131831 | ␣ ,→ 2.5 | default dayofmonth | dict | 688730 | 1032957 | ␣ ,→ 0.5 | default dayofweek | dict | 393577 | 1032957 | ␣ ,→ 1.62 | default departuredelaygroups@null | dict | 129177 | 1032957 | ␣ ,→ 7 | default departuredelaygroups@val | p4d | 516539 | 2065915 | ␣ ,→ 3 | default depdel15@null | dict | 129177 | 1032957 | ␣ ,→ 7 | default depdel15@val | dict | 129183 | 4131831 | ␣ ,→ 30.98 | default depdelay@null | dict | 129177 | 1032957 | ␣ ,→ 7 | default depdelay@val | dict | 1384453 | 4131831 | ␣ ,→ 2.01 | default depdelayminutes@null | dict | 129177 | 1032957 | ␣ ,→ 7 | default depdelayminutes@val | dict | 1362893 | 4131831 | ␣ ,→ 2.06 | default deptime@null | dict | 129177 | 1032957 | ␣ ,→ 7 | default deptime@val | p4d | 1652799 | 2065915 | ␣ ,→ 0.25 | default deptimeblk | dict | 688870 | 9296621 | ␣ ,→ 12.49 | default month | dict | 247852 | 1035246 | ␣ ,→ 3.38 | default month | rle | 5 | 607346 | ␣ ,→ 121468.2 | default origin | dict | 1119457 | 3098873 | ␣ ,→ 1.78 | default quarter | rle | 8 | 1032957 | ␣ ,→ 136498.61 | default securitydelay@null | dict | 129516 | 1035666 | ␣ ,→ 7 | default securitydelay@null | flat | 1041250 | 1041250 | ␣ ,→ 0 | default securitydelay@null | rle | 4869 | 1026493 | ␣ ,→ 202740.2 | default securitydelay@val | dict | 581893 | 4131831 | ␣ ,→ 15.39 | default tailnum@null | dict | 129516 | 1035666 | ␣ ,→ 7 | default tailnum@null | rle | 38643 | 1031646 | ␣ ,→ 121128.68 | default tailnum@val | dict | 1659918 | 12395495 | ␣ ,→ 22.46 | default (continues on next page) 328 Chapter 2. Guides SQream DB, Release 2021.1 (continued from previous page) taxiin@null | dict | 130011 | 1039625 | ␣ ,→ 7 | default taxiin@null | rle | 93404 | 1019833 | ␣ ,→ 116575.61 | default taxiin@val | dict | 839917 | 4131831 | ␣ ,→ 8.49 | default taxiout@null | dict | 130011 | 1039625 | ␣ ,→ 7 | default taxiout@null | rle | 84327 | 1019833 | ␣ ,→ 116575.86 | default taxiout@val | dict | 891539 | 4131831 | ␣ ,→ 8.28 | default totaladdgtime@null | dict | 129516 | 1035666 | ␣ ,→ 7 | default totaladdgtime@null | rle | 3308 | 1031646 | ␣ ,→ 191894.18 | default totaladdgtime@val | dict | 465839 | 4131831 | ␣ ,→ 20.51 | default uniquecarrier | dict | 578221 | 7230705 | ␣ ,→ 11.96 | default year | rle | 6 | 2065915 | ␣ ,→ 317216.08 | default 2.1.5.6.2.8 Notes on reading this table: 1. Higher numbers in the effectiveness column represent better compressions. 0 represents a column that wasn’t compressed at all. 2. Column names are the internal representation. Names with @null and @val suffixes represent a nullable column’s null (boolean) and values respectively, but are treated as one logical column. 3. The query lists all actual compressions for a column, so it may appear several times if the compression has changed mid-way through the loading (as with the carrierdelay column). 4. When default is the compression strategy, the system automatically selects the best compression. This can also mean no compression at all (flat). 2.1.5.6.3 Compression best practices 2.1.5.6.3.1 Let SQream DB decide on the compression strategy In general, SQream DB will decide on the best compression strategy in most cases. When overriding compression strategies, we recommend benchmarking not just storage size but also query and load performance. 2.1.5.6.3.2 Maximize the advantage of each compression schemes Some compression schemes perform better when data is organized in a specific way. 2.1. Full list of guides 329 SQream DB, Release 2021.1 For example, to take advantage of RLE, sorting a column may result in better performance and reduced disk-space and I/O usage. Sorting a column partially may also be beneficial. As a rule of thumb, aim for run-lengths of more than 10 consecutive values. 2.1.5.6.3.3 Choose data types that fit the data Adapting to the narrowest data type will improve query performance and also reduce disk space usage. However, smaller data types may compress better than larger types. For example, use the smallest numeric data type that will accommodate your data. Using BIGINT for data that fits in INT or SMALLINT can use more disk space and memory for query execution. Using FLOAT to store integers will reduce compression’s effectiveness significantly. 2.1.5.7 Data clustering Together with the chunking and Metadata system, SQream DB uses information to execute queries efficiently. SQream DB automatically collects metadata on incoming data. This works well when the data is naturally ordered (e.g. with time-based data). There are situations where you know more about the incoming data than SQream DB. If you help by defining clustering keys, SQream DB can automatically improve query processing. SQream DB’s query optimizer typically selects the most efficient method when executing queries. If no clustering keys are available, itmay have to scan tables physically. 2.1.5.7.1 Clustered tables A table is considered “clustered” by one or more clustering keys if rows containing similar values with regard to these expressions are more likely to be located together on disk. For example, a table containing a date column whose values cover a whole month but each chunk on disk covers less than a specific day is considered clustered by this column. Good clustering has a significant positive impact on query performance. 2.1.5.7.2 When does clustering help? When a table is well-clustered, the metadata collected for each chunk is much more effective (the ranges are more localized). In turn, SQream DB’s query engine chooses to read fewer irrelevant chunks (or just avoid processing them). Here are some common scenarios in which data clustering is beneficial: • When a query contains a WHERE predicate of the form column COMPARISON value. For example, date_column > '2019-01-01' or id = 107 and the columns referenced are clustering keys/ SQream DB will only read the portion of the data containing values matching these predicates. • When two clustered tables are joined on their respective clustering keys, SQream DB will utilize the metadata to identify the matching chunks more easily. 330 Chapter 2. Guides SQream DB, Release 2021.1 2.1.5.7.3 Controlling data clustering Some tables are naturally clustered. For example - a call log table containing CDRs can be naturally clustered by call date if data is inserted as it is generated, or bulk loaded in batches. Data can also be clustered by a region ID, per city, or customer type, depending on the source. If the incoming data is not well-clustered (by the desired key), it is possible to tell SQream DB which keys it should cluster by. This can be done upon table creation (CREATE TABLE), or retroactively (CLUSTER BY ). New data will be clustered upon insert. When data is loaded to an explicitly clustered table, SQream DB partially sorts it. While this slows down the insert time, it is often beneficial for subsequent queries. Note: Some queries significantly benefit from the decision to use clustering. For example, queries that filter or join extensively on clustered columns will benefit. However, clustering can slow down data insertion. Some insert workloads can be up to 75% slower. If you are not sure whether or not a specific scenario will benefit from clustering, we recommended testing end-to-end (both insert and query performance) on a small subset of the data before commiting to permanent clustering keys. 2.1.5.7.4 Examples 2.1.5.7.4.1 Creating a clustered table Even when the table is naturally ordered by start_date, we can specify a cluster key that is different. This will likely improve performance for queries that order by or rely on users’ country. CREATE TABLE users ( name VARCHAR(30) NOT NULL, start_date datetime not null, country VARCHAR(30) DEFAULT 'Unknown' NOT NULL ) CLUSTER BY country; 2.1.5.8 Deleting Data SQream DB supports deleting data, but it’s important to understand how this works and how to maintain deleted data. 2.1.5.8.1 How does deleting in SQream DB work? In SQream DB, when you run a delete statement, any rows that match the delete predicate will no longer be returned when running subsequent queries. Deleted rows are tracked in a separate location, in delete predicates. After the delete statement, a separate process can be used to reclaim the space occupied by these rows, and to remove the small overhead that queries will have until this is done. Some benefits to this design are: 2.1. Full list of guides 331 SQream DB, Release 2021.1 1. Delete transactions complete quickly 2. The total disk footprint overhead at any time for a delete transaction or cleanup process is small and bounded (while the system still supports low overhead commit, rollback and recovery for delete transactions). 2.1.5.8.1.1 Phase 1: Delete When a DELETE statement is run, SQream DB records the delete predicates used. These predicates will be used to filter future statements on this table until all this delete predicate’s matching rows havebeen physically cleaned up. This filtering process takes full advantage of SQream’s zone map feature. 2.1.5.8.1.2 Phase 2: Clean-up The cleanup process is not automatic. This gives control to the user or DBA, and gives flexibility on when to run the clean up. Files marked for deletion during the logical deletion stage are removed from disk. This is achieved by calling both utility function commands: CLEANUP_CHUNKS and CLEANUP_EXTENTS sequentially. Note: • ALTER TABLE and other DDL operations are blocked on tables that require clean-up. See more in the Concurrency and locks guide. • If the estimated time for a cleanup processs is beyond a threshold, you will get an error message about it. The message will explain how to override this limitation and run the process anywhere. 2.1.5.8.2 Notes on data deletion Note: • If the number of deleted records crosses the threshold defined by the mixedColumnChunksThreshold parameter, the delete operation will be aborted. • This is intended to alert the user that the large number of deleted records may result in a large number of mixed chuncks. • To circumvent this alert, replace XXX with the desired number of records before running the delete operation: set mixedColumnChunksThreshold=XXX; 2.1.5.8.2.1 Deleting data does not free up space With the exception of a full table delete (TRUNCATE), deleting data does not free up disk space. To free up disk space, trigger the cleanup process. 332 Chapter 2. Guides SQream DB, Release 2021.1 2.1.5.8.2.2 SELECT performance on deleted rows Queries on tables that have deleted rows may have to scan data that hasn’t been cleaned up. In some cases, this can cause queries to take longer than expected. To solve this issue, trigger the cleanup process. 2.1.5.8.2.3 Use TRUNCATE instead of DELETE For tables that are frequently emptied entirely, consider using TRUNCATE rather than DELETE. TRUN- CATE removes the entire content of the table immediately, without requiring a subsequent cleanup to free up disk space. 2.1.5.8.2.4 Cleanup is I/O intensive The cleanup process actively compacts tables by writing a complete new version of column chunks with no dead space. This minimizes the size of the table, but can take a long time. It also requires extra disk space for the new copy of the table, until the operation completes. Cleanup operations can create significant I/O load on the database. Consider this when planning thebest time for the cleanup process. If this is an issue with your environment, consider using CREATE TABLE AS to create a new table and then rename and drop the old table. 2.1.5.8.3 Example 2.1.5.8.3.1 Deleting values from a table farm=> SELECT * FROM cool_animals; 1,Dog ,7 2,Possum ,3 3,Cat ,5 4,Elephant ,6500 5,Rhinoceros ,2100 6,\N,\N 6 rows farm=> DELETE FROM cool_animals WHERE weight > 1000; executed farm=> SELECT * FROM cool_animals; 1,Dog ,7 2,Possum ,3 3,Cat ,5 6,\N,\N 4 rows 2.1. Full list of guides 333 SQream DB, Release 2021.1 2.1.5.8.3.2 Deleting values based on more complex predicates farm=> SELECT * FROM cool_animals; 1,Dog ,7 2,Possum ,3 3,Cat ,5 4,Elephant ,6500 5,Rhinoceros ,2100 6,\N,\N 6 rows farm=> DELETE FROM cool_animals WHERE weight > 1000; executed farm=> SELECT * FROM cool_animals; 1,Dog ,7 2,Possum ,3 3,Cat ,5 6,\N,\N 4 rows 2.1.5.8.3.3 Identifying and cleaning up tables 2.1.5.8.3.4 List tables that haven’t been cleaned up farm=> SELECT t.table_name FROM sqream_catalog.delete_predicates dp JOIN sqream_catalog.tables t ON dp.table_id = t.table_id GROUP BY 1; cool_animals 1 row 2.1.5.8.3.5 Identify predicates for clean-up farm=> SELECT delete_predicate FROM sqream_catalog.delete_predicates dp JOIN sqream_catalog.tables t ON dp.table_id = t.table_id WHERE t.table_name = 'cool_animals'; weight > 1000 1 row 334 Chapter 2. Guides SQream DB, Release 2021.1 2.1.5.8.3.6 Triggering a cleanup -- Chunk reorganization (aka SWEEP) farm=> SELECT CLEANUP_CHUNKS('public','cool_animals'); executed -- Delete leftover files (aka VACUUM) farm=> SELECT CLEANUP_EXTENTS('public','cool_animals'); executed farm=> SELECT delete_predicate FROM sqream_catalog.delete_predicates dp JOIN sqream_catalog.tables t ON dp.table_id = t.table_id WHERE t.table_name = 'cool_animals'; 0 rows 2.1.5.8.4 Best practices for data deletion • Run CLEANUP_CHUNKS and CLEANUP_EXTENTS after large DELETE operations. • When deleting large proportions of data from very large tables, consider running a CREATE TABLE AS operation instead, then rename and drop the original table. • Avoid killing CLEANUP_EXTENTS operations after they’ve started. • SQream DB is optimised for time-based data. When data is naturally ordered by a date or timestamp, deleting based on those columns will perform best. For more information, see our time based data management guide. 2.1.5.9 Time based data management SQream DB’s columnar-storage system is well adapted to timestamped data. When loading data with natural ordering (sorted by a timestamp), SQream DB organizes and collects metadata in chunks of time. Natural ordering allows for fast retrieval when performing range queries. In this topic: • Timestamped data • Chunking • Use cases • Best practices for time-based data – Use date and datetime types for columns – Ordering – Limit workload by timestamp 2.1. Full list of guides 335 SQream DB, Release 2021.1 2.1.5.9.1 Timestamped data Timestamped data usually has some interesting attributes: • Data is loaded in a natural order, as it is created • Updates are infrequent or non-existent. Any updates are done by inserting a new row relating to a new timestamp • Queries on timestamped data is typically on continuous time ranges • Inserting and reading data are performed in independently, not in the operation or transaction • Timestamped data has a high data volume and accumulates faster than typical OLTP workloads 2.1.5.9.2 Chunking Core to handling timestamped data is SQream DB’s chunking and metadata system. When data is inserted, data is automatically partitioned vertically by column, and horizontally by chunk. A chunk can be thought of as an automatic partition that spans several millions of records of one column. Unlike node-partitioning (or sharding), chunking carries several benefits: • Chunks are small enough to allow multiple workers to read them concurrently • Chunks are optimized for fast insertion of data • Chunks carry metadata, which narrows down their contents for the optimizer • Chunks are ideal for data retension as they can be deleted en-masse • Chunks are optimized for reading into RAM and the GPU • Chunks are compressed individually, which improves compression and data locality 2.1.5.9.3 Use cases Consider a set of data with a timestamp column. The timestamp order matches the order of data insertion (i.e. newer data is loaded after older data). This is common when you insert data in small bulks - every 15 minutes, every hour, every day, etc. SQream DB’s storage works by appending new data, partitioned into chunks containing millions of values. As new data is loaded, it is chunked and appended to a table. This is particularly useful in many scenarios: • You run analytical queries spanning specific date ranges (e.g. the sum of transactions during the summer in 2020 vs. the summer in 2019) • You delete data when it is older than X months old • Regulations instruct you to keep several years’ worth of data, but you’re not interested in querying this data all the time 336 Chapter 2. Guides SQream DB, Release 2021.1 2.1.5.9.4 Best practices for time-based data 2.1.5.9.4.1 Use date and datetime types for columns When creating tables with dates or timestamps, using the purpose-built DATE and DATETIME types over integer types or VARCHAR will bring performance and storage footprint improvements, and in many cases huge performance improvements (as well as data integrity benefits). SQream DB stores dates and datetimes very efficiently and can strongly optimize queries using these specific types. 2.1.5.9.4.2 Ordering Data ordering is an important factor in minimizing storage size and improving query performance. Prioritize inserting data based on timestamps. This will likely reduces the number of chunks that SQream DB reads during query execution. See our Data clustering guide to see how clustering keys can be defined for optimizing data order. 2.1.5.9.4.3 Limit workload by timestamp Grouping by and filtering data based on timestamps will improve performance. For example, SELECT DATEPART(HOUR, timestamp), MIN(transaction_amount), MAX(transaction_amount), avg(transaction_amount) FROM transactions WHERE timestamp BETWEEN (CURRENT_TIMESTAMP AND DATEADD(MONTH,-3,CURRENT_TIMESTAMP)) GROUP BY 1; 2.1.5.10 Foreign Tables Foreign tables can be used to run queries directly on data without inserting it into SQream DB first. SQream DB supports read only foreign tables, so you can query from foreign tables, but you cannot insert to them, or run deletes or updates on them. Running queries directly on foreign (external) data is most effectively used for things like one off querying. If you will be repeatedly querying data, the performance will usually be better if you insert the data into SQream DB first. Although foreign tables can be used without inserting data into SQream DB, one of their main use cases is to help with the insertion process. An insert select statement on an foreign table can be used to insert data into SQream using the full power of the query engine to perform ETL. In this topic: • What kind of data is supported? • What kind of data staging is supported? 2.1. Full list of guides 337 SQream DB, Release 2021.1 • Using foreign tables - a practical example – Planning for data staging – Creating the foreign table ∗ From S3 ∗ From HDFS – Querying foreign tables – Modifying data from staging – Converting a foreign table to a standard database table • Error handling and limitations 2.1.5.10.1 What kind of data is supported? SQream DB uses foreign data wrappers (FDW) to abstract external sources. SQream DB supports these FDWs: • text files (e.g. CSV, PSV, TSV) via the csv_fdw • ORC via the orc_fdw • Parquet via the parquet_fdw 2.1.5.10.2 What kind of data staging is supported? SQream DB can stage data from: • a local filesystem (e.g. /mnt/storage/....) • Amazon S3 buckets (e.g. s3://pp-secret-bucket/users/*.parquet) • Using SQream in an HDFS Environment (e.g. hdfs://hadoop-nn.piedpiper.com/rhendricks/*. csv) 2.1.5.10.3 Using foreign tables - a practical example Use an foreign table to stage data before loading from CSV, Parquet or ORC files. 2.1.5.10.3.1 Planning for data staging For the following examples, we will want to interact with a CSV file. Here’s a peek at the table contents: 338 Chapter 2. Guides SQream DB, Release 2021.1 Table 16: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics The file is stored on Amazon S3, at s3://sqream-demo-data/nba_players.csv. We will make note of the file structure, to create a matching CREATE_EXTERNAL_TABLE statement. 2.1.5.10.3.2 Creating the foreign table Based on the source file structure, we we create a foreign table with the appropriate structure, and point it to the file. The file format in this case is CSV, with a DOS newline(\r\n). Here’s how the table would be created from S3 and HDFS: 2.1.5.10.3.3 From S3 CREATE FOREIGN TABLE nba ( Name varchar(40), Team varchar(40), Number tinyint, Position varchar(2), Age tinyint, Height varchar(4), Weight real, College varchar(40), Salary float ) WRAPPER csv_fdw -- Text file OPTIONS (continues on next page) 2.1. Full list of guides 339 SQream DB, Release 2021.1 (continued from previous page) ( LOCATION = 's3://sqream-demo-data/nba_players.csv', RECORD_DELIMITER = '\r\n'; -- DOS delimited file ) ; 2.1.5.10.3.4 From HDFS CREATE FOREIGN TABLE nba ( Name varchar(40), Team varchar(40), Number tinyint, Position varchar(2), Age tinyint, Height varchar(4), Weight real, College varchar(40), Salary float ) WRAPPER csv_fdw -- Text file OPTIONS ( LOCATION = 'hdfs://hadoop-nn.piedpiper.com:8020/demo-data/nba_players.csv', RECORD_DELIMITER = '\r\n'; -- DOS delimited file ) ; 2.1.5.10.3.5 Querying foreign tables Let’s peek at the data from the foreign table: t=> SELECT * FROM nba LIMIT 10; name | team | number | position | age | height | weight | college ␣ ,→ | salary ------+------+------+------+-----+------+------+------,→------+------Avery Bradley | Boston Celtics | 0 | PG | 25 | 6-2 | 180 | Texas ␣ ,→ | 7730337 Jae Crowder | Boston Celtics | 99 | SF | 25 | 6-6 | 235 | Marquette ␣ ,→ | 6796117 John Holland | Boston Celtics | 30 | SG | 27 | 6-5 | 205 | Boston␣ ,→University | R.J. Hunter | Boston Celtics | 28 | SG | 22 | 6-5 | 185 | Georgia␣ ,→State | 1148640 Jonas Jerebko | Boston Celtics | 8 | PF | 29 | 6-10 | 231 | ␣ ,→ | 5000000 Amir Johnson | Boston Celtics | 90 | PF | 29 | 6-9 | 240 | ␣ ,→ | 12000000 Jordan Mickey | Boston Celtics | 55 | PF | 21 | 6-8 | 235 | LSU ␣ ,→ | 1170960 (continues on next page) 340 Chapter 2. Guides SQream DB, Release 2021.1 (continued from previous page) Kelly Olynyk | Boston Celtics | 41 | C | 25 | 7-0 | 238 | Gonzaga ␣ ,→ | 2165160 Terry Rozier | Boston Celtics | 12 | PG | 22 | 6-2 | 190 | Louisville ␣ ,→ | 1824360 Marcus Smart | Boston Celtics | 36 | PG | 22 | 6-4 | 220 | Oklahoma␣ ,→State | 3431040 2.1.5.10.3.6 Modifying data from staging One of the main reasons for staging data is to examine the contents and modify them before loading them. Assume we are unhappy with weight being in pounds, because we want to use kilograms instead. We can apply the transformation as part of a query: t=> SELECT name, team, number, position, age, height, (weight / 2.205) as weight,␣ ,→college, salary . FROM nba . ORDER BY weight; name | team | number | position | age | height |␣ ,→weight | college | salary ------+------+------+------+-----+------+--- ,→------+------+------Nikola Pekovic | Minnesota Timberwolves | 14 | C | 30 | 6-11 | ␣ ,→139.229 | | 12100000 Boban Marjanovic | San Antonio Spurs | 40 | C | 27 | 7-3 |␣ ,→131.5193 | | 1200000 Al Jefferson | Charlotte Hornets | 25 | C | 31 | 6-10 |␣ ,→131.0658 | | 13500000 Jusuf Nurkic | Denver Nuggets | 23 | C | 21 | 7-0 |␣ ,→126.9841 | | 1842000 Andre Drummond | Detroit Pistons | 0 | C | 22 | 6-11 |␣ ,→126.5306 | Connecticut | 3272091 Kevin Seraphin | New York Knicks | 1 | C | 26 | 6-10 |␣ ,→126.0771 | | 2814000 Brook Lopez | Brooklyn Nets | 11 | C | 28 | 7-0 |␣ ,→124.7166 | Stanford | 19689000 Jahlil Okafor | Philadelphia 76ers | 8 | C | 20 | 6-11 |␣ ,→124.7166 | Duke | 4582680 Cristiano Felicio | Chicago Bulls | 6 | PF | 23 | 6-10 |␣ ,→124.7166 | | 525093 [...] Now, if we’re happy with the results, we can convert the staged foreign table to a standard table 2.1.5.10.3.7 Converting a foreign table to a standard database table CREATE TABLE AS can be used to materialize a foreign table into a regular table. Tip: If you intend to use the table multiple times, convert the foreign table to a standard table. 2.1. Full list of guides 341 SQream DB, Release 2021.1 t=> CREATE TABLE real_nba AS . SELECT name, team, number, position, age, height, (weight / 2.205) as weight,␣ ,→college, salary . FROM nba . ORDER BY weight; executed t=> SELECT * FROM real_nba LIMIT 5; name | team | number | position | age | height | weight ␣ ,→| college | salary ------+------+------+------+-----+------+------,→+------+------Nikola Pekovic | Minnesota Timberwolves | 14 | C | 30 | 6-11 | 139.229␣ ,→| | 12100000 Boban Marjanovic | San Antonio Spurs | 40 | C | 27 | 7-3 | 131.5193␣ ,→| | 1200000 Al Jefferson | Charlotte Hornets | 25 | C | 31 | 6-10 | 131.0658␣ ,→| | 13500000 Jusuf Nurkic | Denver Nuggets | 23 | C | 21 | 7-0 | 126.9841␣ ,→| | 1842000 Andre Drummond | Detroit Pistons | 0 | C | 22 | 6-11 | 126.5306␣ ,→| Connecticut | 3272091 2.1.5.10.4 Error handling and limitations • Error handling in foreign tables is limited. Any error that occurs during source data parsing will result in the statement aborting. • Foreign tables are logical and do not contain any data, their structure is not verified or enforced until a query uses the table. For example, a CSV with the wrong delimiter may cause a query to fail, even though the table has been created successfully: t=> SELECT * FROM nba; master=> select * from nba; Record delimiter mismatch during CSV parsing. User defined line delimiter \n does␣ ,→not match the first delimiter \r\n found in s3://sqream-demo-data/nba.csv • Since the data for a foreign table is not stored in SQream DB, it can be changed or removed at any time by an external process. As a result, the same query can return different results each time it runs against a foreign table. Similarly, a query might fail if the external data is moved, removed, or has changed structure. 2.1.5.11 Working with External Data SQream DB supports external data sources for use with Foreign Tables, COPY FROM, and COPY TO. 2.1.5.11.1 Amazon S3 SQream DB has a native S3 connector for inserting data. The s3:// URI specifies an external file path on an S3 bucket. 342 Chapter 2. Guides SQream DB, Release 2021.1 File names may contain wildcard characters and the files can be a CSV or columnar format like Parquet and ORC. In this topic: • S3 Configuration • S3 URI Format • Authentication • Examples – Planning for Data Staging – Creating the External Table – Querying External Tables – Bulk Loading a File from a Public S3 Bucket – Loading Files from an Authenticated S3 Bucket 2.1.5.11.1.1 S3 Configuration Any SQream DB host that has access to S3 endpoints can access S3 without any configuration. To read files from an S3 bucket, it must have listable files. 2.1.5.11.1.2 S3 URI Format With S3, specify a location for a file (or files) when using COPY FROM or Foreign Tables. The general S3 syntax is: s3://bucket_name/path 2.1.5.11.1.3 Authentication SQream DB supports AWS ID and AWS SECRET authentication. These should be specified when executing a statement. 2.1.5.11.1.4 Examples Use an external table to stage data from S3 before loading from CSV, Parquet or ORC files. 2.1.5.11.1.5 Planning for Data Staging For the following examples, we will want to interact with a CSV file. Here’s a peek at the table contents: 2.1. Full list of guides 343 SQream DB, Release 2021.1 Table 17: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics The file is stored on Amazon S3, and this bucket is public and listable. We will make note of the file structure, to create a matching CREATE FOREIGN TABLE statement. 2.1.5.11.1.6 Creating the External Table Based on the source file structure, we create a foreign table with the appropriate structure, and point it to the file. CREATE FOREIGN TABLE nba ( Name varchar(40), Team varchar(40), Number tinyint, Position varchar(2), Age tinyint, Height varchar(4), Weight real, College varchar(40), Salary float ) WRAPPER csv_fdw OPTIONS ( LOCATION = 's3://sqream-demo-data/nba_players.csv', RECORD_DELIMITER = '\r\n' -- DOS delimited file ) ; The file format in this case is CSV, and it is stored as an S3 object (if thepathison Using SQream in an 344 Chapter 2. Guides SQream DB, Release 2021.1 HDFS Environment, change the URI accordingly). We also took note that the record delimiter was a DOS newline (\r\n). 2.1.5.11.1.7 Querying External Tables Let’s peek at the data from the external table: t=> SELECT * FROM nba LIMIT 10; name | team | number | position | age | height | weight | college ␣ ,→ | salary ------+------+------+------+-----+------+------+------,→------+------Avery Bradley | Boston Celtics | 0 | PG | 25 | 6-2 | 180 | Texas ␣ ,→ | 7730337 Jae Crowder | Boston Celtics | 99 | SF | 25 | 6-6 | 235 | Marquette ␣ ,→ | 6796117 John Holland | Boston Celtics | 30 | SG | 27 | 6-5 | 205 | Boston␣ ,→University | R.J. Hunter | Boston Celtics | 28 | SG | 22 | 6-5 | 185 | Georgia␣ ,→State | 1148640 Jonas Jerebko | Boston Celtics | 8 | PF | 29 | 6-10 | 231 | ␣ ,→ | 5000000 Amir Johnson | Boston Celtics | 90 | PF | 29 | 6-9 | 240 | ␣ ,→ | 12000000 Jordan Mickey | Boston Celtics | 55 | PF | 21 | 6-8 | 235 | LSU ␣ ,→ | 1170960 Kelly Olynyk | Boston Celtics | 41 | C | 25 | 7-0 | 238 | Gonzaga ␣ ,→ | 2165160 Terry Rozier | Boston Celtics | 12 | PG | 22 | 6-2 | 190 | Louisville ␣ ,→ | 1824360 Marcus Smart | Boston Celtics | 36 | PG | 22 | 6-4 | 220 | Oklahoma␣ ,→State | 3431040 2.1.5.11.1.8 Bulk Loading a File from a Public S3 Bucket The COPY FROM command can also be used to load data without staging it first. Note: The bucket must be publicly available and objects can be listed COPY nba FROM 's3://sqream-demo-data/nba.csv' WITH OFFSET 2 RECORD DELIMITER '\r\n'; 2.1.5.11.1.9 Loading Files from an Authenticated S3 Bucket COPY nba FROM 's3://secret-bucket/*.csv' WITH OFFSET 2 RECORD DELIMITER '\r\n' AWS_ID '12345678' AWS_SECRET 'super_secretive_secret'; 2.1. Full list of guides 345 SQream DB, Release 2021.1 2.1.5.11.2 Using SQream in an HDFS Environment 2.1.5.11.2.1 Configuring an HDFS Environment for the User sqream This section describes how to configure an HDFS environment for the user sqream and is only relevant for users with an HDFS environment. To configure an HDFS environment for the user sqream: 1. Open your bash_profile configuration file for editing: $ vim /home/sqream/.bash_profile 3. Verify that the edits have been made: source /home/sqream/.bash_profile 4. Check if you can access Hadoop from your machine: $ hadoop fs -ls hdfs:// 5. Verify that an HDFS environment exists for SQream services: $ ls -l /etc/sqream/sqream_env.sh 6. If an HDFS environment does not exist for SQream services, create one (sqream_env.sh): $ #!/bin/bash $ SQREAM_HOME=/usr/local/sqream $ export SQREAM_HOME $ export JAVA_HOME=${SQREAM_HOME}/hdfs/jdk $ export HADOOP_INSTALL=${SQREAM_HOME}/hdfs/hadoop $ export CLASSPATH=`${HADOOP_INSTALL}/bin/hadoop classpath --glob` $ export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_INSTALL}/lib/native $ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${SQREAM_HOME}/lib:$HADOOP_COMMON_LIB_ ,→NATIVE_DIR $ PATH=$PATH:$HOME/.local/bin:$HOME/bin:${SQREAM_HOME}/bin/:${JAVA_HOME}/bin: ,→$HADOOP_INSTALL/bin $ export PATH Back to top 2.1.5.11.2.2 Authenticate Hadoop Servers that Require Kerberos If your Hadoop server requires Kerberos authentication, do the following: 1. Create a principal for the user sqream. $ kadmin -p root/[email protected] $ addprinc [email protected] 346 Chapter 2. Guides SQream DB, Release 2021.1 2. If you do not know yor Kerberos root credentials, connect to the Kerberos server as a root user with ssh and run kadmin.local: $ kadmin.local Running kadmin.local does not require a password. 3. If a password is not required, change your password to [email protected]. $ change_password [email protected] 4. Connect to the hadoop name node using ssh: $ cd /var/run/cloudera-scm-agent/process 5. Check the most recently modified content of the directory above: $ ls -lrt 5. Look for a recently updated folder containing the text hdfs. The following is an example of the correct folder name: cd This folder should contain a file named hdfs.keytab or another similar .keytab file. 6. Copy the .keytab file to user sqream’s Home directory on the remote machines that you are planning to use Hadoop on. 7. Copy the following files to the sqream sqream@server: $ sudo chown sqream:sqream /home/sqream/hdfs.keytab $ sudo chmod 600 /home/sqream/hdfs.keytab 9. Log into the sqream server. 10. Log in as the user sqream. 11. Navigate to the Home directory and check the name of a Kerberos principal represented by the following .keytab file: $ klist -kt hdfs.keytab The following is an example of the correct output: $ sqream@Host-121 ~ $ klist -kt hdfs.keytab $ Keytab name: FILE:hdfs.keytab $ KVNO Timestamp Principal $ ------,→-- (continues on next page) 2.1. Full list of guides 347 SQream DB, Release 2021.1 (continued from previous page) $ 5 09/15/2020 18:03:05 HTTP/[email protected] $ 5 09/15/2020 18:03:05 HTTP/[email protected] $ 5 09/15/2020 18:03:05 HTTP/[email protected] $ 5 09/15/2020 18:03:05 HTTP/[email protected] $ 5 09/15/2020 18:03:05 HTTP/[email protected] $ 5 09/15/2020 18:03:05 HTTP/[email protected] $ 5 09/15/2020 18:03:05 HTTP/[email protected] $ 5 09/15/2020 18:03:05 HTTP/[email protected] $ 5 09/15/2020 18:03:05 hdfs/[email protected] $ 5 09/15/2020 18:03:05 hdfs/[email protected] $ 5 09/15/2020 18:03:05 hdfs/[email protected] $ 5 09/15/2020 18:03:05 hdfs/[email protected] $ 5 09/15/2020 18:03:05 hdfs/[email protected] $ 5 09/15/2020 18:03:05 hdfs/[email protected] $ 5 09/15/2020 18:03:05 hdfs/[email protected] $ 5 09/15/2020 18:03:05 hdfs/[email protected] 12. Verify that the hdfs service named hdfs/[email protected] is shown in the generated output above. 13. Run the following: $ kinit -kt hdfs.keytab hdfs/[email protected] 13. Check the output: $ klist The following is an example of the correct output: $ Ticket cache: FILE:/tmp/krb5cc_1000 $ Default principal: [email protected] $ $ Valid starting Expires Service principal $ 09/16/2020 13:44:18 09/17/2020 13:44:18 krbtgt/[email protected] 14. List the files located at the defined server name or IP address: $ hadoop fs -ls hdfs:// 15. Do one of the following: • If the list below is output, continue with Step 16. • If the list is not output, verify that your environment has been set up correctly. If any of the following are empty, verify that you followed Step 6 in the Configuring an HDFS Envi- ronment for the User sqream section above correctly: $ echo $JAVA_HOME $ echo $SQREAM_HOME $ echo $CLASSPATH $ echo $HADOOP_COMMON_LIB_NATIVE_DIR $ echo $LD_LIBRARY_PATH $ echo $PATH 16. Verify that you copied the correct keytab file. 348 Chapter 2. Guides SQream DB, Release 2021.1 17. Review this procedure to verify that you have followed each step. NOTE: If you are using more than one server, you must start the metadataserver and serverpicker services on one node only. 18. Start the following SQream services manually and verify that they are running correctly: $ sudo systemctl start metadataserver $ sudo systemctl start serverpicker $ sudo systemctl start sqream1 $ sudo systemctl start sqream2 $ sudo systemctl start sqream3 $ sudo systemctl start sqream4 19. Verify that all SQream processes are running and listening: $ sudo systemctl status metadataserver $ sudo systemctl status serverpicker $ sudo systemctl status sqream1 NOTE: Run sudo systemctl status on all servers. 20. Verify that SQream is listening on all ports: $ sudo netstat -nltp NOTE: Depending on the package build GPU optimization, SQream takes several minutes to start listening on its ports. You can check your service logs in /var/log/sqream to check if SQream has started listening on its ports. Back to top 2.1.5.12 Transactions SQream DB supports serializable transactions. This is also called ‘ACID compliance’. The implementation of transactions means that commit, rollback and recovery are all extremely fast. SQream DB has extremely fast bulk insert speed, with minimal slowdown when running concurrent inserts. There is no performance reason to break large inserts up into multiple transactions. The phrase “supporting transactions” for a database system sometimes means having good performance for OLTP workloads, SQream DB’s transaction system does not have high performance for high concurrency OLTP workloads. SQream DB also supports transactional DDL. 2.1.5.13 Workload Manager The workload manager (WLM) allows SQream DB workers to identify their availability to clients with specific service names. The load balancer will then use that information to route statements to specific workers. 2.1.5.13.1 Why use the workload manager? The workload manager allows a system engineer or database administrator to allocate specific workers and compute resoucres for various tasks. 2.1. Full list of guides 349 SQream DB, Release 2021.1 For example: 1. Creating a service queue named ETL and allocating two workers exclusively to this service prevents non-ETL statements from utilizing these compute resources. 2. Creating a service for the company’s leadership during working hours for dedicated access, and disabling this service at night to allow maintenance operations to use the available compute. 2.1.5.13.2 Setting up service queues By default, every worker subscribes to the sqream service queue. Additional service names are configured in the configuration file for every worker, but canalsobe set on a per-session basis. 2.1.5.13.3 Example - Allocating resources for ETL We want to allocate resources for ETL to ensure a good quality of service, but we also have management users who don’t like waiting. The configuration in this example allocates resources as follows: • 1 worker for ETL work • 3 workers for general queries • All workers assigned to queries from management Service / Worker Worker #1 Worker #2 Worker #3 Worker #4 ETL ✓ Query service ✓ ✓ ✓ Management ✓ ✓ ✓ ✓ This configuration gives the ETL queue dedicated access to two workers, one of which can’t beusedby regular queries. Queries from management will use any available worker. 2.1.5.13.3.1 Creating this configuration The persistent configuration for this setup is listed in these four configuration files. Each worker gets a comma-separated list of service queues that it subscribes to. These services are specified in the initialSubscribedServices attribute. Listing 13: Worker #1 { "compileFlags":{ }, "runtimeFlags":{ }, "runtimeGlobalFlags":{ "initialSubscribedServices" : "etl,management" (continues on next page) 350 Chapter 2. Guides SQream DB, Release 2021.1 (continued from previous page) }, "server":{ "gpu": 0, "port": 5000, "cluster": "/home/rhendricks/raviga_database", "licensePath": "/home/sqream/.sqream/license.enc" } } Listing 14: Workers #2, #3, #4 { "compileFlags":{ }, "runtimeFlags":{ }, "runtimeGlobalFlags":{ "initialSubscribedServices" : "query,management" }, "server":{ "gpu": 1, "port": 5001, "cluster": "/home/rhendricks/raviga_database", "licensePath": "/home/sqream/.sqream/license.enc" } } Tip: This configuration can be created temporarily (for the current session only) by using the SUB- SCRIBE_SERVICE and UNSUBSCRIBE_SERVICE statements. 2.1.5.13.3.2 Verifying the configuration Use SHOW_SUBSCRIBED_INSTANCES to view service subscriptions for each worker. Use ref:show_server_status to see the statement queues. t=> SELECT SHOW_SUBSCRIBED_INSTANCES(); service | servernode | serverip | serverport ------+------+------+------management | node_9383 | 192.168.0.111 | 5000 etl | node_9383 | 192.168.0.111 | 5000 query | node_9384 | 192.168.0.111 | 5001 management | node_9384 | 192.168.0.111 | 5001 query | node_9385 | 192.168.0.111 | 5002 management | node_9385 | 192.168.0.111 | 5002 query | node_9551 | 192.168.1.91 | 5000 management | node_9551 | 192.168.1.91 | 5000 2.1. Full list of guides 351 SQream DB, Release 2021.1 2.1.5.13.4 Configuring a client to connect to a specific service 2.1.5.13.4.1 Using Sqream SQL CLI Reference Add --service= $ sqream sql --port=3108 --clustered --username=mjordan --databasename=master -- ,→service=etl Password: Interactive client mode To quit, use ^D or \q. master=>_ 2.1.5.13.4.2 Using JDBC Add --service= Listing 15: JDBC connection string jdbc:Sqream://127.0.0.1:3108/raviga;user=rhendricks;password=Tr0ub4dor&3;service=etl; ,→cluster=true;ssl=false; 2.1.5.13.4.3 Using ODBC On Linux, modify the DSN parameters in odbc.ini. For example, Service="etl": Listing 16: odbc.ini [sqreamdb] Description=64-bit Sqream ODBC Driver=/home/rhendricks/sqream_odbc64/sqream_odbc64.so Server="127.0.0.1" Port="3108" Database="raviga" Service="etl" User="rhendricks" Password="Tr0ub4dor&3" Cluster=true Ssl=false On Windows, change the parameter in the DSN editing window. 2.1.5.13.4.4 Using Python (pysqream) In Python, set the service parameter in the connection command: 352 Chapter 2. Guides SQream DB, Release 2021.1 Listing 17: Python con = pysqream.connect(host='127.0.0.1', port=3108, database='raviga' , username='rhendricks', password='Tr0ub4dor&3' , clustered=True, use_ssl = False, service='etl') 2.1.5.13.4.5 Using Node.JS Add the service to the connection settings: Listing 18: Node.JS const Connection = require('sqreamdb'); const config = { host: '127.0.0.1', port: 3108, username: 'rhendricks', password: 'Tr0ub4dor&3', connectDatabase: 'raviga', cluster: 'true', service: 'etl' }; 2.1.5.14 Python UDF (user-defined functions) User-defined functions (UDFs) are a feature that extends SQream DB’s built in SQL functionality. SQream DB’s Python UDFs allow developers to create new functionality in SQL by writing the lower-level language implementation in Python. In this topic: • A simple example • Why use UDFs? • SQream DB’s UDF support – Scalar functions – Python – Using modules • Finding existing UDFs in the catalog • Getting the DDL for a function • Error handling • Permissions and sharing • Best practices 2.1. Full list of guides 353 SQream DB, Release 2021.1 2.1.5.14.1 A simple example Most databases have an UPPER function, including SQream DB. However, assume that this function is missing for the sake of this example. You can write a function in Python to uppercase a text value using the CREATE FUNCTION syntax. CREATE FUNCTION my_upper (x1 text) RETURNS text AS $$ return x1.upper() $$ LANGUAGE PYTHON; Let’s break down this example: • CREATE FUNCTION my_upper - Create a function called my_upper. This name must be unique in the current database • (x1 text) - the function accepts one argument named x1 which is of the SQL type TEXT. All data types are supported. • RETURNS text - the function returns the same type - TEXT. All data types are supported. • AS $$ - what follows is some code that we don’t want to quote, so we use dollar-quoting ($$) instead of single quotes ('). • return x1.upper() - the Python function’s body is the argument named x1, uppercased. • $$ LANGUAGE PYTHON - this is the end of the function, and it’s in the Python language. Running this example After creating the function, you can use it in any SQL query. For example: master=>CREATE TABLE jabberwocky(line text); executed master=> INSERT INTO jabberwocky VALUES . ('''Twas brillig, and the slithy toves '), (' Did gyre and gimble in the wabe: ,→') . ,('All mimsy were the borogoves, '), (' And the mome raths outgrabe. ') . ,('"Beware the Jabberwock, my son! '), (' The jaws that bite, the claws that␣ ,→catch! ') . ,('Beware the Jubjub bird, and shun '), (' The frumious Bandersnatch!" '); executed master=> SELECT line, my_upper(line) FROM jabberwocky; line | my_upper ------+------,→------'Twas brillig, and the slithy toves | 'TWAS BRILLIG, AND THE SLITHY TOVES Did gyre and gimble in the wabe: | DID GYRE AND GIMBLE IN THE WABE: All mimsy were the borogoves, | ALL MIMSY WERE THE BOROGOVES, And the mome raths outgrabe. | AND THE MOME RATHS OUTGRABE. "Beware the Jabberwock, my son! | "BEWARE THE JABBERWOCK, MY SON! The jaws that bite, the claws that catch! | THE JAWS THAT BITE, THE CLAWS␣ ,→THAT CATCH! (continues on next page) 354 Chapter 2. Guides SQream DB, Release 2021.1 (continued from previous page) Beware the Jubjub bird, and shun | BEWARE THE JUBJUB BIRD, AND SHUN The frumious Bandersnatch!" | THE FRUMIOUS BANDERSNATCH!" 2.1.5.14.2 Why use UDFs? • They allow simpler statements - You can create the function once, store it in the database, and call it any number of times in a statement. • They can be shared - UDFs can be created by a database administrator, and then used by other roles. • They can simplify downstream code - UDFs can be modified in SQream DB independently of program source code. 2.1.5.14.3 SQream DB’s UDF support 2.1.5.14.3.1 Scalar functions SQream DB’s UDFs are scalar functions. This means that the UDF returns a single data value of the type defined in the RETURNS clause. For an inline scalar function, the returned scalar value is the result of a single statement. 2.1.5.14.3.2 Python At this time, SQream DB’s UDFs are supported for Python. Python 3.6.7 is installed alongside SQream DB, for use exclusively by SQream DB. You may have a different version of Python installed on your server. To find which version of Python is installed for use by SQream DB, create and run thisUDF: master=> CREATE OR REPLACE FUNCTION py_version() . RETURNS text . AS $$ . import sys . return ("Python version: " + sys.version + ". Path: " + sys.base_exec_prefix) . $$ LANGUAGE PYTHON; executed master=> SELECT py_version(); py_version ------Python version: 3.6.7 (default, Jul 22 2019, 11:03:54) [GCC 5.4.0]. Path: /opt/sqream/python-3.6.7-5.4.0 2.1.5.14.3.3 Using modules To import a Python module, use the standard import syntax in the first lines of the user-defined function. 2.1. Full list of guides 355 SQream DB, Release 2021.1 2.1.5.14.4 Finding existing UDFs in the catalog The user_defined_functions catalog view contains function information. Here’s how you’d list all UDFs in the system: master=> SELECT * FROM sqream_catalog.user_defined_functions; database_name | function_id | function_name ------+------+------master | 1 | my_upper 2.1.5.14.5 Getting the DDL for a function master=> SELECT GET_FUNCTION_DDL('my_upper'); ddl ------create function "my_upper" (x1 text) returns text as $$ return x1.upper() $$ language python volatile; See GET_FUNCTION_DDL for more information. 2.1.5.14.6 Error handling In UDFs, any error that occurs causes the execution of the function to stop. This in turn causes the statement that invoked the function to be canceled. 2.1.5.14.7 Permissions and sharing To create a UDF, the creator needs the CREATE FUNCTION permission at the database level. For example, to grant CREATE FUNCTION to a non-superuser role: GRANT CREATE FUNCTION ON DATABASE master TO mjordan; To execute a UDF, the role needs the EXECUTE FUNCTION permission for every function. For example, to grant the permission to the r_bi_users role group, run: GRANT EXECUTE ON FUNCTION my_upper TO r_bi_users; Note: Functions are stored for each database, outside of any schema. See more information about permissions in the Access control guide. 356 Chapter 2. Guides SQream DB, Release 2021.1 2.1.5.14.8 Best practices Although user-defined functions add flexibility, they may have some performance drawbacks. Theyarenot usually a replacement for subqueries or views. In some cases, the user-defined function provides benefits like sharing extended functionality which makesit very appealing. Use user-defined functions sparingly in the WHERE clause. SQream DB can’t optimize the function’s usage, and it will be called once for every value. If possible, you should narrow down the number of results before the UDF is called by using a subquery. 2.1.5.15 Saved queries Saved queries can be used to reuse a query plan for a query to eliminate compilation times for repeated queries. They also provide a way to implement ‘parameterized views’. 2.1.5.15.1 How saved queries work Saved queries are compiled when they are created. When a saved query is run, this query plan is used instead of compiling a query plan at query time. 2.1.5.15.2 Parameters support Query parameters can be used as substitutes for literal expressions in queries. • Parameters cannot be used to substitute things like column names and table names. • Query parameters of a string datatype (like VARCHAR) must be of a fixed length, and can be used in equality checks, but not patterns (e.g. LIKE, RLIKE, etc.) 2.1.5.15.3 Creating a saved query A saved query is created using the SAVE_QUERY utility command. 2.1.5.15.3.1 Saving a simple query t=> SELECT SAVE_QUERY('select_all','SELECT * FROM nba'); executed 2.1.5.15.3.2 Saving a parametrized query Use parameters to replace them later at execution time. t=> SELECT SAVE_QUERY('select_by_weight_and_team','SELECT * FROM nba WHERE Weight > ?␣ ,→AND Team = ?'); executed 2.1. Full list of guides 357 SQream DB, Release 2021.1 2.1.5.15.4 Listing and executing saved queries Saved queries are saved as a database objects. They can be listed in one of two ways: Using the catalog: t=> SELECT * FROM sqream_catalog.savedqueries; name | num_parameters ------+------select_all | 0 select_by_weight | 1 select_by_weight_and_team | 2 Using the LIST_SAVED_QUERIES utility function: t=> SELECT LIST_SAVED_QUERIES(); saved_query ------select_all select_by_weight select_by_weight_and_team Executing a saved query requires calling it by it’s name in a EXECUTE_SAVED_QUERY statement. A saved query with no parameter is called without parameters. t=> SELECT EXECUTE_SAVED_QUERY('select_all'); Name | Team | Number | Position | Age | Height |␣ ,→Weight | College | Salary ------+------+------+------+-----+------+--- ,→-----+------+------Avery Bradley | Boston Celtics | 0 | PG | 25 | 6-2 | ␣ ,→ 180 | Texas | 7730337 Jae Crowder | Boston Celtics | 99 | SF | 25 | 6-6 | ␣ ,→ 235 | Marquette | 6796117 John Holland | Boston Celtics | 30 | SG | 27 | 6-5 | ␣ ,→ 205 | Boston University | R.J. Hunter | Boston Celtics | 28 | SG | 22 | 6-5 | ␣ ,→ 185 | Georgia State | 1148640 [...] Executing a saved query with parameters requires specifying the parameters in the order they appear in the query: t=> SELECT EXECUTE_SAVED_QUERY('select_by_weight_and_team', 240, 'Toronto Raptors'); Name | Team | Number | Position | Age | Height | Weight |␣ ,→College | Salary ------+------+------+------+-----+------+------+------,→-----+------Bismack Biyombo | Toronto Raptors | 8 | C | 23 | 6-9 | 245 | ␣ ,→ | 2814000 James Johnson | Toronto Raptors | 3 | PF | 29 | 6-9 | 250 | Wake␣ ,→Forest | 2500000 Jason Thompson | Toronto Raptors | 1 | PF | 29 | 6-11 | 250 | Rider ␣ ,→ | 245177 (continues on next page) 358 Chapter 2. Guides SQream DB, Release 2021.1 (continued from previous page) Jonas Valanciunas | Toronto Raptors | 17 | C | 24 | 7-0 | 255 | ␣ ,→ | 4660482 2.1.5.15.5 Dropping a saved query When you’re done with a saved query, or would like to replace it with another, you can drop it with DROP_SAVED_QUERY : t=> SELECT DROP_SAVED_QUERY('select_all'); executed t=> SELECT DROP_SAVED_QUERY('select_by_weight_and_team'); executed t=> SELECT LIST_SAVED_QUERIES(); saved_query ------select_by_weight 2.1.5.16 Seeing system objects as DDL 2.1.5.16.1 Dump specific objects 2.1.5.16.1.1 Tables See GET_DDL for more information. Examples 2.1.5.16.1.2 Getting the DDL for a table farm=> SELECT GET_DDL('cool_animals'); create table "public"."cool_animals" ( "id" int not null, "name" varchar(30) not null, "weight" double null, "is_agressive" bool default false not null ) ; 2.1.5.16.1.3 Exporting table DDL to a file COPY (SELECT GET_DDL('cool_animals')) TO '/home/rhendricks/animals.ddl'; 2.1.5.16.1.4 Views See GET_VIEW_DDL for more information. 2.1. Full list of guides 359 SQream DB, Release 2021.1 Examples 2.1.5.16.1.5 Listing all views farm=> SELECT view_name FROM sqream_catalog.views; view_name ------angry_animals only_agressive_animals 2.1.5.16.1.6 Getting the DDL for a view farm=> SELECT GET_VIEW_DDL('angry_animals'); create view "public".angry_animals as select "cool_animals"."id" as "id", "cool_animals"."name" as "name", "cool_animals"."weight" as "weight", "cool_animals"."is_agressive" as "is_agressive" from "public".cool_animals as cool_animals where "cool_animals"."is_agressive" = false; 2.1.5.16.1.7 Exporting view DDL to a file COPY (SELECT GET_VIEW_DDL('angry_animals')) TO '/home/rhendricks/angry_animals.sql'; 2.1.5.16.1.8 User defined functions See GET_FUNCTION_DDL for more information. Examples 2.1.5.16.1.9 Listing all UDFs master=> SELECT * FROM sqream_catalog.user_defined_functions; database_name | function_id | function_name ------+------+------master | 1 | my_distance 2.1.5.16.1.10 Getting the DDL for a function 360 Chapter 2. Guides SQream DB, Release 2021.1 master=> SELECT GET_FUNCTION_DDL('my_distance'); create function "my_distance" (x1 float, y1 float, x2 float, y2 float) returns float as $$ import math if y1 < x1: return 0.0 else: return math.sqrt((y2 - y1) ** 2 + (x2 - x1) ** 2) $$ language python volatile; 2.1.5.16.1.11 Exporting function DDL to a file COPY (SELECT GET_FUNCTION_DDL('my_distance')) TO '/home/rhendricks/my_distance.sql'; 2.1.5.16.1.12 Saved queries See LIST_SAVED_QUERIES, SHOW_SAVED_QUERY for more information. 2.1.5.16.2 Dump entire database DDLs Dumping the database DDL includes tables and views, but not UDFs and saved queries. See DUMP_DATABASE_DDL for more information. Examples 2.1.5.16.2.1 Exporting database DDL to a client farm=> SELECT DUMP_DATABASE_DDL(); create table "public"."cool_animals" ( "id" int not null, "name" varchar(30) not null, "weight" double null, "is_agressive" bool default false not null ) ; create view "public".angry_animals as select "cool_animals"."id" as "id", "cool_animals"."name" as "name", "cool_animals"."weight" as "weight", "cool_animals"."is_agressive" as "is_agressive" (continues on next page) 2.1. Full list of guides 361 SQream DB, Release 2021.1 (continued from previous page) from "public".cool_animals as cool_animals where "cool_animals"."is_agressive" = false; 2.1.5.16.2.2 Exporting database DDL to a file COPY (SELECT DUMP_DATABASE_DDL()) TO '/home/rhendricks/database.ddl'; Note: To export data in tables, see COPY TO. 2.1.6 Architecture This topic includes guides that walk an end-user, database administrator, or system architect through the main ideas behind SQream DB. While SQream DB has many similarities to other database management systems, it has some unique and additional capabilities. Explore the guides below for information about SQream DB’s architecture. 2.1.6.1 Internals and architecture 2.1.6.1.1 SQream DB internals Here is a high level architecture diagram of SQream DB’s internals. 362 Chapter 2. Guides SQream DB, Release 2021.1 2.1.6.1.1.1 Statement compiler The statement compiler is written in Haskell. This takes SQL text and produces an optimised statement plan. 2.1.6.1.1.2 Concurrency and concurrency control The execution engine in SQream DB is built around thread workers with message passing. It uses threads to overlap different kinds of operations (including IO and GPU operations with CPU operations), andto accelerate CPU intensive operations. 2.1.6.1.1.3 Transactions SQream DB has serializable transactions, with these features: • Serializable, with any kind of statement • Run multiple SELECT queries concurrently with anything • Run multiple inserts to the same table at the same time • Cannot run multiple statements in a single transaction • Other operations such as DELETE, TRUNCATE, and DDL use coarse-grained exclusive locking. 2.1.6.1.1.4 Storage The storage is split into the metadata layer and an append-only/ garbage collected bulk data layer. 2.1.6.1.1.5 Metadata layer The metadata layer uses LevelDB, and uses LevelDB’s snapshot and write atomic features as part of the transaction system. The metadata layer, together with the append-only bulk data layer help ensure consistency. 2.1.6.1.1.6 Bulk data layer The bulk data layer is comprised of extents, which are optimised for IO performance as much as possible. Inside the extents, are chunks, which are optimised for processing in the CPU and GPU. Compression is used in the extents and chunks. When you run small inserts, you will get less optimised chunks and extents, but the system is designed to both be able to still run efficiently on this, and to be able to reorganise them transactionally inthe background, without blocking DML operations. By writing small chunks in small inserts, then reorganising later, it supports both fast medium sized insert transactions and fast querying. 2.1. Full list of guides 363 SQream DB, Release 2021.1 2.1.6.1.1.7 Building blocks The heavy lifting in SQream DB is done by single purpose C++/CUDA building blocks. These are purposely designed to not be smart - they have to be instructed exactly what to do. Most of the intelligence in piecing things together is in the statement compiler. 2.1.6.1.2 Columnar Like many other analytical database management systems, SQream DB uses a column store for tables. Column stores offer better I/O and performance with analytic workloads. Columns also compress much better, and lend themselves well to bulk data. 2.1.6.1.3 GPU usage SQream DB uses GPUs for accelerating database operations. This acceleration brings additional benefit to columnar data processing. SQream DB’s GPU acceleration is integral to database operations. It is not an additional feature, but rather core to most data operations, e.g. GROUP BY, scalar functions, JOIN, ORDER BY, and more. Using a GPU is an extended form of SIMD (Single-instruction, multiple data) intended for high throughput operations. When GPU acceleraiton is used, SQream DB uses special building blocks to take advantage of the high degree of parallelism of the GPU. This means that GPU operations use a single instruction that runs on multiple values. 2.1.6.2 Filesystem and usage SQream DB writes and reads data from disk. The SQream DB storage directory, sometimes refered to as a storage cluster is a collection of database objects, metadata database, and logs. Each SQream DB worker and the metadata server must have access to the storage cluster in order to function properly. 2.1.6.2.1 Directory organization The cluster root is the directory in which all data for SQream DB is stored. SQream DB storage cluster directories • databases • metadata or leveldb • temp • logs 364 Chapter 2. Guides SQream DB, Release 2021.1 2.1.6.2.1.1 databases The databases directory houses all of the actual data in tables and columns. Each database is stored as it’s own directory. Each table is stored under it’s respective database, and columns are stored in their respective table. In the example above, the database named retail contains a table directory with a directory named 23. Tip: To find table IDs, use a catalog query: master=> SELECT table_name, table_id FROM sqream_catalog.tables WHERE table_name = ,→'customers'; table_name | table_id ------+------customers | 23 Each table directory contains a directory for each physical column. An SQL column may be built up of several physical columns (e.g. if the data type is nullable). Tip: To find column IDs, use a catalog query: master=> SELECT column_id, column_name FROM sqream_catalog.columns WHERE table_id=23; column_id | column_name ------+------0 | name@null 1 | name@val 2 | age@null (continues on next page) 2.1. Full list of guides 365 SQream DB, Release 2021.1 (continued from previous page) 3 | age@val 4 | email@null 5 | email@val Each column directory will contain extents, which are collections of chunks. 2.1.6.2.1.2 metadata or leveldb SQream DB’s metadata is an embedded key-value store, based on LevelDB. LevelDB helps SQream DB ensure efficient storage for keys, handle atomic writes, snapshots, durability, and automatic recovery. The metadata is where all database objects are stored, including roles, permissions, database and table structures, chunk mappings, and more. 2.1.6.2.1.3 temp The temp directory is where SQream DB writes temporary data. The directory to which SQream DB writes temporary data can be changed to any other directory on the filesystem. SQream recommends remapping this directory to a fast local storage to get better performance when executing intensive larger-than-RAM operations like sorting. SQream recommends an SSD or NVMe drive, in mirrored RAID 1 configuration. If desired, the temp folder can be redirected to a local disk for improved performance, by setting the tempPath setting in the configuration file. 366 Chapter 2. Guides SQream DB, Release 2021.1 2.1.6.2.1.4 logs The logs directory contains logs produced by SQream DB. See more about the logs in the Logging guide. 2.1. Full list of guides 367 SQream DB, Release 2021.1 368 Chapter 2. Guides CHAPTER 3 Reference This section provides reference for using SQream DB’s interfaces and SQL features. 3.1 Data Types This topic describes the data types that SQream DB supports, and how to convert between them. In this topic: • Supported Types • Converting and Casting Types • Data Type Reference – Numeric (NUMERIC, DECIMAL) – Boolean (BOOL) – Integers (TINYINT, SMALLINT, INT, BIGINT) – Floating Point (REAL, DOUBLE) – String (TEXT, VARCHAR) – Date (DATE, DATETIME) 3.1.1 Supported Types The following table shows the supported data types. 369 SQream DB, Release 2021.1 Name Description Data Size (Not Example Null, Uncom- pressed) BOOL Boolean 1 byte true values (true, false) TINYINT Unsigned 1 byte 5 integer (0 - 255) SMALLINT Integer 2 bytes -155 (-32,768 - 32,767) INT Integer (- 4 bytes 1648813 2,147,483,648 - 2,147,483,647) BIGINT Integer (- 8 bytes 36124441255243 9,223,372,036,854,775,808 - 9,223,372,036,854,775,807) REAL Floating 4 bytes 3.141 point (inex- act) DOUBLE (FLOAT) Floating 8 bytes 0.000003 point (inex- act) TEXT [(n)], Variable Up to 4*n bytes '' NVARCHAR (n) length string - UTF-8 unicode NUMERIC, 38 digits 16 bytes 0.123245678901234567890123456789012345678 DECIMAL VARCHAR (n) Variable n bytes 'Kiwis have tiny wings, but cannot fly.' length string - ASCII only DATE Date 4 bytes '1955-11-05' DATETIME Date and 8 bytes '1955-11-05 01:24:00.000' (TIMESTAMP) time pair- ing in UTC Note: SQream DB compresses all columns and types. The data size noted is the maximum data size allocation for uncompressed data. 370 Chapter 3. Reference SQream DB, Release 2021.1 3.1.2 Converting and Casting Types SQream DB supports explicit and implicit casting and type conversion. The system may automatically add implicit casts when combining different data types in the same expression. In many cases, while thedetails related to this are not important, they can affect the query results of a query. When necessary, an explicit cast can be used to override the automatic cast added by SQream DB. For example, the ANSI standard defines a SUM() aggregation over an INT column as an INT. However, when dealing with large amounts of data this could cause an overflow. You can rectify this by casting the value to a larger data type: SUM(some_int_column :: BIGINT) SQream DB supports the following three data conversion types: • CAST( 3.1.3 Data Type Reference 3.1.3.1 Numeric (NUMERIC, DECIMAL) The Numeric data type (also known as Decimal) is recommended for values that tend to occur as exact decimals, such as in Finance. While Numeric has a fixed precision of 38, higher than REAL (9) or DOUBLE (17), it runs calculations more slowly. For operations that require faster performance, using Floating Point is recommended. The correct syntax for Numeric is numeric(p, s)), where p is the total number of digits (38 maximum), and s is the total number of decimal digits. 3.1.3.1.1 Numeric Examples The following is an example of the Numeric syntax: $ create or replace table t(x numeric(20, 10), y numeric(38, 38)); $ insert into t values(1234567890.1234567890, 0.123245678901234567890123456789012345678); $ select x + y from t; The following table shows information relevant to the Numeric data type: Description Data Size (Not Null, Uncom- Example pressed) 38 digits 16 bytes 0.123245678901234567890123456789012345678 Numeric supports the following operations: • All join types. 3.1. Data Types 371 SQream DB, Release 2021.1 • All aggregation types (not including Window functions). • Scalar functions (not including some trigonometric and logarithmic functions). 3.1.3.2 Boolean (BOOL) The following table describes the Boolean data type. Values Syntax Data Size (Not Null, Uncom- pressed) true, false (case sensitive) When loading from CSV, BOOL 1 byte, but resulting average data columns can accept 0 as false sizes may be lower after compres- and 1 as true. sion. 3.1.3.2.1 Boolean Examples The following is an example of the Boolean syntax: CREATE TABLE animals (name TEXT, is_angry BOOL); INSERT INTO animals VALUES ('fox',true), ('cat',true), ('kiwi',false); SELECT name, CASE WHEN is_angry THEN 'Is really angry!' else 'Is not angry' END FROM␣ ,→animals; The following is an example of the correct output: "fox","Is really angry!" "cat","Is really angry!" "kiwi","Is not angry" 3.1.3.2.2 Boolean Casts and Conversions The following table shows the possible Boolean value conversions: Type Details TINYINT, SMALLINT, INT, BIGINT true → 1, false → 0 REAL, DOUBLE true → 1.0, false → 0.0 3.1.3.3 Integers (TINYINT, SMALLINT, INT, BIGINT) Integer data types are designed to store whole numbers. For more information about identity sequences (sometimes called auto-increment or auto-numbers), see Identity. 3.1.3.3.1 Integer Types The following table describes the Integer types. 372 Chapter 3. Reference SQream DB, Release 2021.1 Name Details Data Size (Not Null, Un- Example compressed) TINYINT Unsigned integer (0 - 255) 1 byte 5 SMALLINT Integer (-32,768 - 32,767) 2 bytes -155 INT Integer (-2,147,483,648 - 2,147,483,647) 4 bytes 1648813 BIGINT Integer (-9,223,372,036,854,775,808 - 8 bytes 36124441255243 9,223,372,036,854,775,807) The following table describes the Integer data type. Syntax Data Size (Not Null, Uncompressed) An integer can be entered as a regular literal, such Integer types range between 1, 2, 4, and 8 bytes - as 12, -365. but resulting average data sizes could be lower after compression. 3.1.3.3.2 Integer Examples The following is an example of the Integer syntax: CREATE TABLE cool_numbers (a INT NOT NULL, b TINYINT, c SMALLINT, d BIGINT); INSERT INTO cool_numbers VALUES (1,2,3,4), (-5, 127, 32000, 45000000000); SELECT * FROM cool_numbers; The following is an example of the correct output: 1,2,3,4 -5,127,32000,45000000000 3.1.3.3.3 Integer Casts and Conversions The following table shows the possible Integer value conversions: Type Details REAL, DOUBLE 1 → 1.0, -32 → -32.0 VARCHAR(n) (All numberic values must fit in the string length) 1 → '1', 2451 → '2451' 3.1.3.4 Floating Point (REAL, DOUBLE) The Floating Point data types (REAL and DOUBLE) store extremely close value approximations, and are therefore recommended for values that tend to be inexact, such as Scientific Notation. While Floating Point generally runs faster than Numeric, it has a lower precision of 9 (REAL) or 17 (DOUBLE) compared to Numeric’s 38. For operations that require a higher level of precision, using Numeric is recommended. The floating point representation is based on IEEE 754. 3.1. Data Types 373 SQream DB, Release 2021.1 3.1.3.4.1 Floating Point Types The following table describes the Floating Point data types. Name Details Data Size (Not Null, Uncompressed) Example REAL Single precision floating point (inexact) 4 bytes 3.141 DOUBLE Double precision floating point (inexact) 8 bytes 0.000003 The following table shows information relevant to the Floating Point data types. Aliases Syntax Data Size (Not Null, Uncom- pressed) DOUBLE is also known as FLOAT. A double precision floating point Floating point types are either 4 can be entered as a regular literal, or 8 bytes, but size could be lower such as 3.14, 2.718, .34, or 2. after compression. 71e-45. To enter a REAL floating point number, cast the value. For example, (3.14 :: REAL). 3.1.3.4.2 Floating Point Examples CREATE TABLE cool_numbers (a REAL NOT NULL, b DOUBLE); INSERT INTO cool_numbers VALUES (1,2), (3.14159265358979, 2.718281828459); SELECT * FROM cool_numbers; 1.0,2.0 3.1415927,2.718281828459 Note: Most SQL clients control display precision of floating point numbers, and values may appear differently in some clients. 3.1.3.4.3 Floating Point Casts and Conversions The following table shows the possible Floating Point value conversions: Type Details BOOL 1.0 → true, 0.0 → false TINYINT, SMALLINT, INT, 2.0 → 2, 3.14159265358979 → 3, 2.718281828459 → 2, 0.5 → 0, BIGINT 1.5 → 1 VARCHAR(n) (n > 6 recom- 1 → '1.0000', 3.14159265358979 → '3.1416' mended) Note: As shown in the above examples, when casting real to int, we round down. 374 Chapter 3. Reference SQream DB, Release 2021.1 3.1.3.5 String (TEXT, VARCHAR) TEXT and VARCHAR are types designed for storing text or strings of characters. SQream DB separates ASCII (VARCHAR) and UTF-8 representations (TEXT). Note: The data type NVARCHAR has been deprecated by TEXT as of version 2020.1. 3.1.3.5.1 String Types The following table describes the String types. Name Details Data Size (Not Example Null, Uncom- pressed) TEXT [(n)], Varaiable length string - UTF-8 uni- Up to 4*n bytes '' NVARCHAR code. NVARCHAR is synonymous with (n) TEXT. VARCHAR (n) Variable length string - ASCII only n bytes 'Kiwis have tiny wings, but cannot fly.' 3.1.3.5.2 Length When using TEXT, specifying a size is optional. If not specified, the text field carries no constraints. Tolimit the size of the input, use VARCHAR(n) or TEXT(n), where n is the permitted number of characters. The following apply to setting the String type length: • If the data exceeds the column length limit on INSERT or COPY operations, SQream DB will return an error. • When casting or converting, the string has to fit in the target. For example, 'Kiwis are weird birds' :: VARCHAR(5) will return an error. Use SUBSTRING to truncate the length of the string. • VARCHAR strings are padded with spaces. 3.1.3.5.3 Syntax String types can be written with standard SQL string literals, which are enclosed with single quotes, such as 'Kiwi bird'. To include a single quote in the string, use double quotations, such as 'Kiwi bird''s wings are tiny'. String literals can also be dollar-quoted with the dollar sign $, such as $$Kiwi bird's wings are tiny$$ is the same as 'Kiwi bird''s wings are tiny'. 3.1.3.5.4 Size VARCHAR(n) can occupy up to n bytes, whereas TEXT(n) can occupy up to 4*n bytes. However, the size of strings is variable and is compressed by SQream DB. 3.1. Data Types 375 SQream DB, Release 2021.1 3.1.3.5.5 String Examples The following is an example of the String syntax: CREATE TABLE cool_strings (a TEXT NOT NULL, b TEXT); INSERT INTO cool_strings VALUES ('hello world', 'Hello to kiwi birds specifically'); INSERT INTO cool_strings VALUES ('This is ASCII only', 'But this column can contain '); SELECT * FROM cool_strings; The following is an example of the correct output: hello world ,Hello to kiwi birds specifically This is ASCII only,But this column can contain Note: Most clients control the display precision of floating point numbers, and values may appear differently in some clients. 3.1.3.5.6 String Casts and Conversions The following table shows the possible String value conversions: Type Details BOOL 'true' → true, 'false' → false TINYINT, '2' → 2, '-128' → -128 SMALLINT, INT, BIGINT REAL, DOUBLE '2.0' → 2.0, '3.141592' → 3.141592 DATE, DATETIME Requires a supported format, such as '1955-11-05 → date '1955-11-05', '1955-11-05 01:24:00.000' → '1955-11-05 01:24:00.000' 3.1.3.6 Date (DATE, DATETIME) DATE is a type designed for storing year, month, and day. DATETIME is a type designed for storing year, month, day, hour, minute, seconds, and milliseconds in UTC with 1 millisecond precision. 3.1.3.6.1 Date Types The following table describes the Date types. 376 Chapter 3. Reference SQream DB, Release 2021.1 Table 1: Date types Name Details Data Size (Not Null, Uncom- Example pressed) DATE Date 4 bytes '1955-11-05' DATETIME Date and time pairing in 8 bytes '1955-11-05 01:24:00. UTC 000' 3.1.3.6.2 Aliases DATETIME is also known as TIMESTAMP. 3.1.3.6.3 Syntax DATE values are formatted as string literals. For example, '1955-11-05' or date '1955-11-05'. DATETIME values are formatted as string literals conforming to ISO 8601, for example '1955-11-05 01:26:00'. SQream DB attempts to guess if the string literal is a date or datetime based on context, for example when used in date-specific functions. 3.1.3.6.4 Size A DATE column is 4 bytes in length, while a DATETIME column is 8 bytes in length. However, the size of these values is compressed by SQream DB. 3.1.3.6.5 Date Examples The following is an example of the Date syntax: CREATE TABLE important_dates (a DATE, b DATETIME); INSERT INTO important_dates VALUES ('1997-01-01', '1955-11-05 01:24'); SELECT * FROM important_dates; The following is an example of the correct output: 1997-01-01,1955-11-05 01:24:00.0 The following is an example of the Datetime syntax: SELECT a :: DATETIME, b :: DATE FROM important_dates; The following is an example of the correct output: 1997-01-01 00:00:00.0,1955-11-05 3.1. Data Types 377 SQream DB, Release 2021.1 Warning: Some client applications may alter the DATETIME value by modifying the timezone. 3.1.3.6.6 Date Casts and Conversions The following table shows the possible DATE and DATETIME value conversions: Type Details VARCHAR(n) '1997-01-01' → '1997-01-01', '1955-11-05 01:24' → '1955-11-05 01:24:00.000' 3.2 SQL Statements and Syntax This section provides reference for using SQream DB’s SQL statements - DDL commands, DML commands and SQL query syntax. 3.2.1 SQL Syntax Features SQream DB supports SQL from the ANSI 92 syntax. 3.2.1.1 Keywords and Identifiers SQL statements are made up of three components: • Keywords: Keywords have specific meaning in SQL, like SELECT, CREATE, WHERE, etc. • Identifiers: Names for things like tables, columns, databases, etc. 3.2.1.1.1 Keywords Keywords are special words that make up SQream DB’s SQL syntax, and have a meaning in a statement. Keywords are not allowed as identifiers. A full list of reserved keywords can be found at the end of this document. 3.2.1.1.2 Identifiers Identifiers are typically used as database objects names, such as databases, tables, views or columns. Because of their use, these are also often also called “names”. Identifiers can also be used to change a column name in the result (column alias)ina SELECT statement. 3.2.1.1.2.1 Identifier rules An identifier can be either quoted or unquoted, and are maximum 128 characters long. Regular identifiers follow these rules: 1. Does not contain any special characters, apart for an underscore (_) 378 Chapter 3. Reference SQream DB, Release 2021.1 2. Case insensitive. SQream DB converts all identifiers to lowercase unless quoted. 3. Does not equal any keyword, like SELECT, OR, AND, etc. To bypass these limitations, use a quoted identifier by surrounding the identifier with double quotes ("). Quoted identifiers follow these rules: • Surrounded with double quotes (“). • May contain any ASCII character, except @, $ or ". • Quoted identifiers are stored case-sensitive and must be referenced with double quotes. 3.2.1.1.3 Reserved keywords Keyword ALL ANALYSE ANALYZE AND ANY ARRAY AS ASC AUTHORIZATION BINARY BOTH CASE CAST CHECK COLLATE COLUMN CONCURRENTLY CONSTRAINT CREATE CROSS CURRENT_CATALOG CURRENT_ROLE CURRENT_TIME CURRENT_USER DEFAULT DEFERRABLE DESC DISTINCT DO ELSE END EXCEPT FALSE FETCH FOR FREEZE Continued on next page 3.2. SQL Statements and Syntax 379 SQream DB, Release 2021.1 Table 2 – continued from previous page Keyword FROM FULL GRANT GROUP HASH HAVING ILIKE IN INITIALLY INNER INTERSECT INTO IS ISNULL JOIN LEADING LEFT LIKE LIMIT LOCALTIME LOCALTIMESTAMP LOOP MERGE NATURAL NOT NOTNULL NULL OFFSET ON ONLY OPTION OR ORDER OUTER OVER OVERLAPS PLACING PRIMARY REFERENCES RETURNING RIGHT RLIKE SELECT SESSION_USER SIMILAR SOME SYMMETRIC SYMMETRIC TABLE Continued on next page 380 Chapter 3. Reference SQream DB, Release 2021.1 Table 2 – continued from previous page Keyword THEN TO TRAILING TRUE UNION UNIQUE USER USING VARIADIC VERBOSE WHEN WHERE WINDOW WITH 3.2.1.2 Literals Literals represent constant values. SQream DB contains the following types of literals: • Numeric literals - define numbers such as 1.3, -5 • String literals - define text values like 'Foxes are cool', '1997-01-01' • Typed literals - define values with explicit types like (3.0 :: float) • Boolean literals - define values that include true and false • Other constants - predefined values like NULL or TRUE 3.2.1.2.1 Numeric Literals Numeric literals can be expressed as follows: number_literal ::= [+-] digits | digits . [ digits ] [ e [+-] digits ] | [ digits ] . digits [ e [+-] digits ] | digits e[+-]digits 3.2.1.2.1.1 Examples 1234 1234.56 12. .34 (continues on next page) 3.2. SQL Statements and Syntax 381 SQream DB, Release 2021.1 (continued from previous page) 123.56e-45 0.23 3.141 42 Note: The actual data type of the value changes based on context, the format used, and the value itself. For example, any number containing the decimal point will be considered FLOAT by default. Any whole number will considered INT, unless the value is larger than the maximum value, in which case the type will become a BIGINT. Note: A numeric literal that contains neither a decimal point nor an exponent is considered INT by default if its value fits in type INT (32 bits). If not, it is considered BIGINT by default if its value fits in type BIGINT (64 bits). If neither are true, it is considered FLOAT. Literals that contain decimal points and/or exponents are always considered FLOAT. 3.2.1.2.2 String Literals String literals are string (text) values, encoded either in ASCII or UTF-8. String literals are surrounded by single quotes (') or dollars ($$) Tip: To use a single quote in a string, use a repeated single quote. 3.2.1.2.2.1 Examples 'This is an example of a string' 'Hello? Is it me you''re looking for?' -- Repeated single quotes are treated as a single␣ ,→quote $$That is my brother's company's CEO's son's dog's toy$$ -- Dollar-quoted '1997-01-01' -- This is a string The actual data type of the value changes based on context, the format used, and the value itself. In the example below, the first value is interpreted as a DATE, while the second is interpreted as a VARCHAR. INSERT INTO cool_dates(date_col, reason) VALUES ('1955-11-05', 'Doc Brown discovers flux␣ ,→capacitor'); This section describes the following types of literals: • Regular string literals 382 Chapter 3. Reference SQream DB, Release 2021.1 • Dollar-quoted string literals • Escaped string literals 3.2.1.2.2.2 Regular String Literals In SQL, a regular string literal is a sequence of zero or more characters bound by single quotes ('): 'This is a string'. You can include a single-quote character in a string literal with two consecutive single quotes (''): 'Dianne''s horse'. Note that two adjacent single quotes is not the same as a double-quote character ("). 3.2.1.2.2.3 Examples The following are some examples of regular string literals: '123' '' 'a''b' '' 3.2.1.2.2.4 Dollar-Quoted String Literals Dollar-quoted string literals consist of a dollar sign ($), an optional “tag” of zero or more characters, another dollar sign, an arbitrary sequence of characters that make up the string content, a dollar sign, the same tag at the beginning of the dollar quote, and another dollar sign. 3.2.1.2.2.5 Examples For example, below are two different ways to specify the string Dianne's horse using dollar-quoted string literals: $$Dianne's horse$$ $ Note that you can use single quotes inside the dollar-quoted string without an escape. Because the string is always written literally, you do not need to escape any characters inside a dollar-quoted string. Backslashes and dollar signs indicate no specific functions unless they are part of a sequence matching the opening tag. Any used tags in a dollar-quoted string follow the same rules as for unquoted identifiers, except that they cannot contain a dollar sign. In addition, because tags are case sensitive, $ 3.2. SQL Statements and Syntax 383 SQream DB, Release 2021.1 A dollar-quoted string that follows a keyword or identifier must be separated from it by whitespace (such as spaces, tabs, or newlines). If you do not separate them with whitespace, the dollar-quoting delimiter is taken as part of the preceding identifier. 3.2.1.2.2.6 Escaped String Literals Because regular string literals do not support inserting special characters (such as new lines), the escaped string literals syntax was added to support inserting special characters with an escaping syntax. In addition to being enclosed by single quotes (e.g. ‘abc’), escaped string literals are preceded by a capital E. E'abc' The character sequence inside the single quotes can contain escaped characters in addition to regular char- acters, shown below: Sequence Interpretation \b Inserts a backspace. \f Inserts a form feed. \n Inserts a newline. \r Inserts a carriage return. \t Inserts a tab. \o, \oo, `\ooo` (o = Inserts an octal byte value. This sequence is currently not supported. 0 - 7) \xh, \xhh (h = 0 - 9, Inserts a hexadecimal byte value. This sequence is currently not supported. A - F) \uxxxx, \Uxxxxxxxx Inserts a 16 or 32-bit hexadecimal unicode character value (x = 0 - 9, A - F). Excluding the characters in the table above, escaped string literals take all other characters following a backslash literally. To include a backslash character, use two consecutive backslashes (\\). You can use a single quote in an escape string by writing \', in addition to the normal method (''). 3.2.1.2.3 Typed Literals Typed literals allow you to create any data type using either of the following syntaxes: CAST(literal AS type_name) literal :: type_name See also Converting and Casting Types for more information about supported casts. 3.2.1.2.3.1 Syntax Reference The following is a syntax reference for typed literals: typed_literal ::= cast(literal AS type_name) | literal :: type_name (continues on next page) 384 Chapter 3. Reference SQream DB, Release 2021.1 (continued from previous page) literal ::= string_literal | number_literal | NULL | TRUE | FALSE type_name ::= BOOL | TINYINT | SMALLINT | INT | BIGINT | FLOAT | REAL | DATE | DATETIME | VARCHAR ( digits ) | TEXT ( digits ) 3.2.1.2.3.2 Examples '1955-11-05' :: date 'TRUE' :: BOOL CAST('2300' as BIGINT) CAST(42 :: FLOAT) 3.2.1.2.4 Boolean Literals Boolean literals include the keywords true or false. 3.2.1.2.4.1 Example INSERT INTO animals VALUES ('fox',true), ('cat',true), ('kiwi',false); 3.2.1.2.5 Other Constants The following other constants can be used: • TRUE and FALSE - interpreted as values of type BOOL. • NULL - which has no type of its own. The type is inferred from context during query compilation. 3.2. SQL Statements and Syntax 385 SQream DB, Release 2021.1 3.2.1.3 Scalar expressions Scalar expressions are functions that calculate a single (scalar) value, even if executed on an entire column. They can be stored as a row value, as opposed to a table value which is a result set consisting of more than one column and/or row. A scalar expression can be any one of these: • Literals • Column references • Operators 3.2.1.3.1 Literals Literals are constant values. 3.2.1.3.2 Column references A column reference is a column name, alias, or ordinal. For example, in SELECT name FROM users, the column reference refers to the column titled name. Column names may be aliased. For example in SELECT name as "First name" from users, the column reference is the alias "First name", which is quoted to maintain the case and use of space. A column may also be referneced using an ordinal, for example in a GROUP BY or ORDER BY. In this query SELECT AVG(Salary),Team FROM nba GROUP BY 2, the ordinal 2 refers to the second column in the select list, Team. 3.2.1.3.3 Operators Operators, frequently used for comparison, usually come in three forms: 3.2.1.3.3.1 Unary operator A prefix or postfix to an expression or literals. For example, -, which is used to negate numbers. prefix_unary_operator ::= + | - | NOT postfix_unary_operator ::= IS NULL | IS NOT NULL 3.2.1.3.3.2 Binary operator Two expressions or literals separated by an operator. For example, + which is used to add two numbers. 386 Chapter 3. Reference SQream DB, Release 2021.1 binary_operator ::= . | + | ^ | * | / | % | + | - | >= | <= | != | <> | || | LIKE | NOT LIKE | RLIKE | NOT RLIKE | < | > | = | OR | AND 3.2.1.3.3.3 Special operators for set membership special_operator ::= value_expr IN ( value_expr [, ... ]) | value_expr NOT IN ( value_expr [, ... ]) | value_expr BETWEEN value_expr AND value_expr | value_expr NOT BETWEEN value_expr AND value_expr These operators return TRUE if the value_expr on the left matches the expression on the right for set membership or if the value is in-range. Note: The data type of the left value_expr must match the type of the right side value_expr. 3.2.1.3.3.4 Comparisons Binary operators are frequently used to compare values. Comparison operators (< > = <= >= <> !=, IS NULL, IS NOT NULL always returns BOOL These operators are: Operator Name < Smaller than <= Smaller than or equal to > Greater than >= Greater than or equal to = Equals <> or != Not equal to IS Identical to IS NOT Not identical to Note: NULL values are handled differently than other value expressions: • NULL is always smaller than anything, including another NULL. • NULL is never equal to anything, including another NULL (=). To check if a value is null, use IS NULL 3.2.1.3.3.5 Operator precedence The table below lists the operators in decreasing order of precedence. 3.2. SQL Statements and Syntax 387 SQream DB, Release 2021.1 Precedence Operator Associativity Highest . left + - (unary) ^ left */% left + - (binary) left || right BETWEEN, IN, LIKE, RLIKE < > = <= >= <> != IS NULL, IS NOT NULL NOT AND left Lowest OR left Tip: Use parentheses to avoid ambiguous situations when using binary operators. Note: The NOT variations, such as NOT BETWEEN, NOT IN, NOT LIKE, NOT RLIKE have the same precedence as their non-NOT variations. 3.2.1.4 Joins Joins combine results from two or more table expressions (tables, external tables, views) to form a new result set. The combination of these table expressions ca nbe based on a set of conditions, such as equality of columns. Joins are useful when data in tables are related. For example, when two tables contain one or more columns in common, so that rows can be matched or correlated with rows from the other table. 3.2.1.4.1 Syntax table_ref ::= | left_side join_type right_side [ ON value_expr ] | table_alias join_type ::= [ INNER ] [ join_hint ] JOIN | LEFT [ OUTER ] [ join_hint ] JOIN | RIGHT [ OUTER ] [ join_hint ] JOIN | CROSS [ join_hint ] JOIN join_hint ::= MERGE | LOOP 388 Chapter 3. Reference SQream DB, Release 2021.1 3.2.1.4.1.1 Join types 3.2.1.4.1.2 Inner join left_side [ INNER ] JOIN right_side ON value_expr left_side [ INNER ] JOIN right_side USING ( join_column [, ... ]) This is the default join type. Rows from the left_side and right_side that match the condition are returned. An inner join can also be specified by listing several tables in the FROM clause. Omitting the ON or WHERE will result in a CROSS JOIN, where every row of left_side is matched with every row on right_side. 3.2.1.4.1.3 Left outer join left_side LEFT [ OUTER ] JOIN right_side ON value_expr left_side LEFT [ OUTER ] JOIN right_side USING ( join_column [, ... ]) Similar to the inner join - but for every row of left_side that does not match, NULL values are returned for columns on right_side 3.2.1.4.1.4 Right outer join left_side RIGHT [ OUTER ] JOIN right_side ON value_expr left_side RIGHT [ OUTER ] JOIN right_side USING ( join_column [, ... ]) Similar to the inner join - but for every row of right_side that does not match, NULL values are returned for columns on left_side 3.2.1.4.1.5 Cross join left_side CROSS JOIN right_side A cartesian product - all rows match on all sides. For every row from left_side and right_side, the result set will contain a row with all columns from left_side and all columns from right_side. The CROSS JOIN can’t have an ON clause, but WHERE can be used to limit the result set. 3.2.1.4.1.6 Join conditions 3.2.1.4.1.7 ON value_expr The ON condition is an expression that evaluates to a boolean to identify whether the rows match. For example, ON left_side.name = right_side.name matches when both name columns match. For LEFT and RIGHT joins, the ON clause is optional. However, if it is not specified, the result is a computa- tionally intensive CROSS JOIN. 3.2. SQL Statements and Syntax 389 SQream DB, Release 2021.1 Tip: SQream DB does not support the USING syntax. However, queries can be easily rewritten. left_side JOIN right_side using (name) is equivalent to ON left_side.name = right_side.name 3.2.1.4.2 Examples Assume a pair of tables with the following structure and content: CREATE TABLE left_side (x INT); INSERT INTO left_side VALUES (1), (2), (4), (5); CREATE TABLE right_side (x INT); INSERT INTO right_side VALUES (2), (3), (4), (5), (6); 3.2.1.4.2.1 Inner join In inner joins, values that are not matched do not appear in the result set. t=> SELECT * FROM left_side AS l JOIN right_side AS r . ON l.x = r.x; x | x0 --+--- 2 | 2 4 | 4 5 | 5 3.2.1.4.2.2 Left join Note: Note the null values for 1 which were not matched. SQream DB outputs nulls last. t=> SELECT * FROM left_side AS l LEFT JOIN right_side AS r . ON l.x = r.x; x | x0 --+--- 2 | 2 4 | 4 5 | 5 1 | \N 3.2.1.4.2.3 Right join Note: Note the null values for 3, 6 which were not matched. SQream DB outputs nulls last. 390 Chapter 3. Reference SQream DB, Release 2021.1 t=> SELECT * FROM left_side AS l LEFT JOIN right_side AS r . ON l.x = r.x; x | x0 ---+--- 2 | 2 4 | 4 5 | 5 \N | 3 \N | 6 3.2.1.4.2.4 Cross join t=> SELECT * FROM left_side AS l CROSS JOIN right_side AS r; x | x0 --+--- 1 | 2 1 | 3 1 | 4 1 | 5 1 | 6 2 | 2 2 | 3 2 | 4 2 | 5 2 | 6 4 | 2 4 | 3 4 | 4 4 | 5 4 | 6 5 | 2 5 | 3 5 | 4 5 | 5 5 | 6 Specifying multiple comma-separated tables is equivalent to a cross join, that can then be filtered with a WHERE clause.. t=> SELECT * FROM left_side l, right_side r; x | x0 --+--- 1 | 2 1 | 3 1 | 4 1 | 5 1 | 6 2 | 2 2 | 3 2 | 4 2 | 5 (continues on next page) 3.2. SQL Statements and Syntax 391 SQream DB, Release 2021.1 (continued from previous page) 2 | 6 4 | 2 4 | 3 4 | 4 4 | 5 4 | 6 5 | 2 5 | 3 5 | 4 5 | 5 5 | 6 t=> SELECT * FROM left_side l, right_side r WHERE (r.x=l.x); x | x0 --+--- 2 | 2 4 | 4 5 | 5 3.2.1.4.2.5 Join hints Join hints can be used to override the query compiler and choose a particular join algorithm. The available algorithms are LOOP (corresponding to non-indexed nested loop join algorithm), and MERGE (corresponding to sort merge join algorithm). If no algorithm is specified, a loop join is performed by default. t=> SELECT * FROM left_side AS l INNER MERGE JOIN right_side AS r ON l.x = r.x; x | x0 --+--- 2 | 2 4 | 4 5 | 5 t=> SELECT * FROM left_side AS l INNER LOOP JOIN right_side AS r ON l.x = r.x; x | x0 --+--- 2 | 2 4 | 4 5 | 5 3.2.1.5 Common table expressions (CTEs) Common table expressions or CTEs allow a complex subquery to be represented in a short way later on for improved readability, and reuse multiple times in a query. CTEs do not affect query performance. 392 Chapter 3. Reference SQream DB, Release 2021.1 3.2.1.5.1 Syntax cte ::= [ WITH table_alias AS (query) [, ...] ] query ::= query_term [ UNION ALL query_term [ ...]] [ ORDER BY order [, ... ]] [ LIMIT num_rows ] ; query_term ::= SELECT [ TOP num_rows ] [ DISTINCT ] select_list [ FROM table_ref [, ... ] [ WHERE value_expr [ GROUP BY value_expr [, ... ] [ HAVING value_expr ] ] | ( VALUES ( value_expr [, ... ] ) [, ... ]) 3.2.1.5.2 Examples 3.2.1.5.2.1 Simple CTE nba=> WITH s AS (SELECT "Name" FROM nba WHERE "Salary" > 20000000) . SELECT * FROM nba AS n, s WHERE n."Name" = s."Name"; Name | Team | Number | Position | Age | Height | Weight |␣ ,→College | Salary | name0 ------+------+------+------+-----+------+------+---- ,→------+------+------Carmelo Anthony | New York Knicks | 7 | SF | 32 | 6-8 | 240 |␣ ,→Syracuse | 22875000 | Carmelo Anthony Chris Bosh | Miami Heat | 1 | PF | 32 | 6-11 | 235 |␣ ,→Georgia Tech | 22192730 | Chris Bosh Chris Paul | Los Angeles Clippers | 3 | PG | 31 | 6-0 | 175 |␣ ,→Wake Forest | 21468695 | Chris Paul Derrick Rose | Chicago Bulls | 1 | PG | 27 | 6-3 | 190 |␣ ,→Memphis | 20093064 | Derrick Rose Dwight Howard | Houston Rockets | 12 | C | 30 | 6-11 | 265 | ␣ ,→ | 22359364 | Dwight Howard Kevin Durant | Oklahoma City Thunder | 35 | SF | 27 | 6-9 | 240 |␣ ,→Texas | 20158622 | Kevin Durant Kobe Bryant | Los Angeles Lakers | 24 | SF | 37 | 6-6 | 212 | ␣ ,→ | 25000000 | Kobe Bryant (continues on next page) 3.2. SQL Statements and Syntax 393 SQream DB, Release 2021.1 (continued from previous page) LeBron James | Cleveland Cavaliers | 23 | SF | 31 | 6-8 | 250 | ␣ ,→ | 22970500 | LeBron James In this example, the WITH clause defines the temporary name r for the subquery which finds salaries over $20 million. The result set becomes a valid table reference in any table expression of the subsequent SELECT clause. 3.2.1.5.2.2 Nested CTEs SQream DB also supports any amount of nested CTEs, such as this: WITH w AS (SELECT * FROM (WITH x AS (SELECT * FROM nba) SELECT * FROM x ORDER BY "Salary" DESC)) SELECT * FROM w ORDER BY "Weight" DESC; 3.2.1.5.2.3 Reusing CTEs SQream DB supports reusing CTEs several times in a query. CTEs are separated with commas. nba=> WITH . nba_ct AS (SELECT "Name", "Team" FROM nba WHERE "College"='Connecticut'), . nba_az AS (SELECT "Name", "Team" FROM nba WHERE "College"='Arizona') . SELECT * FROM nba_az JOIN nba_ct ON nba_ct."Team" = nba_az."Team"; Name | Team | name0 | team0 ------+------+------+------Stanley Johnson | Detroit Pistons | Andre Drummond | Detroit Pistons Aaron Gordon | Orlando Magic | Shabazz Napier | Orlando Magic 3.2.1.5.2.4 Using CTEs with CREATE TABLE AS When used with CREATE TABLE AS, the CREATE TABLE statement should appear before WITH. CREATE TABLE weights AS WITH w AS (SELECT * FROM (WITH x AS (SELECT * FROM nba) SELECT * FROM x ORDER BY "Salary" DESC)) SELECT * FROM w ORDER BY "Weight" DESC; 3.2.1.6 Window Functions Window functions are functions applied over a subset (known as a window) of the rows returned by a SELECT query. 394 Chapter 3. Reference SQream DB, Release 2021.1 3.2.1.6.1 Syntax window_expr ::= window_fn ( [ value_expr [, ...]]) OVER ( [ PARTITION BY value_expression [, ...]] [ ORDER BY value_expression [ ASC | DESC ][ NULLS { FIRST | LAST } ] [, ...]] [ frame_clause ] ) window_fn ::= | AVG | COUNT | LAG | LEAD | MAX | MIN | RANK | ROW_NUMBER | SUM | FIRST_VALUE | LAST_VALUE | NTH_VALUE | DENSE_RANK | PERCENT_RANK | CUME_DIST | NTILE frame_clause ::= { RANGE | ROWS } frame_start [ frame_exclusion ] | { RANGE | ROWS } BETWEEN frame_def AND frame_def [ frame_exclusion ] frame_def ::= UNBOUNDED PRECEDING offset PRECEDING CURRENT ROW offset FOLLOWING UNBOUNDED FOLLOWING offset ::= EXCLUDE CURRENT ROW | EXCLUDE GROUP | EXCLUDE TIES | EXCLUDE NO OTHERS row_offset ::= numeric literal 3.2. SQL Statements and Syntax 395 SQream DB, Release 2021.1 3.2.1.6.2 Arguments Parameter Description window_fn Window function, like AVG, MIN, etc. PARTITION BY partition_expr An expression or column reference to partition by ORDER BY order An expression or column reference to order by 3.2.1.6.3 Supported Window Functions Table 3: Window Aggregation Functions FunctionDescription AVG Returns the average of numeric values. COUNTReturns the count of numeric values, or only the distinct values. MAX Returns the maximum values. MIN Returns the minimum values. SUM Returns the sum of numeric values, or only the distinct values. Table 4: Ranking Functions FunctionDescription LAG Returns a value from a previous row within the partition of a result set. LEAD Returns a value from a subsequent row within the partition of a result set. ROW_NUMBERReturns the row number of each row within the partition of a result set. RANK Returns the rank of each row within the partition of a result set. FIRST_VALUEReturns the value in the first row of a window. LAST_VALUEReturns the value in the last row of a window. NTH_VALUEReturns the value in a specified (n) row of a window. DENSE_RANKReturns the rank of the current row with no gaps. PERCENT_RANKReturns the relative rank of the current row. CUME_DISTReturns the cumulative distribution of rows. NTILEReturns an integer ranging between 1 and the argument value, dividing the partitions as equally as possible. 3.2.1.6.4 How Window Functions Work A window function operates on a subset (“window”) of rows. Each time a window function is called, it gets the current row for processing, as well as the window of rows that contains the current row. The window function returns one result row for each input. The result depends on the individual row and the order of the rows. Some window functions are order- sensitive, such as RANK. Note: In general, a window frame will include all rows of a partition. If an ORDER BY clause is applied, the rows will become ordered which can change the order of the function calls. The function will be applied to the subset between the first row and the current row, instead ofthe whole frame. 396 Chapter 3. Reference SQream DB, Release 2021.1 Boundaries for the frames may need to be applied to get the correct results. Window frame functions allows a user to perform rolling operations, such as calculate moving averages, longest standing customers, identifying churn, find movers and shakers, etc. 3.2.1.6.4.1 PARTITION BY The PARTITION BY clause groups the rows of the query into partitions, which are processed separately by the window function. PARTITION BY works similarly to a query-level GROUP BY clause, but expressions are always just expressions and cannot be output-column names or numbers. Without PARTITION BY, all rows produced by the query are treated as a single partition. 3.2.1.6.4.2 ORDER BY The ORDER BY clause determines the order in which the rows of a partition are processed by the window function. It works similarly to a query-level ORDER BY clause, but cannot use output-column names or numbers. Without ORDER BY, rows are processed in an unspecified order. 3.2.1.6.4.3 Frames Note: Frames and frame exclusions have been tested extensively, but are a complex feature. They are released as a preview in v2020.1 pending longer-term testing. The frame_clause specifies the set of rows constituting the window frame, which is a subset of thecurrent partition, for those window functions that act on the frame instead of the whole partition. The set of rows in the frame can vary depending on which row is the current row. The frame can be specified in RANGE or ROWS mode; in each case, it runs from the frame_start to the frame_end. If frame_end is omitted, the end defaults to CURRENT ROW. A frame_start of UNBOUNDED PRECEDING means that the frame starts with the first row of the partition, and similarly a frame_end of UNBOUNDED FOLLOWING means that the frame ends with the last row of the partition. In RANGE mode, a frame_start of CURRENT ROW means the frame starts with the current row’s first peer row (a row that the window’s ORDER BY clause sorts as equivalent to the current row), while a frame_end of CURRENT ROW means the frame ends with the current row’s last peer row. In ROWS mode, CURRENT ROW simply means the current row. In the offset PRECEDING and offset FOLLOWING frame options, the offset must be an expression not containing any variables, aggregate functions, or window functions. The meaning of the offset depends on the frame mode: • In ROWS mode, the offset must yield a non-null, non-negative integer, and the option means thatthe frame starts or ends the specified number of rows before or after the current row. • In RANGE mode, these options require that the ORDER BY clause specify exactly one column. The offset specifies the maximum difference between the value of that column in the current row anditsvaluein 3.2. SQL Statements and Syntax 397 SQream DB, Release 2021.1 preceding or following rows of the frame. This option is restricted to integer types, date and datetime. The offset is required to be a non-null non-negative integer value. • With a DATE or DATETIME column, the offset indicates a number of days. In any case, the distance to the end of the frame is limited by the distance to the end of the partition, so that for rows near the partition ends the frame might contain fewer rows than elsewhere. The default framing option is RANGE UNBOUNDED PRECEDING, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. With ORDER BY, this sets the frame to be all rows from the par- tition start up through the current row’s last ORDER BY peer. Without ORDER BY, this means all rows of the partition are included in the window frame, since all rows become peers of the current row. 3.2.1.6.4.4 Restrictions • frame_start cannot be UNBOUNDED FOLLOWING • frame_end cannot be UNBOUNDED PRECEDING • frame_end choice cannot appear earlier in the above list of frame_start and frame_end options than the frame_start choice does. For example RANGE BETWEEN CURRENT ROW AND 7 PRECEDING is not allowed. However, while ROWS BETWEEN 7 PRECEDING AND 8 PRECEDING is allowed, it would never select any rows. 3.2.1.6.4.5 Frame Exclusion The frame_exclusion option allows rows around the current row to be excluded from the frame, even if they would be included according to the frame start and frame end options. EXCLUDE CURRENT ROW excludes the current row from the frame. EXCLUDE GROUP excludes the current row and its ordering peers from the frame. EXCLUDE TIES excludes any peers of the current row from the frame, but not the current row itself. EXCLUDE NO OTHERS simply specifies explicitly the default behavior of not excluding the current roworits peers. 3.2.1.6.5 Limitations • At this phase, text columns are not supported in window function expressions. • Window function calls are permitted only in the SELECT list. 3.2.1.6.6 Examples For these examples, assume a table named nba, with the following structure: CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, (continues on next page) 398 Chapter 3. Reference SQream DB, Release 2021.1 (continued from previous page) "College" varchar(40), "Salary" float ); Here’s a peek at the table contents (Download nba.csv): Table 5: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics 3.2.1.6.6.1 Window Function Application t=> SELECT SUM("Salary") OVER (PARTITION BY "Team" ORDER BY "Age") FROM nba; sum ------1763400 5540289 5540289 5540289 5540289 7540289 18873622 18873622 30873622 60301531 60301531 60301531 64301531 72902950 72902950 [...] 3.2. SQL Statements and Syntax 399 SQream DB, Release 2021.1 3.2.1.6.6.2 Ranking Results See RANK. t=> SELECT n.Name, n.Age, n.Height ,RANK() OVER . (PARTITION BY n.Age ORDER BY n.Height DESC) AS Rank . FROM nba_2 n; name | age | height | rank ------+-----+------+----- Devin Booker | 19 | 6-6 | 1 Rashad Vaughn | 19 | 6-6 | 1 Kristaps Porzingis | 20 | 7-3 | 1 Karl-Anthony Towns | 20 | 7-0 | 2 Bruno Caboclo | 20 | 6-9 | 3 Kevon Looney | 20 | 6-9 | 3 Aaron Gordon | 20 | 6-9 | 3 Noah Vonleh | 20 | 6-9 | 3 Cliff Alexander | 20 | 6-8 | 7 Stanley Johnson | 20 | 6-7 | 8 Justise Winslow | 20 | 6-7 | 8 Kelly Oubre Jr. | 20 | 6-7 | 8 James Young | 20 | 6-6 | 11 Dante Exum | 20 | 6-6 | 11 D'Angelo Russell | 20 | 6-5 | 13 Emmanuel Mudiay | 20 | 6-5 | 13 Tyus Jones | 20 | 6-2 | 15 Jahlil Okafor | 20 | 6-11 | 16 Christian Wood | 20 | 6-11 | 16 Myles Turner | 20 | 6-11 | 16 Trey Lyles | 20 | 6-10 | 19 [...] 3.2.1.6.6.3 Using LEAD to Access Following Rows Without a Join The LEAD function is used to return data from rows further down the result set. The LAG function returns data from rows further up the result set. This example calculates the salary between two players, starting from the highest salary. t=> SELECT "Name", . "Salary", . LEAD("Salary", 1) OVER (ORDER BY "Salary" DESC) AS "Salary - next", . ABS(LEAD("Salary", 1) OVER (ORDER BY "Salary" DESC)- "Salary") AS "Salary -␣ ,→diff" . FROM nba . LIMIT 11 ; Name | Salary | Salary - next | Salary - diff ------+------+------+------Kobe Bryant | 25000000 | 22970500 | 2029500 LeBron James | 22970500 | 22875000 | 95500 Carmelo Anthony | 22875000 | 22359364 | 515636 Dwight Howard | 22359364 | 22192730 | 166634 (continues on next page) 400 Chapter 3. Reference SQream DB, Release 2021.1 (continued from previous page) Chris Bosh | 22192730 | 21468695 | 724035 Chris Paul | 21468695 | 20158622 | 1310073 Kevin Durant | 20158622 | 20093064 | 65558 Derrick Rose | 20093064 | 20000000 | 93064 Dwyane Wade | 20000000 | 19689000 | 311000 Brook Lopez | 19689000 | 19689000 | 0 DeAndre Jordan | 19689000 | 19689000 | 0 3.2.1.7 Subqueries Subqueries allows you to reuse of results from another query. SQream DB supports relational (also called derived table) subqueries, which appear as SELECT queries as part of a table expression. SQream DB also supports Common table expressions (CTEs), which are a form of subquery. With CTEs, a subquery can be named for reuse in a query. Note: • SQream DB does not currently support correlated subqueries or scalar subqueries. • There is no limit to the number of subqueries or nesting limits in a statement 3.2.1.7.1 Table Subqueries The following is an example of table named nba with the following structure: CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float ); To see the table contents, click Download nba.csv: 3.2. SQL Statements and Syntax 401 SQream DB, Release 2021.1 Table 6: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics 3.2.1.7.2 Scalar Subqueries The following are examples of scalar subqueries. 3.2.1.7.2.1 Simple Subquery t=> SELECT AVG("Age") FROM . (SELECT "Name","Team","Age" FROM nba WHERE "Height" > '7-0'); avg --- 26 3.2.1.7.2.2 Combining a Subquery with a Join t=> SELECT * FROM . (SELECT "Name" FROM nba WHERE "Height" > '7-0') AS t(name) . , nba AS n . WHERE n."Name"=t.name; name | Name | Team | Number | Position |␣ ,→Age | Height | Weight | College | Salary ------+------+------+------+------+--- ,→--+------+------+------+------Alex Len | Alex Len | Phoenix Suns | 21 | C | ␣ ,→22 | 7-1 | 260 | Maryland | 3807120 Alexis Ajinca | Alexis Ajinca | New Orleans Pelicans | 42 | C | ␣ ,→28 | 7-2 | 248 | \N | 4389607 (continues on next page) 402 Chapter 3. Reference SQream DB, Release 2021.1 (continued from previous page) Boban Marjanovic | Boban Marjanovic | San Antonio Spurs | 40 | C | ␣ ,→27 | 7-3 | 290 | \N | 1200000 Kristaps Porzingis | Kristaps Porzingis | New York Knicks | 6 | PF | ␣ ,→20 | 7-3 | 240 | \N | 4131720 Marc Gasol | Marc Gasol | Memphis Grizzlies | 33 | C | ␣ ,→31 | 7-1 | 255 | \N | 19688000 Meyers Leonard | Meyers Leonard | Portland Trail Blazers | 11 | PF | ␣ ,→24 | 7-1 | 245 | Illinois | 3075880 Roy Hibbert | Roy Hibbert | Los Angeles Lakers | 17 | C | ␣ ,→29 | 7-2 | 270 | Georgetown | 15592217 Rudy Gobert | Rudy Gobert | Utah Jazz | 27 | C | ␣ ,→23 | 7-1 | 245 | \N | 1175880 Salah Mejri | Salah Mejri | Dallas Mavericks | 50 | C | ␣ ,→29 | 7-2 | 245 | \N | 525093 Spencer Hawes | Spencer Hawes | Charlotte Hornets | 0 | PF | ␣ ,→28 | 7-1 | 245 | Washington | 6110034 Tibor Pleiss | Tibor Pleiss | Utah Jazz | 21 | C | ␣ ,→26 | 7-3 | 256 | \N | 2900000 Timofey Mozgov | Timofey Mozgov | Cleveland Cavaliers | 20 | C | ␣ ,→29 | 7-1 | 275 | \N | 4950000 Tyson Chandler | Tyson Chandler | Phoenix Suns | 4 | C | ␣ ,→33 | 7-1 | 240 | \N | 13000000 Walter Tavares | Walter Tavares | Atlanta Hawks | 22 | C | ␣ ,→24 | 7-3 | 260 | \N | 1000000 3.2.1.7.2.3 WITH subqueries See Common table expressions (CTEs) for more information. nba=> WITH . nba_ct AS (SELECT "Name", "Team" FROM nba WHERE "College"='Connecticut'), . nba_az AS (SELECT "Name", "Team" FROM nba WHERE "College"='Arizona') . SELECT * FROM nba_az JOIN nba_ct ON nba_ct."Team" = nba_az."Team"; Name | Team | name0 | team0 ------+------+------+------Stanley Johnson | Detroit Pistons | Andre Drummond | Detroit Pistons Aaron Gordon | Orlando Magic | Shabazz Napier | Orlando Magic 3.2.1.8 Null handling SQream DB handles NULL values similar to other RDBMSs, with some minor differences. Tip: When using sqream sql NULL values are displayed as \N. Different clients may show other values, including an empty string. 3.2.1.8.1 Comparisons Any comparison between a NULL and a value will result in NULL. 3.2. SQL Statements and Syntax 403 SQream DB, Release 2021.1 For example, writing WHERE a = NULL will never match to TRUE or FALSE, because the comparison results in NULL. Use IS NULL to check for NULL values in result sets. 3.2.1.8.2 Conditionals Functions like ISNULL and COALESCE evaluate arguments in their given order, so they may never evaluate some arguments. For example, COALESCE(a,b,c,d,NULL) will never evalulate b, c, d, or NULL if a is not NULL. 3.2.1.8.3 Operations 3.2.1.8.3.1 Scalar functions With all scalar functions, a NULL input to any one of the arguments means the result is NULL. t=> SELECT x, y, z, x+y as "x+y", x*y as "x*y", y*z as "y*z", z-y as "z-y" FROM n; x | y | z | x+y | x*y | y*z | z-y --+----+----+-----+-----+-----+---- 1|0|1| 1| 0| 0| 1 2|1|1| 3| 2| 1| 0 3 | \N | 4 | \N | \N | \N | \N 4 | \N | \N | \N | \N | \N | \N 5 | 6 | \N | 11 | 30 | \N | \N 3.2.1.8.3.2 Aggregates With aggregates, NULL values are ignored, so they do not affect the result set. t=> SELECT SUM(x) AS "sum(x)", SUM(y) AS "sum(y)", SUM(z) AS "sum(z)" FROM n; sum(x) | sum(y) | sum(z) ------+------+------15 | 7 | 6 t=> SELECT COUNT(x) AS "count(x)", COUNT(y) AS "count(y)", COUNT(z) AS "count(z)" FROM n; count(x) | count(y) | count(z) ------+------+------5 | 3 | 3 • NULL values are not included in COUNT() of a column. COUNT(x) shows the full row-count because it’s a non-nullable column. COUNT(y) returns 3 - just the non-NULL values. • With MIN, MAX, and AVG - NULL values are completely ignored. 3.2.1.8.3.3 Distincts NULL values are considered distinct, but only counted once. 404 Chapter 3. Reference SQream DB, Release 2021.1 t=> SELECT DISTINCT z FROM n; z -- 1 4 \N Running COUNT DISTINCT however, ignores the NULL values: NULL values are considered distinct, but only counted once. t=> SELECT COUNT(DISTINCT z) FROM n; count ----- 2 3.2.1.8.4 Sorting When sorting a column containing NULL values, SQream DB sorts NULL values first with ASC and last with DESC. SQream DB does not implement NULLS FIRST or NULLS LAST, so where NULL appears cannot change where NULL values appear in the sort order. t=> SELECT * FROM n ORDER BY z ASC; x | y | z --+----+--- 4 | \N | \N 5 | 6 | \N 1 | 0 | 1 2 | 1 | 1 3 | \N | 4 t=> SELECT * FROM n ORDER BY z DESC; x | y | z --+----+--- 3 | \N | 4 1 | 0 | 1 2 | 1 | 1 4 | \N | \N 5 | 6 | \N 3.2.2 SQL Statements SQream DB supports commands from ANSI SQL. 3.2. SQL Statements and Syntax 405 SQream DB, Release 2021.1 3.2.2.1 Data Definition Commands (DDL) Table 7: DDL Commands Command Usage ADD COLUMN Add a new column to a table ALTER DEFAULT SCHEMA Change the default schema for a role ALTER TABLE Change the schema of a table CREATE DATABASE Create a new database CREATE EXTERNAL TABLE Create a new external table in the database (deprecated) CREATE FOREIGN TABLE Create a new foreign table in the database CREATE FUNCTION Create a new user defined function in the database CREATE SCHEMA Create a new schema in the database CREATE TABLE Create a new table in the database CREATE TABLE AS Create a new table in the database using results from a select query CREATE VIEW Create a new view in the database DROP COLUMN Drop a column from a table DROP DATABASE Drop a database and all of its objects DROP FUNCTION Drop a function DROP SCHEMA Drop a schema DROP TABLE Drop a table and its contents from a database DROP VIEW Drop a view RENAME COLUMN Rename a column RENAME TABLE Rename a table 3.2.2.2 Data Manipulation Commands (DML) Table 8: DML Commands Command Usage CREATE TABLE AS Create a new table in the database using results from a select query DELETE Delete specific rows from a table COPY FROM Bulk load CSV data into an existing table COPY TO Export a select query or entire table to CSV files INSERT Insert rows into a table SELECT Select rows and column from a table TRUNCATE Delete all rows from a table VALUES Return rows containing literal values 3.2.2.3 Utility Commands Table 9: Utility Commands Command Usage SELECT GET_LICENSE_INFO View a user’s license information SELECT GET_DDL View the CREATE TABLE statement for a table SELECT GET_FUNCTION_DDL View the CREATE FUNCTION statement for a UDF SELECT GET_VIEW_DDL View the CREATE VIEW statement for a view SELECT RECOMPILE_VIEW Recreate a view after schema changes SELECT DUMP_DATABASE_DDL View the CREATE TABLE statement for an current database 406 Chapter 3. Reference SQream DB, Release 2021.1 3.2.2.4 Saved Queries Table 10: Saved Queries Command Usage SELECT DROP_SAVED_QUERY Drop a saved query SELECT EXE- Executes a saved query CUTE_SAVED_QUERY SELECT LIST_SAVED_QUERIES Returns a list of saved queries SELECT RECOM- Recompiles a query that has been invalidated by a schema PILE_SAVED_QUERY change SELECT SAVE_QUERY Compiles and saves a query for re-use and sharing SELECT SHOW_SAVED_QUERY Shows query text for a saved query For more information, see Saved queries 3.2.2.5 Monitoring Monitoring statements allow a database administrator to execute actions in the system, such as aborting a query or get information about system processes. Table 11: Monitoring Command Usage EXPLAIN Returns a static query plan for a statement SHOW_CONNECTIONS Returns a list of jobs and statements on the current worker SHOW_LOCKS Returns any existing locks in the database SHOW_NODE_INFO Returns a query plan for an actively running statement with timing infor- mation SHOW_SERVER_STATUS Shows running statements across the cluster SHOW_VERSION Returns the version of SQream DB STOP_STATEMENT Stops a query (or statement) if it is currently running 3.2.2.6 Workload Management Table 12: Workload Management Command Usage SUBSCRIBE_SERVICE Add a SQream DB worker to a service queue UNSUBSCRIBE_SERVICE Remove a SQream DB worker to a service queue SHOW_SUBSCRIBED_INSTANCES Return a list of service queues and workers 3.2. SQL Statements and Syntax 407 SQream DB, Release 2021.1 3.2.2.7 Access Control Commands Table 13: Access Control Commands Command Usage ALTER DEFAULT PER- Applies a change to defaults in the current schema MISSIONS ALTER ROLE Applies a change to an existing role CREATE ROLE Creates a roles, which lets a database administrator control permissions on tables and databases DROP ROLE Removes roles GET_STATEMENT_PERMISSIONSReturns a list of permissions required to run a statement or query GRANT Grant permissions to a role REVOKE Revoke permissions from a role RENAME ROLE Rename a role 3.2.2.7.1 ADD COLUMN ADD COLUMN can be used to add columns to an existing table. 3.2.2.7.1.1 Permissions The role must have the DDL permission at the database or table level. 3.2.2.7.1.2 Syntax alter_table_add_column_statement ::= ALTER TABLE [schema_name.]table_name { ADD COLUMN column_def [, ...]} ; table_name ::= identifier schema_name ::= identifier column_def :: = { column_name type_name [ default ] [ column_constraint ] } column_name ::= identifier column_constraint ::= { NOT NULL | NULL } default ::= DEFAULT default_value 408 Chapter 3. Reference SQream DB, Release 2021.1 3.2.2.7.1.3 Parameters Parameter Description schema_name The schema name for the table. Defaults to public if not specified. table_name The table name to apply the change to. ADD COLUMN A comma separated list of ADD COLUMN commands column_def column_def A column definition. A minimal column definition includes a name identifier anda datatype. Other column constraints and default values can be added optionally. Note: • When adding a new column to an existing table, a default (or null constraint) has to be specified, even if the table is empty. • A new column added to the table can not contain an IDENTITY or be of the TEXT type. 3.2.2.7.1.4 Examples 3.2.2.7.1.5 Adding a simple column with default value ALTER TABLE cool_animals ADD COLUMN number_of_eyes INT DEFAULT 2 NOT NULL; 3.2.2.7.1.6 Adding several columns in one command ALTER TABLE cool_animals ADD COLUMN number_of_eyes INT DEFAULT 2 NOT NULL, ADD COLUMN date_seen DATE DEFAULT '2019-08-01'; 3.2.2.7.2 ALTER DEFAULT SCHEMA ALTER DEFAULT SCHEMA can be used to change a role’s default schema. The default schema in SQream DB is public. See also: CREATE SCHEMA, DROP SCHEMA. 3.2.2.7.2.1 Permissions No special permissions are required 3.2.2.7.2.2 Syntax 3.2. SQL Statements and Syntax 409 SQream DB, Release 2021.1 alter_default_schema_statement ::= ALTER DEFAULT SCHEMA FOR role_name TO schema_name ; role_name ::= identifier schema_name ::= identifier 3.2.2.7.2.3 Parameters Parameter Description role_name The name of the role the change will apply to. schema_name The new default schema name. 3.2.2.7.2.4 Examples 3.2.2.7.2.5 Altering the default schema for a role SELECT * FROM users; -- Refers to public.users ALTER DEFAULT SCHEMA FOR bgilfoyle TO staging; SELECT * FROM users; -- Now refers to staging.users, rather than public.users 3.2.2.7.3 ALTER TABLE ALTER TABLE can be used to make schema changes to a table. It works in conjunction with several subcom- mands. 3.2.2.7.3.1 Locks Schema changes take an exclusive lock on tables. While these operations are usually short, other statements may have to wait until the schema changes are completed. 3.2.2.7.3.2 Subcommands Command Usage ADD COLUMN Add a new column to a table DROP COLUMN Drop a column from a table RENAME COLUMN Rename a column RENAME TABLE Rename a table CLUSTER BY Modify (add or reorder) the clustering keys in a table DROP CLUSTERING KEY Drop all clustering keys 410 Chapter 3. Reference SQream DB, Release 2021.1 3.2.2.7.4 CLUSTER BY CLUSTER BY can be used to change clustering keys in a table. Read our Data clustering guide for more information. See also: DROP CLUSTERING KEY , CREATE TABLE. 3.2.2.7.4.1 Permissions The role must have the DDL permission at the database or table level. 3.2.2.7.4.2 Syntax alter_table_rename_table_statement ::= ALTER TABLE [schema_name.]table_name CLUSTER BY column_name [, ...] ; table_name ::= identifier column_name ::= identifier 3.2.2.7.4.3 Parameters Parameter Description schema_name The schema name for the table. Defaults to public if not specified. table_name The table name to apply the change to. column_name [, ... ] Comma separated list of columns to create clustering keys for 3.2.2.7.4.4 Usage notes Removing clustering keys does not affect existing data. To force data to re-cluster, the table has to be recreated (i.e. with CREATE TABLE AS). 3.2.2.7.4.5 Examples 3.2.2.7.4.6 Reclustering a table ALTER TABLE public.users CLUSTER BY start_date; 3.2.2.7.5 CREATE DATABASE CREATE DATABASE creates a new database in SQream DB 3.2. SQL Statements and Syntax 411 SQream DB, Release 2021.1 3.2.2.7.5.1 Permissions Only a superuser can create a new database 3.2.2.7.5.2 Syntax create_database_statement ::= CREATE DATABASE database_name ; database_name ::= identifier 3.2.2.7.5.3 Parameters Parameter Description database_name The name of the database name. The database name must be unique, and follows Identifier rules 3.2.2.7.5.4 Examples CREATE DATABASE raviga; CREATE DATABASE my_db; If the database already exists, an error will appear: master=> CREATE DATABASE MY_DB; Database 'my_db' already exists Note: SQream DB identifiers are always converted to lowercase, so my_db is the same as MY_DB, unless explicitly quoted as "MY_DB". 3.2.2.7.6 CREATE EXTERNAL TABLE Warning: The CREATE EXTERNAL TABLE syntax is deprecated, and will be removed in future versions. Starting with SQream DB v2020.2, external tables have been renamed to foreign tables, and use a more flexible foreign data wrapper concept. See CREATE FOREIGN TABLE instead. Upgrading to a new version of SQream DB converts existing tables automatically. When creating a new external tables, use the new foreign table syntax. CREATE TABLE creates a new external table in an existing database. See more in the External tables guide. 412 Chapter 3. Reference SQream DB, Release 2021.1 Tip: • Data in an external table can change if the sources change, and frequent access to remote files may harm performance. • To create a regular table, see CREATE TABLE 3.2.2.7.6.1 Permissions The role must have the CREATE permission at the database level. 3.2.2.7.6.2 Syntax create_table_statement ::= CREATE [ OR REPLACE ] EXTERNAL TABLE [schema_name].table_name ( { column_def [, ...]} ) USING FORMAT format_def WITH { external_table_option [ ...]} ; schema_name ::= identifier table_name ::= identifier format_def ::= { PARQUET | ORC | CSV } external_table_option ::= { PATH '{ path_spec }' | FIELD DELIMITER '{ field_delimiter }' | RECORD DELIMITER '{ record_delimiter }' | AWS_ID '{ AWS ID }' | AWS_SECRET '{ AWS SECRET }' } path_spec ::= { local filepath | S3 URI | HDFS URI } field_delimiter ::= delimiter_character record_delimiter ::= delimiter_character column_def ::= { column_name type_name [ default ] [ column_constraint ] } column_name ::= identifier column_constraint ::= { NOT NULL | NULL } default ::= (continues on next page) 3.2. SQL Statements and Syntax 413 SQream DB, Release 2021.1 (continued from previous page) DEFAULT default_value | IDENTITY [ ( start_with [ , increment_by ] ) ] 3.2.2.7.6.3 Parameters Parame- Description ter OR Create a new table, and overwrite any existing table by the same name. Does not return an REPLACE error if the table already exists. CREATE OR REPLACE does not check the table contents or structure, only the table name. schema_nameThe name of the schema in which to create the table. table_nameThe name of the table to create, which must be unique inside the schema. column_defA comma separated list of column definitions. A minimal column definition includes aname identifier and a datatype. Other column constraints and default values can be added option- ally. USING Specifies the format of the source files, suchas PARQUET, ORC, or CSV. FORMAT ... WITH Specifies a path or URI of the source files, suchas /path/to/*.parquet. PATH ... FIELD Specifies the field delimiter for CSV files. Defaults to ,. DELIMITER RECORD Specifies the record delimiter for CSV files. Defaults to anewline, \n DELIMITER AWS_ID, Credentials for authenticated S3 access AWS_SECRET 3.2.2.7.6.4 Examples 3.2.2.7.6.5 A simple table from Tab-delimited file (TSV) CREATE OR REPLACE EXTERNAL TABLE cool_animals (id INT NOT NULL, name VARCHAR(30) NOT NULL, weight FLOAT NOT NULL) USING FORMAT csv WITH PATH '/home/rhendricks/cool_animals.csv' FIELD DELIMITER '\t'; 3.2.2.7.6.6 A table from a directory of Parquet files on HDFS CREATE EXTERNAL TABLE users (id INT NOT NULL, name VARCHAR(30) NOT NULL, email VARCHAR(50) NOT NULL) USING FORMAT Parquet WITH PATH 'hdfs://hadoop-nn.piedpiper.com/rhendricks/users/*.parquet'; 414 Chapter 3. Reference SQream DB, Release 2021.1 3.2.2.7.6.7 A table from a bucket of files on S3 CREATE EXTERNAL TABLE users (id INT NOT NULL, name VARCHAR(30) NOT NULL, email VARCHAR(50) NOT NULL) USING FORMAT Parquet WITH PATH 's3://pp-secret-bucket/users/*.parquet' AWS_ID 'our_aws_id' AWS_SECRET 'our_aws_secret'; 3.2.2.7.6.8 Changing an external table to a regular table Materializes an external table into a regular table. CREATE TABLE real_table AS SELECT * FROM external_table; 3.2.2.7.7 CREATE FOREIGN TABLE Note: Starting with SQream DB v2020.2, external tables have been renamed to foreign tables, and use a more flexible foreign data wrapper concept. Upgrading to a new version of SQream DB converts existing external tables automatically. CREATE FOREIGN TABLE creates a new foreign table in an existing database. See more in the Foreign tables guide. Tip: • Data in a foreign table can change if the sources change, and frequent access to remote files may harm performance. • To create a regular table, see CREATE TABLE 3.2.2.7.7.1 Permissions The role must have the CREATE permission at the database level. 3.2.2.7.7.2 Syntax create_table_statement ::= CREATE [ OR REPLACE ] FOREIGN TABLE [schema_name].table_name ( { column_def [, ...]} ) [ FOREIGN DATA ] WRAPPER fdw_name [ OPTIONS ( option_def [, ... ])] (continues on next page) 3.2. SQL Statements and Syntax 415 SQream DB, Release 2021.1 (continued from previous page) ; schema_name ::= identifier table_name ::= identifier fdw_name ::= { csv_fdw | orc_fdw | parquet_fdw } option_def ::= { LOCATION = '{ path_spec }' | DELIMITER = '{ field_delimiter }' -- for CSV only | RECORD_DELIMITER = '{ record_delimiter }' -- for CSV only | AWS_ID '{ AWS ID }' | AWS_SECRET '{ AWS SECRET }' } path_spec ::= { local filepath | S3 URI | HDFS URI } field_delimiter ::= delimiter_character record_delimiter ::= delimiter_character column_def ::= { column_name type_name [ default ] [ column_constraint ] } column_name ::= identifier column_constraint ::= { NOT NULL | NULL } default ::= DEFAULT default_value | IDENTITY [ ( start_with [ , increment_by ] ) ] 416 Chapter 3. Reference SQream DB, Release 2021.1 3.2.2.7.7.3 Parameters Parameter Description OR Create a new table, and overwrite any existing table by the same name. Does not return REPLACE an error if the table already exists. CREATE OR REPLACE does not check the table contents or structure, only the table name. schema_name The name of the schema in which to create the table. table_name The name of the table to create, which must be unique inside the schema. column_def A comma separated list of column definitions. A minimal column definition includes a name identifier and a datatype. Other column constraints and default values can beadded optionally. WRAPPER Specifies the format of the source files, suchas parquet_fdw, orc_fdw, or csv_fdw. ... LOCATION Specifies a path or URI of the source files, suchas /path/to/*.parquet. = ... DELIMITER Specifies the field delimiter for CSV files. Defaults to ,. = ... RECORD_DELIMITERSpecifies the record delimiter for CSV files. Defaults to anewline, \n = ... AWS_ID, Credentials for authenticated S3 access AWS_SECRET 3.2.2.7.7.4 Examples 3.2.2.7.7.5 A simple table from Tab-delimited file (TSV) CREATE OR REPLACE FOREIGN TABLE cool_animals (id INT NOT NULL, name VARCHAR(30) NOT NULL, weight FLOAT NOT NULL) WRAPPER csv_fdw OPTIONS ( LOCATION = '/home/rhendricks/cool_animals.csv', DELIMITER = '\t' ) ; 3.2.2.7.7.6 A table from a directory of Parquet files on HDFS CREATE FOREIGN TABLE users (id INT NOT NULL, name VARCHAR(30) NOT NULL, email VARCHAR(50) NOT NULL) WRAPPER parquet_fdw OPTIONS ( LOCATION = 'hdfs://hadoop-nn.piedpiper.com/rhendricks/users/*.parquet' ); 3.2. SQL Statements and Syntax 417 SQream DB, Release 2021.1 3.2.2.7.7.7 A table from a bucket of ORC files on S3 CREATE FOREIGN TABLE users (id INT NOT NULL, name VARCHAR(30) NOT NULL, email VARCHAR(50) NOT NULL) WRAPPER orc_fdw OPTIONS ( LOCATION = 's3://pp-secret-bucket/users/*.orc', AWS_ID = 'our_aws_id', AWS_SECRET = 'our_aws_secret' ); 3.2.2.7.7.8 Changing a foreign table to a regular table Materializes a foreign table into a regular table. CREATE TABLE real_table AS SELECT * FROM some_foreign_table; 3.2.2.7.8 CREATE FUNCTION CREATE FUNCTION creates a new user-defined function (UDF) in an existing database. See more in our Python UDF (user-defined functions) guide. 3.2.2.7.8.1 Permissions The role must have the CREATE FUNCTION permission at the database level. 3.2.2.7.8.2 Syntax create_function_statement ::= CREATE [ OR REPLACE ] FUNCTION function_name (argument_list) RETURNS return_type AS $$ { function_body } $$ LANGUAGE python ; function_name ::= identifier argument_list :: = { value_name type_name [, ...]} value_name ::= identifier return_type ::= type_name function_body ::= Valid Python code 418 Chapter 3. Reference SQream DB, Release 2021.1 3.2.2.7.8.3 Parameters Pa- Description rame- ter OR Create a new function, and overwrite any existing function by the same name. Does not return REPLACE an error if the function already exists. CREATE OR REPLACE does not check the function contents or structure, only the function name. function_nameThe name of the function to create, which must be unique inside the database. argument_listA comma separated list of column definitions. A column definition includes a name identifier and a datatype. return_typeThe SQL datatype of the return value, such as INT, VARCHAR, etc. function_bodyPython code, dollar-quoted ($$). 3.2.2.7.8.4 Examples 3.2.2.7.8.5 Calculate distance between two points CREATE OR REPLACE FUNCTION my_distance (x1 float, y1 float, x2 float, y2 float) RETURNS FLOAT AS $$ import math if y1 < x1: return 0.0 else: return math.sqrt((y2 - y1) ** 2 + (x2 - x1) ** 2) $$ LANGUAGE PYTHON; -- Usage: SELECT city, my_location, my_distance(x1,y1,x2,y2) from cities; 3.2.2.7.8.6 Calling files from other locations -- Our script my_code.py is in ~/my_python_stuff CREATE FUNCTION write_log() RETURNS INT AS $$ import sys sys.path.append("/home/user/my_python_stuff") import my_code as f f.main() return 1 $$ LANGUAGE PYTHON; (continues on next page) 3.2. SQL Statements and Syntax 419 SQream DB, Release 2021.1 (continued from previous page) -- Usage: SELECT write_log(); 3.2.2.7.9 CREATE SCHEMA The CREATE SCHEMA page describes the following: • Overview • Permissions • Syntax • Parameters • Examples – Creating a Schema – Altering the Default Schema for a Role 3.2.2.7.9.1 Overview CREATE SCHEMA creates a new schema in an existing database. A schema is a virtual space for storing tables. The default schema in SQream DB is public. Tip: Use schemas to separate between use-cases, such as staging and production. The CREATE SCHEMA statement can be used to query tables from different schemas without providing an alias, as in the following example: select schema1.table1.column from schema1.table1 join schema2.table1 on schema1.table1. ,→column1=schema2.table1.column1 See also: DROP SCHEMA, ALTER DEFAULT SCHEMA. 3.2.2.7.9.2 Permissions The role must have the CREATE permission at the database level. 3.2.2.7.9.3 Syntax The following example shows the correct syntax for creating a schema: 420 Chapter 3. Reference SQream DB, Release 2021.1 create_schema_statement ::= CREATE SCHEMA schema_name ; schema_name ::= identifier 3.2.2.7.9.4 Parameters The following table shows the schema_name parameters: Parameter Description schema_name The name of the schema to create. 3.2.2.7.9.5 Examples This section includes the following examples: • Creating a Schema • Altering the Default Schema for a Role 3.2.2.7.9.6 Creating a Schema The following example shows an example of the syntax for creating a schema: CREATE SCHEMA staging; CREATE TABLE staging.users AS SELECT * FROM public.users; SELECT * FROM staging.users; 3.2.2.7.9.7 Altering the Default Schema for a Role The following example shows an example of the syntax for altering the default schema for a role: SELECT * FROM users; -- Refers to public.users ALTER DEFAULT SCHEMA FOR bgilfoyle TO staging; SELECT * FROM users; -- Now refers to staging.users, rather than public.users 3.2.2.7.10 CREATE TABLE CREATE TABLE creates a new table in an existing database. 3.2. SQL Statements and Syntax 421 SQream DB, Release 2021.1 Tip: • To create a table based on the result of a select query, see CREATE TABLE AS. • To create a table based on files like Parquet and ORC, see CREATE FOREIGN TABLE 3.2.2.7.10.1 Permissions The role must have the CREATE permission at the schema level. 3.2.2.7.10.2 Syntax The following is the correct syntax when creating a table: create_table_statement ::= CREATE [ OR REPLACE ] TABLE [schema_name.]table_name ( { column_def [, ...]} ) [ CLUSTER BY { column_name [, ...]}] ; schema_name ::= identifier table_name ::= identifier column_def :: = { column_name type_name [ default ] [ column_constraint ] } column_name ::= identifier column_constraint ::= { NOT NULL | NULL } default ::= DEFAULT default_value | IDENTITY [ ( start_with [ , increment_by ] ) ] 3.2.2.7.10.3 Parameters The following parameters can be used when creating a table: 422 Chapter 3. Reference SQream DB, Release 2021.1 Parameter Description OR REPLACE Creates a new tables and overwrites any existing table by the same name. Does not return an error if the table already exists. CREATE OR REPLACE does not check the table contents or structure, only the table name. schema_name The name of the schema in which to create the table. table_name The name of the table to create, which must be unique inside the schema. column_def A comma separated list of column definitions. A minimal column definition includes a name identifier and a datatype. Other column constraints and default values canbe added optionally. CLUSTER BY A commma separated list of clustering column keys. column_name1 See Data clustering for more information. ... LIKE Duplicates the column structure of an existing table. 3.2.2.7.10.4 Default Value Constraints The DEFAULT value constraint specifies a value to use if a value isn’t defined inan INSERT or COPY FROM statement. The value may be either a literal or GETDATE(), which is s evaluated at the time the row is created. Note: The DEFAULT constraint only applies if the column does not have a value specified in the INSERT or COPY FROM statement. You can still insert a NULL into an nullable column by explicitly inserting NULL. For example, INSERT INTO cool_animals VALUES (1, 'Gnu', NULL). 3.2.2.7.10.5 Syntax The following is the correct syntax when using the DEFAULT value constraints: column_def :: = { column_name type_name [ default ] [ column_constraint ] } column_constraint ::= { NOT NULL | NULL } default ::= DEFAULT default_value | IDENTITY [ ( start_with [ , increment_by ] ) ] check_specification ::= CHECK( 'CS compression_spec' ) compression_spec ::= { "default" | "p4d" | "dict" | "rle" | "sequence" | "flat" } 3.2.2.7.10.6 Identity Identity (or sequence) columns can be used for generating key values. Some databases call this AUTOINCREMENT. 3.2. SQL Statements and Syntax 423 SQream DB, Release 2021.1 The identity property on a column guarantees that each new row inserted is generated based on the current seed & increment. Warning: The identity property on a column does not guarantee uniqueness. The identity value can be bypassed by specifying it in an INSERT command. The following table describes the identity parameters: Parameter Description start_with A value that is used for the very first row loaded into the table. increment_by Incremental value that is added to the identity value of the previous row that was loaded. 3.2.2.7.10.7 Examples 3.2.2.7.10.8 Standard Table The following is an example of the syntax used to create a standard table: CREATE TABLE cool_animals ( id INT NOT NULL, name varchar(30) NOT NULL, weight FLOAT, is_agressive BOOL ); 3.2.2.7.10.9 Table with Default Value Constraints for Some Columns The following is an example of the syntax used to create a table with default value constraints for some columns: CREATE TABLE cool_animals ( id INT NOT NULL, name varchar(30) NOT NULL, weight FLOAT, is_agressive BOOL DEFAULT false NOT NULL ); Note: The nullable/non-nullable constraint appears at the end, after the default option 3.2.2.7.10.10 Table with an Identity Column The following is an example of the syntax used to create a table with an identity (auto-increment) column: CREATE TABLE users ( id BIGINT IDENTITY(0,1) NOT NULL , -- Start with 0, increment by 1 name VARCHAR(30) NOT NULL, (continues on next page) 424 Chapter 3. Reference SQream DB, Release 2021.1 (continued from previous page) country VARCHAR(30) DEFAULT 'Unknown' NOT NULL ); Note: • Identity columns are supported on BIGINT columns. • Identity does not enforce the uniqueness of values. The identity value can be bypassed by specifying it in an INSERT command. 3.2.2.7.10.11 Creating a Table from a SELECT Query The following is an example of the syntax used to create a table from a SELECT query: Use a CREATE TABLE AS statement to create a new table from the results of a SELECT query. CREATE TABLE users_uk AS SELECT * FROM users WHERE country = 'United Kingdom'; 3.2.2.7.10.12 Creating a Table with a Clustering Key The following is an example of the syntax used to create a table with a clustering key: When data in a table is stored in a sorted order, the sorted columns are considered clustered. Good clustering can have a significant positive impact on performance. In the following example, we expect the start_date column to be naturally clustered, as new users sign up and get a newer start date. When the clustering key is set, if the incoming data isn’t naturally clustered, it will be clustered by SQream DB during insert or bulk load. See Data clustering for more information. CREATE TABLE users ( name VARCHAR(30) NOT NULL, start_date datetime not null, country VARCHAR(30) DEFAULT 'Unknown' NOT NULL ) CLUSTER BY start_date; 3.2.2.7.11 CREATE TABLE AS CREATE TABLE AS creates a new table from the result of a select query. 3.2.2.7.11.1 Permissions The role must have the CREATE permission at the schema level, as well as SELECT permissions for any tables referenced by the statement. 3.2. SQL Statements and Syntax 425 SQream DB, Release 2021.1 3.2.2.7.11.2 Syntax create_table_statement ::= CREATE [ OR REPLACE ] TABLE [schema_name].table_name AS query ; schema_name ::= identifier table_name ::= identifier 3.2.2.7.11.3 Parameters Pa- Description rame- ter OR Create a new table, and overwrite any existing table by the same name. Does not return an error REPLACE if the table already exists. CREATE OR REPLACE does not check the table contents or structure, only the table name. schema_nameThe name of the schema in which to create the table. table_nameThe name of the table to create, which must be unique inside the schema. query A select query that returns data 3.2.2.7.11.4 Examples 3.2.2.7.11.5 Create a copy of an foreign table or view CREATE TABLE users AS SELECT * FROM users_source; 3.2.2.7.11.6 Filtering CREATE TABLE users_uk AS SELECT * FROM users WHERE country = 'United Kingdom'; 3.2.2.7.11.7 Adding columns CREATE TABLE users_uk_new AS SELECT GETDATE() as "Date",*,false as is_new FROM users_uk; 3.2.2.7.11.8 Creating a table from values CREATE TABLE new_users AS VALUES(GETDATE(),'Richard','Foxworthy','1984-03-03',True) 426 Chapter 3. Reference SQream DB, Release 2021.1 3.2.2.7.12 CREATE VIEW CREATE VIEW creates a new view in an existing database. A view is a virtual table. Tip: • Use views to simplify complex queries or present only partial data to specific roles. • If an underlying table has changed (new columns, changed names, etc.) - a view may be invalidated. To recompile the view, see SELECT RECOMPILE_VIEW( 3.2.2.7.12.1 Permissions The role must have the CREATE permission at the database level, as well as SELECT permissions for any tables referenced by the view. 3.2.2.7.12.2 Syntax create_view_statement ::= CREATE VIEW [schema_name].view_name [ column_list ] AS query ; schema_name ::= identifier view_name ::= identifier column_list ::= ( { column_name [, ...]}) column_name ::= identifier 3.2.2.7.12.3 Parameters Pa- Description rame- ter schema_nameThe name of the schema in which to create the view. view_nameThe name of the view to create, which must be unique inside the schema. column_listAn optional comma separated list of column names for the view. If specified, these column names will override the column names in the response of the query in the AS query statement. AS The select query to execute when the view is referenced. query 3.2. SQL Statements and Syntax 427 SQream DB, Release 2021.1 3.2.2.7.12.4 Examples 3.2.2.7.12.5 A simple view CREATE VIEW only_agressive_animals AS SELECT * FROM cool_animals WHERE is_agressive=true; SELECT * FROM only_agressive_animals; 3.2.2.7.12.6 Overriding default column names CREATE VIEW only_relaxed_animals (animal_id, animal_name, should_i_worry) AS SELECT id, name, is_agressive FROM cool_animals WHERE is_agressive=false; 3.2.2.7.13 DROP CLUSTERING KEY DROP CLUSTERING KEY drops all clustering keys in a table. Read our Data clustering guide for more information. See also: CLUSTER BY , CREATE TABLE. 3.2.2.7.13.1 Permissions The role must have the DDL permission at the database or table level. 3.2.2.7.13.2 Syntax alter_table_rename_table_statement ::= ALTER TABLE [schema_name.]table_name DROP CLUSTERING KEY ; table_name ::= identifier 3.2.2.7.13.3 Parameters Parameter Description schema_name The schema name for the table. Defaults to public if not specified. table_name The table name to apply the change to. 3.2.2.7.13.4 Usage notes Removing clustering keys does not affect existing data. To force data to re-cluster, the table has to be recreated (i.e. with CREATE TABLE AS). 428 Chapter 3. Reference SQream DB, Release 2021.1 3.2.2.7.13.5 Examples 3.2.2.7.13.6 Dropping clustering keys in a table ALTER TABLE public.users DROP CLUSTERING KEY 3.2.2.7.14 DROP COLUMN DROP COLUMN can be used to remove columns from a table. 3.2.2.7.14.1 Permissions The role must have the DDL permission at the database or table level. 3.2.2.7.14.2 Syntax alter_table_drop_column_statement ::= ALTER TABLE [schema_name.]table_name DROP COLUMN column_name ; table_name ::= identifier schema_name ::= identifier column_name ::= identifier 3.2.2.7.14.3 Parameters Parameter Description schema_name The schema name for the table. Defaults to public if not specified. table_name The table name to apply the change to. column_name The column to remove. 3.2.2.7.14.4 Examples 3.2.2.7.14.5 Removing a column -- Remove the 'weight' column ALTER TABLE users DROP COLUMN weight; 3.2. SQL Statements and Syntax 429 SQream DB, Release 2021.1 3.2.2.7.14.6 Removing a column with a quoted identifier name ALTER TABLE users DROP COLUMN "Weight in kilograms"; 3.2.2.7.15 DROP DATABASE DROP DATABASE can be used to remove a database and all of its objects. 3.2.2.7.15.1 Permissions The role must have the DDL permission at the database level. 3.2.2.7.15.2 Syntax drop_database_statement ::= DROP DATABASE database_name ; database_name ::= identifier 3.2.2.7.15.3 Parameters Parameter Description database_name The name of the database to drop. This can not be the current database in use. 3.2.2.7.15.4 Examples 3.2.2.7.15.5 Dropping a database and all of its objects master=> DROP DATABASE raviga; executed 3.2.2.7.15.6 Dropping the current database The current database in use can’t be dropped. Switch to another database first. raviga=> DROP DATABASE raviga; Current open database 'raviga' cannot be dropped. raviga=> \c master master=> DROP DATABASE raviga; executed 430 Chapter 3. Reference SQream DB, Release 2021.1 3.2.2.7.16 DROP FUNCTION DROP FUNCTION can be used to remove a user defined function. 3.2.2.7.16.1 Permissions The role must have the DDL permission at the database level. 3.2.2.7.16.2 Syntax drop_function_statement ::= DROP FUNCTION [ IF EXISTS ] function_name(); ; function_name ::= identifier 3.2.2.7.16.3 Parameters Parameter Description IF EXISTS Drop the function if it exists. Does not error if the function does not exist. function_name() The name of the function to drop. 3.2.2.7.16.4 Examples 3.2.2.7.16.5 Dropping a function DROP FUNCTION my_distance(); 3.2.2.7.16.6 Dropping a function (always succeeds) farm=> DROP FUNCTION my_distance(); executed farm=> DROP FUNCTION my_distance(); Function 'my_distance' not found -- This will succeed, even though the function does not exist farm=> DROP FUNCTION IF EXISTS my_distance(); executed 3.2.2.7.17 DROP SCHEMA DROP SCHEMA can be used to remove a schema. 3.2. SQL Statements and Syntax 431 SQream DB, Release 2021.1 The schema has to be empty before removal. SQream DB does not support dropping a schema with objects. See also: CREATE SCHEMA, ALTER DEFAULT SCHEMA. 3.2.2.7.17.1 Permissions The role must have the DDL permission at the database level. 3.2.2.7.17.2 Syntax drop_schema_statement ::= DROP SCHEMA schema_name ; schema_name ::= identifier 3.2.2.7.17.3 Parameters Parameter Description schema_name The name of the schema to drop 3.2.2.7.17.4 Examples 3.2.2.7.17.5 Dropping a schema DROP SCHEMA test; 3.2.2.7.17.6 Dropping a schema if it’s not empty If a schema contains several tables, SQream DB will alert you that these tables need to be dropped first. This prevents accidental dropping of full schemas. t=> DROP SCHEMA test; Schema 'test' contains the following objects: Tables - 'test.foo' , 'test.bar' Please drop its content and then try again. To drop the schema, drop the schema’s tables first, and then drop the schema: t=> DROP TABLE test.foo; executed t=> DROP TABLE test.bar; executed t=> DROP SCHEMA test; executed 432 Chapter 3. Reference SQream DB, Release 2021.1 3.2.2.7.18 DROP TABLE DROP TABLE can be used to remove a table and all of its contents. 3.2.2.7.18.1 Permissions The role must have the DDL permission at the database or table level. 3.2.2.7.18.2 Syntax drop_table_statement ::= DROP TABLE [ IF EXISTS ] [schema_name.]table_name ; table_name ::= identifier schema_name ::= identifier 3.2.2.7.18.3 Parameters Parameter Description IF EXISTS Drop the table if it exists. Does not error if the table does not exist. schema_name The name of the schema from which to drop the table. table_name The name of the table to drop. 3.2.2.7.18.4 Examples 3.2.2.7.18.5 Dropping a table DROP TABLE cool_animals; 3.2.2.7.18.6 Dropping a table (always succeeds) farm=> DROP TABLE cool_animals; executed farm=> DROP TABLE cool_animals; Table 'public.cool_animals' not found -- This will succeed, even though the table does not exist farm=> DROP TABLE IF EXISTS cool_animals; executed 3.2. SQL Statements and Syntax 433 SQream DB, Release 2021.1 3.2.2.7.19 DROP VIEW DROP VIEW can be used to remove a view. Because a view is logical, this does not affect any data in any of the referenced tables. 3.2.2.7.19.1 Permissions The role must have the DDL permission at the database level. 3.2.2.7.19.2 Syntax drop_view_statement ::= DROP VIEW [ IF EXISTS ] [schema_name.]view_name ; view_name ::= identifier schema_name ::= identifier 3.2.2.7.19.3 Parameters Parameter Description IF EXISTS Drop the view if it exists. Does not error if the view does not exist. schema_name The name of the schema from which to drop the view. view_name The name of the view to drop. 3.2.2.7.19.4 Examples 3.2.2.7.19.5 Dropping a table DROP VIEW angry_animals; 3.2.2.7.19.6 Dropping a view (always succeeds) farm=> DROP VIEW angry_animals; executed farm=> DROP VIEW angry_animals; View 'public.angry_animals' not found -- This will succeed, even though the view does not exist farm=> DROP VIEW IF EXISTS angry_animals; executed 434 Chapter 3. Reference SQream DB, Release 2021.1 3.2.2.7.20 RENAME COLUMN RENAME COLUMN can be used to rename columns in a table. 3.2.2.7.20.1 Permissions The role must have the DDL permission at the database or table level. 3.2.2.7.20.2 Syntax alter_table_rename_column_statement ::= ALTER TABLE [schema_name.]table_name RENAME COLUMN current_name TO new_name ; table_name ::= identifier schema_name ::= identifier current_name ::= identifier new_name ::= identifier 3.2.2.7.20.3 Parameters Parameter Description schema_name The schema name for the table. Defaults to public if not specified. table_name The table name to apply the change to. current_name The column to rename. new_name The new column name. 3.2.2.7.20.4 Examples 3.2.2.7.20.5 Renaming a column -- Remove the 'weight' column ALTER TABLE users RENAME COLUMN weight TO mass; 3.2.2.7.20.6 Renaming a quoted name ALTER TABLE users RENAME COLUMN "mass" TO "Mass (Kilograms); 3.2. SQL Statements and Syntax 435 SQream DB, Release 2021.1 3.2.2.7.21 RENAME TABLE RENAME TABLE can be used to rename a table. Warning: Renaming a table can void existing views that use this table. See more about recompiling views. 3.2.2.7.21.1 Permissions The role must have the DDL permission at the database or table level. 3.2.2.7.21.2 Syntax alter_table_rename_table_statement ::= ALTER TABLE [schema_name.]current_name RENAME TO new_name ; current_name ::= identifier schema_name ::= identifier new_name ::= identifier 3.2.2.7.21.3 Parameters Parameter Description schema_name The schema name for the table. Defaults to public if not specified. current_name The table name to apply the change to. new_name The new table name. 3.2.2.7.21.4 Examples 3.2.2.7.21.5 Renaming a table ALTER TABLE public.users RENAME TO former_users; 3.2.2.7.22 COPY FROM COPY ... FROM is a statement that allows loading data from files on the filesystem and importing them into SQream tables. This is the recommended way for bulk loading CSV files into SQream DB. In general, COPY moves data between filesystem files and SQream DB tables. Note: 436 Chapter 3. Reference SQream DB, Release 2021.1 • Learn how to migrate from CSV files in the Insert from CSV guide • To copy data from a table to a file, see COPY TO. • To load Parquet or ORC files, see CREATE FOREIGN TABLE 3.2.2.7.22.1 Permissions The role must have the INSERT permission to the destination table. 3.2.2.7.22.2 Syntax COPY [schema name.]table_name FROM WRAPPER fdw_name OPTIONS ( [ copy_from_option [, ...]] ) ; schema_name ::= identifer table_name ::= identifier copy_from_option ::= LOCATION = { filename | S3 URI | HDFS URI } | OFFSET = { offset } | LIMIT = { limit } | DELIMITER = '{ delimiter }' | RECORD_DELIMITER = '{ record delimiter }' | ERROR_LOG = '{ local filepath }' | REJECTED_DATA = '{ local filepath }' | CONTINUE_ON_ERROR = { true | false } | ERROR_COUNT = '{ error count }' | DATETIME_FORMAT = '{ parser format }' | AWS_ID = '{ AWS ID }' | AWS_SECRET = '{ AWS Secret }' offset ::= positive integer (continues on next page) 3.2. SQL Statements and Syntax 437 SQream DB, Release 2021.1 (continued from previous page) limit ::= positive integer delimiter ::= string record delimiter ::= string error count ::= integer parser_format ::= see supported parser table below AWS ID ::= string AWS Secret ::= string Note: Some options are applicable to CSVs only. These include: OFFSET, LIMIT, DELIMITER, RECORD_DELIMITER, REJECTED_DATA, DATETIME_FORMAT 438 Chapter 3. Reference SQream DB, Release 2021.1 3.2. SQL Statements and Syntax 439 SQream DB, Release 2021.1 3.2.2.7.22.3 Elements Parameter Default value Value range Description [schema_name. None Table to copy data into ]table_name name_fdw csv_fdw, orc_fdw, or The name of the Foreign parquet_fdw Data Wrapper to use LOCATION None A path on the lo- cal filesystem, S3, or HDFS URI. For ex- ample, /tmp/foo.csv, s3://my-bucket/ foo.csv, or hdfs:// my-namenode:8020/ foo.csv. The local path must be an abso- lute path that SQream DB can access. Wild- cards are premitted in this field. OFFSET 1 >1, but no more than The row number to start the number of lines in with. The first row is 1. the first file LIMIT unlimited 1 to 2147483647. When specified, tells SQream DB to stop loading after the spec- ified number of rows. Unlimited if unset. DELIMITER ',' Almost any ASCII char- Specifies the field ter- acter, See field delim- minator - the character iters section below (or characters) that sep- arates fields or columns within each row of the file. RECORD_DELIMITER \n (UNIX style newline) \n, \r\n, \r Specifies the row ter- minator - the character that separates lines or rows, also known as a new line separator. ERROR_LOG No error log When used, the COPY process will write error information from un- parsable rows to the file specified by this param- eter. • If an existing file path is specified, it will be overwrit- ten. • Specifying the same file for ERROR_LOG and REJECTED_DATA is not allowed and will result in 440 Chapter 3. Reference error. • Specifing an error log when creating a foreign table will write a new error log for every query on the foreign ta- ble. REJECTED_DATA Inactive When used, the COPY process will write the rejected record lines to this file. • If an existing file path is specified, it will be overwrit- ten. • Specifying the same file for ERROR_LOG and REJECTED_DATA is not allowed and will result in error. • Specifing an error log when creating a foreign table will write a new error log for every query on the foreign ta- ble. CONTINUE_ON_ERROR false true, false Specifies if errors should be ignored or skipped. When set to true, the transaction will con- tinue despite rejected data. This parameter should be set together with ERROR_COUNT When reading multiple files, if an entire file can’t be opened it will be skipped. ERROR_COUNT unlimited 1 to 2147483647 Specifies the threshold for the maximum num- ber of faulty records that will be ignored. This setting must be used in conjunction with CONTINUE_ON_ERROR. DATETIME_FORMAT ISO8601 for all columns See table below Allows specifying a non- default date formats for specific columns AWS_ID, AWS_SECRET None Specifies the authentica- tion details for secured S3 buckets SQream DB, Release 2021.1 3.2.2.7.22.4 Supported Date Formats Table 14: Supported date parsers Name Pattern Examples ISO8601, YYYY-MM-DD [hh:mm:ss[. 2017-12-31 11:12:13.456, 2018-11-02 11:05:00, DEFAULT SSS]] 2019-04-04 ISO8601C YYYY-MM-DD 2017-12-31 11:12:13:456 [hh:mm:ss[:SSS]] DMY DD/MM/YYYY [hh:mm:ss[. 31/12/2017 11:12:13.123 SSS]] YMD YYYY/MM/DD [hh:mm:ss[. 2017/12/31 11:12:13.678 SSS]] MDY MM/DD/YYYY [hh:mm:ss[. 12/31/2017 11:12:13.456 SSS]] YYYYMMDD YYYYMMDD[hh[mm[ss[SSS]]]] 20171231111213456 YYYY-M-D YYYY-M-D[ h:m[:s[.S]]] 2017-9-10 10:7:21.1 (optional leading zeroes) YYYY/M/D YYYY/M/D[ h:m[:s[.S]]] 2017/9/10 10:7:21.1 (optional leading zeroes) DD-mon-YYYY DD-mon-YYYY[ hh:mm[:ss[. 31-Dec-2017 11:12:13.456 SSS]]] YYYY-mon-DD YYYY-mon-DD[ hh:mm[:ss[. 2017-Dec-31 11:12:13.456 SSS]]] Pattern Description YYYY four digit year representation (0000-9999) MM two digit month representation (01-12) DD two digit day of month representation (01-31) m short month representation (Jan-Dec) a short day of week representation (Sun-Sat). hh two digit 24 hour representation (00-23) h two digit 12 hour representation (00-12) P uppercase AM/PM representation mm two digit minute representation (00-59) ss two digit seconds representation (00-59) SSS 3 digits fraction representation for milliseconds (000-999) Note: These date patterns are not the same as date parts used in the DATEPART function. 3.2.2.7.22.5 Supported Field Delimiters Field delimiters can be one or more characters. 3.2.2.7.22.6 Customizing Quotations Using Alternative Characters 3.2.2.7.22.7 Syntax Example 1 - Customizing Quotations Using Alternative Characters The following is the correct syntax for customizing quotations using alternative characters: 3.2. SQL Statements and Syntax 441 SQream DB, Release 2021.1 copy t from wrapper csv_fdw options (location = '/tmp/source_file.csv', quote='@'); copy t to wrapper csv_fdw options (location = '/tmp/destination_file.csv', quote='@'); 3.2.2.7.22.8 Usage Example 1 - Customizing Quotations Using Alternative Characters The following is an example of line taken from a CSV when customizing quotations using a character: Pepsi-"Cola",@Coca-"Cola"@,Sprite,Fanta 3.2.2.7.22.9 Syntax Example 2 - Customizing Quotations Using ASCII Character Codes The following is the correct syntax for customizing quotations using ASCII character codes: copy t from wrapper csv_fdw options (location = '/tmp/source_file.csv', quote=E'\064'); copy t to wrapper csv_fdw options (location = '/tmp/destination_file.csv', quote=E'\064 ,→'); 3.2.2.7.22.10 Usage Example 2 - Customizing Quotations Using ASCII Character Codes The following is an example of line taken from a CSV when customizing quotations using an ASCII character code: Pepsi-"Cola",@Coca-"Cola"@,Sprite,Fanta 3.2.2.7.22.11 Multi-Character Delimiters SQream DB supports multi-character field delimiters, sometimes found in non-standard files. A multi-character delimiter can be specified. For example, DELIMITER '%%', DELIMITER '{~}', etc. 3.2.2.7.22.12 Printable Characters Any printable ASCII character (or characters) can be used as a delimiter without special syntax. The default CSV field delimiter is a comma (,). A printable character is any ASCII character in the range 32 - 126. Literal quoting rules apply with delimiters. For example, to use ' as a field delimiter, use DELIMITER '''' 3.2.2.7.22.13 Non-Printable Characters A non-printable character (1 - 31, 127) can be used in its octal form. A tab can be specified by escaping it, for example \t. Other non-printable characters can be specified using their octal representations, by using the E'\000' format, where 000 is the octal value of the character. For example, ASCII character 15, known as “shift in”, can be specified using E'\017'. 442 Chapter 3. Reference SQream DB, Release 2021.1 3.2.2.7.22.14 Unsupported Field Delimiters The following ASCII field delimiters (octal range 001 - 176) are not supported: Charac- Deci- Sym- Charac- Deci- Sym- Charac- Deci- Sym- ter mal bol ter mal bol ter mal bol “ 34 42 a 97 141 p 112 160 - 45 55 b 98 142 q 113 161 . 46 56 c 99 143 r 114 162 : 58 72 d 100 144 s 115 163 \ 92 134 e 101 145 t 116 164 0 48 60 f 102 146 u 117 165 1 49 61 g 103 147 v 118 166 2 50 62 h 104 150 w 119 167 3 51 63 i 105 151 x 120 170 4 52 64 j 106 152 y 121 171 5 53 65 k 107 153 z 122 172 6 54 66 l 108 154 N 78 116 7 55 67 m 109 155 10 49 12 8 56 70 n 110 156 13 49 13 9 57 71 o 111 157 3.2.2.7.22.15 Capturing Rejected Rows Prior to the column process and storage, the COPY command parses the data. Whenever the data can’t be parsed because it is improperly formatted or doesn’t match the data structure, the entire record (or row) will be rejected. 1. When ERROR_LOG is not used, the COPY command will stop and roll back the transaction upon the first error. 2. When ERROR_LOG is set and ERROR_VERBOSITY is set to 1 (default), all errors and rejected rows are saved to the file path specified. 3. When ERROR_LOG is set and ERROR_VERBOSITY is set to 0, rejected rows are saved to the file path specified, but errors are not logged. This is useful for replaying the filelater. 3.2. SQL Statements and Syntax 443 SQream DB, Release 2021.1 3.2.2.7.22.16 CSV Support By default, SQream DB’s CSV parser can handle RFC 4180 standard CSVs , but can also be modified to support non-standard CSVs (with multi-character delimiters, unquoted fields, etc). All CSV files should be prepared according to these recommendations: • Files are UTF-8 or ASCII encoded • Field delimiter is an ASCII character or characters • Record delimiter, also known as a new line separator, is a Unix-style newline (\n), DOS-style newline (\r\n), or Mac style newline (\r). • Fields are optionally enclosed by double-quotes, or mandatory quoted if they contain one of the fol- lowing characters: – The record delimiter or field delimiter – A double quote character – A newline • If a field is quoted, any double quote that appears must be double-quoted (similar tothe string literals quoting rules. For example, to encode What are "birds"?, the field should appear as "What are ""birds""?". Other modes of escaping are not supported (e.g. 1,"What are \"birds\"?" is not a valid way of escaping CSV values). 3.2.2.7.22.17 Marking Null Markers NULL values can be marked in two ways in the CSV: • An explicit null marker. For example, col1,\N,col3 • An empty field delimited by the field delimiter. For example, col1,,col3 Note: If a text field is quoted but contains no content ("") it is considered an empty text field. It is not considered NULL. 3.2.2.7.22.18 Examples 3.2.2.7.22.19 Loading a Standard CSV File COPY table_name FROM WRAPPER csv_fdw OPTIONS (location = '/tmp/file.csv'); 3.2.2.7.22.20 Skipping Faulty Rows COPY table_name FROM WRAPPER csv_fdw OPTIONS (location = '/tmp/file.csv', continue_on_ ,→error = true); 444 Chapter 3. Reference SQream DB, Release 2021.1 3.2.2.7.22.21 Skipping a Maximum of 100 Faulty Rows COPY table_name FROM WRAPPER csv_fdw OPTIONS (location = '/tmp/file.csv', continue_on_ ,→error = true, error_count = 100); 3.2.2.7.22.22 Loading a Pipe Separated Value (PSV) File COPY table_name FROM WRAPPER csv_fdw OPTIONS (location = '/tmp/file.psv', delimiter = '| ,→'); 3.2.2.7.22.23 Loading a Tab Separated Value (TSV) File COPY table_name FROM WRAPPER csv_fdw OPTIONS (location = '/tmp/file.tsv', delimiter = '\t ,→'); 3.2.2.7.22.24 Loading an ORC File COPY table_name FROM WRAPPER orc_fdw OPTIONS (location = '/tmp/file.orc'); 3.2.2.7.22.25 Loading a Parquet File COPY table_name FROM WRAPPER parquet_fdw OPTIONS (location = '/tmp/file.parquet'); 3.2.2.7.22.26 Loading a Text File with Non-Printable Delimiters In the file below, the separator is DC1, which is represented by ASCII 17 decimal or 021 octal. COPY table_name FROM WRAPPER psv_fdw OPTIONS (location = '/tmp/file.txt', delimiter = E ,→'\021'); 3.2.2.7.22.27 Loading a Text File with Multi-Character Delimiters In the file below, the separator is ^|. COPY table_name FROM WRAPPER psv_fdw OPTIONS (location = '/tmp/file.txt', delimiter = '^| ,→'); In the file below, the separator is '|. The quote character has to be repeated, as per the literal quoting rules. COPY table_name FROM WRAPPER psv_fdw OPTIONS (location = '/tmp/file.txt', delimiter = ''' ,→'|'); 3.2. SQL Statements and Syntax 445 SQream DB, Release 2021.1 3.2.2.7.22.28 Loading Files with a Header Row Use OFFSET to skip rows. Note: When loading multiple files (e.g. with wildcards), this setting affects each file separately. COPY table_name FROM WRAPPER csv_fdw OPTIONS (location = '/tmp/file.psv', delimiter = '| ,→', offset = 2); 3.2.2.7.22.29 Loading Files Formatted for Windows (\r\n) COPY table_name FROM WRAPPER csv_fdw OPTIONS (location = '/tmp/file.psv', delimiter = ,→'\r\n'); 3.2.2.7.22.30 Loading a File from a Public S3 Bucket Note: The bucket must be publicly available and objects can be listed COPY table_name FROM WRAPPER csv_fdw OPTIONS (location = 's3://sqream-demo-data/file.csv ,→', delimiter = '\r\n', offset = 2); 3.2.2.7.22.31 Loading Files from an Authenticated S3 Bucket COPY table_name FROM WRAPPER psv_fdw OPTIONS (location = 's3://secret-bucket/*.csv',␣ ,→DELIMITER = '\r\n', OFFSET = 2, AWS_ID = '12345678', AWS_SECRET = 'super_secretive_ ,→secret'); 3.2.2.7.22.32 Saving Rejected Rows to a File Note: When loading multiple files (e.g. with wildcards), this error threshold is for the entire transaction. COPY table_name FROM WRAPPER csv_fdw OPTIONS (location = '/tmp/file.csv', ␣ ,→ ,continue_on_error = true ,error_log = '/temp/load_error.log' ); COPY table_name FROM WRAPPER csv_fdw OPTIONS (location = '/tmp/file.psv' ␣ ,→ ,delimiter '|' ,error_log = '/temp/load_error.log' -- Save␣ ,→error log (continues on next page) 446 Chapter 3. Reference SQream DB, Release 2021.1 (continued from previous page) ,rejected_data = '/temp/load_rejected.log' - ,→- Only save rejected rows ,limit = 100 -- Only load 100 rows ,error_count = 5 -- Stop the load if 5␣ ,→errors reached ); 3.2.2.7.22.33 Loading CSV Files from a Set of Directories COPY table_name FROM WRAPPER csv_fdw OPTIONS (location = '/tmp/2019_08_*/*.csv'); 3.2.2.7.22.34 Rearranging Destination Columns When the source of the files does not match the table structure, tell the COPY command what the order of columns should be COPY table_name (fifth, first, third) FROM WRAPPER csv_fdw OPTIONS (location = '/tmp/*. ,→csv'); Note: Any column not specified will revert to its default value or NULL value if nullable 3.2.2.7.22.35 Loading Non-Standard Dates If files contain dates not formatted as ISO8601, tell COPY how to parse the column. After parsing, the date will appear as ISO8601 inside SQream DB. These are called date parsers. You can find the supported dates in the ‘Supported date parsers’ table above In this example, date_col1 and date_col2 in the table are non-standard. date_col3 is mentioned explicitly, but can be left out. Any column that is not specified is assumed to be ISO8601. COPY table_name FROM WRAPPER csv_fdw OPTIONS (location = '/tmp/*.csv', datetime_format = ,→'DMY'); 3.2.2.7.23 COPY TO COPY ... TO is a statement that can be used to export data from a SQream database table or query to a file on the filesystem. In general, COPY moves data between filesystem files and SQream DB tables. Note: To copy data from a file to a table, see COPY FROM. 3.2. SQL Statements and Syntax 447 SQream DB, Release 2021.1 3.2.2.7.23.1 Permissions The role must have the SELECT permission on every table or schema that is referenced by the statement. 3.2.2.7.23.2 Syntax COPY { [schema_name].table_name [ ( column_name [, ... ] ) ] | query } TO [FOREIGN DATA] WRAPPER fdw_name OPTIONS ( [ copy_to_option [, ...]] ) ; fdw_name ::= csw_fdw | parquet_fdw | orc_fdw schema_name ::= identifer table_name ::= identifier copy_to_option ::= LOCATION = { filename | S3 URI | HDFS URI } | DELIMITER = '{ delimiter }' | RECORD_DELIMITER = '{ record delimiter }' | HEADER = { true | false } | AWS_ID = '{ AWS ID }' | AWS_SECRET = '{ AWS Secret }' delimiter ::= string record delimiter ::= string AWS ID ::= string AWS Secret ::= string 448 Chapter 3. Reference SQream DB, Release 2021.1 3.2.2.7.23.3 Elements Parameter Description [schema_name].Name of the table to be exported table_name query An SQL query that returns a table result, or a table name fdw_name The name of the Foreign Data Wrapper to use. Supported FDWs are csv_fdw, orc_fdw, or parquet_fdw. LOCATION A path on the local filesystem, S3, or HDFS URI. For example, /tmp/foo.csv, s3:// my-bucket/foo.csv, or hdfs://my-namenode:8020/foo.csv. The local path must be an absolute path that SQream DB can access. HEADER The CSV file will contain a header line with the names of each column in thefile.This option is allowed only when using CSV format. DELIMITER Specifies the character that separates fields (columns) within each row of thefile.The default is a comma character (,). AWS_ID, Specifies the authentication details for secured S3 buckets AWS_SECRET 3.2.2.7.23.4 Usage notes 3.2.2.7.23.5 Supported field delimiters 3.2.2.7.23.6 Printable characters Any printable ASCII character can be used as a delimiter without special syntax. The default CSV field delimiter is a comma (,). A printable character is any ASCII character in the range 32 - 126. 3.2.2.7.23.7 Non-printable characters A non-printable character (1 - 31, 127) can be used in its octal form. A tab can be specified by escaping it, for example \t. Other non-printable characters can be specified using their octal representations, by using the E'\000' format, where 000 is the octal value of the character. For example, ASCII character 15, known as “shift in”, can be specified using E'\017'. 3.2.2.7.23.8 Date format The date format in the output CSV is formatted as ISO 8601 (2019-12-31 20:30:55.123), regardless of how it was parsed initially with COPY FROM date parsers. 3.2.2.7.23.9 Examples 3.2.2.7.23.10 Export table to a CSV without HEADER COPY nba TO WRAPPER csv_fdw OPTIONS (LOCATION = '/tmp/nba_export.csv', DELIMITER = ',',␣ ,→HEADER = false); (continues on next page) 3.2. SQL Statements and Syntax 449 SQream DB, Release 2021.1 (continued from previous page) $ head -n6 nba.csv Avery Bradley,Boston Celtics,0,PG,25,6-2,180,Texas,7730337 Jae Crowder,Boston Celtics,99,SF,25,6-6,235,Marquette,6796117 John Holland,Boston Celtics,30,SG,27,6-5,205,Boston University,\N R.J. Hunter,Boston Celtics,28,SG,22,6-5,185,Georgia State,1148640 Jonas Jerebko,Boston Celtics,8,PF,29,6-10,231,\N,5000000 Amir Johnson,Boston Celtics,90,PF,29,6-9,240,\N,12000000 3.2.2.7.23.11 Export table to a CSV with a HEADER row COPY nba TO WRAPPER csv_fdw OPTIONS (LOCATION = '/tmp/nba_export.csv', DELIMITER = ',',␣ ,→HEADER = true); $ head -n6 nba_h.csv Name,Team,Number,Position,Age,Height,Weight,College,Salary Avery Bradley,Boston Celtics,0,PG,25,6-2,180,Texas,7730337 Jae Crowder,Boston Celtics,99,SF,25,6-6,235,Marquette,6796117 John Holland,Boston Celtics,30,SG,27,6-5,205,Boston University,\N R.J. Hunter,Boston Celtics,28,SG,22,6-5,185,Georgia State,1148640 Jonas Jerebko,Boston Celtics,8,PF,29,6-10,231,\N,5000000 3.2.2.7.23.12 Export table to a TSV with a header row COPY nba TO WRAPPER csv_fdw OPTIONS (LOCATION = '/tmp/nba_export.csv', DELIMITER = '|',␣ ,→HEADER = true); $ head -n6 nba_h.tsv Name Team Number Position Age Height Weight College Salary Avery Bradley Boston Celtics 0 PG 25 6-2 180 Texas 7730337 Jae Crowder Boston Celtics 99 SF 25 6-6 235 Marquette ␣ ,→6796117 John Holland Boston Celtics 30 SG 27 6-5 205 Boston␣ ,→University \N R.J. Hunter Boston Celtics 28 SG 22 6-5 185 Georgia State ␣ ,→1148640 Jonas Jerebko Boston Celtics 8 PF 29 6-10 231 \N 5000000 3.2.2.7.23.13 Use non-printable ASCII characters as delimiter Non-printable characters can be specified using their octal representations, by using the E'\000' format, where 000 is the octal value of the character. For example, ASCII character 15, known as “shift in”, can be specified using E'\017'. COPY nba TO WRAPPER csv_fdw OPTIONS (LOCATION = '/tmp/nba_export.csv', DELIMITER = E'\017 ,→'); 450 Chapter 3. Reference SQream DB, Release 2021.1 COPY nba TO WRAPPER csv_fdw OPTIONS (LOCATION = '/tmp/nba_export.csv', DELIMITER = E'\011 ,→'); -- 011 is a tab character 3.2.2.7.23.14 Exporting the result of a query to a CSV COPY (SELECT "Team", AVG("Salary") FROM nba GROUP BY 1) TO WRAPPER csv_fdw OPTIONS␣ ,→(LOCATION = '/tmp/nba_export.csv'); $ head -n5 nba_salaries.csv Atlanta Hawks,4860196 Boston Celtics,4181504 Brooklyn Nets,3501898 Charlotte Hornets,5222728 Chicago Bulls,5785558 3.2.2.7.23.15 Saving files to an authenticated S3 bucket COPY (SELECT "Team", AVG("Salary") FROM nba GROUP BY 1) TO WRAPPER csv_fdw OPTIONS␣ ,→(LOCATION = 's3://my_bucket/salaries/nba_export.csv', AWS_ID = 'my_aws_id', AWS_SECRET␣ ,→= 'my_aws_secret'); 3.2.2.7.23.16 Saving files to an HDFS path COPY (SELECT "Team", AVG("Salary") FROM nba GROUP BY 1) TO WRAPPER csv_fdw OPTIONS␣ ,→(LOCATION = 'hdfs://pp_namenode:8020/nba_export.csv'); 3.2.2.7.23.17 Export table to a parquet file COPY nba TO WRAPPER parquet_fdw OPTIONS (LOCATION = '/tmp/nba_export.parquet'); 3.2.2.7.23.18 Export a query to a parquet file COPY (select x,y from t where z=0) TO WRAPPER parquet_fdw OPTIONS (LOCATION = '/tmp/file. ,→parquet'); 3.2.2.7.23.19 Export table to a ORC file COPY nba TO WRAPPER orc_fdw OPTIONS (LOCATION = '/tmp/nba_export.orc'); 3.2. SQL Statements and Syntax 451 SQream DB, Release 2021.1 3.2.2.7.24 DELETE DELETE removes specific rows from a table. See more information about how SQream DB deletes data in the Deleting Data guide. Tip: • To delete all rows from a table, see TRUNCATE • To delete columns, see DROP COLUMN 3.2.2.7.24.1 Permissions The role must have the DELETE and SELECT permissions at the table level. 3.2.2.7.24.2 Syntax delete_table_statement ::= DELETE FROM [schema_name.]table_name [ WHERE value_expr ] ; chunk_cleanup_statement ::= SELECT CLEANUP_CHUNKS ( 'schema_name', 'table_name' ) ; extent_cleanup_statement ::= SELECT CLEANUP_EXTENTS ( 'schema_name', 'table_name' ) ; table_name ::= identifier schema_name ::= identifier 3.2.2.7.24.3 Parameters Parame- Description ter schema_nameThe name of the schema for the table. table_nameThe name of the table to delete rows from. value_exprAn expression that returns Boolean values using columns, such as 3.2.2.7.24.4 How SQream DB deletes data Deleting data in SQream DB is a two-step process. First, SQream DB marks rows as deleted, but they remain on-disk until a cleanup process is initiated. 452 Chapter 3. Reference SQream DB, Release 2021.1 See more information about how SQream DB deletes data in the Deleting Data guide. 3.2.2.7.24.5 Notes • ALTER TABLE and other DDL operations are blocked on tables that require clean-up. • The value expression for deletion can’t be the result of a subquery or a join. • SQream DB may prevent a very long delete process. If the estimated time is beyond the threshold, the error message will explain how to override this limitation and continue the process. 3.2.2.7.24.6 Examples 3.2.2.7.24.7 Deleting values from a table farm=> SELECT * FROM cool_animals; 1,Dog ,7 2,Possum ,3 3,Cat ,5 4,Elephant ,6500 5,Rhinoceros ,2100 6,\N,\N 6 rows farm=> DELETE FROM cool_animals WHERE weight > 1000; executed farm=> SELECT * FROM cool_animals; 1,Dog ,7 2,Possum ,3 3,Cat ,5 6,\N,\N 4 rows 3.2.2.7.24.8 Deleting values based on more complex predicates farm=> SELECT * FROM cool_animals; 1,Dog ,7 2,Possum ,3 3,Cat ,5 4,Elephant ,6500 5,Rhinoceros ,2100 6,\N,\N 6 rows farm=> DELETE FROM cool_animals WHERE weight > 1000; executed (continues on next page) 3.2. SQL Statements and Syntax 453 SQream DB, Release 2021.1 (continued from previous page) farm=> SELECT * FROM cool_animals; 1,Dog ,7 2,Possum ,3 3,Cat ,5 6,\N,\N 4 rows 3.2.2.7.24.9 Identifying and cleaning up tables 3.2.2.7.24.10 List tables that haven’t been cleaned up farm=> SELECT t.table_name FROM sqream_catalog.delete_predicates dp JOIN sqream_catalog.tables t ON dp.table_id = t.table_id GROUP BY 1; cool_animals 1 row 3.2.2.7.24.11 Identify predicates for clean-up farm=> SELECT delete_predicate FROM sqream_catalog.delete_predicates dp JOIN sqream_catalog.tables t ON dp.table_id = t.table_id WHERE t.table_name = 'cool_animals'; weight > 1000 1 row 3.2.2.7.24.12 Triggering a cleanup -- Chunk reorganization (SWEEP) farm=> SELECT CLEANUP_CHUNKS('public','cool_animals'); executed -- Delete leftover files (VACUUM) farm=> SELECT CLEANUP_EXTENTS('public','cool_animals'); executed farm=> SELECT delete_predicate FROM sqream_catalog.delete_predicates dp JOIN sqream_catalog.tables t ON dp.table_id = t.table_id WHERE t.table_name = 'cool_animals'; (continues on next page) 454 Chapter 3. Reference SQream DB, Release 2021.1 (continued from previous page) 0 rows 3.2.2.7.25 INSERT INSERT inserts one or more rows into a table. Tip: • To bulk load data into existing tables, the COPY FROM command performs better than INSERT. • To load Parquet or ORC files, see CREATE FOREIGN TABLE 3.2.2.7.25.1 Permissions The role must have the INSERT permission to the destination table. 3.2.2.7.25.2 Syntax insert_statement ::= INSERT INTO [schema_name.]table_name [ ( column_name [, ... ])] query ; schema_name ::= identifier table_name ::= identifier column_name ::= identifier 3.2.2.7.25.3 Elements Parameter Description [schema_name.Table to insert data into ]table_name ( A comma separated list of column names that specifies the destination columns of the column_name insert. [, ...] ) query A SELECT or VALUES statement. Each value must match the data type of its destination column. The values must also match the order of the table or columns if specified with ‘ (column_name, …)‘. 3.2. SQL Statements and Syntax 455 SQream DB, Release 2021.1 3.2.2.7.25.4 Examples 3.2.2.7.25.5 Inserting a single row INSERT INTO cool_animals VALUES (5, 'fox', 15); 3.2.2.7.25.6 Inserting into rows with default values The id column is an IDENTITY column, and has a default, so it can be omitted. INSERT INTO cool_animals(name, weight) VALUES ('fox', 15); 3.2.2.7.25.7 Changing column order INSERT INTO cool_animals(name, weight, id) VALUES ('possum', 7, 6); 3.2.2.7.25.8 Inserting multiple rows INSERT INTO cool_animals(name, weight) VALUES ('koala', 20), ('lemur', 6), ('kiwi', 3); 3.2.2.7.25.9 Import data from other tables INSERT can be used to insert data obtained from queries on other tables, including foreign tables. For example, farm=> SELECT name, weight FROM all_animals . WHERE region = 'Australia'; name | weight ------+------Kangaroo | 120 Koala | 20 Wombat | 60 Platypus | 5 Wallaby | 35 Echidna | 8 Dingo | 25 INSERT INTO cool_animals(name,weight) SELECT name, weight FROM all_animals WHERE region = 'Australia'; 456 Chapter 3. Reference SQream DB, Release 2021.1 3.2.2.7.25.10 Inserting data with positional placeholders When preparing an INSERT statement for loading data over the network (for example, from a Python or Java application, use positional placeholders. Example using Python: data = [["Kangaroo", 120], ["Koala", 20], ["Platypus", 5]] data_len = len(data) insert_stmt = 'INSERT INTO cool_animals (name, weight) VALUES (?, ?)' con.executemany(insert_stmt, data) Note: The executemany method is used only for parametrized statements like INSERT. Running multiple SELECT queries or other statements this way is not supported. 3.2.2.7.26 SELECT SELECT is the main statement that allows reading and processing of data. It is used to retrieve rows and columns from one or more tables. When used alone, the statement is known as a “SELECT statement” or “SELECT query”, but it can also be combined to create more elaborate statements like CREATE TABLE AS. Note: The SELECT keyword is also used to execute utility functions, such as SELECT get_ddl();. These are covered in their respective locations. In this topic: • Permissions • Syntax • Elements • Notes – Query processing – Select lists • Examples – Simple queries – Show a count of the rows – Get all columns – Filter on conditions – Filter based on a list – Select only distinct rows – Count distinct values 3.2. SQL Statements and Syntax 457 SQream DB, Release 2021.1 – Rename columns with aliases – Searching with LIKE – Aggregate functions – Filtering on aggregates – Sorting results – Sorting with multiple columns – Combining two or more queries – Common table expressions (CTE) ∗ Nested CTEs ∗ Reusing CTEs 3.2.2.7.26.1 Permissions The role must have the SELECT permission on every table or schema that is referenced by the SELECT query. 3.2.2.7.26.2 Syntax query ::= query_term [ UNION ALL query_term [ ...]] [ ORDER BY order [, ... ]] [ LIMIT num_rows ] ; query_term ::= [ WITH table_alias AS (query_term) [, ...] ] SELECT [ TOP num_rows ] [ DISTINCT ] select_list [ FROM table_ref [, ... ] [ WHERE value_expr [ GROUP BY value_expr [, ... ] [ HAVING value_expr ] ] | ( VALUES ( value_expr [, ... ] ) [, ... ]) select_list ::= value_expr [ [ AS ] column_alias ] [, ... ] | window_expr [ [ AS ] column alias ] column_alias ::= identifier (continues on next page) 458 Chapter 3. Reference SQream DB, Release 2021.1 (continued from previous page) table_alias ::= identifier window_expr ::= window_fn ( [ value_expr [, ...]]) OVER ( [ PARTITION BY value_expr [, ...]] [ ORDER BY order [ ASC | DESC ] [, ...]] ) table_ref ::= [schema_name.]table_name [ [ AS ] alias [ ( column_alias [, ... ])]] | ( query ) [ [AS] alias [ ( column_alias [, ... ])]] | table_ref join_type table_ref [ ON value_expr ] | table_alias schema_name ::= identifier table_name ::= identifier alias ::= identifier join_type ::= [ INNER ] [ join_hint ] JOIN | LEFT [ OUTER ] [ join_hint ] JOIN | RIGHT [ OUTER ] [ join_hint ] JOIN | CROSS [ join_hint ] JOIN join_hint ::= MERGE | LOOP order ::= value_expr [ ASC | DESC ] [, ...][NULLS FIRST | LAST ] 3.2. SQL Statements and Syntax 459 SQream DB, Release 2021.1 3.2.2.7.26.3 Elements Parameter Description DISTINCT Remove duplicates FROM A table name or another sub-clause that creates a table, such as VALUES or a subquery. table_ref WHERE An expression that returns Boolean values using columns, such as 3.2.2.7.26.4 Notes 3.2.2.7.26.5 Query processing Queries are processed in a manner equivalent to the following order: 1. FROM, including nested queries in the FROM 2. WHERE 3. SELECT list row → value functions and these functions inside aggregates and window function calls in select list 4. GROUP BY and aggregates 5. HAVING 6. Window functions 7. SELECT list row → value functions on the outside of aggregates and window functions 8. DISTINCT 9. UNION ALL 10. ORDER BY 11. LIMIT / TOP Inside the FROM clause, the processing occurs in the usual way, from the outside in. 3.2.2.7.26.6 Select lists The select_list is a comma separated list of column names and value expressions. • Use LIMIT num_rows to retrieve only the first num_rows results. SQream DB also supports the TOP num_rows syntax from SQL Server. • DISTINCT can be used to remove duplicate rows. 460 Chapter 3. Reference SQream DB, Release 2021.1 • Value expressions in select lists support aggregate and window functions as well as normal value expressions. Tip: Each expression in the select list is given an ordinal number, from 1 to the number of expressions. When using ORDER BY or GROUP BY, these ordinals are used as shorthand to refer to these expressions. SELECT a, SUM(b) FROM t GROUP BY a ORDER BY SUM(b) DESC; -- is equivalent to: SELECT a, SUM(b) FROM t GROUP BY 1 ORDER BY 2 DESC; 3.2.2.7.26.7 Examples Assume a table named nba, with the following structure: CREATE TABLE nba ( Name varchar(40), Team varchar(40), Number tinyint, Position varchar(2), Age tinyint, Height varchar(4), Weight real, College varchar(40), Salary float ); Here’s a peek at the table contents (Download nba.csv): Table 15: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics 3.2. SQL Statements and Syntax 461 SQream DB, Release 2021.1 3.2.2.7.26.8 Simple queries This query will get the Name, Team name, and Age from the NBA table, but only show the first 10 results. nba=> SELECT Name, Team, Age FROM nba LIMIT 10; Avery Bradley,Boston Celtics,25 Jae Crowder,Boston Celtics,25 John Holland,Boston Celtics,27 R.J. Hunter,Boston Celtics,22 Jonas Jerebko,Boston Celtics,29 Amir Johnson,Boston Celtics,29 Jordan Mickey,Boston Celtics,21 Kelly Olynyk,Boston Celtics,25 Terry Rozier,Boston Celtics,22 Marcus Smart,Boston Celtics,22 3.2.2.7.26.9 Show a count of the rows Use COUNT(*) to retrieve the number of rows in a result. nba=> SELECT COUNT(*) FROM nba; 457 3.2.2.7.26.10 Get all columns * is used as shorthand for “all columns”. Warning: Running a SELECT * query on very large tables can occupy the client for a long time, if the result set is big. nba=> SELECT * FROM nba; Name | Team | Number | Position | Age | Height |␣ ,→Weight | College | Salary ------+------+------+------+-----+------+--- ,→-----+------+------Avery Bradley | Boston Celtics | 0 | PG | 25 | 6-2 | ␣ ,→ 180 | Texas | 7730337 Jae Crowder | Boston Celtics | 99 | SF | 25 | 6-6 | ␣ ,→ 235 | Marquette | 6796117 John Holland | Boston Celtics | 30 | SG | 27 | 6-5 | ␣ ,→ 205 | Boston University | R.J. Hunter | Boston Celtics | 28 | SG | 22 | 6-5 | ␣ ,→ 185 | Georgia State | 1148640 Jonas Jerebko | Boston Celtics | 8 | PF | 29 | 6-10 | ␣ ,→ 231 | | 5000000 Amir Johnson | Boston Celtics | 90 | PF | 29 | 6-9 | ␣ ,→ 240 | | 12000000 Jordan Mickey | Boston Celtics | 55 | PF | 21 | 6-8 | ␣ ,→ 235 | LSU | 1170960 (continues on next page) 462 Chapter 3. Reference SQream DB, Release 2021.1 (continued from previous page) Kelly Olynyk | Boston Celtics | 41 | C | 25 | 7-0 | ␣ ,→ 238 | Gonzaga | 2165160 Terry Rozier | Boston Celtics | 12 | PG | 22 | 6-2 | ␣ ,→ 190 | Louisville | 1824360 Marcus Smart | Boston Celtics | 36 | PG | 22 | 6-4 | ␣ ,→ 220 | Oklahoma State | 3431040 3.2.2.7.26.11 Filter on conditions Use the WHERE clause to filter results. nba=> SELECT "Name","Age","Salary" FROM nba WHERE "Age" < 24 LIMIT 5; R.J. Hunter,22,1148640 Jordan Mickey,21,1170960 Terry Rozier,22,1824360 Marcus Smart,22,3431040 James Young,20,1749840 nba=> SELECT "Name","Age","Salary" FROM nba WHERE "Age" < 24 AND "Salary" > 1800000␣ ,→LIMIT 5; Terry Rozier,22,1824360 Marcus Smart,22,3431040 Kristaps Porzingis,20,4131720 Joel Embiid,22,4626960 Nerlens Noel,22,3457800 3.2.2.7.26.12 Filter based on a list WHERE column IN (value_expr in comma separated list) matches the column with any value in the list. nba=> SELECT "Name","Age","Salary","Team" FROM nba WHERE "Team" IN ('Utah Jazz', ,→'Portland Trail Blazers'); Cliff Alexander,20,525093,Portland Trail Blazers Al-Farouq Aminu,25,8042895,Portland Trail Blazers Pat Connaughton,23,625093,Portland Trail Blazers [...] Shelvin Mack,26,2433333,Utah Jazz Raul Neto,24,900000,Utah Jazz Tibor Pleiss,26,2900000,Utah Jazz Jeff Withey,26,947276,Utah Jazz 3.2.2.7.26.13 Select only distinct rows nba=> SELECT DISTINCT "Team" FROM nba; Atlanta Hawks Boston Celtics (continues on next page) 3.2. SQL Statements and Syntax 463 SQream DB, Release 2021.1 (continued from previous page) Brooklyn Nets Charlotte Hornets Chicago Bulls Cleveland Cavaliers Dallas Mavericks Denver Nuggets Detroit Pistons Golden State Warriors Houston Rockets Indiana Pacers Los Angeles Clippers Los Angeles Lakers Memphis Grizzlies Miami Heat Milwaukee Bucks Minnesota Timberwolves New Orleans Pelicans New York Knicks Oklahoma City Thunder Orlando Magic Philadelphia 76ers Phoenix Suns Portland Trail Blazers Sacramento Kings San Antonio Spurs Toronto Raptors Utah Jazz Washington Wizards 3.2.2.7.26.14 Count distinct values nba=> SELECT COUNT(DISTINCT "Team") FROM nba; 30 3.2.2.7.26.15 Rename columns with aliases nba=> SELECT "Name" AS "Player", -- Note usage of AS . "Team", . "Salary" "Yearly salary" -- AS is optional. . -- This is identical to "Salary" AS "Yearly salary" FROM nba LIMIT 5; Player | Team | Yearly salary ------+------+------Avery Bradley | Boston Celtics | 7730337 Jae Crowder | Boston Celtics | 6796117 John Holland | Boston Celtics | R.J. Hunter | Boston Celtics | 1148640 Jonas Jerebko | Boston Celtics | 5000000 464 Chapter 3. Reference SQream DB, Release 2021.1 3.2.2.7.26.16 Searching with LIKE LIKE allows pattern matching text in the WHERE clause. • % matches 0 or more characters • _ matches exactly 1 character nba=> SELECT "Name","Age","Salary","Team" FROM nba WHERE "Team" LIKE 'Portland%' LIMIT 5; Cliff Alexander,20,525093,Portland Trail Blazers Al-Farouq Aminu,25,8042895,Portland Trail Blazers Pat Connaughton,23,625093,Portland Trail Blazers Allen Crabbe,24,947276,Portland Trail Blazers Ed Davis,27,6980802,Portland Trail Blazers 3.2.2.7.26.17 Aggregate functions Aggregate functions compute a single result from a column. Tip: Aggregate functions can return NULL if no rows are selected or all input values are NULL. The notable exception to this rule is COUNT, which always returns an integer. Use COALESCE to substitute zero or another value for NULL when necessary. nba=> SELECT max("Salary") FROM nba; 25000000 Aggregate functions are often combined with GROUP BY. nba=> SELECT "Team",max("Salary") FROM nba GROUP BY "Team"; Atlanta Hawks,18671659 Boston Celtics,12000000 Brooklyn Nets,19689000 Charlotte Hornets,13500000 [...] Utah Jazz,15409570 Washington Wizards,15851950 Note: Unlike some other databases, when using an aggregate function, all other items in the select list must either be aggregated or be specified in a GROUP BY. A query like SELECT "Team",max("Salary") FROM nba is not valid, and will result in an error. 3.2.2.7.26.18 Filtering on aggregates Filtering on aggregates is done with the HAVING clause, rather than the WHERE clause. nba=> SELECT "Team",AVG("Salary") FROM nba GROUP BY "Team" HAVING AVG("Salary") BETWEEN␣ ,→4477884 AND 5018868; Atlanta Hawks,4860196 (continues on next page) 3.2. SQL Statements and Syntax 465 SQream DB, Release 2021.1 (continued from previous page) Dallas Mavericks,4746582 Detroit Pistons,4477884 Houston Rockets,5018868 Los Angeles Lakers,4784695 Minnesota Timberwolves,4593053 New York Knicks,4581493 Sacramento Kings,4778911 Toronto Raptors,4741174 3.2.2.7.26.19 Sorting results ORDER BY takes a comma separated list of ordering specifications - a column followed by ASC for ascending or DESC for descending. Note: When ORDER BY is not specified in a query, rows are returned based on the order in which theywere read, not by any consistent criteria. Unlike some databases, NULL values are neither first nor last - but can appear anywhere in the result set. Tip: SQream DB does not support functions and complex arguments in the ORDER BY clause. To work around this limitation, use ordinals or aliases, as with the examples below, which are functionally identical. nba=> SELECT "Team",AVG("Salary") as "Average Salary" FROM nba GROUP BY "Team" ORDER BY ,→"Average Salary" DESC; Team | Average Salary ------+------Cleveland Cavaliers | 7642049 Miami Heat | 6347359 Los Angeles Clippers | 6323642 Oklahoma City Thunder | 6251019 [...] Brooklyn Nets | 3501898 Portland Trail Blazers | 3220121 Philadelphia 76ers | 2213778 nba=> SELECT "Team",AVG("Salary") as "Average Salary" FROM nba GROUP BY "Team" ORDER BY␣ ,→2 DESC; Team | Average Salary ------+------Cleveland Cavaliers | 7642049 Miami Heat | 6347359 Los Angeles Clippers | 6323642 Oklahoma City Thunder | 6251019 [...] Brooklyn Nets | 3501898 Portland Trail Blazers | 3220121 Philadelphia 76ers | 2213778 466 Chapter 3. Reference SQream DB, Release 2021.1 3.2.2.7.26.20 Sorting with multiple columns Order retrieved rows by multiple columns: nba=> SELECT "Name", "Position", "Weight", "Salary" FROM nba ORDER BY "Weight" DESC, ,→"Salary" ASC; Name | Position | Weight | Salary ------+------+------+------Nikola Pekovic | C | 307 | 12100000 Boban Marjanovic | C | 290 | 1200000 Al Jefferson | C | 289 | 13500000 [...] Tim Frazier | PG | 170 | 845059 Brandon Jennings | PG | 169 | 8344497 Briante Weber | PG | 165 | Bryce Cotton | PG | 165 | 700902 Aaron Brooks | PG | 161 | 2250000 3.2.2.7.26.21 Combining two or more queries UNION ALL can be used to combine the results of two or more queries into one result set. UNION ALL does not remove duplicate results. nba=> SELECT "Position" FROM nba WHERE "Weight" > 300 . UNION ALL SELECT "Position" FROM nba WHERE "Weight" < 170; C PG PG PG PG 3.2.2.7.26.22 Common table expressions (CTE) Common table expressions or CTEs allow a possibly complex subquery to be represented in a short way later on, for improved readability. It does not affect query performance. nba=> WITH s AS (SELECT "Name" FROM nba WHERE "Salary" > 20000000) . SELECT * FROM nba AS n, s WHERE n."Name" = s."Name"; Name | Team | Number | Position | Age | Height | Weight |␣ ,→College | Salary | name0 ------+------+------+------+-----+------+------+---- ,→------+------+------Carmelo Anthony | New York Knicks | 7 | SF | 32 | 6-8 | 240 |␣ ,→Syracuse | 22875000 | Carmelo Anthony Chris Bosh | Miami Heat | 1 | PF | 32 | 6-11 | 235 |␣ ,→Georgia Tech | 22192730 | Chris Bosh Chris Paul | Los Angeles Clippers | 3 | PG | 31 | 6-0 | 175 |␣ ,→Wake Forest | 21468695 | Chris Paul (continues on next page) 3.2. SQL Statements and Syntax 467 SQream DB, Release 2021.1 (continued from previous page) Derrick Rose | Chicago Bulls | 1 | PG | 27 | 6-3 | 190 |␣ ,→Memphis | 20093064 | Derrick Rose Dwight Howard | Houston Rockets | 12 | C | 30 | 6-11 | 265 | ␣ ,→ | 22359364 | Dwight Howard Kevin Durant | Oklahoma City Thunder | 35 | SF | 27 | 6-9 | 240 |␣ ,→Texas | 20158622 | Kevin Durant Kobe Bryant | Los Angeles Lakers | 24 | SF | 37 | 6-6 | 212 | ␣ ,→ | 25000000 | Kobe Bryant LeBron James | Cleveland Cavaliers | 23 | SF | 31 | 6-8 | 250 | ␣ ,→ | 22970500 | LeBron James In this example, the WITH clause defines the temporary name r for the subquery which finds salaries over $20 million. The result set becomes a valid table reference in any table expression of the subsequent SELECT clause. 3.2.2.7.26.23 Nested CTEs SQream DB also supports any amount of nested CTEs, such as this: WITH w AS (SELECT * FROM (WITH x AS (SELECT * FROM nba) SELECT * FROM x ORDER BY "Salary" DESC)) SELECT * FROM w ORDER BY "Weight" DESC; 3.2.2.7.26.24 Reusing CTEs SQream DB supports reusing CTEs several times in a query: nba=> WITH . nba_ct AS (SELECT "Name", "Team" FROM nba WHERE "College"='Connecticut'), . nba_az AS (SELECT "Name", "Team" FROM nba WHERE "College"='Arizona') . SELECT * FROM nba_az JOIN nba_ct ON nba_ct."Team" = nba_az."Team"; Name | Team | name0 | team0 ------+------+------+------Stanley Johnson | Detroit Pistons | Andre Drummond | Detroit Pistons Aaron Gordon | Orlando Magic | Shabazz Napier | Orlando Magic 3.2.2.7.27 TRUNCATE TRUNCATE removes all rows from a table. It is functionally identical to running a DELETE statement without a WHERE clause. 3.2.2.7.27.1 Permissions The role must have the DELETE permission at the table level. 468 Chapter 3. Reference SQream DB, Release 2021.1 3.2.2.7.27.2 Syntax truncate_table_statement ::= TRUNCATE [ TABLE ] [schema_name.]table_name [ RESTART IDENTITY | CONTINUE IDENTITY ] ; table_name ::= identifier schema_name ::= identifier 3.2.2.7.27.3 Parameters Parameter Description schema_name The name of the schema for the table. table_name The name of the table to truncate. RESTART IDENTITY Automatically restart sequences owned by columns of the truncated table. CONTINUE IDENTITY Do not change the values of sequences. This is the default. 3.2.2.7.27.4 Examples 3.2.2.7.27.5 Truncating a table farm=> SELECT * FROM cool_animals; 1,Dog ,7 2,Possum ,3 3,Cat ,5 4,Elephant ,6500 5,Rhinoceros ,2100 6,\N,\N 6 rows farm=> TRUNCATE TABLE cool_animals; executed farm=> SELECT * FROM cool_animals; 0 rows 3.2.2.7.28 VALUES VALUES is a table constructor - a clause that can be used to define tabular data. Tip: • Use VALUES in conjunction with INSERT statements to insert a set of one or more rows. 3.2. SQL Statements and Syntax 469 SQream DB, Release 2021.1 3.2.2.7.28.1 Permissions This clause requires no special permissions. 3.2.2.7.28.2 Syntax values_expr ::= VALUES ( value_expr [, ... ] ) [, ... ] ; 3.2.2.7.28.3 Notes Each set of comma-separated value_expr in parentheses represents a single row in the result set. Column names of the result table are auto-generated. To rename the column names, add an AS clause. 3.2.2.7.28.4 Examples 3.2.2.7.28.5 Tabular data with VALUES master=> VALUES (1,2,3,4), (5,6,7,8), (9,10,11,12); 1,2,3,4 5,6,7,8 9,10,11,12 3 rows 3.2.2.7.28.6 Using VALUES with a SELECT query To use VALUES in a select query, assign a name to the VALUES clause with AS master=> SELECT t.* FROM (VALUES (1,2,3,'a'), (5,6,7,'b'), (9,10,11,'c')) AS t; 1,2,3,a 5,6,7,b 9,10,11,c 3 rows You can also use this to rename the columns SELECT t.* FROM (VALUES (1,2,3,'a'), (5,6,7,'b'), (9,10,11,'c')) AS t(a,b,c,d); 3.2.2.7.28.7 Creating a table with VALUES Use AS to assign names to columns 470 Chapter 3. Reference SQream DB, Release 2021.1 CREATE TABLE cool_animals AS (SELECT t.* FROM (VALUES (1, 'dog'), (2, 'cat'), (3, 'horse'), (4, 'hippopotamus') ) AS t(id, name) ); 3.2.2.7.29 DROP_SAVED_QUERY DROP_SAVED_QUERY drops a previously saved query. Read more in the Saved queries guide. See also: ref:save_query, EXECUTE_SAVED_QUERY , ref:show_saved_query, ref:list_saved_queries. 3.2.2.7.29.1 Permissions Dropping a saved query requires no special permissions. 3.2.2.7.29.2 Syntax drop_saved_query_statement ::= SELECT DROP_SAVED_QUERY(saved_query_name) ; saved_query_name ::= string_literal 3.2.2.7.29.3 Returns If saved query is dropped successfully, returns nothing. 3.2.2.7.29.4 Parameters Parameter Description saved_query_name The name of the query to drop 3.2.2.7.29.5 Examples 3.2.2.7.29.6 Dropping a previously saved query t=> SELECT DROP_SAVED_QUERY('select_all'); executed 3.2. SQL Statements and Syntax 471 SQream DB, Release 2021.1 3.2.2.7.30 DUMP_DATABASE_DDL DUMP_DATABASE_DDL() is a function that shows the CREATE statements for database objects including views and tables. Begining with 2020.3.1, DUMP_DATABASE_DDL includes foreign tables in the output. Warning: This function does not currently show UDFs. To list available UDFs, use the catalog: farm=> SELECT * FROM sqream_catalog.user_defined_functions; farm,1,my_distance Then, export UDFs one-by-one using GET_FUNCTION_DDL. Tip: • For just tables, see GET_DDL. • For just views, see GET_VIEW_DDL. • For UDFs, see GET_FUNCTION_DDL. 3.2.2.7.30.1 Permissions The role must have the CONNECT permission at the database level. 3.2.2.7.30.2 Syntax dump_database_ddl_statement ::= SELECT DUMP_DATABASE_DDL() ; 3.2.2.7.30.3 Parameters This function accepts no parameters. 3.2.2.7.30.4 Examples 3.2.2.7.30.5 Getting the DDL for a database farm=> SELECT DUMP_DATABASE_DDL(); create table "public"."cool_animals" ( "id" int not null, "name" varchar(30) not null, "weight" double null, "is_agressive" bool default false not null ) ; (continues on next page) 472 Chapter 3. Reference SQream DB, Release 2021.1 (continued from previous page) create view "public".angry_animals as select "cool_animals"."id" as "id", "cool_animals"."name" as "name", "cool_animals"."weight" as "weight", "cool_animals"."is_agressive" as "is_agressive" from "public".cool_animals as cool_animals where "cool_animals"."is_agressive" = false; 3.2.2.7.30.6 Exporting database DDL to a file COPY (SELECT DUMP_DATABASE_DDL()) TO '/home/rhendricks/database.ddl'; 3.2.2.7.31 EXECUTE_SAVED_QUERY EXECUTE_SAVED_QUERY executes a previously saved query. Read more in the Saved queries guide. See also: ref:save_query, DROP_SAVED_QUERY , ref:show_saved_query, ref:list_saved_queries. 3.2.2.7.31.1 Permissions Executing a saved query requires SELECT permissions to access the tables referenced in the query. 3.2.2.7.31.2 Syntax execute_saved_query_statement ::= SELECT EXECUTE_SAVED_QUERY(saved_query_name, [ , argument [ , ... ]]) ; saved_query_name ::= string_literal argument ::= string_literal | number_literal 3.2.2.7.31.3 Returns Query execution results, based on the query saved. 3.2. SQL Statements and Syntax 473 SQream DB, Release 2021.1 3.2.2.7.31.4 Parameters Parameter Description saved_query_name The name of the query to execute argument A comma separated list of argument literal values 3.2.2.7.31.5 Notes • Query parameters can be used as substitutes for literal expressions. Parameters cannot be used to substitute identifiers, column names, table names, or other parts of the query. • Query parameters of a string datatype (like VARCHAR) must be of a fixed length, and can be used in equality checks, but not patterns (e.g. LIKE, RLIKE, etc) • Query parameters’ types are inferred at compile time. 3.2.2.7.31.6 Examples Assume a table named nba, with the following structure: CREATE TABLE nba ( Name varchar(40), Team varchar(40), Number tinyint, Position varchar(2), Age tinyint, Height varchar(4), Weight real, College varchar(40), Salary float ); Here’s a peek at the table contents (Download nba.csv): 474 Chapter 3. Reference SQream DB, Release 2021.1 Table 16: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics 3.2.2.7.31.7 Saving and executing a simple query t=> SELECT SAVE_QUERY('select_all','SELECT * FROM nba'); executed t=> SELECT EXECUTE_SAVED_QUERY('select_all'); Name | Team | Number | Position | Age | Height |␣ ,→Weight | College | Salary ------+------+------+------+-----+------+--- ,→-----+------+------Avery Bradley | Boston Celtics | 0 | PG | 25 | 6-2 | ␣ ,→ 180 | Texas | 7730337 Jae Crowder | Boston Celtics | 99 | SF | 25 | 6-6 | ␣ ,→ 235 | Marquette | 6796117 John Holland | Boston Celtics | 30 | SG | 27 | 6-5 | ␣ ,→ 205 | Boston University | R.J. Hunter | Boston Celtics | 28 | SG | 22 | 6-5 | ␣ ,→ 185 | Georgia State | 1148640 [...] 3.2.2.7.31.8 Saving and executing parametrized query Use parameters to replace them later at execution time. Tip: Use dollar quoting ($$) to avoid escaping strings. 3.2. SQL Statements and Syntax 475 SQream DB, Release 2021.1 t=> SELECT SAVE_QUERY(‘select_by_weight_and_team’,$$SELECT * FROM nba WHERE Weight > ? AND Team = ?$$); executed t=> SELECT EXE- CUTE_SAVED_QUERY(‘select_by_weight_and_team’, 240, ‘Toronto Raptors’); Name | Team | Number | Position | Age | Height | Weight | College | Salary ——————+—————–+——–+——— -+—–+——–+——–+————-+——– Bismack Biyombo | Toronto Raptors | 8 | C | 23 | 6-9 | 245 | | 2814000 James Johnson | Toronto Raptors | 3 | PF | 29 | 6-9 | 250 | Wake Forest | 2500000 Jason Thompson | Toronto Raptors | 1 | PF | 29 | 6-11 | 250 | Rider | 245177 Jonas Valanciunas | Toronto Raptors | 17 | C | 24 | 7-0 | 255 | | 4660482 3.2.2.7.32 GET_DDL GET_DDL( Tip: • For views, see GET_VIEW_DDL. • For the entire database, see DUMP_DATABASE_DDL. • For UDFs, see GET_FUNCTION_DDL. 3.2.2.7.32.1 Permissions The role must have the CONNECT permission at the database level. 3.2.2.7.32.2 Syntax get_ddl_statement ::= SELECT GET_DDL('[schema_name.]table_name') ; schema_name ::= identifier table_name ::= identifier 3.2.2.7.32.3 Parameters Parameter Description schema_name The name of the schema. table_name The name of the table. 476 Chapter 3. Reference SQream DB, Release 2021.1 3.2.2.7.32.4 Examples 3.2.2.7.32.5 Getting the DDL for a table The result of the GET_DDL function is a verbose version of the CREATE TABLE syntax, which may include additional information that was added by SQream DB. For example, a NULL constraint may be specified explicitly. farm=> CREATE TABLE cool_animals ( id INT NOT NULL, name varchar(30) NOT NULL, weight FLOAT, is_agressive BOOL DEFAULT false NOT NULL ); executed farm=> SELECT GET_DDL('cool_animals'); create table "public"."cool_animals" ( "id" int not null, "name" varchar(30) not null, "weight" double null, "is_agressive" bool default false not null ) ; 3.2.2.7.32.6 Exporting table DDL to a file COPY (SELECT GET_DDL('cool_animals')) TO '/home/rhendricks/animals.ddl'; 3.2.2.7.33 GET_FUNCTION_DDL GET_FUNCTION_DDL( Tip: • For tables, see GET_DDL. • For views, see GET_VIEW_DDL. • For the entire database, see DUMP_DATABASE_DDL. 3.2.2.7.33.1 Permissions The role must have the CONNECT permission at the database level. 3.2. SQL Statements and Syntax 477 SQream DB, Release 2021.1 3.2.2.7.33.2 Syntax get_function_ddl_statement ::= SELECT GET_FUNCTION_DDL('function_name') ; function_name ::= identifier 3.2.2.7.33.3 Parameters Parameter Description function_name The name of the function. 3.2.2.7.33.4 Examples 3.2.2.7.33.5 Getting the DDL for a function The result of the GET_FUNCTION_DDL function is a verbose version of the CREATE FUNCTION statement, which may include additional information that was added by SQream DB. For example, some type names and identifiers may be quoted or altered. master=> CREATE OR REPLACE FUNCTION my_distance (x1 float, y1 float, x2 float,␣ ,→ y2 float) RETURNS float as $$ import math if y1 < x1: return 0.0 else: return math.sqrt((y2 - y1) ** 2 + (x2 - x1) ** 2) $$ LANGUAGE PYTHON; executed master=> SELECT GET_FUNCTION_DDL('my_distance'); create function "my_distance" (x1 float, y1 float, x2 float, y2 float) returns float as $$ import math if y1 < x1: return 0.0 else: return math.sqrt((y2 - y1) ** 2 + (x2 - x1) ** 2) $$ language python volatile; 478 Chapter 3. Reference SQream DB, Release 2021.1 3.2.2.7.33.6 Exporting function DDL to a file COPY (SELECT GET_FUNCTION_DDL('my_distance')) TO '/home/rhendricks/my_distance.sql'; 3.2.2.7.34 GET_LICENSE_INFO SQream displays information related to data size limitations, expiration date, type of license shown by the new UF. When invoked, get_license_info() returns the license information in the following order: 1. Compressed data size (GB) 2. Uncompressed data size (GB) 3. Compression type 4. Data size limit (GB) 5. Expiration date 6. is date expired(0/1) - 0 is no, and 1 is yes 7. is size exceeded(0/1) - 0 is no, and 1 is yes 8. data_size_left (GB) The following is an example of the above licensing information: 10,100,compressed,20,2045-03-18,0,0,10 3.2.2.7.35 GET_VIEW_DDL GET_VIEW_DDL( Tip: • For tables, see GET_DDL. • For the entire database, see DUMP_DATABASE_DDL. • For UDFs, see GET_FUNCTION_DDL. 3.2.2.7.35.1 Permissions The role must have the CONNECT permission at the database level. 3.2.2.7.35.2 Syntax 3.2. SQL Statements and Syntax 479 SQream DB, Release 2021.1 get_view_ddl_statement ::= SELECT GET_VIEW_DDL('[schema_name.]view_name') ; schema_name ::= identifier view_name ::= identifier 3.2.2.7.35.3 Parameters Parameter Description schema_name The name of the schema. view_name The name of the view. 3.2.2.7.35.4 Examples 3.2.2.7.35.5 Getting the DDL for a view The result of the GET_VIEW_DDL function is a verbose version of the CREATE VIEW statement, which may include additional information that was added by SQream DB. For example, schemas and column names will be be specified explicitly. farm=> CREATE VIEW angry_animals AS SELECT * FROM cool_animals WHERE is_agressive =␣ ,→false; executed farm=> SELECT GET_VIEW_DDL('angry_animals'); create view "public".angry_animals as select "cool_animals"."id" as "id", "cool_animals"."name" as "name", "cool_animals"."weight" as "weight", "cool_animals"."is_agressive" as "is_agressive" from "public".cool_animals as cool_animals where "cool_animals"."is_agressive" = false; 3.2.2.7.35.6 Exporting view DDL to a file COPY (SELECT GET_VIEW_DDL('angry_animals')) TO '/home/rhendricks/angry_animals.sql'; 3.2.2.7.36 LIST_SAVED_QUERIES LIST_SAVED_QUERIES lists the available previously saved queries. This is an alternative way to using the savedqueries catalog view. 480 Chapter 3. Reference SQream DB, Release 2021.1 Read more in the Saved queries guide. See also: ref:save_query, EXECUTE_SAVED_QUERY , ref:drop_saved_query, ref:show_saved_query. 3.2.2.7.36.1 Permissions Listing the saved queries requires no special permissions. 3.2.2.7.36.2 Syntax list_saved_queries_statement ::= SELECT LIST_SAVED_QUERIES() ; 3.2.2.7.36.3 Returns List of saved query names, one per row. 3.2.2.7.36.4 Parameters None 3.2.2.7.36.5 Notes This statement returns an empty result set if no saved queries exist in the current database. 3.2.2.7.36.6 Examples 3.2.2.7.36.7 Listing previously saved queries t=> SELECT LIST_SAVED_QUERIES(); saved_query ------select_all select_by_weight select_by_weight_and_team t=> SELECT SHOW_SAVED_QUERY('select_by_weight_and_team'); saved_query ------SELECT * FROM nba WHERE Weight > ? AND Team = ? 3.2.2.7.36.8 Listing saved queries with the catalog Using the catalog is also possible: 3.2. SQL Statements and Syntax 481 SQream DB, Release 2021.1 t=> SELECT * FROM sqream_catalog.savedqueries; name | num_parameters ------+------select_all | 0 select_by_weight | 1 select_by_weight_and_team | 2 3.2.2.7.37 RECOMPILE_SAVED_QUERY RECOMPILE_SAVED_QUERY recompiles a saved query that has been invalidated due to a schema change. 3.2.2.7.37.1 Permissions Recompiling a saved query requires no special permissions. 3.2.2.7.37.2 Syntax recompile_saved_query_statement ::= SELECT RECOMPILE_SAVED_QUERY(saved_query_name) ; saved_query_name ::= string_literal 3.2.2.7.37.3 Returns If saved query is recompiled successfully, returns nothing. 3.2.2.7.37.4 Parameters Parameter Description saved_query_name The name of the query to recompile 3.2.2.7.37.5 Examples 3.2.2.7.37.6 Recreating a query that has been invalidated t=> SELECT SAVE_QUERY('select_by_weight_and_team',$$SELECT * FROM nba WHERE Weight > ?␣ ,→AND Team = ?$$); executed t=> SELECT EXECUTE_SAVED_QUERY('select_by_weight_and_team', 240, 'Toronto Raptors'); Name | Team | Number | Position | Age | Height | Weight |␣ ,→College | Salary ------+------+------+------+-----+------+------+------,→-----+------(continues on next page) 482 Chapter 3. Reference SQream DB, Release 2021.1 (continued from previous page) Bismack Biyombo | Toronto Raptors | 8 | C | 23 | 6-9 | 245 | ␣ ,→ | 2814000 James Johnson | Toronto Raptors | 3 | PF | 29 | 6-9 | 250 | Wake␣ ,→Forest | 2500000 Jason Thompson | Toronto Raptors | 1 | PF | 29 | 6-11 | 250 | Rider ␣ ,→ | 245177 Jonas Valanciunas | Toronto Raptors | 17 | C | 24 | 7-0 | 255 | ␣ ,→ | 4660482 To invalidate the original saved query, we will change the schema without affecting our original query text: t=> ALTER TABLE nba RENAME COLUMN age to "Age (as of 2015)"; executed However, because the query was compiled previously, this change invalidates the query and causes it to fail: t=> SELECT EXECUTE_SAVED_QUERY('select_by_weight_and_team', 240, 'Toronto Raptors'); Error: column not found {Age@null} column not found {Age@null} Recompiling the query will fix this issue t=> SELECT RECOMPILE_SAVED_QUERY('select_by_weight_and_team'); executed t=> SELECT EXECUTE_SAVED_QUERY('select_by_weight_and_team', 240, 'Toronto Raptors'); Name | Team | Number | Position | Age (as of 2015) | Height |␣ ,→Weight | College | Salary ------+------+------+------+------+------+---- ,→----+------+------Bismack Biyombo | Toronto Raptors | 8 | C | 23 | 6-9 | ␣ ,→245 | | 2814000 James Johnson | Toronto Raptors | 3 | PF | 29 | 6-9 | ␣ ,→250 | Wake Forest | 2500000 Jason Thompson | Toronto Raptors | 1 | PF | 29 | 6-11 | ␣ ,→250 | Rider | 245177 Jonas Valanciunas | Toronto Raptors | 17 | C | 24 | 7-0 | ␣ ,→255 | | 4660482 3.2.2.7.38 RECOMPILE_VIEW RECOMPILE_VIEW( 3.2.2.7.38.1 Permissions The role must have the DDL permission at the database level, as well as SELECT permissions for any tables referenced by the view. 3.2. SQL Statements and Syntax 483 SQream DB, Release 2021.1 3.2.2.7.38.2 Syntax recompile_view_statement ::= SELECT RECOMPILE_VIEW('[schema_name].view_name') ; schema_name ::= identifier view_name ::= identifier 3.2.2.7.38.3 Parameters Parameter Description schema_name The name of the schema the view is in. view_name The name of the view. 3.2.2.7.38.4 Examples 3.2.2.7.38.5 Recreating a view that has been invalidated farm=> SELECT * FROM agressive_animals; View 'public.agressive_animals' is invalid since it references a table that has been␣ ,→dropped/altered. The probable candidates are: [ "public.cool_animals" ] farm=> SELECT recompile_view('only_agressive_animals'); executed 3.2.2.7.39 SAVE_QUERY SAVE_QUERY saves a query execution plan. Read more in the Saved queries guide. See also: EXECUTE_SAVED_QUERY , DROP_SAVED_QUERY , SHOW_SAVED_QUERY , LIST_SAVED_QUERIES. 3.2.2.7.39.1 Permissions No special permissions are needed to save a query. 3.2.2.7.39.2 Syntax save_query_statement ::= SELECT SAVE_QUERY(saved_query_name, parameterized_query_string) ; (continues on next page) 484 Chapter 3. Reference SQream DB, Release 2021.1 (continued from previous page) saved_query_name ::= string_literal parameterized_query_string ::= string_literal 3.2.2.7.39.3 Returns If query is saved correctly, this statement does not return anything. 3.2.2.7.39.4 Parameters Parameter Description saved_query_name The name of the query to save. This name will identify the query table_name The query to save. Can be dollar quoted and contain parameters (?) 3.2.2.7.39.5 Notes • Query names are unique across the database and can’t be repeated unless dropped. • The query is compiled upon save, but not executed. It must be a valid query at compile time. • Query parameters can be used as substitutes for literal expressions. Parameters cannot be used to substitute identifiers, column names, table names, or other parts of the query. • Query parameters of a string datatype (like VARCHAR) must be of a fixed length, and can be used in equality checks, but not patterns (e.g. LIKE, RLIKE, etc) • Query parameters’ types are inferred at compile time. 3.2.2.7.39.6 Examples Assume a table named nba, with the following structure: CREATE TABLE nba ( Name varchar(40), Team varchar(40), Number tinyint, Position varchar(2), Age tinyint, Height varchar(4), Weight real, College varchar(40), Salary float ); Here’s a peek at the table contents (Download nba.csv): 3.2. SQL Statements and Syntax 485 SQream DB, Release 2021.1 Table 17: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics 3.2.2.7.39.7 Saving and executing a simple query t=> SELECT SAVE_QUERY('select_all','SELECT * FROM nba'); executed t=> SELECT EXECUTE_SAVED_QUERY('select_all'); Name | Team | Number | Position | Age | Height |␣ ,→Weight | College | Salary ------+------+------+------+-----+------+--- ,→-----+------+------Avery Bradley | Boston Celtics | 0 | PG | 25 | 6-2 | ␣ ,→ 180 | Texas | 7730337 Jae Crowder | Boston Celtics | 99 | SF | 25 | 6-6 | ␣ ,→ 235 | Marquette | 6796117 John Holland | Boston Celtics | 30 | SG | 27 | 6-5 | ␣ ,→ 205 | Boston University | R.J. Hunter | Boston Celtics | 28 | SG | 22 | 6-5 | ␣ ,→ 185 | Georgia State | 1148640 [...] 3.2.2.7.39.8 Saving and executing parametrized query Use parameters to replace them later at execution time. Tip: Use dollar quoting ($$) to avoid escaping strings. 486 Chapter 3. Reference SQream DB, Release 2021.1 t=> SELECT SAVE_QUERY('select_by_weight_and_team',$$SELECT * FROM nba WHERE Weight > ?␣ ,→AND Team = ?$$); executed t=> SELECT EXECUTE_SAVED_QUERY('select_by_weight_and_team', 240, 'Toronto Raptors'); Name | Team | Number | Position | Age | Height | Weight |␣ ,→College | Salary ------+------+------+------+-----+------+------+------,→-----+------Bismack Biyombo | Toronto Raptors | 8 | C | 23 | 6-9 | 245 | ␣ ,→ | 2814000 James Johnson | Toronto Raptors | 3 | PF | 29 | 6-9 | 250 | Wake␣ ,→Forest | 2500000 Jason Thompson | Toronto Raptors | 1 | PF | 29 | 6-11 | 250 | Rider ␣ ,→ | 245177 Jonas Valanciunas | Toronto Raptors | 17 | C | 24 | 7-0 | 255 | ␣ ,→ | 4660482 3.2.2.7.40 SHOW_SAVED_QUERY SHOW_SAVED_QUERY shows the query text for a previously saved query. Read more in the Saved queries guide. See also: ref:save_query, EXECUTE_SAVED_QUERY , ref:drop_saved_query, ref:list_saved_queries. 3.2.2.7.40.1 Permissions Showing a saved query requires no special permissions. 3.2.2.7.40.2 Syntax show_saved_query_statement ::= SELECT SHOW_SAVED_QUERY(saved_query_name) ; saved_query_name ::= string_literal 3.2.2.7.40.3 Returns A single row result containing the saved query string. 3.2.2.7.40.4 Parameters Parameter Description saved_query_name The name of the query to show 3.2. SQL Statements and Syntax 487 SQream DB, Release 2021.1 3.2.2.7.40.5 Examples 3.2.2.7.40.6 Showing a previously saved query t=> SELECT SAVE_QUERY('select_by_weight_and_team',$$SELECT * FROM nba WHERE Weight > ?␣ ,→AND Team = ?$$); executed t=> SELECT SHOW_SAVED_QUERY('select_by_weight_and_team'); saved_query ------SELECT * FROM nba WHERE Weight > ? AND Team = ? 3.2.2.7.41 EXPLAIN EXPLAIN returns a static query plan, which can be used to debug query plans. To see an actively running query or statement, use SHOW_NODE_INFO instead. See also SHOW_NODE_INFO, SHOW_SERVER_STATUS. 3.2.2.7.41.1 Permissions The role must have the SELECT permissions for any tables referenced by the query. 3.2.2.7.41.2 Syntax explain_statement ::= SELECT EXPLAIN(query_stmt) ; 3.2.2.7.41.3 Parameters Parameter Description query_stmt The select query to generate the plan for. 3.2.2.7.41.4 Notes Use dollar-quoting to escape the query text 3.2.2.7.41.5 Examples 3.2.2.7.41.6 Generating a static query plan 488 Chapter 3. Reference SQream DB, Release 2021.1 t=> SELECT EXPLAIN($$SELECT DATEADD(hour,1,dt) FROM cool_dates$$); Select Specific (TTNative Gpu) (dateadd@null := dt@null, dateadd@val := (pure_if dt@null "0" (addHoursdt2dt dt@val ,→"1"))) Table Scan public.cool_dates ("dt@null", "dt@val") [] 3.2.2.7.42 SHOW_CONNECTIONS SHOW_CONNECTIONS returns a list of active sessions on the current worker. To see sessions across the cluster, see SHOW_SERVER_STATUS. 3.2.2.7.42.1 Permissions The role must have the SUPERUSER permissions. 3.2.2.7.42.2 Syntax show_connections_statement ::= SELECT SHOW_CONNECTIONS() ; 3.2.2.7.42.3 Parameters None 3.2.2.7.42.4 Returns This function returns a list of active sessions. If no sessions are active on the worker, the result set will be empty. Table 18: Result columns ip The worker hostname or IP conn_id Connection ID conn_start_time Connection start timestamp stmt_id Statement ID. Connections with no active statement display -1. stmt_start_time Statement start timestamp stmt Statement text 3.2.2.7.42.5 Notes • This utility shows the active connections. Some sessions may be actively connected, but not currently running a statement. 3.2. SQL Statements and Syntax 489 SQream DB, Release 2021.1 • A connection is typically reused. There could be many statements under a single connection ID. 3.2.2.7.42.6 Examples 3.2.2.7.42.7 Using SHOW_CONNECTIONS to get statement IDs t=> SELECT SHOW_CONNECTIONS(); ip | conn_id | conn_start_time | stmt_id | stmt_start_time | stmt ------+------+------+------+------+------,→------192.168.1.91 | 103 | 2019-12-24 00:01:27 | 129 | 2019-12-24 00:38:18 | SELECT␣ ,→GET_DATE(), * F... 192.168.1.91 | 23 | 2019-12-24 00:01:27 | -1 | 2019-12-24 00:01:27 | 192.168.1.91 | 22 | 2019-12-24 00:01:27 | -1 | 2019-12-24 00:01:27 | 192.168.1.91 | 26 | 2019-12-24 00:01:28 | -1 | 2019-12-24 00:01:28 | The statement ID we’re interested in is 129. We can see the connection started at 00:01:27, while the statement started at 00:38:18. 3.2.2.7.43 SHOW_LOCKS SHOW_LOCKS returns a list of locks from across the cluster. Read more about locks in Concurrency and locks. 3.2.2.7.43.1 Permissions The role must have the SUPERUSER permissions. 3.2.2.7.43.2 Syntax show_locks_statement ::= SELECT SHOW_LOCKS() ; 3.2.2.7.43.3 Parameters None 3.2.2.7.43.4 Returns This function returns a list of active locks. If no locks are active in the cluster, the result set will be empty. 490 Chapter 3. Reference SQream DB, Release 2021.1 Table 19: Result columns stmt_id Statement ID that caused the lock. stmt_string Statement text username The role that executed the statement server The worker node’s IP port The worker node’s port locked_object The full qualified name of the object being locked, separated with $ (e.g. table$t$public$nba2 for table nba2 in schema public, in database t lockmode The locking mode (inclusive or exclusive). statement_start_timeTimestamp the statement started lock_start_timeTimestamp the lock was obtained 3.2.2.7.43.5 Examples 3.2.2.7.43.6 Using SHOW_LOCKS to see active locks In this example, we create a table based on results (CREATE TABLE AS), but we are also effectively dropping the previous table (by using OR REPLACE). Thus, SQream DB applies locks during the table creation process to prevent the table from being altered during it’s creation. t=> SELECT SHOW_LOCKS(); statement_id | statement_string ␣ ,→ | username | server | port | locked_object ␣ ,→ | lockmode | statement_start_time | lock_start_time ------+------,→------+------+------+------+------,→--+------+------+------287 | CREATE OR REPLACE TABLE nba2 AS SELECT "Name" FROM nba WHERE REGEXP_COUNT( ,→"Name", '( )+', 8)>1; | sqream | 192.168.1.91 | 5000 | database$t ␣ ,→ | Inclusive | 2019-12-26 00:03:30 | 2019-12-26 00:03:30 287 | CREATE OR REPLACE TABLE nba2 AS SELECT "Name" FROM nba WHERE REGEXP_COUNT( ,→"Name", '( )+', 8)>1; | sqream | 192.168.1.91 | 5000 | globalpermission$ ␣ ,→ | Exclusive | 2019-12-26 00:03:30 | 2019-12-26 00:03:30 287 | CREATE OR REPLACE TABLE nba2 AS SELECT "Name" FROM nba WHERE REGEXP_COUNT( ,→"Name", '( )+', 8)>1; | sqream | 192.168.1.91 | 5000 | schema$t$public ␣ ,→ | Inclusive | 2019-12-26 00:03:30 | 2019-12-26 00:03:30 287 | CREATE OR REPLACE TABLE nba2 AS SELECT "Name" FROM nba WHERE REGEXP_COUNT( ,→"Name", '( )+', 8)>1; | sqream | 192.168.1.91 | 5000 | table$t$public$nba2$Insert ␣ ,→ | Exclusive | 2019-12-26 00:03:30 | 2019-12-26 00:03:30 287 | CREATE OR REPLACE TABLE nba2 AS SELECT "Name" FROM nba WHERE REGEXP_COUNT( ,→"Name", '( )+', 8)>1; | sqream | 192.168.1.91 | 5000 | table$t$public$nba2$Update ␣ ,→ | Exclusive | 2019-12-26 00:03:30 | 2019-12-26 00:03:30 3.2.2.7.44 SHOW_NODE_INFO SHOW_NODE_INFO returns a snapshot of the current query plan, similar to EXPLAIN ANALYZE from other databases. The snapshot provides information about execution which can be used for monitoring and troubleshooting slow running statements by helping identify long-running execution nodes (components that process data), 3.2. SQL Statements and Syntax 491 SQream DB, Release 2021.1 etc. See also EXPLAIN, SHOW_SERVER_STATUS. 3.2.2.7.44.1 Permissions The role must have the SUPERUSER permissions. 3.2.2.7.44.2 Syntax show_node_info_statement ::= SELECT SHOW_NODE_INFO(stmt_id) ; stmt_id ::= bigint 3.2.2.7.44.3 Parameters Parameter Description stmt_id The statement ID to explore 3.2.2.7.44.4 Returns This utility returns details of the execution nodes, if the statement is still running. If the statement has finished, or the statment ID does not exist, the utility returns an empty result set. Table 20: Result columns Column name Description stmt_id The ID for the statement node_id This node’s ID in this execution plan node_type The node type rows Total number of rows this node has processed chunks Number of chunks this node has processed avg_rows_in_chunk Average amount of rows that this node processes in each chunk (rows / chunks) time Timestamp for this node’s creation parent_node_id The node_id of this node’s parent read Total data read from disk write Total data written to disk comment Additional information (e.g. table name for ReadTable) timesum Total elapsed time for this execution node’s processing 3.2.2.7.44.5 Node types This is a full list of node types: 492 Chapter 3. Reference SQream DB, Release 2021.1 Table 21: Node types Column name Execution location Description AddChunkId Used during insert operations AddSequences Used during insert operations, with identity columns AddMinMaxMetadata Used to calculate ranges for the chunk metadata system AddTopSortFilters An operation to optimize LIMIT when used alongside ORDER BY Compress CPU and GPU Compress data with both CPU and GPU schemes CpuDecompress CPU Decompression operation, com- mon for longer VARCHAR types CpuLoopJoin CPU A non-indexed nested loop join, performed on the CPU CpuReduce CPU A reduce process performed on the CPU, primarily with DISTINCT aggregates (e.g. COUNT(DISTINCT ...)) CpuToGpu, GpuToCpu An operation that moves data to or from the GPU for processing CpuTransform CPU A transform operation performed on the CPU, usually a scalar function CrossJoin A join without a join condition DeferredGather CPU Merges the results of GPU oper- ations with a result set1 Distinct GPU Removes duplicate rows (usually as part of the DISTINCT opera- tion) Distinct_Merge CPU The merge operation of the Distinct operation Filter GPU A filtering operation, such asa WHERE or JOIN clause GpuCopy GPU Copies data between GPUs or within a single GPU GpuDecompress GPU Decompression operation GpuReduceMerge GPU An operation to optimize part of the merger phases in the GPU GpuTransform GPU A transformation operation such as a type cast or scalar function Hash CPU Hashes the output result. Used internally. JoinSideMarker Used internally. LiteralValues CPU Creates a virtual relation (table), when VALUES is used LocateFiles CPU Validates external file paths for foreign data wrappers, expanding directories and GLOB patterns LoopJoin GPU A non-indexed nested loop join, performed on the GPU Continued on next page 3.2. SQL Statements and Syntax 493 SQream DB, Release 2021.1 Table 21 – continued from previous page Column name Execution location Description MarkMatchedJoinRows Used in outer joins, matches rows for larger join operations NullifyDuplicates Replaces duplicate values with NULL to calculate distinct aggre- gates NullSink CPU Used internally ParseCsv CPU A CSV parser, used after ReadFiles to convert the CSV into columnar data PopNetworkQueue CPU Fetches data from the network queue (e.g. when used with IN- SERT) PushToNetworkQueue CPU Sends result sets to a client con- nected over the network ReadCatalog CPU Converts the catalog into a rela- tion (table) ReadFiles CPU Reads external flat-files ReadOrc CPU Reads data from an ORC file ReadParquet CPU Reads data from a Parquet file ReadTable CPU Reads data from a standard table stored on disk ReadTableMetadata CPU Reads only table metadata as an optimization Rechunk Reorganize multiple small chunks into a full chunk. Commonly found after joins and when HIGH_SELECTIVITY is used Reduce GPU A reduction operation, such as a GROUP BY ReduceMerge GPU A merge operation of a reduc- tion operation, helps operate on larger-than-RAM data ReorderInput Change the order of arguments in preparation for the next oper- ation SeparatedGather GPU Gathers additional columns for the result Sort GPU Sort operation2 SortMerge CPU A merge operation of a sort op- eration, helps operate on larger- than-RAM data SortMergeJoin GPU A sort-merge join, performed on the GPU TakeRowsFromChunk Take the first N rows from each chunk, to optimize LIMIT when used alongside ORDER BY Top Limits the input size, when used with LIMIT (or its alias TOP) UdfTransform CPU Executes a user defined function Continued on next page 494 Chapter 3. Reference SQream DB, Release 2021.1 Table 21 – continued from previous page Column name Execution location Description UnionAll Combines two sources of data when UNION ALL is used VarCharJoiner Used internally VarCharSplitter Used intenrally Window GPU Executes a non-ranking window function WindowRanking GPU Executes a ranking window func- tion WriteTable CPU Writes the result set to a stan- dard table stored on disk 3.2.2.7.44.6 Statement statuses Table 22: Statement status values Status Description Preparing Statement is being prepared In queue Statement is waiting for execution Initializing Statement has entered execution checks Executing Statement is executing Stopping Statement is in the process of stopping 3.2.2.7.44.7 Notes • This utility shows the execution information for active statements. Once a query has finished execution, the information is no longer available using this utility. See Logging for more information about extracting the information from the logs. • This utility is primarily intended for troubleshooting with SQream support. 3.2.2.7.44.8 Examples 3.2.2.7.44.9 Getting execution details for a statement t=> SELECT SHOW_SERVER_STATUS(); service | instanceid | connection_id | serverip | serverport | database_name | user_ ,→name | clientip | statementid | statement ␣ ,→ | statementstarttime | statementstatus | statementstatusstart ------+------+------+------+------+------+------,→----+------+------+------,→------+------+------+------sqream | | 152 | 192.168.1.91 | 5000 | t | sqream␣ ,→ | 192.168.1.91 | 176 | SELECT "Name" FROM nba WHERE REGEXP_COUNT("Name", '(␣ ,→)+', 8)>1; | 25-12-2019 23:53:13 | Executing | 25-12-2019 23:53:13 (continues on next page) 1 Gathers columns which should be returned. This node typically spends most of the time on decompressing additional columns. 2 A GPU sort operation can be added by the statement compiler before GROUP BY or JOIN operations. 3.2. SQL Statements and Syntax 495 SQream DB, Release 2021.1 (continued from previous page) sqream | | 151 | 192.168.1.91 | 5000 | t | sqream␣ ,→ | 192.168.0.1 | 177 | SELECT show_server_status() ␣ ,→ | 25-12-2019 23:51:31 | Executing | 25-12-2019 23:53:13 The statement ID we want to reserach is 176, running on worker 192.168.1.91. The query text is SELECT "Name" FROM nba WHERE REGEXP_COUNT("Name", '( )+', 8)>1; t=> SELECT SHOW_NODE_INFO(176); stmt_id | node_id | node_type | rows | chunks | avg_rows_in_chunk | time ␣ ,→ | parent_node_id | read | write | comment | timeSum ------+------+------+------+------+------+------,→------+------+------+------+------+------176 | 1 | PushToNetworkQueue | 1 | 1 | 1 | 2019-12-25␣ ,→23:53:13 | -1 | | | | 0.0025 176 | 2 | Rechunk | 1 | 1 | 1 | 2019-12-25␣ ,→23:53:13 | 1 | | | | 0 176 | 3 | GpuToCpu | 1 | 1 | 1 | 2019-12-25␣ ,→23:53:13 | 2 | | | | 0 176 | 4 | ReorderInput | 1 | 1 | 1 | 2019-12-25␣ ,→23:53:13 | 3 | | | | 0 176 | 5 | Filter | 1 | 1 | 1 | 2019-12-25␣ ,→23:53:13 | 4 | | | | 0.0002 176 | 6 | GpuTransform | 457 | 1 | 457 | 2019-12-25␣ ,→23:53:13 | 5 | | | | 0.0002 176 | 7 | GpuDecompress | 457 | 1 | 457 | 2019-12-25␣ ,→23:53:13 | 6 | | | | 0 176 | 8 | CpuToGpu | 457 | 1 | 457 | 2019-12-25␣ ,→23:53:13 | 7 | | | | 0.0003 176 | 9 | Rechunk | 457 | 1 | 457 | 2019-12-25␣ ,→23:53:13 | 8 | | | | 0 176 | 10 | CpuDecompress | 457 | 1 | 457 | 2019-12-25␣ ,→23:53:13 | 9 | | | | 0 176 | 11 | ReadTable | 457 | 1 | 457 | 2019-12-25␣ ,→23:53:13 | 10 | 4MB | | public.nba | 0.0004 3.2.2.7.45 SHOW_SERVER_STATUS SHOW_SERVER_STATUS returns a list of active sessions across the cluster. To list active statements on the current worker only, see SHOW_CONNECTIONS. 3.2.2.7.45.1 Permissions The role must have the SUPERUSER permissions. 3.2.2.7.45.2 Syntax 496 Chapter 3. Reference SQream DB, Release 2021.1 show_server_status_statement ::= SELECT SHOW_SERVER_STATUS() ; 3.2.2.7.45.3 Parameters None 3.2.2.7.45.4 Returns This function returns a list of active sessions. If no sessions are active across the cluster, the result set will be empty. Table 23: Result columns service The service name for the statement instance The worker ID connection_id Connection ID serverip Worker end-point IP serverport Worker end-point port database_name Database name for the statement user_name Username running the statement clientip Client IP statementid Statement ID statement Statement text statementstarttime Statement start timestamp statementstatus Statement status (see table below) statementstatusstart Last updated timestamp Table 24: Statement status values Status Description Preparing Statement is being prepared In queue Statement is waiting for execution Initializing Statement has entered execution checks Executing Statement is executing Stopping Statement is in the process of stopping 3.2.2.7.45.5 Notes • This utility shows the active sessions. Some sessions may be actively connected, but not running any statements. 3.2. SQL Statements and Syntax 497 SQream DB, Release 2021.1 3.2.2.7.45.6 Examples 3.2.2.7.45.7 Using SHOW_SERVER_STATUS to get statement IDs t=> SELECT SHOW_SERVER_STATUS(); service | instanceid | connection_id | serverip | serverport | database_name | user_ ,→name | clientip | statementid | statement | statementstarttime ␣ ,→| statementstatus | statementstatusstart ------+------+------+------+------+------+------,→-----+------+------+------+------+- ,→------+------sqream | | 102 | 192.168.1.91 | 5000 | t |␣ ,→rhendricks | 192.168.0.1 | 128 | SELECT SHOW_SERVER_STATUS() | 24-12-2019␣ ,→00:14:53 | Executing | 24-12-2019 00:14:53 The statement ID is 128, running on worker 192.168.1.91. 3.2.2.7.46 STOP_STATEMENT STOP_STATEMENT stops or aborts an active statement. To find a statement by ID, see SHOW_SERVER_STATUS and SHOW_CONNECTIONS. Tip: Some DBMSs call this process killing a session, terminating a job, or kill query 3.2.2.7.46.1 Permissions The role must have the SUPERUSER permissions. 3.2.2.7.46.2 Syntax stop_statement_statement ::= SELECT STOP_STATEMENT(stmt_id) ; stmt_id ::= bigint 3.2.2.7.46.3 Parameters Parameter Description stmt_id The statement ID to stop 3.2.2.7.46.4 Returns This utility does not return any value, and always succeeds even if the statement does not exist, or has already stopped. 498 Chapter 3. Reference SQream DB, Release 2021.1 3.2.2.7.46.5 Notes • This utility always succeeds even if the statement does not exist, or has already stopped. 3.2.2.7.46.6 Examples 3.2.2.7.46.7 Using SHOW_CONNECTIONS to get statement IDs Tip: Use SHOW_SERVER_STATUS to find statments from across the entire cluster, or SHOW_CONNECTIONS to show statements from the current worker the client is connected to. t=> SELECT SHOW_CONNECTIONS(); ip | conn_id | conn_start_time | stmt_id | stmt_start_time | stmt ------+------+------+------+------+------,→------192.168.1.91 | 103 | 2019-12-24 00:01:27 | 129 | 2019-12-24 00:38:18 | SELECT␣ ,→GET_DATE(), * F... 192.168.1.91 | 23 | 2019-12-24 00:01:27 | -1 | 2019-12-24 00:01:27 | 192.168.1.91 | 22 | 2019-12-24 00:01:27 | -1 | 2019-12-24 00:01:27 | 192.168.1.91 | 26 | 2019-12-24 00:01:28 | -1 | 2019-12-24 00:01:28 | The statement ID we’re interested in is 129. We can now stop this statement: t=> SELECT STOP_STATEMENT(129) executed 3.2.2.7.47 SHOW_SUBSCRIBED_INSTANCES SHOW_SUBSCRIBED_INSTANCES lists the cluster workers and their service queues. Note: If you haven’t already, read the Workload manager guide. See also SUBSCRIBE_SERVICE, UNSUBSCRIBE_SERVICE. 3.2.2.7.47.1 Permissions To list the service queues, the current role must have the SUPERUSER permission. 3.2.2.7.47.2 Syntax show_subscribed_instances_statement ::= SELECT SHOW_SUBSCRIBED_INSTANCES(); 3.2. SQL Statements and Syntax 499 SQream DB, Release 2021.1 3.2.2.7.47.3 Parameters None 3.2.2.7.47.4 Notes This function applies to clusters using the workload manager. 3.2.2.7.47.5 Examples 3.2.2.7.47.6 List service queues and workers Each worker is assigned an automatic ID, and is subscribed to the 'sqream' queue upon start. t=> SELECT SHOW_SUBSCRIBED_INSTANCES(); service | servernode | serverip | serverport ------+------+------+------sqream | node_9383 | 192.168.0.111 | 5000 sqream | node_9384 | 192.168.0.111 | 5001 sqream | node_9385 | 192.168.0.111 | 5002 sqream | node_9551 | 192.168.1.91 | 5000 This list shows all cluster nodes, their service names, IPs, and ports. 3.2.2.7.48 SUBSCRIBE_SERVICE SUBSCRIBE_SERVICE subscribes a worker to a service queue for the duration of the connected session. Note: If you haven’t already, read the Workload manager guide. See also UNSUBSCRIBE_SERVICE, SHOW_SUBSCRIBED_INSTANCES. 3.2.2.7.48.1 Permissions To subscribe a worker to a service, the current role must have the SUPERUSER permission. 3.2.2.7.48.2 Syntax subscribe_service_statement ::= SELECT SUBSCRIBE_SERVICE( worker_id, service_name ); worker_id ::= text service_name ::= text 500 Chapter 3. Reference SQream DB, Release 2021.1 3.2.2.7.48.3 Parameters Parameter Description worker_id The name of the worker to subscribe service_name The service name to subscribe to. If the service does not exist, it will be created 3.2.2.7.48.4 Notes • If the service name does not currently exist, it will be created Warning: SUBSCRIBE_SERVICE applies the service subscription immediately, but the setting applies for the duration of the session. To apply a persistent setting, use the initialSubscribedServices configuration setting. Read the Workload manager guide for more information. 3.2.2.7.48.5 Examples 3.2.2.7.48.6 Subscribing a worker to a service queue A common use of the workload manager is to separate ETL tasks from BI tasks. Each worker is assigned an automatic ID, and is subscribed to the 'sqream' queue upon start. t=> SELECT SHOW_SUBSCRIBED_INSTANCES(); service | servernode | serverip | serverport ------+------+------+------sqream | node_9383 | 192.168.0.111 | 5000 sqream | node_9384 | 192.168.0.111 | 5001 sqream | node_9385 | 192.168.0.111 | 5002 sqream | node_9551 | 192.168.1.91 | 5000 We want to modify node_9551 to join the ETL queue, which will be created when we subscribe it: t=> SELECT SUBSCRIBE_SERVICE('node_9551','etl'); executed t=> SELECT SHOW_SUBSCRIBED_INSTANCES(); service | servernode | serverip | serverport ------+------+------+------sqream | node_9383 | 192.168.0.111 | 5000 sqream | node_9384 | 192.168.0.111 | 5001 sqream | node_9385 | 192.168.0.111 | 5002 sqream | node_9551 | 192.168.1.91 | 5000 etl | node_9551 | 192.168.1.91 | 5000 To connect to the new queue, use the service parameter in the connection string. A statement executed from a connections to the etl service will be routed to an available worker that is subscribed to the etl service queue. 3.2. SQL Statements and Syntax 501 SQream DB, Release 2021.1 3.2.2.7.49 UNSUBSCRIBE_SERVICE UNSUBSCRIBE_SERVICE unsubscribes a worker from a service queue for the duration of the connected session. Note: If you haven’t already, read the Workload manager guide. See also SUBSCRIBE_SERVICE, SHOW_SUBSCRIBED_INSTANCES. 3.2.2.7.49.1 Permissions To unsubscribe a worker from a service, the current role must have the SUPERUSER permission. 3.2.2.7.49.2 Syntax subscribe_service_statement ::= SELECT UNSUBSCRIBE_SERVICE( worker_id, service_name ); worker_id ::= text service_name ::= text 3.2.2.7.49.3 Parameters Parameter Description worker_id The name of the worker to unsubscribe service_name The service name to unsubscribe from 3.2.2.7.49.4 Notes • If the service name does not currently exist, it will be created Warning: UNSUBSCRIBE_SERVICE applies the service subscription immediately, but the setting applies for the duration of the session. To apply a persistent setting, use the initialSubscribedServices configuration setting. Read the Workload manager guide for more information. 3.2.2.7.49.5 Examples 3.2.2.7.49.6 Unsubscribing a worker from a service queue Each worker is assigned an automatic ID, and is subscribed to the 'sqream' queue upon start. 502 Chapter 3. Reference SQream DB, Release 2021.1 t=> SELECT SHOW_SUBSCRIBED_INSTANCES(); service | servernode | serverip | serverport ------+------+------+------sqream | node_9383 | 192.168.0.111 | 5000 sqream | node_9384 | 192.168.0.111 | 5001 sqream | node_9385 | 192.168.0.111 | 5002 sqream | node_9551 | 192.168.1.91 | 5000 etl | node_9551 | 192.168.1.91 | 5000 We want to modify node_9551 to leave the ETL queue: t=> SELECT UNSUBSCRIBE_SERVICE('node_9551','etl'); executed t=> SELECT SHOW_SUBSCRIBED_INSTANCES(); service | servernode | serverip | serverport ------+------+------+------sqream | node_9383 | 192.168.0.111 | 5000 sqream | node_9384 | 192.168.0.111 | 5001 sqream | node_9385 | 192.168.0.111 | 5002 sqream | node_9551 | 192.168.1.91 | 5000 3.2.2.7.50 ALTER DEFAULT PERMISSIONS ALTER DEFAULT PERMISSIONS allows granting automatic permissions to future objects. By default, if one user creates a table, another user will not have SELECT permissions on it. By modifying the target role’s default permissions, a database administrator can ensure that all objects created by that role will be accessible to others. Learn more about the permission system in the access control guide. 3.2.2.7.50.1 Permissions To alter default permissions, the current role must have the SUPERUSER permission. 3.2.2.7.50.2 Syntax alter_default_permissions_statement ::= ALTER DEFAULT PERMISSIONS FOR target_role_name [IN schema_name, ...] FOR { TABLES | SCHEMAS } { grant_clause | DROP grant_clause} TO ROLE { role_name | public }; grant_clause ::= GRANT { CREATE FUNCTION | SUPERUSER | CONNECT (continues on next page) 3.2. SQL Statements and Syntax 503 SQream DB, Release 2021.1 (continued from previous page) | CREATE | USAGE | SELECT | INSERT | DELETE | DDL | EXECUTE | ALL } target_role_name ::= identifier role_name ::= identifier schema_name ::= identifier 3.2.2.7.50.3 Supported permissions Permission Object Description LOGIN Cluster Login permissions (with a password) allows a role to be a user and login to a database PASSWORD Cluster Sets the password for a user role CREATE Database Allows a user to create a Python UDF FUNCTION SUPERUSER Cluster, Database, The most privileged role, with full control over a cluster, Schema database, or schema CONNECT Database Allows a user to connect and use a database CREATE Database, Schema, Table For a role to create and manage objects, it needs the CREATE and USAGE permissions at the respective level USAGE Schema For a role to see tables in a schema, it needs the USAGE permis- sions SELECT Table Allows a user to run SELECT queries on table contents INSERT Table Allows a user to run COPY FROM and INSERT statements to load data into a table DELETE Table Allows a user to run DELETE, TRUNCATE statements to delete data from a table DDL Database, Schema, Table, Allows a user to alter tables, rename columns and tables, etc. Function EXECUTE Function Allows a user to execute UDFs ALL Cluster, Database, All of the above permissions at the respective level Schema, Table, Function 3.2.2.7.50.4 Examples 3.2.2.7.50.5 Automatic permissions for newly created schemas When role demo creates a new schema, roles u1,u2 will get USAGE and CREATE permissions in the new schema: 504 Chapter 3. Reference SQream DB, Release 2021.1 ALTER DEFAULT PERMISSIONS FOR demo FOR SCHEMAS GRANT USAGE, CREATE TO u1,u2; 3.2.2.7.50.6 Automatic permissions for newly created tables in a schema When role demo creates a new table in schema s1, roles u1,u2 wil be granted with SELECT on it: ALTER DEFAULT PERMISSIONS FOR demo IN s1 FOR TABLES GRANT SELECT TO u1,u2; 3.2.2.7.50.7 Revoke (DROP GRANT) permissions for newly created tables ALTER DEFAULT PERMISSIONS FOR public FOR TABLES DROP GRANT SELECT,DDL,INSERT,DELETE TO␣ ,→public; 3.2.2.7.51 ALTER ROLE ALTER ROLE can be used to make changes to a role. It works in conjunction with subcommands. Learn more about the permission system in the access control guide. 3.2.2.7.51.1 Subcommands Command Usage RENAME ROLE Rename a role 3.2.2.7.52 CREATE ROLE CREATE ROLE creates roles which can be assigned permissions. A ROLE can be used as a USER if it has been granted a password and login permissions. Learn more about the permission system in the access control guide. See also DROP ROLE, ALTER ROLE, GRANT. 3.2.2.7.52.1 Permissions To create a role, the current role must have the SUPERUSER permission. 3.2.2.7.52.2 Syntax create_role_statement ::= CREATE ROLE role_name ; role_name ::= identifier 3.2. SQL Statements and Syntax 505 SQream DB, Release 2021.1 3.2.2.7.52.3 Parameters Parameter Description role_name The name of the role to create, which must be unique across the cluster 3.2.2.7.52.4 Notes SQream recommend that role names should follow these rules: • Are case-insensitive • Start with either a letter or underscore • Contain only ASCII letters, numbers, or underscores • Follow rules for Identifiers • A role has no permissions by default. Follow the examples below to see how to grant permissions to databases and tables. • Roles can be members of other roles. • Roles must be unique across the cluster. • Roles cannot log in by default. They do not have a password or login permissions until granted. 3.2.2.7.52.5 Examples 3.2.2.7.52.6 Creating a role with no permissions CREATE ROLE new_role; 3.2.2.7.52.7 Creating a user role A user role has permissions to login, and has a password. Tip: Some DBMSs call this CREATE USER or ADD USER CREATE ROLE new_role; GRANT LOGIN to new_role; GRANT PASSWORD 'Tr0ub4dor&3' to new_role; GRANT CONNECT ON DATABASE master to new_role; -- Repeat for other desired databases 3.2.2.7.53 DROP ROLE DROP ROLE remove roles. Learn more about the permission system in the access control guide. See also CREATE ROLE. 506 Chapter 3. Reference SQream DB, Release 2021.1 3.2.2.7.53.1 Permissions To drop a role, the current role must have the SUPERUSER permission. 3.2.2.7.53.2 Syntax drop_role_statement ::= DROP ROLE role_name ; role_name ::= identifier 3.2.2.7.53.3 Parameters Parameter Description role_name The name of the role to drop. 3.2.2.7.53.4 Examples 3.2.2.7.53.5 Dropping a role DROP ROLE new_role; 3.2.2.7.54 GET_STATEMENT_PERMISSIONS GET_STATEMENT_PERMISSIONS analyzes an SQL statement and returns a list of permissions required to execute it. Use this function to understand the permissions required, before granting them to a specific role. Learn more about the permission system in the access control guide. See also GRANT, CREATE ROLE. 3.2.2.7.54.1 Permissions No special permissions are required to run GET_STATEMENT_PERMISSIONS. 3.2.2.7.54.2 Syntax get_statement_permissions_statement ::= SELECT GET_STATEMENT_PERMISSIONS(query_stmt) ; 3.2. SQL Statements and Syntax 507 SQream DB, Release 2021.1 3.2.2.7.54.3 Parameters Parameter Description query_stmt The statement to analyze 3.2.2.7.54.4 Returns This utility returns details of the required permissions to run the statement. If the statement requires no special permissions, the utility returns an empty result set. Table 25: Result columns permission_type The permission type required object_type The object on which the permission is required object_name The object name 3.2.2.7.54.5 Supported permissions Permission Object Description LOGIN Cluster Login permissions (with a password) allows a role to be a user and login to a database PASSWORD Cluster Sets the password for a user role CREATE Database Allows a user to create a Python UDF FUNCTION SUPERUSER Cluster, Database, The most privileged role, with full control over a cluster, Schema database, or schema CONNECT Database Allows a user to connect and use a database CREATE Database, Schema, Table For a role to create and manage objects, it needs the CREATE and USAGE permissions at the respective level USAGE Schema For a role to see tables in a schema, it needs the USAGE permis- sions SELECT Table Allows a user to run SELECT queries on table contents INSERT Table Allows a user to run COPY FROM and INSERT statements to load data into a table DELETE Table Allows a user to run DELETE, TRUNCATE statements to delete data from a table DDL Database, Schema, Table, Allows a user to alter tables, rename columns and tables, etc. Function EXECUTE Function Allows a user to execute UDFs ALL Cluster, Database, All of the above permissions at the respective level Schema, Table, Function 508 Chapter 3. Reference SQream DB, Release 2021.1 3.2.2.7.54.6 Examples 3.2.2.7.54.7 Getting permission details for a simple statement t=> SELECT GET_STATEMENT_PERMISSIONS('SELECT * from nba'); permission_type | object_type | object_name ------+------+------SELECT | table | master.public.nba USAGE | schema | master.public 3.2.2.7.54.8 Getting permission details for a DDL statement Tip: Use dollar quoting ($$) to avoid escaping a statement t=> SELECT GET_STATEMENT_PERMISSIONS($$ALTER TABLE nba RENAME COLUMN "Weight" TO "Mass"$ ,→$); permission_type | object_type | object_name ------+------+------DDL | table | master.public.nba USAGE | schema | master.public 3.2.2.7.55 GRANT The GRANT statement adds permissions for a role. It allows for setting permissions to databases, schemas, and tables. It also allows adding a role as a memeber to another role. When granting permissions to a role, the permission is added for existing objects only. To automatically add permissions to newly created objects, see ALTER DEFAULT PERMISSIONS. Learn more about the permission system in the access control guide. See also REVOKE, CREATE ROLE. 3.2.2.7.55.1 Permissions To grant permissions, the current role must have the SUPERUSER permission, or have the ADMIN OPTION. 3.2.2.7.55.2 Syntax grant_statement ::= { -- Grant permissions at the cluster level: GRANT { SUPERUSER | LOGIN (continues on next page) 3.2. SQL Statements and Syntax 509 SQream DB, Release 2021.1 (continued from previous page) | PASSWORD 'password' } TO role_name [, ...] -- Grant permissions at the database level: | GRANT { { CREATE | CONNECT | DDL | SUPERUSER | CREATE FUNCTION } [, ...] | ALL [PERMISSIONS] } ON DATABASE database_name [, ...] TO role_name [, ...] -- Grant permissions at the schema level: | GRANT { { CREATE | DDL | USAGE | SUPERUSER } [, ...] | ALL [PERMISSIONS ] } ON SCHEMA schema_name [, ...] TO role_name [, ...] -- Grant permissions at the object level: | GRANT { { SELECT | INSERT | DELETE | DDL } [, ...] | ALL [PERMISSIONS] } ON { TABLE table_name [, ...] | ALL TABLES IN SCHEMA schema_name [, ...]} TO role_name [, ...] -- Grant execute function permission: | GRANT { ALL | EXECUTE | DDL } ON FUNCTION function_name (continues on next page) 510 Chapter 3. Reference SQream DB, Release 2021.1 (continued from previous page) TO role_name [, ...] -- Pass permissions between roles by granting one role to another: | GRANT role_name [, ...] TO role_name_2 [ WITH ADMIN OPTION ] ; role_name ::= identifier role_name2 ::= identifier database_name ::= identifier table_name ::= identifier schema_name ::= identifier 3.2.2.7.55.3 Parameters Parameter Description role_name The name of the role to grant permissions to table_name, Object to grant permissions on. database_name, schema_name, function_name WITH ADMIN If WITH ADMIN OPTION is specified, the role that has the admin option can inturn OPTION grant membership in the role to others, and revoke membership in the role as well. Without the admin option, ordinary roles cannot grant or revoke membership. Roles with SUPERUSER can grant or revoke membership in any role to anyone. 3.2. SQL Statements and Syntax 511 SQream DB, Release 2021.1 3.2.2.7.55.4 Supported permissions Permission Object Description LOGIN Cluster Login permissions (with a password) allows a role to be a user and login to a database PASSWORD Cluster Sets the password for a user role CREATE Database Allows a user to create a Python UDF FUNCTION SUPERUSER Cluster, Database, The most privileged role, with full control over a cluster, Schema database, or schema CONNECT Database Allows a user to connect and use a database CREATE Database, Schema, Table For a role to create and manage objects, it needs the CREATE and USAGE permissions at the respective level USAGE Schema For a role to see tables in a schema, it needs the USAGE permis- sions SELECT Table Allows a user to run SELECT queries on table contents INSERT Table Allows a user to run COPY FROM and INSERT statements to load data into a table DELETE Table Allows a user to run DELETE, TRUNCATE statements to delete data from a table DDL Database, Schema, Table, Allows a user to alter tables, rename columns and tables, etc. Function EXECUTE Function Allows a user to execute UDFs ALL Cluster, Database, All of the above permissions at the respective level Schema, Table, Function 3.2.2.7.55.5 Examples 3.2.2.7.55.6 Creating a user role with login permissions Convert a role to a user by granting a password and login permissions CREATE ROLE new_role; GRANT LOGIN to new_role; GRANT PASSWORD 'Tr0ub4dor&3' to new_role; GRANT CONNECT ON DATABASE master to new_role; -- Repeat for other desired databases 3.2.2.7.55.7 Promoting a user to a superuser -- On the entire cluster GRANT SUPERUSER TO new_role; -- For a specific database GRANT SUPERUSER ON DATABASE my_database TO new_role; 512 Chapter 3. Reference SQream DB, Release 2021.1 3.2.2.7.55.8 Creating a new role for a group of users -- Create new users (we will grant them passwords and logins later) CREATE ROLE dba_user1; CREATE ROLE dba_user2; CREATE ROLE dba_user3; -- Add new users to the existing r_database_architect role GRANT r_database_architect TO dba_user1; GRANT r_database_architect TO dba_user2; GRANT r_database_architect TO dba_user3; 3.2.2.7.55.9 Granting with admin option If WITH ADMIN OPTION is specified, the role that has the admin option can in turn grant membership inthe role to others, and revoke membership in the role as well. -- dba_user1 is our team lead, so he should be able to grant -- permissions to other users. GRANT r_database_architect TO dba_user1 WITH ADMIN OPTION; 3.2.2.7.55.10 Change password for user role To change a user role’s password, grant the user a new password. GRANT PASSWORD 'new_password' TO rhendricks; Note: Granting a new password overrides any previous password. Changing the password while the role has an active running statement does not affect that statement, but will affect subsequent statements. 3.2.2.7.56 RENAME ROLE RENAME ROLE can be used to rename a role. Learn more about the permission system in the access control guide. 3.2.2.7.56.1 Permissions The role must have the SUPERUSER permission. 3.2.2.7.56.2 Syntax 3.2. SQL Statements and Syntax 513 SQream DB, Release 2021.1 alter_role_rename_role_statement ::= ALTER ROLE role_name RENAME TO new_name ; role_name ::= identifier new_name ::= identifier 3.2.2.7.56.3 Parameters Parameter Description role_name The role name to apply the change to new_name The new role name 3.2.2.7.56.4 Notes • Role names are unique across the cluster. • SQream recommend that role names should follow these rules: – Are case-insensitive – Start with either a letter or underscore – Contain only ASCII letters, numbers, or underscores – Follow rules for Identifiers 3.2.2.7.56.5 Examples 3.2.2.7.56.6 Renaming a role ALTER ROLE rhendricks RENAME TO patches; 3.2.2.7.57 REVOKE The REVOKE statement removes permissions from a role. It allows for removing permissions to specific objects. Learn more about the permission system in the access control guide. See also GRANT, DROP ROLE. 3.2.2.7.57.1 Permissions To revoke permissions, the current role must have the SUPERUSER permission, or have the ADMIN OPTION. 514 Chapter 3. Reference SQream DB, Release 2021.1 3.2.2.7.57.2 Syntax grant_statement ::= { -- revoke permissions at the cluster level: REVOKE { SUPERUSER | LOGIN | PASSWORD 'password' } FROM role_name [, ...] -- Revoke permissions at the database level: | REVOKE { { CREATE | CONNECT | DDL | SUPERUSER | CREATE FUNCTION } [, ...] | ALL [PERMISSIONS] } ON DATABASE database_name [, ...] FROM role_name [, ...] -- Revoke permissions at the schema level: | REVOKE { { CREATE | DDL | USAGE | SUPERUSER } [, ...] | ALL [PERMISSIONS ] } ON SCHEMA schema_name [, ...] FROM role_name [, ...] -- Revoke permissions at the object level: | REVOKE { { SELECT | INSERT | DELETE | DDL } [, ...] | ALL [PERMISSIONS] } ON { TABLE table_name [, ...] | ALL TABLES IN SCHEMA schema_name [, ...]} FROM role_name [, ...] (continues on next page) 3.2. SQL Statements and Syntax 515 SQream DB, Release 2021.1 (continued from previous page) -- Revoke execute function permission: | REVOKE { ALL | EXECUTE | DDL } ON FUNCTION function_name FROM role_name [, ...] -- Remove permissions between roles by revoking role membership: | REVOKE role_name [, ...] FROM role_name_2 [ WITH ADMIN OPTION ] ; role_name ::= identifier role_name2 ::= identifier database_name ::= identifier table_name ::= identifier schema_name ::= identifier 3.2.2.7.57.3 Parameters Parameter Description role_name The name of the role to revoke permissions from table_name, Object to revoke permissions on. database_name, schema_name, function_name WITH ADMIN If WITH ADMIN OPTION is specified, the role that has the admin option can inturn OPTION grant membership in the role to others, and revoke membership in the role as well. Specifying WITH ADMIN OPTION for revocation will return the role to an ordinary role. An ordinary role cannot grant or revoke membership. 516 Chapter 3. Reference SQream DB, Release 2021.1 3.2.2.7.57.4 Supported permissions Permission Object Description LOGIN Cluster Login permissions (with a password) allows a role to be a user and login to a database PASSWORD Cluster Sets the password for a user role CREATE Database Allows a user to create a Python UDF FUNCTION SUPERUSER Cluster, Database, The most privileged role, with full control over a cluster, Schema database, or schema CONNECT Database Allows a user to connect and use a database CREATE Database, Schema, Table For a role to create and manage objects, it needs the CREATE and USAGE permissions at the respective level USAGE Schema For a role to see tables in a schema, it needs the USAGE permis- sions SELECT Table Allows a user to run SELECT queries on table contents INSERT Table Allows a user to run COPY FROM and INSERT statements to load data into a table DELETE Table Allows a user to run DELETE, TRUNCATE statements to delete data from a table DDL Database, Schema, Table, Allows a user to alter tables, rename columns and tables, etc. Function EXECUTE Function Allows a user to execute UDFs ALL Cluster, Database, All of the above permissions at the respective level Schema, Table, Function 3.2.2.7.57.5 Examples 3.2.2.7.57.6 Prevent a role from modifying table contents If you don’t trust user shifty, reokve DDL and INSERT permissions. REVOKE INSERT ON TABLE important_table FROM shifty; REVOKE DDL ON TABLE important_table FROM shifty; 3.2.2.7.57.7 Demoting a user from superuser -- On the entire cluster REVOKE SUPERUSER FROM new_role; 3.2.2.7.57.8 Revoking admin option If WITH ADMIN OPTION is specified, the role that has the admin option can in turn grant membership inthe role to others, and revoke membership in the role as well. -- dba_user1 has been demoted from team lead, so he should not be able to grant -- permissions to other users. (continues on next page) 3.2. SQL Statements and Syntax 517 SQream DB, Release 2021.1 (continued from previous page) REVOKE r_database_architect FROM dba_user1 WITH ADMIN OPTION; 3.2.3 SQL Functions SQream DB supports functions from ANSI SQL, as well as others for compatibility. 3.2.3.1 Summary of functions • Built-In Scalar Functions – Bitwise Operations – Conditionals – Conversion – Date and Time – Numeric – Strings • User-Defined Scalar Functions • Aggregate Functions • Window Functions • System Functions • Workload Management Functions 3.2.3.1.1 Built-In Scalar Functions See more about Built-In Scalar functions 3.2.3.1.1.1 Bitwise Operations Function Description & (bitwise AND) Bitwise AND ~ (bitwise NOT) Bitwise NOT | (bitwise OR) Bitwise OR << (bitwise shift left) Bitwise shift left >> (bitwise shift right) Bitwise shift right XOR (bitwise XOR) Bitwise XOR 518 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.1.2 Conditionals Function Description BETWEEN Value is in [ or not within ] the range CASE Test a conditional expression, and depending on the result, evaluate additional expressions. COA- Evaluate first non-NULL expression LESCE IN Value is in [ or not within ] a set of values ISNULL Alias for COALESCE with two expressions IS_ASCII Test a TEXT for ASCII-only characters IS NULL Check for NULL [ or non-NULL ] values 3.2.3.1.1.3 Conversion Function Description FROM_UNIXTS, FROM_UNIXTSMS Converts a UNIX Timestamp to DATE or DATETIME TO_HEX Converts a number to a hexadecimal string representation TO_UNIXTS, TO_UNIXTSMS Converts a DATE or DATETIME to a UNIX Timestamp 3.2.3.1.1.4 Date and Time Function Description CURDATE Special syntax, equivalent to CURRENT_DATE CURRENT_DATE Returns the current date as DATE CURRENT_TIMESTAMP Equivalent to GETDATE DATEPART Extracts a date or time element from a date expression DATEADD Adds an interval to a date expression DATEDIFF Calculates the time difference between two date expressions EOMONTH Calculates the last day of the month of a given date expression EXTRACT ANSI syntax for extracting date or time element from a date expression GETDATE Returns the current timestamp as DATETIME SYSDATE Equivalent to GETDATE TRUNC Truncates a date element down to a specified date or time element 3.2.3.1.1.5 Numeric See more about Arithmetic operators 3.2. SQL Statements and Syntax 519 SQream DB, Release 2021.1 Table 26: Arithmetic operators Operator Syntax Description + (unary) +a Converts a string to a numeric value. Identical to a :: double + a + b Adds two expressions together - (unary) -a Negates a numeric expression - a - b Subtracts b from a * a * b Multiplies a by b / a / b Divides a by b % a % b Modulu of a by b. See also MOD, % Table 27: Functions Function Description ABS Calculates the absolute value of an argument ACOS Calculates the inverse cosine of an argument ASIN Calculates the inverse sine of an argument ATAN Calculates the inverse tangent of an argument ATN2 Calculates the inverse tangent for a point (y, x) CEILING / CEIL Calculates the next integer for an argument COS Calculates the cosine of an argument COT Calculates the cotangent of an argument CRC64 Calculates a CRC-64 hash of an argument DEGREES Converts a value from radian values to degrees EXP Calcalates the natural exponent for an argument (ex) FLOOR Calculates the largest integer smaller than the argument LOG Calculates the natural log for an argument LOG10 Calculates the 10-based log for an argument MOD, % Calculates the modulu (remainder) of two arguments PI Returns the constant value for π POWER Calculates x to the power of y (xy) RADIANS Converts a value from degree values to radians ROUND Rounds an argument down to the nearest integer, or an arbitrary precision SIN Calculates the sine of an argument SQRT Calculates the square root of an argument (√x) SQUARE Raises an argument to the power of 2 (xy) TAN Calculates the tangent of an argument TRUNC Rounds a number to its integer representation towards 0 520 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.1.6 Strings Function Description CHAR_LENGTH Calculates number of characters in an argument CHARINDEX Calculates the position where a string starts inside another string || (Concatenate) Concatenates two strings ISPREFIXOF Matches if a string is the prefix of another string LEFT Returns the first number of characters from an argument LEN Calculates the length of a string in characters LIKE Tests if a string argument matches a pattern LOWER Converts an argument to a lower-case equivalent LTRIM Trims whitespaces from the left side of an argument OCTET_LENGTH Calculates the length of a string in bytes PATINDEX Calculates the position where a pattern matches a string REGEXP_COUNT Calculates the number of matches of a regular expression match in an argument REGEXP_INSTR Returns the start position of a regular expression match in an argument REGEXP_SUBSTR Returns a substring of an argument that matches a regular expression REPEAT Repeats a string as many times as specified REPLACE Replaces characters in a string REVERSE Reverses a string argument RIGHT Returns the last number of characters from an argument RLIKE Tests if a string argument matches a regular expression pattern RTRIM Trims whitespace from the right side of an argument SUBSTRING Returns a substring of an argument TRIM Trims whitespaces from an argument UPPER Converts an argument to an upper-case equivalent 3.2.3.1.2 User-Defined Scalar Functions For more information about user-defined scalar functions, see Scalar SQL UDF 3.2.3.1.3 Aggregate Functions See more about Aggregate Functions Function Aliases Description AVG Calculates the average of all of the values CORR Calculates the Pearson correlation coefficient COUNT Calculates the count of all of the values or only distinct values COVAR_POP Calculates population covariance of values COVAR_SAMP Calculates sample covariance of values MAX Returns maximum value of all values MIN Returns minimum value of all values SUM Calculates the sum of all of the values or only distinct values STDDEV_SAMP stdev, stddev Calculates sample standard deviation of values STDDEV_POP stdevp Calculates population standard deviation of values VAR_SAMP var, variance Calculates sample variance of values VAR_POP varp Calculates population variance of values 3.2. SQL Statements and Syntax 521 SQream DB, Release 2021.1 3.2.3.1.4 Window Functions See more about Window Functions Function Description LAG Calculates the value evaluated at the row that is before the current row within the partition LEAD Calculates the value evaluated at the row that is after the current row within the partition MAX Calculates the maximum value MIN Calculates the minimum value SUM Calculates the sum of all of the values RANK Calculates the rank of a row FIRST_VALUE Returns the value in the first row of a window LAST_VALUE Returns the value in the last row of a window NTH_VALUE Returns the value in a specified (n) row of a window DENSE_RANK Returns the rank of the current row with no gaps PER- Returns the relative rank of the current row CENT_RANK CUME_DIST Returns the cumulative distribution of rows NTILE Returns an integer ranging between 1 and the argument value, dividing the partitions as equally as possible 3.2.3.1.5 System Functions System functions allow you to execute actions in the system, such as aborting a query or get information about system processes. Function Description EXPLAIN Returns a static query plan for a statement SHOW_CONNECTIONS Returns a list of jobs and statements on the current worker SHOW_LOCKS Returns any existing locks in the database SHOW_NODE_INFO Returns a query plan for an actively running statement with timing infor- mation SHOW_SERVER_STATUS Shows running statements across the cluster SHOW_VERSION Returns the version of SQream DB STOP_STATEMENT Stops a query (or statement) if it is currently running 3.2.3.1.6 Workload Management Functions Function Description SUBSCRIBE_SERVICE Add a SQream DB worker to a service queue UNSUBSCRIBE_SERVICE Remove a SQream DB worker to a service queue SHOW_SUBSCRIBED_INSTANCES Return a list of service queues and workers 3.2.3.1.6.1 Built-In Scalar functions Built-in scalar functions return one value per call. 522 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.2 & (bitwise AND) Returns the bitwise AND of two numeric expressions 3.2.3.1.6.3 Syntax expr1 & expr2 --> integer expr1 ::= integer expr2 ::= integer 3.2.3.1.6.4 Arguments Parameter Description expr1, expr2 Integer expressions 3.2.3.1.6.5 Returns Returns an integer that is the bitwise AND of the inputs. 3.2.3.1.6.6 Notes • If either value is NULL, the result is NULL. 3.2.3.1.6.7 Examples master=> SELECT 16 & 24; 16 master=> SELECT 101 & 110; 100 master=> SELECT 32 & 64; 0 master=> CREATE TABLE bit(b1 int, b2 int, b3 int); executed master=> INSERT INTO bit VALUES (1,2,3), (2, 4, 6), (4, 2, 6), (2, 8, 16), (null, null,␣ ,→64), (5, 3, 1), (6, 1, 0); executed SELECT b1, b2, b3, b1 & b2, b2 & b3, b1 & b3 FROM bit; b1 | b2 | b3 | ?column? | ?column?0 | ?column?1 (continues on next page) 3.2. SQL Statements and Syntax 523 SQream DB, Release 2021.1 (continued from previous page) ---+----+----+------+------+------1 | 2 | 3 | 0 | 2 | 1 2 | 4 | 6 | 0 | 4 | 2 4 | 2 | 6 | 0 | 2 | 4 2 | 8 | 16 | 0 | 0 | 0 | | 64 | | | 5 | 3 | 1 | 1 | 1 | 1 6 | 1 | 0 | 0 | 0 | 0 3.2.3.1.6.8 ~ (bitwise NOT) Returns the bitwise NOT (negation) of two numeric expressions. This is the bitwise complement. 3.2.3.1.6.9 Syntax ~ expr --> integer expr ::= integer 3.2.3.1.6.10 Arguments Parameter Description expr Integer expression 3.2.3.1.6.11 Returns Returns an integer that is the bitwise NOT of the input. 3.2.3.1.6.12 Notes • If the value is NULL, the result is NULL. 3.2.3.1.6.13 Examples master=> SELECT ~16, ~24; -17,-25 master=> SELECT ~101, ~110; -102,-111 master=> SELECT ~32, ~64; -33,-65 524 Chapter 3. Reference SQream DB, Release 2021.1 master=> CREATE TABLE bit(b1 int, b2 int, b3 int); executed master=> INSERT INTO bit VALUES (1,2,3), (2, 4, 6), (4, 2, 6), (2, 8, 16), (null, null,␣ ,→64), (5, 3, 1), (6, 1, 0); executed SELECT b1, b2, b3, ~b1, ~b2, ~b3 FROM bit; b1 | b2 | b3 | ?column? | ?column?0 | ?column?1 ---+----+----+------+------+------1 | 2 | 3 | -2 | -3 | -4 2 | 4 | 6 | -3 | -5 | -7 4 | 2 | 6 | -5 | -3 | -7 2 | 8 | 16 | -3 | -9 | -17 | | 64 | | | -65 5 | 3 | 1 | -6 | -4 | -2 6 | 1 | 0 | -7 | -2 | -1 3.2.3.1.6.14 | (bitwise OR) Returns the bitwise OR of two numeric expressions 3.2.3.1.6.15 Syntax expr1 | expr2 --> integer expr1 ::= integer expr2 ::= integer 3.2.3.1.6.16 Arguments Parameter Description expr1, expr2 Integer expressions 3.2.3.1.6.17 Returns Returns an integer that is the bitwise OR of the inputs. 3.2.3.1.6.18 Notes • If either value is NULL, the result is NULL. 3.2. SQL Statements and Syntax 525 SQream DB, Release 2021.1 3.2.3.1.6.19 Examples master=> SELECT 16 | 24; 24 master=> SELECT 101 | 110; 111 master=> SELECT 32 | 64; 96 master=> CREATE TABLE bit(b1 int, b2 int, b3 int); executed master=> INSERT INTO bit VALUES (1,2,3), (2, 4, 6), (4, 2, 6), (2, 8, 16), (null, null,␣ ,→64), (5, 3, 1), (6, 1, 0); executed SELECT b1, b2, b3, b1 | b2, b2 | b3, b1 | b3 FROM bit; b1 | b2 | b3 | ?column? | ?column?0 | ?column?1 ---+----+----+------+------+------1 | 2 | 3 | 3 | 3 | 3 2 | 4 | 6 | 6 | 6 | 6 4 | 2 | 6 | 6 | 6 | 6 2 | 8 | 16 | 10 | 24 | 18 | | 64 | | | 5 | 3 | 1 | 7 | 3 | 5 6 | 1 | 0 | 7 | 1 | 6 3.2.3.1.6.20 << (bitwise shift left) Returns the bitwise shift left of a numeric expression 3.2.3.1.6.21 Syntax expr << n --> integer expr ::= integer n ::= integer 3.2.3.1.6.22 Arguments Parameter Description expr Integer expressions n Number of bits to shift by 526 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.23 Returns Returns an integer that is the input shifted left by n positions. 3.2.3.1.6.24 Notes • If either value is NULL, the result is NULL. 3.2.3.1.6.25 Examples master=> SELECT 16 << 1; 32 master=> SELECT 16 << 2; 64 master=> select 1 << 1; 2 master=> CREATE TABLE bit(b1 int, b2 int, b3 int); executed master=> INSERT INTO bit VALUES (1,2,3), (2, 4, 6), (4, 2, 6), (2, 8, 16), (null, null,␣ ,→64), (5, 3, 1), (6, 1, 0); executed SELECT b1, b2, b3, b1 << b2, b2 << b3, b1 << 1 FROM bit; b1 | b2 | b3 | ?column? | ?column?0 | ?column?1 ---+----+----+------+------+------1 | 2 | 3 | 4 | 16 | 2 2 | 4 | 6 | 32 | 256 | 4 4 | 2 | 6 | 16 | 128 | 8 2 | 8 | 16 | 512 | 524288 | 4 | | 64 | | | 5 | 3 | 1 | 40 | 6 | 10 6 | 1 | 0 | 12 | 1 | 12 3.2.3.1.6.26 >> (bitwise shift right) Returns the bitwise shift right of a numeric expression 3.2.3.1.6.27 Syntax expr >> n --> integer expr ::= integer n ::= integer 3.2. SQL Statements and Syntax 527 SQream DB, Release 2021.1 3.2.3.1.6.28 Arguments Parameter Description expr Integer expressions n Number of bits to shift by 3.2.3.1.6.29 Returns Returns an integer that is the input shifted right by n positions. 3.2.3.1.6.30 Notes • If either value is NULL, the result is NULL. 3.2.3.1.6.31 Examples master=> SELECT 16 >> 1; 8 master=> SELECT 16 >> 2; 4 master=> select 1 >> 1; 0 master=> CREATE TABLE bit(b1 int, b2 int, b3 int); executed master=> INSERT INTO bit VALUES (1,2,3), (2, 4, 6), (4, 2, 6), (2, 8, 16), (null, null,␣ ,→64), (5, 3, 1), (6, 1, 0); executed SELECT b1, b2, b3, b1 >> b2, b2 >> b3, b1 >> 1 FROM bit; b1 | b2 | b3 | ?column? | ?column?0 | ?column?1 ---+----+----+------+------+------1 | 2 | 3 | 0 | 0 | 0 2 | 4 | 6 | 0 | 0 | 1 4 | 2 | 6 | 1 | 0 | 2 2 | 8 | 16 | 0 | 0 | 1 | | 64 | | | 5 | 3 | 1 | 0 | 1 | 2 6 | 1 | 0 | 3 | 1 | 3 3.2.3.1.6.32 XOR (bitwise XOR) Returns the bitwise XOR of two numeric expressions 528 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.33 Syntax XOR(expr1, expr2) --> integer expr1 ::= integer expr2 ::= integer 3.2.3.1.6.34 Arguments Parameter Description expr1, expr2 Integer expressions 3.2.3.1.6.35 Returns Returns an integer that is the bitwise XOR of the inputs. 3.2.3.1.6.36 Notes • If either value is NULL, the result is NULL. 3.2.3.1.6.37 Examples master=> SELECT XOR(16, 24); 8 master=> SELECT XOR(101, 110); 11 master=> SELECT XOR(32, 64); 96 master=> SELECT XOR(512, 512); 0 master=> CREATE TABLE bit(b1 int, b2 int, b3 int); executed master=> INSERT INTO bit VALUES (1,2,3), (2, 4, 6), (4, 2, 6), (2, 8, 16), (null, null,␣ ,→64), (5, 3, 1), (6, 1, 0); executed SELECT b1, b2, b3, xor(b1, b2), xor(b2, b3), xor(b1, b3) FROM bit; b1 | b2 | b3 | xor | xor0 | xor1 ---+----+----+-----+------+----- 1|2|3| 3| 1| 2 (continues on next page) 3.2. SQL Statements and Syntax 529 SQream DB, Release 2021.1 (continued from previous page) 2|4|6| 6| 2| 4 4|2|6| 6| 4| 2 2 | 8 | 16 | 10 | 24 | 18 | | 64 | | | 5|3|1| 6| 2| 4 6|1|0| 7| 1| 6 3.2.3.1.6.38 BETWEEN BETWEEN is used to simplify range tests, by returning TRUE when the input is within two boundaries. 3.2.3.1.6.39 Syntax expr [ NOT ] BETWEEN lower_bound AND upper_bound 3.2.3.1.6.40 Arguments Parameter Description expr A general value expression or a literal. lower_bound, upper_bound Lower and upper bounds, of the same data type as expr 3.2.3.1.6.41 Returns Returns TRUE when expr is within the bounds, or FALSE otherwise. 3.2.3.1.6.42 Notes • expr BETWEEN X AND Y is equivalent to expr >=X AND expr <=Y. • The upper boundary must be greater than the lower boundary 3.2.3.1.6.43 Examples 3.2.3.1.6.44 BETWEEN farm=> SELECT name, num_eyes FROM cool_animals WHERE num_eyes BETWEEN 5 and 8 name | num_eyes ------+------Spider | 8 Starfish | 5 Praying mantis | 5 530 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.45 NOT BETWEEN farm=> SELECT name, num_eyes FROM cool_animals WHERE num_eyes NOT BETWEEN 5 and 8 name | num_eyes ------+------Human | 2 Horseshoe crab | 10 Box Jellyfish | 24 Fox | 2 Possum | 2 3.2.3.1.6.46 CASE A case expression can test a conditional expression, and depending on the result, evaluate additional expres- sions. CASE works like nested IF ... THEN ... ELSE .... When one of the conditions evaluates to TRUE, the eval- uation stops and the expression or operand assosicated with the result (THEN ...) is evaulated. Otherwise, the evaluation continues to the next condition. If no conditional matches, the ELSE result is returned. 3.2.3.1.6.47 Syntax case_expression ::= searched_case | simple_case searched_case ::= CASE WHEN conditional_value_expr THEN result_value_expr [WHEN ...] [ELSE result_value_expr] END simple_case ::= CASE value_expr WHEN value_expr THEN result_value_expr [WHEN ...] [ELSE result_value_expr] END 3.2.3.1.6.48 Arguments Parameter Description conditional_value_exprA condition that evaluates to a boolean (TRUE, FALSE, or NULL) value_expr A general value expression or a literal. result_value_expr A general value expression or a literal. All data types of result_value_expr must match within a CASE. 3.2. SQL Statements and Syntax 531 SQream DB, Release 2021.1 3.2.3.1.6.49 Returns Returns an expression consistent with the type of result_value_expr. 3.2.3.1.6.50 Notes • SQream DB does not support the IF .. THEN .. ELSE syntax. A simple case expression can replace it. • If no ELSE is specified, the default result will be NULL. • All WHEN ... expressions must have the same data type. • All THEN ... and the ELSE results must have the same data type. 3.2.3.1.6.51 Simple case expression simple_case ::= CASE value_expr WHEN value_expr THEN result_value_expr [WHEN ...] [ELSE result_value_expr] END A simple case expression evaluates the condition (conditional_value_expr), and then evaluates and returns the result from the corresponding THEN. If no matches are found, NULL is returned. 3.2.3.1.6.52 Searched case expression searched_case ::= CASE WHEN conditional_value_expr THEN result_value_expr [WHEN ...] [ELSE result_value_expr] END A searched case expression evaluates every conditional_value_expr listed. At the first conditional_value_expr that evaluates to TRUE, the result of the THEN will be returned. If no matches are found, NULL is returned. 3.2.3.1.6.53 Examples 3.2.3.1.6.54 Simple case master=> SELECT name, CASE num_eyes . WHEN 1 THEN 'Cyclops' . WHEN 2 THEN 'Binocular' (continues on next page) 532 Chapter 3. Reference SQream DB, Release 2021.1 (continued from previous page) . WHEN 5 THEN 'Pentocular' . WHEN 8 then 'Octocular' . ELSE 'Other' . END . FROM (VALUES ('Copepod',1), ('Spider',8), ('Starfish', 5), ('Praying mantis',␣ ,→5), ('Human (average)', 2), ('Eagle', 2), ('Horseshoe crab', 10)) . AS cool_animals(name, num_eyes); name | ?column? ------+------Copepod | Cyclops Spider | Octocular Starfish | Pentocular Praying mantis | Pentocular Human (average) | Binocular Eagle | Binocular Horseshoe crab | Other 3.2.3.1.6.55 Searched case SELECT age, CASE WHEN age < 3 THEN 'Toddler' WHEN age < 12 THEN 'Child' WHEN age < 20 THEN 'Teenager' WHEN age < 34 THEN 'Young adult' WHEN age < 65 THEN 'Adult' ELSE 'Senior' END AS "Age Group" FROM (VALUES (2), (5), (15), (19), (32), (44), (87)) AS t(age); age | Age group ----+------2 | Toddler 5 | Child 15 | Teenager 19 | Teenager 32 | Young adult 44 | Adult 87 | Senior 3.2.3.1.6.56 Replacing IF with CASE As SQream DB does not support the IF function found on some other DBMSs, use CASE instead. -- MySQL syntax: IF (age > 65, "Senior", "Other") is functionally identical to: 3.2. SQL Statements and Syntax 533 SQream DB, Release 2021.1 CASE WHEN age > 65 THEN 'Senior' ELSE 'Other' END 3.2.3.1.6.57 COALESCE Evaluates and returns the first non-null expression, or NULL if all expressions are NULL. 3.2.3.1.6.58 Syntax COALESCE( expr, [, ...]) 3.2.3.1.6.59 Arguments Parameter Description expr1, expr2,… Expressions of the same type 3.2.3.1.6.60 Returns Returns the first non-null argument or NULL if all expressions are null. 3.2.3.1.6.61 Notes • All expressions must have the same type, which is also the type of the result. • See also ISNULL. ISNULL(x,y) is equivalent to COALESCE(x,y). 3.2.3.1.6.62 Examples 3.2.3.1.6.63 Coalesce master=> SELECT COALESCE(NULL, NULL, NULL, 5); 5 3.2.3.1.6.64 Replacing NULL values in aggregations In some cases, replacing NULL values with a default can affect results. master=> SELECT . AVG(num_eyes), . AVG(COALESCE(num_eyes,2)) AS "Corrected average" . (continues on next page) 534 Chapter 3. Reference SQream DB, Release 2021.1 (continued from previous page) . FROM ( . VALUES ('Copepod',1),('Spider',8) . ,('Starfish',5),('Praying mantis',5) . ,('Human (average)',2),('Eagle',2) . ,('Horseshoe crab',10),('Kiwi',NULL) . ,('Fox',NULL),('Badger',NULL) . ) AS cool_animals (name , num_eyes); avg | Corrected average ----+------4 | 3 3.2.3.1.6.65 IN IN is used to test if an expression is contained in a list of values. This is also known as set membership. 3.2.3.1.6.66 Syntax expr [ NOT ] IN ( value_expr [, ...]) 3.2.3.1.6.67 Arguments Parameter Description expr A general value expression or a literal to test. value_expr [, ...] A comma separated list of literal values of the same data type as expr 3.2.3.1.6.68 Returns Returns TRUE when expr is contained in the set, or FALSE otherwise. 3.2.3.1.6.69 Notes • NULL is never equal to NULL, so to check if a value is NULL use IS NULL instead. • Using IN with more than 100 literal values can slow down query compilation. If your query relies on more than 100 values, consider using a lookup table and an inner join. 3.2.3.1.6.70 Examples 3.2.3.1.6.71 IN 3.2. SQL Statements and Syntax 535 SQream DB, Release 2021.1 farm=> SELECT name, num_eyes FROM cool_animals WHERE num_eyes IN (8, 10) name | num_eyes ------+------Spider | 8 Horseshoe crab | 10 3.2.3.1.6.72 NOT IN farm=> SELECT name, num_eyes FROM cool_animals WHERE num_eyes NOT IN (8, 10) name | num_eyes ------+------Box Jellyfish | 24 Human | 2 Fox | 2 Possum | 2 3.2.3.1.6.73 IS_ASCII IS_ASCII is used to test if a string contains only ASCII characters. 3.2.3.1.6.74 Syntax IS_ASCII( expr ) --> BOOL 3.2.3.1.6.75 Arguments Parameter Description expr A general value expression or a literal to test. 3.2.3.1.6.76 Returns Returns TRUE when expr contains only ASCII characters (0-127) or FALSE otherwise. 3.2.3.1.6.77 Notes • If the expression to test is NULL, IS_ASCII returns NULL. 3.2.3.1.6.78 Examples For these examples, consider the following table and contents: 536 Chapter 3. Reference SQream DB, Release 2021.1 CREATE TABLE dictionary (id INT NOT NULL, fw TEXT(30), en VARCHAR(30)); INSERT INTO dictionary VALUES (1, '', 'Let''s go'), (2, '', 'Cheers'), (3, 'L''chaim', ,→ 'Cheers'); 3.2.3.1.6.79 IS NULL m=> SELECT id, en, fw, IS_ASCII(fw) FROM dictionary; id | en | fw | is_ascii ---+------+------+------1 | Let's go | | false 2 | Cheers | | false 3 | Cheers | L'chaim | true 3.2.3.1.6.80 IS NULL IS NULL is used to test if an expression is null. 3.2.3.1.6.81 Syntax expr IS [ NOT ] NULL 3.2.3.1.6.82 Arguments Parameter Description expr A general value expression or a literal to test. 3.2.3.1.6.83 Returns Returns TRUE when expr is NULL or FALSE otherwise. 3.2.3.1.6.84 Examples For these examples, consider the following table and contents: CREATE TABLE t (id INT NOT NULL, name VARCHAR(30), weight INT); INSERT INTO t VALUES (1, 'Kangaroo', 120), (2, 'Koala', 20), (3, 'Wombat', 60) ,(4, 'Kappa', NULL),(5, 'Echidna', 8),(6, 'Chupacabra', NULL) ,(7, 'Kraken', NULL); 3.2. SQL Statements and Syntax 537 SQream DB, Release 2021.1 3.2.3.1.6.85 IS NULL m=> SELECT * FROM t WHERE weight IS NULL; id | name | weight ---+------+------4 | Kappa | 6 | Chupacabra | 7 | Kraken | 3.2.3.1.6.86 Using IS NOT NULL to filter unwanted results m=> SELECT AVG(weight) FROM t WHERE weight IS NOT NULL; avg --- 52 3.2.3.1.6.87 ISNULL Evaluates an expression and returns a default value if the expression is NULL. 3.2.3.1.6.88 Syntax ISNULL( expr, value_expr ) 3.2.3.1.6.89 Arguments Parameter Description expr Expressions to evaluate value_expr Default value to return if expr is null 3.2.3.1.6.90 Returns Returns either expr or value_expr. 3.2.3.1.6.91 Notes • All expressions must have the same type, which is also the type of the result. • See also COALESCE. ISNULL(x,y) is equivalent to COALESCE(x,y). 538 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.92 Examples 3.2.3.1.6.93 ISNULL master=> SELECT ISNULL(NULL, 5); 5 3.2.3.1.6.94 Replacing NULL values in aggregations In some cases, replacing NULL values with a default can affect results. master=> SELECT . AVG(num_eyes), . AVG(ISNULL(num_eyes,2)) AS "Corrected average" . . FROM ( . VALUES ('Copepod',1),('Spider',8) . ,('Starfish',5),('Praying mantis',5) . ,('Human (average)',2),('Eagle',2) . ,('Horseshoe crab',10),('Kiwi',NULL) . ,('Fox',NULL),('Badger',NULL) . ) AS cool_animals (name , num_eyes); avg | Corrected average ----+------4 | 3 3.2.3.1.6.95 FROM_UNIXTS, FROM_UNIXTSMS Converts a BIGINT representing a UNIX timestamp to a DATETIME value. 3.2.3.1.6.96 Syntax FROM_UNIXTS( expr ) --> DATETIME FROM_UNIXTSMS( expr ) --> DATETIME 3.2.3.1.6.97 Arguments Parameter Description expr An integer containing a UNIX timestamp in seconds/milliseconds from EPOCH 3.2.3.1.6.98 Returns • FROM_UNIXTS returns a DATETIME from a UNIX timestamp in seconds since EPOCH • FROM_UNIXTSMS returns DATETIME from a UNIX timestamp in milliseconds since EPOCH. 3.2. SQL Statements and Syntax 539 SQream DB, Release 2021.1 3.2.3.1.6.99 Notes To convert a DATETIME to a UNIX timestamp, see TO_UNIXTS. 3.2.3.1.6.100 Examples 3.2.3.1.6.101 Convert a UNIX timestamp to DATETIME master=> SELECT FROM_UNIXTS(0), FROM_UNIXTS(1575636057); to_unixtsms | to_unixts ------+------1575642811562 | 1575642811 3.2.3.1.6.102 TO_HEX Converts an integer to a hexadecimal representation. 3.2.3.1.6.103 Syntax TO_HEX( expr ) --> VARCHAR 3.2.3.1.6.104 Arguments Parameter Description expr An integer expression 3.2.3.1.6.105 Returns • Representation of the hexadecimal number of type VARCHAR. 3.2.3.1.6.106 Examples For these examples, consider the following table and contents: CREATE TABLE cool_numbers(number BIGINT NOT NULL); INSERT INTO cool_numbers VALUES (42), (3735928559), (666), (3135097598), (3221229823); 3.2.3.1.6.107 Convert numbers to hexadecimal 540 Chapter 3. Reference SQream DB, Release 2021.1 master=> SELECT TO_HEX(number) FROM cool_numbers; to_hex ------0x000000000000002a 0x00000000deadbeef 0x000000000000029a 0x00000000baddcafe 0x00000000c00010ff 3.2.3.1.6.108 TO_UNIXTS, TO_UNIXTSMS Converts a DATETIME value to a BIGINT representing a UNIX timestamp. 3.2.3.1.6.109 Syntax TO_UNIXTS( expr ) --> BIGINT TO_UNIXTSMS( expr ) --> BIGINT 3.2.3.1.6.110 Arguments Parameter Description expr A DATE or DATETIME expression 3.2.3.1.6.111 Returns • TO_UNIXTS returns the UNIX timestamp in seconds since EPOCH • TO_UNIXTSMS returns the UNIX timestamp in milliseconds since EPOCH. 3.2.3.1.6.112 Notes To convert a UNIX timestamp to a DATE or DATETIME, see FROM_UNIXTS. 3.2.3.1.6.113 Examples 3.2.3.1.6.114 Get the current UNIX timestamp master=> SELECT TO_UNIXTSMS(GETDATE()), TO_UNIXTS(GETDATE()); to_unixtsms | to_unixts ------+------1575642811562 | 1575642811 3.2. SQL Statements and Syntax 541 SQream DB, Release 2021.1 3.2.3.1.6.115 Filter on a range of UNIX timestamps Get the amount of users that signed up during 2019 master=> SELECT COUNT(*) FROM users . WHERE signup_ts BETWEEN TO_UNIXTS('2019-01-01') AND TO_UNIXTS('2019-12-31'); count ------523428 3.2.3.1.6.116 CURDATE Returns the current date of the system. Note: This function is provided for compatibility with MySQL, IBM DB2 and ODBC. 3.2.3.1.6.117 Syntax CURDATE() --> DATE 3.2.3.1.6.118 Arguments None 3.2.3.1.6.119 Returns The current system date, with type DATE. 3.2.3.1.6.120 Notes • This function is equivalent to CURRENT_DATE. • To get the date and time, see CURRENT_TIMESTAMP. 3.2.3.1.6.121 Examples 3.2.3.1.6.122 Get the current system date master=> SELECT CURRENT_DATE, CURRENT_DATE(), CURDATE(); date | date0 | date1 ------+------+------2019-12-07 | 2019-12-07 | 2019-12-07 542 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.123 CURRENT_DATE Returns the current date of the system. Note: This function has a special ANSI SQL form and can be called without parentheses. 3.2.3.1.6.124 Syntax CURRENT_DATE() --> DATE CURRENT_DATE --> DATE 3.2.3.1.6.125 Arguments None 3.2.3.1.6.126 Returns The current system date, with type DATE. 3.2.3.1.6.127 Notes • This function has a special ANSI SQL form and can be called without parentheses. • This function is equivalent to CURDATE. • To get the date and time, see CURRENT_TIMESTAMP. 3.2.3.1.6.128 Examples 3.2.3.1.6.129 Get the current system date master=> SELECT CURRENT_DATE, CURRENT_DATE(), CURDATE(); date | date0 | date1 ------+------+------2019-12-07 | 2019-12-07 | 2019-12-07 3.2.3.1.6.130 CURRENT_TIMESTAMP Returns the current date and time of the system. Note: This function has a special ANSI SQL form and can be called without parentheses. 3.2. SQL Statements and Syntax 543 SQream DB, Release 2021.1 3.2.3.1.6.131 Syntax CURRENT_TIMESTAMP() --> DATETIME CURRENT_TIMESTAMP --> DATETIME 3.2.3.1.6.132 Arguments None 3.2.3.1.6.133 Returns The current system date and time, with type DATETIME. 3.2.3.1.6.134 Notes • This function has a special ANSI SQL form and can be called without parentheses. • Aliases to this function include SYSDATE and GETDATE. • To get the date only, see CURRENT_DATE. 3.2.3.1.6.135 Examples 3.2.3.1.6.136 Get the current system date and time master=> SELECT CURRENT_TIMESTAMP, CURRENT_TIMESTAMP(), SYSDATE, GETDATE(); getdate | getdate0 | getdate1 | getdate2 ------+------+------+------2019-12-07 23:04:26 | 2019-12-07 23:04:26 | 2019-12-07 23:04:26 | 2019-12-07 23:04:26 3.2.3.1.6.137 Find events that happen before this month We will use TRUNC to get the date at the beginning of this month, and then filter. master=> SELECT COUNT(*) FROM cool_dates WHERE dt <= TRUNC(CURRENT_TIMESTAMP, month); count ----- 5 3.2.3.1.6.138 DATEADD Adds or subtracts an interval to DATE or DATETIME value. 544 Chapter 3. Reference SQream DB, Release 2021.1 Note: SQream DB does not support the INTERVAL ANSI syntax. Use DATEADD to add or subtract date intervals. 3.2.3.1.6.139 Syntax DATEADD( interval, number, date_expr ) interval ::= YEAR | YYYY | YY | QUARTER | QQ | Q | MONTH | MM | M | DAYOFYEAR | DOY | DY | Y | DAY | DD | D | WEEK | WK | WW | HOUR | HH | MINUTE | MI | N | SECOND | SS | S | MILLISECOND | MS 3.2.3.1.6.140 Arguments Parame- Description ter interval An interval representing a date part. See the table below or the syntax reference above for valid date parts number An integer expression date_expr A DATE or DATETIME expression 3.2.3.1.6.141 Valid date parts Date part Shorthand Definition YEAR YYYY, YY Year (0 - 9999) QUARTER QQ, Q Quarter (1-4) MONTH MM, M Month (1-12) DAY DD, D, DOY, DAYOFYEAR, DY, Y Days (1-365) WEEK WK, WW Week of the year (1-52) HOUR HH Hour (0-23) MINUTE MI, N Minute (0-59) SECOND SS, S Seconds (0-59) MILLISECOND MS Milliseconds (0-999) Note: • The first day of the week is Sunday, when used with weekday. 3.2. SQL Statements and Syntax 545 SQream DB, Release 2021.1 3.2.3.1.6.142 Returns • If HOUR, MINUTE, SECOND, or MILLISECOND are added to a DATE, the return type will be DATETIME. • For all other date parts, the return type is the same as the argument supplied. 3.2.3.1.6.143 Notes • Use negative numbers to subtract from a date • Use DATEADD instead of manually figuring out dates. For example, adding a day to February 28th2020 should result in February 29th 2020. Adding another day should result in March 1st, not February 30th. 3.2.3.1.6.144 Examples For these examples, consider the following table and contents: CREATE TABLE cool_dates(name VARCHAR(40), d DATE, dt DATETIME); INSERT INTO cool_dates VALUES ('Marty McFly goes back to this time','1955-11-05','1955- ,→11-05 01:21:00.000') ,('Marty McFly came from this time', '1985-10-26', '1985-10-26 01:22:00.000') ,('Vesuvius erupts', '79-08-24', '79-08-24 13:00:00.000') ,('1997 begins', '1997-01-01', '1997-01-01') ,('1997 ends', '1997-12-31','1997-12-31 23:59:59.999'); 3.2.3.1.6.145 Add a quarter to each date master=> SELECT d as original_date, DATEADD(QUARTER, 1, d) as next_quarter FROM cool_ ,→dates; original_date | next_quarter ------+------1955-11-05 | 1956-02-05 1985-10-26 | 1986-01-26 0079-08-24 | 0079-11-24 1997-01-01 | 1997-04-01 1997-12-31 | 1998-03-31 3.2.3.1.6.146 Getting next month’s date master=> SELECT CURRENT_DATE,DATEADD(MONTH, 1, CURRENT_DATE); date | dateadd ------+------2019-12-07 | 2020-01-07 546 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.147 Filtering +- 50 years from a specific date master=> SELECT name, dt as datetime FROM cool_dates . WHERE dt BETWEEN DATEADD(YEAR,-50,'1955-06-01') AND DATEADD(YEAR,50,'1955-06- ,→01'); name | datetime ------+------Marty McFly goes back to this time | 1955-11-05 01:21:00 Marty McFly came from this time | 1985-10-26 01:22:00 1997 begins | 1997-01-01 00:00:00 1997 ends | 1997-12-31 23:59:59 3.2.3.1.6.148 Check if a year is a leap year Returns TRUE if this is a leap year - because adding a day to February 28th is February 29th on a leap year. -- Should return true for 2020: master=> SELECT DATEPART(MONTH, DATEADD(DAY,1,'2020-02-28')) = 2 AS "2020 is a leap year ,→"; 2020 is a leap year ------true -- Should return false for 2021: master=> SELECT DATEPART(MONTH, DATEADD(DAY,1,'2021-02-28')) = 2 AS "2021 is a leap year ,→"; 2021 is a leap year ------false 3.2.3.1.6.149 DATEDIFF Calculates the difference between to DATE or DATETIME expressions, in terms of a specific date part. Note: Results are given in integers, rather than INTERVAL, which SQream DB does not support. 3.2.3.1.6.150 Syntax DATEDIFF( interval, date_expr1, date_expr2 ) --> INT interval ::= YEAR | YYYY | YY | QUARTER | QQ | Q | MONTH | MM | M | DAYOFYEAR | DY | Y | DAY | DD | D (continues on next page) 3.2. SQL Statements and Syntax 547 SQream DB, Release 2021.1 (continued from previous page) | WEEK | WK | WW | HOUR | HH | MINUTE | MI | N | SECOND | SS | S | MILLISECOND | MS 3.2.3.1.6.151 Arguments Parameter Description interval An interval representing a date part. See the table below or the syntax reference above for valid date parts date_expr1, A DATE or DATETIME expression. The function calculates date_expr2 - date_expr2 date_expr1. 3.2.3.1.6.152 Valid date parts Date part Shorthand Definition YEAR YYYY, YY Year (0 - 9999) QUARTER QQ, Q Quarter (1-4) MONTH MM, M Month (1-12) DAY DD, D, DAYOFYEAR, DY, Y Days (1-365) WEEK WK, WW Week of the year (1-52) HOUR HH Hour (0-23) MINUTE MI, N Minute (0-59) SECOND SS, S Seconds (0-59) MILLISECOND MS Milliseconds (0-999) 3.2.3.1.6.153 Returns An integer representing the number of date part units (e.g. years, days, months, hours, etc.) between date_expr2 and date_expr1. 3.2.3.1.6.154 Notes • Only the selected date part is used to calculate the difference. For example, the difference between '1997-12-31' and '1997-01-01' in years is 0, even though they are 364 days apart. • Negative values means that date_expr1 is after date_expr2. 3.2.3.1.6.155 Examples For these examples, consider the following table and contents: 548 Chapter 3. Reference SQream DB, Release 2021.1 CREATE TABLE cool_dates(name VARCHAR(40), d DATE, dt DATETIME); INSERT INTO cool_dates VALUES ('Marty McFly goes back to this time','1955-11-05','1955- ,→11-05 01:21:00.000') ,('Marty McFly came from this time', '1985-10-26', '1985-10-26 01:22:00.000') ,('Vesuvius erupts', '79-08-24', '79-08-24 13:00:00.000') ,('1997 begins', '1997-01-01', '1997-01-01') ,('1997 ends', '1997-12-31','1997-12-31 23:59:59.999'); 3.2.3.1.6.156 Find out how far past events are 3.2.3.1.6.157 In years master=> SELECT d AS original_date, DATEDIFF(YEAR, CURRENT_DATE, d) AS "was ... years ago ,→" FROM cool_dates; original_date | was ... years ago ------+------1955-11-05 | -64 1985-10-26 | -34 0079-08-24 | -1940 1997-01-01 | -22 1997-12-31 | -22 3.2.3.1.6.158 In days master=> SELECT d AS original_date, DATEDIFF(DAY, CURRENT_DATE, d) AS "was ... days ago"␣ ,→FROM cool_dates; original_date | was ... days ago ------+------1955-11-05 | -23408 1985-10-26 | -12460 0079-08-24 | -708675 1997-01-01 | -8375 1997-12-31 | -8011 3.2.3.1.6.159 In hours Note: • Use CURRENT_TIMESTAMP instead of CURRENT_DATE, to include the current time as well as date. • In this example, we use dt which is a DATETIME column 3.2. SQL Statements and Syntax 549 SQream DB, Release 2021.1 master=> SELECT CURRENT_TIMESTAMP as "Now", dt AS "Original datetime", DATEDIFF(HOUR,␣ ,→CURRENT_TIMESTAMP, dt) AS "was ... hours ago" FROM cool_dates; Now | Original datetime | was ... hours ago ------+------+------2019-12-07 22:35:50 | 1955-11-05 01:21:00 | -561813 2019-12-07 22:35:50 | 1985-10-26 01:22:00 | -299061 2019-12-07 22:35:50 | 0079-08-24 13:00:00 | -17008209 2019-12-07 22:35:50 | 1997-01-01 00:00:00 | -201022 2019-12-07 22:35:50 | 1997-12-31 23:59:59 | -192263 3.2.3.1.6.160 DATEPART Extracts a date or time part from a DATE or DATETIME value. Note: SQream DB also supports the ANSI EXTRACT syntax. 3.2.3.1.6.161 Syntax DATEPART( interval, date_expr ) --> INT interval ::= YEAR | YYYY | YY | QUARTER | QQ | Q | MONTH | MM | M | DAYOFYEAR | DOY | DY | Y | DAY | DD | D | WEEK | WK | WW | WEEKDAY | DW | HOUR | HH | MINUTE | MI | N | SECOND | SS | S | MILLISECOND | MS 3.2.3.1.6.162 Arguments Parame- Description ter interval An interval representing a date part. See the table below or the syntax reference above for valid date parts date_expr A DATE or DATETIME expression 550 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.163 Valid date parts Date part Shorthand Definition YEAR YYYY, YY Year (0 - 9999) QUARTER QQ, Q Quarter (1-4) MONTH MM, M Month (1-12) DAYOFYEAR DOY, DY, Y Day of the year (1-365) DAY DD, D Day of the month (1-31) WEEK WK, WW Week of the year (1-52) WEEKDAY DW Weekday / Day of week (1-7) HOUR HH Hour (0-23) MINUTE MI, N Minute (0-59) SECOND SS, S Seconds (0-59) MILLISECOND MS Milliseconds (0-999) Note: • The first day of the week is Sunday, when used with WEEKDAY. 3.2.3.1.6.164 Returns • An integer representing the date part value 3.2.3.1.6.165 Notes • All date parts work on a DATETIME. • The HOUR, MINUTE, SECOND, and MILLISECOND date parts work only on DATETIME. Using them on DATE will result in an error. 3.2.3.1.6.166 Examples For these examples, consider the following table and contents: CREATE TABLE cool_dates(name VARCHAR(40), d DATE, dt DATETIME); INSERT INTO cool_dates VALUES ('Marty McFly goes back to this time','1955-11-05','1955- ,→11-05 01:21:00.000') ,('Marty McFly came from this time', '1985-10-26', '1985-10-26 01:22:00.000') ,('Vesuvius erupts', '79-08-24', '79-08-24 13:00:00.000') ,('1997 begins', '1997-01-01', '1997-01-01') ,('1997 ends', '1997-12-31','1997-12-31 23:59:59.999'); 3.2.3.1.6.167 Break up a DATE into components 3.2. SQL Statements and Syntax 551 SQream DB, Release 2021.1 master=> SELECT DATEPART(YEAR, d) AS year, DATEPART(MONTH, d) AS month, DATEPART(DAY,d)␣ ,→AS day, . DATEPART(Q,d) AS quarter FROM cool_dates; year | month | day | quarter -----+------+-----+------1955 | 11 | 5 | 4 1985 | 10 | 26 | 4 79 | 8 | 24 | 3 1997 | 1 | 1 | 1 1997 | 12 | 31 | 4 3.2.3.1.6.168 Break up a DATETIME into time components master=> SELECT DATEPART(HOUR, dt) AS hour, DATEPART(MINUTE, dt) AS minute, . DATEPART(SECOND,dt) AS seconds, DATEPART(MILLISECOND,dt) AS milliseconds . FROM cool_dates; hour | minute | seconds | milliseconds -----+------+------+------1 | 21 | 0 | 0 1 | 22 | 0 | 0 13 | 0 | 0 | 0 0 | 0 | 0 | 0 23 | 59 | 59 | 999 3.2.3.1.6.169 Count number of rows grouped by quarter Tip: Use ordinal aliases to avoid having to write complex functions in the GROUP BY clause. See Select lists for more information. master=> SELECT COUNT(*), DATEPART(Q, dt) AS quarter FROM cool_dates GROUP BY 2; count | quarter ------+------1 | 1 1 | 3 3 | 4 3.2.3.1.6.170 EOMONTH Returns a DATE or DATETIME value, reset to midnight on the last day of the month. Note: This function is provided for SQL Server compatability. 552 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.171 Syntax EOMONTH( date_expr ) 3.2.3.1.6.172 Arguments Parameter Description date_expr A DATE or DATETIME expression 3.2.3.1.6.173 Returns The return type is the same as the argument supplied. 3.2.3.1.6.174 Notes • The time value will be set to midnight on the last day of the month, • See also TRUNC to find the first day of the month. 3.2.3.1.6.175 Examples For these examples, consider the following table and contents: CREATE TABLE cool_dates(name VARCHAR(40), d DATE, dt DATETIME); INSERT INTO cool_dates VALUES ('Marty McFly goes back to this time','1955-11-05','1955- ,→11-05 01:21:00.000') ,('Marty McFly came from this time', '1985-10-26', '1985-10-26 01:22:00.000') ,('Vesuvius erupts', '79-08-24', '79-08-24 13:00:00.000') ,('1997 begins', '1997-01-01', '1997-01-01') ,('1997 ends', '1997-12-31','1997-12-31 23:59:59.999'); 3.2.3.1.6.176 Find last day of the month for a DATE master=> SELECT name, d AS date, EOMONTH(d) FROM cool_dates; name | date | eomonth ------+------+------Marty McFly goes back to this time | 1955-11-05 | 1955-11-30 Marty McFly came from this time | 1985-10-26 | 1985-10-31 Vesuvius erupts | 0079-08-24 | 0079-08-31 1997 begins | 1997-01-01 | 1997-01-31 1997 ends | 1997-12-31 | 1997-12-31 3.2. SQL Statements and Syntax 553 SQream DB, Release 2021.1 3.2.3.1.6.177 Find the last day of the month for a DATETIME Note: The time value is reset to midnight, regardless of the original time value. master=> SELECT name, dt AS datetime, EOMONTH(dt) FROM cool_dates; name | datetime | eomonth ------+------+------Marty McFly goes back to this time | 1955-11-05 01:21:00 | 1955-11-30 00:00:00 Marty McFly came from this time | 1985-10-26 01:22:00 | 1985-10-31 00:00:00 Vesuvius erupts | 0079-08-24 13:00:00 | 0079-08-31 00:00:00 1997 begins | 1997-01-01 00:00:00 | 1997-01-31 00:00:00 1997 ends | 1997-12-31 23:59:59 | 1997-12-31 00:00:00 3.2.3.1.6.178 EXTRACT Extracts a date or time part from a DATE or DATETIME value. Note: SQream DB also supports the SQL Server DATEPART syntax, which contains more date parts for use. 3.2.3.1.6.179 Syntax EXTRACT( interval FROM date_expr ) --> DOUBLE interval ::= YEAR | MONTH | WEEK | DOY | DAY | HOUR | MINUTE | SECOND | MILLISECONDS 3.2.3.1.6.180 Arguments Parame- Description ter interval An interval representing a date part. See the table below or the syntax reference above for valid date parts date_expr A DATE or DATETIME expression 554 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.181 Valid date parts Date part Definition YEAR Year (0.0 - 9999.0) MONTH Month (1.0-12.0) DOY Day of the year (1.0-365.0) DAY Day of the month (1.0-31.0) WEEK Week of the year (1.0-52.0) HOUR Hour (0.0-23.0) MINUTE Minute (0.0-59.0) SECOND Seconds (0.0-59.0) MILLISECONDS Milliseconds (0.0-999.0) 3.2.3.1.6.182 Returns • A floating point representing the date part value 3.2.3.1.6.183 Notes • The HOUR, MINUTE, SECOND, and MILLISECOND date parts work only on DATETIME. Using them on DATE will result in an error. 3.2.3.1.6.184 Examples For these examples, consider the following table and contents: CREATE TABLE cool_dates(name VARCHAR(40), d DATE, dt DATETIME); INSERT INTO cool_dates VALUES ('Marty McFly goes back to this time','1955-11-05','1955- ,→11-05 01:21:00.000') ,('Marty McFly came from this time', '1985-10-26', '1985-10-26 01:22:00.000') ,('Vesuvius erupts', '79-08-24', '79-08-24 13:00:00.000') ,('1997 begins', '1997-01-01', '1997-01-01') ,('1997 ends', '1997-12-31','1997-12-31 23:59:59.999'); 3.2.3.1.6.185 Break up a DATE into components master=> SELECT EXTRACT(YEAR FROM d) AS year, EXTRACT(MONTH FROM d) AS month,␣ ,→EXTRACT(DAY FROM d) AS day . FROM cool_dates; year | month | day ------+------+----- 1955.0 | 11.0 | 5.0 1985.0 | 10.0 | 26.0 79.0 | 8.0 | 24.0 1997.0 | 1.0 | 1.0 1997.0 | 12.0 | 31.0 3.2. SQL Statements and Syntax 555 SQream DB, Release 2021.1 3.2.3.1.6.186 GETDATE Returns the current date and time of the system. Note: This function is provided for SQL Server compatability. 3.2.3.1.6.187 Syntax GETDATE() --> DATETIME 3.2.3.1.6.188 Arguments None 3.2.3.1.6.189 Returns The current system date and time, with type DATETIME. 3.2.3.1.6.190 Notes • Aliases to this function include CURRENT_TIMESTAMP and SYSDATE. • To get the date only, see CURRENT_DATE. 3.2.3.1.6.191 Examples 3.2.3.1.6.192 Get the current system date and time master=> SELECT CURRENT_TIMESTAMP, CURRENT_TIMESTAMP(), SYSDATE, GETDATE(); getdate | getdate0 | getdate1 | getdate2 ------+------+------+------2019-12-07 23:04:26 | 2019-12-07 23:04:26 | 2019-12-07 23:04:26 | 2019-12-07 23:04:26 3.2.3.1.6.193 Find events that happen before this month We will use TRUNC to get the date at the beginning of this month, and then filter. master=> SELECT COUNT(*) FROM cool_dates WHERE dt <= TRUNC(GETDATE(), month); count ----- 5 556 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.194 SYSDATE Returns the current date and time of the system. Note: This function is provided for Oracle compatability and is called without parentheses. 3.2.3.1.6.195 Syntax SYSDATE --> DATETIME 3.2.3.1.6.196 Arguments None 3.2.3.1.6.197 Returns The current system date and time, with type DATETIME. 3.2.3.1.6.198 Notes • This function has a special form and is called without parentheses. • Aliases to this function include CURRENT_TIMESTAMP and GETDATE. • To get the date only, see CURRENT_DATE. 3.2.3.1.6.199 Examples 3.2.3.1.6.200 Get the current system date and time master=> SELECT CURRENT_TIMESTAMP, CURRENT_TIMESTAMP(), SYSDATE, GETDATE(); getdate | getdate0 | getdate1 | getdate2 ------+------+------+------2019-12-07 23:04:26 | 2019-12-07 23:04:26 | 2019-12-07 23:04:26 | 2019-12-07 23:04:26 3.2.3.1.6.201 Find events that happen before this month We will use TRUNC to get the date at the beginning of this month, and then filter. master=> SELECT COUNT(*) FROM cool_dates WHERE dt <= TRUNC(SYSDATE, month); count ----- 5 3.2. SQL Statements and Syntax 557 SQream DB, Release 2021.1 3.2.3.1.6.202 TRUNC Truncates a DATE or DATETIME value to a specified resolution. For example, truncating a DATE down to the nearest month returns the date of the first day of the month. Note: This function is overloaded. The function TRUNC can also round numbers towards zero. 3.2.3.1.6.203 Syntax TRUNC( date_expr [ , interval ]) interval ::= YEAR | YYYY | YY | QUARTER | QQ | Q | MONTH | MM | M | DAY | DD | D | WEEK | WK | WW | HOUR | HH | MINUTE | MI | N | SECOND | SS | S | MILLISECOND | MS 3.2.3.1.6.204 Arguments Pa- Description rame- ter date_exprA DATE or DATETIME expression intervalAn interval representing a date part. See the table below or the syntax reference above for valid date parts. If not specified, sets the value to to midnight and returns a DATETIME. 3.2.3.1.6.205 Valid date parts Date part Shorthand Definition YEAR YYYY, YY Year (0 - 9999) QUARTER QQ, Q Quarter (1-4) MONTH MM, M Month (1-12) DAY DD, D Day of the month (1-31) WEEK WK, WW Week of the year (1-52) HOUR HH Hour (0-23) MINUTE MI, N Minute (0-59) SECOND SS, S Seconds (0-59) MILLISECOND MS Milliseconds (0-999) 558 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.206 Returns If no date part is specified, the return type is DATETIME. Otherwise, the return type is the same as the argument supplied. 3.2.3.1.6.207 Notes • All date parts work on a DATETIME. • The HOUR, MINUTE, SECOND, and MILLISECOND date parts work only on DATETIME. Using them on DATE will result in an error. • If no date part is specified, the DATE or DATETIME value will be set to midnight on the date value. See examples below • See also EOMONTH to find the last day of the month. 3.2.3.1.6.208 Examples For these examples, consider the following table and contents: CREATE TABLE cool_dates(name VARCHAR(40), d DATE, dt DATETIME); INSERT INTO cool_dates VALUES ('Marty McFly goes back to this time','1955-11-05','1955- ,→11-05 01:21:00.000') ,('Marty McFly came from this time', '1985-10-26', '1985-10-26 01:22:00.000') ,('Vesuvius erupts', '79-08-24', '79-08-24 13:00:00.000') ,('1997 begins', '1997-01-01', '1997-01-01') ,('1997 ends', '1997-12-31','1997-12-31 23:59:59.999'); 3.2.3.1.6.209 Set all DATE columns to DATETIME at midnight master=> SELECT name, d AS date, trunc(d) FROM cool_dates; name | date | trunc ------+------+------Marty McFly goes back to this time | 1955-11-05 | 1955-11-05 00:00:00 Marty McFly came from this time | 1985-10-26 | 1985-10-26 00:00:00 Vesuvius erupts | 0079-08-24 | 0079-08-24 00:00:00 1997 begins | 1997-01-01 | 1997-01-01 00:00:00 1997 ends | 1997-12-31 | 1997-12-31 00:00:00 3.2.3.1.6.210 Find the first day of the month for dates master=> SELECT name, d AS date, trunc(d, MONTH) FROM cool_dates; name | date | trunc ------+------+------Marty McFly goes back to this time | 1955-11-05 | 1955-11-01 Marty McFly came from this time | 1985-10-26 | 1985-10-01 Vesuvius erupts | 0079-08-24 | 0079-08-01 (continues on next page) 3.2. SQL Statements and Syntax 559 SQream DB, Release 2021.1 (continued from previous page) 1997 begins | 1997-01-01 | 1997-01-01 1997 ends | 1997-12-31 | 1997-12-01 3.2.3.1.6.211 Calculate number of hours from New Years Combine TRUNC with DATEDIFF to calculate the number of hours since New Years. master=> SELECT name, dt AS datetime, . , DATEDIFF(HOUR, trunc(dt, YEAR), dt) AS "Hours since New Years" . FROM cool_dates; name | datetime | Hours since New Years ------+------+------Marty McFly goes back to this time | 1955-11-05 01:21:00 | 7393 Marty McFly came from this time | 1985-10-26 01:22:00 | 7153 Vesuvius erupts | 0079-08-24 13:00:00 | 5653 1997 begins | 1997-01-01 00:00:00 | 0 1997 ends | 1997-12-31 23:59:59 | 8759 3.2.3.1.6.212 ABS Returns the absolute (positive) value of a numeric expression 3.2.3.1.6.213 Syntax ABS( expr ) 3.2.3.1.6.214 Arguments Parameter Description expr Numeric expression 3.2.3.1.6.215 Returns Returns the same type as the argument supplied. 3.2.3.1.6.216 Notes • If the value is NULL, the result is NULL. 3.2.3.1.6.217 Examples For these examples, consider the following table and contents: 560 Chapter 3. Reference SQream DB, Release 2021.1 CREATE TABLE cool_numbers(i INT, f DOUBLE); INSERT INTO cool_numbers VALUES (1,1.618033), (-12,-34) ,(22, 3.141592), (-26538, 2.7182818284) ,(NULL, NULL), (NULL,1.4142135623) ,(42,NULL), (-42, NULL) , (-474, 365); 3.2.3.1.6.218 Absolute value on an integer numbers=> SELECT ABS(-24); 24 3.2.3.1.6.219 Absolute value on integer and floating point numbers=> SELECT i, ABS(i), f, ABS(f) FROM cool_numbers; i | abs | f | abs0 ------+------+------+----- 1 | 1 | 1.62 | 1.62 -12 | 12 | -34 | 34 22 | 22 | 3.14 | 3.14 -26538 | 26538 | 2.72 | 2.72 | | | | | 1.41 | 1.41 42 | 42 | | -42 | 42 | | -474 | 474 | 365 | 365 3.2.3.1.6.220 ACOS Returns the inverse cosine value of a numeric expression 3.2.3.1.6.221 Syntax ACOS( expr ) 3.2.3.1.6.222 Arguments Parameter Description expr Numeric expression in range [-1.0, 1.0] 3.2.3.1.6.223 Returns Always returns a floating point result of the inverse cosine, in radians. 3.2. SQL Statements and Syntax 561 SQream DB, Release 2021.1 3.2.3.1.6.224 Notes • Valid inputs for an inverse cosine are in the range [ -1.0 , 1.0]. Inputs exceeding this range will result in an error. • The result is given in radians, in the range [0, π]. • If the input value is NULL, the result is NULL. 3.2.3.1.6.225 Examples For these examples, consider the following table and contents: CREATE TABLE trig(i INT, f DOUBLE); INSERT INTO trig VALUES (0,0.0), (1,-0.866) ,(2,-0.707), (3,-0.5) ,(4,0.5), (5,0.707) ,(6, 0.866), (7, 1); 3.2.3.1.6.226 Inverse cosine of 0 (= π/2) numbers=> SELECT ACOS(0); acos ------1.5707963267948966 3.2.3.1.6.227 Computing inverse cosine for a column numbers=> SELECT f, ACOS(f) FROM trig; f | acos ------+----- 0 | 1.57 -0.87 | 2.62 -0.71 | 2.36 -0.5 | 2.09 0.5 | 1.05 0.71 | 0.79 0.87 | 0.52 1 | 0 3.2.3.1.6.228 Arithmetic operators Arithmetic operators are functions that can be used as infix operators, or as prefix operators. 562 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.229 Syntax arithmetic_expr ::= | value_expr binary_operator value_expr | unary_operator value_expr arithmetic_infix_operator ::= + | - | * | / | % arithmetic_unary_operator ::= + | - 3.2.3.1.6.230 Arguments Parameter Description value_expr Numeric expression, or expression that can be cast to a numeric expression 3.2.3.1.6.231 Arithmetic operators Operator Syntax Description + (unary) +a Converts a string to a numeric value. Identical to a :: double + a + b Adds two expressions together - (unary) -a Negates a numeric expression - a - b Subtracts b from a * a * b Multiplies a by b / a / b Divides a by b % a % b Modulu of a by b. See also MOD, % 3.2.3.1.6.232 Notes • If any of the inputs value are NULL, the result is NULL. 3.2.3.1.6.233 Examples 3.2.3.1.6.234 Coercing with a unary + numbers=> SELECT +'5'; 5.00 3.2.3.1.6.235 Using infix operators 3.2. SQL Statements and Syntax 563 SQream DB, Release 2021.1 numbers=> SELECT 5*3 AS "5*3", 5+3 AS "5+3", 2-5 AS "2-5" . , 2.0-5 AS "2.0-5", 11 % 5 AS "11%5", 11/5 AS "11/5", 11.0 / 5 AS "11.0/ ,→5"; 5*3 | 5+3 | 2-5 | 2.0-5 | 11%5 | 11/5 | 11.0/5 ----+-----+-----+------+------+------+------15 | 8 | -3 | -3 | 1 | 2 | 2.2 3.2.3.1.6.236 ASIN Returns the inverse sine value of a numeric expression 3.2.3.1.6.237 Syntax ASIN( expr ) 3.2.3.1.6.238 Arguments Parameter Description expr Numeric expression in range [-1.0, 1.0] 3.2.3.1.6.239 Returns Always returns a floating point result of the inverse sine, in radians. 3.2.3.1.6.240 Notes • Valid inputs for an inverse sine are in the range [ -1.0 , 1.0]. Inputs exceeding this range will result in an error. • The result is given in radians, in the range [-π/2, π/2]. • If the input value is NULL, the result is NULL. 3.2.3.1.6.241 Examples For these examples, consider the following table and contents: CREATE TABLE trig(i INT, f DOUBLE); INSERT INTO trig VALUES (0,0.0), (1,-0.866) ,(2,-0.707), (3,-0.5) ,(4,0.5), (5,0.707) ,(6, 0.866), (7, 1); 564 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.242 Inverse sine of 1 (= π/2) numbers=> SELECT ASIN(1); asin ------1.5707963267948966 3.2.3.1.6.243 Computing inverse sine for a column numbers=> SELECT f, ASIN(f) FROM trig; f | asin ------+------0 | 0 -0.87 | -1.05 -0.71 | -0.79 -0.5 | -0.52 0.5 | 0.52 0.71 | 0.79 0.87 | 1.05 1 | 1.57 3.2.3.1.6.244 ATAN Returns the inverse tangent value of a numeric expression 3.2.3.1.6.245 Syntax ATAN( expr ) 3.2.3.1.6.246 Arguments Parameter Description expr Numeric expression 3.2.3.1.6.247 Returns Always returns a floating point result of the inverse tangent, in radians. 3.2.3.1.6.248 Notes • The result is given in radians, in the range [-π/2, π/2]. • If the input value is NULL, the result is NULL. 3.2. SQL Statements and Syntax 565 SQream DB, Release 2021.1 3.2.3.1.6.249 Examples For these examples, consider the following table and contents: CREATE TABLE trig(f DOUBLE); INSERT INTO trig VALUES (-9999), (-3), (-2), (-1.732) , (-1), (-0.5), (0), (0.5) ,(1), (1.732), (2), (3), (9999); 3.2.3.1.6.250 Inverse tangent of 1 (= π/4) numbers=> SELECT ATAN(1); atan ---- 0.7853981633974483 3.2.3.1.6.251 Computing inverse tangent for a column numbers=> SELECT f, ASIN(f) FROM trig; f | atan ------+------9999 | -1.57069632 -3 | -1.24904577 -2 | -1.10714872 -1.732 | -1.04718485 -1 | -0.78539816 -0.5 | -0.46364761 0 | 0.0 0.5 | 0.46364761 1 | 0.78539816 1.732 | 1.04718485 2 | 1.10714872 3 | 1.24904577 9999 | 1.57069632 3.2.3.1.6.252 ATN2 Returns the inverse tangent of the ratio of x and y. ATN2(y,x) = ATAN(y/x) This represents the angle in radians from the positive x-axis of a plane and the point (x, y) on it, with positive sign for the upper half-plane. 3.2.3.1.6.253 Syntax 566 Chapter 3. Reference SQream DB, Release 2021.1 ATN2( expr1, expr2 ) 3.2.3.1.6.254 Arguments Parameter Description expr1 Numeric expression representing the y component. expr2 Numeric expression representing the x component 3.2.3.1.6.255 Returns Always returns a floating point result of the inverse tangent, in radians. 3.2.3.1.6.256 Notes • The result is given in radians, in the range (-π, π]. • If the input value is NULL, the result is NULL. 3.2.3.1.6.257 Examples 3.2.3.1.6.258 Inverse tangent of (1, 0) (= π/2) numbers=> SELECT ATN2(1,0); atn2 ---- 1.5707963267948966 3.2.3.1.6.259 Inverse tangent of x=0.5, y=SQRT(3)/2 (= π/3, or 60°) Use SQRT to simplify input and DEGREES to convert the result from radians. numbers=> SELECT ATN2(0.5*SQRT(3), 0.5), DEGREES(ATN2(0.5*SQRT(3), 0.5)); atn2 | degrees ------+------1.0472 | 60 3.2.3.1.6.260 CEILING / CEIL Calculates the smallest integer greater than the numeric expression given. See also ROUND, FLOOR. 3.2. SQL Statements and Syntax 567 SQream DB, Release 2021.1 3.2.3.1.6.261 Syntax CEILING( expr ) CEIL ( expr ) --> DOUBLE 3.2.3.1.6.262 Arguments Parameter Description expr Numeric expression 3.2.3.1.6.263 Returns • CEIL Always returns a floating point result. • CEILING returns the same type as the argument supplied. 3.2.3.1.6.264 Notes • If the input value is NULL, the result is NULL. 3.2.3.1.6.265 Examples 3.2.3.1.6.266 FLOOR vs. CEIL vs. ROUND numbers=> SELECT FLOOR(x), CEIL(x), ROUND(x) . FROM (VALUES (0.0001), (-0.0001) . , (PI()), (-2.718281), (500.1234)) as t(x); floor | ceil | round ------+------+------0 | 1 | -0 3 | 4 | 3 -3 | -2 | -3 500 | 501 | 500 3.2.3.1.6.267 COS Returns the cosine value of a numeric expression 3.2.3.1.6.268 Syntax COS( expr ) 568 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.269 Arguments Parameter Description expr Numeric expression representing an angle in radians 3.2.3.1.6.270 Returns Always returns a floating point result of the cosine. 3.2.3.1.6.271 Notes • The result is in the range [-1, 1]. • If the input value is NULL, the result is NULL. 3.2.3.1.6.272 Examples For these examples, consider the following table and contents: CREATE TABLE trig(f DOUBLE); INSERT INTO trig VALUES (0), (PI()/12), (PI()/8), (PI()/6) , (PI()/4), (PI()/3), (3*PI()/8), (5*PI()/12), (PI()/2); 3.2.3.1.6.273 Cosine of 0 (= 1) numbers=> SELECT COS(0); cos --- 1.0 3.2.3.1.6.274 Computing cosine for a column, in radians numbers=> SELECT f, COS(f) FROM trig; f | cos ------+------0 | 1 0.2618 | 0.9659 0.3927 | 0.9239 0.5236 | 0.866 0.7854 | 0.7071 1.0472 | 0.5 1.1781 | 0.3827 1.309 | 0.2588 1.5708 | 0 3.2. SQL Statements and Syntax 569 SQream DB, Release 2021.1 3.2.3.1.6.275 COT Returns the cotangent value of a numeric expression 3.2.3.1.6.276 Syntax COT( expr ) 3.2.3.1.6.277 Arguments Parameter Description expr Numeric expression representing an angle in radians 3.2.3.1.6.278 Returns Always returns a floating point result of the cotangent. 3.2.3.1.6.279 Notes • The result is in the range [0, infinity). • If the input value is NULL, the result is NULL. 3.2.3.1.6.280 Examples For these examples, consider the following table and contents: CREATE TABLE trig(f DOUBLE); INSERT INTO trig VALUES (0), (PI()/12), (PI()/8), (PI()/6) , (PI()/4), (PI()/3), (3*PI()/8), (5*PI()/12), (PI()/2); 3.2.3.1.6.281 Cotangent of π/4 (= 1) Tip: Use RADIANS to convert degrees to radians numbers=> SELECT COT(PI()/4), COT(RADIANS(45)); cot | cot0 ----+----- 1 | 1 570 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.282 Computing cotangents for a column, in radians numbers=> SELECT f, COT(f) FROM trig; f | cot ------+------0 | 16331239353195370 0.2618 | 3.7321 0.3927 | 2.4142 0.5236 | 1.7321 0.7854 | 1 1.0472 | 0.5774 1.1781 | 0.4142 1.309 | 0.2679 1.5708 | 0 3.2.3.1.6.283 CRC64 Calculates the CRC-64 hash of a text expression 3.2.3.1.6.284 Syntax CRC64( expr ) --> BIGINT CRC64_JOIN( expr ) --> BIGINT 3.2.3.1.6.285 Arguments Parameter Description expr Text expression (VARCHAR, TEXT) 3.2.3.1.6.286 Returns Returns a CRC-64 hash of the text input, of type BIGINT. 3.2.3.1.6.287 Notes • If the input value is NULL, the result is NULL. • The CRC64_JOIN can be used with VARCHAR only. It can not be used with TEXT. • The CRC64_JOIN variant ignores leading whitespace when used as a JOIN key. 3.2. SQL Statements and Syntax 571 SQream DB, Release 2021.1 3.2.3.1.6.288 Examples 3.2.3.1.6.289 Calculate a CRC-64 hash of a string numbers=> SELECT CRC64(x) FROM . (VALUES ('This is a relatively long text string, that can be converted to a shorter␣ ,→hash' :: varchar(80))) . as t(x); crc64 ------9085161068710498500 3.2.3.1.6.290 DEGREES Converts a numeric expression from radian values to degree values. See also RADIANS. 3.2.3.1.6.291 Syntax DEGREES( expr ) 3.2.3.1.6.292 Arguments Parameter Description expr Numeric expression 3.2.3.1.6.293 Returns Always returns a floating point result of the value in degrees. 3.2.3.1.6.294 Notes • If the value is NULL, the result is NULL. 3.2.3.1.6.295 Examples For these examples, consider the following table and contents: CREATE TABLE trig(f DOUBLE); INSERT INTO trig VALUES (0), (PI()/12), (PI()/8), (PI()/6) , (PI()/4), (PI()/3), (3*PI()/8), (5*PI()/12), (PI()/2); 572 Chapter 3. Reference SQream DB, Release 2021.1 numbers=> SELECT f, DEGREES(f) FROM trig; f | degrees ------+------0 | 0 0.2618 | 15 0.3927 | 22.5 0.5236 | 30 0.7854 | 45 1.0472 | 60 1.1781 | 67.5 1.309 | 75 1.5708 | 90 3.2.3.1.6.296 EXP Returns the natural exponent value of a numeric expression (ex) See also LOG. 3.2.3.1.6.297 Syntax EXP( expr ) 3.2.3.1.6.298 Arguments Parameter Description expr Positive numeric expression 3.2.3.1.6.299 Returns Always returns a floating point result. 3.2.3.1.6.300 Notes • If the input value is NULL, the result is NULL. 3.2.3.1.6.301 Examples 3.2.3.1.6.302 Natural exponent values numbers=> SELECT EXP(x) FROM (VALUES (1.0), (2.0), (4.0)) as t(x); exp ------2.7183 (continues on next page) 3.2. SQL Statements and Syntax 573 SQream DB, Release 2021.1 (continued from previous page) 7.3891 54.5982 3.2.3.1.6.303 FLOOR Calculates the largest integer smaller or equal to the numeric expression given. See also ROUND, CEILING / CEIL. 3.2.3.1.6.304 Syntax FLOOR ( expr ) 3.2.3.1.6.305 Arguments Parameter Description expr Numeric expression 3.2.3.1.6.306 Returns Returns the same type as the argument supplied. 3.2.3.1.6.307 Notes • If the input value is NULL, the result is NULL. 3.2.3.1.6.308 Examples 3.2.3.1.6.309 FLOOR vs. CEILING / CEIL vs. ROUND numbers=> SELECT FLOOR(x), CEIL(x), ROUND(x) . FROM (VALUES (0.0001), (-0.0001) . , (PI()), (-2.718281), (500.1234)) as t(x); floor | ceil | round ------+------+------0 | 1 | -0 3 | 4 | 3 -3 | -2 | -3 500 | 501 | 500 574 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.310 LOG Returns the natural logarithm value of a numeric expression The natural logarithm is also known as ln or logarithm to the base of the mathematical constant e. See also LOG10. 3.2.3.1.6.311 Syntax LOG( expr ) 3.2.3.1.6.312 Arguments Parameter Description expr Positive numeric expression 3.2.3.1.6.313 Returns Always returns a floating point result. 3.2.3.1.6.314 Notes • If the input value is NULL, the result is NULL. 3.2.3.1.6.315 Examples 3.2.3.1.6.316 Natural log values numbers=> SELECT LOG(0.0001), LOG(1), LOG(2.718281), LOG(5000000000000000); log | log0 | log1 | log2 ------+------+------+------9.2103 | 0 | 1 | 36.1482 3.2.3.1.6.317 LOG10 Returns the 10-base logarithm value of a numeric expression See also LOG (natural). 3.2.3.1.6.318 Syntax LOG10( expr ) 3.2. SQL Statements and Syntax 575 SQream DB, Release 2021.1 3.2.3.1.6.319 Arguments Parameter Description expr Positive numeric expression 3.2.3.1.6.320 Returns Always returns a floating point result. 3.2.3.1.6.321 Notes • If the input value is NULL, the result is NULL. 3.2.3.1.6.322 Examples 3.2.3.1.6.323 Log (base 10) values numbers=> SELECT LOG10(0.0001), LOG10(1), LOG10(100), LOG10(1000000); log10 | log100 | log101 | log102 ------+------+------+------4 | 0 | 2 | 6 3.2.3.1.6.324 MOD, % Calculates the modulu (remainder) of x divided by y. This function can be also be called with an infix operator, x % y. 3.2.3.1.6.325 Syntax modulu ::= MOD( expr1, expr2 ) | expr1 % expr2 3.2.3.1.6.326 Arguments Parameter Description expr1 Integer expression (dividend) expr2 Integer expression (divisor) 3.2.3.1.6.327 Returns Always returns a integer result. 576 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.328 Notes • If the input value is NULL, the result is NULL. • Both inputs are expected to be integers 3.2.3.1.6.329 Examples 3.2.3.1.6.330 Using the MOD function numbers=> SELECT MOD(11,-5), MOD(8,5), MOD(1,1); mod | mod0 | mod1 ----+------+----- 1 | 3 | 0 3.2.3.1.6.331 Using the infix operator numbers=> SELECT 11 %-5, 8 % 5, 1 % 1; ?column? | ?column?0 | ?column?1 ------+------+------1 | 3 | 0 3.2.3.1.6.332 PI Returns a constant value of π 3.2.3.1.6.333 Syntax PI( ) 3.2.3.1.6.334 Arguments None 3.2.3.1.6.335 Returns Always returns a floating point value of π, to 10 digits after the decimal. 3.2.3.1.6.336 Examples 3.2.3.1.6.337 Using PI() to represent degrees in radians 3.2. SQL Statements and Syntax 577 SQream DB, Release 2021.1 CREATE TABLE trig(f DOUBLE); INSERT INTO trig VALUES (0), (PI()/12), (PI()/8), (PI()/6) , (PI()/4), (PI()/3), (3*PI()/8), (5*PI()/12), (PI()/2); 3.2.3.1.6.338 Tangent of π/4 (= 1) Tip: Use RADIANS to convert degrees to radians numbers=> SELECT TAN(PI()/4), TAN(RADIANS(45)); tan | tan0 ----+----- 1 | 1 3.2.3.1.6.339 Sine of π/2 (= 1) numbers=> SELECT SIN(PI()/2), SIN(RADIANS(90)); sin | sin0 ----+----- 1 | 1 3.2.3.1.6.340 POWER Returns the value of x to the power of y (xy). POWER(x,2) is equivalent to SQUARE(x). Some DBMSs call this feature POW. See also: SQUARE. 3.2.3.1.6.341 Syntax POWER( base, exponent ) 3.2.3.1.6.342 Arguments Parameter Description base Numeric expression (x) exponent Numeric expression (y) 3.2.3.1.6.343 Returns Returns the same type as the argument supplied. 578 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.344 Notes • If the value is NULL, the result is NULL. 3.2.3.1.6.345 Examples 3.2.3.1.6.346 On integers numbers=> SELECT POWER(3,x) FROM (VALUES (1), (2), (3), (4), (5)) AS t(x); power ----- 3 9 27 81 243 3.2.3.1.6.347 On floating point numbers=> SELECT POWER(3.0,x) FROM (VALUES (1), (2), (3), (4), (5)) AS t(x); power ----- 3.0 9.0 27.0 81.0 243.0 3.2.3.1.6.348 RADIANS Converts a numeric expression from degree values to radian values. See also DEGREES. 3.2.3.1.6.349 Syntax RADIANS( expr ) 3.2.3.1.6.350 Arguments Parameter Description expr Numeric expression 3.2. SQL Statements and Syntax 579 SQream DB, Release 2021.1 3.2.3.1.6.351 Returns Always returns a floating point result of the value in radians. 3.2.3.1.6.352 Notes • If the value is NULL, the result is NULL. 3.2.3.1.6.353 Examples For these examples, consider the following table and contents: CREATE TABLE trig(f DOUBLE); INSERT INTO trig VALUES (0), (15), (22.5) ,(30), (45), (60), (67.5), (75), (90); -- Equivalent to these values in radians: -- (0), (PI()/12), (PI()/8), (PI()/6) -- ,(PI()/4), (PI()/3), (3*PI()/8), (5*PI()/12), (PI()/2); numbers=> SELECT f, RADIANS(f) FROM trig; f | radians -----+------0 | 0 15 | 0.2618 22.5 | 0.3927 30 | 0.5236 45 | 0.7854 60 | 1.0472 67.5 | 1.1781 75 | 1.309 90 | 1.5708 3.2.3.1.6.354 ROUND Rounds a numeric expression to the nearest precision. See also CEILING / CEIL, FLOOR. 3.2.3.1.6.355 Syntax ROUND( expr [, scale ] ) 580 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.356 Arguments Parameter Description expr Numeric expression to round scale Number of digits after the decimal point to round to. Defaults to 0 if not specified. 3.2.3.1.6.357 Returns Always returns a floating point result. 3.2.3.1.6.358 Notes • If the input value is NULL, the result is NULL. 3.2.3.1.6.359 Examples 3.2.3.1.6.360 Rounding to the nearest integer numbers=> SELECT ROUND(x) FROM (VALUES (0.0001), (PI()), (-2.718281), (500.1234), (0.5),␣ ,→(1.5)) as t(x); round ------0 3 -3 500 1 2 3.2.3.1.6.361 Rounding to 2 digits after the decimal point numbers=> SELECT ROUND(x,2) FROM (VALUES (0.0001), (PI()), (-2.718281), (500.1234)) as␣ ,→t(x); round ------0 3.14 -2.72 500.12 3.2.3.1.6.362 FLOOR vs. CEILING / CEIL vs. ROUND numbers=> SELECT FLOOR(x), CEIL(x), ROUND(x) . FROM (VALUES (0.0001), (-0.0001) (continues on next page) 3.2. SQL Statements and Syntax 581 SQream DB, Release 2021.1 (continued from previous page) . , (PI()), (-2.718281), (500.1234)) as t(x); floor | ceil | round ------+------+------0 | 1 | 0 -1 | 0 | 0 3 | 4 | 3 -3 | -2 | -3 500 | 501 | 500 3.2.3.1.6.363 SIN Returns the sine value of a numeric expression 3.2.3.1.6.364 Syntax SIN( expr ) 3.2.3.1.6.365 Arguments Parameter Description expr Numeric expression representing an angle in radians 3.2.3.1.6.366 Returns Always returns a floating point result of the sine. 3.2.3.1.6.367 Notes • The result is in the range [-1, 1]. • If the input value is NULL, the result is NULL. 3.2.3.1.6.368 Examples For these examples, consider the following table and contents: CREATE TABLE trig(f DOUBLE); INSERT INTO trig VALUES (0), (PI()/12), (PI()/8), (PI()/6) , (PI()/4), (PI()/3), (3*PI()/8), (5*PI()/12), (PI()/2); 582 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.369 Sine of π/2 (= 1) Tip: Use RADIANS to convert degrees to radians numbers=> SELECT SIN(PI()/2), SIN(RADIANS(90)); sin | sin0 ----+----- 1 | 1 3.2.3.1.6.370 Computing sine for a column, in radians numbers=> SELECT f, SIN(f) FROM trig; f | sin ------+------0 | 0 0.2618 | 0.2588 0.3927 | 0.3827 0.5236 | 0.5 0.7854 | 0.7071 1.0472 | 0.866 1.1781 | 0.9239 1.309 | 0.9659 1.5708 | 1 3.2.3.1.6.371 SQRT Returns the square root value of a non-negative numeric expression. 3.2.3.1.6.372 Syntax SQRT( expr ) 3.2.3.1.6.373 Arguments Parameter Description expr Non-negative numeric expression 3.2.3.1.6.374 Returns Always returns a floating point result of the square root 3.2. SQL Statements and Syntax 583 SQream DB, Release 2021.1 3.2.3.1.6.375 Notes • If the value is NULL, the result is NULL. • If the numeric expression is negative (<0), an error will be shown. 3.2.3.1.6.376 Examples For these examples, consider the following table and contents: CREATE TABLE cool_numbers(i INT, f DOUBLE); INSERT INTO cool_numbers VALUES (1,1.618033), (-12,-34) ,(22, 3.141592), (-26538, 2.7182818284) ,(NULL, NULL), (NULL,1.4142135623) ,(42,NULL), (-42, NULL) , (-474, 365); numbers=> SELECT SQRT(2); sqrt ------1.4142135623730951 Note: • SQRT always returns a floating point result. • Some clients may show fewer digits. See your client settings to change the precision shown. 3.2.3.1.6.377 Replacing negative values with NULL Combine with CASE to filter out negative results: numbers=> SELECT i, SQRT(CASE WHEN i>=0 THEN i ELSE NULL END) FROM cool_numbers; i | sqrt ------+----- 1 | 1 -12 | 22 | 4.69 -26538 | | | 42 | 6.48 -42 | -474 | 3.2.3.1.6.378 SQUARE Returns the square value of a numeric expression (x2). SQUARE(x) is equivalent to POWER(x,2). See POWER. 584 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.379 Syntax SQUARE( expr ) 3.2.3.1.6.380 Arguments Parameter Description expr Numeric expression 3.2.3.1.6.381 Returns Always returns a floating point result 3.2.3.1.6.382 Notes • If the value is NULL, the result is NULL. 3.2.3.1.6.383 Examples numbers=> SELECT SQUARE(3); square ------9.0 3.2.3.1.6.384 TAN Returns the tangent value of a numeric expression 3.2.3.1.6.385 Syntax TAN( expr ) 3.2.3.1.6.386 Arguments Parameter Description expr Numeric expression representing an angle in radians 3.2.3.1.6.387 Returns Always returns a floating point result of the tangent. 3.2. SQL Statements and Syntax 585 SQream DB, Release 2021.1 3.2.3.1.6.388 Notes • The result is in the range [0, infinity). • If the input value is NULL, the result is NULL. 3.2.3.1.6.389 Examples For these examples, consider the following table and contents: CREATE TABLE trig(f DOUBLE); INSERT INTO trig VALUES (0), (PI()/12), (PI()/8), (PI()/6) , (PI()/4), (PI()/3), (3*PI()/8), (5*PI()/12), (PI()/2); 3.2.3.1.6.390 Tangent of π/4 (= 1) Tip: Use RADIANS to convert degrees to radians numbers=> SELECT TAN(PI()/4), TAN(RADIANS(45)); tan | tan0 ----+----- 1 | 1 3.2.3.1.6.391 Computing tangents for a column, in radians numbers=> SELECT f, TAN(f) FROM trig; f | tan ------+------0 | 0 0.2618 | 0.2679 0.3927 | 0.4142 0.5236 | 0.5774 0.7854 | 1 1.0472 | 1.7321 1.1781 | 2.4142 1.309 | 3.7321 1.5708 | 16331239353195370 3.2.3.1.6.392 TRUNC Rounds a number to its integer representation towards 0. Note: This function is overloaded. The function TRUNC can also modify the precision of DATE and DATETIME values. 586 Chapter 3. Reference SQream DB, Release 2021.1 See also ROUND. 3.2.3.1.6.393 Syntax TRUNC( expr ) 3.2.3.1.6.394 Arguments Parameter Description expr Numeric expression 3.2.3.1.6.395 Returns Returns the same type as the argument supplied. 3.2.3.1.6.396 Notes • If the input value is NULL, the result is NULL. 3.2.3.1.6.397 Examples 3.2.3.1.6.398 Rounding to the nearest integer numbers=> SELECT TRUNC(x) FROM (VALUES (0.0001), (PI()), (-2.718281), (500.1234)) as␣ ,→t(x); trunc ----- 0.0 3.0 -2.0 500.0 3.2.3.1.6.399 TRUNC vs. ROUND numbers=> SELECT TRUNC(x), ROUND(x) . FROM (VALUES (0.0001), (-0.0001) . , (PI()), (-2.718281), (500.1234)) as t(x); trunc | round ------+------0.0 | -0.0 -0.0 | -0.0 3.0 | 3.0 -2.0 | -3.0 500.0 | 500.0 3.2. SQL Statements and Syntax 587 SQream DB, Release 2021.1 3.2.3.1.6.400 CHAR_LENGTH Calculates the number of characters in a string. Note: • This function is supported on TEXT only. • To get the length in bytes, see OCTET_LENGTH. • For VARCHAR strings, the octet length is the number of characters. Use LEN instead. 3.2.3.1.6.401 Syntax CHAR_LEN( text_expr ) --> INT 3.2.3.1.6.402 Arguments Parameter Description text_expr TEXT expression 3.2.3.1.6.403 Returns Returns an integer containing the number of characters in the string. 3.2.3.1.6.404 Notes • To get the length in bytes, see OCTET_LENGTH • If the value is NULL, the result is NULL. 3.2.3.1.6.405 Examples For these examples, consider the following table and contents: CREATE TABLE alphabets(line TEXT(50)); INSERT INTO alphabets VALUES ('abcdefghijklmnopqrstuvwxyz'), ('') ,(''); 3.2.3.1.6.406 Length in characters and bytes of strings ASCII characters take up 1 byte per character, while Thai takes up 3 bytes and Hebrew takes up 2 bytes. Unlike LEN, CHAR_LENGTH preserves the trailing whitespaces. 588 Chapter 3. Reference SQream DB, Release 2021.1 t=> SELECT LEN(line), CHAR_LENGTH(line), OCTET_LENGTH(line) FROM alphabets; len | char_length | octet_length ----+------+------26 | 26 | 26 47 | 47 | 141 22 | 22 | 44 3.2.3.1.6.407 CHARINDEX Returns the starting position of a string inside another string. See also PATINDEX, REGEXP_INSTR. 3.2.3.1.6.408 Syntax CHARINDEX ( needle_string_expr , haystack_string_expr [ , start_location ] ) 3.2.3.1.6.409 Arguments Parameter Description needle_string_exprString to find haystack_string_exprString to search within start_location An integer at which the search starts. This value is optinoal and when not supplied, the search starts at the beggining of needle_string_expr 3.2.3.1.6.410 Returns Integer start position of a match, or 0 if no match was found. 3.2.3.1.6.411 Notes • If the value is NULL, the result is NULL. 3.2.3.1.6.412 Examples For these examples, consider the following table and contents: CREATE TABLE jabberwocky(line VARCHAR(50)); INSERT INTO jabberwocky VALUES ('''Twas brillig, and the slithy toves '), (' Did gyre and gimble in the wabe: ') ,('All mimsy were the borogoves, '), (' And the mome raths outgrabe. ') ,('"Beware the Jabberwock, my son! '), (' The jaws that bite, the claws that␣ ,→catch! ') ,('Beware the Jubjub bird, and shun '), (' The frumious Bandersnatch!" '); 3.2. SQL Statements and Syntax 589 SQream DB, Release 2021.1 3.2.3.1.6.413 Using CHARINDEX t=> SELECT line, CHARINDEX('the', line) FROM jabberwocky line | charindex ------+------'Twas brillig, and the slithy toves | 20 Did gyre and gimble in the wabe: | 30 All mimsy were the borogoves, | 16 And the mome raths outgrabe. | 11 "Beware the Jabberwock, my son! | 9 The jaws that bite, the claws that catch! | 27 Beware the Jubjub bird, and shun | 8 The frumious Bandersnatch!" | 0 3.2.3.1.6.414 || (Concatenate) Concatenate two strings to create a longer string 3.2.3.1.6.415 Syntax expr1 || expr2 3.2.3.1.6.416 Arguments Parameter Description expr1, expr2 String expressions 3.2.3.1.6.417 Returns Returns the same type as the argument supplied. 3.2.3.1.6.418 Notes • Both values must be strings, and can’t be NULL. If NULLS are expected, use COALESCE. • SQream DB removes the trailing spaces from strings by default, which may lead to unexpected results. See the examples for more information. 3.2.3.1.6.419 Examples For these examples, assume a table named nba, with the following structure: 590 Chapter 3. Reference SQream DB, Release 2021.1 CREATE TABLE nba ( Name varchar(40), Team varchar(40), Number tinyint, Position varchar(2), Age tinyint, Height varchar(4), Weight real, College varchar(40), Salary float ); Here’s a peek at the table contents (Download nba.csv): Table 28: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics 3.2.3.1.6.420 Concatenate two values Convert values to string types before concatenation nba=> SELECT ("Age" :: VARCHAR(2)) || "Name" FROM nba ORDER BY 1 DESC LIMIT 5; ?column? ------40Tim Duncan 40Kevin Garnett 40Andre Miller 39Vince Carter 39Pablo Prigioni 3.2. SQL Statements and Syntax 591 SQream DB, Release 2021.1 3.2.3.1.6.421 Concatenate two strings t=> SELECT 'Hello, this is' || ' nice'; ?column? ------Hello, this is nice Warning: Trailing spaces are trimmed by default. For example, t=> SELECT 'Hello, this is ' || 'nice'; ?column? ------Hello, this isnice This may sometimes lead to an unexpected result. See the example below for a remedy. 3.2.3.1.6.422 Adding spaces Add a space and concatenate it first to bypass the space trimming issue nba=> SELECT ("Age" :: VARCHAR(2) || (' ' || "Name")) FROM nba ORDER BY 1 DESC LIMIT 5; ?column? ------40 Tim Duncan 40 Kevin Garnett 40 Andre Miller 39 Vince Carter 39 Pablo Prigioni 3.2.3.1.6.423 ISPREFIXOF Checks if one string is a prefix of the other. This is a more peformant way to write y LIKE (x || '%') See also: LIKE. 3.2.3.1.6.424 Syntax ISPREFIXOF(needle_string_expr , haystack_string_expr) --> BOOL 3.2.3.1.6.425 Arguments Parameter Description needle_string_expr String to locate haystack_string_expr String to search within 592 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.426 Returns TRUE if needle_string_expr is a prefix of haystack_string_expr, or FALSE otherwise. 3.2.3.1.6.427 Notes • This function is supported on TEXT strings only. • If the value is NULL, the result is NULL. 3.2.3.1.6.428 Examples For these examples, consider the following table and contents: CREATE TABLE jabberwocky(line VARCHAR(50)); INSERT INTO jabberwocky VALUES ('''Twas brillig, and the slithy toves '), (' Did gyre and gimble in the wabe: ') ,('All mimsy were the borogoves, '), (' And the mome raths outgrabe. ') ,('"Beware the Jabberwock, my son! '), (' The jaws that bite, the claws that␣ ,→catch! ') ,('Beware the Jubjub bird, and shun '), (' The frumious Bandersnatch!" '); 3.2.3.1.6.429 Filtering using ISPREFIXOF t=> SELECT line FROM jabberwocky WHERE ISPREFIXOF('And',TRIM(line)); line ------And the mome raths outgrabe. Tip: Use TRIM to avoid leading and trailing whitespace issues 3.2.3.1.6.430 LEFT Returns the left part of a character string with the specified number of characters. See also RIGHT. 3.2.3.1.6.431 Syntax LEFT( expr , character_count ) 3.2. SQL Statements and Syntax 593 SQream DB, Release 2021.1 3.2.3.1.6.432 Arguments Parameter Description expr String expression character_count A positive integer that specifies how many characters to return. 3.2.3.1.6.433 Returns Returns the same type as the argument supplied. 3.2.3.1.6.434 Notes • This function works on TEXT strings only. • If the value is NULL, the result is NULL. 3.2.3.1.6.435 Examples For these examples, consider the following table and contents: CREATE TABLE jabberwocky(line TEXT(50)); INSERT INTO jabberwocky VALUES ('''Twas brillig, and the slithy toves '), (' Did gyre and gimble in the wabe: ') ,('All mimsy were the borogoves, '), (' And the mome raths outgrabe. ') ,('"Beware the Jabberwock, my son! '), (' The jaws that bite, the claws that␣ ,→catch! ') ,('Beware the Jubjub bird, and shun '), (' The frumious Bandersnatch!" '); 3.2.3.1.6.436 Using LEFT t=> SELECT LEFT(line, 10), RIGHT(line, 10) FROM jabberwocky; left | right ------+------'Twas bril | thy toves Did | the wabe: All mimsy | orogoves, And | outgrabe. "Beware th | , my son! The | at catch! Beware the | and shun The | rsnatch!" 3.2.3.1.6.437 LEN Calculates the number of characters in a string. 594 Chapter 3. Reference SQream DB, Release 2021.1 Note: • This function is provided for SQL Server compatability. • For UTF-8 encoded TEXT strings, multi-byte characters are counted as a single character. To get the length in bytes, see OCTET_LENGTH. To get the length in characters, see CHAR_LENGTH. 3.2.3.1.6.438 Syntax LEN( expr ) --> INT 3.2.3.1.6.439 Arguments Parameter Description expr String expression 3.2.3.1.6.440 Returns Returns an integer containing the number of characters in the string. 3.2.3.1.6.441 Notes • If the value is NULL, the result is NULL. 3.2.3.1.6.442 Examples For these examples, consider the following table and contents: CREATE TABLE jabberwocky(line VARCHAR(50)); INSERT INTO jabberwocky VALUES ($$'Twas brillig, and the slithy toves$$), (' Did gyre and gimble in the wabe:') ,('All mimsy were the borogoves,'), (' And the mome raths outgrabe.') ,('"Beware the Jabberwock, my son!'), (' The jaws that bite, the claws that␣ ,→catch!') ,('Beware the Jubjub bird, and shun'), (' The frumious Bandersnatch!"'); CREATE TABLE alphabets(line TEXT(50)); INSERT INTO alphabets VALUES ('abcdefghijklmnopqrstuvwxyz'), ('') ,(''); 3.2. SQL Statements and Syntax 595 SQream DB, Release 2021.1 3.2.3.1.6.443 Length in characters and bytes of strings ASCII characters take up 1 byte per character, while Thai takes up 3 bytes and Hebrew takes up 2 bytes. t=> SELECT LEN(line), OCTET_LENGTH(line) FROM alphabets; len | octet_length ----+------26 | 26 47 | 141 22 | 44 3.2.3.1.6.444 Length of an ASCII string Note: SQream DB does not count trailing spaces, but does keep leading spaces. t=> SELECT LEN('Trailing spaces are not counted '); len --- 31 t=> SELECT LEN(' Leading spaces are counted'); len --- 38 3.2.3.1.6.445 Absolute value on integer and floating point numbers=> SELECT LEN(line) FROM jabberwocky; len --- 35 38 29 34 31 47 32 33 3.2.3.1.6.446 LIKE Tests if a string matches a given pattern. LIKE and RLIKE are similar. LIKE uses SQL patterns, whereas RLIKE uses POSIX regular expressions. See also: RLIKE, REGEXP_COUNT, REGEXP_INSTR, REGEXP_SUBSTR, ISPREFIXOF. 596 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.447 Syntax string_expr [ NOT ] LIKE string_test_expr 3.2.3.1.6.448 Arguments Parameter Description string_expr String to test string_test_expr Test pattern 3.2.3.1.6.449 Test patterns Pat- Description tern % match zero or more characters _ (un- match exactly one character der- score) [A-Z] match any character between A and Z inclusive. The - character between two other characters forms a range that matches all characters from the first character to the second. For example, [A-Z] matches all ASCII capital letters. [^A-Z] match any character not between A and Z [abcde] match any one of a b c d and e [^abcde]match any character that is not one of a b c d or e [abcC-F]match a b c or any character between C and F • \ (backslash) - escape character Using a backslash (\) indicates that the wildcard is interpreted as a regular character and not as a wildcard. 3.2.3.1.6.450 Returns TRUE if the test string matches the pattern, or FALSE otherwise. 3.2.3.1.6.451 Notes • The test pattern must be literal string. If matching just the beginning of the string, use ISPREFIXOF which supports column references. • Starting with version 2020.3.1, Column references or complex expressions are also supported. • If the value is NULL, the result is NULL. 3.2.3.1.6.452 Examples For these examples, assume a table named nba, with the following structure: 3.2. SQL Statements and Syntax 597 SQream DB, Release 2021.1 CREATE TABLE nba ( Name varchar(40), Team varchar(40), Number tinyint, Position varchar(2), Age tinyint, Height varchar(4), Weight real, College varchar(40), Salary float ); Here’s a peek at the table contents (Download nba.csv): Table 29: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics 3.2.3.1.6.453 Match the beginning of a string nba=> SELECT "Name","Age","Salary","Team" FROM nba WHERE "Team" LIKE 'Portland%' LIMIT 5; Name | Age | Salary | Team ------+-----+------+------Cliff Alexander | 20 | 525093 | Portland Trail Blazers Al-Farouq Aminu | 25 | 8042895 | Portland Trail Blazers Pat Connaughton | 23 | 625093 | Portland Trail Blazers Allen Crabbe | 24 | 947276 | Portland Trail Blazers Ed Davis | 27 | 6980802 | Portland Trail Blazers Tip: ISPREFIXOF is a more performant way to match the beginning of a string, especially This example can be written as 598 Chapter 3. Reference SQream DB, Release 2021.1 SELECT "Name","Age","Salary","Team" FROM nba WHERE ISPREFIXOF('Portland',"Team") LIMIT 5; 3.2.3.1.6.454 Match a wildcard character by escaping To match a wildcard, escape it with a backslash escape character: nba=> SELECT "Name" FROM nba WHERE "Name" LIKE '%\_%'; Name | Age | Salary | Team ------+-----+------+------R.J._Hunter | 22 | 1148640 | Boston Celtics 3.2.3.1.6.455 Negate with NOT nba=> SELECT "Name","Age","Salary","Team" FROM nba WHERE "Team" NOT LIKE 'Portland%'␣ ,→LIMIT 5; Name | Age | Salary | Team ------+-----+------+------Avery Bradley | 25 | 7730337 | Boston Celtics Jae Crowder | 25 | 6796117 | Boston Celtics John Holland | 27 | | Boston Celtics R.J. Hunter | 22 | 1148640 | Boston Celtics Jonas Jerebko | 29 | 5000000 | Boston Celtics 3.2.3.1.6.456 Match the middle of a string nba=> SELECT "Name","Age","Salary","Team" FROM nba WHERE "Team" LIKE '%zz%' LIMIT 5; Name | Age | Salary | Team ------+-----+------+------Jordan Adams | 21 | 1404600 | Memphis Grizzlies Tony Allen | 34 | 5158539 | Memphis Grizzlies Chris Andersen | 37 | 5000000 | Memphis Grizzlies Matt Barnes | 36 | 3542500 | Memphis Grizzlies Vince Carter | 39 | 4088019 | Memphis Grizzlies 3.2.3.1.6.457 Find players with a middle name or suffix nba=> SELECT "Name","Age","Salary","Team" FROM nba WHERE "Name" LIKE '%%%'; Name | Age | Salary | Team ------+-----+------+------James Michael McAdoo | 23 | 845059 | Golden State Warriors Luc Richard Mbah a Moute | 29 | 947276 | Los Angeles Clippers Larry Nance Jr. | 23 | 1155600 | Los Angeles Lakers Metta World Peace | 36 | 947276 | Los Angeles Lakers Glenn Robinson III | 22 | 1100000 | Indiana Pacers Johnny O'Bryant III | 23 | 845059 | Milwaukee Bucks (continues on next page) 3.2. SQL Statements and Syntax 599 SQream DB, Release 2021.1 (continued from previous page) Tim Hardaway Jr. | 24 | 1304520 | Atlanta Hawks Frank Kaminsky III | 23 | 2612520 | Charlotte Hornets Kelly Oubre Jr. | 20 | 1920240 | Washington Wizards Otto Porter Jr. | 23 | 4662960 | Washington Wizards 3.2.3.1.6.458 Find NON-LITERAL patterns nba=> CREATE TABLE t(x int not null, y text not null, z text not null); nba=> INSERT INTO t VALUES (1,'abc','a'),(2,'abcd','bc'); 3.2.3.1.6.459 Select rows in which z is a prefix of y: nba=> SELECT * FROM t WHERE y LIKE z || '%'; x | y | z ------1 | abc | a 3.2.3.1.6.460 Select rows in which y contains z as a substring: nba=> SELECT * FROM t WHERE y LIKE z || '%'; x | y | z ------1 | abc | a 2 | abcd | bc 3.2.3.1.6.461 Values that contain wildcards as well: nba=> CREATE TABLE patterns(x text not null); nba=> INSERT INTO patterns values ('%'),('a%'),('%a'); nba=> SELECT x, 'abc' LIKE x FROM patterns; x | ?column? ------% | 1 a% | 1 %a | 0 3.2.3.1.6.462 LOWER Converts characters in a string to lower case See also: UPPER 600 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.463 Syntax LOWER( expr ) 3.2.3.1.6.464 Arguments Parameter Description expr String expression 3.2.3.1.6.465 Returns Returns the same type as the argument supplied. 3.2.3.1.6.466 Notes • If the value is NULL, the result is NULL. 3.2.3.1.6.467 Examples For these examples, consider the following table and contents: CREATE TABLE jabberwocky(line VARCHAR(50)); INSERT INTO jabberwocky VALUES ('''Twas brillig, and the slithy toves'), (' Did gyre and gimble in the wabe:') ,('All mimsy were the borogoves,'), (' And the mome raths outgrabe.') ,('"Beware the Jabberwock, my son!'), (' The jaws that bite, the claws that␣ ,→catch!') ,('Beware the Jubjub bird, and shun'), (' The frumious Bandersnatch!"'); 3.2.3.1.6.468 Lower-casing a literal value t=> SELECT LOWER('SQream DB'); sqream db 3.2.3.1.6.469 Lower-casing a column of values t=> SELECT LOWER(line) FROM jabberwocky; lower ------'twas brillig, and the slithy toves did gyre and gimble in the wabe: all mimsy were the borogoves, (continues on next page) 3.2. SQL Statements and Syntax 601 SQream DB, Release 2021.1 (continued from previous page) and the mome raths outgrabe. "beware the jabberwock, my son! the jaws that bite, the claws that catch! beware the jubjub bird, and shun the frumious bandersnatch!" 3.2.3.1.6.470 LTRIM Trims leading whitespace from a string. See also TRIM, RTRIM. 3.2.3.1.6.471 Syntax LTRIM( expr ) 3.2.3.1.6.472 Arguments Parameter Description expr String expression 3.2.3.1.6.473 Returns Returns the same type as the argument supplied. 3.2.3.1.6.474 Notes • This function is equivalent to the ANSI form TRIM( LEADING FROM expr ) • If the value is NULL, the result is NULL. 3.2.3.1.6.475 Examples For these examples, consider the following table and contents: CREATE TABLE jabberwocky(line TEXT(50)); INSERT INTO jabberwocky VALUES ('''Twas brillig, and the slithy toves'), (' Did gyre and gimble in the wabe:') ,('All mimsy were the borogoves,'), (' And the mome raths outgrabe.') ,('"Beware the Jabberwock, my son!'), (' The jaws that bite, the claws that␣ ,→catch!') ,('Beware the Jubjub bird, and shun'), (' The frumious Bandersnatch!"'); 602 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.476 Trimming a literal value t=> SELECT LTRIM(' SQream DB'); ltrim ------SQream DB 3.2.3.1.6.477 Trimming a column of values In this example we use || (Concatenate) to show the leading spaces. t=> SELECT ('Line: ' || line) as untrimmed, ('Line: ' || LTRIM(line)) as trimmed FROM␣ ,→jabberwocky; untrimmed | trimmed ------+------,→------Line: 'Twas brillig, and the slithy toves | Line: 'Twas brillig, and the␣ ,→slithy toves Line: Did gyre and gimble in the wabe: | Line: Did gyre and gimble in the␣ ,→wabe: Line: All mimsy were the borogoves, | Line: All mimsy were the␣ ,→borogoves, Line: And the mome raths outgrabe. | Line: And the mome raths␣ ,→outgrabe. Line: "Beware the Jabberwock, my son! | Line: "Beware the Jabberwock, my␣ ,→son! Line: The jaws that bite, the claws that catch! | Line: The jaws that bite, the␣ ,→claws that catch! Line: Beware the Jubjub bird, and shun | Line: Beware the Jubjub bird,␣ ,→and shun Line: The frumious Bandersnatch!" | Line: The frumious Bandersnatch!" 3.2.3.1.6.478 OCTET_LENGTH Calculates the number of bytes in a string. Note: • This function is supported on TEXT strings only. • To get the length in characters, see CHAR_LENGTH. • For VARCHAR strings, the octet length is the number of characters. Use LEN instead. 3.2.3.1.6.479 Syntax OCTET_LEN( text_expr ) --> INT 3.2. SQL Statements and Syntax 603 SQream DB, Release 2021.1 3.2.3.1.6.480 Arguments Parameter Description text_expr TEXT expression 3.2.3.1.6.481 Returns Returns an integer containing the number of bytes in the string. 3.2.3.1.6.482 Notes • To get the length in characters, see CHAR_LENGTH • If the value is NULL, the result is NULL. 3.2.3.1.6.483 Examples For these examples, consider the following table and contents: CREATE TABLE alphabets(line NVARCHAR(50)); INSERT INTO alphabets VALUES ('abcdefghijklmnopqrstuvwxyz'), ('') ,(''); 3.2.3.1.6.484 Length in characters and bytes of strings ASCII characters take up 1 byte per character, while Thai takes up 3 bytes and Hebrew takes up 2 bytes. t=> SELECT LEN(line), CHAR_LENGTH(line), OCTET_LENGTH(line) FROM alphabets; len | char_length | octet_length ----+------+------26 | 26 | 26 47 | 47 | 141 22 | 22 | 44 3.2.3.1.6.485 PATINDEX Returns the starting position of a pattern inside a string. See also CHARINDEX, REGEXP_INSTR. PATINDEX works like LIKE, so the standard wildcards are supported. You do not have to enclose the pattern between percents. Note: This function is provided for SQL Server compatability. 604 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.486 Syntax PATINDEX ( string_test_expr , string_expr ) --> INT 3.2.3.1.6.487 Arguments Parameter Description string_test_expr Pattern to find string_expr String to search within 3.2.3.1.6.488 Test patterns Pat- Description tern % match zero or more characters _ match exactly one character [A-Z] match any character between A and Z inclusive. The - character between two other characters forms a range that matches all characters from the first character to the second. For example, [A-Z] matches all ASCII capital letters. [^A-Z]match any character not between A and Z [abcde]match any one of a b c d and e [^abcde]match any character that is not one of a b c d or e [abcC-F]match a b c or any character between C and F 3.2.3.1.6.489 Returns Integer start position of a match, or 0 if no match was found. 3.2.3.1.6.490 Notes • If the value is NULL, the result is NULL. • PATINDEX works on VARCHAR text types only. • PATINDEX does not work on all literal values - only on column values. (i.e. PATINDEX('%mimsy%', 'All mimsy were the borogoves') will not work, but PATINDEX('%mimsy%', line) will) 3.2.3.1.6.491 Examples For these examples, consider the following table and contents: CREATE TABLE jabberwocky(line VARCHAR(50)); INSERT INTO jabberwocky VALUES (continues on next page) 3.2. SQL Statements and Syntax 605 SQream DB, Release 2021.1 (continued from previous page) ('''Twas brillig, and the slithy toves '), (' Did gyre and gimble in the wabe: ') ,('All mimsy were the borogoves, '), (' And the mome raths outgrabe. ') ,('"Beware the Jabberwock, my son! '), (' The jaws that bite, the claws that␣ ,→catch! ') ,('Beware the Jubjub bird, and shun '), (' The frumious Bandersnatch!" '); 3.2.3.1.6.492 Simple patterns t=> SELECT line, PATINDEX('%mimsy%', line) as "Position of mimsy" FROM jabberwocky; line | Position of mimsy ------+------'Twas brillig, and the slithy toves | 0 Did gyre and gimble in the wabe: | 0 All mimsy were the borogoves, | 5 And the mome raths outgrabe. | 0 "Beware the Jabberwock, my son! | 0 The jaws that bite, the claws that catch! | 0 Beware the Jubjub bird, and shun | 0 The frumious Bandersnatch!" | 0 3.2.3.1.6.493 Complex wildcards expressions The following example uses the ^ negation operator to find the position of a character that is neither a number, a letter, or a space. t=> SELECT PATINDEX('%[^ 0-9A-z]%', line), line FROM jabberwocky; patindex | line ------+------1 | 'Twas brillig, and the slithy toves 38 | Did gyre and gimble in the wabe: 29 | All mimsy were the borogoves, 34 | And the mome raths outgrabe. 1 | "Beware the Jabberwock, my son! 25 | The jaws that bite, the claws that catch! 23 | Beware the Jubjub bird, and shun 32 | The frumious Bandersnatch!" 3.2.3.1.6.494 REGEXP_COUNT Counts the number of occurences that a string matches a given regular expression pattern. See also: REGEXP_INSTR, REGEXP_SUBSTR. 606 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.495 Syntax REGEXP_COUNT( string_expr, string_test_expr [, start_index ] ) --> INT 3.2.3.1.6.496 Arguments Parameter Description string_expr String to test string_test_expr Test pattern start_index The character index offset to start counting from. Defaults to1 3.2.3.1.6.497 Test patterns Pat- Description tern ^ Match the beginning of a string $ Match the end of a string . Match any character (including whitespace such as carriage return and newline) * Match the preceding pattern zero or more times + Match the preceding pattern at least once ? Match the preceding pattern once at most de|abcMatch either de or abc (abc)*Match zero or more instances of the sequence abc {2} Match the preceding pattern exactly two times {2, Match the preceding pattern between two and four times 4} [a-dX]Matches, any character that is (or is not when negated with ^) either a, b, c, d, or X. The - character [^a-dX]between two other characters forms a range that matches all characters from the first character to the second. For example, [0-9] matches any decimal digit. To include a literal ] character, it must immediately follow the opening bracket [. To include a literal - character, it must be written first or last. Any character that does not have a defined special meaning inside a [] pair matches only itself. 3.2.3.1.6.498 Returns Integer result of the number of times that a pattern occurs in a string. 3.2.3.1.6.499 Notes • The test pattern must be literal string. Column references or complex expressions are currently un- supported. • If the value is NULL, the result is NULL. 3.2. SQL Statements and Syntax 607 SQream DB, Release 2021.1 3.2.3.1.6.500 Examples For these examples, assume a table named nba, with the following structure: CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float ); Here’s a peek at the table contents (Download nba.csv): Table 30: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics 3.2.3.1.6.501 Find players with at least 3 names (2 spaces) nba=> SELECT "Name" FROM nba WHERE REGEXP_COUNT("Name", '( )+')>1; Name ------James Michael McAdoo Luc Richard Mbah a Moute Larry Nance Jr. Metta World Peace Glenn Robinson III (continues on next page) 608 Chapter 3. Reference SQream DB, Release 2021.1 (continued from previous page) Johnny O'Bryant III Tim Hardaway Jr. Frank Kaminsky III Kelly Oubre Jr. Otto Porter Jr. 3.2.3.1.6.502 Using the offset index Start finding spaces that appear 8 characters in nba=> SELECT "Name" FROM nba WHERE REGEXP_COUNT("Name", '( )+', 8)>1; Name ------Luc Richard Mbah a Moute 3.2.3.1.6.503 REGEXP_INSTR Returns the start position of a regex match. searches a string for a POSIX-style regular expression. This function returns the position within the string where the match was located See also: REGEXP_COUNT, REGEXP_SUBSTR. 3.2.3.1.6.504 Syntax REGEXP_INSTR( string_expr, string_test_expr [, start_index [, occurence [, return_ ,→position ]] ) --> INT 3.2.3.1.6.505 Arguments Pa- Description rame- ter string_exprString to test string_test_exprTest pattern start_indexThe character index offset to start counting from. Defaults to1 occurenceWhich occurence to search for. Defaults to 1 return_positionSpecifies the location within the string to return. Using 0, the function returns the string position of the first character of the substring that matches the pattern. A value greater than 0returns will return the position of the first character following the end of the pattern. Defaults to0 3.2. SQL Statements and Syntax 609 SQream DB, Release 2021.1 3.2.3.1.6.506 Test patterns Pat- Description tern ^ Match the beginning of a string $ Match the end of a string . Match any character (including whitespace such as carriage return and newline) * Match the preceding pattern zero or more times + Match the preceding pattern at least once ? Match the preceding pattern once at most de|abcMatch either de or abc (abc)*Match zero or more instances of the sequence abc {2} Match the preceding pattern exactly two times {2, Match the preceding pattern between two and four times 4} [a-dX]Matches, any character that is (or is not when negated with ^) either a, b, c, d, or X. The - character [^a-dX]between two other characters forms a range that matches all characters from the first character to the second. For example, [0-9] matches any decimal digit. To include a literal ] character, it must immediately follow the opening bracket [. To include a literal - character, it must be written first or last. Any character that does not have a defined special meaning inside a [] pair matches only itself. 3.2.3.1.6.507 Returns Integer start position of a regex match, or 0 if no match was found. 3.2.3.1.6.508 Notes • The test pattern must be literal string. Column references or complex expressions are currently un- supported. • If the value is NULL, the result is NULL. 3.2.3.1.6.509 Examples For these examples, assume a table named nba, with the following structure: CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float ); 610 Chapter 3. Reference SQream DB, Release 2021.1 Here’s a peek at the table contents (Download nba.csv): Table 31: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics 3.2.3.1.6.510 Find players with ‘ow’ in their name nba=> SELECT "Name", REGEXP_INSTR("Name", 'ow') FROM nba WHERE REGEXP_COUNT("Name", 'ow ,→')>0; Name | regexp_instr ------+------Jae Crowder | 7 Markel Brown | 10 Langston Galloway | 14 Kyle Lowry | 7 Norman Powell | 9 Anthony Brown | 11 Cameron Bairstow | 15 Lorenzo Brown | 11 Dirk Nowitzki | 7 Dwight Powell | 9 Dwight Howard | 9 Justise Winslow | 14 Karl-Anthony Towns | 15 Anthony Morrow | 13 3.2.3.1.6.511 Using the return_position argument Get the second occurence of the letter ‘k’ in a player’s name. We set start_index to 1 (the default) 3.2. SQL Statements and Syntax 611 SQream DB, Release 2021.1 nba=> SELECT "Name", REGEXP_INSTR("Name", 'k', 1, 2) FROM nba WHERE REGEXP_INSTR("Name", ,→ 'k', 1, 2)>0; Name | regexp_instr ------+------Nik Stauskas | 10 Tarik Black | 11 Dirk Nowitzki | 12 Sam Dekker | 8 Kendrick Perkins | 13 Frank Kaminsky III | 13 Nikola Jokic | 10 Nikola Pekovic | 10 3.2.3.1.6.512 REGEXP_SUBSTR Returns the occurence of a regex match. See also: REGEXP_INSTR, REGEXP_COUNT. 3.2.3.1.6.513 Syntax REGEXP_SUBSTR( string_expr, string_test_expr [ , start_index [ , occurence ] ] ) --> INT 3.2.3.1.6.514 Arguments Parameter Description string_exprString to test string_test_exprTest pattern start_indexThe character index offset to start counting from. Defaults to1 occurence Which occurence to search for. Defaults to 1 return_positionSetes the position within the string to return. Using 0, the function returns the string position of the first character of the substring that matches the pattern. Defaults to0 612 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.515 Test patterns Pat- Description tern ^ Match the beginning of a string $ Match the end of a string . Match any character (including whitespace such as carriage return and newline) * Match the preceding pattern zero or more times + Match the preceding pattern at least once ? Match the preceding pattern once at most de|abcMatch either de or abc (abc)*Match zero or more instances of the sequence abc {2} Match the preceding pattern exactly two times {2, Match the preceding pattern between two and four times 4} [a-dX]Matches, any character that is (or is not when negated with ^) either a, b, c, d, or X. The - character [^a-dX]between two other characters forms a range that matches all characters from the first character to the second. For example, [0-9] matches any decimal digit. To include a literal ] character, it must immediately follow the opening bracket [. To include a literal - character, it must be written first or last. Any character that does not have a defined special meaning inside a [] pair matches only itself. 3.2.3.1.6.516 Returns Returns the same type as the argument supplied, or Integer start position of a regex match, or an empty string ('') if no match was found. 3.2.3.1.6.517 Notes • The test pattern must be literal string. Column references or complex expressions are currently un- supported. • If the value is NULL, the result is NULL. 3.2.3.1.6.518 Examples For these examples, assume a table named nba, with the following structure: CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float ); 3.2. SQL Statements and Syntax 613 SQream DB, Release 2021.1 Here’s a peek at the table contents (Download nba.csv): Table 32: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics 3.2.3.1.6.519 Find players with ‘o’ in their name nba=> SELECT "Name", REGEXP_SUBSTR("Name", '([a-zA-Z]+o[a-zA-Z]+)+') FROM nba ORDER BY 2␣ ,→DESC LIMIT 10; Name | regexp_substr ------+------James Young | Young Thaddeus Young | Young Nick Young | Young Metta World Peace | World Christian Wood | Wood Justise Winslow | Winslow Wilson Chandler | Wilson C.J. Wilcox | Wilcox Shayne Whittington | Whittington Russell Westbrook | Westbrook 3.2.3.1.6.520 Using the return_position argument Get the last name (or middle name) for players with ‘o’ in their first and last name. We set start_index to 1 (the default) nba=> SELECT "Name", REGEXP_SUBSTR("Name", '([a-zA-Z]+o[a-zA-Z]+)+', 1, 2) FROM nba␣ ,→ORDER BY 2 DESC LIMIT 10; Name | regexp_substr (continues on next page) 614 Chapter 3. Reference SQream DB, Release 2021.1 (continued from previous page) ------+------Joe Young | Young Tony Wroten | Wroten Noah Vonleh | Vonleh Karl-Anthony Towns | Towns Anthony Tolliver | Tolliver Hollis Thompson | Thompson Jason Thompson | Thompson Donald Sloan | Sloan Jonathon Simmons | Simmons Ramon Sessions | Sessions 3.2.3.1.6.521 REPEAT Repeats a string as many times as specified. Warning: This function works ONLY with TEXT data type. 3.2.3.1.6.522 Syntax REPEAT(expr, character_count) 3.2.3.1.6.523 Arguments 3.2.3.1.6.524 Returns Returns the same type as the argument supplied. 3.2.3.1.6.525 Notes • When character_count <= 0, and empty string is returned. 3.2.3.1.6.526 Examples For these examples, consider the following table and contents: CREATE TABLE customer(customername TEXT)); INSERT INTO customer VALUES ('Alfreds Futterkiste'), ('Ana Trujillo Emparedados y helados'), ('Antonio Moreno Taquería'), ('Around the Horn'); 3.2. SQL Statements and Syntax 615 SQream DB, Release 2021.1 3.2.3.1.6.527 Repeat the text in customername 2 times: t=> SELECT REPEAT(customername, 2) FROM customers; repeat ------Alfreds FutterkisteAlfreds Futterkiste Ana Trujillo Emparedados y heladosAna Trujillo Emparedados y helados Antonio Moreno TaqueríaAntonio Moreno Taquería Around the HornAround the Horn 3.2.3.1.6.528 Repeat the string 0 times: t=> SELECT REPEAT('abc', 0); repeat ------'' 3.2.3.1.6.529 Repeat the string 1 times: t=> SELECT REPEAT('abc', 1); repeat ------'abc' 3.2.3.1.6.530 Repeat the string 3 times: t=> SELECT REPEAT('a', 3); repeat ------'aaa' 3.2.3.1.6.531 Repeat an empty string 10 times: t=> SELECT REPEAT('', 10); repeat ------'' 616 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.532 Repeat a string -3 times: t=> SELECT REPEAT('abc',-3); repeat ------'' 3.2.3.1.6.533 REPLACE Replaces all occurrences of a specified string value with another string value. Warning: With VARCHAR, a substring can only be replaced with another substring of equal byte length. See OCTET_LENGTH. 3.2.3.1.6.534 Syntax REPLACE( expr, source_expr, replacement_expr ) 3.2.3.1.6.535 Arguments Parameter Description expr String expression source_expr String expression to be replaced replacement_expr String expression containing the replacement 3.2.3.1.6.536 Returns Returns the same type as the argument supplied. 3.2.3.1.6.537 Notes • In VARCHAR strings, the source_expr and replacement_expr must be the same byte length. See OCTET_LENGTH. • If the value is NULL, the result is NULL. 3.2.3.1.6.538 Examples For these examples, consider the following table and contents: 3.2. SQL Statements and Syntax 617 SQream DB, Release 2021.1 CREATE TABLE jabberwocky(line TEXT(50)); INSERT INTO jabberwocky VALUES ('''Twas brillig, and the slithy toves '), (' Did gyre and gimble in the wabe: ') ,('All mimsy were the borogoves, '), (' And the mome raths outgrabe. ') ,('"Beware the Jabberwock, my son! '), (' The jaws that bite, the claws that␣ ,→catch! ') ,('Beware the Jubjub bird, and shun '), (' The frumious Bandersnatch!" '); 3.2.3.1.6.539 Replacing single characters t=> SELECT REPLACE('What is all of this about?', 'u', 'o'); replace ------What is all of this aboot? 3.2.3.1.6.540 Replacing words t=> SELECT REPLACE(line, 'the', 'þe') FROM jabberwocky; replace ------'Twas brillig, and þe slithy toves Did gyre and gimble in þe wabe: All mimsy were þe borogoves, And þe mome raths outgrabe. "Beware þe Jabberwock, my son! The jaws that bite, þe claws that catch! Beware þe Jubjub bird, and shun The frumious Bandersnatch!" 3.2.3.1.6.541 REVERSE Returns a reversed order of a character string. 3.2.3.1.6.542 Syntax REVERSE( expr ) 3.2.3.1.6.543 Arguments Parameter Description expr String expression 618 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.544 Returns Returns the same type as the argument supplied. 3.2.3.1.6.545 Notes • If the value is NULL, the result is NULL. 3.2.3.1.6.546 Examples For these examples, consider the following table and contents: CREATE TABLE jabberwocky(line TEXT(50)); INSERT INTO jabberwocky VALUES ('''Twas brillig, and the slithy toves '), (' Did gyre and gimble in the wabe: ') ,('All mimsy were the borogoves, '), (' And the mome raths outgrabe. ') ,('"Beware the Jabberwock, my son! '), (' The jaws that bite, the claws that␣ ,→catch! ') ,('Beware the Jubjub bird, and shun '), (' The frumious Bandersnatch!" '); 3.2.3.1.6.547 Using REVERSE t=> SELECT line, REVERSE(line) FROM jabberwocky; line | reverse ------+------,→------'Twas brillig, and the slithy toves | sevot yhtils eht dna ,gillirb sawT' Did gyre and gimble in the wabe: | :ebaw eht ni elbmig dna eryg diD All mimsy were the borogoves, | ,sevogorob eht erew ysmim llA And the mome raths outgrabe. | .ebargtuo shtar emom eht dnA "Beware the Jabberwock, my son! | !nos ym ,kcowrebbaJ eht eraweB" The jaws that bite, the claws that catch! | !hctac taht swalc eht ,etib taht␣ ,→swaj ehT Beware the Jubjub bird, and shun | nuhs dna ,drib bujbuJ eht eraweB The frumious Bandersnatch!" | "!hctansrednaB suoimurf ehT 3.2.3.1.6.548 RIGHT Returns the right part of a character string with the specified number of characters. See also LEFT. 3.2.3.1.6.549 Syntax RIGHT( expr , character_count ) 3.2. SQL Statements and Syntax 619 SQream DB, Release 2021.1 3.2.3.1.6.550 Arguments Parameter Description expr String expression character_count A positive integer that specifies how many characters to return. 3.2.3.1.6.551 Returns Returns the same type as the argument supplied. 3.2.3.1.6.552 Notes • This function works on TEXT strings only. • If the value is NULL, the result is NULL. 3.2.3.1.6.553 Examples For these examples, consider the following table and contents: CREATE TABLE jabberwocky(line TEXT(50)); INSERT INTO jabberwocky VALUES ('''Twas brillig, and the slithy toves '), (' Did gyre and gimble in the wabe: ') ,('All mimsy were the borogoves, '), (' And the mome raths outgrabe. ') ,('"Beware the Jabberwock, my son! '), (' The jaws that bite, the claws that␣ ,→catch! ') ,('Beware the Jubjub bird, and shun '), (' The frumious Bandersnatch!" '); 3.2.3.1.6.554 Using RIGHT t=> SELECT LEFT(line, 10), RIGHT(line, 10) FROM jabberwocky; left | right ------+------'Twas bril | thy toves Did | the wabe: All mimsy | orogoves, And | outgrabe. "Beware th | , my son! The | at catch! Beware the | and shun The | rsnatch!" 3.2.3.1.6.555 RLIKE Tests if a string matches a given regular expression pattern. 620 Chapter 3. Reference SQream DB, Release 2021.1 RLIKE and LIKE are similar, but RLIKE uses POSIX regular expressions instead of the SQL patterns. See also: LIKE, REGEXP_COUNT, REGEXP_INSTR, REGEXP_SUBSTR. 3.2.3.1.6.556 Syntax string_expr [ NOT ] RLIKE string_test_expr 3.2.3.1.6.557 Arguments Parameter Description string_expr String to test string_test_expr Test pattern 3.2.3.1.6.558 Test patterns Pat- Description tern ^ Match the beginning of a string $ Match the end of a string . Match any character (including whitespace such as carriage return and newline) * Match the preceding pattern zero or more times + Match the preceding pattern at least once ? Match the preceding pattern once at most de|abcMatch either de or abc (abc)*Match zero or more instances of the sequence abc {2} Match the preceding pattern exactly two times {2, Match the preceding pattern between two and four times 4} [a-dX]Matches, any character that is (or is not when negated with ^) either a, b, c, d, or X. The - character [^a-dX]between two other characters forms a range that matches all characters from the first character to the second. For example, [0-9] matches any decimal digit. To include a literal ] character, it must immediately follow the opening bracket [. To include a literal - character, it must be written first or last. Any character that does not have a defined special meaning inside a [] pair matches only itself. 3.2.3.1.6.559 Returns TRUE if the test string matches the pattern, or FALSE otherwise. 3.2.3.1.6.560 Notes • The test pattern must be literal string. Column references or complex expressions are currently un- supported. • If the value is NULL, the result is NULL. 3.2. SQL Statements and Syntax 621 SQream DB, Release 2021.1 3.2.3.1.6.561 Examples For these examples, assume a table named nba, with the following structure: CREATE TABLE nba ( Name varchar(40), Team varchar(40), Number tinyint, Position varchar(2), Age tinyint, Height varchar(4), Weight real, College varchar(40), Salary float ); Here’s a peek at the table contents (Download nba.csv): Table 33: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics 3.2.3.1.6.562 Match the beginning of a string This form is equivalent to ... LIKE "Portland%" nba=> SELECT "Name","Age","Salary","Team" FROM nba WHERE "Team" RLIKE '^(Portland)+'␣ ,→LIMIT 5; Name | Age | Salary | Team ------+-----+------+------Cliff Alexander | 20 | 525093 | Portland Trail Blazers Al-Farouq Aminu | 25 | 8042895 | Portland Trail Blazers (continues on next page) 622 Chapter 3. Reference SQream DB, Release 2021.1 (continued from previous page) Pat Connaughton | 23 | 625093 | Portland Trail Blazers Allen Crabbe | 24 | 947276 | Portland Trail Blazers Ed Davis | 27 | 6980802 | Portland Trail Blazers 3.2.3.1.6.563 Negate with NOT nba=> SELECT "Name","Age","Salary","Team" FROM nba WHERE "Team" NOT RLIKE '^(Portland)+'␣ ,→LIMIT 5; Name | Age | Salary | Team ------+-----+------+------Avery Bradley | 25 | 7730337 | Boston Celtics Jae Crowder | 25 | 6796117 | Boston Celtics John Holland | 27 | | Boston Celtics R.J. Hunter | 22 | 1148640 | Boston Celtics Jonas Jerebko | 29 | 5000000 | Boston Celtics 3.2.3.1.6.564 Match the middle of a string nba=> SELECT "Name","Age","Salary","Team" FROM nba WHERE "Team" RLIKE '(zz)' LIMIT 5; Name | Age | Salary | Team ------+-----+------+------Jordan Adams | 21 | 1404600 | Memphis Grizzlies Tony Allen | 34 | 5158539 | Memphis Grizzlies Chris Andersen | 37 | 5000000 | Memphis Grizzlies Matt Barnes | 36 | 3542500 | Memphis Grizzlies Vince Carter | 39 | 4088019 | Memphis Grizzlies 3.2.3.1.6.565 Find players with a Roman numeral suffix Use $ to match only the end of the string nba=> SELECT "Name","Age","Salary","Team" FROM nba WHERE "Name" RLIKE '[XCLVMI]$'; Name | Age | Salary | Team ------+-----+------+------Glenn Robinson III | 22 | 1100000 | Indiana Pacers Johnny O'Bryant III | 23 | 845059 | Milwaukee Bucks Frank Kaminsky III | 23 | 2612520 | Charlotte Hornets 3.2.3.1.6.566 Find players with just one middle name nba=> SELECT "Name","Age","Salary","Team" FROM nba WHERE "Name" RLIKE '^[a-zA-Z]+ [a-zA- ,→Z]+ [a-zA-Z]+$'; Name | Age | Salary | Team ------+-----+------+------James Michael McAdoo | 23 | 845059 | Golden State Warriors (continues on next page) 3.2. SQL Statements and Syntax 623 SQream DB, Release 2021.1 (continued from previous page) Metta World Peace | 36 | 947276 | Los Angeles Lakers Glenn Robinson III | 22 | 1100000 | Indiana Pacers Frank Kaminsky III | 23 | 2612520 | Charlotte Hornets 3.2.3.1.6.567 RTRIM Trims trailing whitespace from a string. See also TRIM, LTRIM. 3.2.3.1.6.568 Syntax RTRIM( expr ) 3.2.3.1.6.569 Arguments Parameter Description expr String expression 3.2.3.1.6.570 Returns Returns the same type as the argument supplied. 3.2.3.1.6.571 Notes • When using VARCHAR values, SQream DB automatically trims the trailing whitespace. Using RTRIM on VARCHAR does not affect the result. • This function is equivalent to the ANSI form TRIM( TRAILING FROM expr ) • If the value is NULL, the result is NULL. 3.2.3.1.6.572 Examples For these examples, consider the following table and contents: CREATE TABLE jabberwocky(line TEXT(50)); INSERT INTO jabberwocky VALUES ('''Twas brillig, and the slithy toves '), (' Did gyre and gimble in the wabe: ') ,('All mimsy were the borogoves, '), (' And the mome raths outgrabe. ') ,('"Beware the Jabberwock, my son! '), (' The jaws that bite, the claws that␣ ,→catch! ') ,('Beware the Jubjub bird, and shun '), (' The frumious Bandersnatch!" '); 624 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.573 Trimming a literal value In this example we use || (Concatenate) to show the whitespace. t=> SELECT '#' || RTRIM(' SQream DB ') || '#'; rtrim ------# SQream DB# 3.2.3.1.6.574 Trimming a column of values t=> SELECT ('Line: ' || line) as untrimmed, ('Line: ' || RTRIM(line) || ') as trimmed␣ ,→FROM jabberwocky; untrimmed | trimmed ------+------,→------Line: 'Twas brillig, and the slithy toves # | Line: 'Twas brillig, and the␣ ,→slithy toves# Line: Did gyre and gimble in the wabe: # | Line: Did gyre and␣ ,→gimble in the wabe:# Line: All mimsy were the borogoves, # | Line: All mimsy were the␣ ,→borogoves,# Line: And the mome raths outgrabe. # | Line: And the mome raths␣ ,→outgrabe.# Line: "Beware the Jabberwock, my son! # | Line: "Beware the Jabberwock,␣ ,→my son!# Line: The jaws that bite, the claws that catch! # | Line: The jaws that bite, ,→ the claws that catch!# Line: Beware the Jubjub bird, and shun # | Line: Beware the Jubjub bird,␣ ,→and shun# Line: The frumious Bandersnatch!" # | Line: The frumious␣ ,→Bandersnatch!"# 3.2.3.1.6.575 SUBSTRING Returns a substring of the input starting at start_pos. Note: Some systems call this function SUBSTR. See also REGEXP_SUBSTR. 3.2.3.1.6.576 Syntax SUBSTRING( expr, start_pos, length ) 3.2. SQL Statements and Syntax 625 SQream DB, Release 2021.1 3.2.3.1.6.577 Arguments Parameter Description expr String expression start_pos Starting position (starts at 1) length Number of characters to extract 3.2.3.1.6.578 Returns Returns the same type as the argument supplied. 3.2.3.1.6.579 Notes • Character count starts at 1. • If the value is NULL, the result is NULL. 3.2.3.1.6.580 Examples For these examples, assume a table named nba, with the following structure: CREATE TABLE nba ( Name varchar(40), Team varchar(40), Number tinyint, Position varchar(2), Age tinyint, Height varchar(4), Weight real, College varchar(40), Salary float ); Here’s a peek at the table contents (Download nba.csv): 626 Chapter 3. Reference SQream DB, Release 2021.1 Table 34: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics 3.2.3.1.6.581 Substring using fixed offsets Get 4 characters, starting from the 4th character nba=> SELECT SUBSTRING("Name", 4, 4) FROM nba LIMIT 5; substring ------ry B Cro n Ho . Hu as J 3.2.3.1.6.582 Truncating strings Trim a string to 10 characters nba=> SELECT SUBSTRING("Name", 1, 10) FROM nba LIMIT 5; substring ------Avery Brad Jae Crowde John Holla R.J. Hunte Jonas Jere 3.2. SQL Statements and Syntax 627 SQream DB, Release 2021.1 3.2.3.1.6.583 TRIM Trims both leading and trailing whitespace from a string. See also LTRIM, RTRIM. 3.2.3.1.6.584 Syntax TRIM( expr ) 3.2.3.1.6.585 Arguments Parameter Description expr String expression 3.2.3.1.6.586 Returns Returns the same type as the argument supplied. 3.2.3.1.6.587 Notes • When using VARCHAR values, SQream DB automatically trims the trailing whitespace. • This function is equivalent to the ANSI form TRIM( BOTH FROM expr ) • If the value is NULL, the result is NULL. 3.2.3.1.6.588 Examples For these examples, consider the following table and contents: CREATE TABLE jabberwocky(line TEXT(50)); INSERT INTO jabberwocky VALUES ('''Twas brillig, and the slithy toves '), (' Did gyre and gimble in the wabe: ') ,('All mimsy were the borogoves, '), (' And the mome raths outgrabe. ') ,('"Beware the Jabberwock, my son! '), (' The jaws that bite, the claws that␣ ,→catch! ') ,('Beware the Jubjub bird, and shun '), (' The frumious Bandersnatch!" '); 3.2.3.1.6.589 Trimming a literal value In this example we use || (Concatenate) to show the whitespace. 628 Chapter 3. Reference SQream DB, Release 2021.1 t=> SELECT '#' || TRIM(' SQream DB ') || '#'; trim ------#SQream DB# 3.2.3.1.6.590 Trimming a column of values t=> SELECT ('Line: ' || line || '#') as untrimmed, ('Line: ' || TRIM(line) || '#') as␣ ,→trimmed FROM jabberwocky; untrimmed | trimmed ------+------,→------Line: 'Twas brillig, and the slithy toves # | Line: 'Twas brillig, and the␣ ,→slithy toves# Line: Did gyre and gimble in the wabe: # | Line: Did gyre and gimble in␣ ,→the wabe:# Line: All mimsy were the borogoves, # | Line: All mimsy were the␣ ,→borogoves,# Line: And the mome raths outgrabe. # | Line: And the mome raths␣ ,→outgrabe.# Line: "Beware the Jabberwock, my son! # | Line: "Beware the Jabberwock,␣ ,→my son!# Line: The jaws that bite, the claws that catch! # | Line: The jaws that bite, the␣ ,→claws that catch!# Line: Beware the Jubjub bird, and shun # | Line: Beware the Jubjub bird,␣ ,→and shun# Line: The frumious Bandersnatch!" # | Line: The frumious␣ ,→Bandersnatch!"# 3.2.3.1.6.591 UPPER Converts characters in a string to upper case See also: LOWER 3.2.3.1.6.592 Syntax UPPER( expr ) 3.2.3.1.6.593 Arguments Parameter Description expr String expression 3.2. SQL Statements and Syntax 629 SQream DB, Release 2021.1 3.2.3.1.6.594 Returns Returns the same type as the argument supplied. 3.2.3.1.6.595 Notes • If the value is NULL, the result is NULL. 3.2.3.1.6.596 Examples For these examples, consider the following table and contents: CREATE TABLE jabberwocky(line VARCHAR(50)); INSERT INTO jabberwocky VALUES ('''Twas brillig, and the slithy toves'), (' Did gyre and gimble in the wabe:') ,('All mimsy were the borogoves,'), (' And the mome raths outgrabe.') ,('"Beware the Jabberwock, my son!'), (' The jaws that bite, the claws that␣ ,→catch!') ,('Beware the Jubjub bird, and shun'), (' The frumious Bandersnatch!"'); 3.2.3.1.6.597 Upper-casing a literal value t=> SELECT UPPER('SQream DB'); SQREAM DB 3.2.3.1.6.598 Upper-casing a column of values t=> SELECT UPPER(line) FROM jabberwocky; upper ------'TWAS BRILLIG, AND THE SLITHY TOVES DID GYRE AND GIMBLE IN THE WABE: ALL MIMSY WERE THE BOROGOVES, AND THE MOME RATHS OUTGRABE. "BEWARE THE JABBERWOCK, MY SON! THE JAWS THAT BITE, THE CLAWS THAT CATCH! BEWARE THE JUBJUB BIRD, AND SHUN THE FRUMIOUS BANDERSNATCH!" 3.2.3.1.6.599 User-Defined Functions The following user-defined functions are functions that can be defined and configured byusers: • Python user-defined functions. • Scalar SQL user-defined functions. 630 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.600 Scalar SQL UDF A scalar SQL UDF is a user-defined function that returns a single value, such as the sum of agroup values. Scalar UDFs are different than table-value functions, which return a result set in the form ofatable. This page describes the correct syntax when building simple scalar UDFs and provides three examples. 3.2.3.1.6.601 Syntax The following example shows the correct syntax for simple scalar SQL UDF’s returning the type name: $ create_function_statement ::= $ CREATE [ OR REPLACE ] FUNCTION function_name (argument_list) $ RETURNS return_type $ AS $$ $ { function_body } $ $$ LANGUAGE SQL $ ; $ $ function_name ::= identifier $ argument_list :: = { value_name type_name [, ...] } $ value_name ::= identifier $ return_type ::= type_name $ function_body ::= A valid SQL statement 3.2.3.1.6.602 Examples 3.2.3.1.6.603 Example 1 – Support for Different Syntax Scalar SQL UDF supports standard functionality even when different syntax is used. In the example below, the syntax dateadd is used instead of add_months, although the function of each is identical. In addition, the operation works correctly even though the order of the expressions in add_months (dt, datetime, and n int) is different than MONTH, n, and dt in dateadd. $ CREATE or replace FUNCTION add_months(dt datetime,n int) $ RETURNS datetime $ AS $$ $ SELECT dateadd(MONTH ,n,dt) $ $$ LANGUAGE SQL; 3.2.3.1.6.604 Example 2 – Manipulating Strings The Scalar SQL UDF can be used to manipulate strings. The following example shows the correct syntax for converting a TEXT date to the DATE type: $ CREATE or replace FUNCTION STR_TO_DATE(f text) $ RETURNS date $ AS $$ (continues on next page) 3.2. SQL Statements and Syntax 631 SQream DB, Release 2021.1 (continued from previous page) $ select (substring(f,1,4)||'-'||substring(f,5,2)||'-'||substring(f,7,2))::date $ $$ LANGUAGE SQL; 3.2.3.1.6.605 Example 3 – Manually Building Functionality You can use the Scalar SQL UDF to manually build functionality for otherwise unsupported operations. $ CREATE OR REPLACE function "least_sq" (a float, b float) -- Replace the LEAST(from␣ ,→hql) function $ returns float as $ $$select case $ when a <= b then a $ when b < a then b $ when a is null then b $ when b is null then a $ else null $ end; $ $$ $ language sql; 3.2.3.1.6.606 Usage Notes The following usage notes apply when using simple scalar SQL UDF’s: • During this stage, the SQL embedded in the function body must be of the type SELECT expr;. Creating a UDF with invalid SQL, or with valid SQL of any other type, results in an error. • As with Python UDFs, the argument list can be left empty. • SQL UDFs can reference other UDF’s, including Python UDF’s. NOTICE: A function cannot (directly or indirectly) reference itself (such as by referencing another function that references it). Because SQL UDF’s are one type of supported UDFs, the following Python UDF characteristics apply: • UDF permission rules - see Access Control. • The get_function_ddl utility function works on these functions - see Getting the DDL for a Function. • SQL UDF’s should appear in the catalog with Python UDF’s - see Finding Existing UDFs in the Catalog. 3.2.3.1.6.607 Restrictions The following restrictions apply to simple scalar SQL UDFs: • Simple scalar SQL UDF’s cannot currently reference other UDFs. • Like Python UDF’s, Sqream does not support overloading. 632 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.608 Aggregate Functions 3.2.3.1.6.609 Overview Aggregate functions perform calculations based on a set of values and return a single value. Most aggregate functions ignore null values. Aggregate functions are often used with the GROUP BY clause of the SELECT statement. 3.2.3.1.6.610 Available Aggregate Functions The following list shows the available aggregate functions: 3.2.3.1.6.611 AVG Returns the average of numeric values. 3.2.3.1.6.612 Syntax -- As an aggregate AVG( expr ) -- As a window function AVG ( expr ) OVER ( [ PARTITION BY value_expression [, ...]] [ ORDER BY value_expression [ ASC | DESC ][ NULLS { FIRST | LAST } ] [, ...]] [ frame_clause ] ) 3.2.3.1.6.613 Arguments Parameter Description expr Numeric expression 3.2.3.1.6.614 Returns Return type is dependant on the argument. • For TINYINT, SMALLINT and INT, the return type is INT. • For BIGINT, the return type is BIGINT. • For REAL, the return type is REAL • For DOUBLE, rhe return type is DOUBLE 3.2. SQL Statements and Syntax 633 SQream DB, Release 2021.1 3.2.3.1.6.615 Notes • NULL values are ignored • AVG relies on calculating the sum, which can very quickly overflow on large data sets. If the sumis over 231 for example, up-cast to a larger type like BIGINT: AVG(expr :: BIGINT) 3.2.3.1.6.616 Examples For these examples, assume a table named nba, with the following structure: CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float ); Here’s a peek at the table contents (Download nba.csv): Table 35: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics 634 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.617 Simple average t=> SELECT AVG("Age") FROM nba; avg --- 26 Note: The return type is the same as the input type. To get a fractional result, cast the argument: t=> SELECT AVG("Age" :: REAL) FROM nba; avg ------26.9387 3.2.3.1.6.618 Combine AVG with other aggregates t=> SELECT "Age", AVG("Salary") as "Average salary", COUNT(*) as "Number of players"␣ ,→FROM nba GROUP BY 1; Age | Average salary | Number of players ----+------+------19 | 1930440 | 2 20 | 2725790 | 19 21 | 2067379 | 19 22 | 2357963 | 26 23 | 2034746 | 41 24 | 3785300 | 47 25 | 3930867 | 45 26 | 6866566 | 36 27 | 6676741 | 41 28 | 5110188 | 31 29 | 6224177 | 28 30 | 7061858 | 31 31 | 8511396 | 22 32 | 7716958 | 13 33 | 3930739 | 14 34 | 7606030 | 10 35 | 3461739 | 9 36 | 2238119 | 10 37 | 12777778 | 4 38 | 1840041 | 4 39 | 2517872 | 2 40 | 4666916 | 3 3.2.3.1.6.619 CORR Returns the Pearson correlation coefficient of value pairs. 3.2. SQL Statements and Syntax 635 SQream DB, Release 2021.1 3.2.3.1.6.620 Syntax -- As an aggregate CORR( expr1, expr2 ) -- As a window function CORR ( expr1, expr2 ) OVER ( [ PARTITION BY partition_expr [, ...]] [ ORDER BY order [ ASC | DESC ] [, ...]] ) 3.2.3.1.6.621 Arguments Parameter Description expr1, expr2 Numeric expression 3.2.3.1.6.622 Returns Returns the Perason correlation coefficient with type DOUBLE. 3.2.3.1.6.623 Notes • When all rows contain NULL values, the function returns NULL. 3.2.3.1.6.624 Examples For these examples, assume a table named nba, with the following structure: CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float ); Here’s a peek at the table contents (Download nba.csv): 636 Chapter 3. Reference SQream DB, Release 2021.1 Table 36: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics 3.2.3.1.6.625 Simple correlation t=> SELECT "Team", CORR("Age","Salary") FROM nba GROUP BY 1 ORDER BY 2 ASC; Team | corr ------+------Cleveland Cavaliers | -0.3219 San Antonio Spurs | -0.2015 Oklahoma City Thunder | -0.1236 Detroit Pistons | -0.0678 New Orleans Pelicans | -0.0459 Los Angeles Clippers | -0.0279 Utah Jazz | 0.0913 Washington Wizards | 0.1217 Dallas Mavericks | 0.1388 Sacramento Kings | 0.1489 Milwaukee Bucks | 0.1626 Golden State Warriors | 0.1648 Minnesota Timberwolves | 0.1909 Denver Nuggets | 0.2035 Houston Rockets | 0.2051 Philadelphia 76ers | 0.2645 Chicago Bulls | 0.2663 Phoenix Suns | 0.2808 Orlando Magic | 0.2878 Toronto Raptors | 0.2916 Memphis Grizzlies | 0.3225 Miami Heat | 0.3635 Charlotte Hornets | 0.3779 (continues on next page) 3.2. SQL Statements and Syntax 637 SQream DB, Release 2021.1 (continued from previous page) Brooklyn Nets | 0.4084 Indiana Pacers | 0.4261 Atlanta Hawks | 0.4321 New York Knicks | 0.4401 Los Angeles Lakers | 0.4563 Portland Trail Blazers | 0.4856 Boston Celtics | 0.6904 3.2.3.1.6.626 COUNT Returns the count of numeric values, or only the distinct values. 3.2.3.1.6.627 Syntax -- As an aggregate COUNT( { [ DISTINCT ] expr | * } ) --> BIGINT -- As a window function COUNT ( { [ DISTINCT ] expr | * } ) OVER ( [ PARTITION BY value_expression [, ...]] [ ORDER BY value_expression [ ASC | DESC ][ NULLS { FIRST | LAST } ] [, ...]] [ frame_clause ] ) 3.2.3.1.6.628 Arguments Pa- Description rame- ter expr Any value expression * Specifies that COUNT should count all rows. It does not use information about any particular column, and preserves duplicate rows and NULL values. DISTINCTSpecifies that the operation should operate only on unique values 3.2.3.1.6.629 Returns • Count returns BIGINT. 3.2.3.1.6.630 Notes • NULL values are not ignored by COUNT • When all rows contain NULL values, the function returns NULL. 638 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.631 Examples For these examples, assume a table named nba, with the following structure: CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float ); Here’s a peek at the table contents (Download nba.csv): Table 37: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics 3.2.3.1.6.632 Count rows in a table t=> SELECT COUNT(*) FROM nba; count ----- 457 3.2. SQL Statements and Syntax 639 SQream DB, Release 2021.1 3.2.3.1.6.633 Count distinct values in a table These two forms are equivalent: t=> SELECT COUNT(distinct "Age") FROM nba; count ----- 22 t=> SELECT COUNT(*) FROM (SELECT "Age" FROM nba GROUP BY 1); count ----- 22 3.2.3.1.6.634 Combine COUNT with other aggregates t=> SELECT "Age", AVG("Salary") as "Average salary", COUNT(*) as "Number of players"␣ ,→FROM nba GROUP BY 1; Age | Average salary | Number of players ----+------+------19 | 1930440 | 2 20 | 2725790 | 19 21 | 2067379 | 19 22 | 2357963 | 26 23 | 2034746 | 41 24 | 3785300 | 47 25 | 3930867 | 45 26 | 6866566 | 36 27 | 6676741 | 41 28 | 5110188 | 31 29 | 6224177 | 28 30 | 7061858 | 31 31 | 8511396 | 22 32 | 7716958 | 13 33 | 3930739 | 14 34 | 7606030 | 10 35 | 3461739 | 9 36 | 2238119 | 10 37 | 12777778 | 4 38 | 1840041 | 4 39 | 2517872 | 2 40 | 4666916 | 3 3.2.3.1.6.635 COVAR_POP Returns the population covariance of value pairs. See also: COVAR_SAMP 640 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.636 Syntax -- As an aggregate COVAR_POP( expr1, expr2 ) -- As a window function COVAR_POP ( expr1, expr2 ) OVER ( [ PARTITION BY value_expression [, ...]] [ ORDER BY value_expression [ ASC | DESC ][ NULLS { FIRST | LAST } ] [, ...]] [ frame_clause ] ) 3.2.3.1.6.637 Arguments Parameter Description expr1, expr2 Numeric expression 3.2.3.1.6.638 Returns Returns the population covariance with type DOUBLE. 3.2.3.1.6.639 Notes • When all rows contain NULL values, the function returns NULL. 3.2.3.1.6.640 Examples For these examples, assume a table named nba, with the following structure: CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float ); Here’s a peek at the table contents (Download nba.csv): 3.2. SQL Statements and Syntax 641 SQream DB, Release 2021.1 Table 38: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics 3.2.3.1.6.641 Simple covariance t=> SELECT "Team", COVAR_POP("Age","Salary") FROM nba GROUP BY 1 ORDER BY 2 ASC; Team | covar_pop ------+------Cleveland Cavaliers | -9192828.8163 San Antonio Spurs | -6938848.9867 Oklahoma City Thunder | -3696440.1244 Detroit Pistons | -1313564.5067 Los Angeles Clippers | -968077.3778 New Orleans Pelicans | -440360.5374 Utah Jazz | 930824.7689 Philadelphia 76ers | 1458110.3265 Sacramento Kings | 2306106.08 Dallas Mavericks | 2418733.4356 Washington Wizards | 2427928.8978 Milwaukee Bucks | 2616404.8555 Orlando Magic | 2812867.8673 Golden State Warriors | 3352356.3333 Portland Trail Blazers | 3941655.4533 Denver Nuggets | 3966387.1122 Minnesota Timberwolves | 4492620.0237 Toronto Raptors | 4524417.1244 Charlotte Hornets | 5056864.8 Houston Rockets | 5309246.2089 Phoenix Suns | 5580976.6889 Indiana Pacers | 5757986.9067 Boston Celtics | 5797738.7245 (continues on next page) 642 Chapter 3. Reference SQream DB, Release 2021.1 (continued from previous page) Brooklyn Nets | 6119732.0667 Chicago Bulls | 6506357.92 Atlanta Hawks | 8859452.0667 Memphis Grizzlies | 9524269 New York Knicks | 10264800.6875 Miami Heat | 13009610.4734 Los Angeles Lakers | 15400203.6533 3.2.3.1.6.642 COVAR_SAMP Returns the sample covariance of value pairs. See also: COVAR_POP 3.2.3.1.6.643 Syntax -- As an aggregate COVAR_SAMP( expr1, expr2 ) -- As a window function COVAR_SAMP ( expr1, expr2 ) OVER ( [ PARTITION BY value_expression [, ...]] [ ORDER BY value_expression [ ASC | DESC ][ NULLS { FIRST | LAST } ] [, ...]] [ frame_clause ] ) 3.2.3.1.6.644 Arguments Parameter Description expr1, expr2 Numeric expression 3.2.3.1.6.645 Returns Returns the sample covariance with type DOUBLE. 3.2.3.1.6.646 Notes • When all rows contain NULL values, the function returns NULL. • The function also returns NULL when only one value is non-NULL. 3.2.3.1.6.647 Examples For these examples, assume a table named nba, with the following structure: 3.2. SQL Statements and Syntax 643 SQream DB, Release 2021.1 CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float ); Here’s a peek at the table contents (Download nba.csv): Table 39: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics 3.2.3.1.6.648 Simple covariance t=> SELECT "Team", COVAR_SAMP("Age","Salary") FROM nba GROUP BY 1 ORDER BY 2 ASC; Team | covar_samp ------+------Cleveland Cavaliers | -9899969.4945 San Antonio Spurs | -7434481.0571 Oklahoma City Thunder | -3960471.5619 Detroit Pistons | -1407390.5429 Los Angeles Clippers | -1037225.7619 New Orleans Pelicans | -464825.0117 Utah Jazz | 997312.2524 Philadelphia 76ers | 1570272.6593 (continues on next page) 644 Chapter 3. Reference SQream DB, Release 2021.1 (continued from previous page) Sacramento Kings | 2470827.9429 Dallas Mavericks | 2591500.1095 Washington Wizards | 2601352.3905 Milwaukee Bucks | 2790831.8458 Orlando Magic | 3029242.3187 Golden State Warriors | 3591810.3571 Portland Trail Blazers | 4223202.2714 Denver Nuggets | 4271493.8132 Toronto Raptors | 4847589.7762 Minnesota Timberwolves | 4867005.0256 Charlotte Hornets | 5418069.4286 Houston Rockets | 5688478.081 Phoenix Suns | 5979617.881 Indiana Pacers | 6169271.6857 Boston Celtics | 6243718.6264 Brooklyn Nets | 6556855.7857 Chicago Bulls | 6971097.7714 Atlanta Hawks | 9492270.0714 Memphis Grizzlies | 10256905.0769 New York Knicks | 10949120.7333 Miami Heat | 14093744.6795 Los Angeles Lakers | 16500218.2 3.2.3.1.6.649 MAX Returns the maximum values 3.2.3.1.6.650 Syntax -- As an aggregate MAX( expr ) -- As a window function MAX ( expr ) OVER ( [ PARTITION BY value_expression [, ...]] [ ORDER BY value_expression [ ASC | DESC ][ NULLS { FIRST | LAST } ] [, ...]] [ frame_clause ] ) 3.2.3.1.6.651 Arguments Parameter Description expr Value expression 3.2.3.1.6.652 Returns Return type is dependant on the argument. 3.2. SQL Statements and Syntax 645 SQream DB, Release 2021.1 3.2.3.1.6.653 Notes • NULL values are ignored 3.2.3.1.6.654 Examples For these examples, assume a table named nba, with the following structure: CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float ); Here’s a peek at the table contents (Download nba.csv): Table 40: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics 3.2.3.1.6.655 Simple Minimum, Maximum on numeric columns t=> SELECT MIN("Age"), MAX("Age") FROM nba; min | max (continues on next page) 646 Chapter 3. Reference SQream DB, Release 2021.1 (continued from previous page) ----+---- 19 | 40 3.2.3.1.6.656 Minimum and maximum on text columns t=> SELECT MIN("Name"), MAX("Name") FROM nba; min | max ------+------Aaron Brooks | Zaza Pachulia 3.2.3.1.6.657 Combine MAX with GROUP BY t=> SELECT "Team", MAX("Salary") FROM nba GROUP BY 1 ORDER BY 2 DESC LIMIT 5; Team | max ------+------Los Angeles Lakers | 25000000 Cleveland Cavaliers | 22970500 New York Knicks | 22875000 Houston Rockets | 22359364 Miami Heat | 22192730 3.2.3.1.6.658 MIN Returns the minimum values 3.2.3.1.6.659 Syntax -- As an aggregate MIN( expr ) -- As a window function MIN ( expr ) OVER ( [ PARTITION BY value_expression [, ...]] [ ORDER BY value_expression [ ASC | DESC ][ NULLS { FIRST | LAST } ] [, ...]] [ frame_clause ] ) 3.2.3.1.6.660 Arguments Parameter Description expr Value expression 3.2. SQL Statements and Syntax 647 SQream DB, Release 2021.1 3.2.3.1.6.661 Returns Return type is dependant on the argument. 3.2.3.1.6.662 Notes • NULL values are ignored 3.2.3.1.6.663 Examples For these examples, assume a table named nba, with the following structure: CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float ); Here’s a peek at the table contents (Download nba.csv): Table 41: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics 648 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.664 Simple Minimum, Maximum on numeric columns t=> SELECT MIN("Age"), MAX("Age") FROM nba; min | max ----+---- 19 | 40 3.2.3.1.6.665 Minimum and maximum on text columns t=> SELECT MIN("Name"), MAX("Name") FROM nba; min | max ------+------Aaron Brooks | Zaza Pachulia 3.2.3.1.6.666 Combine MIN with GROUP BY t=> SELECT "Team", MIN("Salary") FROM nba GROUP BY 1 ORDER BY 2 DESC LIMIT 5; Team | min ------+------Boston Celtics | 1148640 Minnesota Timberwolves | 947276 Utah Jazz | 900000 Orlando Magic | 845059 Memphis Grizzlies | 700902 3.2.3.1.6.667 STDDEV_POP Returns the population standard deviation of values. Note: Aliases to this function include STDEVP for compatibility. See also: STDDEV_SAMP 3.2.3.1.6.668 Syntax -- As an aggregate STDDEV_POP( expr ) STDEVP( expr ) -- As a window function STDDEVP_POP ( expr ) OVER ( [ PARTITION BY value_expression [, ...]] [ ORDER BY value_expression [ ASC | DESC ][ NULLS { FIRST | LAST } ] [, ...]] [ frame_clause ] ) 3.2. SQL Statements and Syntax 649 SQream DB, Release 2021.1 3.2.3.1.6.669 Arguments Parameter Description expr Numeric expression 3.2.3.1.6.670 Returns Returns the population deviation with type DOUBLE. 3.2.3.1.6.671 Notes • When all rows contain NULL values, the function returns NULL. 3.2.3.1.6.672 Examples For these examples, assume a table named nba, with the following structure: CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float ); Here’s a peek at the table contents (Download nba.csv): 650 Chapter 3. Reference SQream DB, Release 2021.1 Table 42: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics 3.2.3.1.6.673 Simple standard deviation t=> SELECT "Team", STDDEV_POP("Age") FROM nba GROUP BY 1 LIMIT 10; Team | stddev_pop ------+------Atlanta Hawks | 4.0857 Boston Celtics | 2.7439 Brooklyn Nets | 2.9166 Charlotte Hornets | 3.0521 Chicago Bulls | 4.0464 Cleveland Cavaliers | 3.9811 Dallas Mavericks | 3.5864 Denver Nuggets | 4.5821 Detroit Pistons | 4.2926 Golden State Warriors | 3.7178 3.2.3.1.6.674 Combine STDEVP with other aggregates t=> SELECT "Age", AVG("Salary"), STDDEV_SAMP("Salary"), STDEVP("Salary") FROM nba GROUP␣ ,→BY 1; Age | avg | stddev_samp | stddev_pop ----+------+------+------19 | 1930440 | 279165.7572 | 197400 20 | 2725790 | 1510913.4308 | 1470615.1437 21 | 2067379 | 1412350.3607 | 1374680.8959 22 | 2357963 | 1517378.326 | 1487911.8642 23 | 2034746 | 2728292.0728 | 2693086.8292 (continues on next page) 3.2. SQL Statements and Syntax 651 SQream DB, Release 2021.1 (continued from previous page) 24 | 3785300 | 4803383.8083 | 4749713.0309 25 | 3930867 | 4558462.9038 | 4506364.4739 26 | 6866566 | 6100470.7879 | 6015145.316 27 | 6676741 | 6831984.073 | 6746043.7454 28 | 5110188 | 4316626.5733 | 4244073.0603 29 | 6224177 | 4870705.8411 | 4779656.5821 30 | 7061858 | 5408669.6434 | 5317761.1581 31 | 8511396 | 7170163.4693 | 7005310.0889 32 | 7716958 | 7451335.8768 | 7159011.944 33 | 3930739 | 4354293.0145 | 4195901.738 34 | 7606030 | 5653035.0228 | 5362939.9094 35 | 3461739 | 2364690.175 | 2211965.1152 36 | 2238119 | 1550061.2451 | 1470517.2142 37 | 12777778 | 10715167.374 | 8748897.5249 38 | 1840041 | 1496660.6556 | 1296146.1486 39 | 2517872 | 2220522.4752 | 1570146.5 40 | 4666916 | 4155420.6792 | 3392886.7769 3.2.3.1.6.675 STDDEV_SAMP Returns the sample standard deviation of values. Note: Aliases to this function include STDDEV and STDEV for compatibility. See also: STDDEV_POP 3.2.3.1.6.676 Syntax -- As an aggregate STDDEV_SAMP( expr ) STDDEV( expr ) STDEV( expr ) -- As a window function STDDEV_SAMP ( expr ) OVER ( [ PARTITION BY value_expression [, ...]] [ ORDER BY value_expression [ ASC | DESC ][ NULLS { FIRST | LAST } ] [, ...]] [ frame_clause ] ) 3.2.3.1.6.677 Arguments Parameter Description expr Numeric expression 652 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.678 Returns Returns the standard deviation with type DOUBLE. 3.2.3.1.6.679 Notes • When all rows contain NULL values, the function returns NULL. • The function also returns NULL when only one value is non-NULL. 3.2.3.1.6.680 Examples For these examples, assume a table named nba, with the following structure: CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float ); Here’s a peek at the table contents (Download nba.csv): Table 43: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics 3.2. SQL Statements and Syntax 653 SQream DB, Release 2021.1 3.2.3.1.6.681 Simple standard deviation t=> SELECT AVG("Age"), STDDEV_SAMP("Age") FROM nba; avg | stddev_samp ----+------26 | 4.404 3.2.3.1.6.682 Combine stddev with other aggregates t=> SELECT "Age", AVG("Salary"), STDDEV_SAMP("Salary"), STDDEV_POP("Salary") FROM nba␣ ,→GROUP BY 1; Age | avg | stddev_samp | stddev_pop ----+------+------+------19 | 1930440 | 279165.7572 | 197400 20 | 2725790 | 1510913.4308 | 1470615.1437 21 | 2067379 | 1412350.3607 | 1374680.8959 22 | 2357963 | 1517378.326 | 1487911.8642 23 | 2034746 | 2728292.0728 | 2693086.8292 24 | 3785300 | 4803383.8083 | 4749713.0309 25 | 3930867 | 4558462.9038 | 4506364.4739 26 | 6866566 | 6100470.7879 | 6015145.316 27 | 6676741 | 6831984.073 | 6746043.7454 28 | 5110188 | 4316626.5733 | 4244073.0603 29 | 6224177 | 4870705.8411 | 4779656.5821 30 | 7061858 | 5408669.6434 | 5317761.1581 31 | 8511396 | 7170163.4693 | 7005310.0889 32 | 7716958 | 7451335.8768 | 7159011.944 33 | 3930739 | 4354293.0145 | 4195901.738 34 | 7606030 | 5653035.0228 | 5362939.9094 35 | 3461739 | 2364690.175 | 2211965.1152 36 | 2238119 | 1550061.2451 | 1470517.2142 37 | 12777778 | 10715167.374 | 8748897.5249 38 | 1840041 | 1496660.6556 | 1296146.1486 39 | 2517872 | 2220522.4752 | 1570146.5 40 | 4666916 | 4155420.6792 | 3392886.7769 3.2.3.1.6.683 SUM Returns the sum of numeric values, or only the distinct values. 3.2.3.1.6.684 Syntax -- As an aggregate SUM( [ DISTINCT ] expr ) -- As a window function SUM ( expr ) OVER ( [ PARTITION BY value_expression [, ...]] (continues on next page) 654 Chapter 3. Reference SQream DB, Release 2021.1 (continued from previous page) [ ORDER BY value_expression [ ASC | DESC ][ NULLS { FIRST | LAST } ] [, ...]] [ frame_clause ] ) 3.2.3.1.6.685 Arguments Parameter Description expr Numeric expression DISTINCT Specifies that the operation should operate only on unique values 3.2.3.1.6.686 Returns Return type is dependant on the argument. • For TINYINT, SMALLINT and INT, the return type is INT. • For BIGINT, the return type is BIGINT. • For REAL, the return type is REAL • For DOUBLE, rhe return type is DOUBLE 3.2.3.1.6.687 Notes • NULL values are ignored • Because SUM returns the same data type, it can very quickly overflow on large data sets. If the SUM is over 231 for example, up-cast to a larger type like BIGINT: SUM(expr :: BIGINT) 3.2.3.1.6.688 Examples For these examples, assume a table named nba, with the following structure: CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float ); Here’s a peek at the table contents (Download nba.csv): 3.2. SQL Statements and Syntax 655 SQream DB, Release 2021.1 Table 44: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics 3.2.3.1.6.689 Simple sum t=> SELECT SUM("Age") FROM nba; sum ----- 12311 3.2.3.1.6.690 Sum only distinct values t=> SELECT SUM(DISTINCT "Age") FROM nba; sum --- 649 3.2.3.1.6.691 Combine sum with GROUP BY t=> SELECT "Age", SUM("Salary") FROM nba GROUP BY 1; Age | sum ----+------19 | 3860880 20 | 51790026 21 | 39280213 22 | 61307050 23 | 79355103 24 | 170338514 (continues on next page) 656 Chapter 3. Reference SQream DB, Release 2021.1 (continued from previous page) 25 | 172958166 26 | 247196385 27 | 267069647 28 | 153305658 29 | 168052779 30 | 211855757 31 | 187250724 32 | 100320456 33 | 55030346 34 | 76060300 35 | 27693918 36 | 22381196 37 | 38333334 38 | 7360164 39 | 5035745 40 | 14000750 3.2.3.1.6.692 VAR_POP Returns the population variance of values. Note: Aliases to this function include VARP for compatibility. See also: STDDEV_SAMP 3.2.3.1.6.693 Syntax -- As an aggregate VAR_POP( expr ) VARP( expr ) -- As a window function VAR_POP ( expr ) OVER ( [ PARTITION BY value_expression [, ...]] [ ORDER BY value_expression [ ASC | DESC ][ NULLS { FIRST | LAST } ] [, ...]] [ frame_clause ] ) 3.2.3.1.6.694 Arguments Parameter Description expr Numeric expression 3.2. SQL Statements and Syntax 657 SQream DB, Release 2021.1 3.2.3.1.6.695 Returns Returns the population variance with type DOUBLE. 3.2.3.1.6.696 Notes • When all rows contain NULL values, the function returns NULL. 3.2.3.1.6.697 Examples For these examples, assume a table named nba, with the following structure: CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float ); Here’s a peek at the table contents (Download nba.csv): Table 45: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics 658 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.698 Simple variance t=> SELECT "Team", VAR_POP("Age") FROM nba GROUP BY 1 LIMIT 10; Team | var_pop ------+------Atlanta Hawks | 16.6933 Boston Celtics | 7.5289 Brooklyn Nets | 8.5067 Charlotte Hornets | 9.3156 Chicago Bulls | 16.3733 Cleveland Cavaliers | 15.8489 Dallas Mavericks | 12.8622 Denver Nuggets | 20.9956 Detroit Pistons | 18.4267 Golden State Warriors | 13.8222 3.2.3.1.6.699 Combine VARP with other aggregates t=> SELECT "Age", AVG("Salary"), VARP("Salary"), STDDEV_POP("Salary") FROM nba GROUP BY␣ ,→1; Age | avg | var_pop | stddev_pop ----+------+------+------19 | 1930440 | 38966760000 | 197400 20 | 2725790 | 2162708900784.4473 | 1470615.1437 21 | 2067379 | 1889747565514.0222 | 1374680.8959 22 | 2357963 | 2213881715652.018 | 1487911.8642 23 | 2034746 | 7252716669494.947 | 2693086.8292 24 | 3785300 | 22559773876347.457 | 4749713.0309 25 | 3930867 | 20307320771204.332 | 4506364.4739 26 | 6866566 | 36181973172363.8 | 6015145.316 27 | 6676741 | 45509106214871.39 | 6746043.7454 28 | 5110188 | 18012156141081.11 | 4244073.0603 29 | 6224177 | 22845117042669.63 | 4779656.5821 30 | 7061858 | 28278583734766.582 | 5317761.1581 31 | 8511396 | 49074369441838.43 | 7005310.0889 32 | 7716958 | 51251452013710.28 | 7159011.944 33 | 3930739 | 17605591394980.715 | 4195901.738 34 | 7606030 | 28761124471812.6 | 5362939.9094 35 | 3461739 | 4892789670765.1875 | 2211965.1152 36 | 2238119 | 2162420877245.64 | 1470517.2142 37 | 12777778 | 76543207901234.67 | 8748897.5249 38 | 1840041 | 1679994838499 | 1296146.1486 39 | 2517872 | 2465360031462.25 | 1570146.5 40 | 4666916 | 11511680680555.555 | 3392886.7769 3.2.3.1.6.700 VAR_SAMP Returns the sample variance of values. 3.2. SQL Statements and Syntax 659 SQream DB, Release 2021.1 Note: Aliases to this function include VAR and VARIANCE for compatibility. See also: VAR_POP 3.2.3.1.6.701 Syntax -- As an aggregate VAR_SAMP( expr ) VAR( expr ) VARIANCE( expr ) -- As a window function VAR_SAMP ( expr ) OVER ( [ PARTITION BY value_expression [, ...]] [ ORDER BY value_expression [ ASC | DESC ][ NULLS { FIRST | LAST } ] [, ...]] [ frame_clause ] ) 3.2.3.1.6.702 Arguments Parameter Description expr Numeric expression 3.2.3.1.6.703 Returns Returns the sample variance with type DOUBLE. 3.2.3.1.6.704 Notes • When all rows contain NULL values, the function returns NULL. • The function also returns NULL when only one value is non-NULL. 3.2.3.1.6.705 Examples For these examples, assume a table named nba, with the following structure: CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), (continues on next page) 660 Chapter 3. Reference SQream DB, Release 2021.1 (continued from previous page) "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float ); Here’s a peek at the table contents (Download nba.csv): Table 46: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics 3.2.3.1.6.706 Simple variance t=> SELECT "Team", VAR("Age") FROM nba GROUP BY 1 LIMIT 10; Team | var_samp ------+------Atlanta Hawks | 17.8857 Boston Celtics | 8.0667 Brooklyn Nets | 9.1143 Charlotte Hornets | 9.981 Chicago Bulls | 17.5429 Cleveland Cavaliers | 16.981 Dallas Mavericks | 13.781 Denver Nuggets | 22.4952 Detroit Pistons | 19.7429 Golden State Warriors | 14.8095 3.2. SQL Statements and Syntax 661 SQream DB, Release 2021.1 3.2.3.1.6.707 Combine VAR with other aggregates t=> SELECT "Age", AVG("Salary"), VAR("Salary"), VARP("Salary") FROM nba GROUP BY 1; Age | avg | var_samp | var_pop ----+------+------+------19 | 1930440 | 77933520000 | 38966760000 20 | 2725790 | 2282859395272.472 | 2162708900784.4473 21 | 2067379 | 1994733541375.9124 | 1889747565514.0222 22 | 2357963 | 2302436984278.0986 | 2213881715652.018 23 | 2034746 | 7443577634481.656 | 7252716669494.947 24 | 3785300 | 23072496009900.81 | 22559773876347.457 25 | 3930867 | 20779584044953.27 | 20307320771204.332 26 | 6866566 | 37215743834431.336 | 36181973172363.8 27 | 6676741 | 46676006374227.07 | 45509106214871.39 28 | 5110188 | 18633264973532.18 | 18012156141081.11 29 | 6224177 | 23723775390464.617 | 22845117042669.63 30 | 7061858 | 29253707311827.5 | 28278583734766.582 31 | 8511396 | 51411244177164.07 | 49074369441838.43 32 | 7716958 | 55522406348186.13 | 51251452013710.28 33 | 3930739 | 18959867656133.08 | 17605591394980.715 34 | 7606030 | 31956804968680.668 | 28761124471812.6 35 | 3461739 | 5591759623731.643 | 4892789670765.1875 36 | 2238119 | 2402689863606.2666 | 2162420877245.64 37 | 12777778 | 114814811851852 | 76543207901234.67 38 | 1840041 | 2239993117998.6665 | 1679994838499 39 | 2517872 | 4930720062924.5 | 2465360031462.25 40 | 4666916 | 17267521020833.332 | 11511680680555.555 3.2.3.1.6.708 Window Functions Window functions are functions applied over a subset (known as a window) of the rows returned by a SELECT query. Read more about Window Functions in the SQL Syntax Features section. 3.2.3.1.6.709 LAG Returns a value from a previous row within the partition of a result set. See also: LEAD. 3.2.3.1.6.710 Syntax LAG ( expr [, offset ]) OVER ( [ PARTITION BY partition_expr [, ...]] [ ORDER BY order [ ASC | DESC ] [, ...]] ) offset ::= integer 662 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.711 Arguments Parameter Description expr Any value expression offset Specifies the offset of rows to scan back. If not specified, defaults to1. 3.2.3.1.6.712 Returns Same type as expr. 3.2.3.1.6.713 Notes • offset is evaluated with respect to the current row. If not specified, offset defaults to 1. • If there is no previous row (calculated against the current row), LAG returns NULL. 3.2.3.1.6.714 Examples For these examples, assume a table named nba, with the following structure: CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float ); Here’s a peek at the table contents (Download nba.csv): 3.2. SQL Statements and Syntax 663 SQream DB, Release 2021.1 Table 47: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics 3.2.3.1.6.715 Using LAG to access previous rows Changed in version 2020.2: LAG and LEAD are supported from v2020.2. This example calculates the salary between two players, starting from the highest salary. t=> SELECT "Name", . "Salary", . LAG("Salary") OVER (ORDER BY "Salary" DESC) AS "Salary - previous", . LAG("Salary",1) OVER (ORDER BY "Salary" DESC)- "Salary" AS "Salary - diff" . -- LAG("Salary",1) is equivalent to LAG("Salary") . FROM nba . LIMIT 11 ; Name | Salary | Salary - previous | Salary - diff ------+------+------+------Kobe Bryant | 25000000 | | LeBron James | 22970500 | 25000000 | 2029500 Carmelo Anthony | 22875000 | 22970500 | 95500 Dwight Howard | 22359364 | 22875000 | 515636 Chris Bosh | 22192730 | 22359364 | 166634 Chris Paul | 21468695 | 22192730 | 724035 Kevin Durant | 20158622 | 21468695 | 1310073 Derrick Rose | 20093064 | 20158622 | 65558 Dwyane Wade | 20000000 | 20093064 | 93064 Brook Lopez | 19689000 | 20000000 | 311000 DeAndre Jordan | 19689000 | 19689000 | 0 664 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.716 LEAD Returns a value from a subsequent row within the partition of a result set. See also: LAG. 3.2.3.1.6.717 Syntax LEAD ( expr [, offset ]) OVER ( [ PARTITION BY partition_expr [, ...]] [ ORDER BY order [ ASC | DESC ] [, ...]] ) offset ::= integer 3.2.3.1.6.718 Arguments Parameter Description expr Any value expression offset Specifies the offset of rows to scan forward. If not specified, defaults to1. 3.2.3.1.6.719 Returns Same type as expr. 3.2.3.1.6.720 Notes • offset is evaluated with respect to the current row. If not specified, offset defaults to 1. • If there is no subsequent row (calculated against the current row), LEAD returns NULL. 3.2.3.1.6.721 Examples For these examples, assume a table named nba, with the following structure: CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float ); 3.2. SQL Statements and Syntax 665 SQream DB, Release 2021.1 Here’s a peek at the table contents (Download nba.csv): Table 48: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics 3.2.3.1.6.722 Using LEAD to access next rows Changed in version 2020.2: LAG and LEAD are supported from v2020.2. This example calculates the salary between two players, starting from the highest salary. t=> SELECT "Name", . "Salary", . LEAD("Salary") OVER (ORDER BY "Salary" DESC) AS "Salary - next", . "Salary" - LEAD("Salary",1) OVER (ORDER BY "Salary" DESC) AS "Salary - diff" . -- LEAD("Salary",1) is equivalent to LEAD("Salary") . FROM nba . LIMIT 11 ; Name | Salary | Salary - next | Salary - diff ------+------+------+------Kobe Bryant | 25000000 | 22970500 | 2029500 LeBron James | 22970500 | 22875000 | 95500 Carmelo Anthony | 22875000 | 22359364 | 515636 Dwight Howard | 22359364 | 22192730 | 166634 Chris Bosh | 22192730 | 21468695 | 724035 Chris Paul | 21468695 | 20158622 | 1310073 Kevin Durant | 20158622 | 20093064 | 65558 Derrick Rose | 20093064 | 20000000 | 93064 Dwyane Wade | 20000000 | 19689000 | 311000 Brook Lopez | 19689000 | 19689000 | 0 DeAndre Jordan | 19689000 | 19689000 | 0 666 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.723 ROW_NUMBER Returns the row number of each row within the partition of a result set. ROW_NUMBER numbers all rows sequentially. See also RANK, which provides the same value for identical consecutive rows (e.g. 1, 2, 2, 4, 4, 6). 3.2.3.1.6.724 Syntax ROW_NUMBER( ) OVER ( [ PARTITION BY partition_expr [, ...]] [ ORDER BY order [ ASC | DESC ] [, ...]] ) 3.2.3.1.6.725 Arguments None 3.2.3.1.6.726 Returns The row number, with data type BIGINT. 3.2.3.1.6.727 Notes • The ORDER BY clause that is used determines the order in which the rows appear in a result set, and thus the rank. • Rank is non-deterministic. It may return different results each time it is called with a specific setof input values even if there has been no change in the table contents. 3.2.3.1.6.728 Examples For these examples, assume a table named nba, with the following structure: CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float ); 3.2. SQL Statements and Syntax 667 SQream DB, Release 2021.1 Here’s a peek at the table contents (Download nba.csv): Table 49: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics 3.2.3.1.6.729 RANK vs. ROW_NUMBER t=> SELECT ROW_NUMBER () OVER (PARTITION BY "Age" ORDER BY "Height") FROM nba; row_number ------1 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 1 (continues on next page) 668 Chapter 3. Reference SQream DB, Release 2021.1 (continued from previous page) 2 3 4 [...] t=> SELECT RANK () OVER (PARTITION BY "Age" ORDER BY "Height") FROM nba; rank ---- 1 1 1 2 2 2 5 6 6 8 8 10 10 10 13 14 14 14 14 18 19 1 2 2 2 [...] 3.2.3.1.6.730 RANK Returns the rank of each row within the partition of a result set. The rank of a row is defined as 1 + the number of ranks that come before the row. While ROW_NUMBER numbers all rows sequentially, RANK provides the same value for identical consecutive rows (e.g. 1, 2, 2, 4, 4, 6). 3.2.3.1.6.731 Syntax RANK( ) OVER ( [ PARTITION BY partition_expr [, ...]] (continues on next page) 3.2. SQL Statements and Syntax 669 SQream DB, Release 2021.1 (continued from previous page) [ ORDER BY order [ ASC | DESC ] [, ...]] ) 3.2.3.1.6.732 Arguments None 3.2.3.1.6.733 Returns The rank of the row, with data type BIGINT. 3.2.3.1.6.734 Notes • The ORDER BY clause that is used determines the order in which the rows appear in a result set, and thus the rank. • Rank is non-deterministic. It may return different results each time it is called with a specific setof input values even if there has been no change in the table contents. 3.2.3.1.6.735 Examples For these examples, assume a table named nba, with the following structure: CREATE TABLE nba ( "Name" varchar(40), "Team" varchar(40), "Number" tinyint, "Position" varchar(2), "Age" tinyint, "Height" varchar(4), "Weight" real, "College" varchar(40), "Salary" float ); Here’s a peek at the table contents (Download nba.csv): 670 Chapter 3. Reference SQream DB, Release 2021.1 Table 50: nba.csv Name Team Num- Posi- Age Height Weight College Salary ber tion Avery Boston 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 Bradley Celtics Jae Crowder Boston 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 Celtics John Holland Boston 30.0 SG 27.0 6-5 205.0 Boston Univer- Celtics sity R.J. Hunter Boston 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 Celtics Jonas Jere- Boston 8.0 PF 29.0 6-10 231.0 5000000.0 bko Celtics Amir John- Boston 90.0 PF 29.0 6-9 240.0 12000000.0 son Celtics Jordan Boston 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 Mickey Celtics Kelly Olynyk Boston 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0 Celtics Terry Rozier Boston 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0 Celtics 3.2.3.1.6.736 RANK vs. ROW_NUMBER t=> SELECT ROW_NUMBER () OVER (PARTITION BY "Age" ORDER BY "Height") FROM nba; row_number ------1 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 1 2 (continues on next page) 3.2. SQL Statements and Syntax 671 SQream DB, Release 2021.1 (continued from previous page) 3 4 [...] t=> SELECT RANK () OVER (PARTITION BY "Age" ORDER BY "Height") FROM nba; rank ---- 1 1 1 2 2 2 5 6 6 8 8 10 10 10 13 14 14 14 14 18 19 1 2 2 2 [...] 3.2.3.1.6.737 FIRST_VALUE The FIRST_VALUE function returns the value located in the selected column of the first row of a segment. If the table is not segmented, the FIRST_VALUE function returns the value from the first row of the whole table. This function returns the same type of variable that you input for your requested value. For example, requesting the value for the first employee in a list using an int type output returns an int type ID column. If you use a varchar type, the function returns a varchar type name column. 3.2.3.1.6.738 Syntax The following shows the correct syntax for the FIRST_VALUE function. FIRST_VALUE(Selected_Column) OVER (...) 672 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.739 Arguments None 3.2.3.1.6.740 Returns Returns the value located in the selected column of the first row of a segment. 3.2.3.1.6.741 LAST_VALUE The LAST_VALUE function returns the value located in the selected column of the last row of a segment. If the table is not segmented, the LAST_VALUE function returns the value from the last row of the whole table. This function returns the same type of variable that you input for your requested value. For example, requesting the value for the last employee in a list using an int type output returns an int type ID column. If you use a varchar type, the function returns a varchar type name column. 3.2.3.1.6.742 Syntax The following shows the correct syntax for the LAST_VALUE function. LAST_VALUE(Selected_Column) OVER (...) 3.2.3.1.6.743 Arguments None 3.2.3.1.6.744 Returns Returns the value located in the selected column of the last row of a segment. 3.2.3.1.6.745 NTH_VALUE The NTH_VALUE function returns the value located in the selected column of a specified row of a segment. While the NTH_VALUE function is identical to the FIRST_VALUE and LAST_VALUE functions, it requires you to add a literal and whole int type n value representing the row number containing the value that you want. If you select an n value larger than the number of rows in the table, the NTH_VALUE function returns NULL. 3.2.3.1.6.746 Syntax The following shows the correct syntax for the NTH_VALUE function. NTH_VALUE(Selected_Column, n) OVER (...) 3.2. SQL Statements and Syntax 673 SQream DB, Release 2021.1 3.2.3.1.6.747 Examples The following example shows the syntax for a table named superstore used for tracking item sales in thousands: CREATE TABLE superstore ( "Section" varchar(40), "Product_Name" varchar(40), "Sales_In_K" int, ); The following example shows the output of the syntax above: Section Product_Name Sales_In_K bed pillow_case 17 bath rubber_duck 22 beyond curtain 15 bed side_table 8 bath shampoo 9 bed blanket 7 bath bath_bomb 13 bath conditioner 17 beyond lamp 7 bath soap 13 beyond rug 12 bed pillow 17 3.2.3.1.6.748 DENSE_RANK The DENSE_RANK function returns data ranking information and is similar to the RANK function. Its main difference is related to the results of identical line values. While the RANK function proceeds with the next row after providing the same value for identical consecutive rows (for example, 1,1,3), the DENSE_RANK function proceeds with the next number in the sequence (1,1,2). 3.2.3.1.6.749 Syntax The following shows the correct syntax for the DENSE_RANK function. DENSE_RANK ( ) OVER (...) 3.2.3.1.6.750 Arguments None 3.2.3.1.6.751 Returns Returns data ranking information. 674 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.752 PERCENT_RANK The PERCENT_RANK function returns relative data ranking information and is similar to the RANK function. It differs from the RANK function in the following ways: • The output type is double instead of bigint. • Shows ranking as a relative rank number (for example, a number between and including 0 and 1), and not an absolute rank number (for example, 1, 1, 3). 3.2.3.1.6.753 Syntax The following shows the correct syntax for the PERCENT_RANK function. PERCENT_RANK ( ) OVER (...) 3.2.3.1.6.754 Arguments None 3.2.3.1.6.755 Returns Returns relative data ranking information in the form of a number between (and including) 0 and 1. 3.2.3.1.6.756 CUME_DIST The CUME_DIST function returns the cumulative distribution for each of your partitioned rows between the range of 1/n (with n being the total number of rows), and 1. For example, in a table with ten rows, the cumulative distance for the first row is 0.1 and the second is 0.2, etc. The cumulative distance of the final row is always 1. 3.2.3.1.6.757 Syntax The following shows the correct syntax for the CUME_DIST function. CUME_DIST ( ) OVER (...) 3.2.3.1.6.758 Arguments None 3.2.3.1.6.759 Returns Returns the cumulative distribution for each of your partitioned rows. 3.2. SQL Statements and Syntax 675 SQream DB, Release 2021.1 3.2.3.1.6.760 NTILE The NTILE function returns an integer ranging between 1 and the argument value, and divides a table’s rows into n ranked buckets according to the following: • The selected column. • The selected positive, literal n. Selecting an n that exceeds the number of rows in a table outputs a result identical to the RANK function. When this happens, each ranked bucket is populated with a single value and any bucket numbered higher than the number of rows not even being created. Additionally, if the number of rows in a bucket is not exactly divisible by n, the function gives precedence to values at the top. For example, if n is 6 for 63 values, the top three buckets each have 11 values, and the bottom three each have 10. 3.2.3.1.6.761 Syntax The following shows the correct syntax for the NTILE function. NTILE(n) OVER (...) 3.2.3.1.6.762 System Functions System functions are used for working with database objects, settings, and values. 3.2.3.1.6.763 SHOW_VERSION SHOW_VERSION() is a function that returns the system version for SQream DB. 3.2.3.1.6.764 Permissions No special permissions are required. 3.2.3.1.6.765 Syntax show_version_statement ::= SELECT SHOW_VERSION() ; 3.2.3.1.6.766 Parameters None 3.2.3.1.6.767 Notes To check the SQream DB version from the shell, run $ sqreamd --version 676 Chapter 3. Reference SQream DB, Release 2021.1 3.2.3.1.6.768 Examples 3.2.3.1.6.769 Getting the current SQream DB version t=> SELECT SHOW_VERSION(); bytesread ------v2019.3 3.3 Catalog reference SQream DB contains a schema called sqream_catalog that contains information about your database’s objects - tables, columns, views, permissions, and more. Some additional catalog tables are used primarily for internal introspection, which could change across SQream DB versions. In this topic: • Types of data exposed by sqream_catalog • Tables in the catalog – clustering_keys – columns – external_tables – external_table_columns – databases – database_permissions – identity_key – permission_types – roles – roles_memberships – savedqueries – schemas – schema_permissions – tables – table_permissions – udf_permissions – user_defined_functions – views • Additional tables 3.3. Catalog reference 677 SQream DB, Release 2021.1 – extents – chunk_columns – chunks – delete_predicates • Examples – List all tables in the database – List all schemas in the database – List columns and their types for a specific table – List delete predicates – List Saved queries 3.3.1 Types of data exposed by sqream_catalog Table 51: Database objects Object Table Clustering clustering_keys keys Columns columns, external_table_columns Databases databases Permissions table_permissions, database_permissions, schema_permissions, permission_types, udf_permissions Roles roles, roles_memeberships Schemas schemas Sequences identity_key Tables tables, external_tables Views views UDFs user_defined_functions The catalog contains a few more tables which contain storage details for internal use Table 52: Storage objects Object Table Extents extents Chunks chunks Delete predicates delete_predicates 3.3.2 Tables in the catalog 3.3.2.1 clustering_keys Explicit clustering keys for tables. When more than one clustering key is defined, each key is listed in a separate row. 678 Chapter 3. Reference SQream DB, Release 2021.1 Column Description database_name Name of the database containing the table table_id ID of the table containing the column schema_name Name of the schema containing the table table_name Name of the table containing the column clustering_key Name of the column that is a clustering key for this table 3.3.2.2 columns Column objects for standard tables Column Description database_name Name of the database containing the table schema_name Name of the schema containing the table table_id ID of the table containing the column table_name Name of the table containing the column column_id Ordinal of the column in the table (begins at 0) column_name Name of the column type_name Data type of the column column_size The maximum length in bytes. has_default NULL if the column has no default value. 1 if the default is a fixed value, or 2 if the default is an Identity default_value Default value for the column compression_strategyUser-overridden compression strategy created Timestamp when the column was created altered Timestamp when the column was last altered 3.3.2.3 external_tables external_tables identifies external tables in the database. For TABLES see tables Column Description database_name Name of the database containing the table table_id Database-unique ID for the table schema_name Name of the schema containing the table table_name Name of the table format Identifies the foreign data wrapper used. 0 for csv_fdw, 1 for parquet_fdw, 2 for orc_fdw. created Identifies the clause used to create the table 3.3.2.4 external_table_columns Column objects for external tables 3.3. Catalog reference 679 SQream DB, Release 2021.1 3.3.2.5 databases Column Description database_Id Unique ID of the database database_name Name of the database default_disk_chunk_size Internal use default_process_chunk_size Internal use rechunk_size Internal use storage_subchunk_size Internal use compression_chunk_size_threshold Internal use 3.3.2.6 database_permissions database_permissions identifies all permissions granted to databases. There is one row for each combination of role (grantee) and permission granted to a database. Column Description database_name Name of the database the permission applies to role_id ID of the role granted permissions (grantee) permission_type Identifies the permission type 3.3.2.7 identity_key 3.3.2.8 permission_types permission_types Identifies the permission names that exist in the database. Column Description permission_type_id ID of the permission type name Name of the permission type 3.3.2.9 roles roles identifies the roles in the database. Column Description role_id Database-unique ID of the role name Name of the role superuser Identifies if this role is a superuser. 1 for superuser or 0 otherwise. login Identifies if this role can be used to log in to SQream DB. 1 for yes or 0 otherwise. has_password Identifies if this role has a password. 1 for yes or 0 otherwise. can_create_function Identifies if this role can create UDFs. 1 for yes, 0 otherwise. 3.3.2.10 roles_memberships roles_memberships identifies the role memberships in the database. 680 Chapter 3. Reference SQream DB, Release 2021.1 Column Description role_id Role ID member_role_id ID of the parent role from which this role will inherit inherit Identifies if permissions are inherited. 1 for yes or 0 otherwise. 3.3.2.11 savedqueries savedqueries identifies the saved_queries in the database. Column Description name Saved query name num_parameters Number of parameters to be replaced at run-time 3.3.2.12 schemas schemas identifies all the database’s schemas. Column Description schema_id Unique ID of the schema schema_name Name of the schema schema_owner Name of the role who owns this schema rechunker_ignore Internal use 3.3.2.13 schema_permissions schema_permissions identifies all permissions granted to schemas. There is one row for each combination of role (grantee) and permission granted to a schema. Column Description database_name Name of the database containing the schema schema_id ID of the schema the permission applies to role_id ID of the role granted permissions (grantee) permission_type Identifies the permission type 3.3.2.14 tables tables identifies proper SQream tables in the database. For EXTERNAL TABLES see external_tables Column Description database_name Name of the database containing the table table_id Database-unique ID for the table schema_name Name of the schema containing the table table_name Name of the table row_count_valid Identifies if the row_count can be used row_count Number of rows in the table rechunker_ignore Internal use 3.3. Catalog reference 681 SQream DB, Release 2021.1 3.3.2.15 table_permissions table_permissions identifies all permissions granted to tables. There is one row for each combination of role (grantee) and permission granted to a table. Column Description database_name Name of the database containing the table table_id ID of the table the permission applies to role_id ID of the role granted permissions (grantee) permission_type Identifies the permission type 3.3.2.16 udf_permissions 3.3.2.17 user_defined_functions user_defined_functions identifies UDFs in the database. Column Description database_name Name of the database containing the view function_id Database-unique ID for the UDF function_name Name of the UDF 3.3.2.18 views views identifies views in the database. Column Description view_id Database-unique ID for the view view_schema Name of the schema containing the view view_name Name of the view view_data Internal use view_query_text Identifies the AS clause used to create the view 3.3.3 Additional tables There are additional tables in the catalog that can be used for performance monitoring and inspection. The definition for these tables is provided below could change across SQream DBversions. 3.3.3.1 extents extents identifies storage extents. Each storage extents can contain several chunks. Note: This is an internal table designed for low-level performance troubleshooting. 682 Chapter 3. Reference SQream DB, Release 2021.1 Column Description database_name Name of the databse containing the extent table_id ID of the table containing the extent column_id ID of the column containing the extent extent_id ID for the extent size Extent size in megabytes path Full path to the extent on the file system 3.3.3.2 chunk_columns chunk_columns lists chunk information by column. Column Description database_name Name of the databse containing the extent table_id ID of the table containing the extent column_id ID of the column containing the extent chunk_id ID for the chunk extent_id ID for the extent compressed_size Actual chunk size in bytes uncompressed_size Uncompressed chunk size in bytes compression_type Actual compression scheme for this chunk long_min Minimum numeric value in this chunk (if exists) long_max Maximum numeric value in this chunk (if exists) string_min Minimum text value in this chunk (if exists) string_max Maximum text value in this chunk (if exists) offset_in_file Internal use Note: This is an internal table designed for low-level performance troubleshooting. 3.3.3.3 chunks chunks identifies storage chunks. Note: This is an internal table designed for low-level performance troubleshooting. Column Description database_nameName of the databse containing the chunk table_id ID of the table containing the chunk column_id ID of the column containing the chunk rows_num Amount of rows contained in the chunk deletion_statusWhen data is deleted from the table, it is first deleted logically. This value identifies howmuch data is deleted from the chunk. 0 for no data, 1 for some data, 2 to specify the entire chunk is deleted. 3.3. Catalog reference 683 SQream DB, Release 2021.1 3.3.3.4 delete_predicates delete_predicates identifies the existing delete predicates that have not been cleaned up. Each DELETE command may result in several entries in this table. Note: This is an internal table designed for low-level performance troubleshooting. Column Description database_name Name of the databse containing the predicate table_id ID of the table containing the predicate max_chunk_id Internal use. Placeholder marker for the highest chunk_id logged during the DELETE operation. delete_predicate Identifies the DELETE predicate 3.3.4 Examples 3.3.4.1 List all tables in the database master=> SELECT * FROM sqream_catalog.tables; database_name | table_id | schema_name | table_name | row_count_valid | row_count |␣ ,→rechunker_ignore ------+------+------+------+------+------+-- ,→------master | 1 | public | nba | true | 457 | ␣ ,→ 0 master | 12 | public | cool_dates | true | 5 | ␣ ,→ 0 master | 13 | public | cool_numbers | true | 9 | ␣ ,→ 0 master | 27 | public | jabberwocky | true | 8 | ␣ ,→ 0 3.3.4.2 List all schemas in the database master=> SELECT * FROM sqream_catalog.schemas; schema_id | schema_name | schema_owner | rechunker_ignore ------+------+------+------0 | public | sqream | false 1 | secret_schema | mjordan | false 3.3.4.3 List columns and their types for a specific table SELECT column_name, type_name FROM sqream_catalog.columns WHERE table_name='cool_animals'; 684 Chapter 3. Reference SQream DB, Release 2021.1 3.3.4.4 List delete predicates SELECT t.table_name, d.* FROM sqream_catalog.delete_predicates AS d INNER JOIN sqream_catalog.tables AS t ON d.table_id=t.table_id; 3.3.4.5 List Saved queries SELECT * FROM sqream_catalog.savedqueries; 3.4 Command line programs SQream contains several command line programs for using, starting, managing, and configuring SQream DB clusters. This topic contains the reference for these programs, as well as flags and configuration settings. Table 53: User CLIs Command Usage sqream sql Built-in SQL client Table 54: SQream DB cluster components Command Usage sqreamd Start a SQream DB worker metadata_server The cluster manager/coordinator that enables scaling SQream DB. server_picker Load balancer end-point Table 55: SQream DB utilities Command Usage SqreamStorage Initialize a cluster and set superusers upgrade_storage Upgrade metadata schemas when upgrading between major versions Table 56: Docker utilities Command Usage sqream_console Dockerized convenience wrapper for operations sqream_installer Dockerized installer 3.4.1 metadata_server SQream DB’s cluster manager/coordinator is called metadata_server. In general, you should not need to run metadata_server manually, but it is sometimes useful for testing. This page serves as a reference for the options and parameters. 3.4. Command line programs 685 SQream DB, Release 2021.1 3.4.1.1 Positional command line arguments $ metadata_server [ Argument Default Description Logging path Current directory Path to store metadata logs into Listen port 3105 TCP listen port. If used, log path must be specified beforehand. 3.4.1.2 Starting metadata server 3.4.1.2.1 Starting temporarily $ nohup metadata_server & $ MS_PID=$! Using nohup and & sends metadata server to run in the background. Note: • Logs are saved to the current directory, under metadata_server_logs. • The default listening port is 3105 3.4.1.2.2 Starting temporarily with non-default port To use a non-default port, specify the logging path as well. $ nohup metadata_server /home/rhendricks/metadata_logs 9241 & $ MS_PID=$! Using nohup and & sends metadata server to run in the background. Note: • Logs are saved to the /home/rhendricks/metadata_logs directory. • The listening port is 9241 3.4.1.2.3 Stopping metadata server To stop metadata server: $ kill -9 $MS_PID Tip: It is safe to stop any SQream DB component at any time using kill. No partial data or data corruption should occur when using this method to stop the process. 686 Chapter 3. Reference SQream DB, Release 2021.1 3.4.2 sqreamd SQream DB’s main worker is called sqreamd. This page serves as a reference for the options and parameters. 3.4.2.1 Starting SQream DB 3.4.2.1.1 Start SQream DB temporarily In general, you should not need to run sqreamd manually, but it is sometimes useful for testing. $ nohup sqreamd -config ~/.sqream/sqream_config.json & $ SQREAM_PID=$! Using nohup and & sends SQream DB to run in the background. To stop the active worker: $ kill -9 $SQREAM_PID Tip: It is safe to stop SQream DB at any time using kill. No partial data or data corruption should occur when using this method to stop the process. 3.4.2.2 Command line arguments sqreamd supports the following command line arguments: Argument Default Description --version None Outputs the version of SQream DB and immediately exits. -config $HOME/.sqream/ Specifies the configuration file touse sqream_config.json --port_ssl Don’t use SSL When specified, tells SQream DB to listen for SSL con- nections 3.4.2.2.1 Positional command arguments sqreamd also supports positional arguments, when not using a configuration file. This method can be used to temporarily start a SQream DB worker for testing. $ sqreamd 3.4. Command line programs 687 SQream DB, Release 2021.1 Argument Re- Description quired Storage path ✓ Full path to a valid SQream DB persistant storage GPU Ordinal ✓ Number representing the GPU to use. Check GPU ordinals with nvidia-smi -L TCP listen port (unse- ✓ TCP port SQream DB should listen on. Recommended: 5000 cured) License path ✓ Full path to a SQream DB license file 3.4.3 sqream-console sqream-console is an interactive shell designed to help manage a dockerized SQream DB installation. The console itself is a dockerized application. This page serves as a reference for the options and parameters. In this topic: • Starting the console • Operations and flag reference – Commands – Master ∗ Syntax ∗ Common usage · Start master node · Start master node on different ports · Listing active master nodes and workers · Stopping all SQream DB workers and master – Workers ∗ Syntax ∗ Common usage · Start 2 workers · Stop a single worker · Start workers with a different spool size · Starting multiple workers on non-dedicated GPUs · Overriding default configuration files – Client ∗ Syntax ∗ Common usage · Start a client 688 Chapter 3. Reference SQream DB, Release 2021.1 · Start a client to a specific worker · Start master node on different ports · Listing active master nodes and worker nodes – Editor ∗ Syntax ∗ Common usage · Start the editor UI · Stop the editor UI • Using the console to start SQream DB – Starting a SQream DB cluster for the first time 3.4.3.1 Starting the console sqream-console can be found in your SQream DB installation, under the name sqream-console. Start the console by executing it from the shell $ ./sqream-console ...... ,→...... ...... ,→...... Welcome to SQream Console ver 1.7.6, type exit to log-out usage: sqream [-h] [--settings] {master,worker,client,editor} ... Run SQream Cluster optional arguments: -h, --help show this help message and exit --settings sqream environment variables settings subcommands: sqream services {master,worker,client,editor} sub-command help (continues on next page) 3.4. Command line programs 689 SQream DB, Release 2021.1 (continued from previous page) master start sqream master worker start sqream worker client operating sqream client editor operating sqream statement editor sqream-console> The console is now waiting for commands. The console is a wrapper around a standard linux shell. It supports commands like ls, cp, etc. All SQream DB-specific commands start with the keyword sqream. 3.4.3.2 Operations and flag reference 3.4.3.2.1 Commands Command Description sqream --help Shows the initial usage information sqream master Controls the master node’s operations sqream worker Controls workers’ operations sqream client Access to sqream sql sqream editor Controls the statement editor’s operations (web UI) 3.4.3.2.2 Master The master node contains the metadata server and the load balancer. 3.4.3.2.2.1 Syntax sqream master Flag/command Description --start [ Starts the master node. The --single-host modifier sets the mode to allow all --single-host containers to run on the same server. ] --stop [ Stops the master node and all connected workers. The --all modifier instructs the --all ] --stop command to stop all running services related to SQream DB --list Shows a list of all active master nodes and their workers -p 690 Chapter 3. Reference SQream DB, Release 2021.1 3.4.3.2.2.2 Common usage 3.4.3.2.2.3 Start master node sqream-console> sqream master --start starting master server in single_host mode ... sqream_single_host_master is up and listening on ports: 3105,3108 3.4.3.2.2.4 Start master node on different ports sqream-console> sqream master --start -p 4105 -m 4108 starting master server in single_host mode ... sqream_single_host_master is up and listening on ports: 4105,4108 3.4.3.2.2.5 Listing active master nodes and workers sqream-console> sqream master --list container name: sqream_single_host_worker_1, container id: de9b8aff0a9c container name: sqream_single_host_worker_0, container id: c919e8fb78c8 container name: sqream_single_host_master, container id: ea7eef80e038 3.4.3.2.2.6 Stopping all SQream DB workers and master sqream-console> sqream master --stop --all shutting down 2 sqream services ... sqream_editor stopped sqream_single_host_worker_1 stopped sqream_single_host_worker_0 stopped sqream_single_host_master stopped 3.4.3.2.3 Workers Workers are SQream DB daemons, that connect to the master node. 3.4.3.2.3.1 Syntax sqream worker Flag/command Description --start [ options [ Starts worker nodes. See options table below. ...] ] --stop [ 3.4. Command line programs 691 SQream DB, Release 2021.1 Start options are specified consecutively, separated by spaces. Table 57: Start options Option Description 3.4.3.2.3.2 Common usage 3.4.3.2.3.3 Start 2 workers After starting the master node, start workers: sqream-console> sqream worker --start 2 started sqream_single_host_worker_0 on port 5000, allocated gpu: 0 started sqream_single_host_worker_1 on port 5001, allocated gpu: 1 3.4.3.2.3.4 Stop a single worker To stop a single worker, find its name first: sqream-console> sqream master --list container name: sqream_single_host_worker_1, container id: de9b8aff0a9c container name: sqream_single_host_worker_0, container id: c919e8fb78c8 container name: sqream_single_host_master, container id: ea7eef80e038 Then, issue a stop command: sqream-console> sqream worker --stop sqream_single_host_worker_1 stopped sqream_single_host_worker_1 3.4.3.2.3.5 Start workers with a different spool size If no spool size is specified, the RAM is equally distributed among workers. Sometimes a system engineer may wish to specify the spool size manually. This example starts two workers, with a spool size of 50GB per node: 692 Chapter 3. Reference SQream DB, Release 2021.1 sqream-console> sqream worker --start 2 -m 50 3.4.3.2.3.6 Starting multiple workers on non-dedicated GPUs By default, SQream DB workers assign one worker per GPU. However, a system engineer may wish to assign multiple workers per GPU, if the workload permits it. This example starts 4 workers on 2 GPUs, with 50GB spool each: sqream-console> sqream worker --start 2 -g 0 -m 50 started sqream_single_host_worker_0 on port 5000, allocated gpu: 0 started sqream_single_host_worker_1 on port 5001, allocated gpu: 0 sqream-console> sqream worker --start 2 -g 1 -m 50 started sqream_single_host_worker_2 on port 5002, allocated gpu: 1 started sqream_single_host_worker_3 on port 5003, allocated gpu: 1 3.4.3.2.3.7 Overriding default configuration files It is possible to override default configuration settings by listing a configuration file for every worker. This example starts 2 workers on the same GPU, with modified configuration files: sqream-console> sqream worker --start 2 -g 0 -j /etc/sqream/configfile.json /etc/sqream/ ,→configfile2.json 3.4.3.2.4 Client The client operation runs sqream sql in interactive mode. Note: The dockerized client is useful for testing and experimentation. It is not the recommended method for executing analytic queries. See more about connecting a third party tool to SQream DB for data analysis. 3.4.3.2.4.1 Syntax sqream client 3.4. Command line programs 693 SQream DB, Release 2021.1 Flag/command Description --master Connects to the master node via the load balancer --worker Connects to a worker directly --host 3.4.3.2.4.2 Common usage 3.4.3.2.4.3 Start a client Connect to default master database through the load balancer: sqream-console> sqream client --master -u sqream -w sqream Interactive client mode To quit, use ^D or \q. master=> _ 3.4.3.2.4.4 Start a client to a specific worker Connect to database raviga directly to a worker on port 5000: sqream-console> sqream client --worker -u sqream -w sqream -p 5000 -d raviga Interactive client mode To quit, use ^D or \q. raviga=> _ 3.4.3.2.4.5 Start master node on different ports sqream-console> sqream master --start -p 4105 -m 4108 starting master server in single_host mode ... sqream_single_host_master is up and listening on ports: 4105,4108 3.4.3.2.4.6 Listing active master nodes and worker nodes sqream-console> sqream master --list container name: sqream_single_host_worker_1, container id: de9b8aff0a9c (continues on next page) 694 Chapter 3. Reference SQream DB, Release 2021.1 (continued from previous page) container name: sqream_single_host_worker_0, container id: c919e8fb78c8 container name: sqream_single_host_master, container id: ea7eef80e038 3.4.3.2.5 Editor The editor operation runs the web UI for the SQream DB Statement Editor. The editor can be used to run queries from a browser. 3.4.3.2.5.1 Syntax sqream editor Flag/command Description --start Start the statement editor --stop Shut down the statement editor --port 3.4.3.2.5.2 Common usage 3.4.3.2.5.3 Start the editor UI sqream-console> sqream editor --start access sqream statement editor through Chrome http://192.168.0.100:3000 3.4.3.2.5.4 Stop the editor UI sqream-console> sqream editor --stop sqream_editor stopped 3.4.3.3 Using the console to start SQream DB The console is used to start and stop SQream DB components in a dockerized environment. 3.4.3.3.1 Starting a SQream DB cluster for the first time To start a SQream DB cluster, start the master node, followed by workers. The example below starts 2 workers, running on 2 dedicated GPUs. sqream-console> sqream master --start starting master server in single_host mode ... sqream_single_host_master is up and listening on ports: 3105,3108 (continues on next page) 3.4. Command line programs 695 SQream DB, Release 2021.1 (continued from previous page) sqream-console> sqream worker --start 2 started sqream_single_host_worker_0 on port 5000, allocated gpu: 0 started sqream_single_host_worker_1 on port 5001, allocated gpu: 1 sqream-console> sqream editor --start access sqream statement editor through Chrome http://192.168.0.100:3000 SQream DB is now listening on port 3108 for any incoming statements. A user can also access the web editor (running on port 3000 on the SQream DB machine) to connect and run queries. 3.4.4 sqream-installer sqream-installer is an application that prepares and configures a dockerized SQream DB installation. This page serves as a reference for the options and parameters. In this topic: • Operations and flag reference – Command line flags • Usage – Install SQream DB for the first time – Modify exposed directories – Install a new license package – View system settings – Upgrading to a new version of SQream DB 696 Chapter 3. Reference SQream DB, Release 2021.1 3.4.4.1 Operations and flag reference 3.4.4.1.1 Command line flags Flag Description -i Loads the docker images for installation -k Load new licenses from the license subdirectory -K Validate licenses -f Force overwrite any existing installation and data directories cur- rently in use -c 3.4.4.2 Usage 3.4.4.2.1 Install SQream DB for the first time Assuming license package tarball has been placed in the license subfolder. • The path where SQream DB will store data is /home/rhendricks/sqream_storage. • Logs will be stored in /var/log/sqream • Source CSV, Parquet, and ORC files can be accessed from /home/rhendricks/source_data. All other directory paths are hidden from the Docker container. # ./sqream-install -i -k -v /home/rhendricks/sqream_storage -l /var/log/sqream -c /etc/ ,→sqream -d /home/rhendricks/source_data Note: Installation commands should be run with sudo or root access. 3.4.4.2.2 Modify exposed directories To expose more directory paths for SQream DB to read and write data from, re-run the installer with additional directory flags. # ./sqream-install -d /home/rhendricks/more_source_data There is no need to specify the initial installation flags - only the modified exposed directory paths flag. 3.4. Command line programs 697 SQream DB, Release 2021.1 3.4.4.2.3 Install a new license package Assuming license package tarball has been placed in the license subfolder. # ./sqream-install -k 3.4.4.2.4 View system settings This information may be useful to identify problems accessing directory paths, or locating where data is stored. # ./sqream-install -s SQREAM_CONSOLE_TAG=1.7.4 SQREAM_TAG=2020.1 SQREAM_EDITOR_TAG=3.1.0 license_worker_0=[...] license_worker_1=[...] license_worker_2=[...] license_worker_3=[...] SQREAM_VOLUME=/home/rhendricks/sqream_storage SQREAM_DATA_INGEST=/home/rhendricks/source_data SQREAM_CONFIG_DIR=/etc/sqream/ LICENSE_VALID=true SQREAM_LOG_DIR=/var/log/sqream/ SQREAM_USER=sqream SQREAM_HOME=/home/sqream SQREAM_ENV_PATH=/home/sqream/.sqream/env_file PROCESSOR=x86_64 METADATA_PORT=3105 PICKER_PORT=3108 NUM_OF_GPUS=8 CUDA_VERSION=10.1 NVIDIA_SMI_PATH=/usr/bin/nvidia-smi DOCKER_PATH=/usr/bin/docker NVIDIA_DRIVER=418 SQREAM_MODE=single_host 3.4.4.2.5 Upgrading to a new version of SQream DB When upgrading to a new version with Docker, most settings don’t need to be modified. The upgrade process replaces the existing docker images with new ones. 1. Obtain the new tarball, and untar it to an accessible location. Enter the newly extracted directory. 2. Install the new images # ./sqream-install -i 3. The upgrade process will check for running SQream DB processes. If any are found running, the installer will ask to stop them in order to continue the upgrade process. Once all services are stopped, the new version will be loaded. 698 Chapter 3. Reference SQream DB, Release 2021.1 4. After the upgrade, open sqream-console and restart the desired services. 3.4.5 server_picker SQream DB’s load balancer is called server_picker. This page serves as a reference for the options and parameters. 3.4.5.1 Positional command line arguments $ server_picker [ Argument Default Description Metadata server address IP or hostname to an active metadata server Metadata server port TCP port to an active metadata server TCP listen port 3108 TCP port for server picker to listen on Metadata server port 3109 SSL port for server picker to listen on 3.4.5.2 Starting server picker 3.4.5.2.1 Starting temporarily In general, you should not need to run server_picker manually, but it is sometimes useful for testing. Assuming we have a metadata server listening on the localhost, on port 3105: $ nohup server_picker 127.0.0.1 3105 & $ SP_PID=$! Using nohup and & sends server picker to run in the background. 3.4.5.2.2 Starting temporarily with non-default port Tell server picker to listen on port 2255 for unsecured connections, and port 2266 for SSL connections. $ nohup server_picker 127.0.0.1 3105 2255 2266 & $ SP_PID=$! Using nohup and & sends server picker to run in the background. 3.4.5.2.3 Stopping server picker $ kill -9 $SP_PID Tip: It is safe to stop any SQream DB component at any time using kill. No partial data or data corruption should occur when using this method to stop the process. 3.4. Command line programs 699 SQream DB, Release 2021.1 3.4.6 SqreamStorage SqreamStorage allows the creation of a new storage cluster. This page serves as a reference for the options and parameters. 3.4.6.1 Running SqreamStorage SqreamStorage can be found in the bin directory of your SQream DB installation.. 3.4.6.2 Command line arguments SqreamStorage supports the following command line arguments: Argument Shorthand Description --create-cluster -C Creates a storage cluster at a specified path --cluster-root -r Specifies the cluster path. The path must not already exist. 3.4.6.3 Examples 3.4.6.3.1 Create a new storage cluster Create a new cluster at /home/rhendricks/raviga_database: $ SqreamStorage --create-cluster --cluster-root /home/rhendricks/raviga_database Setting cluster version to: 26 This can also be written shorthand as SqreamStorage -C -r /home/rhendricks/raviga_database. This message confirms the creation of the cluster successfully. 3.4.7 Sqream SQL CLI Reference SQream DB comes with a built-in client for executing SQL statements either interactively or from the command-line. This page serves as a reference for the options and parameters. Learn more about using SQream DB SQL with the CLI by visiting the First steps with SQream DB tutorial. In this topic: • Installing Sqream SQL – Troubleshooting Sqream SQL Installation • Using Sqream SQL – Running Commands Interactively (SQL shell) – Executing Batch Scripts (-f) – Executing Commands Immediately (-c) 700 Chapter 3. Reference SQream DB, Release 2021.1 • Examples – Starting a Regular Interactive Shell – Executing Statements in an Interactive Shell – Executing SQL Statements from the Command Line – Controlling the Client Output ∗ Exporting SQL Query Results to CSV ∗ Changing a CSV to a TSV – Executing a Series of Statements From a File – Connecting Using Environment Variables – Connecting to a Specific Queue • Operations and Flag References – Command Line Arguments ∗ Supported Record Delimiters – Meta-Commands – Basic Commands – Moving Around the Command Line – Searching 3.4.7.1 Installing Sqream SQL If you have a SQream DB installation on your server, sqream sql can be found in the bin directory of your SQream DB installation, under the name sqream. Note: If you installed SQream DB via Docker, the command is named sqream-client sql, and can be found in the same location as the console. Changed in version 2020.1: As of version 2020.1, ClientCmd has been renamed to sqream sql. To run sqream sql on any other Linux host: 1. Download the sqream sql tarball package from the Client drivers for v2021.1 page. 2. Untar the package: tar xf sqream-sql-v2020.1.1_stable.x86_64.tar.gz 3. Start the client: $ cd sqream-sql-v2020.1.1_stable.x86_64 $ ./sqream sql --port=5000 --username=jdoe --databasename=master Password: Interactive client mode To quit, use ^D or \q. master=> _ 3.4. Command line programs 701 SQream DB, Release 2021.1 3.4.7.1.1 Troubleshooting Sqream SQL Installation Upon running sqream sql for the first time, you may get an error error while loading shared libraries: libtinfo.so.5: cannot open shared object file: No such file or directory. Solving this error requires installing the ncruses or libtinfo libraries, depending on your operating system. • Ubuntu: 1. Install libtinfo: $ sudo apt-get install -y libtinfo 2. Depending on your Ubuntu version, you may need to create a symbolic link to the newer libtinfo that was installed. For example, if libtinfo was installed as /lib/x86_64-linux-gnu/libtinfo.so.6.2: $ sudo ln -s /lib/x86_64-linux-gnu/libtinfo.so.6.2 /lib/x86_64-linux-gnu/ libtinfo.so.5 • CentOS / RHEL: 1. Install ncurses: $ sudo yum install -y ncurses-libs 2. Depending on your RHEL version, you may need to create a symbolic link to the newer libtinfo that was installed. For example, if libtinfo was installed as /usr/lib64/libtinfo.so.6: $ sudo ln -s /usr/lib64/libtinfo.so.6 /usr/lib64/libtinfo.so.5 3.4.7.2 Using Sqream SQL By default, sqream sql runs in interactive mode. You can issue commands or SQL statements. 3.4.7.2.1 Running Commands Interactively (SQL shell) When starting sqream sql, after entering your password, you are presented with the SQL shell. To exit the shell, type \q or Ctrl-d. $ sqream sql --port=5000 --username=jdoe --databasename=master Password: Interactive client mode To quit, use ^D or \q. master=> _ The database name shown means you are now ready to run statements and queries. Statements and queries are standard SQL, followed by a semicolon (;). Statement results are usually formatted as a valid CSV, followed by the number of rows and the elapsed time for that statement. 702 Chapter 3. Reference SQream DB, Release 2021.1 master=> SELECT TOP 5 * FROM nba; Avery Bradley ,Boston Celtics ,0,PG,25,6-2 ,180,Texas , ,→7730337 Jae Crowder ,Boston Celtics ,99,SF,25,6-6 ,235,Marquette , ,→6796117 John Holland ,Boston Celtics ,30,SG,27,6-5 ,205,Boston University , ,→\N R.J. Hunter ,Boston Celtics ,28,SG,22,6-5 ,185,Georgia State , ,→1148640 Jonas Jerebko ,Boston Celtics ,8,PF,29,6-10,231,\N,5000000 5 rows time: 0.001185s Note: Null values are represented as \N. When writing long statements and queries, it may be beneficial to use line-breaks. The prompt for amulti- line statement will change from => to ., to alert users to the change. The statement will not execute until a semicolon is used. $ sqream sql --port=5000 --username=mjordan -d master Password: Interactive client mode To quit, use ^D or \q. master=> SELECT "Age", . AVG("Salary") . FROM NBA . GROUP BY 1 . ORDER BY 2 ASC . LIMIT 5 . ; 38,1840041 19,1930440 23,2034746 21,2067379 36,2238119 5 rows time: 0.009320s 3.4.7.2.2 Executing Batch Scripts (-f) To run an SQL script, use the -f $ sqream sql --port=5000 --username=jdoe -d master -f sql_script.sql --results-only Tip: Output can be saved to a file by using redirection (>). 3.4. Command line programs 703 SQream DB, Release 2021.1 3.4.7.2.3 Executing Commands Immediately (-c) To run a statement from the console, use the -c $ sqream sql --port=5000 --username=jdoe -d nba -c "SELECT TOP 5 * FROM nba" Avery Bradley ,Boston Celtics ,0,PG,25,6-2 ,180,Texas , ,→7730337 Jae Crowder ,Boston Celtics ,99,SF,25,6-6 ,235,Marquette , ,→6796117 John Holland ,Boston Celtics ,30,SG,27,6-5 ,205,Boston University , ,→\N R.J. Hunter ,Boston Celtics ,28,SG,22,6-5 ,185,Georgia State , ,→1148640 Jonas Jerebko ,Boston Celtics ,8,PF,29,6-10,231,\N,5000000 5 rows time: 0.202618s Tip: Remove the timing and row count by passing the --results-only parameter 3.4.7.3 Examples 3.4.7.3.1 Starting a Regular Interactive Shell Connect to local server 127.0.0.1 on port 5000, to the default built-in database, master: $ sqream sql --port=5000 --username=mjordan -d master Password: Interactive client mode To quit, use ^D or \q. master=>_ Connect to local server 127.0.0.1 via the built-in load balancer on port 3108, to the default built-in database, master: $ sqream sql --port=3105 --clustered --username=mjordan -d master Password: Interactive client mode To quit, use ^D or \q. master=>_ 3.4.7.3.2 Executing Statements in an Interactive Shell Note that all SQL commands end with a semicolon. Creating a new database and switching over to it without reconnecting: 704 Chapter 3. Reference SQream DB, Release 2021.1 $ sqream sql --port=3105 --clustered --username=oldmcd -d master Password: Interactive client mode To quit, use ^D or \q. master=> create database farm; executed time: 0.003811s master=> \c farm farm=> farm=> create table animals(id int not null, name varchar(30) not null, is_angry bool␣ ,→not null); executed time: 0.011940s farm=> insert into animals values(1,'goat',false); executed time: 0.000405s farm=> insert into animals values(4,'bull',true); executed time: 0.049338s farm=> select * from animals; 1,goat ,0 4,bull ,1 2 rows time: 0.029299s 3.4.7.3.3 Executing SQL Statements from the Command Line $ sqream sql --port=3105 --clustered --username=oldmcd -d farm -c "SELECT * FROM animals␣ ,→WHERE is_angry = true" 4,bull ,1 1 row time: 0.095941s 3.4.7.3.4 Controlling the Client Output Two parameters control the dispay of results from the client: • --results-only - removes row counts and timing information • --delimiter - changes the record delimiter 3.4.7.3.4.1 Exporting SQL Query Results to CSV Using the --results-only flag removes the row counts and timing. 3.4. Command line programs 705 SQream DB, Release 2021.1 $ sqream sql --port=3105 --clustered --username=oldmcd -d farm -c "SELECT * FROM animals ,→" --results-only > file.csv $ cat file.csv 1,goat ,0 2,sow ,0 3,chicken ,0 4,bull ,1 3.4.7.3.4.2 Changing a CSV to a TSV The --delimiter parameter accepts any printable character. Tip: To insert a tab, use Ctrl-V followed by Tab in Bash. $ sqream sql --port=3105 --clustered --username=oldmcd -d farm -c "SELECT * FROM animals ,→" --delimiter '' > file.tsv $ cat file.tsv 1 goat 0 2 sow 0 3 chicken 0 4 bull 1 3.4.7.3.5 Executing a Series of Statements From a File Assuming a file containing SQL statements (separated by semicolons): $ cat some_queries.sql CREATE TABLE calm_farm_animals ( id INT IDENTITY(0, 1), name VARCHAR(30) ); INSERT INTO calm_farm_animals (name) SELECT name FROM animals WHERE is_angry = false; $ sqream sql --port=3105 --clustered --username=oldmcd -d farm -f some_queries.sql executed time: 0.018289s executed time: 0.090697s 3.4.7.3.6 Connecting Using Environment Variables You can save connection parameters as environment variables: $ export SQREAM_USER=sqream; $ export SQREAM_DATABASE=farm; $ sqream sql --port=3105 --clustered --username=$SQREAM_USER -d $SQREAM_DATABASE 706 Chapter 3. Reference SQream DB, Release 2021.1 3.4.7.3.7 Connecting to a Specific Queue When using the dynamic workload manager - connect to etl queue instead of using the default sqream queue. $ sqream sql --port=3105 --clustered --username=mjordan -d master --service=etl Password: Interactive client mode To quit, use ^D or \q. master=>_ 3.4.7.4 Operations and Flag References 3.4.7.4.1 Command Line Arguments Sqream SQL supports the following command line arguments: Argu- De- Description ment fault -c or None Changes the mode of operation to single-command, non-interactive. Use this argument --command to run a statement and immediately exit. -f or None Changes the mode of operation to multi-command, non-interactive. Use this argument --file to run a sequence of statements from an external file and immediately exit. --host 127. Address of the SQream DB worker. 0. 0.1 --port 5000 Sets the connection port. --databasenameNone Specifies the database name for queries and statements in this session. or -d --usernameNone Username to connect to the specified database. --passwordNone Specify the password using the command line argument. If not specified, the client will prompt the user for the password. --clusteredFalse When used, the client connects to the load balancer, usually on port 3108. If not set, the client assumes the connection is to a standalone SQream DB worker. --service sqreamService name (queue) that statements will file into. --results-onlyFalse Outputs results only, without timing information and row counts --no-historyFalse When set, prevents command history from being saved in ~/.sqream/clientcmdhist --delimiter, Specifies the field separator. By default, sqream sql outputs valid CSVs. Change the delimiter to modify the output to another delimited format (e.g. TSV, PSV). See the section supported record delimiters below for more information. Tip: Run $ sqream sql --help to see a full list of arguments 3.4.7.4.1.1 Supported Record Delimiters The supported record delimiters are printable ASCII values (32-126). 3.4. Command line programs 707 SQream DB, Release 2021.1 • Recommended delimiters for use are: ,, |, tab character. • The following characters are not supported: \, N, -, :, ", \n, \r, ., lower-case latin letters, digits (0-9) 3.4.7.4.2 Meta-Commands • Meta-commands in Sqream SQL start with a backslash (\) Note: Meta commands do not end with a semicolon Command Example Description \q or \quit Quit the client. (Same as master=> \q Ctrl-d) \c 3.4.7.4.3 Basic Commands 3.4.7.4.4 Moving Around the Command Line Command Description Ctrl-a Goes to the beginning of the command line. Ctrl-e Goes to the end of the command line. Ctrl-u Deletes from cursor to the beginning of the command line. Ctrl-k Deletes from the cursor to the end of the command line. Ctrl-w Delete from cursor to beginning of a word. Ctrl-y Pastes a word or text that was cut using one of the deletion shortcuts (such as the one above) after the cursor. Alt-b Moves back one word (or goes to the beginning of the word where the cursor is). Alt-f Moves forward one word (or goes to the end of word the cursor is). Alt-d Deletes to the end of a word starting at the cursor. Deletes the whole word if the cursor is at the beginning of that word. Alt-c Capitalizes letters in a word starting at the cursor. Capitalizes the whole word if the cursor is at the beginning of that word. Alt-u Capitalizes from the cursor to the end of the word. Alt-l Makes lowercase from the cursor to the end of the word. Ctrl-f Moves forward one character. Ctrl-b Moves backward one character. Ctrl-h Deletes characters located before the cursor. Ctrl-t Swaps a character at the cursor with the previous character. 708 Chapter 3. Reference SQream DB, Release 2021.1 3.4.7.4.5 Searching Command Description Ctrl-r Searches the history backward. Ctrl-g Escapes from history-searching mode. Ctrl-p Searches the previous command in history. Ctrl-n Searches the next command in history. 3.4.8 upgrade_storage upgrade_storage is used to upgrade metadata schemas, when upgrading between major versions. This page serves as a reference for the options and parameters. 3.4.8.1 Running upgrade_storage upgrade_storage can be found in the bin directory of your SQream DB installation. 3.4.8.2 Command line arguments upgrade_storage contains one positional argument: $ upgrade_storage Argument Required Description Storage path ✓ Full path to a valid storage cluster 3.4.8.3 Results and error codes Result Message Description Success storage has been Storage has been successfully upgraded upgraded successfully to version 26 Success no need to upgrade Storage doesn’t need an upgrade Failure: levelDB is in use by Check permissions, and ensure no SQream DB workers or can’t read another application metadata_server are running when performing this oper- storage ation. 3.4.8.4 Examples 3.4.8.4.1 Upgrade SQream DB’s storage cluster $ ./upgrade_storage /home/rhendricks/raviga_database get_leveldb_version path{/home/rhendricks/raviga_database} current storage version 23 upgrade_v24 (continues on next page) 3.4. Command line programs 709 SQream DB, Release 2021.1 (continued from previous page) upgrade_storage to 24 upgrade_storage to 24 - Done upgrade_v25 upgrade_storage to 25 upgrade_storage to 25 - Done upgrade_v26 upgrade_storage to 26 upgrade_storage to 26 - Done validate_leveldb storage has been upgraded successfully to version 26 This message confirms that the cluster has already been upgraded correctly. 3.5 SQL Feature Checklist To understand which ANSI SQL and other SQL features SQream DB supports, use the tables below. In this topic: • Data types and values • Contraints • Transactions • Indexes • Schema changes • Statements • Clauses • Table expressions • Scalar expressions • Permissions • Extra functionality 3.5.1 Data types and values Read more about supported data types. 710 Chapter 3. Reference SQream DB, Release 2021.1 Table 58: Value Item Sup- Further information ported BOOL ✓ Boolean values TINTINT ✓ Unsigned 1 byte integer (0 - 255) SMALLINT ✓ 2 byte integer (-32,768 - 32,767) INT ✓ 4 byte integer (-2,147,483,648 - 2,147,483,647) BIGINT ✓ 8 byte integer (-9,223,372,036,854,775,808 - 9,223,372,036,854,775,807) REAL ✓ 4 byte floating point DOUBLE, FLOAT ✓ 8 byte floating point DECIMAL, NUMERIC ✓ Fixed-point numbers. VARCHAR ✓ Variable length string - ASCII only TEXT, NVARCHAR ✓ Variable length string - UTF-8 encoded DATE ✓ Date DATETIME, ✓ Date and time TIMESTAMP NULL ✓ NULL values TIME Can be stored as a text string or as part of a DATETIME 3.5.2 Contraints Table 59: Contraints Item Supported Further information Not null ✓ NOT NULL Default values ✓ DEFAULT AUTO INCREMENT ✓ Different name IDENTITY 3.5.3 Transactions SQream DB treats each statement as an auto-commit transaction. Each transaction is isolated from other transactions with serializable isolation. If a statement fails, the entire transaction is cancelled and rolled back. The database is unchanged. Read more about transactions in SQream DB. 3.5.4 Indexes SQream DB has a range-index collected on all columns as part of the metadata collection process. SQream DB does not support explicit indexing, but does support clustering keys. Read more about clustering keys and our metadata system. 3.5. SQL Feature Checklist 711 SQream DB, Release 2021.1 3.5.5 Schema changes Table 60: Schema changes Item Sup- Further information ported ALTER TABLE ✓ ALTER TABLE - Add column, alter column, drop column, rename col- umn, rename table, modify clustering keys Rename database Rename table ✓ RENAME TABLE Rename column ✓ RENAME COLUMN Add column ✓ ADD COLUMN Remove column ✓ DROP COLUMN Alter column data type Add / modify clus- ✓ CLUSTER BY tering keys Drop clustering ✓ DROP CLUSTERING KEY keys Add / Remove con- straints Rename schema Drop schema ✓ DROP SCHEMA Alter default ✓ ALTER DEFAULT SCHEMA schema per user 3.5.6 Statements Table 61: Statements Item Supported Further information SELECT ✓ SELECT CREATE TABLE ✓ CREATE TABLE CREATE FOREIGN / EXTERNAL TABLE ✓ CREATE FOREIGN TABLE DELETE ✓ Deleting Data INSERT ✓ INSERT, COPY FROM TRUNCATE ✓ TRUNCATE UPDATE VALUES ✓ VALUES 3.5.7 Clauses Table 62: Clauses Item Supported Further information LIMIT / TOP ✓ LIMIT with OFFSET WHERE ✓ HAVING ✓ OVER ✓ 712 Chapter 3. Reference SQream DB, Release 2021.1 3.5.8 Table expressions Table 63: Table expressions Item Supported Further information Tables, Views ✓ Aliases, AS ✓ JOIN - INNER, LEFT [ OUTER ], RIGHT [ OUTER ], CROSS ✓ Table expression subqueries ✓ Scalar subqueries 3.5.9 Scalar expressions Read more about Scalar expressions. Table 64: Scalar expressions Item Sup- Further information ported Common functions ✓ CURRENT_TIMESTAMP, SUBSTRING, TRIM, EXTRACT, etc. Comparison operators ✓ <, <=, >, >=, =, <>, !=, IS, IS NOT Boolean operators ✓ AND, NOT, OR Conditional expressions ✓ CASE .. WHEN Conditional functions ✓ COALESCE Pattern matching ✓ LIKE, RLIKE, ISPREFIXOF, CHARINDEX, PATINDEX REGEX POSIX pattern matching ✓ RLIKE, REGEXP_COUNT, REGEXP_INSTR, REGEXP_SUBSTR, EXISTS IN, NOT IN Partial Literal values only Bitwise arithemtic ✓ &, |, XOR, ~, >>, << 3.5.10 Permissions Read more about Access control in SQream DB. Table 65: Permissions Item Supported Further information Roles as users and groups ✓ Object default permissions ✓ Column / Row based permissions Object ownership 3.5. SQL Feature Checklist 713 SQream DB, Release 2021.1 3.5.11 Extra functionality Table 66: Extra functionality Item Supported Further information Information schema ✓ Catalog reference Views ✓ CREATE VIEW Window functions ✓ Window Functions CTEs ✓ Common table expressions (CTEs) Saved queries, Saved queries with parameters ✓ Saved queries Sequences ✓ Identity 714 Chapter 3. Reference CHAPTER 4 Releases Version Release Date What’s New in 2021.2 September 13, 2021 What’s New in 2021.1 June 13, 2021 What’s New in 2020.3 October 8, 2020 What’s New in 2020.2 July 22, 2020 What’s New in 2020.1 January 15, 2020 4.1 What’s New in 2021.2 The 2021.2 release notes were released on 13/9/2021. 4.1.1 New Features The 2021.2 Release Notes include the following new features: • New Driver Compatibility • Centralized Configuration System • Qualifying Schemas Without Providing an Alias • Double-Quotations Supported When Importing and Exporting CSVs 4.1.1.1 New Driver Compatibility The 2021.2 release supports the following drivers: • JDBC - new driver version (JDBC 4.5) with important bug fixes. 715 SQream DB, Release 2021.1 • ODBC - ODBC 4.1.1. available on request. • NodeJS - all versions starting with NodeJS 4.0. SQream recommends the latest version (NodeJS 4.2.4). • Dot Net - SQream recommends version version 3.02 (compatible with DotNet version 48). • Pysqream - pysqream 3.1.2 4.1.1.2 Centralized Configuration System SQream now uses a new configuration system based on centralized configuration accessible from SQream Studio. For more information, see the following: • Configuration - describes how to configure your instance of SQream from a centralized location. • SQream Studio 5.4.0 - configure your instance of SQream from Studio. 4.1.1.3 Qualifying Schemas Without Providing an Alias When running queries, SQream now supports qualifying schemas without providing an alias. For more information, see CREATE SCHEMA. 4.1.1.4 Double-Quotations Supported When Importing and Exporting CSVs When importing and exporting CSVs, SQream now supports using quotation characters other than double quotation marks ("). For more information, see the following: • COPY FROM • COPY TO Note the following: • Leaving • The quotation character must be a single, 1-byte printable ASCII character. The same octal syntax of the copy command can be used. • The quote character cannot be contained in the field delimiter, record delimiter, or null marker. • Double-quotations can be customized when the csv_fdw value is used with the COPY FROM and CREATE FOREIGN TABLE statements. • The default escape character always matches the quote character, and can be overridden by using the ESCAPE = {'\\' | E'\XXX') syntax as shown in the following examples: copy t from wrapper csv_fdw options (location = '/tmp/file.csv', escape='\\'); 716 Chapter 4. Releases SQream DB, Release 2021.1 copy t from wrapper csv_fdw options (location = '/tmp/file.csv', escape=E'\017'); copy t to wrapper csv_fdw options (location = '/tmp/file.csv', escape='\\'); For more information, see the following statements: • COPY FROM - Loading data from files on the filesystem and importing it into SQream tables. • CREATE FOREIGN TABLE - Creating a new foreign table in an existing database. 4.2 What’s New in 2021.1 The 2021.1 Release Notes describe the following releases: • Release Notes Version 2021.1.2 • Release Notes Version 2021.1.1 • Release Notes Version 2021.1 4.2.1 Release Notes Version 2021.1.2 The 2021.1.2 release notes were released on 8/9/2021. 4.2.1.1 New Features The 2021.1.2 Release Notes include the following new features: • Aliases Added to SUBSTRING Function and Length Argument • Data Type Aliases Added • String Literals Containing ASCII Characters Interepreted as TEXT • Decimal Literals Interpreted as Numeric Columns • Roles Area Added to Studio Version 5.3.3 4.2.1.1.1 Aliases Added to SUBSTRING Function and Length Argument The following aliases have been added: • length - len • substring - substr 4.2. What’s New in 2021.1 717 SQream DB, Release 2021.1 4.2.1.1.2 Data Type Aliases Added The following data type aliases have been added: • INTEGER - int • DECIMAL - numeric • DOUBLE PRECISION - double • CHARACTER/CHAR - text • NATIONAL CHARACTER/NATIONAL CHAR/NCHAR - text • CHARACTER VARYING/CHAR VARYING - text • NATIONAL CHARACTER VARYING/NATIONAL CHAR VARYING/NCHAR VARYING - text 4.2.1.1.3 String Literals Containing ASCII Characters Interepreted as TEXT SQream now interprets all string literals, including those containing ASCII characters, as text. For more information, see String Types. 4.2.1.1.4 Decimal Literals Interpreted as Numeric Columns SQream now interprets literals containing decimal points as numeric instead of as double. For more information, see Data Types. 4.2.1.1.5 Roles Area Added to Studio Version 5.3.3 The Roles area has been added to Studio version 5.3.3. From the Roles area users can create and assign roles and manage user permissions. 4.2.1.2 Resolved Issues The following list describes the resolved issues: • In Parquet files, float columns could not be mapped to SQream double columns. This was fixed. • The REPLACE function only supported constant values as arguments. This was fixed. • The LIKE function did not check for incorrect patterns or handle escape characters. This was fixed. 4.2.2 Release Notes Version 2021.1.1 The 2021.1.1 release notes were released on 7/27/2021. 718 Chapter 4. Releases SQream DB, Release 2021.1 4.2.2.1 New Features The 2021.1.1 Release Notes include the following new features: • Complete Ranking Function Support 4.2.2.1.1 Complete Ranking Function Support SQream now supports the following new ranking functions: FunctionReturn Type Description first_valueSame type as value Returns the value in the first row of a window. last_valueSame type as value Returns the value in the last row of a window. nth_valueSame type as value Returns the value in a specified (n) row of a window. if the specified row does not exist, this function returns NULL. dense_rankbigint Returns the rank of the current row with no gaps. percent_rankdouble Returns the relative rank of the current row. cume_distdouble Returns the cumulative distribution of rows. ntile(buckets)integer Returns an integer ranging between 1 and the argument value, dividing the partitions as equally as possible. For more information, navigate to Windows Functions and scroll to the Ranking Functions table. 4.2.2.2 Resolved Issues The following list describes the resolved issues: • SQream did not support exporting and reading Int64 columns as bigint in Parquet. This was fixed. • The Decimal column was not supported when inserting data from Parquet files. This was fixed. • Values in Parquet Numeric columns were not being converted correctly. This was fixed. • Converting string data type to datetime was not working correctly. This was fixed. • Casting datetime to text truncated the time. This was fixed. 4.2.3 Release Notes Version 2021.1 The 2021.1 release notes were released on 6/13/2021. 4.2.3.1 Version Content The 2021.1 Release Notes describes the following: • Major feature release targeted for all on-premises customers. • Basic Cloud functionality. 4.2. What’s New in 2021.1 719 SQream DB, Release 2021.1 4.2.3.2 New Features The 2021.1 Release Notes include the following new features: • SQream DB on Cloud • Numeric Data Types • Text Data Type • Supports Scalar Subqueries • Literal Arguments • Simple Scalar SQL UDFs • Logging Enhancements • Improved Presented License Information • Optimized Foreign Data Wrapper Export 4.2.3.2.1 SQream DB on Cloud SQream DB can now be run on AWS, GCP, and Azure. 4.2.3.2.2 Numeric Data Types SQream now supports Numeric Data types for the following operations: • All join types. • All aggregation types (not including Window functions). • Scalar functions (not including some trigonometric and logarithmic functions). For more information, see Numeric Data Types. 4.2.3.2.3 Text Data Type SQream now supports TEXT data types in all operations, which is default string data type for new projects. • Sqream supports VARCHAR functionalty, but recommends using TEXT. • TEXT data enhancements introduced in Release Notes version 2020.3.1: – Support text columns in queries with multiple distinct aggregates. – Text literal support for all functions. For more information, see String Types. 4.2.3.2.4 Supports Scalar Subqueries SQream now supports running initial scalar subqueries. For more information, see Subqueries. 720 Chapter 4. Releases SQream DB, Release 2021.1 4.2.3.2.5 Literal Arguments SQream now supports literal arguments for functions in all cases where column/scalar arguments are sup- ported. 4.2.3.2.6 Simple Scalar SQL UDFs SQream now supports simple scalar SQL UDF’s. For more information, see Simple Scalar SQL UDF’s. 4.2.3.2.7 Logging Enhancements The following log information has been added for the following events: • Compilation start time. • When the first metadata callback in the compiler (if relevant). • When the last metadata callback in the compiler (if relevant). • When the log started attempting to apply locks. • When a statement entered the queue. • When a statement exited the queue. • When a client has connected to an instance of sqreamd (if it reconnects). • When the log started executing. 4.2.3.2.8 Improved Presented License Information SQream now displays information related to data size limitations, expiration date, type of license shown by the new UF. The Utility Function (UF) name is get_license_info(). For more information, see GET_LICENSE_INFO. 4.2.3.2.9 Optimized Foreign Data Wrapper Export Sqream now supports exporting to multiple files concurrently. This is useful when you need to reduce file size to more easily export multiple files. The following is the correct syntax for exporting multiple files concurrently: COPY table_name TO fdw_name OPTIONS(max_file_size=size_in_bytes,enforce_single_file= ,→{TRUE|FALSE}); The following is an example of the correct syntax for exporting multiple files concurrently: COPY my_table1 TO my_ext_table OPTIONS(max_file_size=500000,enforce_single_file=TRUE); The following apply: • Both of the parameters in the above example are optional. 4.2. What’s New in 2021.1 721 SQream DB, Release 2021.1 • The max_file_size value is specified in bytes and can be any positive value. The default valueis 16*2^20 (16MB). • When the enforce_single_file value is set to TRUE, only one file is created, and its size is not limited by the max_file_size value. Its default value is TRUE. 4.2.3.3 Main Features The following list describes the main features: • SQreamDB available on AWS. • SQreamDB available on GCP. • SQreamDB available on Azure. • SQream usages storage located on Object Store (as opposed to local disks) for the above three cloud providers. • SQream now supports Microstrategy. • Supports MVP licensing system. • A new literal syntax containing character escape semantics for string literals has been added. • Supports optimizing exporting foreign data wrappers. • Supports truncating Numeric values when ingested from ORC and CSV files. • Supports catalog Utility Function that accepts valid SQL patterns and escape characters. • Supports creating a basic random data foreign data wrapper for non-text types. • The new foreign data wrapper random_fdw has been introduced for non-text types. • Supports simple scalar SQL UDF’s. • SQream parses its own logs as CSV’s. 4.2.3.4 Resolved Issues The following list describes the resolved issues: • Copying text from a CSV file to the TEXT column without closing quotes caused SQream tocrash. This was fixed. • Using an unsupported function call generated an incorrect insert error. This was fixed. • Using the insert into function from table_does_not_exist generated an incorrect error. • SQream treated inserting * in select_distinct as one column. This was fixed. • Using certain encodeKey functions generated errors. This was fixed. • Compile errors occurred while running decimal datatype sets. This was fixed. • Running the select table_name,row_count from sqream_catalog.tables order by row_count limit 5 query generated an internal runtime error. • Using wildcards (such as *.x.y) did not work in parquet files. This was fixed. • Executing log*(x,y) generated an incorrect error message. This was fixed. • The internal runtime error type doesn’t have a fixed size when doing max on text on develop. 722 Chapter 4. Releases SQream DB, Release 2021.1 • The min and max on TEXT were significantly slower than varchar. This was fixed. • Running regexp_instr generated an empty regular expression. This was fixed. • Schemas with external tables could be dropped. This was fixed. 4.2.3.5 Operations and Configuration Changes 4.2.3.5.1 Recommended SQream Configuration on Cloud For more information about AWS, see Amazon S3. 4.2.3.5.2 Optimized Foreign Data Wrapper Export Configuration Flag SQream now has a new runtimeGlobalFlags flag called WriteToFileThreads. This flag configures the number of threads inthe WriteToFile function. The default value is 16. For more information about the runtimeGlobalFlags flag, see the Runtime Global Flags table in Con- figuration. 4.2.3.6 Naming Changes No relevant naming changes were made. 4.2.3.7 Deprecated Features No features were depecrated. 4.2.3.8 Known Issues and Limitations The the list below describes the following known issues and limitations: • In cases when selecting top 1 from external table using the Parquet format with an hdfs path, SQream experienced an error. • Internal Runtime Error occurred when SQream was unable to find column in reorder columns. • Casting datetime to text truncates the time segment. • In the select list, the compiler generates an error when a count is used as an alias. • Performance degradation occurred when joins made on small tables. • SQream causes a logging error when using copy from logs. • Deploying S3 requires setting the ObjectStoreClients parameter to 40. 4.2.3.9 Upgrading to v2021.1 Due to the known issue of a limitation on the amount of access requests that can be simultaneously sent to AWS, deploying S3 requires setting the ObjectStoreClients parameter to 40. 4.2. What’s New in 2021.1 723 SQream DB, Release 2021.1 4.3 What’s New in 2020.3 SQream DB v2020.3 contains new features, performance enhancements, and resolved issues. The 2020.3 Release Notes describe the following: • Release Notes Version 2020.3.2.1 • Release Notes Version 2020.3.1 • Release Notes Version 2020.3 4.3.1 Release Notes Version 2020.3.2.1 SQream DB v2020.3.2.1 contains major performance improvements and some bug fixes. 4.3.1.1 Performance Enhancements • Metadata on Demand optimization resulting in reduced latency and improved overall performance. 4.3.1.2 Known Issues and Limitations • Multiple count distinct operations - is blocked for Text data type. 4.3.1.3 Upgrading to v2020.3.2.1 Versions are available for IBM POWER9, RedHat (CentOS) 7, Ubuntu 18.04, and other OSs via Docker. Contact your account manager to get the latest release of SQream DB. 4.3.2 Release Notes Version 2020.3.1 4.3.2.1 New Features The following list describes the new features: • TEXT data type: – Full support for MIN and MAX aggregate functions on TEXT columns in GROUP BY queries. – Support Text-type as window partition keys (e.g., select distinct name, max(id) over (partition by name) from textTable;). – Support Text-type fields in windows order by keys. – Support join on TEXT columns (such as t1.x = t2.y where x and y are columns of type TEXT). 724 Chapter 4. Releases SQream DB, Release 2021.1 – Complete the implementation of LIKE on TEXT columns (previously limited to prefix and suffix). – Support for cast fromm TEXT to REAL/FLOAT. – New string function - REPEAT for repeating a string value for a specified number of times. • Support mapping DECIMAL ORC columns to SQream’s floating-point types. • Support LIKE on non-literal patterns (such as columns and complex expressions). • Catch OS signals and save the signal along with the stack trace in the SQream debug log. • Support equijoin conditions on columns with different types (such as tinyint, smallint, int and bigint). • DUMP_DATABASE_DDL now includes foreign tables in the output. • New utility function - TRUNCATE_IF_EXISTS. 4.3.2.2 Performance Enhancements The following list describes the performance enhancements: • Introduced the “MetaData on Demand” feature which results in signicant proformance improvements. • Implemented regex functions (RLIKE, REGEXP_COUNT, REGEXP_INSTR, REGEXP_SUBSTR, PATINDEX) for TEXT columns on GPU. 4.3.2.3 Resolved Issues The following list describes the resolved issues: • Multiple distinct aggregates no longer need to be used with developerMode flag. • In some scenarios, the statement_id and connection_id values are incorrectly recorded as -1 in the log. • NOT RLIKE is not supported for TEXT in the compiler. • Casting from TEXT to date/datetime returns an error when the TEXT column contains NULL. 4.3.2.4 Known Issues and Limitations No known issues and limitations. 4.3.2.5 Upgrading to v2020.3.1 Versions are available for IBM POWER9, RedHat (CentOS) 7, Ubuntu 18.04, and other OSs via Docker. Contact your account manager to get the latest release of SQream DB. 4.3. What’s New in 2020.3 725 SQream DB, Release 2021.1 4.3.3 Release Notes Version 2020.3 4.3.3.1 New Features The following list describes the new features: • Parquet and ORC files can now be exported to local storage, S3, and HDFS with COPY TO and foreign data wrappers. • New error tolerance features when loading data with foreign data wrappers. • TEXT is ramping up with new features (previously only available with VARCHARs): – SUBSTRING, LOWER, LTRIM, CHARINDEX, REPLACE, etc. – Binary operators - || (Concatenate), LIKE, etc. – Casts to and from TEXT • sqream_studio v5.1 – New log viewer helps you track and debug what’s going on in SQream DB. – Dashboard now also available for non-k8s deployments. – The editor contains a new query concurrency tool for date and numeric ranges. 4.3.3.2 Performance Enhancements The following list describes the performance enhancements: • Error handling for CSV FDW. • Enable logging errors - ORC, Parquet, CSV. • Add limit and offset options to csv_fdw import. • Enable logging errors to an external file when skipping CSV, Parquet, and ORC errors. • Option to specify date format to the CSV FDW. • Support all existing VARCHAR functions with TEXT on GPU. • Support INSERT INTO + ORDER BY optimization for non-clustered tables. • Performance improvements with I/O. 4.3.3.3 Resolved Issues The following list describes the resolved issues: • Better error message when passing the max errors limit. This was fixed. • showFullExceptionInfo is no longer restricted to Developer Mode. This was fixed. • An StreamAggregateA reduction error occured when performing aggregation on a NULL column. This was fixed. • Insert into query fails with “”Error at Sql phase during Stages “”rewriteSqlQuery””. This was fixed. • Casting from VARCHAR to ‘‘NVARCHAR‘ does not remove the spaces. This was fixed. • An Internal Runtime Error t1.size() == t2.size() occurs when querying the sqream_catalog. delete_predicates. This was fixed. 726 Chapter 4. Releases SQream DB, Release 2021.1 • spoolMemoryGB and limitQueryMemoryGB show incorrectly in the runtime global section of show_conf. This was fixed. • Casting empty text to int causes illegal memory access. This was fixed. • Copying from the TEXT field is 1.5x slower than the VARCHAR equivalent. This was fixed. • TPCDS 10TB - Internal runtime error (std::bad_alloc: out of memory) occurs on 2020.1.0.2. This was fixed. • An unequal join on non-existing TEXT caused a system crash. This was fixed. • An Internal runtime time error occured when using TEXT (tpcds). This was fixed. • Copying CSV with a quote in the middle of a field to a TEXT field does not produce the required error. This was fixed. • Cannot monitor long network insert loads with SQream. This was fixed. • Upper and like performance on NVARCHAR. This was fixed. • Insert into from 4 instances would get stuck (hanging). This was fixed. • An invalid formatted CSV would cause an insufficient memory error ona COPY FROM statement if a quote was not closed and the file was much larger than system memory. This was fixed. • TEXT columns cannot be used with an outer join together with an inequality check (!= , <>). This was fixed. 4.3.3.4 Known Issues and Limitations The following list describes the known issues and limitations: • Cast from TEXT to a DATE or DATETIME errors when the TEXT column contains NULL • Casting an empty TEXT field to an INT type returns 0 instead of erroring • Multiple COUNT( distinct ... ) operations on the TEXT data type are currently unsupported • Multiple COUNT( distinct ... ) operations within the same query are limited to “developer mode” due to an instability that was identified. If you rely on this feature, contact your SQream account manager to enable this feature. 4.3.3.5 Upgrading to v2020.3 Versions are available for IBM POWER9, RedHat (CentOS) 7, Ubuntu 18.04, and other OSs via Docker. Contact your account manager to get the latest release of SQream DB. 4.4 What’s New in 2020.2 SQream DB v2020.2 contains some new features, improved performance, and bug fixes. This version has new window ranking function and a new editor UI to empower data users to analyze more data with less friction. As always, the latest release improves reliability and performance, and makes getting more data into SQream DB easier than ever. 4.4. What’s New in 2020.2 727 SQream DB, Release 2021.1 4.4.1 New features 4.4.1.1 UI • New sqream_studio replaces the previous Statement Editor. Try it out today! 4.4.1.2 Integrations • Our Python driver (pysqream) now has an SQLAlchemy dialect. Customers can write high-performance Python applications that make full use of SQream DB - connect, query, delete, and insert data. Data scientists can use pysqream with Pandas, Numpy, and AI/ML frameworks like TensorFlow for direct queries of huge datasets. 4.4.1.3 SQL support • Added LAG/LEAD ranking functions to our Window Functions support. We will have more features coming in the next version. • New syntax preview for Foreign Tables. Foreign tables replace external tables, with improved func- tionality. You can keep using the existing external table syntax for now, but it may be deprecated in the future. CREATE FOREIGN TABLE orc_example ( name varchar(40), Age tinyint, Salary float ) WRAPPER orc_fdw OPTIONS ( LOCATION = 'hdfs://hadoop-nn.piedpiper.com:8020/demo-data/example.orc' ); 4.4.2 Improvements and fixes SQream DB v2020.2 includes hundreds of small new features and tunable parameters that improve perfor- mance, reliability, and stability. • ~100 bug fixes, including: – Fixed CSV handling for DOS newlines – Fixed “out of bounds” message when several layers of nested substring, cast, and to_hex were used to produce one value. – Fixed “Illegal memory access” that would occur in extremely rare situations on all-text tables – Window functions can now be used with all aggregations – Fixed situation where a single worker may use more than one GPU that isn’t allocated to it – Text columns can now be added to existing tables with ALTER TABLE • New Data clustering syntax that can improve query performance for unsorted data 728 Chapter 4. Releases SQream DB, Release 2021.1 4.4.3 Operations • When upgrading from a previous version of SQream DB (for example, v2019.2), the storage version must be upgraded using the upgrade_storage utility: ./bin/upgrade_storage /path/to/storage/ sqreamdb/ • A change in memory allocation behaviour in this version sees the introduction of a new setting, limitQueryMemoryGB. This is an addition to the previous spoolMemoryGB setting. A good rule-of-thumb is to allow 5% system memory for other processes. The spool memory allocation should be around 90% of the total memory allocated. – limitQueryMemoryGB defines how much total system memory is used by the worker. The recom- mended setting is (total host memory - 5%) / sqreamd workers on host. – spoolMemoryGB defines how much memory is set aside for spooling, out of the total system memory allocated in limitQueryMemoryGB. The recommended setting is 90% of the limitQueryMemoryGB. This setting must be set lower than the limitQueryMemoryGB setting. For example, for a machine with 512GB of RAM and 4 workers, the recommended settings are: – limitQueryMemoryGB - (512 * 0.95 / 4) → ~ 486 / 4 → 121. – spoolMemoryGB - ( 0.9 * limitQueryMemoryGB ) → ( 0.9 * 121 ) → 108 Example settings per-worker, for 512GB of RAM and 4 workers: "runtimeFlags": { "limitQueryMemoryGB" : 121, "spoolMemoryGB" : 108 4.4.4 Known Issues & Limitations • An invalid formatted CSV can cause an insufficient memory error on a COPY FROM statement if a quote isn’t closed and the file is much larger than system memory. • Multiple COUNT( distinct ... ) operations within the same query are limited to “developer mode” due to an instability that was identified. If you rely on this feature, contact your SQream account manager to enable this feature. • TEXT columns can’t be used with an outer join together with an inequality check (!= , <>) 4.4.5 Upgrading to v2020.2 Versions are available for IBM POWER9, RedHat (CentOS) 7, Ubuntu 18.04, and other OSs via Docker. Contact your account manager to get the latest release of SQream DB. 4.5 What’s New in 2020.1 SQream DB v2020.1 contains lots of new features, improved performance, and bug fixes. This is the first release of 2020, with a strong focus on integration into existing environments. Therelease includes connectivity to Hadoop and other legacy data warehouse ecosystems. We’re also bringing lots of new capabilities to our analytics engine, to empower data users to analyze more data with less friction. 4.5. What’s New in 2020.1 729 SQream DB, Release 2021.1 The latest release vastly improves reliability and performance, and makes getting more data into SQream DB easier than ever. The core of SQream DB v2020.1 contains new integration features, more analytics capabilities, and better drivers and connectors. 4.5.1 New features 4.5.1.1 Integrations • Load files directly from S3 buckets. Customers with columnar data in S3 data lakes can now access the data directly. All that is needed is to simply point an external table to an S3 bucket with Parquet, ORC, or CSV objects. This feature is available on all deployments of SQream DB – in the cloud and on-prem. • Load files directly from HDFS. SQream DB now comes with built-in, native HDFS support for directly loading data from Hadoop-based data lakes. Our focus on helping Hadoop customers do more with their data led us to develop this feature, which works out of the box. As a result, SQream DB can now not only read but also write data, and intermediate results back to HDFS for HIVE and other data consumers. SQream DB now fits seamlessly into a Hadoop data pipeline. • Import ORC files, through Foreign Tables. ORC files join Parquet as files that can be natively accessed and inserted into SQream DB tables. • Python driver (pysqream) is now DB-API v2.0 compliant. Customers can write high-performance Python applications that make full use of SQream DB - connect, query, delete, and insert data. Data scientists can use pysqream with Pandas, Numpy, and AI/ML frameworks like TensorFlow for direct queries of huge datasets. • Certified Tableau JDBC connector (taco), now also supported on MacOS. Users are encouraged to install the new JDBC connector. • All logs are now unified into one log, which can be analyzed with SQream DB directly. See Logging for more information. 4.5.1.2 SQL support • Added frames and frame exclusions to Window Functions. This is available for preview, with more features coming in the next version. The new frames and frame exclusionsfeature adds complex analytics capabilities to the already powerful window functions. • New datatype - TEXT, which replaces NVARCHAR directly with UTF-8 support and improved performance. Unlike VARCHAR, the new TEXT data type has no restrictions on size, and carries no performance overhead as the text sizes grow. • TEXT join keys are now supported • Added lots of new aggregate functions, including VAR_SAMP, VAR_POP, COVAR_POP, etc. 4.5.2 Improvements and fixes SQream DB v2020.1 includes hundreds of small new features and tunable parameters that improve perfor- mance, reliability, and stability. Existing SQream DB users can expect to see a general speedup of around 10% on most statements and queries! 730 Chapter 4. Releases SQream DB, Release 2021.1 • 207 bug fixes, including: – Improved performance of both inner and outer joins – Fixed wrong results on STDDEV (0 instead of NULL) – Fixed wrong results on nested Parquet files – Fixed failing cast from VARCHAR to FLOAT – Fix INSERT that would fail on nullable values and non-nullable columns in some scenarios – Improved memory consumption, so Out of GPU memory errors should not occur anymore – Reduced long compilation times for very complex queries – Improved ODBC reliability – Fixed situation where some logs would clip very long queries – Improved error messages when dropping a schema with many objects – Fixed situation where Spotfire would not show table names – Fixed situation where some queries with UTF-8 literals wouldn’t run through Tableau over ODBC – Significantly improved cache freeing and memory allocation – Fixed situation in which a malformed time (24:00:00) would get incorrectly inserted from a CSV – Fixed race condition in which loading thousands of small files from HDFScaused a memory leak • The saved query feature can now be used with INSERT statements • Faster “Deferred gather” algorithm for joins with text keys • Faster filtering when using DATEPART • Faster metadata tagging during load • Fixed situation where some queries would get compiled twice • Saved queries now support INSERT statements • highCardinalityColumns can be configured to tell the system about high selectivity columns • sqream sql starts up faster, can run on any Linux machine • Additional CSV date formats (date parsers) added for compatibility 4.5.3 Behaviour changes • ClientCmd is now known as sqream sql • NVARCHAR columns are now known as TEXT internally • Deprecated the ability to run SELECT and COPY at the same time on the same worker. This change is designed to protect against out of GPU memory issues. This comes with a configuration change, namely the limitQueryMemoryGB setting. See the operations section for more information. • All logs are now unified into one log. See Logging for more information • Compression changes: – The latest version of SQream DB could select a different compression scheme if data is reloaded, compared to previous versions of SQream DB. This internal change improves performance. 4.5. What’s New in 2020.1 731 SQream DB, Release 2021.1 – With LZ4 compression, the maximum chunk size is limited to 2.1GB. If the chunk size is bigger, another compression may be selected - primarily SNAPPY. • The following configuration flags have been deprecated: – addStatementRechunkerAfterGpuToHost – increasedChunkSizeFactor – gpuReduceMergeOutputFactor – fullSortInputMemFactor – reduceInputMemFactor – distinctInputMemFactor – useAutoMemFactors – autoMemFactorsVramFactor – catchNotEnoughVram – useNetworkRechunker – useMemFactorInJoinOutput 4.5.4 Operations • The client-server protocol has been updated to support faster data flow, and more reliable memory allocations on the client side. End users are required to use only the latest sqream sql, JDBC, and ODBC drivers delivered with this version. See the client driver download page for the latest drivers and connectors. • When upgrading from a previous version of SQream DB (for example, v2019.2), the storage version must be upgraded using the upgrade_storage utility: ./bin/upgrade_storage /path/to/storage/ sqreamdb/ • A change in memory allocation behaviour in this version sees the introduction of a new setting, limitQueryMemoryGB. This is an addition to the previous spoolMemoryGB setting. A good rule-of-thumb is to allow 5% system memory for other processes. The spool memory allocation should be around 90% of the total memory allocated. – limitQueryMemoryGB defines how much total system memory is used by the worker. The recom- mended setting is (total host memory - 5%) / sqreamd workers on host. – spoolMemoryGB defines how much memory is set aside for spooling, out of the total system memory allocated in limitQueryMemoryGB. The recommended setting is 90% of the limitQueryMemoryGB. This setting must be set lower than the limitQueryMemoryGB setting. For example, for a machine with 512GB of RAM and 4 workers, the recommended settings are: – limitQueryMemoryGB - (512 * 0.95 / 4) → ~ 486 / 4 → 121. – spoolMemoryGB - ( 0.9 * limitQueryMemoryGB ) → ( 0.9 * 121 ) → 108 Example settings per-worker, for 512GB of RAM and 4 workers: "runtimeGlobalFlags": { "limitQueryMemoryGB" : 121, "spoolMemoryGB" : 108 732 Chapter 4. Releases SQream DB, Release 2021.1 4.5.5 Known Issues & Limitations • An invalid formatted CSV can cause an insufficient memory error on a COPY FROM statement if a quote isn’t closed and the file is much larger than system memory. • TEXT columns cannot be used in a window functions’ partition • Parsing errors are sometimes hard to read - the location points to the wrong part of the statement • LZ4 compression may not be applied correctly on very large VARCHAR columns, which decreases per- formance • Using SUM on very large numbers in window functions can error (overflow) when not used with an ORDER BY clause • Slight performance decrease with DATEADD in this version (<4%) • Operations on Snappy-compressed ORC files are slower than their Parquet equivalents. 4.5.6 Upgrading to v2020.1 Versions are available for IBM POWER9, RedHat (CentOS) 7, Ubuntu 18.04, and other OSs via Docker. Contact your account manager to get the latest release of SQream DB. 4.5. What’s New in 2020.1 733 SQream DB, Release 2021.1 734 Chapter 4. Releases CHAPTER 5 Glossary Cluster A SQream DB deployment containing several workers running on one or more nodes. Node A machine running SQream DB workers. Worker A SQream DB application that can respond to statements. Several workers can run on one or more nodes to form a cluster. Metadata SQream DB’s internal storage which contains details about database objects. Authentication Authentication is the process that identifies a user or role to verify an identity - to make sure the user is who they say they are. This is done with a username and password. Authorization Authorization defines the set of actions that an authenticaed role can perform after gaining access to the system, protecting from threats that authentication controls alone are not enough against. Role A role is a group or a user, depending on the permissions granted. A role groups together a set of permissions. Catalog A set of views that contains information (metadata) about the objects in a database (e.g. tables, columns, chunks, etc…). Storage cluster The storage cluster is the directory in which SQream DB stores data, including database objects, metadata database, and logs. UDF User-defined function A feature that extends SQream DB’s built in SQL functionality with user-written Python code. 735 SQream DB, Release 2021.1 736 Chapter 5. Glossary Index Symbols S __iter__() (Connection method), 49 Storage cluster, 735 A U arraysize (Connection attribute), 47 UDF, 735 Authentication, 735 User-defined function, 735 Authorization, 735 W C Worker, 735 Catalog, 735 close() (Connection method), 48 Cluster, 735 connect(), 47 Connection (built-in class), 47 cursor() (Connection method), 48 D description (Connection attribute), 47 E execute() (Connection method), 48 executemany() (Connection method), 48 F fetchall() (Connection method), 48 fetchmany() (Connection method), 49 fetchone() (Connection method), 48 M Metadata, 735 N Node, 735 R Role, 735 rowcount (Connection attribute), 47 737) is a function that shows the CREATE TABLE statement for a table.