Install Guide for

Data Preparation for Amazon Redshift and S3

Version: 7.6.1 Doc Build Date: 01/29/2021 Copyright © Trifacta Inc. 2021 - All Rights Reserved. CONFIDENTIAL

These materials (the “Documentation”) are the confidential and proprietary information of Trifacta Inc. and may not be reproduced, modified, or distributed without the prior written permission of Trifacta Inc.

EXCEPT AS OTHERWISE PROVIDED IN AN EXPRESS WRITTEN AGREEMENT, TRIFACTA INC. PROVIDES THIS DOCUMENTATION AS-IS AND WITHOUT WARRANTY AND TRIFACTA INC. DISCLAIMS ALL EXPRESS AND IMPLIED WARRANTIES TO THE EXTENT PERMITTED, INCLUDING WITHOUT LIMITATION THE IMPLIED WARRANTIES OF MERCHANTABILITY, NON-INFRINGEMENT AND FITNESS FOR A PARTICULAR PURPOSE AND UNDER NO CIRCUMSTANCES WILL TRIFACTA INC. BE LIABLE FOR ANY AMOUNT GREATER THAN ONE HUNDRED DOLLARS ($100) BASED ON ANY USE OF THE DOCUMENTATION.

For third-party license information, please select About Trifacta from the Help menu. 1. Quick Start . 4 1.1 Install from AWS Marketplace . . 5 1.2 Install from AWS Marketplace with EMR . . 9 1.3 Upgrade for AWS Marketplace . 16 2. Configure 17 2.1 Configure for AWS . 18 2.1.1 Enable S3 Access . 25 2.1.1.1 Create Redshift Connections 37 2.1.2 Configure for EC2 Role-Based Authentication . 41 3. Legal 43 3.1 Third-Party License Information . 44 3.2 Contact Support . 67

Page #3 Quick Start

Copyright © 2021 Trifacta Inc. Page #4 Install from AWS Marketplace

Contents:

Product Limitations Internet access Install Desktop Requirements Pre-requisites Install Steps - CloudFormation template SSH Access Troubleshooting SELinux Upgrade Documentation Related Topics

This guide steps through the requirements and process for installing Trifacta® Data Preparation for Amazon Redshift and S3 through the AWS Marketplace.

Product Limitations

Connectivity to sources other than S3 and Redshift is not supported. Jobs must be executed in the Trifacta Photon running environment. No other running environment integrations are supported. Anomaly and stratified sampling are not supported in this deployment. When publishing single to S3, you cannot apply an append publishing action. Trifacta Data Preparation for Amazon Redshift and S3 must be deployed into an existing Virtual Private Cloud (VPC). The EC2 instance, S3 buckets, and any connected Redshift databases must be located in the same Amazon region. Cross-region integrations are not supported at this time.

NOTE: HDFS integration is not supported for Amazon Marketplace installations.

The S3 bucket automatically created by the Marketplace CloudFormation template is not automatically deleted when you delete the stack in CloudFormation. You must empty the bucket and delete it, which can be done through the AWS Console.

Internet access

From AWS, the Trifacta platform requires Internet access for the following services:

NOTE: Depending on your AWS deployment, some of these services may not be required.

AWS S3 Key Management System [KMS] (if sse-kms server side encryption is enabled) Secure Token Service [STS] (if temporary credential provider is used) EMR (if integration with EMR cluster is enabled)

Copyright © 2021 Trifacta Inc. Page #5 NOTE: If the Trifacta platform is hosted in a VPC where Internet access is restricted, access to S3, KMS and STS services must be provided by creating a VPC endpoint. If the platform is accessing an EMR cluster, a proxy server can be configured to provide access to the AWS ElasticMapReduce regional endpoint.

Install Desktop Requirements

All desktop users of the platform must have the latest stable version of Chrome installed on their desktops. All desktop users must be able to connect to the EC2 instance over the port on which Trifacta Data Preparation for Amazon Redshift and S3 is listening. Default is 3005.

NOTE: Trifacta Data Preparation for Amazon Redshift and S3 enforces a maximum limit of 30 users.

Pre-requisites

Before you install the platform, please verify that the following steps have been completed.

1. EULA. Before you begin, please review the End-User License Agreement. See End-User License Agreement. 2. SSH Key-pair. Please verify that there is an SSH key pair available to assign to the Trifacta node.

Install Steps - CloudFormation template

This install process creates the following:

Trifacta node on an EC2 instance An S3 bucket to store data IAM roles and policies to access the S3 bucket from the Trifacta node.

Steps:

1. In the Marketplace listing, click Deploy into an existing VPC. 2. Select Template: The template path is automatically populated for you. 3. Specify Details: a. Stack Name: Display name of the application

NOTE: Each instance of the Trifacta platform should have a separate name.

b. Instance Type: Only one instance type is enabled. c. Key Pair: Select the SSH pair to use for Trifacta Instance access. d. Allowed HTTP Source: Please specify the IP address or range of address from which HTTP /HTTPS connections to the application are permitted. e. Allowed SSH Source: Please specify the IP address or range of address from which SSH connections to the EC2 instance are permitted. 4. Options: None of these is required for installation. Specify your options as needed for your environment. 5. Review: Review your installation and configured options. a. Select the checkbox at the end of the page. b. To launch the configured instance, click Create. 6. In the Stacks list, select the name of your application. Click the Outputs tab and collect the following information. Instructions in how to use this information is provided later. a. No outputs appear until the stack has been created successfully.

Copyright © 2021 Trifacta Inc. Page #6

Parameter Description Use

TrifactaUrl URL and port number to which to connect to the Trif Users must connect to this IP address and port value acta application number to access.

TrifactaBuck The address of the default S3 bucket This value must be applied through the application. et

TrifactaInsta The identifier for the instance of the platform This value is the default password for the admin nceId account.

NOTE: This password must be changed immediately.

7. When the instance is spinning up for the first time, performance may be slow. When the instance is up, please navigate to the TrifactaUrl location:

http://:3005

8. When the login screen appears, enter the following: a. Username: [email protected] b. Password: (the TrifactaInstanceId value)

NOTE: As soon as you login as an admin for the first time, you should change the password.

9. Enable read access to S3: a. From the menu bar, select User menu > Admin console > Workspace Settings. b. Set the following property to Enabled:

Enable S3 Connectivity

10. From the menu bar, select User menu > Admin console > Admin settings. 11. In the Admin Settings page, you can configure many aspects of the platform, including user management tasks, and perform restarts to apply the changes. a. In the Search bar, enter the following:

aws.s3.bucket.name

b. Set the value of this setting to be the TrifactaBucket value that you collected from the Outputs tab. 12. Enable the platform to generate job output manifest files. Locate the following parameter and set it to true:

"feature.enableJobOutputManifest": true,

13. Click Save. 14. When the platform restarts, you can begin using the product.

SSH Access

If you need to SSH to the Trifacta node, you can use the following command:

ssh -i @

Copyright © 2021 Trifacta Inc. Page #7 Parameter Description

Path to the key file stored on your local computer.

The user ID is always centos .

DNS or IP address of the Trifacta node

Troubleshooting

SELinux

By default, Trifacta Data Preparation for Amazon Redshift and S3 is installed on a server with SELinux enabled. Security-enhanced Linux (SELinux) provides a set of security features for, among other things, managing access controls.

Tip: The following may be applied to other deployments of the Trifacta platform on servers where SELinux has been enabled.

In some cases, SELinux can interfere with normal operations of platform software. If you are experiencing connectivity problems related to SELinux, you can do either one of the following:

1. Disable SELinux on the server. For more information, please see the CentOS documentation. 2. Apply the following commands on the server, as root: a. Open ports on the server for listening. i. By default, the Trifacta application listens on port 3005. The following opens that port when SELinux is enabled:

semanage port -a -t http_port_t -p tcp 3005

ii. Repeat the above step for any other ports that you wish to open on the server. b. Permit nginx, the proxy on the Trifacta node, to open websockets:

setsebool -P httpd_can_network_connect 1

Upgrade

For more information, see Upgrade for AWS Marketplace.

Documentation

You can access complete product documentation online and in PDF format. From within the product, select Help menu > Documentation.

Copyright © 2021 Trifacta Inc. Page #8 Install from AWS Marketplace with EMR

Contents:

Scenario Description Install Pre-requisites Internet access Product Limitations Install Desktop Requirements Note about deleting the CloudFormation stack Verify Start and Stop the Platform Verify Operations Troubleshooting SELinux CloudFormation stack fails to deploy with EMR error Upgrade Documentation Related Topics

This documentation applies to installation from a supported Marketplace. Please use the installation instructions provided with your deployment.

If you are installing or upgrading a Marketplace deployment, please use the available PDF content. You must use the install and configuration PDF available through the Marketplace listing.

This step-by-step Guide walks through the process of deploying an instance of Trifacta Wrangler Enterprise through the AWS Marketplace, using a CloudFormation template. CloudFormation templates are designed to configure and launch the product as well as the related AWS compute, network, storage and related services to support the platform. Size of these resources can be configured based on the expected workload of the platform.

Scenario Description

CloudFormation templates enable you to install Trifacta® Wrangler Enterprise with a minimal amount of effort.

After install, customizations can be applied by tweaking the resources that were created by the CloudFormation process. If you have additional requirements or a complex environment, please contact Trifacta Supportfor assistance with your solution.

Install

The CloudFormation template creates a complete working instance of Trifacta Wrangler Enterprise, including the following:

VPC and all required networking infrastructure (if deploying to a new VPC) EC2 instance with all supporting policies/roles S3 bucket

Copyright © 2021 Trifacta Inc. Page #9 EMR cluster Configurable autoscaling instance groups All supporting policies/roles

Pre-requisites

If you are integrating the Trifacta platform with an EMR cluster, you must acquire a Trifacta license first. Additional configuration is required. For more information, please contact [email protected].

Before you begin:

1. Read: Please read this entire document before you begin. 2. EULA. Before you begin, please review the End-User License Agreement. See https://docs.trifacta.com/display/PUB/End-User+License+Agreement+-+Trifacta+Wrangler+Enterprise. 3. Trifacta license file: If you have not done so already, please acquire a Trifacta license file from your Trifac ta representative.

Internet access

From AWS, the Trifacta platform requires Internet access for the following services:

NOTE: Depending on your AWS deployment, some of these services may not be required.

AWS S3 Key Management System [KMS] (if sse-kms server side encryption is enabled) Secure Token Service [STS] (if temporary credential provider is used) EMR (if integration with EMR cluster is enabled)

NOTE: If the Trifacta platform is hosted in a VPC where Internet access is restricted, access to S3, KMS and STS services must be provided by creating a VPC endpoint. If the platform is accessing an EMR cluster, a proxy server can be configured to provide access to the AWS ElasticMapReduce regional endpoint.

Product Limitations

The EC2 instance, S3 buckets, and any connected Redshift databases must be located in the same Amazon region. Cross-region integrations are not supported at this time. No support for Hive integration No support for secure impersonation or Kerberos No support for high availability and failover Job cancellation is not supported on EMR. When publishing single files to S3, you cannot apply an append publishing action. Spark 2.4 only.

Install

Desktop Requirements

All desktop users of the platform must have the latest stable version of installed on their desktops. All desktop users must be able to connect to the EC2 instance through the enterprise infrastructure.

Copyright © 2021 Trifacta Inc. Page #10 Steps:

1. In the Marketplace listing, select your deployment option: a. Deploy into a new VPC: Creates a new VPC and all the required networking infrastructure to run the Trifacta instance and the EMR cluster. b. Deploy into an existing VPC: Creates the Trifacta instance and EMR cluster in a VPC of your choice. 2. Choose a Template: The template path is automatically populated for you. 3. These parameters are common for both deployment options: a. Stack Name: Display name of the stack is used in the names of resources created by the stack and as an identifier for the stack.

NOTE: Each instance of the Trifacta platform must have a separate name.

b. Trifacta Server: Please select the appropriate instance depending on the number of users and data volumes of your environment. c. Key Pair: This SSH key pair is used to access the Trifacta Instance and the EMR cluster instances. d. Allowed HTTP Source: This range of addresses are permitted access to the Trifacta Instance on port 80, 443, and 3005. i. Port numbers 80 and 443 do not have any services by default, but you may modify the Trifact a configuration to enable access via these ports. e. Allowed SSH Source: This range of addresses is permitted access to port 22 on the Trifacta Instance. f. EMR Cluster Node Configuration: Allows you to customize the configuration of the deployed EMR nodes i. Reasonable values are used as defaults. ii. If you do customize these values, you should upsize. Avoid downsizing these values. g. EMR Cluster Autoscaling Configuration: Allows you to customize the autoscaling settings used by the EMR cluster. i. Reasonable values are used as defaults. 4. If you are launching into a new VPC, skip this step. If you are launching into an existing VPC, you must enter these additional parameters: a. VPC: The existing VPC into which you want to launch the resources. b. Trifacta Subnet: The subnet into which you want to launch your Trifacta instance. i. Please verify that this subnet is part of the VPC you selected. ii. This subnet must be able to communicate with the EMR Subnet (see below). c. EMR Subnet: The private subnet into which you want to launch your EMR cluster. i. Please verify that this subnet is part of the VPC you selected. ii. Amazon EMR only supports launching clusters in private subnets. iii. This subnet must be reachable from your Trifacta instance. iv. The EMR cluster requires an S3 endpoint available in this subnet. 5. Click Next. 6. Options: Specify any options as needed for your environment. None of these is required for installation. 7. Review: Review your installation and configured options. a. Select the checkbox at the end of the page. b. To launch the stack, click Create Stack. 8. Please wait while the stack creates all required resources. 9. In the Stacks list, select the name of your application. Click the Outputs tab and collect the following information. Instructions on how to use this information are provided later.

Parameter Description Use

Trifacta URL and port number to Users must connect to this IP address and port number to access. By default, it is URL value which to connect to the Trifac set to 3005 . The access port can be moved to 80 or 443 if desired. Please ta application contact us for more details.

Copyright © 2021 Trifacta Inc. Page #11 Trifacta The address of the default This value must be applied through the application after it has been deployed. Bucket S3 bucket

Trifacta The identifier for the This value is the default password for the admin account. Instance Id instance of the platform

NOTE: You must change this password on the first login to the application.

10. After the Trifacta instance has been created, it is licensed for use for a couple of days only. This temporary license permits you to finish configuration from within the Trifacta application where you can also upload your permanent license. 11. Please wait while the instance finishes initialization and becomes available for use. When it is ready, navigate to the Trifacta application. 12. When the login screen appears, enter the following: a. Username: [email protected] b. Password: (the TrifactaInstanceId value)

NOTE: If the password does not work, please verify that you have not copied extra spaces in the password.

13. Enable read access to S3: a. From the menu bar, select User menu > Admin console > Workspace Settings. b. Set the following property to Enabled:

Enable S3 Connectivity

14. From the menu bar, select User menu > Admin console > Admin settings. 15. In the Admin Settings page, you can configure many aspects of the platform, including user management tasks, and perform restarts to apply the changes. 16. Add the S3 bucket that was automatically created to store Trifacta metadata and EMR content. Search for:

"aws.s3.bucket.name"

a. Update the value with the Trifacta Bucket value provided when you created the stack in AWS. 17. Enable the Run in EMR option within the platform. Search for:

"webapp.runinEMR"

a. Select the checkbox to enable it. 18. Enable platform to generate job output manifest files. Locate the following parameter and set it to true:

"feature.enableJobOutputManifest": true,

19. Click Save underneath the Platform Settings section. 20. The platform restarts, which can take several minutes. 21. In the Admin Settings page, locate the External Service Settings section. a. AWS EMR Cluster ID: Paste the value for the EMR Cluster ID for the cluster to which the platform is connecting. i. Verify that there are no extra spaces in any copied value. b. AWS Region: Enter the region code where you have deployed the CloudFormation stack. i. Example: us-east-1

Copyright © 2021 Trifacta Inc. Page #12 ii. For a list of available regions, see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones. html#concepts-available-regions . c. Resource Bucket: you may use the already created Trifacta Bucket. i. Verify that there are no extra spaces in any copied value. d. Resource Path: you should use something like EMRLOGS. 22. Click Save underneath the External Service Settings section. 23. In the Admin Settings page, scroll down to the Upload License section. 24. Click Upload License and upload your license. a. When the server comes online for the first time, it is assigned a temporary license. Please be sure to upload your license within 24 hours. 25. Updating the license does not require a restart.

Note about deleting the CloudFormation stack

If you must delete the CloudFormation stack, please be aware of the following.

1. The S3 bucket that was created for the stack is not removed. If you want to delete it, you must empty it first and then delete it. 2. Any EMR security groups created for the stack cannot be deleted, due to circular references. The stack deletion process informs you of the security groups that it failed to delete. To complete the deletion: a. Remove all rules from the security groups. b. Delete the security groups manually. c. Re-run the stack deletion, which should complete successfully.

Verify

Start and Stop the Platform

Use the following command line commands to start, stop, and restart the platform.

Start:

sudo service trifacta start

Stop:

sudo service trifacta stop

Restart:

sudo service trifacta restart

Verify Operations

After you have installed or made changes to the platform, you should verify end-to-end operations.

NOTE: The Trifacta® platform is not operational until it is connected to a supported backend datastore.

Steps:

1. Login to the application as an administrator. See Login.

Copyright © 2021 Trifacta Inc. Page #13 2. Through the Admin Settings page, run Tricheck, which performs tests on the Trifacta node and any connected cluster. See Admin Settings Page. 3. In the application menu bar, click Library. Click Import Dataset. Select your backend datastore.

4. Navigate your datastore directory structure to locate a small CSV or JSON file.

5. Select the file. In the right panel, click Create and Transform. a. Troubleshooting: If the steps so far work, then you have read access to the datastore from the platform. If not, please check permissions for the Trifacta user and its access to the appropriate directories. b. See Import Data Page. 6. In the Transformer page, some steps have already been added to your recipe, so you can run the job right away. Click Run Job. a. See Transformer Page.

7. In the Run Job Page: a. For Running Environment, some of these options may not be available. Choose according to the running environment you wish to test. i. Photon: Runs job on the Photon running environment hosted on the Trifacta node. This method of job execution does not utilize any integrated cluster. ii. Spark: Runs the job on Spark on the integrated cluster. iii. Databricks: If the platform is integrated with an Azure Databricks cluster, you can test job execution on the cluster. b. Select CSV and JSON output. c. Select the Profile Results checkbox. d. Troubleshooting: At this point, you are able to initiate a job for execution on the selected running environment. Later, you can verify operations by running the same job on other available environments. e. See Run Job Page.

8. When the job completes, you should see a success message in the Jobs tab of the Flow View page. a. Troubleshooting: Either the Transform job or the Profiling job may break. To localize the problem, mouse over the Job listing in the Jobs page. Try re-running a job by deselecting the broken job type or running the job in a different environment. You can also download the log files to try to identify the problem. See Jobs Page.

9. Click View Results in the Jobs page. In the Profile tab of the Job Details page, you can see a visual profile of the generated results. a. See Job Details Page. 10. In the Output Destinations tab, click the CSV and JSON links to download the results to your local desktop. See Import Data Page.

11. Load these results into a local application to verify that the content looks ok.

Troubleshooting

SELinux

By default, Trifacta Wrangler Enterprise is installed on a server with SELinux enabled. Security-enhanced Linux (SELinux) provides a set of security features for, among other things, managing access controls.

Tip: The following may be applied to other deployments of the Trifacta platform on servers where SELinux has been enabled.

In some cases, SELinux can interfere with normal operations of platform software. If you are experiencing connectivity problems related to SELinux, you can do either one of the following:

Copyright © 2021 Trifacta Inc. Page #14 1. Disable SELinux on the server. For more information, please see the CentOS documentation. 2. Apply the following commands on the server, as root: a. Open ports on the server for listening. i. By default, the Trifacta application listens on port 3005. The following opens that port when SELinux is enabled:

semanage port -a -t http_port_t -p tcp 3005

ii. Repeat the above step for any other ports that you wish to open on the server. b. Permit nginx, the proxy on the Trifacta node, to open websockets:

setsebool -P httpd_can_network_connect 1

CloudFormation stack fails to deploy with EMR error

After the stack fails to deploy you may see an error like:

ElasticMapReduce Cluster with Id j-2CFQJ4K8HABCD, is in state TERMINATED_WITH_ERRORS and failed to stabilize due to the following reason: {Code: VALIDATION_ERROR,Message: You cannot specify a ServiceAccessSecurityGroup for a cluster launched in public subnet.}

To solve this you must select a private subnet for the EMR cluster to launch in.

For more information, see https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-vpc-subnet.html

Upgrade

For more information, see Upgrade for AWS Marketplace with EMR.

Documentation

You can access complete product documentation in online and PDF format. After the platform has been installed, login to the Trifacta application. Select Help menu > Documentation.

Copyright © 2021 Trifacta Inc. Page #15 Upgrade for AWS Marketplace

For more information on upgrading Trifacta® Data Preparation for Amazon Redshift and S3, please contact Trifacta Support.

Copyright © 2021 Trifacta Inc. Page #16 Configure

The following topics describe how to configure Trifacta® Wrangler Enterprise for initial deployment and continued use.

For more information on administration tasks and admin resources, see Admin.

Copyright © 2021 Trifacta Inc. Page #17 Configure for AWS

Contents:

Internet Access Database Installation Base AWS Configuration Base Storage Layer Configure AWS Region AWS Authentication AWS Auth Mode AWS Credential Provider AWS Storage S3 Sources Redshift Connections Snowflake Connections AWS Clusters EMR Hadoop

This documentation applies to installation from a supported Marketplace. Please use the installation instructions provided with your deployment.

If you are installing or upgrading a Marketplace deployment, please use the available PDF content. You must use the install and configuration PDF available through the Marketplace listing.

The Trifacta® platform can be hosted within Amazon and supports integrations with multiple services from Amazon Web Services, including combinations of services for hybrid deployments. This section provides an overview of the integration options, as well as links to related configuration topics.

For an overview of AWS deployment scenarios, see Supported Deployment Scenarios for AWS.

Internet Access

From AWS, the Trifacta platform requires Internet access for the following services:

NOTE: Depending on your AWS deployment, some of these services may not be required.

AWS S3 Key Management System [KMS] (if sse-kms server side encryption is enabled) Secure Token Service [STS] (if temporary credential provider is used) EMR (if integration with EMR cluster is enabled)

NOTE: If the Trifacta platform is hosted in a VPC where Internet access is restricted, access to S3, KMS and STS services must be provided by creating a VPC endpoint. If the platform is accessing an EMR cluster, a proxy server can be configured to provide access to the AWS ElasticMapReduce regional endpoint.

Copyright © 2021 Trifacta Inc. Page #18 Database Installation

The following database scenarios are supported.

Database Description Host

Cluster By default, the Trifacta databases are installed on PostgreSQL instances in the Trifacta node or another accessible node in node the enterprise environment. For more information, see Install Databases.

Amazon For Amazon-based installations, you can install the Trifacta databases on PostgreSQL instances on Amazon RDS. RDS

Base AWS Configuration

The following configuration topics apply to AWS in general.

You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json

. For more information, see Platform Configuration Methods.

Base Storage Layer

NOTE: The base storage layer must be set during initial configuration and cannot be modified after it is set.

S3: Most of these integrations require use of S3 as the base storage layer, which means that data uploads, default location of writing results, and sample generation all occur on S3. When base storage layer is set to S3, the Trifacta platform can:

read and write to S3 read and write to Redshift connect to an EMR cluster

HDFS: In on-premises installations, it is possible to use S3 as a read-only option for a Hadoop-based cluster when the base storage layer is HDFS. You can configure the platform to read from and write to S3 buckets during job execution and sampling. For more information, see Enable S3 Access.

For more information on setting the base storage layer, see Set Base Storage Layer.

For more information, see Storage Deployment Options.

Configure AWS Region

For Amazon integrations, you can configure the Trifacta node to connect to Amazon datastores located in different regions.

NOTE: This configuration is required under any of the following deployment conditions:

1. The Trifacta node is installed on-premises, and you are integrating with Amazon resources. 2. The EC2 instance hosting the Trifacta node is located in a different AWS region than your Amazon datastores. 3. The Trifacta node or the EC2 instance does not have access to s3.amazonaws.com .

Copyright © 2021 Trifacta Inc. Page #19 1. In the AWS console, please identify the location of your datastores in other regions. For more information, see the Amazon documentation. 2. Login to the Trifacta application. 3. You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json

. For more information, see Platform Configuration Methods. 4. Set the value of the following property to the region where your S3 datastores are located:

aws.s3.region

If the above value is not set, then the Trifacta platform attempts to infer the region based on default S3 bucket location. 5. Save your changes.

Configure region for AWS GovCloud

If your instance of the Trifacta platform is deployed in the AWS GovCloud, additional configuration is required.

NOTE: GovCloud authentication is completely isolated from Amazon.com. Key-secret combinations for AWS do not apply to AWS GovCloud.

For more information, see https://docs.aws.amazon.com/govcloud-us/latest/UserGuide/govcloud-differences.html.

NOTE: In AWS GovCloud, the AWS S3 endpoint must be configured in the private subnet or VPC to be able to make outbound requests to S3 resources.

You must configure the region and endpoint settings to communicate with GovCloud S3 resources.

Steps:

1. You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json

. For more information, see Platform Configuration Methods. 2. In the Admin Settings page, specify the following parameters as follows:

Parameter Description

aws.region Specify the AWS GovCloud region where your S3 resources are located.

aws.s3.endpoint Specify the private subnet or VPC endpoint through which the Trifacta platform can reach S3 resources.

3. Save your changes. 4. Login to the Trifacta node as an administrator. 5. Edit the following file:

/opt/trifacta/conf/env.sh

6. Add the following line:

Copyright © 2021 Trifacta Inc. Page #20 AWS_REGION=

where: is the value you inserted for the AWS GovCloud region in the Admin Settings page. 7. Save the file. 8. If the following does not apply, restart the platform.

Accessing AWS secret region using an emulation platform:

1. If you are connecting to the AWS secret region using an emulation platform with custom TLS certificates, you must install and reference a valid TLS/SSL certificate for the emulation platform. a. To the env.sh file, please add the following line:

NODE_EXTRA_CA_CERTS=/home/trifacta/ca-chain.cert.pem

NOTE: The above certificate must be a valid TLS/SSL certificate for the emulation platform.

b. For AWS requests from Java services, you must add the custom CA certificate to the CA certificates for the system. For more information, please see the documentation for your emulation platform. 2. Save your changes. 3. Restart the platform.

AWS Authentication

The following table illustrates the various methods of managing authentication between the platform and AWS. The matrix of options is basically determined by the settings for two key parameters.

credential provider - source of credentials: platform (default), instance (EC2 instance only), or temporary AWS mode - the method of authentication from platform to AWS: system-wide or by-user

AWS Mode System User

Credential Provider

Default One system-wide key/secret combo is inserted in the platform for Each user provides key/secret combo. use

Config: Config:

"aws.credentialProvider": "default", "aws.credentialProvider": "default", "aws.mode": "system", "aws.mode": "user", "aws.s3.key": , "aws.s3.secret": , User: Configure Your Access to S3

Instance Platform uses EC2 instance roles. Users provide EC2 instance roles.

Config: Config:

"aws.credentialProvider": "instance", "aws.credentialProvider": "aws.mode": "system", "instance", "aws.mode": "user",

Temporary Temporary credentials are issued based on per-user IAM roles. Per-user authentication when using IAM role.

Copyright © 2021 Trifacta Inc. Page #21 Config: Config:

"aws.credentialProvider": "temporary", "aws.credentialProvider": "aws.mode": "system", "temporary", "aws.systemIAMRole": ", "aws.mode": "user",

AWS Auth Mode

When connecting to AWS, the platform supports the following basic authentication modes:

Mode Configuration Description

system Access to AWS resources is managed through a single, system account. The account that you specify is "aws.mode": based on the credential provider selected below. "system", The instance credential provider ignores this setting.

See below.

user Authentication must be specified for individual users. "aws.mode": "user", NOTE: Creation and use of custom dictionaries is not supported in user mode.

Tip: In AWS user mode, Trifacta administrators can manage S3 access for users through the Admin Settings page. See Manage Users.

AWS Credential Provider

The Trifacta platform supports the following methods of providing credentialed access to AWS and S3 resources.

Type Configuration Description

default This method uses the provided AWS Key and Secret values to access resources. See below. "aws. credentialProvider ":"default",

instanc When you are running the Trifacta platform on an EC2 instance, you can leverage your enterprise "aws. e IAM roles to manage permissions on the instance for the Trifacta platform. See below. credentialProvider ":"instance",

tempor Details are below. ary

Default credential provider

Whether the AWS access mode is set to system or user, the default credential provider for AWS and S3 resources is the Trifacta platform.

Copyright © 2021 Trifacta Inc. Page #22 Mode Description Configuration

A single AWS Key and Secret is inserted into platform configuration. This "aws. "aws.s3.key": account is used to access all resources and must have the appropriate mode": "", permissions to do so. "system "aws.s3.secret": ", "",

Each user must specify an AWS Key and Secret into the account to access For more information on configuring "aws. resources. individual user accounts, see mode": Configure Your Access to S3. "user",

Default credential provider with EMR:

If you are using this method and integrating with an EMR cluster:

Copying the custom credential JAR file must be added as a bootstrap action to the EMR cluster definition. See Configure for EMR. As an alternative to copying the JAR file, you can use the EMR EC2 instance-based roles to govern access. In this case, you must set the following parameter:

"aws.emr.forceInstanceRole": true,

For more information, see Configure for EC2 Role-Based Authentication.

Instance credential provider

When the platform is running on an EC2 instance, you can manage permissions through pre-defined IAM roles.

NOTE: If the Trifacta platform is connected to an EMR cluster, you can force authentication to the EMR cluster to use the specified IAM instance role. See Configure for EMR.

For more information, see Configure for EC2 Role-Based Authentication.

Temporary credential provider

For even better security, you can enable use temporary credentials provided from your AWS resources based on an IAM role specified per user.

Tip: This method is recommended by AWS.

Set the following properties.

Property Description

"aws. If aws.mode = system , set this value to temporary . credentialProvider" If aws.mode = user and you are using per-user authentication, then this setting is ignored and should stay as default.

Copyright © 2021 Trifacta Inc. Page #23 Per-user authentication

Individual users can be configured to provide temporary credentials for access to AWS resources, which is a more secure authentication solution. For more information, see Configure AWS Per-User Auth for Temporary Credentials.

AWS Storage

S3 Sources

To integrate with S3, additional configuration is required. See Enable S3 Access.

Redshift Connections

You can create connections to one or more Redshift databases, from which you can read database sources and to which you can write job results. Samples are still generated on S3.

NOTE: Relational connections require installation of an encryption key file on the Trifacta node. For more information, see Create Encryption Key File.

For more information, see Create Redshift Connections.

Snowflake Connections

Through your AWS deployment, you can access your Snowflake databases. For more information, see Create Snowflake Connections.

AWS Clusters

Trifacta Wrangler Enterprise can integrate with one instance of either of the following.

NOTE: If Trifacta Wrangler Enterprise is installed through the Amazon Marketplace, only the EMR integration is supported.

EMR

When Trifacta Wrangler Enterprise in installed through AWS, you can integrate with an EMR cluster for Spark- based job execution. For more information, see Configure for EMR.

Hadoop

If you have installed Trifacta Wrangler Enterprise on-premises or directly into an EC2 instance, you can integrate with a Hadoop cluster for Spark-based job execution. See Configure for Hadoop.

Copyright © 2021 Trifacta Inc. Page #24 Enable S3 Access

Contents:

Base Storage Layer Limitations Pre-requisites Required AWS Account Permissions Configuration Define base storage layer Enable job output manifest Enable read access to S3 Configure file storage protocols and locations Java VFS service S3 access modes System mode - additional configuration S3 Configuration Configuration reference Enable use of server-side encryption Configure S3 filewriter Create Redshift Connection Hadoop distribution-specific configuration Hortonworks Additional Configuration for S3 Testing Troubleshooting Profiling consistently fails for S3 sources of data Spark local directory has no space

Below are instructions on how to configure Trifacta® Wrangler Enterprise to point to S3.

Simple Storage Service (S3) is an online data storage service provided by Amazon, which provides low- latency access through web services. For more information, see https://aws.amazon.com/s3/.

NOTE: Please review the limitations before enabling. See Limitations of S3 Integration below.

Base Storage Layer

If base storage layer is S3: you can enable read/write access to S3. If base storage layer is not S3: you can enable read-only access to S3.

Limitations

The Trifacta platform only supports running S3-enabled instances over AWS. Access to AWS S3 Regional Endpoints through internet protocol is required. If the machine hosting the Trif acta platform is in a VPC with no internet access, a VPC endpoint enabled for S3 services is required. The Trifacta platform does not support access to S3 through a proxy server. Write access requires using S3 as the base storage layer. See Set Base Storage Layer.

Copyright © 2021 Trifacta Inc. Page #25 NOTE: Spark 2.3.0 jobs may fail on S3-based datasets due to a known incompatibility. For details, see https://github.com/apache/incubator-druid/issues/4456.

If you encounter this issue, please set spark.version to 2.1.0 in platform configuration. For more information, see Admin Settings Page.

Pre-requisites

On the Trifacta node, you must install the Oracle Java Runtime Environment for Java 1.8. Other versions of the JRE are not supported. For more information on the JRE, see http://www.oracle.com/technetwork/java/javase/downloads/index.html. If IAM instance role is used for S3 access, it must have access to resources at the bucket level.

Required AWS Account Permissions

For more information, see Required AWS Account Permissions.

Configuration

Depending on your S3 environment, you can define:

read access to S3 access to additional S3 buckets S3 as base storage layer Write access to S3 S3 bucket that is the default write destination

Define base storage layer

The base storage layer is the default platform for storing results.

Required for:

Write access to S3 Connectivity to Redshift

The base storage layer for your Trifacta instance is defined during initial installation and cannot be changed afterward.

If S3 is the base storage layer, you must also define the default storage bucket to use during initial installation, which cannot be changed at a later time. See Define default S3 write bucket below.

For more information on the various options for storage, see Storage Deployment Options.

For more information on setting the base storage layer, see Set Base Storage Layer.

Enable job output manifest

When the base storage layer is set to S3, you must enable the platform to generate job output manifest files.

During job execution, the platform creates a manifest file of all files generated during job execution. This manifest file is used as a reference to indicate the files that eventually appear, as there can be consistency issues between the files that S3 reports as present and the files that are actually written. The manifest assists the platform in determining when the files are actually ready for use.

Copyright © 2021 Trifacta Inc. Page #26 Manifest filenames begin with an underscore. Manifest files are small. Manifest files may be written even for single-file outputs.

NOTE: Avoid deleting manifest files. After results have been generated, the manifest file may be used in the future for additional writing and publishing operations.

When the job results are published, this manifest file ensures proper publication.

NOTE: This feature must be enabled when using S3 as the base storage layer.

Steps:

1. You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json

. For more information, see Platform Configuration Methods. 2. Locate the following parameter and set it to true:

"feature.enableJobOutputManifest": true,

3. Save your changes and restart the platform.

Enable read access to S3

When read access is enabled, Trifacta users can explore S3 buckets for creating datasets.

NOTE: When read access is enabled, Trifacta users have automatic access to all buckets to which the specified S3 user has access. You may want to create a specific user account for S3 access.

NOTE: Data that is mirrored from one S3 bucket to another might inherit the permissions from the bucket where it is owned.

Steps:

1. You apply this change through the Workspace Settings Page. For more information, see Platform Configuration Methods. 2. Set the following property to enabled:

Enable S3 Connectivity

3. Save your changes. 4. In the S3 configuration section, set enabled=true, which allows Trifacta users to browse S3 buckets through the Trifacta application. 5. Specify the AWS key and secret values for the user to access S3 storage.

Copyright © 2021 Trifacta Inc. Page #27 Configure file storage protocols and locations

The Trifacta platform must be provided the list of protocols and locations for accessing S3.

Steps:

1. You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json

. For more information, see Platform Configuration Methods. 2. Locate the following parameters and set their values according to the table below:

"fileStorage.whitelist": ["s3"], "fileStorage.defaultBaseUris": ["s3:///"],

Parameter Description

filestorage. A comma-separated list of protocols that are permitted to access S3. whitelist

NOTE: The protocol identifier "s3" must be included in this list.

filestorage. For each supported protocol, this parameter must contain a top-level path to the location where Trifacta platform defaultBase files can be stored. These files include uploads, samples, and temporary storage used during job execution. Uris

NOTE: A separate base URI is required for each supported protocol. You may only have one base URI for each protocol.

NOTE: For S3, three slashes at the end are required, as the third one is the end of the path value. This value is used as the base URI for all S3 connections created in Trifacta Wrangler Enterprise.

Example:

s3:///

The above example is the most common example, as it is used as the base URI for all S3 connections that you create. Do not add a bucket value to the above URI. 3. Save your changes and restart the platform.

Java VFS service

Use of SFTP connections requires the Java VFS service in the Trifacta platform.

NOTE: This service is enabled by default.

For more information on configuring this service, see Configure Java VFS Service.

S3 access modes

The Trifacta platform supports the following modes for access S3. You must choose one access mode and then complete the related configuration steps.

Copyright © 2021 Trifacta Inc. Page #28 NOTE: Avoid switching between user mode and system mode, which can disable user access to data. At install mode, you should choose your preferred mode.

System mode

(default) Access to S3 buckets is enabled and defined for all users of the platform. All users use the same AWS access key, secret, and default bucket.

System mode - read-only access

For read-only access, the key, secret, and default bucket must be specified in configuration.

NOTE: Please verify that the AWS account has all required permissions to access the S3 buckets in use. The account must have the ListAllMyBuckets ACL among its permissions.

Steps:

1. You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json

. For more information, see Platform Configuration Methods. 2. Locate the following parameters:

Parameters Description

aws.s3.key Set this value to the AWS key to use to access S3.

aws.s3.secret Set this value to the secret corresponding to the AWS key provided.

aws.s3.bucket.name Set this value to the name of the S3 bucket from which users may read data.

NOTE: Bucket names are not validated.

NOTE: Additional buckets may be specified. See below.

3. Save your changes.

User mode

Optionally, access to S3 can be defined on a per-user basis. This mode allows administrators to define access to specific buckets using various key/secret combinations as a means of controlling permissions.

NOTE: When this mode is enabled, individual users must have AWS configuration settings applied to their account, either by an administrator or by themselves. The global settings in this section do not apply in this mode.

To enable:

1. You apply this change through the Workspace Settings Page. For more information, see Platform Configuration Methods. 2. Verify that the following setting has been set to enabled:

Copyright © 2021 Trifacta Inc. Page #29 Enable S3 Connectivity

3. You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json

. For more information, see Platform Configuration Methods.

4. Please verify that the setting below has been configured:

"aws.mode": "user",

5. Additional configuration is required for per-user authentication. For more information, see Configure AWS Per-User Authentication.

User mode - Create encryption key file

If you have enabled user mode for S3 access, you must create and deploy an encryption key file. For more information, see Create Encryption Key File.

NOTE: If you have enabled user access mode, you can skip the following sections, which pertain to the system access mode, and jump to the Enable Redshift Connection section below.

System mode - additional configuration

The following sections apply only to system access mode.

Define default S3 write bucket

When S3 is defined as the base storage layer, write access to S3 is enabled. The Trifacta platform attempts to store outputs in the designated default S3 bucket.

NOTE: This bucket must be set during initial installation. Modifying it at a later time is not recommended and can result in inaccessible data in the platform.

NOTE: Bucket names cannot have underscores in them. See http://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html.

Steps:

1. Define S3 to be the base storage layer. See Set Base Storage Layer. 2. Enable read access. See Enable read access. 3. Specify a value for aws.s3.bucket.name which defines the S3 bucket where data is written. Do not include a protocol identifier. For example, if your bucket address is s3://MyOutputBucket, the value to specify is the following:

MyOutputBucket

NOTE: Bucket names are not validated.

Copyright © 2021 Trifacta Inc. Page #30 NOTE: Specify the top-level bucket name only. There should not be any backslashes in your entry.

NOTE: This bucket also appears as a read-access bucket if the specified S3 user has access.

Adding additional S3 buckets

When read access is enabled, all S3 buckets of which the specified user is the owner appear in the Trifacta application. You can also add additional S3 buckets from which to read.

NOTE: Additional buckets are accessible only if the specified S3 user has read privileges.

NOTE: Bucket names cannot have underscores in them.

Steps:

1. You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json

. For more information, see Platform Configuration Methods.

2. Locate the following parameter: aws.s3.extraBuckets: a. In the Admin Settings page, specify the extra buckets as a comma-separated string of additional S3 buckets that are available for storage. Do not put any quotes around the string. Whitespace between string values is ignored. b. In trifacta-conf.json

, specify the extraBuckets array as a comma-separated list of buckets as in the following:

"extraBuckets": ["MyExtraBucket01","MyExtraBucket02","MyExtraBucket03"]

NOTE: Specify the top-level bucket name only. There should not be any backslashes in your entry.

NOTE: Bucket names are not validated.

3. Specify the extraBuckets array as a comma-separated list of buckets as in the following:

"extraBuckets": ["MyExtraBucket01","MyExtraBucket02","MyExtraBucket03"]

4. These values are mapped to the following bucket addresses:

s3://MyExtraBucket01 s3://MyExtraBucket02 s3://MyExtraBucket03

Copyright © 2021 Trifacta Inc. Page #31 S3 Configuration

Configuration reference

You apply this change through the Workspace Settings Page. For more information, see Platform Configuration Methods.

Setting Description

Enable S3 When set to enabled, the S3 file browser is displayed in the GUI for locating files. For more Connectivity information, see S3 Browser.

You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json

. For more information, see Platform Configuration Methods.

"aws.s3.bucket.name": "" "aws.s3.key": "", "aws.s3.secret": "", "aws.s3.extraBuckets": [""]

Setting Description

bucket. Set this value to the name of the S3 bucket to which you are writing. name When webapp.storageProtocol is set to s3 , the output is delivered to aws.s3.bucket. name.

key Access Key ID for the AWS account to use.

NOTE: This value cannot contain a slash (/ ).

secret Secret Access Key for the AWS account.

extraBuckets Add references to any additional S3 buckets to this comma-separated array of values.

The S3 user must have read access to these buckets.

Enable use of server-side encryption

You can configure the Trifacta platform to publish data on S3 when a server-side encryption policy is enabled. SSE-S3 and SSE-KMS methods are supported. For more information, see http://docs.aws.amazon.com/AmazonS3/latest/dev/serv-side-encryption.html.

Notes:

When encryption is enabled, all buckets to which you are writing must share the same encryption policy. Read operations are unaffected.

To enable, please specify the following parameters.

You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json

. For more information, see Platform Configuration Methods.

Copyright © 2021 Trifacta Inc. Page #32 Server-side encryption method

"aws.s3.serverSideEncryption": "none",

Set this value to the method of encryption used by the S3 server. Supported values:

NOTE: Lower case values are required.

sse-s3 sse-kms none

Server-side KMS key identifier

When KMS encryption is enabled, you must specify the AWS KMS key ID to use for the server-side encryption.

"aws.s3.serverSideKmsKeyId": "",

Notes:

Access to the key: Access must be provided to the authenticating user. The AWS IAM role must be assigned to this key. Encrypt/Decrypt permissions for the specified KMS key ID: Permissions must be assigned to the authenticating user. The AWS IAM role must be given these permissions. For more information, see https://docs.aws.amazon.com/kms/latest/developerguide/key-policy-modifying.html .

The format for referencing this key is the following:

"arn:aws:kms:::key/"

You can use an AWS alias in the following formats. The format of the AWS-managed alias is the following:

"alias/aws/s3"

The format for a custom alias is the following:

"alias/"

where:

is the name of the alias for the entire key.

Save your changes and restart the platform.

Configure S3 filewriter

The following configuration can be applied through the Hadoop site-config.xml file. If your installation does not have a copy of this file, you can insert the properties listed in the steps below into

Copyright © 2021 Trifacta Inc. Page #33 trifacta-conf.json

to configure the behavior of the S3 filewriter.

Steps:

1. You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json

. For more information, see Platform Configuration Methods.

2. Locate the filewriter.hadoopConfig block, where you can insert the following Hadoop configuration properties:

"filewriter": { max: 16, "hadoopConfig": { "fs.s3a.buffer.dir": "/tmp", "fs.s3a.fast.upload": "true" }, ... }

Property Description

fs.s3a. Specifies the temporary directory on the Trifacta node to use for buffering when uploading to S3. If fs.s3a. buffer. fast.upload is set to false , this parameter is unused. dir

NOTE: This directory must be accessible to the Batch Job Runner process during job execution.

fs.s3a. Set to true to enable buffering in blocks. fast. upload When set to false , buffering in blocks is disabled. For a given file, the entire object is buffered to the disk of the Tri facta node. Depending on the size and volume of your datasets, the node can run out of disk space. 3. Save your changes and restart the platform.

Create Redshift Connection

For more information, see Create Redshift Connections.

Hadoop distribution-specific configuration

Hortonworks

NOTE: If you are using Spark profiling through Hortonworks HDP on data stored in S3, additional configuration is required. See Configure for Hortonworks.

Additional Configuration for S3

The following parameters can be configured through the Trifacta platform to affect the integration with S3. You may or may not need to modify these values for your deployment.

You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json

Copyright © 2021 Trifacta Inc. Page #34 . For more information, see Platform Configuration Methods.

Parameter Description

aws.s3. S3 does not guarantee that at any time the files that have been written to a directory will be consistent with the files consistencyTim available for reading. S3 does guarantee that eventually the files are in sync. eout This guarantee is important for some platform jobs that write data to S3 and then immediately attempt to read from the written data.

This timeout defines how long the platform waits for this guarantee. If the timeout is exceeded, the job is failed. The default value is 120 .

Depending on your environment, you may need to modify this value.

aws.s3.endpoint This value should be the S3 endpoint DNS name value.

NOTE: Do not include the protocol identifier.

Example value:

s3.us-east-1.amazonaws.com

If your S3 deployment is either of the following:

located in a region that does not support the default endpoint, or v4-only signature is enabled in the region

Then, you can specify this setting to point to the S3 endpoint for Java/Spark services.

For more information on this location, see https://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region.

Testing

Restart services. See Start and Stop the Platform.

Try running a simple job from the Trifacta application. For more information, see Verify Operations.

Troubleshooting

Profiling consistently fails for S3 sources of data

If you are executing visual profiles of datasets sourced from S3, you may see an error similar to the following in the batch-job-runner.log file:

01:19:52.297 [Job 3] ERROR com.trifacta.hadoopdata.joblaunch.server.BatchFileWriterWorker - BatchFileWriterException: Batch File Writer unknown error: {jobId=3, why=bound must be positive} 01:19:52.298 [Job 3] INFO com.trifacta.hadoopdata.joblaunch.server.BatchFileWriterWorker - Notifying monitor for job 3 with status code FAILURE

This issue is caused by improperly configuring buffering when writing to S3 jobs. The specified local buffer cannot be accessed as part of the batch job running process, and the job fails to write results to S3.

Solution:

You may do one of the following:

Use a valid temp directory when buffering to S3. Disable buffering to directory completely.

Copyright © 2021 Trifacta Inc. Page #35 Steps:

You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json

. For more information, see Platform Configuration Methods.

Locate the following, where you can insert either of the following Hadoop configuration properties:

"filewriter": { max: 16, "hadoopConfig": { "fs.s3a.buffer.dir": "/tmp", "fs.s3a.fast.upload": false }, ... }

Property Description

fs.s3a. Specifies the temporary directory on the Trifacta node to use for buffering when uploading to S3. If fs.s3a. buffer.dir fast.upload is set to false , this parameter is unused.

fs.s3a. When set to false , buffering is disabled. fast.upload

Save your changes and restart the platform.

Spark local directory has no space

During execution of a Spark job, you may encounter the following error:

org.apache.hadoop.util.DiskChecker$DiskErrorException: No space available in any of the local directories.

Solution:

Restart Trifacta services, which may free up some temporary space. Use the steps in the preceding solution to reassign a temporary directory for Spark to use (fs.s3a. buffer.dir).

Copyright © 2021 Trifacta Inc. Page #36 Create Redshift Connections

Contents:

Pre-requisites Permissions Limitations Create Connection Create through application Reference Connection URL Driver Information Troubleshooting Use Testing

This section provides information on how to enable Redshift connectivity and create one or more connections to Redshift sources.

Amazon Redshift is a hosted data warehouse available through Amazon Web Services. It is frequently used for hosting of datasets used by downstream analytic tools such as Tableau and Qlik. For more information, see https://aws.amazon.com/redshift/.

Pre-requisites

Before you begin, please verify that your Trifacta® environment meets the following requirements:

Tip: If the credentials used to connect to S3 do not provide access to Redshift, you can create an independent IAM role to provide access from Redshift to S3. If this separate role is available, the Redshift connection uses it instead. There may be security considerations.

1. S3 base storage layer: Redshift access requires use of S3 as the base storage layer, which must be enabled. See Set Base Storage Layer. 2. Same region: The Redshift cluster must be in the same region as the default S3 bucket. 3. Integration: Your Trifacta instance is connected to a running environment supported by your product edition.

4. Deployment: Trifacta platform is deployed either on-premises or in EC2.

Permissions

Access to Redshift requires:

Each user is able to access S3 S3 is the base storage layer

If the credentials used to connect to S3 do not provide access to Redshift, you can create an independent IAM role to provide access from Redshift to S3. If this separate role is available, the Redshift connection uses it instead.

Copyright © 2021 Trifacta Inc. Page #37 NOTE: There may be security considerations with using an independent role to govern this capability.

Steps:

1. The IAM role must contain the required S3 permissions. See Required AWS Account Permissions. 2. The Redshift cluster should be assigned this IAM role. For more information, see https://docs.aws.amazon.com/redshift/latest/mgmt/authorizing-redshift-service.html.

Limitations

You can publish any specific job once to Redshift through the export window. See Publishing Dialog.

When publishing to Redshift through the Publishing dialog, output must be in Avro or JSON format. This limitation does not apply to direct writing to Redshift. Management of nulls: Nulls are displayed as expected in the Trifacta application. When Redshift jobs are run, the UNLOAD SQL command in Redshift converts all nulls to empty strings. Null values appear as empty strings in generated results, which can be confusing. This is a known issue with Redshift.

Create Connection

You can create Redshift connections through the following methods.

Tip: SSL connections are recommended. Details are below.

Create through application

Any user can create a Redshift connection through the application.

Steps:

1. Login to the application. 2. In the menu, click User menu > Preferences > Connections. 3. In the Create Connection page, click the Redshift connection card. 4. Specify the properties for your Redshift database connection. The following parameters are specific to Redshift connections:

Property Description

IAM Role ARN for (Optional) You can specify an IAM role ARN that enables role-based connectivity between Redshift and the Redshift-S3 S3 bucket that is used as intermediate storage during Redshift bulk COPY/UNLOAD operations. Example: Connectivity arn:aws:iam::1234567890:role/MyRedshiftRole

For more information, see Create Connection Window. 5. Click Save.

Copyright © 2021 Trifacta Inc. Page #38 Enable SSL connections

To enable SSL connections to Redshift, you must enable them first on your Redshift cluster. For more information, see https://docs.aws.amazon.com/redshift/latest/mgmt/connecting-ssl-support.html.

In your connection to Redshift, please add the following string to your Connect String Options:

;ssl=true

Save your changes.

Reference

Connection URL

The properties that you provide are inserted into the following URL, which connects Trifacta Wrangler Enterprise t o the connection:

jdbc:redshift://:/

Connect string options

The connect string options are optional. If you are passing additional properties and values to complete the connection, the connect string options must be structured in the following manner:

;=&=;...

where:

: the name of the property : the value for the property

Delimiters:

; : any set of connect string options must begin and end with a semi-colon. ; : all additional property names must be prefixed with a semi-colon. = : property names and values must be separated with an equal sign (=).

Driver Information

This connection uses the following driver:

Driver name: com.amazon.redshift.jdbc41.Driver Driver version: com.amazon.redshift:redshift-jdbc41-no-awssdk:1.2.41.1065 Driver documentation: https://docs.aws.amazon.com/redshift/latest/mgmt/configure-jdbc-connection.html

Troubleshooting

For more information, see https://docs.aws.amazon.com/redshift/latest/mgmt/troubleshooting-connections.html.

Use

For more information, see Redshift Browser.

Copyright © 2021 Trifacta Inc. Page #39 For more information about interacting with data on Redshift, see Using Redshift.

Testing

Import a dataset from Redshift. Add it to a flow, and specify a publishing action. Run a job.

NOTE: When publishing to Redshift through the Publishing dialog, output must be in Avro or JSON format. This limitation does not apply to direct writing to Redshift.

Copyright © 2021 Trifacta Inc. Page #40 Configure for EC2 Role-Based Authentication

Contents:

IAM roles AWS System Mode Additional AWS Configuration Use of S3 Sources

When you are running the Trifacta platform on an EC2 instance, you can leverage your enterprise IAM roles to manage permissions on the instance for the Trifacta platform. When this type of authentication is enabled, Trifacta administrators can apply a role to the EC2 instance where the platform is running. That role's permissions apply to all users of the platform.

IAM roles

Before you begin, your IAM roles should be defined and attached to the associated EC2 instance.

NOTE: The IAM instance role used for S3 access should have access to resources at the bucket level.

For more information, see http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html.

AWS System Mode

To enable role-based instance authentication, the following parameter must be enabled.

"aws.mode": "system",

Additional AWS Configuration

The following additional parameters must be specified:

Parameter Description

aws.credentialProvider Set this value to instance. IAM instance role is used for providing access.

aws. Set this value to true for CDH. The class information is provided below. hadoopFsUseSharedInstanceProvider

Shared instance provider class information

Hortonworks:

"com.amazonaws.auth.InstanceProfileCredentialsProvider",

Pre-Cloudera 6.0.0:

Copyright © 2021 Trifacta Inc. Page #41 "org.apache.hadoop.fs.s3a.SharedInstanceProfileCredentialsProvider"

Cloudera 6.0.0 and later:

Set the above parameters as follows:

"aws.credentialProvider": "instance", "aws.hadoopFSUseSharedInstanceProvider": false,

Use of S3 Sources

To access S3 for storage, additional configuration for S3 may be required.

NOTE: Do not configure the properties that apply to user mode.

Output sizing recommendations:

Single-file output: If you are generating a single file, you should try to keep its size under 1 GB. Multi-part output: For multiple-file outputs, each part file should be under 1 GB in size. For more information, see https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-use-multiple-files.html

For more information, see Enable S3 Access.

Copyright © 2021 Trifacta Inc. Page #42 Legal

Copyright © 2021 Trifacta Inc. Page #43 Third-Party License Information

Copyright © 2021 Trifacta Inc.

This product also includes the following libraries which are covered by The (MIT AND BSD-3-Clause):

sha.js

This product also includes the following libraries which are covered by The 2-clause BSD License:

double_metaphone

This product also includes the following libraries which are covered by The 3-Clause BSD License:

com.google.protobuf.protobuf-java com.google.protobuf.protobuf-java-util

This product also includes the following libraries which are covered by The ASL:

funcsigs org.json4s.json4s-ast_2.10 org.json4s.json4s-core_2.10 org.json4s.json4s-native_2.10

This product also includes the following libraries which are covered by The ASL 2.0:

pykerberos

This product also includes the following libraries which are covered by The Amazon Redshift ODBC and JDBC Driver License Agreement:

RedshiftJDBC41-1.1.7.1007 com.amazon.redshift.RedshiftJDBC41

This product also includes the following libraries which are covered by The Apache 2.0 License:

com.uber.jaeger.jaeger-b3 com.uber.jaeger.jaeger-core com.uber.jaeger.jaeger-thrift com.uber.jaeger.jaeger-zipkin org.apache.spark.spark-catalyst_2.11 org.apache.spark.spark-core_2.11 org.apache.spark.spark-hive_2.11 org.apache.spark.spark-kvstore_2.11 org.apache.spark.spark-launcher_2.11 org.apache.spark.spark-network-common_2.11 org.apache.spark.spark-network-shuffle_2.11 org.apache.spark.spark-sketch_2.11 org.apache.spark.spark-sql_2.11 org.apache.spark.spark-tags_2.11 org.apache.spark.spark-unsafe_2.11 org.apache.spark.spark-yarn_2.11

This product also includes the following libraries which are covered by The :

com.chuusai.shapeless_2.11 commons-httpclient.commons-httpclient org.apache.httpcomponents.httpclient org.apache.httpcomponents.httpcore

Copyright © 2021 Trifacta Inc. Page #44 This product also includes the following libraries which are covered by The Apache License (v2.0):

com.vlkan.

This product also includes the following libraries which are covered by The Apache License 2.0:

@google-cloud/pubsub @google-cloud/resource @google-cloud/storage arrow avro aws-sdk azure-storage bootstrap browser-request bytebuffer cglib.cglib-nodep com.amazonaws.aws-java-sdk com.amazonaws.aws-java-sdk-bundle com.amazonaws.aws-java-sdk-core com.amazonaws.aws-java-sdk-dynamodb com.amazonaws.aws-java-sdk-emr com.amazonaws.aws-java-sdk-iam com.amazonaws.aws-java-sdk-kms com.amazonaws.aws-java-sdk-s3 com.amazonaws.aws-java-sdk-sts com.amazonaws.jmespath-java com.carrotsearch.hppc com.clearspring.analytics.stream com.cloudera.navigator.navigator-sdk com.cloudera.navigator.navigator-sdk-client com.cloudera.navigator.navigator-sdk-model com.codahale.metrics.metrics-core com.cronutils.cron-utils com.databricks.spark-avro_2.11 com.fasterxml.classmate com.fasterxml.jackson.dataformat.jackson-dataformat-cbor com.fasterxml.jackson.datatype.jackson-datatype-joda com.fasterxml.jackson.jaxrs.jackson-jaxrs-base com.fasterxml.jackson.jaxrs.jackson-jaxrs-json-provider com.fasterxml.jackson.module.jackson-module-jaxb-annotations com.fasterxml.jackson.module.jackson-module-paranamer com.fasterxml.jackson.module.jackson-module-scala_2.11 com.fasterxml.uuid.java-uuid-generator com..nscala-time.nscala-time_2.10 com.github.stephenc.jcip.jcip-annotations com.google.api-client.google-api-client com.google.api-client.google-api-client-jackson2 com.google.api-client.google-api-client-java6 com.google.api..grpc-google-cloud-bigquerystorage-v1beta1 com.google.api.grpc.grpc-google-cloud--admin-v2 com.google.api.grpc.grpc-google-cloud-bigtable-v2 com.google.api.grpc.grpc-google-cloud-pubsub-v1 com.google.api.grpc.grpc-google-cloud--admin-database-v1 com.google.api.grpc.grpc-google-cloud-spanner-admin-instance-v1 com.google.api.grpc.grpc-google-cloud-spanner-v1 com.google.api.grpc.grpc-google-common-protos com.google.api.grpc.proto-google-cloud-bigquerystorage-v1beta1 com.google.api.grpc.proto-google-cloud-bigtable-admin-v2

Copyright © 2021 Trifacta Inc. Page #45 com.google.api.grpc.proto-google-cloud-bigtable-v2 com.google.api.grpc.proto-google-cloud-datastore-v1 com.google.api.grpc.proto-google-cloud-monitoring-v3 com.google.api.grpc.proto-google-cloud-pubsub-v1 com.google.api.grpc.proto-google-cloud-spanner-admin-database-v1 com.google.api.grpc.proto-google-cloud-spanner-admin-instance-v1 com.google.api.grpc.proto-google-cloud-spanner-v1 com.google.api.grpc.proto-google-common-protos com.google.api.grpc.proto-google-iam-v1 com.google.apis.google-api-services- com.google.apis.google-api-services-clouddebugger com.google.apis.google-api-services-cloudresourcemanager com.google.apis.google-api-services-dataflow com.google.apis.google-api-services-iam com.google.apis.google-api-services-oauth2 com.google.apis.google-api-services-pubsub com.google.apis.google-api-services-storage com.google.auto.service.auto-service com.google.auto.value.auto-value-annotations com.google.cloud.bigdataoss.gcsio com.google.cloud.bigdataoss.util com.google.cloud.google-cloud-bigquery com.google.cloud.google-cloud-bigquerystorage com.google.cloud.google-cloud-bigtable com.google.cloud.google-cloud-bigtable-admin com.google.cloud.google-cloud-core com.google.cloud.google-cloud-core-grpc com.google.cloud.google-cloud-core-http com.google.cloud.google-cloud-monitoring com.google.cloud.google-cloud-spanner com.google.code.findbugs.jsr305 com.google.code..gson com.google.errorprone.error_prone_annotations com.google.flogger.flogger com.google.flogger.flogger-system-backend com.google.flogger.google-extensions com.google.guava.failureaccess com.google.guava.guava com.google.guava.guava-jdk5 com.google.guava.listenablefuture com.google.http-client.google-http-client com.google.http-client.google-http-client-apache com.google.http-client.google-http-client-appengine com.google.http-client.google-http-client-jackson com.google.http-client.google-http-client-jackson2 com.google.http-client.google-http-client-protobuf com.google.inject.extensions.guice-servlet com.google.inject.guice com.google.j2objc.j2objc-annotations com.google.oauth-client.google-oauth-client com.google.oauth-client.google-oauth-client-java6 com.googlecode.javaewah.JavaEWAH com.googlecode.libphonenumber.libphonenumber com.hadoop.gplcompression.hadoop-lzo com.jakewharton.threetenabp.threetenabp com.jamesmurty.utils.java-xmlbuilder com.jolbox.bonecp com.mapr.mapr-root com.microsoft.azure.adal4j com.microsoft.azure.azure-core

Copyright © 2021 Trifacta Inc. Page #46 com.microsoft.azure.azure-storage com.microsoft.windowsazure.storage.microsoft-windowsazure-storage-sdk com.nimbusds.lang-tag com.nimbusds.nimbus-jose-jwt com.ning.compress-lzf com.opencsv.opencsv com.squareup.okhttp.okhttp com.squareup.okhttp3.logging-interceptor com.squareup.okhttp3.okhttp com.squareup.okhttp3.okhttp-urlconnection com.squareup.okio.okio com.squareup.retrofit2.adapter-rxjava com.squareup.retrofit2.converter-jackson com.squareup.retrofit2.retrofit com.trifacta.hadoop.cloudera4 com.twitter.chill-java com.twitter.chill_2.11 com.twitter.parquet-hadoop-bundle com.typesafe.akka.akka-actor_2.11 com.typesafe.akka.akka-cluster_2.11 com.typesafe.akka.akka-remote_2.11 com.typesafe.akka.akka-slf4j_2.11 com.typesafe.config com.univocity.univocity-parsers com.zaxxer.HikariCP com.zaxxer.HikariCP-java7 commons-beanutils.commons-beanutils commons-beanutils.commons-beanutils-core commons-cli.commons-cli commons-codec.commons-codec commons-collections.commons-collections commons-configuration.commons-configuration commons-dbcp.commons-dbcp commons-dbutils.commons-dbutils commons-digester.commons-digester commons-el.commons-el commons-fileupload.commons-fileupload commons-io.commons-io commons-lang.commons-lang commons-logging.commons-logging commons-net.commons-net commons-pool.commons-pool dateinfer de.odysseus.juel.juel-api de.odysseus.juel.juel-impl de.odysseus.juel.juel-spi express-opentracing google-benchmark googleapis io.airlift.aircompressor io.dropwizard.metrics.metrics-core io.dropwizard.metrics.metrics-graphite io.dropwizard.metrics.metrics-json io.dropwizard.metrics.metrics-jvm io.grpc.grpc-all io.grpc.grpc-alts io.grpc.grpc-auth io.grpc.grpc-context io.grpc.grpc-core io.grpc.grpc-grpclb

Copyright © 2021 Trifacta Inc. Page #47 io.grpc.grpc-netty io.grpc.grpc-netty-shaded io.grpc.grpc-okhttp io.grpc.grpc-protobuf io.grpc.grpc-protobuf-lite io.grpc.grpc-protobuf-nano io.grpc.grpc-stub io.grpc.grpc-testing io.netty.netty io.netty.netty-all io.netty.netty-buffer io.netty.netty-codec io.netty.netty-codec-http io.netty.netty-codec-http2 io.netty.netty-codec-socks io.netty.netty-common io.netty.netty-handler io.netty.netty-handler-proxy io.netty.netty-resolver io.netty.netty-tcnative-boringssl-static io.netty.netty-transport io.opentracing.contrib.opentracing-concurrent io.opentracing.contrib.opentracing-globaltracer io.opentracing.contrib.opentracing-web-servlet-filter io.opentracing.opentracing-api io.opentracing.opentracing-noop io.opentracing.opentracing-util io.prometheus.simpleclient io.prometheus.simpleclient_common io.prometheus.simpleclient_servlet io.reactivex.rxjava io.spray.spray-can_2.11 io.spray.spray-http_2.11 io.spray.spray-httpx_2.11 io.spray.spray-io_2.11 io.spray.spray-json_2.11 io.spray.spray-routing_2.11 io.spray.spray-util_2.11 io.springfox.springfox-core io.springfox.springfox-schema io.springfox.springfox-spi io.springfox.springfox-spring-web io.springfox.springfox-swagger-common io.springfox.springfox-swagger-ui io.springfox.springfox-swagger2 io.swagger.swagger-annotations io.swagger.swagger-models io.undertow.undertow-core io.undertow.undertow-servlet io.undertow.undertow-websockets-jsr io.zipkin.java.zipkin io.zipkin.reporter.zipkin-reporter io.zipkin.reporter.zipkin-sender-urlconnection io.zipkin.zipkin2.zipkin it.unimi.dsi.fastutil jaeger-client javax.inject.javax.inject javax.jdo.jdo-api javax.validation.validation-api joda-time.joda-time

Copyright © 2021 Trifacta Inc. Page #48 kerberos less libcuckoo .apache-log4j-extras log4j.log4j long mathjs mx4j.mx4j net.bytebuddy.byte-buddy net.hydromatic.eigenbase-properties net.java.dev.eval.eval net.java.dev.jets3t.jets3t net.jcip.jcip-annotations net.jpountz.lz4.lz4 net.minidev.accessors-smart net.minidev.json-smart net.sf.opencsv.opencsv net.snowflake.snowflake-jdbc opentracing org.activiti.activiti-bpmn-converter org.activiti.activiti-bpmn-layout org.activiti.activiti-bpmn-model org.activiti.activiti-common-rest org.activiti.activiti-dmn-api org.activiti.activiti-dmn-model org.activiti.activiti-engine org.activiti.activiti-form-api org.activiti.activiti-form-model org.activiti.activiti-image-generator org.activiti.activiti-process-validation org.activiti.activiti-rest org.activiti.activiti-spring org.activiti.activiti-spring-boot-starter-basic org.activiti.activiti-spring-boot-starter-rest-api org.activiti.activiti5-compatibility org.activiti.activiti5-engine org.activiti.activiti5-spring org.activiti.activiti5-spring-compatibility org.apache.arrow.arrow-format org.apache.arrow.arrow-memory org.apache.arrow.arrow-vector org.apache.atlas.atlas-client org.apache.atlas.atlas-typesystem org.apache.avro.avro org.apache.avro.avro-ipc org.apache.avro.avro-mapred org.apache.beam.beam-model-job-management org.apache.beam.beam-model-pipeline org.apache.beam.beam-runners-core-construction-java org.apache.beam.beam-runners-direct-java org.apache.beam.beam-runners-google-cloud-dataflow-java org.apache.beam.beam-sdks-java-core org.apache.beam.beam-sdks-java-extensions-google-cloud-platform-core org.apache.beam.beam-sdks-java-extensions-protobuf org.apache.beam.beam-sdks-java-extensions-sorter org.apache.beam.beam-sdks-java-io-google-cloud-platform org.apache.beam.beam-sdks-java-io-parquet org.apache.beam.beam-vendor-grpc-1_13_1 org.apache.beam.beam-vendor-guava-20_0 org.apache.calcite.calcite-avatica

Copyright © 2021 Trifacta Inc. Page #49 org.apache.calcite.calcite-core org.apache.calcite.calcite-linq4j org.apache.commons.codec org.apache.commons.commons-collections4 org.apache.commons.commons-compress org.apache.commons.commons-configuration2 org.apache.commons.commons-crypto org.apache.commons.commons-csv org.apache.commons.commons-dbcp2 org.apache.commons.commons-email org.apache.commons.commons-exec org.apache.commons.commons-lang3 org.apache.commons.commons-math org.apache.commons.commons-math3 org.apache.commons.commons-pool2 org.apache.curator.curator-client org.apache.curator.curator-framework org.apache.curator.curator-recipes org.apache.derby.derby org.apache.directory.api.api-asn1-api org.apache.directory.api.api-util org.apache.directory.server.apacheds-i18n org.apache.directory.server.apacheds-kerberos-codec org.apache.hadoop.avro org.apache.hadoop.hadoop-annotations org.apache.hadoop.hadoop-auth org.apache.hadoop.hadoop-aws org.apache.hadoop.hadoop-azure org.apache.hadoop.hadoop-azure-datalake org.apache.hadoop.hadoop-client org.apache.hadoop.hadoop-common org.apache.hadoop.hadoop-hdfs org.apache.hadoop.hadoop-hdfs-client org.apache.hadoop.hadoop--client-app org.apache.hadoop.hadoop-mapreduce-client-common org.apache.hadoop.hadoop-mapreduce-client-core org.apache.hadoop.hadoop-mapreduce-client-jobclient org.apache.hadoop.hadoop-mapreduce-client-shuffle org.apache.hadoop.hadoop-yarn-api org.apache.hadoop.hadoop-yarn-client org.apache.hadoop.hadoop-yarn-common org.apache.hadoop.hadoop-yarn-registry org.apache.hadoop.hadoop-yarn-server-common org.apache.hadoop.hadoop-yarn-server-nodemanager org.apache.hadoop.hadoop-yarn-server-web-proxy org.apache.htrace.htrace-core org.apache.htrace.htrace-core4 org.apache.httpcomponents.httpmime org.apache.ivy.ivy org.apache.kerby.kerb-admin org.apache.kerby.kerb-client org.apache.kerby.kerb-common org.apache.kerby.kerb-core org.apache.kerby.kerb-crypto org.apache.kerby.kerb-identity org.apache.kerby.kerb-server org.apache.kerby.kerb-simplekdc org.apache.kerby.kerb-util org.apache.kerby.kerby-asn1 org.apache.kerby.kerby-config

Copyright © 2021 Trifacta Inc. Page #50 org.apache.kerby.kerby-pkix org.apache.kerby.kerby-util org.apache.kerby.kerby-xdr org.apache.kerby.token-provider org.apache.logging.log4j.log4j-1.2-api org.apache.logging.log4j.log4j-api org.apache.logging.log4j.log4j-api-scala_2.11 org.apache.logging.log4j.log4j-core org.apache.logging.log4j.log4j-jcl org.apache.logging.log4j.log4j-jul org.apache.logging.log4j.log4j-slf4j-impl org.apache.logging.log4j.log4j-web org.apache.orc.orc-core org.apache.orc.orc-mapreduce org.apache.orc.orc-shims org.apache.parquet.parquet-avro org.apache.parquet.parquet-column org.apache.parquet.parquet-common org.apache.parquet.parquet-encoding org.apache.parquet.parquet-format org.apache.parquet.parquet-hadoop org.apache.parquet.parquet-jackson org.apache.parquet.parquet-tools org.apache.pig.pig org.apache.pig.pig-core-spork org.apache.thrift.libfb303 org.apache.thrift.libthrift org.apache.tomcat.embed.tomcat-embed-core org.apache.tomcat.embed.tomcat-embed-el org.apache.tomcat.embed.tomcat-embed-websocket org.apache.tomcat.tomcat-annotations-api org.apache.tomcat.tomcat-jdbc org.apache.tomcat.tomcat-juli org.apache.xbean.xbean-asm5-shaded org.apache.xbean.xbean-asm6-shaded org.apache.zookeeper.zookeeper org.codehaus.jackson.jackson-core-asl org.codehaus.jackson.jackson-mapper-asl org.codehaus.jettison.jettison org.datanucleus.datanucleus-api-jdo org.datanucleus.datanucleus-core org.datanucleus.datanucleus-rdbms org.eclipse.jetty.jetty-client org.eclipse.jetty.jetty-http org.eclipse.jetty.jetty-io org.eclipse.jetty.jetty-security org.eclipse.jetty.jetty-server org.eclipse.jetty.jetty-servlet org.eclipse.jetty.jetty-util org.eclipse.jetty.jetty-util-ajax org.eclipse.jetty.jetty-webapp org.eclipse.jetty.jetty-xml org.fusesource.jansi.jansi org.hibernate.hibernate-validator org.htrace.htrace-core org.iq80.snappy.snappy org.jboss.jandex org.joda.joda-convert org.json4s.json4s-ast_2.11 org.json4s.json4s-core_2.11

Copyright © 2021 Trifacta Inc. Page #51 org.json4s.json4s-jackson_2.11 org.json4s.json4s-scalap_2.11 org.liquibase.liquibase-core org.lz4.lz4-java org.mapstruct.mapstruct org.mortbay.jetty.jetty org.mortbay.jetty.jetty-util org.mybatis.mybatis org.objenesis.objenesis org.osgi.org.osgi.core org.parboiled.parboiled-core org.parboiled.parboiled-scala_2.11 org.quartz-scheduler.quartz org.roaringbitmap.RoaringBitmap org.sonatype.oss.oss-parent org.sonatype.sisu.inject.cglib org.spark-project.hive.hive-exec org.spark-project.hive.hive-metastore org.springframework.boot.spring-boot org.springframework.boot.spring-boot-autoconfigure org.springframework.boot.spring-boot-starter org.springframework.boot.spring-boot-starter-aop org.springframework.boot.spring-boot-starter-data-jpa org.springframework.boot.spring-boot-starter-jdbc org.springframework.boot.spring-boot-starter-log4j2 org.springframework.boot.spring-boot-starter-logging org.springframework.boot.spring-boot-starter-tomcat org.springframework.boot.spring-boot-starter-undertow org.springframework.boot.spring-boot-starter-web org.springframework.data.spring-data-commons org.springframework.data.spring-data-jpa org.springframework.plugin.spring-plugin-core org.springframework.plugin.spring-plugin-metadata org.springframework.retry.spring-retry org.springframework.security.spring-security-config org.springframework.security.spring-security-core org.springframework.security.spring-security-crypto org.springframework.security.spring-security-web org.springframework.spring-aop org.springframework.spring-aspects org.springframework.spring-beans org.springframework.spring-context org.springframework.spring-context-support org.springframework.spring-core org.springframework.spring-expression org.springframework.spring-jdbc org.springframework.spring-orm org.springframework.spring-tx org.springframework.spring-web org.springframework.spring-webmvc org.springframework.spring-websocket org.uncommons.maths.uncommons-maths org.wildfly.openssl.wildfly-openssl org.xerial.snappy.snappy-java org.yaml.snakeyaml oro.oro parquet pbr pig-0.11.1-withouthadoop-23 pig-0.12.1-mapr-noversion-withouthadoop

Copyright © 2021 Trifacta Inc. Page #52 pig-0.14.0-core-spork piggybank-amzn-0.3 piggybank-cdh5.0.0-beta-2-0.12.0 python-iptables request requests stax.stax-api thrift tomcat.jasper-compiler tomcat.jasper-runtime xerces.xercesImpl xml-apis.xml-apis zipkin zipkin-transport-http

This product also includes the following libraries which are covered by The Apache License 2.0 + Eclipse Public License 1.0:

spark-assembly-thinner

This product also includes the following libraries which are covered by The Apache License Version 2:

org.mortbay.jetty.jetty-sslengine

This product also includes the following libraries which are covered by The Apache License v2.0:

net.java.dev.jna.jna net.java.dev.jna.jna-platform

This product also includes the following libraries which are covered by The Apache License, version 2.0:

com.nimbusds.oauth2-oidc-sdk org.jboss.logging.jboss-logging

This product also includes the following libraries which are covered by The Apache Software Licenses:

org.slf4j.log4j-over-slf4j

This product also includes the following libraries which are covered by The BSD license:

alabaster antlr.antlr asm.asm-parent babel click com.google.api.api-common com.google.api.gax com.google.api.gax-grpc com.google.api.gax-httpjson com.jcraft.jsch com.thoughtworks.paranamer.paranamer dk.brics.automaton.automaton dom4j.dom4j enum34 flask itsdangerous javolution.javolution jinja2 jline.jline microee

Copyright © 2021 Trifacta Inc. Page #53 mock networkx numpy org.antlr.ST4 org.antlr.antlr-runtime org.antlr.antlr4-runtime org.antlr.stringtemplate org.codehaus.woodstox.stax2-api org.ow2.asm.asm org.scala-lang.jline pandas pluginbase psutil pygments python-enum34 python-json-logger scipy snowballstemmer sphinx sphinxcontrib-websupport strptime websocket-stream xlrd xmlenc.xmlenc xss-filters

This product also includes the following libraries which are covered by The BSD 2-Clause License:

com.github.luben.zstd-jni

This product also includes the following libraries which are covered by The BSD 3-Clause:

org.scala-lang.scala-compiler org.scala-lang.scala-library org.scala-lang.scala-reflect org.scala-lang.scalap

This product also includes the following libraries which are covered by The BSD 3-Clause "New" or "Revised" License (BSD-3-Clause):

org.abego.treelayout.org.abego.treelayout.core

This product also includes the following libraries which are covered by The BSD 3-Clause License:

org.antlr.antlr4

This product also includes the following libraries which are covered by The BSD 3-clause:

org.scala-lang.modules.scala-parser-combinators_2.11 org.scala-lang.modules.scala-xml_2.11 org.scala-lang.plugins.scala-continuations-library_2.11 org.threeten.threetenbp

This product also includes the following libraries which are covered by The BSD New license:

com.google.auth.google-auth-library-credentials com.google.auth.google-auth-library-oauth2-http

This product also includes the following libraries which are covered by The BSD-2-Clause:

Copyright © 2021 Trifacta Inc. Page #54 cls-bluebird node-polyglot org.postgresql.postgresql terser uglify-js

This product also includes the following libraries which are covered by The BSD-3-Clause:

@sentry/node d3-dsv datalib datalib-sketch markupsafe md5 node-forge protobufjs qs queue-async sqlite3 werkzeug

This product also includes the following libraries which are covered by The BSD-Style:

com.jsuereth.scala-arm_2.11

This product also includes the following libraries which are covered by The BSD-derived ( http://www.repoze.org/LICENSE.txt):

meld3 supervisor

This product also includes the following libraries which are covered by The BSD-like:

dnspython idna org.scala-lang.scala-actors org.scalamacros.quasiquotes_2.10

This product also includes the following libraries which are covered by The BSD-style license:

bzip2

This product also includes the following libraries which are covered by The Boost Software License:

asio boost cpp-netlib-uri expected jsbind poco

This product also includes the following libraries which are covered by The Bouncy Castle Licence:

org.bouncycastle.bcprov-jdk15on

This product also includes the following libraries which are covered by The CDDL:

javax.mail.mail javax.mail.mailapi javax.servlet.jsp-api

Copyright © 2021 Trifacta Inc. Page #55 javax.servlet.jsp.jsp-api javax.servlet.servlet-api javax.transaction.jta javax.xml.stream.stax-api org.glassfish.external.management-api org.glassfish.gmbal.gmbal-api-only org.jboss.spec.javax.annotation.jboss-annotations-api_1.2_spec

This product also includes the following libraries which are covered by The CDDL + GPLv2 with classpath exception:

javax.annotation.javax.annotation-api javax.jms.jms javax.servlet.javax.servlet-api javax.transaction.javax.transaction-api javax.transaction.transaction-api org.glassfish.grizzly.grizzly-framework org.glassfish.grizzly.grizzly-http org.glassfish.grizzly.grizzly-http-server org.glassfish.grizzly.grizzly-http-servlet org.glassfish.grizzly.grizzly-rcm org.glassfish.hk2.external.aopalliance-repackaged org.glassfish.hk2.external.javax.inject org.glassfish.hk2.hk2-api org.glassfish.hk2.hk2-locator org.glassfish.hk2.hk2-utils org.glassfish.hk2.osgi-resource-locator org.glassfish.javax.el org.glassfish.jersey.core.jersey-common

This product also includes the following libraries which are covered by The CDDL 1.1:

com.sun.jersey.contribs.jersey-guice com.sun.jersey.jersey-client com.sun.jersey.jersey-core com.sun.jersey.jersey-json com.sun.jersey.jersey-server com.sun.jersey.jersey-servlet com.sun.xml.bind.jaxb-impl javax.ws.rs.javax.ws.rs-api javax.xml.bind.jaxb-api org.jvnet.mimepull.mimepull

This product also includes the following libraries which are covered by The CDDL License:

javax.ws.rs.jsr311-api

This product also includes the following libraries which are covered by The CDDL/GPLv2+CE:

com.sun.mail.javax.mail

This product also includes the following libraries which are covered by The CERN:

colt.colt

This product also includes the following libraries which are covered by The COMMON DEVELOPMENT AND DISTRIBUTION LICENSE (CDDL) Version 1.0:

javax.activation.activation javax.annotation.jsr250-api

Copyright © 2021 Trifacta Inc. Page #56 This product also includes the following libraries which are covered by The Doug Crockford's license that allows this module to be used for Good but not for Evil:

jsmin

This product also includes the following libraries which are covered by The Dual License:

python-dateutil

This product also includes the following libraries which are covered by The Eclipse Distribution License (EDL), Version 1.0:

org.hibernate.javax.persistence.hibernate-jpa-2.1-api

This product also includes the following libraries which are covered by The Eclipse Public License:

com.github.oshi.oshi-core

This product also includes the following libraries which are covered by The Eclipse Public License - v 1.0:

org.aspectj.aspectjweaver

This product also includes the following libraries which are covered by The Eclipse Public License, Version 1.0:

com.mchange.mchange-commons-java

This product also includes the following libraries which are covered by The GNU General Public License v2.0 only, with Classpath exception:

org.jboss.spec.javax.servlet.jboss-servlet-api_3.1_spec org.jboss.spec.javax.websocket.jboss-websocket-api_1.1_spec

This product also includes the following libraries which are covered by The GNU LGPL:

nose

This product also includes the following libraries which are covered by The GNU Lesser General Public License:

org.hibernate.common.hibernate-commons-annotations org.hibernate.hibernate-core org.hibernate.hibernate-entitymanager

This product also includes the following libraries which are covered by The GNU Lesser General Public License Version 2.1, February 1999:

org.jgrapht.jgrapht-core

This product also includes the following libraries which are covered by The GNU Lesser General Public License, Version 2.1:

com.fasterxml.jackson.core.jackson-annotations com.fasterxml.jackson.core.jackson-core com.fasterxml.jackson.core.jackson-databind com.mchange.c3p0

This product also includes the following libraries which are covered by The GNU Lesser Public License:

com.google.code.findbugs.annotations

Copyright © 2021 Trifacta Inc. Page #57 This product also includes the following libraries which are covered by The GPLv3:

yamllint

This product also includes the following libraries which are covered by The Google Cloud Software License:

com.google.cloud.google-cloud-storage

This product also includes the following libraries which are covered by The ICU License:

icu

This product also includes the following libraries which are covered by The ISC:

browserify-sign iconify inherits lru-cache request-promise request-promise-native requests-kerberos rimraf sax semver split-ca

This product also includes the following libraries which are covered by The JGraph Ltd - 3 clause BSD license:

org.tinyjee.jgraphx.jgraphx

This product also includes the following libraries which are covered by The LGPL:

chardet com.sun.jna.jna

This product also includes the following libraries which are covered by The LGPL 2.1:

org.codehaus.jackson.jackson-jaxrs org.codehaus.jackson.jackson-xc org.javassist.javassist xmlhttprequest

This product also includes the following libraries which are covered by The LGPLv3 or later:

com.github.fge.json-schema-core com.github.fge.json-schema-validator

This product also includes the following libraries which are covered by The MIT license:

amplitude-js analytics-node args4j.args4j async avsc backbone backbone-forms basic-auth bcrypt bluebird

Copyright © 2021 Trifacta Inc. Page #58 body-parser browser-filesaver buffer-crc32 bufferedstream busboy byline bytes cachetools chai chalk cli-table clipboard codemirror colors com.github.tommyettinger.blazingchain com.microsoft.azure.azure-data-lake-store-sdk com.microsoft.sqlserver.mssql-jdbc commander common-tags compression console.table cookie cookie-parser cookie-session cronstrue crypto-browserify csrf csurf definitely eventEmitter express express-http-context express-params express-zip forever form-data fs-extra function-rate-limit future fuzzy generic-pool google-auto-auth iconv-lite imagesize int24 is-my-json-valid jade jq jquery jquery-serializeobject jquery.ba-serializeobject jquery.event.drag jquery.form jquery.ui json-stable-stringify jsonfile jsonschema jsonwebtoken jszip keygrip

Copyright © 2021 Trifacta Inc. Page #59 keysim knex less-middleware lodash lunr matic memcached method-override mini-css-extract-plugin minilog mkdirp moment moment-jdateformatparser moment-timezone mongoose morgan morgan-json mysql net.razorvine.pyrolite netifaces nock nodemailer org.checkerframework.checker-compat-qual org.checkerframework.checker-qual org.mockito.mockito-core org.slf4j.jcl-over-slf4j org.slf4j.jul-to-slf4j org.slf4j.slf4j-api org.slf4j.slf4j-log4j12 pace passport passport-azure-ad passport-http passport-http-bearer passport-ldapauth passport-local passport-saml passport-saml-metadata passport-strategy password-validator pegjs pg pg-hstore pip png-img promise-retry prop-types punycode py-cpuinfo python-crfsuite python-six pytz pyyaml query-string querystring randexp rapidjson react react-day-picker react-dom

Copyright © 2021 Trifacta Inc. Page #60 react-hot-loader react-modal react-router-dom react-select react-switch react-table react-virtualized recursive-readdir redefine requestretry require-jade require-json retry retry-as-promised rotating-file-stream safe-json-stringify sequelize setuptools simple-ldap-search simplejson singledispatch six slick.core slick.grid slick.headerbuttons slick.headerbuttons.css slick.rowselectionmodel snappy sphinx-rtd-theme split-pane sql stream-meter supertest tar-fs temp to-case tv4 ua-parser-js umzug underscore.string unicode-length universal-analytics urijs uritools url urllib3 user-agent-parser uuid validator wheel winston winston-daily-rotate-file ws yargs zxcvbn

This product also includes the following libraries which are covered by The MIT License; BSD 3-clause License:

antlr4-cpp-runtime

Copyright © 2021 Trifacta Inc. Page #61 This product also includes the following libraries which are covered by The MIT license:

org.codehaus.mojo.animal-sniffer-annotations

This product also includes the following libraries which are covered by The MIT/X:

vnc2flv

This product also includes the following libraries which are covered by The MIT/X11 license:

optimist

This product also includes the following libraries which are covered by The MPL 2.0:

pathspec

This product also includes the following libraries which are covered by The MPL 2.0 or EPL 1.0:

com.h2database.h2

This product also includes the following libraries which are covered by The MPL-2.0:

certifi

This product also includes the following libraries which are covered by The Microsoft Software License:

com.microsoft.sqlserver.sqljdbc4

This product also includes the following libraries which are covered by The Mozilla Public License, Version 2.0:

org.mozilla.rhino

This product also includes the following libraries which are covered by The New BSD License:

asm.asm backbone-queryparams bloomfilter com.esotericsoftware.kryo-shaded com.esotericsoftware.minlog com.googlecode.protobuf-java-format.protobuf-java-format d3 domReady double-conversion gflags glog gperftools gtest org.antlr.antlr-master org.codehaus.janino.commons-compiler org.codehaus.janino.janino org.hamcrest.hamcrest-core pcre protobuf re2 require-text sha1 termcolor topojson triflow

Copyright © 2021 Trifacta Inc. Page #62 vega websocketpp

This product also includes the following libraries which are covered by The New BSD license:

com.google.protobuf.nano.protobuf-javanano

This product also includes the following libraries which are covered by The Oracle Technology Network License:

com.oracle.ojdbc6

This product also includes the following libraries which are covered by The PSF:

futures heap typing

This product also includes the following libraries which are covered by The PSF license:

functools32 python

This product also includes the following libraries which are covered by The PSF or ZPL:

wsgiref

This product also includes the following libraries which are covered by The Proprietary:

com.trifacta.connect.trifacta-TYcassandra com.trifacta.connect.trifacta-TYdb2 com.trifacta.connect.trifacta-TYgreenplum com.trifacta.connect.trifacta-TYhive com.trifacta.connect.trifacta-TYinformix com.trifacta.connect.trifacta-TYmongodb com.trifacta.connect.trifacta-TYmysql com.trifacta.connect.trifacta-TYopenedgewp com.trifacta.connect.trifacta-TYoracle com.trifacta.connect.trifacta-TYoraclesalescloud com.trifacta.connect.trifacta-TYpostgresql com.trifacta.connect.trifacta-TYredshift com.trifacta.connect.trifacta-TYrightnow com.trifacta.connect.trifacta-TYsforce com.trifacta.connect.trifacta-TYsparksql com.trifacta.connect.trifacta-TYsqlserver com.trifacta.connect.trifacta-TYsybase jsdata

This product also includes the following libraries which are covered by The Public Domain:

aopalliance.aopalliance org.jboss.xnio.xnio-api org.jboss.xnio.xnio-nio org.tukaani.xz protobuf-to-dict pycrypto simple-xml-writer

This product also includes the following libraries which are covered by The Public domain:

net.iharder.base64

Copyright © 2021 Trifacta Inc. Page #63 This product also includes the following libraries which are covered by The Python Software Foundation License:

argparse backports-abc ipaddress python-setuptools regex

This product also includes the following libraries which are covered by The Tableau Binary Code License Agreement:

com.tableausoftware.common.tableaucommon com.tableausoftware.extract.tableauextract

This product also includes the following libraries which are covered by The Apache License, Version 2.0:

com.fasterxml.woodstox.woodstox-core com.google.cloud.bigtable.bigtable-client-core com.google.cloud.datastore.datastore-v1-proto-client io.opencensus.opencensus-api io.opencensus.opencensus-contrib-grpc-metrics io.opencensus.opencensus-contrib-grpc-util io.opencensus.opencensus-contrib-http-util org.spark-project.spark.unused software.amazon.ion.ion-java

This product also includes the following libraries which are covered by The BSD 3-Clause License:

org.fusesource.leveldbjni.leveldbjni-all

This product also includes the following libraries which are covered by The GNU General Public License, Version 2:

mysql.mysql-connector-java

This product also includes the following libraries which are covered by The Go license:

com.google.re2j.re2j

This product also includes the following libraries which are covered by The MIT License (MIT):

com.microsoft.azure.azure com.microsoft.azure.azure-annotations com.microsoft.azure.azure-client-authentication com.microsoft.azure.azure-client-runtime com.microsoft.azure.azure-keyvault com.microsoft.azure.azure-keyvault-core com.microsoft.azure.azure-keyvault-webkey com.microsoft.azure.azure-mgmt-appservice com.microsoft.azure.azure-mgmt-batch com.microsoft.azure.azure-mgmt-cdn com.microsoft.azure.azure-mgmt-compute com.microsoft.azure.azure-mgmt-containerinstance com.microsoft.azure.azure-mgmt-containerregistry com.microsoft.azure.azure-mgmt-containerservice com.microsoft.azure.azure-mgmt-cosmosdb com.microsoft.azure.azure-mgmt-dns com.microsoft.azure.azure-mgmt-graph-rbac com.microsoft.azure.azure-mgmt-keyvault

Copyright © 2021 Trifacta Inc. Page #64 com.microsoft.azure.azure-mgmt-locks com.microsoft.azure.azure-mgmt-network com.microsoft.azure.azure-mgmt-redis com.microsoft.azure.azure-mgmt-resources com.microsoft.azure.azure-mgmt-search com.microsoft.azure.azure-mgmt-servicebus com.microsoft.azure.azure-mgmt-sql com.microsoft.azure.azure-mgmt-storage com.microsoft.azure.azure-mgmt-trafficmanager com.microsoft.rest.client-runtime

This product also includes the following libraries which are covered by The MIT License: http://www.opensource.org/licenses/mit-license.php:

probableparsing usaddress

This product also includes the following libraries which are covered by The New BSD License:

net.sf.py4j.py4j org.jodd.jodd-core

This product also includes the following libraries which are covered by The Unicode/ICU License:

com.ibm.icu.icu4j

This product also includes the following libraries which are covered by The Unlicense:

moment-range stream-buffers

This product also includes the following libraries which are covered by The WTFPL:

org.reflections.reflections

This product also includes the following libraries which are covered by The http://www.apache.org/licenses/LICENSE-2.0:

tornado

This product also includes the following libraries which are covered by The new BSD:

scikit-learn

This product also includes the following libraries which are covered by The new BSD License:

decorator

This product also includes the following libraries which are covered by The provided without support or warranty:

org.json.json

This product also includes the following libraries which are covered by The public domain, Python, 2-Clause BSD, GPL 3 (see COPYING.txt):

docutils

This product also includes the following libraries which are covered by The zlib License:

zlib

Copyright © 2021 Trifacta Inc. Page #65 Copyright © 2021 Trifacta Inc. Page #66 Contact Support Do you need further assistance? Check out the resources below: Resources for Support

Download logs: When you Search Support report your issue, please acquire the relevant logs In Trifacta® Wrangler Enterprise, click the Help icon and select Search Help available in the Trifacta to search our help content. application. Select Help menu > Download logs. If your question is not answered through search, you can file a support ticket See Download Logs Dialog. through the Support Portal (see below). Email Trifacta Community and Support Portal [email protected] The Trifacta Community and Support Portal can be reached at: https://community.trifacta.com trifacta.com

Within our Community, you can: For more information about other products from Tri Manage your support cases facta®, please visit Get free Wrangler certifications www.trifacta.com. Post questions to the community Search our AI-driven knowledgebase Answer questions to earn points and get on the leaderboard Watch tutorials, access documentation, learn about new features, and more

Copyright © 2021 Trifacta Inc. Page #67 Copyright © 2021 - Trifacta, Inc. All rights reserved.