Install Guide for

Data Preparation for Amazon Redshift and S3

Version: 6.4.2 Doc Build Date: 11/26/2019 Copyright © Trifacta Inc. 2019 - All Rights Reserved. CONFIDENTIAL

These materials (the “Documentation”) are the confidential and proprietary information of Trifacta Inc. and may not be reproduced, modified, or distributed without the prior written permission of Trifacta Inc.

EXCEPT AS OTHERWISE PROVIDED IN AN EXPRESS WRITTEN AGREEMENT, TRIFACTA INC. PROVIDES THIS DOCUMENTATION AS-IS AND WITHOUT WARRANTY AND TRIFACTA INC. DISCLAIMS ALL EXPRESS AND IMPLIED WARRANTIES TO THE EXTENT PERMITTED, INCLUDING WITHOUT LIMITATION THE IMPLIED WARRANTIES OF MERCHANTABILITY, NON-INFRINGEMENT AND FITNESS FOR A PARTICULAR PURPOSE AND UNDER NO CIRCUMSTANCES WILL TRIFACTA INC. BE LIABLE FOR ANY AMOUNT GREATER THAN ONE HUNDRED DOLLARS ($100) BASED ON ANY USE OF THE DOCUMENTATION.

For third-party license information, please select About Trifacta from the User menu. 1. Release Notes . . 4 1.1 Changes to System Behavior . . 4 1.1.1 Changes to the Language . . 4 1.1.2 Changes to the APIs . 16 1.1.3 Changes to Configuration 21 1.1.4 Changes to the Command Line Interface . 23 1.1.5 Changes to the Object Model . 25 1.2 Release Notes 6.4 . 29 1.3 Release Notes 6.0 . 35 1.4 Release Notes 5.1 . 42 2. Install Overview 48 2.1 Install from AWS Marketplace . 49 3. Install Software 53 3.1 Install Dependencies without Internet Access . 53 3.2 Install for Docker 55 3.3 Install on CentOS and RHEL . 64 3.4 Install on Ubuntu . 67 3.5 License Key . 71 3.6 Install Desktop Application 73 3.7 Start and Stop the Platform . 76 3.8 Login 78 4. Install Reference 79 4.1 Install SSL Certificate 79 4.2 Change Listening Port . 83 4.3 Supported Deployment Scenarios for AWS . 84 4.4 Uninstall 88 5. Configure for AWS . 89 5.1 Configure for EC2 Role-Based Authentication . 95 6. Enable S3 Access . 96 6.1 Create Redshift Connections 109 7. Upgrade Overview 111 7.1 Upgrade for AWS Marketplace 111 8. Third-Party License Information 118

Page #3 Release Notes

This section contains release notes for published versions of Trifacta® Wrangler Enterprise.

Topics:

Changes to System Behavior Changes to the Language Changes to the APIs Changes to Configuration Changes to the Command Line Interface Changes to the Object Model Release Notes 6.4 Release Notes 6.0 Release Notes 5.1 Release Notes 5.0 Release Notes 4.2 Release Notes 4.1 Release Notes 4.0

Changes to System Behavior

The following pages contain information about changes to system features, capabilities, and behaviors in this release.

Topics:

Changes to the Language Changes to the APIs Changes to Configuration Changes to the Command Line Interface Changes to the Object Model

Changes to the Language

Contents:

Release 6.4 Improvements to metadata references Release 6.3 New Functions Optional input formats for DateFormat task Release 6.2 New Functions ARRAYELEMENTAT function accepts new inputs Release 6.1 Release 6.0 New Functions Changes to LIST* inputs Renamed functions FILL Function has new before and after parameters Release 5.9 New functions Release 5.8 File lineage information using source metadata references New math and statistical functions for arrays

Copyright © 2019 Trifacta Inc. Page #4 Release 5.7 WEEKNUM function now behaves consistently across running environments Release 5.6 URLPARAMS function returns null values Release 5.1 Wrangle now supports nested expressions SOURCEROWNUMBER function generates null values consistently New Functions Release 5.0.1 RAND function generates true random numbers Release 5.0 Required type parameter Deprecated aggregate transform New search terms Support for <> operator ROUND function takes optional number of digits New Functions Release 4.2.1 Release 4.2 New Filter transform New Case transform Rename transform now supports multi-column rename Drop specified columns or drop the others New string comparison functions NOW function returns 24-hour time values New Transforms New Functions

The following changes have been applied to Wrangle in this release of Trifacta® Wrangler Enterprise.

Release 6.4

Improvements to metadata references

Broader support for metadata references: For Excel files, $filepath references now return the location of the source Excel file. Sheet names are appended to the end of the reference. See Source Metadata References.

Release 6.3

New Functions

Function Name Description

PARSEDATE Function Evaluates an input against an array of Datetime format strings in their listed order. If the input matches one of the listed formats, the function outputs a Datetime value.

Optional input formats for DateFormat task

The DateFormat task now supports a new parameter: Input Formats. This parameter specifies the date format to use when attempting to parse the input column.

If the parameter is specified, then the value of the parameter is used to parse the inputs. (default) if the parameter is not specified, then the following common formats are used for parsing the input:

Copyright © 2019 Trifacta Inc. Page #5 'M/d/yy', 'MM/dd/yy', 'MM-dd-yy', 'M-d-yy', 'MMM d, yyyy', 'MMMM d, yyyy', 'EEEE, MMMM d, yyyy', 'MMM d yyyy', 'MMMM d yyyy', 'MM-dd-yyyy', 'M-d-yyyy', 'yyyy-MM-ddXXX', 'dd/MM/yyyy', 'd/M/yyyy', 'MM/dd/yyyy', 'M/d/yyyy', 'yyyy/M/d', 'M/d/yy h:mm a', 'MM/dd/yy h:mm a', 'MM-dd-yy h:mm a', 'MMM dd yyyy HH.MM.SS xxx', 'M-d-yy h:mm a', 'MMM d, yyyy h:mm:ss a', 'EEEE, MMMM d, yyyy h:mm:ss a X', 'EEE MMM dd HH:mm:ss X yyyy', 'EEE, d MMM yyyy HH:mm:ss X', 'd MMM yyyy HH:mm:ss X', 'MM-dd-yyyy h:mm:ss a', 'M-d-yyyy h:mm:ss a', 'yyyy-MM-dd h:mm:ss a', 'yyyy-M-d h:mm:ss a', 'yyyy-MM-dd HH:mm:ss.S', 'dd/MM/yyyy h:mm:ss a', 'd/M/yyyy h:mm:ss a', 'MM/dd/yyyy h:mm:ss a', 'M/d/yyyy h:mm:ss a', 'MM/dd/yy h:mm:ss a', 'MM/dd/yy H:mm:ss', 'M/d/yy H:mm:ss', 'dd/MM/yyyy h:mm a', 'd/M/yyyy h:mm a', 'MM/dd/yyyy h:mm a', 'M/d/yyyy h:mm a', 'MM-dd-yy h:mm:ss a', 'M-d-yy h:mm:ss a', 'MM-dd-yyyy h:mm a', 'M-d-yyyy h:mm a', 'yyyy-MM-dd h:mm a', 'yyyy-M-d h:mm a', 'MMM.dd.yyyy', 'd/MMM/yyyy H:mm:ss X', 'dd/MMM/yy h:mm a',

Copyright © 2019 Trifacta Inc. Page #6 These formats are a subset of the date formatting strings supported by the product. For more information, see Datetime Data Type.

Release 6.2

New Functions

Function Name Description

RANK Function Computes the rank of an ordered set of value within groups. Tie values are assigned the same rank, and the next ranking is incremented by the number of tie values.

DENSERANK Function Computes the rank of an ordered set of value within groups. Tie values are assigned the same rank, and the next ranking is incremented by 1.

ARRAYELEMENTAT function accepts new inputs

In previous releases, the ARRAYELEMENTAT function accepted a second input parameter to specify the index value of the element to retrieve. This "at" parameter had to be an Integer literal.

Beginning in this release, the function also accepts for this second "at" parameter:

Names of columns containing Integer values Functions that return Integer values

For more information, see ARRAYELEMENTAT Function.

Release 6.1

None.

Release 6.0

New Functions

Function Name Description

ARRAYINDEXOF Function Computes the index at which a specified element is first found within an array. Indexing is left to right.

ARRAYRIGHTINDEXOF Function Computes the index at which a specified element is first found within an array, when searching right to left. Returned value is based on left-to-right indexing.

ARRAYSLICE Function Returns an array containing a slice of the input array, as determined by starting and ending index parameters.

ARRAYMERGEELEMENTS Function Merges the elements of an array in left to right order into a string. Values are optionally delimited by a provided delimiter.

Changes to LIST* inputs

The following LIST-based functions have been changed to narrow the accepted input data types. In previous releases, any data type was accepted for input, which was not valid for most data types.

In Release 6.0 and later, these functions accept only Array inputs. Inputs can be Array literals, a column of Arrays, or a function returning Arrays.

NOTE: You should references to these functions in your recipes.

Copyright © 2019 Trifacta Inc. Page #7 LIST* Functions

LISTAVERAGE Function

LISTMIN Function

LISTMAX Function

LISTMODE Function

LISTSTDEV Function

LISTSUM Function

LISTVAR Function

Renamed functions

The following functions have been renamed in Release 6.0.

Release 5.9 and earlier Release 6.0 and later

LISTUNIQUE Function UNIQUE Function

FILL Function has new before and after parameters

Prior to Release 6.0, the FILL function replaced empty cells with the most recent non-empty value.

In Release 6.0, before and after function parameters have been added. These parameters define the window of rows before and after the row being tested to search for non-empty values. Within this window, the most recent non-empty value is used.

The default values for these parameters are -1 and 0 respectively, which performs a search of an unlimited number of preceding rows for a non-empty value.

NOTE: Upon upgrade, the FILL function retains its preceding behavior, as the default values for the new parameters perform the same unlimited row search for non-empty values.

For more information, see FILL Function.

Release 5.9

New functions

The following functions can now be applied directly to arrays to derive meaningful statistics about them.

Function Description

ARRAYSORT Function Sorts array values in the specified column, array literal, or function that returns an array in ascending or descending order.

TRANSLITERATE Function Transliterates Asian script characters from one script form to another. The string can be specified as a column reference or a string literal.

Copyright © 2019 Trifacta Inc. Page #8 Release 5.8

File lineage information using source metadata references

Beginning in Release 5.8, you can insert the following references into the formulas of your transformations. These source metadata references enable you to continue to track file lineage information from within your datasets as part of your wrangling project.

NOTE: These references apply only to file-based sources. Some additional limitations may apply.

reference Description

$filepath Returns the full path and filename of the source of the dataset.

$sourcerownumber Returns the row number for the current row from the original source of the dataset.

NOTE: This reference is equivalent to the SOURCEROWNUMBER function, which is likely to be deprecated in a future release. You should begin using this reference in your recipes.

For more information, see Source Metadata References.

New math and statistical functions for arrays

The following functions can now be applied directly to arrays to derive meaningful statistics about them.

Function Description

LISTSUM Function Computes the sum of all numeric values found in input array. Input can be an array literal, a column of arrays, or a function returning an array. Input values must be of Integer or Decimal type.

LISTMAX Function Computes the maximum of all numeric values found in input array. Input can be an array literal, a column of arrays, or a function returning an array. Input values must be of Integer or Decimal type.

LISTMIN Function Computes the minimum of all numeric values found in input array. Input can be an array literal, a column of arrays, or a function returning an array. Input values must be of Integer or Decimal type.

LISTAVERAGE Function Computes the average of all numeric values found in input array. Input can be an array literal, a column of arrays, or a function returning an array. Input values must be of Integer or Decimal type.

LISTVAR Function Computes the variance of all numeric values found in input array. Input can be an array literal, a column of arrays, or a function returning an array. Input values must be of Integer or Decimal type.

LISTSTDEV Function Computes the standard deviation of all numeric values found in input array. Input can be an array literal, a column of arrays, or a function returning an array. Input values must be of Integer or Decimal type.

LISTMODE Function Computes the most common value of all numeric values found in input array. Input can be an array literal, a column of arrays, or a function returning an array. Input values must be of Integer or Decimal type.

Copyright © 2019 Trifacta Inc. Page #9 Release 5.7

WEEKNUM function now behaves consistently across running environments

In Release 5.6 and earlier, the WEEKNUM function treated the first week of the year differently between the Trifac ta Photon and Spark running environments:

Trifacta Photon week 1 of the year: The week that contains January 1. Spark week 1 of the year: The week that contains at least four days in the specified year.

This issue was caused by Spark following an ISO-8601 standard and relying on the joda datetimeformatter.

Beginning in Release 5.7, the WEEKNUM function behaves consistently for both Trifacta Photon and Spark:

Week 1 of the year: The week that contains January 1.

For more information, see WEEKNUM Function.

Release 5.6

URLPARAMS function returns null values

In Release 5.1 and earlier, the URLPARAMS function returned empty Objects when no answer was computed for the function.

In Release 5.6 and later, this function returns null values in the above case.

See URLPARAMS Function.

Release 5.1

Wrangle now supports nested expressions

Beginning in Release 5.1, all Wrangle functions now supported nested expressions, which can be arithmetic calculations, column references, or other function calls.

NOTE: This feature is enabled by default, as this change does not break any steps created in previous versions of the product. It can be disabled if needed. See Miscellaneous Configuration.

NOTE: This capability represents a powerful enhancement to the language, as you can now use dynamic inputs for all functions.

The following expression is a valid transform in Wrangle. It locates the substring in myString that begins with the @ sign until the end of the string, inclusive:

derive value: substring(myString, find(myString, '@', true, 0), length(myString)

Nested arithmetic expressions:

Suppose you wanted just the value after the @ sign until the end of the string. Prior to Release 5.1, the following generated a validation error:

derive value: substring(myString, find(myString, '@', true, 0) + 1, length(myString)

Copyright © 2019 Trifacta Inc. Page #10 In the above, the addition of +1 to the second parameter is a nested expression and was not supported. Instead, you had to use multiple steps to generate the string value.

Beginning in Release 5.1, the above single-step transform is supported.

Nested column references:

In addition to arithmetic expressions, you can nested column references. In the following example, the previous step has been modified to replace the static +1 with a reference to a column containing the appropriate value (at _sign_offset) :

derive value: substring(myString, find(myString, '@', true, 0) + at_sign_offset, length (myString)

Nested function references:

Now, you can combine multiple function references into a single computation. The following computes the total volume of a cube of length side and then multiplies that volume by the number of cubes (cube_count) to compute the total cube_volume

derive type: single value: MULTIPLY(POW(cube_side,3),cube_count) as: 'cube_volume'

For more information, see Wrangle Language.

SOURCEROWNUMBER function generates null values consistently

The SOURCEROWNUMBER function returns the row number of the row as it appears in the original dataset. After some operations, such as unions, joins, and aggregations, this row information is no longer available.

In Release 5.0.1 and earlier, the results were confusing. When source row information was not available, the function was simply not available for use.

In Release 5.1 and later, the behavior of the SOURCEROWNUMBER function is more consistent:

If the source row information is available, it is returned. If it is not available: The function can still be used. The function returns null values in all cases.

For more information, see SOURCEROWNUMBER Function.

New Functions

Function Name Description

ARRAYELEMENTAT Function Returns element value of input array for the provided index value.

DOUBLEMETAPHONE Function Returns primary and secondary phonetic spellings of an input string using the Double Metaphone algorithm.

DOUBLEMETAPHONEEQUALS Function Returns true if two strings match phonetic spellings using Double Metaphone algorithm. Tolerance threshold can be adjusted.

UNIQUE Function Generates a new column containing an array of the unique values from a source column.

Copyright © 2019 Trifacta Inc. Page #11 Release 5.0.1

RAND function generates true random numbers

In Release 5.0 and earlier, the RAND function produced the same set of random numbers within the browser, after browser refresh, and over subsequent runs of a job.

During job execution, a default seed value was inserted as the basis for the function during the execution of the job. In some cases, this behavior is desired.

In Release 5.0.1 and later, the RAND function accepts an optional integer as a parameter. When this new seed value is inserted, the function generates deterministic, pseudo-random values.

This version matches the behavior of the old function.

NOTE: On all upgraded instances of the platform, references to the RAND function have been converted to use a default seed value, so that previous behavior is maintained in the upgraded version.

If no seed value is inserted as a parameter, the RAND function generates true random values within the browser, after browser refresh, and over subsequent job runs.

NOTE: Be aware that modifying your dataset based on the generated values of RAND() may have unpredictable effects later in your recipe and downstream of it.

For more information, see RAND Function.

Release 5.0

Required type parameter

Prior to Release 5.0, the following was a valid Wrangle step:

derive value:colA + colB as:'colC'

Beginning in Release 5.0, the type parameter is required. This parameter defines whether the transform is a single or multi-row formula. In the Transform Builder, this value must be specified.

The following is valid in Release 5.0:

derive type:single value:colA + colB as:'colC'

See Derive Transform.

See Transform Builder.

Deprecated aggregate transform

In Release 4.2.1 and earlier, the aggregate transform could be used to aggregate your datasets using aggregation functions and groupings.

In Release 5.0 and later, this transform has been merged into the pivot transform. The aggregate transform has been deprecated and is no longer available.

Copyright © 2019 Trifacta Inc. Page #12 NOTE: During upgrade to Release 5.0 and later, recipes that had previously used the aggregate transform are automatically migrated to use the pivot equivalent.

Example 1

Release 4.2.1 and earlier Aggregate:

aggregate value:AVERAGE(Scores)

Release 5.0 and later Pivot:

pivot value: AVERAGE(Score) limit: 1

The limit parameter defines the maximum number of columns that can be generated by the pivot.

Example 2

Aggregate:

aggregate value:AVERAGE(Scores) group:studentId

Pivot:

pivot group: StudentId value: AVERAGE(Score) limit: 1

For more information, see Pivot Transform.

New search terms

In the new Search panel, you can search for terms that can be used to select transformations for quick population of parameters. In the following table, you can see Wrangle how terminology has changed in Release 5.0 for some common transforms from earlier release.

Tip: You can paste the Release 5.0 terms in the Search panel to locate the same transformations used in earlier releases.

Release 4.2.1 and earlier transforms Release 5.0 and later search terms

aggregate pivot

keep filter

delete filter

extract on: extractpatterns

extract at: extractpositions

extract before: extractbetweendelimiters

extract after: extractbetweendelimiters

replace on: replacepatterns

replace at: replacepositions

replace before: replacebetweenpatterns

replace after: replacebetweenpatterns

Copyright © 2019 Trifacta Inc. Page #13 replace from: replacebetweenpatterns

replace to: replacebetweenpatterns

split on: splitpatterns

split delimiters: splitpositions

split every: splitpositions

split positions: splitpositions

split after: splitpatterns

split before: splitpatterns

split from: splitpatterns

split to: splitpatterns

Support for <> operator

Prior to Release 5.0, the following operator was used to test "not equal" comparisons:

!=

Beginning in Release 5.0, the following operators is also supported:

<>

Example:

derive value:IF ((col1 <> col2), 'different','equal') as:'testNotEqual'

Tip: Both of the above operators are supported, although the <> operator is preferred.

For more information, see Comparison Operators.

ROUND function takes optional number of digits

The ROUND function now supports rounding to a specified number of digits. By default, values are rounded to the nearest integer, as before. See ROUND Function.

New Functions

Function Name Description

DEGREES Function Generates the value in degrees for an input radians value.

EXACT Function Compares two strings to see if they are exact matches.

FILTEROBJECT Function Filters the keys and values from an Object based on specified keys.

HOST Function Returns the host value from a URL.

ISEVEN Function Returns true if an Integer, function returning an Integer, or a column contains an even value.

Copyright © 2019 Trifacta Inc. Page #14 ISODD Function Returns true if an Integer, function returning an Integer, or a column contains an odd value.

KTHLARGESTUNIQUE Function Computes the kth-ranked unique value in a set of values.

LCM Function Returns the least common multiple between two input values.

MODE Function Computes the mode (most common) value for a set of values.

MODEIF Function Computes the mode based on a conditional test.

PAD Function Pads the left or right side of a value with a specified character string.

PI Function Generates the value for pi to 15 decimal places.

RADIANS Function Generates the value in radians for an input degrees value.

RANDBETWEEN Function Generates a random Integer in a range between two specified values.

RIGHTFIND Function Locates a substring by searching from the right side of an input value.

ROLLINGCOUNTA Function Computes count of non-null values across a rolling window within a column.

ROLLINGKTHLARGEST Function Computes the kth largest value across a rolling window within a column.

ROLLINGKTHLARGESTUNIQUE Function Computes the kth largest unique value across a rolling window within a column.

ROLLINGLIST Function Computes list of all values across a rolling window within a column.

ROLLINGMAX Function Computes maximum value across a rolling window within a column.

ROLLINGMIN Function Computes minimum value across a rolling window within a column.

ROLLINGMODE Function Computes mode (most common) value across a rolling window within a column.

ROLLINGSTDEV Function Computes standard deviation across a rolling window within a column.

ROLLINGVAR Function Computes variance across a rolling window within a column.

SIGN Function Computes the positive or negative sign of an input value.

TRUNC Function Truncates a value to the nearest integer or a specified number of digits.

URLPARAMS Function Extracts any query parameters from a URL into an Object.

WEEKNUM Function Calculates the week that the date appears during the year (1-52).

Release 4.2.1

None.

Release 4.2

New Filter transform

Perform a variety of predefined row filtrations using the new filter transform, or apply your own custom formula to keep or delete rows from your dataset.

See Remove Data. See Filter Transform.

Copyright © 2019 Trifacta Inc. Page #15 New Case transform

Beginning in Release 4.2, you can use the Transform Builder to simplify the construction of CASE statements. For each case, specify the conditional and resulting expression in separate textboxes.

See Apply Conditional Transformations. See Case Transform.

Rename transform now supports multi-column rename

Use the rename transform to rename multiple columns in a single transform.

See Rename Columns. See Rename Transform.

Drop specified columns or drop the others

The drop transform now supports the option of dropping all columns except the ones specified in the transform. See Drop Transform.

New string comparison functions

Compare two strings using Latin collation settings. See below.

NOW function returns 24-hour time values

In Release 4.1.1 and earlier, the NOW function returned time values for the specified time zone in 12-hour time, which was confusing.

In Release 4.2 and later, this function returns values in 24-hour time.

New Transforms

Transform Name Documentation

case Case Transform

filter Filter Transform

New Functions

Function Name Documentation

STRINGGREATERTHAN STRINGGREATERTHAN Function

STRINGGREATERTHANEQUAL STRINGGREATERTHANEQUAL Function

STRINGLESSTHAN STRINGLESSTHAN Function

STRINGLESSTHANEQUAL STRINGLESSTHANEQUAL Function

SUBSTITUTE SUBSTITUTE Function

Changes to the APIs

Contents:

Changes for Release 6.4.2 Overrides to relational sources and targets

Copyright © 2019 Trifacta Inc. Page #16 Changes for Release 6.4 v4 version of password reset request endpoint Changes to awsConfig object Changes for Release 6.3 Assign AWSConfigs to a user at create time Changes for Release 6.0 Error in Release 6.0.x API docs Planned End of Life of v3 API endpoints Changes for Release 5.9 Introducing Access Tokens Changes for Release 5.1 Changes for Release 5.0 Introducing v4 APIs Changes for Release 4.2 Create Hive and Redshift connections via API WrangledDataset endpoints are still valid

Review the changes to the publicly available REST APIs for the Trifacta® platform for the current release and past releases.

Changes for Release 6.4.2

Overrides to relational sources and targets

Through the APIs, you can now apply overrides to relational sources and targets during job execution or deployment import.

jobGroups

When you are running a job, you can override the default publication settings for the job using overrides in the request. For more information, see API Workflow - Run Job.

Deployments

When you import a flow package into a deployment, you may need to remap the source and output of the flow to use production versions of your data. This capability has been present in the product for file-based sources and targets. Now, it's available for relational sources and targets. For more information, see Define Import Mapping Rules.

Changes for Release 6.4

v4 version of password reset request endpoint

To assist in migration from the command-line interface to using the APIs, a v4 version of an API endpoint has been made available to allow for administrators to generate password reset codes. For more information, see CLI Migration to APIs.

Changes to awsConfig object

NOTE: No action is required.

In Release 6.0, the awsConfig object was introduced to enable the assignment of AWS configurations to individual users (per-user auth) via API. This version of the awsConfig object supported a mapping of a single IAM role to an awsConfig object.

Copyright © 2019 Trifacta Inc. Page #17 Beginning in Release 6.4, per-user authentication now supports mapping of multiple possible IAM roles to an individual user's configuration. To enable this one-to-many mapping, the awsRoles object was introduced.

An awsRoles object creates a one-to-one mapping between an IAM role and an awsConfig object. An awsConfig object can have multiple awsRoles assigned to it.

Changes to awsConfig object:

The role field in the object has been replaced by activeRoleId, which maps to the active role for the configuration object. For each role reference in the awsConfig objects, a corresponding awsRole object has been created and mapped to it.

Beginning in Release 6.4, you can create, edit, and delete awsRoles objects, which can be used to map an AWS IAM role ARN to a specified AWSConfig object. You can map multiple awsRoles to a single awsConfig.

For more information, see API Workflow - Manage AWS Configurations.

Changes for Release 6.3

Assign AWSConfigs to a user at create time

Beginning in Release 6.3, you can assign an AWSConfig object to a user when you create the object. This shortcut reduces the number of REST calls that you need to make.

NOTE: For security reasons, AWSConfig objects must be assigned to users at the time of creation. Admin users can assign to other users. Non-admin users are automatically assigned the AWSConfig objects that they create.

Prior to Release 6.3, AWSConfig objects were assigned through the following endpoint. Example:

/v4/people/2/awsConfigs/6

NOTE: This endpoint has been removed from the platform. Please update any scripts that reference the above endpoint to manage AWS configuration assignments through the new method described in the following link.

See API Workflow - Manage AWS Configurations.

Changes for Release 6.0

Error in Release 6.0.x API docs

In Release 6.0 - Release 6.0.2, the online and PDF versions of the documentation referenced the following endpoint: API JobGroups Get Status v4. According to the docs, this endpoint was triggered in this manner:

Method GET

Endpoint /v4/jobGroups//status

Copyright © 2019 Trifacta Inc. Page #18 This endpoint exists in v3 of the API endpoints. It does not exist in v4.

Instead, you should monitor the status field for the base GET endpoint for jobGroups. For more information, see API JobGroups Get v4.

Planned End of Life of v3 API endpoints

In Release 6.0, the v3 API endpoints are supported.

In the next release of Trifacta Wrangler Enterprise after Release 6.0, the v3 API endpoints will be removed from the product (End of Life).

You must migrate to using the v4 API endpoints before upgrading to the next release after Release 6.0. For more information, see API Migration to v4.

Changes for Release 5.9

Introducing Access Tokens

Each request to the API endpoints of the Trifacta platform requires submission of authentication information. In Release 5.1 and earlier:

A request could include clear-text username/password combinations. This method is not secure. A request could include a browser cookie. This method does not work for well for use cases outside of the browser (e.g. scripts).

Beginning in Release 5.9, API users can manage authentication using access tokens. These tokens obscure any personally identifiable information and represent a standards-based method of secure authentication.

NOTE: All previous methods of API authentication are supported in this release. Access tokens is the preferred method of authentication.

The basic process works in the following manner:

1. API user requests generation of a new token. 1. This initial request must contain a valid username and password. 2. Request includes expiration. 3. Token value is returned in the response. 2. The token value inserted into the Authorization header of each request to the platform. 3. User monitors current time and expiration time of the token. At any time, the user can request a new token to be generated using the same endpoint used in the initial request.

Access tokens can be generated via API or the Trifacta application.

NOTE: This feature must be enabled in your instance of the platform. See Enable API Access Tokens.

API: For more information, see API AccessTokens Create v4. Trifacta application: For more information, see Access Tokens Page.

For more information on API authentication methods, see API Authentication.

Copyright © 2019 Trifacta Inc. Page #19 Changes for Release 5.1

None.

Changes for Release 5.0

Introducing v4 APIs

NOTE: This feature is in Beta release.

Release 5.0 signals the introduction of version 4 of the REST APIs.

NOTE: At this time, a very limited number of v4 REST APIs are publicly available. Where possible, you should continue to use the v3 endpoints. For more information, see v4 Endpoints.

NOTE: v3 of the APIs are still supported as of the generally available date for Release 5.0. For more information, see v3 Endpoints.

v4 conventions

The following conventions apply to v4 and later versions of the APIs:

Parameter lists are consistently enveloped in the following manner:

{ "data": [ { ... } ] }

Field names are in camelCase and are consistent with the resource name in the URL or with the embed U RL parameter. From early API versions, foreign keys have been replaced with identifiers like the following:

v3 and earlier v4 and later

"createdBy": 1, "creator": { "id": 1 },

Copyright © 2019 Trifacta Inc. Page #20 "updatedBy": 2, "updater": { "id": 2 },

Publication endpoint references database differently. This change is to make the publishing endpoint for relational targets more flexible in the future.

v3 and earlier v4 and later

"database": "dbName", "path": ["dbName"],

Changes for Release 4.2

Create Hive and Redshift connections via API

You can create connections of these types via API:

Only one global Hive connection is still supported. You can create multiple Redshift connections.

See API Connections Create v3.

WrangledDataset endpoints are still valid

In Release 4.1.1 and earlier, the WrangledDataset endpoints enabled creation, modification, and deletion of a wrangled dataset object, which also created the associated recipe object.

In Release 4.2, wrangled datasets have been removed from the application interface. However, the WrangledDataset API endpoints remain, although they now apply to the recipe directly.

The following endpoints and methods are still available:

NOTE: In a future release, these endpoints may be migrated to recipe-based endpoints. API users should review this page for each release.

API WrangledDatasets Create v3 API WrangledDatasets Get List v3 API WrangledDatasets Get v3 API WrangledDatasets Delete v3 API WrangledDatasets Get PrimaryInputDataset v3 API WrangledDatasets Put PrimaryInputDataset v3

For more information, see Changes to the Object Model.

Changes to Configuration

Contents:

Copyright © 2019 Trifacta Inc. Page #21 Release Updates Release 6.4.1 Release 6.4 Release 6.0 Configuration Mapping

To centralize common enablement configuration on a per-workspace basis, a number of configuration properties are being migrated from trifacta-conf.json into the Trifacta® application.

In prior releases, some of these settings may have been surfaced in the Admin Settings page. See Admin Settings Page. For more information on configuration in general, see Platform Configuration Methods.

To assist administrators in managing these settings, these section provides a per-release set of updates and a map of old properties to new settings.

These settings now appear in the Workspace Admin page. To access, from the left menu bar select Settin gs menu > Settings > Workspace Admin. For more information, see Workspace Admin Page.

Release Updates

Release 6.4.1

Following parameter was moved in this release:

Admin Settings or trifacta-conf. Workspace Admin setting Update json setting

webapp.connectivity.customSQLQuery. Enable custom SQL query Feature is now enabled at the workspace enabled or tier level. For more information on this feature, see Enable Custom SQL Query.

webapp.enableTypecastOutput Schematized output Feature is now enabled at the workspace or tier level. For more information, see Miscellaneous Configuration.

Release 6.4

Parameters from Release 6.0 that are no longer available in the Workspace Admin page are now enabled by default for all users.

Some new features for this release may be enabled or disabled through the Workspace Admin page.

See Workspace Admin Page.

Release 6.0

Initial release of the Workspace Admin page. See below for configuration mapping.

Configuration Mapping

The following mapping between old trifacta-conf.json settings and new Workspace Admin settings is accurate for the current release.

Copyright © 2019 Trifacta Inc. Page #22 Admin Settings or trifacta-conf. Workspace Admin setting Update json setting

webapp.walkthrough.enabled Product walkthroughs

webapp.session.durationInMins Session duration

webapp.enableDataDownload Sample downloads

webapp.enableUserPathModification Allow the user to modify their paths

webapp.enableSelfServicePasswordReset Enable self service password reset

outputFormats.Parquet Parquet output format

outputFormats.JSON JSON output format

outputFormats.CSV CSV output format

outputFormats.Avro Avro output format

outputFormats.TDE TDE output format

feature.scheduling.enabled Enable Scheduling feature

feature.scheduling. Scheduling management UI (experimental) Now hidden. schedulingManagementUI

feature.scheduling.upgradeUI Show users a modal to upgrade to a plan Now hidden. with Scheduling

feature.sendFlowToOtherPeople.enabled Allow users to send copies of flows to other users

feature.flowSharing.enabled Enable Flow Sharing feature

feature.flowSharing.upgradeUI Show users a modal to upgrade to a plan Now hidden. with Flow Sharing

feature.enableFlowExport Allow users to export their flows

feature.enableFlowImport Allow users to import flows into the platform

feature.hideAddPublishingActionOptions Forbid users to add non-default publishing actions

feature.hideUnderlyingFileSystem Hide underlying file system to users

feature.publishEnabled Enable publishing

feature.rangeJoin.enabled Enable UI for range join

feature.showDatasourceTabs Show datasource tabs in the application

feature.showFileLocation Show file location

feature.showOutputHomeDirectory Show output directory in profile view

feature.showUploadDirectory Show upload directory in profile view

metadata.branding Branding to use in-product for this Now hidden. deployment

webapp.connectivity.enabled Enable Connectivity feature

webapp.connectivity.upgradeUI Show users a modal to upgrade to a plan with Connectivity

feature.apiAccessTokens.enabled API Access Token

Changes to the Command Line Interface

Contents:

Copyright © 2019 Trifacta Inc. Page #23 Changes for Release 6.4 Changes for Release 6.0 Changes for Release 5.1 Changes for Release 5.0 CLI for Connections does not support Redshift and SQL DW connections Changes for Release 4.2 All CLI scripts with relational connections must be redownloaded Redshift credentials format has changed

Changes for Release 6.4

The command line interface (CLI) is now removed from the platform. CLI content is included in this release to assist with migration to the v4 APIs. For more information, see CLI Migration to APIs.

Legacy versions or instances of the CLI on the Trifacta® node are not supported for use with Release 6.4 and later of the Trifacta platform.

Changes for Release 6.0

In the next release of Trifacta Wrangler Enterprise after Release 6.0, the Command Line Interface (CLI) will reach its End of Life (EOL). The tools will no longer be included in the software distribution and will not be supported for use against the software platform. You should transition away from using the CLI as soon as possible. For more information, see CLI Migration to APIs.

Changes for Release 5.1

None.

Changes for Release 5.0

CLI for Connections does not support Redshift and SQL DW connections

In Release 5.0, the management of Redshift and SQL DW connections through the CLI for Connections is not supported.

NOTE: Please create Redshift and SQL DW connections through the application. See Connections Page.

Changes for Release 4.2

All CLI scripts with relational connections must be redownloaded

Each CLI script that references a dataset through a connection to run a job must be re-downloaded from the application in Release 4.2.

Scripts from Release 4.1 that utilize the run_job command do not work in Release 4.2.

Requirements for Release 4.2 and later:

Copyright © 2019 Trifacta Inc. Page #24 1. In the executing environment for the CLI script, the relational (JDBC) connection must exist and must be accessible to the user running the job. 2. When the CLI script is downloaded from the application, the connection ID in the datasources.tsv must be replaced by a corresponding connection ID from the new environment. 1. Connection identifiers can be retrieved using the list_connections command from the CLI. See CLI for Connections.

After the above changes have been applied to the CLI script, it should work as expected in Release 4.2. For more information, see Run job in CLI for Jobs.

Redshift credentials format has changed

In Release 4.1.1 and earlier, the credentials file used for Redshift connection was similar to the following:

{ "awsAccessKeyId": "", "awsSecretAccessKey": "", "user": "", "password": "" }

In Release 4.2:

The AWS key and secret, which were stored in trifacta-conf.json, do not need to be replicated in the Redshift credentials file. The Trifacta platform now supports EC2 role-based instance authentication. This configuration can be optionally included in the credentials file.

The credentials file format looks like the following:

{ "user": "", "password": "", "iamRoleArn": "" }

NOTE: For security purposes, you may wish to remove the AWS key/secret information from the Redshift credentials file.

NOTE: iamRoleArn is optional. For more information on using IAM roles, see Configure for EC2 Role-Based Authentication.

Changes to the Object Model

Contents:

Release 6.4 Macros Release 6.0

Copyright © 2019 Trifacta Inc. Page #25 Release 5.1 Release 5.0 Datasets with parameters Release 4.2 Wrangled datasets are removed Recipes can be reused and chained Introducing References Introducing Outputs Flow View Differences Connections as a first-class object

For more information on the objects available in the platform, see Object Overview.

Release 6.4

Macros

This release introduces macros, which are reusable sequences of parameterized steps. These sequences can be saved independently and references in other recipes in other flows. See Overview of Macros.

Release 6.0

None.

Release 5.1

None.

Release 5.0

Datasets with parameters

Beginning in Release 5.0, imported datasets can be augmented with parameters, which enables operationalizing sampling and jobs based on date ranges, wildcards, or variables applied to the input path. For more information, see Overview of Parameterization.

Release 4.2

In Release 4.2, the object model has undergone the following revisions to improve flexibility and control over the objects you create in the platform.

Wrangled datasets are removed

In Release 3.2, the object model introduced the concepts of imported datasets, recipes, and wrangled datasets. These objects represented data that you imported, steps that were applied to that data, and data that was modified by those steps.

In Release 4.2, the wrangled dataset object has been removed in place of two objects listed below. All of the functionality associated with a wrangled dataset remains, including the following actions. Next to these actions are the new object with which the action is associated.

Wrangled Dataset action Release 4.2 object

Run or schedule a job Output object

Copyright © 2019 Trifacta Inc. Page #26 Preview data Recipe object

Reference to the dataset Reference object

NOTE: At the API level, the wrangledDataset endpoint continues to be in use. In a future release, separate endpoints will be available for recipes, outputs, and references. For more information, see API Reference.

These objects are described below.

Recipes can be reused and chained

Since recipes are no longer tied to a specific wrangled dataset, you can now reuse recipes in your flow. Create a copy with or without inputs and move it to a new flow if needed. Some cleanup may be required.

This flexibility allows you to create, for example, recipes that are applicable to all of your datasets for initial cleanup or other common wrangling tasks.

Additionally, recipes can be created from recipes, which allows you to create chains of recipes. This sequencing allows for more effective management of common steps within a flow.

Introducing References

Before Release 4.2, reference datasets existed and were represented in the user interface. However, these objects existed in the downstream flow that consumes the source. If you had adequate permissions to reference a dataset from outside of your flow, you could pull it in as a reference dataset for use.

In Release 4.2, a reference is a link between a recipe in your flow to other flows. This object allows you to expose your flow's recipe for use outside of the flow. So, from the source flow, you can control whether your recipe is available for use.

This object allows you to have finer-grained control over the availability of data in other flows. It is a dependent object of a recipe.

NOTE: For multi-dataset operations such as union or join, you must now explicitly create a reference from the source flow and then union or join to that object. In previous releases, you could directly join or union to any object to which you had access.

Introducing Outputs

In Release 4.1, outputs became a configurable object that was part of the wrangled dataset. For each wrangled dataset, you could define one or more publishing actions, each with its own output types, locations, and other parameters. For scheduled executions, you defined a separate set of publishing actions. These publishing actions were attached to the wrangled dataset.

In Release 4.2, an output is a defined set of scheduled or ad-hoc publishing actions. With the removal of the wrangled dataset object, outputs are now top-level objects attached to recipes. Each output is a dependent object of a recipe.

Flow View Differences

Below, you can see the same flow as it appears in Release 4.1 and Release 4.2. In each Flow View:

The same datasets have been imported. POS-r01 has been unioned to POS-r02 and POS-r03.

Copyright © 2019 Trifacta Inc. Page #27 POS-r01 has been joined to REF-PROD, and the column containing the duplicate join key in the result has been dropped. In addition to the default CSV publishing action (output), a scheduled one has been created in JSON format and scheduled for weekly execution.

Release 4.1 Flow View

Release 4.2 Flow View

Flow View differences

Wrangled dataset no longer exists. In Release 4.1, scheduling is managed off of the wrangled dataset. In Release 4.2, it is managed through the new output object. Outputs are configured in a very similar manner, although in Release 4.2, the tab is labeled, "Destinations." No changes to scheduling UI.

Copyright © 2019 Trifacta Inc. Page #28 Like the output object, the reference object is an externally visible link to a recipe in Flow View. This object just enables referencing the recipe object in other flows. See Flow View Page.

Other differences

In application pages where you can select tabs to view object types, the available selections are typically: All, Imported Dataset, Recipe, and Reference. Wrangled datasets have been removed from the Dataset Details page, which means that the job cards for your dataset runs have been removed. These cards are still available in the Jobs page when you click the drop-down next to the jjob entry. The list of jobs for a recipe is now available through the output object in Flow View. Select the object and review the job details through the right panel. In Flow View and the Transformer page, context menu items have changed.

Connections as a first-class object

In Release 4.1.1 and earlier, connections appeared as objects to be created or explored in the Import Data page. Through the left navigation bar, you could create or edit connections to which you had permission to do so. Connections were also selections in the Run Job page.

Only administrators could create public connections. End-users could create private connections.

In Release 4.2, the Connections Manager enables you to manage your personal connections and (if you're an administrator) global connections. Key features:

Connections can be managed like other objects. Connections can be shared, much like flows. When a flow with a connection is shared, its connection is automatically shared. For more information, see Overview of Sharing. Release 4.2 introduces a much wider range of connectivity options. Multiple Redshift connections can be created through this interface. In prior releases, you could only create a single Redshift connection, and it had to be created through the command line interface (CLI).

NOTE: Beginning in Release 4.2, all connections are initially created as private connections, accessible only to the user who created. Connections that are available to all users of the platform are called, public connections. You can make connections public through the Connections page.

For more information, see Connections Page.

Release Notes 6.4

Contents:

Release 6.4.2 What's New Changes in System Behavior Key Bug Fixes New Known Issues Release 6.4.1

Copyright © 2019 Trifacta Inc. Page #29 What's New Changes in System Behavior Key Bug Fixes New Known Issues Release 6.4 What's New Changes in System Behavior Key Bug Fixes New Known Issues

Release 6.4.2

November 15, 2019

This release is primarily a bug fix release with the following new features.

What's New

API:

Apply overrides at time of job execution via API. Define import mapping rules for your deployments that use relational sources or publish to relational targets. See Changes to the APIs.

Job execution:

By default, the Trifacta application permits up to four jobs from the same flow to be executed at the same time. If needed, you can configure the application to execute jobs from the same flow one at a time. See Configure Application Limits.

Changes in System Behavior

None.

Key Bug Fixes

Ticket Description

TD-44548 RANGE function returns null values if more than 1000 values in output.

TD-44494 Lists are not correctly updated in Deployment mode

TD-44311 Out of memory error when running a flow with many output objects

TD-44188 Performance is poor for SQL DW connection

TD-43877 Preview after a DATEFORMAT step does not agree with results or profile values

TD-44035 Spark job failure from Excel source

TD-43849 Export flows are broken when recipe includes Standardization or Transform by Example tasks.

NOTE: This Advanced Feature is available in Trifacta Wrangler Enterprise under a separate, additional license. If it is not available under your current license,

Copyright © 2019 Trifacta Inc. Page #30 do not enable it for use. Please feel free to contact your representative.

New Known Issues

None.

Release 6.4.1

August 30. 2019

This release includes bug fixes and introduces SSO connections for Azure relational sources.

What's New

Connectivity:

You can now leverage your Azure AD SSO infrastructure to create SSO connections to Azure relational databases. For more information, see Enable SSO for Azure Relational Connections.

Changes in System Behavior

Configuration changes:

The parameter to enable custom SQL query has been moved to the Workspace Admin page. The parameter to disable schematized output has been moved to the Workspace Admin page. For more information, see Changes to Configuration.

Key Bug Fixes

Ticket Description

TD-39086 Hive ingest job fails on Microsoft Azure.

New Known Issues

None.

Release 6.4

August 1, 2019

This release of Trifacta® Wrangler Enterprise features broad improvements to the recipe development experience, including multi-step operations and improved copied and paste within the Recipe panel. As a result of the panel's redesign, you can now create user-defined macros, which are sets of sequenced and parameterized steps for easy reuse and adaptation for other recipes. When jobs are executed, detailed monitoring provides enhanced information on progress of the job through each phase of the process. You can also connect to a broader ecosystem of sources and targets, including enhancements to the integration with Tableau Server and AWS Glue. New for this release: read from your Snowflake sources. Read on for additional details on new features and enhancements.

What's New

Transformer Page:

Copyright © 2019 Trifacta Inc. Page #31 The redesigned Recipe panel enables multi-step operations and more robust copy and paste actions. See Recipe Panel. Introducing user-defined macros, which enable saving and reusing sequences of steps. For more information, see Overview of Macros. Transform by example output values for a column of values. See Transformation by Example Page. For an overview of this feature, see Overview of TBE. Browse current flow for datasets or recipes to join into the current recipe. See Join Panel. Replace specific cell values. See Replace Cell Values.

Job Execution:

Detailed job monitoring for ingest and publishing jobs. See Overview of Job Monitoring. Parameterize output paths and table and file names. See Run Job Page.

Install:

Support for RHEL/CentOS 7.5 and 7.6 for the Trifacta node. See System Requirements.

Support for deployment of Trifacta platform via Docker image. See Install for Docker.

Connectivity:

Support for integration with Cloudera 6.2.x. See System Requirements.

NOTE: Support for integration with Cloudera 5.15.x and earlier has been deprecated. See End of Life and Deprecated Features.

NOTE: Support for integration with HDP 2.5.x and earlier has been deprecated. See End of Life and Deprecated Features.

Support for Snowflake database connections.

NOTE: This feature is supported only when Trifacta Wrangler Enterprise is installed on customer- managed AWS infrastructure.

For more information, see Enable Snowflake Connections.

Support for direct publishing to Tableau Server. For more information, see Run Job Page. Support for MySQL database timezones. See Install Databases for MySQL.

Enhanced support for AWS Glue integration:

Metadata catalog browsing through the application. See AWS Glue Browser. Per-user authentication to Glue. See Configure AWS Per-User Authentication. See Enable AWS Glue Access.

Import:

Add timestamp parameters to your custom SQL statements to enable data import relative to the job execution time. See Create Dataset with SQL.

Authentication:

Leverage your enterprise's SAML identity provider to pass through a set of IAM roles that Trifacta users can select for access to AWS resources.

Copyright © 2019 Trifacta Inc. Page #32 NOTE: This authentication method is supported only if SSO authentication has been enabled using the platform-native SAML authentication method. For more information, see Configure SSO for SAML.

For more information, see Configure for AWS SAML Passthrough Authentication.

Support for AzureManaged Identities with Azure Databricks. See Configure for Azure Databricks.

Admin:

Administrators can review, enable, disable, and delete schedules through the application. See Schedules Page.

Sharing:

Share flows and connections with groups of users imported from your LDAP identity provider.

NOTE: This feature is in Beta release.

See Configure Users and Groups.

Logging:

Tracing user information across services for logging purposes. See Configure Logging for Services.

Language:

New functions. See Changes to the Language. Broader support for metadata references. For Excel files, $filepath references now return the location of the source Excel file. Sheet names are appended to the end of the reference. See Source Metadata References.

APIs:

Admins can now generate password reset requests via API. See Changes to the APIs.

Databases:

New databases:

Job Metadata Service database

Changes in System Behavior

NOTE: The Trifacta software must now be installed on an edge node of the cluster. Existing customers who cannot migrate to an edge node will be supported. You will be required to update cluster files on the Trifacta node whenever they change, and cluster upgrades may be more complicated. You should migrate your installation to an edge node if possible. For more information, see System Requirements.

NOTE: The v3 APIs are no longer supported. Please migrate immediately to using the v4 APIs. For more information, see API Migration to v4.

Copyright © 2019 Trifacta Inc. Page #33 NOTE: The command line interface (CLI) is no longer available. Please migrate immediately to using the v4 APIs. For more information, see CLI Migration to APIs.

NOTE: The PNaCl browser client extension is no longer supported. Please verify that all users of Trifacta Wrangler Enterprise are using a supported version of Google Chrome, which automatically enables use of WebAssembly. For more information, see Desktop Requirements.

NOTE: Support for Java 7 has been deprecated in the platform. Please upgrade to Java 8 on the Trifacta node and any connected cluster. Some versions of Cloudera may install Java 7 by default.

NOTE: The Chat with us feature is no longer available. For Trifacta Wrangler Enterprise customers, this feature had to be enabled in the product. For more information, see Trifacta Support.

NOTE: The desktop version of Trifacta Wrangler will cease operations on August 31, 2019. If you are still using the product at that time, your data will be lost. Please transition to using the free Cloud version of Tri facta® Wrangler. Automated migration is not available. To register for a free account, please visit https://cloud.trifacta.com.

Workspace:

Configuration for AWS authentication for platform users has been migrated to a new location. See Configure Your Access to S3.

API:

The endpoint used to assign an AWSConfig object to a user has been replaced.

NOTE: If you used the APIs to assign AWSConfig objects in a previous release, you must update your scripts to assign AWS configurations. For more information, see Changes to the APIs.

Documentation:

In prior releases, the documentation listed UTF32-BE and UTF32-LE as supported file formats. These formats are not supported. Documentation has been updated to correct this error. See Supported File Encoding Types.

Key Bug Fixes

Ticket Description

TD-41260 Unable to append Trifacta Decimal type into table with Hive Float type. See Hive Data Type Conversions.

TD-40424 UTF-32BE and UTF-32LE are available as supported file encoding options. They do not work.

Copyright © 2019 Trifacta Inc. Page #34 NOTE: Although these options are available in the application, they have never been supported in the underlying platform. They have been removed from the interface.

TD-40299 Cloudera Navigator integration cannot locate the database name for JDBC sources on Hive.

TD-40243 API access tokens don't work with native SAML SSO authentication

TD-39513 Import of folder of Excel files as parameterized dataset only imports the first file, and sampling may fail.

TD-39455 HDI 3.6 is not compatible with Guava 26.

TD-39092 $filepath and $sourcerownumber references are not supported for Parquet file inputs.

For more information, see Source Metadata References.

TD-31354 When creating Tableau Server connections, the Test Connection button is missing. See Create Tableau Server Connections.

TD-36145 Spark running environment recognizes numeric values preceded by + as Integer or Decimal data type. Photon running environment does not and types these values as strings.

New Known Issues

Ticket Description

TD-42638 Publishing and ingest jobs that are short in duration cannot be canceled.

Workaround: Allow the job to complete. You can track the progress through these phases of the jobs through the application. See Job Details Page.

TD-39052 Changes to signout on reverse proxy method of SSO do not take effect after upgrade.

Release Notes 6.0

Contents:

Release 6.0.2 What's New Changes to System Behavior Key Bug Fixes New Known Issues Release 6.0.1 What's New Changes to System Behavior Key Bug Fixes New Known Issues Release 6.0 What's New Changes to System Behavior

Copyright © 2019 Trifacta Inc. Page #35 Key Bug Fixes New Known Issues

Release 6.0.2

This release addresses several bug fixes.

What's New

Support for Cloudera 6.2. For more information, see System Requirements.

Changes to System Behavior

NOTE: As of Release 6.0, all new and existing customers must license, download, and install the latest version of the Tableau SDK onto the Trifacta node. For more information, see Create Tableau Server Connections.

Upload:

In previous releases, files that were uploaded to the Trifacta platform that had an unsupported filename extension received a warning before upload. Beginning in this release, files with unsupported extensions are blocked from upload. You can change the list of supported file extensions. For more information, see Miscellaneous Configuration.

Documentation:

In Release 6.0.x documentation, documentation for the API JobGroups Get Status v4 endpoint was mistakenly published. This endpoint does not exist. For more information on the v4 equivalent, see Changes to the APIs.

Key Bug Fixes

Ticket Description

TD-40471 SAM auth: Logout functionality not working

TD-39318 Spark job fails with parameterized datasets sourced from Parquet files

TD-39213 Publishing to Hive table fails

New Known Issues

None.

Release 6.0.1

This release features support for several new Hadoop distributions and numerous bug fixes.

What's New

Connectivity:

Support for integration with CDH 5.16.

Copyright © 2019 Trifacta Inc. Page #36 Support for integration with CDH 6.1. Version-specific configuration is required.

See Supported Deployment Scenarios for Cloudera. Support for integration with HDP 3.1. Version-specific configuration is required. See Supported Deployment Scenarios for Hortonworks. Support for Hive 3.0 on HDP 3.0 or HDP 3.1. Version-specific configuration is required. See Configure for Hive. Support for Spark 2.4.0.

NOTE: There are some restrictions around which running environment distributions support and do not support Spark 2.4.0.

For more information, see Configure for Spark. Support for integration with high availability for Hive.

NOTE: High availability for Hive is supported on HDP 2.6 and HDP 3.0 with Hive 2.x enabled. Other configurations are not currently supported.

For more information, see Create Hive Connections.

Publishing:

Support for automatic publishing of job metadata to Cloudera Navigator.

NOTE: For this release, Cloudera 5.16 only is supported.

For more information, see Configure Publishing to Cloudera Navigator.

Changes to System Behavior

Key Bug Fixes

Ticket Description

TD-39779 MySQL JARs must be downloaded by user.

NOTE: If you are installing the databases in MySQL, you must download a set of JARs and install them on the Trif acta node. For more information, see Install Databases for MySQL.

TD-39694 Tricheck returns status code 200, but there is no response. It does not work through Admin Settings page.

TD-39455 HDI 3.6 is not compatible with Guava 26.

TD-39086 Hive ingest job fails on Microsoft Azure.

New Known Issues

Ticket Description

TD-40299 Cloudera Navigator integration cannot locate the database name for JDBC sources on Hive.

Copyright © 2019 Trifacta Inc. Page #37 TD-40348 When loading a recipe in imported flow that references an imported Excel dataset, Transformer page displays Input validation failed: (Cannot read property 'filter' of undefined) error, and the screen is blank.

Workaround: In Flow View, select an output object, and run a job. Then, load the recipe in the Transformer page and generate a new sample. For more information, see Import Flow.

TD-39969 On import, some Parquet files cannot be previewed and result in a blank screen in the Transformer page.

Workaround: Parquet format supports row groups, which define the size of data chunks that can be ingested. If row group size is greater than 10 MB in a Parquet source, preview and initial sampling does not work. To workaround this issue, import the dataset and create a recipe for it. In the Transformer page, generate a new sample for it. For more information, see Parquet Data Type Conversions.

Release 6.0

This release of Trifacta® Wrangler Enterprise introduces key features around column management, including multi-select and copy and paste of columns and column values. A new Job Details page captures more detailed information about job execution and enables more detailed monitoring of in-progress jobs. Some relational connections now support publishing to connected databases. This is our largest release yet. Enjoy!

NOTE: This release also announces the deprecation of several features, versions, and supported extensions. Please be sure to review Changes to System Behavior below.

What's New

NOTE: Beginning in this release, the Wrangler Enterprise desktop application requires a 64-bit version of . For more information, see Install Desktop Application.

Wrangling:

In data grid, you can select multiple columns before receiving suggestions and performing transformations on them. For more information, see Data Grid Panel. New Selection Details panel enables selection of values and groups of values within a selected column. See Selection Details Panel. Copy and paste columns and column values through the column menus. see Copy and Paste Columns. Support for importing files in Parquet format. See Supported File Formats. Specify ranges of key values in your joins. See Configure Range Join.

Jobs:

Review details and monitor the status of in-progress jobs through the new Job Details page. See Job Details Page. Filter list of jobs by source of job execution or by date range. See Jobs Page.

Copyright © 2019 Trifacta Inc. Page #38 Connectivity:

Publishing (writeback) is now supported for relational connections. This feature is enabled by default

NOTE: After a connection has been enabled for publishing, you cannot disable publishing for that connection. Before you enable, please verify that all user accounts accessing databases of these types have appropriate permissions.

See Enable Relational Connections. The following connection types are natively supported for publishing to relational systems. Oracle Data Type Conversions Postgres Data Type Conversions SQL Server Data Type Conversions Teradata Data Type Conversions Import folders of Microsoft Excel workbooks. See Import Excel Data. Support for integration with CDH 6.0. Version-specific configuration is required.See Supported Deployment Scenarios for Cloudera. Support for integration with HDP 3.0. Version-specific configuration is required. See Supported Deployment Scenarios for Hortonworks. Support for Hive 3.0 on HDP 3.0 only. Version-specific configuration is required. See Configure for Hive.

Language:

Track file-based lineage using $filepath and $sourcerownumber references. See Source Metadata References. In addition to directly imported files, the $sourcerownumber reference now works for converted files (such as Microsoft Excel workbooks) and for datasets with parameters. See Source Metadata References.

Workspace:

Organize your flows into folders. See Flows Page.

Publishing:

Users can be permitted to append to Hive tables when they do not have CREATE or DROP permissions on the schema.

NOTE: This feature must be enabled. See Configure for Hive.

Administration:

New Workspace Admin page centralizes many of the most common admin settings. See Changes to System Behavior below. Download system logs through the Trifacta application. See Admin Settings Page.

Supportability:

High availability for the Trifacta node is now generally available. See Install for High Availability.

Authentication:

Integrate SSO authentication with enterprise LDAP-AD using platform-native LDAP support.

Copyright © 2019 Trifacta Inc. Page #39 NOTE: This feature is in Beta release.

NOTE: In previous releases, LDAP-AD SSO utilizes an Apache reverse proxy. While this method is still supported, it is likely to be deprecated in a future release. Please migrate to using the above SSO method. See Configure SSO for AD-LDAP.

Support for SAML SSO authentication. See Configure SSO for SAML.

Changes to System Behavior

NOTE: The Trifacta node requires NodeJS 10.13.0. See System Requirements .

Configuration:

To simplify configuration of the most common feature enablement settings, some settings have been migrated to the new Workspace Admin page. For more information, see Workspace Admin Page.

NOTE: Over subsequent releases, more settings will be migrated to the Workspace Admin page from the Admin Settings page and from trifacta-conf.json. For more information, see Changes to Configuration.

See Platform Configuration Methods.

See Admin Settings Page.Java 7:

NOTE: In the next release of Trifacta Wrangler Enterprise, support for Java 7 will be end of life. The product will no longer be able to use Java 7 at all. Please upgrade to Java 8 on the Trifacta node and your Hadoop cluster.

Key Bug Fixes

Ticket Description

TD-36332 Data grid can display wrong results if a sample is collected and dataset is unioned.

TD-36192 Canceling a step in recipe panel can result in column menus disappearing in the data grid.

TD-35916 Cannot logout via SSO

TD-35899 A deployment user can see all deployments in the instance.

TD-35780 Upgrade: Duplicate metadata in separate publications causes DB migration failure.

TD-35644 Extractpatterns with "HTTP Query strings" option doesn't work.

TD-35504 Cancel job throws 405 status code error. Clicking Yes repeatedly pops up Cancel Job dialog.

TD-35486 Spark jobs fail on LCM function that uses negative numbers as inputs.

TD-35483 Differences in how WEEKNUM function is calculated in the Trifacta Photon and Spark running environments, due to the underlying frameworks on which the environments are created.

Copyright © 2019 Trifacta Inc. Page #40 NOTE: Trifacta Photon and Spark jobs now behave consistently. Week 1 of the year is the week that contains January 1.

TD-35481 Upgrade Script is malformed due to SplitRows not having a Load parent transform.

TD-35177 Login screen pops up repeatedly when access permission is denied for a connection.

TD-27933 For multi-file imports lacking a newline in the final record of a file, this final record may be merged with the first one in the next file and then dropped in the Trifacta Photon running environment.

New Known Issues

Ticket Description

TD-39513 Import of folder of Excel files as parameterized dataset only imports the first file, and sampling may fail.

Workaround: Import as separate datasets and union together.

TD-39455 HDI 3.6 is not compatible with Guava 26.

TD-39092 $filepath and $sourcerownumber references are not supported for Parquet file inputs.

Workaround: Upload your Parquet files. Create an empty recipe and run a job to generate an output in a different file format, such as CSV or JSON. Use that output as a new dataset. See Build Sequence of Datasets.

For more information on these references, see Source Metadata References.

TD-39086 Hive ingest job fails on Microsoft Azure.

TD-39053 Cannot read datasets from Parquet files generated by Spark containing nested values.

Workaround: In the source for the job, change the data types of the affected columns to String and re-run the job on Spark.

TD-39052 Signout using reverse proxy method of SSO is not working after upgrade.

TD-38869 Upload of Parquet files does not support nested values, which appear as null values in the Transformer page.

Workaround: Unnest the values before importing into the platform.

TD-37683 Send a copy does not create independent sets of recipes and datasets in new flow. If imported datasets are removed in the source flow, they disappear from the sent version.

Copyright © 2019 Trifacta Inc. Page #41 Workaround: Create new versions of the imported datasets in the sent flow.

TD-36145 Spark running environment recognizes numeric values preceded by + as Integer or Decimal data type. Trifacta Photon running environment does not and types these values as strings.

TD-35867 v3 publishing API fails when publishing to alternate S3 buckets

Workaround: You can use the corresponding v4 API to perform these publication tasks. For more information on a workflow, see API Workflow - Manage Outputs.

Release Notes 5.1

Contents:

What's New Changes to System Behavior Key Bug Fixes New Known Issues

Welcome to Release 5.1 of Trifacta® Wrangler Enterprise! This release includes a significant expansion in database support and connectivity with more running environment versions, such as Azure Databricks. High availability is now available on the Trifacta platform node itself.

Within the Transformer page, you should see a number of enhancements, including improved toolbars and column menus. Samples can be named.

Regarding operationalization of the platform, datasets with parameters can now accept Trifacta patterns for parameter specification, which simplifies the process of creating complex matching patterns. Additionally, you can swap out a static imported dataset for a dataset with parameters, which enables development on a simpler dataset before expanding to a more complex set of sources. Variable overrides can now be applied to scheduled job executions, and you can specify multiple variable overrides in Flow View.

The underlying language has been improved with a number of transformations and functions, including a set of transformations designed around preparing data for machine processing. Details are below.

Tip: For a general overview of the product and its capabilities, see Product Overview.

What's New

Install:

Support for PostgreSQL 9.6 for Trifacta databases.

NOTE: PostgreSQL 9.3 is no longer supported. PostgreSQL 9.3 is scheduled for end of life (EOL) in September 2018. For more information on upgrading, see Upgrade Databases for PostgreSQL.

Copyright © 2019 Trifacta Inc. Page #42 Partial support for MySQL 5.7 for hosting the Trifacta databases.

NOTE: MySQL 5.7 is not supported for installation on Amazon RDS. See System Requirements.

Support for high availability on the Trifacta node . See Configure for High Availability .

NOTE: High availability on the platform is in Beta release.

Support for CDH 5.15.

NOTE: Support for CDH 5.12 has been deprecated. See End of Life and Deprecated Features.

Support for Spark 2.3.0 on the Hadoop cluster. See System Requirements. Support for integration with EMR 5.13, EMR 5.14, and EMR 5.15. See Configure for EMR.

NOTE: EMR 5.13 - 5.15 require Spark 2.3.0. See Configure for Spark.

Support for integration with Azure Databricks. See Configure for Azure Databricks. Support for WebAssembly, Google Chrome's standards-compliant native client.

NOTE: This feature is in Beta release.

NOTE: In a future release, use of PNaCl native client is likely to be deprecated.

Use of WebAssembly requires Google Chrome 68+. No additional installation is required. In this release, this feature must be enabled. For more information, see Miscellaneous Configuration.

The Trifacta® platform defaults to using Spark 2.3.0 for Hadoop job execution. See Configure for Spark .

Connectivity:

Enhanced import process for Excel files, including support for import from backend file systems. See Import Excel Data. Support for DB2 connections. See Connection Types. Support for HiveServer2 Interactive (Hive 2.x) on HDP 2.6. See Configure for Hive. Support for Kerberos-delegated relational connections. See Enable SSO for Relational Connections.

NOTE: In this release, only SQL Server connections can use SSO. See Create SQL Server Connections.

Performance caching for JDBC ingestion. See Configure JDBC Ingestion. Enable access to multiple WASB datastores. See Enable WASB Access.

Import:

Support for use of Trifacta patterns in creating datasets with parameters. See Create Dataset with Parameters.

Copyright © 2019 Trifacta Inc. Page #43 Swap a static imported dataset with a dataset with parameters in Flow View. See Flow View Page.

Flow View:

Specify overrides for multiple variables through Flow View. See Flow View Page. Variable overrides can also be applied to scheduled job executions. See Add Schedule Dialog.

Transformer Page:

Join tool is now integrated into the context panel in the Transformer page. See Join Panel . Improved join inference key model. See Join Panel . Patterns are available for review and selection, prompting suggestions, in the context panel. Updated toolbar. See Transformer Toolbar. Enhanced options in the column menu. See Column Menus. Support for a broader range of characters in column names. See Rename Columns.

Sampling:

Samples can be named. See Samples Panel. Variable overrides can now be applied to samples taken from your datasets with parameters. See Samples Panel.

Jobs:

Filter list of jobs by date. See Jobs Page .

Language:

Rename columns using values across multiple rows. See Rename Columns.

Transformation Name Description

Bin column Place numeric values into bins of equal or custom size for the range of values.

Scale column Scale a column's values to a fixed range or to zero mean, unit variance distributions.

One-hot encoding Encode values from one column into separate columns containing 0 or 1 , depending on the absence or presence of the value in the corresponding row.

Group By Generate new columns or replacement tables from aggregate functions applied to grouped values from one or more columns.

Publishing:

Export dependencies of a job as a flow. See Flow View Page. Add quotes as CSV file publishing options. See Run Job Page. Specify CSV field delimiters for publishing. See Miscellaneous Configuration. Support for publishing Datetime values to Redshift as timestamps. See Redshift Data Type Conversions.

Execution:

UDFs are now supported for execution on HDInsight clusters. See Java UDFs .

Admin:

Enable deletion of jobs. See Miscellaneous Configuration. Upload an updated license file through the application. See Admin Settings Page.

Copyright © 2019 Trifacta Inc. Page #44 Changes to System Behavior

Diagnostic Server removed from product

The Diagnostic Server and its application page have been removed from the product. This feature has been superseded by Tricheck, which is available to administrators through the application. For more information, see Admin Settings Page.

Wrangle now supports nested expressions

The Wrangle now supports nested expressions within expressions. For more information, see Changes to the Language.

Language changes

The RAND function without parameters now generates true random numbers. When the source information is not available, the SOURCEROWNUMBER function can still be used. It returns null values in all cases. New functions. See Changes to the Language.

Key Bug Fixes

Ticket Description

TD-36332 Data grid can display wrong results if a sample is collected and dataset is unioned.

TD-36192 Canceling a step in recipe panel can result in column menus disappearing in the data grid.

TD-36011 User can import modified exports or exports from a different version, which do not work.

TD-35916 Cannot logout via SSO

TD-35899 A deployment user can see all deployments in the instance.

TD-35780 Upgrade: Duplicate metadata in separate publications causes DB migration failure.

TD-35746 /v4/importedDatasets GET method is failing.

TD-35644 Extractpatterns with "HTTP Query strings" option doesn't work

TD-35504 Cancel job throws 405 status code error. Clicking Yes repeatedly pops up Cancel Job dialog.

TD-35481 After upgrade, recipe is malformed at splitrows step.

TD-35177 Login screen pops up repeatedly when access permission is denied for a connection.

TD-34822 Case-sensitive variations in date range values are not matched when creating a dataset with parameters.

NOTE: Date range parameters are now case-insensitive.

TD-33428 Job execution on recipe with high limit in split transformation due to Java Null Pointer Error during profiling.

Copyright © 2019 Trifacta Inc. Page #45 NOTE: Avoid creating datasets that are wider than 2500 columns. Performance can degrade significantly on very wide datasets.

TD-31327 Unable to save dataset sourced from multi-line custom SQL on dataset with parameters.

TD-31252 Assigning a target schema through the Column Browser does not refresh the page.

TD-31165 Job results are incorrect when a sample is collected and then the last transform step is undone.

TD-30979 Transformation job on wide dataset fails on Spark 2.2 and earlier due to exceeding Java JVM limit. For details, see https://issues.apache.org/jira/browse/SPARK-18016.

TD-30857 Matching file path patterns in a large directory can be very slow, especially if using multiple patterns in a single dataset with parameters.

NOTE: To increase matching speed, avoid wildcards in top-level directories and be as specific as possible with your wildcards and patterns.

TD-30854 When creating a new dataset from the Export Results window from a CSV dataset with Snappy compression, the resulting dataset is empty when loaded in the Transformer page.

TD-30820 Some string comparison functions process leading spaces differently when executed on the Trifacta Photon or the Spark running environment.

TD-30717 No validation is performed for Redshift or SQL DW connections or permissions prior to job execution. Jobs are queued and then fail.

TD-27860 When the platform is restarted or an HA failover state is reached, any running jobs are stuck forever In Progress.

New Known Issues

Ticket Component Description

TD-35714 Installer/Upgrader/Utilities After installing on Ubuntu 16.04 (Xenial), platform may fail to start with "ImportError: No module named pkg_resources" error.

Workaround: Verify installation of python-setuptools package. Install if missing.

TD-35644 Compilation/Execution Extractpatterns for "HTTP Query strings" option doesn't work.

TD-35562 Compilation/Execution When executing Spark 2.3.0 jobs on S3- based datasets, jobs may fail due to a known incompatibility between HTTPClient: 4.5.x and aws-java-jdk:1.10.xx. For details, see https://github.com/apache/incubator-druid /issues/4456 .

Workaround: Use Spark 2.1.0 instead. In Admin Settings page, configure the spark.

Copyright © 2019 Trifacta Inc. Page #46 version property to 2.1.0 . For more information, see Admin Settings Page.

For additional details on Spark versions, see Configure for Spark.

TD-35504 Compilation/Execution Clicking Cancel Job button generates a 405 status code error. Click Yes button fails to close the dialog.

Workaround: After you have clicked the Yes button once, you can click the No button. The job is removed from the page.

TD-35486 Compilation/Execution Spark jobs fail on LCM function that uses negative numbers as inputs.

Workaround: If you wrap the negative number input in the ABS function, the LCM function may be computed. You may have to manually check if a negative value for the LCM output is applicable.

TD-35483 Compilation/Execution Differences in how WEEKNUM function is calculated in Trifacta Photon and Spark running environments, due to the underlying frameworks on which the environments are created.

Trifacta Photon week 1 of the year: The week that contains January 1. Spark week 1 of the year: The week that contains at least four days in the specified year.

For more information, see WEEKNUM Function.

TD-35478 Compilation/Execution The Spark running environment does not support use of multi-character delimiters for CSV outputs. For more information on this issue, see https://issues.apache.org/jira/browse /SPARK-24540 .

Workaround: You can switch your job to a different running environment or use single- character delimiters.

TD-34840 Transformer Page Platform fails to provide suggestions for transformations when selecting keys from an object with many of them.

TD-34119 Compilation/Execution WASB job fails when publishing two successive appends.

TD-30855 Publish Creating dataset from Parquet-only output results in "Dataset creation failed" error.

Copyright © 2019 Trifacta Inc. Page #47 NOTE: If you generate results in Parquet format only, you cannot create a dataset from it, even if the Create button is present.

TD-30828 Publish You cannot publish ad-hoc results for a job when another publishing job is in progress for the same job.

Workaround: Please wait until the previous job has been published before retrying to publish the failing job.

TD-27933 Connectivity For multi-file imports lacking a newline in the final record of a file, this final record may be merged with the first one in the next file and then dropped in the Trifacta Photon running environment.

Workaround: Verify that you have inserted a new line at the end of every file-based source.

Install Overview

Contents:

Basic Install Workflow Installation Scenarios Install On-Premises Install for AWS Install for Azure Install from AWS Marketplace Install from AWS with EMR Install for Azure Marketplace Install Desktop Application Notation

Basic Install Workflow

1. Review the pre-installation checklist and other system requirements. See Install Preparation. 2. Review the requirements for your specific installation scenario in the following sections. 3. Install the software. See Install Software. 4. Install the databases. See Install Databases. 5. Configure your installation. 6. Verify operations.

Notation

In this guide, JSON settings are provided in dot notation. For example, webapp.selfRegistration refers to a JSON block selfRegistration under webapp:

Copyright © 2019 Trifacta Inc. Page #48 { ... "webapp": { "selfRegistration": true, ... } ... }

Install from AWS Marketplace

Contents:

Product Limitations Internet access SELinux Install Desktop Requirements Pre-requisites Install Steps - CloudFormation template SSH Access Upgrade Documentation Related Topics

This guide steps through the requirements and process for installing Trifacta® Data Preparation for Amazon Redshift and S3 through the AWS Marketplace.

Product Limitations

Connectivity to sources other than S3 and Redshift is not supported. Jobs must be executed in the Trifacta Photon running environment. No other running environment integrations are supported. Anomaly and stratified sampling are not supported in this deployment. When publishing single files to S3, you cannot apply an append publishing action. Trifacta Data Preparation for Amazon Redshift and S3 must be deployed into an existing Virtual Private Cloud (VPC). The EC2 instance, S3 buckets, and any connected Redshift databases must be located in the same Amazon region. Cross-region integrations are not supported at this time.

NOTE: HDFS integration is not supported for Amazon Marketplace installations.

The S3 bucket automatically created by the Marketplace CloudFormation template is not automatically deleted when you delete the stack in CloudFormation. You must empty the bucket and delete it, which can be done through the AWS Console.

Copyright © 2019 Trifacta Inc. Page #49 Internet access

From AWS, the Trifacta platform requires Internet access for the following services:

NOTE: Depending on your AWS deployment, some of these services may not be required.

AWS S3 Key Management System [KMS] (if sse-kms server side encryption is enabled) Secure Token Service [STS] (if temporary credential provider is used) EMR (if integration with EMR cluster is enabled)

NOTE: If the Trifacta platform is hosted in a VPC where Internet access is restricted, access to S3, KMS and STS services must be provided by creating a VPC endpoint. If the platform is accessing an EMR cluster, a proxy server can be configured to provide access to the AWS ElasticMapReduce regional endpoint.

SELinux

By default, Trifacta Data Preparation for Amazon Redshift and S3 is installed on a server with SELinux enabled. Security-enhanced (SELinux) provides a set of security features for, among other things, managing access controls.

Tip: The following may be applied to other deployments of the Trifacta platform on servers where SELinux has been enabled.

In some cases, SELinux can interfere with normal operations of platform software. If you are experiencing connectivity problems related to SELinux, you can do either one of the following:

1. Disable SELinux on the server. For more information, please see the CentOS documentation. 2. Apply the following commands on the server, as root: 1. Open ports on the server for listening. 1. By default, the Trifacta application listens on port 3005. The following opens that port when SELinux is enabled:

semanage port -a -t http_port_t -p tcp 3005

2. Repeat the above step for any other ports that you wish to open on the server. 2. Permit nginx, the proxy on the Trifacta node, to open websockets:

setsebool -P httpd_can_network_connect 1

Copyright © 2019 Trifacta Inc. Page #50 Install

Desktop Requirements

All desktop users of the platform must have the latest version of Google Chrome installed on their desktops. All desktop users must be able to connect to the EC2 instance over the port on which Trifacta Data Preparation for Amazon Redshift and S3 is listening. Default is 3005.

NOTE: Trifacta Data Preparation for Amazon Redshift and S3 enforces a maximum limit of 30 users.

Pre-requisites

Before you install the platform, please verify that the following steps have been completed.

1. EULA. Before you begin, please review the End-User License Agreement. See End-User License Agreement. 2. SSH Key-pair. Please verify that there is an SSH key pair available to assign to the Trifacta node.

Install Steps - CloudFormation template

This install process creates the following:

Trifacta node on an EC2 instance An S3 bucket to store data IAM roles and policies to access the S3 bucket from the Trifacta node.

Steps:

1. In the Marketplace listing, click Deploy into an existing VPC. 2. Select Template: The template path is automatically populated for you. 3. Specify Details: 1. Stack Name: Display name of the application

NOTE: Each instance of the Trifacta platform should have a separate name.

2. Instance Type: Only one instance type is enabled. 3. Key Pair: Select the SSH pair to use for Trifacta Instance access. 4. Allowed HTTP Source: Please specify the IP address or range of address from which HTTP /HTTPS connections to the application are permitted. 5. Allowed SSH Source: Please specify the IP address or range of address from which SSH connections to the EC2 instance are permitted. 4. Options: None of these is required for installation. Specify your options as needed for your environment. 5. Review: Review your installation and configured options. 1. Select the checkbox at the end of the page. 2. To launch the configured instance, click Create. 6. In the Stacks list, select the name of your application. Click the Outputs tab and collect the following information. Instructions in how to use this information is provided later. 1. No outputs appear until the stack has been created successfully.

Parameter Description Use

Copyright © 2019 Trifacta Inc. Page #51 TrifactaUrl value URL and port number to which to Users must connect to this IP address connect to the Trifacta application and port number to access.

TrifactaBucket The address of the default S3 bucket This value must be applied through the application.

TrifactaInstanceId The identifier for the instance of the This value is the default password for platform the admin account.

NOTE: This password must be changed immediately.

7. When the instance is spinning up for the first time, performance may be slow. When the instance is up, please navigate to the TrifactaUrl location:

http://:3005

8. When the login screen appears, enter the following: 1. Username: [email protected] 2. Password: (the TrifactaInstanceId value)

NOTE: As soon as you login as an admin for the first time, you should change the password.

9. From the application menu, select the Settings menu. Then, click Settings > Admin Settings. 10. In the Admin Settings page, you can configure many aspects of the platform, including user management tasks, and perform restarts to apply the changes. 1. In the Search bar, enter the following:

aws.s3.bucket.name

2. Set the value of this setting to be the TrifactaBucket value that you collected from the Outputs tab. 11. Click Save. 12. When the platform restarts, you can begin using the product.

SSH Access

If you need to SSH to the Trifacta node, you can use the following command:

ssh -i @

Parameter Description

Path to the key file stored on your local computer.

The user ID is always centos .

DNS or IP address of the Trifacta node

Copyright © 2019 Trifacta Inc. Page #52 Upgrade

For more information, see Upgrade for AWS Marketplace.

Documentation

You can access complete product documentation online and in PDF format. From within the product, select Help menu > Product Docs.

Install Software

To install Trifacta® Wrangler Enterprise, please review and complete the following sections in the order listed below.

Topics:

Install Dependencies without Internet Access Install for Docker Install on CentOS and RHEL Install on Ubuntu License Key Install Desktop Application Start and Stop the Platform Login

Install Dependencies without Internet Access

Offline dependencies should be included in the URL location that Trifacta® provided to you. Please use the \*de ps\* file.

NOTE: If your installation server is connected to the Internet, the required dependencies are automatically downloaded and installed for you. You may skip this section.

Use the steps below to acquire and install dependencies required by the Trifacta platform. If you need further assistance, please contact Trifacta Support.

Install software dependencies without Internet access for CentOS or RHEL:

1. In a CentOS or RHEL environment, the dependencies repository must be installed into the following directory:

/var/local/trifacta

2. The following commands configure Yum to point to the repository in /var/local/trifacta, which yum knows as local. Repo permissions are set appropriately. Commands:

tar xvzf .tar.gz mv local.repo /etc/yum.repos.d mv trifacta /var/local

Copyright © 2019 Trifacta Inc. Page #53 chown -R root:root /var/local/trifacta chmod -R o-w+r /var/local/trifacta

3. The following command installs the RPM while disable all repos other than local, which prevents the installer from reaching out to the Internet for package updates:

NOTE: The disabling of repositories only applies to this command.

sudo yum --disablerepo=* --enablerepo=local install .rpm

4. If the above command fails and complains about a missing repo, you can add the missing repo to the enab lerepo list. For example, if the centos-base repo is reported as missing, then the command would be the following:

sudo yum --disablerepo=* --enablerepo=local,centos-base install .rpm

5. If you do not have a supported version of a Java Developer Kit installed on the Trifacta node, you can use the following command to install OpenJDK, which is included in the offline dependencies:

sudo yum --disablerepo=* --enablerepo=local,centos-base install java-1.8.0-openjdk-1.8.0 java-1.8.0-openjdk-devel

Install database dependencies without Internet access:

If you are installing the databases on a node without Internet access, you can install the dependencies using either of the following commands:

NOTE: This step is only required if you are installing the databases on the same node where the software is installed.

For PostgreSQL:

sudo yum --disablerepo=* --enablerepo=local install postgresql96-server

For MySQL:

sudo yum --disablerepo=* --enablerepo=local install mysql-community- server

Copyright © 2019 Trifacta Inc. Page #54 NOTE: You must also install the MySQL JARs on the Trifacta node. These instructions are provided later.

Database are installed after the software is installed. For more information, see Install Databases.

Install dependencies without Internet access in Ubuntu:

If you are trying to perform a manual installation of dependencies in Ubuntu, please contact Trifacta Support. Install for Docker

Contents:

Deployment Scenario Limitations Requirements Docker Daemon Preparation Acquire Image Acquire from FTP site Build your own Docker image Configure Docker Image Start Server Container Import Additional Configuration Files Import license key file Import Hadoop distribution libraries Import Hadoop cluster configuration files Install Kerberos client Perform configuration changes as necessary Start and Stop the Container Stop container Restart container Recreate container Stop and destroy the container Verify Deployment Configuration

This guide steps through the process of acquiring and deploying a Docker image of the Trifacta® platform in your Docker environment. Optionally, you can build the Docker image locally, which enables further configuration options.

Deployment Scenario

Trifacta Wrangler Enterprise deployed into a customer-managed environment: On-premises, AWS, or Azure. PostgreSQL 9.6 installed either: Locally Remote server

Limitations

You cannot upgrade to a Docker image from a non-Docker deployment. You cannot switch an existing installation to a Docker image. Supported distributions of Cloudera or Hortonworks:

Copyright © 2019 Trifacta Inc. Page #55 Supported Deployment Scenarios for Cloudera Supported Deployment Scenarios for Hortonworks The base storage layer of the platform must be HDFS. Base storage of S3 is not supported. High availability for the Trifacta platform in Docker is not supported. SSO integration is not supported.

Requirements

Support for orchestration through Docker Compose only

Docker version 17.12 or later Docker-Compose 1.11.2 or later. Version must be compatible with your version of Docker.

Docker Daemon

Minimum Recommended

CPU Cores 2 CPU 4 CPU

Available RAM 8 GB RAM 10+ GB RAM

Preparation

1. Review the Desktop Requirements.

NOTE: Trifacta Wrangler Enterprise requires the installation of Google Chrome on each desktop. For more information, see Desktop Requirements.

2. Acquire your License Key.

Acquire Image

You can acquire the latest Docker image using one of the following methods:

1. Acquire from FTP site. 2. Build your own Docker image.

Acquire from FTP site

Steps:

1. Download the following files from the FTP site: 1. trifacta-docker-setup-bundle-x.y.z.tar 2. trifacta-docker-image-x.y.z.tar

NOTE: x.y.z refers to the version number (e.g. 6.4.0 ).

2. Untar the setup-bundle file:

Copyright © 2019 Trifacta Inc. Page #56 tar xvf trifacta-docker-setup-bundle-x.y.z.tar

3. Files are extracted into a docker folder. Key files:

File Description

docker-compose-local-postgres.yaml Runtime configuration file for the Docker image when PostgreSQL is to be running on the same machine. More information is provided below.

docker-compose-local-mysql.yaml Runtime configuration file for the Docker image when MySQL is to be running on the same machine. More information is provided below.

docker-compose-remote-db.yaml Runtime configuration file for the Docker image when the database is to be accessed from a remote server.

NOTE: You must manage this instance of the database.

More information is provided below.

README-running-trifacta-container.md Instructions for running the Trifacta container

NOTE: These instructions are referenced later in this workflow.

README-building-trifacta-container.md Instructions for building the Trifacta container

NOTE: This file does not apply if you are using the provided Docker image.

4. Load the Docker image into your local Docker environment:

docker load < trifacta-docker-image-x.y.z.tar

5. Confirm that the image has been loaded. Execute the following command, which should list the Docker image:

docker images

6. You can now configure the Docker image. Please skip that section.

Build your own Docker image

As needed, you can build your own Docker image.

Copyright © 2019 Trifacta Inc. Page #57 Requirements

Docker version 17.12 or later Docker Compose 1.11.2 or newer. It should be compatible with above version of Docker.

Build steps

1. Acquire the RPM file from the FTP site:

NOTE: You must acquire the el7 RPM file for this release.

2. In your Docker environment, copy the trifacta-server\*.rpm file to the same level as the Dockerf ile . 3. Verify that the docker-files folder and its contents are present. 4. Use the following command to build the image:

docker build -t trifacta/server-enterprise:latest .

5. This process could take about 10 minutes. When it is completed, you should see the build image in the Docker list of local images.

NOTE: To reduce the size of the Docker image, the Dockerfile installs the trifacta-server RPM file in one stage and then copies over the results to the final stage. The RPM is not actually installed in the final stage. All of the files are properly located.

6. You can now configure the Docker image.

Configure Docker Image

Before you start the Docker container, you should review the properties for the Docker image. In the provided image, please open the appropriate docker-compose file:

File Description

docker-compose-local-postgres.yaml Database properties in this file are pre-configured to work with the installed instance of PostgreSQL, although you may wish to change some of the properties for security reasons.

docker-compose-local-mysql.yaml Database properties in this file are pre-configured to work with the installed instance of MySQL, although you may wish to change some of the properties for security reasons.

docker-compose-remote-db.yaml The Trifacta databases are to be installed on a remote server that you manage.

NOTE: Additional configuration is required.

NOTE: You may want to create a backup of this file first.

Copyright © 2019 Trifacta Inc. Page #58 Key general properties:

NOTE: Avoid modifying properties that are not listed below.

Property Description

image This reference must match the name of the image that you have acquired.

container_name Name of container in your Docker environment.

ports Defines the listening port for the Trifacta application. Default is 30 05.

NOTE: If you must change the listening port, additional configuration is required after the image is deployed. See Change Listening Port

For more information, see System Ports.

Database properties:

These properties pertain to the installation of the database to which the Trifacta application connects.

Property Description

DB_INIT If set to true , database initialization steps are performed at startup.

NOTE: This step applies only if you are starting the container for the first time, and PostgreSQL databases will be installed locally.

DB_TYPE Set this value to postgresql or mysql.

DB_HOST_NAME Hostname of the machine hosting the databases. Leave value as l ocalhost for local installation.

DB_HOST_PORT (Remote only) Port number to use to connect to the databases. Default is 5432 .

NOTE: If you are modifying, additional configuration is required after installation is complete. See Change Database Port.

DB_ADMIN_USERNAME Admin username to be used to create DB roles/databases. Modify this value for remote installation.

NOTE: If you are modifying this value, additional configuration is required. Please see the documentation for your database version.

DB_ADMIN_PASSWORD Admin password to be used to create DB roles/databases. Modify this value for remote installation.

Copyright © 2019 Trifacta Inc. Page #59 Kerberos properties:

If your Hadoop cluster is protected by Kerberos, please review the following properties.

Property Description

KERBEROS_KEYTAB_FILE Full path inside of the container where the Kerberos keytab file is located. Default value:

/opt/trifacta/conf/trifacta. keytab

NOTE: The keytab file must be imported and mounted to this location. Configuration details are provided later.

KERBEROS_KRB5_CONF Full path inside of the container where the Kerberos krb5. conf file is located. Default:

/opt/krb-config/krb5.conf

Hadoop distribution client JARs:

Please enable the appropriate path to the client JAR files for your Hadoop distribution. In the following example, the Cloudera path has been enabled, and the Hortonworks path has been disabled:

# Mount folder from outside for necessary hadoop client jars # For CDH - /opt/cloudera:/opt/cloudera # For HDP #- /usr/hdp:/usr/hdp

Please modify these lines if you are using Hortonworks.

Volume properties:

These properties govern where volumes are mounted in the container.

NOTE: These values should not be modified unless necessary.

Property Description

volumes.conf Full path in container to the Trifacta configuration directory. Default:

/opt/trifacta/conf

Copyright © 2019 Trifacta Inc. Page #60 volumes.logs Full path in container to the Trifacta logs directory. Default:

/opt/trifacta/logs

volumes.license Full path in container to the Trifacta license directory. Default:

/trifacta-license

Start Server Container

After you have performed the above configuration, execute the following to initialize the Docker container:

docker-compose -f .yaml run trifacta initfiles

When the above is started for the first time, the following directories are created on the localhost:

Directory Description

./trifacta-data Used by the Trifacta container to expose the conf and logs directories.

Import Additional Configuration Files

After you have started the new container, additional configuration files must be imported.

Import license key file

The Trifacta license file must be staged for use by the platform. Stage the file in the following location in the container:

NOTE: If you are using a non-default path or filename, you must update the .yaml file.

trifacta-license/license.json

Import Hadoop distribution libraries

If the container you are creating is on the edge node of your Hadoop cluster, you must provide the Hadoop libraries.

1. You must mount the Hadoop distribution libraries into the container. For more information on the libraries, see the documentation for your Hadoop distribution. 2. The Docker Compose file must be made aware of these libraries. Details are below.

Copyright © 2019 Trifacta Inc. Page #61 Import Hadoop cluster configuration files

Some core cluster configuration files from your Hadoop distribution must be provided to the container. These files must be copied into the following directory within the container:

./trifacta-data/conf/hadoop-site

For more information, see Configure for Hadoop in the Configuration Guide.

Install Kerberos client

If Kerberos is enabled, you must install the Kerberos client and keytab on the node container. Copy the keytab file to the following stage location:

/trifacta-data/conf/trifacta.keytab

See Configure for Kerberos Integration in the Configuration Guide.

Perform configuration changes as necessary

The primary configuration file for the platform is in the following location in the launched container:

/opt/trifacta/conf/trifacta-conf.json

NOTE: Unless you are comfortable working with this file, you should avoid direct edits to it. All subsequent configuration can be applied from within the application, which supports some forms of data validation. It is possible to corrupt the file using direct edits.

Configuration topics are covered later.

Start and Stop the Container

Stop container

Stops the container but does not destroy it.

NOTE: Application and local database data is not destroyed. As long as the .yaml properties point to the correct location of the *-data files, data should be preserved. You can start new containers to use this data, too. Do not change ownership on these directories.

docker-compose -f .yaml stop

Copyright © 2019 Trifacta Inc. Page #62 Restart container

Restarts an existing container.

docker-compose -f .yaml start

Recreate container

Recreates a container using existing local data.

docker-compose -f .yaml up --force-recreate -d

Stop and destroy the container

Stops the container and destroys it.

The following also destroys all application configuration, logs, and database data. You may want to back up these directories first.

docker-compose -f .yaml down

Local PostgreSQL:

sudo rm -rf trifacta-data/ postgres-data/

Local MySQL or remote database:

sudo rm -rf trifacta-data/

Verify Deployment

1. Verify access to the server where the Trifacta platform is to be installed. 2. Cluster Configuration: Additional steps are required to integrate the Trifacta platform with the cluster. See Prepare Hadoop for Integration with the Platform. 3. Start the platform within the container. See Start and Stop the Platform.

Copyright © 2019 Trifacta Inc. Page #63 Configuration

After installation is complete, additional configuration is required. You can complete this configuration from within the application.

Steps:

1. Login to the application. See Login. 2. The primary configuration interface is the Admin Settings page. From the left menu, select Settings menu > Settings > Admin Settings. For more information, see Admin Settings Page in the Admin Guide. 3. Workspace-level configuration can also be applied. From the left menu, select Settings menu > Settings > Workspace Admin. For more information, see Workspace Admin Page in the Admin Guide.

The Trifacta platform requires additional configuration for a successful integration with the datastore. Please review and complete the necessary configuration steps. For more information, see Configure in the Configuration Guide.

Install on CentOS and RHEL

Contents:

Preparation Installation 1. Install Dependencies 2. Install JDK 3. Install Trifacta package 4. Verify Install 5. Install License Key 6. Store install packages 7. Install and configure Trifacta databases Configuration

This guide takes you through the steps for installing Trifacta® Wrangler Enterprise software on CentOS or Red Hat.

For more information on supported versions, see System Requirements.

Preparation

Before you begin, please complete the following.

NOTE: Except for database installation and configuration, all install commands should be run as the root user or a user with similar privileges. For database installation, you will be asked to switch the database user account.

Steps:

1. Set the node where Trifacta Wrangler Enterprise is to be installed. 1. Review the System Requirements and verify that all required components have been installed.

Copyright © 2019 Trifacta Inc. Page #64 2. Verify that all required system ports are opened on the node. See System Ports. 2. Review the Desktop Requirements.

NOTE: Trifacta Wrangler Enterprise requires the installation of Google Chrome on each desktop. For more information, see Desktop Requirements.

3. Review the System Dependencies.

NOTE: If you are installing on node without access to the Internet, you must download the offline dependencies before you begin. See Install Dependencies without Internet Access.

4. Acquire your License Key. 5. Install and verify operations of the datastore, if used.

NOTE: Access to the Spark cluster is required.

6. Verify access to the server where the Trifacta platform is to be installed. 7. Cluster Configuration: Additional steps are required to integrate the Trifacta platform with the cluster. See Prepare Hadoop for Integration with the Platform.

Installation

1. Install Dependencies

Without Internet access

If you have not done so already, you may download the dependency bundle with your release directly from Trifacta . For more information, see Install Dependencies without Internet Access.

With Internet access

Use the following to add the hosted package repository for CentOS/RHEL, which will automatically install the proper packages for your environment.

# If the client has curl installed ... curl https://packagecloud.io/install/repositories/trifacta/dependencies /script.rpm.sh | sudo bash

# Otherwise, you can also use wget ... wget -qO- https://packagecloud.io/install/repositories/trifacta /dependencies/script.rpm.sh | sudo bash

2. Install JDK

By default, the Trifacta node uses OpenJDK for accessing Java libraries and components. In some environments, basic setup of the node may include installation of a JDK. Please review your environment to verify that an appropriate JDK version has been installed on the node.

Copyright © 2019 Trifacta Inc. Page #65 NOTE: Use of Java Development Kits other than OpenJDK is not currently supported. However, the platform may work with the Java Development Kit of your choice, as long as it is compatible with the

supported version(s) of Java. See System Requirements.

NOTE: OpenJDK is included in the offline dependencies, which can be used to install the platform without Internet access. For more information, see Install Dependencies without Internet Access.

The following commands can be used to install OpenJDK. These commands can be modified to install a separate compatible version of the JDK.

sudo yum install java-1.8.0-openjdk-1.8.0 java-1.8.0-openjdk-devel

NOTE: If java-1.8.0-openjdk-devel is not included, the batch job runner service, which is required, fails to start.

JAVA_HOME:

By default, the JAVA_HOME environment variable is configured to point to a default install location for the OpenJDK package.

NOTE: If you have installed a JDK other than the OpenJDK version provided with the software, you must set the JAVA_HOME environment variable on the Trifacta node to point to the correct install location.

The property value must be updated in the following locations:

1. Edit the following file: /opt/trifacta/conf/env.sh 2. Save changes.

3. Install Trifacta package

NOTE: If you are installing without Internet access, you must reference the local repository. The command to execute the installer is slightly different. See Install Dependencies without Internet Access.

NOTE: Installing the Trifacta platform in a directory other than the default one is not supported or recommended.

Install the package with yum, using root:

sudo yum install

4. Verify Install

The product is installed in the following directory:

Copyright © 2019 Trifacta Inc. Page #66 /opt/trifacta

JAVA_HOME:

The platform must be made aware of the location of Java.

Steps:

1. Edit the following file: /opt/trifacta/conf/trifacta-conf.json 2. Update the following parameter value:

"env": { "JAVA_HOME": "/usr/lib/jvm/java-1.8.0-openjdk.x86_64" },

3. Save changes.

5. Install License Key

Please install the license key provided to you by Trifacta. See License Key.

6. Store install packages

For safekeeping, you should retain all install packages that have been installed with this Trifacta deployment.

7. Install and configure Trifacta databases

The Trifacta platform requires installation of several databases. If you have not done so already, you must install and configure the databases used to store Trifacta metadata. See Install Databases.

Configuration

After installation is complete, additional configuration is required.

The Trifacta platform requires additional configuration for a successful integration with the datastore. Please review and complete the necessary configuration steps. For more information, see Configure .

Install on Ubuntu

Contents:

Preparation Installation 1. Install Dependencies 2. Install JDK 3. Install Trifacta package 4. Verify Install 5. Install License Key 6. Store install packages

Copyright © 2019 Trifacta Inc. Page #67 7. Install and configure Trifacta databases Configuration

This guide takes you through the steps for installing Trifacta® Wrangler Enterprise software on Ubuntu.

For more information on supported operating system versions, see System Requirements.

Preparation

Before you begin, please complete the following.

NOTE: Except for database installation and configuration, all install commands should be run as the root user or a user with similar privileges. For database installation, you will be asked to switch the database user account.

Steps:

1. Set the node where Trifacta Wrangler Enterprise is to be installed. 1. Review the System Requirements and verify that all required components have been installed. 2. Verify that all required system ports are opened on the node. See System Ports. 2. Review the Desktop Requirements.

NOTE: Trifacta Wrangler Enterprise requires the installation of Google Chrome on each desktop.

3. Review the System Dependencies.

NOTE: If you are installing on node without access to the Internet, you must download the offline dependencies before you begin. See Install Dependencies without Internet Access.

4. Acquire your License Key. 5. Install and verify operations of the datastore, if used.

NOTE: Access to the cluster may be required.

6. Verify access to the server where the Trifacta platform is to be installed. 7. Cluster configuration: Additional steps are required to integrate the Trifacta platform with the cluster. See Prepare Hadoop for Integration with the Platform.

Installation

1. Install Dependencies

Without Internet access

If you have not done so already, you may download the dependency bundle with your release directly from Trifacta . For more information, see Install Dependencies without Internet Access.

Copyright © 2019 Trifacta Inc. Page #68 With Internet access

Use the following to add the hosted package repository for Ubuntu, which will automatically install the proper packages for your environment.

NOTE: Install curl if not present on your system.

Then, execute the following command:

NOTE: Run the following command as the root user. In proxied environments, the script may encounter issues with detecting proxy settings.

curl https://packagecloud.io/install/repositories/trifacta/dependencies /script.deb.sh | sudo bash

Special instructions for Ubuntu installs

These steps manually install the correct and supported version of the following:

nodeJS nginX

Due to a known issue resolving package dependencies on Ubuntu, please complete the following steps prior to installation of other dependencies or software.

1. Login to the Trifacta node as an administrator. 2. Execute the following command to install the appropriate versions of nodeJS and nginX.

1. Ubuntu 14.04:

sudo apt-get install nginx=1.12.2-1~trusty nodejs=10.13.0- 1nodesource1

2. Ubuntu 16.04

sudo apt-get install nginx=1.12.2-1~xenial nodejs=10.13.0- 1nodesource1

3. Continue with the installation process.

2. Install JDK

By default, the Trifacta node uses OpenJDK for accessing Java libraries and components. In some environments, basic setup of the node may include installation of a JDK. Please review your environment to verify that an appropriate JDK version has been installed on the node.

Copyright © 2019 Trifacta Inc. Page #69 NOTE: Use of Java Development Kits other than OpenJDK is not currently supported. However, the platform may work with the Java Development Kit of your choice, as long as it is compatible with the supported version(s) of Java. See System Requirements.

NOTE: OpenJDK is included in the offline dependencies, which can be used to install the platform without Internet access. For more information, see Install Dependencies without Internet Access.

The following commands can be used to install OpenJDK. These commands can be modified to install a separate compatible version of the JDK.

sudo apt-get install openjdk-8-jre-headless

JAVA_HOME:

By default, the JAVA_HOME environment variable is configured to point to a default install location for the OpenJDK package.

NOTE: If you have installed a JDK other than the OpenJDK version provided with the software, you must set the JAVA_HOME environment variable on the Trifacta node to point to the correct install location.

The property value must be updated in the following locations:

1. Edit the following file: /opt/trifacta/conf/env.sh 2. Save changes.

3. Install Trifacta package

NOTE: If you are installing without Internet access, you must reference the local repository. The command to execute the installer is slightly different. See Install Dependencies without Internet Access.

NOTE: Installing the Trifacta platform in a directory other than the default one is not supported or recommended.

Install the package with apt, using root:

sudo dpkg -i

The previous line may return an error message, which you may ignore. Continue with the following command:

sudo apt-get -f -y install

4. Verify Install

The product is installed in the following directory:

Copyright © 2019 Trifacta Inc. Page #70 /opt/trifacta

JAVA_HOME:

The platform must be made aware of the location of Java.

Steps:

1. Edit the following file: /opt/trifacta/conf/trifacta-conf.json 2. Update the following parameter value:

"env": { "JAVA_HOME": "/usr/lib/jvm/java-1.8.0-openjdk.x86_64" },

3. Save changes.

5. Install License Key

Please install the license key provided to you by Trifacta. See License Key.

6. Store install packages

For safekeeping, you should retain all install packages that have been installed with this Trifacta deployment.

7. Install and configure Trifacta databases

The Trifacta platform requires installation of several databases. If you have not done so already, you must install and configure the databases used to store Trifacta metadata. See Install Databases.

Configuration

After installation is complete, additional configuration is required.

The Trifacta platform requires additional configuration for a successful integration with the datastore. Please review and complete the necessary configuration steps. For more information, see Configure .

License Key

Contents:

Download license key file Acquire license key Install your license key Update your license key Changing the license key location Expired license Invalid license key file

Copyright © 2019 Trifacta Inc. Page #71 Download license key file

If you have not done so already, the license key file is available where you have acquired the installation package. Please download license.json.

Acquire license key

A valid license key (license.json) is provided to each customer prior to installation. Your license key file is a JSON file that contains important information on your license.

NOTE: If your license key has expired, please contact Trifacta Support.

Install your license key

If you are updating your license, you may want to save your previous license key to a new location before overwriting.

NOTE: Do not maintain multiple license key files in this directory.

To apply your license key, copy the key file to the following location in the Trifacta® deployment:

/opt/trifacta/license

Update your license key

After you have installed your license key, you can update your license with a new one through the Admin Settings page. See Admin Settings Page.

Changing the license key location

By default, the license key file in use must be named: license.json.

If needed, you can change the path and filename of the license key. The property is the following:

"license.location"

See Admin Settings Page.

Expired license

NOTE: If your license expires, you cannot use the product until a new and valid license key file has been applied. When administrators attempt to login to the application, they are automatically redirected to a location from which they can upload a new license key file.

Copyright © 2019 Trifacta Inc. Page #72 Invalid license key file

When you start the Trifacta platform, you may see the following:

Your license key is missing or has expired. Please contact Trifacta Support.

Install Desktop Application

Contents:

Install Process Download Setup Install for Windows Windows Command Line Installation and Configuration Launch the Application Documentation Note Troubleshooting Cannot connect to server "Does Not Support Your Browser" error

If your environment does not support the use of Chrome, you can install the Wrangler Enterprise desktop application to provide the same access and functionality as the Trifacta® application. This desktop application connects to the enterprise Trifacta instance and provides the same capabilities without requiring a locally installed version of Chrome browser.

Wrangler Enterprise desktop application is a hybrid desktop application. Your local application instance accesses registered data files located in the datastore to which the Trifacta node is connected.

Install Process

NOTE: The Wrangler Enterprise desktop application is a 64-bit Microsoft Windows application. It requires a 64-bit version of Windows to execute. The application also supports Single Sign On (SSO), if it is enabled.

Copyright © 2019 Trifacta Inc. Page #73 Download

To begin, you must download the following Windows MSI file (TrifactaEnterpriseSetup.msi) from the location where your software was provided.

If you are planning to automate installation to desktops in your environment, please also download setTrifacta Server.ps1.

Setup

Before you begin, you should perform any necessary configuration of the Trifacta node before deploying the instances of the application. See Configure for Desktop Application.

Install for Windows

Steps:

1. On your Windows desktop, double-click the MSI file. 2. Follow the on-screen instructions to install the software.

Windows Command Line Installation and Configuration

As an alternative, you can perform installation and initial configuration from the command line. Download the MSI and the PS1 files to a local directory that is accessible.

NOTE: For command line install, you must download from the setTrifactaServer.ps1 from the download location.

Install software:

msiexec /i /passive

Configure URL of Trifacta node:

setTrifactaServer.ps1 -trifactaServer -installDir

Parameter Description

trifactaServer (Required) URL of the server hosting the Trifacta platform. Format:

://:

installDir (Optional) Specifies the installation directory in the local environment.

If not specified, installation directory defaults to use the same path as the installer.

common installer parameters This command supports the following Windows installer

Copyright © 2019 Trifacta Inc. Page #74 parameters: Verbose, Debug, ErrorAction, ErrorVariable, WarningAction, WarningVariable, OutBuffer, PipelineVariable, and OutVariable. For more information, see about_CommonParameters here: http://go.microsoft.com/fwlink/?LinkID=113216.

After this install is completed, desktop users should be able to use the application normally.

Launch the Application

Steps:

1. When installation is complete, double-click the application icon. 2. For the server, please enter the full URL including port number of the Trifacta instance to which you are connecting.

1. By default, the server is available over port 3005. For more information, please contact your IT administrator. 2. If you connect to the Internet through a proxy, additional configuration is required. See Configure Server Access through Proxy.

NOTE: If you make a mistake in specifying the URL to the server, please uninstall and reinstall the MSI. This step clears the local application cache, and you can enter the appropriate path through the application. See Uninstall below.

3. When the proper URL and port number are provided, you may launch the application. 4. If your environment contains multiple server deployments, you can select the one to which to connect:

Figure: Choose Server 5. Login with your Trifacta account. See Login.

Documentation Note

Unless specifically noted, all features described for Trifacta Wrangler Enterprise or the Trifacta application apply to the Wrangler Enterprise desktop application. Uninstall

To uninstall from your Windows machine, use the Add or Remove Programs control panel.

Copyright © 2019 Trifacta Inc. Page #75 Troubleshooting

Cannot connect to server

If you are unable to connect to the server, please do the following:

1. Verify that you are connecting to the appropriate URL. 1. If you are connecting to the incorrect URL, please uninstall the application and re-install using the MSI file. See Uninstall above. 2. Verify if you need to connect to the server through a proxy server. If so, additional configuration is required. See Configure Server Access through Proxy. 3. Check your firewall settings.

"Does Not Support Your Browser" error

This error message indicates that you are trying to connect to an instance of the server that does not support the Wrangler Enterprise desktop application. Please verify that your connection URL is pointed to a supported instance of the server. Start and Stop the Platform

Contents:

Start Verify operations Restart Stop Debugging Troubleshooting Error - SequelizeConnectionRefusedError: connect ECONNREFUSED

Tip: The Restart Trifacta button in the Admin Settings page is the preferred method for restarting the platform.

NOTE: The restart button is not available when high availability is enabled for the Trifacta® node.

See Admin Settings Page.

Start

NOTE: These operations must be executed under the root user.

Command:

Copyright © 2019 Trifacta Inc. Page #76 service trifacta start

Verify operations

Steps:

1. Check logs for errors:

/opt/trifacta/logs/*.log

1. You can also access logs through the Trifacta® application for each service. See System Services and Logs. 2. Login to the Trifacta application. If available, perform a simple transformation operation. See Login. 3. Run a simple job. See Verify Operations.

Restart

Command:

service trifacta restart

When the login page is available, the system has been restarted. See Login.

Tip: If you have made any configuration changes, you should verify operations. See Verify Operations.

Stop

Command:

service trifacta stop

Debugging

You can verify operations of WebHDFS. Command:

curl -i "http://:/webhdfs/v1/? op=LISTSTATUS&user.name=trifacta"

Copyright © 2019 Trifacta Inc. Page #77 Troubleshooting

Error - SequelizeConnectionRefusedError: connect ECONNREFUSED

If you have attempted to start the platform after an operating system reboot, you may receive the following error message, and the platform start fails to complete:

2016-10-04T14:03:17.883Z - error: [ENVIRONMENT] Environment Sanity Test Failed 2016-10-04T14:03:17.883Z - error: [ENVIRONMENT] Exception Type: Error 2016-10-04T14:03:17.883Z - error: [ENVIRONMENT] Exception Message: SequelizeConnectionRefusedError: connect ECONNREFUSED

Solution:

NOTE: This solution applies to PostgreSQL 9.6 only. Please modify for your installed database version.

This error can occur when the operating system is restarted. Please execute the following commands to check the PostgreSQL configuration and restart the databases.

chkconfig postgresql-9.6 on

Then, restart the platform as normal.

service trifacta restart

Login

NOTE: Administrators of the platform should change the default password for the admin account. See Change Admin Password.

To login to the Trifacta® application, navigate to the following in your browser:

http://:

where:

is the host of the Trifacta application. is the port number to use. Default is 3005.

If you do not have an account, click Register.

If self-registration is enabled, you you may be able to immediately login after registering. If Kerberos or secure impersonation is enabled, an administrator must apply a Hadoop principal value to the account before you can login. Please contact your Trifacta administrator. System administrators can enable self-registration. See Configure User Self-Registration.

Copyright © 2019 Trifacta Inc. Page #78 After you login, you are placed in the Flows page, where you can create and manage your datasets and flows. See Flows Page.

If you are using S3 as your base storage layer and per-user authentication has been enabled, you must provide the AWS credentials to connect to your storage. From the left navigation bar, select Settings > Storage and then select the AWS option. See Configure Your Access to S3.

For a basic walkthrough of the Trifacta application, see Workflow Basics.

To logout:

From the Settings menu, select Logout. Install Reference

These appendices provide additional information during installation of Trifacta® Wrangler Enterprise.

Topics:

Install SSL Certificate Change Listening Port Supported Deployment Scenarios for Cloudera Supported Deployment Scenarios for Hortonworks Supported Deployment Scenarios for AWS Supported Deployment Scenarios for Azure Uninstall

Install SSL Certificate

Contents:

Pre-requisites Configure nginx Modify listening port for Trifacta platform Add secure HTTP headers Enable secure cookies Troubleshooting

You may optionally configure an SSL certificate to secure connections to the web application of the Trifacta® platform.

Pre-requisites

1. A valid SSL certificate for the FQDN where the Trifacta application is hosted 2. Root access to the Trifacta server 3. Trifacta platform is up and running

Configure nginx

There are two separate Nginx services on the server: one service for internal application use, and one service that functions as a proxy between users and the Trifacta application. To install the SSL certificate, all configuration are applied to the proxy process only.

Copyright © 2019 Trifacta Inc. Page #79 Steps:

1. Log into the Trifacta server as the centos user. Switch to the root user:

sudo su

2. Enable the proxy nginx service so that it starts on boot:

systemctl enable nginx

3. Create a folder for the private key and limit access to it:

sudo mkdir /etc/ssl/private/ && sudo chmod 700 /etc/ssl/private

4. Copy the following files to the server. If you copy and paste the content, please ensure that you do not miss characters or insert unwanted characters.

1. The .key file should go into the /etc/ssl/private/ directory. 2. The .crt file and the CA bundle/intermediate certificate bundle should go into the /etc/ssl /certs/ directory.

NOTE: The delivery name and format of these files varies by provider. Please verify with your provider's documentation if this is unclear.

3. Your certificate and the intermediate/authority certificate must be combined into one file for nginx. Here is an example of how to combine them together:

cat example_com.crt bundle.crt >> ssl-bundle.crt

5. Update the permissions on these files. Modify the following filenames as necessary:

sudo chmod 600 /etc/ssl/certs/ssl-bundle.crt sudo chmod 600 /etc/ssl/private/your-private-cert.key

6. Use the following commands to deploy the example SSL configuration file provided on the server:

NOTE: Below, some values are too long for a single line. Single lines that overflow to additional lines are marked with a \ . The backslash should not be included if the line is used as input.

cp /opt/trifacta/conf/ssl-nginx.conf.sample /etc/nginx/conf.d /trifacta.conf && \ rm /etc/nginx/conf.d/default.conf

Copyright © 2019 Trifacta Inc. Page #80 7. Edit the following file:

/etc/nginx/conf.d/trifacta.conf

8. Please modify the following key directives at least:

Directive Description

server_name FQDN of the host, which must match the SSL certificate's Common Name

ssl_certificate Path to the file of the certificate bundle that you created on the server. This value may not require modification.

ssl_certificate_key Path to the .key file on the server.

Example file:

server { listen 443; ssl on; server_name EXAMPLE.CUSTOMER.COM; # Don't limit the size of client uploads. client_max_body_size 0; access_log /var/log/nginx/ssl-access.log; error_log /var/log/nginx/ssl-error.log; ssl_certificate /etc/ssl/certs/ssl-bundle.crt; ssl_certificate_key /etc/ssl/certs/EXAMPLE-NAME.key; ssl_protocols SSLv3 TLSv1 TLSv1.1 TLSv1.2; ssl_ciphers RC4:HIGH:!aNULL:!MD5; ssl_prefer_server_ciphers on; keepalive_timeout 60; ssl_session_cache shared:SSL:10m; ssl_session_timeout 10m; location / { proxy_pass http://localhost:3005; proxy_next_upstream error timeout invalid_header http_500 http_502 http_503 http_504; proxy_set_header Accept-Encoding ""; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; add_header Front-End-Https on; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; proxy_set_header Host $host; proxy_redirect off; } proxy_connect_timeout 6000;

Copyright © 2019 Trifacta Inc. Page #81 proxy_send_timeout 6000; proxy_read_timeout 6000; send_timeout 6000; } server { listen 80; return 301 https://$host$request_uri; }

9. Save the file. 10. To apply the new configuration, start or restart the nginx service:

service nginx restart

Modify listening port for Trifacta platform

If you have changed the listening port as part of the above configuration change, then the proxy.port setting in Trifacta platform configuration must be updated. See Change Listening Port.

Add secure HTTP headers

If you have enabled SSL on the platform, you can optionally insert the following additional headers to all requests to the Trifacta node:

Header Protocol Required Parameters

X-XSS-Protection HTTP and HTTPS proxy.securityHeaders. enabled=true

X-Frame-Options HTTP and HTTPS proxy.securityHeaders. enabled=true

Strict-Transport-Security HTTPS proxy.securityHeaders. enabled=true and proxy.securityHeaders. httpsHeaders=true

NOTE: SSL must be enabled to apply these security headers.

Steps:

To add these headers to all requests, please apply the following change:

1. You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods. 2. Locate the following setting and change its value to true:

"proxy.securityHeaders.httpsHeaders": false,

3. Save your changes and restart the platform.

Copyright © 2019 Trifacta Inc. Page #82 Enable secure cookies

If you have enabled SSL on the platform, you can optionally enable the use of secure cookies.

NOTE: SSL must be enabled.

Steps:

1. You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods. 2. Locate the following setting and change its value to true:

"webapp.session.cookieSecureFlag": false,

3. Save your changes and restart the platform.

Troubleshooting

Problem - SELinux blocks proxy service from communicating with internal app service

If the Trifacta platform is installed on SELinux, the operating system blocks communications between the service that manages the proxy between users and the application and the service that manages internal application communications.

To determine if this problem is present, execute the following command:

sudo cat /var/log/audit/audit.log | grep nginx | grep denied

The problem is present if an error similar to the following is returned:

type=AVC msg=audit(1555533990.045:1826142): avc: denied { name_connect } for pid=25516 comm="nginx" dest=3005 scontext=system_u:system_r: httpd_t:s0

For more information on this issue, see https://www.nginx.com/blog/using-nginx-plus-with-selinux.

Solution:

The solution is to enable the following network connection through the operating system:

sudo setsebool -P httpd_can_network_connect 1

Restart the platform. Change Listening Port

If you need to change the listening port for the Trifacta® platform, please complete the following instructions.

Copyright © 2019 Trifacta Inc. Page #83 Tip: This change most typically applies if you are enabling use of SSL. For more information, see Install SSL Certificate.

NOTE: By default, the platform listens on port 3005 . All client browsing devices must be configured to enable use of this port or any port number that you choose to use.

Steps:

1. Login to the Trifacta node as an admin. 2. Edit the following file:

/opt/trifacta/conf/nginx.conf

3. Edit the following setting:

server { listen 3005; ...

4. Save the file. 5. You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods. 6. Locate the following setting:

"proxy.port": 3005,

7. Set this value to the same value you applied in nginx.conf. 8. Save your changes and restart the platform.

Supported Deployment Scenarios for AWS

Contents:

AWS Deployment Scenarios AWS Installations Trifacta Data Preparation for Amazon Redshift and S3 on AWS Marketplace (AMI) Trifacta Wrangler Enterprise on AWS Marketplace with EMR Trifacta Wrangler Enterprise on EC2 Instance AWS Integrations

AWS Deployment Scenarios

The following are the basic AWS deployment scenarios.

Trifacta platform deployed through AWS Marketplace:

Copyright © 2019 Trifacta Inc. Page #84 Deployment Trifacta node Base Storage Storage - S3 Storage - Cluster Notes Scenario installation Layer Redshift

Trifacta Data EC2 S3 read/write read/write None Trifacta Data Preparation for Preparation for Amazon Amazon Redshift and S3 Redshift and S3 AWS install does not support through AWS integration with Marketplace any running CloudFormation environment template clusters. All job execution occurs on the Trifacta in thenode in thenode Trifac ta Photon running environment. This scenario is suitable for smaller user groups and data volumes.

Trifacta EC2 S3 read/write read/write EMR This deployment Wrangler scenario Enterprise AWS integrates by install through default with an AWS EMR cluster, Marketplace which is created CloudFormation as part of the template - with process. integration to EMR cluster It does not support integration with a Hadoop cluster.

Trifacta EC2 S3 read/write read/write EMR This deployment Wrangler scenario Enterprise AWS assumes that install through the platform is to AWS be integrated at Marketplace - a later time with without a pre-existing integration to EMR cluster. EMR cluster

Trifacta platform installed on AWS:

Deployment Trifacta node Base Storage Storage - S3 Storage - Cluster Notes Scenario installation Layer Redshift

Trifacta EC2 HDFS read only Not supported EMR When HDFS is Wrangler the base storage Enterprise AWS layer, the only install with S3 accessible AWS read access resources is read-only access to S3.

Trifacta EC2 S3 read/write read/write EMR Wrangler Enterprise AWS install with S3 read/write access

Trifacta platform installed on-premises and integrated with AWS resources:

Deployment Base Storage Storage - S3 Storage - Cluster Notes

Copyright © 2019 Trifacta Inc. Page #85 Scenario Trifacta node Layer Redshift installation

Trifacta On-premises HDFS read only Not supported Hadoop When HDFS is Wrangler the base storage Enterprise on- layer, the only premises install accessible AWS with S3 read resources is access read-only access to S3.

For more information, see Install Software.

Microsoft Azure Integration with AWS-based resources is not supported. See Install for Azure.

Legend and Notes:

Column Notes

Deployment Scenario Description of the AWS-connected deployment

Trifacta node installation Location where the Trifacta node is installed in this scenario.

All AWS installations are installed on EC2 instances.

Base Storage Layer When the Trifacta platform is first installed, the base storage layer must be set.

NOTE: After you have begun using the product, you cannot change the base storage layer.

NOTE: Read/write access to AWS-based resources requires that S3 be set as the base storage layer.

Storage - S3 Trifacta Wrangler Enterprise supports read access to S3 when the base storage layer is set to HDFS.

For read/write access to S3, the base storage layer must be set to S3.

Storage - Redshift For access to Redshift, the base storage layer must be set to S3.

Cluster List of cluster types that are supported for integration and job execution at scale.

The Trifacta platform can integrate with at most one cluster. It cannot integrate with two different clusters at the same time. Access to an EMR cluster requires S3 to be the base storage layer. Smaller jobs can be executed on the Trifacta Photon running environment, which is hosted on the Trifacta node itself. For more information, see Running Environment Options.

Notes Any additional notes

AWS Installations

Copyright © 2019 Trifacta Inc. Page #86 Trifacta Data Preparation for Amazon Redshift and S3 on AWS Marketplace (AMI)

Through the Amazon Marketplace, you can license and deploy an AMI of Trifacta Data Preparation for Amazon Redshift and S3, which does not require integration with a clustered running environment. All job execution happens within the AMI on the EC2 instance that you deploy. For more information, see the Trifacta Data Preparation for Amazon Redshift and S3 listing for AWS Marketplace.

For install and configuration instructions, see Install from AWS Marketplace.

Trifacta Wrangler Enterprise on AWS Marketplace with EMR

You can deploy an AMI of the Trifacta platform onto an EC2 instance. For more information, see the Trifacta Wrangler Enterprise listing for AWS Marketplace.

You can deploy it in either of the following ways:

1. Auto-create a 3-node EMR cluster. For more information on installation, see Install from AWS Marketplace with EMR. 2. Integrate it later with your pre-existing EMR cluster. 1. For more information on base AWS configuration, see Configure for AWS. 2. For more information on configuring integration with EMR, see Configure for EMR.

Trifacta Wrangler Enterprise on EC2 Instance

When the Trifacta platform is installed on AWS, it is deployed on an EC2 instance. Through the EC2 console, there are a few key parameters that must be specified.

NOTE: After you have created the instance, you should retain the instanceId from the console, which must be applied to the configuration in the Trifacta platform.

For more information, see Install.

For more information on base AWS configuration, see Configure for AWS.

For more information on configuring EC2, see Configure for EC2 Role-Based Authentication.

AWS Integrations

The following table describes the different AWS components that can host or integrate with the Trifacta platform. Combinations of one or more of these items constitute one of the deployment scenarios listed in the following section.

AWS Service Description Base Storage Layer Other Required AWS Services

EC2 Amazon Elastic Compute Cloud Base storage layer can be S3 (EC2) can be used to host the Tr or HDFS. ifacta node in a scalable cloud- based environment. The If set to HDFS, only read following deployments are access to S3 is permitted. supported:

Trifacta Wrangler Enterprise with or without access to an EMR cluster

Trifacta Data Preparation for Amazon Redshift and S3 on an AMI

Copyright © 2019 Trifacta Inc. Page #87 S3 Amazon Simple Storage Base storage layer can be S3 Service (S3) can be used for or HDFS. reading data sources, writing job results, and hosting the Trifa If set to HDFS, only read cta databases. access to S3 is permitted.

Redshift Amazon Redshift provides a Base Storage Layer = S3 S3 scalable data warehouse platform, designed for big data analytics applications. The Trifac ta platform can be configured to read and write from Amazon Redshift database tables.

EMR For more information on Base Storage Layer = S3 EC2 instance supported versions of EMR, see Configure for EMR.

Amazon RDS Optionally, the Trifacta Base Storage Layer = S3 databases can be installed on Amazon RDS. For more information, see Install Databases on Amazon RDS .

AWS Marketplace integrations:

AWS Service Description Base Storage Layer Other Required AWS Services

AMI Through the AWS Marketplace, Base Storage Layer = S3 EC2 instance you can license and install an Amazon Machine Image (AMI) instance of Trifacta Data NOTE: HDFS is not Preparation for Amazon supported. Redshift and S3. This product is intended for smaller user groups that do not need large- scale processing of Hadoop- based clusters.

EMR Through the AWS Marketplace, Base Storage Layer = S3 AMI you can license and install an AMI specifically configured to work with Amazon Elastic Map Reduce (EMR), a Hadoop- based data processing platform.

Uninstall

To remove Trifacta® Wrangler Enterprise, execute as root user one of the following commands on the Trifacta node.

NOTE: All platform and cluster configuration files are preserved. User metadata is preserved in the Trifact a database.

CentOS/RHEL:

sudo rpm -e trifacta

Ubuntu:

Copyright © 2019 Trifacta Inc. Page #88 sudo apt-get remove trifacta

Configure for AWS

Contents:

Internet Access Database Installation Base AWS Configuration Base Storage Layer Configure AWS Region AWS Authentication AWS Auth Mode AWS Credential Provider AWS Storage S3 Sources Redshift Connections AWS Clusters EMR Hadoop

This documentation applies to installation from a supported Marketplace. Please use the installation instructions provided with your deployment.

If you are installing or upgrading a Marketplace deployment, please use the available PDF content. You must use the install and configuration PDF available through the Marketplace listing.

The Trifacta® platform can be hosted within Amazon and supports integrations with multiple services from Amazon Web Services, including combinations of services for hybrid deployments. This section provides an overview of the integration options, as well as links to related configuration topics.

For an overview of AWS deployment scenarios, see Supported Deployment Scenarios for AWS.

Internet Access

From AWS, the Trifacta platform requires Internet access for the following services:

NOTE: Depending on your AWS deployment, some of these services may not be required.

AWS S3 Key Management System [KMS] (if sse-kms server side encryption is enabled) Secure Token Service [STS] (if temporary credential provider is used) EMR (if integration with EMR cluster is enabled)

Copyright © 2019 Trifacta Inc. Page #89 NOTE: If the Trifacta platform is hosted in a VPC where Internet access is restricted, access to S3, KMS and STS services must be provided by creating a VPC endpoint. If the platform is accessing an EMR cluster, a proxy server can be configured to provide access to the AWS ElasticMapReduce regional endpoint.

Database Installation

The following database scenarios are supported.

Database Host Description

Cluster node By default, the Trifacta databases are installed on PostgreSQL instances in the Trifacta node or another accessible node in the enterprise environment. For more information, see Install Databases.

Amazon RDS For Amazon-based installations, you can install the Trifacta databases on PostgreSQL instances on Amazon RDS. For more information, see Install Databases on Amazon RDS.

Base AWS Configuration

The following configuration topics apply to AWS in general.

You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.

Base Storage Layer

NOTE: The base storage layer must be set during initial configuration and cannot be modified after it is set.

S3: Most of these integrations require use of S3 as the base storage layer, which means that data uploads, default location of writing results, and sample generation all occur on S3. When base storage layer is set to S3, the Trifacta platform can:

read and write to S3 read and write to Redshift connect to an EMR cluster

HDFS: In on-premises installations, it is possible to use S3 as a read-only option for a Hadoop-based cluster when the base storage layer is HDFS. You can configure the platform to read from and write to S3 buckets during job execution and sampling. For more information, see Enable S3 Access.

For more information on setting the base storage layer, see Set Base Storage Layer.

For more information, see Storage Deployment Options.

Configure AWS Region

For Amazon integrations, you can configure the Trifacta node to connect to Amazon datastores located in different regions.

NOTE: This configuration is required under any of the following deployment conditions:

Copyright © 2019 Trifacta Inc. Page #90 1. The Trifacta node is installed on-premises, and you are integrating with Amazon resources. 2. The EC2 instance hosting the Trifacta node is located in a different AWS region than your Amazon datastores. 3. The Trifacta node or the EC2 instance does not have access to s3.amazonaws.com .

1. In the AWS console, please identify the location of your datastores in other regions. For more information, see the Amazon documentation. 2. Login to the Trifacta application. 3. You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods. 4. Set the value of the following property to the region where your S3 datastores are located:

aws.s3.region

If the above value is not set, then the Trifacta platform attempts to infer the region based on default S3 bucket location. 5. Save your changes.

AWS Authentication

The following table illustrates the various methods of managing authentication between the platform and AWS. The matrix of options is basically determined by the settings for two key parameters.

credential provider - source of credentials: platform (default), instance (EC2 instance only), or temporary AWS mode - the method of authentication from platform to AWS: system-wide or by-user

AWS Mode System User

Credential Provider

Default One system-wide key/secret combo is Each user provides key/secret combo. inserted in the platform for use

Config: Config:

"aws. "aws. credentialProvi credentialProvi der": der": "default", "default", "aws.mode": "aws.mode": "system", "user", "aws.s3.key": , "aws.s3. User: Configure Your Access to S3 secret": ,

Instance Platform uses EC2 instance roles. Users provide EC2 instance roles.

Config: Config:

Copyright © 2019 Trifacta Inc. Page #91 "aws. "aws. credentialProvi credentialProvi der": der": "instance", "instance", "aws.mode": "aws.mode": "system", "user",

Temporary Temporary credentials are issued based on Per-user authentication when using IAM per-user IAM roles. role.

Config: Config:

"aws. "aws. credentialProvi credentialProvi der": der": "temporary", "instance", "aws.mode": "aws.mode": "system", "user",

AWS Auth Mode

When connecting to AWS, the platform supports the following basic authentication modes:

Mode Configuration Description

system Access to AWS resources is managed "aws.mode": through a single, system account. The account that you specify is based on the "system", credential provider selected below.

The instance credential provider ignores this setting.

See below.

user Authentication must be specified for "aws.mode": individual users. "user", Tip: In AWS user mode, Trifacta administrators can manage S3 access for users through the Admin Settings page. See Manage Users.

AWS Credential Provider

The Trifacta platform supports the following methods of providing credentialed access to AWS and S3 resources.

Type Configuration Description

Copyright © 2019 Trifacta Inc. Page #92 default This method uses the provided AWS Key "aws. and Secret values to access resources. See below. credentialProvi der":"default",

instance When you are running the Trifacta platform "aws. on an EC2 instance, you can leverage your enterprise IAM roles to manage credentialProvi permissions on the instance for the Trifacta der":" platform. See below. instance",

temporary Details are below.

Default credential provider

Whether the AWS access mode is set to system or user, the default credential provider for AWS and S3 resources is the Trifacta platform.

Mode Description Configuration

A single AWS Key and Secret is inserted "aws.mode": into platform configuration. This account is "aws.s3.key": used to access all resources and must "system", have the appropriate permissions to do so. "", "aws.s3. secret": "",

Each user must specify an AWS Key and For more information on configuring "aws.mode": Secret into the account to access individual user accounts, see resources. Configure Your Access to S3. "user",

Default credential provider with EMR:

If you are using this method and integrating with an EMR cluster:

Copying the custom credential JAR file must be added as a bootstrap action to the EMR cluster definition. See Configure for EMR. As an alternative to copying the JAR file, you can use the EMR EC2 instance-based roles to govern access. In this case, you must set the following parameter:

"aws.emr.forceInstanceRole": true,

For more information, see Configure for EC2 Role-Based Authentication.

Copyright © 2019 Trifacta Inc. Page #93 Instance credential provider

When the platform is running on an EC2 instance, you can manage permissions through pre-defined IAM roles.

NOTE: If the Trifacta platform is connected to an EMR cluster, you can force authentication to the EMR cluster to use the specified IAM instance role. See Configure for EMR.

For more information, see Configure for EC2 Role-Based Authentication.

Temporary credential provider

For even better security, you can enable use temporary credentials provided from your AWS resources based on an IAM role specified per user.

Tip: This method is recommended by AWS.

Set the following properties.

Property Description

"aws.credentialProvider" If aws.mode = system , set this value to temporary . If aws.mode = user and you are using per-user authentication, then this setting is ignored and should stay as de fault.

Per-user authentication

Individual users can be configured to provide temporary credentials for access to AWS resources, which is a more secure authentication solution. For more information, see Configure AWS Per-User Authentication.

AWS Storage

S3 Sources

To integrate with S3, additional configuration is required. See Enable S3 Access.

Redshift Connections

You can create connections to one or more Redshift databases, from which you can read database sources and to which you can write job results. Samples are still generated on S3.

NOTE: Relational connections require installation of an encryption key file on the Trifacta node. For more information, see Create Encryption Key File.

For more information, see Create Redshift Connections.

AWS Clusters

Trifacta Wrangler Enterprise can integrate with one instance of either of the following.

Copyright © 2019 Trifacta Inc. Page #94 NOTE: If Trifacta Wrangler Enterprise is installed through the Amazon Marketplace, only the EMR integration is supported.

EMR

When Trifacta Wrangler Enterprise in installed through AWS, you can integrate with an EMR cluster for Spark- based job execution. For more information, see Configure for EMR.

Hadoop

If you have installed Trifacta Wrangler Enterprise on-premises or directly into an EC2 instance, you can integrate with a Hadoop cluster for Spark-based job execution. See Configure for Hadoop. Configure for EC2 Role-Based Authentication

Contents:

IAM roles AWS System Mode Additional AWS Configuration Use of S3 Sources

When you are running the Trifacta platform on an EC2 instance, you can leverage your enterprise IAM roles to manage permissions on the instance for the Trifacta platform. When this type of authentication is enabled, Trifacta administrators can apply a role to the EC2 instance where the platform is running. That role's permissions apply to all users of the platform.

IAM roles

Before you begin, your IAM roles should be defined and attached to the associated EC2 instance.

NOTE: The IAM instance role used for S3 access should have access to resources at the bucket level.

For more information, see http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html.

AWS System Mode

To enable role-based instance authentication, the following parameter must be enabled.

"aws.mode": "system",

Additional AWS Configuration

The following additional parameters must be specified:

Copyright © 2019 Trifacta Inc. Page #95 Parameter Description

aws.credentialProvider Set this value to instance. IAM instance role is used for providing access.

aws.hadoopFsUseSharedInstanceProvider Set this value to true for CDH. The class information is provided below.

Shared instance provider class information

Hortonworks:

"com.amazonaws.auth.InstanceProfileCredentialsProvider",

Pre-Cloudera 6.0.0:

"org.apache.hadoop.fs.s3a.SharedInstanceProfileCredentialsProvider"

Cloudera 6.0.0 and later:

Set the above parameters as follows:

"aws.credentialProvider": "instance", "aws.hadoopFSUseSharedInstanceProvider": false,

Use of S3 Sources

To access S3 for storage, additional configuration for S3 may be required.

NOTE: Do not configure the properties that apply to user mode.

Output sizing recommendations:

Single-file output: If you are generating a single file, you should try to keep its size under 1 GB. Multi-part output: For multiple-file outputs, each part file should be under 1 GB in size. For more information, see https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-use-multiple-files.html

For more information, see Enable S3 Access. Enable S3 Access

Contents:

Base Storage Layer Limitations Pre-requisites Required AWS Account Permissions Read-only access polices

Copyright © 2019 Trifacta Inc. Page #96 Write access polices Other AWS policies for S3 Configuration Define base storage layer Enable read access to S3 S3 access modes System mode - additional configuration S3 Configuration Configuration reference Enable use of server-side encryption Configure S3 filewriter Create Redshift Connection Hadoop distribution-specific configuration Hortonworks Additional Configuration for S3 Testing Troubleshooting Profiling consistently fails for S3 sources of data Spark local directory has no space

Below are instructions on how to configure Trifacta® Wrangler Enterprise to point to S3.

Simple Storage Service (S3) is an online data storage service provided by Amazon, which provides low- latency access through web services. For more information, see https://aws.amazon.com/s3/.

NOTE: Please review the limitations before enabling. See Limitations of S3 Integration below.

Base Storage Layer

If base storage layer is S3: you can enable read/write access to S3. If base storage layer is not S3: you can enable read only access to S3.

Limitations

The Trifacta platform supports S3 connectivity for the following distributions:

NOTE: You must use a Hadoop version that is supported for your release of the product.

Cloudera 5.10 and later. See Supported Deployment Scenarios for Cloudera. Hortonworks 2.5 and later. See Supported Deployment Scenarios for Hortonworks. The Trifacta platform only supports running S3-enabled instances over AWS. Access to AWS S3 Regional Endpoints through internet protocol is required. If the machine hosting the Trif acta platform is in a VPC with no internet access, a VPC endpoint enabled for S3 services is required. The Trifacta platform does not support access to S3 through a proxy server.

Write access requires using S3 as the base storage layer. See Set Base Storage Layer.

Copyright © 2019 Trifacta Inc. Page #97 NOTE: Spark 2.3.0 jobs may fail on S3-based datasets due to a known incompatibility. For details, see https://github.com/apache/incubator-druid/issues/4456.

If you encounter this issue, please set spark.version to 2.1.0 in platform configuration. For more information, see Admin Settings Page.

Pre-requisites

On the Trifacta node, you must install the Oracle Java Runtime Environment for Java 1.8. Other versions of the JRE are not supported. For more information on the JRE, see http://www.oracle.com/technetwork/java/javase/downloads/index.html.

If IAM instance role is used for S3 access, it must have access to resources at the bucket level.

Required AWS Account Permissions

All access to S3 sources occurs through a single AWS account (system mode) or through an individual user's account (user mode). For either mode, the AWS access key and secret combination must provide access to the default bucket associated with the account.

NOTE: These permissions should be set up by your AWS administrator

Read-only access polices

NOTE: To enable viewing and browsing of all folders within a bucket, the following permissions are required:

The system account or individual user accounts must have the ListAllMyBuckets access permission for the bucket. All objects to be browsed within the bucket must have Get access enabled.

The policy statement to enable read-only access to your default S3 bucket should look similar to the following. Replace 3c-my-s3-bucket with the name of your bucket:

{ "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "s3:GetObject", "s3:ListBucket", "s3:GetBucketLocation" ], "Resource": [ "arn:aws:s3:::3c-my-s3-bucket", "arn:aws:s3:::3c-my-s3-bucket/*",

Copyright © 2019 Trifacta Inc. Page #98 ] } ] }

Write access polices

Write access is enabled by adding the PutObject and DeleteObject actions to the above. Replace 3c-my- s3-bucket with the name of your bucket:

{ "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "s3:GetObject", "s3:ListBucket", "s3:GetBucketLocation", "s3:PutObject", "s3:DeleteObject" ], "Resource": [ "arn:aws:s3:::3c-my-s3-bucket", "arn:aws:s3:::3c-my-s3-bucket/*", ] } ] }

Other AWS policies for S3

Policy for access to Trifacta public buckets

NOTE: This feature must be enabled. For more information, see Enable Onboarding Tour.

To access S3 assets that are managed by Trifacta, you must apply the following policy definition to any IAM role that is used to access Trifacta Wrangler Pro. This bucket contain demo assets for the On-Boarding tour:

{ "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor1", "Effect": "Allow",

Copyright © 2019 Trifacta Inc. Page #99 "Action": [ "s3:GetObject", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::trifacta-public-datasets/*", "arn:aws:s3:::trifacta-public-datasets" ] } ] }

For more information on creating policies, see https://console.aws.amazon.com/iam/home#/policies.

KMS policy

If any accessible bucket is encrypted with KMS-SSE, another policy must be deployed. For more information, see https://docs.aws.amazon.com/kms/latest/developerguide/iam-policies.html.

Configuration

Depending on your S3 environment, you can define:

read access to S3 access to additional S3 buckets

S3 as base storage layer

Write access to S3 S3 bucket that is the default write destination

Define base storage layer

The base storage layer is the default platform for storing results.

Required for:

Write access to S3 Connectivity to Redshift

The base storage layer for your Trifacta instance is defined during initial installation and cannot be changed afterward.

If S3 is the base storage layer, you must also define the default storage bucket to use during initial installation, which cannot be changed at a later time. See Define default S3 write bucket below.

For more information on the various options for storage, see Storage Deployment Options.

For more information on setting the base storage layer, see Set Base Storage Layer.

Copyright © 2019 Trifacta Inc. Page #100 Enable read access to S3

When read access is enabled, Trifacta users can explore S3 buckets for creating datasets.

NOTE: When read access is enabled, Trifacta users have automatic access to all buckets to which the specified S3 user has access. You may want to create a specific user account for S3 access.

NOTE: Data that is mirrored from one S3 bucket to another might inherit the permissions from the bucket where it is owned.

Steps:

1. You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods. 2. Set the following property to true:

"aws.s3.enabled": true,

3. Save your changes. 4. In the S3 configuration section, set enabled=true, which allows Trifacta users to browse S3 buckets through the Trifacta application. 5. Specify the AWS key and secret values for the user to access S3 storage.

S3 access modes

The Trifacta platform supports the following modes for access S3. You must choose one access mode and then complete the related configuration steps.

NOTE: Avoid switching between user mode and system mode, which can disable user access to data. At install mode, you should choose your preferred mode.

System mode

(default) Access to S3 buckets is enabled and defined for all users of the platform. All users use the same AWS access key, secret, and default bucket.

System mode - read-only access

For read-only access, the key, secret, and default bucket must be specified in configuration.

NOTE: Please verify that the AWS account has all required permissions to access the S3 buckets in use. The account must have the ListAllMyBuckets ACL among its permissions.

Steps:

Copyright © 2019 Trifacta Inc. Page #101 1. You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods. 2. Locate the following parameters:

Parameters Description

aws.s3.key Set this value to the AWS key to use to access S3.

aws.s3.secret Set this value to the secret corresponding to the AWS key provided.

aws.s3.bucket.name Set this value to the name of the S3 bucket from which users may read data.

NOTE: Additional buckets may be specified. See below.

3. Save your changes.

User mode

Optionally, access to S3 can be defined on a per-user basis. This mode allows administrators to define access to specific buckets using various key/secret combinations as a means of controlling permissions.

NOTE: When this mode is enabled, individual users must have AWS configuration settings applied to their account, either by an administrator or by themselves. The global settings in this section do not apply in this mode.

To enable:

1. You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods. 2. Please verify that the settings below have been configured:

"aws.s3.enabled": true, "aws.mode": "user",

3. Additional configuration is required for per-user authentication. For more information, see Configure AWS Per-User Authentication.

User mode - Create encryption key file

If you have enabled user mode for S3 access, you must create and deploy an encryption key file. For more information, see Create Encryption Key File.

NOTE: If you have enabled user access mode, you can skip the following sections, which pertain to the system access mode, and jump to the Enable Redshift Connection section below.

System mode - additional configuration

The following sections apply only to system access mode.

Define default S3 write bucket

Copyright © 2019 Trifacta Inc. Page #102 When S3 is defined as the base storage layer, write access to S3 is enabled. The Trifacta platform attempts to store outputs in the designated default S3 bucket.

NOTE: This bucket must be set during initial installation. Modifying it at a later time is not recommended and can result in inaccessible data in the platform.

NOTE: Bucket names cannot have underscores in them. See http://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html.

Steps:

1. Define S3 to be the base storage layer. See Set Base Storage Layer. 2. Enable read access. See Enable read access. 3. Specify a value for aws.s3.bucket.name , which defines the S3 bucket where data is written. Do not include a protocol identifier. For example, if your bucket address is s3://MyOutputBucket, the value to specify is the following:

MyOutputBucket

NOTE: Specify the top-level bucket name only. There should not be any backslashes in your entry.

NOTE: This bucket also appears as a read-access bucket if the specified S3 user has access.

Adding additional S3 buckets

When read access is enabled, all S3 buckets of which the specified user is the owner appear in the Trifacta application. You can also add additional S3 buckets from which to read.

NOTE: Additional buckets are accessible only if the specified S3 user has read privileges.

NOTE: Bucket names cannot have underscores in them.

Steps:

1. You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods. 2. Locate the following parameter: aws.s3.extraBuckets:

1. In the Admin Settings page, specify the extra buckets as a comma-separated string of additional S3 buckets that are available for storage. Do not put any quotes around the string. Whitespace between string values is ignored. 2. In trifacta-conf.json, specify the extraBuckets array as a comma-separated list of buckets as in the following:

"extraBuckets": ["MyExtraBucket01","MyExtraBucket02"," MyExtraBucket03"]

Copyright © 2019 Trifacta Inc. Page #103 NOTE: Specify the top-level bucket name only. There should not be any backslashes in your entry.

3. Specify the extraBuckets array as a comma-separated list of buckets as in the following:

"extraBuckets": ["MyExtraBucket01","MyExtraBucket02"," MyExtraBucket03"]

4. These values are mapped to the following bucket addresses:

s3://MyExtraBucket01 s3://MyExtraBucket02 s3://MyExtraBucket03

S3 Configuration

Configuration reference

You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.

"aws.s3.enabled": true, "aws.s3.bucket.name": "" "aws.s3.key": "", "aws.s3.secret": "", "aws.s3.extraBuckets": [""]

Setting Description

enabled When set to true , the S3 file browser is displayed in the GUI for locating files. For more information, see S3 Browser.

bucket.name Set this value to the name of the S3 bucket to which you are writing.

When webapp.storageProtocol is set to s3 , the output is delivered to aws.s3.bucket.name.

key Access Key ID for the AWS account to use.

NOTE: This value cannot contain a slash (/ ).

secret Secret Access Key for the AWS account.

extraBuckets Add references to any additional S3 buckets to this comma- separated array of values.

Copyright © 2019 Trifacta Inc. Page #104 The S3 user must have read access to these buckets.

Enable use of server-side encryption

You can configure the Trifacta platform to publish data on S3 when a server-side encryption policy is enabled. SSE-S3 and SSE-KMS methods are supported. For more information, see http://docs.aws.amazon.com/AmazonS3/latest/dev/serv-side-encryption.html.

Notes:

When encryption is enabled, all buckets to which you are writing must share the same encryption policy. Read operations are unaffected.

To enable, please specify the following parameters.

You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.

Server-side encryption method

"aws.s3.serverSideEncryption": "none",

Set this value to the method of encryption used by the S3 server. Supported values:

NOTE: Lower case values are required.

sse-s3 sse-kms none

Server-side KMS key identifier

When KMS encryption is enabled, you must specify the AWS KMS key ID to use for the server-side encryption.

"aws.s3.serverSideKmsKeyId": "",

Notes:

Access to the key: Access must be provided to the authenticating user. The AWS IAM role must be assigned to this key. Encrypt/Decrypt permissions for the specified KMS key ID: Permissions must be assigned to the authenticating user. The AWS IAM role must be given these permissions. For more information, see https://docs.aws.amazon.com/kms/latest/developerguide/key-policy-modifying.html .

The format for referencing this key is the following:

Copyright © 2019 Trifacta Inc. Page #105 "arn:aws:kms:::key/"

You can use an AWS alias in the following formats. The format of the AWS-managed alias is the following:

"alias/aws/s3"

The format for a custom alias is the following:

"alias/"

where:

is the name of the alias for the entire key.

Save your changes and restart the platform.

Configure S3 filewriter

The following configuration can be applied through the Hadoop site-config.xml file. If your installation does not have a copy of this file, you can insert the properties listed in the steps below into trifacta-conf.json to configure the behavior of the S3 filewriter.

Steps:

1. You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods. 2. Locate the filewriter.hadoopConfig block, where you can insert the following Hadoop configuration properties:

"filewriter": { max: 16, "hadoopConfig": { "fs.s3a.buffer.dir": "/tmp", "fs.s3a.fast.upload": "true" }, ... }

Property Description

fs.s3a.buffer.dir Specifies the temporary directory on the Trifacta node to use for buffering when uploading to S3. If fs.s3a.fast. upload is set to false , this parameter is unused.

NOTE: This directory must be accessible to the Batch Job Runner process during job execution.

fs.s3a.fast.upload Set to true to enable buffering in blocks.

Copyright © 2019 Trifacta Inc. Page #106 When set to false, buffering in blocks is disabled. For a given file, the entire object is buffered to the disk of the Trifacta node. Depending on the size and volume of your datasets, the node can run out of disk space.

3. Save your changes and restart the platform.

Create Redshift Connection

For more information, see Create Redshift Connections.

Hadoop distribution-specific configuration

Hortonworks

NOTE: If you are using Spark profiling through Hortonworks HDP on data stored in S3, additional configuration is required. See Configure for Hortonworks.

Additional Configuration for S3

The following parameters can be configured through the Trifacta platform to affect the integration with S3. You may or may not need to modify these values for your deployment.

You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.

Parameter Description

aws.s3.consistencyTimeout S3 does not guarantee that at any time the files that have been written to a directory will be consistent with the files available for reading. S3 does guarantee that eventually the files are in sync.

This guarantee is important for some platform jobs that write data to S3 and then immediately attempt to read from the written data.

This timeout defines how long the platform waits for this guarantee. If the timeout is exceeded, the job is failed. The default value is 120 .

Depending on your environment, you may need to modify this value.

aws.s3.endpoint If your S3 deployment is either of the following:

located in a region that does not support the default endpoint, or v4-only signature is enabled in the region

Then, you can specify this setting to point to the S3 endpoint for Java/Spark services. This value should be the S3 endpoint DNS name value.

For more information on this location, see https://docs.aws.amazon.com/general/latest/gr/rande. html#s3_region .

Testing

Restart services. See Start and Stop the Platform.

Try running a simple job from the Trifacta application. For more information, see Verify Operations.

Copyright © 2019 Trifacta Inc. Page #107 Troubleshooting

Profiling consistently fails for S3 sources of data

If you are executing visual profiles of datasets sourced from S3, you may see an error similar to the following in the batch-job-runner.log file:

01:19:52.297 [Job 3] ERROR com.trifacta.hadoopdata.joblaunch.server. BatchFileWriterWorker - BatchFileWriterException: Batch File Writer unknown error: {jobId=3, why=bound must be positive} 01:19:52.298 [Job 3] INFO com.trifacta.hadoopdata.joblaunch.server. BatchFileWriterWorker - Notifying monitor for job 3 with status code FAILURE

This issue is caused by improperly configuring buffering when writing to S3 jobs. The specified local buffer cannot be accessed as part of the batch job running process, and the job fails to write results to S3.

Solution:

You may do one of the following:

Use a valid temp directory when buffering to S3. Disable buffering to directory completely.

Steps:

You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods. Locate the following, where you can insert either of the following Hadoop configuration properties:

"filewriter": { max: 16, "hadoopConfig": { "fs.s3a.buffer.dir": "/tmp", "fs.s3a.fast.upload": false }, ... }

Property Description

fs.s3a.buffer.dir Specifies the temporary directory on the Trifacta node to use for buffering when uploading to S3. If fs.s3a.fast. upload is set to false , this parameter is unused.

fs.s3a.fast.upload When set to false , buffering is disabled.

Save your changes and restart the platform.

Spark local directory has no space

Copyright © 2019 Trifacta Inc. Page #108 During execution of a Spark job, you may encounter the following error:

org.apache.hadoop.util.DiskChecker$DiskErrorException: No space available in any of the local directories.

Solution:

Restart Trifacta services, which may free up some temporary space. Use the steps in the preceding solution to reassign a temporary directory for Spark to use (fs.s3a. buffer.dir).

Create Redshift Connections

Contents:

Pre-requisites Limitations Create Connection Create through application Testing

This section provides information on how to enable Redshift connectivity and create one or more connections to Redshift sources.

Amazon Redshift is a hosted data warehouse available through Amazon Web Services. It is frequently used for hosting of datasets used by downstream analytic tools such as Tableau and Qlik. For more information, see https://aws.amazon.com/redshift/.

Pre-requisites

Before you begin, please verify that your Trifacta® environment meets the following requirements:

NOTE: In the Admin Settings page are some deprecated parameters pertaining to Redshift. Please ignore these parameters and their settings. They do not apply to this release.

1. S3 base storage layer: Redshift access requires use of S3 as the base storage layer, which must be enabled. See Set Base Storage Layer. 2. Same region: The Redshift cluster must be in the same region as the default S3 bucket. 3. Integration: Your Trifacta instance is connected to a running environment supported by your product edition. 4. Deployment: Trifacta platform is deployed either on-premises or in EC2.

Limitations

1. When publishing to Redshift through the Publishing dialog, output must be in Avro or JSON format. This limitation does not apply to direct writing to Redshift.

Copyright © 2019 Trifacta Inc. Page #109 2.

You can publish any specific job once to Redshift through the export window. See Publishing Dialog. 3. 4. Management of nulls: 1. Nulls are displayed as expected in the Trifacta application. 2. When Redshift jobs are run, the UNLOAD SQL command in Redshift converts all nulls to empty strings. Null values appear as empty strings in generated results, which can be confusing. This is a known issue with Redshift.

Create Connection

You can create Redshift connections through the following methods.

Tip: SSL connections are recommended. Details are below.

Create through application

Any user can create a Redshift connection through the application.

Steps:

1. Login to the application. 2. In the menu, click Settings menu > Connections. 3. In the Create Connection page, click the Redshift connection card. 4. Specify the properties for your Redshift database connection. The following parameters are specific to Redshift connections:

Property Description

IAM Role ARN for Redshift-S3 Connectivity (Optional) You can specify an IAM role ARN that enables role- based connectivity between Redshift and the S3 bucket that is used as intermediate storage during Redshift bulk COPY /UNLOAD operations. Example:

arn:aws:iam::1234567890: role/MyRedshiftRole

For more information, see Create Connection Window. 5. Click Save.

Enable SSL connections

To enable SSL connections to Redshift, you must enable them first on your Redshift cluster. For more information, see https://docs.aws.amazon.com/redshift/latest/mgmt/connecting-ssl-support.html.

In your connection to Redshift, please add the following string to your Connect String Options:

;ssl=true

Copyright © 2019 Trifacta Inc. Page #110 Save your changes.

Testing

Import a dataset from Redshift. Add it to a flow, and specify a publishing action. Run a job.

NOTE: When publishing to Redshift through the Publishing dialog, output must be in Avro or JSON format. This limitation does not apply to direct writing to Redshift.

For more information, see Verify Operations.

After you have run your job, you can publish the results to Redshift through the Job Details page. See Publishing Dialog. Upgrade Overview

This section provides an overview of how Trifacta® Wrangler Enterprise is upgraded, including upgrade prep, software and database upgrade, and post-upgrade cleanup.

Topics:

Upgrade for AWS Marketplace Upgrade for AWS Marketplace with EMR Upgrade for Azure Marketplace

Upgrade for AWS Marketplace

Contents:

Upgrade Process preview Supported paths Upgrade Prep Backup Perform the Cloudformation Stack upgrade Restore your backup onto the new instance Appendix - Post-Upgrade Fixups Upgrade from Release 5.1 or earlier Documentation Related Topics

This documentation applies to installation from a supported Marketplace. Please use the installation instructions provided with your deployment.

If you are installing or upgrading a Marketplace deployment, please use the available PDF content. You must use the install and configuration PDF available through the Marketplace listing.

Copyright © 2019 Trifacta Inc. Page #111 This guide steps through the requirements and process for upgrading Trifacta® Data Preparation for Amazon Redshift and S3 through the AWS Marketplace.

Upgrade

Process preview

Upgrading to the latest version of Trifacta Data Preparation for Amazon Redshift and S3 via the AWS Marketplace will terminate your existing instance and bring up a new instance with the latest software in its place. Please follow the instructions here carefully.

Your Trifacta instance is deployed as part of a Cloudformation Stack. When you upgrade this stack, Cloudformation informs you of the resources it plans on modifying and then manages the modifications. If there is any problem doing these modifications, Cloudformation rolls back its changes. Your existing instance is not deleted until the new one has been brought up successfully.

Upgrade flow:

1. Upgrade prep: Review some items that you should know before upgrading Trifacta Data Preparation for Amazon Redshift and S3. 2. Backup: Before performing the Cloudformation Stack upgrade, you perform two different backups. 3. Upgrade: You perform the Cloudformation Stack upgrade which also makes the required changes to your environment, including bringing up a new instance which has the latest version of the software. 4. Restore: You restore your backups onto the new version of the software. 5. Post-Upgrade Fixes: You make any changes necessary to get up and running again.

Supported paths

This upgrade process supports upgrade for the following versions:

Source Version Target Upgrade Version

Trifacta Data Preparation for Amazon Redshift and S3 6.0 Trifacta Data Preparation for Amazon Redshift and S3 6.4

Before You Upgrade

You must upgrade the product version over version. If you are upgrading from a version that is earlier than the supported Source Version listed above for this upgrade process, please use the links below to acquire the AMI(s) and documentation to upgrade to the earliest supported Source Version above. Then, return to these instructions to complete the process.

Your Version Target Version AMI Documentation

Trifacta Data Preparation for Trifacta Data Preparation for Please see the AWS Trifacta Install Guide for AWS Amazon Redshift and S3 5.1 Amazon Redshift and S3 6.0.2 Marketplace listing for the Marketplace v6.0 product. The AMI is accessible from there.

Upgrade Prep

This process requires broad permissions on your AWS account. If you do not have Administrator access, you may encounter errors when Cloudformation tries to modify AWS objects, like IAM or Security Groups.

Copyright © 2019 Trifacta Inc. Page #112 If you have made additional tweaks to the default installation, these changes are likely to be lost. Please review and note any changes, so you can replicate them if necessary. IAM role or policy changes Changes on the Trifacta Server OS itself SSL certificates and configuration The IP address of your Trifacta Server is changing. This change requires a DNS update after the upgrade is complete.

Backup

We'll make two backups: one of the Trifacta software and one of the entire EC2 Instance.

1. SSH to your current Marketplace instance. Example:

ssh -i MyKey.pem [email protected]

2. Switch to Root user on the server:

sudo su

3. Stop the Trifacta platform on your current Marketplace instance:

service trifacta stop

4. Run the Trifacta backup script:

/opt/trifacta/bin/setup-utils/trifacta-backup-config-and-db.sh

1. When the script is complete, the output identifies the location of the backup. Example:

/opt/trifacta-backups/trifacta-backup-6.0.2+329. 20190628114005.900f3da-20191126003834.tgz

5. Store the backup in a safe location. You should copy it to either S3 or to your local computer via SCP.

1. To copy the backup to the S3 bucket used by your installation, you can use this example:

aws s3 cp /opt/trifacta-backups/trifacta-backup-6.0.2+329. 20190628114005.900f3da-20191126003834.tgz s3:///trifacta-backups/

2. If you choose to use SCP, please note that the AMI does not allow root login. You must copy the files to the CentOS user's home directory and modify any permissions to allow the CentOS user to read them. Then, you can use SCP to copy the files. Example:

Copyright © 2019 Trifacta Inc. Page #113 scp -i centos@:./example-file.txt ./

6. You should also take a snapshot of the EBS volume backing your EC2 instance. 1. This step is not needed to restore the product, but it can be useful if you have additional files or configurations to replicate to your new Trifacta instance.

Perform the Cloudformation Stack upgrade

In this section, you upgrade the Cloudformation Stack. This upgrade launches the latest version of the Trifacta software and performs any necessary adjustments to your existing resources.

Steps:

1. Upgrade the Cloudformation Stack. 2. Visit the Marketplace listing page for your product. 3. Under View Usage Instructions, expand the View CloudFormation Template section. 4. Right-click the Download CloudFormation Template link and copy its URL. 5. In your AWS Console, go to CloudFormation. 6. Select your Trifacta Stack. 7. Click Update. 8. Select Replace Current Template. 9. Select Amazon S3 URL, and paste the link to the template in the textbox. 10. Click Next and review the parameters, which should be inherited from your existing Stack.

1. Please review the "Change Set Preview" list, so you are aware of the resources that are changing. 11. Wait until the CloudFormation Stack indicates that the upgrade is finished. 12. Access the Trifacta software via the URL in the Outputs tab of CloudFormation. 13. When you reach the the Login page, the software has started, and you can continue with the following sections.

Restore your backup onto the new instance

1. Connect to your new Trifacta instance via SSH.

NOTE: The IP address has changed.

2. Switch to Root user on the Trifacta instance.

sudo su cd

3. The Trifacta software starts automatically on boot. Stop it before continuing:

service trifacta stop

4. Download the backup from your storage location and extract its contents. Example:

mkdir -p /root/trifacta-restore-files cd /root/trifacta-restore-files

Copyright © 2019 Trifacta Inc. Page #114 aws s3 cp s3:///trifacta-backups/ . tar xzf

5. Execute the restore script. Pass in the path to your unzipped backup as a parameter. Example:

/opt/trifacta/bin/setup-utils/trifacta-restore-from-backup.sh -r /root/trifacta-restore-files/trifacta-backup-6.0.2+329. 20190628114005.900f3da-20191126003834

6. Start up the product:

service trifacta start

7. Verify that the product is working as expected by running jobs.

Appendix - Post-Upgrade Fixups

Upgrade from Release 5.1 or earlier

Photon scaling factor has been removed

Applies if: You modified the Photon scaling properties in your pre-upgrade environment.

In Release 6.0, the Photon scaling factor parameter (photon.loadScalingFactor) has been removed. As part of this change, the following parameters are automatically set to new values as part of the upgrade. The new values are listed below:

"webapp.client.loadLimit": 10485760, "webapp.client.maxResultsBytes": 41943040,

NOTE: If you had not modified either of the above values previously, then no action is required. If you had changed these values before upgrading, the settings are set to the new default values above.

Update Data Service properties

Applies if: You modified the Data Service classpath in your pre-upgrade environment.

After you have upgraded, the Data Service fails to start.

In Release 6.0.0, some configuration files related to the Data Service were relocated, so the classpath values pointing to these files need to be updated.

Steps:

Copyright © 2019 Trifacta Inc. Page #115 1. To apply this configuration change, login as an administrator to the Trifacta node. Then, edit trifacta- conf.json. Some of these settings may not be available through the Admin Settings Page. For more information, see Platform Configuration Methods. 2. Locate the data-service.classpath setting. Change the class path value to point to the correct directory:

/opt/trifacta/conf/data-service

3. Locate the webapp.connectivity.kerberosDelegateConfigPath setting. If you are enabling Kerberos-based SSO for relational connections, please add the following value to the path:

"%(topOfTree)s/services/data-service/build/conf/kerberosdelegate. config"

For more information, see Enable SSO for Relational Connections. 4. Save the file.

SSO signout updates

Applies if: You have enabled SSO for AD/LDAP and have noticed that logout is not working.

After upgrading to this release, signing out of the Trifacta application may not work when SSO is enabled. This issue applies to the reverse proxy method of SSO for AD/LDAP.

NOTE: Beginning in Release 6.0, a platform-native method of SSO is available. This new method is recommended.

Some properties related to the reverse proxy must be updated. Please complete the following:

Steps:

1. Login to the Trifacta node. 2. Edit the following file:

/opt/trifacta/pkg3p/src/tripache/conf/conf.d/trifacta.conf

3. Add the following rule for the /unauthorized path:

Copyright © 2019 Trifacta Inc. Page #116 4. Modify the redirection for /sign-out from / to /unauthorized. Remove the rewrite rule:

5. Save the file and restart the platform.

Troubleshooting - disambiguating email accounts

Applies if: Applies to all upgrades to Release 6.0.1 and later.

In Release 6.0.1, the Trifacta platform permitted case-sensitive email addresses. So for purposes of creating user accounts, the following could be different userIds in the platform. Pre-upgrade, the People table might look like the following:

| | | other columns | | 1 | [email protected] | * | | 2 | [email protected] | * | | 3 | [email protected] | * |

Beginning in Release 6.0.2, all email addresses (userIds) are case-insensitive, so the above distinctions are no longer permitted in the platform.

As of Release 6.0.2, all email addresses are converted to lower-case. As part of the upgrade to Release 6.0.2, any email addresses that are case-insensitive matches (foobar and FOOBAR) are disambiguated. After upgrade the People table might look like the following:

| | | other columns | | 1 | [email protected]_duplicate_1 | * | | 2 | [email protected]_duplicate_2 | * | | 3 | [email protected] | * |

Notes:

The email address with the highest Id value in the People table is assumed to be the original user account. The format for email addresses is:

_duplicate_

where is the row in the table where the duplicate was detected.

After all migrations have completed, you should review the migration logs. Search the logs for the following: all- emails-for-people-to-lower-case.

A set of users without duplicates has the following entry:

Copyright © 2019 Trifacta Inc. Page #117 == 20181107131647-all-emails-for-people-to-lower-case: migrating ======20181107131647-all-emails-for-people-to-lower-case: migrated (0.201s)

Entries like the following indicate that duplicate addresses were found for separate accounts. The new duplicate Ids are listed as part of the message:

== 20181107131647-all-emails-for-people-to-lower-case: migrating ======warn: List of duplicated emails: [email protected]_duplicate_1, [email protected]_duplicate_2 == 20181107131647-all-emails-for-people-to-lower-case: migrated (0.201s)

NOTE: The above log entry indicates that there are duplicated user accounts.

Suggested approach:

1. Change ownership on all flows in secondary accounts to the primary account. 2. Delete secondary accounts.

Documentation

You can access complete product documentation online and in PDF format. From within the product, select Help menu > Product Docs. Third-Party License Information

Copyright © 2019 Trifacta Inc.

This product also includes the following libraries which are covered by The (MIT AND BSD-3-Clause):

sha.js

This product also includes the following libraries which are covered by The 2-clause BSD License:

double_metaphone

This product also includes the following libraries which are covered by The 3-Clause BSD License:

com.google.protobuf.protobuf-java com.google.protobuf.protobuf-java-util

This product also includes the following libraries which are covered by The ASL:

funcsigs org.json4s.json4s-ast_2.10 org.json4s.json4s-core_2.10 org.json4s.json4s-native_2.10

This product also includes the following libraries which are covered by The ASL 2.0:

pykerberos

Copyright © 2019 Trifacta Inc. Page #118 This product also includes the following libraries which are covered by The Amazon Redshift ODBC and JDBC Driver License Agreement:

RedshiftJDBC41-1.1.7.1007 com.amazon.redshift.RedshiftJDBC41

This product also includes the following libraries which are covered by The Apache 2.0 License:

com.uber.jaeger.jaeger-b3 com.uber.jaeger.jaeger-core com.uber.jaeger.jaeger-thrift com.uber.jaeger.jaeger-zipkin org.apache.spark.spark-catalyst_2.11 org.apache.spark.spark-core_2.11 org.apache.spark.spark-hive_2.11 org.apache.spark.spark-kvstore_2.11 org.apache.spark.spark-launcher_2.11 org.apache.spark.spark-network-common_2.11 org.apache.spark.spark-network-shuffle_2.11 org.apache.spark.spark-sketch_2.11 org.apache.spark.spark-sql_2.11 org.apache.spark.spark-tags_2.11 org.apache.spark.spark-unsafe_2.11 org.apache.spark.spark-yarn_2.11

This product also includes the following libraries which are covered by The Apache License:

com.chuusai.shapeless_2.11 commons-httpclient.commons-httpclient org.apache.httpcomponents.httpclient org.apache.httpcomponents.httpcore

This product also includes the following libraries which are covered by The Apache License (v2.0):

com.vlkan.flatbuffers

This product also includes the following libraries which are covered by The Apache License 2.0:

@google-cloud/pubsub @google-cloud/resource @google-cloud/storage arrow avro aws-sdk azure-storage bootstrap browser-request bytebuffer cglib.cglib-nodep com.amazonaws.aws-java-sdk com.amazonaws.aws-java-sdk-bundle com.amazonaws.aws-java-sdk-core com.amazonaws.aws-java-sdk-dynamodb com.amazonaws.aws-java-sdk-emr com.amazonaws.aws-java-sdk-iam com.amazonaws.aws-java-sdk-kms com.amazonaws.aws-java-sdk-s3 com.amazonaws.aws-java-sdk-sts com.amazonaws.jmespath-java

Copyright © 2019 Trifacta Inc. Page #119 com.carrotsearch.hppc com.clearspring.analytics.stream com.cloudera.navigator.navigator-sdk com.cloudera.navigator.navigator-sdk-client com.cloudera.navigator.navigator-sdk-model com.codahale.metrics.metrics-core com.cronutils.cron-utils com.databricks.spark-avro_2.11 com.fasterxml.classmate com.fasterxml.jackson.dataformat.jackson-dataformat-cbor com.fasterxml.jackson.datatype.jackson-datatype-joda com.fasterxml.jackson.jaxrs.jackson-jaxrs-base com.fasterxml.jackson.jaxrs.jackson-jaxrs-json-provider com.fasterxml.jackson.module.jackson-module-jaxb-annotations com.fasterxml.jackson.module.jackson-module-paranamer com.fasterxml.jackson.module.jackson-module-scala_2.11 com.fasterxml.uuid.java-uuid-generator com.github.nscala-time.nscala-time_2.10 com.github.stephenc.jcip.jcip-annotations com.google.api-client.google-api-client com.google.api-client.google-api-client-jackson2 com.google.api-client.google-api-client-java6 com.google.api.grpc.grpc-google-cloud-bigquerystorage-v1beta1 com.google.api.grpc.grpc-google-cloud-bigtable-admin-v2 com.google.api.grpc.grpc-google-cloud-bigtable-v2 com.google.api.grpc.grpc-google-cloud-pubsub-v1 com.google.api.grpc.grpc-google-cloud-spanner-admin-database-v1 com.google.api.grpc.grpc-google-cloud-spanner-admin-instance-v1 com.google.api.grpc.grpc-google-cloud-spanner-v1 com.google.api.grpc.grpc-google-common-protos com.google.api.grpc.proto-google-cloud-bigquerystorage-v1beta1 com.google.api.grpc.proto-google-cloud-bigtable-admin-v2 com.google.api.grpc.proto-google-cloud-bigtable-v2 com.google.api.grpc.proto-google-cloud-datastore-v1 com.google.api.grpc.proto-google-cloud-monitoring-v3 com.google.api.grpc.proto-google-cloud-pubsub-v1 com.google.api.grpc.proto-google-cloud-spanner-admin-database-v1 com.google.api.grpc.proto-google-cloud-spanner-admin-instance-v1 com.google.api.grpc.proto-google-cloud-spanner-v1 com.google.api.grpc.proto-google-common-protos com.google.api.grpc.proto-google-iam-v1 com.google.apis.google-api-services-bigquery com.google.apis.google-api-services-clouddebugger com.google.apis.google-api-services-cloudresourcemanager com.google.apis.google-api-services-dataflow com.google.apis.google-api-services-iam com.google.apis.google-api-services-oauth2 com.google.apis.google-api-services-pubsub com.google.apis.google-api-services-storage com.google.auto.service.auto-service com.google.auto.value.auto-value-annotations com.google.cloud.bigdataoss.gcsio com.google.cloud.bigdataoss.util com.google.cloud.google-cloud-bigquery com.google.cloud.google-cloud-bigquerystorage com.google.cloud.google-cloud-bigtable com.google.cloud.google-cloud-bigtable-admin com.google.cloud.google-cloud-core com.google.cloud.google-cloud-core-grpc com.google.cloud.google-cloud-core-http

Copyright © 2019 Trifacta Inc. Page #120 com.google.cloud.google-cloud-monitoring com.google.cloud.google-cloud-spanner com.google.code.findbugs.jsr305 com.google.code.gson.gson com.google.errorprone.error_prone_annotations com.google.flogger.flogger com.google.flogger.flogger-system-backend com.google.flogger.google-extensions com.google.guava.failureaccess com.google.guava.guava com.google.guava.guava-jdk5 com.google.guava.listenablefuture com.google.http-client.google-http-client com.google.http-client.google-http-client-apache com.google.http-client.google-http-client-appengine com.google.http-client.google-http-client-jackson com.google.http-client.google-http-client-jackson2 com.google.http-client.google-http-client-protobuf com.google.inject.extensions.guice-servlet com.google.inject.guice com.google.j2objc.j2objc-annotations com.google.oauth-client.google-oauth-client com.google.oauth-client.google-oauth-client-java6 com.googlecode.javaewah.JavaEWAH com.googlecode.libphonenumber.libphonenumber com.hadoop.gplcompression.hadoop-lzo com.jakewharton.threetenabp.threetenabp com.jamesmurty.utils.java-xmlbuilder com.jolbox.bonecp com.mapr.mapr-root com.microsoft.azure.adal4j com.microsoft.azure.azure-core com.microsoft.azure.azure-storage com.microsoft.windowsazure.storage.microsoft-windowsazure-storage-sdk com.nimbusds.lang-tag com.nimbusds.nimbus-jose-jwt com.ning.compress-lzf com.opencsv.opencsv com.squareup.okhttp.okhttp com.squareup.okhttp3.logging-interceptor com.squareup.okhttp3.okhttp com.squareup.okhttp3.okhttp-urlconnection com.squareup.okio.okio com.squareup.retrofit2.adapter-rxjava com.squareup.retrofit2.converter-jackson com.squareup.retrofit2.retrofit com.trifacta.hadoop.cloudera4 com.twitter.chill-java com.twitter.chill_2.11 com.twitter.parquet-hadoop-bundle com.typesafe.akka.akka-actor_2.11 com.typesafe.akka.akka-cluster_2.11 com.typesafe.akka.akka-remote_2.11 com.typesafe.akka.akka-slf4j_2.11 com.typesafe.config com.univocity.univocity-parsers com.zaxxer.HikariCP com.zaxxer.HikariCP-java7 commons-beanutils.commons-beanutils commons-beanutils.commons-beanutils-core

Copyright © 2019 Trifacta Inc. Page #121 commons-cli.commons-cli commons-codec.commons-codec commons-collections.commons-collections commons-configuration.commons-configuration commons-dbcp.commons-dbcp commons-dbutils.commons-dbutils commons-digester.commons-digester commons-el.commons-el commons-fileupload.commons-fileupload commons-io.commons-io commons-lang.commons-lang commons-logging.commons-logging commons-net.commons-net commons-pool.commons-pool dateinfer de.odysseus.juel.juel-api de.odysseus.juel.juel-impl de.odysseus.juel.juel-spi express-opentracing google-benchmark googleapis io.airlift.aircompressor io.dropwizard.metrics.metrics-core io.dropwizard.metrics.metrics-graphite io.dropwizard.metrics.metrics-json io.dropwizard.metrics.metrics-jvm io.grpc.grpc-all io.grpc.grpc-alts io.grpc.grpc-auth io.grpc.grpc-context io.grpc.grpc-core io.grpc.grpc-grpclb io.grpc.grpc-netty io.grpc.grpc-netty-shaded io.grpc.grpc-okhttp io.grpc.grpc-protobuf io.grpc.grpc-protobuf-lite io.grpc.grpc-protobuf-nano io.grpc.grpc-stub io.grpc.grpc-testing io.netty.netty io.netty.netty-all io.netty.netty-buffer io.netty.netty-codec io.netty.netty-codec-http io.netty.netty-codec-http2 io.netty.netty-codec-socks io.netty.netty-common io.netty.netty-handler io.netty.netty-handler-proxy io.netty.netty-resolver io.netty.netty-tcnative-boringssl-static io.netty.netty-transport io.opentracing.contrib.opentracing-concurrent io.opentracing.contrib.opentracing-globaltracer io.opentracing.contrib.opentracing-web-servlet-filter io.opentracing.opentracing-api io.opentracing.opentracing-noop io.opentracing.opentracing-util io.prometheus.simpleclient

Copyright © 2019 Trifacta Inc. Page #122 io.prometheus.simpleclient_common io.prometheus.simpleclient_servlet io.reactivex.rxjava io.spray.spray-can_2.11 io.spray.spray-http_2.11 io.spray.spray-httpx_2.11 io.spray.spray-io_2.11 io.spray.spray-json_2.11 io.spray.spray-routing_2.11 io.spray.spray-util_2.11 io.springfox.springfox-core io.springfox.springfox-schema io.springfox.springfox-spi io.springfox.springfox-spring-web io.springfox.springfox-swagger-common io.springfox.springfox-swagger-ui io.springfox.springfox-swagger2 io.swagger.swagger-annotations io.swagger.swagger-models io.undertow.undertow-core io.undertow.undertow-servlet io.undertow.undertow-websockets-jsr io.zipkin.java.zipkin io.zipkin.reporter.zipkin-reporter io.zipkin.reporter.zipkin-sender-urlconnection io.zipkin.zipkin2.zipkin it.unimi.dsi.fastutil jaeger-client javax.inject.javax.inject javax.jdo.jdo-api javax.validation.validation-api joda-time.joda-time kerberos less libcuckoo log4j.apache-log4j-extras log4j.log4j long mathjs mx4j.mx4j net.bytebuddy.byte-buddy net.hydromatic.eigenbase-properties net.java.dev.eval.eval net.java.dev.jets3t.jets3t net.jcip.jcip-annotations net.jpountz.lz4.lz4 net.minidev.accessors-smart net.minidev.json-smart net.sf.opencsv.opencsv net.snowflake.snowflake-jdbc opentracing org.activiti.activiti-bpmn-converter org.activiti.activiti-bpmn-layout org.activiti.activiti-bpmn-model org.activiti.activiti-common-rest org.activiti.activiti-dmn-api org.activiti.activiti-dmn-model org.activiti.activiti-engine org.activiti.activiti-form-api org.activiti.activiti-form-model

Copyright © 2019 Trifacta Inc. Page #123 org.activiti.activiti-image-generator org.activiti.activiti-process-validation org.activiti.activiti-rest org.activiti.activiti-spring org.activiti.activiti-spring-boot-starter-basic org.activiti.activiti-spring-boot-starter-rest-api org.activiti.activiti5-compatibility org.activiti.activiti5-engine org.activiti.activiti5-spring org.activiti.activiti5-spring-compatibility org.apache.arrow.arrow-format org.apache.arrow.arrow-memory org.apache.arrow.arrow-vector org.apache.atlas.atlas-client org.apache.atlas.atlas-typesystem org.apache.avro.avro org.apache.avro.avro-ipc org.apache.avro.avro-mapred org.apache.beam.beam-model-job-management org.apache.beam.beam-model-pipeline org.apache.beam.beam-runners-core-construction-java org.apache.beam.beam-runners-direct-java org.apache.beam.beam-runners-google-cloud-dataflow-java org.apache.beam.beam-sdks-java-core org.apache.beam.beam-sdks-java-extensions-google-cloud-platform-core org.apache.beam.beam-sdks-java-extensions-protobuf org.apache.beam.beam-sdks-java-extensions-sorter org.apache.beam.beam-sdks-java-io-google-cloud-platform org.apache.beam.beam-sdks-java-io-parquet org.apache.beam.beam-vendor-grpc-1_13_1 org.apache.beam.beam-vendor-guava-20_0 org.apache.calcite.calcite-avatica org.apache.calcite.calcite-core org.apache.calcite.calcite-linq4j org.apache.commons.codec org.apache.commons.commons-collections4 org.apache.commons.commons-compress org.apache.commons.commons-configuration2 org.apache.commons.commons-crypto org.apache.commons.commons-csv org.apache.commons.commons-dbcp2 org.apache.commons.commons-email org.apache.commons.commons-exec org.apache.commons.commons-lang3 org.apache.commons.commons-math org.apache.commons.commons-math3 org.apache.commons.commons-pool2 org.apache.curator.curator-client org.apache.curator.curator-framework org.apache.curator.curator-recipes org.apache.derby.derby org.apache.directory.api.api-asn1-api org.apache.directory.api.api-util org.apache.directory.server.apacheds-i18n org.apache.directory.server.apacheds-kerberos-codec org.apache.hadoop.avro org.apache.hadoop.hadoop-annotations org.apache.hadoop.hadoop-auth org.apache.hadoop.hadoop-aws org.apache.hadoop.hadoop-azure

Copyright © 2019 Trifacta Inc. Page #124 org.apache.hadoop.hadoop-azure-datalake org.apache.hadoop.hadoop-client org.apache.hadoop.hadoop-common org.apache.hadoop.hadoop-hdfs org.apache.hadoop.hadoop-hdfs-client org.apache.hadoop.hadoop-mapreduce-client-app org.apache.hadoop.hadoop-mapreduce-client-common org.apache.hadoop.hadoop-mapreduce-client-core org.apache.hadoop.hadoop-mapreduce-client-jobclient org.apache.hadoop.hadoop-mapreduce-client-shuffle org.apache.hadoop.hadoop-yarn-api org.apache.hadoop.hadoop-yarn-client org.apache.hadoop.hadoop-yarn-common org.apache.hadoop.hadoop-yarn-registry org.apache.hadoop.hadoop-yarn-server-common org.apache.hadoop.hadoop-yarn-server-nodemanager org.apache.hadoop.hadoop-yarn-server-web-proxy org.apache.htrace.htrace-core org.apache.htrace.htrace-core4 org.apache.httpcomponents.httpmime org.apache.ivy.ivy org.apache.kerby.kerb-admin org.apache.kerby.kerb-client org.apache.kerby.kerb-common org.apache.kerby.kerb-core org.apache.kerby.kerb-crypto org.apache.kerby.kerb-identity org.apache.kerby.kerb-server org.apache.kerby.kerb-simplekdc org.apache.kerby.kerb-util org.apache.kerby.kerby-asn1 org.apache.kerby.kerby-config org.apache.kerby.kerby-pkix org.apache.kerby.kerby-util org.apache.kerby.kerby-xdr org.apache.kerby.token-provider org.apache.logging.log4j.log4j-1.2-api org.apache.logging.log4j.log4j-api org.apache.logging.log4j.log4j-api-scala_2.11 org.apache.logging.log4j.log4j-core org.apache.logging.log4j.log4j-jcl org.apache.logging.log4j.log4j-jul org.apache.logging.log4j.log4j-slf4j-impl org.apache.logging.log4j.log4j-web org.apache.orc.orc-core org.apache.orc.orc-mapreduce org.apache.orc.orc-shims org.apache.parquet.parquet-avro org.apache.parquet.parquet-column org.apache.parquet.parquet-common org.apache.parquet.parquet-encoding org.apache.parquet.parquet-format org.apache.parquet.parquet-hadoop org.apache.parquet.parquet-jackson org.apache.parquet.parquet-tools org.apache.pig.pig org.apache.pig.pig-core-spork org.apache.thrift.libfb303 org.apache.thrift.libthrift org.apache.tomcat.embed.tomcat-embed-core

Copyright © 2019 Trifacta Inc. Page #125 org.apache.tomcat.embed.tomcat-embed-el org.apache.tomcat.embed.tomcat-embed-websocket org.apache.tomcat.tomcat-annotations-api org.apache.tomcat.tomcat-jdbc org.apache.tomcat.tomcat-juli org.apache.xbean.xbean-asm5-shaded org.apache.xbean.xbean-asm6-shaded org.apache.zookeeper.zookeeper org.codehaus.jackson.jackson-core-asl org.codehaus.jackson.jackson-mapper-asl org.codehaus.jettison.jettison org.datanucleus.datanucleus-api-jdo org.datanucleus.datanucleus-core org.datanucleus.datanucleus-rdbms org.eclipse.jetty.jetty-client org.eclipse.jetty.jetty-http org.eclipse.jetty.jetty-io org.eclipse.jetty.jetty-security org.eclipse.jetty.jetty-server org.eclipse.jetty.jetty-servlet org.eclipse.jetty.jetty-util org.eclipse.jetty.jetty-util-ajax org.eclipse.jetty.jetty-webapp org.eclipse.jetty.jetty-xml org.fusesource.jansi.jansi org.hibernate.hibernate-validator org.htrace.htrace-core org.iq80.snappy.snappy org.jboss.jandex org.joda.joda-convert org.json4s.json4s-ast_2.11 org.json4s.json4s-core_2.11 org.json4s.json4s-jackson_2.11 org.json4s.json4s-scalap_2.11 org.liquibase.liquibase-core org.lz4.lz4-java org.mapstruct.mapstruct org.mortbay.jetty.jetty org.mortbay.jetty.jetty-util org.mybatis.mybatis org.objenesis.objenesis org.osgi.org.osgi.core org.parboiled.parboiled-core org.parboiled.parboiled-scala_2.11 org.quartz-scheduler.quartz org.roaringbitmap.RoaringBitmap org.sonatype.oss.oss-parent org.sonatype.sisu.inject.cglib org.spark-project.hive.hive-exec org.spark-project.hive.hive-metastore org.springframework.boot.spring-boot org.springframework.boot.spring-boot-autoconfigure org.springframework.boot.spring-boot-starter org.springframework.boot.spring-boot-starter-aop org.springframework.boot.spring-boot-starter-data-jpa org.springframework.boot.spring-boot-starter-jdbc org.springframework.boot.spring-boot-starter-log4j2 org.springframework.boot.spring-boot-starter-logging org.springframework.boot.spring-boot-starter-tomcat org.springframework.boot.spring-boot-starter-undertow

Copyright © 2019 Trifacta Inc. Page #126 org.springframework.boot.spring-boot-starter-web org.springframework.data.spring-data-commons org.springframework.data.spring-data-jpa org.springframework.plugin.spring-plugin-core org.springframework.plugin.spring-plugin-metadata org.springframework.retry.spring-retry org.springframework.security.spring-security-config org.springframework.security.spring-security-core org.springframework.security.spring-security-crypto org.springframework.security.spring-security-web org.springframework.spring-aop org.springframework.spring-aspects org.springframework.spring-beans org.springframework.spring-context org.springframework.spring-context-support org.springframework.spring-core org.springframework.spring-expression org.springframework.spring-jdbc org.springframework.spring-orm org.springframework.spring-tx org.springframework.spring-web org.springframework.spring-webmvc org.springframework.spring-websocket org.uncommons.maths.uncommons-maths org.wildfly.openssl.wildfly-openssl org.xerial.snappy.snappy-java org.yaml.snakeyaml oro.oro parquet pbr pig-0.11.1-withouthadoop-23 pig-0.12.1-mapr-noversion-withouthadoop pig-0.14.0-core-spork piggybank-amzn-0.3 piggybank-cdh5.0.0-beta-2-0.12.0 python-iptables request requests stax.stax-api thrift tomcat.jasper-compiler tomcat.jasper-runtime xerces.xercesImpl xml-apis.xml-apis zipkin zipkin-transport-http

This product also includes the following libraries which are covered by The Apache License 2.0 + Eclipse Public License 1.0:

spark-assembly-thinner

This product also includes the following libraries which are covered by The Apache License Version 2:

org.mortbay.jetty.jetty-sslengine

This product also includes the following libraries which are covered by The Apache License v2.0:

net.java.dev.jna.jna net.java.dev.jna.jna-platform

Copyright © 2019 Trifacta Inc. Page #127 This product also includes the following libraries which are covered by The Apache License, version 2.0:

com.nimbusds.oauth2-oidc-sdk org.jboss.logging.jboss-logging

This product also includes the following libraries which are covered by The Apache Software Licenses:

org.slf4j.log4j-over-slf4j

This product also includes the following libraries which are covered by The BSD license:

alabaster antlr.antlr asm.asm-parent babel click com.google.api.api-common com.google.api.gax com.google.api.gax-grpc com.google.api.gax-httpjson com.jcraft.jsch com.thoughtworks.paranamer.paranamer dk.brics.automaton.automaton dom4j.dom4j enum34 flask itsdangerous javolution.javolution jinja2 jline.jline microee mock networkx numpy org.antlr.ST4 org.antlr.antlr-runtime org.antlr.antlr4-runtime org.antlr.stringtemplate org.codehaus.woodstox.stax2-api org.ow2.asm.asm org.scala-lang.jline pandas pluginbase psutil pygments python-enum34 python-json-logger scipy snowballstemmer sphinx sphinxcontrib-websupport strptime websocket-stream xlrd xmlenc.xmlenc xss-filters

This product also includes the following libraries which are covered by The BSD 2-Clause License:

Copyright © 2019 Trifacta Inc. Page #128 com.github.luben.zstd-jni

This product also includes the following libraries which are covered by The BSD 3-Clause:

org.scala-lang.scala-compiler org.scala-lang.scala-library org.scala-lang.scala-reflect org.scala-lang.scalap

This product also includes the following libraries which are covered by The BSD 3-Clause "New" or "Revised" License (BSD-3-Clause):

org.abego.treelayout.org.abego.treelayout.core

This product also includes the following libraries which are covered by The BSD 3-Clause License:

org.antlr.antlr4

This product also includes the following libraries which are covered by The BSD 3-clause:

org.scala-lang.modules.scala-parser-combinators_2.11 org.scala-lang.modules.scala-xml_2.11 org.scala-lang.plugins.scala-continuations-library_2.11 org.threeten.threetenbp

This product also includes the following libraries which are covered by The BSD New license:

com.google.auth.google-auth-library-credentials com.google.auth.google-auth-library-oauth2-http

This product also includes the following libraries which are covered by The BSD-2-Clause:

cls-bluebird node-polyglot org.postgresql.postgresql terser uglify-js

This product also includes the following libraries which are covered by The BSD-3-Clause:

@sentry/node d3-dsv datalib datalib-sketch markupsafe md5 node-forge protobufjs qs queue-async sqlite3 werkzeug

This product also includes the following libraries which are covered by The BSD-Style:

com.jsuereth.scala-arm_2.11

This product also includes the following libraries which are covered by The BSD-derived ( http://www.repoze.org/LICENSE.txt):

Copyright © 2019 Trifacta Inc. Page #129 meld3 supervisor

This product also includes the following libraries which are covered by The BSD-like:

dnspython idna org.scala-lang.scala-actors org.scalamacros.quasiquotes_2.10

This product also includes the following libraries which are covered by The BSD-style license:

bzip2

This product also includes the following libraries which are covered by The Boost :

asio boost cpp-netlib-uri expected jsbind poco

This product also includes the following libraries which are covered by The Bouncy Castle Licence:

org.bouncycastle.bcprov-jdk15on

This product also includes the following libraries which are covered by The CDDL:

javax.mail.mail javax.mail.mailapi javax.servlet.jsp-api javax.servlet.jsp.jsp-api javax.servlet.servlet-api javax.transaction.jta javax.xml.stream.stax-api org.glassfish.external.management-api org.glassfish.gmbal.gmbal-api-only org.jboss.spec.javax.annotation.jboss-annotations-api_1.2_spec

This product also includes the following libraries which are covered by The CDDL + GPLv2 with classpath exception:

javax.annotation.javax.annotation-api javax.jms.jms javax.servlet.javax.servlet-api javax.transaction.javax.transaction-api javax.transaction.transaction-api org.glassfish.grizzly.grizzly-framework org.glassfish.grizzly.grizzly-http org.glassfish.grizzly.grizzly-http-server org.glassfish.grizzly.grizzly-http-servlet org.glassfish.grizzly.grizzly-rcm org.glassfish.hk2.external.aopalliance-repackaged org.glassfish.hk2.external.javax.inject org.glassfish.hk2.hk2-api org.glassfish.hk2.hk2-locator org.glassfish.hk2.hk2-utils org.glassfish.hk2.osgi-resource-locator org.glassfish.javax.el

Copyright © 2019 Trifacta Inc. Page #130 org.glassfish.jersey.core.jersey-common

This product also includes the following libraries which are covered by The CDDL 1.1:

com.sun.jersey.contribs.jersey-guice com.sun.jersey.jersey-client com.sun.jersey.jersey-core com.sun.jersey.jersey-json com.sun.jersey.jersey-server com.sun.jersey.jersey-servlet com.sun.xml.bind.jaxb-impl javax.ws.rs.javax.ws.rs-api javax.xml.bind.jaxb-api org.jvnet.mimepull.mimepull

This product also includes the following libraries which are covered by The CDDL License:

javax.ws.rs.jsr311-api

This product also includes the following libraries which are covered by The CDDL/GPLv2+CE:

com.sun.mail.javax.mail

This product also includes the following libraries which are covered by The CERN:

colt.colt

This product also includes the following libraries which are covered by The COMMON DEVELOPMENT AND DISTRIBUTION LICENSE (CDDL) Version 1.0:

javax.activation.activation javax.annotation.jsr250-api

This product also includes the following libraries which are covered by The Doug Crockford's license that allows this module to be used for Good but not for Evil:

jsmin

This product also includes the following libraries which are covered by The Dual License:

python-dateutil

This product also includes the following libraries which are covered by The Eclipse Distribution License (EDL), Version 1.0:

org.hibernate.javax.persistence.hibernate-jpa-2.1-api

This product also includes the following libraries which are covered by The Eclipse Public License:

com.github.oshi.oshi-core

This product also includes the following libraries which are covered by The Eclipse Public License - v 1.0:

org.aspectj.aspectjweaver

This product also includes the following libraries which are covered by The Eclipse Public License, Version 1.0:

com.mchange.mchange-commons-java

Copyright © 2019 Trifacta Inc. Page #131 This product also includes the following libraries which are covered by The GNU General Public License v2.0 only, with Classpath exception:

org.jboss.spec.javax.servlet.jboss-servlet-api_3.1_spec org.jboss.spec.javax.websocket.jboss-websocket-api_1.1_spec

This product also includes the following libraries which are covered by The GNU LGPL:

nose

This product also includes the following libraries which are covered by The GNU Lesser General Public License:

org.hibernate.common.hibernate-commons-annotations org.hibernate.hibernate-core org.hibernate.hibernate-entitymanager

This product also includes the following libraries which are covered by The GNU Lesser General Public License Version 2.1, February 1999:

org.jgrapht.jgrapht-core

This product also includes the following libraries which are covered by The GNU Lesser General Public License, Version 2.1:

com.fasterxml.jackson.core.jackson-annotations com.fasterxml.jackson.core.jackson-core com.fasterxml.jackson.core.jackson-databind com.mchange.c3p0

This product also includes the following libraries which are covered by The GNU Lesser Public License:

com.google.code.findbugs.annotations

This product also includes the following libraries which are covered by The GPLv3:

yamllint

This product also includes the following libraries which are covered by The Google Cloud Software License:

com.google.cloud.google-cloud-storage

This product also includes the following libraries which are covered by The ICU License:

icu

This product also includes the following libraries which are covered by The ISC:

browserify-sign iconify inherits lru-cache request-promise request-promise-native requests-kerberos rimraf sax semver split-ca

Copyright © 2019 Trifacta Inc. Page #132 This product also includes the following libraries which are covered by The JGraph Ltd - 3 clause BSD license:

org.tinyjee.jgraphx.jgraphx

This product also includes the following libraries which are covered by The LGPL:

chardet com.sun.jna.jna

This product also includes the following libraries which are covered by The LGPL 2.1:

org.codehaus.jackson.jackson-jaxrs org.codehaus.jackson.jackson-xc org.javassist.javassist xmlhttprequest

This product also includes the following libraries which are covered by The LGPLv3 or later:

com.github.fge.json-schema-core com.github.fge.json-schema-validator

This product also includes the following libraries which are covered by The MIT license:

amplitude-js analytics-node args4j.args4j async avsc backbone backbone-forms basic-auth bcrypt bluebird body-parser browser-filesaver buffer-crc32 bufferedstream busboy byline bytes cachetools chai chalk cli-table clipboard codemirror colors com.github.tommyettinger.blazingchain com.microsoft.azure.azure-data-lake-store-sdk com.microsoft.sqlserver.mssql-jdbc commander common-tags compression console.table cookie cookie-parser cookie-session cronstrue crypto-browserify csrf

Copyright © 2019 Trifacta Inc. Page #133 csurf definitely eventEmitter express express-http-context express-params express-zip forever form-data fs-extra function-rate-limit future fuzzy generic-pool google-auto-auth iconv-lite imagesize int24 is-my-json-valid jade jq jquery jquery-serializeobject jquery.ba-serializeobject jquery.event.drag jquery.form jquery.ui json-stable-stringify jsonfile jsonschema jsonwebtoken jszip keygrip keysim knex less-middleware lodash lunr matic memcached method-override mini-css-extract-plugin minilog mkdirp moment moment-jdateformatparser moment-timezone mongoose morgan morgan-json mysql net.razorvine.pyrolite netifaces nock nodemailer org.checkerframework.checker-compat-qual org.checkerframework.checker-qual org.mockito.mockito-core org.slf4j.jcl-over-slf4j org.slf4j.jul-to-slf4j

Copyright © 2019 Trifacta Inc. Page #134 org.slf4j.slf4j-api org.slf4j.slf4j-log4j12 pace passport passport-azure-ad passport-http passport-http-bearer passport-ldapauth passport-local passport-saml passport-saml-metadata passport-strategy password-validator pegjs pg pg-hstore pip png-img promise-retry prop-types punycode py-cpuinfo python-crfsuite python-six pytz pyyaml query-string querystring randexp rapidjson react react-day-picker react-dom react-hot-loader react-modal react-router-dom react-select react-switch react-table react-virtualized recursive-readdir redefine requestretry require-jade require-json retry retry-as-promised rotating-file-stream safe-json-stringify sequelize setuptools simple-ldap-search simplejson singledispatch six slick.core slick.grid slick.headerbuttons slick.headerbuttons.css slick.rowselectionmodel

Copyright © 2019 Trifacta Inc. Page #135 snappy sphinx-rtd-theme split-pane sql stream-meter supertest tar-fs temp to-case tv4 ua-parser-js umzug underscore.string unicode-length universal-analytics urijs uritools url urllib3 user-agent-parser uuid validator wheel winston winston-daily-rotate-file ws yargs zxcvbn

This product also includes the following libraries which are covered by The MIT License; BSD 3-clause License:

antlr4-cpp-runtime

This product also includes the following libraries which are covered by The MIT license:

org.codehaus.mojo.animal-sniffer-annotations

This product also includes the following libraries which are covered by The MIT/X:

vnc2flv

This product also includes the following libraries which are covered by The MIT/X11 license:

optimist

This product also includes the following libraries which are covered by The MPL 2.0:

pathspec

This product also includes the following libraries which are covered by The MPL 2.0 or EPL 1.0:

com.h2database.h2

This product also includes the following libraries which are covered by The MPL-2.0:

certifi

This product also includes the following libraries which are covered by The Microsoft Software License:

com.microsoft.sqlserver.sqljdbc4

Copyright © 2019 Trifacta Inc. Page #136 This product also includes the following libraries which are covered by The Mozilla Public License, Version 2.0:

org.mozilla.rhino

This product also includes the following libraries which are covered by The New BSD License:

asm.asm backbone-queryparams bloomfilter com.esotericsoftware.kryo-shaded com.esotericsoftware.minlog com.googlecode.protobuf-java-format.protobuf-java-format d3 domReady double-conversion gflags glog gperftools gtest org.antlr.antlr-master org.codehaus.janino.commons-compiler org.codehaus.janino.janino org.hamcrest.hamcrest-core pcre protobuf re2 require-text sha1 termcolor topojson triflow vega websocketpp

This product also includes the following libraries which are covered by The New BSD license:

com.google.protobuf.nano.protobuf-javanano

This product also includes the following libraries which are covered by The Oracle Technology Network License:

com.oracle.ojdbc6

This product also includes the following libraries which are covered by The PSF:

futures heap typing

This product also includes the following libraries which are covered by The PSF license:

functools32 python

This product also includes the following libraries which are covered by The PSF or ZPL:

wsgiref

This product also includes the following libraries which are covered by The Proprietary:

com.trifacta.connect.trifacta-TYcassandra

Copyright © 2019 Trifacta Inc. Page #137 com.trifacta.connect.trifacta-TYdb2 com.trifacta.connect.trifacta-TYgreenplum com.trifacta.connect.trifacta-TYhive com.trifacta.connect.trifacta-TYinformix com.trifacta.connect.trifacta-TYmongodb com.trifacta.connect.trifacta-TYmysql com.trifacta.connect.trifacta-TYopenedgewp com.trifacta.connect.trifacta-TYoracle com.trifacta.connect.trifacta-TYoraclesalescloud com.trifacta.connect.trifacta-TYpostgresql com.trifacta.connect.trifacta-TYredshift com.trifacta.connect.trifacta-TYrightnow com.trifacta.connect.trifacta-TYsforce com.trifacta.connect.trifacta-TYsparksql com.trifacta.connect.trifacta-TYsqlserver com.trifacta.connect.trifacta-TYsybase jsdata

This product also includes the following libraries which are covered by The Public Domain:

aopalliance.aopalliance org.jboss.xnio.xnio-api org.jboss.xnio.xnio-nio org.tukaani.xz protobuf-to-dict pycrypto simple-xml-writer

This product also includes the following libraries which are covered by The Public domain:

net.iharder.base64

This product also includes the following libraries which are covered by The Python Software Foundation License:

argparse backports-abc ipaddress python-setuptools regex

This product also includes the following libraries which are covered by The Tableau Binary Code License Agreement:

com.tableausoftware.common.tableaucommon com.tableausoftware.extract.tableauextract

This product also includes the following libraries which are covered by The Apache License, Version 2.0:

com.fasterxml.woodstox.woodstox-core com.google.cloud.bigtable.bigtable-client-core com.google.cloud.datastore.datastore-v1-proto-client io.opencensus.opencensus-api io.opencensus.opencensus-contrib-grpc-metrics io.opencensus.opencensus-contrib-grpc-util io.opencensus.opencensus-contrib-http-util org.spark-project.spark.unused software.amazon.ion.ion-java

This product also includes the following libraries which are covered by The BSD 3-Clause License:

Copyright © 2019 Trifacta Inc. Page #138 org.fusesource.leveldbjni.leveldbjni-all

This product also includes the following libraries which are covered by The GNU General Public License, Version 2:

mysql.mysql-connector-java

This product also includes the following libraries which are covered by The Go license:

com.google.re2j.re2j

This product also includes the following libraries which are covered by The MIT License (MIT):

com.microsoft.azure.azure com.microsoft.azure.azure-annotations com.microsoft.azure.azure-client-authentication com.microsoft.azure.azure-client-runtime com.microsoft.azure.azure-keyvault com.microsoft.azure.azure-keyvault-core com.microsoft.azure.azure-keyvault-webkey com.microsoft.azure.azure-mgmt-appservice com.microsoft.azure.azure-mgmt-batch com.microsoft.azure.azure-mgmt-cdn com.microsoft.azure.azure-mgmt-compute com.microsoft.azure.azure-mgmt-containerinstance com.microsoft.azure.azure-mgmt-containerregistry com.microsoft.azure.azure-mgmt-containerservice com.microsoft.azure.azure-mgmt-cosmosdb com.microsoft.azure.azure-mgmt-dns com.microsoft.azure.azure-mgmt-graph-rbac com.microsoft.azure.azure-mgmt-keyvault com.microsoft.azure.azure-mgmt-locks com.microsoft.azure.azure-mgmt-network com.microsoft.azure.azure-mgmt-redis com.microsoft.azure.azure-mgmt-resources com.microsoft.azure.azure-mgmt-search com.microsoft.azure.azure-mgmt-servicebus com.microsoft.azure.azure-mgmt-sql com.microsoft.azure.azure-mgmt-storage com.microsoft.azure.azure-mgmt-trafficmanager com.microsoft.rest.client-runtime

This product also includes the following libraries which are covered by The MIT License: http://www.opensource.org/licenses/mit-license.php:

probableparsing usaddress

This product also includes the following libraries which are covered by The New BSD License:

net.sf.py4j.py4j org.jodd.jodd-core

This product also includes the following libraries which are covered by The Unicode/ICU License:

com.ibm.icu.icu4j

This product also includes the following libraries which are covered by The Unlicense:

moment-range

Copyright © 2019 Trifacta Inc. Page #139 stream-buffers

This product also includes the following libraries which are covered by The WTFPL:

org.reflections.reflections

This product also includes the following libraries which are covered by The http://www.apache.org/licenses/LICENSE-2.0:

tornado

This product also includes the following libraries which are covered by The new BSD:

scikit-learn

This product also includes the following libraries which are covered by The new BSD License:

decorator

This product also includes the following libraries which are covered by The provided without support or warranty:

org.json.json

This product also includes the following libraries which are covered by The public domain, Python, 2-Clause BSD, GPL 3 (see COPYING.txt):

docutils

This product also includes the following libraries which are covered by The zlib License:

zlib

Copyright © 2019 Trifacta Inc. Page #140 Copyright © 2019 - Trifacta, Inc. All rights reserved.