Amazon Glue Studio User Guide Amazon Glue Studio User Guide

Amazon Glue Studio User Guide Amazon Glue Studio User Guide Amazon Glue Studio: User Guide Amazon Glue Studio User Guide Table of Contents What is Amazon Glue Studio? .............................................................................................................. 1 Features of Amazon Glue Studio .................................................................................................. 2 Visual job editor ................................................................................................................ 2 Job script code editor ......................................................................................................... 2 Job performance dashboard ................................................................................................ 3 Support for dataset partitioning .......................................................................................... 3 When should I use Amazon Glue Studio? ...................................................................................... 3 Accessing Amazon Glue Studio .................................................................................................... 3 Pricing for Amazon Glue Studio ................................................................................................... 4 Setting up ......................................................................................................................................... 5 Sign up for Amazon ................................................................................................................... 5 Create an IAM administrator user ................................................................................................. 5 Signing in as an IAM user ........................................................................................................... 6 IAM permissions needed for the Amazon Glue Studio user .............................................................. 6 Amazon Glue service permissions ......................................................................................... 6 Amazon CloudWatch permissions ......................................................................................... 7 Job-related permissions .............................................................................................................. 7 Data source and data target permissions ............................................................................... 7 Permissions required for deleting jobs .................................................................................. 8 Amazon Key Management Service permissions ....................................................................... 8 Additional permissions when using connectors ....................................................................... 8 Set up IAM permissions for Amazon Glue Studio ............................................................................ 8 Configuring a VPC for your ETL job .............................................................................................. 9 Populate the Amazon Glue Data Catalog ...................................................................................... 9 Tutorial: Getting started .................................................................................................................... 10 Prerequisites ............................................................................................................................ 10 Step 1: Start the job creation process ......................................................................................... 10 Step 2: Edit the data source node in the job diagram .................................................................... 11 Step 3: Edit the transform node of the job .................................................................................. 12 Step 4: Edit the data target node of the job ................................................................................ 12 Step 5: View the job script ........................................................................................................ 13 Step 6: Specify the job details and save the job ........................................................................... 13 Step 7: Run the job .................................................................................................................. 14 Next steps ............................................................................................................................... 14 Creating jobs ................................................................................................................................... 15 Start the job creation process .................................................................................................... 15 Create jobs that use a connector ................................................................................................ 16 Next steps for creating a job in Amazon Glue Studio .................................................................... 16 Editing jobs ..................................................................................................................................... 17 Accessing the job diagram editor ................................................................................................ 17 Job editor features ................................................................................................................... 17 Using schema previews in the visual job editor .................................................................... 18 Using data previews in the visual job editor ......................................................................... 18 Restrictions when using data previews ................................................................................ 19 Editing the data source node ..................................................................................................... 19 Using Data Catalog tables for the data source ..................................................................... 20 Using a connector for the data source ................................................................................ 21 Using files in Amazon S3 for the data source ....................................................................... 21 Using a streaming data source ........................................................................................... 22 Editing the data transform node ................................................................................................ 23 Overview of mappings and transforms ................................................................................ 23 Using ApplyMapping to remap data property keys ................................................................ 24 Using SelectFields to remove most data property keys .......................................................... 25 Using DropFields to keep most data property keys ............................................................... 25 iii Amazon Glue Studio User Guide Renaming a field in the dataset ......................................................................................... 26 Using Spigot to sample your dataset .................................................................................. 27 Joining datasets ............................................................................................................... 27 Using SplitFields to split a dataset into two ......................................................................... 29 Overview of SelectFromCollection transform ........................................................................ 29 Using SelectFromCollection to choose which dataset to keep ................................................. 30 Filtering keys within a dataset ........................................................................................... 30 Find and fill missing values in a dataset .............................................................................. 31 Using a SQL query to transform data ................................................................................. 32 Creating a custom transformation ...................................................................................... 33 Configuring data target nodes ................................................................................................... 36 Overview of data target options ........................................................................................ 36 Editing the data target node ............................................................................................. 37 Editing or uploading a job script ................................................................................................ 39 Creating and editing Scala scripts in Amazon Glue Studio ...................................................... 40 Creating and editing Python shell jobs in Amazon Glue Studio ............................................... 41 Adding nodes to the job diagram ............................................................................................... 42 Changing the parent nodes for a node in the job diagram ............................................................. 42 Deleting nodes from the job diagram ......................................................................................... 43 Using connectors and connections .....................................................................................................

Amazon Glue Studio User Guide Amazon Glue Studio User Guide

ARCHIVED: Lambda Architecture for Batch and Stream Processing

ETL Options Across Cloud Simplifying the Data Preparation, Data Transformation & Data Exploration for Augmented Analytics X-Clouds

Serverless Linear Algebra

Numpywren: Serverless Linear Algebra

An Overview on Amazon Rekognition Technology

Whitepaper Title

Big Data Analytics Options on AWS

Aws Data Pipeline Developer Guide

Lambda Architecture for Batch and Stream Processing

Towards Practical Serverless Analytics

Scalable Analytics on Serverless Infrastructure

AWS Solutions Constructs AWS Solutions AWS Solutions Constructs AWS Solutions