Amazon Glue Studio User Guide Amazon Glue Studio User Guide
Total Page:16
File Type:pdf, Size:1020Kb
Amazon Glue Studio User Guide Amazon Glue Studio User Guide Amazon Glue Studio: User Guide Amazon Glue Studio User Guide Table of Contents What is Amazon Glue Studio? .............................................................................................................. 1 Features of Amazon Glue Studio .................................................................................................. 2 Visual job editor ................................................................................................................ 2 Job script code editor ......................................................................................................... 2 Job performance dashboard ................................................................................................ 3 Support for dataset partitioning .......................................................................................... 3 When should I use Amazon Glue Studio? ...................................................................................... 3 Accessing Amazon Glue Studio .................................................................................................... 3 Pricing for Amazon Glue Studio ................................................................................................... 4 Setting up ......................................................................................................................................... 5 Sign up for Amazon ................................................................................................................... 5 Create an IAM administrator user ................................................................................................. 5 Signing in as an IAM user ........................................................................................................... 6 IAM permissions needed for the Amazon Glue Studio user .............................................................. 6 Amazon Glue service permissions ......................................................................................... 6 Amazon CloudWatch permissions ......................................................................................... 7 Job-related permissions .............................................................................................................. 7 Data source and data target permissions ............................................................................... 7 Permissions required for deleting jobs .................................................................................. 8 Amazon Key Management Service permissions ....................................................................... 8 Additional permissions when using connectors ....................................................................... 8 Set up IAM permissions for Amazon Glue Studio ............................................................................ 8 Configuring a VPC for your ETL job .............................................................................................. 9 Populate the Amazon Glue Data Catalog ...................................................................................... 9 Tutorial: Getting started .................................................................................................................... 10 Prerequisites ............................................................................................................................ 10 Step 1: Start the job creation process ......................................................................................... 10 Step 2: Edit the data source node in the job diagram .................................................................... 11 Step 3: Edit the transform node of the job .................................................................................. 12 Step 4: Edit the data target node of the job ................................................................................ 12 Step 5: View the job script ........................................................................................................ 13 Step 6: Specify the job details and save the job ........................................................................... 13 Step 7: Run the job .................................................................................................................. 14 Next steps ............................................................................................................................... 14 Creating jobs ................................................................................................................................... 15 Start the job creation process .................................................................................................... 15 Create jobs that use a connector ................................................................................................ 16 Next steps for creating a job in Amazon Glue Studio .................................................................... 16 Editing jobs ..................................................................................................................................... 17 Accessing the job diagram editor ................................................................................................ 17 Job editor features ................................................................................................................... 17 Using schema previews in the visual job editor .................................................................... 18 Using data previews in the visual job editor ......................................................................... 18 Restrictions when using data previews ................................................................................ 19 Editing the data source node ..................................................................................................... 19 Using Data Catalog tables for the data source ..................................................................... 20 Using a connector for the data source ................................................................................ 21 Using files in Amazon S3 for the data source ....................................................................... 21 Using a streaming data source ........................................................................................... 22 Editing the data transform node ................................................................................................ 23 Overview of mappings and transforms ................................................................................ 23 Using ApplyMapping to remap data property keys ................................................................ 24 Using SelectFields to remove most data property keys .......................................................... 25 Using DropFields to keep most data property keys ............................................................... 25 iii Amazon Glue Studio User Guide Renaming a field in the dataset ......................................................................................... 26 Using Spigot to sample your dataset .................................................................................. 27 Joining datasets ............................................................................................................... 27 Using SplitFields to split a dataset into two ......................................................................... 29 Overview of SelectFromCollection transform ........................................................................ 29 Using SelectFromCollection to choose which dataset to keep ................................................. 30 Filtering keys within a dataset ........................................................................................... 30 Find and fill missing values in a dataset .............................................................................. 31 Using a SQL query to transform data ................................................................................. 32 Creating a custom transformation ...................................................................................... 33 Configuring data target nodes ................................................................................................... 36 Overview of data target options ........................................................................................ 36 Editing the data target node ............................................................................................. 37 Editing or uploading a job script ................................................................................................ 39 Creating and editing Scala scripts in Amazon Glue Studio ...................................................... 40 Creating and editing Python shell jobs in Amazon Glue Studio ............................................... 41 Adding nodes to the job diagram ............................................................................................... 42 Changing the parent nodes for a node in the job diagram ............................................................. 42 Deleting nodes from the job diagram ......................................................................................... 43 Using connectors and connections .....................................................................................................