AWS Glue Studio User Guide AWS Glue Studio User Guide

AWS Glue Studio User Guide AWS Glue Studio User Guide AWS Glue Studio: User Guide Copyright © Amazon Web Services, Inc. and/or its affiliates. All rights reserved. Amazon's trademarks and trade dress may not be used in connection with any product or service that is not Amazon's, in any manner that is likely to cause confusion among customers, or in any manner that disparages or discredits Amazon. All other trademarks not owned by Amazon are the property of their respective owners, who may or may not be affiliated with, connected to, or sponsored by Amazon. AWS Glue Studio User Guide Table of Contents What is AWS Glue Studio? ................................................................................................................... 1 Features of AWS Glue Studio ....................................................................................................... 2 Visual job editor ................................................................................................................ 2 Job script code editor ......................................................................................................... 2 Job performance dashboard ................................................................................................ 3 Support for dataset partitioning .......................................................................................... 3 When should I use AWS Glue Studio? ........................................................................................... 3 Accessing AWS Glue Studio ......................................................................................................... 3 Pricing for AWS Glue Studio ........................................................................................................ 4 Setting up ......................................................................................................................................... 5 Sign up for AWS ........................................................................................................................ 5 Create an IAM administrator user ................................................................................................. 5 Signing in as an IAM user ........................................................................................................... 6 IAM permissions needed for the AWS Glue Studio user ................................................................... 6 AWS Glue service permissions .............................................................................................. 6 Amazon CloudWatch permissions ......................................................................................... 7 Job-related permissions .............................................................................................................. 7 Data source and data target permissions ............................................................................... 7 Permissions required for deleting jobs .................................................................................. 8 AWS Key Management Service permissions ........................................................................... 8 Additional permissions when using connectors ....................................................................... 8 Set up IAM permissions for AWS Glue Studio ................................................................................. 8 Configuring a VPC for your ETL job .............................................................................................. 9 Populate the AWS Glue Data Catalog ........................................................................................... 9 Tutorial: Getting started .................................................................................................................... 11 Prerequisites ............................................................................................................................ 11 Step 1: Start the job creation process ......................................................................................... 11 Step 2: Edit the data source node in the job diagram .................................................................... 12 Step 3: Edit the transform node of the job .................................................................................. 13 Step 4: Edit the data target node of the job ................................................................................ 13 Step 5: View the job script ........................................................................................................ 14 Step 6: Specify the job details and save the job ........................................................................... 14 Step 7: Run the job .................................................................................................................. 15 Next steps ............................................................................................................................... 15 Creating jobs ................................................................................................................................... 16 Start the job creation process .................................................................................................... 16 Create jobs that use a connector ................................................................................................ 17 Next steps for creating a job in AWS Glue Studio ......................................................................... 17 Editing jobs ..................................................................................................................................... 18 Accessing the job diagram editor ................................................................................................ 18 Job editor features ................................................................................................................... 18 Using schema previews in the visual job editor .................................................................... 19 Using data previews in the visual job editor ......................................................................... 19 Restrictions when using data previews ................................................................................ 20 Editing the data source node ..................................................................................................... 20 Using Data Catalog tables for the data source ..................................................................... 21 Using a connector for the data source ................................................................................ 22 Using files in Amazon S3 for the data source ....................................................................... 22 Using a streaming data source ........................................................................................... 23 Editing the data transform node ................................................................................................ 24 Overview of mappings and transforms ................................................................................ 24 Using ApplyMapping to remap data property keys ................................................................ 25 Using SelectFields to remove most data property keys .......................................................... 26 Using DropFields to keep most data property keys ............................................................... 26 iii AWS Glue Studio User Guide Renaming a field in the dataset ......................................................................................... 27 Using Spigot to sample your dataset .................................................................................. 28 Joining datasets ............................................................................................................... 28 Using SplitFields to split a dataset into two ......................................................................... 30 Overview of SelectFromCollection transform ........................................................................ 30 Using SelectFromCollection to choose which dataset to keep ................................................. 31 Filtering keys within a dataset ........................................................................................... 31 Find and fill missing values in a dataset .............................................................................. 32 Using a SQL query to transform data ................................................................................. 33 Creating a custom transformation ...................................................................................... 34 Configuring data target nodes ................................................................................................... 37 Overview of data target options ........................................................................................ 37 Editing the data target node ............................................................................................. 38 Editing or uploading a job script ................................................................................................ 40 Creating and editing Scala scripts in AWS Glue Studio ........................................................... 41 Creating and editing Python shell jobs in AWS Glue Studio ...................................................

AWS Glue Studio User Guide AWS Glue Studio User Guide

Advantages of Schema Less Database

An Evaluation of Compilation-Based PL/PGSQL Execution Tanuj Nayak CMU-CS-21-101 February 2021

CMU-CS-21-106 May 2021

Scalable and Reactive Data Management for Mobile Internet-Of-Things Applications with Actor-Oriented Databases

732A54 / TDDE31 Big Data Analytics Topic: Dbmss for Big