AWS Glue Studio User Guide AWS Glue Studio User Guide

AWS Glue Studio User Guide AWS Glue Studio User Guide

AWS Glue Studio User Guide AWS Glue Studio User Guide AWS Glue Studio: User Guide Copyright © Amazon Web Services, Inc. and/or its affiliates. All rights reserved. Amazon's trademarks and trade dress may not be used in connection with any product or service that is not Amazon's, in any manner that is likely to cause confusion among customers, or in any manner that disparages or discredits Amazon. All other trademarks not owned by Amazon are the property of their respective owners, who may or may not be affiliated with, connected to, or sponsored by Amazon. AWS Glue Studio User Guide Table of Contents What is AWS Glue Studio? ................................................................................................................... 1 Features of AWS Glue Studio ....................................................................................................... 2 Visual job editor ................................................................................................................ 2 Job script code editor ......................................................................................................... 2 Job performance dashboard ................................................................................................ 3 Support for dataset partitioning .......................................................................................... 3 When should I use AWS Glue Studio? ........................................................................................... 3 Accessing AWS Glue Studio ......................................................................................................... 3 Pricing for AWS Glue Studio ........................................................................................................ 4 Setting up ......................................................................................................................................... 5 Sign up for AWS ........................................................................................................................ 5 Create an IAM administrator user ................................................................................................. 5 Signing in as an IAM user ........................................................................................................... 6 IAM permissions needed for the AWS Glue Studio user ................................................................... 6 AWS Glue service permissions .............................................................................................. 6 Amazon CloudWatch permissions ......................................................................................... 7 Job-related permissions .............................................................................................................. 7 Data source and data target permissions ............................................................................... 7 Permissions required for deleting jobs .................................................................................. 8 AWS Key Management Service permissions ........................................................................... 8 Additional permissions when using connectors ....................................................................... 8 Set up IAM permissions for AWS Glue Studio ................................................................................. 8 Configuring a VPC for your ETL job .............................................................................................. 9 Populate the AWS Glue Data Catalog ........................................................................................... 9 Tutorial: Getting started .................................................................................................................... 11 Prerequisites ............................................................................................................................ 11 Step 1: Start the job creation process ......................................................................................... 11 Step 2: Edit the data source node in the job diagram .................................................................... 12 Step 3: Edit the transform node of the job .................................................................................. 13 Step 4: Edit the data target node of the job ................................................................................ 13 Step 5: View the job script ........................................................................................................ 14 Step 6: Specify the job details and save the job ........................................................................... 14 Step 7: Run the job .................................................................................................................. 15 Next steps ............................................................................................................................... 15 Creating jobs ................................................................................................................................... 16 Start the job creation process .................................................................................................... 16 Create jobs that use a connector ................................................................................................ 17 Next steps for creating a job in AWS Glue Studio ......................................................................... 17 Editing jobs ..................................................................................................................................... 18 Accessing the job diagram editor ................................................................................................ 18 Job editor features ................................................................................................................... 18 Using schema previews in the visual job editor .................................................................... 19 Using data previews in the visual job editor ......................................................................... 19 Restrictions when using data previews ................................................................................ 20 Editing the data source node ..................................................................................................... 20 Using Data Catalog tables for the data source ..................................................................... 21 Using a connector for the data source ................................................................................ 22 Using files in Amazon S3 for the data source ....................................................................... 22 Using a streaming data source ........................................................................................... 23 Editing the data transform node ................................................................................................ 24 Overview of mappings and transforms ................................................................................ 24 Using ApplyMapping to remap data property keys ................................................................ 25 Using SelectFields to remove most data property keys .......................................................... 26 Using DropFields to keep most data property keys ............................................................... 26 iii AWS Glue Studio User Guide Renaming a field in the dataset ......................................................................................... 27 Using Spigot to sample your dataset .................................................................................. 28 Joining datasets ............................................................................................................... 28 Using SplitFields to split a dataset into two ......................................................................... 30 Overview of SelectFromCollection transform ........................................................................ 30 Using SelectFromCollection to choose which dataset to keep ................................................. 31 Filtering keys within a dataset ........................................................................................... 31 Find and fill missing values in a dataset .............................................................................. 32 Using a SQL query to transform data ................................................................................. 33 Creating a custom transformation ...................................................................................... 34 Configuring data target nodes ................................................................................................... 37 Overview of data target options ........................................................................................ 37 Editing the data target node ............................................................................................. 38 Editing or uploading a job script ................................................................................................ 40 Creating and editing Scala scripts in AWS Glue Studio ........................................................... 41 Creating and editing Python shell jobs in AWS Glue Studio ...................................................

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    90 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us