Amazon EMR on EKS Development Guide Amazon EMR Amazon EMR on EKS Development Guide
Total Page:16
File Type:pdf, Size:1020Kb
Amazon EMR Amazon EMR on EKS Development Guide Amazon EMR Amazon EMR on EKS Development Guide Amazon EMR: Amazon EMR on EKS Development Guide Copyright © Amazon Web Services, Inc. and/or its affiliates. All rights reserved. Amazon's trademarks and trade dress may not be used in connection with any product or service that is not Amazon's, in any manner that is likely to cause confusion among customers, or in any manner that disparages or discredits Amazon. All other trademarks not owned by Amazon are the property of their respective owners, who may or may not be affiliated with, connected to, or sponsored by Amazon. Amazon EMR Amazon EMR on EKS Development Guide Table of Contents What is Amazon EMR on EKS .............................................................................................................. 1 Architecture ............................................................................................................................... 2 Concepts ................................................................................................................................... 2 Kubernetes namespace ....................................................................................................... 2 Virtual cluster .................................................................................................................... 2 Job run ............................................................................................................................. 3 Amazon EMR containers ...................................................................................................... 3 How the components work together ............................................................................................ 3 Setting up ......................................................................................................................................... 5 Install the AWS CLI .................................................................................................................... 5 To install or update the AWS CLI for macOS .......................................................................... 5 To install or update the AWS CLI for Linux ............................................................................ 5 To install or update the AWS CLI for Windows ....................................................................... 6 Configure your AWS CLI credentials ...................................................................................... 6 Install eksctl ........................................................................................................................... 7 To install or upgrade eksctl on macOS using Homebrew ......................................................... 7 To install or upgrade eksctl on Linux using curl ................................................................. 7 To install or upgrade eksctl on Windows using Chocolatey .................................................... 8 Set up an Amazon EKS cluster ..................................................................................................... 8 Prerequisites ...................................................................................................................... 8 Create an Amazon EKS cluster using eksctl ......................................................................... 8 Create an EKS cluster using AWS Management Console and AWS CLI ...................................... 10 Create an EKS cluster with AWS Fargate .............................................................................. 10 Enable cluster access for Amazon EMR on EKS ............................................................................. 11 Manual steps to enable cluster access for Amazon EMR on EKS .............................................. 12 Enable IAM Roles for Service Accounts (IRSA) on the EKS cluster .................................................... 14 To create an IAM OIDC identity provider for your cluster with eksctl ..................................... 14 To create an IAM OIDC identity provider for your cluster with the AWS Management Console ...... 14 Create a job execution role ........................................................................................................ 15 Update the trust policy of the job execution role ......................................................................... 15 Grant users access to Amazon EMR on EKS .................................................................................. 16 Creating a new IAM policy and attaching it to an IAM user in the IAM console ........................... 16 Permissions for managing virtual clusters ............................................................................ 17 Permissions for submitting jobs ......................................................................................... 17 Permissions for debugging and monitoring .......................................................................... 18 Register the Amazon EKS cluster with Amazon EMR ..................................................................... 18 Getting started ................................................................................................................................ 20 Run a Spark Python application ................................................................................................. 20 Customizing Docker images ............................................................................................................... 23 How to customize Docker images ............................................................................................... 23 Prerequisites .................................................................................................................... 23 Step 1: Retrieve a base image from Amazon Elastic Container Registry (Amazon ECR) ................ 23 Step 2: Customize a base image ......................................................................................... 24 Step 3: (Optional but recommended) Validate a custom image ............................................... 24 Step 4: Publish a custom image ......................................................................................... 25 Step 5: Submit a Spark workload in Amazon EMR using a custom image .................................. 26 How to select a base image URI ................................................................................................. 28 Considerations ......................................................................................................................... 29 Job runs .......................................................................................................................................... 30 Managing job runs using the AWS CLI ......................................................................................... 30 Submit a job run .............................................................................................................. 30 Options for configuring a job run ....................................................................................... 31 Configure a job run to use S3 logs ..................................................................................... 33 Configure a job run to use Amazon CloudWatch Logs ........................................................... 33 iii Amazon EMR Amazon EMR on EKS Development Guide List job runs .................................................................................................................... 34 Describe a job run ............................................................................................................ 35 Cancel a job run ............................................................................................................... 35 Job run states .......................................................................................................................... 35 Viewing jobs in the Amazon EMR console .................................................................................... 35 Common errors when running jobs ............................................................................................. 36 Using pod templates ................................................................................................................ 40 Common scenarios ........................................................................................................... 40 Enabling pod templates with Amazon EMR on EKS ............................................................... 41 Pod template fields .......................................................................................................... 43 Sidecar container considerations ........................................................................................ 45 Using Spark event log rotation .................................................................................................. 46 Monitoring jobs ................................................................................................................................ 47 Monitor jobs with Amazon CloudWatch Events ............................................................................. 47 Automate Amazon EMR on EKS with CloudWatch Events ............................................................... 48 Example: Set up a rule that invokes Lambda ................................................................................ 48 Managing