AWS Prescriptive Guidance

AWS Prescriptive Guidance Cross-account full table copy options for Amazon DynamoDB AWS Prescriptive Guidance Cross-account full table copy options for Amazon DynamoDB AWS Prescriptive Guidance: Cross-account full table copy options for Amazon DynamoDB Copyright © Amazon Web Services, Inc. and/or its affiliates. All rights reserved. Amazon's trademarks and trade dress may not be used in connection with any product or service that is not Amazon's, in any manner that is likely to cause confusion among customers, or in any manner that disparages or discredits Amazon. All other trademarks not owned by Amazon are the property of their respective owners, who may or may not be affiliated with, connected to, or sponsored by Amazon. AWS Prescriptive Guidance Cross-account full table copy options for Amazon DynamoDB Table of Contents Home ............................................................................................................................................... 1 Overview ................................................................................................................................... 1 Using AWS Data Pipeline .................................................................................................................... 2 Advantages ............................................................................................................................... 2 Drawbacks ................................................................................................................................. 2 Using AWS Glue and Amazon DynamoDB export .................................................................................... 3 Advantages ............................................................................................................................... 4 Drawbacks ................................................................................................................................. 4 Using Amazon EMR ............................................................................................................................ 5 Advantages ............................................................................................................................... 5 Drawbacks ................................................................................................................................. 5 Using a custom implementation .......................................................................................................... 6 Advantages ............................................................................................................................... 6 Drawbacks ................................................................................................................................. 7 Using AWS Lambda and Python ........................................................................................................... 8 Advantages ............................................................................................................................... 8 Drawbacks ................................................................................................................................. 8 Using AWS Glue with Amazon DynamoDB as source and sink .................................................................. 9 Advantages ............................................................................................................................... 9 Drawbacks ................................................................................................................................. 9 Next steps ....................................................................................................................................... 10 Resources ........................................................................................................................................ 11 Document history ............................................................................................................................. 12 iii AWS Prescriptive Guidance Cross-account full table copy options for Amazon DynamoDB Overview Cross-account full table copy options for Amazon DynamoDB Ramkumar Ramanujam, Consultant, and Sravan Velagandula, Consultant, Amazon Web Services August 2021 This guide covers different ways to perform full table copying of Amazon DynamoDB tables across multiple Amazon Web Services (AWS) accounts. This guide also lists the advantages and drawbacks of each solution and the scenarios for which each solution can be considered. It does not cover streaming- replication solutions. This guide in intended for architects, managers, and technical leads who have a basic understanding of DynamoDB. Overview To improve application performance and to reduce operational costs and burdens, many organizations are switching over to DynamoDB. A common use case while working with DynamoDB tables is the ability to copy full table data across multiple environments. Usually, each environment is owned by a different team using a different AWS account. An example of such a use case is the promotion of code from development to staging and then to production environments. The staging environment is refreshed with the data in production so that it is closest to that of production for conducting tests before promoting to production. The built-in DynamoDB backup and restore feature seems to be a straightforward way to perform full table copy, but this feature works only within the same AWS account. Backups created in Account-A are not available for use in Account-B. This guide gives a high-level overview of several approaches for copying a full refresh of a DynamoDB table from one account to another. The best way to ensure that the target table has the same data as the source table is to delete and then recreate the table. This approach avoids the costs associated with the write capacity units (WCUs) required to delete individual items from the table. Each of the solutions discussed in this guide assumes that the target table is recreated before the data refresh. 1 AWS Prescriptive Guidance Cross-account full table copy options for Amazon DynamoDB Advantages Using AWS Data Pipeline AWS Data Pipeline is a web service that you can use to automate the movement and transformation of data. Using Data Pipeline, you can create a pipeline to export table data from the source account (Account-A). The exported data is stored in an Amazon Simple Storage Service (Amazon S3) bucket in the target account (Account-B). The S3 bucket in the target account must be accessible from the source account. To allow this cross-account access, update the access control list (ACL) in the target S3 bucket. Create another pipeline in the target account (Account-B) to import data from the S3 bucket into the table in the target account. This was the traditional way to back up Amazon DynamoDB tables to Amazon S3 and to restore from Amazon S3 until AWS Glue introduced support for reading from DynamoDB tables natively. Advantages • It's a serverless solution. • No new code is required. • AWS Data Pipeline uses Amazon EMR clusters behind the scenes for the job, so this approach is efficient and can handle large datasets. Drawbacks • Additional AWS services (Data Pipeline and Amazon S3) are required. • The process consumes provisioned throughput on the source table and the target tables involved, so it can affect performance and availability. • This approach incurs additional costs, over the cost of DynamoDB read capacity units (RCUs) and write capacity units (WCUs). 2 AWS Prescriptive Guidance Cross-account full table copy options for Amazon DynamoDB Using AWS Glue and Amazon DynamoDB export AWS Glue is a fully managed ETL (extract, transform, and load) service that makes it simple and cost- effective to categorize your data, clean it, enrich it, and move it reliably between various data stores and data streams. Using AWS Glue with the native export functionality in Amazon DynamoDB works well for large datasets. The DynamoDB export feature relies on the DynamoDB point-in-time recovery feature. Because of this, it can quickly export large datasets without consuming any DynamoDB read capacity units (RCUs). The DynamoDB export feature allows exporting table data to Amazon S3 across AWS accounts and AWS Regions. After the data is uploaded to Amazon S3, AWS Glue can read this data and write it to the target table. After the data is exported to an S3 bucket in the target account, you must do the following in the target account: 1. Run an AWS Glue crawler on the data in Amazon S3. The crawler infers the schema and creates an AWS Glue Data Catalog table with that schema definition. 2. Use AWS Glue Studio to create an ETL job. AWS Glue Studio is a graphical interface for creating, running, and monitoring ETL workflows. After you specify a source, a transformation, and a target, AWS Glue Studio automatically generates PySpark code based on these inputs. For this job, specify the AWS Glue Data Catalog table as the source and ApplyMapping as the transformation. Because DynamoDB is not listed as a target, don’t specify a target. 3. Ensure that the key name and datatype mappings of the AWS Glue Studio generated code are correct. If the mappings aren’t correct, modify the code and correct the mappings. Because the target wasn’t specified while creating the AWS Glue job, add a sink operation that allows writing directly to the target DynamoDB table. glueContext.write_dynamic_frame_from_options ( 3 AWS Prescriptive

Load more