Top 10 Data Migration Best Practices
Chris Rogers Jeff Bartley Global Storage Bus. Development Storage Specialist SA
© 2020, Amazon Web Services, Inc. or its Affiliates. Agenda
• Migration basics • Top 10 Best Practices • The right tool for the job • Planning your migration • Transferring data • Wrap up / Q&A
© 2020, Amazon Web Services, Inc. or its Affiliates. Stages of Cloud Adoption for Enterprise Transformation
Value
Continuous Reinvention
Modernization Cloud native Legacy applications and data Project Foundation Migration
Time
© 2020, Amazon Web Services, Inc. or its Affiliates. Common Migration Drivers
Agility/Dev Data Center Digital Cost Productivity Consolidation Transformation Reduction
Acquisitions Large-Scale Compute- Facility or Colocation or Outsourcing or Divestitures Intensive Workloads Real Estate Decisions Contract Changes
© 2020, Amazon Web Services, Inc. or its Affiliates. Migration Business Outcomes
Agility Build and operate your foundation for innovation
Operational Efficiency Obtain substantial cost savings, freeing up resources to focus on what differentiates your business
Reduced Risk Migrate through a secure and proven approach that reduces IT risks by moving to a more resilient IT model
© 2020, Amazon Web Services, Inc. or its Affiliates. Top 10 Data Migration Best Practices
© 2020, Amazon Web Services, Inc. or its Affiliates. Section 1: The right tool for the job
© 2020, Amazon Web Services, Inc. or its Affiliates. Best practice #1: Know your data
Choose the right tool for the job
• Virtual machines (VMs) → CloudEndure for AWS
• Databases → AWS Database Migration Service
• Unstructured data/file data → AWS DataSync, AWS Snow* Family etc.
© 2020, Amazon Web Services, Inc. or its Affiliates. Best practice #2: Migrate virtual machines with CloudEndure
CloudEndure continuously replicates any application or Business outcome: Allow self-service, rapid, reliable database from any source into AWS migrations with minimal business disruption
CloudEndure agent handshake APIs to create staging area and launch target machines CloudEndure user console
Replication server configuration and ongoing monitoring
Corporate data center/ Target region any cloud Target subnet Staging area subnet Oracle database, Oracle CloudEndure database agent Continuous, Launched real-time data target EC2 replication traffic instance Staging area Orchestration and (compressed and system conversion encrypted) replication servers SQL SQL Server, ready to run (lightweight Linux EC2 Server CloudEndure workloads instances) Launched agent in minutes Staging target EC2 Target EBS Disks EBS volumes instance volumes
© 2020, Amazon Web Services, Inc. or its Affiliates. Best practice #3: Migrate databases with AWS Database Migration Service (DMS)
Migrate between on-premises and AWS M i g r a t i n g d a t a b a s e s Migrate between databases t o A W S 100,000+ Automated schema conversion databases migrated Data replication for migration with zero downtime
© 2020, Amazon Web Services, Inc. or its Affiliates. Section 2: Planning your migration
© 2020, Amazon Web Services, Inc. or its Affiliates. Best practice #4: Understand available bandwidth
Usable network bandwidth
100 Mbps 1 Gbps 10 Gbps
1 TB 30 hours 3 hours 18 minutes 10 TB 12 days 30 hours 3 hours 100 TB 124 days 12 days 30 hours 1 PB 3 years 124 days 12 days 10 PB Assumes ~25%34 networkyears overhead 3 years 124 days
© 2020, Amazon Web Services, Inc. or its Affiliates. Best practice #5: Assess operational impact of migration
AWS Snowball Run a proof of concept (POC) • Early discovery and remediation of environmental issues • Sets more realistic migration and edge compute timelines • Deploy staging workstations • Ensure low network latencies (<1ms) • Ensure larger files (>5MB) • Benchmark & optimize data transfer (target 300-500 MBps)
Plan devices and scheduling with your account team/TAM before ordering jobs Resources • White paper: AWS Snowball Edge data migration guide • Blog: Data migration best practices with Snowball Edge
© 2020, Amazon Web Services, Inc. or its Affiliates. Best practice #5: Assess operational impact of migration AWS DataSync
• Every part of the network is critical • Bottlenecks are a moving target • The WAN might not be the biggest bottleneck • Source system configuration dictates read performance
© 2020, Amazon Web Services, Inc. or its Affiliates. Best practice #6: Know your data profile
How much data? How many files?
AWS Snowball
Files Batch Batch Data source Workstation AWS Snowball Edge
© 2020, Amazon Web Services, Inc. or its Affiliates. Best practice #6: Know your data profile
How much data? How many files?
AWS DataSync
aws Files
Data source DataSync Agent
© 2020, Amazon Web Services, Inc. or its Affiliates. Best practice #6: Know your data profile
Partitioning large data sources
50 million files ~80 TiB per task usable capacity 1 mount point per task Snowball Edge DataSync device task
© 2020, Amazon Web Services, Inc. or its Affiliates. Best practice #7: Scale out as needed
Scale resources to handle large datasets
Snowball Edge devices
Device availability
Operational overhead
Infrastructure requirements
© 2020, Amazon Web Services, Inc. or its Affiliates. Best practice #7: Scale out as needed
Scale resources to handle large datasets
One task per agent
Source storage impact
Bandwidth throttling
DataSync agents
© 2020, Amazon Web Services, Inc. or its Affiliates. Best practice #8: Consider the data source
How will a data transfer impact source storage?
• Is the storage system healthy? • Permissions available to access all data? • Can the storage support scale-out access? • What is the rate of change of the data? • Sufficient source resources to maintain production workloads?
© 2020, Amazon Web Services, Inc. or its Affiliates. Section 3: Transferring data (finally)
© 2020, Amazon Web Services, Inc. or its Affiliates. Best practice #9: Preserving metadata
What is metadata? Workloads that need metadata • File ownership • Data protection • Permissions • Migration to cloud file systems • • Time stamps On-premises access with File Gateway • Access from FSx for Lustre • File system attributes
© 2020, Amazon Web Services, Inc. or its Affiliates. Best practice #10: Validate your assumptions
Test run to validate your plan
• Verify you can read and batch data (if needed) • Verify source performance • Verify your network works as expected • Verify service configuration and other settings • Validate timeframe expectations
© 2020, Amazon Web Services, Inc. or its Affiliates. Best practice #11: Verify data transfer
• Verification ensures migrated data matches the source • Critical for medical records, financial transactions, analytics datasets, etc. • Make sure to plan time for verifying data
© 2020, Amazon Web Services, Inc. or its Affiliates. Wrap Up
© 2020, Amazon Web Services, Inc. or its Affiliates. Best Practices for data migration
The right tool for the job ✓ #7: Scale out as needed ✓ #1: Know your data ✓ #8: Consider the data source ✓ #2: Migrate VMs with CloudEndure ✓ #3: Migrate databases with DMS Transferring data ✓ #9: Preserving metadata Planning your migration ✓ #10: Validate your assumptions ✓ #4: Understand available bandwidth ✓ Bonus #11: Verify data transfer ✓ #5: Assess operational impact of migration ✓ #6: Know your data profile
© 2020, Amazon Web Services, Inc. or its Affiliates. AWS services covered
AWS DataSync Move data over the network between on-premises storage and AWS Learn more: https://aws.amazon.com/datasync
CloudEndure Migrate live applications and databases to AWS Learn more: https://aws.amazon.com/cloudendure-migration
AWS Snow Family Offline transfer of large amounts of data into and out of AWS Learn more: https://aws.amazon.com/snow
AWS Database Migration Service Migrate your databases to AWS with minimal downtime Learn more: https://aws.amazon.com/dms
© 2020, Amazon Web Services, Inc. or its Affiliates. Q&A
Chris Rogers Jeff Bartley Global Storage Bus. Development Storage Specialist SA
© 2020, Amazon Web Services, Inc. or its Affiliates. Thank you!
© 2020, Amazon Web Services, Inc. or its Affiliates.