S T G 3 0 2 - R Best Practices for Amazon S3
Rob Wilson Shikha Sukumaran Matt Wheeler Senior Product Manager – Software Development Site Reliability Engineer, Data Technical, Amazon S3 Manager, Amazon S3 & Analytics Amazon Web Services Amazon Web Services Instructure
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda
Amazon Simple Storage Service (S3) overview
Storage classes, cost optimization, and performance
Security
Managing objects at scale
Customer speaker
Data protection
Monitoring and visibility Related breakouts
[STG203] What's new with Amazon S3 and Amazon S3 Glacier [STG212] Managing your data at scale with Amazon S3 storage management tools [STG301] Deep dive on Amazon S3 security and management [STG331] Beyond eleven nines: Lessons from the Amazon S3 culture of durability [STG332] Guidelines and design patterns for optimizing cost in Amazon S3 [STG343] Optimize your storage performance with Amazon S3 © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Benefits of Amazon S3
Amazon S3 Geospatial or lunar imagery Internet of Medical imagery Compliance and records records Analytics Things (IoT) sensor data Media master files Customer Data call-center Digital record records preservation Mobile sync Home- recording video lakes and storage Seismic and Origin storage reservoir Pharmaceutical for CDN simulation data Durable study data DNA sequences backups Amazon Surveillance S3 video/closed- ML training data circuit television Financial Media assets transaction records Website hosting Meteorological and environmental research User-generated Autonomous Log vehicle data Mapping content Oil and gas topography data files Amazon S3 has more options for data transfer
Amazon Amazon S3 AWS AWS Kinesis Data Amazon Kinesis Amazon Kinesis Transfer Storage Direct Connect Firehose Data Streams Video Streams Acceleration Gateway
AWS AWS AWS AWS AWS SFTP Snowball Snowball Edge Snowmobile DataSync © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon S3 storage classes Optimize your storage cost by utilizing all Amazon S3 storage classes
Decreasing storage prices Accelerating innovation
S3 Glacier Deep Archive (2019) S3 Intelligent- Tiering (2H-2018)
>80% lower S3 One Zone-IA (1H-2018) price per GB S3 Standard-IA (2015)
S3 Glacier (2012) S3 Standard (2006)
2006 2019 2006 2019 Your choice of Amazon S3 storage classes
S3 Intelligent- S3 Glacier S3 Standard S3 Standard-IA S3 One Zone-IA S3 Glacier Tiering Deep Archive
Frequent Access frequency Infrequent
Active, frequently Data with changing Infrequently Re-creatable, less Archive data Archive data accessed data access patterns accessed data accessed data Minutes or hours Hours to access Milliseconds access Milliseconds access Milliseconds access Milliseconds access access > 3 AZ > 3 AZ > 3 AZ > 3 AZ 1 AZ > 3 AZ $0.00099/GB $0.0210/GB $0.0210 to $0.0125/GB $0.0100/GB $0.0040/GB $0.0125/GB Amazon S3 Intelligent-Tiering Automatic cost optimization with no performance impact and no operational overhead Amazon S3 Intelligent-Tiering automates cost savings
Automatically optimizes storage costs for data with changing access patterns Stores objects in two access tiers, optimized for frequent and infrequent access Monitors access patterns and optimizes cost on granular object level No performance impact, no operational overhead, no retrieval fees
Customers of all sizes and virtually every industry use S3 INT and save automatically S3 Storage Class Analysis Provides lifecycle policy recommendations based on access patterns
Monitors access patterns Classifies data as frequently or infrequently accessed Can be filtered by bucket, prefix, or object tag Lifecycle policies use rules to manage your storage Use lifecycle policies to transition objects to another storage class
Lifecycle rules take action based on object age. Here’s an example: 1. Move objects older than 30 days to S3 Standard — Infrequent Access 2. Move objects older than 365 days to S3 Glacier Deep Archive
S3 Standard — S3 Glacier S3 Standard Infrequent Access Deep Archive Object tags work with lifecycle policies Perform automated actions on a subset of your data with object tags
Lifecycle Specify a tag filter to transition or expire objects Use S3 batch operations to apply object tags at scale Ex: Transition all objects tagged “Project : Delta” to S3 Glacier Use lifecycle policies with object tag filters
Object tag filters simplify lifecycle policies when the same action needs to be performed across multiple prefixes in the bucket
Use the latest version of the AWS SDKs to automatically see performance improvements from: • Automatic retries • Handling timeouts • Parallelized uploads and downloads with TransferManager
See the Optimizing Amazon S3 Performance whitepaper to learn more about: • Scaling horizontally for more throughput • Caching data • Using Amazon S3 Transfer Acceleration for faster data transfer
Learn more about how to optimize performance in S3 this week: STG 343, STG 229, STG 320 © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Security is at the core of everything we do Data stored in Amazon S3 is secure by default
Amazon S3 Encrypt data by Encryption Bucket Free checks Block Public default in status in Amazon permission with AWS Access Amazon S3 S3 inventory checks Trusted Advisor Layers of access control
Resource-based • Object Access Control Lists (ACLs) • Bucket Access Control Lists (ACLs) • Bucket policies
User-based • Identity and Access Management (IAM) policies
Amazon S3 recommends using bucket policies and IAM policies Amazon S3 Block Public Access
Can be applied to accounts or buckets Four security settings to deny public access
Use AWS Organizations Service Control Policies (SCPs) to prevent settings changes Amazon S3 Block Public Access settings Amazon S3 default encryption
One-time Automatically Simplified Supports SSE-S3 bucket-level encrypts all new and SSE-KMS setup objects compliance
Provides Amazon S3 encryption-at-rest support for applications that do not otherwise support encrypting data in Amazon S3 Access Analyzer for Amazon S3 buckets new!
Continuous analysis Provides insights Swift remediation Continuously monitors and Drilldown into source and level Lock public buckets down automatically analyzes resources of public and shared access with a single click Surfaces buckets with public & Acknowledge shared shared access in the S3 access as intended Management Console Know exactly where and what remediation actions to apply Access Analyzer for Amazon S3 buckets new! Introducing Amazon S3 Access Points new!
Amazon S3 Access Points simplify access control for large, shared buckets such as data lakes Every application that interacts with a multi-tenant bucket can have a dedicated access point with custom permissions Amazon S3 Access Points can be set to only allow access from a Virtual Private Cloud (VPC) VPC access points do not allow requests from the Internet. S3 restricts request traffic to the specified VPC What is an Amazon S3 Access Point? A new S3 resource with a hostname, ARN and an IAM resource policy ▪ Applications use Access Points to access objects in a bucket ▪ Access Points can be limited to a specified VPC ▪ Access Points have a Access Point specific Block Public Access setting ▪ Access Point names live in a private namespace that is unique to an account and the region ▪ Access Point ARNs and hostname have the account ID and region embedded in them
Bucket hostname: mycomdata.s3.us-west-1.amazonaws.com
ap1 Access Point policy Bucket Policy AP hostname: ap1-123456789012.s3-accesspoint.us-west-1.amazonaws.com ARN: arn:aws:s3:us-west-1:123456789012:accesspoint/ap1
ap2 Access Point policy
AP hostname: ap2-123456789012.s3-accesspoint.us-west-1.amazonaws.com Bucket (mycomdata) Points ARN: arn:aws:s3:us-west-1:123456789012:accesspoint/ap2 Access Accessing objects in Amazon S3—Previously
All users would access objects directly through the bucket using the bucket hostname
Administrator
Bucket hostname: mycomdata.s3.us-west-1.amazonaws.com Role1 Bucket (mycomdata)
Role2 Use case: simplify access control for shared buckets
Now, we can grant custom access to multiple teams using Access Points Access Point polices can establish granular control within limits enforced by the bucket policy
Policy grants Finance read/write access to Finance data AP
Policy grants Sales read/write access AP to Sales data Policy grants Supply read/write access AP Access to Supply data Points Data Policy grants read Bucket Policy access to data Science tagged in bucket AP Use case: enforce VPC only data access for a bucket
Access Points can be configured to limit access to a specified VPC only ▪ Create AWS Organization Service Control Policy to enforce VPC only access points for applications using the bucket ▪ Data access through the bucket directly disabled (enforced through bucket policy)
Bucket hostname: mycomdata.s3.us-west-1.amazonaws.com Internet
VPC 67890 Hostname: ap1-123456789012.s3-accesspoint.us-west-1.amazonaws.com
Bucket (mycomdata) VPC Access 12345 Hostname: ap2-123456789012.s3-accesspoint.us-west-1.amazonaws.com Points © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon S3 inventory A managed alternative to using the LIST API
• Storage class • Creation date • Encryption status • Replication status • Object size, and more Regularly generates a list of objects for analytics and auditing. • S3 Intelligent-Tiering access tier new! Use Amazon Athena to filter S3 inventory reports This query selects bucket, object key, version id for unencrypted objects select s._1, s._2, s._3 from s3object s where s._6 = 'NOT-SSE’
Example results: batchoperationsdemo,0100059%7Ethumb.jpg,lsrtIxksLu0R0ZkYPL.LhgD5caTYn6vu batchoperationsdemo,0100074%7Ethumb.jpg,sd2M60g6Fdazoi6D5kNARIE7KzUibmHR batchoperationsdemo,0100075%7Ethumb.jpg,TLYESLnl1mXD5c4BwiOIinqFrktddkoL Amazon S3 batch operations new! Save time when performing one-time or recurring actions at scale
Replace object tag sets
Change object ACLs Manage millions or billions of objects with a single request
Restore objects Automatically handles retries, from Amazon S3 Glacier displays progress and generates reports Copy objects
Run AWS Lambda functions Amazon S3 batch operations
Choose objects Select an operation View progress
• S3 Inventory report • Copy • Object level progress • CSV list • Restore from S3 Glacier • Completion report • Put Access Control List (ACL) • Replace object tag sets • Run AWS Lambda functions Use Amazon S3 batch operations to encrypt objects
Choose objects Select an operation View progress
• S3 Inventory report • Copy • Completion report
• Filter S3 Inventory report • Copy objects to • Retain completion with Amazon Athena the same bucket report of all tasks for object-level visibility • Identify all • Specify desired unencrypted objects encryption type Amazon S3 batch operations and AWS Lambda Run your custom code across billions of objects in Amazon S3
Manifest selection: • Specify existing Amazon S3 objects • Use URL-encoded JSON to pass object-level parameters • Invoke general purpose AWS Lambda functions
AWS Lambda function: • Invoke AWS services like Amazon Rekognition • Use Amazon S3 operations like copy with parameters AWS Lambda • Run your own custom code © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Common standards, Open APIs, largest and most active community in education
2008 – Founded 2011 – Production • 30 million users • 70 countries
• >10,000 EC2 instances • Petabytes on S3
• 99.9% uptime • 1 million concurrent AWS Cloud
7+ Regions
The Monolith x100 clusters Service X x50 services Transports Elastic Load Balancer Customer-facing analytics Elastic Load Balancer
Amazon S3 HTTPS APIs Stateless compute Stateless compute
Amazon SQS Amazon EMR
Amazon Kinesis Amazon DynamoDB
S3 Bucket PostgreSQL S3 Bucket PostgreSQL on EC2 on EC2 Amazon S3
Amazon Amazon RDS DynamoDB Canvas is the world’s #1 Learning Management Platform
Open source
Software as a Service • 50 services • 40 teams of about 7
• Languages • Ruby • Node • Golang • Scala • Java • Python
• 40+ AWS Accounts AWS Cloud
7+ Regions
The Monolith x100 clusters Service X x50 services Transports Elastic Load Balancer Customer-facing analytics Elastic Load Balancer
Amazon S3 HTTPS APIs Stateless compute Stateless compute
Amazon SQS Amazon EMR
Amazon Kinesis Amazon DynamoDB
S3 Bucket PostgreSQL S3 Bucket PostgreSQL on EC2 on EC2 Amazon S3
Amazon Amazon RDS DynamoDB Shard data dumps
• 1 Amazon S3 bucket dedicated to this purpose
• 100s of PostgreSQL clusters
• 1000s of shards (PostgreSQL schemas)
• Each dump of all clusters is >1 million objects and 20 TB
• ~1 PB across all retained dumps Shard data dumps - Cost optimization
• Lifecycle policies to expire objects after 28 days
• Lifecycle policies to expire incomplete multipart uploads
• Other buckets use lifecycle policies to tier objects down to S3 Standard-Infrequent Access and S3 Glacier Shard data dumps – Cost optimization
Intentional bucket structure supports surgical access
Prefix Description s3://my-bucket/* All shard dumps s3://my-bucket/shard_1/* All dumps for a shard (~tenant) s3://my-bucket/shard_1/ AWS Cloud AWS Account X { "Effect":"Allow", "Principal":{ AWS Account A "AWS":[ Amazon EMR Amazon S3 Amazon DynamoDB "arn:aws:iam::111111111111:root", Stateless compute "arn:aws:iam::222222222222:root", "arn:aws:iam::333333333333:root" AWS Account Y ] }, "Action":[ "s3:List*", Amazon EMR Amazon S3 "s3:Get*" PostgreSQL Shard data dumps ], on EC2 AWS Account Z "Resource":"arn:aws:s3:::my-bucket/*" } Amazon Kinesis Amazon S3 Amazon DynamoDB Data Analytics Durability Instructure internal wiki: Summary • Lifecycle policies take the toil out of managing storage • Cost effective storage has enabled easy-to-build architectures • S3 is a great team boundary Increasing cost savings with shard data dumps using Amazon S3 Born in the cloud, Instructure builds the Canvas Learning Management System (LMS) for kindergarten through university-level higher education, as well as the Bridge employee development suite for the corporate space © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S3 Data Protection capabilities GOAL AMAZON S3 AND S3 GLACIER FEATURES Replicate data for compliance Use S3 Replication with Replication and bad actor protection Time Control and ownership override Protect data from accidental Use bucket versioning while reducing deletes cost with Lifecycle policies Protect data for governance Use S3 Object Lock to store objects as and compliance purposes write-once-read-many (WORM) Amazon S3 Replication Amazon S3 Replication automatically copies your data to the same or different AWS region NEW! Same-Region Cross-Region Replication (SRR) Replication (CRR) Source bucket Destination bucket Amazon S3 Replication Select data Select a region Satisfy distance and residency requirements Change ownership Protect against bad actors & IAM account compromise Cross account Protect against AWS root account compromise Set storage class .. or replicate straight to Amazon S3 Glacier Amazon S3 Replication time control new! Designed to replicate 99.99% of objects within 15 minutes 15 minute replication Monitor replication time backed by an using Amazon AWS Service Level CloudWatch metrics Agreement (SLA) and event notifications Amazon S3 Replication time control new! Designed to replicate 99.99% of objects within 15 minutes Monitor your replication with 3 new CloudWatch metrics Optional: Set up alarms on your metrics 250000 600 800 200000 500 600 400 150000 300 400 BYTES 100000 COUNT 200 SECONDS 200 50000 100 0 0 0 9:03 9:14 9:04 9:11 9:01 9:12 9:00 9:01 9:02 9:04 9:05 9:06 9:07 9:08 9:09 9:10 9:11 9:12 9:13 9:15 9:00 9:01 9:02 9:03 9:05 9:06 9:07 9:08 9:09 9:10 9:12 9:13 9:14 9:15 9:00 9:02 9:03 9:04 9:05 9:06 9:07 9:08 9:09 9:10 9:11 9:13 9:14 9:15 Replication Latency Alarm Bytes Pending Replication Alarm Operations Pending Replication Replication latency Bytes pending replication Operations pending replication The maximum number of seconds by The total number of bytes of objects The number of operations pending replication which the destination region is behind the pending replication for a given replication for a given replication rule source region for a given replication rule rule Enable Amazon S3 bucket versioning Use versioning to protect your data from accidental deletion Create a new version with every upload Previous versions are retained, not overwritten Making delete requests without a version ID removes access to objects, but keeps the data Manage previous versions with lifecycle Transition or expire objects a specified number of days after they are no longer the current version Use lifecycle policies to expire object versions Set lifecycle policies to control the cost of noncurrent versions Use Object Lock to store objects as write-once-read-many (WORM) Compliance Governance Legal mode mode hold Store compliant Store data in If you’re unsure data WORM format; how long you privileged users want your objects can modify to stay immutable retention controls © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Daily storage metrics Provides storage bytes by storage class and object count Amazon S3 CloudWatch request metrics Can be filtered by bucket, prefix, or tagged objects Metric Name Value Metric Name Value AllRequests Count BytesDownloaded MB PutRequests Count BytesUploaded MB GetRequests Count 4xxErrors Count ListRequests Count 5xxErrors Count DeleteRequests Count FirstByteLatency ms HeadRequests Count TotalRequestLatency ms PostRequests Count Amazon S3 CloudWatch percentiles metrics new! Amazon S3 request metrics on any percentile (e.g., p90, p99, p99.9, p100) Understand the distribution of Amazon S3 request metrics Visualize and alarm on any percentile to identify outliers or unusual application behavior Avoid false alarms and save time spent monitoring and tracking requests Learn storage with AWS Training and Certification Resources created by the experts at AWS to help you build cloud storage skills 45+ free digital courses cover topics related to cloud storage, including: • Amazon S3 • Amazon Elastic File System • AWS Storage Gateway (Amazon EFS) • Amazon S3 Glacier • Amazon Elastic Block Store (Amazon EBS) Classroom offerings, like Architecting on AWS, feature AWS expert instructors and hands-on activities Visit aws.amazon.com/training/path-storage/ © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you! Rob Wilson, Shikha Sukumaran, and Matt Wheeler © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.-