<<

S T G 3 0 2 - R Best Practices for S3

Rob Wilson Shikha Sukumaran Matt Wheeler Senior Product Manager – Software Development Site Reliability Engineer, Data Technical, Amazon S3 Manager, Amazon S3 & Analytics Amazon Web Services Instructure

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda

Amazon Simple Storage Service (S3) overview

Storage classes, cost optimization, and performance

Security

Managing objects at scale

Customer speaker

Data protection

Monitoring and visibility Related breakouts

[STG203] What's new with Amazon S3 and Amazon S3 Glacier [STG212] Managing your data at scale with Amazon S3 storage management tools [STG301] Deep dive on Amazon S3 security and management [STG331] Beyond eleven nines: Lessons from the Amazon S3 culture of durability [STG332] Guidelines and design patterns for optimizing cost in Amazon S3 [STG343] Optimize your storage performance with Amazon S3 © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Benefits of Amazon S3

Amazon S3 Geospatial or lunar imagery of Medical imagery Compliance and records records Analytics Things (IoT) sensor data Media master files Customer Data call-center Digital record records preservation Mobile sync Home- recording video lakes and storage Seismic and Origin storage reservoir Pharmaceutical for CDN simulation data Durable study data DNA sequences backups Amazon Surveillance S3 video/closed- ML training data circuit television Financial Media assets transaction records Website hosting Meteorological and environmental research User-generated Autonomous Log vehicle data Mapping content Oil and gas topography data files Amazon S3 has more options for data transfer

Amazon Amazon S3 AWS AWS Kinesis Data Amazon Kinesis Amazon Kinesis Transfer Storage Direct Connect Firehose Data Streams Video Streams Acceleration Gateway

AWS AWS AWS AWS AWS SFTP Snowball Snowball Edge Snowmobile DataSync © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon S3 storage classes Optimize your storage cost by utilizing all Amazon S3 storage classes

Decreasing storage prices Accelerating innovation

S3 Glacier Deep Archive (2019) S3 Intelligent- Tiering (2H-2018)

>80% lower S3 One Zone-IA (1H-2018) price per GB S3 Standard-IA (2015)

S3 Glacier (2012) S3 Standard (2006)

2006 2019 2006 2019 Your choice of Amazon S3 storage classes

S3 Intelligent- S3 Glacier S3 Standard S3 Standard-IA S3 One Zone-IA S3 Glacier Tiering Deep Archive

Frequent Access frequency Infrequent

Active, frequently Data with changing Infrequently Re-creatable, less Archive data Archive data accessed data access patterns accessed data accessed data Minutes or hours Hours to access Milliseconds access Milliseconds access Milliseconds access Milliseconds access access > 3 AZ > 3 AZ > 3 AZ > 3 AZ 1 AZ > 3 AZ $0.00099/GB $0.0210/GB $0.0210 to $0.0125/GB $0.0100/GB $0.0040/GB $0.0125/GB Amazon S3 Intelligent-Tiering Automatic cost optimization with no performance impact and no operational overhead Amazon S3 Intelligent-Tiering automates cost savings

Automatically optimizes storage costs for data with changing access patterns Stores objects in two access tiers, optimized for frequent and infrequent access Monitors access patterns and optimizes cost on granular object level No performance impact, no operational overhead, no retrieval fees

Customers of all sizes and virtually every industry use S3 INT and save automatically S3 Storage Class Analysis Provides lifecycle policy recommendations based on access patterns

Monitors access patterns Classifies data as frequently or infrequently accessed Can be filtered by bucket, prefix, or object tag Lifecycle policies use rules to manage your storage Use lifecycle policies to transition objects to another storage class

Lifecycle rules take action based on object age. Here’s an example: 1. Move objects older than 30 days to S3 Standard — Infrequent Access 2. Move objects older than 365 days to S3 Glacier Deep Archive

S3 Standard — S3 Glacier S3 Standard Infrequent Access Deep Archive Object tags work with lifecycle policies Perform automated actions on a subset of your data with object tags

Lifecycle Specify a tag filter to transition or expire objects Use S3 batch operations to apply object tags at scale Ex: Transition all objects tagged “Project : Delta” to S3 Glacier Use lifecycle policies with object tag filters

Object tag filters simplify lifecycle policies when the same action needs to be performed across multiple prefixes in the bucket

Project Delta Performance best practices on Amazon S3

Use the latest version of the AWS SDKs to automatically see performance improvements from: • Automatic retries • Handling timeouts • Parallelized uploads and downloads with TransferManager

See the Optimizing Amazon S3 Performance whitepaper to learn more about: • Scaling horizontally for more throughput • Caching data • Using Amazon S3 Transfer Acceleration for faster data transfer

Learn more about how to optimize performance in S3 this week: STG 343, STG 229, STG 320 © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Security is at the core of everything we do Data stored in Amazon S3 is secure by default

Amazon S3 Encrypt data by Encryption Bucket Free checks Block Public default in status in Amazon permission with AWS Access Amazon S3 S3 inventory checks Trusted Advisor Layers of access control

Resource-based • Object Access Control Lists (ACLs) • Bucket Access Control Lists (ACLs) • Bucket policies

User-based • Identity and Access Management (IAM) policies

Amazon S3 recommends using bucket policies and IAM policies Amazon S3 Block Public Access

Can be applied to accounts or buckets Four security settings to deny public access

Use AWS Organizations Service Control Policies (SCPs) to prevent settings changes Amazon S3 Block Public Access settings Amazon S3 default encryption

One-time Automatically Simplified Supports SSE-S3 bucket-level encrypts all new and SSE-KMS setup objects compliance

Provides Amazon S3 encryption-at-rest support for applications that do not otherwise support encrypting data in Amazon S3 Access Analyzer for Amazon S3 buckets new!

Continuous analysis Provides insights Swift remediation Continuously monitors and Drilldown into source and level Lock public buckets down automatically analyzes resources of public and shared access with a single click Surfaces buckets with public & Acknowledge shared shared access in the S3 access as intended Management Console Know exactly where and what remediation actions to apply Access Analyzer for Amazon S3 buckets new! Introducing Amazon S3 Access Points new!

Amazon S3 Access Points simplify access control for large, shared buckets such as data lakes Every application that interacts with a multi-tenant bucket can have a dedicated access point with custom permissions Amazon S3 Access Points can be set to only allow access from a (VPC) VPC access points do not allow requests from the Internet. S3 restricts request traffic to the specified VPC What is an Amazon S3 Access Point? A new S3 resource with a hostname, ARN and an IAM resource policy ▪ Applications use Access Points to access objects in a bucket ▪ Access Points can be limited to a specified VPC ▪ Access Points have a Access Point specific Block Public Access setting ▪ Access Point names live in a private namespace that is unique to an account and the region ▪ Access Point ARNs and hostname have the account ID and region embedded in them

Bucket hostname: mycomdata.s3.us-west-1.amazonaws.com

ap1 Access Point policy Bucket Policy AP hostname: ap1-123456789012.s3-accesspoint.us-west-1.amazonaws.com ARN: arn:aws:s3:us-west-1:123456789012:accesspoint/ap1

ap2 Access Point policy

AP hostname: ap2-123456789012.s3-accesspoint.us-west-1.amazonaws.com Bucket (mycomdata) Points ARN: arn:aws:s3:us-west-1:123456789012:accesspoint/ap2 Access Accessing objects in Amazon S3—Previously

All users would access objects directly through the bucket using the bucket hostname

Administrator

Bucket hostname: mycomdata.s3.us-west-1.amazonaws.com Role1 Bucket (mycomdata)

Role2 Use case: simplify access control for shared buckets

Now, we can grant custom access to multiple teams using Access Points Access Point polices can establish granular control within limits enforced by the bucket policy

Policy grants Finance read/write access to Finance data AP

Policy grants Sales read/write access AP to Sales data Policy grants Supply read/write access AP Access to Supply data Points Data Policy grants read Bucket Policy access to data Science tagged in bucket AP Use case: enforce VPC only data access for a bucket

Access Points can be configured to limit access to a specified VPC only ▪ Create AWS Organization Service Control Policy to enforce VPC only access points for applications using the bucket ▪ Data access through the bucket directly disabled (enforced through bucket policy)

Bucket hostname: mycomdata.s3.us-west-1.amazonaws.com Internet

VPC 67890 Hostname: ap1-123456789012.s3-accesspoint.us-west-1.amazonaws.com

Bucket (mycomdata) VPC Access 12345 Hostname: ap2-123456789012.s3-accesspoint.us-west-1.amazonaws.com Points © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon S3 inventory A managed alternative to using the LIST API

• Storage class • Creation date • Encryption status • Replication status • Object size, and more Regularly generates a list of objects for analytics and auditing. • S3 Intelligent-Tiering access tier new! Use Amazon Athena to filter S3 inventory reports This query selects bucket, object key, version id for unencrypted objects select s._1, s._2, s._3 from s3object s where s._6 = 'NOT-SSE’

Example results: batchoperationsdemo,0100059%7Ethumb.jpg,lsrtIxksLu0R0ZkYPL.LhgD5caTYn6vu batchoperationsdemo,0100074%7Ethumb.jpg,sd2M60g6Fdazoi6D5kNARIE7KzUibmHR batchoperationsdemo,0100075%7Ethumb.jpg,TLYESLnl1mXD5c4BwiOIinqFrktddkoL Amazon S3 batch operations new! Save time when performing one-time or recurring actions at scale

Replace object tag sets

Change object ACLs Manage millions or billions of objects with a single request

Restore objects Automatically handles retries, from Amazon S3 Glacier displays progress and generates reports Copy objects

Run AWS Lambda functions Amazon S3 batch operations

Choose objects Select an operation View progress

• S3 Inventory report • Copy • Object level progress • CSV list • Restore from S3 Glacier • Completion report • Put Access Control List (ACL) • Replace object tag sets • Run AWS Lambda functions Use Amazon S3 batch operations to encrypt objects

Choose objects Select an operation View progress

• S3 Inventory report • Copy • Completion report

• Filter S3 Inventory report • Copy objects to • Retain completion with Amazon Athena the same bucket report of all tasks for object-level visibility • Identify all • Specify desired unencrypted objects encryption type Amazon S3 batch operations and AWS Lambda Run your custom code across billions of objects in Amazon S3

Manifest selection: • Specify existing Amazon S3 objects • Use URL-encoded JSON to pass object-level parameters • Invoke general purpose AWS Lambda functions

AWS Lambda function: • Invoke AWS services like • Use Amazon S3 operations like copy with parameters AWS Lambda • Run your own custom code © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Common standards, Open , largest and most active community in education

2008 – Founded 2011 – Production • 30 million users • 70 countries

• >10,000 EC2 instances • Petabytes on S3

• 99.9% uptime • 1 million concurrent AWS Cloud

7+ Regions

The Monolith x100 clusters Service X x50 services Transports Elastic Load Balancer Customer-facing analytics Elastic Load Balancer

Amazon S3 HTTPS APIs Stateless compute Stateless compute

Amazon SQS Amazon EMR

Amazon Kinesis Amazon DynamoDB

S3 Bucket PostgreSQL S3 Bucket PostgreSQL on EC2 on EC2 Amazon S3

Amazon Amazon RDS DynamoDB Canvas is the world’s #1 Learning Management Platform

Open source

Software • 50 services • 40 teams of about 7

• Languages • Ruby • Node • Golang • Scala • Java • Python

• 40+ AWS Accounts AWS Cloud

7+ Regions

The Monolith x100 clusters Service X x50 services Transports Elastic Load Balancer Customer-facing analytics Elastic Load Balancer

Amazon S3 HTTPS APIs Stateless compute Stateless compute

Amazon SQS Amazon EMR

Amazon Kinesis Amazon DynamoDB

S3 Bucket PostgreSQL S3 Bucket PostgreSQL on EC2 on EC2 Amazon S3

Amazon Amazon RDS DynamoDB Shard data dumps

• 1 Amazon S3 bucket dedicated to this purpose

• 100s of PostgreSQL clusters

• 1000s of shards (PostgreSQL schemas)

• Each dump of all clusters is >1 million objects and 20 TB

• ~1 PB across all retained dumps Shard data dumps - Cost optimization

• Lifecycle policies to expire objects after 28 days

• Lifecycle policies to expire incomplete multipart uploads

• Other buckets use lifecycle policies to tier objects down to S3 Standard-Infrequent Access and S3 Glacier Shard data dumps – Cost optimization

Intentional bucket structure supports surgical access

Prefix Description s3://my-bucket/* All shard dumps s3://my-bucket/shard_1/* All dumps for a shard (~tenant) s3://my-bucket/shard_1//* All objects in a dump s3://my-bucket/shard_1//

-.gz Single object Fan out to many accounts/teams

AWS Cloud AWS Account X { "Effect":"Allow", "Principal":{

AWS Account A "AWS":[

Amazon EMR Amazon S3 Amazon DynamoDB "arn:aws:iam::111111111111:root", Stateless compute "arn:aws:iam::222222222222:root", "arn:aws:iam::333333333333:root" AWS Account Y ] }, "Action":[ "s3:List*", Amazon EMR Amazon S3 "s3:Get*"

PostgreSQL Shard data dumps ], on EC2 AWS Account Z "Resource":"arn:aws:s3:::my-bucket/*" }

Amazon Kinesis Amazon S3 Amazon DynamoDB Data Analytics Durability

Instructure internal wiki: Summary

• Lifecycle policies take the toil out of managing storage

• Cost effective storage has enabled easy-to-build architectures

• S3 is a great team boundary Increasing cost savings with shard data dumps using Amazon S3 Born in the cloud, Instructure builds the Canvas Learning Management System (LMS) for kindergarten through university-level higher education, as well as the Bridge employee development suite for the corporate space © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S3 Data Protection capabilities

GOAL AMAZON S3 AND S3 GLACIER FEATURES

Replicate data for compliance Use S3 Replication with Replication and bad actor protection Time Control and ownership override

Protect data from accidental Use bucket versioning while reducing deletes cost with Lifecycle policies

Protect data for governance Use S3 Object Lock to store objects as and compliance purposes write-once-read-many (WORM) Amazon S3 Replication

Amazon S3 Replication automatically copies your data to the same or different AWS region

NEW! Same-Region Cross-Region Replication (SRR) Replication (CRR) Source bucket Destination bucket Amazon S3 Replication

Select data

Select a region Satisfy distance and residency requirements

Change ownership Protect against bad actors & IAM account compromise

Cross account Protect against AWS root account compromise

Set storage class .. or replicate straight to Amazon S3 Glacier Amazon S3 Replication time control new! Designed to replicate 99.99% of objects within 15 minutes

15 minute replication Monitor replication time backed by an using Amazon AWS Service Level CloudWatch metrics Agreement (SLA) and event notifications Amazon S3 Replication time control new! Designed to replicate 99.99% of objects within 15 minutes

Monitor your replication with 3 new CloudWatch metrics Optional: Set up alarms on your metrics

250000 600 800 200000 500 600 400 150000 300 400

BYTES 100000

COUNT 200 SECONDS 200 50000 100

0 0 0

9:03 9:14 9:04 9:11 9:01 9:12 9:00 9:01 9:02 9:04 9:05 9:06 9:07 9:08 9:09 9:10 9:11 9:12 9:13 9:15 9:00 9:01 9:02 9:03 9:05 9:06 9:07 9:08 9:09 9:10 9:12 9:13 9:14 9:15 9:00 9:02 9:03 9:04 9:05 9:06 9:07 9:08 9:09 9:10 9:11 9:13 9:14 9:15 Replication Latency Alarm Bytes Pending Replication Alarm Operations Pending Replication

Replication latency Bytes pending replication Operations pending replication The maximum number of seconds by The total number of bytes of objects The number of operations pending replication which the destination region is behind the pending replication for a given replication for a given replication rule source region for a given replication rule rule Enable Amazon S3 bucket versioning Use versioning to protect your data from accidental deletion

Create a new version with every upload Previous versions are retained, not overwritten Making delete requests without a version ID removes access to objects, but keeps the data Manage previous versions with lifecycle Transition or expire objects a specified number of days after they are no longer the current version Use lifecycle policies to expire object versions Set lifecycle policies to control the cost of noncurrent versions

objectversionexpiration Enabled 7 Amazon S3 Object Lock

Use Object Lock to store objects as write-once-read-many (WORM)

Compliance Governance Legal mode mode hold

Store compliant Store data in If you’re unsure data WORM format; how long you privileged users want your objects can modify to stay immutable retention controls © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Daily storage metrics Provides storage bytes by storage class and object count Amazon S3 CloudWatch request metrics Can be filtered by bucket, prefix, or tagged objects

Metric Name Value Metric Name Value

AllRequests Count BytesDownloaded MB

PutRequests Count BytesUploaded MB

GetRequests Count 4xxErrors Count

ListRequests Count 5xxErrors Count

DeleteRequests Count FirstByteLatency ms

HeadRequests Count TotalRequestLatency ms

PostRequests Count Amazon S3 CloudWatch percentiles metrics new! Amazon S3 request metrics on any percentile (e.g., p90, p99, p99.9, p100)

Understand the distribution of Amazon S3 request metrics Visualize and alarm on any percentile to identify outliers or unusual application behavior Avoid false alarms and save time spent monitoring and tracking requests Learn storage with AWS Training and Certification Resources created by the experts at AWS to help you build skills

45+ free digital courses cover topics related to cloud storage, including:

• Amazon S3 • Amazon Elastic • AWS Storage Gateway (Amazon EFS) • Amazon S3 Glacier • Amazon Elastic Block Store (Amazon EBS)

Classroom offerings, like Architecting on AWS, feature AWS expert instructors and hands-on activities

Visit aws.amazon.com/training/path-storage/

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you!

Rob Wilson, Shikha Sukumaran, and Matt Wheeler

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.