Research Cloud Program Accelerating Science and Innovation

AWS Research Cloud Program Accelerating Science and Innovation Researcher’s Handbook (Version 2.0, Released 2017-10-01) Edited by Kevin Jorissen and Brendan Bouffler with contributions by the Research & Technical Computing Group Amazon Web Services © 2017, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational purposes only. It represents AWS’s current product offerings and practices as of the date of issue of this document, which are subject to change without notice. Users are responsible for making their own independent assessment of the information in this document and any use of AWS’s products or services, each of which is provided “as is” without warranty of any kind, whether express or implied. This document does not create any warranties, representations, contractual commitments, conditions or assurances from AWS, its affiliates, suppliers or licensors. The responsibilities and liabilities of AWS to its customers are controlled by AWS agreements. This handbook is not part of, and does not modify any agreement between AWS and its customers. This handbook may include a set of suggested solutions but should not be construed as a legally-binding offer from AWS. For current prices of AWS services, please refer to the AWS website at www.aws.amazon.com. AWS Research Cloud Program Accelerating Science and Innovation FOREWORD When I was asked to write the foreword for a book that explains how easy it is to use cloud computing for research, I jumped at the opportunity. Technology should never be an obstacle to researchers and their ability to gain new insights into our world. For the past 20 years, I’ve run several research departments at the Fraunhofer Institute for Industrial Mathematics (ITWM) as well as leading our IT department for more than 10 of those years. I always felt it obvious to look for solutions that make the life of a researcher developing and using simulation software easier - and to transform IT to support this. In 2001, I wrote a proposal called “I-LAB” (Internet-Lab), taking inspiration from the MOSAIC project that led to the first web browser. “The browser was the basis for the initial success of the internet, and with I-LAB, the internet should become a ´problem solving environment´, first for researchers, then for large companies and finally for everybody”, I wrote at the time. However, we understood then that the protocols and concepts of Grid Computing had been too complex to become a real success. At Fraunhofer, we started to use web services intensively to automate fault-tolerant execution, provisioning and large-scale parallelization. We started to develop what is today known as BeeGFS - a parallel file system – and we developed industry- grade applications based on asynchronous communication protocols. After Amazon Web Services launched Amazon S3 and Amazon EC2, we were ready to go, and within a few weeks, I ran several large parallel finance applications in the cloud directly from my laptop. I was even able to do this in real time – in fact, live during lectures I was giving on the subject! Since then AWS has become the most flexible and versatile web services company and has set the standards for cloud computing. This book is an easy to read, step-by-step guide to AWS and its services. It describes the application ecosystem that is helping researchers around the globe to provision software components and set up their research IT infrastructure with just a few mouse clicks. Today I head up the HPC Department at Fraunhofer ITWM where we develop applications that rely on highly specialized HPC platforms. We now use the cloud to run many of these applications at scale and today cloud and HPC coexist happily because for many problems, the limits of a supercomputer are where we begin our work. In most universities and research institutes, it is considered a waste of valuable resources to run early stage developments, simple farming jobs or weakly coupled applications on highly specialized systems when there is a readily available cloud nearby that can scale up to whatever extent your science requires. I’m still fascinated by the ongoing revolution that’s come with the availability of cloud infrastructures and what this means for research and startups. For those like me that are interested in this broader view, I still recommend picking up Nicolas Carr´s book from 2008, “The Big Switch”, where he describes the changes the use of cloud infrastructures will bring. Along with machine intelligence these topics will have a large impact on science and society. I recommend this current document to get answers on why all this is important to you as a researcher, as well as practical knowledge on how to get started. This is a book written not for the IT guy, but for the researcher who’s seeking a better way to get their work done. Dr. Franz-Josef Pfreundt Division Director, Fraunhofer Institute for Industrial Mathematics Kaiserslautern, 8 December 2016 Researcher’s Handbook – v2.0 – 2017-10-01 Page i AWS Research Cloud Program Accelerating Science and Innovation Contents 1 INTRODUCTION TO THE AWS RESEARCHER’S HANDBOOK ...................................... 1 1.1 WHAT IS CLOUD COMPUTING? ...................................................................................... 2 1.2 WHAT IS THE AWS CLOUD? .......................................................................................... 2 1.3 WHY AWS FOR SCIENCE AND RESEARCH? .................................................................... 6 1.4 AWS COMPUTE AND STORAGE ....................................................................................11 1.5 AWS MARKETPLACE ...................................................................................................14 1.6 SCIENCE AS A SERVICE................................................................................................15 1.7 SECURITY ...................................................................................................................16 1.8 DATA PRIVACY ............................................................................................................17 1.9 THE HANDBOOK AND YOU ............................................................................................17 2 SETTING UP YOUR AWS ACCOUNT ..............................................................................19 2.1 CREATING AN AWS ACCOUNT AND REGISTERING FOR RCP BENEFITS ...........................19 2.2 CREATING A NEW AWS ACCOUNT THROUGH A PARTNER PORTAL (INTERNET2, ARCUS, COMPAREX, SPARKLE) ......................................................................................................20 2.3 GETTING A NEW AWS ACCOUNT DIRECTLY WITH AWS ..................................................25 2.4 THE ROOT ACCOUNT ...................................................................................................27 2.5 CREATING IAM LOGINS ................................................................................................28 2.6 AWS ORGANIZATIONS .................................................................................................36 2.7 SETTING UP SSH ACCESS KEYS ..................................................................................36 3 BUDGETING .....................................................................................................................41 3.1 SETTING BUDGETS IN YOUR AWS ACCOUNT .................................................................41 3.2 OPTIONAL: THE BUDGET SAFETY SWITCH .....................................................................46 3.3 AMAZON EC2 LAUNCH LIMITS ......................................................................................56 3.4 NEXT STEPS ...............................................................................................................57 4 WORKING WITH DATA ....................................................................................................58 4.1 RANGE OF STORAGE TYPES .........................................................................................58 4.2 STORING YOUR RESEARCH DATA .................................................................................59 4.3 SHARING THE DATA IN YOUR S3 BUCKET. ......................................................................64 4.4 GLOBAL DATA EGRESS WAIVER FOR RESEARCH ...........................................................65 4.5 VERY LARGE DATA TRANSFERS TO S3...........................................................................65 4.6 DATA STREAM INGESTION ............................................................................................66 4.7 OPEN DATA MEANS MORE SCIENTIFIC IMPACT ...............................................................66 5 WORKING WITH SENSITIVE AND CONTROLLED-ACCESS DATA ..............................71 5.1 THE SHARED RESPONSIBILITY SECURITY MODEL ...........................................................71 5.2 MEETING SECURITY AND COMPLIANCE REQUIREMENTS WITH AWS QUICK STARTS ..........73 6 WORKING WITH COMPUTE ............................................................................................75 6.1 THE AMAZON ELASTIC COMPUTE CLOUD (AMAZON EC2) ..............................................75 6.2 AMAZON EC2 COMPUTE INSTANCE TYPES ....................................................................76 6.3 HOW ARE EC2 COMPUTE INSTANCES PRICED? ..............................................................77 6.4 OTHER COMPUTE SERVICES .........................................................................................82

Research Cloud Program Accelerating Science and Innovation

GAO-17-14 Accessible Version, OPEN INNOVATION: Practices to Engage

An Open Ecosystem Engagement Strategy Through the Lens of Global Food Safety [Version 1; Peer Review: 2 Approved]

OPEN INNOVATION Strategies the Federal Government Can Use to Gather Ideas and Expertise from the Public an OVERVIEW of GAO-17-14

Exploring Precision FDA, an Online Platform for Crowdsourcing Genomics Jordan Paradise Loyola University Chicago, School of Law, [email protected]

The Future Role of HPC in Medical Product Decision Making

US Department of Health and Human Services Open

BIT-2015-Agenda.Pdf

Developing Open Source Tools for Clinical Trials

Open Collaboration in the Public Sector : the Case of Social Coding on Github

Plan to Increase Access to Results of FDA-Funded Scientific Research

In Plain Sight: Is Open Data Improving Our Health?

CAPITALIZING on the OPEN DATA REVOLUTION Click Here to Save the Date! FOREWORD: WELCOME to the DATA-DRIVEN GOVERNMENT