Building Secure and Reliable Systems
Total Page:16
File Type:pdf, Size:1020Kb
Building Secure & Reliable Systems Best Practices for Designing, Implementing and Maintaining Systems Compliments of Heather Adkins, Betsy Beyer, Paul Blankinship, Piotr Lewandowski, Ana Oprea & Adam Stubblefi eld Praise for Building Secure and Reliable Systems It is very hard to get practical advice on how to build and operate trustworthy infrastructure at the scale of billions of users. This book is the first to really capture the knowledge of some of the best security and reliability teams in the world, and while very few companies will need to operate at Google’s scale many engineers and operators can benefit from some of the hard-earned lessons on securing wide-flung distributed systems. This book is full of useful insights from cover to cover, and each example and anecdote is heavy with authenticity and the wisdom that comes from experimenting, failing and measuring real outcomes at scale. It is a must for anybody looking to build their systems the correct way from day one. —Alex Stamos, Director of the Stanford Internet Observatory and former CISO of Facebook and Yahoo This book is a rare treat for industry veterans and novices alike: instead of teaching information security as a discipline of its own, the authors offer hard-wrought and richly illustrated advice for building software and operations that actually stood the test of time. In doing so, they make a compelling case for reliability, usability, and security going hand-in-hand as the entirely inseparable underpinnings of good system design. —Michał Zalewski, VP of Security Engineering at Snap, Inc. and author of The Tangled Web and Silence on the Wire This is the “real world” that researchers talk about in their papers. —JP Aumasson, CEO at Teserakt and author of Serious Cryptography Google faces some of the toughest security challenges of any company, and they’re revealing their guiding security principles in this book. If you’re in SRE or security and curious as to how a hyperscaler builds security into their systems from design through operation, this book is worth studying. —Kelly Shortridge, VP of Product Strategy at Capsule8 If you’re responsible for operating or securing an internet service: caution! Google and others have made it look too easy. It’s not. I had the privilege of working with these book authors for many years and was constantly amazed at what they uncovered and their extreme measures to protect our users’ data. If you have such responsibilities yourself, or if you’re just trying to understand what it takes to protect services at scale in the modern world, study this book. Nothing is covered in detail—there are other references for that—but I don’t know anywhere else that you’ll find the breadth of pragmatic tips and frank discussion of tradeoffs. —Eric Grosse, former VP of Security Engineering at Google Building Secure and Reliable Systems Best Practices for Designing, Implementing, and Maintaining Systems Heather Adkins, Betsy Beyer, Paul Blankinship, Piotr Lewandowski, Ana Oprea, and Adam Stubblefield Beijing Boston Farnham Sebastopol Tokyo Building Secure and Reliable Systems by Heather Adkins, Betsy Beyer, Paul Blankinship, Piotr Lewandowski, Ana Oprea, and Adam Stubblefield Copyright © 2020 Google LLC. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (https://oreilly.com). For more information, contact our corporate/institu‐ tional sales department: 800-998-9938 or [email protected]. Acquisitions Editor: John Devins Indexer: WordCo, Inc. Development Editor: Virginia Wilson Interior Designer: David Futato Production Editor: Kristen Brown Cover Designer: Karen Montgomery Copyeditor: Rachel Head Illustrators: Jenny Bergman and Rebecca Demarest Proofreader: Sharon Wilkey March 2020: First Edition Revision History for the First Edition 2020-03-11: First Release See https://oreilly.com/catalog/errata.csp?isbn=9781492083122 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Building Secure and Reliable Systems, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the authors, and do not represent the publisher’s views or the views of the authors’ employer (Google). While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher, the authors, and Google disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. This work is part of a collaboration between O’Reilly and Google. See our statement of editorial inde‐ pendence. 978-1-492-08313-9 [LSI] To Susanne, whose strategic project management and passion for reliability and security kept this book on track! Table of Contents Foreword by Royal Hansen. xix Foreword by Michael Wildpaner. xxiii Preface. xxv Part I. Introductory Material 1. The Intersection of Security and Reliability. 3 On Passwords and Power Drills 3 Reliability Versus Security: Design Considerations 4 Confidentiality, Integrity, Availability 5 Confidentiality 6 Integrity 6 Availability 6 Reliability and Security: Commonalities 7 Invisibility 7 Assessment 8 Simplicity 8 Evolution 9 Resilience 9 From Design to Production 10 Investigating Systems and Logging 11 Crisis Response 11 Recovery 12 Conclusion 13 vii 2. Understanding Adversaries. 15 Attacker Motivations 16 Attacker Profiles 18 Hobbyists 18 Vulnerability Researchers 18 Governments and Law Enforcement 19 Activists 21 Criminal Actors 22 Automation and Artificial Intelligence 24 Insiders 24 Attacker Methods 30 Threat Intelligence 30 Cyber Kill Chains™ 31 Tactics, Techniques, and Procedures 32 Risk Assessment Considerations 32 Conclusion 34 Part II. Designing Systems 3. Case Study: Safe Proxies. 37 Safe Proxies in Production Environments 37 Google Tool Proxy 40 Conclusion 42 4. Design Tradeoffs. 43 Design Objectives and Requirements 44 Feature Requirements 44 Nonfunctional Requirements 45 Features Versus Emergent Properties 45 Example: Google Design Document 47 Balancing Requirements 49 Example: Payment Processing 50 Managing Tensions and Aligning Goals 54 Example: Microservices and the Google Web Application Framework 54 Aligning Emergent-Property Requirements 55 Initial Velocity Versus Sustained Velocity 56 Conclusion 59 5. Design for Least Privilege. 61 Concepts and Terminology 62 Least Privilege 62 viii | Table of Contents Zero Trust Networking 62 Zero Touch 63 Classifying Access Based on Risk 63 Best Practices 65 Small Functional APIs 65 Breakglass 67 Auditing 68 Testing and Least Privilege 71 Diagnosing Access Denials 73 Graceful Failure and Breakglass Mechanisms 74 Worked Example: Configuration Distribution 74 POSIX API via OpenSSH 75 Software Update API 76 Custom OpenSSH ForceCommand 76 Custom HTTP Receiver (Sidecar) 77 Custom HTTP Receiver (In-Process) 77 Tradeoffs 77 A Policy Framework for Authentication and Authorization Decisions 78 Using Advanced Authorization Controls 79 Investing in a Widely Used Authorization Framework 80 Avoiding Potential Pitfalls 80 Advanced Controls 81 Multi-Party Authorization (MPA) 81 Three-Factor Authorization (3FA) 82 Business Justifications 84 Temporary Access 85 Proxies 85 Tradeoffs and Tensions 86 Increased Security Complexity 86 Impact on Collaboration and Company Culture 86 Quality Data and Systems That Impact Security 87 Impact on User Productivity 87 Impact on Developer Complexity 87 Conclusion 87 6. Design for Understandability. 89 Why Is Understandability Important? 90 System Invariants 91 Analyzing Invariants 92 Mental Models 93 Designing Understandable Systems 94 Complexity Versus Understandability 94 Table of Contents | ix Breaking Down Complexity 95 Centralized Responsibility for Security and Reliability Requirements 96 System Architecture 97 Understandable Interface Specifications 98 Understandable Identities, Authentication, and Access Control 100 Security Boundaries 105 Software Design 111 Using Application Frameworks for Service-Wide Requirements 112 Understanding Complex Data Flows 113 Considering API Usability 116 Conclusion 119 7. Design for a Changing Landscape. 121 Types of Security Changes 122 Designing Your Change 122 Architecture Decisions to Make Changes Easier 123 Keep Dependencies Up to Date and Rebuild Frequently 123 Release Frequently Using Automated Testing 124 Use Containers 124 Use Microservices 125 Different Changes: Different Speeds, Different Timelines 127 Short-Term Change: Zero-Day Vulnerability 129 Medium-Term Change: Improvement to Security Posture 132 Long-Term Change: External Demand 136 Complications: When Plans Change 138 Example: Growing Scope—Heartbleed 140 Conclusion 141 8. Design for Resilience. 143 Design Principles for Resilience 144 Defense in Depth 145 The Trojan Horse 145 Google App Engine Analysis 147 Controlling Degradation 150 Differentiate Costs of Failures 152 Deploy Response Mechanisms 154 Automate