Ultra-Large-Scale Sites

Ultra-large-scale Sites <working title> – Scalability, Availability and Performance in Social Media Sites (picture from social network visualization?) Walter Kriha With a forword by << >> Walter Kriha, Scalability and Availability Aspects…, V.1.9.1 page 1 03/12/2010 Copyright <<ISBN Number, Copyright, open access>> ©2010 Walter Kriha This selection and arrangement of content is licensed under the Creative Commons Attribution License: http://creativecommons.org/licenses/by/3.0/ online: www.kriha.de/krihaorg/ ... <a rel="license" href="http://creativecommons.org/licenses/by/3.0/de/"><img alt="Creative Commons License" style="border-width:0" src="http://i.creativecommons.org/l/by/3.0/de/88x31.png" /></a><br /><span xmlns:dc="http://purl.org/dc/elements/1.1/" href="http://purl.org/dc/dcmitype/Text" property="dc:title" rel="dc:type"> Building Scalable Social Media Sites</span> by <a xmlns:cc="http://creativecommons.org/ns#" href="wwww.kriha.de/krihaorg/books/ultra.pdf" property="cc:attributionName" rel="cc:attributionURL">Walter Kriha</a> is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/3.0/de/">Creative Commons Attribution 3.0 Germany License</a>.<br />Permissions beyond the scope of this license may be available at <a xmlns:cc="http://creativecommons.org/ns#" href="www.kriha.org" rel="cc:morePermissions">www.kriha.org</a>. Walter Kriha, Scalability and Availability Aspects…, V.1.9.1 page 2 03/12/2010 Acknowledgements <<master course, Todd Hoff/highscalability.com..>> Walter Kriha, Scalability and Availability Aspects…, V.1.9.1 page 3 03/12/2010 ToDo’s - The role of ultra fast networks (Infiniband) on distributed algorithms and behaviour with respect to failure models - more on group behaviour from Clay Shirky etc. into the first part (also modelling of social groups and data) - OpenSocial as a modelling example. Does it scale? - finish chapter of popular sites and their architecture - alternative architectures better explained (spaces, queues) - cloud APIs (coming) - consensus algs for the lowest parts explained - failure models (empirical and theoretical, in connection with consensus algs) - practical part: ideas for monitoring, experiments, extending a site into a community site as an example, darkstar/wonderland scalability - feature management as a core technique (example: MMOGs) - ..and so on… - Time in virtual machines - The effect of virtual machines on distributed algorithms, e.g. consensus - Modelling performance with palladio - Space based architecture alternative - eventbasierte Frameworks (node.js / eventmachine) in I/O - client side optimization hints - queuing with data bases ( http://www.slideshare.net/postwait/postgresql- meet-your-queue ) - spanner: googles next infrastructure, http://www.royans.net/arch/spanner- googles-next-massive-storage-and-computation-infrastructure - CAP explanation: http://www.instapaper.com/text?u=http%3A%2F%2Fwww.julianbrowne.com%2 Farticle%2Fviewer%2Fbrewers-cap-theorem - Puppet config management: - http://bitfieldconsulting.com/puppet-vs-chef - Agile but extremely large systems configuration problems! Walter Kriha, Scalability and Availability Aspects…, V.1.9.1 page 4 03/12/2010 Foreword <<by ?>> Walter Kriha, Scalability and Availability Aspects…, V.1.9.1 page 5 03/12/2010 Copyright 2 Acknowledgements 3 ToDo’s 4 Foreword 5 Introduction 13 Part I: Media, People and Distributed Systems 17 Media 18 Meaning across Space and Time 18 Partitioning 18 Social Media 19 Being digital, distributed and social 19 Short Digression: The fragile concept of ownership in digital times 20 Superstructures 24 Social Media and their Price 24 People – communicating, participating, collaborating 25 Coordination 26 Where is the Money? 29 Findability 30 Epidemics 31 Group Behavior 31 Social Graphs 32 Superstructures 32 The API Web – the Sensor Web – the Open Web? 33 Supersize Me – on network effects and endless growth 33 Security 34 Federated Access Control to Private Data 36 De-Anonymization of Private Data 37 Identity Spoofing in Social Networks 38 Scams 39 Bootstrapping a large community 40 Part II: Distributed Systems 41 Basics of Distributed Computing Systems 42 Remoteness, Concurrency and Interactions 42 Functions of distributed systems 43 Manifestation: Middleware and Programming Models 45 Theoretical Underpinnings 47 Topologies and Communication Styles 49 Classic Client/Server Computing 49 The Web Success Model 49 REST Architecture of the Web 50 Web2.0 and beyond 53 Web-Services and SOA 56 Walter Kriha, Scalability and Availability Aspects…, V.1.9.1 page 6 03/12/2010 Peer networks 59 Distributed Hashtable Approaches 60 Bittorrent Example 63 Special Hierarchies 64 Compute Grids 65 Event-Driven Application Architectures and Systems 66 Reliability, Availability, Scalability, Performance (RASP) 71 Resilience and Dependability 71 Scalability 72 Availability 75 Concepts and Replication Topologies 79 Failure Modes and Detection 85 J2EE Clustering for Scalability and Availability 89 Reliability 97 Deployment 97 Reliability and Scalability Tradeoff in Replication Groups 98 Performance 98 Monitoring and Logging 99 Distribution in Media Applications 99 Storage Subsystems for HDTV media 99 Audio Server for Interactive Rooms 103 Distributed Rendering in 3DSMAX 105 Understanding the Rendering Network Components of 3dsMax 105 Using partitioning to speed things up 107 Part III: Ultra Large Scale Distributed Media Architectures 109 Analysis Framework 110 Examples of Large Scale Social Sites 113 Wikipedia 113 Myspace 113 Flickr 115 Facebook 118 PlentyOfFish 118 Twitter – “A short messaging layer for the internet (A.Payne)” 118 Digg 119 Google 119 YouTube 119 Amazon 120 LiveJournal Architecture 120 LavaBit E-mail Provider 120 Stack Overflow 120 Massively Multiplayer Online Games (MMOGs) 122 On Shards, Shattering and Parallel Worlds 124 Shard Architecture and visible partitioning 125 Shardless Architecture and Dynamic Reconfiguration 127 Feature and Social Management 129 Walter Kriha, Scalability and Availability Aspects…, V.1.9.1 page 7 03/12/2010 Security in MMOGs 131 Methodologies in Building Large-Scale Sites 131 Limits in Hardware and Software – on prices, performance etc. 131 A History of Large Scale Site Technology 133 Growing Pains – How to start small and grow big 133 Feature Management 134 Patterns and Anti-Patterns of Scalability 134 Test and Deployment Methodology 135 Client-Side Optimizations 136 A Model for RASP in Large Scale Distribution 138 Canonical or Classic Site Architecture 138 Classic Document-Oriented Large Site Architecture (Wikipedia) 140 Message Queuing System (Twitter) 140 Social Data Distributor (Facebook) 140 Space-Based Programming 141 Queuing theory, OR 141 Basic Concepts 141 Applications of QT concepts in multi-tier Systems 151 Service Demand Reduction: Batching and Caching 151 Service Demand Reduction: Data-in-Index 153 Service Demand Measurements 153 The n-tier funnel architecture 154 Cost of slow machines in mid- or end-tier 154 Queue length and Residence Time 156 Output traffic shaping 156 The realism of Queuing Theory based Models for distributed systems 157 Request Processing: Asynchronous and/or fixed service time 157 Heterogeneous hardware and self-balancing algorithms 158 Dispatch in Multi-Queue Servers 158 Unfair Dispatch: Shortest Remaining Processing Time First 158 Request Design Alternatives 159 Heijunka 160 Tools for QT-Analysis 161 Applicability of QT in large-scale multi-tier architectures 162 Combinatorial Reliability and Availability Analysis 162 Stochastic Availability Analysis 168 Guerilla Capacity Planning 168 Concurreny and Coherence 169 Calculation of contention and coherence parameters 172 Client Distribution over Day/Week/Year 175 Simulation 175 Tools for statistical analysis, queuing models and simulation 176 Architectural Principles and Metrics 177 Architectural Principles 178 Metrics 178 Changes in Perspective 178 Part IV: System Components 179 System Components for Distributed Media 179 Component Interaction and Hierarchy 179 Walter Kriha, Scalability and Availability Aspects…, V.1.9.1 page 8 03/12/2010 Latency, Responsiveness and Distribution Architecture 179 Adaptations to media 184 Content Delivery Networks (CDN) 186 HA-Service Distributor 188 Distributed Load Balancers 189 Distributed Caching – not an Optimization 191 Caching and Application Architecture 191 Caching Strategies 192 When not to cache 192 Invalidation Events vs. Timeout 193 Operational Criticality 193 Pre-Loading of Caches 193 Local or distributed caches 193 Partitioning Schemes 194 Memory or Disk 194 Distribution of values 194 Granularity 194 Statistics 194 Size and Replacement Algorithms 195 Cache Hierarchies 195 Memcached 195 Fragment Architecture and Processor 197 Compression 201 Local or predictive processing 202 Search Engine Architecture and Integration 202 Special Web Servers (light-weight) 203 A pull based Web Server Design? 203 Scheduler and parallel Processor 204 High-availability failure detector 204 and lock service 204 Buffering and compensation for networked audio 204 Data Center Architecture 205 Geographically Dispersed Data Centers and Topology 205 Scale-out vs. Scale-up 206 Data Stores 208 Requirements and Criteria 209 virtualized storage: 209 External Storage Sub-Systems 210 Grid-Storage/Distributed File Systems 210 Distributed Clustered Storage 214 ZFS 215 Database Partitioning and Sharding 215 Cache concepts with shards and partitions 222 Why Sharding is Bad 223 Social data examples and modeling: 224 Partitioning concepts and consequences

Ultra-Large-Scale Sites

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support