Scalability and Replication

Scalability and Replication Marco Serafini COMPSCI 532 Lecture 13 Scalability 2 Scalability • Ideal world • Linear scalability Speedup • Reality Ideal • Bottlenecks • For example: central coordinator • When do we stop scaling? Reality Parallelism 3 Scalability • Capacity oF a system to improve perFormance by increasing the amount oF resources available • Typically, resources = processors • Strong scaling • Fixed total problem size, more processors • Weak scaling • Fixed per-processor problem size, more processors 44 Scaling Up and Out • Scaling Up • More powerFul server (more cores, memory, disk) • Single server (or Fixed number oF servers) • Scaling Out • Larger number oF servers • Constant resources per server 55 Scalability! But at what COST? Frank McSherry Michael Isard Derek G. Murray Unaffiliated Microsoft Research Unaffiliated⇤ Abstract 50 1000 system A We offer a new metric for big data platforms, COST, 10 system A system B 100 system B or the Configuration that Outperforms a Single Thread. seconds The COST of a given platform for a given problem is the speed-up hardware configuration required before the platform out- 1 8 1 10 100 300 1 10 100 300 performs a competent single-threaded implementation. cores cores COST weighs a system’s scalability against the overheads introduced by the system, and indicates the actual Figure 1: Scaling and performance measurements performance gains of the system, without rewarding sys- for a data-parallel algorithm, before (system A) and tems that bring substantial but parallelizable overheads. after (system B) a simple performance optimization. We survey measurements of data-parallel systems re- The unoptimized implementation “scales” far better, cently reported in SOSP and OSDI, and find that many despite (or rather, because of) its poor performance. systems have either a surprisingly large COST, often hundreds of cores, or simply underperform one thread for all of their reported configurations. argue that many published big data systems more closely resemble system A than they resemble system B. 1 Introduction 1.1 Methodology “You can have a second computer once you’ve In this paper we take several recent graph processing pa- shown you know how to use the first one.” pers from the systems literature and compare their re- -Paul Barham ported performance against simple, single-threaded implementations on the same datasets using a high-end The published work on big data systems has fetishized 2014 laptop. Perhaps surprisingly, many published sys- scalability as the most important feature of a distributed tems have unbounded COST—i.e., no configuration out- data processing platform. While nearly all such publi- performs the best single-threaded implementation—for cations detail their system’s impressive scalability, few all of the problems to which they have been applied. directly evaluate their absolute performance against rea- The comparisons are neither perfect nor always fair, sonable benchmarks. To what degree are these systems but the conclusions are sufficiently dramatic that some truly improving performance, as opposed to parallelizing concern must be raised. In some cases the single- overheads that they themselves introduce? threaded implementations are more than an order of mag- Contrary to the common wisdom that effective scal- nitude faster than published results for systems using ing is evidence of solid systems building, any system hundreds of cores. We identify reasons for these gaps: can scale arbitrarily well with a sufficient lack of care in some are intrinsic to the domain, some are entirely avoid- its implementation. The two scaling curves in Figure 1 able, and others are good subjects for further research. present the scaling of a Naiad computation before (sys- We stress that these problems lie not necessarily with tem A) and after (system B) a performance optimization the systems themselves, which may be improved with is applied. The optimization, which removes paralleliz- time, but rather with the measurements that the authors able overheads, damages the apparent scalability despite provide and the standard that reviewers and readers de- resulting in improved performance in all configurations. mand. Our hope is to shed light on this issue so that While this may appear to be a contrived example, we will future research is directed toward distributed systems ⇤Derek G. Murray was unaffiliated at the time of his involvement, whose scalability comes from advances in system design but is now employed by Google Inc. rather than poor baselines and low expectations. 1 Scalability! But at what COST? Frank McSherry Michael Isard Derek G. Murray Unaffiliated Microsoft Research Unaffiliated⇤ What Does This Plot Tell You? Abstract 50 1000 system A system A system A We offer a new metric for big data platforms, COST, 10 system A We offer a new metric for big data platforms, COST, 10 100 system B 100 system B or the Configuration that Outperforms a Single Thread. seconds system B or the Configuration that Outperforms a Single Thread. seconds speed-up The COST of a given platform for a given problem is the speed-up hardware configuration required before the platform out- 1 8 1 10 100 300 1 10 100 300 performs a competent single-threaded implementation. cores cores COST weighs a system’s scalability against the over- 7 7 heads introduced by the system, and indicates the actual Figure 1: Scaling and performance measurements performance gains of the system, without rewarding sys- for a data-parallel algorithm, before (system A) and tems that bring substantial but parallelizable overheads. after (system B) a simple performance optimization. We survey measurements of data-parallel systems re- The unoptimized implementation “scales” far better, cently reported in SOSP and OSDI, and find that many despite (or rather, because of) its poor performance. systems have either a surprisingly large COST, often hundreds of cores, or simply underperform one thread for all of their reported configurations. argue that many published big data systems more closely resemble system A than they resemble system B. 1 Introduction 1.1 Methodology “You can have a second computer once you’ve In this paper we take several recent graph processing pa- shown you know how to use the first one.” pers from the systems literature and compare their re- -Paul Barham ported performance against simple, single-threaded implementations on the same datasets using a high-end The published work on big data systems has fetishized The published work on big data systems has fetishized 2014 laptop. Perhaps surprisingly, many published sys- scalability as the most important feature of a distributed scalability as the most important feature of a distributed tems have unbounded COST—i.e., no configuration out- data processing platform. While nearly all such publi- data processing platform. While nearly all such publi- performs the best single-threaded implementation—for cations detail their system’s impressive scalability, few cations detail their system’s impressive scalability, few all of the problems to which they have been applied. directly evaluate their absolute performance against rea- directly evaluate their absolute performance against rea- The comparisons are neither perfect nor always fair, sonable benchmarks. To what degree are these systems sonable benchmarks. To what degree are these systems but the conclusions are sufficiently dramatic that some truly improving performance, as opposed to parallelizing truly improving performance, as opposed to parallelizing concern must be raised. In some cases the single- overheads that they themselves introduce? overheads that they themselves introduce? threaded implementations are more than an order of mag- Contrary to the common wisdom that effective scal- Contrary to the common wisdom that effective scal- nitude faster than published results for systems using ing is evidence of solid systems building, any system ing is evidence of solid systems building, any system hundreds of cores. We identify reasons for these gaps: can scale arbitrarily well with a sufficient lack of care in can scale arbitrarily well with a sufficient lack of care in some are intrinsic to the domain, some are entirely avoid- its implementation. The two scaling curves in Figure 1 its implementation. The two scaling curves in Figure 1 able, and others are good subjects for further research. present the scaling of a Naiad computation before (sys- present the scaling of a Naiad computation before (sys- We stress that these problems lie not necessarily with tem A) and after (system B) a performance optimization tem A) and after (system B) a performance optimization the systems themselves, which may be improved with is applied. The optimization, which removes paralleliz- is applied. The optimization, which removes paralleliz- time, but rather with the measurements that the authors able overheads, damages the apparent scalability despite able overheads, damages the apparent scalability despite provide and the standard that reviewers and readers de- resulting in improved performance in all configurations. resulting in improved performance in all configurations. mand. Our hope is to shed light on this issue so that While this may appear to be a contrived example, we will While this may appear to be a contrived example, we will future research is directed toward distributed systems whose scalability comes from advances in system design ⇤Derek G. Murray was unaffiliated at the time of his involvement, whose scalability comes from advances in system design but is now employed by Google Inc. rather than poor baselines and low expectations. 1 Scalability! But at what COST? Frank McSherry Michael Isard Derek G. Murray Unaffiliated Microsoft Research Unaffiliated⇤ How About Now? Abstract 50 1000 system A We offer a new metric for big data platforms, COST, 10 system A system B 100 system B or the Configuration that Outperforms a Single Thread. seconds The COST of a given platform for a given problem is the speed-up hardware configuration required before the platform out- 1 8 1 10 100 300 1 10 100 300 performs a competent single-threaded implementation.

Load more