Advanced Architectures Goals of Distributed Computing Evaluating

6/7/2018 Advanced Architectures Goals of Distributed Computing 15A. Distributed Computing • better services 15B. Multi-Processor Systems – scalability • apps too big to run on a single computer 15C. Tightly Coupled Distributed Systems • grow system capacity to meet growing demand 15D. Loosely Coupled Distributed Systems – improved reliability and availability 15E. Cloud Models – improved ease of use, reduced CapEx/OpEx 15F. Distributed Middle-Ware • new services – applications that span multiple system boundaries – global resource domains, services (vs. systems) – complete location transparency Advanced Architectures 1 Advanced Architectures 2 Major Classes of Distributed Systems Evaluating Distributed Systems • Symmetric Multi-Processors (SMP) • Performance – multiple CPUs, sharing memory and I/O devices – overhead, scalability, availability • Single-System Image (SSI) & Cluster Computing • Functionality – a group of computers, acting like a single computer – adequacy and abstraction for target applications • loosely coupled, horizontally scalable systems • Transparency – coordinated, but relatively independent systems – compatibility with previous platforms • application level distributed computing – scope and degree of location independence – peer-to-peer, application level protocols • Degree of Coupling – distributed middle-ware platforms – on how many things do distinct systems agree – how is that agreement achieved Advanced Architectures 3 Advanced Architectures 4 SMP systems and goals Symmetric Multi-Processors • Characterization: – multiple CPUs sharing memory and devices CPU 1 CPU 2 CPU 3 CPU 4 interrupt controller • Motivations: cache cache cache cache – price performance (lower price per MIP) shared memory & device busses – scalability (economical way to build huge systems) – perfect application transparency device device device controller controller controller • Example: memory – multi-core Intel CPUs – multi-socket mother boards Advanced Architectures 5 Advanced Architectures 6 1 6/7/2018 SMP Price/Performance SMP Operating System Design • a computer is much more than a CPU • one processor boots with power on – mother-board, disks, controllers, power supplies, case – it controls the starting of all other processors – CPU might cost 10-15% of the cost of the computer • same OS code runs in all processors • adding CPUs to a computer is very cost-effective – one physical copy in memory, shared by all CPUs – a second CPU yields cost of 1.1x, performance 1.9x • – a third CPU yields cost of 1.2x, performance 2.7x Each CPU has its own registers, cache, MMU – • same argument also applies at the chip level they must cooperatively share memory and devices – making a machine twice as fast is ever more difficult • ALL kernel operations must be Multi-Thread-Safe – adding more cores to the chip gets ever easier – protected by appropriate locks/semaphores • massive multi-processors are obvious direction – very fine grained locking to avoid contention Advanced Architectures 7 Advanced Architectures 8 SMP Parallelism The Challenge of SMP Performance • scheduling and load sharing • scalability depends on memory contention – each CPU can be running a different process – memory bandwidth is limited, can't handle all CPUs – just take the next ready process off the run-queue – most references satisfied from per-core cache – processes run in parallel – if too many requests go to memory, CPUs slow down – most processes don't interact (other than in kernel) • scalability depends on lock contention • serialization – waiting for spin-locks wastes time – mutual exclusion achieved by locks in shared memory – context switches waiting for kernel locks waste time – locks can be maintained with atomic instructions • contention wastes cycles, reduces throughput – spin locks acceptable for VERY short critical sections – 2 CPUs might deliver only 1.9x performance – if a process blocks, that CPU finds next ready process – 3 CPUs might deliver only 2.7x performance Advanced Architectures 9 Advanced Architectures 10 Managing Memory Contention Non-Uniform Memory Architecture • Fast n-way memory is very expensive Symmetric Multi-Processors – without it, memory contention taxes performance CPU n CPU n+1 – cost/complexity limits how many CPUs we can add local local cache memory cache memory • Non-Uniform Memory Architectures (NUMA) – each CPU has its own memory PCI bridge PCI bridge • each CPU has fast path to its own memory PCI bus PCI bus – connected by a Scalable Coherent Interconnect CC NUMA device device CC NUMA device device • a very fast , very local network between memories interface controller controller interface controller controller • accessing memory over the SCI may be 3-20x slower – these interconnects can be highly scalable Scalable Coherent Interconnect (e.g. Intel Quick Path Interconnect) Advanced Architectures 11 Advanced Architectures 12 2 6/7/2018 OS design for NUMA systems Single System Image (SSI) Clusters • it is all about local memory hit rates • Characterization: – every outside reference costs us 3-20x performance – a group of seemingly independent computers – we need 75-95% hit rate just to break even collaborating to provide SMP-like transparency • • How can the OS ensure high hit-rates? Motivation: – – replicate shared code pages in each CPU's memory higher reliability, availability than SMP/NUMA – – assign processes to CPUs, allocate all memory there more scalable than SMP/NUMA – – migrate processes to achieve load balancing excellent application transparency – spread kernel resources among all the CPUs • Examples: – attempt to preferentially allocate local resources – Locus, MicroSoft Wolf-Pack, OpenSSI – migrate resource ownership to CPU that is using it – Oracle Parallel Server Advanced Architectures 13 Advanced Architectures 14 Modern Clustered Architecture OS design for SSI clustering • all nodes agree on the state of all OS resources geographic fail over switch switch – file systems, processes, devices, locks IPC ports – any process can operate on any object, transparently ethernet request replication • they achieve this by exchanging messages SMP SMP SMP SMP system #1 system #2 system #3 system #4 – advising one-another of all changes to resources • each OS's internal state mirrors the global state dual ported synchronous FC replication dual ported optional – request execution of node-specific requests primary site RAID RAID back-up site • node-specific requests are forwarded to owning node Active systems service independent requests in parallel. They cooperate to maintain • implementation is large, complex, difficult shared global locks, and are prepared to take over partner’s work in case of failure. State replication to a back-up site is handled by external mechanisms. • the exchange of messages can be very expensive Advanced Architectures 15 Advanced Architectures 16 SSI Clustered Performance Lessons Learned • clever implementation can minimize overhead • consensus protocols are expensive – 10-20% overall is not uncommon, can be much worse – they converge slowly and scale poorly • • complete transparency systems have a great many resources – – even very complex applications "just work" resource change notifications are expensive • – they do not have to be made "network aware" location transparency encouraged non-locality – remote resource use is much more expensive • good robustness • a greatly complicated operating system – when one node fails, others notice and take-over – distributed objects are more complex to manage – often, applications won't even notice the failure – complex optimizations to reduce the added overheads • nice for application developers and customers – new modes of failure w/complex recovery procedures – but they are complex, and not particularly scalable • Bottom Line: Deutsch was right! Advanced Architectures 17 Advanced Architectures 18 3 6/7/2018 Loosely Coupled Systems Horizontal Scalability w/HA • Characterization: WAN to clients – a parallel group of independent computers load balancing switch – serving similar but independent requests w/fail-over – minimal coordination and cooperation required … web web web web web app app app app app … D1 • Motivation: server server server server server server server server server server – scalability and price performance – availability – if protocol permits stateless servers content HA – distribution database D2 ease of management, reconfigurable capacity server server • Examples: – web servers, Google search farm, Hadoop Advanced Architectures 19 Advanced Architectures 20 (elements of architecture) Horizontally scaled performance • farm of independent servers • individual servers are very inexpensive – servers run same software, serve different requests – blade servers may be only $100-$200 each – may share a common back-end database • scalability is excellent – • front-ending switch 100 servers deliver approximately 100x performance • – distributes incoming requests among available servers service availability is excellent – front-end automatically bypasses failed servers – can do both load balancing and fail-over – stateless servers and client retries fail-over easily • service protocol • the challenge is managing thousands of servers – stateless servers and idempotent operations – automated installation, global configuration services – successive requests may be sent to different servers – self monitoring, self-healing systems Advanced Architectures 21 Advanced Architectures 22 Limited Transparency Clusters Limited LocationTransparency

Advanced Architectures Goals of Distributed Computing Evaluating

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support