High-Performance Pdc (Parallel and Distributed Computing)

HIGH-PERFORMANCE PDC (PARALLEL AND DISTRIBUTED COMPUTING) Prasun Dewan Department of Computer Science University of North Carolina at Chapel Hill [email protected] GOAL High Performance PDC (Parallel and Distributed Computing) Important for traditional problems when computers were slow Back to the future with emergence of data science Modern popular software abstractions: OpenMP (Parallel Computing and Distributed Shared Memory) Late nineties- MapReduce (Distributed Computing) 2004- OpenMPI (~ Sockets, not covered) Prins 633 course Covers HPC PDC algorithms and OpenMP in detail. Don Smith 590 Course Covers MapReduce uses in detail Connect OpenMP and MapReduce to Each other (reduction) Non HPC PDC Research (with which I am familiar) Derive the design and implementation of both kinds of abstractions From similar but different performance issues that motivate them. 2 A TALE OF TWO DISTRIBUTION KINDS Remotely Accessible Services (Printers, Desktops) Replicated Repositories (Files, Databases) Differences between the Collaborative Applications two groups? (Games, Shared Desktops) Distributed Sensing (Disaster Prediction) Computation Distribution (Number, Matrix Multiplication) 3 PRIMARY/SECONDARY DISTRIBUTION REASON Primary Secondary Remotely Accessible Services (Printers, Desktops) Remote Service Replicated Repositories (Files, Fault Tolerance, Databases) Availabilty Collaborative Applications Collaboration among Speedup (Games, Shared distributed users Desktops) Distributed Sensing Aggregation of (Disaster Prediction) Distributed Data Computation Remote Distribution High-Performance: (Number, Matrix service, Speedup Multiplication) aggregation,… 4 DIST. VS PARALLEL-DIST. EVOLUTION Non Remotely Accessible Distributed Services (Printers, Existing Desktops) Program Replicated Repositories (Files, Databases) Distributed Distributed T T T T Collaborative Program Program Applications (Games, Shared Desktops) Non Distributed Non Distributed Distributed Sensing Existing T T (Disaster Prediction) Single-Thread Multi-Thread Program Program Computation Distribution Distributed Distributed (Number, Matrix T Program T T Program T Multiplication) 5 CONCURRENCY-DISTRIBUTION RELATIONSHIP Remotely Accessible Threads complement blocking IPC primitives by Services (Printers, improving responsiveness, removing deadlocks, Desktops) and allowing certain kinds of distributed applications that would otherwise not be Replicated possible Repositories (Files, e.g. Server waiting for messages from multiple Databases) blocking sockets, NIP selector threads sending read data to read thread, creating a separate Collaborative thread for incoming remote calls Applications (Games, Shared Desktops) Thread decomposition for speedup and replaced by process decomposition and can bee present in Distributed Sensing decomposed processes. (Disaster Prediction) e.g. Single-process: each row of matrix A multiplied with column of A by separate thread Computation Distribution e.g. Multi-process: Each row of matrix assigned to (Number, Matrix a process, which uses different threads to Multiplication) multiply it with different columns of matrix B. 6 ALGORITHMIC CHALLENGE Remotely Accessible Services (Printers, Consistency: How to define and implement Desktops) correct coupling among distributed processes? Replicated Repositories (Files, Databases) Collaborative Applications (Games, Shared Desktops) Distributed Sensing (Disaster Prediction) Computation How to parallelize/distribute single- Distribution thread algorithms? (Number, Matrix Multiplication) 7 CENTRAL MEDIATOR Remotely Accessible Services (Printers, Desktops) Clients often talk to each through a central server whose code is Replicated unaware of specific arbitrary Repositories (Files, client/slave locations and ports Databases) Collaborative Applications (Games, Shared Desktops) Distributed Sensing (Disaster Prediction) A central master process/thread often decomposes problem and Computation combines results computed by slave Distribution agents, but decomposer knows (Number, Matrix about the nature of slaves, which are Multiplication) the service providers. 8 IMPLEMENTATION CHALLENGE Remotely Accessible Services (Printers, How to add to single-process local Desktops) observable/observer, producer- consumer and synchronization Replicated relationships corresponding distributed Repositories (Files, observable/observer, producer- Databases) consumer and synchronization relationships? Collaborative Applications (Games, Shared Distributed separation of concerns! Desktops) Distributed Sensing (Disaster Prediction) Computation How to reuse existing single-thread Distribution code in multi-thread/multi-process (Number, Matrix program? Multiplication) 9 SECURITY ISSUES Remotely Accessible Services (Printers, Communicating processes created Desktops) independently on typically geographically dispersed Replicated autonomous hosts, raising security Repositories (Files, issues Databases) Collaborative Applications (Games, Shared Desktops) Distributed Sensing Communicating threads/processes (Disaster Prediction) created on hosts/processors typically under control of one authority. Computation Though crowd problem solving is Distribution (Number, Matrix an infrequent exception: UW Condor – part of Cverse/XSEDE, Multiplication) 10 Wagstaff primes FAULT TOLERANCE VS SECURITY Byzantine and general Usually less autonomy security problems ∝ Fault Tolerance in HPC autonomy Byzantine Fault Non-Byzantine Tolerance Fault Tolerance Block chain Two-Phase Commit Paxos Algorithm Message Loss, Manipulation Computer Failure 11 (Adversary) LIFETIME OF PROCESSES/THREADS Remotely Accessible Services (Printers, Desktops) Replicated Repositories (Files, Long-lived, processes Databases) need to be explicitly terminated Collaborative Applications (Games, Shared Desktops) Distributed Sensing (Disaster Prediction) Computation Short-lived, terminate Distribution when computation (Number, Matrix complete Multiplication) 12 FAULT TOLERANCE Remotely Accessible Services (Printers, Faults can be long lived and Desktops) propagate as goal is to couple processes Replicated Repositories (Files, Databases) Collaborative Applications (Games, Shared Desktops) Distributed Sensing (Disaster Prediction) More independent work as goal is Computation to divide common work. Thus Distribution faults usually do not propagate. (Number, Matrix Short-term task can be simply Multiplication) completely or partially restarted. 13 COUPLING VS HIGH-PERFORMANCE Coupling/Consistency High Performance Independent, “long-lived” A single short-lived process processes at different created to perform some autonomous locations made computation made multi- dependent for resource threaded and/or distributed sharing, user collaboration, fault tolerance. to increase performance. Include consistency algorithms, Speedup algorithms that possibly in separate threads, to replace logic of existing define dependency that have code, with task distributed producer-consumer, decomposition observer relationship with existing algorithms May involve a central mediating, distributing May use additional mediating master code but it can be server and other infrastructure code unaware of specific aware of and creator of clients. specific slave processes Division of labor between client Division of labor among and infrastructure an issue master and slave and slave (centralized vs replicated) and slaves an issue 14 EXAMPLE: SERVICE + SPEEDUP Comp 401 Slave Client Grader Assignment-level Server Master decomposition Slave Client1 TS GraderGrader Server TG Comp 533 Grader Client2 Slave Master Slave T1 Slave 4 Comp 533 4 533 T 533 T Test-level T3 T4 Grader T5 T6 decomposition Tester Tester T2 15 ABSTRACTIONS Remotely Accessible Services (Printers, Desktops) Threads, IPC, Bounded Buffer Distributed Repositories (Files, Databases) Collaborative Applications (Games, Shared Desktops) Distributed Sensing (Disaster Prediction) Computation Distribution Higher-level abstractions for different (Number, Matrix classes of speedup algorithms? Multiplication) 16 COMPLETE AUTOMATION? Remotely Accessible Modules of Non-Distributed Program Services (Printers, Desktops) Distributed Transparently Loader contacts registry to Distributed determine if local module loaded or Repositories (Files, Databases) remote module accessed Assumes one name space, one Collaborative instance of each service – cannot Applications handle replication (Games, Shared Desktops) Motivates RPC transparency Distributed Sensing (Disaster Prediction) Parallelizing compilers (Kuck and Kennedy) Computation Distribution Halting problem (Number, Matrix Multiplication) Motivates Declarative Abstractions 17 DECLARATIVE ABSTRACTIONS Remotely Accessible Services (Printers, Desktops) Threads, IPC, Bounded Buffer Distributed Repositories (Files, Databases) Collaborative Applications (Games, Shared Desktops) Distributed Sensing (Disaster Prediction) Computation Adding Declarative abstractions for Distribution different classes of speedup (Number, Matrix algorithms Multiplication) 18 DECLARATIVE VS IMPERATIVE: COMPLEMENTING 4.8 4.8 Declarative: Specify what we want 5.2 4.5 5.2 floats[] floats = {4.8f, 5.2f, 4.5f}; Type declaration 4.5 Procedural, Imperative : Implement what we want Functional, O-O, … public static float sum(Float[] aList) { float retVal = (float) 0.0; for (int i = 0; i < aList.length; i++) { retVal += aList[i]; Loop } return retVal; } Locality relevant to PDC abstractions - later 19 DECLARATIVE VS IMPERATIVE: COMPETING Declarative:

Load more