What Is Parallel Computing?
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
2.5 Classification of Parallel Computers
52 // Architectures 2.5 Classification of Parallel Computers 2.5 Classification of Parallel Computers 2.5.1 Granularity In parallel computing, granularity means the amount of computation in relation to communication or synchronisation Periods of computation are typically separated from periods of communication by synchronization events. • fine level (same operations with different data) ◦ vector processors ◦ instruction level parallelism ◦ fine-grain parallelism: – Relatively small amounts of computational work are done between communication events – Low computation to communication ratio – Facilitates load balancing 53 // Architectures 2.5 Classification of Parallel Computers – Implies high communication overhead and less opportunity for per- formance enhancement – If granularity is too fine it is possible that the overhead required for communications and synchronization between tasks takes longer than the computation. • operation level (different operations simultaneously) • problem level (independent subtasks) ◦ coarse-grain parallelism: – Relatively large amounts of computational work are done between communication/synchronization events – High computation to communication ratio – Implies more opportunity for performance increase – Harder to load balance efficiently 54 // Architectures 2.5 Classification of Parallel Computers 2.5.2 Hardware: Pipelining (was used in supercomputers, e.g. Cray-1) In N elements in pipeline and for 8 element L clock cycles =) for calculation it would take L + N cycles; without pipeline L ∗ N cycles Example of good code for pipelineing: §doi =1 ,k ¤ z ( i ) =x ( i ) +y ( i ) end do ¦ 55 // Architectures 2.5 Classification of Parallel Computers Vector processors, fast vector operations (operations on arrays). Previous example good also for vector processor (vector addition) , but, e.g. recursion – hard to optimise for vector processors Example: IntelMMX – simple vector processor. -
Task Scheduling Balancing User Experience and Resource Utilization on Cloud
Rochester Institute of Technology RIT Scholar Works Theses 7-2019 Task Scheduling Balancing User Experience and Resource Utilization on Cloud Sultan Mira [email protected] Follow this and additional works at: https://scholarworks.rit.edu/theses Recommended Citation Mira, Sultan, "Task Scheduling Balancing User Experience and Resource Utilization on Cloud" (2019). Thesis. Rochester Institute of Technology. Accessed from This Thesis is brought to you for free and open access by RIT Scholar Works. It has been accepted for inclusion in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact [email protected]. Task Scheduling Balancing User Experience and Resource Utilization on Cloud by Sultan Mira A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in Software Engineering Supervised by Dr. Yi Wang Department of Software Engineering B. Thomas Golisano College of Computing and Information Sciences Rochester Institute of Technology Rochester, New York July 2019 ii The thesis “Task Scheduling Balancing User Experience and Resource Utilization on Cloud” by Sultan Mira has been examined and approved by the following Examination Committee: Dr. Yi Wang Assistant Professor Thesis Committee Chair Dr. Pradeep Murukannaiah Assistant Professor Dr. Christian Newman Assistant Professor iii Dedication I dedicate this work to my family, loved ones, and everyone who has helped me get to where I am. iv Abstract Task Scheduling Balancing User Experience and Resource Utilization on Cloud Sultan Mira Supervising Professor: Dr. Yi Wang Cloud computing has been gaining undeniable popularity over the last few years. Among many techniques enabling cloud computing, task scheduling plays a critical role in both efficient resource utilization for cloud service providers and providing an excellent user ex- perience to the clients. -
Chapter 5 Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism Copyright © 2012, Elsevier Inc. All rights reserved. 1 Contents 1. Introduction 2. Centralized SMA – shared memory architecture 3. Performance of SMA 4. DMA – distributed memory architecture 5. Synchronization 6. Models of Consistency Copyright © 2012, Elsevier Inc. All rights reserved. 2 1. Introduction. Why multiprocessors? Need for more computing power Data intensive applications Utility computing requires powerful processors Several ways to increase processor performance Increased clock rate limited ability Architectural ILP, CPI – increasingly more difficult Multi-processor, multi-core systems more feasible based on current technologies Advantages of multiprocessors and multi-core Replication rather than unique design. Copyright © 2012, Elsevier Inc. All rights reserved. 3 Introduction Multiprocessor types Symmetric multiprocessors (SMP) Share single memory with uniform memory access/latency (UMA) Small number of cores Distributed shared memory (DSM) Memory distributed among processors. Non-uniform memory access/latency (NUMA) Processors connected via direct (switched) and non-direct (multi- hop) interconnection networks Copyright © 2012, Elsevier Inc. All rights reserved. 4 Important ideas Technology drives the solutions. Multi-cores have altered the game!! Thread-level parallelism (TLP) vs ILP. Computing and communication deeply intertwined. Write serialization exploits broadcast communication -
Cloud Computing and Internet of Things: Issues and Developments
Proceedings of the World Congress on Engineering 2018 Vol I WCE 2018, July 4-6, 2018, London, U.K. Cloud Computing and Internet of Things: Issues and Developments Isaac Odun-Ayo, Member, IAENG, Chinonso Okereke, and Hope Orovwode Abstract—Cloud computing is a pervasive paradigm that is allows access to recent technologies and it enables growing by the day. Various service types are gaining increased enterprises to focus on core activities, instead programming importance. Internet of things is a technology that is and infrastructure. The services provided include Software- developing. It allows connectivity of both smart and dumb as-a-Service (SaaS), Platform-as–a–Service (PaaS) and systems over the internet. Cloud computing will continue to be Infrastructure–as–a–Services (IaaS). SaaS provides software relevant to IoT because of scalable services available on the cloud. Cloud computing is the need for users to procure servers, applications over the Internet and it is also known as web storage, and applications. These services can be paid for and service [2]. utilized using the various cloud service providers. Clearly, IoT Cloud users can access such applications anytime, which is expected to connect everything to everyone, requires anywhere either on their personal computers or on mobile not only connectivity but large storage that can be made systems. In PaaS, the cloud service provider makes it available either through on-premise or off-premise cloud possible for users to deploy applications using application facility. On the other hand, events in the cloud and IoT are programming interfaces (APIs), web portals or gateways dynamic. -
Resource Access Control Facility (RACF) General User's Guide
----- --- SC28-1341-3 - -. Resource Access Control Facility -------_.----- - - --- (RACF) General User's Guide I Production of this Book This book was prepared and formatted using the BookMaster® document markup language. Fourth Edition (December, 1988) This is a major revision of, and obsoletes, SC28-1341- 3and Technical Newsletter SN28-l2l8. See the Summary of Changes following the Edition Notice for a summary of the changes made to this manual. Technical changes or additions to the text and illustrations are indicated by a vertical line to the left of the change. This edition applies to Version 1 Releases 8.1 and 8.2 of the program product RACF (Resource Access Control Facility) Program Number 5740-XXH, and to all subsequent versions until otherwise indicated in new editions or Technical Newsletters. Changes are made periodically to the information herein; before using this publication in connection with the operation of IBM systems, consult the latest IBM Systemj370 Bibliography, GC20-0001, for the editions that are applicable and current. References in this publication to IBM products or services do not imply that IBM· intends to make these available in all countries in which IBM operates. Any reference to an IBM product in this publication is not intended to state or imply that only IBM's product may be used. Any functionally equivalent product may be used instead. This statement does not expressly or implicitly waive any intellectual property right IBM may hold in any product mentioned herein. Publications are not stocked at the address given below. Requests for IBM publications should be made to your IBM representative or to the IBM branch office serving your locality. -
Scalable and Distributed Deep Learning (DL): Co-Design MPI Runtimes and DL Frameworks
Scalable and Distributed Deep Learning (DL): Co-Design MPI Runtimes and DL Frameworks OSU Booth Talk (SC ’19) Ammar Ahmad Awan [email protected] Network Based Computing Laboratory Dept. of Computer Science and Engineering The Ohio State University Agenda • Introduction – Deep Learning Trends – CPUs and GPUs for Deep Learning – Message Passing Interface (MPI) • Research Challenges: Exploiting HPC for Deep Learning • Proposed Solutions • Conclusion Network Based Computing Laboratory OSU Booth - SC ‘18 High-Performance Deep Learning 2 Understanding the Deep Learning Resurgence • Deep Learning (DL) is a sub-set of Machine Learning (ML) – Perhaps, the most revolutionary subset! Deep Machine – Feature extraction vs. hand-crafted Learning Learning features AI Examples: • Deep Learning Examples: Logistic Regression – A renewed interest and a lot of hype! MLPs, DNNs, – Key success: Deep Neural Networks (DNNs) – Everything was there since the late 80s except the “computability of DNNs” Adopted from: http://www.deeplearningbook.org/contents/intro.html Network Based Computing Laboratory OSU Booth - SC ‘18 High-Performance Deep Learning 3 AlexNet Deep Learning in the Many-core Era 10000 8000 • Modern and efficient hardware enabled 6000 – Computability of DNNs – impossible in the 4000 ~500X in 5 years 2000 past! Minutesto Train – GPUs – at the core of DNN training 0 2 GTX 580 DGX-2 – CPUs – catching up fast • Availability of Datasets – MNIST, CIFAR10, ImageNet, and more… • Excellent Accuracy for many application areas – Vision, Machine Translation, and -
3.A Fixed-Priority Scheduling
2017/18 UniPD - T. Vardanega 10/03/2018 Standard notation : Worst-case blocking time for the task (if applicable) 3.a Fixed-Priority Scheduling : Worst-case computation time (WCET) of the task () : Relative deadline of the task : The interference time of the task : Release jitter of the task : Number of tasks in the system Credits to A. Burns and A. Wellings : Priority assigned to the task (if applicable) : Worst-case response time of the task : Minimum time between task releases, or task period () : The utilization of each task ( ⁄) a-Z: The name of a task 2017/18 UniPD – T. Vardanega Real-Time Systems 168 of 515 The simplest workload model Fixed-priority scheduling (FPS) The application is assumed to consist of tasks, for fixed At present, the most widely used approach in industry All tasks are periodic with known periods Each task has a fixed (static) priority determined off-line This defines the periodic workload model In real-time systems, the “priority” of a task is solely derived The tasks are completely independent of each other from its temporal requirements No contention for logical resources; no precedence constraints The task’s relative importance to the correct functioning of All tasks have a deadline equal to their period the system or its integrity is not a driving factor at this level A recent strand of research addresses mixed-criticality systems, with Each task must complete before it is next released scheduling solutions that contemplate criticality attributes All tasks have a single fixed WCET, which can be trusted as The ready tasks (jobs) are dispatched to execution in a safe and tight upper-bound the order determined by their (static) priority Operation modes are not considered Hence, in FPS, scheduling at run time is fully defined by the All system overheads (context-switch times, interrupt handling and so on) are assumed absorbed in the WCETs priority assignment algorithm 2017/18 UniPD – T. -
The CICS/MQ Interface
1 2 Each application program which will execute using MQ call statements require a few basic inclusions of generated code. The COPY shown are the one required by all applications using MQ functions. We have suppressed the list of the CMQV structure since it is over 20 pages of COBOL variable constants. The WS‐VARIABLES are the minimum necessary to support bot getting and putting messages from and to queues. The MQ‐CONN would normally be the handle acquired by an MQCONN function, but that call along with the MQDISC call are ignored by CICS / MQ interface. It is CICS itself that has performed the MQCONN to the queue manager. The MQ‐HOBJ‐I and MQ‐HOBJ‐O represent the handles, acquired on MQOPEN for a queue, used to specify which queue the other calls relate to. The MQ‐COMPCODE and MQ‐REASON are the variables which the application must test after every MQ call to determine its success. 3 This slide contains the message input and output areas used on the MQGET and MQPUT calls. Our messages are rather simple and limited in size. SO we have put the data areas in the WORKING‐STORAGE SECTION. Many times you will have these as copy books rather than native COBOL variables. Since the WORKING‐STORAGE area is being copied for each task using the program, we recommend that larger messages areas be put into the LINKAGE SECTION. Here we use a size of 200‐300 K as a suggestion. However, that size will vary depending upon your CICS environment. When the application is using message areas in the LINKAGE SECTION, it will be responsible to perform a CICS GETMAIN command to acquire the storage for the message area. -
Parallel Programming
Parallel Programming Parallel Programming Parallel Computing Hardware Shared memory: multiple cpus are attached to the BUS all processors share the same primary memory the same memory address on different CPU’s refer to the same memory location CPU-to-memory connection becomes a bottleneck: shared memory computers cannot scale very well Parallel Programming Parallel Computing Hardware Distributed memory: each processor has its own private memory computational tasks can only operate on local data infinite available memory through adding nodes requires more difficult programming Parallel Programming OpenMP versus MPI OpenMP (Open Multi-Processing): easy to use; loop-level parallelism non-loop-level parallelism is more difficult limited to shared memory computers cannot handle very large problems MPI(Message Passing Interface): require low-level programming; more difficult programming scalable cost/size can handle very large problems Parallel Programming MPI Distributed memory: Each processor can access only the instructions/data stored in its own memory. The machine has an interconnection network that supports passing messages between processors. A user specifies a number of concurrent processes when program begins. Every process executes the same program, though theflow of execution may depend on the processors unique ID number (e.g. “if (my id == 0) then ”). ··· Each process performs computations on its local variables, then communicates with other processes (repeat), to eventually achieve the computed result. In this model, processors pass messages both to send/receive information, and to synchronize with one another. Parallel Programming Introduction to MPI Communicators and Groups: MPI uses objects called communicators and groups to define which collection of processes may communicate with each other. -
Computer Hardware Architecture Lecture 4
Computer Hardware Architecture Lecture 4 Manfred Liebmann Technische Universit¨atM¨unchen Chair of Optimal Control Center for Mathematical Sciences, M17 [email protected] November 10, 2015 Manfred Liebmann November 10, 2015 Reading List • Pacheco - An Introduction to Parallel Programming (Chapter 1 - 2) { Introduction to computer hardware architecture from the parallel programming angle • Hennessy-Patterson - Computer Architecture - A Quantitative Approach { Reference book for computer hardware architecture All books are available on the Moodle platform! Computer Hardware Architecture 1 Manfred Liebmann November 10, 2015 UMA Architecture Figure 1: A uniform memory access (UMA) multicore system Access times to main memory is the same for all cores in the system! Computer Hardware Architecture 2 Manfred Liebmann November 10, 2015 NUMA Architecture Figure 2: A nonuniform memory access (UMA) multicore system Access times to main memory differs form core to core depending on the proximity of the main memory. This architecture is often used in dual and quad socket servers, due to improved memory bandwidth. Computer Hardware Architecture 3 Manfred Liebmann November 10, 2015 Cache Coherence Figure 3: A shared memory system with two cores and two caches What happens if the same data element z1 is manipulated in two different caches? The hardware enforces cache coherence, i.e. consistency between the caches. Expensive! Computer Hardware Architecture 4 Manfred Liebmann November 10, 2015 False Sharing The cache coherence protocol works on the granularity of a cache line. If two threads manipulate different element within a single cache line, the cache coherency protocol is activated to ensure consistency, even if every thread is only manipulating its own data. -
Massively Parallel Computing with CUDA
Massively Parallel Computing with CUDA Antonino Tumeo Politecnico di Milano 1 GPUs have evolved to the point where many real world applications are easily implemented on them and run significantly faster than on multi-core systems. Future computing architectures will be hybrid systems with parallel-core GPUs working in tandem with multi-core CPUs. Jack Dongarra Professor, University of Tennessee; Author of “Linpack” Why Use the GPU? • The GPU has evolved into a very flexible and powerful processor: • It’s programmable using high-level languages • It supports 32-bit and 64-bit floating point IEEE-754 precision • It offers lots of GFLOPS: • GPU in every PC and workstation What is behind such an Evolution? • The GPU is specialized for compute-intensive, highly parallel computation (exactly what graphics rendering is about) • So, more transistors can be devoted to data processing rather than data caching and flow control ALU ALU Control ALU ALU Cache DRAM DRAM CPU GPU • The fast-growing video game industry exerts strong economic pressure that forces constant innovation GPUs • Each NVIDIA GPU has 240 parallel cores NVIDIA GPU • Within each core 1.4 Billion Transistors • Floating point unit • Logic unit (add, sub, mul, madd) • Move, compare unit • Branch unit • Cores managed by thread manager • Thread manager can spawn and manage 12,000+ threads per core 1 Teraflop of processing power • Zero overhead thread switching Heterogeneous Computing Domains Graphics Massive Data GPU Parallelism (Parallel Computing) Instruction CPU Level (Sequential -
Overview of Cloud Storage Allan Liu, Ting Yu
Overview of Cloud Storage Allan Liu, Ting Yu To cite this version: Allan Liu, Ting Yu. Overview of Cloud Storage. International Journal of Scientific & Technology Research, 2018. hal-02889947 HAL Id: hal-02889947 https://hal.archives-ouvertes.fr/hal-02889947 Submitted on 6 Jul 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Overview of Cloud Storage Allan Liu, Ting Yu Department of Computer Science and Engineering Shanghai Jiao Tong University, Shanghai Abstract— Cloud computing is an emerging service and computing platform and it has taken commercial computing by storm. Through web services, the cloud computing platform provides easy access to the organization’s storage infrastructure and high-performance computing. Cloud computing is also an emerging business paradigm. Cloud computing provides the facility of huge scalability, high performance, reliability at very low cost compared to the dedicated storage systems. This article provides an introduction to cloud computing and cloud storage and different deployment modules. The general architecture of the cloud storage is also discussed along with its advantages and disadvantages for the organizations. Index Terms— Cloud Storage, Emerging Technology, Cloud Computing, Secure Storage, Cloud Storage Models —————————— u —————————— 1 INTRODUCTION n this era of technological advancements, Cloud computing Ihas played a very vital role in changing the way of storing 2 CLOUD STORAGE information and run applications.