Fiber Optics Based Parallel Computer Architecture
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
IBM System Z Strengths and Values
Front cover IBM System z Strengths and Values Technical presentation of System z hardware and z/OS Enterprise-wide roles for the System z platform Cost of computing considerations Philippe Comte Andrea Corona James Guilianelli Douglas Lin Werner Meiner Michel Plouin Marita Prassolo Kristine Seigworth Eran Yona Linfeng Yu ibm.com/redbooks International Technical Support Organization IBM System z Strengths and Values January 2007 SG24-7333-00 Note: Before using this information and the product it supports, read the information in “Notices” on page ix. First Edition (January 2007) This edition applies to the IBM System z platform and IBM z/OS V1.8. © Copyright International Business Machines Corporation 2007. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents Notices . ix Trademarks . x Preface . xi The team that wrote this redbook. xi Become a published author . xiii Comments welcome. xiii Chapter 1. A business view . 1 1.1 Business drivers . 2 1.2 Impact on IT . 3 1.3 The System z platform . 7 1.3.1 Using System Z technology to reduce complexity . 7 1.3.2 Business integration and resiliency. 8 1.3.3 Managing the System z platform to meet business goals. 11 1.3.4 Security . 12 1.4 Summary . 13 Chapter 2. System z architecture and hardware platform . 15 2.1 History . 16 2.2 System z architecture . 17 2.2.1 Multiprogramming and multiprocessing . 18 2.2.2 The virtualization concept . 19 2.2.3 PR/SM and logical partitions . -
Designing an Ultra Low-Overhead Multithreading Runtime for Nim
Designing an ultra low-overhead multithreading runtime for Nim Mamy Ratsimbazafy Weave [email protected] https://github.com/mratsim/weave Hello! I am Mamy Ratsimbazafy During the day blockchain/Ethereum 2 developer (in Nim) During the night, deep learning and numerical computing developer (in Nim) and data scientist (in Python) You can contact me at [email protected] Github: mratsim Twitter: m_ratsim 2 Where did this talk came from? ◇ 3 years ago: started writing a tensor library in Nim. ◇ 2 threading APIs at the time: OpenMP and simple threadpool ◇ 1 year ago: complete refactoring of the internals 3 Agenda ◇ Understanding the design space ◇ Hardware and software multithreading: definitions and use-cases ◇ Parallel APIs ◇ Sources of overhead and runtime design ◇ Minimum viable runtime plan in a weekend 4 Understanding the 1 design space Concurrency vs parallelism, latency vs throughput Cooperative vs preemptive, IO vs CPU 5 Parallelism is not 6 concurrency Kernel threading 7 models 1:1 Threading 1 application thread -> 1 hardware thread N:1 Threading N application threads -> 1 hardware thread M:N Threading M application threads -> N hardware threads The same distinctions can be done at a multithreaded language or multithreading runtime level. 8 The problem How to schedule M tasks on N hardware threads? Latency vs 9 Throughput - Do we want to do all the work in a minimal amount of time? - Numerical computing - Machine learning - ... - Do we want to be fair? - Clients-server - Video decoding - ... Cooperative vs 10 Preemptive Cooperative multithreading: -
Fiber Context As a first-Class Object
Document number: P0876R6 Date: 2019-06-17 Author: Oliver Kowalke ([email protected]) Nat Goodspeed ([email protected]) Audience: SG1 fiber_context - fibers without scheduler Revision History . .1 abstract . .2 control transfer mechanism . .2 std::fiber_context as a first-class object . .3 encapsulating the stack . .4 invalidation at resumption . .4 problem: avoiding non-const global variables and undefined behaviour . .5 solution: avoiding non-const global variables and undefined behaviour . .6 inject function into suspended fiber . 11 passing data between fibers . 12 termination . 13 exceptions . 16 std::fiber_context as building block for higher-level frameworks . 16 interaction with STL algorithms . 18 possible implementation strategies . 19 fiber switch on architectures with register window . 20 how fast is a fiber switch . 20 interaction with accelerators . 20 multi-threading environment . 21 acknowledgments . 21 API ................................................................ 22 33.7 Cooperative User-Mode Threads . 22 33.7.1 General . 22 33.7.2 Empty vs. Non-Empty . 22 33.7.3 Explicit Fiber vs. Implicit Fiber . 22 33.7.4 Implicit Top-Level Function . 22 33.7.5 Header <experimental/fiber_context> synopsis . 23 33.7.6 Class fiber_context . 23 33.7.7 Function unwind_fiber() . 27 references . 29 Revision History This document supersedes P0876R5. Changes since P0876R5: • std::unwind_exception removed: stack unwinding must be performed by platform facilities. • fiber_context::can_resume_from_any_thread() renamed to can_resume_from_this_thread(). • fiber_context::valid() renamed to empty() with inverted sense. • Material has been added concerning the top-level wrapper logic governing each fiber. The change to unwinding fiber stacks using an anonymous foreign exception not catchable by C++ try / catch blocks is in response to deep discussions in Kona 2019 of the surprisingly numerous problems surfaced by using an ordinary C++ exception for that purpose. -
Fiber-Based Architecture for NFV Cloud Databases
Fiber-based Architecture for NFV Cloud Databases Vaidas Gasiunas, David Dominguez-Sal, Ralph Acker, Aharon Avitzur, Ilan Bronshtein, Rushan Chen, Eli Ginot, Norbert Martinez-Bazan, Michael Muller,¨ Alexander Nozdrin, Weijie Ou, Nir Pachter, Dima Sivov, Eliezer Levy Huawei Technologies, Gauss Database Labs, European Research Institute fvaidas.gasiunas,david.dominguez1,fi[email protected] ABSTRACT to a specific workload, as well as to change this alloca- The telco industry is gradually shifting from using mono- tion on demand. Elasticity is especially attractive to op- lithic software packages deployed on custom hardware to erators who are very conscious about Total Cost of Own- using modular virtualized software functions deployed on ership (TCO). NFV meets databases (DBs) in more than cloudified data centers using commodity hardware. This a single aspect [20, 30]. First, the NFV concepts of modu- transformation is referred to as Network Function Virtual- lar software functions accelerate the separation of function ization (NFV). The scalability of the databases (DBs) un- (business logic) and state (data) in the design and imple- derlying the virtual network functions is the cornerstone for mentation of network functions (NFs). This in turn enables reaping the benefits from the NFV transformation. This independent scalability of the DB component, and under- paper presents an industrial experience of applying shared- lines its importance in providing elastic state repositories nothing techniques in order to achieve the scalability of a DB for various NFs. In principle, the NFV-ready DBs that are in an NFV setup. The special combination of requirements relevant to our discussion are sophisticated main-memory in NFV DBs are not easily met with conventional execution key-value (KV) stores with stringent high availability (HA) models. -
8. IBM Z and Hybrid Cloud
The Centers for Medicare and Medicaid Services The role of the IBM Z® in Hybrid Cloud Architecture Paul Giangarra – IBM Distinguished Engineer December 2020 © IBM Corporation 2020 The Centers for Medicare and Medicaid Services The Role of IBM Z in Hybrid Cloud Architecture White Paper, December 2020 1. Foreword ............................................................................................................................................... 3 2. Executive Summary .............................................................................................................................. 4 3. Introduction ........................................................................................................................................... 7 4. IBM Z and NIST’s Five Essential Elements of Cloud Computing ..................................................... 10 5. IBM Z as a Cloud Computing Platform: Core Elements .................................................................... 12 5.1. The IBM Z for Cloud starts with Hardware .............................................................................. 13 5.2. Cross IBM Z Foundation Enables Enterprise Cloud Computing .............................................. 14 5.3. Capacity Provisioning and Capacity on Demand for Usage Metering and Chargeback (Infrastructure-as-a-Service) ................................................................................................................... 17 5.4. Multi-Tenancy and Security (Infrastructure-as-a-Service) ....................................................... -
A Thread Performance Comparison: Windows NT and Solaris on a Symmetric Multiprocessor
The following paper was originally published in the Proceedings of the 2nd USENIX Windows NT Symposium Seattle, Washington, August 3–4, 1998 A Thread Performance Comparison: Windows NT and Solaris on A Symmetric Multiprocessor Fabian Zabatta and Kevin Ying Brooklyn College and CUNY Graduate School For more information about USENIX Association contact: 1. Phone: 510 528-8649 2. FAX: 510 548-5738 3. Email: [email protected] 4. WWW URL:http://www.usenix.org/ A Thread Performance Comparison: Windows NT and Solaris on A Symmetric Multiprocessor Fabian Zabatta and Kevin Ying Logic Based Systems Lab Brooklyn College and CUNY Graduate School Computer Science Department 2900 Bedford Avenue Brooklyn, New York 11210 {fabian, kevin}@sci.brooklyn.cuny.edu Abstract Manufacturers now have the capability to build high performance multiprocessor machines with common CPU CPU CPU CPU cache cache cache cache PC components. This has created a new market of low cost multiprocessor machines. However, these machines are handicapped unless they have an oper- Bus ating system that can take advantage of their under- lying architectures. Presented is a comparison of Shared Memory I/O two such operating systems, Windows NT and So- laris. By focusing on their implementation of Figure 1: A basic symmetric multiprocessor architecture. threads, we show each system's ability to exploit multiprocessors. We report our results and inter- pretations of several experiments that were used to compare the performance of each system. What 1.1 Symmetric Multiprocessing emerges is a discussion on the performance impact of each implementation and its significance on vari- Symmetric multiprocessing (SMP) is the primary ous types of applications. -
Concurrent Cilk: Lazy Promotion from Tasks to Threads in C/C++
Concurrent Cilk: Lazy Promotion from Tasks to Threads in C/C++ Christopher S. Zakian, Timothy A. K. Zakian Abhishek Kulkarni, Buddhika Chamith, and Ryan R. Newton Indiana University - Bloomington, fczakian, tzakian, adkulkar, budkahaw, [email protected] Abstract. Library and language support for scheduling non-blocking tasks has greatly improved, as have lightweight (user) threading packages. How- ever, there is a significant gap between the two developments. In previous work|and in today's software packages|lightweight thread creation incurs much larger overheads than tasking libraries, even on tasks that end up never blocking. This limitation can be removed. To that end, we describe an extension to the Intel Cilk Plus runtime system, Concurrent Cilk, where tasks are lazily promoted to threads. Concurrent Cilk removes the overhead of thread creation on threads which end up calling no blocking operations, and is the first system to do so for C/C++ with legacy support (standard calling conventions and stack representations). We demonstrate that Concurrent Cilk adds negligible overhead to existing Cilk programs, while its promoted threads remain more efficient than OS threads in terms of context-switch overhead and blocking communication. Further, it enables development of blocking data structures that create non-fork-join dependence graphs|which can expose more parallelism, and better supports data-driven computations waiting on results from remote devices. 1 Introduction Both task-parallelism [1, 11, 13, 15] and lightweight threading [20] libraries have become popular for different kinds of applications. The key difference between a task and a thread is that threads may block|for example when performing IO|and then resume again. -
The Impact of Process Placement and Oversubscription on Application Performance: a Case Study for Exascale Computing
Takustraße 7 Konrad-Zuse-Zentrum D-14195 Berlin-Dahlem fur¨ Informationstechnik Berlin Germany FLORIAN WENDE,THOMAS STEINKE,ALEXANDER REINEFELD The Impact of Process Placement and Oversubscription on Application Performance: A Case Study for Exascale Computing ZIB-Report 15-05 (January 2015) Herausgegeben vom Konrad-Zuse-Zentrum fur¨ Informationstechnik Berlin Takustraße 7 D-14195 Berlin-Dahlem Telefon: 030-84185-0 Telefax: 030-84185-125 e-mail: [email protected] URL: http://www.zib.de ZIB-Report (Print) ISSN 1438-0064 ZIB-Report (Internet) ISSN 2192-7782 The Impact of Process Placement and Oversubscription on Application Performance: A Case Study for Exascale Computing Florian Wende, Thomas Steinke, Alexander Reinefeld Abstract: With the growing number of hardware components and the increasing software complexity in the upcoming exascale computers, system failures will become the norm rather than an exception for long-running applications. Fault-tolerance can be achieved by the creation of checkpoints dur- ing the execution of a parallel program. Checkpoint/Restart (C/R) mechanisms allow for both task migration (even if there were no hardware faults) and restarting of tasks after the occurrence of hard- ware faults. Affected tasks are then migrated to other nodes which may result in unfortunate process placement and/or oversubscription of compute resources. In this paper we analyze the impact of unfortunate process placement and oversubscription of compute resources on the performance and scalability of two typical HPC application workloads, CP2K and MOM5. Results are given for a Cray XC30/40 with Aries dragonfly topology. Our results indicate that unfortunate process placement has only little negative impact while oversubscription substantially degrades the performance. -
And Computer Systems I: Introduction Systems
Welcome to Computer Team! Noah Singer Introduction Welcome to Computer Team! Units Computer and Computer Systems I: Introduction Systems Computer Parts Noah Singer Montgomery Blair High School Computer Team September 14, 2017 Overview Welcome to Computer Team! Noah Singer Introduction 1 Introduction Units Computer Systems 2 Units Computer Parts 3 Computer Systems 4 Computer Parts Welcome to Computer Team! Noah Singer Introduction Units Computer Systems Computer Parts Section 1: Introduction Computer Team Welcome to Computer Team! Computer Team is... Noah Singer A place to hang out and eat lunch every Thursday Introduction A cool discussion group for computer science Units Computer An award-winning competition programming group Systems A place for people to learn from each other Computer Parts Open to people of all experience levels Computer Team isn’t... A programming club A highly competitive environment A rigorous course that requires commitment Structure Welcome to Computer Team! Noah Singer Introduction Units Four units, each with five to ten lectures Computer Each lecture comes with notes that you can take home! Systems Computer Guided activities to help you understand more Parts difficult/technical concepts Special presentations and guest lectures on interesting topics Conventions Welcome to Computer Team! Noah Singer Vocabulary terms are marked in bold. Introduction Definition Units Computer Definitions are in boxes (along with theorems, etc.). Systems Computer Parts Important statements are highlighted in italics. Formulas are written -
IBM Z Server Time Protocol Guide
Front cover Draft Document for Review August 3, 2020 1:37 pm SG24-8480-00 IBM Z Server Time Protocol Guide Octavian Lascu Franco Pinto Gatto Gobehi Hans-Peter Eckam Jeremy Koch Martin Söllig Sebastian Zimmermann Steve Guendert Redbooks Draft Document for Review August 3, 2020 7:26 pm 8480edno.fm IBM Redbooks IBM Z Server Time Protocol Guide August 2020 SG24-8480-00 8480edno.fm Draft Document for Review August 3, 2020 7:26 pm Note: Before using this information and the product it supports, read the information in “Notices” on page vii. First Edition (August 2020) This edition applies to IBM Server Time Protocol for IBM Z and covers IBM z15, IBM z14, and IBM z13 server generations. This document was created or updated on August 3, 2020. © Copyright International Business Machines Corporation 2020. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Draft Document for Review August 3, 2020 8:32 pm 8480TOC.fm Contents Notices . vii Trademarks . viii Preface . ix Authors. ix Comments welcome. .x Stay connected to IBM Redbooks . xi Chapter 1. Introduction to Server Time Protocol . 1 1.1 Introduction to time synchronization . 2 1.1.1 Insertion of leap seconds . 2 1.1.2 Time-of-Day (TOD) Clock . 3 1.1.3 Industry requirements . 4 1.1.4 Time synchronization in a Parallel Sysplex. 6 1.2 Overview of Server Time Protocol (STP) . 7 1.3 STP concepts and terminology . 9 1.3.1 STP facility . 9 1.3.2 TOD clock synchronization . -
Availability Digest
the Availability Digest Parallel Sysplex – Fault Tolerance from IBM April 2008 IBM’s Parallel Sysplex, HP’s NonStop server, and Stratus’ ftServer are today the primary industry fault-tolerant offerings that can tolerate any single failure, thus leading to very high levels of availability. The Stratus line of fault-tolerant computers is aimed at seamlessly protecting industry- standard servers running operating systems such as Windows, Unix, and Linux. As a result, ftServer does not compete with the other two systems, Parallel Sysplex and NonStop servers, which do compete instead in the large enterprise marketplace. In this article, we will explore IBM’s Parallel Sysplex and its features that address high availability. IBM’s Parallel Sysplex IBM’s Parallel Sysplex systems are multiprocessor clusters that can support from two to thirty-two mainframe nodes (typically S/390 or zSeries systems).1 A Parallel Sysplex system is nearly linearly scalable up to its 32-processor limit. A node may be a separate system or a logical partition (LPAR) within a system. The nodes do not have to be identical. They can be a mix of any servers that support the Parallel Sysplex environment. The nodes in a Parallel Sysplex system interact as an active/active architecture. The system allows direct, concurrent read/write access to shared data from all processing nodes without sacrificing data integrity. Furthermore, work requests associated with a single transaction or database query can be dynamically distributed for parallel execution based on available processor capacity of the nodes in the Parallel Sysplex cluster. Parallel Sysplex Architecture All nodes in a Parallel Sysplex cluster connect to a shared disk subsystem. -
IBM Z Connectivity Handbook
Front cover IBM Z Connectivity Handbook Octavian Lascu John Troy Anna Shugol Frank Packheiser Kazuhiro Nakajima Paul Schouten Hervey Kamga Jannie Houlbjerg Bo XU Redbooks IBM Redbooks IBM Z Connectivity Handbook August 2020 SG24-5444-20 Note: Before using this information and the product it supports, read the information in “Notices” on page vii. Twentyfirst Edition (August 2020) This edition applies to connectivity options available on the IBM z15 (M/T 8561), IBM z15 (M/T 8562), IBM z14 (M/T 3906), IBM z14 Model ZR1 (M/T 3907), IBM z13, and IBM z13s. © Copyright International Business Machines Corporation 2020. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents Notices . vii Trademarks . viii Preface . ix Authors. ix Now you can become a published author, too! . xi Comments welcome. xi Stay connected to IBM Redbooks . xi Chapter 1. Introduction. 1 1.1 I/O channel overview. 2 1.1.1 I/O hardware infrastructure . 2 1.1.2 I/O connectivity features . 3 1.2 FICON Express . 4 1.3 zHyperLink Express . 5 1.4 Open Systems Adapter-Express. 6 1.5 HiperSockets. 7 1.6 Parallel Sysplex and coupling links . 8 1.7 Shared Memory Communications. 9 1.8 I/O feature support . 10 1.9 Special-purpose feature support . 12 1.9.1 Crypto Express features . 12 1.9.2 Flash Express feature . 12 1.9.3 zEDC Express feature . 13 Chapter 2. Channel subsystem overview . 15 2.1 CSS description . 16 2.1.1 CSS elements .