Massively Parallel Artificial Intelligence

Total Page:16

File Type:pdf, Size:1020Kb

Massively Parallel Artificial Intelligence Massively Parallel Artificial Intelligence Hiroaki Kitano (Chairperson) Carnegie Mellon University NEC Corporation Pittsburgh, PA, 15213 Tokyo 108, Japan [email protected] James Hendler Tetsuya Higuchi University of Maryland, USA Electrotechnical Laboratory, Japan [email protected] [email protected] Dan Moldovan David Waltz University of Southern California, USA Thinking Machines Corporation, USA and [email protected] Brandeis University, USA [email protected] Abstract exploring the full potential of the massive parallelism offered on currently available machines. One of the causes of this Massively Parallel Artificial Intelligence is a new is that little communication has occurred between hardware and growing area of AI research, enabled by the architects and AI researchers. Hardware architects design emergence of massively parallel machines. It is a without actually recognizing the processing, memory, and new paradigm in AI research. A high degree of par­ performance requirements of AI algorithms. AI researchers allelism not only affects computing performance, have developed their theories and models assuming idealiza­ but also triggers drastic change in the approach to­ tions of massive parallelism. Further, with few exceptions, ward building intelligent systems; memory-based the AI community has often taken parallelism as a mere "im­ reasoning and parallel marker-passing are examples plementation detail" and has not yet come up with algorithms of new and redefined approaches. These new ap­ and applications which take full advantage of the massively proaches, fostered by massively parallel machines, parallelism available. offer a golden opportunity for AI in challenging the The panel intends to rectify this situation by inviting pan­ vastness and irregularities of real - world data that are elists knowledgeable and experienced in both hardware and encountered when a system accesses and processes application aspects of massively parallel computing in artifi- Very Large Data Bases and Knowledge Bases. This cial intelligence. There are two interrelated issues which will article describes the current status of massively par­ be addressed by the panel: (1) the design of massively parallel allel artificial intelligence research and positions of hardware for artificial intelligence, and (2) the potential ap­ each panelist. plications, algorithms and paradigms aimed at fully exploring the power of massively parallel computers for symbolic AL 1 Introduction 2 Current Research in Massively Parallel AI The goal of the panel is to highlight current accomplishments and future issues in the use of massively parallel machines for 2.1 Massively Parallel Machines artificial intelligence research, a field generally called mas• Currently, there are a few research projects involving the de­ sively parallel artificial intelligence. The importance of mas­ velopment of the massively parallel machines and a few com­ sively parallel artificial intelligence has been recognized in mercially available machines being used for symbolic AI. recent years due to three major reasons: Three projects of particular importance are: 1. increasing availability of massively parallel machines, ♦ The CM-2 Connection Machine (Thinking Machines 2. increasing interest in memory-based reasoning and other Corporation), highly-parallel AI approaches, ♦ The Semantic Network Array Processor (University of 3. development efforts on Very Large Knowledge Bases Southern California) (VLKB). ♦ The IXM2 Associative Memory Processor (Electrotech­ nical Laboratory, Japan). Despite wide recognition of massively parallel computing as an important aspect of high performance computing and These machines provide an extremely high-level of par­ general interest in the AI community on highly parallel pro­ allelism (8K - 256K) and promise even more in the future. cessing, only a small amount of attention has been paid to Table 1 shows the specification of these machines. Kitano, et al. 557 There are two major approaches to designing a massively attractive approach to AI on massively parallel machines due parallel machine: the Array Processor and the Associative to the memory-intensive and data-parallel nature of its opera­ Processor. CM-2 and SNAP arc examples of the array proces­ tion. Traditional AI work has been largely constrained by the sor architecture, and IXM2 is an example of the associative performance characteristics of serial machines. Thus for ex­ processor architecture. While the array processor architec­ ample, the memory efficiency and optimization of serial rule ture attains parallelism by the number of physical processors application has been regarded as a central issue in expert sys­ available, the associative processor attains parallelism by the tems design. However, massively parallel machines may take associative memory assigned to each processor. Thus, the away such constraints by the use of highly parallel operations parallelism attained by the associative processor architecture based on the idea of data-parallelism. The memory-based is beyond the number of processors in the machine, whereas reasoning fits perfectly with this idea. the array processor attains parallelism equal to the number of Another approach is marker-passing. In the Marker- actual processors. This is why the IXM2 attains 256K paral­ Passing approach, knowledge is stored in semantic net­ lelism with 64 processors. However, operations carried out works, and objects called markers are propagated, in par­ by associative memories are limited to bit-marker passing and allel, to perform the inference. Marker-passing is a powerful relatively simple arithmetic operations. When more complex method of performing inferencing on large semantic network operations are necessary, the parallelism will be equal to the knowledge-bases on massively parallel machines, due to the number of processors. high degree of parallelism that can be attained. One obvious Regarding the parallelism, the next version of the IXM2 application of this approach is the processing of Very Large (may be called IXM3) will aim at over one million paral­ Knowledge Bases (VLKB) such as MCC's CYC [Lcnat and lelism using up-to-data processors and high density associa­ Guha, 1989], EDR's electric dictionaries [EDR, 1988] (both tive memory chips. The SNAP project is planning to develop a of which are expected to require millions of network links), custom VLSI to attain a one million processor-scale machine. and ATR's dialogue database tEhara et. al., 1990]. It is clear DARPA (Defense Advanced Research Projects Agency) is that as KBs grow substantially large (over a million concepts) fundings project to attain TeraOps by 1995 [Waltz, 1990]. the complex (and often complete) searches used in many tra­ ditional inferencing systems will have to give way to heuristic 2.2 Massively Parallel Al Paradigm solutions unless a high degree of parallelism can be exploited. Thus, the use of massively parallel computers for VLKB pro­ In addition to designing new hardware architectures, the cessing is clearly warranted. strategies and perhaps even the paradigms for designing and Table 2 shows massively parallel Al systems developed so building AI systems may need to be changed in order to take far. The list is by no means exhaustive, only lists the major advantage of the full potential of massively parallel machines. systems. Also, there are many other models which match Emergence of massively parallel machines offers a new op­ well with massively parallel machines. But, we only list the portunity for AI in that large-scale DB/KB processing can be systems that are actually implemented on massively parallel made possible in real-time. Walu's talk at AAAI-90 [Waltz, machines. 1990] envisioned the challenge of massively parallel Al. Two of the major ideas that play central roles in massively parallel AI are; memory-based reasoning and marker-passing. Memory-based reasoning and case-based reasoning assume that memory is a foundation of intelligence. Systems based on this approach store a large number of memory instances of past cases, and modify them to provide solutions to new prob­ lems. Computationally, the memory-based reasoning is an 558 Panels 3 Associative Memory Architecture of two orders of magnitude in execu- tion time be­ tween Cray-XMP and CM-2 for 64K data, and a Tetsuya Higuchi difference of three orders between Cray-XMP and Electrotechnical Laboratory, Japan IXM2. In this talk, we consider the architectural requirements for Example 2. Marker Propagation: Marker propaga­ massively parallel AI applications, based on our experiences tion is intensively used in processing is-a hierarchy of developing a parallel associative processor IXM2 and ap­ know- ledge base. It actually traverses links of the plications for IXM2, In addition, we introduce the current network structured data. A marker propagation pro- status of the Electric Dictionary Research project in Japan gram written in C, which uses recursive procedure which is a real example of a very large knowledge base con­ call for traversing links, was run both on SUN-4 and taining 400,000 concepts. Cray-XMP. In spite of the exactly same program, We have developed a parallel associative processor IXM2 Cray was slightly slower than SUN-4. The main which enables 256K parallel operations using a large associa­ reasons for this are that the overhead of recursive tive memory. IXM2 consists of 64 associative processors with procedure calls is heavy, and that network struc­ 256K word large associative
Recommended publications
  • 2.5 Classification of Parallel Computers
    52 // Architectures 2.5 Classification of Parallel Computers 2.5 Classification of Parallel Computers 2.5.1 Granularity In parallel computing, granularity means the amount of computation in relation to communication or synchronisation Periods of computation are typically separated from periods of communication by synchronization events. • fine level (same operations with different data) ◦ vector processors ◦ instruction level parallelism ◦ fine-grain parallelism: – Relatively small amounts of computational work are done between communication events – Low computation to communication ratio – Facilitates load balancing 53 // Architectures 2.5 Classification of Parallel Computers – Implies high communication overhead and less opportunity for per- formance enhancement – If granularity is too fine it is possible that the overhead required for communications and synchronization between tasks takes longer than the computation. • operation level (different operations simultaneously) • problem level (independent subtasks) ◦ coarse-grain parallelism: – Relatively large amounts of computational work are done between communication/synchronization events – High computation to communication ratio – Implies more opportunity for performance increase – Harder to load balance efficiently 54 // Architectures 2.5 Classification of Parallel Computers 2.5.2 Hardware: Pipelining (was used in supercomputers, e.g. Cray-1) In N elements in pipeline and for 8 element L clock cycles =) for calculation it would take L + N cycles; without pipeline L ∗ N cycles Example of good code for pipelineing: §doi =1 ,k ¤ z ( i ) =x ( i ) +y ( i ) end do ¦ 55 // Architectures 2.5 Classification of Parallel Computers Vector processors, fast vector operations (operations on arrays). Previous example good also for vector processor (vector addition) , but, e.g. recursion – hard to optimise for vector processors Example: IntelMMX – simple vector processor.
    [Show full text]
  • A Massively-Parallel Mixed-Mode Computer Designed to Support
    This paper appeared in th International Parallel Processing Symposium Proc of nd Work shop on Heterogeneous Processing pages NewportBeach CA April Triton A MassivelyParallel MixedMo de Computer Designed to Supp ort High Level Languages Christian G Herter Thomas M Warschko Walter F Tichy and Michael Philippsen University of Karlsruhe Dept of Informatics Postfach D Karlsruhe Germany Mo dula Abstract Mo dula pronounced Mo dulastar is a small ex We present the architectureofTriton a scalable tension of Mo dula for massively parallel program mixedmode SIMDMIMD paral lel computer The ming The programming mo del of Mo dula incor novel features of Triton are p orates b oth data and control parallelism and allows hronous and asynchronous execution mixed sync Support for highlevel machineindependent pro Mo dula is problemorientedinthesensethatthe gramming languages programmer can cho ose the degree of parallelism and mix the control mo de SIMD or MIMDlike as need Fast SIMDMIMD mode switching ed bytheintended algorithm Parallelism maybe nested to arbitrary depth Pro cedures may b e called Special hardware for barrier synchronization of from sequential or parallel contexts and can them multiple process groups selves generate parallel activity without any restric tions Most Mo dula programs can b e translated into ecient co de for b oth SIMD and MIMD archi A selfrouting deadlockfreeperfect shue inter tectures connect with latency hiding Overview of language extensions The architecture is the outcomeofanintegrated de Mo dula extends Mo dula
    [Show full text]
  • Massively Parallel Computing with CUDA
    Massively Parallel Computing with CUDA Antonino Tumeo Politecnico di Milano 1 GPUs have evolved to the point where many real world applications are easily implemented on them and run significantly faster than on multi-core systems. Future computing architectures will be hybrid systems with parallel-core GPUs working in tandem with multi-core CPUs. Jack Dongarra Professor, University of Tennessee; Author of “Linpack” Why Use the GPU? • The GPU has evolved into a very flexible and powerful processor: • It’s programmable using high-level languages • It supports 32-bit and 64-bit floating point IEEE-754 precision • It offers lots of GFLOPS: • GPU in every PC and workstation What is behind such an Evolution? • The GPU is specialized for compute-intensive, highly parallel computation (exactly what graphics rendering is about) • So, more transistors can be devoted to data processing rather than data caching and flow control ALU ALU Control ALU ALU Cache DRAM DRAM CPU GPU • The fast-growing video game industry exerts strong economic pressure that forces constant innovation GPUs • Each NVIDIA GPU has 240 parallel cores NVIDIA GPU • Within each core 1.4 Billion Transistors • Floating point unit • Logic unit (add, sub, mul, madd) • Move, compare unit • Branch unit • Cores managed by thread manager • Thread manager can spawn and manage 12,000+ threads per core 1 Teraflop of processing power • Zero overhead thread switching Heterogeneous Computing Domains Graphics Massive Data GPU Parallelism (Parallel Computing) Instruction CPU Level (Sequential
    [Show full text]
  • CS 677: Parallel Programming for Many-Core Processors Lecture 1
    1 CS 677: Parallel Programming for Many-core Processors Lecture 1 Instructor: Philippos Mordohai Webpage: mordohai.github.io E-mail: [email protected] Objectives • Learn how to program massively parallel processors and achieve – High performance – Functionality and maintainability – Scalability across future generations • Acquire technical knowledge required to achieve the above goals – Principles and patterns of parallel programming – Processor architecture features and constraints – Programming API, tools and techniques 2 Important Points • This is an elective course. You chose to be here. • Expect to work and to be challenged. • If your programming background is weak, you will probably suffer. • This course will evolve to follow the rapid pace of progress in GPU programming. It is bound to always be a little behind… 3 Important Points II • At any point ask me WHY? • You can ask me anything about the course in class, during a break, in my office, by email. – If you think a homework is taking too long or is wrong. – If you can’t decide on a project. 4 Logistics • Class webpage: http://mordohai.github.io/classes/cs677_s20.html • Office hours: Tuesdays 5-6pm and by email • Evaluation: – Homework assignments (40%) – Quizzes (10%) – Midterm (15%) – Final project (35%) 5 Project • Pick topic BEFORE middle of the semester • I will suggest ideas and datasets, if you can’t decide • Deliverables: – Project proposal – Presentation in class – Poster in CS department event – Final report (around 8 pages) 6 Project Examples • k-means • Perceptron • Boosting – General – Face detector (group of 2) • Mean Shift • Normal estimation for 3D point clouds 7 More Ideas • Look for parallelizable problems in: – Image processing – Cryptanalysis – Graphics • GPU Gems – Nearest neighbor search 8 Even More… • Particle simulations • Financial analysis • MCMC • Games/puzzles 9 Resources • Textbook – Kirk & Hwu.
    [Show full text]
  • Toward Memory-Based Reasoning
    SPECIAL ISSUE Articles TOWARD MEMORY-BASED REASONING The intensive use of memory to recall specific episodes from the past­ rather than rules-should be the foundation of machine reasoning. CRAIG STANFILL and DAVID WALTZ The traditional assumption in artificial intelligence reasoning warrants further experimentation. Two (AI) is that most expert knowledge ·is encoded in the bodies of evidence have led us to support this form of rules. We consider the phenomenon of rea­ hypothesis. The first is the study of cognition: It is soning from memories of specific episodes, however, difficult to conceive of thought without memory. to be the foundation of an intelligent system, rather The second is the inability of AI to achieve success than an adj unct to some other reasoning method. in any broad domain or to effectively capture the This theory contrasts with much of the current notion of "common sense," much of which, we be­ work in similarity-based learning, which tacitly lieve, is based on essentially undigested memories of assumes that learning is equivalent to the automatic past experience. generation of rules, and differs from work on We will primarily discuss the application of our "explanation-based" and "case-based" reasoning in hypothesis to the limited case of similarity-based in­ that it does not depend on having a strong domain duction. The goal of similarity-based induction is to model. make decisions by looking for patterns in data. This With the development of new parallel architec­ .approach has been studied extensively in the con­ tures, specifically the Connection Machine@ system, text of "similarity-based learning," which uses ob­ the operations necessary to implement this approach served patterns in the data to create classification .
    [Show full text]
  • Core Processors
    UNIVERSITY OF CALIFORNIA Los Angeles Parallel Algorithms for Medical Informatics on Data-Parallel Many-Core Processors A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Computer Science by Maryam Moazeni 2013 © Copyright by Maryam Moazeni 2013 ABSTRACT OF THE DISSERTATION Parallel Algorithms for Medical Informatics on Data-Parallel Many-Core Processors by Maryam Moazeni Doctor of Philosophy in Computer Science University of California, Los Angeles, 2013 Professor Majid Sarrafzadeh, Chair The extensive use of medical monitoring devices has resulted in the generation of tremendous amounts of data. Storage, retrieval, and analysis of such data require platforms that can scale with data growth and adapt to the various behavior of the analysis and processing algorithms. In recent years, many-core processors and more specifically many-core Graphical Processing Units (GPUs) have become one of the most promising platforms for high performance processing of data, due to the massive parallel processing power they offer. However, many of the algorithms and data structures used in medical and bioinformatics systems do not follow a data-parallel programming paradigm, and hence cannot fully benefit from the parallel processing power of ii data-parallel many-core architectures. In this dissertation, we present three techniques to adapt several non-data parallel applications in different dwarfs to modern many-core GPUs. First, we present a load balancing technique to maximize parallelism in non-serial polyadic Dynamic Programming (DP), which is a family of dynamic programming algorithms with more non-uniform data access pattern. We show that a bottom-up approach to solving the DP problem exploits more parallelism and therefore yields higher performance.
    [Show full text]
  • Origins of the American Association for Artificial Intelligence
    AI Magazine Volume 26 Number 4 (2006)(2005) (© AAAI) 25th Anniversary Issue The Origins of the American Association for Artificial Intelligence (AAAI) Raj Reddy ■ This article provides a historical background on how AAAI came into existence. It provides a ratio- nale for why we needed our own society. It pro- vides a list of the founding members of the com- munity that came together to establish AAAI. Starting a new society comes with a whole range of issues and problems: What will it be called? How will it be financed? Who will run the society? What kind of activities will it engage in? and so on. This article provides a brief description of the consider- ations that went into making the final choices. It also provides a description of the historic first AAAI conference and the people that made it happen. The Background and the Context hile the 1950s and 1960s were an ac- tive period for research in AI, there Wwere no organized mechanisms for the members of the community to get together and share ideas and accomplishments. By the early 1960s there were several active research groups in AI, including those at Carnegie Mel- lon University (CMU), the Massachusetts Insti- tute of Technology (MIT), Stanford University, Stanford Research Institute (later SRI Interna- tional), and a little later the University of Southern California Information Sciences Insti- tute (USC-ISI). My own involvement in AI began in 1963, when I joined Stanford as a graduate student working with John McCarthy. After completing my Ph.D. in 1966, I joined the faculty at Stan- ford as an assistant professor and stayed there until 1969 when I left to join Allen Newell and Herb Simon at Carnegie Mellon University Raj Reddy.
    [Show full text]
  • Massively Parallel Computers: Why Not Prirallel Computers for the Masses?
    Maslsively Parallel Computers: Why Not Prwallel Computers for the Masses? Gordon Bell Abstract In 1989 I described the situation in high performance computers including several parallel architectures that During the 1980s the computers engineering and could deliver teraflop power by 1995, but with no price science research community generally ignored parallel constraint. I felt SIMDs and multicomputers could processing. With a focus on high performance computing achieve this goal. A shared memory multiprocessor embodied in the massive 1990s High Performance looked infeasible then. Traditional, multiple vector Computing and Communications (HPCC) program that processor supercomputers such as Crays would simply not has the short-term, teraflop peak performanc~:goal using a evolve to a teraflop until 2000. Here's what happened. network of thousands of computers, everyone with even passing interest in parallelism is involted with the 1. During the first half of 1992, NEC's four processor massive parallelism "gold rush". Funding-wise the SX3 is the fastest computer, delivering 90% of its situation is bright; applications-wise massit e parallelism peak 22 Glops for the Linpeak benchmark, and Cray's is microscopic. While there are several programming 16 processor YMP C90 has the greatest throughput. models, the mainline is data parallel Fortran. However, new algorithms are required, negating th~: decades of 2. The SIMD hardware approach of Thinking Machines progress in algorithms. Thus, utility will no doubt be the was abandoned because it was only suitable for a few, Achilles Heal of massive parallelism. very large scale problems, barely multiprogrammed, and uneconomical for workloads. It's unclear whether The Teraflop: 1992 large SIMDs are "generation" scalable, and they are clearly not "size" scalable.
    [Show full text]
  • Introduction to Parallel Computing
    INTRODUCTION TO PARALLEL COMPUTING Plamen Krastev Office: 38 Oxford, Room 117 Email: [email protected] FAS Research Computing Harvard University OBJECTIVES: To introduce you to the basic concepts and ideas in parallel computing To familiarize you with the major programming models in parallel computing To provide you with with guidance for designing efficient parallel programs 2 OUTLINE: Introduction to Parallel Computing / High Performance Computing (HPC) Concepts and terminology Parallel programming models Parallelizing your programs Parallel examples 3 What is High Performance Computing? Pravetz 82 and 8M, Bulgarian Apple clones Image credit: flickr 4 What is High Performance Computing? Pravetz 82 and 8M, Bulgarian Apple clones Image credit: flickr 4 What is High Performance Computing? Odyssey supercomputer is the major computational resource of FAS RC: • 2,140 nodes / 60,000 cores • 14 petabytes of storage 5 What is High Performance Computing? Odyssey supercomputer is the major computational resource of FAS RC: • 2,140 nodes / 60,000 cores • 14 petabytes of storage Using the world’s fastest and largest computers to solve large and complex problems. 5 Serial Computation: Traditionally software has been written for serial computations: To be run on a single computer having a single Central Processing Unit (CPU) A problem is broken into a discrete set of instructions Instructions are executed one after another Only one instruction can be executed at any moment in time 6 Parallel Computing: In the simplest sense, parallel
    [Show full text]
  • Massively Parallel Processor Architectures for Resource-Aware Computing
    Massively Parallel Processor Architectures for Resource-aware Computing Vahid Lari, Alexandru Tanase, Frank Hannig, and J¨urgen Teich Hardware/Software Co-Design, Department of Computer Science Friedrich-Alexander University Erlangen-N¨urnberg (FAU), Germany {vahid.lari, alexandru-petru.tanase, hannig, teich}@cs.fau.de Abstract—We present a class of massively parallel processor severe, when considering massively parallel architectures, with architectures called invasive tightly coupled processor arrays hundreds to thousands of resources, all working with real-time (TCPAs). The presented processor class is a highly parame- requirements. terizable template, which can be tailored before runtime to fulfill costumers’ requirements such as performance, area cost, As a remedy, we present a domain-specific class of mas- and energy efficiency. These programmable accelerators are well sively parallel processor architectures called invasive tightly suited for domain-specific computing from the areas of signal, coupled processor arrays (TCPA), which offer built-in and image, and video processing as well as other streaming processing scalable resource management. The term “invasive” stems from applications. To overcome future scaling issues (e. g., power con- sumption, reliability, resource management, as well as application a novel paradigm called invasive computing [3], for designing parallelization and mapping), TCPAs are inherently designed and programming future massively parallel computing systems in a way to support self-adaptivity and resource awareness at (e.g., heterogeneous MPSoCs). The main idea and novelty of hardware level. Here, we follow a recently introduced resource- invasive computing is to introduce resource-aware program- aware parallel computing paradigm called invasive computing ming support in the sense that a given application gets the where an application can dynamically claim, execute, and release ability to explore and dynamically spread its computations resources.
    [Show full text]
  • Real-Time Network Traffic Simulation Methodology with a Massively Parallel Computing Architecture
    TRANSPORTATION RESEA RCH RECORD 1358 13 Advanced Traffic Management System: Real-Time Network Traffic Simulation Methodology with a Massively Parallel Computing Architecture THANAVAT }UNCHAYA, GANG-LEN CHANG, AND ALBERTO SANTIAGO The advent of parallel computing architectures presents an op­ of these ATMS developments would ensure optimal net­ portunity fo r lra nspon a tion professionals 1 imulate a large- ca le workwise performance. tra ffi c network with sufficienily fa t response time for real-time The anticipated benefits of these control methods depend opcrarion. Howeve r, ii neccssira tc ·. a fundament al change in the on the complex interactions among principal traffic system modeling algorithm LO tuke full ad va ntage of parallel computing. • uch a methodology t imulare tra ffic n twork with the Con­ components. These systems include driver behavior, level of nection Machine, a massively parallel computer, is described . The congestion, dynamic nature of traffic patterns, and the net­ basic parallel computing architectures are introdu ed, along with work's geometric configuration. It is crucial to the design of a list of commercially available parall el comput ers. This is fol ­ these strategies that a comprehensive understanding of the lowed by an in-depth presentation of the proposed simulation complex interrelations between these key system components meth odology with a massively parallel computer. The propo ed traffic simulation model ha. an inherent path-proces ing capa­ be established. Because it is often difficult for theoretical bilit y to represent drivers ' roure choice behavior at the individual­ formulations to take all such complexities into account, traffic vehicle level.
    [Show full text]
  • Massively Parallel Message Passing: Time to Start from Scratch?
    Massively Parallel Message Passing: Time to Start from Scratch? Anthony Skjellum, PhD University of Alabama at Birmingham Department of Computer and Information Sciences College of Arts and Sciences Birmingham, AL, USA [email protected] Abstract primitive abstractions do not do enough to abstract how operations are performed compared to what is to be performed. Extensions Yes: New requirements have invalidated the use of MPI and to and attempts to standardize with distributed shared memory (cf, other similar library-type message passing systems at full scale on MPI-2 one-sided primitives [10]) have been included, but are systems of 100's of millions of cores. Uses on substantial subsets currently somewhat incompatible with PGAS languages and sys- of these systems should remain viable, subject to limitations. tems [15], and furthermore MPI isn't easy to mutually convert No: If the right set of implementation and application deci- with BSP-style notations or semantics either [16]. Connection sions are made, and runtime strategies put in place, existing MPI with publish-subscribe models [17] used in military / space-based notations can used within exascale systems at least to a degree. fault-tolerant message passing systems are absent (see also [18]). Maybe: Implications of instantiating “MPI worlds” as islands The hypothesis of this paper is that new requirements arising inside exascale systems cannot damage overall scalability too in exascale computation have at least partly invalidated the use of much, or they will inhibit exascalability, and thus fail. the parallel model expressed and implied by MPI and other simi- lar library-type message passing systems at full scale on systems Categories and Subject Descriptors F.1.2.
    [Show full text]