1. Introduction
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Performance Modeling the Earth Simulator and ASCI Q
PAL CCS-3 Performance Modeling the Earth Simulator and ASCI Q Darren J. Kerbyson Adolfy Hoisie, Harvey J. Wasserman Performance and Architectures Laboratory (PAL) Los Alamos National Laboratory April 2003 Los Alamos PAL CCS-3 “26.58Tflops on AFES … 64.9% of peak (640nodes)” “14.9Tflops on Impact-3D …. 45% of peak (512nodes)” “10.5Tflops on PFES … 44% of peak (376nodes)” 40Tflops 20Tflops Los Alamos PAL Talk Overview CCS-3 G Overview of the Earth Simulator – A (quick) view of the architecture of the Earth Simulator (and Q) – A look at its performance characteristics G Application Centric Performance Models – Method of comparing performance is to use trusted models of applications that we are interested in, e.g. SAGE and Sweep3D. – Analytical / Parameterized in system & application characteristics G Models can be used to provide: – Predicted performance prior to availability (hardware or software) – Insight into performance – Performance Comparison (which is better?) G System Performance Comparison (Earth Simulator vs ASCI Q) Los Alamos PAL Earth Simulator: Overview CCS-3 ... ... RCU RCU . RCU AP AP AP Memory Memory Memory 0-7 0-7 0-7 ... Node0 Node1 Node639 640x640 ... crossbar G 640 Nodes (Vector Processors) G interconnected by a single stage cross-bar – Copper interconnect (~3,000Km wire) G NEC markets a product – SX-6 – Not the same as an Earth Simulator Node - similar but different memory sub-system Los Alamos PAL Earth Simulator Node CCS-3 Crossbar LAN network Disks AP AP AP AP AP AP AP AP 0 1 2 3 4 5 6 7 RCU IOP Node contains: 8 vector processors (AP) 16GByte memory Remote Control Unit (RCU) I/O Processor (IOP) 2 3 4 5 6 0 1 31 M M M M M M M . -
Project Aurora 2017 Vector Inheritance
Vector Coding a simple user’s perspective Rudolf Fischer NEC Deutschland GmbH Düsseldorf, Germany SIMD vs. vector Input Pipeline Result Scalar SIMD people call it “vector”! Vector SX 2 © NEC Corporation 2017 Data Parallelism ▌‘Vector Loop’, data parallel do i = 1, n Real, dimension(n): a,b,c a(i) = b(i) + c(i) … end do a = b + c ▌‘Scalar Loop’, not data parallel, ex. linear recursion do i = 2, n a(i) = a(i-1) + b(i) end do ▌Reduction? do i = 1, n, VL do i_ = i, min(n,i+VL-1) do i = 1, n s_(i_) = s_(i_) + v(i_)* w(i_) s = s + v(i)* w(i) end do end do end do s = reduction(s_) (hardware!) 3 © NEC Corporation 2017 Vector Coding Paradigm ▌‘Scalar’ thinking / coding: There is a (grid-point,particle,equation,element), what am I going to do with it? ▌‘Vector’ thinking / coding: There is a certain action or operation, to which (grid-points,particles,equations,elements) am I going to apply it simultaneously? 4 © NEC Corporation 2017 Identifying the data-parallel structure ▌ Very simple case: do j = 2, m-1 do i = 2, n-1 rho_new(i,j) = rho_old(i,j) + dt * something_with_fluxes(i+/-1,j+/-1) end do end do ▌ Does it tell us something? Partial differential equations Local theories Vectorization is very “natural” 5 © NEC Corporation 2017 Identifying the data-parallel subset ▌ Simple case: V------> do j = 1, m |+-----> do i = 2, n || a(i,j) = a(i-1,j) + b(i,j) |+----- end do V------ end do ▌ The compiler will vectorise along j ▌ non-unit-stride access, suboptimal ▌ Results for a certain case: Totally scalar (directives): 25.516ms Avoid outer vector loop: -
IBM 1401 System Summary
File No. 1401-00 Form A24-1401-1 Systems Reference Library IBM 1401 System Summary This reference publication contains brief descriptions of the machine features, components, configurations, and special features. Also included is a section on pro grams and programming systems. Publications providing detailed information on sub jects discussed in this summary are listed in IB~I 1401 and 1460 Bibliography, Form A24-1495. Major Revision (September 1964) This publication, Form A24-1401-1, is a major revision of and obsoletes Form A24-1401-0. Significant changes have been made throughout the publication. Reprinted April 1966 Copies of this and other IBM publications can be obtained through IBM Branch Offices. Address comments concerning the content of this publication to IBM Product Publications, Endicott, New York 13764. Contents IBM 1401 System Summary . ........... 5 System Concepts . ................ 6 Card-Oriented System .... ......... 11 Physical Features. 11 Interleaving. .. .................................... 14 Data Flow.... ... ... ... ... .. ... ... .. ................... 14 Checking ................................................... 15 Word Mark.. ... ... ... ... ... ... .. ... ... ... ........... 15 Stored-Program Instructions. .................. 15 Operation Codes . .. 18 Editing. .. ............ 18 IBM 1401 Console ............................................ 19 IBM 1406 Storage Unit. ........................... 20 Magnetic-Tape-Oriented System . ........................... 22 Data Flow ................................................. -
2020 Global High-Performance Computing Product Leadership Award
2020 GLOBAL HIGH-PERFORMANCE COMPUTING PRODUCT LEADERSHIP AWARD Strategic Imperatives Frost & Sullivan identifies three key strategic imperatives that impact the automation industry: internal challenges, disruptive technologies, and innovative business models. Every company that is competing in the automation space is obligated to address these imperatives proactively; failing to do so will almost certainly lead to stagnation or decline. Successful companies overcome the challenges posed by these imperatives and leverage them to drive innovation and growth. Frost & Sullivan’s recognition of NEC is a reflection of how well it is performing against the backdrop of these imperatives. Best Practices Criteria for World-Class Performance Frost & Sullivan applies a rigorous analytical process to evaluate multiple nominees for each award category before determining the final award recipient. The process involves a detailed evaluation of best practices criteria across two dimensions for each nominated company. NEC excels in many of the criteria in the high-performance computing (HPC) space. About NEC Established in 1899, NEC is a global IT, network, and infrastructure solution provider with a comprehensive product portfolio across computing, data storage, embedded systems, integrated IT infrastructure, network products, software, and unified communications. Headquartered in Tokyo, Japan, NEC has been at the forefront of accelerating the industrial revolution of the 20th and 21st © Frost & Sullivan 2021 The Growth Pipeline Company™ centuries by leveraging its technical knowhow and product expertise across thirteen different industries1 in industrial and energy markets. Deeply committed to the vision of orchestrating a better world. NEC envisions a future that embodies the values of safety, security, fairness, and efficiency, thus creating long-lasting social value. -
Hardware Technology of the Earth Simulator 1
Special Issue on High Performance Computing 27 Architecture and Hardware for HPC 1 Hardware Technology of the Earth Simulator 1 By Jun INASAKA,* Rikikazu IKEDA,* Kazuhiko UMEZAWA,* 5 Ko YOSHIKAWA,* Shitaka YAMADA† and Shigemune KITAWAKI‡ 5 1-1 This paper describes the hardware technologies of the supercomputer system “The Earth Simula- 1-2 ABSTRACT tor,” which has the highest performance in the world and was developed by ESRDC (the Earth 1-3 & 2-1 Simulator Research and Development Center)/NEC. The Earth Simulator has adopted NEC’s leading edge 2-2 10 technologies such as most advanced device technology and process technology to develop a high-speed and high-10 2-3 & 3-1 integrated LSI resulting in a one-chip vector processor. By combining this LSI technology with various advanced hardware technologies including high-density packaging, high-efficiency cooling technology against high heat dissipation and high-density cabling technology for the internal node shared memory system, ESRDC/ NEC has been successful in implementing the supercomputer system with its Linpack benchmark performance 15 of 35.86TFLOPS, which is the world’s highest performance (http://www.top500.org/ : the 20th TOP500 list of the15 world’s fastest supercomputers). KEYWORDS The Earth Simulator, Supercomputer, CMOS, LSI, Memory, Packaging, Build-up Printed Wiring Board (PWB), Connector, Cooling, Cable, Power supply 20 20 1. INTRODUCTION (Main Memory Unit) package are all interconnected with the fine coaxial cables to minimize the distances The Earth Simulator has adopted NEC’s most ad- between them and maximize the system packaging vanced CMOS technologies to integrate vector and density corresponding to the high-performance sys- 25 25 parallel processing functions for realizing a super- tem. -
CERN LIBRARIES, GENEVA CERN/SPC/220 28 February, 1966
CERN LIBRARIES, GENEVA CERN/SPC/220 28 February, 1966 CM-P00095068 ORGANISATION EUROPÉENNE POUR LA RECHERCHE NUCLÉAIRE CERN EUROPEAN ORGANIZATION FOR NUCLEAR RESEARCH SCIENTIFIC POLICY COMMITTEE Thirty-eighth Meeting 8 March, 1966 EXTENSIONS TO THE CERN COMPUTING FACILITIES by Dr. M.G.N. Hine The attached paper contains a review of CERN's needs from its central computing facilities, both in the near future and as far as 1969-1970. It describes possible ways in which these needs might be met within the budget forecasts, and suggests an optimum way to proceed. The scientific policy is to be discussed at the Scientific Policy Committee meeting on 8th March. If the policy line is cleared at that meeting, the Finance Committee could start discussion of the contractual implications at its meeting on 10th March. The Finance Committee would not be pressed to give final decisions at that meeting about contract adjudications, but those decisions will be needed in the near future. 66/348/5 CERN/SPC/220 EXTENSIONS TO THE CERN COMPUTING FACILITIES 1. Introduction The need to continue the development of computing facilities at CERN in order to keep pace with the rapidly growing use of com• puting and digital techniques in high-energy physics has been ex• plained on several occasions to the Scientific Policy Committee and the Finance Committee, and money has been reserved in the future budget forecasts for this purpose. A review of our needs has been made recently in the light of a year's experience of use of the CDC 6600 and associated equipment, and this paper describes the conclusions and the policy we propose to follow. -
An Experimental Time-Sharing System
335 AN EXPERIMENTAL TIME-SHARING SYSTEM Fernando J. Corbato, Marjorie Merwin-Daggett, Robert C. Daley Computer Center, Massachusetts Institute of Technology Cambridge, Massachusetts summary It is the purpose of this paper to discuss The motivation for time-shared computer briefly the need for time-sharing, some of the usage arises out of the slow man-computer inter- implementation problems, an experimental time- action rate presently possible with the bigger, sharing system which has been developed for the more advanced computers. This rate has changed contemporary IBM 7090, and finally a scheduling little (and has become worse in some cases) in algorithm of one of us (FJC) that illustrates the last decade of widespread computer use. lO some of the techniques which may be employed to enhance and be analyzed for the performance In part, this effect has been due to the limits of such a time-sharing system. fact that as elementary problems become mast- ered on the computer, more complex problems immediately become of interest. As a result, Introduction larger and more complicated programs are written to take advantage of larger and faster computers. The last dozen years of computer usage have This process inevitably leads to more programm- seen great strides. In the early 1950's, the ing errors and a longer period of time required problems solved were largely in the construction for debugging. Using current batch monitor and maintenance of hardware; in the mid-1950's, techniques, as is done on most large computers, the usage languages were greatly improved with each program bug usually requires several hours the advent of compilers; now in the early 1960's, to eliminate, if not a complete day. -
Systems Reference Library IBM 7090-7040 Direct
File No. 7090-36 Form C28-6383-2 Systems Reference Library IBM 7090-7040 Direct Couple Operating System Systems Programmer's Guide This publication contains information useful to programmers who require a thorough understanding of the IBM 7090-7040 Direct Couple Operating System (DCOS), #7090-PR-161. This system consists of the following parts: 7090/7094 Operating System (IBSYS) 7040/7044 Control Program (DCMUP) The publication provides descriptions of general DCOS program logic, the format of the DCOS System Library, the Operating System Monitor (DC-IBSYS), and the lnput/Output Executor (DC-IOEX). It also includes instructions for system library preparation and maintenance. Separate publications contain descriptions of the compo nents of the 7090/7094 Operating System (IBSYS). Other related publications contain information needed by applica tions programmers and instructions for machine operators. Form C28-6383-2 Page Revised 6/11/65 By TNL N28-0158-0 PREFACE This publication is primarily intended IBM 7040/7044 Principles of Operation, for systems programmers who are responsible Form A22-6649 for the maintenance of the 7090-7040 Direct Couple Operating System (DCOS} and any modifications to it at the installation. IBM 7090 Principles of Operation, Form It may also be of interest to individuals A22-6528 who desire a more thorough understanding of the system. IBM 7094 principles of Operation, Form A22-6703 The reader of this publication is Directly Coupled Processing Units--7040 assumed to be familiar with the contents of to 7090/7094; 7040 to 7094/7094 II, Form the publication IBM 7090-7040 Direct Couple A22-6803 Operating System: Programmer's Guide, Form C28-6382. -
Computing at CERN Units
Inside the computer room where the CDC 6600 has served as CERN's main computer. To the left and in the background are magnetic tape Computing at CERN units. In the foreground is the console where the operators are in communication with the A review of the past, present and future of computer's operating system. On the displays computing at CERN concentrating on the large they can see which programs are running central computers which bear the brunt at that time and also the various stage which of CERN's workload. the programs going through the computer have reached. D. Ball spared a painful running-in period. The Past The growth in demand for com The FORTRAN era came with it. The puting at CERN continued unabated, Computing at CERN started in the problem of the mismatch of me amounting to a doubling each year, Autumn of 1958 with the installation of chanical input/output speed to that and this situation, plus the increasing a Ferranti Mercury, which was, for its of electronic computation soon be importance of computers in the work time, a modern and powerful Euro came apparent and a small ' satellite ' of the Laboratory, necessitated a jump pean-built computer. It was bought to computer, an IBM 1401, was installed in computing capacity preferably of provide data-handling facilities for to relieve the 709 of some of the the order of a factor of ten. The only CERN's physics programme. Its in drudgery. In 1962 a flying spot digit computer which could give this in stallation taught CERN that computers izer to measure bubble chamber film crease was the Control Data Corpo are a mixed blessing, requiring a staff was connected to the IBM 709 — the ration 6600 and, in March 1964, CERN of experts to nurse them along parti first use of computers on-line at placed its order. -
F.T. Mcclure Computing Center Development
ROBERT P. RICH F. T. McCLURE COMPUTING CENTER DEVELOPMENT The development of APL's F. T. McClure Comput Acquisition of the IBM 7090/1401 combination in ing Center has been guided by two complementary 1959 (Fig. 3) provided the Fortran language to the needs: computing power (capacity) and specific capa APL computing community, with access to programs bilities. written by a much larger community of Fortran users. Computing power is a measure of how much com Although still restricted to batch, several jobs could puting can be done per unit time, regardless of what be submitted to the system at one time, in both assem is being computed or the mode of man-machine in bly language and in Fortran. The existence of sepa teraction. It is measured in operations per second and rate channels made it possible for computation to has grown, since 1966, at an average annual compound proceed in parallel with input/output operations. Be rate of about 180/0. The growth has been impelled by ginnings were made in information retrieval and text processing during that period, and the Laboratory's 1. Growth in the staff during the early years; financial work was shifted to a separate IBM 1401. 2. The change in work habits of the staff to take The system grew to an IBM 7094 in 1962 and then advantage of increasing computer capabilities; to two 7094s in 1963. The 1401 was replaced by an IBM 3. The shift in APL programs made possible by 7040 and an IBM 1410 in response to the mentioned those capabilities (Transit, Submarine Tech growth in required computing power. -
IBM 709 MANUFACTU RER IBM 709 Data Processing System International Business Machines Corporation
IBM 709 MANUFACTU RER IBM 709 Data Processing System International Business Machines Corporation Photo by International Business Machines Corporation at Point Mugu, California and one at Point Arguello, APPLICATIONS California. Land Air is the lessee, and our major Manufacturer committment is for missile test flight data reduction. This is a general purpose computer doing both scien In addition, we provide computing facilities for the tific computing and commercial work. The system is entire installation at Mugu (general scientific and scientifically oriented with fast internal speeds. engineering research and data processing). USA Ballistic Missile Agency Redstone Arsenal U.S.N. Pacific Missile Range Ft. Mugu Located at Computation Laboratory, Redstone Arsenal, Operated by Land Air, Inc. ALabama, the system is used for scientific and commer Located at the Naval Missile Faculty, Point Arguello, cial applications. California, the system is used on the main problem U. S. Army Electronic Proving Ground of range safety impact predicition in real time using Located in Greely Hall, Fort Huachuca, Arizona, sys FPS-l6 Radar and Cubic COTAR data. System is also tem is used in support of the tactical field army used for post flight trajectory reduction of FPS-l6 and the technical program of the departments of the radar data and for trajectory integration and analysis, U. S. Army Electronic Proving Ground. etc. U.S.N. Pacific Missile Range Ft. Mugu USN OTS China Lake, California Operated by Land Air, Inc. Located at the Data Computation Branch, Assessment Located at the Pacific Missile Range, Point Mugu, the DiVision, Test Department, the computer is used for system is used for the processing of missile test data reduction and scientific computation as related data (radar, optical, and telemetry), for real time to Naval Ordnance, Test, Development & Research applications, and for the solution of general mathe (l5% of computer time devoted to management data pro m.atical problems. -
Supercomputers – Prestige Objects Or Crucial Tools for Science and Industry?
Supercomputers – Prestige Objects or Crucial Tools for Science and Industry? Hans W. Meuer a 1, Horst Gietl b 2 a University of Mannheim & Prometeus GmbH, 68131 Mannheim, Germany; b Prometeus GmbH, 81245 Munich, Germany; This paper is the revised and extended version of the Lorraine King Memorial Lecture Hans Werner Meuer was invited by Lord Laird of Artigarvan to give at the House of Lords, London, on April 18, 2012. Keywords: TOP500, High Performance Computing, HPC, Supercomputing, HPC Technology, Supercomputer Market, Supercomputer Architecture, Supercomputer Applications, Supercomputer Technology, Supercomputer Performance, Supercomputer Future. 1 e-mail: [email protected] 2 e-mail: [email protected] 1 Content 1 Introduction ..................................................................................................................................... 3 2 The TOP500 Supercomputer Project ............................................................................................... 3 2.1 The LINPACK Benchmark ......................................................................................................... 4 2.2 TOP500 Authors ...................................................................................................................... 4 2.3 The 39th TOP500 List since 1993 .............................................................................................. 5 2.4 The 39th TOP10 List since 1993 ...............................................................................................