1984, Instruction Issue Logic for Pipelined Supercomputers

Total Page:16

File Type:pdf, Size:1020Kb

1984, Instruction Issue Logic for Pipelined Supercomputers INSTRUCTION ISSUE LOGIC FOR PIPELINED SUPERCOMPUTERS Shlomo Weiss Computer Sciences Department University of Wisconsin-Madison Madison, WI 53706 James E. Smith Department of Electrical and Computer Engineering University of Wisconsin-Madison Madison, WI 53706 Abs~t 1.1. Tradeo~ Basic principles and design tradeoffs for control ol We begin with a discussion of design tradeoffs that pipelined processors are first discussed. We concentrate centers on four principle issues: on register-register architectures like the GRAY-1 where (1) clock period, pipeline control logic is localized to one or two pipeline stages and is referred to as "instruction issue logic". (2) instruction scheduling, Design tradeoffs are explored by giving designs for a (3) issue logic complexity, and variety of instruction issue methods that represent a (4) hardware cost, debugging, and maintenance. range of complexity and sophistication. These vary from the original CRAY-1 issue logic to a version of Tomasulo's Each of these issues will be discussed in turn. algorithm, first used in the IBM 360/91 floating point Clock Period. In a pipeltned computer, there are a unit. Also studied are Thornton's "scoreboard" algo- number of segments containing combinational logic with rithm used on the CDC 8600 and an algorithm we have latches separating successive segments. All the latches devised. To provide a standard for comparison, all the are synchronized by the same clock, and the pipeline is issue methods are used to implement the GRAY-1 scalar capable of initiating a new instruction every clock architecture. Then, using a simulation model and the period. Hence, under ideal conditions, i.e. no dependen- Lawrence Livermore Loops compiled with the GRAY FOR- cies or resource conflicts, pipeline performance is TRAN compiler, performance results for the various issue directly related to the period of the clock used to syn- methods are given and discussed. chronize the pipe. Even with data dependencies and resource conflicts, there is a high correlation between performance and clock period. Historically, pipelined supercomputers have had shorter clock periods than other computers. This is in part due to the use of the fastest available logic techno- 1. Introduction logies, but it is also due to designs that minimize logic levels between successive latches. Although modern supereomputers are closely asso- ciated with high speed vector operation, it is widely Sehednling of Instructions. Performance of a pipe- recognized that scalar operation is at least of equal lined processor depends greatly on the order of the importance, and pipelining [KOGGS1] is the predominant instructions in the instruction stream. If consecutive technique for achieving high scalar performance. In a instructions have data and control dependencies and pipelined computer, instruction processing is broken contend for resources, then "holes" in the pipeline will into segments and processing proceeds in an assembly develop and performance will suffer. To improve perfor- line fashion with the execution of several instructions mance, it is often possible to arrange the code, or being overlapped. Because of data and control dependen- schedule it, so that dependencies and resource conflicts cies in a scalar instruction stream, interlock logic is are minimized. Registers can also be allocated so that placed between critical pipeline segments to control register conflicts are reduced (register conflicts caused instruction flow through the pipe. In an register-register by data dependencies can not be eliminated in this way, architecture like the CDC 6600 [THOR70], the CDC 7000 however). Because of their close relationship, in the [BONS69], and the GRAY-1 [GRAY77, CRAY79,RUSST8], remainder of the paper we will group code scheduling most of the interlock logic is localized to one segment and register allocation together and refer to them col- early in the pipeline and is referred to as "instruction lectively as "code scheduling". issue" logic. There are two different ways that code scheduling It is the purpose of this paper to highlight some of can be done. First, it can be done at compile time by the the tradeoffs that affect pipeline control, with particular software. We refer to this as "static" scheduling because emphasis on instruction issue logic. The primary vehicle it does not change as the program runs. Second, it can for this discussion is a simulation study of different be done by the hardware at run time. We refer to this as instruction issue methods with varying degrees of com- "dynamic" scheduling. These two methods are not mutu- plexity. These range from the simple and straightfor- ally exclusive. ward as in the GRAY-1 to the complex and sophisticated Most compilers for pipelined processors do some as in the CDC 6600 and the IBM 360/91 floating point unit form of static scheduling to avoid dependencies. This [TOMA67]. Each is used to implement the GRAY-1 scalar adds a new dimension to the optimization problems architecture, and each implementation is simulated faced by a compiler, and occasionally a programmer will using the 14 Lawrence IAvermore Loops [MCMA72] as hand code inner loops in assembly language to arrive at compiled by the Gray Research FORTRAN compiler (cgr). a better schedule than a compiler can provide. II0 0194-7111/84/0000/0110501.00© 1984 IEEE Issue Logic Complexity. By using complex issue algorithm we have devised to allow dynamic scheduling logic, dynamic scheduling of instructions can be while doing away with some of the associative compares achieved. This allows instructions to begin execution required by the other methods that perform dynamic "out-of-order" with respect to the compiled code scheduling. Each of the four CRAY-1 implementations is sequence. This has two advantages. First, it relieves simulated over the same set of benchmarks to allow per- some the burden on the compiler to generate a good formance comparisons. Section 6 contains a further dis- schedule. That is, performance is not as dependent on cussion on the relationship between software and the quality of the compiled code. Second, dynamic hardware code scheduling in pipelined processors, and scheduling at issue time can take advantage of depen- Section 7 contains conclusions. dency information that is not available to the compiler when it does static scheduling. Complex issue logic does 2. The CRAY-1 Architecture and Instruction Issue Algo- require longer control paths, however, which can lead to rithm a longer clock period. 2.1. Overview of the CRAY-1 Hardware Cost, Debugging, and Maintenance. Com- plex issue methods lead to additional hardware cost. The CRAY-I architecture and organization are used More logic is needed, and design time is increased. Com- throughout this paper as a basis for comparison. The plex control logic is also more expensive to debug and CRAY-1 scalar architecture is shown in Figure 1. It con- maintain. These problems are aggravated by issue sists of two sets of registers and functional units for (1) methods that dynamically schedule code because it may address processing and (2) scalar processing. The be difficult to reproduce exact issue sequences. address registers are partitioned into two levels: eight A registers and sixty four B registers. The integer add and multiply functional units are dedicated to address pro- 1.2. Historical PerspeeU~e cessing. Similarly, the scalar registers are partitioned It is interesting to review the way the above into two levels: eight S registers and sixty four T regis- tradeoffs have been dealt with historically. In the late ters. The B and T register files may be used as a 1950's and early 1960's there was rapid movement programmer-manipulated data cache, although this toward increasingly complex issue methods. Important feature is largely unused by the CFT compiler. Four milestones were achieved by STRETCH [BUCH62] in 1961 functional units are used exclusively for scalar process- and the CDC 6600 in 1964. Probably the most sophisti- ing. In addition, three floating point functional units are cated issue logic used to date is in the IBM 360/91 shared with the vector processing section (vector regis- [ANDE67], shipped in 1967. After this first rush toward ters are not shown in Fig. 1). more and more complex methods, there was a retreat toward simpler instruction issue methods that are still in use today. At CDC, the 7600 was designed to issue instructions in strict program sequence with no dynamic ~.~ T REGISTER scheduling. The clock period, however, was very fast, FILE even by today's standards. The more recent CRAY-1 and CRAY-XMP [CRAYS2] differ very little from the CDC7600 in the way they handle scalar instructions. The CDC CYBER205 [CDCS1] scalar unit is also very similar. At IBM, and later at Amdahl Corp., pipelined implementa- tions of the 360/370 architecture following the 360/91 REGIS~R have issued instructions strictly in order. FlU FILE As for the future, both the debug/maintenance problem and the hardware cost problem may be significantly alleviated by using VLSI where logic is much less expensive and where replaceable parts are such that fault isolation does not need to be as precise as with SSI. In addition, there is a trend toward moving software F'UNC1 problems into hardware, and code scheduling seems to be a candidate. Consequently, tradeoffs are shifting and instruction issue logic that dynamically schedules code deserves renewed study. 'LY AL 1.3. Paper Overview {IFT The tradeoffs just discussed lead to a spectrum of instruction issue algorithms. Through simulation we can look at the performance gains that are made possible by Figure 1 - The CRAY-1 Scalar Architecture dynamic code scheduling. Other issues like clock period and hardware cost and maintenance are more difficult The instruction set is designed for efficient pipeline and require detailed design and construction to make processing. Being a register-register architecture, only quantitative assessments.
Recommended publications
  • Contrpl Data Nos Version 2 Administration Handbook
    60459840 CONTRPL DATA NOS VERSION 2 ADMINISTRATION HANDBOOK /fP^v CDC® OPERATING SYSTEMS: CYBER 180 CYBER 170 CYBER 70 MODELS 71, 72, 73, 74 6000 REVISION RECORD T-gSZBZaSESEl jiito wminan REVISION DESCRIPTION Manual released; reflects NOS 2.3 at PSR level 617. Features include default charge restriction, (10-05-84) terminal input and output count at logoff, password randomization, a new CHARGE directive for the SUBMIT command, and support of the Mass Storage Archival Subsystem. Supports CYBER 180 computer systems. B Revision B reflects NOS 2.4.2 at PSR level 642. It incorporates new features such as support of (09-26-85) CYBER 180 Models 840, 850, 860, and 990, Printer Support Utility, and NAM Application Switching. Revision C reflects NOS 2.5.1 at PSR level 664. It documents the personal identification (09-30-86) v a l i d a t i o n , t h e s i n g l e t e r m i n a l s e s s i o n r e s t r i c t i o n , a n d o t h e r m i s c e l l a n e o u s t e c h n i c a l c h a n g e s . Revision D reflects NOS 2.6.1 at PSR level 700. It includes miscellaneous corrections and minor (04-14-88) additions. Publication No, 60459840 REVISION LETTERS I. O. Q. S. X AND Z ARE NOT USED. Address comments concerning this manual to: Control Data Technical Publications 4201 N.
    [Show full text]
  • Mi!!Lxlosalamos SCIENTIFIC LABORATORY
    LA=8902-MS C3b ClC-l 4 REPORT COLLECTION REPRODUCTION COPY VAXNMS Benchmarking 1-’ > .— u) 9 g .— mi!!lxLOS ALAMOS SCIENTIFIC LABORATORY Post Office Box 1663 Los Alamos. New Mexico 87545 — wAifiimative Action/Equal Opportunity Employer b . l)lS(”L,\l\ll K “Thisreport wm prcpmd J, an xcttunt ,,1”wurk ,pmwrd by an dgmcy d the tlnitwl SIdtcs (kvcm. mm:. Ncit her t hc llniml SIJIL.. ( Lwcrnmcm nor any .gcncy tlhmd. nor my 08”Ihcif cmployccs. makci my wur,nly. mprcss w mphd. or JwImL.s m> lcg.d Iululity ur rcspmuhdily ltw Ilw w.cur- acy. .vmplctcncs. w uscftthtc>. ttt”any ml’ormdt ml. dpprdl us. prudu.i. w proccw didowd. or rep. resent%Ihd IIS us wuukl not mfrm$e priwtcly mvnd rqdtts. Itcl”crmcti herein 10 my sp.xi!l tom. mrcial ptotlucr. prtxcm. or S.rvskc hy tdc mmw. Irdcnmrl.. nmu(a.lurm. or dwrwi~.. does nut mmwsuily mnstitutc or reply its mdursmwnt. rccummcnddton. or favorin: by the llniwd States (“mvcmment ormy qxncy thctcd. rhc V!C$VSmd opinmm d .mthor% qmxd herein do nut net’. UMrily r;~lt or died lhow. ol”the llnttcd SIJIL.S( ;ovwnnwnt or my ugcncy lhure of. UNITED STATES .. DEPARTMENT OF ENERGY CONTRACT W-7405 -ENG. 36 . ... LA-8902-MS UC-32 Issued: July 1981 G- . VAX/VMS Benchmarking Larry Creel —. I . .._- -- ----- ,. .- .-. .: .- ,.. .. ., ..,..: , .. .., . ... ..... - .-, ..:. .. *._–: - . VAX/VMS BENCHMARKING by Larry Creel ABSTRACT Primary emphasis in this report is on the perform- ance of three Digital Equipment Corporation VAX-11/780 computers at the Los Alamos National Laboratory. Programs used in the study are part of the Laboratory’s set of benchmark programs.
    [Show full text]
  • Conversion of the COBRA-IV-I Code from CDC CYBER to HP 9000/700 Version
    KAERI/TR-803/97 Conversion of the COBRA-IV-I Code from CDC CYBER to HP 9000/700 Version CDC CYBER Version COBRA-IV-I ^H HP 9000/700 Version ° 1997. 1 Korea Atomic Energy Research Institute £- JiJL>Hl- "CDC CYBER Version COBRA-IV-I 3.B.4) HP 9000/700 Version ±£-2\ ^^" o\] 1997 tf 71 «V Abstract COBRA-IV-I is a multichannel analysis code for the thermal-hydraulic analysis of rod bundle nuclear fuel elements and cores based on the subchannel approach. The existing COBRA-IV-I code is the Control Data Corporation (CDC) CYBER version, which has limitations on the computer core storage and gives some inconvenience to the user interface. To solve these problems, we have converted the COBRA-IV-I code from the CDC CYBER mainframe to an Hewlett Packard (HP) 9000/700-series workstation version, and have verified the converted code. As a result, we have found almost no difference between the two versions in their calculation results. Therefore we expect the HP 9000/700 version of the COBRA-IV-I code to be the basis for the future development of an improved multichannel analysis code under the more convenient user environment. COBRA-IV-I fe COBRA-IV-I £• CDC CYBER-§- SE CDC CYBER oflSlH COBRA-IV-I SHf HP 9000/700 HP 9000/700 -§--^- COBRA-IV-I 3-E.t: ^± Table of Contents Abstract 2 Abstract (in Korean) 3 Table of Contents 4 List of Tables 6 List of Figures 7 1. Introduction 8 2. Conversion of COBRA-IV-1 9 2.1 Precision 9 2.2 Convenience 12 2.3 Emulation 12 2.4 Syntax 13 2.5 Equivalence 17 3.
    [Show full text]
  • ABSTRACT CO-RESIDENT OPERATING SYSTEMS TASK TEAM REPORT April 181 1973
    ABSTRACT CO-RESIDENT OPERATING SYSTEMS TASK TEAM REPORT THIS DOCUI-lENT CONTAINS INFORl-lATiON PROPRIETARY TO THE NATIONAL CASH REGISTER COfoIPAflY AND CONTROL DATA CORPORATION. ITS CONTENTS SHALL NOT BE DIVULGED OUTSIDE OF EITHER CO~lPANY NOR REPROD~CED mTHOUT EXPLICIT PERloJISSION OF THE DIRECTOR AND GENERAL r-lANAGER. NCRICDC ADVANCED SYSTEfolS LABORATORY. April 18 1 1973 · 1 The mission of the co-resident oper.al"ing system task group was: "To define a cost effective mechamism within the new operating system to allow selected 01 d operating syst'ems to operate in their entirety permitting simultaneous execution with no modifications to the old operating system". Four people were assigned to this task, I:wo from NCR wifh ,expBrience on/he NCi\/Century and two from CDC experienced with CYI3ER 70 computer systems. The scope of the probl em was narrowed to a point where productive OLltput could be obtained in the allotted time frame. It was decided the' time could be spent most profitably in a detailad study of emulating the .NCR Century computer systems ,on the IPL.·· This decision was heavily influen'ced by a desire to deal with specific problems , ' ral·her·than'philosophi'cal-cmd incondusive'mattcrs. It was also:f.;:;lt thut The productivity of the team would be greater if it were kept s.mall •. Having limited the problom in scope, vario~sassumptions h.od t:)bo made bdore furi-h.:,:,r progress could be accompl ished. Key in this area wes the assumption thar. ;h:; probkm wos soluble. As a result I Ihe team concentroted on how to effect Iheimp\ernentation, as opposed j'O if if' was possibl e.
    [Show full text]
  • VMS 4.2 Service Begins
    r, University of Minnesota Twin Cities May 1986 VAX News VMS 4.2 Service Begins Marisa Riviere On April16 the VAX 8600, ACSS's passwords on the VAX 8600 are network prompt as described new VX system, became available the same as those you were using above. to our users. This larger, faster on the VAX 11/780 at the time of VAX permits us to offer new VMS the transfer. You may also have to change your services that were difficult or terminal parity; in VMS 4.2 the impossible to offer on the VA, a As previously announced in our parity default is even. smaller VMS machine, and will cost March Newsletter, we discontinued users 20 to 30 percent less than our VMS 3.6 service on the VA a the VA. few days after the VX became Training Software available. The ACSS additions to the VMS We plan to offer several on-line operating system have been Not all the software available on the training packages on the VX. As transferred to the VAX 8600. VAX 11/780 was transferred to the we go to press, two packages are There are, however, some VAX 8600. See the March available: the Introduction to VMS important differences between Newsletterorthe VMS42 writeup and the Introduction to EDT. (EDT VMS 4.2 (the operating system on on the VX for details. is the VMS line and full-screen the VX) and VMS 3.6 (the editor). To use e~her package, first operating system on the VA). type in the set term/vt100 Some changes are discussed later Logging On command.
    [Show full text]
  • Computer Architectures and Aspects of NWP Models
    Computer Architectures and Aspects of NWP models Deborah Salmond ECMWF Shinfield Park, Reading, UK Abstract: This paper describes the supercomputers used in operational NWP together with the computer aspects of NWP models. The IFS from ECMWF and the UM from the UK Met Office are used to illustrate these aspects being typical examples of a spectral model and a grid-point model. The use of message passing and OpenMP to obtain highly scalable parallelised NWP models is shown. Finally code optimization is briefly described. 1. Introduction This paper describes aspects of high-performance computer architectures that are relevant to Numerical Weather Prediction (NWP) model performance. Over the years high-performance computers have been developed based on either scalar or vector processors1. Each has its own particular issues that need to be considered when writing optimized code. For vector processors it is most important to vectorise as much as possible of the code and this can give performance improvements of the order of ten times. Scalar processors have an intermediate cache between the processors and the memory and it is important to consider locality of data and data access and reuse. High performance computers now have a large number of processors and the NWP model code has to be parallelised in a scalable way to take advantage of this. Initially computers with multiple processors were made with all processors connected with equal access to a single shared memory, for example the CRAY C90. This was followed by systems with multiple processors each with its own memory and connected together by a high speed interconnect.
    [Show full text]
  • (Is!!I%Losalamos Scientific Laboratory
    LA-8689-MS Los Alamos Scientific Laboratory Computer Benchmark Performance 1979 Y I I i PERMANENT RETENTION i i I REQUIRED BY CONTRACT ~- i _...._. -.. .— - .. ———--J (IS!!I%LOS ALAMOS SCIENTIFIC LABORATORY Post Office Box 1663 Los Alamos. New Mexico 87545 An AffKmtitive Action/I!qual Opportunity Employer t- Editedby MargeryMcCormick GroupC-2 I)lSCLAIM1.R Thh rqwtl was prtpued as an ‘mount or work sponwrcd by m aguncy or !he UnllcdSIws Gown. mcnt. Nclthcr the Unltcd Stales Govw,lmcnt nor any agency thcr.wf, nor any of lheif cmpluyws, mtkci my warranty, cipre!> or Impllcd, or assumes any Iqd USIWY or rc$ponslbdlty for the Iccur. ncv, mmplclcness, or u~cfulnc$s of any information, appm’’luq product, or process disclosed, oc r+ re$enh thit Its use would not Infrlngc privntcly owned r~hts. Rcfcrcncc hercln 10 my $I!CCUSCcorn. mercltl product, PIOCC$S,or SCNICCby trtdc name, Imdcmtrk, manufaL!urer, or o!hmwisc, does not necenarily mmtitute or imply its cndorscmcnt, retommcndation, or favoring by lhc United Stttcs Government or any agency thereof. The views and opinions of authors expressed herein do not ncc. c:mrily ttmtc or reflect those of the Unltcd Statct Government 01 my s;ency thcceof. UNITED STATES DEPARTMENT OF ENERGY CONTRACT W-7405 -ENG. 36 LA-8689-MS UC-32 Issued: February 1981 Los Alamos Scientific Laboratory Computer Benchmark Performance 1979 Ann H. Hayes Ingrid Y. Bucher — ----- . J E- -“ ““- ”’”-” .- ------- r- ------ . ..- . .. — ,.. LOS ALAMOS SCIENTIFIC LABORATORY COMPUTER BENCHMARK PERFOWCE 1979 by Ann H. Hayes and Ingrid Y. Bucher ABSTRACT Benchmarking of computers for performance evalua- tion and procurement purposes is an ongoing activity of the Research and Applications Group (Group C-3) at the Los Alamos Scientific Laboratory (LASL).
    [Show full text]
  • ' an Evaluation of ^0 Superminicomputers ^ for Thermal Analysis
    X NASA Technical Memorandum 83284 ' An Evaluation of ^0 Superminicomputers ^ for Thermal Analysis Olaf O. Storaasli, James B. Vidal, and Gary K. Jones APRIL 1982 NASA NASA Technical Memorandum 83284 An Evaluation of Superminicomputers for Thermal Analysis Olaf O. Storaasli Langley Research Center Hampton, Virginia James B. Vidal Digital Equipment Corporation Richmond, Virginia Gary K. Jones Goddard Space Plight Center Greenbelt, Maryland NASA National Aeronautics and Space Administration Scientific and Technical Information Branch 1982 INTRODUCTION The computer hardware on which thermal analysis is run now and will be run in the future is undergoing significant changes as shown in figure 1. The past and projected market share for mainframes, minicomputers, and microcomputers is shown on the left of figure 1 (refs. 1 and 2 ). The figure shows a dramatic decrease in market share for mainframe computers as the minicomputers and microcomputers become capable of solving problems formerly solved on mainframes only. Figure 1 shows on the right the most dramatic increase in revenues is projected to be for 32-bit minicomputers to reach 4.3 billion dollars by 1985. In a parallel effort, 32-bit microcomputers (CPU on a chip) with virtual memory and Winchester disk drives are being introduced (micromainframes) which promise to provide mainframe capability in smaller packages at significantly reduced cost (i.e., "VAX on a chip"). The cost vs. capability or "cost-effectiveness" is the driving factor in the choice of future computing capability. The economy of scale criteria used to justify large centralized computer complexes is being challenged by wide-scale use of inexpensive minicomputers which are proliferating in much the same way as hand-held calculators.
    [Show full text]
  • THE USE of MINICOMPUTERS in a DISTRIBUTED INFORMATION PROCESSING SYSTEM-A FEASIBILITY STUDY by Stanley M
    THE USE OF MINICOMPUTERS IN A DISTRIBUTED INFORMATION PROCESSING SYSTEM-A FEASIBILITY STUDY By Stanley M. Lon^will, Jesse M. McNellis, and Dou&las R. Posson U.S. GEOLOGICAL SURVEY Open-File Report 80—326 1980 UNITED STATES DEPARTMENT OF THE INTERIOR CECIL D. ANDRUS, Secretary GEOLOGICAL SURVEY H. William Menard, Director For additional information write to: Chief Hydrolo&ist 437 National Center U.S. Geological Survey, WRD Reston, Virginia 22092 TABLE OF CONTENTS I Introduction ................. 5 II Executive summary ............... 6 III Acquisition of prototype systems .......... 8 IV Prototype minicomputer configuration ......... 9 V Minicomputer installation and operations ........ 13 A. Installation of minicomputer systems ....... 13 1. Logistics ............... 13 2. Environment .............. 13 3. Other ................ 13 B. Operation ................ 14 1. Attended ............... 14 2. Unattended .............. 14 C. Hardware malfunctions ............ 15 1. CPU and memory ............. 15 2. Peripherals .............. 15 D. Hardware deficiencies ............ 16 1. CPU and memory ............. 16 2. Peripherals .............. 16 E. Software deficiencies ............ 17 1. System functions ............ 17 2. Processors .............. 17 a. FORTRAN .............. 17 b. Editor .............. 18 c. HASP emulator ............ 18 d. 200UT emulator ........... 18 e. Tape utilities ........... 18 F. Telecommunications ............. 18 G. Programming ................ 19 H. Training ................ 19 I. Maintenance ................ 20
    [Show full text]
  • Supercomputers: the Amazing Race Gordon Bell November 2014
    Supercomputers: The Amazing Race Gordon Bell November 2014 Technical Report MSR-TR-2015-2 Gordon Bell, Researcher Emeritus Microsoft Research, Microsoft Corporation 555 California, 94104 San Francisco, CA Version 1.0 January 2015 1 Submitted to STARS IEEE Global History Network Supercomputers: The Amazing Race Timeline (The top 20 significant events. Constrained for Draft IEEE STARS Article) 1. 1957 Fortran introduced for scientific and technical computing 2. 1960 Univac LARC, IBM Stretch, and Manchester Atlas finish 1956 race to build largest “conceivable” computers 3. 1964 Beginning of Cray Era with CDC 6600 (.48 MFlops) functional parallel units. “No more small computers” –S R Cray. “First super”-G. A. Michael 4. 1964 IBM System/360 announcement. One architecture for commercial & technical use. 5. 1965 Amdahl’s Law defines the difficulty of increasing parallel processing performance based on the fraction of a program that has to be run sequentially. 6. 1976 Cray 1 Vector Processor (26 MF ) Vector data. Sid Karin: “1st Super was the Cray 1” 7. 1982 Caltech Cosmic Cube (4 node, 64 node in 1983) Cray 1 cost performance x 50. 8. 1983-93 Billion dollar SCI--Strategic Computing Initiative of DARPA IPTO response to Japanese Fifth Gen. 1990 redirected to supercomputing after failure to achieve AI goals 9. 1982 Cray XMP (1 GF) Cray shared memory vector multiprocessor 10. 1984 NSF Establishes Office of Scientific Computing in response to scientists demand and to counteract the use of VAXen as personal supercomputers 11. 1987 nCUBE (1K computers) achieves 400-600 speedup, Sandia winning first Bell Prize, stimulated Gustafson’s Law of Scalable Speed-Up, Amdahl’s Law Corollary 12.
    [Show full text]
  • ADAPTATION of a PROGRAM for NONLINEAR FINITE ELEMENT ANALYSIS to the CDC STAR 100 COMPUTER* Allan B. Pifko Grumman Aerospace
    ADAPTATION OF A PROGRAMFOR NONLINEAR FINITE ELEMENT ANALYSIS TO THE CDC STAR 100 COMPUTER* Allan B. Pifko Grumman Aerospace Corporation Patricia L. Ogilvie Grumman Data Systems Corporation SUMMARY This paper discusses an effort to convert a nonlinear finite element program to the CDC STAR 100 pipeline computer. The program called DYCAST was developed under contract with NASA Langley Research Center for crash simulation of structures. Initial results with the STAR 100 computer indicate that significant gains in computation time are possible for operations on global arrays. However, for element level computations that do not lend themselves easily to long vector processing, the STAR 100 was slower than comparable scalar computers. On this basis it was concluded that in order for pipeline computers to impact the economic feasibility of large nonlinear analyses it is absolutely essential that algorithms be devised to improve the efficiency of element level computations. INTRODUCTION During the last decade, finite element methods, originally developed for linear structural analysis, have been extended to nonlinear problems. These developments have progressed to the point where the available methods are on a firm analytical basis and have been implemented in a number of general purpose computer codes. Since nonlinear analysis with the finite element method is essentially a successive linearization of a nonlinear problem, a given analysis completed in 'In" steps can require a level of computation associ- ated with 'In" linear finite element solutions. Consequently, the problem facing the analyst today is not so much whether the solution to a given problem can be obtained, but whether the computer cost can be afforded.
    [Show full text]
  • A Comparison of Processor Technologies
    Virginia Commonwealth University VCU Scholars Compass Theses and Dissertations Graduate School 1983 A comparison of processor technologies Eddie R. Wachter Follow this and additional works at: https://scholarscompass.vcu.edu/etd Part of the Computer Sciences Commons, and the Mathematics Commons © The Author Downloaded from https://scholarscompass.vcu.edu/etd/5617 This Thesis is brought to you for free and open access by the Graduate School at VCU Scholars Compass. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of VCU Scholars Compass. For more information, please contact [email protected]. College of Humanities and Sciences Virginia Commonwealth University This is to certify that the thesis prepared by Eddie R. Wachter has been approved by his committee as satisfactory completion of the thesis requirement for the degree of Master of Science. Dr. James E. Ames, IV Director of Thesis / Dr. Richard Allan Committee Member Dr. Francis R. Kane Committee Member Committee Member Dr. William Haver E. Department Chairman Dr. Elkse v. P. Smith Dean, College of Humanities and Sciences A Comparison of Processor Technologies A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science at Virginia Commonwealth University . by Eddie R. Wachter Director : Dr . James E. Ames, IV Affiliate As sistant Professor Department of f'.'Jathematica l Sciences Virginia Commonwealth University Richmond, Virginia May, 1983 ii Acknowledgements I wish to thank Dr . Ames for hi s support and assistance in preparing the study , Dr . Richard Allan for his experience and insights which guided me in the right directions, and above all , my family for understanding the time involved in pursuing a degree at night over the last four years .
    [Show full text]