Chapter 5. Driving Forces on Packaging: Physical Interconnects

Total Page:16

File Type:pdf, Size:1020Kb

Chapter 5. Driving Forces on Packaging: Physical Interconnects DRIVING FORCES ON PACKAGING: 5 PHYSICAL INTERCONNECTS The single most important element of the package and interconnect that influences the system clock speed, the performance density, and often the cost of a system, is how close together the devices can be placed. The term that best describes this is packaging efficiency. The three fabri- cation and assembly issues that constrain the packaging efficiency are: ¥ the I/O off the chipÑthe number, the form factor, the pitch ¥ the I/O off the packageÑthe number, the form factor, the pitch ¥ the interconnect substrateÑthe pad pitch, the via pitch, the line pitch, the number of layers Pin Count Requirements and RentÕs Rule As the number of gates on a single die has increased, the number of I/Os required to interface them to the outside world has also increased. In 1960, E.F. Rent of IBM identified a definite empir- ical relationship between the number of gates in a block, Ngates, and the number of I/Os they required, NI/O. This relationship, since coined RentÕs Rule, has been extended and generalized to encompass a variety of chip types and module sizes. Figure 5-1 is a plot of the signal I/Os required for various gate arrays and microprocessors. In general, for every 4 to 10 signal I/Os, one power or one ground is used. As the clock frequency goes up, a higher fraction of power and ground pads are required to keep switching noise at acceptable levels. For clock frequencies over 250MHz, the ratio is closer to 3:1 signal to power/ground pads. Empirically, the RentÕs Rule relationship between total signal I/Os required and gate count is: NkN= p I/O gates where k and p are constants that depend on the architecture and the partitioning. RentÕs Rule graphically shows the ever greater demand for more I/Os as the number of gates increases. INTEGRATED CIRCUIT ENGINEERING CORPORATION 5-1 Driving Forces on Packaging: Physical Interconnects For the case of microprocessors: k = 0.82 p = 0.45 For the case of already committed gate arrays: k = 1.9 p = 0.5 The original values that Rent found applicable for his systems were: k = 2.5 p = 0.61 104 Bipolar Gate Array CMOS Gate Array Microprocessor SRAM DRAM 103 2 Number of Signal Pins 10 101 102 103 104 105 106 Number of Gates or Bits Source: University of Arizona/ICE, "Roadmaps of Packaging Technology" 13651A Figure 5-1. I/O Pin Count Versus Complexity RentÕs Rule is a loose, empirical rule of thumb that can be used to roughly predict the number of I/Os a chip or module will need as the number of committed gates increases. Care must be exer- cised in using it. There is an explicit assumption that the partitioning will remain the same as the gate count increases. Of course, at some point, the functionality must reach the point where the I/O count decreases. After all, even the largest computer has at most a few hundred system-level I/Os for disk drive and keyboard access. The point at which the I/O requirements begin to fall under RentÕs Rule is a measure of the degree of partitioned functionality. 5-2 INTEGRATED CIRCUIT ENGINEERING CORPORATION Driving Forces on Packaging: Physical Interconnects Decreasing the I/O count off chip without resorting to multiplexing or degrading signal integrity, will always increase the performance density per unit cost, if only by reducing the component and assembly costs. Partitioning of gates and chips into functional units that will bring the I/O count under RentÕs Rule is a critically important step in the design cycle. Integrating more functional- ity on chip will most often increase the performance/cost. As the gate count available on gate arrays increases, the required I/O count will increase. From 2 the gate density, Dgate, in gates/in , the required I/O count for any die size can be calculated with RentÕs Rule. For gate arrays, the equation would be: N = k D • A = 1.9 • N I/O gates chip gates In Figure 5-2, this estimate of required I/O count is compared with a few representative examples of late 1980s vintage gate array families. It is clear that a consequence of smaller design rules and denser gates is more I/O required for the same size chips! This means either finer pitch periph- eral I/O or switching to the more efficient area array I/O. 2,000 1000K Gates/in2 1,500 500K Gates/in2 (486 density) Future Generation 1,000 250K Gates/in2 200K Gates/in2 I/O Required on CMOS Density 100K Gates/in2 500 One Micr 10K Gates/in2 0 0.0 0.2 0.4 0.6 0.8 1.0 √Die Size (inches) Source: ICE, "Roadmaps of Packaging Technology" 15793 Figure 5-2. I/O Required Increases with Gate Density For gate arrays fabricated with deep submicron design rules, the integration levels are high enough, and bussed I/O are sufficiently prevalent that RentÕs Rule no longer applies. For exam- ple, the 1996 generation ASICs, at 0.35 micron design rules, have a gate density of roughly 0.4M/cm2. The largest ASIC, 18mm on a side, would have about 3.2M gates, and by RentÕs Rule, require 3,400 I/O! By comparison, the LSI Logic G10 family of gate arrays, at 0.35 micron design rules and slightly under 18mm on a side, has a maximum usable number of gates of about 2.5 million. However, it has a capacity of only about 800 I/O. This is significantly lower than the prediction from RentÕs INTEGRATED CIRCUIT ENGINEERING CORPORATION 5-3 Driving Forces on Packaging: Physical Interconnects Rule. Part of the reason is that 800 I/O were all that could be practically interconnected with cur- rent generation wirebonding equipment. This family is pad limited. Even so, there would never be designs requiring 3,400 I/O because of the high integration levels and the high level of func- tionality. Designs of greater than 200,000 gates are approaching the system-on-a-chip, and begin to fall significantly under RentÕs Rule. The coefficients for RentsÕ rule have been derived empirically based on a study of gate arrays built in the 1970s and 1980s. The predictions using these coefficients are a measure of the number of signal I/O to fully utilize all the gates as part of a larger ÒrandomÓ logic system. When there is significant integration and mostly busses as the interface, RentÕs Rule should be used as an upper limit of the required number of I/O. Chips with this high an integration level do not have the same interconnect requirements as random logic. The SIA roadmap for pin count takes into account the impact from higher integration levels and the implementation of bussed I/O. Figure 5-3 contrasts the SIA prediction for I/O with RentÕs Rule prediction, based on the SIA values for chip size and gate density. The large and growing discrepancy is a measure of the impact from the two factors of integration functionality and bussing. Even so, the off- chip I/O count is predicted to grow considerably. 80,000 70,000 Rent's Rule I/O 60,000 SIA Prediction SIA Rent's Rule 50,000 40,000 I/O Off-Chip 30,000 20,000 10,000 0 1994 1996 1998 2000 2002 2004 2006 2008 2010 Year of Introduction Source: ICE, "Roadmaps of Packaging Technology" 22196 Figure 5-3. SIA Off-Chip I/O Compared to RentÕs Rule 5-4 INTEGRATED CIRCUIT ENGINEERING CORPORATION Driving Forces on Packaging: Physical Interconnects Based on the SIA roadmap predictions, we can estimate the new values for RentÕs Rule that match the SIA roadmap. A good approximation is obtained using: k = 0.2 p = 0.5 The match to the projected SIA predictions with these coefficients to RentÕs Rule is also shown in Figure 5-3. IMPLEMENTING OFF CHIP INTERCONNECTS There are two configurations for I/Os off a chip: 1. a single row or two staggered rows around the periphery 2. an array of pads on a grid over the surface of the die The maximum number of pads, Npads, on a chip is constrained by the pad pitch and perimeter for peripheral I/O, 4L N = chip pads P pads and the grid pitch and chip area for area array, L2 N = chip pads P2 pads This is diagrammed in Figure 5-4. The number of pads constrained by these two approaches is shown in Figure 5-5 for various pitches and for one and two rows of pads. Using the die sizes and pin count predictions of the SIA roadmap, the pad pitches that would be needed can be estimated. For example, if a single peripheral row is used, a pad pitch of 80 microns is required for current generation ASICs. This is right at the capability of 1996 wirebonding in volume production. To meet future ASIC needs, this pitch must steadily decrease. In contrast, if the I/O were to be on an area array, the pitch would only have to be 600 microns, or 24mils, a much more realistic effort. This is a strong driving force for area array off chip I/O. In addition to accommodating a higher I/O count without heroic mechanical feats, area array also offers the opportunity for better electrical performance by allowing more power and ground pads distributed over the surface of the chip, where they are needed the most.
Recommended publications
  • (52) Cont~Ol Data
    C) (52) CONT~OL DATA literature and Distribution Services ~~.) 308 North Dale Street I st. Paul. Minnesota 55103 rJ 1 August 29, 1983 "r--"-....." (I ~ __ ,I Dear Customer: Attached is the third (3) catalog supplement since the 1938 catalog was published . .. .·Af ~ ~>J if-?/t~--62--- G. F. Moore, Manager Literature & Distribution Services ,~-" l)""... ...... I _._---------_._----_._----_._-------- - _......... __ ._.- - LOS CATALOG SUPPLEPtENT -- AUGUST 1988 Pub No. Rev [Page] TITLE' [ extracted from catalog entry] Bind Price + = New Publication r = Revision - = Obsolete r 15190060 [4-07] FULL SCREEN EDITOR (FSEDIT) RM (NOS 1 & 2) .......•...•.•...•••........... 12.00 r 15190118 K [4-07] NETWORK JOB ENTRY FACILITY (NJEF) IH8 (NOS 2) ........................... 5.00 r 15190129 F [4-07] NETWORK JOB ENTRY FACILITY (NJEF) RM (NOS 2) .........•.......•........... + 15190150 C [4-07] NETWORK TRANSFER FACILITY (NTF) USAGE (NOS/VE) .......................... 15.00 r 15190762 [4-07] TIELINE/NP V2 IHB (L642) (NOS 2) ........................................ 12.00 r 20489200 o [4-29] WREN II HALF-HEIGHT 5-114" DISK DRIVE ................................... + 20493400 [4-20] CDCNET DEVICE INTERFACE UNITS ........................................... + 20493600 [4-20] CDCNET ETHERNET EQUIPMENT ............................................... r 20523200 B [4-14] COMPUTER MAINTENANCE SERVICES - DEC ..................................... r 20535300 A [4-29] WREN II 5-1/4" RLL CERTIFIED ............................................ r 20537300 A [4-18] SOFTWARE
    [Show full text]
  • Future of Supercomputing Yoshio Oyanagi ∗
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Elsevier - Publisher Connector Journal of Computational and Applied Mathematics 149 (2002) 147–153 www.elsevier.com/locate/cam Future of supercomputing Yoshio Oyanagi ∗ Department of Computer Science, University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo, 113-0033 Japan Received 26 October 2001; received in revised form 11 February 2002 Abstract Supercomputing or High Performance Computing plays ever more important roles in industrial ÿelds as well as in basic research. Based upon the history of supercomputers in the last few decades, a personal view of the supercomputing in the ÿrst decade of the 21st century is presented. c 2002 Elsevier Science B.V. All rights reserved. Keywords: Supercomputers; HPC; Simulation; Computational science; Peta5ops computer 1. From supercomputing to high performance computing The ÿrst “supercomputer” Cray-1 was built by Seymour Cray in 1976 and people thought only a fewof such supercomputers wouldmeet the demand from science and engineering. At present, however, a notebook PC in your bag supersedes the computing power of Cray-1. Supercomputing has played important roles in industrial ÿelds such as automobile, aeronautics, building and civil engineering, electric and electronic engineering, material and pharmaceutics, as well as in basic research such as elementary particles, chemistry, condensed matter, genome, protein and complex systems. Supercomputers are regarded as the computers which have an order of magnitude higher performance. They are special expensive facilities which only government laboratories or universities or big companies can a<ord. Currently, although supercomputing is playing ever more important roles in various ÿelds, the word “supercomputer” is not very popular.
    [Show full text]
  • Recent Supercomputing Development in Japan
    Supercomputing in Japan Yoshio Oyanagi Dean, Faculty of Information Science Kogakuin University 2006/4/24 1 Generations • Primordial Ages (1970’s) – Cray-1, 75APU, IAP • 1st Generation (1H of 1980’s) – Cyber205, XMP, S810, VP200, SX-2 • 2nd Generation (2H of 1980’s) – YMP, ETA-10, S820, VP2600, SX-3, nCUBE, CM-1 • 3rd Generation (1H of 1990’s) – C90, T3D, Cray-3, S3800, VPP500, SX-4, SP-1/2, CM-5, KSR2 (HPC ventures went out) • 4th Generation (2H of 1990’s) – T90, T3E, SV1, SP-3, Starfire, VPP300/700/5000, SX-5, SR2201/8000, ASCI(Red, Blue) • 5th Generation (1H of 2000’s) – ASCI,TeraGrid,BlueGene/L,X1, Origin,Power4/5, ES, SX- 6/7/8, PP HPC2500, SR11000, …. 2006/4/24 2 Primordial Ages (1970’s) 1974 DAP, BSP and HEP started 1975 ILLIAC IV becomes operational 1976 Cray-1 delivered to LANL 80MHz, 160MF 1976 FPS AP-120B delivered 1977 FACOM230-75 APU 22MF 1978 HITAC M-180 IAP 1978 PAX project started (Hoshino and Kawai) 1979 HEP operational as a single processor 1979 HITAC M-200H IAP 48MF 1982 NEC ACOS-1000 IAP 28MF 1982 HITAC M280H IAP 67MF 2006/4/24 3 Characteristics of Japanese SC’s 1. Manufactured by main-frame vendors with semiconductor facilities (not ventures) 2. Vector processors are attached to mainframes 3. HITAC IAP a) memory-to-memory b) summation, inner product and 1st order recurrence can be vectorized c) vectorization of loops with IF’s (M280) 4. No high performance parallel machines 2006/4/24 4 1st Generation (1H of 1980’s) 1981 FPS-164 (64 bits) 1981 CDC Cyber 205 400MF 1982 Cray XMP-2 Steve Chen 630MF 1982 Cosmic Cube in Caltech, Alliant FX/8 delivered, HEP installed 1983 HITAC S-810/20 630MF 1983 FACOM VP-200 570MF 1983 Encore, Sequent and TMC founded, ETA span off from CDC 2006/4/24 5 1st Generation (1H of 1980’s) (continued) 1984 Multiflow founded 1984 Cray XMP-4 1260MF 1984 PAX-64J completed (Tsukuba) 1985 NEC SX-2 1300MF 1985 FPS-264 1985 Convex C1 1985 Cray-2 1952MF 1985 Intel iPSC/1, T414, NCUBE/1, Stellar, Ardent… 1985 FACOM VP-400 1140MF 1986 CM-1 shipped, FPS T-series (max 1TF!!) 2006/4/24 6 Characteristics of Japanese SC in the 1st G.
    [Show full text]
  • VHSIC and ETA10
    VHSIC and The First CMOS and Only Cryogenically Cooled Super Computer David Bondurant Former Honeywell Solid State Electronics Division VHSIC System Applications Manager ETA10 Supercomputer at MET Office - UK’s National Weather Forecasting Service Sources • “Very High Speed Integrated Circuits (VHSIC) Final Program Report 1980-1990, VHSIC Program Office”, Office of the Under Secretary of Defense for Acquisition, Deputy Director, Defense Research and Engineering for Research and Advanced Technology, September 30, 1990 • Carlson, Sullivan, Bach, and Resnick, “The ETA10 Liquid-Nitrogen-Cooled Supercomputer System”, IEEE Transactions on Electron Devices, Vol. 36, No. 8, August 1989. • Cummings and Chase, “High Density Packaging for Supercomputers” • Tony Vacca, “First Hand: The First CMOS And The Only Cryogenically Cooled Supercomputer”, ethw.org • Robert Peglar, “The ETA Era or How to (Mis-)Manage a Company According to Control Data Corp.”, April 17, 1990 Background • I graduated from Missouri S&T in May 1971 • My first job was at Control Data Corporation, the leading Supercomputer Company • I worked in Memory Development • CDC 7600 was in production, 8600 (Cray 1) was in development by Seymour Cray in Chippewa Falls, WI • Star 100 Vector Processor was in Development in Arden Hills • I was assigned to the development of the first DRAM Memory Module for a CDC Supercomputer • I designed the DRAM Module Tester Using ECL Logic Control Data Corporation Arden Hills • First DRAM Module was 4Kx32 using 128 1K DRAM Development Center - June 1971 Chips
    [Show full text]
  • PERSPECTIVE on the NEXT TEN YEARS in PLASMA Physicst
    Particle Accelerators, 1986, Vol. 19, pp. 247-255 0031-2460/86/1904-0247/$15.00/0 © 1986 Gordon and Breach, Science Publishers, S.A. Printed in the United States of America PERSPECTIVE ON THE NEXT TEN YEARS IN PLASMA PHYSICSt JOHN KILLEEN National Magnetic Fusion Energy Computer Center, Lawrence Livermore National Laboratory, University of California, Livermore, California 94550 (Received March 7, 1985) A brief survey of developments in the plasma physics of magnetic-fusion research is presented. The major experimental facilities of the next decade are listed. Eight fusion-physics issues are identified which must be addressed as a complete plasma system in order to reach the goal of a fusion reactor. In order to resolve the physics issues the results from these new experimental facilities must be augmented by an extensive program of computer modeling. The use of computer models of a magnetically confined plasma and the implementation of these models on the new supercomputers of the next decade are the main topics of this paper. I. INTRODUCTION During the early 1970s the U.S. magnetic-fusion program supported at least fifteen varieties of experimental concepts. These were rather small experiments as compared to today's large facilities. During the years 1974 to 1980, the program went through a period of dramatic growth, but at the same time evaluations and reviews reduced the number of experimental concepts supported to the following six: tokamak, tandem mirror, reverse field pinch (RFP), stellarator, compact toroids, and Elmo bumpy torus. The most advanced of the above concepts is the tokamak, and all four of the major international groups have commissioned large facilities (Table I) to establish the scientific feasibility of fusion.
    [Show full text]
  • Supercomputers: the Amazing Race Gordon Bell November 2014
    Supercomputers: The Amazing Race Gordon Bell November 2014 Technical Report MSR-TR-2015-2 Gordon Bell, Researcher Emeritus Microsoft Research, Microsoft Corporation 555 California, 94104 San Francisco, CA Version 1.0 January 2015 1 Submitted to STARS IEEE Global History Network Supercomputers: The Amazing Race Timeline (The top 20 significant events. Constrained for Draft IEEE STARS Article) 1. 1957 Fortran introduced for scientific and technical computing 2. 1960 Univac LARC, IBM Stretch, and Manchester Atlas finish 1956 race to build largest “conceivable” computers 3. 1964 Beginning of Cray Era with CDC 6600 (.48 MFlops) functional parallel units. “No more small computers” –S R Cray. “First super”-G. A. Michael 4. 1964 IBM System/360 announcement. One architecture for commercial & technical use. 5. 1965 Amdahl’s Law defines the difficulty of increasing parallel processing performance based on the fraction of a program that has to be run sequentially. 6. 1976 Cray 1 Vector Processor (26 MF ) Vector data. Sid Karin: “1st Super was the Cray 1” 7. 1982 Caltech Cosmic Cube (4 node, 64 node in 1983) Cray 1 cost performance x 50. 8. 1983-93 Billion dollar SCI--Strategic Computing Initiative of DARPA IPTO response to Japanese Fifth Gen. 1990 redirected to supercomputing after failure to achieve AI goals 9. 1982 Cray XMP (1 GF) Cray shared memory vector multiprocessor 10. 1984 NSF Establishes Office of Scientific Computing in response to scientists demand and to counteract the use of VAXen as personal supercomputers 11. 1987 nCUBE (1K computers) achieves 400-600 speedup, Sandia winning first Bell Prize, stimulated Gustafson’s Law of Scalable Speed-Up, Amdahl’s Law Corollary 12.
    [Show full text]
  • Some Thoughts for Particle-In-Cell on Leadership Class Computers
    Some thoughts for particle-in-cell on leadership class computers W. B. Mori University of California Los Angeles (UCLA) Departments of Physics and Astronomy and of Electrical Engineering Institute of Digital Research and Education [email protected] And much of the material from the Joule Metric study is from: R.A. Fonseca GoLP/IPFN, Instituto Superior Técnico, Lisboa, Portugal W. B. Mori | IPAM, May 14th | PLWS 2012 My experimental colleagues view of the world Life of an experimentalist Life of a simulationist W. B. Mori | IPAM, May 14th | PLWS 2012 But we face a tsunami of advances... ...in theory, in computational methods, in new hardware, and in new algorithms Contrary to the perception of my experimental colleagues, we rarely get to “relax” . This workshop offers us a rare opportunity for us to think in a “quiet room”. W. B. Mori | IPAM, May 14th | PLWS 2012 The evolution of state-of-the-art HPC systems cannot be ignored Top System Performance [MFlop/s] Tianhe-1A IBM RoadRunner 9 Cray XT5-HE 10 IBM Blue Gene/L IBM Blue Gene/L NEC Earth Simulator SGI Project Columbia IBM ASCI White Intel ASCI Red/9152 106 Intel ASCI Red/9632 Intel Paragon XP/S 140 Hitachi/Tsukuba CP-PACS/2048 Fujitsu Numerical Wind Tunnel NEC SX-3/44R Cray-2/8 ETA10-G/8 103 Cray X-MP/4 M-13 Cray-1 CDC Cyber 205 CDC STAR-100 ILLIAC IV CDC 7600 CDC 6600 Jaguar (#2 11/2010) 0 10 IBM 7030 Stretch UNIVAC LARC IBM 7090 Cray XT5 Rpeak 2.33 PFlop/s IBM 709 Rmax 1.76 PFlop/s IBM 704 10-3 1950 1965 1980 1995 2010 Memory also has increased! Year W.
    [Show full text]
  • The Evolution of Software in High Energy Physics
    International Conference on Computing in High Energy and Nuclear Physics 2012 (CHEP2012) IOP Publishing Journal of Physics: Conference Series 396 (2012) 052016 doi:10.1088/1742-6596/396/5/052016 The Evolution of Software in High Energy Physics Ren´eBrun, [email protected] CERN, Geneva, Switzerland Abstract. The paper reviews the evolution of the software in High Energy Physics from the time of expensive mainframes to grids and clouds systems using thousands of multi-core processors. It focuses on the key parameters or events that have shaped the current software infrastructure. 1. Introduction A review of the evolution of the software or/and hardware in the past 40 or 50 years has been made several times in the past few years. A very interesting article by D.O.Williams [1] was published in 2005 with a general overview of the software evolution and a detailed description of the hardware and networks areas. A book [2] was recently published with a detailed description of the events, systems and people involved in this evolution. The intention of this paper is to focus on a few elements that have been very important in the development of the general tools and libraries commonly used today in HEP. As we are living in a world with an increasing frequency of changes and new features, we must prepare the ground for massive upgrades of our software systems if we want to make an efficient use of the rapidly coming parallel hardware. The general tendency has been to build a coherent family of systems as illustrated in Figure 1.
    [Show full text]
  • Needs and Challenges for Modeling FACET-II and Beyond
    Needs and challenges for modeling FACET-II and Beyond W.B.Mori University of California Los Angeles (UCLA) Departments of Physics and Astronomy and of Electrical Engineering Institute of Digital Research and Education [email protected] With help from W. An, X.Xu, C. Joshi, R. Fonseca, A. Tableman, W. Lu, M. Hogan, PICKSC, and SLAC. Simulations will be critical for FACET-II and PWFA linear collider research • Need simulation tools that can support the design of experiments at FACET II. • Need simulation tools that can aid in interpreting experiments at FACET II. • Need simulation tools that can simulate new physics concepts, e.g., 3D down ramp injection and matching sections. • Need simulation tools that can simulate physics of a PWFA-LC including the final focus. • Need simulation tools that aid in helping to design a self-consistent set of parameters for a PWFA-LC. Simulations are critical for FACET-II and PWFA linear collider research • Simulations tools need to be continually improved and validated. • Simulation tools need to run on entire ecosystem of resources. • Simulation and analysis tools need to be easy to use. • Relationship between code developers/maintainers and users is critical (best practices are not always easy to document). Local clusters can be very useful: Dawson2 Dawson2 @ UCLA • 96 nodes • Ranked 148 in top 500 • 68 TFlops on Linpack Node configuration • 2× Intel G7 X5650 CPU • 3× NVIDIA M2070 GPU Computing Cores • Each GPU has 448 cores • total GPU cores: 129,024 • total CPU cores:1152 Funded by NSF Existing leadership
    [Show full text]
  • Recent Supercomputing Development in Japan
    Development of Supercomputers in Japan --From the Numerical Wind Tunnel to Fugaku-- Yoshio Oyanagi Science Advisor, RIST Kobe Center (高度情報科学技術研究機構) 2019/9/5 KSC 2019 1 How computers started in Japan PREHISTORY 2019/9/5 KSC 2019 2 “Computers” before WWII • Many Powers/Hollerith Punch Card Systems have been introduced to Japan since 1923 • The first one in universities: IBM Punch Card Systems in Kobe Univ. in 1941 • Since no PCS could be imported during WWII, Japan tried to copy the machine. • The next slide shows computer history exhibit of Kobe Univ. RIEB (Res. Inst. for Economics and Business Administration) . 2019/9/5 KSC 2019 3 2019/9/5 KSC 2019 4 Universities in 1950’s and 1960’s • Big computers were too expensive to buy with ordinary budget (we were poor, then!) • Some universities introduced small computers • Some universities developed computers by themselves – TAC 1952-1959 Univ. of Tokyo (Eng.) (with Toshiba) V – (no name) 1953-unfinished Osaka Univ. V – PC-1 1956- 1958 Univ. of Tokyo (Sci.) P – KDC-1 1958-1960 Kyoto Univ. (with Hitachi) T – SENAC-1 1956-1958 Tohoku Univ. (with NEC) P – K-1 1958-1960 Keio Univ. T • Those computers were open to teachers and students in the campus and were heavily used 2019/9/5 KSC 2019 5 PC-1 in Univ. of Tokyo (hand made) 2019/9/5 KSC 2019 6 Authorized Computer Centers • 1963/5/13 Recommendation of Science Council of Japan • 1963/7 Budget Proposal from Univ. of Tokyo • 1964/3 Accepted by government • Finalist computers were: – Hitachi HITAC 5020 (under development) – IBM 7094 II (popular but out-of-date) – CDC 3600 (new in Japan) • 1964/5 Committee selected HITAC 5020 – Rejected American computers 2019/9/5 KSC 2019 7 Computer Center, Univ.
    [Show full text]
  • The Scaled-Sized Model: a Revision of Amdahl's
    The Scaled-Sized Model: A Revision of Amdahl’s Law John L. Gustafson Sandia National Laboratories Abstract A popular argument, generally attributed to Amdahl [1], is that vector and parallel architectures should not be carried to extremes because the scalar or serial portion of the code will eventually dominate. Since pipeline stages and extra processors obviously add hardware cost, a corollary to this argument is that the most cost-effective computer is one based on uniprocessor, scalar principles. For architectures that are both parallel and vector, the argument is compounded, making it appear that near-optimal performance on such architectures is a near-impossibility. A new argument is presented that is based on the assumption that program execution time, not problem size, is constant for various amounts of vectorization and parallelism. This has a dramatic effect on Amdahl’s argument, revealing that one can be much more optimistic about achieving high speedups on massively parallel and highly vectorized machines. The revised argument is Figure 1. Fixed-Sized Model (Amdahl’s Argument) supported by recent results of over 1000 times speedup on 1024 processors on several practical scientific applications The figure makes obvious the fact that speedup with these [2]. assumptions can never exceed 1 ⁄ s even if N is infinite. However, the assumption that the problem is fixed in size is 1. Introduction questionable. An alternative is presented in the next section. We begin with a review of Amdahl’s general argument, show a revision that alters the functional form of his equation for 3. Scaled Problem Model speedup, and then discuss consequences for vector, parallel, and When given a more powerful processor, the problem generally vector-parallel architectures.
    [Show full text]
  • The Marketplace of High Performance Computing
    The Marketplace of High Performance Computing a1 a;b2 c3 Erich Strohmaier , Jack J. Dongarra , Hans W. Meuer d4 and Horst D. Simon a Computer Science Department, University of Tennessee, Knoxvil le, TN 37996 b Mathematical ScienceSection, Oak Ridge National Lab., Oak Ridge, TN 37831 c Computing Center, University of Mannheim, D-68131 Mannheim, Germany d NERSC, Lawrence Berkeley Laboratory, 50A, Berkeley, CA 94720 Abstract In this pap er we analyze the ma jor trends and changes in the High Performance Computing HPC market place since the b eginning of the journal `Parallel Com- puting'. The initial success of vector computers in the seventies was driven byraw p erformance. The intro duction of this typ e of computer systems started the area of `Sup ercomputing'. In the eighties the availability of standard developmentenviron- ments and of application software packages b ecame more imp ortant. These criteria determined next to p erformance the success of MP vector systems esp ecially at in- dustrial customers. MPPs b ecame successful in the early nineties due to their b etter price/p erformance ratios whichwas enabled by the attack of the `killer-micros'. In the lower and medium market segments the MPPs were replaced by micropro cessor based SMP systems in the middle of the nineties. This success was the basis for the emerging cluster concepts for the very high end systems. In the last few years only the companies whichhaveentered the emerging markets for massive paral- lel database servers and nancial applications attract enough business volume to b e able to supp ort the hardware development for the numerical high end comput- ing market as well.
    [Show full text]