Some Thoughts for Particle-In-Cell on Leadership Class Computers

Total Page:16

File Type:pdf, Size:1020Kb

Some Thoughts for Particle-In-Cell on Leadership Class Computers Some thoughts for particle-in-cell on leadership class computers W. B. Mori University of California Los Angeles (UCLA) Departments of Physics and Astronomy and of Electrical Engineering Institute of Digital Research and Education [email protected] And much of the material from the Joule Metric study is from: R.A. Fonseca GoLP/IPFN, Instituto Superior Técnico, Lisboa, Portugal W. B. Mori | IPAM, May 14th | PLWS 2012 My experimental colleagues view of the world Life of an experimentalist Life of a simulationist W. B. Mori | IPAM, May 14th | PLWS 2012 But we face a tsunami of advances... ...in theory, in computational methods, in new hardware, and in new algorithms Contrary to the perception of my experimental colleagues, we rarely get to “relax” . This workshop offers us a rare opportunity for us to think in a “quiet room”. W. B. Mori | IPAM, May 14th | PLWS 2012 The evolution of state-of-the-art HPC systems cannot be ignored Top System Performance [MFlop/s] Tianhe-1A IBM RoadRunner 9 Cray XT5-HE 10 IBM Blue Gene/L IBM Blue Gene/L NEC Earth Simulator SGI Project Columbia IBM ASCI White Intel ASCI Red/9152 106 Intel ASCI Red/9632 Intel Paragon XP/S 140 Hitachi/Tsukuba CP-PACS/2048 Fujitsu Numerical Wind Tunnel NEC SX-3/44R Cray-2/8 ETA10-G/8 103 Cray X-MP/4 M-13 Cray-1 CDC Cyber 205 CDC STAR-100 ILLIAC IV CDC 7600 CDC 6600 Jaguar (#2 11/2010) 0 10 IBM 7030 Stretch UNIVAC LARC IBM 7090 Cray XT5 Rpeak 2.33 PFlop/s IBM 709 Rmax 1.76 PFlop/s IBM 704 10-3 1950 1965 1980 1995 2010 Memory also has increased! Year W. B. Mori | IPAM, May 14th | PLWS 2012 Petascale computing has arrived and exascale is around the corner • Jaguar (jaguarpf) • XT5 Compute node • Complete system • 18688 compute nodes • Dual hex-core AMD Opteron 2435 224256 processing cores • dedicated service/login nodes (Istanbul) at 2.6 GHz 300 TB of memory • SeaStar2+ network • 16GB DDR2-800 memory Peak performance 2.3 PFlop/s W. B.W.B.Mori Mori | IPAM, | March May 6, 2012 14th | |IST PLWS 2012 Outline Introduction/Motivation The particle-in-cell approach Code developments for HEDP modeling in HPC systems Conclusions and perspectives W. B. Mori | IPAM, May 14th | PLWS 2012 Outline Introduction/Motivation The particle-in-cell approach Code developments for HEDP modeling in HPC systems Conclusions and perspectives W. B. Mori | IPAM, May 14th | PLWS 2012 Advanced ICF modeling is demanding due to different scales involved eg. fast Ignition or shock ignition Typical compressed target Computational requirements for PIC Physical size Box size: 1 mm Cell size: 5 Å Duration: 10 ps Time step: 1 as (10-18 s) Numerical size # cells/dim: 2x106 # particles/cell: 100 (1D); 10 (2D); 1 (3D) # time steps: 106 Particle push time: 1 ms Computational time 1D - 2x103 CPU days H2 gas jet 2D - 5x108 CPU days ~ 106 CPU years Laser duration =CO 102 ps - 10 ns 11 8 laser pulse 3D - 2x10 CPU days ~ 7x10 CPU years 400 nm driver pulse W. B. Mori | IPAM, May 14th | PLWS 2012 What is high energy density plasma physics? Why are both plasma-based acceleration and the nonlinear optics of plasmas considered high-energy density plasma research? • High energy density, means high pressure – What is a high pressure? • MBar? GBar? – Need a dimensionless parameter • In plasma physics an important parameter is the number of particles in a Debye sphere (which is directly related to the ratio of the kinetic energy of an electron to the potential energy between particles). It measures the discreteness of the plasma. 4π T 2 nλ3 N =2.1 103 keV 3 d ≡ D × 1/2 PMBar – When the pressure exceeds ~1 MBar then the discrete nature of the plasma becomes important: • ND is not “infinite” • Discreteness makes developing computational methods difficult How can we describe the plasma physics? Hierarchy of descriptions (ignoring quantum effects) ‣ Klimontovich equation (“exact”). ‣ Ensemble average the Klimontovich equation • Leads to Vlasov Fokker Planck equation (approximate) ‣ Take the limit that ND is very very large • Vlasov equation ‣ Take moments of the Vlasov or Vlasov Fokker Planck equation • “Two” fluid (transport) equations ‣ Ignore electron inertia • Single fluid or MHD equations W. B. Mori | IPAM, May 14th | PLWS 2012 Klimontovich description (“exact”) Klimontovich equation Maxwell’s equations D ∂ 4π F =0 E = B J Dt ∂t ∇× − c N ∂ F (x , v ; t)= δ(x x (t))δ(v v (t)) B = E − i − i ∂t −∇ × i D ∂ + v + a Dt ≡ t ·∇x ·∇v J(x , t )= dv qv F (x , v , t ) d q v a v = E + B ≡ dt m c × W. B. Mori | IPAM, May 14th | PLWS 2012 Descriptions of a plasma Not practical to follow every particle • Ensemble average of Klimontovich equation • Ensemble averages Equation for smooth F D d F = F + δF F <F > FS = <δa vδf >= FS s s Dt − ·∇ dt ≡ S col J = J + δJ S D B = BS + δB δF = δa F + δa δF <δa δF > Dt − ·∇v s ·∇v − ·∇v s E = ES + δE q v a = E + B s m s c × s D ∂ = + v + a Dt ∂t ·∇x s ·∇v s Descriptions of a plasma Where does the Vlasov equation fit into this • Smooth F descriptions • Vlasov Fokker Planck Vlasov Equation D d F = <δa δf >= F D Dt S − ·∇v dt S S col Fs =0 Dt s D δF = δa F Dt − ·∇v s s ONLY TRUE IF ND>>>1 Sometimes called the quasi- linear approximation Descriptions of a plasma Where does the Vlasov equation fit into this • Smooth F descriptions • Vlasov Fokker Planck Vlasov Equation D d F = <δa δf >= F D Dt S − ·∇v dt S S col Fs =0 Dt s D δF = δa F Dt − ·∇v s s ONLY TRUE IF ND>>>1 Sometimes called the quasi- linear approximation W. B. Mori | IPAM, May 14th | PLWS 2012 Outline Introduction/Motivation The particle-in-cell approach Code developments for HEDP modeling in HPC systems Conclusions and perspectives W. B. Mori | IPAM, May 14th | PLWS 2012 What computational method do we use to model HEDP including discrete effects? The Particle-in-cell method Not all PIC codes are the same! dp ⎛ v ⎞ = q⎜ E + × B⎟ Integration of equations of motion, Push dt ⎝ c ⎠ particles Fk à uk à xk Interpolating Depositing Δt ( E , B )ij à Fk (x,u)k à Jij Integration of Field Equations on the grid ∂E = 4π j − c∇ × B ( E , B )ij ß Jij ∂t ∂B = −c∇ × E ∂t What is the particle-in-cell model? Is it an efficient way of modeling the Vlasov equation? No, it is a Klimontovich description for finite size (macro-particles) Klimontovich eq. of macro-particles Maxwell’s equations D ∂ 4π F =0 E = B J Dt ∂t ∇× − c N ∂ F (x , v ; t)= S (x x (t))δ(v v (t)) p − i − i B = E i ∂t −∇ × D ∂ + v + a Dt ≡ t ·∇x ·∇v J(x , t )= dv qv F (x , v , t ) d q v a v = E + B ≡ dt m c × W. B. Mori | IPAM, May 14th | PLWS 2012 Does it matter that we agree on what it is? I think it does. Yes. It changes your view of what convergence tests should be done to VERIFY the implementation of the algorithm? Yes. Today’s computers are getting very powerful. We can do runs with smaller particle sizes (compared to Debye length) and with more particles. Converge to “real” conditions. Yes. A single PIC simulation is not an ensemble average. So for some problems, perhaps we should start thinking about taking ensemble averages. Yes. PIC simulations can be used to VALIDATE reduced models W.B.Mori | March 12, 2012 | UCLA Outline Introduction/Motivation The particle-in-cell approach Code developments for HEDP modeling in HPC systems Conclusions and perspectives W. B. Mori | IPAM, May 14th | PLWS 2012 OSIRIS 2.0 osiris framework · Massivelly Parallel, Fully Relativistic Particle-in-Cell (PIC) Code · Visualization and Data Analysis Infrastructure · Developed by the osiris.consortium ⇒ UCLA + IST New Features in v2.0 · High-orderOptimized igh-ordersplines splines · Binary Collision Module · PML absorbing BC · Tunnel (ADK) and Impact Ionization Ricardo Fonseca: [email protected] · Dynamic Load Balancing Frank Tsung: [email protected] · Parallel I/O http://cfp.ist.utl.pt/golp/epp/ · Hybrid options http://exodus.physics.ucla.edu/ W. B. Mori | IPAM, May 14th | PLWS 2012 Want to discuss what performance is currently possible with PIC. This sets a bar for other kinetic approaches for HEDP Results from a Joule Metric Excercise on Jaguar: ~30% of peak speed can be obtained dynamic load balancing can be important W.B.Mori | March 12, 2012 | UCLA High-order particle weighting Run simulations for up to ~ 107 iterations Control numerical self-heating • Stable algorithm • Increase number of particles per • Energy conservation cell • Numerical noise seeded • Use high-order particle weighting instabilities 10-4 2.93 10-5 Particles per cell 2.41 10-5 1 -5 10 4 16 -1 0 1 10-6 64 Linear 10-7 Energy Conservation 10-8 -1 0 1 100 101 102 103 104 105 Quadratic Iterations W. B. Mori | IPAM, May 14th | PLWS 2012 Calculations overhead Order Weights Interpolation 1 linear W0 =1 ∆ linear − Aint = WiAi W = ∆ S(x) 1.0 1 i=0 quadratic 1 2 W 1 = (1 2∆) 1 0.8 − 8 − 3 2 quadratic W0 = ∆ Aint = WiAi 4 − cubic 1 2 i= 1 W1 = 8 (1 + 2∆) − 0.6 1 3 W 1 = ( 1+∆) − − 6 − 1 2 3 2 0.4 W0 = 6 4 6∆ +3∆ cubic − Aint = WiAi W = 1 1+3∆ +3∆2 3∆3 1 6 i= 1 ∆3 − − 0.2 W2 = 6 -2 -1 0 1 2 x [cell] Ops Linear Quadratic Cubic B-spline blending functions 2D 219 480 890 performance 3D 349 1180 2863 Measured performance: • 3D quadratic ~ 2.0 × slower than 3D linear ~ 3.4 × • pipeline and cache effects W.
Recommended publications
  • (52) Cont~Ol Data
    C) (52) CONT~OL DATA literature and Distribution Services ~~.) 308 North Dale Street I st. Paul. Minnesota 55103 rJ 1 August 29, 1983 "r--"-....." (I ~ __ ,I Dear Customer: Attached is the third (3) catalog supplement since the 1938 catalog was published . .. .·Af ~ ~>J if-?/t~--62--- G. F. Moore, Manager Literature & Distribution Services ,~-" l)""... ...... I _._---------_._----_._----_._-------- - _......... __ ._.- - LOS CATALOG SUPPLEPtENT -- AUGUST 1988 Pub No. Rev [Page] TITLE' [ extracted from catalog entry] Bind Price + = New Publication r = Revision - = Obsolete r 15190060 [4-07] FULL SCREEN EDITOR (FSEDIT) RM (NOS 1 & 2) .......•...•.•...•••........... 12.00 r 15190118 K [4-07] NETWORK JOB ENTRY FACILITY (NJEF) IH8 (NOS 2) ........................... 5.00 r 15190129 F [4-07] NETWORK JOB ENTRY FACILITY (NJEF) RM (NOS 2) .........•.......•........... + 15190150 C [4-07] NETWORK TRANSFER FACILITY (NTF) USAGE (NOS/VE) .......................... 15.00 r 15190762 [4-07] TIELINE/NP V2 IHB (L642) (NOS 2) ........................................ 12.00 r 20489200 o [4-29] WREN II HALF-HEIGHT 5-114" DISK DRIVE ................................... + 20493400 [4-20] CDCNET DEVICE INTERFACE UNITS ........................................... + 20493600 [4-20] CDCNET ETHERNET EQUIPMENT ............................................... r 20523200 B [4-14] COMPUTER MAINTENANCE SERVICES - DEC ..................................... r 20535300 A [4-29] WREN II 5-1/4" RLL CERTIFIED ............................................ r 20537300 A [4-18] SOFTWARE
    [Show full text]
  • Future of Supercomputing Yoshio Oyanagi ∗
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Elsevier - Publisher Connector Journal of Computational and Applied Mathematics 149 (2002) 147–153 www.elsevier.com/locate/cam Future of supercomputing Yoshio Oyanagi ∗ Department of Computer Science, University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo, 113-0033 Japan Received 26 October 2001; received in revised form 11 February 2002 Abstract Supercomputing or High Performance Computing plays ever more important roles in industrial ÿelds as well as in basic research. Based upon the history of supercomputers in the last few decades, a personal view of the supercomputing in the ÿrst decade of the 21st century is presented. c 2002 Elsevier Science B.V. All rights reserved. Keywords: Supercomputers; HPC; Simulation; Computational science; Peta5ops computer 1. From supercomputing to high performance computing The ÿrst “supercomputer” Cray-1 was built by Seymour Cray in 1976 and people thought only a fewof such supercomputers wouldmeet the demand from science and engineering. At present, however, a notebook PC in your bag supersedes the computing power of Cray-1. Supercomputing has played important roles in industrial ÿelds such as automobile, aeronautics, building and civil engineering, electric and electronic engineering, material and pharmaceutics, as well as in basic research such as elementary particles, chemistry, condensed matter, genome, protein and complex systems. Supercomputers are regarded as the computers which have an order of magnitude higher performance. They are special expensive facilities which only government laboratories or universities or big companies can a<ord. Currently, although supercomputing is playing ever more important roles in various ÿelds, the word “supercomputer” is not very popular.
    [Show full text]
  • Recent Supercomputing Development in Japan
    Supercomputing in Japan Yoshio Oyanagi Dean, Faculty of Information Science Kogakuin University 2006/4/24 1 Generations • Primordial Ages (1970’s) – Cray-1, 75APU, IAP • 1st Generation (1H of 1980’s) – Cyber205, XMP, S810, VP200, SX-2 • 2nd Generation (2H of 1980’s) – YMP, ETA-10, S820, VP2600, SX-3, nCUBE, CM-1 • 3rd Generation (1H of 1990’s) – C90, T3D, Cray-3, S3800, VPP500, SX-4, SP-1/2, CM-5, KSR2 (HPC ventures went out) • 4th Generation (2H of 1990’s) – T90, T3E, SV1, SP-3, Starfire, VPP300/700/5000, SX-5, SR2201/8000, ASCI(Red, Blue) • 5th Generation (1H of 2000’s) – ASCI,TeraGrid,BlueGene/L,X1, Origin,Power4/5, ES, SX- 6/7/8, PP HPC2500, SR11000, …. 2006/4/24 2 Primordial Ages (1970’s) 1974 DAP, BSP and HEP started 1975 ILLIAC IV becomes operational 1976 Cray-1 delivered to LANL 80MHz, 160MF 1976 FPS AP-120B delivered 1977 FACOM230-75 APU 22MF 1978 HITAC M-180 IAP 1978 PAX project started (Hoshino and Kawai) 1979 HEP operational as a single processor 1979 HITAC M-200H IAP 48MF 1982 NEC ACOS-1000 IAP 28MF 1982 HITAC M280H IAP 67MF 2006/4/24 3 Characteristics of Japanese SC’s 1. Manufactured by main-frame vendors with semiconductor facilities (not ventures) 2. Vector processors are attached to mainframes 3. HITAC IAP a) memory-to-memory b) summation, inner product and 1st order recurrence can be vectorized c) vectorization of loops with IF’s (M280) 4. No high performance parallel machines 2006/4/24 4 1st Generation (1H of 1980’s) 1981 FPS-164 (64 bits) 1981 CDC Cyber 205 400MF 1982 Cray XMP-2 Steve Chen 630MF 1982 Cosmic Cube in Caltech, Alliant FX/8 delivered, HEP installed 1983 HITAC S-810/20 630MF 1983 FACOM VP-200 570MF 1983 Encore, Sequent and TMC founded, ETA span off from CDC 2006/4/24 5 1st Generation (1H of 1980’s) (continued) 1984 Multiflow founded 1984 Cray XMP-4 1260MF 1984 PAX-64J completed (Tsukuba) 1985 NEC SX-2 1300MF 1985 FPS-264 1985 Convex C1 1985 Cray-2 1952MF 1985 Intel iPSC/1, T414, NCUBE/1, Stellar, Ardent… 1985 FACOM VP-400 1140MF 1986 CM-1 shipped, FPS T-series (max 1TF!!) 2006/4/24 6 Characteristics of Japanese SC in the 1st G.
    [Show full text]
  • VHSIC and ETA10
    VHSIC and The First CMOS and Only Cryogenically Cooled Super Computer David Bondurant Former Honeywell Solid State Electronics Division VHSIC System Applications Manager ETA10 Supercomputer at MET Office - UK’s National Weather Forecasting Service Sources • “Very High Speed Integrated Circuits (VHSIC) Final Program Report 1980-1990, VHSIC Program Office”, Office of the Under Secretary of Defense for Acquisition, Deputy Director, Defense Research and Engineering for Research and Advanced Technology, September 30, 1990 • Carlson, Sullivan, Bach, and Resnick, “The ETA10 Liquid-Nitrogen-Cooled Supercomputer System”, IEEE Transactions on Electron Devices, Vol. 36, No. 8, August 1989. • Cummings and Chase, “High Density Packaging for Supercomputers” • Tony Vacca, “First Hand: The First CMOS And The Only Cryogenically Cooled Supercomputer”, ethw.org • Robert Peglar, “The ETA Era or How to (Mis-)Manage a Company According to Control Data Corp.”, April 17, 1990 Background • I graduated from Missouri S&T in May 1971 • My first job was at Control Data Corporation, the leading Supercomputer Company • I worked in Memory Development • CDC 7600 was in production, 8600 (Cray 1) was in development by Seymour Cray in Chippewa Falls, WI • Star 100 Vector Processor was in Development in Arden Hills • I was assigned to the development of the first DRAM Memory Module for a CDC Supercomputer • I designed the DRAM Module Tester Using ECL Logic Control Data Corporation Arden Hills • First DRAM Module was 4Kx32 using 128 1K DRAM Development Center - June 1971 Chips
    [Show full text]
  • PERSPECTIVE on the NEXT TEN YEARS in PLASMA Physicst
    Particle Accelerators, 1986, Vol. 19, pp. 247-255 0031-2460/86/1904-0247/$15.00/0 © 1986 Gordon and Breach, Science Publishers, S.A. Printed in the United States of America PERSPECTIVE ON THE NEXT TEN YEARS IN PLASMA PHYSICSt JOHN KILLEEN National Magnetic Fusion Energy Computer Center, Lawrence Livermore National Laboratory, University of California, Livermore, California 94550 (Received March 7, 1985) A brief survey of developments in the plasma physics of magnetic-fusion research is presented. The major experimental facilities of the next decade are listed. Eight fusion-physics issues are identified which must be addressed as a complete plasma system in order to reach the goal of a fusion reactor. In order to resolve the physics issues the results from these new experimental facilities must be augmented by an extensive program of computer modeling. The use of computer models of a magnetically confined plasma and the implementation of these models on the new supercomputers of the next decade are the main topics of this paper. I. INTRODUCTION During the early 1970s the U.S. magnetic-fusion program supported at least fifteen varieties of experimental concepts. These were rather small experiments as compared to today's large facilities. During the years 1974 to 1980, the program went through a period of dramatic growth, but at the same time evaluations and reviews reduced the number of experimental concepts supported to the following six: tokamak, tandem mirror, reverse field pinch (RFP), stellarator, compact toroids, and Elmo bumpy torus. The most advanced of the above concepts is the tokamak, and all four of the major international groups have commissioned large facilities (Table I) to establish the scientific feasibility of fusion.
    [Show full text]
  • Supercomputers: the Amazing Race Gordon Bell November 2014
    Supercomputers: The Amazing Race Gordon Bell November 2014 Technical Report MSR-TR-2015-2 Gordon Bell, Researcher Emeritus Microsoft Research, Microsoft Corporation 555 California, 94104 San Francisco, CA Version 1.0 January 2015 1 Submitted to STARS IEEE Global History Network Supercomputers: The Amazing Race Timeline (The top 20 significant events. Constrained for Draft IEEE STARS Article) 1. 1957 Fortran introduced for scientific and technical computing 2. 1960 Univac LARC, IBM Stretch, and Manchester Atlas finish 1956 race to build largest “conceivable” computers 3. 1964 Beginning of Cray Era with CDC 6600 (.48 MFlops) functional parallel units. “No more small computers” –S R Cray. “First super”-G. A. Michael 4. 1964 IBM System/360 announcement. One architecture for commercial & technical use. 5. 1965 Amdahl’s Law defines the difficulty of increasing parallel processing performance based on the fraction of a program that has to be run sequentially. 6. 1976 Cray 1 Vector Processor (26 MF ) Vector data. Sid Karin: “1st Super was the Cray 1” 7. 1982 Caltech Cosmic Cube (4 node, 64 node in 1983) Cray 1 cost performance x 50. 8. 1983-93 Billion dollar SCI--Strategic Computing Initiative of DARPA IPTO response to Japanese Fifth Gen. 1990 redirected to supercomputing after failure to achieve AI goals 9. 1982 Cray XMP (1 GF) Cray shared memory vector multiprocessor 10. 1984 NSF Establishes Office of Scientific Computing in response to scientists demand and to counteract the use of VAXen as personal supercomputers 11. 1987 nCUBE (1K computers) achieves 400-600 speedup, Sandia winning first Bell Prize, stimulated Gustafson’s Law of Scalable Speed-Up, Amdahl’s Law Corollary 12.
    [Show full text]
  • The Evolution of Software in High Energy Physics
    International Conference on Computing in High Energy and Nuclear Physics 2012 (CHEP2012) IOP Publishing Journal of Physics: Conference Series 396 (2012) 052016 doi:10.1088/1742-6596/396/5/052016 The Evolution of Software in High Energy Physics Ren´eBrun, [email protected] CERN, Geneva, Switzerland Abstract. The paper reviews the evolution of the software in High Energy Physics from the time of expensive mainframes to grids and clouds systems using thousands of multi-core processors. It focuses on the key parameters or events that have shaped the current software infrastructure. 1. Introduction A review of the evolution of the software or/and hardware in the past 40 or 50 years has been made several times in the past few years. A very interesting article by D.O.Williams [1] was published in 2005 with a general overview of the software evolution and a detailed description of the hardware and networks areas. A book [2] was recently published with a detailed description of the events, systems and people involved in this evolution. The intention of this paper is to focus on a few elements that have been very important in the development of the general tools and libraries commonly used today in HEP. As we are living in a world with an increasing frequency of changes and new features, we must prepare the ground for massive upgrades of our software systems if we want to make an efficient use of the rapidly coming parallel hardware. The general tendency has been to build a coherent family of systems as illustrated in Figure 1.
    [Show full text]
  • Chapter 5. Driving Forces on Packaging: Physical Interconnects
    DRIVING FORCES ON PACKAGING: 5 PHYSICAL INTERCONNECTS The single most important element of the package and interconnect that influences the system clock speed, the performance density, and often the cost of a system, is how close together the devices can be placed. The term that best describes this is packaging efficiency. The three fabri- cation and assembly issues that constrain the packaging efficiency are: ¥ the I/O off the chipÑthe number, the form factor, the pitch ¥ the I/O off the packageÑthe number, the form factor, the pitch ¥ the interconnect substrateÑthe pad pitch, the via pitch, the line pitch, the number of layers Pin Count Requirements and RentÕs Rule As the number of gates on a single die has increased, the number of I/Os required to interface them to the outside world has also increased. In 1960, E.F. Rent of IBM identified a definite empir- ical relationship between the number of gates in a block, Ngates, and the number of I/Os they required, NI/O. This relationship, since coined RentÕs Rule, has been extended and generalized to encompass a variety of chip types and module sizes. Figure 5-1 is a plot of the signal I/Os required for various gate arrays and microprocessors. In general, for every 4 to 10 signal I/Os, one power or one ground is used. As the clock frequency goes up, a higher fraction of power and ground pads are required to keep switching noise at acceptable levels. For clock frequencies over 250MHz, the ratio is closer to 3:1 signal to power/ground pads.
    [Show full text]
  • Needs and Challenges for Modeling FACET-II and Beyond
    Needs and challenges for modeling FACET-II and Beyond W.B.Mori University of California Los Angeles (UCLA) Departments of Physics and Astronomy and of Electrical Engineering Institute of Digital Research and Education [email protected] With help from W. An, X.Xu, C. Joshi, R. Fonseca, A. Tableman, W. Lu, M. Hogan, PICKSC, and SLAC. Simulations will be critical for FACET-II and PWFA linear collider research • Need simulation tools that can support the design of experiments at FACET II. • Need simulation tools that can aid in interpreting experiments at FACET II. • Need simulation tools that can simulate new physics concepts, e.g., 3D down ramp injection and matching sections. • Need simulation tools that can simulate physics of a PWFA-LC including the final focus. • Need simulation tools that aid in helping to design a self-consistent set of parameters for a PWFA-LC. Simulations are critical for FACET-II and PWFA linear collider research • Simulations tools need to be continually improved and validated. • Simulation tools need to run on entire ecosystem of resources. • Simulation and analysis tools need to be easy to use. • Relationship between code developers/maintainers and users is critical (best practices are not always easy to document). Local clusters can be very useful: Dawson2 Dawson2 @ UCLA • 96 nodes • Ranked 148 in top 500 • 68 TFlops on Linpack Node configuration • 2× Intel G7 X5650 CPU • 3× NVIDIA M2070 GPU Computing Cores • Each GPU has 448 cores • total GPU cores: 129,024 • total CPU cores:1152 Funded by NSF Existing leadership
    [Show full text]
  • Recent Supercomputing Development in Japan
    Development of Supercomputers in Japan --From the Numerical Wind Tunnel to Fugaku-- Yoshio Oyanagi Science Advisor, RIST Kobe Center (高度情報科学技術研究機構) 2019/9/5 KSC 2019 1 How computers started in Japan PREHISTORY 2019/9/5 KSC 2019 2 “Computers” before WWII • Many Powers/Hollerith Punch Card Systems have been introduced to Japan since 1923 • The first one in universities: IBM Punch Card Systems in Kobe Univ. in 1941 • Since no PCS could be imported during WWII, Japan tried to copy the machine. • The next slide shows computer history exhibit of Kobe Univ. RIEB (Res. Inst. for Economics and Business Administration) . 2019/9/5 KSC 2019 3 2019/9/5 KSC 2019 4 Universities in 1950’s and 1960’s • Big computers were too expensive to buy with ordinary budget (we were poor, then!) • Some universities introduced small computers • Some universities developed computers by themselves – TAC 1952-1959 Univ. of Tokyo (Eng.) (with Toshiba) V – (no name) 1953-unfinished Osaka Univ. V – PC-1 1956- 1958 Univ. of Tokyo (Sci.) P – KDC-1 1958-1960 Kyoto Univ. (with Hitachi) T – SENAC-1 1956-1958 Tohoku Univ. (with NEC) P – K-1 1958-1960 Keio Univ. T • Those computers were open to teachers and students in the campus and were heavily used 2019/9/5 KSC 2019 5 PC-1 in Univ. of Tokyo (hand made) 2019/9/5 KSC 2019 6 Authorized Computer Centers • 1963/5/13 Recommendation of Science Council of Japan • 1963/7 Budget Proposal from Univ. of Tokyo • 1964/3 Accepted by government • Finalist computers were: – Hitachi HITAC 5020 (under development) – IBM 7094 II (popular but out-of-date) – CDC 3600 (new in Japan) • 1964/5 Committee selected HITAC 5020 – Rejected American computers 2019/9/5 KSC 2019 7 Computer Center, Univ.
    [Show full text]
  • The Scaled-Sized Model: a Revision of Amdahl's
    The Scaled-Sized Model: A Revision of Amdahl’s Law John L. Gustafson Sandia National Laboratories Abstract A popular argument, generally attributed to Amdahl [1], is that vector and parallel architectures should not be carried to extremes because the scalar or serial portion of the code will eventually dominate. Since pipeline stages and extra processors obviously add hardware cost, a corollary to this argument is that the most cost-effective computer is one based on uniprocessor, scalar principles. For architectures that are both parallel and vector, the argument is compounded, making it appear that near-optimal performance on such architectures is a near-impossibility. A new argument is presented that is based on the assumption that program execution time, not problem size, is constant for various amounts of vectorization and parallelism. This has a dramatic effect on Amdahl’s argument, revealing that one can be much more optimistic about achieving high speedups on massively parallel and highly vectorized machines. The revised argument is Figure 1. Fixed-Sized Model (Amdahl’s Argument) supported by recent results of over 1000 times speedup on 1024 processors on several practical scientific applications The figure makes obvious the fact that speedup with these [2]. assumptions can never exceed 1 ⁄ s even if N is infinite. However, the assumption that the problem is fixed in size is 1. Introduction questionable. An alternative is presented in the next section. We begin with a review of Amdahl’s general argument, show a revision that alters the functional form of his equation for 3. Scaled Problem Model speedup, and then discuss consequences for vector, parallel, and When given a more powerful processor, the problem generally vector-parallel architectures.
    [Show full text]
  • The Marketplace of High Performance Computing
    The Marketplace of High Performance Computing a1 a;b2 c3 Erich Strohmaier , Jack J. Dongarra , Hans W. Meuer d4 and Horst D. Simon a Computer Science Department, University of Tennessee, Knoxvil le, TN 37996 b Mathematical ScienceSection, Oak Ridge National Lab., Oak Ridge, TN 37831 c Computing Center, University of Mannheim, D-68131 Mannheim, Germany d NERSC, Lawrence Berkeley Laboratory, 50A, Berkeley, CA 94720 Abstract In this pap er we analyze the ma jor trends and changes in the High Performance Computing HPC market place since the b eginning of the journal `Parallel Com- puting'. The initial success of vector computers in the seventies was driven byraw p erformance. The intro duction of this typ e of computer systems started the area of `Sup ercomputing'. In the eighties the availability of standard developmentenviron- ments and of application software packages b ecame more imp ortant. These criteria determined next to p erformance the success of MP vector systems esp ecially at in- dustrial customers. MPPs b ecame successful in the early nineties due to their b etter price/p erformance ratios whichwas enabled by the attack of the `killer-micros'. In the lower and medium market segments the MPPs were replaced by micropro cessor based SMP systems in the middle of the nineties. This success was the basis for the emerging cluster concepts for the very high end systems. In the last few years only the companies whichhaveentered the emerging markets for massive paral- lel database servers and nancial applications attract enough business volume to b e able to supp ort the hardware development for the numerical high end comput- ing market as well.
    [Show full text]