Developing supercomputers and computational physics
Akira Ukawa Center for Computational Physics University of Tsukuba
Academic HPC environment in Japan Center for Computational Physics lattice QCD on CP-PACS computer CP-PACS/GRAPE-6 complex and galaxy formation simulation National networks and beyond
1 Academic HPC environment in Japan
Supercomputer resources viewed in the Top 500 list
June 1997 June 2002 30 systems 20 systems 23% of total Rmax 37% of total Rmax in the top 100 in the top 100
Top500-june1997 Top500-June2002 104 10 105 100
USA JAMSTEC USA CP-PACS at CCP Germany Germany Japan Japan UK UK France 1000 1 104 France 10 Earth Simulator CCP Tokyo NAL Tflops Tokyo Tflops Osaka Osaka Kyushu KEK JAERI NEC KEK Rmax(GFlops) Tokyo Rmax(GFlops)
100 0.1 1000 IMR 1 Tsukuba GSIC JMA CBRC NIED Tohoku JAERI Nagoya RWC ISSP NIG ISSP JAERI Kyushu JAIST NAL NIEA Nagoya NAL Kyoto AIST TACC RIKEN JAMSTEC NRIM JAERI NCC
10 0.01 100 0.1 0 255075100 0255075100 Rank Rank 2 Supercomputers classified by affiliations
Computer Centers of major universities Academic research institutes/organizations Each major field has its own supercomputer facility e.g., High energy physics Condensed matter physics Astrophysics Genetics/bioinformatics Government labs AIST/JAERI/NRIM…. Earth Simulator belongs to JAMSTEC (Japan Marine science and Technology Center)
3 Resource allocation to users
Computer centers of universities Charged by CPU time and disk usage Special low rates for larg-scale usage Academic research institutes/organizations Application by users and selection by peer committee Typical large-scale applications: 1 week of full system peak max allocation #projects/year KEK high energy 1.2Tflops decide by case 20 NAO astrophysics 0.6Tflops 1 week of full system 20 ISSP condensed matter 0.6Tflops 1 week of full system 100 Government labs Mission-oriented Limited access by outside users Earth Simulator: application and selection 27 projects this year Atomosphere/ocean 15; earth 8; computer science 3; others 1
4 System procurement and upgrade
Installed under rental contract with vendors Contract renewed at every 5-6 years Selection of system/vendor by a bidding under strict government regulations Rental budget part of annual budget of installation sites
System renewal at every 5-6 years successfully provided high level of HPC resources to general scientific users
In Tsukuba, a somewhat different tradition has been pursued.....
5 Center for computational Physics University of Tsukuba
Founded in 1992 Emphasis on Development of HPC systems suitable for computational physics Close collaboration of physicists and computer scientists
Computing facility City of Tsukuba CP-PACS parallel system MPP with 2048PU/0.6Tflops peak Developed at the Center with Hitachi Ltd. #1 of Top500-November 1996 GRAPE-6 system Dedicated to gravity calculations Developed at U. Tokyo 8Tflops equivalent
6 History of PACS/PAX computers at Tsukuba
Long history (since late ’70s) of developing parallel machines for scientific calculations PU board of PACS-9 (1978) PAX-128 (1983) with control unit
CP-PACS/96 Year Machine Performance #PU Memory CPU/MPU ------1978 PACS-9 7 KFLOPS 9 M6800 1980 PAX-32 0.5 MFLOPS 32 576 KB M6800/AM9511 1983 PAX-128 4 MFLOPS 128 3 MB M68B00/AM9511-4 1984 PAX-32J 3 MFLOPS 32 4 MB DCJ-11 1989 QCDPAX 14 GFLOPS 480 3 GB M68020/L64133 1996 CP-PACS 307 GFLOPS 1024 64 GB extended PA-RISC 1997 CP-PACS 614 GFLOPS 2048 128 GB 1.1 ------
PU array of PAX-32 (1980) PAX-32J (1984) A cabinet of QCDPAX QCDPAX/89 (1989)
7 CP-PACS parallel computer
MPP with 2048 compute nodes and 128 I/O nodes 0.6Tflops peak , 0.3Gflops/node, 16x16x17 node array, 3-dim crossbar network, 0.3GB/sec throughput/link Built by a joint effort of Computer scientists (about 15 members) Physicists (about 15 members) Vendor (Hitachi Ltd.) Developed to SR2201/SR8000 series Architecture development PVP-SW (vector processing on a RISC processor) Remote-DMA (zero-copy data transfer between nodes) Detailed benchmark on real applications (QCD)
8 Computational physics with CP-PACS
Concentrated usage on a few fundamental physics problems which demands large- scale calculations
High energy physics; QCD
Astrophysics; Radiation hydrodynamics
Condensed matter; phases of solid hydrogen
9 CP-PACS run statistics Oct.96-Feb.02 averaged CP-PACS Jobs Used Time 100% maintenance PU2048 PU1024 PU512 PU256 PU128 PU64 PU48 PIOU usage (*106hour) (days)
87% 30
1.E+061.4
25 1.E+061.2
1.E+061 20
8.E+050.8 15
6.E+050.6
10
4.E+050.4
5 2.E+050.2
0.E+000 0 1996.4 1996.5 1996.6 1996.7 1996.8 1996.9 1997.1 1997.2 1997.3 1997.4 1997.5 1997.6 1997.7 1997.8 1997.9 1998.1 1998.2 1998.3 1998.4 1998.5 1998.6 1998.7 1998.8 1998.9 1999.1 1999.2 1999.3 1999.4 1999.5 1999.6 1999.7 1999.8 1999.9 2000.1 2000.2 2000.3 2000.4 2000.5 2000.6 2000.7 2000.8 2000.9 2001.1 2001.2 2001.3 2001.4 2001.5 2001.6 2001.7 2001.8 2001.9 2002.1 2002.2 1996.10 1996.11 1996.12 1997.10 1997.11 1997.12 1998.10 1998.11 1998.12 1999.10 1999.11 1999.12 2000.10 2000.11 2000.12 2001.10 2001.11 2001.12 Month Oct. 1996 Feb. 2002
10 Lattice QCD on the CP-PACS computer
Lattice Quantum Chromodynamics (QCD) and its goal Physics achievements Quenched light hadron spectrum Light quark masses in two-flavor full QCD ΔI=1/2 rule and CP violation parameter ε’/ε in weak decays of K mesons
11 Lattice QCD and its goal
Quantum Chromodynamics (QCD) Fundamental theory of quarks and gluons and their strong interactions Knowing 1 coupling constant α and 6 quark masses s will allow full understanding of strong α mu ,md ,ms ,mc ,mb ,mt interactions (Yukawa’sdream) ψ γ ψ ψ = 1 ()+ ()⋅ + ψ From computational point of view S ∑tr UUUU ∑ U mq Relatively simple calculation s P ψ ψ 1 ψ − Uniform mesh O()U, , = dUd d O()U, ,ψ e S Single scale Z ∫ Requires much computing power due to 4-dim. Problem Fermions essential Physics is at lattice spacing a=0 Precision required( 12 Quenched hadron mass spectrum Benchmark calculation to verify QCD Pursued since 1981 (Weingarten/Hamber- Parisi) Quenched: quark-antiquark pair creation/annihilation ignored Essential to control various sysmatic errors down to a % level Finite lattice size L>3fm Finite quark mass mq→0 Finite lattice spacing a→0 Experimental spectrum 13 CP-PACS result for the quenched spectrum General pattern in good agreement with experiment Clear systematic deviation below 10% level Indirect evidence of sea quark effect Completes the calculation pursued since 1981 Calculated quenched spectrum 14 Two-flavor full QCD Spectrum of quarks 3 light quarks (u,d,s) m < 1GeV Need dynamical simulation 3 heavy quarks (c,b,t) m >1GeV Quenching sufficient Dynamical quark simulation costs 100-1000 times more computing power Algorithm for odd number of quarks still being developed First systematic study of two-flavor full QCD u and d quark dynamical simulation s quark quenched approximation 15 Quark masses from lattice QCD Continuum extrapolation of light quark masses Several methods yield a unique value in the continuum limit Significant decrease by inclusion of sea quark effects up&down quark Quenched value strange quark Two-flavor full QCD 16 Summary on quark masses Significant sea quark effects phenomenological estimate 160 Lower edge of 140 m phenomenological s estimates; 120 lighter than one believed 100 for a long time 80 60 Nf=3 simulations being quark (MeV) mass 40 pursued to obtain physical 20 m ( x 10) values of light quark ud 0 masses quenched two-flavor full QCD Real world;three flavors 17 ΔI=1/2 rule and CP violation in K decays Weak interaction decays of K mesons Re()A K →ππ ()I = 0 ΔI=1/2 rule 0 ≈ 22 ()→ ππ ()= ε Re A2 K I 2 ω CP violation Im A Im A ε' = 2 − 0 ε 2 Re A2 Re A0 ()± × −3 = 20.7 2.8 10 KTeV experiment (FNAL) ()−3 15.3 ± 2.6 ×10 NA48experiment (CERN) Crucial numbers to verify the Standard Model understanding of CP violation (matter-antimatter asymmetry) Two large-scale calculations using domain-wall QCD RIKEN-BNL-Columbia by QCDSP CP-PACS 18 ΔI=1/2 rule Reasonable agreement with experiment for I=2 About half of experiment for I=0 RIKEN-BNL-Columbia obtains a somewhat different result ( smaller I=2 and larger I=0) 19 CP violation parameter ε’/ε Small and negative in ε disagreement with ω ε' Im A2 Im A0 experiment = − ε Re A Re A Similar result from RIKEN- 2 2 0 BNL-Columbia Possible reasons connected with insufficient enhancement of ΔI=1/2 rule Method of calculation (K→πreduction) may have problems Still a big problem requiring further work 20 Scale of QCD simulations Sustained speed Quenched QCD 64^3x112 53% of peak Full QCD 24^3x32 34% of peak Total CPU time with CP-PACS (0.6Tflops peak) Quenched QCD 199 days of full machine Two-flavor full QCD 415 days of full machine K decay 180 days of full machine Scaling law π −6 5 m / m 7 = ⋅ #conf ⋅ ρ ⋅ L ⋅ 1/ a ⋅ # FLOP's C Tflops year 1000 0.6 3 fm 2GeV C ≈ 2.8 O(10) Tflops machine needed for next progress QCDOC (Mawhinney)/ApeNEXT(Jansen) 21 Astrophysics simulation with CP-PACS/GRAPE-6 complex Concept of HMCS (heterogeneous multi-computer system) Prototype HMCS with CP-PACS/GRAPE-6 Galaxy formation in the Early Universe 22 Concept of HMCS Two interaction types of physical systems Short-ranged long-ranged Complex and manifold universal (gravity, Coulomb) O(N) calculations O(N^2) calculations but complex but simple A hybrid approach: General purpose MPP to deal with short-ranged interactions Special-purpose MPP to accelerate calculations of long-ranged interactions Coupling of the two systems by a parallel network 23 CP-PACS/GRAPE-6 as a prototype HMCS GRAPE-6: Developed by Makino et al (U. N m m Tokyo); − G i j , i = 1, , N ∑ 2 L Gordon Bell winner of 2001 j=1 rij Gravity calculation accelerator 32Gflops/chip x 32chip/board =1Tflops/board CP-PACS GRAPE-6 Parallel I/O system General-purpose MPP Special-purpose MPP PAVEMENT/PIO 0.6Tflops 8Tflops 100base-TX Host PC switch 24 Galaxy formation simulation on HMCS M. Umemura & H. Susa Cosmic Time Redshift Z Big Bang 10-44sec Radiation Matter (Baryon, Dark matter) 0.1Myr 103 Neutral DARK AGE 0.5Gyr 10 1Gyr QSO 1Gyr QSO Ionized 5 Galaxy formation 15Gyr dE E S0 SB S dI 0 (The Present) 25 Structure of the calculation Hydrodynamic motion of matter; ρρ Smoothed Particle Hydrodynamics ()rrr=−W (¦¦) (SPH) iij∑ 0 W j Represents matter density as : kernelj function collection of particles ρ N m m Self-gravity acting on matter: − G i j , i = 1, , N ∑ 2 L j=1 rij Direct calculation by GRAPE-6. ν ν ∂ ν 1 I ν Radiative transfer; + n⋅∇I = χ (S − Iν ) c ∂t Interaction of photon with matter is solved Chemical reaction & radiative cooling 26 Algorithm Self-Gravity GRAPE-6 Density (SPH) Radiative Transfer CP-PACS Iteration Chemical Reaction Temperature Pressure Gradient Integration of Equation of Motion 27 Calculation parameters Processors 1024PU of CP-PACS 0.3TFLOPS 4 boards of GRAPE-6 4TFLOPS Particle numbers 64K baryonic matter particles 64K dark matter particles calculation time/step 4 sec for calculation on CP-PACS 3 sec for communication to/from GRAPE-6 0.1 sec for calculation on GRAPE-6 total 7.1 sec/time step 28 QSO 29 0.5 Gyr 0.3 Gyr 1 Gyr 0.7 Gyr 30 National network and beyond TsukubaWAN SuperSINET 31 Tsukuba Science City High density of HPC-related research organizations High energy physics KEK/U of Tsukuba/… Matierial science AIST/NRIM/U of Tsukuba/… Life science RIKEN/AIST/… Computer science AIST/RWCP/… High density of supercomputers Tsukuba 12 systems in the Top500 (june/2002) rank organization vendor model Gflops year 26 KEK Hitachi SR8000-F1/100 917.00 2000 45 U of Tsukuba Fujitsu VPP5000/80 730.00 2001 56 AIST/CBRC NECbe lf-ma Magi Clusterd SCore PIII III/PIII 654.00 93 618.30 2001 2001 57 RWCP/Tsuku89 NIES NEC SX-6/64M8 495.20 2002 99 AIST/TACC Hitachi SR8000/64 449.00 1999 125 U of Tsukuba Hitachi CP-PACS/2048 368.20M 306.30 1996 2002 Tokyo 160 NIED187 MRI SGI Hitachi ORIGIN SR8000/36 3000 500 255.00 1999 228 NRIM NEC SX-5/32H2 241.40 2000 426 U of Tsukuba Hitachi SR8000-G1/12 150.10 2002 32 TsukubaWAN City-wide network to TsukubaWAN schematic view connect research KEK institutions KEK Loop length 30km 13 sites at present NTT Optical ring NTT NIED 10Gbps backbone NIED U. Tsukuba/CCP OADM/ADM node U. Tsukuba/CCP Operating since March GBLAB NRIM 2002 GBLAB NRIM Wil be connected to 光 Gbit-SW 伝 送 SuperSINET at U of 路 NIES NIES Router Tsukuba Super computer AISTAIST NARO NARO 33 TsukubaWan configuration SC:Super-Computer SuperSINET OADM: Optical Add- U of Lib. Info. Drop Multiplexer ADM U of Tsukuba Res. Exch. Center ADM:Add-Drop Multiplexer SC ADM 13 sites KEK SC OADMADM OADM SC NIMS OADMADM OADMADM ADM NILIM SC SC AIST OADMADM OADMADM NIESDP NTT AS Lab To research labs SC OADMADM OADMADM AFF Supercomputer NW Supercomputer NW GbE GbE ADM ADM 10Gbps×6 NIE 10Gbps×6 Gbit Lab 2Gbps×1 ADM 2Gbps×1 researchNW researchNW Tsukuba NOC GbE JGN network IMNet GbE 600Mbps path 600Mbps path 34 SuperSINET Nationwide research network connecting Major universities Research institutes/organizations Government labs Optical network with Optical cross connect 10Gbps WDM backbone 1Gbps point-to-point connection Operating since Jan. 2002 35 SuperSINET research projects High-energy and fusion research KEK/BELLE and LHC/ATLAS experiment data analysis Fusion data analysis Lattice QCD Space and astronomical science On-line VLBI GRAPE net etc Genome information analysis Genome analysis Genome database GRID Supercomputer network Nanotechnology Material simulation Grid-based large-scale simulations 36 Conclusions Supercomputer Centers in Japan Important component of science and engineering research in Japan Have worked well in providing HPC resources Direction pursued at CCP physicist/computer scientist collaboration academia/vendor collaboration concentration on key problems Successfully attacked several fundamental issues in physics With emerging high speed network Closer collaboration among centers in Japan Closer collaboration world-wide 37