<<

Developing and computational

Akira Ukawa Center for University of Tsukuba

† Academic HPC environment in Japan † Center for Computational Physics † QCD on CP-PACS computer † CP-PACS/GRAPE-6 complex and galaxy formation † National networks and beyond

1 Academic HPC environment in Japan

† resources viewed in the Top 500 list

June 1997 June 2002 30 systems 20 systems 23% of total Rmax 37% of total Rmax in the top 100 in the top 100

Top500-june1997 Top500-June2002 104 10 105 100

USA JAMSTEC USA CP-PACS at CCP Germany Germany Japan Japan UK UK France 1000 1 104 France 10 Earth Simulator CCP Tokyo NAL Tflops Tokyo Tflops Osaka Osaka Kyushu KEK JAERI NEC KEK Rmax(GFlops) Tokyo Rmax(GFlops)

100 0.1 1000 IMR 1 Tsukuba GSIC JMA CBRC NIED Tohoku JAERI Nagoya RWC ISSP NIG ISSP JAERI Kyushu JAIST NAL NIEA Nagoya NAL Kyoto AIST TACC RIKEN JAMSTEC NRIM JAERI NCC

10 0.01 100 0.1 0 255075100 0255075100 Rank Rank 2 Supercomputers classified by affiliations

† Computer Centers of major universities † Academic research institutes/organizations Each major field has its own supercomputer facility e.g., „ High energy physics „ „ „ Genetics/ † Government labs „ AIST/JAERI/NRIM…. „ Earth Simulator belongs to JAMSTEC (Japan Marine science and Technology Center)

3 Resource allocation to users

† Computer centers of universities „ Charged by CPU time and disk usage „ Special low rates for larg-scale usage † Academic research institutes/organizations „ Application by users and selection by peer committee „ Typical large-scale applications: 1 week of full system peak max allocation #projects/year KEK high energy 1.2Tflops decide by case 20 NAO astrophysics 0.6Tflops 1 week of full system 20 ISSP condensed matter 0.6Tflops 1 week of full system 100 † Government labs „ Mission-oriented „ Limited access by outside users „ Earth Simulator: application and selection † 27 projects this year Atomosphere/ocean 15; earth 8; 3; others 1

4 System procurement and upgrade

† Installed under rental contract with vendors † Contract renewed at every 5-6 years † Selection of system/vendor by a bidding under strict government regulations † Rental budget part of annual budget of installation sites

‡ System renewal at every 5-6 years ‡ successfully provided high level of HPC resources to general scientific users

In Tsukuba, a somewhat different tradition has been pursued.....

5 Center for computational Physics University of Tsukuba

† Founded in 1992 † Emphasis on „ Development of HPC systems suitable for computational physics „ Close collaboration of and computer

† facility City of Tsukuba „ CP-PACS parallel system † MPP with 2048PU/0.6Tflops peak † Developed at the Center with Hitachi Ltd. † #1 of Top500-November 1996 „ GRAPE-6 system † Dedicated to gravity calculations † Developed at U. Tokyo † 8Tflops equivalent

6 History of PACS/PAX computers at Tsukuba

† Long history (since late ’70s) of developing parallel machines for scientific calculations PU board of PACS-9 (1978) PAX-128 (1983) with control unit

CP-PACS/96 Year Machine Performance #PU Memory CPU/MPU ------1978 PACS-9 7 KFLOPS 9 M6800 1980 PAX-32 0.5 MFLOPS 32 576 KB M6800/AM9511 1983 PAX-128 4 MFLOPS 128 3 MB M68B00/AM9511-4 1984 PAX-32J 3 MFLOPS 32 4 MB DCJ-11 1989 QCDPAX 14 GFLOPS 480 3 GB M68020/L64133 1996 CP-PACS 307 GFLOPS 1024 64 GB extended PA-RISC 1997 CP-PACS 614 GFLOPS 2048 128 GB 1.1 ------

PU array of PAX-32 (1980) PAX-32J (1984) A cabinet of QCDPAX QCDPAX/89 (1989)

7 CP-PACS parallel computer

† MPP with 2048 compute nodes and 128 I/O nodes „ 0.6Tflops peak , 0.3Gflops/node, „ 16x16x17 node array, 3-dim crossbar network, 0.3GB/sec throughput/link † Built by a joint effort of „ Computer scientists (about 15 members) „ Physicists (about 15 members) „ Vendor (Hitachi Ltd.) † Developed to SR2201/SR8000 series † Architecture development „ PVP-SW (vector processing on a RISC processor) „ Remote-DMA (zero-copy data transfer between nodes) „ Detailed benchmark on real applications (QCD)

8 Computational physics with CP-PACS

† Concentrated usage on a few fundamental physics problems which demands large- scale calculations

„ High energy physics; QCD

„ Astrophysics; Radiation hydrodynamics

„ Condensed matter; phases of solid hydrogen

9 CP-PACS run Oct.96-Feb.02 averaged CP-PACS Jobs Used Time 100% maintenance PU2048 PU1024 PU512 PU256 PU128 PU64 PU48 PIOU usage (*106hour) (days)

87% 30

1.E+061.4

25 1.E+061.2

1.E+061 20

8.E+050.8 15

6.E+050.6

10

4.E+050.4

5 2.E+050.2

0.E+000 0 1996.4 1996.5 1996.6 1996.7 1996.8 1996.9 1997.1 1997.2 1997.3 1997.4 1997.5 1997.6 1997.7 1997.8 1997.9 1998.1 1998.2 1998.3 1998.4 1998.5 1998.6 1998.7 1998.8 1998.9 1999.1 1999.2 1999.3 1999.4 1999.5 1999.6 1999.7 1999.8 1999.9 2000.1 2000.2 2000.3 2000.4 2000.5 2000.6 2000.7 2000.8 2000.9 2001.1 2001.2 2001.3 2001.4 2001.5 2001.6 2001.7 2001.8 2001.9 2002.1 2002.2 1996.10 1996.11 1996.12 1997.10 1997.11 1997.12 1998.10 1998.11 1998.12 1999.10 1999.11 1999.12 2000.10 2000.11 2000.12 2001.10 2001.11 2001.12 Month Oct. 1996 Feb. 2002

10 Lattice QCD on the CP-PACS computer

† Lattice (QCD) and its goal † Physics achievements „ Quenched light hadron spectrum „ Light quark masses in two-flavor full QCD „ ΔI=1/2 rule and CP violation parameter ε’/ε in weak decays of K mesons

11 Lattice QCD and its goal

† Quantum Chromodynamics (QCD) „ Fundamental of quarks and gluons and their strong interactions „ Knowing 1 coupling constant α and 6 quark masses s will allow full understanding of strong α mu ,md ,ms ,mc ,mb ,mt interactions (Yukawa’sdream) ψ γ ψ ψ = 1 ()+ ()⋅ + ψ † From computational point of view S ∑tr UUUU ∑ U mq „ Relatively simple calculation s P ψ ψ 1 ψ − † Uniform mesh O()U, , = dUd d O()U, ,ψ e S † Single scale Z ∫ „ Requires much computing power due to † 4-dim. Problem † Fermions essential † Physics is at lattice spacing a=0 „ Precision required(

12 Quenched hadron mass spectrum

† Benchmark calculation to verify QCD „ Pursued since 1981 (Weingarten/Hamber- Parisi) † Quenched: quark-antiquark pair creation/annihilation ignored † Essential to control various sysmatic errors down to a % level „ Finite lattice size L>3fm „ Finite quark mass mq→0 „ Finite lattice spacing a→0 Experimental spectrum

13 CP-PACS result for the quenched spectrum

† General pattern in good agreement with experiment † Clear systematic deviation below 10% level „ Indirect evidence of sea quark effect

† Completes the calculation pursued since 1981

Calculated quenched spectrum

14 Two-flavor full QCD

† Spectrum of quarks „ 3 light quarks (u,d,s) m < 1GeV † Need dynamical simulation „ 3 heavy quarks (c,b,t) m >1GeV † Quenching sufficient † Dynamical quark simulation „ costs 100-1000 times more computing power „ for odd number of quarks still being developed

† First systematic study of two-flavor full QCD „ u and d quark dynamical simulation „ s quark quenched approximation

15 Quark masses from lattice QCD

† Continuum extrapolation of light quark masses „ Several methods yield a unique value in the continuum limit „ Significant decrease by inclusion of sea quark effects

up&down quark Quenched value strange quark

Two-flavor full QCD

16 Summary on quark masses

† Significant sea quark

effects phenomenological estimate 160

† Lower edge of 140 m phenomenological s estimates; 120 lighter than one believed 100 for a long time 80 60

† Nf=3 being quark (MeV) mass 40 pursued to obtain physical 20 m ( x 10) values of light quark ud 0 masses quenched two-flavor full QCD

Real world;three flavors

17 ΔI=1/2 rule and CP violation in K decays

† Weak interaction decays of K mesons

Re()A K →ππ ()I = 0 „ ΔI=1/2 rule 0 ≈ 22 ()→ ππ ()= ε Re A2 K I 2 ω „ CP violation Im A Im A  ε' = 2 − 0 ε   2 Re A2 Re A0  ()± × −3 = 20.7 2.8 10 KTeV experiment (FNAL) ()−3  15.3 ± 2.6 ×10 NA48experiment (CERN) † Crucial numbers to verify the understanding of CP violation (matter-antimatter asymmetry)

† Two large-scale calculations using domain-wall QCD „ RIKEN-BNL-Columbia by QCDSP „ CP-PACS

18 ΔI=1/2 rule

† Reasonable agreement with experiment for I=2 † About half of experiment for I=0 † RIKEN-BNL-Columbia obtains a somewhat different result ( smaller I=2 and larger I=0)

19 CP violation parameter ε’/ε

† Small and negative in ε disagreement with ω ε' Im A2 Im A0  experiment =  −  ε Re A Re A † Similar result from RIKEN- 2  2 0  BNL-Columbia

† Possible reasons „ connected with insufficient enhancement of ΔI=1/2 rule „ Method of calculation (K→πreduction) may have problems

† Still a big problem requiring further work

20 Scale of QCD simulations

† Sustained speed „ Quenched QCD 64^3x112 53% of peak „ Full QCD 24^3x32 34% of peak † Total CPU time with CP-PACS (0.6Tflops peak) „ Quenched QCD 199 days of full machine „ Two-flavor full QCD 415 days of full machine „ K decay 180 days of full machine

† Scaling law π −6 5 m / m 7 = ⋅ #conf  ⋅  ρ  ⋅  L  ⋅  1/ a  ⋅ # FLOP's C         Tflops year  1000   0.6  3 fm 2GeV  C ≈ 2.8

O(10) Tflops machine needed for next progress

QCDOC (Mawhinney)/ApeNEXT(Jansen)

21 Astrophysics simulation with CP-PACS/GRAPE-6 complex

† Concept of HMCS (heterogeneous multi-computer system) † Prototype HMCS with CP-PACS/GRAPE-6 † Galaxy formation in the Early Universe

22 Concept of HMCS

† Two interaction types of physical systems

Short-ranged long-ranged Complex and manifold universal (gravity, Coulomb) O(N) calculations O(N^2) calculations but complex but simple

† A hybrid approach: „ General purpose MPP to deal with short-ranged interactions „ Special-purpose MPP to accelerate calculations of long-ranged interactions „ Coupling of the two systems by a parallel network

23 CP-PACS/GRAPE-6 as a prototype HMCS

„ GRAPE-6: † Developed by Makino et al (U. N m m Tokyo); − G i j , i = 1, , N ∑ 2 L Gordon Bell winner of 2001 j=1 rij † Gravity calculation accelerator † 32Gflops/chip x 32chip/board =1Tflops/board CP-PACS GRAPE-6 Parallel I/O system General-purpose MPP Special-purpose MPP PAVEMENT/PIO 0.6Tflops 8Tflops 100base-TX

Host PC switch

24 Galaxy formation simulation on HMCS M. Umemura & H. Susa

Cosmic Time Redshift Z Big Bang 10-44sec Radiation Matter (Baryon, Dark matter) 0.1Myr 103 Neutral DARK AGE 0.5Gyr 10

1Gyr QSO 1Gyr QSO Ionized 5 Galaxy formation

15Gyr dE E S0 SB S dI 0 (The Present) 25 Structure of the calculation

† Hydrodynamic motion of matter; ρρ „ Smoothed Hydrodynamics ()rrr=−W (¦¦) (SPH) iij∑ 0 W j „ Represents matter density as : kernelj function collection of ρ

N m m † Self-gravity acting on matter: − G i j , i = 1, , N ∑ 2 L j=1 rij „ Direct calculation by GRAPE-6. ν

ν ∂ ν 1 I ν † Radiative transfer; + n⋅∇I = χ (S − Iν ) c ∂t „ Interaction of photon with matter is solved

† Chemical reaction & radiative cooling

26 Algorithm

Self-Gravity GRAPE-6

Density (SPH)

Radiative Transfer CP-PACS

Iteration Chemical Reaction

Temperature

Pressure Gradient

Integration of Equation of Motion 27 Calculation parameters

† Processors „ 1024PU of CP-PACS 0.3TFLOPS „ 4 boards of GRAPE-6 4TFLOPS

† Particle numbers „ 64K baryonic matter particles „ 64K dark matter particles

† calculation time/step „ 4 sec for calculation on CP-PACS „ 3 sec for communication to/from GRAPE-6 „ 0.1 sec for calculation on GRAPE-6 total 7.1 sec/time step

28 QSO

29 0.5 Gyr 0.3 Gyr

1 Gyr 0.7 Gyr

30 National network and beyond

† TsukubaWAN † SuperSINET

31 Tsukuba Science City

† High density of HPC-related research organizations „ High energy physics KEK/U of Tsukuba/… „ Matierial science AIST/NRIM/U of Tsukuba/… „ Life science RIKEN/AIST/… „ Computer science AIST/RWCP/… † High density of supercomputers Tsukuba „ 12 systems in the Top500 (june/2002)

rank organization vendor model Gflops year 26 KEK Hitachi SR8000-F1/100 917.00 2000 45 U of Tsukuba Fujitsu VPP5000/80 730.00 2001 56 AIST/CBRC NECbe lf-ma Magi Clusterd SCore PIII III/PIII 654.00 93 618.30 2001 2001 57 RWCP/Tsuku89 NIES NEC SX-6/64M8 495.20 2002 99 AIST/TACC Hitachi SR8000/64 449.00 1999 125 U of Tsukuba Hitachi CP-PACS/2048 368.20M 306.30 1996 2002 Tokyo 160 NIED187 MRI SGI Hitachi ORIGIN SR8000/36 3000 500 255.00 1999 228 NRIM NEC SX-5/32H2 241.40 2000 426 U of Tsukuba Hitachi SR8000-G1/12 150.10 2002

32 TsukubaWAN

† City-wide network to TsukubaWAN schematic view connect research KEK institutions KEK Loop length 30km „ 13 sites at present NTT † Optical ring NTT NIED „ 10Gbps backbone NIED U. Tsukuba/CCP „ OADM/ADM node U. Tsukuba/CCP † Operating since March GBLAB NRIM 2002 GBLAB NRIM

† Wil be connected to 光 Gbit-SW 伝 送 SuperSINET at U of 路 NIES NIES Router Tsukuba Super computer AISTAIST

NARO NARO

33 TsukubaWan configuration

SC:Super-Computer SuperSINET

OADM: Optical Add- U of Lib. Info. Drop Multiplexer ADM U of Tsukuba Res. Exch. Center ADM:Add-Drop Multiplexer SC ADM 13 sites KEK SC OADMADM OADM SC NIMS OADMADM OADMADM ADM NILIM

SC SC AIST OADMADM OADMADM NIESDP

NTT AS Lab To research labs SC OADMADM OADMADM AFF

Supercomputer NW Supercomputer NW GbE GbE ADM ADM 10Gbps×6 NIE 10Gbps×6 Gbit Lab 2Gbps×1 ADM 2Gbps×1 researchNW researchNW Tsukuba NOC GbE JGN network IMNet GbE 600Mbps path 600Mbps path

34 SuperSINET

† Nationwide research network connecting „ Major universities „ Research institutes/organizations „ Government labs † Optical network with „ Optical cross connect „ 10Gbps WDM backbone „ 1Gbps point-to-point connection † Operating since Jan. 2002

35 SuperSINET research projects

† High-energy and fusion research „ KEK/BELLE and LHC/ATLAS experiment „ Fusion data analysis „ Lattice QCD † Space and astronomical science „ On-line VLBI „ GRAPE net etc † Genome information analysis „ Genome analysis „ Genome † GRID „ Supercomputer network † Nanotechnology „ Material simulation „ Grid-based large-scale simulations

36 Conclusions

† Supercomputer Centers in Japan „ Important component of science and engineering research in Japan „ Have worked well in providing HPC resources † Direction pursued at CCP „ /computer collaboration „ academia/vendor collaboration „ concentration on key problems „ Successfully attacked several fundamental issues in physics † With emerging high speed network „ Closer collaboration among centers in Japan „ Closer collaboration world-wide

37