The Hartree Centre: Experience in Addressing Industrial, Societal and Scientific Challenges
Total Page:16
File Type:pdf, Size:1020Kb
The Hartree Centre: Experience in Addressing Industrial, Societal and Scientific Challenges Prof Vassil Alexandrov Chief Science Officer Our mission Transforming UK industry by accelerating the adoption of high performance computing, big data and AI. Hartree Centre Mission: To transform UK industry by accelerating the adoption of high performance computing, data-centric computing and AI. Lenovo / Arm development system Research & innovation Development & Better products & services Deployment through collaboration prototyping delivered faster & cheaper What we do − Collaborative R&D Address industrial, societal and scientific challenges. − Platform as a service Give your own experts pay-as-you-go access to our compute power − Creating digital assets License the new industry-led software applications we create with IBM Research − Training and skills Drop in on our comprehensive programme of specialist training courses and events or design a bespoke course for your team Government & public sector Network of expertise Local business networks Technology partners Universities Academia International research communities Hartree Research Themes Data Science HPC AI Emerging Technological Paradigms Our track record Case study | Code optimisation for aero engines Collaborative R&D Hartree Centre experts optimised codes for modelling component design, speeding up run times by approx 20% “Working with the Hartree Centre, we have quickly made significant improvements to our code, delivering faster turnaround and more capability to our engineers.” − Matthew Street, Rolls-Royce Case study | Computer aided formulation Collaborative R&D Faster development process for products like shampoo, reducing physical testing “The Hartree Centre’s high performance computing capabilities help us achieve better design solutions for our consumers, delivered by more efficient, cost-effective and sustainable processes.” − Paul Howells, Unilever Case study | Building the cognitive hospital Collaborative R&D Transforming the patient experience using cognitive technology and data analytics “Helping our patients and their families prepare properly for coming into hospital will really reduce their anxiety and could mean they spend more meaningful time with doctors so we are able to get them better faster.” − Iain Hennessey, Alder Hey Children’s Hospital Case study | Smart crop protection Creating digital assets Building pest risk prediction models with the potential to: • Enable the farming industry to more accurately plan preventative measures • Reduce crop losses • Drive down insurance rates through lower probability of crop damage Our Platforms Intro Our platforms Intel platforms Bull Sequana X1000 (840 Skylake + 840 KNL processors) - One of the largest supercomputers in Europe focusing primarily on industrial-led challenges. IBM big data analytics cluster | 288TB IBM data centric platforms IBM Power8 + NVLink + Tesla P100 IBM Power8 + Nvidia K80 Accelerated & emerging tech Maxeler FPGA system ARM 64-bit platform Clustervision novel cooling demonstrator Detailed Case Studies HPSE Case Study: • Performance: Needs to get the results in time for forecast, ever-increasing accuracy goals for climate simulations. • Productivity: hundreds of people contributing with different areas of expertise, 2 million lines of code (UM) • Portability: Very risky to chose just one platform: may not be future-proofed, hardware changes more often than software, procurement negotiation disadvantage if you can only run on one architecture,Difficult... to compromise on one Domain Specific Languages: Embedded Fortran-to-Fortran code generation system used by the UK MetOffice next-generation weather and climate simulation model (LFRic) Operates on full fields Algorithm Natural Computational Science Parallel System Science Kernel Operates on local elements or columns Given domain-specific knowledge and information about the Algorithm and Kernels, PSyclone can generate the Parallel System layer. Software Parallelization and Optimization • Serial performance optimizations. • Distributed-memory parallelism with MPI. • Shared-memory parallelism. • GPGPU Programming. Example: Direct Simulation Monte Carlo (for rarified gas flows) PRACE (Partnership for Advanced Computing in Europe) Currently undertaking an industrial collaboration Briggs Automotive Company: • Involves analysing and improving the design of the BAC Mono single-seat sports car using large-scale CFD computations. European project that targets to provide the template for an upcoming Exascale system by co-designing and implementing a petascale-level prototype with ground-breaking characteristics. Builds on top of cost-efficient architecture enabled by novel inter- die links and FPGA acceleration. Work package 2: Applications, Co-design, Porting and Evaluation Work package 3: System software and programming environment Porting DL_MESO (DPD) on Nvidia GPUs Jony Castagna What is DL_MESO (DPD) • DL_MESO is a general purpose mesoscale simulation package developed by Michael Seaton for CCP5 and UKCOMES under a grant provided by EPSRC. • It is written in Fortran90 and C++ and supports both Lattice Boltzmann Equation (LBE) and Dissipative Particle Dynamics (DPD) methods. • https://www.scd.stfc.ac.uk/Pages/DL_MESO.aspx ...similar to MD - Free spherical particles which interact over a range that is of the same order as their diameters. - The particles can be thought of as assemblies or aggregates of molecules, such as solvent molecules or polymers, or more simply as carriers of momentum. cut off for short range forces + i j long range forces Fi is the sum of conservative, drag and random (or stochastic) pair forces: Examples of DL_MESO_DPD applications Vesicle Formation Phase separation Lipid Bilayer Polyelectrolyte DL_MESO: highly scalable mesoscale simulations Molecular Simulation 39 (10) pp. 796-821, 2013 Porting DL_MESO on NVidia GPU DL_MESO_DPD on GPU initialisation, IO, etc. (Fortran) host = CPU pass arrays to C copy to device device = GPU (Fortran) (CUDA C) start main loop first step VV construct neighbour list generate random values all done by the GPU find short range forces with 1 thread per particle find long range forces (FFT) second step VV gather statistics 100 % compatible IO with master version no time = final time ? pass arrays back to yes Fortran and End Multi GPU version step 1 Compute internal overlap computation cell forces step 2 between particles with communications! step3 Compute boundary cell forces between particles Multi GPU version GPU 0 GPU 1 halo regions GPU 2 GPU 3 Multi GPU version Use of communication between GPUs: 1) to exchange particles positions using ghost cells (ideally while computing the internal cells!) 2) for particles leaving the domain 3) gathering statistics 4) transfer info for FFTW (not implemented yet!) Multi GPU version exchange particles positions using ghost cells step 1: exchange x-y planes (2 communications) y z x Multi GPU version exchange particles positions using ghost cells step 2: exchange x-z planes (with halo data!) y z x Multi GPU version exchange particles positions using ghost cells step 3: exchange y-z planes (with halo data!) y need only 6 communications x z instead of 26! Weak scaling 1.2 billion particles for a mixture phase separation weak scaling 1 *GPUs 0.8 ideal 0.6 Piz Daint DL_MESO (GPU version) 0.4 0.2 (CSCS) 0 0 128 256 384 512 GPUs Strong scaling 1.8 billion particles for a mixture phase separation GPUs - no imbalance, if not within the same GPU due the particles clustering! Xiaohu Case studies Software Development for Energy and Environment IMPORTANCE: Hartree Centre key technologies, align with STFC global challenge schemes. Finite Element Method Smoothed Particle Hydrodynamics Nuclear Schlumberger oil reservoir Wave energy converter Wave impact on oil rig NERC ocean roadmap EPSRC MAGIC Tsunami CCP-WSI Scalable Algorithm Development Large scale application software development, advanced computational methods development. FEM SPH/ISPH Unstructured Mesh Mesh topology SPH Pre/Post Nearest Neighbour Pre/Post Processing Management Processing List Search Basic FEM Math FEM Matrix Mesh Basic SPH Math SPH Particle operators Assembly Adaptivity operators Refinement Mesh/Particles Reordering Sparse Linear Solver DDM/DLB MPI OpenMP CUDA OpenCL OpenACC C/C++ Fortran Python Unstructured Application Framework EPSRC, CCP, SLA Common Particle Exascale Algorithms Methods Kernels EPSRC eCSE Common Unstructured Application Framework EPSRC, Innovate UK, HRBEU Unstructured Mesh Performance Portability Techs Innovate UK, GCRF Advanced Monte Carlo Methods for Linear Algebra Vassil Alexandrov Advanced Monte Carlo Methods for Linear Algebra on Advanced Accelerator Architectures Anton Lebedev (Institute for Theoretical Physics, University of Tuebingen, Germany), [email protected] Vassil Alexandrov (Hartree Centre STFC, UK and ICREA, Spain) [email protected] Hybrid Monte Carlo Speed-ups V100 Speed-ups Intel Xeon vs Hybrid MSPAI vs Intel Xeon 8160 vs V100 and K80 OpenMP vs GPU implementation OpenMP vs GPU implementation OpenMP vs GPU implementation Data Science Simon Goodchild Accuracy % 13 features 100 50 0 Accuracy % 13 features Hartree Centre Impact “Rolls-Royce has worked with the Hartree Centre for many years due to their combination of highly skilled computational scientists and state-of-the-art HPC.” - Leigh Lapworth, Rolls-Royce “The Hartree Centre’s expertise made our testing process cheaper and more reliable, enabling us to get our product