The Hartree Centre: Experience in Addressing Industrial, Societal and Scientific Challenges

Prof Vassil Alexandrov Chief Science Officer Our mission

Transforming UK industry by accelerating the adoption of high performance computing, and AI. Hartree Centre

Mission: To transform UK industry by accelerating the adoption of high performance computing, data-centric computing and AI.

Lenovo / Arm development system

Research & innovation Development & Better products & services Deployment through collaboration prototyping delivered faster & cheaper What we do

− Collaborative R&D Address industrial, societal and scientific challenges.

− Platform as a service Give your own experts pay-as-you-go access to our compute power

− Creating digital assets License the new industry-led software applications we create with IBM Research

− Training and skills Drop in on our comprehensive programme of specialist training courses and events or design a bespoke course for your team

Government & public sector Network of expertise Local business networks Technology partners Universities

Academia

International research communities Hartree Research Themes

Data Science HPC

AI Emerging Technological Paradigms Our track record Case study | Code optimisation for aero engines Collaborative R&D

Hartree Centre experts optimised codes for modelling component design, speeding up run times by approx 20%

“Working with the Hartree Centre, we have quickly made significant improvements to our code, delivering faster turnaround and more capability to our engineers.”

− Matthew Street, Rolls-Royce Case study | Computer aided formulation Collaborative R&D

Faster development process for products like shampoo, reducing physical testing

“The Hartree Centre’s high performance computing capabilities help us achieve better design solutions for our consumers, delivered by more efficient, cost-effective and sustainable processes.”

− Paul Howells, Unilever Case study | Building the cognitive hospital Collaborative R&D

Transforming the patient experience using cognitive technology and data analytics

“Helping our patients and their families prepare properly for coming into hospital will really reduce their anxiety and could mean they spend more meaningful time with doctors so we are able to get them better faster.”

− Iain Hennessey, Alder Hey Children’s Hospital Case study | Smart crop protection Creating digital assets

Building pest risk prediction models with the potential to:

• Enable the farming industry to more accurately plan preventative measures

• Reduce crop losses

• Drive down insurance rates through lower probability of crop damage Our Platforms

Intro Our platforms

Intel platforms Bull Sequana X1000 (840 Skylake + 840 KNL processors) - One of the largest supercomputers in Europe focusing primarily on industrial-led challenges. IBM big data analytics cluster | 288TB​ IBM data centric platforms​ IBM Power8 + NVL​ink​ + Tesla P100 IBM Power8 ​​+ K80​​​ Accelerated & emerging tech Maxeler FPGA system ARM​ 64-bit platform​ Clustervision novel cooling demonstrator Detailed Case Studies

HPSE Case Study:

• Performance: Needs to get the results in time for forecast, ever-increasing accuracy goals for climate simulations. • Productivity: hundreds of people contributing with different areas of expertise, 2 million lines of code (UM) • Portability: Very risky to chose just one platform: may not be future-proofed, hardware changes more often than software, procurement negotiation disadvantage if you can only run on one architecture,Difficult... to compromise on one Domain Specific Languages:

Embedded Fortran-to-Fortran code generation system used by the UK MetOffice next-generation weather and climate simulation model (LFRic)

Operates on full fields Algorithm

Natural Computational Science Parallel System Science

Kernel Operates on local elements or columns

Given domain-specific knowledge and information about the Algorithm and Kernels, PSyclone can generate the Parallel System layer. Software Parallelization and Optimization

• Serial performance optimizations. • Distributed-memory parallelism with MPI. • Shared-memory parallelism. • GPGPU Programming.

Example: Direct Simulation Monte Carlo (for rarified gas flows) PRACE (Partnership for Advanced Computing in Europe)

Currently undertaking an industrial collaboration Briggs Automotive Company: • Involves analysing and improving the design of the BAC Mono single-seat sports car using large-scale CFD computations. European project that targets to provide the template for an upcoming Exascale system by co-designing and implementing a petascale-level prototype with ground-breaking characteristics. Builds on top of cost-efficient architecture enabled by novel inter- die links and FPGA acceleration.

Work package 2: Applications, Co-design, Porting and Evaluation

Work package 3: System software and programming environment Porting DL_MESO (DPD) on Nvidia GPUs

Jony Castagna What is DL_MESO (DPD)

• ​DL_MESO is a general purpose mesoscale simulation package

developed by Michael Seaton for CCP5 and UKCOMES under a grant

provided by EPSRC.

• It is written in Fortran90 and C++ and supports both Lattice Boltzmann

Equation (LBE) and Dissipative Particle Dynamics (DPD) methods.

• https://www.scd.stfc.ac.uk/Pages/DL_MESO.aspx ...similar to MD

- Free spherical particles which interact over a range that is of the same order as their diameters.

- The particles can be thought of as assemblies or aggregates of molecules, such as solvent molecules or polymers, or more simply as carriers of momentum. cut off for short range forces

+ i j long range forces

Fi is the sum of conservative, drag and random (or stochastic) pair forces: Examples of DL_MESO_DPD applications

Vesicle Formation

Phase separation

Lipid Bilayer

Polyelectrolyte DL_MESO: highly scalable mesoscale simulations Molecular Simulation 39 (10) pp. 796-821, 2013 Porting DL_MESO on NVidia GPU DL_MESO_DPD on GPU

initialisation, IO, etc. (Fortran)

host = CPU pass arrays to C copy to device device = GPU (Fortran) (CUDA C)

start main loop first step VV

construct neighbour list

generate random values all done by the GPU find short range forces with 1 thread per particle find long range forces

(FFT)

second step VV

gather statistics 100 % compatible IO with master version no time = final time ? pass arrays back to yes Fortran and End Multi GPU version

step 1 Compute internal overlap computation cell forces step 2 between particles with communications! step3

Compute boundary cell forces between particles Multi GPU version

GPU 0 GPU 1

halo regions

GPU 2 GPU 3 Multi GPU version Use of communication between GPUs: 1) to exchange particles positions using ghost cells (ideally while computing the internal cells!)

2) for particles leaving the domain

3) gathering statistics

4) transfer info for FFTW (not implemented yet!) Multi GPU version exchange particles positions using ghost cells step 1: exchange x-y planes (2 communications)

y z x Multi GPU version exchange particles positions using ghost cells

step 2: exchange x-z planes (with halo data!)

y z x Multi GPU version exchange particles positions using ghost cells

step 3:

exchange y-z planes (with halo data!)

y need only 6 communications x z instead of 26! Weak scaling

1.2 billion particles for a mixture phase separation

weak scaling 1

*GPUs 0.8 ideal 0.6 Piz Daint DL_MESO (GPU version) 0.4

0.2 (CSCS)

0 0 128 256 384 512

GPUs Strong scaling

1.8 billion particles for a mixture phase separation

GPUs - no imbalance, if not within the same GPU due the particles clustering! Xiaohu Case studies Software Development for Energy and Environment

IMPORTANCE: Hartree Centre key technologies, align with STFC global challenge schemes.

Finite Element Method Smoothed Particle Hydrodynamics

Nuclear Schlumberger oil reservoir Wave energy converter Wave impact on oil rig

NERC ocean roadmap EPSRC MAGIC Tsunami CCP-WSI Scalable Algorithm Development

Large scale application software development, advanced computational methods development.

FEM SPH/ISPH

Unstructured Mesh Mesh topology SPH Pre/Post Nearest Neighbour Pre/Post Processing Management Processing List Search

Basic FEM Math FEM Matrix Mesh Basic SPH Math SPH Particle operators Assembly Adaptivity operators Refinement

Mesh/Particles Reordering Sparse Linear Solver DDM/DLB

MPI OpenMP CUDA OpenCL OpenACC

C/C++ Fortran Python Unstructured Application Framework

EPSRC, CCP, SLA Common Particle Exascale Algorithms Methods Kernels EPSRC eCSE

Common Unstructured Application Framework

EPSRC, Innovate UK, HRBEU Unstructured Mesh Performance Portability Techs Innovate UK, GCRF Advanced Monte Carlo Methods for Linear Algebra

Vassil Alexandrov Advanced Monte Carlo Methods for Linear Algebra on Advanced Accelerator Architectures Anton Lebedev (Institute for Theoretical Physics, University of Tuebingen, Germany), [email protected] Vassil Alexandrov (Hartree Centre STFC, UK and ICREA, Spain) [email protected]

Hybrid Monte Carlo Speed-ups V100 Speed-ups Xeon vs Hybrid MSPAI vs Intel Xeon 8160 vs V100 and K80 OpenMP vs GPU implementation OpenMP vs GPU implementation OpenMP vs GPU implementation Data Science

Simon Goodchild Accuracy % 13 features 100 50 0

Accuracy % 13 features Hartree Centre Impact

“Rolls-Royce has worked with the Hartree Centre for many years due to their combination of highly skilled computational scientists and state-of-the-art HPC.”

- Leigh Lapworth, Rolls-Royce

“The Hartree Centre’s expertise made our testing process cheaper and more reliable, enabling us to get our product to market faster.”

- William Wilson, GLOBAL-365 Evaluation participants Training and Education at Hartree Centre Hartree research and training engagement

UKRI / STFC programs & initiatives

Strategic Outreach partnerships & activities international collaboration HPC RD&I: supercomputing, Big data Analytics, AI, RSE, Visual Computing Training Commercial & courses & Research Knowledge projects Transfer

University liaison

Knowledge Transfer, Events and Activities

Over 3000 person-days of training delivered to industrial and academic audiences • Workshops: Aimed at decision-makers they shed light on High-performance Computing(HPC), Data Analytics, and Research Software Engineering or Artificial Intelligence applications(AI) incorporating individual consultations with our Business Development team (BD). • Training courses: The usual format is 1 to 3 days with sessions alternating between presenting novel concept following by a technical hands-on session to facilitate the trainees in applying the learned technique/skill on-the-job. The courses usually end up with a “bring your own problem” session in order to build the confidence of the participants in tackling the discussed technology/methodology. • Upskilling and collaboration for specific project: A practical workshop with experienced researchers and engineers from the Hartree Centre can kick-start a collaboration by exploring some of the required steps. For example how to collect and prepare your data for AI applications, how to evaluate the performance of a software and to optimise applications for HPC systems or in the cloud. Impact of Hartree engagement with Universities

• Hartree Annual Doctoral Symposium, open to all of the CDTs collaborating with the Centre. The aim of the Symposium is to provide a forum for the students to share the results of their research work through talks, poster sessions and discussions. • Internships & Placements, under Individual Research Training plan. Hartree centre guarantee a free access to all Hartree training courses and Seminars for the students it is supervising, and address the requirements for their technological skills. • Functional Skills - as presentation of research results in front of academic and non-academic audiences is as equally important as the ability to produce technical and business reports and successfully publish research papers, opportunities to work in collaboration with Hartree researchers on written and verbal communication targeting diverse audiences are part of the PDP. Thank you Vassil Alexandrov [email protected]

[email protected]

[email protected]