The Hartree Centre: Experience in Addressing Industrial, Societal and Scientific Challenges
Prof Vassil Alexandrov Chief Science Officer Our mission
Transforming UK industry by accelerating the adoption of high performance computing, big data and AI. Hartree Centre
Mission: To transform UK industry by accelerating the adoption of high performance computing, data-centric computing and AI.
Lenovo / Arm development system
Research & innovation Development & Better products & services Deployment through collaboration prototyping delivered faster & cheaper What we do
− Collaborative R&D Address industrial, societal and scientific challenges.
− Platform as a service Give your own experts pay-as-you-go access to our compute power
− Creating digital assets License the new industry-led software applications we create with IBM Research
− Training and skills Drop in on our comprehensive programme of specialist training courses and events or design a bespoke course for your team
Government & public sector Network of expertise Local business networks Technology partners Universities
Academia
International research communities Hartree Research Themes
Data Science HPC
AI Emerging Technological Paradigms Our track record Case study | Code optimisation for aero engines Collaborative R&D
Hartree Centre experts optimised codes for modelling component design, speeding up run times by approx 20%
“Working with the Hartree Centre, we have quickly made significant improvements to our code, delivering faster turnaround and more capability to our engineers.”
− Matthew Street, Rolls-Royce Case study | Computer aided formulation Collaborative R&D
Faster development process for products like shampoo, reducing physical testing
“The Hartree Centre’s high performance computing capabilities help us achieve better design solutions for our consumers, delivered by more efficient, cost-effective and sustainable processes.”
− Paul Howells, Unilever Case study | Building the cognitive hospital Collaborative R&D
Transforming the patient experience using cognitive technology and data analytics
“Helping our patients and their families prepare properly for coming into hospital will really reduce their anxiety and could mean they spend more meaningful time with doctors so we are able to get them better faster.”
− Iain Hennessey, Alder Hey Children’s Hospital Case study | Smart crop protection Creating digital assets
Building pest risk prediction models with the potential to:
• Enable the farming industry to more accurately plan preventative measures
• Reduce crop losses
• Drive down insurance rates through lower probability of crop damage Our Platforms
Intro Our platforms
Intel platforms Bull Sequana X1000 (840 Skylake + 840 KNL processors) - One of the largest supercomputers in Europe focusing primarily on industrial-led challenges. IBM big data analytics cluster | 288TB IBM data centric platforms IBM Power8 + NVLink + Tesla P100 IBM Power8 + Nvidia K80 Accelerated & emerging tech Maxeler FPGA system ARM 64-bit platform Clustervision novel cooling demonstrator Detailed Case Studies
HPSE Case Study:
• Performance: Needs to get the results in time for forecast, ever-increasing accuracy goals for climate simulations. • Productivity: hundreds of people contributing with different areas of expertise, 2 million lines of code (UM) • Portability: Very risky to chose just one platform: may not be future-proofed, hardware changes more often than software, procurement negotiation disadvantage if you can only run on one architecture,Difficult... to compromise on one Domain Specific Languages:
Embedded Fortran-to-Fortran code generation system used by the UK MetOffice next-generation weather and climate simulation model (LFRic)
Operates on full fields Algorithm
Natural Computational Science Parallel System Science
Kernel Operates on local elements or columns
Given domain-specific knowledge and information about the Algorithm and Kernels, PSyclone can generate the Parallel System layer. Software Parallelization and Optimization
• Serial performance optimizations. • Distributed-memory parallelism with MPI. • Shared-memory parallelism. • GPGPU Programming.
Example: Direct Simulation Monte Carlo (for rarified gas flows) PRACE (Partnership for Advanced Computing in Europe)
Currently undertaking an industrial collaboration Briggs Automotive Company: • Involves analysing and improving the design of the BAC Mono single-seat sports car using large-scale CFD computations. European project that targets to provide the template for an upcoming Exascale system by co-designing and implementing a petascale-level prototype with ground-breaking characteristics. Builds on top of cost-efficient architecture enabled by novel inter- die links and FPGA acceleration.
Work package 2: Applications, Co-design, Porting and Evaluation
Work package 3: System software and programming environment Porting DL_MESO (DPD) on Nvidia GPUs
Jony Castagna What is DL_MESO (DPD)
• DL_MESO is a general purpose mesoscale simulation package
developed by Michael Seaton for CCP5 and UKCOMES under a grant
provided by EPSRC.
• It is written in Fortran90 and C++ and supports both Lattice Boltzmann
Equation (LBE) and Dissipative Particle Dynamics (DPD) methods.
• https://www.scd.stfc.ac.uk/Pages/DL_MESO.aspx ...similar to MD
- Free spherical particles which interact over a range that is of the same order as their diameters.
- The particles can be thought of as assemblies or aggregates of molecules, such as solvent molecules or polymers, or more simply as carriers of momentum. cut off for short range forces
+ i j long range forces
Fi is the sum of conservative, drag and random (or stochastic) pair forces: Examples of DL_MESO_DPD applications
Vesicle Formation
Phase separation
Lipid Bilayer
Polyelectrolyte DL_MESO: highly scalable mesoscale simulations Molecular Simulation 39 (10) pp. 796-821, 2013 Porting DL_MESO on NVidia GPU DL_MESO_DPD on GPU
initialisation, IO, etc. (Fortran)
host = CPU pass arrays to C copy to device device = GPU (Fortran) (CUDA C)
start main loop first step VV
construct neighbour list
generate random values all done by the GPU find short range forces with 1 thread per particle find long range forces
(FFT)
second step VV
gather statistics 100 % compatible IO with master version no time = final time ? pass arrays back to yes Fortran and End Multi GPU version
step 1 Compute internal overlap computation cell forces step 2 between particles with communications! step3
Compute boundary cell forces between particles Multi GPU version
GPU 0 GPU 1
halo regions
GPU 2 GPU 3 Multi GPU version Use of communication between GPUs: 1) to exchange particles positions using ghost cells (ideally while computing the internal cells!)
2) for particles leaving the domain
3) gathering statistics
4) transfer info for FFTW (not implemented yet!) Multi GPU version exchange particles positions using ghost cells step 1: exchange x-y planes (2 communications)
y z x Multi GPU version exchange particles positions using ghost cells
step 2: exchange x-z planes (with halo data!)
y z x Multi GPU version exchange particles positions using ghost cells
step 3:
exchange y-z planes (with halo data!)
y need only 6 communications x z instead of 26! Weak scaling
1.2 billion particles for a mixture phase separation
weak scaling 1
*GPUs 0.8 ideal 0.6 Piz Daint DL_MESO (GPU version) 0.4
0.2 (CSCS)
0 0 128 256 384 512
GPUs Strong scaling
1.8 billion particles for a mixture phase separation
GPUs - no imbalance, if not within the same GPU due the particles clustering! Xiaohu Case studies Software Development for Energy and Environment
IMPORTANCE: Hartree Centre key technologies, align with STFC global challenge schemes.
Finite Element Method Smoothed Particle Hydrodynamics
Nuclear Schlumberger oil reservoir Wave energy converter Wave impact on oil rig
NERC ocean roadmap EPSRC MAGIC Tsunami CCP-WSI Scalable Algorithm Development
Large scale application software development, advanced computational methods development.
FEM SPH/ISPH
Unstructured Mesh Mesh topology SPH Pre/Post Nearest Neighbour Pre/Post Processing Management Processing List Search
Basic FEM Math FEM Matrix Mesh Basic SPH Math SPH Particle operators Assembly Adaptivity operators Refinement
Mesh/Particles Reordering Sparse Linear Solver DDM/DLB
MPI OpenMP CUDA OpenCL OpenACC
C/C++ Fortran Python Unstructured Application Framework
EPSRC, CCP, SLA Common Particle Exascale Algorithms Methods Kernels EPSRC eCSE
Common Unstructured Application Framework
EPSRC, Innovate UK, HRBEU Unstructured Mesh Performance Portability Techs Innovate UK, GCRF Advanced Monte Carlo Methods for Linear Algebra
Vassil Alexandrov Advanced Monte Carlo Methods for Linear Algebra on Advanced Accelerator Architectures Anton Lebedev (Institute for Theoretical Physics, University of Tuebingen, Germany), [email protected] Vassil Alexandrov (Hartree Centre STFC, UK and ICREA, Spain) [email protected]
Hybrid Monte Carlo Speed-ups V100 Speed-ups Intel Xeon vs Hybrid MSPAI vs Intel Xeon 8160 vs V100 and K80 OpenMP vs GPU implementation OpenMP vs GPU implementation OpenMP vs GPU implementation Data Science
Simon Goodchild Accuracy % 13 features 100 50 0
Accuracy % 13 features Hartree Centre Impact
“Rolls-Royce has worked with the Hartree Centre for many years due to their combination of highly skilled computational scientists and state-of-the-art HPC.”
- Leigh Lapworth, Rolls-Royce
“The Hartree Centre’s expertise made our testing process cheaper and more reliable, enabling us to get our product to market faster.”
- William Wilson, GLOBAL-365 Evaluation participants Training and Education at Hartree Centre Hartree research and training engagement
UKRI / STFC programs & initiatives
Strategic Outreach partnerships & activities international collaboration HPC RD&I: supercomputing, Big data Analytics, AI, RSE, Visual Computing Training Commercial & courses & Research Knowledge projects Transfer
University liaison
Knowledge Transfer, Events and Activities
Over 3000 person-days of training delivered to industrial and academic audiences • Workshops: Aimed at decision-makers they shed light on High-performance Computing(HPC), Data Analytics, and Research Software Engineering or Artificial Intelligence applications(AI) incorporating individual consultations with our Business Development team (BD). • Training courses: The usual format is 1 to 3 days with sessions alternating between presenting novel concept following by a technical hands-on session to facilitate the trainees in applying the learned technique/skill on-the-job. The courses usually end up with a “bring your own problem” session in order to build the confidence of the participants in tackling the discussed technology/methodology. • Upskilling and collaboration for specific project: A practical workshop with experienced researchers and engineers from the Hartree Centre can kick-start a collaboration by exploring some of the required steps. For example how to collect and prepare your data for AI applications, how to evaluate the performance of a software and to optimise applications for HPC systems or in the cloud. Impact of Hartree engagement with Universities
• Hartree Annual Doctoral Symposium, open to all of the CDTs collaborating with the Centre. The aim of the Symposium is to provide a forum for the students to share the results of their research work through talks, poster sessions and discussions. • Internships & Placements, under Individual Research Training plan. Hartree centre guarantee a free access to all Hartree training courses and Seminars for the students it is supervising, and address the requirements for their technological skills. • Functional Skills - as presentation of research results in front of academic and non-academic audiences is as equally important as the ability to produce technical and business reports and successfully publish research papers, opportunities to work in collaboration with Hartree researchers on written and verbal communication targeting diverse audiences are part of the PDP. Thank you Vassil Alexandrov [email protected]