STFC Scientific Computing Department

Juan Bicarregui Head of Data Division

PLaN-E meeting Copenhagen 9 April 2015

1 Overview

• Science and Technology Facilities Council – Scientific Computing Department Applications Division Systems Division Technology Division Data Division

2 What is STFC?

250m Programme includes:

• Neutron and Muon Source Square• Synchrotron Kilometre Radiation Array Source Large Hadron Collider • Lasers • Space Science • Particle Physics • Compuing and Data Management • Microstructures • Nuclear Physics • Radio Communications

Daresbury Laboratory ESRF & ILL, Grenoble The National Laboratories Overview

• Science and Technology Facilities Council – Scientific Computing Department Applications Division Systems Division Technology Division Data Division

5 Scientific Computing

Established 1st April 2012 from e-Science and Computational Science and Engineering

180 staff supporting over 7500 users • Scientific Applications development and support • Compute and data facilities and services • Systems administration, data services, • high-performance computing, • numerical analysis & software engineering. > 100 publications pa > 3500 training days pa

Maximising the impact of Scientific Computing Big Data for Big Science Scientific Computing Department

Acting Director

4 Divisions: David Corney Group Leader Group Leader Martyn Winn Biology and Life Sciences Shirley Miller DL Administration Hannes Loeffler Eugene Krissinel Ville Uski David Waterman Marcin Wojdyr Charles Ballard Ronan Keegan Narayanan Krishnan Chris Morris Andrey Lebedev Chang Sik Kim Agnel Joseph Chris Wood Valeria Losasso Tom Burnley James Gebbie Christelle Gendrin Laura Johnston Esme Williams Damian Jones

Dave Emerson RAL Division Engineering and Administration Environment Head Benzi John Charles Moulinec Rob Barber Xiaojun Gu Stefano Rolfo Malgorzata Zimon Yin Yue Karen McIntyre Jean Pearce Carol Malpass

Barbara Montanari Applications Theoretical and Finance Paul Sherwood Computational Physics CORE Applications Keith Refson Leonardo Leon Petit Martin Lueders Martin Plummer Dominik Jochym Simone Sturniolo Nic Harrison

Bernasconi Tracey Kelly Jenny Williams

APPLICATIONS ACTIVITIES TBC Computational Chemistry CECAM

Ilian Todorov John Purton David Gunn Laurence Ellison Richard Anderson Micheal Seaton Sebastian Metz Thomas Keal Henry Boateng Chin Yong Dawn Geatches David Bray Paul Durham Leon Petit

Group Leader Chadwick and RAL Group LeaderLibraries Jens Jensen Division Data Services Debbie Franks Head Laboratory Brian Davies Matt Viljoen Bruno Canning Shaun de Witt Christopher Prosser Kashyap Manjusha Carmine Cioffi Roger Downing Kevin O'Neill Juan Sierra Richard Blake Rob Appleyard Cheney Ketley Mark Swaisland

DATA Brian Matthews Data Research Data Rutherford Appleton Laboratory Juan Bicarregui

Data Steve Fisher Alistair Mills Shirley Crompton Kevin Phipps Antony Wilson Vasily Bunakov Simon Lambert Catherine Jones Alastair Duncan Brian Ritchie Wayne Chung Erica Yang Holly Zhen Denise Small Kunal Arora Thomas Kirkham Tracy Colborne Maureen Linda Gilbert Williamson

Group Leader

Dave Cable ActingHigh Performance Systems Division Tim Franks Viliam Kalavsky Colin Morey Tavia Stone Stephen Hill Don Parkin Sarah Steele Karl Richardson Head Nick Hill Research Infrastructure Andrew Sansum Jonathan Churchill Suleman Tariq Ian Johnson Derek Ross Mohit Mittal Sam Worley Dave Meredith Claire Devereux John Kewley Cristina del Cano Stuart Pullinger Ahmed Sajid Matt Langthorpe John Gordon Adrian Coveney Alison Packer Systems Novales

SYSTEMS Systems Andrew Sansum Peta Scale Computing and Storage

Martin Bly Tim Folkes Kashif Hafeez James Adams Dimitrios Zilaskos Ian Collier Catalin Condurache Gareth Smith John Kelly Tiju Idiculla Rob Harper Emile Youmbi Alistair Dewhurst Andrew Lahiff Mbuenmo PPD PPD

Division Group Leader Head

Jennifer Scott Numerical Analysis Michael Gleaves Adrian Toland Terry Hewitt Lee Hannis David Moss Angela Walsh Iain Duff Nick Gould Jonathan Hogg Tyrone Rees Sue Thorne Evgueni Cliff Brereton Ovtchinnikov

CENTRE HARTREE Chris Greenough Division Software Engineering Alan Kyffin Gemma Poulter

Technology Head TES

Frazer Barnsley Greg Corbett George Ryall GRADUA

Mike Ashworth Stephen Pickles Lucian Anton Xiaohu Guo Andrew Porter Andrew Sunderland Rupert Ford Bill Smith Coralia Cartis Peter Oliver Application Performance Liam Jones OCF Dziidka Szotek Jack Dongarra Jeremy Appleyard Technology Engineering

Valerie Burke Kirk Jordan NVIDIA

TS RY RY Paul Kummer TS Stanko Tmoic

Bob McMeeking

TECHNOLOGY

SCIENTIS

VISITORS

HONORA VISITING VISITING John Reid SCIENTIS Jonathan Follows Graham Riley Rob Allan Vendel Szeremi Mark Mawson Paul Strange

Hartree Centre Martin Turner Visualisation

Barry Searle Ronald Fowler Srikanth Nagella September 2014 Overview

• Science and Technology Facilities Council – Scientific Computing Department Applications Division Systems Division Technology Division Data Division

8 Applications Division

• Four groups • Developing and applying computational science software packages • Physical and biological sciences. – Computational Biology (Martyn Winn)  including structural biology, molecular simulation and bioinformatics – Theoretical and Computational Physics (Barbara Montanari),  electronic structure of the solid state and surfaces, atomic and molecular physics – Computational Engineering (David Emerson)  HPC solutions in fluid flow modelling, with particular strength in turbulence and microfluidics – Computational Chemistry (Ilian Todorov)  molecular dynamics, quantum chemistry and QM/MM techniques, and mesoscale methods

9 Research Software in SCD

Time

Seconds

Microseconds

Nanoseconds

Picoseconds

Femtoseconds Distance Nanometer Micrometer Meter

CASTEP DL_POLY DL_MESO etc. Software CRYSTAL Collaborative Computational Projects

CCP4 Prof David Brown Macromolecular Crystallography

CCP5 Prof Stephen Parker The Computer Simulation of Condensed Phases

CCP9 Prof Mike Payne Computational Electronic Structure of Condensed Matter

CCP12 Prof Stewart Cant High Performance Computing in Engineering

CCP-ASEArch Prof Mike Giles Algorithms and Software for Emerging Architectures

CCP-BioSim Prof Adrian Mulholland Biomolecular Simulation at the Life Sciences Interface

CCP-EM Dr Martyn Winn Electron Cryo-Microscopy

CCPi Prof Phillip Withers Tomographic Imaging

CCPN Prof Geerten Vuister NMR

CCP-NC Dr Jonathan Yates NMR Crystallography CCPP Dr Tony Arber Computational Plasma Physics CCPQ * Prof Tania Monteiro Quantum Dynamics in Atomic, Molecular and Optical Physics

CCP-SAS Prof Steve Perkins Structural Data in Chemical Biology and Soft Condensed Matter

CCPForge Prof Chris Greenough Collaborative Software Development Environment Tool11 Overview

• Science and Technology Facilities Council – Scientific Computing Department Applications Division Systems Division Technology Division Data Division

12 UK Tier-1 for the LHC

Hunting the Higgs:

• Low cost commodity hardware

• Open source middleware

• Remote job submission (Grid) • 14 PB disk • 16 PB tape • >99% availability • 10,000 CPU cores • >50 petabytes/year moved globally • 2000 servers • >300 petabytes/year moved internally • 40Gb network Run 2 (doubles data rates and volumes) • 10Gb/s direct optical link to CERN JASMIN

Predictive environmental science (NERC)

JASMIN is a world leading, unique hybrid of: • 16PB high performance storage (~250GByte/s) • High-performance computing (~4,000 cores) • Non-blocking Networking (> 3Tbit/sec), and Optical Private Network WAN’s • Coupled with cloud hosting capabilities

• JASMIN Holds >60% of Data used by the latest IPCC report on Climate change. Emerald Large scale GPU resource National Service Computational Chemistry Software £1M purchased in 2012 Compute, Training and support Oxford, Southampton, Bristol, and UCL SGI Altix UV SMP system Universities 512 CPUs, Diamond Tomography 4TB shared memory Wide range of scientific usage from drug ~70 peer reviewed papers discovery to climate modelling: per year Over 40 applications installed 30 25 20 15 Bristol 10 5 0 Oxford

Southa mpton

Data Archive for BBSRC

Data Support Service for MRC Overview

• Science and Technology Facilities Council – Scientific Computing Department Applications Division Systems Division Technology Division Data Division

16 Software Engineering, Algorithms and Optimisation

Numerical Analysis – Maths Research, Sparse Linear Algebra and Optimisation Software Engineering – Software Engineering Tools and Services, Agent Modelling Application Performance Engineering • Performance tuning and optimisation of HPC Applications, Novel Architectures Visualisation

– Leading edge visualisation facilities – Tomographic Imaging – Collaboration spaces – IMAT Neutron Imaging at ISIS – offering time-of-flight tomography- driven diffraction Overview

• Science and Technology Facilities Council – Scientific Computing Department Applications Division Systems Division Technology Division Data Division

19 Building an Open Data Infrastructure for Research

From Policy to Practice The Innovation Lifecycle

Enabling Enabling Wealth Strategic Knowledge Creation Direction Creation

The The Research Government The Body of Process Process Knowledge

Improved Improved Quality Understanding Quality of Life Assessment

Aggregation of Knowledge lies at the heart of the innovation lifecycle Data centric view of research

Virtual Research the researcher acts Environment through ingest and access Archival

Creation Data Access

Curation Services the researcher shouldn’t have to Information Network Infrastructure worry about the information infrastructureStorage Compute Data Sharing Vision Single Infrastructure  Single User Experience

Raw Data Data Analysed Data Publication Data Publications DifferentCatalogue InfrastructuresAnalysis CatalogueDifferent UserCatalogue ExperiencesCatalogue

Raw Data Analysed Publication Data Publications Analysis Data Data Experiment 1

Analysed Publication Raw Data Data Publications Analysis Data Data Observation 2

Analysed Publication Raw Data Data Publications Analysis Data Data Simulation 3 Capacity Software Data Publications Storage Repositories Repositories Repositories

PaN-Data Infrastructure for Photon and Neutron Sources The 7 C’s

Collaboration Communication

Creation Archival Collection

CreationDataAccess

Curation Services Network Storage Compute

Capacity Curation

Computation Creation

Linked systems for:

• Proposal submission • User management • Data acquisition

Metadata carried from each system to the next

Detectors moving from Hz to KHz, towards MHz,...

Examining the detectors on MAPS instrument on ISIS Capacity 10 x Moore’s Law (2 years) 2 x Moore’s Law (1.5 years) Total Data Stored (TeraBytes) 20PB 10.000 Moore’s law for us } 1.000 is about 15 months

100 Currently store about

20 PetaBytes of data 10

1

2012

Moore’s Law x1000 in 13 years Doubling every 1.3 years Computation

Fitting of experimental data to model Compute intensive components on HPC (BlueGene/Q,1.2 Pflop,#13; Emerald GPGPU cluster at RAL)

Real-time diagnostics of Computational derivation of properties from instrument performance and data flow pipeline. theory Curation

The StorageTek RAL Facility Archives tape robots • All ISIS data (~25 years) > 3,000,000 files 100PB • All Diamond Data (~5 years) > 100,000,000 files Capacity LHC Tier 1 • UK hub for LHC data (30PB) Other UK Research Funders • NERC JASMIN+CEMS super data cluster • BBSRC Institutes data archive • MRC Data Support Service Universities • Imperial College - National Service for Computational Chemistry Software • Oxford, UCL, Southampton Bristol, Emerald GPGPU cluster Publications: • The STFC Publications Archive JASMIN Collection

Record Publication Proposal

Subsequent Approval publication Data registered with Scientist analysis facility submits application for Scheduling Data beamtime Experiment cleansing

Tools for processing made Facility committee available approves application Raw data filtered Scientists visits, and cleansed Facility registers, facility run’s trains, and experiment schedules scientist’s visit Communication

Immense Expectations !

 Web enables: – access to everything  Everything on-demand

 Interlinking enables: – Validation of results – Repetition of experiment STFC’s “e-pubs” Institutional  Discovery enables: Repository has records of – new knowledge from old 30,000 publications spanning 25 years The 7 C’s

Collaboration Communication

Creation Archival Collection

CreationDataAccess

Curation Services Network Storage Compute

Capacity Curation

Computation Overview

• Science and Technology Facilities Council – Scientific Computing Department Applications Division Systems Division Technology Division Data Division

32 Latest News

9 new projects in E-INFRA and FET

EINFRA • EUDAT 2020 • RDA 3 • NFFA • …

FET • NLAFET • SAGE • …

33

Usable

Bio

Sci

Sci

Medicine

Social Social

Humanities

Environment Physical Questions? STFC Scientific Computing Department

Juan Bicarregui Head of Data Division

PLaN-E meeting Copenhagen 9 April 2015

36