STFC Scientific Computing Department
Juan Bicarregui Head of Data Division
PLaN-E meeting Copenhagen 9 April 2015
1 Overview
• Science and Technology Facilities Council – Scientific Computing Department Applications Division Systems Division Technology Division Data Division
2 What is STFC?
250m Programme includes:
• Neutron and Muon Source Square• Synchrotron Kilometre Radiation Array Source Large Hadron Collider • Lasers • Space Science • Particle Physics • Compuing and Data Management • Microstructures • Nuclear Physics • Radio Communications
Daresbury Laboratory ESRF & ILL, Grenoble The National Laboratories Overview
• Science and Technology Facilities Council – Scientific Computing Department Applications Division Systems Division Technology Division Data Division
5 Scientific Computing
Established 1st April 2012 from e-Science and Computational Science and Engineering
180 staff supporting over 7500 users • Scientific Applications development and support • Compute and data facilities and services • Systems administration, data services, • high-performance computing, • numerical analysis & software engineering. > 100 publications pa > 3500 training days pa
Maximising the impact of Scientific Computing Big Data for Big Science Scientific Computing Department
Acting Director
4 Divisions: David Corney Group Leader Group Leader Martyn Winn Biology and Life Sciences Shirley Miller DL Administration Hannes Loeffler Eugene Krissinel Ville Uski David Waterman Marcin Wojdyr Charles Ballard Ronan Keegan Narayanan Krishnan Chris Morris Andrey Lebedev Chang Sik Kim Agnel Joseph Chris Wood Valeria Losasso Tom Burnley James Gebbie Christelle Gendrin Laura Johnston Esme Williams Damian Jones
Dave Emerson RAL Division Engineering and Administration Environment Head Benzi John Charles Moulinec Rob Barber Xiaojun Gu Stefano Rolfo Malgorzata Zimon Yin Yue Karen McIntyre Jean Pearce Carol Malpass
Barbara Montanari Applications Theoretical and Finance Paul Sherwood Computational Physics CORE Applications Keith Refson Leonardo Leon Petit Martin Lueders Martin Plummer Dominik Jochym Simone Sturniolo Nic Harrison
Bernasconi Tracey Kelly Jenny Williams
APPLICATIONS ACTIVITIES TBC Computational Chemistry CECAM
Ilian Todorov John Purton David Gunn Laurence Ellison Richard Anderson Micheal Seaton Sebastian Metz Thomas Keal Henry Boateng Chin Yong Dawn Geatches David Bray Paul Durham Leon Petit
Group Leader Chadwick and RAL Group LeaderLibraries Jens Jensen Division Data Services Debbie Franks Head Daresbury Laboratory Brian Davies Matt Viljoen Bruno Canning Shaun de Witt Christopher Prosser Kashyap Manjusha Carmine Cioffi Roger Downing Kevin O'Neill Juan Sierra Richard Blake Rob Appleyard Cheney Ketley Mark Swaisland
DATA Brian Matthews Data Research Data Rutherford Appleton Laboratory Juan Bicarregui
Data Steve Fisher Alistair Mills Shirley Crompton Kevin Phipps Antony Wilson Vasily Bunakov Simon Lambert Catherine Jones Alastair Duncan Brian Ritchie Wayne Chung Erica Yang Holly Zhen Denise Small Kunal Arora Thomas Kirkham Tracy Colborne Maureen Linda Gilbert Williamson
Group Leader
Dave Cable ActingHigh Performance Systems Division Tim Franks Viliam Kalavsky Colin Morey Tavia Stone Stephen Hill Don Parkin Sarah Steele Karl Richardson Head Nick Hill Research Infrastructure Andrew Sansum Jonathan Churchill Suleman Tariq Ian Johnson Derek Ross Mohit Mittal Sam Worley Dave Meredith Claire Devereux John Kewley Cristina del Cano Stuart Pullinger Ahmed Sajid Matt Langthorpe John Gordon Adrian Coveney Alison Packer Systems Novales
SYSTEMS Systems Andrew Sansum Peta Scale Computing and Storage
Martin Bly Tim Folkes Kashif Hafeez James Adams Dimitrios Zilaskos Ian Collier Catalin Condurache Gareth Smith John Kelly Tiju Idiculla Rob Harper Emile Youmbi Alistair Dewhurst Andrew Lahiff Mbuenmo PPD PPD
Division Group Leader Head
Jennifer Scott Numerical Analysis Michael Gleaves Adrian Toland Terry Hewitt Lee Hannis David Moss Angela Walsh Iain Duff Nick Gould Jonathan Hogg Tyrone Rees Sue Thorne Evgueni Cliff Brereton Ovtchinnikov
CENTRE Hartree Centre HARTREE Chris Greenough Division Software Engineering Alan Kyffin Gemma Poulter
Technology Head TES
Frazer Barnsley Greg Corbett George Ryall GRADUA
Mike Ashworth Stephen Pickles Lucian Anton Xiaohu Guo Andrew Porter Andrew Sunderland Rupert Ford Bill Smith Coralia Cartis Peter Oliver Application Performance Liam Jones OCF Dziidka Szotek Jack Dongarra Jeremy Appleyard Technology Engineering
Valerie Burke Kirk Jordan NVIDIA
TS RY RY Paul Kummer TS Stanko Tmoic
Bob McMeeking
TECHNOLOGY
SCIENTIS
VISITORS
HONORA VISITING VISITING John Reid SCIENTIS Jonathan Follows Graham Riley Rob Allan Vendel Szeremi Mark Mawson Paul Strange
Hartree Centre Martin Turner Visualisation
Barry Searle Ronald Fowler Srikanth Nagella September 2014 Overview
• Science and Technology Facilities Council – Scientific Computing Department Applications Division Systems Division Technology Division Data Division
8 Applications Division
• Four groups • Developing and applying computational science software packages • Physical and biological sciences. – Computational Biology (Martyn Winn) including structural biology, molecular simulation and bioinformatics – Theoretical and Computational Physics (Barbara Montanari), electronic structure of the solid state and surfaces, atomic and molecular physics – Computational Engineering (David Emerson) HPC solutions in fluid flow modelling, with particular strength in turbulence and microfluidics – Computational Chemistry (Ilian Todorov) molecular dynamics, quantum chemistry and QM/MM techniques, and mesoscale methods
9 Research Software in SCD
Time
Seconds
Microseconds
Nanoseconds
Picoseconds
Femtoseconds Distance Nanometer Micrometer Meter
CASTEP DL_POLY DL_MESO etc. Software CRYSTAL Collaborative Computational Projects
CCP4 Prof David Brown Macromolecular Crystallography
CCP5 Prof Stephen Parker The Computer Simulation of Condensed Phases
CCP9 Prof Mike Payne Computational Electronic Structure of Condensed Matter
CCP12 Prof Stewart Cant High Performance Computing in Engineering
CCP-ASEArch Prof Mike Giles Algorithms and Software for Emerging Architectures
CCP-BioSim Prof Adrian Mulholland Biomolecular Simulation at the Life Sciences Interface
CCP-EM Dr Martyn Winn Electron Cryo-Microscopy
CCPi Prof Phillip Withers Tomographic Imaging
CCPN Prof Geerten Vuister NMR
CCP-NC Dr Jonathan Yates NMR Crystallography CCPP Dr Tony Arber Computational Plasma Physics CCPQ * Prof Tania Monteiro Quantum Dynamics in Atomic, Molecular and Optical Physics
CCP-SAS Prof Steve Perkins Structural Data in Chemical Biology and Soft Condensed Matter
CCPForge Prof Chris Greenough Collaborative Software Development Environment Tool11 Overview
• Science and Technology Facilities Council – Scientific Computing Department Applications Division Systems Division Technology Division Data Division
12 UK Tier-1 for the LHC
Hunting the Higgs:
• Low cost commodity hardware
• Open source middleware
• Remote job submission (Grid) • 14 PB disk • 16 PB tape • >99% availability • 10,000 CPU cores • >50 petabytes/year moved globally • 2000 servers • >300 petabytes/year moved internally • 40Gb network Run 2 (doubles data rates and volumes) • 10Gb/s direct optical link to CERN JASMIN
Predictive environmental science (NERC)
JASMIN is a world leading, unique hybrid of: • 16PB high performance storage (~250GByte/s) • High-performance computing (~4,000 cores) • Non-blocking Networking (> 3Tbit/sec), and Optical Private Network WAN’s • Coupled with cloud hosting capabilities
• JASMIN Holds >60% of Data used by the latest IPCC report on Climate change. Emerald Large scale GPU resource National Service Computational Chemistry Software £1M purchased in 2012 Compute, Training and support Oxford, Southampton, Bristol, and UCL SGI Altix UV SMP system Universities 512 CPUs, Diamond Tomography 4TB shared memory Wide range of scientific usage from drug ~70 peer reviewed papers discovery to climate modelling: per year Over 40 applications installed 30 25 20 15 Bristol 10 5 0 Oxford
Southa mpton
Data Archive for BBSRC
Data Support Service for MRC Overview
• Science and Technology Facilities Council – Scientific Computing Department Applications Division Systems Division Technology Division Data Division
16 Software Engineering, Algorithms and Optimisation
Numerical Analysis – Maths Research, Sparse Linear Algebra and Optimisation Software Engineering – Software Engineering Tools and Services, Agent Modelling Application Performance Engineering • Performance tuning and optimisation of HPC Applications, Novel Architectures Visualisation
– Leading edge visualisation facilities – Tomographic Imaging – Collaboration spaces – IMAT Neutron Imaging at ISIS – offering time-of-flight tomography- driven diffraction Overview
• Science and Technology Facilities Council – Scientific Computing Department Applications Division Systems Division Technology Division Data Division
19 Building an Open Data Infrastructure for Research
From Policy to Practice The Innovation Lifecycle
Enabling Enabling Wealth Strategic Knowledge Creation Direction Creation
The The Research Government The Body of Process Process Knowledge
Improved Improved Quality Understanding Quality of Life Assessment
Aggregation of Knowledge lies at the heart of the innovation lifecycle Data centric view of research
Virtual Research the researcher acts Environment through ingest and access Archival
Creation Data Access
Curation Services the researcher shouldn’t have to Information Network Infrastructure worry about the information infrastructureStorage Compute Data Sharing Vision Single Infrastructure Single User Experience
Raw Data Data Analysed Data Publication Data Publications DifferentCatalogue InfrastructuresAnalysis CatalogueDifferent UserCatalogue ExperiencesCatalogue
Raw Data Analysed Publication Data Publications Analysis Data Data Experiment 1
Analysed Publication Raw Data Data Publications Analysis Data Data Observation 2
Analysed Publication Raw Data Data Publications Analysis Data Data Simulation 3 Capacity Software Data Publications Storage Repositories Repositories Repositories
PaN-Data Infrastructure for Photon and Neutron Sources The 7 C’s
Collaboration Communication
Creation Archival Collection
CreationDataAccess
Curation Services Network Storage Compute
Capacity Curation
Computation Creation
Linked systems for:
• Proposal submission • User management • Data acquisition
Metadata carried from each system to the next
Detectors moving from Hz to KHz, towards MHz,...
Examining the detectors on MAPS instrument on ISIS Capacity 10 x Moore’s Law (2 years) 2 x Moore’s Law (1.5 years) Total Data Stored (TeraBytes) 20PB 10.000 Moore’s law for us } 1.000 is about 15 months
100 Currently store about
20 PetaBytes of data 10
1
2012
Moore’s Law x1000 in 13 years Doubling every 1.3 years Computation
Fitting of experimental data to model Compute intensive components on HPC (BlueGene/Q,1.2 Pflop,#13; Emerald GPGPU cluster at RAL)
Real-time diagnostics of Computational derivation of properties from instrument performance and data flow pipeline. theory Curation
The StorageTek RAL Facility Archives tape robots • All ISIS data (~25 years) > 3,000,000 files 100PB • All Diamond Data (~5 years) > 100,000,000 files Capacity LHC Tier 1 • UK hub for LHC data (30PB) Other UK Research Funders • NERC JASMIN+CEMS super data cluster • BBSRC Institutes data archive • MRC Data Support Service Universities • Imperial College - National Service for Computational Chemistry Software • Oxford, UCL, Southampton Bristol, Emerald GPGPU cluster Publications: • The STFC Publications Archive JASMIN Collection
Record Publication Proposal
Subsequent Approval publication Data registered with Scientist analysis facility submits application for Scheduling Data beamtime Experiment cleansing
Tools for processing made Facility committee available approves application Raw data filtered Scientists visits, and cleansed Facility registers, facility run’s trains, and experiment schedules scientist’s visit Communication
Immense Expectations !
Web enables: – access to everything Everything on-demand
Interlinking enables: – Validation of results – Repetition of experiment STFC’s “e-pubs” Institutional Discovery enables: Repository has records of – new knowledge from old 30,000 publications spanning 25 years The 7 C’s
Collaboration Communication
Creation Archival Collection
CreationDataAccess
Curation Services Network Storage Compute
Capacity Curation
Computation Overview
• Science and Technology Facilities Council – Scientific Computing Department Applications Division Systems Division Technology Division Data Division
32 Latest News
9 new projects in E-INFRA and FET
EINFRA • EUDAT 2020 • RDA 3 • NFFA • …
FET • NLAFET • SAGE • …
33
Usable
Bio
Sci
Sci
Medicine
Social Social
Humanities
Environment Physical Questions? STFC Scientific Computing Department
Juan Bicarregui Head of Data Division
PLaN-E meeting Copenhagen 9 April 2015
36