Accelerating Precision Medicine through the UK WGS initiative.

Dr Tim Cutts Head of Scientific Computing Wellcome Sanger Institute

Dr Sinan Yavuz Senior Bioinformatics Scientist Seven Bridges • Leading large projects, long-term national and international scientific collaborations and Ideas factory consortia pushing new frontiers Sanger as a • Substantial research studies and technical uncertainties, pushing Genome Institute boundaries at the scientific Take risks leading edge From 1 human genome to 100,000s, to every single cell in the body, to the • Change direction with agility to genome of every species on the planet: accommodate new technology and applications Adapt DNA and RNA sequence data generation, human cell culture and • Industrial-scale technology imaging - supported by computational platforms, large IT infrastructure, analysis and data management Large-scale highly skilled and professional strategies befitting our scale Platforms staff Sanger Scientific Operations and IT

Cellular Operations Animal Facility and Mouse Pipelines* Sequencing Facility and Sample Management Research & Development Software (LIMS) and Informatics Quality Assurance Strategic Planning, Delivery and Support

HPC Research & Development On-premise cloud computing Multi-petabyte scientific data quantities Informatics assistance UKB Summary

UK Government Initiative in Precision Medicine Large, long-term biobank cohort - improve prevention, diagnosis and treatment of serious and life-threatening illnesses Following health and well-being of 500,000 volunteers Provides anonymised health information to approved researchers Investigate contributions of genetic predisposition and environmental exposure to development of

Clare Bycroft et al., Nature, 11 Oct. 2018 UK Biobank Sequencing Main Phase Project

Key facts and figures:

• Follows on from Vanguard Pilot study of 50,000 whole human genomes • £200m fund to whole genome sequence and variant call 450,000 remaining participants • Build on health data to identify genomic variations to understand contribution to health and disease • Funding from 4 x Pharmaceutical companies at £25m each • Wellcome & UKRI joint funding of £100m • Working in partnership with deCODE (Iceland) 225,000 each • Analysis following Functional Equivalence (doi:10.1038/s41467-018-06159-4) • Generate equivalent of 5000 billion pages of text • Key Milestones: Joint Variant Calling (JVC) on ~120,000 and at 450,000 genomes • First JVC data set due to be delivered May 2020 • Sequencing completed Sept 2021

• Data will be made available by UK Biobank for analysis with their other genetic and phenotypic data sets How have we scaled?

• Rapidly! 4 weeks to scale the operation • Install 7 New Novaseq’s in 2 weeks • Install and implement new IT infrastructure – 2 new Lustre systems and a new separate OpenStack environment in a separate UKB environment: more than 600 separate IT components. 15 weeks 3 days from requisition to handover • Upgraded Internet link to 20Gbit/sec to cope with data flow rate • Implement a new data analysis pipeline in 4 weeks

• Sequence 1000 30X Human Genomes each Monday, Wednesday and Friday • Now run 15x Novaseq 6000 dedicated to this project Main Phase Starts Sep 2019

• Sanger has sequenced more DNA in the past 12 months than in the previous 25 years

Vanguard Starts • First human genome took 10 years Aug 2018

• Sanger is now sequencing ~3,000 Novaseq human genomes a week UKB Ultra High-Throughput Sequencing Pipeline

Stock sample Cherry- Library Library Pooling & Primary Analysis Secondary analysis Data Sequencing run QC picking preparation quantification normalisation and Data transfer (SevenBridges) review

LIMS tracks samples throughout the process, recording QC values & data status

Sanger IT Strategy Flexible Hybrid Cloud Approach to Data Analysis: Sanger and Seven Bridges Compute Environment • Sample Merging and initial QC at Sanger • Advantages: • OpenStack private cloud • Dedicated UKB MP private cloud is going live this • Continuity with Vanguard • 12,000 vCPUS week, 3,000 cores and 5.4PB of Lustre storage • Reduced capital lead time • 100 TB RAM dedicated to UKB • 5.5 PB object storage • Further data processing with Seven Bridges • 100Gbit software-defined network • >130,000 VMs run in the first year • sample QC • single variant calling Used for Vanguard and early stages of • data submission to UKB main phase while dedicated capacity built • Runs on Google Compute Platform UKB WGS Data Flow Multiple Teams Involved

COO SciOps Senior Management Director’s Office Sequencing Operations R&D Informatics IT Legal and Ethics Finance Procurement Communications Stores Health and Safety Scientific Administration Further information

Wellcome Sanger Institute Director of Scientific Operations Cordelia Langford

Head of Sequencing Operations Ian Johnston

Head of Scientific Computing Tim Cutts

Seven Bridges Dr Sinan Yavuz