Highlights 2016/17 What we do Our work Our approach Other information

1

Wherever you see this icon, click findto more information on the Sanger Institute website. Sanger Institute Highlights 2016/17

a ular Geneticsular omatic Mutation Cancer, Ageing and S Human Genetics Infection Genomics Malari

 Cell Innovation Influence Connections

Culture Scale 32 34 36 38 40 Director’s introduction Institute information Image credits Institute numbers in What we do we What Other information Our approach

Contents 30 42 42 43 02 03 04 08 Our work 10 16 20 24 28

sanger.ac.uk

science globally and empower medical and empower designed to influence projects that are long-term exploratory long-term engage in bold and scale, we are able to able are we scale, Throughour ability at research conduct to Wellcome Trust Sanger Institute Highlights 2016/17 Wellcome Trust Sanger Institute Highlights 2016/17 Director’s introduction What we do we What

oday at the Sanger and their environment, and our research dialogue: between Institute genomic evolving research facilitated by scientists and institutes, industry scientists face unparalleled global dialogue between scientists, and academe, investigators and opportunities – and data sources and institutions. healthcare, and researchers and What Tchallenges – offered by the governments. This Institute was confluence of powerful new The dialogue between established to facilitate such work Our technologies, paradigm shifts in and their environment is collaborative working – first as understanding, daring scientific continually shaping human and part of the Human ambition and global cooperation. pathogen genomes, creating Project and now by laying the a flux that impacts every aspect foundations for sustainable global More than two decades ago the of disease and health. From the networks for genomic research.

first reference was rise of drug resistance in malaria Our approach we do born from the scientific community to finding the drivers of cancer, coalescing to pool knowledge, and from the sources of rare technologies and funding at a developmental disorders to scale never before seen. The result healthy bacterial mixes in the was truly historic – the publication gut microbiome, genomic of humankind’s ‘book of life’. research plays a pivotal role in But, just as in nature, this sequence understanding health and disease. Other information is not static: continual challenge The research outlined within has developed and refined it many the following pages is providing times. The same is true of the new knowledge and insight to Sanger Institute. inform approaches to diagnosis, treatment, disease prevention Running through this Institute and health promotion. Highlights are two entwined narratives – the ever-changing The only way to explore such nature of genomes resulting a vast world of connections is Mike Stratton, Director from interplay between organisms through inclusive and equitable Wellcome Trust Sanger Institute

2 3 What we do Our work Our approach Other information 7 Dec

Gut microbiomeGut spin-out company launched – Microbiotica See page 35

Sanger spin-outSanger – – securesKymab $100m See page 35 Nov Wellcome Trust Sanger Institute Highlights 2016/17 UK Prime Minister Minister Prime UK opens new Sequencing Facility Biodataand Centre Innovation See page 32 Infection control control Infection by changed practices genomic discovery See page 24 Genetic damage damage Genetic cigarette regular of quantified smoking See page11 Oct Landmark BLUEPRINT papers published blood on development cell function and See page 22 Sep

Roots of childhoodRoots of congenital heart disease revealed See page 23

Genetic marker Genetic marker malaria of treatment failure found in Cambodia See page28 Aug Human Cancer Human Initiative Model launched See page 33

Human Cell Atlas Initiative launched See page 17 Jul 2016-2021 Sanger researchInstitute quinquennium started See page 8 Cancer cell lines lines cell Cancer predictor good response drug of See page 14 6 Acute Myeloid Acute Leukaemia is at different 11 least diseases See page 13 GA4GH presents genomic for vision data clinical and sharing See page39 Jun May Superbug tracked tracked Superbug Europe across See page 27 Five new breast found genes cancer See page 12 Apr

Mar Gut’s bacterial dark matter catalogued See page27 All PartyAll Parliamentary onGroup Personalised launched Medicine See page 38 European Bank Bank European inducedof pluripotent Stem Cells opened See page 33 Feb

Aboriginal Y Australian show chromosomes 50,000 of years independence See page 23 Jan Biogen joined Open pioneering Centre Targets See page 40 2016 Timeline

17 Organoid lines so far to EBiSC Stem Cells) 182 (European for Bank induced pluripotent

100 , 237 208 to ECACC Cell Cultures) Cell of Authenticated of , (European Collection 1 cell lines given researchers to around the world, plus small numbers lines of five to other institutes. 27 cells. of …vials Organoids

Science 12

1 3 Journal The Lancet of Medicine of 429 New England of Dundee to University

in 2016 Institute 684 Publications 988

, Cell 6 847 Nature 20 Genetics London 418 , Nature 14 to King’s College College King’s to Cell lines shared with partner organisations partner with shared lines Cell 2 them of 2,418 unique cell lines were cultured 2016. in were future for use frozen generating… Publications and Phenotyping Generation Cellular 21 cell culture plates used in 2016. and EBiSC act as distributionECACC hubs our for induced pluripotent stem cell lines and send them researchers to around the world. 5

We readWe the genome every

equivalent one of 3m

gold-standard human 35mins Number of compute jobs compute Number of Sequencing genomes a day the Data centre centre produces centre the equivalent of Usable storage in 417 gold-standard human 42PB All facts and figures gathered in December2016

Number of compute cores of a genomeof is Data 30x A gold standard read centre centre 22,000

Sequencing

websites bases long bases The Human Page views of genome is approx. Sanger Institute-run

3bn

44m

Electricity used outputs approx. DNA bases a day by theby Data centre Sequencing centre centre Sequencing

1.4 MW 3,750bn

4 2016 at a glance at 2016 Wellcome Trust Sanger Institute Highlights 2016/17 Wellcome Trust Sanger Institute Highlights 2016/17 Wellcome Trust Sanger Institute Highlights 2016/17

Our work

10 Cancer, Ageing and Somatic do we What Mutation Programme With secured

Provides leadership in data aggregation and informatics innovation, develops funding from high-throughput cellular models of cancer for genome-wide functional screens and drug

testing, and explores somatic mutation’s role Wellcome for the work Our in clonal , ageing and development. 16 Cellular Genetics next five years Programme

Explores human gene function by studying we aim to focus

the impact of genome variation on cell Our approach . Large-scale systematic screens are used to discover the impact of naturally- our work in five occurring and engineered genome mutations in human iPS cells, their differentiated derivatives, and other cell types. key research 20 Human Genetics Programme programmes Other information

Applies genomics to population-scale studies to identify the causal variants and pathways involved in human disease and their effects on cell biology. It also models developmental disorders to explore which physical aspects might be reversible.

24 Infection Genomics Programme

Investigates the common underpinning mechanisms of evolution, infection and resistance to therapy in viruses, bacteria and parasites. It also explores the genetics of host response to infection and the role of the microbiota in health and disease. 28 Malaria Programme 212m Integrates genomic, genetic and proteomic cases worldwide of approaches to develop and enhance malaria in 2015 high‑throughput tools and technologies to study specific biological problems relevant for malaria control and to understand the fundamental science of the human host, the mosquito vector and the Plasmodium pathogen.

8 9 Wellcome Trust Sanger Institute Highlights 2016/17 Wellcome Trust Sanger Institute Highlights 2016/17 Cancer, Ageing and Somatic Mutation Programme Cancer, Ageing and Somatic Mutation Programme

Mutational signatures: new insights into Cracking cancer’s DNA damage 23

mouth do we What n recent years, Sanger researchers have Mutations produced made great progress in understanding in each cell by conundrums not only what has gone wrong in cancer smoking a pack cells, but also how. By analysing the of cigarettes a day Ialtered genetic landscapes of many for a year The Cancer, Ageing and thousands of cancers, they have identified Somatic Mutation Programme dozens of distinctive patterns of DNA 39 damage. Each of these ‘mutational pharynx has sequenced and analysed signatures’ is the result of a distinct work Our cancer genomes at scale DNA-damaging process. This work to reveal complexity and is shedding light on the molecular surprising commonality. mechanisms that compromise the integrity of the genome – knowledge Sanger researchers have begun that could underpin new ways to treat 97 to tease apart the multiple or prevent cancer. larynx

mechanisms disrupting the Our approach In 2010, Sanger researchers identified genome and driving cancer distinctive genetic changes in the genome – offering insights that could of a smoker’s lung cancer, revealing how inform new diagnostics the constituent chemicals in tobacco smoke damage DNA in different ways. 150 lungs and treatments. Extending this approach to other forms of cancer, Sanger scientists systematically analysed data from multiple cancer In this section or more than a decade, Sanger “All cancers are due to genomes, identifying a wide range of Other information researchers have been at the mutational signatures linked to different 11 Mapping DNA damage in the body forefront of large-scale cancer mutations that occur DNA-damaging processes. genome projects that have in all of us in the DNA For example, some types of mutation 18 bladder 12 Hidden root of skin cancer found Fgenerated an ever-lengthening catalogue are linked to abnormal activation of an of our cells during the of genes linked to various cancers. antiviral defence mechanism (the APOBEC 12 Breast cancer: the next step These efforts have revealed a daunting course of our lifetimes. system), while others seem to reflect the diversity and complexity to cancer’s Finding these mutations action of a cellular ‘clock-like’ mechanism, 13 Genomic microscope gives genomic landscape: mutations in many is crucial to understanding leading to the steady accumulation of better diagnoses genes can cause cancer, particular types mutations over time. of cancer can have multiple genetic the causes of cancer 6 liver 14 Cell lines point to best drugs causes, and cancers are constantly and to developing Signatures show extent of evolving within individual patients. improved therapies.” smoking’s genetic damage 15 Cancer in 3D: organoids In 2016 Sanger scientists returned to To meet these challenges, the Cancer, smoking-associated cancer by leading a Professor Sir Mike Stratton Ageing and Somatic Mutation Programme Director of the Sanger Institute study of more than 5,000 genomes of employs a range of scientific approaches. smoking-linked cancers. Computational DNA, including APOBEC and clock-like radiation-induced cancers with those At the international level the Programme analysis, published in Science, revealed mechanisms. The analysis also revealed not linked to radiation exposure leads and contributes to many worldwide that smoking is associated with multiple that, on average, smokers consuming identifying two mutational signatures collaborations that generate, share and mutational signatures, suggesting that a pack of cigarettes a day accumulated specifically associated with exposure interpret cancer genome data. tobacco smoke damages DNA and drives an extra 150 mutations in every lung cell to ionising radiation.2 cancer through a diversity of mechanisms.1 each year, 23 mutations in each mouth Through Sanger cell and six mutations in each liver cell. Radiation-induced cancers typically scientists pioneer innovative methods One common type of damage is chemical carried approximately 200 small deletions of data analysis and pattern recognition modification of DNA bases by compounds Like tobacco smoke, ionising radiation (of up to 100 base pairs) as well as highly to provide important new insights into in tobacco smoke, which was seen only can damage DNA and increase the risk unusual short stretches of ‘flipped’ DNA cancer cell biology. At the laboratory in cancers of tissues exposed to tobacco of cancer. Each year, a small number of (balanced inversions). These signatures bench Sanger research groups work with smoke, such as lung, larynx and mouth. patients receiving radiotherapy develop will help clinicians to determine whether international partners to develop novel In cancers of tissues not directly exposed cancers because they are exposed cancers are linked to radiation exposure, cancer cell models that could transform to tobacco smoke, such as pancreas and to radiation. In work published in Nature and enable researchers to explore the in vitro studies of cancer and the kidney, smoking affected other cellular Communications, Sanger scientists possibility of specific types of treatment. development of therapies. processes that introduce mutations into compared the genomes of 12 such

10 11 Wellcome Trust Sanger Institute Highlights 2016/17 Wellcome Trust Sanger Institute Highlights 2016/17 Cancer, Ageing and Somatic Mutation Programme Cancer, Ageing and Somatic Mutation Programme

Acute myeloid leukaemia

New genomic classifications based on specific combinations of mutations Redheads carry a offer a way to discern a person’s likely cancerous legacy prognosis and response to therapy

V radiation is a particular danger Furthermore, as well as the telltale A Sanger-led collaboration reported in do we What to people with fair skin, red hair mutational signature associated with The New England Journal of Medicine its and freckles, who carry two solar UV exposure, people with one analysis of 111 genes implicated in AML copies of a variant of the or two copies of the MC1R variant in more than 1,500 patients taking part Umelanocortin 1 receptor (MC1R) gene. also showed increased levels of other in clinical trials of intensive therapy.6 Such people are unable to produce a mutational signatures. Hence the MC1R The study identified more than 5,000 protective skin pigment and are at variant may be affecting cancer risk cancer-causing mutations affecting increased risk of melanoma. through multiple mutational processes 76 regions of the genome, with most in skin cells. patients having at least two mutations. work Our By analysing public databases of Crucially, the team was able to link the melanoma cancer genomes, Sanger “This is one of the first examples presence of certain combinations of researchers reported in Nature mutations to key clinical factors, such Communications that the tumours of of a common genetic profile as prognosis and response to treatment. patients with the MC1R variant contained having a large impact on a cancer In total the team distinguished 11 different markedly more mutations – equivalent genome and could help better types of AML, some corresponding to to an extra 21 years of sun exposure.3 existing patient subgroups but also new Moreover, even people with just one identify people at higher risk subtypes that reflect novel mechanisms Our approach copy of the MC1R variant – who are not of developing skin cancer.” Personalising by which AML may arise. redheads – showed high levels of DNA treatment: damage, and are at significantly increased Dr David Adams Although the new classification needs risk of melanoma. Group leader at the Sanger Institute to be verified, it has the potential to offer developing genomic an important new way to classify AML patients according to their likely prognosis diagnosis and response to therapy. The findings are also a key step towards more efficient Other information Breast cancer research enters new era clinical trials in which participants are key goal of cancer genomics is recruited on the basis of their cancer’s to create new tools that help genomic landscape and its likelihood utational signatures have The study added five new members to “In the future, we’d like clinicians and benefit patients. of response to the treatment. also generated new insight the list of protein-coding genes implicated to be able to profile In particular it is hoped that a into the origins of breast in breast cancer – now up to 93. It also Adeeper understanding of the genetic origins As AML illustrates, a single type of cancer, one of the most identified 12 base substitution and six individual cancer genomes of individual cancers will lead to more cancer may have multiple genetic causes. Mintensively studied cancers. Sanger rearrangement mutational signatures targeted drug therapies aimed at specific Equally, the same genetic abnormality so that we can identify the researchers led the most comprehensive associated with breast cancer. Three of the treatment most likely to molecular abnormalities, and diagnostics may contribute to more than one type analysis yet of breast cancer genomes, rearrangement signatures were associated Whole-genome profiling that will enable clinicians to tailor treatment of cancer. This offers the possibility that revealing a host of processes disrupting with defective DNA repair mechanisms – be successful for a woman to the specific genomic landscape of a targeted treatments could be effectively genomic integrity.4 one linked to the common BRCA1 gene, or man diagnosed with Visualising breast cancer genomes patient’s cancer. Sanger researchers are deployed across multiple types of cancer one to BRCA2 and one of unknown origin. breast cancer. It is a step in this way may lead to more effective, in the vanguard of this work. that share cancer-causing mutations. In Nature, the international collaboration personalised treatment decisions reported the results of analysing The study revealed key details of the closer to personalised For decades, cancers have been whole-genome sequences from mutational processes leading to breast healthcare for cancer.” characterised according to their position 560 cancer genomes. While previous cancer and the critical genes they affect. in the body and appearance under the “We have shown that sequencing efforts have typically It suggests that most of the genes involved Dr Serena Nik-Zainal microscope. Cancer genomics offers an AML is an umbrella concentrated on finding DNA variations in breast cancer have been identified, Group leader at the Sanger Institute By linking mutational signatures to alternative approach, with individual term for a group of at in the exome (the protein-coding parts and that fusion genes and mutations cellular processes, the study provides cancers classified according to the of the genome), this study included in non-coding regions are likely to play at random across the genome, but new insight into the molecular genetic changes that have rendered least 11 different types non-coding regions to explore the role only a minor role. show an association with particular mechanisms by which these mutational them cancerous. of leukaemia. We can of gene activity control in triggering genomic landmarks, such as those processes alter the genome. More now start to decode cancerous cell growth. By s