<<

Leading the way

Highlights 2018/19 World-class 02 research and What we do innovation 02 Director’s Introduction that impacts 04 At a Glance people’s lives Through our cutting-edge infrastructure, scientific independence and innovative ideas we engage in long-term exploratory projects that drive forward science

Wherever you see this dark blue text or yellow, click to find more information.

Read more about our ground-breaking research into Parasites and Microbes Page 26

sanger.ac.uk What we do Our work Our approach Other information

08

Our work

10 , Ageing and Somatic

16 Cellular

20 Human Genetics

26 Parasites and Microbes

34 Tree of Life

40

Our approach

42 Scale

44 Innovation

46 Culture

48 Influence 52

50 Connections Other info

52 Image Credits

53 Institute Information

Wellcome Sanger Institute Highlights 2018/19 1 What we do

One such project is the UK Director’s Vanguard project, where our pipeline is delivering 50,000 volunteers’ Introduction whole- sequences to aid genetic he past year has been one of great exploration of health and . joy and sadness: we celebrated our By marrying the wealth of genomic data 25th anniversary and we mourned T with electronic health records and other the of our first director, . molecular characterisation tools, our Under John’s guidance, the Sanger Institute researchers have discovered new determined one-third of the human reference variants associated with osteoarthritis, genome, providing a fundamental gift to life developed prognostic calculators for science research worldwide. cancer, and created an artificial intelligence It seemed most fitting to seek to make that predicts new pathogen emergence. another foundational gift to science as our Our established approaches continue to main celebration. Our 25 for 25 deliver important discoveries. Our malaria Years project has delivered high-quality researchers applied genome-wide genomes for UK species for which no saturation mutagenesis to identify reference existed. This work is catalysing Plasmodium falciparium’s essential research and aiding conservation efforts , revealing new therapeutic targets. in areas as diverse as golden eagle Discover While our collaboration with the NHS – resettlement and fen raft spider breeding. how we have the Deciphering Developmental Disorders driven forward There is much that can bring to (DDD) project – celebrated its eighth year genomic the ecological issues facing our planet and of delivering diagnoses to children with understanding our communities, and we are proud to developmental disorders. After 125 papers in 2018 have founded a new research programme and more than 4,500 diagnoses, the project – The Tree of Life. It will lead the UK-wide continues to deliver fresh insights into the collaboration between genomic, conservation mechanisms of inheritance, and has identified and ecology partners to produce reference 49 previously unknown genetic disorders. genomes for all 66,000 species of complex As we move into our next quarter century, organisms found in the British Isles. the promise of genomic science to improve This new work will also serve to strengthen health and conservation is greater than our research into human health. A paradigm ever, and we are privileged to help deliver shift in sequencing technologies has the next wave of discovery. opened up routine, high-throughput whole-genome sequencing of human genomes. Advances in single-cell technologies, machine learning, and cellular modelling are dissecting health and disease within single cells in individual tissues, over time. Our challenge and goal is to lead the way in extracting the maximum value from this data. Professor Sir Mike Stratton, Director Wellcome Sanger Institute

2 Wellcome Sanger Institute Highlights 2018/19 What we do Our work Our approach Other information

Wellcome Sanger Institute Highlights 2018/19 3 At a Glance

In 2018, the In 2018, we read The human genome Sequencing Centre the genomes of is approximately outputted almost 7,558bn 338 3bn DNA bases species bases long a day In 2018, the In 2018, we Sequencing Centre read the equivalent provided the equivalent of of one gold-standard (30x) human genome every 588 gold-standard (30x) 17 mins human genomes a week

encing Cen qu tr Se e

D ata Centre

Centre-based Usable storage high performance in the Data Centre compute cores 20,000 55PB

Total number of Centre-based Network backbone compute cores cloud-based flexible speed compute cores 32,000 12,000 400GB/sec

4 Wellcome Sanger Institute Highlights 2018/19 2018 timeline

Institute’s founding director, John Sulston, died Page 47

Malaria screen Sarah Teichmann finds genes essential awarded Genetics for life Society’s 2018 Page 31 Mary Lyon NCTC 3000 Medal HDR UK funding genomes reveal for Cambridge hub antibiotic resistance Innovations’ history Adrian Ibrahim chairs new BIA Microbiotica Genomics Seeds of kidney and Genentech Advisory cancer sown in sign strategic Committee adolescence partnership

Jan Feb Mar Apr May Jun

New osteoarthritis Celgene joins genes discovered Open Targets

Machine learning flags potential pathogens

Dominic Kwiatkowski Multidrug resistant joins Fellowship malaria spread under of the Royal the radar for years Society in Cambodia Peter Campbell honoured by EMBO

Cholera tracked at household level Page 28

UK Biobank agreement to sequence 50,000 human genomes Page 43 Skin is a battlefield for Page 10

Largest collection of worm reference test genomes produced could predict Page 27 leukaemia Page 11

Human Cell Personalised Atlas boosted by prediction of Wellcome funding blood Kidney cancer’s Sanofi joins developmental Open Targets source revealed

Jul Aug Sep Oct Nov Dec

Genome damage Rare genetic Launch of UK from CRISPR-Cas9 more Darwin Tree of Life editing more complex than Project to support extensive than thought Earth BioGenome thought Project

AI-created CRISPR-Cas9 editing prediction tool

Golden eagle 25 Genomes for genome sequenced 25 Years project Page 39 Page 34

IT’s award winning cloud compute Page 42 What we do Our work Our approach Other information Year in numbers

507 7.398 Institute Petabases publications in 2018 4.639 Pb Cell 4

3.025 Pb Nature 21 1.910 Pb

Nature Genetics 16 1.054 Pb

New England Journal of Medicine 1 Jan 2015 Jan 2016 Jan 2017 Jan 2018 Jan 2019 Science 14

Who published Where Sanger How much DNA our work? staff are from* was sequenced?

1,236 Europe

63 North America 7 Middle 100 East Asia Pacific 16 12 Africa Latin America

*In January 2019

Wellcome Sanger Institute Highlights 2018/19 7 Our work

With secured funding from Wellcome, we are able to strategically focus our work in five key research fields

10 Cancer, Ageing 26 Parasites and Somatic and Microbes Mutation Investigates the common underpinning mechanisms of evolution, infection Provides leadership in data aggregation and resistance to therapy in and informatics innovation, develops and parasites. It also explores the high-throughput cellular models of cancer genetics of host response to infection for genome-wide functional screens and the role of the microbiota in health and drug testing, and explores somatic and disease. mutation’s role in clonal evolution, ageing and development. 34 Tree of Life 16 Cellular Investigates the diversity of complex organisms found in the UK through Genetics sequencing and cellular technologies. It also compares and contrasts species’ Explores human gene function by studying genome sequences to unlock the impact of genome variation on cell evolutionary insights. . Large-scale systematic screens are used to discover the impact of naturally-occurring and engineered genome mutations in human iPS cells, their differentiated derivatives, and other cell types.

20 Human Genetics Applies genomics to population-scale studies to identify the causal variants and pathways involved in human disease and their effects on cell biology. It also models developmental disorders to explore which physical aspects might be reversible.

8 Wellcome Sanger Institute Highlights 2018/19 What we do Our work Our approach Other information

Discover how we tracked cholera house by house Page 28

Read how Himalayan genomes show how humans adapted to high altitudes Page 24

Find out how we use single-cell techniques to explore foetal development Page 19

Wellcome Sanger Institute Highlights 2018/19 9 Cancer, Ageing and Somatic Mutation

In this section 1 Your body: a Darwinian battlefield

2 Blood test could predict leukaemia

3 One-stop shop for cancer models

4 Cancer calculator gives personalised predictions

5 Genomic microscope sorts lions from lambs

6 ‘Fast’ bone cancer has slow origins

7 Cancer’s seeds sown decades earlier

8 Ecological approach reveals hidden stem cell population

1

Your body: a Darwinian battlefield Research by Sanger Institute cancer researchers, published in Science and Cell Stem Cell, reveals that our bodies are evolutionary battlefields where only the fittest cells survive.

s our bodies age, our cells acquire by other clones. Even when the mice were While further work is needed to answer such an increasing amount of mutations exposed to UV light, mimicking the effects questions, the results show the fundamental A in their genomes, making cancer of the sun, the clones with cancer importance of considering healthy tissue development ever more likely. While much mutations didn’t survive and progress when studying the origins of cancer. work has been undertaken to understand to tumour formation. the mutations that lead to tumour formation, much less is known about how healthy Much to the researchers’ surprise, the tissue changes over time. For example, in same was true in oesophageal tissue. 2015, Sanger scientists showed that 25 per Despite being shielded from UV light, References a large proportion of the cells carried Martincorena I et al. High burden and pervasive cent of eyelid cells in people without cancer positive selection of somatic mutations in normal cancer-associated mutations. Yet tumour had at least one cancer-associated mutation. human skin. Science 2015; 348: 880–886. formation had not occurred. In fact, by To address this, the team applied deep- the time we reach middle age, the team Murai K et al. Epidermal tissue adapts to restrain progenitors carrying clonal p53 mutations. Cell Stem targeted single-cell sequencing to mouse discovered, we are likely to have more cells Cell 2018; 23: 687–699.e8. skin samples and human oesophageal with cancer-associated mutations than not. tissue to explore how mutational burden Martincorena I et al. Somatic mutant clones colonize These findings have important ramifications the human esophagus with age. Science 2018; 362: develops in healthy cells. 911– 917. when seeking to understand genetic The work revealed that the tissues were progression to cancer. Mutations in certain made up of a patchwork of clones – many genes commonly associated with cancer, with mutations in cancer-associated genes for example NOTCH1, were more prevalent – all competing for space as they grew. in healthy cells than tumour cells. Does In the mice, cells with a well-documented this mean that genes that are classified as cancer-driving mutation divided rapidly at cancer-driving may, in some circumstances, first but, after six months, were out-competed have a protective effect?

10 Wellcome Sanger Institute Highlights 2018/19 What we do Our work Our approach Other information

Our work Cancer, Ageing and Somatic Mutation

2

Cancer and Nutrition (EPIC). They analysed Blood test could the genomes of blood cells from 124 people with AML that had been taken before they predict leukaemia developed the disease and compared the results with those of 676 people who did Results of a Sanger Institute not go on to develop any form of cancer. co-led study in Nature show When compared with unaffected people, that genomic analysis can the genomes of blood cells from healthy identify those at high risk people who went on to develop AML contained more numerous genetic of developing acute myeloid mutations in their blood cells, showed leukaemia (AML) years before greater clonal expansion of these affected cells and showed enrichment of mutations The blood cell symptoms become apparent. in certain genes. The differences were genomes of he analysis of blood cells can reveal specific enough to allow the team to subtle changes in the genomic accurately predict which people, from landscape and clonal makeup that a separate group who had also provided T blood samples over time, went on to 800 predict AML development. develop AML. people were Often AML doesn’t have any symptoms analysed early on. It is aggressive and usually While the subtlety of the differences appears very suddenly in patients who means that further work is needed to refine present with the acute complications of the prognostic model, the work provides bone marrow failure. Yet the roots of the proof-of-principle that it may be possible disease form much earlier and identification to develop tests to identify people at a high of these genomic markers would enable risk of developing AML. more effective monitoring and treatment.

To understand the genomic changes that precede symptoms, Sanger researchers and colleagues at European Reference Abelson S et al. Prediction of acute myeloid Institute (EMBL-EBI) collaborated with the leukaemia risk in healthy individuals. Nature European Prospective Investigation into 2018; 559: 400–404.

For the first time we can identify people at risk of developing AML many years before they actually develop this life-threatening disease.”

Dr George Vassiliou, Wellcome Sanger Institute and the Wellcome-MRC Cambridge Stem Cell Institute

Wellcome Sanger Institute Highlights 2018/19 11 Our work Cancer, Ageing and Somatic Mutation

3

y applying large-scale high-throughput Being able to study cancer cells in the One-stop shop genome sequencing and analysis, laboratory – how they behave, and what Bresearchers in the Sanger Institute’s drugs they respond to – is critical for cancer for cancer models Cancer, Ageing and Somatic Mutations research. By reducing the time required to Programme have developed a ‘one-stop select the most appropriate model to study, A new online resource shop’ for information on more than 1,200 the work will help to speed research into developed by Sanger Institute cancer models from 43 cancer types in 29 the genes essential for cancer growth tissues. Through the online hub researchers and potential drug targets. In addition, scientists is powering can access centralised high-quality raw and the resource will aid the reproducibility worldwide cancer research processed genome sequence data, details and expansion of cancer research. of key gene mutations, functional datasets by speeding the selection (including drug susceptibility), and details Cell Model Passports will be regularly and application of cancer about the original tumour for each cell line updated with new cell models, and or organoid. genomic and functional datasets as cell lines and organoids: they are generated. Cell Model Passports. Many of the organoids detailed in Cell Model Passports are grown from tumour tissues sent to the Institute from four clinical sites across the UK. This important collaboration is part of the Human Cancer Models Initiative, Reference van der Meer D et al. Cell Model Passports—a hub for an international project to generate new clinical, genetic and functional datasets of preclinical cancer cell models. cancer models. Nucleic Acids Research 2018; gky872. Published online.

Cell Model Passports is a one-stop shop for information on 1,200 cancer models from 43 cancer types in 29 tissues

12 Wellcome Sanger Institute Highlights 2018/19 What we do Our work Our approach Other information

Our work Cancer, Ageing and Somatic Mutation

4

In a further proof-of-principle their underlying causal biology – showed Cancer calculator markedly different blood counts, risk that genomics can power of leukaemic transformation and length gives personalised personalised medicine, of time before cancer recurrence. predictions Sanger Institute researchers Taking the analysis even further, the scientists have developed an accurate combined 63 clinical and genomic variables to develop personalised predictions of online calculator to predict each individual’s outcome. This approach the outcome of disease for successfully produced much more accurate patients with certain types prognoses than current techniques and may help to inform treatment choice. of chronic blood cancer. This work builds on previous work in 2017 eveloped in conjunction with the by the team, where they used a similar Wellcome-MRC Stem Cell Institute approach to develop a knowledge bank Dand the , to determine disease outcome and best the freely available resource combines treatment choice for acute myeloid leukaemia. an individual’s genetic data with their Sequencing clinical information to provide greatly 69 enhanced prognoses. Published in the New England Journal genes in 2,000 people References of Medicine, the team detailed how Grinfield J et al. Classification and personalized revealed 8 genomic sequencing the gene-coding regions of prognosis in myeloproliferative neoplasms. New subtypes of 69 genes associated with cancer in more England Journal of Medicine 2018; 379: 1416–1430. myeloproliferative than 2,000 patients with myeloproliferative Gerstung M et al. Precision oncology for acute cancers neoplasms identified eight genomic myeloid leukemia using a knowledge bank approach. subtypes. People within these eight Nature Genetics 2017; 49: 332–340. groups – defined for the first time by 5

By conducting whole-genome and Genomic sequencing of five human osteoblastomas and another form of bone microscope sorts cancer (osteoid osteoma), the researchers Osteoblastoma discovered rearrangement of the mainly affects lions from lambs factor gene FOS in five of the young people aged tumours and rearrangement of FOSB (its For the first time, a genetic genetic cousin) in the other. By applying marker for osteoblastoma the molecular genomic techniques of FISH (fluorescent in situ hybridization) 10-25 has been discovered. and immunohistochemistry to another 55 osteoblastoma and osteoid osteoma he discovery by scientists at the samples, the genetic changes were Sanger Institute, published in Nature years found to be ubiquitous and diagnostic. T Communications, may lead to a diagnostic tool to distinguish between benign While FOS has been implicated in osteoblastoma and aggressive osteosarcoma tumour formation before – for example tumours. It could help spare young people changes in v-fos has been shown to cause from unnecessary and painful treatment. osteosarcoma in mice – it is the first time that the has been Osteoblastoma is the most common benign associated with human bone cancer. bone tumour, mainly affecting young people aged 10-25 years. It is treated by surgery to remove the tumour and ease symptoms. But its diagnosis is challenging: under the microscope, the tumours can look very Reference similar to osteosarcoma. In contrast to its Fittall MW et al. Recurrent rearrangements of FOS benign lookalike, osteosarcoma is an and FOSB define osteoblastoma.Nature aggressive form of bone cancer that requires Communications 2018; 9: 2150. extensive treatment – sometimes including amputation and intensive chemotherapy.

Wellcome Sanger Institute Highlights 2018/19 13 Our work Cancer, Ageing and Somatic Mutation

6

By sequencing the entire tumour genomes ‘Fast’ bone cancer of 124 children, the collaboration uncovered two key mechanisms at work. The process has slow origins responsible determined the aggressiveness of the resulting tumour. In addition, these Ewing sarcoma is thought rearrangements occurred many years to be a fast-growing cancer before the disease is usually detected. in children, with only harsh While the majority of children had simple chemotherapeutic and surgical gene fusion of EWSR1 and ETS, 42 per cent had undergone chromoplexy. First identified treatment options. in prostate cancer, chromoplexy involves enomic research by the Sanger the formation of dramatic loops of DNA Institute and the Hospital for Sick that disrupt multiple genes. Children (SickKids) in Canada, G Those tumours with chromoplexy were has shown that the cancer actually has found to be more aggressive than those slow-growing roots whose detection due to simple gene fusion. Developing might enable swifter diagnosis and more a genomics-based diagnostic to identify successful treatment. 42% this process might enable tumours to be detected and treated when they are of patients with Ewing Sarcoma Some tumours are driven not by mutations smaller and easier to treat. showed chromoplexy – formation within individual genes, but by chromosomal of loops of DNA that disrupt rearrangements, leading to gene fusions. multiple genes Ewing sarcoma is one such cancer, driven by the fusion of the genes EWSR1 and ETS. But how does this gene fusion occur, and Reference could the causative mechanism be important? Anderson ND et al. Rearrangement bursts generate canonical gene fusions in bone and soft tissue tumors. Science 2018; 361: 6405. 7

Reporting in Cell, they found that the first Cancer’s seeds seeds of kidney cancer are sown in childhood or adolescence, 40 or 50 years before any References sown decades cancer develops – and that many of us will Mitchell TJ et al. Timing the landmark events in the evolution of clear cell renal cell cancer: TracerX Renal. be carrying these cancer-associated Cell 2018; 173: 611–623.e17. earlier changes. Yet most people do not develop kidney cancer, and it is hoped that this Ju YS et al. Somatic mutations reveal asymmetric cellular dynamics in the early human embryo. Nature Sanger researchers, together work will help with mapping the journey 2017; 543: 714–718. with colleagues at the Francis of mutation acquisition that is needed for the cells to become malignant. Stephens PJ et al. Massive genomic rearrangement Crick Institute and the TracerX acquired in a single catastrophic event during cancer development. Cell 2011; 144: 27–40. Renal consortium, have By analysing the whole genomes of 95 biopsies from 33 patients with ccRCC, discovered that the first seeds the researchers found that the first mutation of kidney cancer are sown in more than 90 per cent of people was the years, or even decades, loss of the short arm of chromosome 3, which removed a number of tumour- before symptoms appear. suppressing genes. They also found that 35-40 per cent of these people had also o understand how one type of gained the long arm of chromosome 5 at kidney cancer – clear cell Renal the same time, due to a process known Cell Carcinoma (ccRCC) – develops, T as chromothripsis – where chromosomes the team used genomic archaeology shatter and are put back together. techniques to reconstruct the genetic changes that took place. The approach, By identifying the key genetic changes which they had developed in 2017, is able over time that drive a kidney cell to cancer, to trace back the time when each individual the researchers hope that the knowledge mutation appeared in the body, all the way will help to inform early detection of, and to the first few hours of the developing intervention for, people at high risk of embryo in the womb. developing the cancer.

14 Wellcome Sanger Institute Highlights 2018/19 What we do Our work Our approach Other information

Our work Cancer, Ageing and Somatic Mutation

8 Ecological approach reveals hidden stem cell population

Sanger researchers used novel techniques to estimate the populations of blood stem cells in healthy adults.

he new approach, detailed in Nature, could lead to insights into cancer development, and Twhy some stem cell therapies are more effective than others.

The method was a genomic form of ‘capture-recapture’ – used by ecologists to study animal populations. The team conducted whole-genome sequencing of 198 blood and bone marrow stem cell colonies from a 59-year-old man to identify and assign sets of mutations unique to each stem cell and its descendants. By searching for these mutations in the rest of the blood, the scientists could work out which blood cells had derived from these stem cells and estimate the total stem cell population.

They found that healthy adults have 10 times more blood stem cells than previously thought – somewhere between 50,000 and 200,000.

The technique is very flexible. Not only can researchers now measure how many stem cells exist, but also they can see how related the cells are to each other and what types of blood cells they produce. The approach can also be applied to study the effects of ageing and disease in other organs.

The team will now use this method to understand stem cell populations in patients with blood cancers. They hope to learn how single cells expand their numbers, outcompete normal stem cells and lead to tumour formation.

Reference Lee-Six H et al. Population dynamics of normal human blood inferred from somatic mutations. Healthy adults have Nature 2018; 561: 473–478. 10 times more blood stem cells than previously thought Cellular Genetics

In this section 1 CRISPR-Cas9 damage worse than thought

2 Machine learning improves CRISPR-Cas9 gene editing

3 Human Cell Atlas bears fruit

4 Immune system evolution reveals its Achilles’ heel

5 Immune system secrets unlocked by new CRISPR system

6 How baby placates mum’s immune system

We found that changes 1 in CRISPR/Cas9 edited cells have been seriously CRISPR-Cas9 underestimated damage worse before now.” Professor , than thought Director Emeritus at the Work by Sanger Institute Wellcome Sanger Institute researchers has shown that the widely used CRISPR- Cas9 genome editing method can cause more extensive DNA damage than previously thought. hey found large deletions and Reporting their findings inNature genome rearrangements, in some , the researchers discovered T cases far from the targeted site. These that areas of the genome close to the target changes could lead to dangerous changes site had been deleted. In some cases, these in some cells. were large stretches of DNA, up to several thousand bases long. Further away from Until now, analysis of CRISPR-Cas9 editing the target site, they found complicated has shown that the damage caused near to rearrangements of DNA, where once- the targeted site was low – usually involving distant stretches of the genome had been just a few DNA bases. However, standard fused together. lab-based analysis techniques used to search for this damage require specific sites The team warns that standard tests used on the genome to work and sometimes to detect unwanted DNA changes may CRISPR can remove them. miss important forms of damage caused by CRISPR-Cas9 editing. They counsel that To investigate whether or not the process clinical use of the technique in gene editing had other unwanted effects, the scientists therapies should undergo specific testing. studied – for the first time – the whole genomes of CRISPR-Cas9 edited cells. They looked at mouse embryonic stem cells, mouse blood-making cells, and human retinal cells. In addition, to overcome Reference the shortcomings of standard lab-based Kosicki M et al. Repair of double-strand breaks induced by CRISPR-Cas9 leads to large deletions techniques, the team employed long-read and complex rearrangements. Nature Biotechnology sequencing and long-range genotyping 2018; 36: 765–771. for analysis.

16 Wellcome Sanger Institute Highlights 2018/19 What we do Our work Our approach Other information

Our work Cellular Genetics

2

Writing in Nature Biotechnology, the team CRISPR-Cas9 Machine learning outline how they made over 41,000 gRNAs and applied them to cells with a range off-target 41,630 improves of genetic backgrounds using different damage CRISPR reagents. These edited genomes gRNAs CRISPR-Cas9 were then sequenced at high coverage in synthetic to identify off-target changes. constructs gene editing created Next, the scientists applied machine 1,000, CRISPR-based gene learning to the more than 1,000,000,000 mutational outcomes to develop their editing techniques have computational tool. Dubbed FORECasT 000,000 mutational revolutionised genetic (favoured outcomes of repair events at outcomes research, by reducing the Cas9 targets), the resulting system is able analysed to predict the exact mutations resulting time and cost of gene from CRISPR-Cas9 genetic editing just 6,568 knockout experiments. from the sequence of the target DNA. human gene targeting owever, the process is not perfect Researchers can use the freely available gRNAs studied and can produce off-target mutations program to select the most appropriate in greater Hthat might bias genetic experiments gRNA, and screen out any damaged cells depth or produce harmful outcomes. by using targeted DNA sequencing to look for predicted off-target damage. 58% The amount – and location – of these of off-target mutations can differ depending on the damage are cell type, even within the same organism. deletions of at least 3 bases However, this DNA damage is not random and appears to be related to the guide RNA Reference (gRNA) template being used. Armed with Allen F et al. Predicting the mutations generated by repair of Cas9-induced double-strand breaks. 31% of off-target this knowledge, Sanger scientists sought Nature Biotechnology 2019; 37: 64–72. to develop a reliable and accurate damage occur prediction tool that could help minimise between repeating sequences of at unwanted changes. least 2 bases 3

Now the first fruits of this research have Human Cell Atlas been made available online to the scientific community: the single-cell gene activity bears fruit profiles of more than half a million cells.

Scientists around the world The data comes from a number of sources. are already benefiting from Sanger scientists, collaborating with the MRC Cancer Unit studied thousands of the work of the Human Cell immune and connective tissue cells from Atlas, the ambitious global a widely used strain of mouse used to study skin melanoma. Cells were collected over collaboration to create a 11 days to understand how they respond ‘Google Map’ of the human and adapt to a tumour over time.

body’s 37 trillion cells. In addition, in conjunction with the o-led by the Sanger Institute and Cambridge Repository for Translational the , more than 480 Medicine, Sanger researchers generated the first single-cell data from a donated Gene activity profiles scientists are seeking to discover, C human spleen. They characterised for more than and pinpoint the location of, every cell type in the human body at different stages of life individual cells over time to study the from foetus to adult. With backing from effects of oxygen deprivation on the cells. funders including Wellcome, the Medical The Broad Institute and Harvard 500,000 Research Council (MRC), National Institutes University have supplied data from for Health, and the Chan-Zuckerberg individual cells have individual immune cells from umbilical Initiative the project is also exploring how been released cord and adult bone marrow. these cells respond to disease or trauma.

Wellcome Sanger Institute Highlights 2018/19 17 Our work Cellular Genetics

4

The scientists compared the genomic Immune system activity of the cells in response to stimuli that imitated a pathogen. They showed that evolution reveals genes which are highly divergent between The team species also show a wide variety of activation characterised the its Achilles’ heel between cells within an individual tissue. activity of thousands In comparison, genes which were conserved of genes in more than To understand more about between species appeared to be much the immune system, Sanger more tightly regulated and consistent in Institute and European their activation between cells in a tissue. 250,000 Writing in Nature, the researchers Bioinformatics Institute cells from humans, hypothesise that the balance between rhesus macaques, (EMBL-EBI) researchers divergent and conserved genes has evolved mice, rats, pigs worked with collaborators to to ensure that immune responses are and rabbits study the activity of individual proportionate and confined. The divergent genes have evolved to enable immune cells immune cells in six different to recognise and attack a particular virus species. or bacteria. In contrast, the conserved ones regulate the level of response to prevent n total, the team characterised the over-activation of the immune response, activity of thousands of genes in which could cause inflammatory diseases. I more than 250,000 individual cells from humans, rhesus macaques, mice, However, this balance also has an inherent rats, pigs and rabbits. They studied the weakness. The conserved genes are innate immune system in both immune more tightly regulated and may have other functions in a cell. These represent Reference cells and fibroblasts. Hagai T et al. variability across an Achilles’ heel and are targeted by viruses cells and species shapes innate immunity. and bacteria to subvert the immune system. Nature 2018; 563: 197–202. 5

By combining this tool with a new screening Immune system protocol, the team was able to study the Th2 cell’s genome efficiently and quickly. secrets unlocked The result is an atlas of how the cell develops and signals other immune system by new CRISPR cells to help direct the body’s response. system The researchers identified a number of new genes that regulate Th2 cells’ response Sanger researchers and to infection, providing further avenues 88,000 of research into auto-immune diseases. collaborators have provided In particular, they discovered that a molecule guide RNA templates a much-needed overview involved in controlling production were made to knock – PPARG – plays an important role. out 20,000 of a vital part of the immune mouse genes system: Th2 cells. Until now individual studies have focused on different aspects of the Th2 cells’ hese cells help to coordinate, behaviour, but were difficult to compare and limit, how the body’s immune or combine. This unbiased genome-wide T system fights off an infection. This overview not only confirms many of the understanding could help drive research previous studies’ findings, it also provides into new therapies for auto-immune diseases, a vital map of which genes are switched such as allergies and rheumatoid arthritis. on and off, and when. Reporting in the journal Cell, the scientists detail how they systematically switched off every gene in a mouse Th2 cell, by creating

a new retroviral CRISPR-Cas9 gene editing Reference system. The novel approach was needed Henriksson J et al. Genome-wide CRISPR Screens because mouse Th2 cells cannot be edited in T Helper Cells Reveal Pervasive Crosstalk between using standard lentiviral techniques. In total, Activation and Differentiation.Cell 2019; Available online DOI: 10.1016/j.cell.2018.11.044. the team made 88,000 guide RNA templates to ensure that they could knock out all 20,000 mouse genes. What we do Our work Our approach Other information

6 How baby placates mum’s immune system

The first findings from the – the junction between the placenta and They also undertook microscopy the uterus – during the first trimester of studies. Combining this data, the human developmental cell pregnancy. This is the first time that the scientists discovered how foetal cells atlas has shown how foetal maternal-foetal interface has been studied entering the mother’s uterus lining cells and maternal cells in such detail. stimulate it to produce the blood vessels to supply the placenta. communicate with each The researchers used DNA and RNA other to prevent the sequencing to identify maternal and foetal By providing much-needed data on the cells in the decidua and determine which very first steps of embryo implantation mother’s immune system genes were active in each cell type. and placenta formation1, researchers from attacking the baby’s can better understand how these To systematically study the interactions processes work and what goes wrong placenta. The discoveries between the maternal and foetal cells and in pre-eclampsia and miscarriage. will help future research into uncover those involved in modifying the mother’s immune system, the team first miscarriages and still births. created a publicly available repository he study, published in Nature, of cell-surface and secreted molecules applied genomic and bioinformatic (www.CellPhoneDB.org). They then Reference Vento-Tormo R et al. Single-cell reconstruction techniques to approximately 70,000 interrogated the data to identify molecules involved in signalling between cell types. of the early maternal–fetal interface in humans. Tindividual cells taken from the decidua Nature 2018; 563: 347–353.

Wellcome Sanger Institute Highlights 2018/19 19 Human Genetics

In this section 1 Visionary study powers NHS’ genomics revolution

2 Rare disorders influenced by common variants

3 Recessive genes play minimal role in undiagnosed developmental disorders

4 UK Biobank data reveals new osteoarthritis drug targets

5 Genetic diagnoses in the womb

6 Himalayan genomes show how humans adapted to high altitudes

7 African hepatitis C virus strains evade Western drugs

As a result of the DDD project thousands of children with rare neurodevelopmental disorders now have a diagnosis, and their families can use this to inform their decisions about further children and to guide and target appropriate medical care.”

Dr Jane Hurst, President of the UK Clinical Genetic Society, and Great Ormond Street Hospital

33,500 parents and children

20 Wellcome Sanger Institute Highlights 2018/19 What we do Our work Our approach Other information

Our work Human Genetics

1

The groundbreaking collaboration with Every child in the study has had their genome Visionary study NHS services across the UK and Ireland data analysed, but the work is not yet recruited over 14,000 children and their complete. The project is continuing to powers NHS’ parents for genome sequencing. When deliver new insights. As knowledge builds, they joined, each child had an undiagnosed through DDD and similar studies across the genomics developmental disorder. So far, combining globe, the power to make new discoveries genomic insights with clinical data has increases. A recent reanalysis of 1,133 revolution provided diagnoses to 4,500 children. children’s genomes resulted in an additional For many families, the diagnosis has 182 diagnoses. Most of those were due In October 2010, the pointed to a treatment that may be to new disorder-associated genes Sanger Institute launched beneficial. For others, the diagnosis has discovered in the years since the initial brought relief, ending a years-long search analysis. The researchers, reporting their the Deciphering Developmental for the cause of their child’s condition. findings inGenetics in Medicine, estimate Disorders (DDD) study, a that child-parent genome sequencing as a A diagnosis can help inform family planning first-line diagnostic test for developmental visionary project to demonstrate decisions, as often the analysis will tell how disorders would diagnose over 50 per cent the feasibility of using genome likely it is that future children may be affected. of patients. Many families who received a diagnosis sequencing for diagnosis have connected to others with the same In addition, the Sanger Institute spin-out in the clinic – bringing direct condition – finding support groups around company, Congenica, has built on the benefits to patients. the world. foundations of the DDD study to develop its SapientiaTM clinical genomics analysis t has proven so successful that it The researchers have also identified platform that is helping to power medical spurred the UK Government to initiate 49 completely new disorders, caused by services around the world, including the the 100,000 Genomes Project and to changes in genes previously unconnected UK, China and Portugal. I to developmental disorders. The study has launch the new NHS clinical genomics service, and has founded an analysis also powered international research efforts company that is powering clinical into childhood disorders, resulting in over 125 published research papers. genomics services worldwide. Reference Wright CF et al. Making new genetic diagnoses with old data: iterative reanalysis and reporting from genome-wide data in 1,133 families with developmental disorders. Genetics in Medicine 2018; 20: 1216–1223. 8 years

200 4,500 NHS consultants diagnoses so far

49 completely new disorders

125 research papers 24 NHS Genetics Centres Wellcome Sanger Institute Highlights 2018/19 21 Our work Human Genetics

2

The research, published in Nature, is the Rare disorders first large-scale study of the role of common genetic variants in rare disorders. Using influenced by genome-wide association studies the team tested four million common common variants genetic variants for association with neurodevelopmental disorders. Most rare developmental conditions are thought to Surprisingly, the researchers found that these common DNA variations – for be caused by damaging example those that increase the risk of mutations in a single gene. schizophrenia – also affected the risk of rare disorders. This is the first time that common However, the symptoms genetic variants have been found to have between individuals with a role in rare developmental disorders.

the same mutation can The team hope their findings will form the differ markedly. basis of further research into how common genetic variation affects both the risks and nd some individuals who have symptoms of rare diseases. Sanger ‘disease-causing’ mutations researchers studied A can appear unaffected. To try to the genomes of nearly understand this phenomenom, Sanger Institute researchers studied the genomes Reference of nearly 7,000 children in the Deciphering Niemi MEK et al. Common genetic variants contribute 7,000 Developmental Disorders (DDD) study. to risk of rare severe neurodevelopmental disorders. Nature 2018; 562: 268–271. children They compared these DNA sequences with those of individuals unaffected by developmental conditions. 3

The DDD study aims to find diagnoses Recessive genes for children with previously unknown developmental disorders. Sanger Institute References play minimal role researchers analysed the gene-coding Martin HC et al. Quantifying the contribution of recessive coding variation to developmental DNA of more than 6,000 families to look disorders. Science 2018; 362: 1161–1164. in undiagnosed for recessive causes of their conditions. The team developed a novel bioinformatics Deciphering Developmental Disorders study. Prevalence and architecture of de novo mutations developmental approach that allowed the scientists to in developmental disorders. Nature 2017; search in both known and yet-to-be 542: 433–438. disorders discovered genes.

Researchers working on the The researchers estimate that only 5 per Deciphering Developmental cent had inherited a disease-causing gene mutation from both parents, far fewer than Disorders (DDD) study have previously thought. Separate studies have discovered that only a small estimated that half the children in the DDD study have conditions caused by new Sanger fraction of rare, undiagnosed mutations – mutations that were not Institute researchers inherited from their parents. Together, developmental disorders in analysed the gene-coding these findings indicate that, for many the British Isles are caused DNA of more than patients, more complicated genetic by recessive gene variants. mechanisms may be involved.

he findings, published inScience , The team also identified two recessive 6,000 could help clinicians better predict genes – KDM5B and EIF3F – that had families T the risk of a condition affecting any not been associated with recessive subsequent children, and so help families developmental disorders before. plan for the future. They also reveal that a large proportion of developmental disorders Dr Hilary Martin, the first author on the are due to complex genetic mechanisms paper, is now leading a group in the Sanger that warrant deeper investigation. Institute Human Genetics programme to identify and explore the genetic complexity of health and disease.

22 Wellcome Sanger Institute Highlights 2018/19 What we do Our work Our approach Other information

Our work Human Genetics

4

Scientists at the Sanger Institute, surgery, statistical fine-mapping, and UK Biobank GlaxoSmithKline (GSK) and their evidence from rare human disease and collaborators used the UK Biobank and animal models. The variants identified data reveals new arcOGEN resources to analyse the genomes are in genes involved in bone development, of over 77,000 people with osteoarthritis. collagen formation, or organising the osteoarthritis They carried out a genome-wide extracellular matrix. association study of approximately 17.5 drug targets million single variants in 77,000 By matching the variants’ biological functions people with osteoarthritis and 378,000 to the modes of action of existing or In the largest genetic study healthy individuals. Reporting in Nature experimental drugs, the researchers of osteoarthritis to date, Genetics, they identified 65 genome identified 10 targetable variants. The findings variants associated with osteoarthritis, will play an important role in speeding up researchers have uncovered 52 of which were previously unknown. drug development. 52 new genome variants To understand how these variants linked to the condition. contribute to osteoarthritis, the team

heir findings more than double the integrated data from functional genomics Reference number of variants associated with and gene expression experiments. Tachmazidou I et al. Identification of new therapeutic targets for osteoarthritis through genome-wide osteoarthritis. Of these, 10 of the They also incorporated transcriptomic T characterisation of tissue from osteoarthritis analyses of UK Biobank. Nature Genetics 2019; variants are in genes that can be targeted 51: 230–236. by drugs – either already in use for another patients undergoing joint replacement condition, or in development – accelerating the advance towards new therapies.

Almost 10 million people in the UK have osteoarthritis – a degenerative, painful joint disease with no treatments. The disease indirectly costs the UK economy £14.8 billion per year.

52 new genome variants linked to the condition

10 million people in the UK have osteoarthritis

Wellcome Sanger Institute Highlights 2018/19 23 Our work Human Genetics

5

Institute, Great Ormond Street Hospital, Genetic and Birmingham Women’s Hospital, and collaborators from the Prenatal Assessment Foetal of Genomes and (PAGE) study diagnoses structural anomalies sequenced the genes of 610 developing occur in about in the womb babies and 1,026 biological parents. Each foetus had abnormalities detected Foetal structural anomalies by routine ultrasound – either of the heart, occur in about 3 per cent of brain, skeleton, or in multiple organs. 3% pregnancies and are usually Reporting in The Lancet, the team were able of pregnancies detected by ultrasound. to provide a genetic diagnosis for nearly 10 per cent of pregnancies. Such diagnoses hile ultrasound can pick up potential could be used to provide better information problems, it can’t identify their to parents about how their child is likely The teams hope the method will be cause. Genetic screening is now W to be affected, and guide clinical care. widely used, improving diagnoses for routinely used to provide more accurate families across the UK. The information diagnoses for such abnormalities. This The team focused their analysis on could provide parents with important involves looking for additional copies of 1,628 genes that have a potential role information about the outlook for their a chromosome or copy number variation in developmental disorders. After baby, as well as whether they might face (where sections of a chromosome are bioinformatic filtering and prioritisation, a similar situation in future pregnancies. duplicated or lost). 321 genetic variants representing 255 potential diagnoses were selected However, reading the entire for review by a multidisciplinary clinical of the foetus’ genes would provide far panel. In total, 52 of the 610 pregnancies greater and, potentially, more definitive were diagnosed with a known disorder Reference information on the underlying genetic cause and a further 24 had a genetic variant Lord J et al. Prenatal sequencing analysis in fetal structural anomalies detected of structural anomalies. So, in the largest of uncertain significance but potential by ultrasonography (PAGE): a cohort study. study of its kind, researchers at the Sanger clinical utility. Lancet 2019; S0140–6736(18)31940–8.

6

Researchers from the Sanger Combining four statistical approaches, they Himalayan identified regions of the genome that related Institute and Leiden University to adaptations to life at high altitude. genomes show have performed the most The strongest signals of high-altitude how humans comprehensive survey to date adaptation were near two genes known of genetic variation among to be associated with the body’s response adapted to high to low-oxygen environments. They people living in the Himalayas. discovered eight additional signals of altitudes heir study found that different high-altitude adaptation, five of which have populations living in the region share strong biological links to such functions. a specific ancestral component, The team also uncovered the genetic T ancestry of the different populations and suggesting that genetic adaptation to life at high altitude originated once in the were able to uncover where they had mixed region and subsequently spread as and diverged over the past 2,000 years. populations diverged. Their results suggest the presence of The harsh environment at high altitudes, a single ancestral population carrying due to increased ultraviolet radiation, advantageous variants for high-altitude decreased atmospheric pressure and lower adaptation that separated from populations levels of oxygen, is inescapable. The team living in lowland East Asia, and then spread explored the genetic and physiological and diverged into different populations adaptations of people who have settled in across the Himalayan region. the region, analysing the genomes of 738 individuals from 49 populations in Nepal, Bhutan, North India and Tibet. They looked at 500,000 single nucleotide variants in Reference each individual’s genome and compared Arciero E et al. Demographic History and Genetic Adaptation in the Himalayan Region Inferred from the data to worldwide population data. Genome-Wide SNP Genotypes of 49 Populations. Molecular Biology and Evolution 2018; 35: 1916–1933. What we do Our work Our approach Other information

Our work Human Genetics

7 African hepatitis C virus strains evade Western drugs

A British-Canadian-Ugandan HCV causes hepatitis C – a liver disease mutations in genes known to be that can trigger cirrhosis and liver cancer associated with resistance to several collaboration has identified – and nearly 400,000 people die from it commonly used antiviral drugs, three new strains of the each year. The World Health Organization highlighting that careful approaches hepatitis C virus (HCV) in has set a goal of eliminating the disease are needed to diagnose and treat HCV by 2030 but, at the moment, an estimated effectively in Africa. patients in Uganda, in the 71 million people around the world have largest study of the disease chronic hepatitis C infection. Of these Further research to determine the extent people roughly 10 million live in sub- of HCV genetic diversity in Africa will aid in sub-Saharan Africa. Saharan Africa. the development of an effective vaccine and enhance elimination efforts. It will enome analysis by the Sanger The researchers recruited 7,751 patients reveal recent and historical transmission Institute, the MRC-University from Uganda who were subsequently patterns to inform public health of Glasgow Centre for Virus screened for the virus. They found 20 interventions. GResearch and collaborators suggests that patients had an undiagnosed hepatitis C these African strains may not be susceptible infection. Sequencing the virus genome to antiviral drugs developed to treat HCV from these individuals, together with two infections in Europe and America. samples from the Democratic Republic of Reference the Congo, revealed the three previously Davis C et al. New highly diverse hepatitis C strains undiscovered strains. The strains have detected in sub‐Saharan Africa have unknown susceptibility to direct‐acting antiviral treatments. Hepatology 2018; doi: 10.1002/hep.30342.

World Health Organization has set a goal of eliminating hepatitis C by 2030

Wellcome Sanger Institute Highlights 2018/19 25 Parasites and Microbes

In this section 1 NCTC 3000 reveals history of antibiotic resistance

2 Largest worm genomes study finds new drug targets

3 Real-time, Europe-wide gonorrhoea monitoring is feasible

4 Tracking cholera – house by house

5 Cholera in a conflict zone

6 How malaria jumped to humans

7 Malaria screen finds genes essential for life

8 Flipping malaria’s sex master switch

9 Humans and cows – it’s complicated

10 AI identifies emerging pathogens 1

he first bacteria added to the NCTC Comparing the historical strains to modern NCTC 3000 was a strain of dysentery-causing ones enables studies into epidemiology, T Shigella flexneri isolated in 1915 from virulence, prevention, and treatment reveals history a soldier in the trenches of World War 1. of infectious diseases. The genome The collection also includes 16 samples sequences will support further understanding of antibiotic deposited by Sir Alexander Fleming, of the bacteria and the diseases they including one from his own nose. cause. The sequences will also aid in the resistance development of new diagnostics – to rapidly The NCTC contains deadly strains of test for infections and identify the source The National Collection cholera, plague, MRSA and tuberculosis, of an outbreak. of Type Cultures (NCTC) making it one of the largest collections of clinically relevant species. Scientists Comparing historical and modern strains was set up in 1920 and worldwide use it for medical, scientific of bacteria will also aid in tackling the global has become the longest and veterinary studies. threat of antibiotic resistance. The genome sequences of historical samples, before the established collection of Now, researchers from the Sanger Institute, introduction of antibiotics and vaccines, can bacteria in the world. European Bioinformatics Institute (EMBL-EBI), be compared to those from current strains and Pacific Biosciences – allowing researchers to track a species’ (PacBio) have sequenced 3,000 bacteria evolution in response to antibiotics. from the collection. This included sequencing all the ‘type strains’ of bacteria – those samples which describe a species and are used as a classification standard.

The sequencing was performed using PacBio’s single DNA molecule, long-read sequencing technology, resulting in high-quality reference sequences for these important species.

3,000 bacteria were sequenced

26 Wellcome Sanger Institute Highlights 2018/19 What we do Our work Our approach Other information

Our work Parasites and Microbes

2

Reporting in Nature Genetics, they compared Many of the drug compounds identified Largest worm the genomes of 81 species of roundworms by the team are existing treatments for and flatworms, including 45 that had not other human illnesses, and have the genomes study been sequenced before. Their analysis potential to be re-purposed for deworming. revealed 800,000 new genes in thousands This comparative genomics resource finds new drug of new gene families. They found gene provides a much-needed boost for the families that modulate host immune research community to understand and targets responses, enable the parasite to penetrate combat parasitic worms. though host tissues (such as the gut), or allow A quarter of the world’s the parasite to feed. The insights will enable researchers to better understand how the population are infected worms invade their hosts, and evade with parasitic nematodes Reference immune systems. International Helminth Genomes Consortium. (roundworms) or Comparative genomics of the major parasitic worms. At the moment, few drugs are available Nature Genetics 2019; 51: 163 –174. platyhelminths (flatworms). to treat worm infections. To accelerate nfection is rarely lethal, but can cause the search for new and effective treatments, debilitating pain or chronic diseases such the team mined the genomic data to look as river blindness and schistosomiasis. for potential new drug targets. They studied I all 1.4 million genes across the parasitic To advance research into these neglected tropical diseases, the International Helminth species. Combining this information with Genomes Consortium, led by Sanger Institute the ChEMBL database of bioactive molecules researchers, undertook the largest genomic with drug-like properties, they list 40 study of parasitic worms to date. high-priority drug targets in the worms, and hundreds of possible drugs.

3

Drug-resistant strains of Before the samples were sequenced, Real-time, they were tested locally to identify the the Neisseria gonorrhoeae strain and any antibiotic sensitivity. Europe-wide bacterium are on the rise. Genome sequencing proved to be more accurate than current laboratory techniques gonorrhoea 88 million people worldwide in identifying strains resistant to antibiotics, are infected with the sexually and even highlighted incorrect laboratory monitoring transmitted bacteria, which results. The findings, reported in Lancet Infectious Diseases, will help public health is feasible causes gonorrhoea, and officials monitor emerging resistance and increasing numbers of those doctors prescribe effective antibiotics. infections are difficult to treat. To aid global efforts to control antibiotic resistance, the team has established esearchers at the Centre for Genomic a web-based database of N. gonorrhoeae Pathogen Surveillance (CGPS) and genome sequences. The new, open-access the Sanger Institute have carried out R resource will support real-time surveillance the first pan-European genome-wide survey of gonorrhoea worldwide, enabling of gonorrhoea. Working with the European researchers to rapidly share, compare Centre for Disease Control and collaborators, and interrogate their data. the team combined sequencing the genomes of 1,054 samples of N. gonorrhoeae from 20 countries with the bacteria’s phenotypic and epidemiological data. The result is an easily accessible, online map of antibiotic Reference Harris SR et al. Public health surveillance of 88 resistance of the bacteria across the multidrug-resistant clones of Neisseria gonorrhoeae continent – PathogenWatch. in Europe: a genomic survey. Lancet Infectious million Diseases 2018; 18: 758–768. people worldwide are infected by sexually transmitted bacteria

Wellcome Sanger Institute Highlights 2018/19 27 Our work Parasites and Microbes

4

Some people infected were Tracking cholera asymptomatic – suggesting an important role for immunity in the spread of the – house by house disease. Many people were infected with more than one strain simultaneously. There are three to five million cases of cholera per year To determine the diversity of V. cholerae within a household, the team linked around the globe, with the household contact information to the greatest burden in regions bacteria’s genetic makeup. The data showed a high degree of genetic where it is endemic. relatedness between V. cholerae isolated eporting in Nature Genetics, from members of the same household. researchers at the Sanger Institute This indicates that infections were and collaborators in Bangladesh due to within-household transmission, R or exposure to a common source. studied the Vibrio cholerae bacterium in Dhaka, where the disease is hyper-endemic. To understand the spread of cholera more The city experiences two seasonal widely, the isolates from Dhaka were outbreaks of cholera every year. analysed in the context of a global collection Between 2002 and 2005, samples were of 513 additional 7PET genomes collected taken from cholera patients admitted to the between 1957 and 2014. The results Dhaka Hospital of icddr,b in Bangladesh. highlight a complex history of cholera Over a surveillance period of three weeks, in South Asia, and suggests that samples were also collected from other a connected, regional network of members of each patients’ household. cholera transmission exists. In total, 303 V. cholerae samples were The work highlights the importance of collected from 224 individuals in 103 relevant control strategies. Preventing households. the spread of cholera within the household The team sequenced the genomes of could enormously reduce cholera outbreaks. the V. cholerae samples. Each sample was This could have an impact, not only on mapped to a V. cholerae reference genome the individual households, but also on to determine their genomic similarities. the entire region. DNA analysis showed that six sublineages of the seventh pandemic V. cholerae El Tor (7PET) strain were circulating concurrently in Dhaka during the study. Each sublineage Reference waxed and waned with a different pattern Domman D et al. Defining endemic cholera at three levels of spatiotemporal resolution within Bangladesh. in that time. Non-pandemic strains of the Nature Genetics 2018; 50: 951–955. bacteria were also present. 3-5 million cases of cholera worldwide

28 Wellcome Sanger Institute Highlights 2018/19 What we do Our work Our approach Other information

Our work Parasites and Microbes

The disease has caused over 2,500 in Yemen

5

Previous theories suggested that the two Cholera in a outbreaks of cholera in Yemen in 2016 and 2017 were caused by two different strains conflict zone of the bacteria – but this study revealed they were caused by the same V. cholerae strain. As well as a brutal civil war, The team also uncovered that, unusually, Yemen has faced the worst the Yemeni cholera strain was susceptible epidemic of cholera since to several antibiotics. records began. They hope their work will enable better understanding of how cholera spreads, he disease has affected over one as well as inform outbreak prevention. million people and caused almost T 2,500 deaths. In 2018, the United Nations estimated that 16 million of the

29 million people in Yemen lacked access Reference to safe water and basic sanitation. Weill FX, et al. Genomic insights into the 2016–2017 cholera epidemic in Yemen. Nature 2019; 565: 230–233. Researchers from the Sanger Institute, Institut Pasteur and Médecins Sans Frontières worked together to characterise the cholera outbreak during a peak in 2017. Reporting in Nature they uncovered that the bacteria was likely introduced into Yemen with the movement of people from East Africa.

The team sequenced the genomes of 42 Vibrio cholerae bacterium samples from Yemen, together with 74 cholera samples from South Asia, the Middle East and Eastern and Central Africa. They compared the genomes to a global collection of over 1,000 cholera samples from the current and ongoing pandemic – which is caused by a single lineage of V. cholerae, called 7PET.

Wellcome Sanger Institute Highlights 2018/19 29 Our work Parasites and Microbes

6

Gabon, have studied the genomes of Reporting in the journal Nature Microbiology, How malaria P. falciparum’s relatives to uncover its they compared the genome sequences evolutionary history – including its jump of the Laverania species, building a picture jumped to from infecting gorillas to infecting humans. of their relatedness, and key events in their evolution. They estimate that P. falciparum humans The team sequenced the genomes of all first emerged 50,000 years ago, and fully known Laverania species. Previously, only diverged as a human-specific parasite The Plasmodium falciparum two other genome sequences were available 3,000–4,000 years ago. Contrary to parasite causes the most for these species. A key challenge was previous theories, they estimate that the obtaining the parasites from great apes. parasite jumped from gorilla to human deadly form of malaria in The researchers used blood samples more than once during this time. humans, and is responsible taken as part of routine health checks of chimpanzees and gorillas living in sanctuaries The team also sought to uncover for 90 per cent of malaria in Gabon. The samples were tiny, containing the genomic changes that allowed deaths worldwide. very few parasites, so they devised strategies P. falciparum to jump between species. to amplify the DNA and obtain good quality They detail changes in gene copy number t belongs to the Laverania subgenus genome sequences (see below). and structural variations in gene families. of parasites which infects great apes – In particular, they discovered the movement Iincluding humans, chimpanzees and of one cluster of genes that appears to be gorillas. P. falciparum is the only species a significant event that enabled the parasite of the subgenus to infect humans. to infect human red blood cells. Their work forms the basis for new investigations into Scientists from the Sanger Institute and how malaria parasites adapt to infect humans. their collaborators from the French National Center for Scientific Research (CNRS), French How to get high-quality National Research Institute for Sustainable genomes from Development (IRD), and the International miniscule samples Centre for Medical Research of Franceville, Reference • Deplete the host’s DNA Otto TD et al. Genomes of all known members of a from the sample Plasmodium subgenus reveal paths to virulent human malaria. Nature Microbiology 2018; 3: 687–697. • Apply parasite cell sorting • Amplify parasite DNA templates • Use long-read sequencing (single-molecule real-time Plasmodium falciparum technology) became fully human- specific approximately • Use short-read sequencing 3,000–4,000 years ago

30 Wellcome Sanger Institute Highlights 2018/19 What we do Our work Our approach Other information

Our work Parasites and Microbes

7

The single-celled parasite is technically Malaria screen difficult to study because its genome is rich in the DNA bases and thymine. finds genes However, the teams turned this disadvantage into a positive by utilising piggyBac- essential for life transposon insertional mutagenesis. Developed by the Sanger scientists, Malaria is caused by this technique disrupts and inactivates Plasmodium parasites – the genes at random. most deadly species being The researchers used it to create 38,000 Genes that are current candidates for Plasmodium falciparum. P. falciparum mutants that knocked out drug targets were identified as essential, the action of 87 per cent of the parasite’s including genes involved in the proteasome P. falciparum is responsible genes. Each genome was then sequenced degradation pathway. This is the likely target for half of the estimated with quantitative insertion site sequencing of the current first choice drug for treating 200 million malaria infections to pinpoint the mutations. Those genes malaria – . In contrast, many without mutations were considered of the genes that are targets of current worldwide. essential for survival. vaccines were not shown to be essential. The findings will help guide researchers o understand the parasite better, Reporting their findings inScience , they as they design new treatments. as well as how it might be tackled, identified 2,680 essential genes, and ranked T researchers from the University of their importance. Genes involved in translation, South Florida (USF) and the Sanger Institute RNA metabolism, and cell cycle control have uncovered which of its genes are were found to be the most important. essential for survival in humans. Such Reference 1,000 of the essential genes are conserved Zhang M et al. Uncovering the essential genes of genes might prove to be important targets across all Plasmodium species and have the human malaria parasite Plasmodium falciparum for treatments. completely unknown functions, suggesting by saturation mutagenesis. Science 2018; 360: a great deal of important malaria biology e a a p78 47. is yet to be uncovered.

8

The single-celled Plasmodium AP2-G is necessary for gametocyte Flipping malaria’s production in several Plasmodium species parasites that cause malaria – acting as a master switch to trigger sex master have complex lifecycles. gametocyte formation. Reporting in Nature Microbiology, the researchers developed switch esearchers at the University of Glasgow a system to increase AP2-G activity in and the Sanger Institute previously the rodent-infecting malaria parasite, Ridentified a genetic ‘master-switch’, Plasmodium berghei. They show that when known as AP2-G, involved in a key stage of AP2-G is amplified, it causes the majority parasite development. They have now created of parasites to convert to gametocytes. a system to study and manipulate this master switch in the lab – confirming its importance, This discovery allowed the team to chart and opening up new avenues for research the changes in gene activity through into preventing the spread of malaria. gametocyte development. They showed that gender-specific changes occurred Plasmodium parasites invade their host’s within six hours of induction. They also red blood cells, and then follow one of two identified several potential targets in AP2-G. developmental pathways. Some replicate If these functions were disrupted by drugs, asexually to sustain the infection – causing this could stop the parasite reproducing. the symptoms of malaria. Others convert into male and female gametocytes, the sexual stage of development. These

gametocytes are taken up by mosquitoes Reference and fuse together to form new parasites Kent RS et al. Inducible developmental that then infect the next person. How reprogramming redefines commitment to sexual gametocytes are made is not well understood, development in the malaria parasite Plasmodium berghei. Nature Microbiology 2018; 3: 1206–1213. partly because it is difficult to generate high numbers of male and female parasites in laboratory conditions.

Wellcome Sanger Institute Highlights 2018/19 31 Our work Parasites and Microbes

9

The teams sequenced 800 strains of However, the relationship between antibiotic Humans and S. aureus from 43 different host species, resistance of bacteria in livestock, and those isolated in 50 countries. They used the data in humans, is complex. To explore this cows – it’s to work out the evolutionary relationships further, the Sanger Institute team undertook between the species, pinpointing key a separate study of Escherichia coli, in complicated moments in the bacteria’s evolution. conjunction with researchers at the School of Hygiene and Tropical Medicine Parasitic bacterial species, The researchers show that humans were and Addenbrooke’s Hospital in Cambridge. like Staphylococcus aureus, the original hosts of the bacteria. It has made 14 jumps from humans to cows They studied the genomes of 431 E. coli that can jump between following the domestication of the animals. samples from livestock and 1,517 from livestock and humans pose They estimate 10 strains have jumped back people. All the samples were from the same from cows to humans over the last 5,000 location: the . They found a threat to public health and – 6,000 years, causing infections in that antibiotic resistance genes were not food security. populations around the world. While other shared between the two groups. The results, animals did swap strains, humans and published in mBio, suggests that, at least sually S. aureus is harmless to cows are the main reservoirs. in the populations sampled, it is unlikely humans, residing in the nose. that antibiotic resistance in E. coli bacteria UBut some strains, like MRSA, have Each time the bacteria jumps between is transferred from livestock to people. evolved resistance to antibiotics, and can species, it acquires new genes enabling prove deadly. The bacteria is also a major survival in its new host. The genes are burden for the agricultural industry – causing acquired by horizontal (bacteria-to-bacteria) a range of diseases from mastitis to gene transfer, possibly from other species References septicaemia in livestock. present in the host gut. The team also show Richardson EJ et al. Gene exchange drives the evidence of adaptive evolution of the strains ecological success of a multi-host bacterial pathogen. Researchers from the University of to their host environments. Nature Ecology & Evolution 2018; 2: 1468–1478. Edinburgh’s Roslin Institute, together with Ludden C et al. One health genomic surveillance of collaborators at the Sanger Institute, have In some cases, the genes acquired during Escherichia coli demonstrates distinct lineages and used genome sequencing to understand these processes can also confer resistance mobile genetic elements in isolates from humans how this parasitic bacteria can jump from to commonly used antibiotics. The team versus livestock. mBio 2019; 10: e02693–18. one host species to another. Their findings, found that the antibiotic resistance of human published in Nature Ecology & Evolution, and animal strains of the bacteria varies also show the evolutionary processes that – reflecting the different way antibiotics give rise to new, disease-causing strains. are used in medicine and agriculture.

This study shows how human bacteria can jump to animals and back. This information could help with designing more effective ways to prevent bacterial transfer in farms and better antibiotic practices.”

Professor Julian Parkhill, Wellcome Sanger Institute

32 Wellcome Sanger Institute Highlights 2018/19 What we do Our work Our approach Other information

Our work Parasites and Microbes

200 genes were identified which were involved in controlling the bacterium that causes food poisoning

10 AI identifies emerging pathogens

New, disease-causing Reporting in PLOS Genetics, the hosts and become more invasive – providing scientists developed and trained a important information about the bacteria’s bacteria continuously machine learning program using the functions and biology. This information emerge as strains evolve: genome sequences of Salmonella could help researchers design more some pose a major threat, enterica strains. They chose sample effective treatments in the future. Salmonella strains which live in the gut while others may only and cause mild food poisoning, as well The machine learning tool, which is cause mild symptoms. as those which escape the gut and cause freely available for others, is not limited serious infections, such as typhoid. The to Salmonella. It could be used to study hrough the widespread use of novel approach enabled them to identify emerging antibiotic resistance in any genome sequencing, researchers almost 200 genes involved in controlling bacterium. It could be used in real time, are tracking the spread of bacteria whether the bacterium will cause food to identify a dangerous strain of bacteria Taround the globe to inform public health poisoning or an invasive infection. They before it spreads and causes an outbreak. interventions. identified patterns of genetic changes which were associated with a bacteria This work is starting to form the evolving to become more dangerous. cornerstone of future healthcare Reference initiatives. However, the approach can The tool was then tested on Salmonella Wheeler E et al. Machine learning identifies only monitor the rise of drug resistance strains currently emerging in sub-Saharan signatures of host adaptation in the bacterial or greater virulence in the bacteria, it pathogen Salmonella enterica. PLOS Genetics Africa. It correctly identified two dangerous 2018; 14: e1007333. cannot currently predict when a new strains within a group of commonly epidemic may arise. In order to better circulating infections. The tool also revealed understand how bacteria become the precise genetic changes that enabled dangerous, Sanger Institute researchers the Salmonella strains to adapt to their and colleagues have developed a powerful new machine learning tool.

Wellcome Sanger Institute Highlights 2018/19 33 Tree of Life

In this section 1 25 Genomes for 25 Years

2 Biology’s moonshot: the Earth BioGenome Project

3 Darwin Tree of Life Project will create UK’s genomics ‘Domesday Book’

4 Project produces platinum-quality genomes

5 New approach opens up the insect world

6 Bird genomes aid conservation efforts

1

The 25 new species represent wildlife There have already been intriguing insights 25 Genomes in the UK, from well-loved iconic species from the sequences; king scallops are more to invasive plants which are threatening genetically diverse than humans, and Roesel’s for 25 Years biodiversity. In total, 20 of the species bush-crickets have genomes that are four were chosen by Institute researchers in times the size of the human genome. In 2018, the Sanger Institute collaboration with experts in conservation, turned 25 years old. As part natural history and ecology. The final five Many of the species have intriguing species were chosen by public vote as part adaptations and behaviours that could of its anniversary, it chose a of ‘I’m a Scientist Get Me Out of Here’. be understood by future research based uniquely Sanger-inspired way on their genome sequences. The bat Reading the genomes of such a diverse range genome might shed light on why they are to celebrate. of species brought interesting and surprising resistant to so many diseases, including he Institute sequenced 25 UK challenges. Some species were difficult cancer. While studying the genome of the species for the first time to provide to identify. Others have complex cellular common starfish could help with research a genomics resource for the life structures which made extracting DNA into limb regeneration. T for sequencing a particularly thorny issue. science research community.

The Institute was founded to sequence the The sequencing was completed in a little human genome. The international Human under 10 months, but the work to extract the maximum value from the data continues. Genome Project took 13 years to complete, The sequencing and the Institute became the largest single The sequence and associated data is being made freely available for researchers of all 25 species was contributor. Since then, its scientists have completed in a little under been involved in sequencing important worldwide via European Bioinformatics animal species, including the gorilla, mouse, Institute (EMBL-EBI) repositories. pig and zebrafish. Thousands of microbes, 10 months parasites and disease-causing bacteria have also had their genomes decoded by Institute teams.

34 Wellcome Sanger Institute Highlights 2018/19 What we do Our work Our approach Other information

25 Genomes for 25 Years of the Sanger Institute

King Scallop Carrington’s Featherwort Northern February Red Stonefly

New Zealand Flatworm

Lesser Spotted Golden Catshark Eagle

Summer Truffle

Grey Squirrel European Robin Red Squirrel Common Ringlet Pipistrelle Butterfly Bat

Fen Eurasian Raft Spider Otter

Asian Hornet Turtle Dove

Water Vole Roesel’s Red Mason Bush-Cricket Bee Common Starfish Brown Giant Trout Hogweed

Indian Balsam

Oxford Blackberry Ragwort

Genome size Categories

Flourishing – species on the up in the UK Floundering – endangered and declining species Dangerous – invasive and harmful species Iconic – quintessentially British species that we all recognise <1 GB 1–1.99 GB 2–2.99 GB 3–3.99 GB >4 GB Cryptic – species that are out of sight or indistinguishable from others based on looks alone Chosen by the public

Wellcome Sanger Institute Highlights 2018/19 35 Our work Tree of Life

2

moonshoot, the new sequences will When the began Biology’s revolutionise the understanding of life 25 years ago, it was not possible to imagine on earth. Genome sequences will provide how it would transform research into human moonshot: the insight into evolution and conservation, health and disease. The same is true for and provide new resources for researchers sequencing all life on earth. The discoveries Earth BioGenome in agricultural and medical fields. it could unlock may help to develop new treatments for infectious diseases, help Project This vast project is expected to take just create new bio materials, or generate 10 years to complete and an estimated new approaches to feeding the world. The Sanger Institute is $4.7 billion which, when accounting for leading the UK contribution inflation, is less than the cost of the Human Genome Project. to biology’s most ambitious project yet. The amount of biological data that will be produced and collected by the project he Earth BioGenome Project is a is expected to be on the exascale; more global collaboration of more than 17 than that accumulated by the current T institutions with the extraordinary aim biggest data producers in the world; Twitter, of sequencing the genomes of all complex YouTube and astronomy. The data will be life on the planet: 1.5 million known species stored in public-domain databases and of (animals, plants, fungi and access will be open to all researchers. protozoa). The work will create a genomic foundation for biology that could help preserve biodiversity and the sustainability of human societies.

Currently, fewer than 3,500, or about 0.2 per cent of all known eukaryotic species, have had their genome sequenced. Just 100 of those are high-quality reference sequences. Described as biology’s

The vast project is expected to take 10 years to complete

The project plans to sequence the genomes of 1.5 million known species Globally, more than half of on the planet the vertebrate population has been lost in the past 40 years. Using the biological insights we will get from the genomes of all eukaryotic species, we can look to our responsibilities as custodians of life on this planet.”

Professor Sir Mike Stratton, Director of the Wellcome Sanger Institute 36 Wellcome Sanger Institute Highlights 2018/19 What we do Our work Our approach Other information

Our work Tree of Life

The Sanger Institute is leading the DNA sequencing of all 66,000 species in the UK It is estimated that the Institute’s contribution to the project could cost approximately £100 million

3

he initiative, named the Darwin Tree It is estimated that the Institute’s Darwin Tree of Life Project, brings together the contribution to the project could cost T Sanger Institute; Natural History approximately £100 million over the first of Life Project Museum; Royal Botanic Gardens, Kew; five years, with the entire Darwin Tree of Earlham Institute; European Bioinformatics Life Project expected to run for 10 years. will create Institute (EMBL-EBI); Edinburgh Genomics, and many others. To derive the maximum scientific benefit UK’s genomics from the project, the Institute is establishing To prevent duplication of effort, the Sanger a new research programme, The Tree of ‘Domesday Book’ Institute will coordinate the Darwin Project Life. Organised in the same way as the and will liaise with the other initiatives and Institute’s other research programmes, As part of the Earth consortia that are also contributing to the it will recruit a cohort of Faculty to design BioGenome Project to Earth BioGenome Project, including the and deliver a range of laboratory and G10K Vertebrate Genomes Project and computational experiments using the sequence all complex life on the 10,000 Genomes Plant Project. project’s data and techniques. Their work earth, the Sanger Institute is will unlock insights into areas such as This vast project is only possible because population genomics, evolution and the leading the DNA sequencing of recent advances in sequencing technology, emergence of species, and why some of all 66,000 eukaryote not least in terms of speed. Advances in species develop certain genetic conditions information technology and the analysis of when others do not. species in the UK. biological data are also important factors, and it is expected that this project will accelerate further progress in both of these areas.

Wellcome Sanger Institute Highlights 2018/19 37 Our work Tree of Life

4 23,000 species of vertebrate face the threat The international team is part of the of extinction Project produces Genomes 10K consortium. The 15 new, high-quality reference genomes represent platinum-quality species from all five vertebrate classes; , birds, reptiles, amphibians and genomes fish. All the data is freely available in the Genome Ark database, the project’s In September 2018, the open-access library of genomes. Vertebrate Genomes Project, The researchers hope that their work will The experts concluded that single DNA which aims to provide highlight the current threats to survival that molecule long-read technologies always ‘platinum-quality’ (near so many species face. More than half of the give the highest quality results and should world’s vertebrate population has been lost be coupled with technologies that measure error-free and complete) in the past 40 years, and 23,000 species long-range genome interactions to assemble genome sequences of all face the threat of extinction. Among them the DNA reads into whole chromosomes. 66,000 vertebrate species is the kakapo, whose genome was read by The consortium has found that the common the project; there are less than 150 left alive. practice of merging an individual’s paternal on earth, completed its first and maternal chromosomes into one 15 reference genomes. A key part of the consortium’s work has genome was causing numerous errors. been to develop protocols and pipelines To overcome this, they now assemble anger Institute teams are playing a that can produce ‘platinum-quality’ genome paternal and maternal DNA separately. key role in the project by supplying sequences. The group drew together more SDNA sequencing and computational than 150 experts from academia, industry, analysis to the work. and government, representing over 50 institutions in 12 countries, to compare the major sequencing and analysis technologies.

5

he technique not only allows By removing two steps from the DNA New approach fine-grained exploration of mosquito preparation process – DNA shearing and T populations, it also enables biologists filtering – the team produced the whole opens up the to obtain de novo reference genomes for genome sequence of a single Anopheles individual insects and other small species. coluzzii mosquito at high quality and with insect world few gaps. As a result, nearly half of the Reporting their work in Genes, the team species’ previously unplaced DNA Sanger scientists, collaborating detailed how they generated a whole fragments have been assigned to their with Pacific Biosciences genome sequence from just 100 nanograms correct chromosomal position. of a single mosquito’s DNA – the equivalent (PacBio), have developed of half a mosquito. This is a full order This advance is an important boost for new laboratory preparation of magnitude less than the amount the Earth BioGenome Project, as it improves previously needed. the quality of DNA sequencing. It will also protocols that allow allow researchers to use specimens from researchers to accurately When sequencing a species for the first museums or CryoArks, where using the read the whole genomes time, longer reads of DNA are preferable. least amount of material possible is vital. PacBio’s Single Molecule, Real-Time And it will enable researchers to explore of individual mosquitoes. sequencing method produces long full genetic diversity of insects and other stretches of DNA with high accuracy, arthopods – the most diverse animal group but it requires approximately 5 micrograms in the Tree of Life – by sequencing individual of DNA. For small species this means that members of each species. researchers have to pool DNA from many organisms. The resulting reads can be difficult to align, increasing the chance of errors and gaps. Reference Kingan SB et al. A high-quality de novo genome assembly from a single mosquito using PacBio sequencing. Genes 2019; 10: 62.

38 Wellcome Sanger Institute Highlights 2018/19 What we do Our work Our approach Other information

With the golden eagle genome sequence, we will be able to compare the eagles being relocated to southern to those already in the area to ensure we are creating a genetically diverse population.”

Dr Rob Ogden, Head of Conservation Genetics at the University of Edinburgh and a scientific adviser to the South of Scotland Golden Eagle Project

6 Bird genomes aid conservation efforts

The golden eagle, swiftly project. Conservation scientists will The European robin genome may harbour be able to compare the eagles being the secrets of bird migration. The species followed by the robin and relocated to southern Scotland to live across large swathes of northern turtle dove, were the first those already in the area to ensure Europe and Siberia. Some overwinter in the species to have their that a genetically diverse population UK, escaping colder Scandinavian weather, is being created. while others live in the UK during the genomes sequenced as summer, migrating to southern Europe part of the Sanger Institute’s The European robin and turtle dove in the winter. were sequenced in partnership with the 25 Genomes project. University of Lincoln. The turtle dove Birds can use the earth’s magnetic field is one of the UK’s fastest declining bird for orientation during migratory journeys, he three birds had not previously species – since 1995, 94 per cent of and the magnetic compass in birds was been sequenced, and so the turtle doves have been lost and there first described in a robin. The European resource will be a huge boost to T are fewer than 5,000 breeding pairs left robin genome will allow researchers to those studying these iconic species. in the UK. Its genome sequence will help identify what’s driving migration in birds, The golden eagle sequence was researchers understand the pressures and understand the variability of migration completed together with researchers that are affecting the birds – be that both within a species and more widely. at the University of Edinburgh. It will disease or a decline in food. It will also aid monitoring of existing, reinforced aid practical conservation efforts, and reintroduced populations of golden by enabling conservationists to eagles, such as those in the South of maximise the genetic diversity Scotland Golden Eagle translocation of introduced populations.

Wellcome Sanger Institute Highlights 2018/19 39 Our approach

We foster strong collaborations with scientists, clinicians, institutions, governments and society for mutual benefit

42 Scale 48 Influence Genomic inquiry requires vast By leading global initiatives and facilitating volumes of data, experimental cross-cutting partnerships we seek to models and computational power. lay the foundations for a strong and vital Our Institute’s unique, scalable and future of genomic research, data sharing robust infrastructure delivers – both and clinical application. for us and researchers worldwide.

44 Innovation 50 Connections To take our research findings to the We use the power of the internet and next level and deliver transformative collaboration tools to build genomic technologies we work in collaboration research capacity worldwide and with biotechnology and pharmaceutical facilitate the next wave of discovery. industries and funders.

46 Culture As genomic research begins to impact clinical practice and society, our researchers are crossing traditional divides to work with entrepreneurs, health services and society.

40 Wellcome Sanger Institute Highlights 2018/19 What we do Our work Our approach Other information

Read more about our superfast and super-accurate sequencing Page 43

Discover how we hacked our genomic future Page 51

Find out about the John Sulston’s legacy Page 47

Wellcome Sanger Institute Highlights 2018/19 41 Scale

In this section 1 Flexible cloud computing wins peer award

2 Institute establishes cancer organoid production line

3 Superfast, super-accurate sequencing

1

he team won HPCWire’s 2018 Flexible cloud Readers’ Choice Award for Best Use Genomic and clinical of High Performance Computing in T data challenges computing wins the Cloud. The worldwide community cited the creative way that the Institute combines peer award cloud computing with its onsite data centre Security The sensitive and personal to store, analyse and share genomic data. The Sanger Institute’s High nature of human genome and Performance Computing Genomic data, especially when coupled clinical data demands the with clinical data, poses many challenges highest levels of protection team has been praised by – see box to the right. These concerns can the global IT industry for the often rule out public cloud storage, yet Jurisdiction standard onsite data centre solutions are Such data must be stored in way it has pioneered flexible, increasingly unable to cope with the volume locations that comply with the scalable and secure computing and complexity of data being generated. EU’s General Data Protection Regulation (GDPR) and UK to analyse thousands of The answer was to put the cloud into the data centre. privacy laws human genomes. To ‘put the cloud in a box’ the team has Volume deployed a private OpenStack Cloud The scale of the data is vast, IT environment within its data centre. porting the Institute’s current This allows scientists to run their own genomic data to the external computational experiments, by creating cloud would take years an appropriately sized ‘virtual computer’ that uses the speed and cost-efficiency Cost Storing the Institute’s tens of of the onsite hardware to deliver its results, petabytes of data in the cloud while keeping all sensitive data behind would be prohibitively expensive formidable firewalls.

This Flexible Compute Environment not only provides rapid, reliable results for Sanger scientists, but for worldwide collaborators too. External scientists can upload the software and data needed to run their analysis, which then uses the data centre’s vast compute to query its enormous data sets. The results, shorn of any identifying data, are then sent back.

In this way, only small amounts of Winning the Readers’ non-sensitive data pass over the internet, Choice Award is especially sparing the collaborator’s internet bandwidth and computing facility, humbling as it means that our while also ensuring the highest security. peers worldwide acknowledge the groundbreaking work of our scientific computing and informatics teams.”

Tim Cutts, Head of Scientific Computing at the Wellcome Sanger Institute 42 Wellcome Sanger Institute Highlights 2018/19 What we do Our work Our approach Other information

2 The Sanger Institute has produced The work was a two-year pilot project Institute between the Institute’s cancer researchers, Scientific Operations teams, and Cancer establishes Research UK (CRUK) to develop efficient 100 networks and pipelines for organoid cancer organoids cancer organoid production. It forms part of the Institute’s lead role in the UK arm of the Human production line Cancer Models Initiative that aims to generate approximately 1,000 new models The Sanger Institute has to empower the study of cancer initiation, produced 100 cancer development, and progression. organoids – living 3D models CRUK provided access to its unique clinical that more accurately represent network to provide the tumour samples at Sanger’s researchers have sequenced the the human tissues tumours the time of surgical removal, along with their organoids’ genomes to create profiles of related clinical data. Working with surgeons their mutations, combined this with their grow in – to provide the next and pathologists across the UK, Sanger medical information, and then screened generation of cancer models Institute scientists established an efficient them against hundreds of drugs to identify pipeline that covers ethical agreements, their susceptibilities. The models are now for researchers worldwide. transport infrastructure and robust laboratory available from the ATCC biorepository, pipelines to start culturing the tissues at the and their data is stored on the Cell Model hese 100 cover pancreatic, Institute within 36 hours of surgery. Passports website. oesophageal and colon tumours T at different stages of development – they were chosen because these cancers currently have few treatment options and poor survival rates.

3

machines to the work. Originally each Superfast, machine was expected to read approximately UK Biobank Vanguard 48 gold-standard human genomes per run, project to sequence super-accurate at three runs per week. But, through a the genomes of number of DNA preparation and wash cycle sequencing optimisations introduced by the Sequencing R&D team, the amount of high-quality When the Sanger Institute human genomes produced per run is 50,000 was chosen to deliver the UK expected to rise even further. volunteers Biobank Vanguard project to The Institute is delivering approximately sequence the genomes of 1,200 human genomes to the project each week, and at higher quality and 50,000 volunteers, the scale coverage than required. This is the equivalent of the task was substantial. of one gold-standard human genome every 8.5 minutes, and is expected to become o read each genome to even quicker. gold-standard level requires each T of its three billion bases to be read But the benefits are not just for UK Biobank- 30 times over. Therefore approximately based research: Sanger researchers can 4,500,000,000,000,000 bases of DNA now call upon this same pipeline. By the needs to be sequenced – only 10 per cent end of 2019, the majority of the Institute’s less than the entire amount of DNA the high-throughput DNA sequencing pipelines, Institute had read in its first 24 years. , RNA pipelines and sequencing from the 10X Genomics To achieve this goal, the team has dedicated platform will be carried out solely on eight of their fleet of 10 NovaSeq sequencing its other two NovaSeq machines.

Wellcome Sanger Institute Highlights 2018/19 43 Innovation

In this section 1 Microbiotica – getting to the guts of disease

2 Spinning out success

3 Sanger-developed technology can stop outbreaks

1

Roche Group. The two companies have Microbiotica – signed a strategic partnership to discover biomarkers and treatments for inflammatory getting to the bowel disease, which could see Microbiotica receive up to $534 million in upfront and guts of disease milestone payments. In addition, the company is using Genentech’s patient The Sanger Institute’s samples to further expand its culture microbiome-based spin-out collection and reference genome database. company, Microbiotica, has Microbiotica is also collaborating with the had a stellar year. University of Adelaide to analyse samples that have successfully treated ulcerative t specialises in analysing the gut flora of colitis. The aim is to identify the key bacteria patients, using state-of-the-art culturing, involved to develop a bacterial product that Isequencing and informatic techniques will reset the gut’s microbiome. developed at the Institute. By cataloguing the human gut microbiome at an industrial Based on these developments, the company scale, it has built up an extensive culture successfully completed its second round collection of human gut bacteria and of funding, raising a further £4 million reference genome database of gut from Seventure Partners. bacterial species.

The company uses these tools to identify the makeup of the bacterial community within individual patients. By applying artificial intelligence to the results, it is discovering microbiome ‘signatures’ that can be linked to specific diseases and conditions. These are then validated in mice with a humanised microbiome.

Microbiotica’s platform has attracted the attention of biopharma world leader Genentech, a member of the

Microbiotica has signed a deal worth up to $534 million with Genentech

44 Wellcome Sanger Institute Highlights 2018/19 What we do Our work Our approach Other information

2 Congenica helped to successfully deliver the UK’s by partnering with Digital China Health Spinning out Technologies Cooperation Limited to develop a Chinese version of its 100,000 success SapientiaTM decision support platform. Genomes Project Two established Sanger In 2018, therapeutics company Kymab Institute spin-out companies was tipped by US finance and investment magazine Forbes to become a $1 billion are making headlines valued company. There are only 300 or worldwide as they use so ‘Unicorns’ worldwide. genomic technologies The company generates antibody-based to advance healthcare. treatments and vaccines from mice with a humanised immune system whose origins enome analysis company, lie in the Sanger Institute. Two of its fully Congenica, is leading the way in human monoclonal antibodies are Gpersonalised medicine. In 2018, undergoing clinical trials. it became a key partner to deliver the world’s first routine national genomic The first, KY1005, targets a molecule at medicine service for the UK’s National the heart of a number of T-cell mediated Health Service (NHS). Founded on the inflammatory diseases, including atopic Institute’s Deciphering Developmental dermatitis and rejection of transplanted Disorders collaboration with the NHS organs. The second, KY1044, is to provide genomics-based diagnoses, being trialled in collaboration with the the company helped to successfully deliver pharmaceutical company Roche, to the UK’s 100,000 Genomes Project. treat solid tumours. It boosts the immune response by reducing the number of In addition, Congenica is supporting Regulatory T cells within tumours and China’s 100K Wellness Pioneer Project and stimulating T effector cells. is deepening its relations with the country 3

outbreaks. Where once diagnoses could Sanger- take days, now whole-genome sequencing combined with a world-leading data developed analysis platform can reveal the bacteria’s species, strain and relatedness in a matter technology can of hours. In addition, by comparing different patient’s bacterial genomes down to the stop outbreaks individual DNA , the service flags up emerging patterns of transmission, In 2012, Sanger Institute enabling infection control teams to act to scientists applied genome stop the outbreak. sequencing to track an To deliver a high-volume, low-cost and MRSA outbreak in a UK real-time service Next-Gen Diagnostics combines on-site robotic sample hospital’s neonatal ward preparation and sequencing, with reference and their insights helped genome libraries and artificial intelligence to break the transmission. in a single package.

he study demonstrated that sequencing To demonstrate its capabilities, the bacteria’s whole genomes could company is conducting the first-ever trial produce real-time results with a of prospective whole-genome sequencing T of MRSA samples as an infection control clinical impact. Six years later, Sanger Institute technology has underpinned the system. Funded by Wellcome and the UK formation of a new company based on Government’s Health Innovation Challenge this approach to deliver a fully automated Fund, the study is being run in collaboration service that can be deployed in clinical with the microbiology and infection control settings worldwide. teams at Addenbrooke’s Hospital, Cambridge. The technology is also being Next-Gen Diagnostics offers a trialled at Mayo Clinic in the US. comprehensive genomic analysis service to monitor, diagnose and stop infectious

Wellcome Sanger Institute Highlights 2018/19 45 Culture

In this section 1 Promoting equality, diversity and inclusion

2 Committed to our technicians

3 John Sulston’s legacy – the Sanger Prize

1

To resolve these issues, in January 2019, Promoting the Sanger Institute joined a new initiative that seeks to change the approach and equality, diversity design of science across academic research, funding and the commercial and inclusion research sector.

UK science has a problem The Equality, Diversity and Inclusion – it is biased. But the bias in Science and Health (EDIS) coalition is seeking to create a community of is not only in terms of who organisations to drive forward evidence- conducts the research, but based policies to produce lasting change to the makeup of research teams, scientific also in terms of how it is focus and funding priorities. Its work conducted, on whom, and will also promote greater diversity and on what it is focused. inclusion in volunteer cohorts to ensure that life-changing medical and scientific or example, only 2 per cent of UK research benefits all. professors are black women and female scientists are underrepresented By becoming a member, the Institute F will provide financial support, share best at the highest positions in UK science. Equally, 80 per cent of individuals studied practice ideas, and help build the evidence in genome-wide association studies are base for future change strategies. In 2019, of Western European descent, while most the Institute will attend an EDIS symposium biomedical animal-based studies have to explore how research and experimental 80% a large bias to male populations. design can overcome biases in the sex of cells and animals studied; the ethnicity of individuals studied in and ancestry of participants in genome genome-wide association sequencing projects; and the diversity studies are of Western of participants in clinical trials. European descent

In 2017, scientists studied 234 traits in <50,000 mice, across 10 centres

In control mice, sex had an effect on: In mutant mice, sex modified the effect on:

56.6% 9.9% 17.7% 13.3%

quantitative quantitative quantitative quantitative traits traits traits traits Reference Karp NA et al. Prevalence of sexual diamorphism in mammalian phenotypes. Nature Communications 2017; 8: 15475.

46 Wellcome Sanger Institute Highlights 2018/19 What we do Our work Our approach Other information

2

Supported by the Science Council Committed to and Gatsby Charitable Foundation’s Visibility ‘Technicians make it happen’ campaign, Ensure that all technicians our technicians the Commitment addresses key issues are identifiable and that their facing technicians working in higher The Sanger Institute’s unique contribution is visible within education and research – see box. and beyond the organisation ability to conduct large-scale, Our two-year, technician-led action plan high-throughput genomic Recognition has been developed and will be regularly Ensure that technicians receive experiments is founded evaluated by a steering group of more the acknowledgement and on an army of technicians than 25 technicians and managers from recognition they deserve at across all the Institute’s research areas all levels of the organisation. who support our Faculty and disciplines. They will facilitate wider Provide opportunities for research teams. networking and support between professional registration technicians and create new processes heir dedication and skill to deliver to improve technicians’ training, access Career Development animal care, computational analysis, to professional qualifications, and career Enable career progression T DNA sequencing, data storage, and pathways. In addition, they will create opportunities for technicians wet-lab experiments powers all our research. new ways to celebrate the vital role through the provision of clear, of technicians in delivering the Sanger documented career pathways To ensure that our staff are rightly Institute’s science at scale. recognised and that the Institute is Sustainability equipping, empowering and inspiring Ensure the future sustainability the next generation of technicians, of technical skills across the we are delighted to have signed organisation and that technical up to the Technician Commitment. expertise is fully utilised

3

The first winner, back in 2005, was Anna John Sulston’s Protasio, who came from Uruguay and chose to work on the C. elegans project. legacy – the This sparked a research interest in parasite genomics that she pursued first at the Sanger Prize University of Cambridge, and then for nine years under Matt Berriman at the In 2018 the founding director Sanger Institute. Anna is now working of the Sanger Institute, at the University of Cambridge, but continues her association with the Sir John Sulston, died. Institute as a lead content developer But his legacy lives on in and educator for Advanced Courses a scheme which reflects and Scientific Conferences. many of his qualities. In the years since, winners have come from countries including China, India, Mexico hen Sir John won his Nobel Prize and Uganda. Many have been inspired to in 2002, he donated his prize pursue a career in genomics and are now W money to found a charity that based in institutes in France, the USA, India, enables undergraduate students from low- Uganda and Estonia. They are applying and middle-income countries to come genetics and genomics to antibiotic More than and work at the Institute. The annual resistance in Escherichia coli, drug targets Sanger Prize awards one student a in parasitic flatworms, genome variation three-month internship with one of our across and within species, and the role research groups to practise cutting-edge of epigenetics and gene regulation in 450 genomic research – all travel, living and neurodevelopment. students applied for research expenses paid. More than 450 the 2019 Award students applied for the 2019 Award.

Wellcome Sanger Institute Highlights 2018/19 47 Influence

In this section 1 Cambridge hub of HDR UK aims to transform diagnoses

2 MPs offer scientists top influencing tips

3 GA4GH Connect – making big data a reality

1

The hub has four key areas of activity: 3. Monitor the carriage, transmission and Cambridge hub evolution of infectious bacteria in the 1. Apply genomics, genetics, single cell general population to understand how of HDR UK aims studies, medical imaging and electronic organisms that live harmlessly on large health records to correlate the symptoms numbers of people can give rise to to transform of disparate diseases with shared life-threatening outbreaks and drive cellular signalling pathways and antibiotic resistance. diagnoses molecular processes. 4. Develop IT and data analysis infrastructure Health Data Research UK (HDR 2. Create a knowledge base of the clinical to allow seamless and secure gathering UK) has awarded £5 million to course of rare and extreme diseases. of sensitive information, computational By analysing the health records of exploration, and dissemination of results. the Sanger Institute, European individuals with genomic diagnoses Bioinformatics Institute made by NHS-based studies, including The hub’s findings could drive new the Deciphering Developmental Disorders therapies, optimise treatment strategies, (EMBL-EBI) and the University (DDD) project, the Prenatal Assessment and lay the foundations of truly personalised of Cambridge and Cambridge of Genomes and Exomes (PAGE) and medicine. , it may be possible University Hospitals to form to develop novel therapies and optimise HDR UK’s Cambridge hub. decision making.

he hub is exploring how genomics and single-cell studies can be T combined with NHS health data to reveal the molecular underpinnings of disease symptoms and prognoses. This work could shift the paradigm of medical diagnosis from observed physical characteristics to the underlying molecular processes.

£5 million awarded to the Sanger Institute by Health Data Research UK

48 Wellcome Sanger Institute Highlights 2018/19 What we do Our work Our approach Other information

2

he adage ‘Information is dead Current MPs Heidi Allen and Vicky Ford unless it is read’ is never truer than were joined by minister Nicola Blackwood T when applied to research findings and policy professionals to highlight the and decision makers. most effective ways in which scientists can present their evidence. Locked away in research papers or conference reports, the vital knowledge A key lesson for the delegates was the MPs and policy makers need may never need to target honed and brief messages be seen. Even if the research paper is read, to time-poor MPs. Another insight was the without context or guidance, the reader importance of timing and context: the MPs offer may not be able to extract its policy decision makers highlighted the key stages implications or recommendations. Equally, in the legislative process where scientific scientists top researchers can find the complex workings evidence would have maximum impact. of the UK’s Civil Service, Parliament and influencing tips Government impenetrable. The conference was a resounding success with many delegates wanting to build their Scientific research and To bridge this divide, the Sanger Institute’s engagement with policy makers. Policy team worked with Connecting knowledge has much to Science’s Advanced Courses and Scientific offer the UK’s economy, Conferences to run a two-day course that but it should also help guide brought MPs and scientists together. The course was designed to guide scientists strategic policy making on through the policymaking process to help issues that impact health them engage with decision makers. and science.

3

Significant progress has been made in GA4GH Connect areas ranging from healthcare and research to patient advocacy and information – making big data technology, but the work is far from done. To deliver true interconnectivity, the alliance a reality launched GA4GH Connect – with the aim of enabling responsible sharing of clinical- Big data presents exciting grade data by 2022. new opportunities for genomic The work is being carried out by eight and medical researchers to alliance work streams, but many of their unlock insights into health efforts would not be possible without the administrative and technical support and disease. provided by the Sanger Institute. Staff at ut the hurdles to exploiting its power the Institute help coordinate and guide the 2,000+ are significant: incompatibilities workstreams and others contribute to the subscribers between different approaches to data development of technology standards, B policies and guidance in regulation, ethics, collection, storage and analyses can make combining and comparing results challenging. and data security.

To address this issue, the Sanger Institute In this way, the Sanger Institute is playing 500+ was a founding member of the Global a key role in shaping the future of genomic organisational Alliance for Global Health (GA4GH) in 2013 and medical research worldwide, in areas members and, along with the Broad Institute and as diverse as genomic variation and the Ontario Institute for Cancer Research, annotation information, clinical and provide the Secretariat of GA4GH. The phenotypic data, ethical data gathering, initiative seeks to enable effective and and researcher identification. responsible sharing of health and genomic 71 data by developing a common framework countries of standards and harmonised approaches.

Wellcome Sanger Institute Highlights 2018/19 49 Connections

In this section 1 Campus’ unique collaboration gains new partners

2 Sanger science at the heart of research excellence

3 Hacking our genomic future

1

The initiative seeks to transform drug Institute, allows researchers to explore Campus’ unique discovery by systematically identifying variant-gene-trait associations from and prioritising drug targets. It combines UK Biobank and the GWAS Catalogue. collaboration collaborator’s skills and technologies to create a critical mass of expertise that could The new genetics portal enables gains new not exist in one organisation. Large-scale researchers to browse, visualise and genomic experiments and computational interpret human genetics and genomics partners techniques from the public domain data to unravel the biology of human are blended with pharmaceutical R&D diseases, inform drug repurposing, Open Targets is a pioneering approaches to identify causal links and predict toxicity effects. between targets, pathways and diseases. pre-competitive public-private In addition, new Director Ian Dunham, partnership between the Each new partner shares new expertise an Honorary Faculty at the Sanger Institute, ’ with the collective. Celgene brings new is seeking to increase the collaboration’s approaches to the discovery and development use of single-cell sequencing, CRISPR academic institutes – the of therapies in cancer, immune-inflammatory and artificial intelligence. Sanger Institute and European and other unmet medical needs. While Bioinformatics Institute Sanofi’s involvement increases the initiative’s expertise in immunology, (EMBL-EBI) – and the oncology, neurosciences and . pharmaceutical industry. The freely available Open Targets Platform aunched in 2014, the collaboration contains over 28,000 targets, in excess of The Open Targets has five commercial partners: GSK, 3,000,000 associations, covering more than Platform contains over L Biogen, Takeda, Celgene and Sanofi 10,000 diseases. And a new tool – Open – with the latter two joining in 2018. Targets Genetics – developed by the Genetics Core Team from the Sanger 28,000 targets

We have built strong partnerships and established powerful research programmes that exploit advances in genetics and genomics to improve drug target identification.”

Ian Dunham, Director of Open Targets and Honorary Faculty at the Sanger Institute

50 Wellcome Sanger Institute Highlights 2018/19 What we do Our work Our approach Other information

2

Mitochondrial Biology, Biostatistics and The research will be supported by Sanger science Epidemiology Units. a wide range of Institute researchers. The Functional Genomics work will at the heart of Three of the four key areas of research collaborate with Sarah Teichmann’s team focus for the Centre are led by Sanger and the Human Cell Atlas project, with research Institute-affiliated scientists: additional input from Daniel Gaffney’s and Gosia Trynka’s teams on computational excellence approaches and human induced pluripotent stem cells. The Population and Data Sciences In 2019, the British Heart studies will draw on the knowledge of Matt Foundation (BHF) doubled Cardio-metabolic Hurles’ team when combining genomic data its funding for the Cambridge Medicine with medical records. – Associate Faculty Professor The Centre will also call on state-of-the-art BHF Centre of Research Antonio Vidal-Puig Excellence to enable five more computational analyses conducted on the Functional Genomics Institute’s high-performance computers. years of exploration into the While the Institute’s sequencing, genotyping, – Faculty Professor Nicole underlying biology, diagnosis, single-cell, and genome engineering teams Soranzo will enable experimental exploration and prevention and treatment confirmation of the centre’s discoveries. of . Population Sciences – Faculty Professor John Danesh Through this work, the Sanger Institute will he Sanger Institute has extensive Vascular Medicine help to drive the development of new clinical involvement with the centre, which products, tools and applications, particularly – Professor Ziad Mallat, University brings together teams from the for pulmonary hypertension, and large artery T of Cambridge University of Cambridge, the Sanger and small vessel stroke. Institute, Babraham Institute, and MRC

3

The hackathon had a strong entrepreneurial Hacking our flavour, bringing together academic and industry researchers in 18 multidisciplinary genomic future teams. Each group then tackled one of five sponsored challenges, supported by 20 For two days in July 2018, mentors from incubator and accelerator the Wellcome Genome programmes, and Cambridge health Campus Conference Centre technology companies. became a ‘Dragons’ Den’ After an intense period of analysing vast data sets and resource development with of challenge, innovation, support from mentors, day two saw the and translation of genomic teams pitch their viability and value of their science into medical benefit. solutions to a panel of business leaders, health technology company CEOs and n the UK’s first Genomes and Biodata leading researchers. Hackathon, 111 data scientists, genomics researchers, usability experts, patient The event produced a range of solutions; I from a mobile, AI-augmented device that advocates and entrepreneurs sought to hack the future of healthcare. allows patients with gut diseases to monitor their own microbiome to a solution that The Institute has a strong history of uses known drug-symptom relationships 111 to identify candidates for drug repurposing. developing and scaling genomic technologies data scientists, genomics to benefit healthcare. To drive the next wave researchers, usability experts, of innovation, the Sanger Institute’s The winning solutions are attracting further interest from industry and it is hoped that patient advocates and Enterprise and Innovation team worked with entrepreneurs sought the Genome Campus’ Connecting Sciences a number will be taken forward for further development. In fact, one team to hack the future Advanced Courses and Scientific of healthcare Conferences team to create a heady presented their idea to the NHS Chair of 48 hours of collaboration and ideation. Sir Munir Pirmohamed.

Wellcome Sanger Institute Highlights 2018/19 51 Other information Image Credits

Image Credits All images belong to Wellcome Sanger Institute, Genome Research Limited except where stated below:

Outside Page 22 – Brain PET scan – Jens Maus, Wikimedia Commons Front Cover – Plasmodium parasite inside a red blood cell Child hands and blocks – FeeLoona, pixabay.com – National Institute of Allergy and Infectious Diseases Page 23 – Hip replacement – NIADDK, 9AO4 (Connie Raab- (NIAID)/Rocky Mountain Laboratory contact); NIH Neisseria gonorrhoea bacteria – National Institute of Page 24 – Foetal scan – Dr Wolfgang Moroder, Wikimedia Allergy and Infectious Diseases, National Institutes Commons of Health Mother and baby – iStock Cancer cell – Shutterstock Page 25 – Hepatitis C virus – iStock Inside Page 26 – MRSA cells – National Institute of Allergy and Front Cover – E. coli bacteria on Trichuris egg – Dave Goulding, Infectious Diseases Wellcome Sanger Institute, Genome Research Limited Page 27 – Neisseria gonorrhoea bacteria – National Institute Page 3 – Child having an injection – Shutterstock of Allergy and Infectious Diseases, National Institutes Mother and daughter – iStock of Health Page 5 – Mosquito – Kevin Mackenzie, University of Schistosoma mansoni worm – Dave Goulding, Aberdeen, Wellcome Images Wellcome Sanger Institute, Genome Research Grandmother and daughter – sasint, pixabay.com Limited Plasmodium parasite inside a red blood cell – Page 28 – Vibrio cholerae bacteria – Dave Goulding, Wellcome National Institute of Allergy and Infectious Diseases Sanger Institute, Genome Research Limited (NIAID)/Rocky Mountain Laboratory Dhaka – rahmatullah77, pixabay.com Vibrio cholerae bacteria – Dave Goulding, Wellcome Page 29 – Mother and child hands – iStock Sanger Institute, Genome Research Limited Page 30 – Gorilla – Franck Prugnolle (CNRS/IRD/CIRMF) Page 6 – Acute myeloid leukaemia cells – National Page 31 – Plasmodium parasite inside a red blood cell Cancer Institute – National Institute of Allergy and Infectious Diseases Forearm – stevepb, pixabay.com (NIAID)/Rocky Mountain Laboratory Golden eagle – dawnydawny, pixabay.com Plasmodium parasites inside mosquitoes – Claire Schistosoma mansoni worms – Dave Goulding, Sayers, Wellcome Sanger Institute, Genome Wellcome Sanger Institute, Genome Research Limited Research Limited Page 9 – Mother and baby – iStock Page 32 – Cows – Grant Brookes, unsplash.com Foetal scan – Dr Wolfgang Moroder, Wikimedia Staphylococcus aureus – Annie Cavanagh, Commons Wellcome Images Vibrio cholerae bacteria – Dave Goulding, Wellcome Page 33 – Salmonella bacteria – CDC/James Archer Sanger Institute, Genome Research Limited Page 34 – Greater horseshoe bat – Marie Jullion, Wikimedia Page 11 – Mother and daughter – iStock Commons Acute leukaemia – Armed Forces Institute of Page 35 – Roesel’s Bush-Cricket – Ryan Hodnett, Wikimedia Pathology (AFIP) Commons Page 13 – Myelofibrosis Reticulin Stain – Ed Uthman, Golden eagle – dawnydawny, pixabay.com Wikimedia Commons Robin – Allan Cox, unsplash.com Vertebral osteoblastoma – Dr Henry Knipe, Turtle dove – zoosnow, pixabay.com Radiopedia.org rID 4885 Page 38 – Canadian Lynx – Eric Kilby, Wikimedia Commons Page 14 – Ewing sarcoma in the tibia of a child – Michael Mosquito – James Gathany, CDC/PHIL Richardson, Wikimedia Commons Page 39 – Golden eagle – Kdsphotos, pixabay.com Human kidney cortex – Menna Clatworthy, Page 45 – MRSA cells – National Institutes of Health University of Cambridge Page 48 – Neisseria gonorrhoeae – CDC/PHIL Page 15 – Red blood cells – Annie Cavanagh, Wellcome Page 49 – Houses of Parliament – Marcin Nowak, Images unsplash.com Page 19 – Pregnant woman – iStock Page 50 – Medicine pills – rawpixel, unsplash.com Page 20 – Mother and child – iStock Page 51 – Mouse hearts – NIMRC, MRC Page 21 – Sisters – With special thanks for permission from their mother

52 Wellcome Sanger Institute Highlights 2017/18 What we do Our work Our approach Other information

Other information Institute Information

Wellcome Sanger Institute Highlights 2018/19 The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London NW1 2BE.

First published by the Wellcome Sanger Institute, 2019.

This is an open-access publication and, with the exception of images and illustrations, the content may, unless otherwise stated, be reproduced free of charge in any format or medium, subject to the following conditions: content must be reproduced accurately; content must not be used in a misleading context; the Wellcome Sanger Institute must be attributed as the original author and the title of the document specified in the attribution.

Printed by Park Communications on FSC® certified paper. Park is an EMAS certified company and its Environmental Management System is certified to ISO 14001. This document is printed on Soporset Offset, a paper containing 100% virgin fibre sourced from well-managed, responsible, FSC® certified forests and other controlled sources. Designed and produced by

Wellcome Sanger Institute Highlights 2017/18 53 Wellcome Sanger Institute Tel: +44 (0)1223 834244 sanger.ac.uk