<<

Science at scale

Highlights 2017/18 Contents

02 What we do

03 Director’s Introduction

04 Wellcome Sanger Institute at a Glance What we do we What 25 years of 08 Our work pushing the 10 , Ageing and Somatic Our work Our boundaries of 18 Cellular breakthrough 22 Genetics Our approach Our 28 Infection

34 Other information 40 Our approach

42 Scale

44 Innovation Through our big and bold ideas, scientific independence and 46 Culture cutting-edge infrastructure, we engage in long-term exploratory 48 Influence projects that influence science and impact people’s lives on 50 Connections a global scale. 52 Other information

52 Image Credits

53 Institute Information

Wherever you see this dark blue text or yellow, click to find sanger.ac.uk more information.

Wellcome Sanger Institute Highlights 2017/18 1 What we do Director’s Introduction his year the Wellcome Sanger Institute celebrates its 25th Anniversary. T We have come a long way in that time: from helping to deliver the first reference human to asking bold questions about the genomics underlying health and . We are proud of our past, but excited about our future.

The influence of genomics has progressed in the same way, with society and the do we What NHS embracing genomic technology to transform and lifestyle. While the interrogation and interpretation of , once the field solely of data specialists, is now being carried out by schools across the UK. This remarkable shift is encapsulated in Genome Decoders: Whipworm a pioneering collaboration work Our between Connecting Science, Sanger researchers, the Institute for Research in Schools and school students.

Our researchers are now able to interrogate an adult genome to understand its entire mutational history all the way back to the 4-8 stage of the embryo and discern approach Our the injuries and insults suffered by the genome on the way. They survey and understand the landscape of diversity and mutation within spread across countries and continents to discern the rise of antibiotic and pesticide resistance. They develop new techniques to create new models of disease that more closely Other information mirror the human condition.

Our scientists are setting new challenges and goals that seem as impossible now as the Project did just 25 years ago. In scale: from the Human Cell Atlas – that aims to unpick the entire human body cell by cell – to sequencing the genomes of all life on earth. In new scientific fields: such as synthesising genomes. In health: from developing early warning systems for infectious disease outbreaks to penetrating malaria’s ever-changing defensive coat to produce effective vaccines and treatments.

As we look to the next 25 years, the Sanger will continue to evolve its science focus to navigate and lead in these emerging fields.

Professor Sir Mike Stratton, Director Wellcome Sanger Institute

2 Wellcome Sanger Institute Highlights 2017/18 Wellcome Sanger Institute Highlights 2017/18 3 At a Glance 2017 timeline Year in numbers

Apr 558 5.203 2018 Sequencing centre In 2017, we read The human Petabases Institute outputs approx. the genomes of genome is approx. 4.639 Pb publications Sanger responds to in 2017 5,400bn 3bn Government Life Sciences Industrial DNA bases a day 547 First bases long strategy different species in life discovered 3.025 Pb

Sequencing See page 13 do we What Cell We read centre produces 4 the equivalent the equivalent of Hidden MRSA of one gold-standard 420 14 new childhood outbreaks 1.910 Pb (30x) human genome every developmental 2/3rds 18 gold-standard disorders found detected by (30x) human Malaria needs Takeda joins Open 24 mins routine genomic Nature Genetics 1.054 Pb genomes a by DDD 2/3rds of its Targets initiative 27

Isolated work Our for growth surveillance See page 44 week See page 22 Greeks reveal See page 35 See page 30 New healthy heart NIHR funds Global Journal of Medicine 2 secrets Health Research Unit £20 See page 50 Science See page 25 7 Jan 2015 Jan 2016 Jan 2017 Jan 2018 cing Insecticide uen Cen million Richard Durbin q tr Professor Ele resistance Se e recognised with approach Our Sanger wins Zeggini becomes Royal Society’s in African £20 million CRUK a World Gabor Medal mosquitoes Grand Challenge Economic Forum mapped See page 12 Young Scientist See page 34

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Who published Where Sanger How much DNA our work? staff are from* was sequenced? Other information Cancer knowledge Dr Matt Hurles Number of banks are feasible elected to mutations needed See page 16 Academy of for cancer Medical Sciences discovered Dr Sarah New human cell See page 11 D Teichmann made model reveals HipSci delivers Modern genomes ata ntre Helmholtz chlamydia drug UK’s largest show Biblical Human Cell Atlas 25 Genomes to Ce International data from first targets human stem Caananites Type 2 be sequenced See page 29 cell resource 1 million cells survived test to celebrate See page 20 announced See page 18 25 years See page 26 may misdiagnose See page 49 1,190 See page 25 New malaria Inflammatory Usable storage Bowel Disease 20,000 in the Data centre vaccine suspects pinpointed 62 centre-based target found See page 23 North high performance 9 55PB See page 37 America compute cores Middle 115 East 16 Asia Pacific Total number of Africa compute cores 10 6,000 160GB/sec Latin America 26,000 cloud-based flexible Network backbone compute cores speed

* In December 2017

4 Wellcome Sanger Institute Highlights 2017/18 Wellcome Sanger Institute Highlights 2017/18 7 Our work With secured funding from Wellcome, we are able to strategically focus our work in five key research fields

10 Cancer, Ageing and do we What Somatic Mutation Provides leadership in data aggregation and informatics innovation, develops high- throughput cellular models of cancer for genome-wide functional screens and drug testing, and explores somatic mutation’s role

in clonal , ageing and development. work Our

18 Cellular Genetics Explores human function by studying the impact of genome variation on cell . Large-scale systematic screens are used to discover the impact of naturally-occurring and engineered genome mutations in human approach Our iPS cells, their differentiated derivatives, and other cell types.

22 Human Genetics Applies genomics to population-scale studies to identify the causal variants and pathways Other information involved in human disease and their effects on cell biology. It also models developmental disorders to explore which physical aspects might be reversible.

28 Infection Genomics Investigates the common underpinning mechanisms of evolution, infection and resistance to therapy in bacteria and parasites. It also explores the genetics of host response to infection and the role of the microbiota in health and disease.

34 Malaria Integrates genomic, genetic and proteomic approaches to develop and enhance high‑throughput tools and technologies to study specific biological problems relevant for malaria control and to understand the fundamental science of the human host, the mosquito vector and the Plasmodium pathogen.

8 Wellcome Sanger Institute Highlights 2017/18 Wellcome Sanger Institute Highlights 2017/18 9 Our work Cancer, Ageing and Somatic Mutation

2 The results have profound implications. How cancer evolves Since somatic cells have such a remarkable tolerance of mutations, the findings highlight Number of mutations needed Cancer, Ageing nnovative methods adapted from the important role of individual mutations to cause cancer evolutionary biology have provided a in cellular and organismal ageing. Inew way of viewing cancer development. and Somatic The findings have wide-ranging implications The team discovered that out of the many 6 for our understanding of cell ageing, accumulated mutations acquired over the hunt for cancer driver mutations, a person’s lifetime just one to ten key Mutation and precision oncology. alterations in a cell are required to cause cancer, depending on the tumour type. can be seen as models of 4 The results also reveal that approximately

evolution in action, with cancer cells do we What half of positively selected driver mutations evolving as they accumulate genetic are not in genes previously implicated in changes that provide a selective growth cancer, suggesting that many cancer- advantage. Now that abundant cancer driving changes remain to be discovered. 6 genome sequence data are available, Sanger scientists have seized the In addition, these methods provide a way 16% opportunity to study populations of to assess whether or not specific mutations cancer cells with the same approaches 1 of patients had are truly driving cancer – an essential step 4 mutations affecting developed to explore the evolution of in generating the robust knowledge base work Our the PI3K signalling populations of organisms. that will be needed to apply genomics to Bone cancer pathway clinical decision making and deliver an era In a landmark study published in Cell, of precision oncology. Sanger researchers interrogated data from 2 drug leads more than 7,500 cancers across 29 cancer types.1 Remarkably, species and cancers evolve in diametrically opposed ways. Applying two of the Sanger Institute’s key strengths Species evolution is typically characterised Reference 1. Martincorena I et al. Universal patterns of approach Our – long-term collaboration and whole-genome analysis by negative selection – the loss of mutations selection in cancer and somatic tissues. 11 because they lower fitness. Yet this is hardly Cell. 2017; 171: 1029–1041.e21. – to rare bone cancers has revealed how existing ever seen in cancer. Instead, cancer drugs may offer new treatment options. development is dominated by positive selection – preserving the handful of driver hordoma and osteosarcoma have Similarly, the largest whole-genome mutations that give a cell a competitive been the latest two cancer types sequencing study yet undertaken in growth advantage. to benefit from Sanger scientists’ osteosarcoma – the most common form of C Other information long-standing relationship with University bone cancer in children and young adults In this section College London Cancer Institute and the – could also deliver rapid patient benefits. Data from more than 7,500 Royal National Orthopaedic Hospital NHS Of the 112 sequenced tumours investigated, cancers from 29 tumour types 1 were analysed Trust. In two studies published in Nature 7 per cent had mutations in genes involved Bone cancer drugs leads Across cancer types a relatively Communications, the researchers in insulin-like growth factor signalling, consistent small number of 2 demonstrated how combining clinical which is likely to play a key role in the How cancer evolves samples and knowledge with whole-genome control of bone growth.2 mutated genes is required to sequencing and analysis delivers insights convert a single normal cell into 3 into the molecular biology of bone cancer Crucially, drugs targeting this pathway, a cancer cell, but the specific 1,2 IGFR1 inhibitors, already exist. Although Sanger takes on mutational that could inform future treatments. genes differ according to cancer signatures Grand Challenge trials of IGFR1 inhibitors in osteosarcoma 4 In 2013, this approach uncovered cancer- have shown limited benefits, further studies type. This increasingly precise First mutations in human driver mutations shared by almost all forms concentrating on patients with mutations understanding provides the life discovered of chondroblastoma and giant cell bone affecting insulin-like growth factor signalling foundation for the discovery 3 5 tumours. In 2017, applying the same could enhance the chances of success, and use of targeted therapies.” Key genes in cancer method to more than 100 chordomas – rare paving the way for personalised treatments.

suppression found cancers of the skull and spine – revealed Professor Sir Mike Stratton that 16 per cent of patients had mutations 6 An author of the study and Director pinpoint affecting the PI3K signalling pathway.1 of the Wellcome Sanger Institute best breast cancer treatments PI3K inhibitors are already being used to References 7 treat a range of cancers, and represent a 1. Tarpey PS et al. The driver landscape of sporadic Key role of epitranscriptomics possible new option for chordoma patients chordoma. Nat Commun. 2017; 8: 890. in leukaemia carrying PI3K pathway mutations. 2. Behjati S et al. Recurrent mutation of IGF 8 signalling genes and distinct patterns of genomic Knowledge bank for Approximately one quarter of cases had rearrangement in osteosarcoma. Nat Commun. 2017; 8: 15936. precision oncology an additional copy of the brachyury gene, 3. Behjati S et al. Distinct H3F3A and H3F3B driver 9 previously implicated in inherited forms of mutations define chondroblastoma and giant cell Antique childhood tumours give chordoma. In addition, some 10 per cent tumor of bone. Nat Genet. 2013; 45: 1479–82. new understanding of cases were linked to mutation of a novel 10 cancer gene, LYST. In the longer term, How organoids reveal cancer’s these genes could be valuable leads true diversity for new drug development.

10 Wellcome Sanger Institute Highlights 2017/18 Wellcome Sanger Institute Highlights 2017/18 11 Our work Our work Cancer, Ageing and Somatic Mutation Cancer, Ageing and Somatic Mutation

4 First mutations in human life discovered 8 5 Partners Project years

In Nature, Sanger researchers reported how they do we What traced, in adult cells, mutations that arose when 5,000 an embryo was just hours old. bowel, kidney, s cells divide, mutations inevitably back as the two-, four- and eight-cell stage signatures to characterise mechanisms 50 arise, albeit very rarely. As adults of development.1 of DNA damage, the researchers found 20 oesophageal and mutational we are, therefore, genetic mosaics, that two particular mutational processes Grant (£ millions) signatures pancreatic cancers A Moreover, these mutations could be used work Our our cells carrying different combinations of predominate: those responsible for mutations depending on their evolutionary as markers to track the descendants of signatures 1 and 5.2 Ultimately, the journey from a single fertilised egg. these primordial cells. Strikingly, this techniques will enable studies of adult Moreover, a mutation that arose early in life revealed that the two cells in a two-cell cells to provide new insights into will be carried by many cells in the body. embryo are not equivalent – one typically embryonic development. gives rise to 70 per cent of adult body Researchers in the Cancer, Ageing and tissues. This skewing was also seen Somatic Mutation programme have at later rounds of cell division.

exploited this fact to develop statistical tools approach Our 12 The work also enabled the team to References 5 Signatures with an that can ‘date’ when mutations arose. They 1. Ju YS et al. Somatic mutations reveal asymmetric determine that, on average, three mutations Continents involved unknown cause generated whole-genome sequences using cellular dynamics in the early human embryo. samples from 279 individuals, and occur at each round of division, more Nature. 2017; 543: 714–718. identified 163 mutations that arose very than previously thought. In addition, by 2. Alexandrov LB et al. Signatures of mutational early in development – remarkably, as far applying their understanding of mutational processes in human cancer. Nature. 2013; 500: 415–21.

3

In the Grand Challenge initiative – one of Other information Sanger takes just four supported in the first round of funding – an international team will study on mutational 5,000 pancreatic, kidney, oesophageal and bowel cancers from five continents. signatures Grand The study will generate exceptionally high-resolution data on the mutational Challenge processes operating in these cancers, of potential value in diagnosis and predicting anger researchers are leading an prognosis. Furthermore, data on exposures international quest to identify the could reveal the key factors triggering DNA causes of cancer-driving mutations, S damage – opening up new opportunities funded by a £20 million Grand Challenge for cancer prevention. award from Cancer Research UK.

It has recently become clear that cancer cell genomes contain distinct patterns of DNA damage – mutational signatures – Reference Whole-genome that give important clues to the origins of 1. Alexandrov LB et al. Signatures of mutational sequencing processes in human cancer. Nature. 2013; of 279 individuals cancer. DNA-damaging and cancer-causing 500: 415–21. insults such as tobacco smoke and identified ultraviolet radiation all leave distinctive scars in the genome. Cancer genome sequencing and innovative 163 computational tools have revealed dozens of mutational signatures.1 Some have mutations from the been linked to external exposures or earliest stages to mechanistic processes, such as the of life breakdown of mechanisms that repair DNA and maintain the integrity of the genome. However, the underlying causes of many mutational signatures remain a mystery.

12 Wellcome Sanger Institute Highlights 2017/18 Wellcome Sanger Institute Highlights 2017/18 13 Our work Our work Cancer, Ageing and Somatic Mutation Cancer, Ageing and Somatic Mutation

5 6 anger researchers have gained Similarly, a further study of 640 breast Mutational new insights into the mutational cancer genomes found that 11 had a Key genes in cancer Sprocesses underlying breast cancer mutational signature associated with signatures pinpoint – knowledge that could be used to identify an abnormality in another DNA repair patients likely to benefit from particular mechanism, mismatch repair, not previously suppression found best breast anti-cancer drugs. seen before in breast cancer.2 A class of drugs known as checkpoint inhibitors has Large-scale studies in mice have identified genes cancer treatments Sanger research has revealed that mutagenic been used to treat colorectal cancers agents and processes can damage with defective mismatch repair, and could that slow the emergence of cancer and prevent its genomes in distinctive ways, creating potentially provide another option for deadly spread. ‘mutational signatures’ that may shed light women whose breast cancers show on the causes of DNA damage and the the same patterns of DNA damage. ouse models are an important element which, when mobilised, disrupts Sanger researchers identified 23 genes origins of cancer. Furthermore, drugs do we What complement to human studies Pten as well as any gene into which it whose loss influenced colonisation, either developed for one type of cancer could In addition, using mutational signatures to Mfor identifying genes involved in inserts. An analysis of 278 prostate, breast accelerating or inhibiting metastasis.2 potentially be active against other types identify damage to mismatch repair genes cancer – either driving tumour formation or and skin cancers that developed after sharing the same mutational signature. was more accurate than using genetic and protecting against it – and for exploring their activation of the transposable element, The biggest impact was linked to loss epigenetic sequencing alone. Standard function. In particular, genetic manipulation published in Nature Genetics, revealed of the Spns2 gene, previously implicated In breast cancer, for example, so-called genetic analysis identified only six out of the enables studies to be carried out in mice hundreds of mutations that cooperated in immune cell function but not cancer PARP inhibitors have been developed to 11 people affected, in contrast mutational that would be impossible in . with Pten.1 biology. The results, published in Nature, treat patients with cancers caused by signatures highlighted all 11. This suggests highlight the exciting possibility of targeting BRCA1 and BRCA2 mutations, which that patients could be stratified into different work Our This approach has been used to conduct a Focusing on five of the most promising Spns2 to influence immune system disrupt DNA repair and are associated categories for treatment based on their large-scale screen for genes that inhibit the leads, the team showed that disrupting function and limit the metastatic spread with a specific mutational signature. mutational signatures. development of cancer – so-called tumour these genes in human cell lines drove of cancer cells. While BRCA1 and BRCA2 mutations are suppressor genes – which are frequently cancerous changes. The work has associated with 1–5 per cent of breast lost in cancer cells. therefore identified a wealth of new cancer cases, a Sanger-led study tumour suppressors acting in concert with published in Nature Medicine found It is increasingly clear that genes involved References Pten to prevent cancer developing, opening References that up to 20 per cent of tumours show 1. Davies H et al. HRDetect is a predictor of BRCA1

in cancer do not act in isolation; their up new avenues of therapy development. 1. de la Rosa J et al. A single-copy Sleeping Beauty a BRCA-like mutational signature.1 PARP and BRCA2 deficiency based on mutational approach Our impact often depends on other mutated transposon mutagenesis screen identifies new signatures. Nat Med. 2017; 23: 517–525. PTEN-cooperating tumor suppressor genes. inhibitors could therefore potentially be genes present in a cell. For example, Studies in mice have also generated 2. Davies H et al. Whole-genome sequencing reveals Nat Genet. 2017; 49: 730–741. of benefit to many more women. mutations in the PTEN tumour suppressor important new insights into metastasis, breast cancers with mismatch repair deficiency. – the second most commonly mutated the spread of cancer around the body. 2. van der Weyden L et al. Genome-wide in vivo Cancer Res. 2017; 77: 4755–4762. screen identifies novel host regulators of metastatic gene in human cancers – typically act in Metastasis is driven by genetic changes colonization. Nature. 2017; 541: 233–236. combination with other mutated genes. in cancer cells, but also depends on how receptive body tissues are to colonisation To identify other tumour suppressors that by cancer cells. By systematically assessing 7 Other information cooperate with Pten, Sanger researchers 800 mouse strains lacking specific genes, METTL3 was found to bind to the developed an innovative transposable Key role of region of a suite of genes, in the presence of a known as CEBPZ. At these epitranscriptomics genes, METTL3 methylated the protein- coding section of RNA transcripts, leading Studying cancer in mouse in leukaemia to more efficient translation and increased models has allowed Sanger synthesis of the corresponding . ene editing screens have identified scientists to discover five Among these proteins were several known an RNA-modifying that genes that help to suppress to drive proliferation of AML cells. tumour development G plays a critical role in acute myeloid leukaemia (AML), and could be an The study identified METTL3 as part of a important target for new therapeutics. pathway critical to the development of AML, making the protein an exciting new target that modify RNA – for example by for a cancer that kills two out of every adding methyl groups to RNA bases – are three people affected. turning out to be important regulators of gene activity. To explore their contribution to cancer, Sanger researchers and their colleagues used CRISPR-Cas9 gene editing in systematic screens for genes necessary Reference 1. Barbieri I et al. Promoter-bound METTL3 maintains for the growth of mouse leukaemia cells. myeloid leukaemia by m6A-dependent translation This screening, published in Nature, control. Nature. 2017; 552: 126–131. revealed 46 genes coding for RNA-modifying enzymes critical to leukaemia cell growth.1 23 One of the strongest effects was seen after genes identified by Sanger knockout of METTL3, which methylates A researchers whose loss residues in RNA transcripts. The team went influenced colonisation, either on to show that loss of METTL3 slowed the accelerating or inhibiting proliferation of cultured mouse and human metastasis Acute myeloid leukaemia: leukaemia cells, and made cells less likely METTL3 is part of a pathway to seed new leukaemias when injected into that plays an important role in mice. Crucially, loss of METTL3 had no AML development, making the detrimental impact on normal cells. protein an exciting drug target

14 Wellcome Sanger Institute Highlights 2017/18 Wellcome Sanger Institute Highlights 2017/18 15 Our work Our work Cancer, Ageing and Somatic Mutation Cancer, Ageing and Somatic Mutation

8 Knowledge bank for precision oncology How organoids reveal n Nature Genetics, Sanger researchers cancer’s true diversity and clinical colleagues described how More than Ithey developed a prototype ‘knowledge bank’ to show how genomic information could guide clinical decision-making in acute myeloid leukaemia.1 The database incorporated genomic and clinical data from 1,500 more than 1,500 patients, with statistical patients incorporated ll tumours are genetically different, single-cell genomic techniques, has genetically different to each other. In do we What tools providing insight into likely prognosis onto the database even if they occur in the same the potential to be applied at scale to addition, the mutational processes at and preferred treatment options for organ. And each tumour contains investigate the evolution of numerous work in the cancer cells were markedly individual patients. Building this knowledge geneticallyA different tribes of cancerous tumour types. distinct from those seen in normal cells. bank was an important milestone on the cells competing for dominance.1 Now These cancer-specific processes offer route to precision medicine in oncology, Sanger cancer researchers, working The researchers isolated individual cells opportunities to further understand and has pinpointed the challenges that closely with researchers at the Hubrecht from cancerous and healthy portions of a how cancers develop and provide need to be overcome to deliver the vision Institute, have shown that every cell person’s colon, and then grew them into novel targets for therapeutics. in medical practice. within every tumour is genetically organoids – 3D clusters of cells that mimic work Our different to each other. the gut – to stably amplify the numbers of cells for study. In this way the team was Writing in Nature, the team describes able to interrogate the genetic, epigenetic, References Reference the first study to combine cutting-edge transcriptomic and functional differences 1. Fischer A et al. High-definition reconstruction of 1. Gerstung M et al. Precision oncology for acute organoid techniques with single-cell between neighbouring healthy and tumour clonal composition in cancer. Cell Reports 2014; myeloid leukemia using a knowledge bank approaches to interrogate individual cells from the same tissue. 7: 1740–1752. approach. Nat Genet. 2017; 332–340. 49: cells within the same colorectal cancer 2. Roerink S et al. Intra-tumour diversification in 2 colorectal cancer at the single cell level. Nature. tumour. Their work, which exploits an They found that the tumour cells had approach Our 2018; Apr 11. doi: 10.1038/s41586-018-0024-3. innovative methodology to overcome many more mutations than healthy cells many of the problems inherent in current and that each cell in a tumour was

Gut organoids for single-cell insights – using mini 3D clusters 9 Other information of cells can overcome the Antique childhood problems with current tumours give new single-cell techniques understanding ntil now, some cancers have been Some cancers so rare that only limited insight only appear into their genetic basis could be 3 or 4 times in U obtained. But now a joint project between Sanger researchers and pathologists at Great Ormond Street Hospital for Children is opening up a century’s worth of hospital 50 1 sample collections for genomic scrutiny. years Technological advances now enable sequence information to be obtained from tissue samples fixed with formalin and embedded in paraffin wax. In Lancet Oncology the scientists detail likely driver mutations in three rare childhood cancers dating back to the 1920s. The team hopes to apply the same approach to more of the hospital’s archive to explore other exceptionally rare cancers.

Reference 1. Virasami A et al. Molecular diagnoses of century-old childhood tumours. Lancet Oncol. 2017; 18: e237.

16 Wellcome Sanger Institute Highlights 2017/18 Our work Cellular Genetics

1 2 adult cell, which are switched on by the OPTi-OX is a new method High-speed, doxycycline-dependent factor. that can generate neuron and Addition of doxycycline therefore drives skeletal cells in days instead Cellular of weeks One million cells down… high‑volume cells rapid and efficient cellular reprogramming. Just one year after its launch, the Human Cell Atlas, on demand Using this method, known as OPTi-OX, the Genetics Sanger team successfully generated large an ambitious international partnership to map the n innovative new method is enabling numbers of neurons and skeletal muscle human body cell by cell has sequenced, analysed researchers to generate large cells within days, and developed a protocol numbers of precisely defined cell and characterised more than one million cells. A for production of human oligodendrocytes. types in days instead of weeks. The platform could theoretically be adapted to generate many other cell types –

To explore the properties of specialised do we What potentially including novel cell types cells, researchers are increasingly culturing identified by the Human Cell Atlas project. and driving the differentiation of human pluripotent stem cells (hPSCs), embryonic stem cells and induced pluripotent stem cells. However, differentiation of cultured hPSCs is typically slow and inefficient. Reference The data will provide an entry point 1. Pawlowski M et al. Inducible and deterministic forward programming of human pluripotent stem Now, Sanger scientists have developed

for deeper study of cells’ functions cells into neurons, skeletal myocytes, and work Our and interactions, both within their a new method that is both quicker and oligodendrocytes. Stem Cell Reports. 2017; 8: 803–812. home tissues and throughout the generates larger numbers of differentiated cells.1 The technique is based on the body. Such knowledge will, over introduction of two gene cassettes into time, have a transformative effect ‘genomic safe harbours’ – sites where on the understanding of human integration of a new genetic element biology and health.” will have no adverse impact on the cell. The first cassette is permanently active approach Our Nerve fibres of a healthy Dr Sarah Teichmann and produces a Head of Cellular Genetics at the Sanger adult brain: by generating that is activated only in the presence Institute and co-Chair of the Human Cell large quantities of neurons, of an antibiotic, doxycycline; the second Atlas Organising Committee scientists can explore the cassette contains transgenes required to cellular biology of otherwise reprogramme the hPSC into a specialised difficult-to-access organs

3 Other information Bile duct cells: organoid nnovative technologies for analysing Employing these techniques, the Human technology has been Growing individual cells are opening up new Cell Atlas has investigated more than one successfully used to Iopportunities to define the myriad cell million cells collected from bone marrow grow new bile ducts for replacement organs types of the human body, and to explore and cord blood from healthy human donors. transplant surgery in mice how their functions relate to the genes The single-cell RNA expression data of multidisciplinary team including The isolation of cholangiocytes from active within them. An atlas mapping these immune cells form the foundation of Sanger researchers has used tissue accessible sites in the body, rapid every cell type in the body would be an openly available resource that will allow A engineering to generate artificial production of large numbers of functioning a tremendous boon for biomedical the global scientific community to further bile ducts from cultured cells, and shown cells in culture, and successful seeding research, and is the goal of a global their research. Over time, as the initiative’s that they function effectively in mice. of scaffolds are significant steps towards partnership led by an organising data are made freely available, an open the development of replacement tissue In this section committee, co-chaired by Sanger and globally accessible reference map Bile ducts carry bile from the liver to the for use in regenerative medicine. and researchers, of the human body will be built, in much intestine. Bile duct disorders are currently 1 encompassing 10 countries. Writing in the same way that the Human Genome difficult to treat, and are responsible for One million cells down… Nature, the Human Cell Atlas team has Project unlocked the human genome. 70 per cent of paediatric liver transplants. set out the goals and challenges facing 2 Reference 1 Artificial bile ducts created by tissue High-speed, high-volume cells this potentially transformative initiative. 1. S ampaziotis F et al. Reconstruction of the mouse engineering could offer an alternative on demand extrahepatic biliary tree using primary human For phase one, the collaboration seeks to approach to treatment. In pursuit of this extrahepatic cholangiocyte organoids. Nat Med. Reference 3 objective, a Cambridge-based team has 2017; 23: 954–963. Growing replacement organs collect and study between 30-100 million 1. Rozenblatt-Rosen O, Stubbington MJT, Regev A, individual cells from select organs and Teichmann SA. The Human Cell Atlas: from vision been able to extract human cholangiocytes tissues. The researchers use massively- to reality. Nature. 2017; 550: 451–453. – the cells that line the walls of bile ducts – 4 New, ultra-flexible stem cells recreate parallel single-cell RNA sequencing and grow them up in large numbers in earliest steps of life (a suite of genomic techniques capable Bile duct disorders cultures, maintaining their functional properties.1 The cells spontaneously formed 5 of identifying profiles are responsible for HipSci delivers new national stem in thousands of individual cells at a time), duct-like structures – organoids – and cell resource related technologies to characterise were also able to populate biodegradable 6 other molecules, and spatial methods scaffolds. As reported in Nature Medicine, Drug resistance: some mutations to map cells’ locations and interactions. 70% these structures could rescue a mouse need a little genetic help The results will give a firm foundation model of bile duct injury. of paediatric of understanding of how different cells liver transplants work with each other both in their home tissues and throughout the body.

18 Wellcome Sanger Institute Highlights 2017/18 Wellcome Sanger Institute Highlights 2017/18 19 Our work Our work Cellular Genetics Cellular Genetics

5 The different types of stem cell HipSci delivers new national stem cell Drug resistance: some resource reter otetil or reserch tretmets mutations need a little fertilised egg national consortium including Sanger researchers has unveiled a resource EPSC of comprehensively characterised genetic help Expanded Potential Stem Cells A placenta induced pluripotent (iPS) stem cell lines. yolk sac The Human Induced Pluripotent Stem Cell Initiative (HipSci) has generated

all cells in the body do we What and characterised 711 cell lines from igh genetic diversity in bacterial 8-cell embryo 301 healthy donors, publishing their initial and cancer cells results in poorer analyses in Nature.1 The work has revealed outcomes for people treated ESC that common genetic variation has a withH antibiotics or chemotherapy drugs. Embryonic Stem Cells significant impact on the properties of A massive study in yeast conducted by iPS cell lines, and has shed light on gene Sanger scientists has discovered why. all cells in the body regulatory regions and the impact of DNA rearrangements commonly seen in such The team cross-bred two distantly related cells. The detailed characterisation provides species of yeast, from Africa and the work Our blastocyst crucial information for those using the cell Americas, to create more than 10 million lines in their research. genetically distinct hybrids and followed their evolution in response to iPSC chemotherapeutic drugs1. This approach Induced Pluripotent Stem Cells revealed which mutations drive resistance, and how they are affected by other all cells in the body Reference 1. Kilpinen H et al. Common genetic variation drives mutations already present in the genome. molecular heterogeneity in human iPSCs. Nature. approach Our 2017; 546: 370–375. The results pointed to two types of driver reprogramme mutations: strong drivers whose presence always produce resistance, and weaker 4 ones that require the assistance of pre-existing background mutations to be effective. Further study will help to identify New, ultra-flexible stem which genetic profiles are more conducive

to resistance than others. Other information cells recreate earliest Human Induced Pluripotent Stem Cell Initiative (HipSci) has Reference generated and characterised steps of life 1 . Vázquez-García I et al. Clonal heterogeneity influences the fate of new adaptive mutations. Cell Rep. 2017; 21: 732–744. Sanger researchers have created primordial ‘expanded potential stem cells’ that can generate the 711 cell lines from placenta and yolk sac, as well as the embryo itself. 301 healthy mbryonic stem cells and induced potential stem cells’ (EPSCs) can be donors pluripotent stem (iPS) cells are highly induced to generate the other two types E flexible, able to generate all the of blastocyst stem cells that produce the different cell types of an adult organism. placenta and yolk sac. Writing in Nature, However, to date, they have not been the team has also shown that later-stage able to create the extra-embryonic tissues embryonic stem cells and iPS cells can (such as the placenta and yolk sac) that be converted into EPSCs. 10 million support an embryo’s development. Sanger genetically distinct hybrids scientists have developed a technique that EPSCs promise to be a valuable tool of yeast from Africa and completes the full picture and will allow for understanding the earliest stages of the Americas experimental genomic exploration of embryonic development and placenta miscarriage and developmental disorders. formation. Furthermore, the methods could also be applied to generate EPSCs The team inhibited the expression of for mammalian species where attempts proteins thought to drive differentiation to generate embryonic stem cells and of extra-embryonic cell lineages (mitogen- iPS cells have not yet succeeded. activated protein kinases, Src and Wnt/ Hippo/TNKS1/2), to extract and propagate cells from even earlier in development – 1 eight-cell mouse embryos. As well as Reference embryonic tissues, these ‘expanded 1. Yang J et al. Establishment of mouse expanded potential stem cells. Nature. 2017; 550: 393–397.

20 Wellcome Sanger Institute Highlights 2017/18 Our work Human Genetics

1 immune system function. Notably, integrins are part of pathways already being targeted Human therapeutically, suggesting that new Sequencing reveals new associations at common variants can provide therapeutically important leads.

Genetics causes of developmental Finally, Sanger-led research published in Nature has made important progress in the fine-mapping of susceptibility loci to identify disorders the precise genetic changes underlying increased risk of disease.3 Using high- 2 density tools specific for loci In this section A Sanger-led study of children with undiagnosed One Sanger-led study published in Nature implicated in immune-related disease, the do we What Pinpointing the Genetics explored the potential of low- team was able to nail down 18 specific 1 developmental disorders has provided hundreds coverage whole-genome sequencing to genetic changes with a high degree of Sequencing reveals new causes of families with a firm diagnosis. causes of IBD identify low-frequency variants affecting IBD confidence and a further 27 with a of developmental disorders risk.1 Sequencing of more than 4,000 cases anger-led analyses are generating reasonable degree of certainty (>50 per 2 n the Deciphering Developmental The results provide unprecedented insight revealed one significant finding – variation a short list of genetic prime suspects cent). The results provide 45 specific leads Pinpointing the causes of IBD Disorders (DDD) initiative, Sanger into the spectrum of developmental at the ADCY7 gene that doubled the risk that contribute to inflammatory for exploring mechanisms of disease and researchers and clinical geneticists from disorders across the UK. They suggest S of ulcerative colitis – but suggested that this identifying possible therapeutic targets. bowel disease.

I work Our 3 across the British Isles are using genome that one in every 300 children born in the type of variation makes little contribution Baby’s DNA influences mother’s sequencing to identify possible causes of UK has a rare developmental disorder Inflammatory bowel disease (IBD) is an to Crohn’s disease risk. Overall, the study pre-eclampsia risk unexplained developmental disorders, and caused by a new mutation – approximately exemplar of how genetic analyses can demonstrated that future projects should 4 to determine how genome-based approaches 2,000 children a year. Collectively, de novo provide insight into the causes and involve larger-scale and deeper sequencing, How losing a gene could help could be integrated into routine care. mutations account for more disorders References mechanisms of a complex disease. So far, to identify very rare variants of larger effect, 1. Luo Y et al. Exploring the genetic architecture of your heart than the common chromosomal disorders more than 200 loci that affect the risk of IBD and employing bigger GWAS were likely inflammatory bowel disease by whole-genome 5 Focusing on de novo mutations – those caused by the duplication of either sequencing identifies association at ADCY7. have been identified using genome-wide to elucidate greater insights. Why HbA1c diabetes test may fail not present in either parent but arising chromosome 13, 18 or 21. The researchers Nat Genet. 2017; 49: 186–192. association studies (GWAS), and additional 2. de Lange KM et al. Genome-wide association African Americans spontaneously during cell division after estimate that 400,000 babies are likely to In fact a GWAS and meta-analysis of nearly approach Our locations continue to be discovered. 6 conception – the DDD team sequenced the be born with a developmental disorder 60,000 subjects, also published in Nature study implicates immune activation of multiple Three studies published in 2017 by Sanger integrin genes in inflammatory bowel disease. Greeks bearing (welcome) of the genomes of more than caused by a de novo mutation every year Genetics, revealed 25 new IBD loci.2 scientists are providing the mathematical Nat Genet. 2017; 49: 256–261. genetic gifts 4,000 affected families (children and their across the globe. Among the most notable new discoveries tools to narrow down this pool to the most 3. Huang H et al. Fine-mapping inflammatory bowel 7 parents). The results, published in Nature, were disease-linked variants in three disease loci to single-variant resolution. Nature. DNA analyses reveal secrets valuable leads for therapeutic intervention. estimated new mutations in 42 per cent of In a follow-on study, also published in integrin genes, which have wide-ranging 2017; 547: 173–178. of human history 1 children studied. Around half of mutations Nature, the DDD team explored whether roles in cell adhesion, cell signalling and 8 were predicted to lead to complete loss of de novo mutations in regulatory elements UK helps unlock

gene function and half to altered function. controlling gene expression in the foetal 3 Other information the biology of osteoarthritis Almost one quarter of cases could be brain could account for cases where no For this study, published in Nature Genetics, assigned to existing syndromes, while de coding sequence mutations were identified. Baby’s DNA the researchers examined the other side novo mutations were identified in 14 genes Such mutations would not affect the of the coin: how foetal growth impacts that had not been previously linked to function of a protein but might disrupt influences mother’s maternal biology. The team carried out developmental disorders. carefully regulated developmental the first GWAS of the genomes of more programmes by affecting when, where, pre-eclampsia risk than 4,300 affected mothers’ offspring 4,300 or how much of a protein is produced. to search for genetic variants that might hy does pre-eclampsia – raised affected mothers’ offspring Sequencing of known control regions in have contributed to the development that can lead to were sequenced to search for nearly 8,000 cases – by far the largest such of pre-eclampsia.1 complications during pregnancy genetic variants that might study ever undertaken – revealed de novo W have contributed to the – occur? Is there a genetic component? mutations in 1–3 per cent of patients with The scientists discovered that genetic InterPregGen – an international study development of unexplained neurodevelopmental disorders. variants close to the fetal FLT1 gene pre-eclampsia involving Sanger scientists, and geneticists, increased the mother’s risk of pre- 200 30 The study proved the hypothesis that midwives and obstetricians from , eclampsia. The FLT1 gene codes for a noncoding regions of the genome are Iceland, Kazakhstan, Norway, the UK protein involved in placental growth and NHS clinical Research able to contribute to neurodevelopmental and Uzbekistan – has sought to find function, enhancing understanding of the geneticists organisations disorders, but also offered cause for the answers. mechanisms influencing pre-eclampsia risk. optimism: such DNA variations only rarely Pregnancy is a two-way biochemical have such an effect. In fact, the researchers conversation between mother and baby. found that less than 1 per cent of Yet genome-wide association studies noncoding changes were involved in 14 (GWAS) of maternal genomes have failed to Reference neurodevelopmental disorders.2 1. McGinnis R et al. Variants in the fetal genome near causative mutations conclusively identify any maternal factors FLT1 are associated with risk of preeclampsia. for development that increase the risk of pre-eclampsia. Nat Genet. 2017; 49: 1255–1260. disorders identified

References 1. Deciphering Developmental Disorders Study. 4,293 20,000 Prevalence and architecture of de novo mutations in developmental disorders. Nature. 2017; Families whose Approximate 542: 433–438. genomes have been number of human 2. Short PJ et al. De novo mutations in regulatory sequenced genes elements in neurodevelopmental disorders. Nature. 2018; 555: 611–619.

22 Wellcome Sanger Institute Highlights 2017/18 Wellcome Sanger Institute Highlights 2017/18 23 Our work Our work Human Genetics Human Genetics

5 It was this question that spurred 200 Why HbA1c scientists, including Sanger researchers, to carry out the largest international diabetes test meta-analysis of GWAS data on HbA1c levels, published in PLOS Medicine.1 may fail African They discovered that 11 per cent of African Americans carry an X chromosome variant Sanger Americans in the G6PD gene that leads to more rapid researchers recycling of haemoglobin, generating an genetic variant discovered in 11 per helped identify artificially low HbA1c reading. more than cent of African Americans could be responsible for large numbers of Relying solely on the HbA1c test to

A do we What missed diagnoses of diabetes. diagnose diabetes could mean that 1,300 2 per cent of African Americans – Type 2 diabetes is commonly diagnosed approximately 650,000 people – could genes whose function and assessed by monitoring levels of remain undiagnosed. The results emphasise could be completely lost glycated haemoglobin (HbA1c) in the the importance of alternative tests for without any obvious bloodstream. As blood sugar levels assessing type 2 diabetes in African impact on health increase, the glucose molecules attach to Over Americans or of combining HbA1c haemoglobin, forming HbA1c. Therefore monitoring with G6PD genotyping.

raised HbA1C levels indicate the presence work Our of diabetes. 65,000 However, levels of HbA1c in the body are African Americans could be not solely affected by the amount of glucose Reference misdiagnosed using HbA1C in the blood. Genome-wide association 1. Wheeler E et al. Impact of common genetic diabetes test determinants of Hemoglobin A1c on type 2 studies (GWAS) have identified 18 loci that diabetes risk and diagnosis in ancestrally influence HbA1c levels, some by affecting diverse populations: A transethnic genome-wide 4 blood glucose control, others by altering meta-analysis. PLOS Med. 2017; 14: e1002383. Our approach Our red blood cell function. Could such genetic How losing a gene factors affect HbA1c levels so strongly as to mask a person’s diabetes status? could help your heart Gene loss is not always harmful – sometimes it

6 Other information can even be beneficial. Greeks bearing ne way to understand how a gene Notably, because data had been collected works is to turn it off and observe on multiple aspects of participants’ (welcome) what happens. Normally this is only cardiometabolic health, the team was possibleO in biological models, such as cell able to explore whether these natural genetic gifts lines and organoids, or in model organisms gene knockouts had any impact on human like mouse or zebrafish. Such models are physiology. At least seven of the disrupted Sanger-led whole-genome 250 informative, but can never completely genes showed an association with sequencing study of 250 residents residents of Crete may have replicate the effect of gene loss in a fully cardiometabolic traits. A of the Greek island of Crete may revealed the secret to functioning human body. We’re now entering a new era have revealed why they remain anomalously a healthy long life One of the most striking discoveries was in genetics where we can healthy despite a diet rich in animal fat. Sometimes, however, nature provides. an association between loss of the APOC3 systematically examine what it Members of the isolated population of For a variety of reasons, some people are gene and clearance rates of dietary fats Mylopotamos had high levels of a genetic born with both their copies of a gene not from the bloodstream. Family members means for humans when parts variant – rs145556679 – associated with working. The results can be surprising. lacking APOC3 cleared fats quicker than of the blueprint are missing.” low bloodstream levels of triglycerides and Although we have only around 20,000 their relatives who had a functional APOC3 very-low-density lipoprotein, both important genes, a study published in Nature shows gene, and are likely to have a reduced Professor John Danesh risk factors for heart disease.1 This variant that a surprisingly large number may risk of developing . Director of the MRC/BHF Cardiovascular appears to be almost unique to the be inessential. The findings suggest that targeting Epidemiology Unit at Cambridge University Mylopotamos population. The study APOC3 could be a therapeutic strategy. and Associate Faculty at the Sanger Institute illustrates how remote and isolated Working with a population of 10,503 people populations may harbour unusual genetic in , an international consortium that variants that have a significant impact includes Sanger researchers identified more on human health and disease. than 1,300 genes whose function could be Reference completely lost without any obvious impact 1. Saleheen D et al. Human knockouts and on health.1 Most of the people affected phenotypic analysis in a cohort with a high rate had lost just one gene, but one individual of consanguinity. Nature. 2017; 544: 235–239. Reference had lost six. 1. Southam L et al. and in isolated populations identify genetic associations with medically-relevant complex traits. Nature Commun. 2017; 8: 15606

24 Wellcome Sanger Institute Highlights 2017/18 Wellcome Sanger Institute Highlights 2017/18 25 Our work Our work Human Genetics Human Genetics

7 DNA analyses reveal UK Biobank helps unlock secrets of human history the biology of osteoarthritis Sequencing projects have shone a

fascinating light on extreme isolationism, Papua New Guinea is Papua New Guinea’s genocide and assimilation around the most linguistically first settlers arrived diverse country in the the world. more than do we What world with approx. anger scientists, with It has a strong genetic component: up to By applying proteomics and RNA collaborators, have developed an 50 per cent of the variation in a person’s sequencing, the scientists identified 50,000 integrated approach that exploits risk of developing the disease is due five genes within the candidate regions 850 theS power of the UK Biobank resource to genetic factors. Yet the disease has whose levels of activity were significantly years ago and DECODE database in Iceland to proved remarkably resistant to genomic decreased and, therefore, are likely languages discover nine previously unknown investigation, yielding only 21 disease- to contribute to disease progression. regions of the genome associated associated locations in the genome. with osteoarthritis.1 By combining In addition to finding new drug targets 381 these findings with further genomic To unlock this complex condition, the scientists discovered that, within the work Our Groups with different techniques, wet lab experiments and Sanger researchers conducted the largest limits of their study, type 2 diabetes and Papua New Guinean Papua New Guinea languages have epidemiology the team was able to genomic study of osteoarthritis to date. high levels of lipids in the blood do not people from accounts for more than identify five promising targets for drug They studied 16.5 million DNA variations have causal effects on osteoarthritis, very development and uncover biological provided by the first release of data from but reaffirm that obesity does. 10% pathways involved in the disease. the UK Biobank and identified 173 85 candidate genetic variants. After using the of the world’s strong language groups This unified approach, deploying the DECODE data to refine and confirm their languages were DNA genetic differences skills of specialists across a range of findings in an independent population, Reference approach Our sequenced disciplines and data sources, could nine previously unidentified disease- 1 . Zengini E et al. Genome-wide analyses using UK help to unlock other, equally opaque associated genomic regions remained. Biobank data provide insights into the genetic architecture of osteoarthritis. Nature Genetics. complex . 2018; doi: 10.1038/s41588-018-0079-y. Tribes stopped To discover the biological pathways More than mixing between Osteoarthritis is a debilitating disease involved and offer new targets for drug that affects up to 40 per cent of people development, the team conducted over 70 in the UK. No drug treatments functional genomic experiments on

1 million 10,000– are available, with pain relief and surgery diseased and healthy cartilage cells. Other information genetic positions the only therapeutic option. were compared 20,000 years ago

he first detailed genetic study of seen in Europe and elsewhere following the Parsi are believed to have migrated to the people of Papua New Guinea, the emergence of agriculture and a switch what is now India from Persia in the 7-10th T published in Science, has provided from a hunter-gatherer lifestyle. Potentially, century. An international collaboration has a new glimpse into ancient populations later Bronze Age and Iron Age technological confirmed that the Parsi are genetically that have survived almost untouched developments, which did not take hold in distinct, more related to ancient Neolithic by the modern world.1 Papua New Guinea, may have been more Iranian populations than local Indian and important drivers of homogenisation. Pakistani populations.3 The genetic findings Genotyping of 381 Papua New Guinean are consistent with Parsi historical accounts people from 85 different language groups, In the Middle East, a comparison of five of assimilation into the Indian sub-continent by an international collaboration including genomes from 4,000-year-old Canaanite populations in the seventh century, Sanger researchers, has revealed that the skeletons and 99 modern-day Lebanese described as ‘Like sugar in milk’. country’s populations have been genetically genomes suggests that the enigmatic independent from Europe and Asia for Canaanites have left a substantial modern- most of the past 50,000 years. day genetic legacy. Bronze Age Canaanites developed an extensive civilisation but References Within Papua New Guinea, groups have largely vanished from history, in part 1. Bergström A et al. A Neolithic expansion, but speaking different languages were also because they left few written records. strong genetic structure, in the independent history genetically highly distinct from one another, According to Biblical accounts, the of New Guinea. Science. 2017; 357: 1160 –1163. demonstrating little mixing between tribes. Canaanites were annihilated, but the 2. Haber M et al. Continuity and admixture in the last Remarkably, despite being geographically genetic evidence points to their survival – five millennia of Levantine history from ancient quite close, groups living in the Canaanite and present-day Lebanese genome 90 per cent of Lebanese ancestry could sequences. Am J Hum Genet. 2017; 101: 274–282. mountainous highlands and lowland 2 be traced back to the Canaanites. 3. Chaubey G et al. “Like sugar in milk”: populations are genetically separated reconstructing the genetic history of the Parsi by some 10–20,000 years. Long-term survival is also the theme of population. Genome Biol. 2017; 18: 110. a separate study examining the genetic A further notable feature is the apparent heritage of the Parsi, a culturally distinct absence in Papua New Guinea of the South Asian population. Followers of genetic ‘homogenisation’ of populations the ancient religion of Zoroastrianism,

26 Wellcome Sanger Institute Highlights 2017/18 Our work Infection Genomics

1 2 provides powerful opportunities to explore Human cell model genetic factors affecting host responses Infection to infection.1 First, the team used cellular Humans, not reservoirs, shows our genes reprogramming technologies to create macrophages from human induced Genomics affect response pluripotent cells created by the Sanger drive cholera epidemics co-led HipSci project, and showed that the to Chlamydia cells behaved like bona fide macrophages in their interactions with chlamydia. y combining stem cell reprogramming and gene editing, In a second step, the researchers used Sanger researchers have created an CRISPR-Cas9 gene editing to eliminate

B do we What In this section An analysis of historical Vibrio cholerae isolates has innovative new model to study host genetic specific genes potentially involved in host factors that influence macrophage invasion responses to chlamydia infection. These 1 revealed how virulent cholera strains have spread by chlamydia. This model has the potential studies highlighted the key role of two Humans, not reservoirs, through South America and Africa, with important to be applied to other pathogens and genes involved in immune system function, drive cholera epidemics tissue systems to explore a wide range IRF5 and I L-10 R A , in preventing Chlamydia 2 implications for disease control strategies. of host-pathogen interactions and the from sheltering within macrophages. As Human cell model shows our genes role human genetics plays in influencing ince the 1800s, the world has been By contrast, 7PET appears to seed rapid well as suggesting new leads for therapy affect response to Chlamydia infection outcome. swept by seven waves of cholera explosive outbreaks. Identifying which development, the model provides a tool for 3 dissecting host responses to chlamydia – work Our Drug-resistant E. coli tracked epidemics, the latest of which began strain is associated with an outbreak Chlamydia: by employing HipSci cells Chlamydia trachomatis is one of the S and, moreover, a similar strategy could be in a care home in the 1960s and has killed many thousands may therefore help public health officials to create genetically reprogrammed UK’s most common sexually transmitted of people worldwide. Important new mount the most appropriate response macrophages, Sanger scientists adopted for other intracellular pathogens. 4 infections, and also an important cause of Slave trade helped melioidosis insights into the recent global spread of to contain its spread. identified two key immune genes that blindness worldwide. Chlamydia is difficult help to prevent chlamydia sheltering jump continents cholera has come from a major genomic to study (and treat) because it tends to The spread of 7PET in Africa has been in macrophage cells 5 study, published in two papers in Science, invade and replicate within macrophages. more complicated, with 11 introductions Genomic surveillance in NHS reveals from an international team led by scientists Attempts have been made to study the Reference from the Sanger Institute and the Institut since the 1970s, triggering epidemics 1. Yeung AT et al. Exploiting induced pluripotent stem full picture of MRSA spread bacteria within mouse macrophages and cell-derived macrophages to unravel host factors approach Our 1,2 lasting up to 28 years. The last five 6 Pasteur in France. macrophage cell lines but neither is a true influencingChlamydia trachomatis pathogenesis. introductions were all of multidrug-resistant Genomics cracks Icelandic model of natural infections. Nat Commun. 2017; 8: 15013. horse mystery The team analysed 1,200 cholera samples strains originating in South Asia. However, 7 from 14 South American and 45 African a key similarity with South America is that To overcome this issue, Sanger researchers Disturbed bacterial populations countries covering a 49-year period, and African epidemics of 7PET appear to be and colleagues in Canada have developed seek stability reconstructed evolutionary trees to show triggered by human transmission of the a new and more realistic model that 8 how they were related. The genomic data disease, and are not due to long-term Penicillin use drove methicillin resistance indicate that the strain responsible for the environmental reservoirs or climate events.

3 Other information seventh wave, 7P El Tor (7PET), has been This discovery will help focus control repeatedly introduced from South Asia strategies on areas of greatest impact. Drug-resistant E. coli into Africa since the 1970s and into South America in the 1990s. tracked in a care home Notably, in South America, the pandemic Sanger-led study has identified References strain seems to behave very differently from 1. Domman D et al. Integrated view of Vibrio cholerae extensive transmission of antibiotic- those already present. ‘Local’ cholera in the Americas. Science. 2017; 358: 789–793. resistant Escherichia coli in a care Cholera A strains persist in environmental reservoirs 2. Weill FX et al. Genomic history of the seventh home. Care homes are known to be reservoirs still affects and can trigger outbreaks but do not seem pandemic of cholera in Africa. Science. 2017; of drug-resistant pathogens, but are not to erupt into full-blown epidemics. 358: 785–789. included in current surveillance programmes. The study used genomic methods to track 47 drug-resistant and drug-sensitive E. coli strains countries worldwide in 45 care home residents over a six-month and kills almost period. Slightly more than one third of participants carried antibiotic-resistant E. coli, which was spread both within the care facility 100,000 and to a local hospital.1 With increasing We are getting a real sense of numbers of people requiring long-term care, people each year how cholera is moving across the the data point to a potentially important new globe, and with that information application of genomic surveillance to monitor we can inform improved control community reservoirs of drug resistance. strategies and basic science to better understand how a simple bacterium continues to pose Reference 1. Brodrick HJ et al. Longitudinal genomic such a threat.” surveillance of multidrug-resistant Escherichia coli carriage in a long-term care facility in the United Professor Nick Thomson Kingdom. Genome Med. 2017; 9: 70. Group Leader at the Sanger Institute Care homes are an important reservoir of drug-resistant E. coli

28 Wellcome Sanger Institute Highlights 2017/18 Wellcome Sanger Institute Highlights 2017/18 29 Melioidosis kills Our work Our work Infection Genomics approximately Infection Genomics 89,000 4 people a year 6 Slave trade helped Genomics cracks melioidosis jump continents Icelandic horse mystery enome sequencing has revealed how and when the bacterium Genomic sleuthing by Sanger scientists has Gcausing melioidosis spread from revealed the origins of a mysterious epidemic its base in Australia. affecting the Icelandic horse population.

Despite a high fatality rate, melioidosis do we What has tended to pass under the global radar. Caused by infection with the soil-dwelling bacterium Burkholderia pseudomallei, melioidosis affects around 165,000 people a year and kills an estimated 89,000. As well as the scale of its impact, its potential use in bioterrorism has recently

raised its profile. lvery routes i – work Our ric Isoltes To provide insight into its global meric Isoltes Icelandic horses have dissemination, an international team led been isolated from by Sanger researchers sequenced 469 the rest of the equine B. pseudomallei isolates from 30 countries The dates for the likely introduction into of isolates varies markedly between these world for more than collected over 79 years, reconstructing its South America – between 1682 and 1849 two locations. evolutionary history.1 The sequence data – strongly implicate the Slave Trade in its suggest that B. pseudomallei originated transatlantic spread. 130 years Our approach Our in Australia and was transferred just once The genetic data also provide a possible into South-East Asia. Reference answer to a clinical curiosity. Melioidosis 1. Chewapreecha C et al. Global and regional From this single introduction, it then spread symptoms differ geographically, particularly dissemination and evolution of Burkholderia repeatedly through countries in South-East between South-East Asia and Australia – pseudomallei. Nat Microbiol. 2017; 2: 16263. Asia and also into Africa and South America. potentially because the genetic make up

5 Other information a year, encompassing samples from three More generally, the study suggests that hospitals and 75 GP practices. The study Genomic integrated genomic and epidemiologic generated both genomic and epidemiological surveillance could play a pivotal role in (time and place of sample collection) surveillance in NHS outbreak investigation, to support more information for more than 2,000 isolates.2 effective infection control and rapid halting reveals full picture or more than 130 years, the Icelandic More conventional epidemiological The analysis revealed 173 transmission of transmission. A follow-up study is horse population has been isolated analysis strongly suggested that an equine clusters, most of which had not been exploring the potential of real-time sharing of MRSA spread from the rest of the equine world. rehabilitation centre was the source of the previously identified, varying in size from of genomic data with clinicians, to enable F This isolation has kept Iceland free of Icelandic epidemic. Horses exercised in outinely combining genomics with two to 44 individuals and involving nearly sequencing to be integrated into routine the major diseases that affect horses, a communal water treadmill at the centre, epidemiology within an NHS region 600 people in total. Some clusters were clinical management. but leaves the horse population highly which probably provided an ideal breeding would reveal networks of MRSA centred on hospitals, others were based R vulnerable to infections they have not ground for the bacterium to thrive and transmission in hospitals and community in the community, and some involved encountered before. spread from horse to horse. settings that would otherwise go transmission between the two. While MRSA undetected, Sanger scientists have found. strains are generally thought to be adapted References 1. Harris SR et al. Whole-genome sequencing for Despite strict biosecurity measures, in 2010 Since no horses were imported into Iceland, to the hospital environment, hospital- An influential 2013 study demonstrated the analysis of an outbreak of methicillin-resistant the island’s 77,000 horses were badly the epidemic strain of S. zooepidemicus associated lineages were perfectly able to power of genome sequencing to identify Staphylococcus aureus: a descriptive study. affected by a mysterious respiratory probably arrived via contaminated tack spread in the community, with transmission Lancet Infect Dis. 2013; 13: 130–6. MRSA outbreaks in hospitals and likely disease. When conventional methods failed or an asymptomatic human carrier from seen in households, care homes and at 2. Coll F et al. Longitudinal genomic surveillance Analysis involved chains of infection.1 But MRSA also to identify a cause, the Animal Health Trust overseas; the strain has been found in GP practices. of MRSA in the UK reveals transmission patterns circulates outside hospitals, so a full in hospitals and the community. Sci Transl Med. and Sanger Institute were brought in to a horse in Sweden and in an infected investigate. Genomic analyses of 257 person in Finland. picture of its spread will require analysis The data point to significant shortcomings 2017; 9: pii: eaak9745. samples from affected animals suggested 173 of samples from the wider community. in current infection control procedures. that a virulent strain of a bacterium found transmission clusters, They also highlight the importance of In a study published in Science Translational in healthy horses, Streptococcus varying in size from extending MRSA surveillance outside the Medicine, Sanger scientists and their zooepidemicus, was the likely culprit.1 hospital setting, as transmission networks Reference colleagues analysed all MRSA samples 1. B jörnsdóttir S et al. Genomic dissection of an clearly extend into the community. handled by a large diagnostic laboratory Icelandic epidemic of respiratory disease in horses and associated zoonotic cases. MBio. 2017; 2-44 in the over the course of 8 pii: e00826–17. individuals

30 Wellcome Sanger Institute Highlights 2017/18 Wellcome Sanger Institute Highlights 2017/18 31 Our work Our work Infection Genomics Infection Genomics

7 over completely, ST131 numbers stabilised In a study combining genetic analysis Disturbed bacterial within a few years; a similar pattern was and modelling, Sanger researchers seen with ST69. Hence, after a new strain and their international colleagues took Penicillin use drove populations seek emerges, population equilibrium is initially a gene-centred view. They discovered perturbed but then stabilises.1 that the frequency of individual genes stability was remarkably similar before and after methicillin resistance The likely explanation is that drug vaccination, even though the strain s well as antibiotics and vaccination, resistance is not always a selective composition changed markedly.2 Hence interaction between different advantage. E. coli is a usually harmless there again appears to be a population strains of bacteria plays an A bacterium living in the gut, competing for equilibrium, driven by competition between important role in the survival and make-up living space. Drug-resistant strains are strains within an environmental niche, of bacterial populations. strongly favoured when bacteria establish with negative frequency-dependent bloodstream infections and are treated taphylococcus aureus acquired the do we What The alarming rise in antibiotic resistance is selection acting as a key stabilising with antibiotics. But selection appears key gene for methicillin resistance, often used to illustrate the power of natural factor both before and after vaccination. to act against strains that become too mecA, years before methicillin selection – resistant microbes prosper at common (an effect known as ‘negative beganS to be used therapeutically, the expense of the susceptible. However, frequency-dependent selection’), a genomic analysis of more than two studies from Sanger scientists in 2017 halting their expansion and leading to 200 historical S. aureus samples has paint a more complicated picture, with References the maintenance of population diversity. revealed.1 The analysis suggests that survival also depending on competition 1. Kallonen T et al. Systematic longitudinal survey of invasive Escherichia coli in England demonstrates MRSA emerged in the 1940s, more than

between bacterial strains for space in work Our Interestingly, a similar phenomenon is a stable population structure only transiently a decade before the first medicinal use environmental niches. apparent after pneumococcal vaccination. disturbed by the emergence of ST131. Genome Res. 2017 Jul 18. doi: 10.1101/gr.216606.116. of methicillin. It is likely that the use of Multiple strains of Streptococcus The first study published analysis of more penicillin selected for a genetic element pneumoniae exist, only some of which are 2. Corander J et al. Frequency-dependent selection than 1,500 Escherichia coli isolates from in vaccine-associated pneumococcal population including the mecA gene, and these targeted by pneumococcal vaccination. national and regional hospital collections dynamics. Nat Ecol Evol. 2017; 1: 1950–1960. strains had a selective advantage once An important question, therefore, is what of bloodstream infections, from 2001–2011. methicillin was introduced in 1959 – happens to the population structure During this time, a globally disseminated leading to the detection of methicillin- when particular strains are eliminated multidrug-resistant strain, ST131, resistant strains within a year. by vaccination. approach Our appeared, as did a new drug-sensitive strain, ST69. However, rather than taking

Reference 1. Harkins CP et al. Methicillin-resistant Staphylococcus aureus emerged long before the introduction of methicillin into clinical practice. Genome Biol. 2017; 18: 130. Other information In bloodstream infections, no single E. coli strain ever completely outcompetes all other strains, instead any new strain finds a stable equilibrium with the others

More than 200 historical S. aureus samples were analysed

32 Wellcome Sanger Institute Highlights 2017/18 Our work Malaria

1 2 genome sequences will facilitate the Malaria family development of new diagnostic tests to Malaria distinguish these unusual forms of malaria Mosquito diversity ties revealed and potentially also new drugs and vaccines.

enome sequences of the remaining Furthermore, comparisons between threatens control elusive species of human-infecting human-infecting and other malaria parasites In this section G malaria parasites have shed new has revealed how they are related to light on Plasmodium family relationships one another, and shone light on genetic 1 and evolution. Mosquito diversity threatens efforts changes associated with adaptation to human hosts. The data provide insight into control efforts In Nature, an international consortium led 2 rapidly evolving genes involved in host

by Sanger researchers has reported the do we What Malaria family ties revealed invasion, including two new families of genome sequences of three relatively rare genes coding for proteins resembling the malaria parasites causing human disease RH5 vaccine target. 3 – Plasmodium malariae and two species Landmark study of half the genes of P. ovale.1 in the malaria parasite genome 4 Although less common and deadly than Missing link could boost RH5-based species such as P. falciparum, the three Reference malaria vaccines 1. R utledge GG et al. Plasmodium malariae and still account for some 10 million P. ovale genomes provide insights into malaria work Our 5 cases a year and can establish infections Genes and cell interactions drive parasite evolution. Nature. 2017; 542: 101–104. that last for years. Because of their rarity T-cell fate and resistance to culturing, very little is 6 Attack all sides of malaria at once known about their biology. The new

7 Why some people are naturally

resistant to malaria approach Our 8 Grand scale reveals hidden life Percentage of genes required for normal growth of malaria parasite

34.9% 36.6% 40.0% 62.9%

S. cerevisiae Trypansoma brucei Toxoplasma gondii Plasmodium berghei Other information (Baker’s yeast) (Sleeping sickness) (Toxoplasmosis) (Malaria) The largest ever genetic study of malaria- transmitting mosquitoes has found that they are among the most genetically diverse creatures on Earth – a finding with important consequences for mosquito control. 3 ontrol of Anopheles mosquitoes, to spread sterility genes through mosquito Using the mouse model, the team Less positively, genes that proteins principally through use of populations. Gene drive relies on gene- Landmark study developed an innovative high-throughput visible to the host immune system were Cinsecticides, has made a large editing methods that target specific gene knockout strategy for Plasmodium much more disposable – potentially contribution to recent falls in malaria stretches of DNA sequence, many of which of half the genes berghei. As well as eliminating individual explaining why it has proven so challenging infections. However, a surge in insecticide show high levels of genetic variability. genes, the technique added a unique to develop malaria vaccines targeting resistance threatens to derail mosquito More positively, the study identified a set in the malaria barcode to the genome of each altered single antigens. control efforts. of genes free of genetic variability that P. berghei cell. The barcodes enabled could be targeted in gene drive projects. parasite genome researchers to count the numbers of To gain a better understanding of malaria- progeny cells generated in each knockout ew research shows why vaccines transmitting mosquitoes, the international The data revealed that genetic variants line, so the effects of each knockout on often fail, but also reveals targets Reference Anopheles gambiae responsible for insecticide resistance have parasite growth could be determined. 1. B ushell E et al. Functional profiling of a sequenced the genomes of 765 Anopheles been under intensive selective pressures, Nfor effective drug treatment. Plasmodium genome reveals an abundance of genomes from 15 locations in eight and spread more readily than previously The good news is that a surprisingly large essential genes. Cell. 2017; 170: 260–272.e8. A mouse model of malaria infection countries. As reported in Nature, the thought. Indeed, genes seem to flow with number of 2,578 genes assayed – around developed by Sanger researchers has Anopheles genome turns out to be remarkable rapidity across the African two-thirds – were essential or important to enabled innovative genetic techniques to be remarkably variable, containing more continent, not only through independent growth. This probably reflects the fact that used to investigate parasite biology. In Cell than 50 million sites of single nucleotide emergence but also by mosquito migration. the parasite genome has been slimmed Sanger researchers and their colleagues variation in the 141 million base pairs of down to the bare minimum number of in the PlasmoGEM consortium report the the genome that the team could analyse.1 genes for survival. Encouragingly, this most extensive study ever undertaken of suggests that a relatively large number malaria parasite gene function, assessing This variation has significant implications for of genes could be good targets for Reference the contributions to growth made by more control. It provides abundant raw material 1. The Anopheles gambiae 1000 Genomes drug development. from which resistance to insecticides might Consortium. Genetic diversity of the than half the 4,600-4,700 genes in the 1 derive. In addition, it will be an obstacle to African malaria vector Anopheles gambiae. parasite genome. Nature. 2017; 96–100. ‘gene drive’ control measures, which seek 552:

34 Wellcome Sanger Institute Highlights 2017/18 Wellcome Sanger Institute Highlights 2017/18 35 Our work Our work Malaria Malaria

4 6 Missing link could Attack all sides boost RH5-based malaria vaccines of malaria at once

anger researchers have identified the protein that plays a key role in S the malaria parasite’s attachment to red blood cells.

In 2011, Sanger researchers demonstrated at the Institute, to detect those that bind to researchers. This study showed that Creating a malaria vaccine that targets a number of do we What that the RH5 protein of Plasmodium RH5.4 Their work revealed that a parasite RH5 delivered via viral vectors stimulated falciparum is essential for the parasite’s surface protein known as P113 binds to one anti-RH5 antibody responses far in excess different parasite proteins – each of which operate at invasion of red blood cells.1 Disrupting the end of RH5, anchoring it to the parasite. of those seen in adults exposed to natural a different stage of red blood cell invasion – could be Sanger interaction of this parasite’s protein and with When RH5 then binds to basigin on the malaria infections.5 researchers the basigin protein on the surface of human surface of human red blood cells invasion the secret to effective prevention. generated red blood cells completely inhibits parasite begins. The P113-binding end of RH5 is alaria vaccine development against several of these proteins proliferation in vitro. Subsequent studies cleaved off as the parasite enters the red faces two major challenges: the inhibited cell invasion across multiple have revealed that targeting this pfRH5- blood cell, with internalisation driven by work Our References complexity of the parasite life cycle parasite strains.2 29 basigin interaction offers great promise other protein interactions between parasite 1. Crosnier C et al. Basigin is a receptor essential for M and the parasite’s genetic diversity. This 2,3 erythrocyte invasion by Plasmodium falciparum. antigens involved in for developing both vaccines and drugs. and host cell. Importantly, a number acted synergistically Nature. 2011; 480: 534 –7. great diversity has meant that there is no invasion of red and high-resolution videomicroscopy However, there has been one vital piece The team also chemically synthesised a 2. Zenonos Z et al. Basigin is a druggable target for highly effective vaccine currently available blood cells suggested that this compounding of effect missing from the picture. The parasite peptide corresponding to the P113-binding host-orientated antimalarial interventions. JEM. to combat the disease. Research from the 2015; 212: 1145 – 51. was due to targeting different aspects of the RH5 protein lacks any structure that would portion of RH5, and showed that it retained Sanger Institute and its partners in 2017 3. Wanaguru M et al. RH5-Basigin interaction plays invasion process. Notably, with colleagues anchor it to the parasite surface, suggesting the ability to bind P113 and elicited suggest that a vaccine targeting a number a major role in the host tropism of Plasmodium of different proteins, operating at different in Mali, the team also found that people that it is a secreted protein. But, if this is protective antibodies. Such a peptide approach Our falciparum. Proc Nat Acad Sci. 2013; 110; 20735–40. carrying combinations of antibodies the case, how can it mediate the parasite’s has exciting potential as a component stages of red blood cell invasion, would be 4. Galaway F et al. P113 is a merozoite surface against these proteins were at reduced attachment to the red blood cell? of a multisubunit vaccine. protein that binds the N terminus of Plasmodium much more powerful. risk of malaria. falciparum RH5. Nat Commun. 2017; 8: 14333. In 2014 Sanger scientists and researchers Sanger researchers have solved this The strategy of targeting RH5 has also 5. Payne RO et al. Human vaccination against RH5 The study provides further support for mystery by screening a library of parasite received a boost from a phase 1 vaccine induces neutralizing antimalarial antibodies that in reported that children whose multiple antigen targeting, and has identified proteins using a sensitive assay, developed trial carried out in Oxford, involving Sanger inhibit RH5 invasion complex interactions. immune systems responded to several JCI Insight. 2017; 2: pii: e96381. proteins found on the surface of at least five new potential targets for vaccine bloodstream stage parasites were at development. Moreover, the technology

5 Other information offers opportunities for additional large- Single-cell RNA sequencing combined reduced risk of developing malaria over 1 scale studies to identify further targets. Genes and cell with computational modelling of gene the following six months. Furthermore, expression data revealed patterns of gene children with good responses across interactions drive activity associated with a split into either multiple antigens were almost completely Th1 or Tfh cells, which have distinct roles protected against malaria. T-cell fate in immune responses. A third cell type, References This highly encouraging study relied on 1. Osier FH et al. New antigens for a multicomponent Tr1 cells, emerged as a spur from the ingle-cell genomics and technology developed by Sanger scientists blood-stage malaria vaccine. Sci Transl Med. 2014; Th1 lineage. Development of Th1 cells was 247ra102. computational modelling have been which enables the external domains 6: dependent on expression of galectin-1, a used to track the fate of CD4 T cells of Plasmodium surface proteins to be 2. Bustamante LY et al. Synergistic malaria vaccine S protein previously implicated in regulating combinations identified by systematic antigen after malaria infection in mice. produced in large quantities. Using this immune responses and a potential screening. Proc Natl Acad Sci USA. 2017; technology, Sanger researchers generated 12045–12050. drug target. 114: Single-cell RNA sequencing represents a 29 antigens involved in invasion of red blood major technological advance, generating The results provide new insights that could cells and, in 2017, showed that antibodies gene expression data on individual potentially be exploited to nudge CD4 T-cell cells rather than populations. One key differentiation in ways that protect against application is in tracking the differentiation malarial infection. More generally, the pathways followed by cells, including computational tools developed could be immune cells after infection – work that used to dissect the differentiation of other By bringing together multiple could identify genes critical to effective important cell types. immune responses to pathogens. areas of expertise, from genomics to large field studies of patients in In an international collaboration, Sanger Mali, and down to advanced video researchers used single-cell techniques to microscopy observing individual explore how mouse precursor CD4 T cells Reference – important orchestrators of immune 1. Lönnberg T et al. Single-cell RNA-seq and Insights parasites, we have discovered computational analysis using temporal mixture responses – develop into specialised several new vaccine targets.” modelling resolves Th1/Tfh fate bifurcation in into CD4 T-cell subclasses following infection with the malaria. Sci Immunol. 2017; pii: eaal2192. 2: differentiation could point malaria parasite.1 Dr Julian Rayner to ways to boost immune Group Leader at the Sanger Institute response to malaria

36 Wellcome Sanger Institute Highlights 2017/18 Wellcome Sanger Institute Highlights 2017/18 37 Our work Our work Malaria Malaria

7 Why some people Grand scale reveals hidden are naturally resistant life of malaria parasite to malaria Painstaking genetic reconstructions by Sanger

researchers have revealed how rearrangements in do we What wo projects in the Malaria much earlier than standard procedures, Visit Malaria Cell Atlas site here glycophorin genes can reduce some people’s risk programme demonstrate how and supports the need for active genomic www.sanger.ac.uk/science/tools/ of developing severe malaria by up to 40 per cent. Sanger scientists are using surveillance in malaria-affected countries. mca/mca/ large-scaleT sequencing and analysis to discern the most intimate details At the other end of the scale is research of the malaria parasite’s evolution and applying single-cell technologies to understand every step of the parasite’s life cycle. One is applying whole-genome References sequencing to monitor and understand life cycle. As part of the new Malaria Cell 1 . Amato R et al. Origins of the current outbreak the evolution of drug resistance across Atlas project now being conducted at the of multidrug-resistant malaria in southeast Asia: work Our Malaria infects countries, while the other is unlocking Sanger Institute, researchers analysed a retrospective genetic study. The Lancet roughly Infectious Diseases. 2018; 18: 337–345. the individual actions of single parasites more than 500 individual Plasmodium 2 at different stages in their life cycles. parasites during their blood-borne stage. 2. Reid A et al. Single-cell RNA-seq reveals hidden transcriptional variation in malaria parasites. eLife. 2018; 7: e33105. 212 million In The Lancet Infectious Diseases, On average, the team was able to detect people a year Sanger researchers detailed how their the activity of almost 2,000 genes in analysis of 1,500 Plasmodium falciparum each parasite, just under half the genetic

genomes gathered over 11 years in complement of the parasite and the approach Our Malaria causes 11 locations across South-East Asia has largest number surveyed in individual an estimated revealed how a multidrug-resistant form malaria cells. This intimate view of the of the parasite evolved in Cambodia.1 parasite’s gene activity has revealed that In 2008 a powerful antimalarial drug a long-standing view of the genome’s 429,000 combination – dihydroartemisinin and action was wrong. Instead of genes each year piperaquine (DHA-PPQ) – was made the turning on and off in a slow cycle, the official first-line treatment in Cambodia scientists discovered that whole sets

and, within a year, the first resistant of genes switched on and off in unison, Other information parasites had emerged. For the next suggesting that regulatory elements five years, this resistant strain spread play a vital role. 765 under the radar across Cambodia, The information they have gathered volunteers from outcompeting all other parasite strains. Gambia, Burkina is being made freely available on an Faso, Cameroon and The work shows that population-wide interactive, open-access Malaria Cell Tanzania were genomic surveillance of malaria can Atlas website. sequenced detect the emergence of drug resistance

The activity of almost n 2015, the Sanger co-led Malaria severe malaria and 5,310 controls, the team arisen recently and been selected for Genomic Epidemiology Network was able to piece together the stretch because of the protection it offers against I (MalariaGEN), reported in Nature that of DNA containing glycophorin genes.2 severe malaria. Further work could reveal variation at the site of glycophorin genes the mechanism underlying this protective 2,000 The critical finding, published in Science, was found to provide significant protection effect, and hence possible new ways to genes in each parasite 1 was that resistance to severe malaria was against severe malaria. Glycophorins are a intervene to prevent severe disease. was detected family of proteins that the malaria parasite associated with a complex rearrangement, interacts with as it invades host red blood dubbed DUP4. DUP4 includes five cells, but it was unclear exactly which glycophorin genes, including two copies of a fusion between the GYPA and GYPB genetic features were protective. References genes. The fusion genes resembled an 1. Malaria Genomic Epidemiology Network et al. To find the answers, the MalariaGEN team unusual variant of the MNS blood group A novel of resistance to severe malaria in collected genome sequence information system, known as Dantu, and the team a region of ancient . Nature. 2015; 253–7. from an additional 765 people from 10 went on to show that DUP4 is indeed the 526: African ethnic groups. By combining this basis of the Dantu blood group variant. 2. Leffler EM et al. Resistance to malaria through of red blood cell invasion genomic data with information from the receptors. Science. 2017; 356: pii: eaam6393. 1000 Genomes Project, and imputing the Notably, DUP4 was found only in East results into a data set of 4,579 people with African populations, suggesting it has

38 Wellcome Sanger Institute Highlights 2017/18 Our approach We foster strong collaborations with scientists, clinicians, institutions, governments and society for mutual benefit

42 Scale do we What Genomic inquiry requires vast volumes of data, experimental models and computational power. Our Institute’s unique, scalable and robust infrastructure delivers – both for us and researchers worldwide.

44 Innovation work Our To take our research findings to the next level and deliver transformative technologies we work in collaboration with biotechnology and pharmaceutical industries and funders.

46 Culture approach Our As genomic research begins to impact clinical practice and society, our researchers are crossing traditional divides to work with entrepreneurs, health services and society.

48 Influence By leading global initiatives and facilitating Other information cross-cutting partnerships we seek to lay the foundations for a strong and vital future of genomic research, data sharing and clinical application.

50 Connections We use the power of the internet and collaboration tools to build genomic research capacity worldwide and facilitate the next wave of discovery.

40 Wellcome Sanger Institute Highlights 2017/18 Wellcome Sanger Institute Highlights 2017/18 41 Our approach Scale

1 2 To put the figure into context, it is the the speed of delivery. Using high- 5,000,000,000, equivalent of reading 1,666,667 human throughput DNA sequencing, the first Scale Understanding human life genomes once. Or, if you strung together petabase of DNA took 1,952 days to read, 000,000 bases of all the DNA molecules that would make up in contrast the fifth petabase took just 5 Petabases, it would be 3,000km long. 169 days – a more than 10-fold increase In this section – one cell at a time DNA and counting That’s the same as travelling from Earth in speed. 1 to the International Space Station seven he Sanger Institute has one of the Understanding human life – one and a half times. In fact, at this current rate, by the time you The Sanger Institute, in conjunction with the Broad world’s largest DNA sequencing cell at a time have finished reading this sentence, the Institute, is co-leading the Human Cell Atlas initiative, centres, enabling us to conduct 2 T Yet these figures do not tell the whole Sanger’s sequencing teams will have read 5,000,000,000,000,000 bases of DNA which seeks to create a comprehensive reference map genomic experiments at a scale that story. Honed by 25 years of continuous another 273 million bases of DNA code. and counting few can match. In fact, in 2018, our development to extract the maximum

sequencing teams celebrated a do we What 3 of every cell in the human body. Just like the Human output from the latest sequencing Sanger partners with UK Biobank to Genome Project more than 25 years ago, the scope remarkable achievement: they had machines, Sanger’s sequencing teams are read more than 5 Petabases of DNA. sequence 50,000 people’s whole genomes of the endeavour is vast, requiring both worldwide always searching for more ways to increase collaboration and the development of new cutting-edge techniques and methods of analysis. Time taken to sequence 1 Petabase by Welcome Sanger Institute he ultimate aim of the Human Cell liver, placenta and skin. The ongoing data Atlas is to identify, and then analyses from these samples will help to work Our T investigate the genetic activity reveal which genes are activated, when, of every cell type – that make up and in which cells as the body forms. an estimated 37 trillion cells in the human body. The work is divided into 185 projects, The knowledge this project will generate 2009–2014 focusing on 22 different tissues in the will help to shine a light on the importance body. So far, more than 480 scientists, of different biological processes at work within cells during development. This could 2014–2016 in 44 countries around the world, are 2016 1,952 approach Our involved and more than 1.5 million cells help to unlock understanding of a range 2016/17 days 434 2017/18 have been studied. of health issues including miscarriage 329 and children’s developmental disorders. days 247 169 One programme in which the Sanger is In addition, the discoveries made could days days days participating is the Human Developmental reveal important insights into the biological Cell Atlas, which is exploring the cells that pathways involved in ageing and cell are important for human development. By repair, which could lead to advances March 2018, researchers from the Sanger in regenerative medicine. and Newcastle University had sequenced 3 Other information UK Biobank’s aim is to improve the a quarter of a million separate cells from Sanger partners diagnosis and treatment of a wide range developing tissues including the kidney, of serious and life-threatening illnesses, with UK Biobank to including cancer, heart disease, stroke, diabetes, arthritis and osteoporosis. For UK Biobank sequence 50,000 this reason, all the data generated by the contains DNA Sanger will be made freely available by the samples from people’s whole UK Biobank to researchers in the UK and 185 482 genomes around the world to power future research. 500,000 projects scientists The genomic information generated by he Sanger Institute will be applying Sanger will join UK Biobank’s health and individuals its expertise in high-throughput DNA wellbeing databases that contain detailed T sequencing to read the full genomes information on medical histories, lifestyles, of 50,000 people to gold standard for the imaging and biochemical analyses. It is UK Biobank. The Sanger was chosen hoped that this resource will drive worldwide because of its sequencing teams’ proven research by providing scientists with the experience in delivering high-quality human high-quality data they need to conduct their genome data at a scale that few in the investigations, sparing them the time and world can match. money that they would otherwise need to spend conducting experiments to gather To deliver this work is a major undertaking, the same information. and demonstrates the ongoing revolution in 22 44 DNA sequencing speeds. Reading 50,000 Also, this work may only be the beginning. human genomes 30 times over to deliver UK Biobank contains DNA samples from tissues countries gold standard reads will require the 500,000 individuals. If successful, this sequencing of 4.5 Petabases of DNA code. project may lead to the sequencing of To put this into context, the project will every UK Biobank volunteer’s samples, require the Sanger’s sequencing teams to creating the world’s most detailed whole- read almost the same amount of bases genome database. as the entire sequencing output of the Institute over its 25-year history.

42 Wellcome Sanger Institute Highlights 2017/18 Wellcome Sanger Institute Highlights 2017/18 43 Our approach Innovation

1 2 Pioneer Project, Portugal’s In2 Genome Sanger spin-outs genetic diagnosis project and is part of a Innovation Takeda joins Open Targets memorandum of understanding, signed go from strength during the UK Prime Minister’s tour of In this section In December 2017 Takeda Pharmaceutical became , to bring clinical diagnosis support the fifth member of the pioneering precompetitive to strength to China’s national digital health platform. 1 wo companies founded on long-term Takeda joins Open Targets partnership – Open Targets – which brings together Microbiotica, Sanger’s most recent research developed at the Sanger company, has won a number of awards the Sanger Institute, EMBL-EBI and pharmaceutical Institute continue to garner success. 2 T in its first year of operation. Founded in Sanger spin-outs go from strength companies to speed drug development and delivery December 2016 by the Institute’s Dr Trevor Congenica, the genome analysis to strength through the application of big data. Lawley and Professor Gordon Dougan,

company founded by Dr Matt Hurles and do we What the spin-out seeks to develop and 3 Dr Richard Durbin, successfully completed Kymab is one of Deloitte’s UK he Open Targets initiative was The result of this public-private enterprise commercialise bacteriotherapies and an £8 million Series B funding round and Technology Fast 50 launched in 2014 to address a key is that Open Target’s partners are able to biomarkers based on the human gut partnered with US-based Edico Genomics challenge in drug development: systematically identify small molecules that microbiome. The company has unique T to offer an all-in-one genome analysis that almost 90 per cent of all compounds are most tailored to a disease’s biological access to the techniques and bacterial solution. It is already supplying services entering clinical trials fail to become licensed targets and prioritise those most likely libraries generated in the Institute to culture, to Ltd to help deliver due to insufficient efficacy or to succeed for development. The goal characterise and the majority the UK 100,000 Genomes Project and patient safety. The root of this failure is often is to reduce both the time and cost of of a patient’s gut bacteria.

has secured contracts and partnerships work Our a lack of understanding of the biological drug development. Clinical scientists using around the world. target the drug is acting on. Since its inception, Microbiotica has won Sapientia™ by Congenica A cornerstone of this unique endeavour, OBN’s UK Best Start-Up Biotech 2017 A key component of Congenica’s Open Targets addresses this issue by marrying hosted on the Wellcome Genome award, Biotech and Money’s ‘Life Science offering is Sapientia™, a clinical decision the expertise of two scientific worlds to create Campus, is the Open Targets Platform Spin-out of the Year’ 2017 award and management support platform that a critical mass of knowledge and data that (www.targetvalidation.org). This site the One Nucleus Summer 2017 highlights potential disease-causing variants could not exist in any one organisation. The openly shares all the sequence data and BioNewsRound Award. in a patient’s genome and links these to initiative draws together the public domain information gathered by the initiative to the observed symptoms and supporting expertise of the Sanger Institute and EMBL-EBI wider scientific community to drive research approach Our research literature. This platform is being in generating and interpreting data from and avoid duplication. As of February 2018, used to help deliver China’s 100K Wellness genomics, proteomics, chemistry and the platform offers information on 20,974 disease biology with the private research of targets, 2,306,670 associations and Biogen, GSK and Takeda in areas spanning 9,728 diseases from 17 data sources. disease epidemiology, preclinical animal modelling, and biological processes.

Experts in life-science data Expertise in the role of genetics 3 Other information integration and analysis. in disease. Extensive Founded in 2010, the company has raised Access to vast public experimental capabilities Kymab is one of a total of $220 million in three rounds domain resources of investment funding. To deliver new Deloitte’s UK products, the company has developed strategic partnerships in the fields of Technology Fast 50 autoimmunity, cancer, blood-related $220 diseases and infectious disease with ymab, a company spun out from the Heptares, Novo Nordisk, the Bill and Sanger Institute to develop antibody million Melinda Gates Foundation, and MD and vaccine treatments, has been K Anderson Cancer Center. has been raised by named by Deloitte as one of the UK’s 50 Kymab in three rounds fastest growing technology companies One of its human monoclonal antibody in 2017, based on its past four years of of investment treatments – KY1005 – is currently in funding revenue growth. The company, founded on clinical trial having shown strong potential genome engineering developed by Sanger’s in dampening exaggerated immune Director Emeritus Professor , responses in bone marrow transplant has also been named one of Labiotech’s patients, a common and potentially top nine companies to watch in 2018. Kymab has been deadly complication. The drug works by named by Deloitte addressing an underlying immune system The company’s core technology is the as one of the UK’s imbalance found in many autoimmune Kymouse™ platform that recreates the conditions that prolongs T-cell response, entire diversity of the human immune causing tissue and organ damage. system’s B lymphocyte component in a Top 50 humanised mouse. Using this platform, Another of its antibodies – KY1044 – is on fastest growing Kymab’s researchers are exploring the role course to enter clinical trial in 2019. This technology companies immunological processes play in health drug works by improving the ability of a and disease to generate new, fully human, person’s immune system to recognise and Leading Expertise monoclonal antibody drugs and vaccines kill cancer cells. In particular, it has two neurodegenerative in oncology, for a range of diseases ranging from cancer important effects: it stimulates tumour- diseases and gastroenterology Expertise in to infectious disease. fighting immune cells and kills T regulatory disease biology. innovative and central cells, which are often found in tumours Translational haemophilia nervous system medicine therapies disease areas and reduce the body’s immune response.

44 Wellcome Sanger Institute Highlights 2017/18 Wellcome Sanger Institute Highlights 2017/18 45 Our approach Culture

1 2 most suitable choice of drug therapy far Free online course more quickly than current laboratory-based Culture Sanger to train the techniques. spreads genomic Yet understanding of the use and power In this section next generation of knowledge around of genomic techniques among healthcare 1 professionals is limited. Sanger to train the next generation the world of scientific leaders scientific leaders To address this knowledge gap, Sanger or the first time, the Sanger Institute scientists from three research groups in 2 has helped to deliver a freely available Free online course spreads genomic As life sciences research becomes ever more the Infection Genomics Programme have knowledge around the world web-based genomics course that will contributed to a pioneering online course.

F do we What boost understanding among healthcare 3 complex, there is a need to develop leaders who can The resulting training package, ‘Bacterial Sanger hosts UK’s first data scientist inspire and build strong communities of researchers professionals and the public. The course Genomes: Disease Outbreaks and degree apprenticeship scheme covers the key role that DNA sequencing ’, has been created dedicated to a common goal – both within single and genetic analysis can play in tracking by Advanced laboratories and across continent-spanning research and limiting disease outbreaks and the Courses and Scientific Conferences in spread of antibiotic resistance. partnership with FutureLearn, a social networks. Yet the skills needed to successfully steer learning platform founded by the Open Over the past eight years the Sanger has University, and the . a project to conclusion are not taught to researchers The course

pioneered the use of genetic sequencing work Our at university, nor are commercially available is the first of to track the spread of bacterial diseases This course is the first of a series of 10 leadership courses well placed to meet the unique a series of through hospital wards, countries, that will be created in conjunction with demands of life science research. continents and around the globe. the Advanced Courses and Scientific 10 Groundbreaking research into the spread of Conferences team and FutureLearn. It has o ensure that the next generation of The Institute’s Scientific Education and MRSA in a hospital ward demonstrated how been specifically designed for doctors and life science leaders are empowered Excellence Development (SEED) Scheme applying genomics in a healthcare setting nurses both in the UK and around the world with the management, negotiating offers five training programmes. Each one can provide early detection of disease but, because the course does not assume

T transmission and allow clinical intervention any knowledge of genomics or bacterial approach Our and organisational skills needed for is designed to support life science research success, the Sanger Institute has started leaders at very different stages of their to contain its spread. Additional research disease, anyone who wishes to explore the to share its bespoke internal leadership career, from newly promoted supervisors studies have shown that genetically power of genomics to transform healthcare training programmes with the wider life and managers, through to more established analysing a bacteria’s drug resistance could take part. sciences community. managers and group leaders. profile can help to guide clinicians to the

The courses have been specifically created All the programmes focus on helping the to meet the needs of the Sanger Institute participants effectively transfer their learning

as it leads global networks of scientists in back to their day-to-day roles. Development 3 Other information delivering cutting-edge science at scale. is supported through interactive training course enables students to develop Sanger’s long experience of coordinating sessions, feedback and personality Sanger hosts the unique skills necessary to visualise, and motivating diverse groups of scientists profiling. In addition, by structuring the analyse and interpret biological data in the Institute and across global research courses to encourage face-to-face UK’s first data through in-work training. networks, while also running one of discussion, the participants are given A truly Wellcome Genome Campus-wide Europe’s largest high-throughput DNA plenty of opportunities to build meaningful scientist degree endeavour, the programme is supported sequencing and analysis pipelines, networks, share best practice ideas and by a number of companies in the Biodata means that its training is ideally suited to develop behaviours that will drive their apprenticeship Innovation Centre, including Sanger the needs of the genomic and biological own success and that of their teams. spin-out company Congenica, Eagle research community. scheme Genomics, Genomics England, Global s genomics and biodata play an Gene Corp, SciBite and Specific increasing role in life sciences Technologies. A research and healthcare delivery, the need for skilled bioinformaticians becomes The scheme will start for the 2019/20 increasingly acute. Deloitte estimates that academic year and is funded by a the UK genomics industry will grow by 20 Higher Education Funding Council per cent this year, while the report ‘Big Data for England Degree Apprenticeship Analytics: Demand for Labour and Skills’, Development Fund. has calculated that 56,000 UK-based new big data jobs will be created every year until 2020.

To help address this urgent need for computer scientists with big data skills, the Sanger Institute has partnered with Anglia Ruskin University’s Degrees at Work team to run the first-ever UK data scientist degree apprenticeship for the 56,000 profession. Covering data science, software engineering, biology and mathematics, the UK-based new big data BSc (Hons) Bioinformatics undergraduate jobs will be created every year until 2020

46 Wellcome Sanger Institute Highlights 2017/18 Wellcome Sanger Institute Highlights 2017/18 47 Our approach Influence

2 Influence 25 Genomes Whipworm: UK school for 25 Years students are helping to analyse the genome of o celebrate its 25th anniversary, the a parasite that affects Sanger Institute is marking the event millions of children in in a way that uniquely embodies tropical countries T its values and mission – by generating genomes and biodata that are made freely available to the scientific community to 1 drive research that would otherwise not be

possible. The resulting project, 25 Genomes Throughout November 2017 schoolchildren Asterias rubens: The do we What In this section for 25 Years, is sequencing the genetic and members of the public went online to Common Starfish has remarkable regenerative 1 School students join code of 25 UK species whose genomes quiz scientists about the 42 species that powers and is one of School students join Sanger scientists are not currently available to the scientific were put forward for inclusion, and what the genomes being to decode Whipworm’s DNA community. To read more about the project, benefits DNA sequencing would provide. sequenced by the Sanger scientists to decode see page 51. 2 Sanger Institute 25 Genomes for 25 Years Experts from across the country kindly The project also offered the opportunity gave up their time to answer the nation’s

3 Whipworm’s DNA to embrace a key goal of the Wellcome questions. Some even replied as if they work Our Sanger Innovations help guide UK Genome Campus, on which the Sanger were a member of the species they were BioIndustry Association policies A unique collaborative effort is bringing the world is located: to engage with the public championing: for example the Leathery Sea about the power of genomic data and Squirt, Common Crane and Orkney Vole of neglected tropical diseases and bioinformatics bioinformatics to drive research and, amongst others. out of the laboratory and into classrooms across ultimately, improve outcomes. After five weeks, more than 500 questions, 42 the UK and Ireland. Organised by the Wellcome Genome and more than 4,000 votes, the final five species were put

Campus Public Engagement Team and species were decided: Common Starfish, approach Our he results of the year-long initiative The high-quality DNA sequence they are forward for the general partners, an unusual version of ‘I’m a Fen Raft Spider, Lesser Spotted Catshark, might even see school students interrogating was recently produced by the public’s vote Scientist, Get Me Out of Here’ gave the Asian Hornet and Eurasian Otter. T included as contributors on research Sanger’s Pathogen Genomics research public the chance to choose five of publications, and could set some of them group, and the analysis software that the species that will be sequenced. on the way to filling the UK’s bioinformatics they are using has been supplied by skills gap. the Wormbase team at EMBL-EBI and is regularly used by professional Genome Decoders: Whipworm bioinformaticians. combines the skills of Sanger scientists 3 Other information he Sanger is playing a central role • collaborate with the Healthcare and researchers in EMBL-EBI with the But perhaps the most innovative result Sanger Innovations in helping to guide the UK’s support Advanced Research Programme (HARP) enthusiasm and curiosity of 10,000 A-level of the project may take longer to deliver: and development of world-leading to determine the most productive students across the UK to fully annotate the those pupils whose work substantively T help guide UK genomics and biodata businesses. Adrian moonshot programmes and ensure genome of the human whipworm (Trichuris contributes to any future research papers Ibrahim, Head of the Innovations team at that the funding, infrastructure and trichiura). Coordinated by the Campus’ will be included in the contributors list BioIndustry the Sanger Institute, is the inaugural chair of industry connections are in place Connecting Science Public Engagement on the publications. the UK BioIndustry Association’s Genomics to enable success team, in partnership with The Institute for Association policies Advisory Committee. In this role, he will Research in Schools (IRIS), the project • ensure that large datasets are used help to guide the group as it provides the aims to annotate all of the estimated responsibly to deliver medical benefits, Association with expert guidance on the 15,000 genes in the worm’s genome. while also ensuring data privacy needs of, challenges to, and opportunities • support the NHS in embedding genomic Infection by the worm is called trichuriasis for, genomics companies. medicine into its healthcare systems to and it affects millions of children in tropical Launched at the BIA’s Committee Summit improve health outcomes for patients. countries where sanitation is poor. It places in early 2018, the group will provide a united a substantial health and economic burden voice to raise awareness of the genomics Other members of the committee include on children by preventing them from industry and the support it needs to the Sanger spin-out company Congenica, attending school and damaging their 15,000 flourish. In particular it will focus on and Eagle Genomics and Global Gene economic futures. It is hoped that their UK Corp, which are fellow tenant companies genes in a whipworm’s engaging with both the UK Government peers will help to alleviate this burden by in the Wellcome Genome Campus Biodata genome and international organisations to: identifying targets for vaccine development Innovation Centre. and novel therapies. • ensure that the Life Sciences Industrial Strategy and the Sector Deal are Not only does this collaboration help delivered in ways that optimally support to tackle a debilitating neglected tropical the launch and growth of UK-based disease, it also addresses the pressing genomics companies need for bioinformaticians. By engaging the students in a real-world scientific endeavour, the participants experience first-hand the excitement and challenge of a career in .

48 Wellcome Sanger Institute Highlights 2017/18 Wellcome Sanger Institute Highlights 2017/18 49 Our approach Connections

1 2 In December 2017, the final list of 25 How 25 genomes species was decided. Since this time Connections Sanger sets up global Sanger scientists have been working in are connecting partnership with gardeners, university professors, conservation trusts, commercial In this section antibiotic resistance Sanger with another organisations, environment charities, 1 the Natural History Museum, and the UK Sanger sets up global antibiotic resistance world of biological Government’s Natural England, to gather monitoring network monitoring network samples for sequencing. This experience 2 research is illuminating for everyone involved, and is How 25 genomes are connecting Sanger revealing just how challenging conservation A new global genomic surveillance unit is being set or its 25th anniversary, the Sanger with another world of biological research research can be, especially without a

Institute is generating a unique do we What 3 up by the Sanger Institute to help tackle the rise . Sanger is a key partner in Health Data and spread of antibiotic-resistant bacteria. F present to the UK and wider world: Research UK 25 new reference genomes for species For example, gathering the correct amed as a ‘catastrophic threat’ by Centre for Genomic Pathogen Surveillance, that reflect the diversity of life in Britain. blackberry sample for sequencing was no the UK Government’s Chief Medical within the Sanger Institute. The Centre is In so doing, the Sanger is forming new easy matter. There are approximately 360 Officer in 2011, the burden of an initiative dedicated to developing and partnerships beyond its usual sphere of species of blackberry and even specialists N activity and is employing the power of antibiotic-resistant bacteria could cripple disseminating best practice in genomic often need to observe the plant for a full healthcare systems if the issue is not epidemiology, laboratory techniques and genomics in new research fields to help year of its life cycle to determine its species.

There are work Our successfully managed. software engineering to improve the global understanding of ecosystems and preserve endangered species. It was by forming connections with experts approximately 360 surveillance of pathogenic bacteria. species of blackberry As demonstrated by Sanger researchers, Following the award of £6.8 million Official in soft fruit research that the Sanger was In conjunction with the Natural History pathogenic bacteria that spontaneously Development Assistance funding by the able to obtain a verified sample. It was Museum, the Sanger contacted more than acquire new forms of antibiotic resistance UK National Institute of Health Research supplied by the National Clonal Germplasm 400 partner groups of wildlife experts to or resistance to many of the most (NIHR), the Centre is now working to Repository in Oregon, run by the US generate a list of 20 species that would commonly used drugs, can appear establish a global genomic monitoring Department of Agriculture. And the US species of blackberry. Now both are be sequenced and a short list of 42 from anywhere in the world at any time. network spread across low- and middle- research connections for this plant working together to avoid unnecessary

which the public would select five (see approach Our The work of Sanger Infection Genomics income countries. continue courtesy of a serendipitous duplication and unlock this fruit’s scientists has shown how these bacteria page 49). Many of these species were then meeting at the Plant and Animal Genome genomic secrets. are then carried swiftly around the world Four sentinel sites are being established to championed by these partners, exposing conference in San Diego. There a Sanger by international travel, leading to national conduct whole-genome DNA sequencing them and the public to the benefits that scientist discovered a US-based researcher epidemics and global pandemics. Their and genomic surveillance of antibiotic- genomic sequencing could provide to who had just started to sequence the same research has also proved that these, often resistant bacteria in , India, conservation efforts. silent, developments could be detected and the Philippines. These sites £6.8 early through genomic surveillance – were selected because they are already regularly sequencing the genomes of involved in their nations’ antimicrobial 3 Other information records, molecular research data, (Welsh Government), Health and Social circulating bacteria in populations across programmes and are the laboratory testing million Sanger is a key genomics, environmental and social data Care Research and Development Division the world. This would allow governments stations for a number of local hospitals and to deliver enhanced healthcare approaches (Public Health Agency, Northern Ireland), to deploy effective and timely international healthcare networks. They also occupy Official Development partner in Health and therapeutics. Chief Scientist Office of the Scottish public health strategies. strategic positions around the world, Assistance funding by the Government Health and Social Care UK National Institute acting as gateways to understanding The Cambridge Hub will blend the genomic The first steps towards such a Data Research UK Directorates, and Wellcome. of Health Research the landscape of antibiotic resistance talents of the Sanger Institute with the comprehensive, globally connected early in their respective continents. he Sanger, with co-partners computational biology of EMBL-EBI and warning system are being taken by the EMBL-EBI and the University of the clinical knowledge of the University The NIHR Global Health Research Unit Cambridge, has been selected to of Cambridge to deliver fresh insights on Genomic Surveillance of Antimicrobial T join the UK’s newest institute for health into health and disease. The ability of Resistance, based at Sanger, will supply and biomedical informatics research: Health these organisations to work in such a resources and training in genomic methods The Global Health Research Unit will connect with sentinel Data Research UK (HDR UK). The three complementary fashion demonstrates HDR UK to the sentinel sites. This support will organisations will form the Cambridge the value of strategic partnership and encompasses stations in Colombia, India, Nigeria and the Philippines to enhance their detection and research Centre of Excellence, one of six centres long-term investment in genomics and genomically monitor drug resistant bacteria worldwide capacities, and help to embed cutting-edge established across the UK that will apply informatics to help to deliver the next- genomic practice into their national big data to challenging healthcare issues. generation of precision medicine. healthcare systems. Because the data generated are digital, and will be uploaded Coordinated by the Medical Research 22 HDR UK encompasses 22 UK research to the UK centre, the sentinel sites will be Council, HDR UK operates in a similar institutes and organisations and is a joint UK research institutes able to share their information through way to Sanger’s innovative Open Targets investment coordinated by the Medical and organisations online databases to provide clear, initiative: it seeks to draw together data Research Council, working in partnership interpretable information for healthcare scientists from a diverse range of disciplines with the British Heart Foundation, the professionals and government officials. and organisations to tackle challenges that National Institute for Health Research, the no one research institute could attempt on Economic and Social Research Council, It is hoped that, through this online network, its own. It will fund research that develops the Engineering and Physical Sciences these sentinel centres will benefit their computational approaches that combine Research Council, Health and Social Care nations’ populations by providing cutting- clinical data from routine healthcare Research and Development Division edge insights for local health provision and prevention. In addition, by globally sharing their data, a prototype global early warning system will be established.

50 Wellcome Sanger Institute Highlights 2017/18 Wellcome Sanger Institute Highlights 2017/18 51 Other information Other information Image Credits Institute Information What we do we What Our work Our Our approach Our Image Credits Wellcome Sanger Institute Highlights 2017/18 All images belong to Wellcome Sanger Institute, Genome Research Limited except where stated below: The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered Outside Page 18 – 3D illustration of T cells – Shutterstock in England with number 1021457 and a company registered in England with number 2742969, Front Cover – Methicillin-resistant Staphylococcus aureus Page 19 – Nerve fibres in adult healthy brain – Zeynep M Saygin, whose registered office is 215 Euston Road, London NW1 2BE. (MRSA) – Shutterstock McGovern Institute, MIT, Other information Inside Page 21 – Yeast cells – Offset First published by the Wellcome Sanger Institute, 2018. Front Cover – Breast cancer cells – Annie Cavanagh, Page 23 – Microscope – iStock, gut – Shutterstock, pregnant Wellcome Collection woman – iStock This is an open-access publication and, with the exception of images and illustrations, the content Page 2 – Woman – H. Kuper, Wellcome Collection, Page 24 – Father and son – Getty Images, heart – Shutterstock may, unless otherwise stated, be reproduced free of charge in any format or medium, subject to Petri Dish – iStock Page 25 – Father and son – Getty Images, Cretian woman the following conditions: content must be reproduced accurately; content must not be used in Page 3 – Mosquito – CDC/Paul Howell, breast cancer cell – iStock a misleading context; the Wellcome Sanger Institute must be attributed as the original author spheroid, SEM – Khuloud T. Al-Jamal, David Page 26 – Papua New Guinean people – Freda Oppenheimer and the title of the document specified in the attribution. McCarthy & Izzat Suffian, Wellcome Collection Page 27 – Hip joint – Tom Turmezei, Ken Poole and Graham Page 5 – Baby music – thedanw, Pixabay, Mosquito head Treece, University of Cambridge, Wellcome – CDC/Paul Howell, 8-cell embryo – K. Hardy, Collection Wellcome Collection, chlamydia cells – Page 28 – Mother and child – H. Kuper, Wellcome Collection Shutterstock, Cretian woman – iStock, Bowel Page 29 – Chlamydia cells – Shutterstock, hands and walking section slide – Wellcome Collection stick – iStock Page 6 – C anaanite hieroglyphs – Thameen Darby, Wikimedia Page 30 – M RSA cells – Annie Cavanagh, Wellcome Collection Commons, Houses of Parliament – iStock, father Page 31 – Icelandic horses – Pixabay and son – Getty Images, MRSA cells – Annie Page 32 – E . coli bacterium – iStock Cavanagh, Wellcome Collection Page 33 – P etri dish – iStock Pages 8-9 – H ealth worker and child – Offset and iStock Page 34 – M osquito head – CDC/Paul Howell Page 10 – O steosarcoma of the thigh – Wellcome Collection Page 36 – Plasmodium invasion of red blood cell – NIH/NIAID, Page 11 – M edical discussion – iStock, breast cancer cells Dendritic cell and T-cell – iStock – Izzat Suffian, David McCarthy & Khuloud T. Page 37 – Injection – iStock Al-Jamal, Wellcome Collection Page 38 – C hild – iStock Page 13 – 8 -cell embryo – K. Hardy, Wellcome Collection Page 39 – M alaria parasites – Hilary Hurd, Wellcome Collection Page 14 – C ancer cell migrating through blood – Annie Pages 40-41 – M icroscope – Getty Images Cavanagh, Wellcome Collection Page 45 – S cientists using Sapientia™ – Congenica Page 15 – B reast cancer cell spheroid, SEM – Khuloud T. Page 47 – M RSA cells – Annie Cavanagh, Wellcome Collection Printed by Park Communications on FSC® certified paper. Al-Jamal, David McCarthy & Izzat Suffian, Wellcome Page 48 – W hipworm – Dave Goulding, Wellcome Sanger Collection, acute myeloid leukaemia cells – Paulo Institute, Genome Research Limited Park is an EMAS certified company and its Environmental Management System Henrique Orlandi Mourao, Wikimedia Commons Page 49 – C ommon Starfish – Hans Hillewaert, Wikimedia is certified to ISO 14001. Page 16 – M edical computing – iStock, Antique tumour in Commons This document is printed on Soporset Offset, a paper containing 100% virgin paraffin wax – Great Ormond Street Hospital Page 51 – B lackberry – Pixabay fibre sourced from well-managed, responsible, FSC® certified forests. Page 17 – G ut organoid – Anne Rios and Florijn Dekkers, Princess Maxima Center Designed and produced by

52 Wellcome Sanger Institute Highlights 2017/18 Wellcome Sanger Institute Highlights 2017/18 53 Wellcome Sanger Institute

Tel: +44 (0)1223 834244

sanger.ac.uk