<<

Science with impact

Highlights 2019/20 What we do Our work Our approach Other information What we do Our work Our approach Other information

World-class research and innovation that impacts people’s lives

Read more about what we do at sanger.ac.uk

1 What we do Our work Our approach Other information What we do helps us to understand and improve all life.

Gorilla hand Read how reconstructing a 50,000-year-old parasite reveals how malaria jumped from gorillas to humans on page 32

2 What we do Our work Our approach Other information

Director’s Introduction

As the UK Chief Medical Officer outlined in the 2019 report Health, our global asset – partnering for progress we live in a global village where the health issues of one country swiftly impact another. In this world, – powered by open- access data sharing – is central to delivering the rapid, insightful information needed to shape successful therapies and effective healthcare delivery.

As evidenced by the outbreak of the Covid-19 coronavirus, seamless and open sharing of genomic data is guiding international public health responses and powering the worldwide race to develop vaccines. Here at the Sanger Institute, we analyse malaria parasite from both neighbouring countries and continents to track the emergence and spread of drug resistance, guiding public health strategies. By successfully pooling datasets on samples, databases and analysis we can work with fellow scientists across continents to reveal the causes and potential treatments for tumours, driving the development of precision, personalised medicine.

Research of this scale is a truly collaborative endeavour, both within the Institute, nationally and internationally. It is only through the tireless efforts of researchers in the field or in the wards to collect samples and phenotypic data, combined with the and computational skills of teams around the world, that we can make these scientific advances. To celebrate the invaluable work of our technicians to deliver science at scale, we are proud signatories to the Technician Commitment to develop the talents and careers of our staff. We are also delighted that our International Fellows scheme is enabling new continental networks of genomic research to grow and deliver science.

To deliver truly seamless sharing of genomic and clinical data, we actively support the work of the Global Alliance for Genomics and Health (GA4GH) to create the protocols and frameworks needed to open up the world’s genomic databases to the global scientific community. Through innovations such as GA4GH Passports and Data Use Ontology, the process to gain access to much-needed data will now take a matter of days instead of weeks.

As the impact of genomics is more broadly felt across society as a whole, the Institute plays a pivotal role in helping policymakers and clinicians understand and apply its power. The Institute has a long history of providing expert guidance to the UK’s policy makers and this year has been no exception, with our researchers advising the House of Commons Science and Technology Committee on Commercial Genomics and contributing to the Chief Medical Officer’s annual report. And our researchers have helped to shape online genomics training courses that, in conjunction with Wellcome Campus Advanced Courses and Scientific Conferences, are enabling healthcare professionals to embrace genomic medicine.

Genomic research is still in the foothills of extracting and using the knowledge buried in the six billion letters of code in the . The ever increasing numbers of human genomes sequenced for research or clinical diagnosis will reveal patterns and motifs that will shape health and research for decades to come. When we also consider the rest of the genomes on Earth the potential is vast and the Sanger Institute will be in the vanguard of this revolution in science and society.

Professor Sir Mike Stratton, Director Wellcome Sanger Institute

3 What we do Our work Our approach Other information

At a Glance

In 2019, we read the genomes of 313 species In 2019, the In 2019, we Sequencing Centre read the equivalent provided the equivalent of of one gold-standard (30x) human genome every 2,880 gold-standard 3.5 mins human genomes a week In 2019, the The human genome Sequencing Centre is approximately outputted an average of 31,000bn 3bn DNA bases base pairs a day Sequencing long Centre

Data Centre

Total number of Network backbone compute cores speed 45,000 400GB/sec approx

Centre-based Usable storage high performance in the Data Centre compute cores 65-70Pb 25,000 approx approx Centre-based cloud-based flexible compute cores 20,000 approx

4 What we do Our work Our approach Other information

2019 timeline

Cancer resource reveals bursts of activity

Genome read Global Strep A from just half a vaccine one mosquito-worth step closer of DNA 100 organoid cancer models developed

100 new gut Matt Hurles Nicole Soranzo elected Fellow honoured by discovered of Royal Society EMBO

Molecular blueprint of Gene of early embryo WW1 soldier’s offer MRSA drug development cholera mapped options

Jan Feb Mar Apr May Jun

Ian Dunham Sanger Institute Peter Campbell becomes Open signs up to elected to Targets Director Technician Academy of Commitment Medical Combining Sciences ultrasound Sarah Teichmann with genomics wins 2020 CRISPR catches improves prenatal Biochemical critical cancer diagnoses Society award changes

Sanger Institute joins Building blocks EDIS (Equality, of lungs mapped Diversity and Inclusion for first time in Science and Health) Crusaders made love and war

5 What we do Our work Our approach Other information

Baby biome shows vaginal Drug targets delivery boosts for liver-stage maternal bacteria malaria found transfer

Three CT scans Mathew Garnett could promote receives National cancer Cancer Research development Institute Excellence Award

Jumping can cause rare 140 genes linked Genomes reveal developmental to immune system Africa’s malaria disorders control control is at risk Pathway to colon Successful merge Malaria Cell Atlas cancer mapped of cancer datasets reveals parasite’s creates secrets at every Measles wipes the comprehensive stage of life cycle immune system resource

Jul Aug Sep Oct Nov Dec

Institute research Sanger scientists Wellcome funding featured in CMO’s support LifeLab drives project report event across to map all life in East Anglia Britain and Ireland Genomic monitoring Sanger Sam Behjati is shows spread of co-sequencing recognised with multidrug-resistant 450,000 genomes early career malaria in Asia for UK research award

Houses of Brown Trout Parliament Select genome may Committee reveal species’ evidence on superpowers Commercial Genomics

50,000-year-old Missing gene gorilla to human reveals two IBD malaria gene pathways resurrected

Diarrhoea-causing bacteria adapted to spread in hospitals

6 What we do Our work Our approach Other information

Year in Numbers

568 17.78 research Petabases articles in 2019

Cell 8

7.398 Pb Lancet 1 4.639 Pb Nature 20 3.025 Pb 1.910 Pb

Nature 21 Jan 2016 Jan 2017 Jan 2018 Jan 2019 Jan 2020 Science 10

Who published Where Sanger How much DNA our work? staff are from* was sequenced?

1,199 Europe

51 North America 8 Middle 94 East 14 Asia Pacific 16 Africa Latin America

*In January 2020

7 What we do Our work Our approach Other information Our work is at the forefront of genomic research.

With secured funding from Wellcome, we are able to strategically focus our work in five key research fields

8 What we do Our work Our approach Other information

Tree of Life Discover how we enable researchers to study life at the most intimate detail on page 36

9 What we do Our work Our approach Other information

Cancer, Ageing and Somatic Mutation

The team studied cells from 2,035 colon ‘crypts’

Samples read with 1,000x less DNA than previously

In this section 1 and then fully automated the process 1 Uncovering the pathway Uncovering enabling tens of thousands of genomes to colon cancer to be swiftly sequenced. The method has the pathway to since been adapted for other projects where 2 Why it’s never too late only tiny amounts of DNA are available, for to stop smoking colon cancer example in studies of parasites.

3 Medical imaging radiation The hidden world of genetic mutations In the colon study, published in Nature, promotes cancer capable cells in healthy colon tissue has been researchers found mutations that uncovered. Sanger researchers shed represent the very earliest stages of cancer 4 Dismantling cancer to find light on the very earliest stages of development. The cells were morphologically its weak spots cancer and help to understand how a normal, but had distinct patterns of genetic healthy cell becomes a cancerous one. changes, including in genes known to drive 5 Cancer drug data will power cancer growth. next wave of discovery Researchers at the Sanger Institute have developed technology to sequence the The team also uncovered mutational 6 CRISPR catches out critical genomes of small numbers of cells, allowing signatures in the cells – tell-tale patterns, cancer changes them to study genetic mutations in or ‘fingerprints’ of specific DNA changes unprecedented detail. caused by a specific carcinogenic chemical 7 Clarifying mutations in the driving seat or process. Some signatures were ubiquitous The team studied cells from 2,035 colon and continuous, others were present in only 8 Cancer causing culprits will be ‘crypts’, the tiny cavities that form colon a few individuals, in particular crypts or during caught by their DNA fingerprints tissue. They developed a method to certain periods of life. Six signatures had sequence DNA from just a few hundred cells never been seen before, and further 9 International scientists come together at a time, using over 1,000 times less DNA research is needed to determine their for unprecedented exploration of than standard protocols. The method had cause – potentially uncovering hidden cancer genomes to work on cells, rather than the purified DNA causes of colon cancer. the team would normally use, and it had to be effective without any DNA amplification, which could introduce errors that would obscure mutations. The Sequencing Reference Research and Development team at the Henry L. et al. The landscape of somatic Sanger Institute developed a new workflow mutation in normal colorectal epithelial cells. Nature 2019; 574: 532–537.

0 1 What we do Our work Our approach Other information

2 Lungs can compared with non-smokers. More than a repair themselves Why it’s never quarter of these damaged cells had at least one cancer-driver mutation, explaining the too late to stop increased risk of lung cancer in smokers. Never smoked Unexpectedly, the researchers saw huge smoking genetic variation between individual cells. Smoking added 1,000 to 10,000+ genetic 100% Sanger researchers and their mutations per cell, and cells from the same of cells are collaborators have uncovered protective tiny biopsy of bronchial epithelium could ‘near normal’ cells in the lung that are shielded from vary 10 fold. the harmful effects of tobacco. These cells re-awaken in ex-smokers to The researchers also found a population actively repair the airways after of cells in ex-smokers with less genetic someone stops smoking, protecting damage from tobacco that looked similar against lung cancer. to cells in people who had never smoked. Smokers Ex-smokers had four times more of these When someone stops smoking, regardless cells than people who still smoked – 90-96% of their age or how long they smoked for, accounting for up to 40 per cent of lung of lung cells have up to their risk of lung cancer immediately reduces. cells. These cells have escaped the It continues to fall with time. To understand damaging effects of tobacco, and have a 10,000 why, and to quantify the effects of smoking much lower risk of developing into a cancer. extra mutations on healthy lung cells, researchers at the After stopping smoking, the undamaged compared with non-smokers Sanger Institute and UCL compared the cells can reawaken and repopulate the genetic changes in non-cancerous lung lungs, protecting against cancer. cells from smokers, ex-smokers and people who have never smoked. Ex-smokers

The team sequenced the genomes of Reference 20-40% 632 cell clones from the lungs of 16 people. Yoshida K. et al. Tobacco smoking and somatic of cells are As expected, they found thousands more mutations in human bronchial epithelium. ‘near normal’ Nature 2020; 578: 266–272. genetic mutations in cells from smokers 4x more ‘near normal’ cells than smokers

3 To investigate potentially hidden effects Medical imaging of radiation exposure, the researchers gave mice a 50 milligray dose of radiation, radiation equivalent to three or four CT scans. This caused an increase in the number of promotes cancer cancer-capable cells with mutations in the p53 gene. p53 is mutated in 5–10 per cent p53 capable cells of normal oesophageal tissue, but in almost all is mutated in oesophageal squamous cell carcinomas – 5-10% of normal oesophageal tissue Sanger Institute and University of suggesting that these arise from the Cambridge researchers studied the p53 mutant cell population in normal tissue. effects of low doses of radiation in the oesophagus of mice. Scientists found The researchers then gave the mice an that doses equivalent to three CT scans over-the-counter antioxidant – N-Acetyl promote the growth of cancer-capable Cysteine (NAC) – before the radiation cells in healthy tissue and recommend exposure. The team discovered that the this risk should be considered in antioxidant gave normal cells the boost assessing radiation safety. needed to outcompete and replace the p53 mutant cells. Sanger researchers previously found that our healthy tissues are battlefields, where mutant Until now, the effects of low levels of radiation cells compete with healthy cells for space. on the DNA of healthy cells has remained These mutant cells accumulate as we age – hidden, making assessing the risks difficult. though the vast majority do not go on to Future research will not only uncover the become cancer. longer-term effects of radiation on cancer risk, it will also help scientists understand Reporting in Cell Stem Cell, the team what sets a cell on the journey from healthy investigated the effects of low doses of to becoming cancerous, shedding light on radiation on the cellular dynamics of healthy the very earliest stages of disease. tissue. Low doses of radiation, like those experienced during x-rays and CT scans, are generally considered safe as they cause little DNA damage. Reference David, F. et al. Outcompeting p53-mutant cells in the normal esophagus by redox manipulation. Cell Stem Cell 2019; 25: 3; 299–300. 11 What we do Our work Our approach Other information

4 represent patients’ tumours, including The collaboration between researchers at Dismantling the diversity of molecular features across the Sanger Institute, EMBL-EBI and GSK, different types of cancer. The cell models the Open Targets partners, bolsters the cancer to find include common cancers as well as cancers translation of these research results into its weak spots of particular unmet clinical need. new treatments. After the disruption, the cells’ growth The work is part of the Cancer DepMap Researchers from the Sanger Institute, was assessed and each gene was given project, and the analysis provides a Open Targets and their collaborators a fitness score in each cell model. If cells resource of cancer dependencies, generates used CRISPR-Cas9 technology to didn’t survive, the gene that had been a framework to prioritise cancer drug targets disrupt every single gene in more than disrupted was considered essential for and suggests specific new targets. 300 cancer models, representing that cancer’s survival. This process 30 types of cancer. The team took resulted in a list of over 1,000 genes cancers apart, piece by piece to essential for the different cancers. discover thousands of key survival Reference genes, and prioritised them to produce To prioritise the list, the team systematically Behan, F. et al. Prioritization of cancer therapeutic a hit-list for drug development. integrated additional data, resulting in a targets using CRISPR–Cas9 screens. Nature 2019; : 511–516. ranked list of 628 potential drug targets. 568 Researchers across the globe are designing Their work is published in Nature, and the new targeted therapies that can kill cancer hugely valuable data resource is openly cells and leave healthy tissue unharmed. available to researchers around the world. But developing such treatments is costly, and the vast majority of new drugs fail during One of the top-scoring target genes present development. Finding the correct targets in multiple different cancer types was Werner to start with will massively accelerate the syndrome RecQ helicase (WRN). The team Open Targets is a pioneering public- process, bringing new effective treatments found that cancer cells with a faulty DNA private collaboration that aims to to patients faster. repair pathway, known as microsatellite transform drug discovery by unstable cancers, require WRN for survival. systematically improving the identification To find potential drug targets, the team, Microsatellite instability occurs in many and prioritisation of drug targets and led by researchers from the Sanger Institute, different cancer types, including colon and improving the success rate for developing GSK, the European Institute stomach cancers. The new identification of new medicines. opentargets.org (EMBL-EBI) and Open Targets, used CRISPR WRN as a promising drug target offers an gene editing technology to disrupt each of exciting opportunity to develop the first 18,009 genes in 324 cancer models. The cancer treatments to target WRN – models – cells originally from a tumour but commonly seen in colon, ovarian, now grown in a laboratory – broadly and endometrial cancers.

CRISPR gene 5 editing has identified over Cancer drug data To undertake the hundreds of thousands of experiments needed, the researchers will power next miniaturised the assays and developed 1,000 sophisticated robotics, using acoustic wave of discovery dispensing to ‘jump’ droplets of drugs genes essential onto the cells. in 30 types Sanger researchers have generated and of cancer published the world’s largest dataset The result is a unique, online, open-access showing how a cancer’s underlying resource for the research and medical genomic landscape influences its communities. The data can be used to response to treatment. The data, freely optimise the clinical application of cancer available to the global research drugs as well as the design of clinical trials. community, will drive new developments The previous releases of data have enabled in cancer treatment. discoveries that led to drug trials of PARP inhibitors in childhood bone cancer and Cancer is genetically diverse – hundreds of directly contributed to drug development. different genetic changes can drive or inhibit So far, the resource has powered 70 research a cancer cell’s growth. Predicting which studies across the globe. Working with The cancer treatment will work for which patient remains Massachusetts General Hospital Cancer drug screening a challenge in cancer medicine. Over the Center and as part of the Cancer DepMap data has powered past 10 years, Sanger researchers and their project, the researchers continue to expand partners have shown there is an intimate their work to understand how a cancer’s relationship between the genetic changes underlying genomic landscape influences its 70 present in cancer cells and the way a response to treatments with the ultimate aim drug works. of targeting treatments to individual tumours. research studies across the globe To precisely define those links, the team has screened 498 drugs against 1,000 cancer cell lines, representing 30 different Link types of cancer. The drugs include licensed Genomics of Drug Sensitivity in Cancer resource – treatments and experimental compounds. cancerrxgene.org 12 What we do Our work Our approach Other information

6 The results, published in Nature CRISPR catches Communications, show that 90 per cent Stories of gene fusions do not play an essential role 4-6 out critical in cancer – an important finding for clinicians when inferring causes of a cancer from the cancer changes genome sequence of a patient’s tumour.

Researchers from the Sanger Institute, The team also discovered a new fusion, Open Targets and their collaborators YAP1-MAML2. It is essential for the used CRISPR-Cas9 in a large-scale progression of multiple cancer types, analysis of gene fusions to reveal including some brain and ovarian cancers, Cancer Dependency Map which ones are critical for the growth and so represents a new treatment target. The Sanger Institute and the Broad of cancer cells. As a result, the The researchers also found fusions involving Institute are producing cancer researchers identified a new drug the genes RAF1, BRD4 and ROS1, in cancer dependency maps that will enable target for multiple cancers. types previously unknown to harbour them cancer researchers around the world – pancreatic, breast and lung. This opens to discover new drug targets and Gene fusions, caused by the abnormal up the possibility of re-purposing existing precisely match an individual’s cancer joining of two otherwise different genes, play treatments designed for other cancer mutations to specific vulnerabilities. an important role in cancer. As a cancer’s types for patients with these particular genome becomes increasingly dysregulated, gene fusions. gene fusions are common. Identifying gene fusions aids both cancer diagnosis and The collaboration between researchers at prognosis, and they are the targets of some the Sanger Institute, EMBL-EBI and GSK, of the latest cancer treatments. the Open Targets partners, bolsters the translation of these research results into In the first large-scale analysis of cancer new treatments. gene fusions, researchers at the Sanger Institute, EMBL-EBI, Open Targets and GSK analysed more than 8,000 gene fusions in over 1,000 human cancer cell lines, from Reference 43 different cancer types. Gabriele, P. et al. Functional linkage of gene fusions to cancer cell fitness assessed by pharmacological and CRISPR/Cas9 screening. Nature Communications 2019; The team first analysed RNA-sequencing 10: 2198. (RNA-seq) data to define the gene fusions present in the cell lines. They then used genome-wide CRISPR screens to uncover which gene fusions are critical for each cell line’s survival and tested the cell lines against more than 350 cancer drugs.

Only 10% of gene fusions play a key role in cancer

13 What we do Our work Our approach Other information

7

Clarifying Discovery of cancer drivers, genetic The overall conclusion, however, reaffirms mutations that drive a cancer to grow, the fact that the vast majority of cancer mutations in has largely focused on genes that code drivers occur in -coding regions for . To investigate driver mutations of the human genome. Some non-coding the driving seat in the 98 per cent of the genome that drivers identified in previous studies doesn’t code for proteins, researchers were found to be the result of flaws in A clearer picture of how DNA changes at the Sanger Institute and the Broad methodology. This unexpected finding lead to cancer has emerged, following Institute analysed 2,658 cancer genomes has implications for future research and the most comprehensive evaluation from the Pan-Cancer project. the development of new cancer treatments. of non-coding driver mutations. The study is the most comprehensive For cancer patients, these results mean evaluation of non-coding driver mutations that the vast majority of clinically-relevant to date, in terms of the number of methods mutations in a cancer are likely to be employed, number of samples analysed, found in protein-coding sequences, which and the number of cancer, genome region will simplify efforts for the clinical use of and mutation types studied. Types of genome sequencing in cancer diagnosis mutations studied include promoters, and treatment. enhancers, which control gene activity, non-coding and other elements. The team combined the results of several driver-discovery methods to overcome Reference limitations of any individual method. Rheinbay E et al. Analyses of non-coding somatic drivers in 2,658 cancer whole genomes. Nature 2020; : 102–111. The research, published in Nature, found 578 several new cancer driver mutations in non-coding genes, including a point mutation in the TP53 promoter region, which resulted in inactivation of the gene.

8 cancer. They analysed a total of 84,729,690 Cancer causing genetic mutations, 10 times more than any previous study. References Ludmil A et al. The repertoire of mutational signatures culprits will be in human cancer. Nature 2020; : 94–101. The team used multiple methods to classify 578 caught by their and characterise the mutational signatures. Li Y et al. Patterns of somatic structural variation in They identified 49 single base substitution, human cancer genomes. Nature 2020; 578: 112–121. DNA fingerprints 11 doublet base substitution, four clustered base substitution, and 17 small insertion and Sanger researchers together with deletion signatures. Many had not been seen international collaborators have created before. While the causes of many signatures the most detailed list of mutational remain unknown, this analysis provides a signatures in cancer to date. It will systematic perspective on the range of be used to understand how cancer mutational processes contributing to cancer. 81 develops and discover new causes of The work is part of the Pan-Cancer project, mutational signatures cancer, helping to inform public health published in Nature. were identified by studying strategies to prevent the disease. Another Pan-Cancer study by Sanger The processes that change DNA leave researchers discovered that larger, more 84,729,690 patterns of damage, or mutational signatures, complex rearrangements of DNA could behind. Those factors that can lead to also act as mutational signatures, and mutations cancer, from UV light and tobacco smoke, point towards causes of cancer. to faulty DNA repair, each leave a distinct signature – their unique fingerprints can be It is likely that a substantial proportion of the seen in the DNA. naturally-occurring mutational signatures found in human cancer have now been To build a comprehensive picture of described. This comprehensive repertoire mutational signatures in cancer, researchers provides a foundation for research into why from the Sanger Institute, the cancer rates vary across location and time, and their collaborators analysed almost the mutational processes operating in every publicly available cancer genome normal tissues, clinical and public health sequence. The team studied data from applications of signatures, and mechanistic 4,645 whole genome and 19,184 understanding of the mutational processes sequences, encompassing most types of underlying cancer.

14 What we do Our work Our approach Other information

37 countries 2,600+ genomes

5 tools

1,300 scientists and clinicians 5 data portals 38 tumour types

and passed rigorous quality control tests. International The Pan-Cancer project advanced Stories methods for analysing cancer genomes and 7-9 scientists come the software for genome analysis, together with the raw genome sequencing data, are together for freely available to the scientific community. unprecedented Their research is published in 23 papers in Nature and affiliated journals. Several exploration of themes have emerged; firstly, the cancer genome is finite and knowable. Pan-Cancer Analysis of cancer genomes Researchers can characterise every Whole Genomes Study genetic change found in a cancer and The study aimed to examine the The global Pan-Cancer project has all of the processes that have generated similarities and differences among completed the most comprehensive those mutations, including the order of the genomic and cellular alterations study of whole cancer genomes. key events during a cancer’s life history. found across diverse tumour types. Researchers uncovered causes of previously unexplained cancers, new Secondly, scientists are close to mechanisms of cancer development cataloguing all biological pathways involved and have signposted new directions in cancer – from changes in single DNA for diagnosis and treatment. letters to the reorganisation of whole . Thirdly, through a new The decade-long Pan-Cancer project method of dating mutations, researchers involved over 1,300 scientists and clinicians can identify those which occurred years from 37 countries. Co-founded and co-led before the tumour appeared. This could by researchers at the Sanger Institute, the lead to earlier cancer detection. Finally, collaboration of scientists analysed more tumour types can be identified accurately than 2,600 genomes of 38 tumour types, according to the patterns of genetic including breast, bone and cancers. changes seen across the genome, potentially aiding diagnosis. Cancer is driven by genetic changes, and the rapid advancement of sequencing This landmark project will accelerate technologies means that documenting those the development of precision medicine changes across whole genomes is now for cancer – where patients are matched possible. The Pan-Cancer project explored to therapies targeted at their tumours, the entire genome, including regions that using genomics. don’t code for proteins, regulatory regions that control gene activity, non-coding Ribonucleic acids (RNAs), and large-scale structures. Links The project website docs.icgc.org/pcawg To facilitate the comparison between provides links to data resources for online browsing, analysis and download of data and results diverse tumour types, all of the genome data were subjected to uniform processing, The research papers can be accessed via: nature.com/collections/pcawg 15 What we do Our work Our approach Other information

Cellular Genetics 74,000 skin, kidney and yolk sac cells were studied 140,000 liver cells were studied

In this section 1 Reported in Nature, the team charted 1 First cell map of developing human First cell map changes in the cellular landscape of the liver liver reveals how blood and immune between the first and second trimesters of systems grow of developing pregnancy. They discovered that during development, first generation haematopoietic 2 C hildhood kidney cancer discovery human liver stem cells stay in the liver. The next could revolutionise treatment generation cells – progenitor cells – travel reveals how to other tissues. They mature in places such 3 Human kidney map charts our as the skin, where they develop into red blood growing immune defence blood and cells to help meet the high oxygen demands of a growing foetus. These cells subsequently 4 C han Zuckerberg Initiative boosts immune systems seed other peripheral tissues and finally, Human Cell Atlas research grow bone marrow. 5 F irst lung map uncovers new insights The team also studied genes known to be into asthma Sanger researchers and their colleagues involved in immune deficiencies to see which have mapped the cellular landscape of cells were expressing them. The result is a 6 S cientists hone in on DNA differences the developing liver. The data will comprehensive, high-resolution resource – behind immune enhance understanding of how the the developmental liver cell atlas. It provides blood and immune system develop, crucial insights into how the blood and 7 M ajor stem cell discovery boosts and support efforts to tackle diseases immune systems develop. It will also enhance research into development and such as leukaemia. research aiming to tackle diseases which can regenerative medicine form during development, such as leukaemia. It was not previously known precisely how the blood and immune systems In a related study, Sanger researchers develop in humans – a process known collaborated with a team at the University as haematopoiesis. In adults, bone marrow of Edinburgh and identified new sub-types creates the blood and immune cells. But of cells that, when they interact, accelerate in early embryonic life, the yolk sac and the scarring process in diseased livers. It is liver play a major role. hoped that the findings, also published in Nature, will help researchers develop new To understand the process, Sanger treatments for liver diseases. scientists, together with teams from the Wellcome – MRC Cambridge Stem Cell The work is part of the Human Cell Institute, Newcastle University, University Atlas initiative. of Cambridge and others, used single cell technology to analyse 140,000 developing liver cells and 74,000 skin, kidney and yolk sac cells. References Popescu, D. et al. Decoding human fetal liver They characterised individual cells isolated haematopoiesis. Nature 2019; 574: 365–371.

from the developing liver using morphology, Ramachandran, P. et al. Resolving the fibrotic niche and single-cell RNA sequencing to determine of human liver cirrhosis at single-cell level. Nature which genes were active. Haematopoietic 2019; 575: 512–518. cells in sections of developmental liver were tagged using heavy metal markers so researchers could map each cell to its location.

16 What we do Our work Our approach Other information

2 The team showed that the patches of Childhood genetically abnormal cells within the normal tissue, known as precursor lesions, kidney cancer had developed from a single ‘rogue’ cell. Phylogenetic analyses of the cells, a discovery could technique first pioneered by Sanger researchers in 2012, indicate that they revolutionise evolve in early development, before the treatment left and right kidney diverge. More than 60 per cent of the rogue cells The genetic origins of childhood contained a mutation in the h19 gene. The kidney cancer have been uncovered mutation suppresses h19’s activity, causing Mutations in by Sanger researchers. The findings the cell to multiply more quickly than those h19 were found in indicate precursors to the cancer could around it. These mutated rogue cells are the be removed, rather than sacrificing likely precursors to Wilms’ tumour. the whole kidney, which would Uncovering the genetic root of Wilms’ tumour 60%+ revolutionise treatment. represents a shift in the understanding of of Wilms’ tumour Wilms’ tumour is the most common form childhood cancer – researchers didn’t expect associated of childhood kidney cancer, mainly affecting to find the root of childhood kidney cancer in cells children under five. Most cases are curable normal-looking tissue. The results could help by removing the kidney, but in 10 per improve treatments, as it may be possible cent of cases, both kidneys are affected. to remove precursor lesions, rather than Without complete removal of the affected sacrificing the whole kidney, to effectively kidneys, there is a strong likelihood of the treat the disease. cancer returning. Stories To uncover the causes of Wilms’ tumour, 1-5 Sanger researchers worked together with References colleagues at Addenbrooke’s Hospital Coorens THH et al. Embryonal precursors of in Cambridge and Great Ormond Street Wilms tumour. Science 2019; 366: 1247–1251. Hospital in London. Young, M. et al. Single cell from human kidneys reveal the cellular identity of renal Reporting in Science, the team compared tumours. Science 2018; 361: 6402: 594–599. the whole genome sequences of 66 tumour samples with 163 normal kidney samples. The Human Cell Atlas In two thirds of children with Wilms’ tumour, The Human Cell Atlas is a mission to they found DNA mutations associated with create comprehensive reference maps the disease in both the morphologically of all human cells—the fundamental normal kidney tissue and the tumour tissue. units of life—as a basis for both understanding human health and diagnosing, monitoring, and treating disease. The initiative is coordinated by researchers at the Sanger Institute, the Broad Institute in the US, and the Riken and Karolinska Institutes in Japan and Sweden. humancellatlas.org

17 What we do Our work Our approach Other information

3 To map the position, development and Human kidney functions of immune cells that reside in the kidney, researchers studied nearly 70,000 map charts our individual kidney cells from developmental, child and adult kidney tissue. The research, growing immune conducted by teams from the Sanger Institute, and defence Newcastle University, is part of the Human Cell Atlas – a project to map every cell type Sanger researchers and their in the human body. collaborators mapped 70,000 individual cells to create the first cell atlas of the Cells were assessed with single-cell RNA human kidney. The research reveals sequencing, to identify which genes were 850m zones of immune cells in the kidney and active. The team mapped the cells over time people worldwide the map will help scientists understand and space, to understand the development have kidney disease chronic kidney disease and why and organisation of the kidney’s immune transplants are rejected. system. They discovered that the very earliest cells that populate the developing Kidneys filter the blood, removing waste and kidney are macrophages – large white blood excess water, and maintain the levels of salts, cells that consume pathogens – that remain minerals and water that other organs need in the kidney as we grow older. to function. Worldwide, 850 million people are affected by chronic kidney disease – a The team were able to chart which types The findings, published inScience , will help condition where kidneys gradually lose their of immune cells were present in particular scientists understand the immune system ability to function. It can progress to kidney zones of the kidney. The researchers in the kidneys, what happens when tissue failure, for which dialysis or transplant are the discovered that antimicrobial immunity is is damaged or infection occurs, how this only treatment options. The immune system spatially zonated in the adult kidney, with can lead to chronic kidney disease, and plays a critical role in responding to kidney a strong ‘defence zone’ at the base, near why kidney transplants are rejected. tissue damage, but very little is known where leaves the kidney, which fights about how it works. against urinary tract infections. There were few active immune cells in the developing kidney, which aligns with the view that a Reference developing baby is relatively sterile. Stewart, B. et al. Spatio-temporal immune zonation of the human kidney. Science 2019; 365: 1461–1466.

4 The grants support networks of scientists Chan Zuckerberg from different disciplines who study a The Sanger Institute variety of human organs. Bringing together is involved in five Initiative boosts scientists, computational biologists, software engineers, and physicians, these HCA projects funded Human Cell Atlas collaborative groups will work towards by CZI building an atlas of the whole body – research mapping the types, location and functions of cells in unprecedented detail. Sanger researchers have received funding from the Chan Zuckerberg The funding supports five projects at the Initiative (CZI) for five collaborative Sanger Institute to investigate specific projects supporting the Human Cell tissues in the thymus, lung, liver, kidney Atlas (HCA). and immune system. The aim is to understand the tissues in health and disease. Immune Kidney The HCA is a global initiative to map All the data from the research will be made system every cell type in the human body. In 2019, available via the HCA Data Coordination the Chan Zuckerberg Initiative funded 38 Platform, a unified resource that will enable new collaborative projects to support the free data sharing across researchers and HCA, with grants totalling $68 million. research institutes.

Liver Lung

Thymus

18 What we do Our work Our approach Other information

5 A symptom of asthma is an overproduction First lung map of mucus. The team discovered a previously unknown mucus-creating cell state – the uncovers new mucociliated state – in asthmatic lungs. insights into The researchers also revealed that asthmatic lungs had many more asthma inflammatory Th2 cells than non-asthmatic lungs. These cells sent the vast majority Researchers from the Sanger Institute, of cellular signals in the lungs, dominating Open Targets and their collaborators communication pathways. In normal lungs, mapped the location, type, and activity a broad range of cells send communication of cells in the lungs, uncovering a new signals to keep the airways functioning well. cell state responsible for creating The team’s finding highlights the importance mucus. The results could help pinpoint of Th2 cells and inflammatory signals in new drug targets for asthma. asthma, and that knowledge could help inform drug development. Asthma, a common lung condition that makes breathing difficult and triggers The research from the Sanger Institute, coughing, wheezing and shortness of breath, University Medical Center Groningen, is caused by swelling of the tubes that carry Open Targets, GSK and others was air to the lungs. It affected more than 350 published in Nature Medicine. The large- million people worldwide in 2015, and at least scale data set is openly accessible to 5 million people in the UK are currently researchers worldwide, providing an receiving treatment. important resource for research into lungs, asthma, and potential new treatments. As part of the Human Cell Atlas (HCA) project to map every cell type in the body, researchers studied more than 36,000 individual lung and nasal cells from people Reference with and without asthma. Using single-cell Vieira Braga, F. et al. A cellular census of human sequencing technology, they analysed the lungs identifies novel cell states in health and in asthma. Nature Medicine 2019; : 1153 –1163. gene activity in each cell. The study revealed 25 clear differences between asthmatic and normal lungs.

36,000+ individual lung and nasal cells from people with and without asthma were studied

19 What we do Our work Our approach Other information

6 The variants are frequently in regions of the One particular cell type and its state – early Scientists genome that control gene activity in immune activation of T cells – had the most cells, specifically T cells and macrophages, active DNA across the same regions as the hone in on DNA but the variants’ roles in disease remain genetic variants implicated in immune poorly understood. diseases. Surprisingly, the research showed differences that the cytokines generally only had subtle To understand the significance of genetic effects on the DNA activity, and played a behind immune variants, researchers from the Sanger lesser role in most of the diseases studied. Institute, together with GSK and Biogen diseases within the Open Targets collaboration, The results point towards the initial activation stimulated T cells and macrophages from of these memory T cells being important Discovery that early ‘switch-on’ healthy donors. They included different in disease development. The findings, of certain immune cells triggers combinations of inflammatory molecules, published in Nature Genetics, will enable inflammatory diseases could lead called cytokines, with the activation, researchers to study in great detail how to new drug targets for conditions to mimic 55 different cellular states. the ‘switching on’ of T cells and other like asthma. immune cells is regulated and discover Using advanced DNA sequencing methods, genes and pathways that could be used Immune diseases, including arthritis, the team looked at which parts of the as new drug targets for immune conditions. asthma, multiple sclerosis and inflammatory genome were active in three types of bowel disease, are caused by immune immune cells from healthy volunteers, and cells mistakenly targeting the body’s own cross-checked these positions against all cells, rather than an invading pathogen, the genetic variants implicated in different Reference and causing inflammation. These conditions immune diseases. Soskic B. et al. Chromatin activity at GWAS loci affect millions of people worldwide, yet it identifies T cell states driving complex immune disease. Nature Genetics 2019; : 1486–1493. is not known what triggers the immune To enable this complex analysis, they 51 system to act in this way, or which cell developed a new computational method, types are involved. called CHEERS. Openly available, this resource could also be used to find links Previous research has found thousands between genetic variation and mechanisms of genetic changes, or variants, that are for other complex diseases. more common in patients with immune diseases than in healthy people.

Study mimicked 55 different cellular states

20 What we do Our work Our approach Other information

Stem cells have the ability to develop into Using the same technology, the Major stem cell other cell types, and existing stem cell lines researchers have now derived stem cells – characterised populations of stem cells from early pig embryos, creating porcine discovery boosts maintained in laboratories – are already stem cells – something that had not extremely useful for research into previously been achieved. Domestic research into development and disease. However, the pigs have great potential for biomedical types of stem cell lines currently available research due to their genetic and development and have limitations; they are not usually anatomical similarities to humans, capable of producing extra-embryonic including comparable organ sizes. regenerative cells including the placenta or yolk sac. Under similar conditions, the team also medicine Teams of researchers from LKS Faculty of used human stem cells to create EPSCs Medicine of the University of Hong Kong, that display the molecular and functional Scientists from the University of the Sanger Institute, and the Friedrich- attributes reminiscent of porcine EPSCs. Hong Kong, Sanger Institute and their Loeffler-Institut in Germany have previously The pathway-inhibition method opens collaborators have created human created EPSCs in mice. They achieved this an avenue for generating mammalian and pig Expanded Potential Stem Cells by inhibiting the critical molecular pathways pluripotent stem cells. (EPSCs) – stem cells that look like the that predispose the cells’ differentiation. very first cells in the embryo and can The EPSCs have the features of the very The research, published in Nature Cell develop into any type of cell. The first cells in the developing embryo, and can Biology, offers incredible potential for achievement presents a unique cellular develop into any type of cell, including studying human development and platform for research into placental cells. regenerative medicine. and regenerative medicine.

Reference Gao, X. et.al. Establishment of porcine and human expanded potential stem cells. Nature Cell Biology 2019; 21: 687–699.

21 What we do Our work Our approach Other information

Human Genetics

Measles causes 100,000+ per year worldwide

In this section 1 The team discovered that a person’s 1 Measles infection wipes immune Measles infection specific memory B cells, which swiftly system’s memory tackle infections that have been previously wipes immune encountered, were depleted after a measles 2 O ut-of-Africa and back again: ancient infection. This is despite the fact that total human migrations system’s memory blood cell counts – the measurement taken in clinics – had recovered. The depletion 3 Crusaders made love and war Sanger scientists have found how of memory B cells had not previously been measles causes long-term damage seen in measles patients and this would 4 Rare genetic change gives clues to to the immune system and leaves leave the children vulnerable to a whole pancreas and brain development people vulnerable to other infections, range of infectious diseases they had emphasising the importance previously been immune to. 5 ‘Jumping genes’ can cause rare of vaccination. developmental disorders in children The researchers discovered that the Measles is highly contagious, causing measles virus also affects naïve B cells, 100,000 deaths per year worldwide in which are those yet to be exposed to unvaccinated communities. A measles an infection. After infection with measles, infection weakens the immune system the naïve B cell population was reset to an and leaves people much more susceptible immature ‘baby-like’ state, able to produce to other infectious diseases. This only a limited range of antibodies. This would immunosuppression can last for months leave the children with a reduced ability to After infection or years after the infection has passed. fight any new infections. The team’s work some children were provides the explanation for ‘immune immune suppressed To understand the impact of measles on amnesia’ caused by measles. for up to the immune system, Sanger scientists, together with collaborators at the University The study, published in Science Immunology, of Amsterdam, studied a group of non- has huge implications for vaccination and vaccinated people in the . public health. It shows that not only does 5 measles vaccination protect people from years Blood samples were taken from 26 children measles, it also protects from other before and after a natural measles outbreak. infectious diseases. The researchers sequenced and analysed specific DNA sequences from immune B cells. These B cell receptor (BCR) genes are responsible for producing antibodies which Reference aid the immune system to recognise Petrova V N et al. Incomplete genetic different pathogens. reconstitution of B cell pools contributes to prolonged immunosuppression after measles. Science Immunology 2019; 4: eaay6125.

22 What we do Our work Our approach Other information

2 Three Nigerian men representing this humans out of Africa. Sanger researchers Out-of-Africa lineage were first studied in 2003. Since demonstrate how DNA sequences from then, genome sequencing technology has just three chromosomes can provide deep and back again: advanced, and so the team sequenced the understandings and insights into genomes again. They analysed the genome human evolution. ancient human sequences against the globally-recognised migrations Y family tree. Their analysis, published in Genetics, shows Reference Sanger scientists resolve a long-running that the three Y chromosomes represent Haber M et al. A Rare Deep-Rooting D0 African debate about out-of-Africa migration a new lineage – termed ‘D0’, which the Y-Chromosomal Haplogroup and Its Implications for the Expansion of Modern Humans Out of Africa. team included in an update to the human after genomic analysis of three Nigerian Genetics 2019; 212: 1421–1428. Y chromosomes. ancestral family tree. These results provide key information to add to the debate about According to current models, present-day the factors that influence the geographical humans outside Africa descend mainly locations of the Y lineages 50,000–100,000 from a single migration out of the African years ago and the mode of expansion out continent some 50,000–70,000 years ago. of Africa. The estimation is supported by genetic data from genome-wide, mitochondrial DNA, The researchers considered three scenarios and Y-chromosomal analyses, but many for the expansion of Y lineages out of Africa, details remain unclear, including the incorporating migration back to Africa where movements of the male-specific necessary, to explain present-day Y-lineage Y chromosome lineages at the time. distributions. Considering published archaeological evidence for modern humans To investigate, Sanger researchers studied outside Africa, and mixing with Neanderthals, a rare African Y-chromosomal lineage. the most favoured scenario is an exit from The Y chromosome, passed directly from Africa between 50,300–59,400 years ago. father to son without genetic recombination, gives researchers a clear picture of ancestry This work resolves a long-running debate and the rate of DNA change over time. about Y-chromosomal out-of-Africa/ back-to-Africa migrations, increasing our understanding of the movement of modern

3

Crusaders made where DNA degrades quickly. Only recent advances in DNA extraction and sequencing love and war technology made the study possible.

Sanger researchers sequenced the Genetic analysis showed all of the individuals 800 genomes of 13th century skeletons were males; some were Western Europeans discovered in Lebanon. The skeletons from diverse origins, some were locals year-old DNA were found to be Crusaders who (genetically indistinguishable from present- extracted from had families with local people in day Lebanese), and two individuals were bone in skulls the Near East, and that they fought a mixture of European and Near Eastern and died side by side with their sons. ancestries. This provides direct evidence that the Crusaders mixed and had families During the medieval period, hundreds of with the local population, and that their sons thousands of Europeans migrated to the joined them in battle. Near East to fight in the Crusades. These religious wars, which took place between These ancestral mixtures appear to have 1095 and 1291, were led by Christian had no genetic consequences on Lebanese nobility who tried to claim the Near East. But people today, as genetic signals of admixture historical records lack details of the ordinary with Europeans were quickly lost over time. soldiers who travelled to, lived and died there. The study, published in the American Journal of Human Genetics, shows how To study these important historical events, collaborations between scientists and Sanger scientists worked closely with archaeologists and genomic analysis archaeologists at the Sidon excavation site of ancient DNA can provide exceptional in Lebanon who had recently uncovered 25 historical insights. skeletons, believed to be Crusaders killed in battle in the 13th century. The bones of nine skeletons were transferred to a specialist ancient DNA laboratory in Cambridge. Reference Haber M et al. A Transient Pulse of Genetic Admixture Small portions of the surviving 800-year-old from the Crusaders in the Near East Identified from Ancient Genome Sequences. American Journal of DNA were extracted from the temporal bone Human Genetics 2019; 104: 977–984. in the skulls. The bodies had been burned and buried in a warm and humid climate, 23 What we do Our work Our approach Other information

CNOT1 gene mutation identified in people lacking a pancreas

4 Sanger researchers, together with a team at The CNOT1 gene had previously been Rare genetic the University of Exeter, studied 107 people implicated in keeping human and mice with pancreatic agenesis. In those people embryonic stem cells in a pluripotent state, change gives where the genetic cause of their condition meaning they are capable of developing was unknown, the researchers sequenced into any type of cell. In the mice with the clues to pancreas the protein coding regions of the genome to Cnot1 mutation, the researchers found the uncover the genetic changes responsible. activity of a key developmental factor had and brain They discovered an identical mutation in the changed, which kept stem cells as stem CNOT1 gene of three people. The three cells and prevented them from developing development individuals had similar clinical features, into pancreas cells. including holoprosencephaly – a condition Understanding how the pancreas forms where the forebrain fails to develop The work, published in the American could aid research into treatments for into two hemispheres. Journal of Human Genetics, adds crucial type 1 . understanding to how the pancreas forms. To investigate further, the team bred mice The results could also help researchers Very rarely, babies are born without a to have the same mutation in the equivalent develop replacement insulin-producing pancreas – a condition called pancreatic mouse gene – Cnot1. As a result, the mouse cells to treat patients with type 1 diabetes agenesis. Forming part of the digestive embryos had a much smaller pancreas than in the future. system, the pancreas makes various usual, directly linking Cnot1 with pancreas hormones, including insulin that controls development. The researchers also saw the amount of sugar in the blood. To survive, changes in the brain development of the those with pancreatic agenesis need mice, including structural defects. This is Reference immediate and lifelong treatment with the first time CNOT1 has been identified De Franco E, Watson R. et al. A specific CNOT1 insulin and other hormones. as important in pancreatic and mutation results in a novel syndrome of pancreatic agenesis and holoprosencephaly through impaired neurological development. pancreatic and neurological development. American Journal of Human Genetics 2019; 104: 985–989.

24 What we do Our work Our approach Other information

the children taking part. Dozens of new In this latest study, published in Nature ‘Jumping genes’ conditions have been identified. A genetic Communications, the Sanger team diagnosis gives families improved access developed and validated an analytical can cause rare to support networks and, in some cases, process for the rapid assessment of MEs opportunities to participate in clinical in genome data. They used the software developmental research in the future. to search for MEs in DDD patients and identified four children with MEs that disorders in Mobile genetic elements, also known as were the likely cause of their symptoms, mobile elements, or MEs, are pieces of providing a new diagnosis for three children DNA that can ‘jump’ from place to place of them. in the genome via a process called After analysing the role of ‘jumping retrotransposition. MEs can cause While events like MEs are rare, they genes’, three more children with rare disease if they land in a gene and disrupt represent an important class of genetic developmental diseases received a its function, but it has not been previously variation to investigate in clinical diagnosis for the first time through known if MEs could cause assessments. The software is openly the Deciphering Developmental developmental disorders. available and could be used to help Disorders (DDD) study. diagnose other patient groups. To uncover the potential role of MEs The DDD study began in 2010, and in developmental disorders, Sanger recruited nearly 14,000 children with researchers, together with colleagues rare, undiagnosed, genetic conditions. from NHS Regional Genetics services, Reference Since then, genetic analyses by Sanger studied the protein-coding sequences Gardner E et al. Contribution of retrotransposition scientists, NHS Clinical Geneticists from 9,738 parent and child trios (28,132 to developmental disorders. Nature and researchers around the world have individuals) of DDD participants. Communications 2019; 10: 4630. provided diagnoses for over 4,500 of

14,000 children were studied

4,500 have received a genetic diagnosis

25 What we do Our work Our approach Other information

Parasites and Microbes

100+ new species have been identified

In this section 1 strains. However, these differences 1 How you are born determines How you are born appear to largely even out after a year. your gut bacteria determines your This type of research has previously 2 Diarrhoea causing bacteria been restricted by technical limitations adapted to spread in hospitals gut bacteria of sequencing just a small portion of DNA (16S rRNA gene profiling), small sample size 3 Combination of common drugs could In the largest ever study of new-borns’ or limited sampling during the first month be used to treat superbugs, like MRSA gut bacteria, Sanger researchers of life. Taking a uncovered the different species approach and longitudinal sampling allowed 4 Monitoring dangerous bacteria that colonise a baby’s gut at birth. the researchers to identify the bacteria around the world to stop the spread present. The study draws on the team’s and save lives The gut microbiome – the genetic material recent work to isolate and sequence the of all the microbes living in the gut – is vital genomes of gut bacteria, where they 5 Genomics points to universal for human health. Imbalances are thought identified over 100 new species. Streptococcus A vaccine to contribute to conditions such as obesity, asthma and Crohn’s disease. The very first The consequences of disruptions to 6 Deadly drug-resistant Salmonella micro-organisms to colonise a baby’s gut early-life microbiota, and the presence of discovered by African surveillance could influence the subsequent development immunogenic pathogens during this critical of its microbiome and immune system. window of immune development, are yet 7 World War One soldier’s cholera to be uncovered. The study provides insight revived and sequenced, 100 years on To understand where and how babies into the source and species of a baby’s gut acquire their gut bacteria, and the microbiome, and future work will determine 8 Resurrecting 50,000-year-old subsequent development of their if this early stage has a role to play in health. malaria gene shows jump from microbiomes, Sanger researchers gorillas to humans studied 596 babies and 178 mothers.

9 Success in controlling malaria at risk Reporting in Nature, genomic analysis References from spreading drug resistance of 1,679 faecal samples revealed delivery Shao et al. Stunted gut microbiota and increased method had the biggest impact on gut pathogen colonisation associated with caesarean birth. Nature 2019; 574: 117–121. 10 N ew malaria drug targets identified microbiota during the neonatal period. in liver stage of life cycle Babies born vaginally acquired their mother’s Forster et al. Human Gastrointestinal Bacteria gut microbes, whereas babies born via Genome and Culture Collection. Nature 11 M alaria deconstructed: cell by cell, C-section had more hospital bacteria, Biotechnology 2019; 37: 186–192. stage by stage including pathogenic and antibiotic resistant

12 Multidrug resistant malaria spreading in Asia

26 What we do Our work Our approach Other information

2 Researchers at the Sanger Institute, the The findings give new insights into bacterial Diarrhoea London School of Hygiene & Tropical evolution and demonstrate how human Medicine and their collaborators undertook behaviour shapes species. The team also causing bacteria the largest ever genomic study of C. difficile. demonstrates the importance of genomic They collected 906 strains of bacteria from surveillance of bacteria. Their work could adapted to humans, animals and the environment, help shed light on how other dangerous across 33 countries around the world. pathogens evolve and could help inform spread in Genomic sequencing and analysis showed infection control in hospitals. that C. difficile is currently evolving into two hospitals separate species.

Sanger researchers have discovered Reporting in Nature Genetics, the Reference that the gut-infecting bacterium, team identified genetic changes in the Browne H et al. Adaptation of host transmission cycle Clostridium difficile is evolving into two emerging species that allow it to thrive during Clostridium difficile speciation. Nature Genetics 2019; 51: 1315–1320. separate species. One of these species on Western sugar-rich diets. It has also has adapted to thrive on sugary diets evolved differences in genes involved in and evade disinfectants in order to forming spores, which help the bacterium spread in hospitals. evade common hospital disinfectants, allowing it to remain on surfaces. It has Antibiotic treatments wipe out normal become specialised to spread in gut bacteria, leaving people vulnerable to healthcare environments. C. difficile infection. The infection is difficult to treat, and can cause bowel inflammation Researchers estimated this species first and severe diarrhoea. C. difficile bacteria appeared about 76,000 years ago; it are the leading cause of antibiotic-associated was primed to take advantage of modern diarrhoea worldwide. healthcare practices and diets before hospitals even existed. The bacterium has continued to evolve and is now thriving, accounting for over two thirds of healthcare C. difficile infections.

Babies pick up gut bacteria from their mothers and the environment

Mouth Gut Skin

Breast milk Vagina Maternal bacteria 906 strains of bacteria collected from humans, animals and the environment, across 33 countries

Other sources

Foods Home Hospital

Natural Family, pets, environment other people 27 What we do Our work Our approach Other information

Mutations in MRSA can make it susceptible to penicillin and clavulanic acid combined

3 S. aureus has become resistant to β-lactam other mutations increased the ability of Combination of antibiotics by acquiring a gene encoding PBP2a to bind to penicillin in the presence a β-lactamase, or, in the case or MRSA, of clavulanic acid. As a result, a combination common drugs a gene encoding penicillin-binding protein of penicillin and clavulanic acid could 2a (PBP2a). overcome the resistance to penicillin in some could be used to MRSA strains. The researchers used the Previously, scientists in Cambridge identified drugs to successfully treat MRSA infections treat superbugs, an isolate of MRSA that showed susceptibility in moth larvae and then mice. to penicillin when combined with clavulanic like MRSA acid. Clavulanic acid is a beta-lactamase The team then sequenced the genomes from inhibitor, already used to treat kidney a diverse collection of MRSA strains. They Sanger researchers have revealed many infections during pregnancy. found a significant number of MRSA strains MRSA strains are susceptible to the – including the dominant strain in the USA widely-available antibiotic, penicillin, Reporting in Nature Microbiology the team, – contained both mutations, making them when combined with another drug – together with scientists from the Sanger vulnerable to the combined treatment. clavulanic acid. Institute and others from Denmark, Germany, Portugal, and the USA, sequenced the This makes it possible that one of the most Antibiotic resistance poses a severe threat genome of this particular MRSA strain to widespread strains of MRSA could be treated to the future of medicine. Methicillin-resistant identify which genes make it susceptible by a combination of drugs already licensed Staphylococcus aureus bacterium – MRSA to the combination of drugs. They found for use. The next step is to prepare clinical – is already a serious problem. The superbug several mutations centred around the gene trials in humans. has acquired widespread resistance to that codes for the penicillin-binding protein β-lactam antibiotics which include penicillin 2a (PBP2a). and its derivative, methicillin, the most clinically important and widely-used group PBP2a is crucial to MRSA strains, enabling Reference of antibiotics. them to grow in the presence of penicillin Harrison, EM et al. Genomic identification of cryptic and related antibiotics. Two of the mutations susceptibility to penicillins and β-lactamase inhibitors in methicillin-resistant Staphylococcus aureus. Nature reduced PBP2a expression, whereas two Microbiology 2019; 4: 1680–1691.

28 What we do Our work Our approach Other information

4 In 2007, an estimated 341 people died from Samples were analysed from both before Monitoring infections with carbapenem-resistant K. and after the introduction of the vaccine, pneumoniae; by 2015 the number of deaths allowing the team to see how the bacterium dangerous was 2,094. The increase is down to the fact has evolved in response. The work will help that once carbapenems are no longer effective, predict which strains are likely to cause a bacteria around there are few other treatment options left. threat in the future, and it will be vital for Infants, the elderly and immuno-compromised researchers reformulating the next the world to stop individuals are particularly at risk. generation of vaccines and developing vaccine strategies worldwide. the spread and Researchers at the Centre for Genomic Pathogen Surveillance, based at the Sanger Both studies highlight the importance save lives Institute, and their partners, published their of ongoing global genomic surveillance findings inNature Microbiology. The results to inform practices and save lives. Sanger scientists use genomic will inform public health efforts to control the surveillance to study the global spread spread of these infections in hospitals across of dangerous bacterial pathogens and Europe. improve control and treatment practices. References In a separate global genomic survey, David S et al. Epidemic of carbapenem-resistant During a Europe-wide survey, researchers scientists from the Sanger Institute, Emory Klebsiella pneumoniae in Europe is driven by nosocomial spread. Nature Microbiology 2019; analysed the genomes of almost 2,000 University and the US Centers for Disease 4: 1919–1929. samples of Klebsiella pneumoniae, an Control and Prevention sequenced DNA from opportunistic pathogen that can cause nearly 20,000 samples of Streptococcus Lo S et al. Pneumococcal Lineages Associated with respiratory and bloodstream infections. pneumoniae, the most common bacterial Serotype Replacement and Antibiotic Resistance in They found that antibiotic-resistant cause of pneumonia. The study is the first Childhood Invasive Pneumococcal Disease in the Post-PCV13 Era: An international whole genome strains are spreading through hospitals time S. pneumoniae has been assessed on sequencing study. The Lancet Infectious Diseases in Europe. Some strains are resistant to a global scale. 2019; 19: 759–796. the carbapenem antibiotics that represent the last line of defence in treating infections. Pneumonia is responsible for hundreds of thousands of deaths each year. Many countries have introduced the pneumococcal conjugate vaccine, and while highly effective against some strains, pneumococcal pneumonia rates remain high.

Analysing bacteria genomes across Europe shows how antibiotic resistant strains spread

29 What we do Our work Our approach Other information

5 Reporting in Nature Genetics, the team Genomics points revealed the existence of more than 290 genetically different lineages of clinically to universal important Strep A from countries spanning Africa, the Pacific, New Zealand Streptococcus A and Australia.

vaccine Until now, little has been known about the genetic makeup of Strep A in areas with A decade-long genomic study of the greatest burden – most information has highly diverse, dangerous bacteria come from areas such as the UK and US. has revealed potential vaccine targets. This new data showed that current leading Strep A vaccine candidates would be of Group A Streptococcus bacteria, little use in lower-income areas of the globe commonly known as Strep A, is an unjust where Strep A is endemic. killer. In low-income regions of the world it is thought to be responsible for more than However, the team found a number of half a million deaths per year, where it causes molecular targets that were present in all infections leading to rheumatic heart disease. of the bacterial strains, offering new targets Yet in high-income countries it is more likely to develop a global vaccine. In addition, to cause milder issues, such as a sore throat further analysis of the genomic data may help or impetigo. researchers understand why the bacteria cause such differing symptoms and diseases 290+ To aid the search for a global vaccine, in different regions of the globe. researchers from the Sanger Institute, the genetically different University of Cambridge and collaborators lineages of clinically in Australia, sequenced and analysed important Strep A the genomes of over 2,000 Group A Reference studied Streptococcus samples from 22 countries. Davies MR et al. Atlas of group A streptococcal vaccine candidates compiled using large scale comparative genomics. Nature Genetics 2019; 51: 1035–1043.

6 To understand the disease, its evolution and developed successfully identified the Deadly spread, researchers from the Institut National changes that enabled the bacterium to be de Recherche Biomédicale in the DRC and more dangerous and they hope that such drug-resistant the Institute of Tropical Medicine in Antwerp approaches could predict the emergence established a hospital-based surveillance and spread of drug-resistant strains in the Salmonella system that has been operating for the past future – guiding more effective interventions. 10 years. Working with scientists at the discovered Sanger Institute and others, the teams analysed both the bacterium’s physical by African characteristics and its genome sequence. Reference Van Puyvelde S et al. An African Salmonella surveillance Reporting their findings inNature Typhimurium ST313 sublineage with extensive drug-resistance and signatures of host adaptation. Communications, the teams uncovered Nature Communications 2019; 10: 4280. International collaboration and long- the emergence of a new, extensively term monitoring has revealed the rise drug-resistant form of S. Typhimurium. in dangerous strains of Salmonella Laboratory analysis identified that this bacteria in the Democratic Republic new sub-group, named ST313 lineage II.1, of Congo (DRC) that are now becoming is resistant to all but one of the commonly resistant to the last-line-of-defence drug. available drugs in the DRC; one sample even showed reduced susceptibility to this Every year invasive non-typhoidal Salmonella final antibiotic. The findings also suggest infections affect 3.4 million people, resulting that the bacteria are evolving to become in more than 680,000 deaths. The majority more invasive. of these blood-borne infections are caused by Salmonella Typhimurium. It is a major To see whether genomic insights could health burden in sub-Saharan Africa, where help with effective healthcare planning, it disproportionately affects children under Sanger researchers used machine learning five and adults with HIV. to look for DNA patterns in the Salmonella genome associated with invasiveness and drug resistance. The algorithm they

30 What we do Our work Our approach Other information

7 The results, published in Proceedings of The bacterium from WWI was collected World War One the Royal Society B, show that this strain is during a significant period of cholera history. a unique, non-toxigenic strain of V. cholerae. The disease has killed millions since the 19th soldier’s cholera It was not capable of causing epidemic century and remains a global threat to public cholera, and was unrelated to the classical health today – up to 5,000 people die every revived and V. cholerae that caused the sixth pandemic year from the disease. and was active at the time. The genomic sequenced, findings align with historical records of the soldier’s illness. 100 years on Reference The team also found that this strain of Dorman M et al. The history, genome and biology Sanger researchers have read the V. cholerae possessed a gene for ampicillin of NCTC 30: a non-pandemic Vibrio cholerae isolate from World War One. Proceedings of the Royal genetic code of a century-old Vibrio resistance, adding to evidence that bacteria Society B 2019 DOI: 10.1098/rspb.2018.2025. cholerae bacterium, stored by the possessed genes for antibiotic resistance National Collection of Type Cultures, long before antibiotics were developed. providing insights on bacterial evolution. One explanation is that bacteria needed such genes to protect themselves against The bacterium was isolated during naturally-occurring chemicals. World War One (WWI) from a British soldier convalescing in Egypt. It is the oldest publicly Comparing historical bacterial species with available strain of these cholera-causing their modern-day counterparts can give bacteria. As part of a collaboration to deep insights into the evolution of bacteria, understand bacterial evolution, researchers and the roles they played in human history. revived the bacterium and sequenced its genome.

Ampicillin resistance gene existed before the drug had been developed

31 What we do Our work Our approach Other information

8 P. falciparum is one of seven members of The study, published in PLOS Biology, Resurrecting the sub-genus Laverania, which originated in also identified six differences between the African great apes. P. falciparum only infects current and ancestral RH5 DNA sequence 50,000-year-old humans, after switching host from gorillas in P. falciparum. One of the changes stops around 50,000 years ago. RH5 being able to bind to gorilla basigin, malaria gene explaining how P. falciparum became To investigate the origins of P. falciparum, restricted to humans. shows jump researchers from the Sanger Institute and the University of Montpellier studied the In a separate study, published in Nature from gorillas genomes of all Laverania species. Analysis Communications, a second Sanger Institute revealed a section of DNA, containing the team investigated proteins that interact with to humans rh5 gene that had transferred from a gorilla RH5, and found that a change in the EBA165 parasite to the ancestor of P. falciparum. gene may also have played a key role in the Sanger researchers have uncovered emergence of P. falciparum as a human- the molecular pathway that enabled The gene encodes the RH5 protein that infective species. the world’s deadliest malaria parasite binds to the basigin receptor on human to jump from gorillas to humans. The red blood cells – a critical interaction for research is important for understanding P. falciparum to infect humans. The team how pathogens are able to move used ancestral sequence reconstruction References between species. to ‘resurrect’ the 50,000-year-old rh5 DNA Galaway F et al. Resurrection of the ancestral sequence and created synthetic copies. RH5 invasion ligand provides a molecular explanation for the origin of P. falciparum malaria in humans. Malaria’s most deadly form is caused by The resulting protein was able to bind to both PLoS Biol 2019; 17: e3000490. Plasmodium falciparum. The species has gorilla and human basigin, providing the arguably been responsible for more human molecular explanation as to how the parasite Proto WA et al. Adaptation of Plasmodium falciparum deaths than any other disease. switched host. to humans involved the loss of an ape-specific erythrocyte invasion ligand. Nature Communications 2019; 10: 4512.

Stories 8-12

Ancestral sequence reconstruction was Malaria Malaria, caused by Plasmodium used to ‘resurrect’ the parasites, is a global killer. It is responsible for at least 400,000 deaths a year. Nearly 228 million people were infected in 2018, with children under 50,000 five most at risk. Species of Plasmodium year-old RH5 DNA are rapidly and repeatedly becoming sequence resistant to antimalarial drugs in many areas, meaning elimination efforts are under threat and new treatments are urgently needed.

32 What we do Our work Our approach Other information

9 Over the last 15 years, the ongoing drive to Of most concern were genetic signatures Success in eliminate malaria has seen worldwide deaths detected on chromosome 12 in P. falciparum halve to 429,000 per year. But recently, the from Ghana and Malawi which could controlling decline in prevalence has stalled and malaria compromise the effectiveness of artemisinin- remains a global problem. In 2015, 92 per based combination therapies (ACTs). ACTs malaria at risk cent of global malaria deaths were in Africa, combine multiple antimalarial drugs in one with 74 per cent of these in children under five. treatment to overcome resistance to one or from spreading more individual drugs. Researchers, led by those from the drug resistance Plasmodium Diversity Network Africa, This information is crucial for understanding worked with the Sanger Institute to study how resistance to malaria drugs is Regional populations of malaria genetic features of P. falciparum parasite developing in Africa, and it can help inform parasites in Africa are sharing genetic populations from 15 African countries. control strategies across the continent. material in all directions – including The work is part of the global MalariaGEN Ongoing, large-scale genomic surveillance genes that can confer resistance to data-sharing network, which aims to build is going to be vital to track the emergence anti-malarial drugs. African scientists surveillance capacity in low and middle and spread of drug resistant malaria. together with Sanger Institute income countries. collaborators provide crucial evidence as to how resistance to malaria drugs The findings, published inScience , show is developing across the continent. that P. falciparum parasites are present References in genetically distinct regional populations, Amambua-Ngwa A et. al. Major subpopulations In the first continent-wide genomic study of and these are consistent with human and of Plasmodium falciparum in sub-Saharan Africa. Science 2019; : 813–816. malaria in Africa, scientists have uncovered mosquito population divergence. 365 the genetic features and evolution of the Miotto O et al. Genetic architecture of artemisinin- deadly Plasmodium falciparum parasite, Using single- polymorphism (SNP) resistant Plasmodium falciparum. Nature Genetics 2015; : 226–34. including how drug resistance spreads. analysis on whole genome sequence data, 47 researchers found that the populations are sharing genetic material in all directions – previously it was thought that the transfer of genes went from East to West Africa. Genes that confer resistance to antimalarial drugs are part of that transfer, with new types of drug resistance emerging in different parts of Africa.

10 Building on their previous work, the team New malaria drug studied parasites which had individual genes that had been deleted and tagged using targets identified a molecular barcode. The team assessed the effect of each deletion on the parasite’s in liver stage of behaviour throughout its life cycle, including in the liver stages. They cross-referenced life cycle the results with previously published data on metabolites and parasite gene activity. A huge collaborative project has identified essential malaria survival This computational approach enabled genes as drug targets, aimed at the researchers to identify 461 genes disrupting the parasite’s invasion that are essential for parasite transmission of the liver. The results open the from mosquitoes, through the mouse liver, possibility of new ways to tackle and back into the bloodstream of mice. the disease, alongside the existing The team then verified 20 of the findings in blood-stage drugs. the laboratory – demonstrating the validity of their method. They discovered seven Most current anti-malarial drugs target the metabolic pathways that the parasite blood stage of the parasite’s complicated life needs to infect the liver. cycle, with very few targeting the liver stages. Continent-wide Liver-stage drugs hold the potential to attack The study, published in Cell, is the first study shows genetic malaria parasites that can lay dormant in the systematic study of gene function in the material shared in liver for years causing recurrent symptoms Plasmodium malaria parasite family. all directions – such as Plasmodium vivax.

Using Plasmodium berghei, which infects mice, scientists from the Sanger Institute, the Reference University of Bern, Switzerland and Umeå Stanway R et al. Genome Scale Identification of University, Sweden studied over 1,000 Essential Metabolic Processes for Targeting the Plasmodium Liver Stage. Cell 2019; : 1112–1128. individual genes in the malaria parasites. 179

33 What we do Our work Our approach Other information

11 Previous malaria studies used pooled The research is published in Science, and Malaria samples of parasite cells to study gene all the data is freely available for use. The activity, but this could mask potentially atlas builds on initial work published in 2018 deconstructed: important differences, both between and where the team analysed over 500 individual within life stages, because it was not possible parasites during the blood stages of the cell by cell, stage to ensure all the cells were at exactly the parasite’s life cycle. by stage same point in the cycle. To overcome this, Sanger scientists used Sanger researchers have built the first advanced single-cell RNA sequencing References Malaria Cell Atlas, giving unprecedented technologies to study more than 17,000 Howick VM et al. The Malaria Cell Atlas: Single detail of how the parasite works at each individual parasites from across the life cycle. parasite transcriptomes across the complete Plasmodium life cycle. Science 2019; 365: stage of its complex life cycle. Not only did this provide an accurate picture eaaw26190. of gene activity for each individual stage, it The parasite that causes malaria is complex. also allowed the gene activities of different Reid A et al. Single-cell RNA-seq reveals hidden It lives in both and mosquitoes and parasites within the same stage to be transcriptional variation in malaria parasites. goes through multiple stages in mammalian compared, giving exceptional detail. eLife 2018; 7: e33105. livers and red blood cells, and mosquito Tool: Malaria Cell Atlas guts and salivary glands. The parasite itself The function of 40 per cent of Plasmodium is a single cell, which takes on many forms. genes is unknown but the team were able Understanding which parasite genes are to infer the roles for many, based on their active at each of the different stages of the pattern of activity across the life cycle and life cycle is key to developing much-needed compared to the activity of genes whose antimalarial drugs, vaccines, and function is known. These data could help transmission blocking strategies. researchers understand how parasites evade the host immune system, decide when to switch their developmental trajectory, react to stress or use different red blood cell invasion pathways.

Advanced single-cell RNA sequencing technologies used to study 17,000+ individual parasites from across the life cycle

34 What we do Our work Our approach Other information

Laos Rise of Thailand 0% multidrug 20% 0% resistant 88% KEL1/PLA1 malaria strain

1% 0% 56% Before 100% 2011 After 2016 29% 74% Cambodia

0% 57%

Vietnam

Infectious Diseases, revealed that the Multidrug situation has worsened. The multidrug- resistant KEL1/PLA1 parasites have spread resistant malaria internationally, replacing the local parasite populations in Vietnam, Laos and spreading in Asia Northeastern Thailand.

Sanger scientists used genomic The researchers also discovered that the surveillance to reveal that malaria resistant strain has picked up additional resistance to two first-line antimalarial genetic changes in the chloroquine drugs has spread rapidly from Cambodia resistance transporter gene which may 80%+ to neighbouring countries in be enhancing resistance even further. parasites in some Southeast Asia. A related report on clinical outcomes regions were revealed that these mutations were multidrug- Over the last decade, the first-line associated with complete treatment resistant treatment for malaria in many areas failure of DHA-PPQ. of Asia has been a combination of dihydroartemisinin and piperaquine The findings highlight the rapid and (DHA-PPQ). In 2018, researchers from worrying spread of this deadly pathogen. the Sanger Institute, University of Oxford Active genomic surveillance is now and Mahidol University, Bangkok, identified vital to inform national malaria control a Plasmodium strain resistant to DHA-PPQ, programmes and to help reduce the named KEL1/PLA1. It had been had risk of a major global outbreak. spreading across Cambodia, unnoticed, between 2007 and 2013.

In the most up-to-date and comprehensive Reference genome study of malaria parasites in van der Pluijm RW et al. Determinants of dihydroartemisinin-piperaquine treatment failure Southeast Asia, the team has now in Plasmodium falciparum malaria in Cambodia, sequenced and analysed the DNA of Thailand, and Vietnam: a prospective clinical, 1,673 Plasmodium falciparum parasites. pharmacological, and genetic study. The Lancet Their analysis, reported in The Lancet Infectious Diseases 2019; 19: 952–961.

35 What we do Our work Our approach Other information

Tree of Life 60,000 species in Britain and ireland will be sequenced as part of the Darwin Tree of Life

In this section 1 The Sanger Institute will collaborate with 1 Darwin Tree of Life sets down roots Darwin Tree partners at the Universities of Cambridge, Edinburgh and Oxford, the Earlham Institute, 2 Se quencing an entire ecosystem of Life sets EMBL’s European Bioinformatics Institute (EMBL-EBI), The Marine Biological 3 Br own trout genome will down roots Association, Natural History Museum, help explain species’ genetic and the Royal Botanic Gardens at Kew superpowers In 2019 the Sanger Institute welcomed and Edinburgh. Together, teams will identify Professor Mark Blaxter as the Tree of and collect species, set up new pipelines 4 G enome could hold the key Life Programme Lead and the Darwin and workflows to process large numbers of to red squirrel survival Tree of Life project secured £9.4 million species through DNA preparation, genome in funding from Wellcome. sequencing and assembly, gene finding and annotation. The Darwin Tree of Life project is one of several initiatives across the globe working New methods will be developed for towards the ultimate goal of sequencing the high-throughput and high-quality assembly genomes of all complex life () on of genomes and their annotation, and data Earth. The project is the first effort to will be shared openly through existing data sequence the species of one geographical sharing archives and project specific portals. The Darwin Tree area – Britain and Ireland. The project joins of Life project is part other global initiatives in a historic venture The £9.4 million funding from Wellcome will of a global project known as the Earth BioGenome project, support researchers to launch the first phase to sequence which aims to sequence 1.5 million species of sequencing 60,000 species in Britain and of plants, animals, fungi and protists. Ireland. This will see the teams collect and barcode around 8,000 key species, and Exploring the genomes of these organisms deliver high-quality genomes of 1.5m will give an unprecedented insight into how 2,000 species. species life on Earth evolved and uncover new genes, proteins and metabolic pathways as well These data will be of enormous value to as new drugs for infectious and inherited the international scientific community, diseases. At a time when many species including those working in life sciences, are under threat from climate change and medicine, alternative energy and climate human development, these data will also research. The data will also act as a global help characterise, catalogue and support resource for public engagement experts, conservation of global biodiversity for naturalists, citizen scientists, university future generations. students and schools.

The Sanger Institute will serve as one of the hubs for sequencing and assembling the genomes of 60,000 species from Britain Link and Ireland. Professor Mark Blaxter joined Darwin Tree of Life project news story the Institute to set up and lead the new Tree of Life Programme.

36 What we do Our work Our approach Other information

2 As part of the Darwin Tree of Life project, For most species, good quality genetic or Sequencing an the species living there will be collected and genomic information is unavailable, so the catalogued, and their genome sequences data will be hugely valuable to the research entire ecosystem will add to the wealth of data available on community worldwide. Having data from the ecosystem. an ecosystem will allow scientists to discover Species living in Wytham Woods in how different species have adapted to living Oxfordshire will be among the first It’s thought there are approximately 4,000 in the same environment. For example, to be sequenced for the Darwin Tree eukaryotic species in the woods, though this genome sequence data could give insights of Life project. is likely to be an underestimate. There are into how different insects break down the over 700 species of Lepidoptera (butterflies same defensive plant toxin. It may be that Wytham Woods is one of the most studied and moths), and these will be among the first there are different solutions to the same areas of woodland in the world. Owned and species to have their genomes sequenced. problems; alternatively species may have maintained by the University of Oxford, its evolved the same solutions independently. 1,000 acres are a designated Site of Special The goal over the first two years of the Scientific Interest (SSSI). Home to a rich project will be to sequence the dominant The data will also provide a basis diversity of species, there is a wealth of players in the ecosystem, and a broad for analysing genetic responses to long-term biological data available about taxonomic spread of species. environmental changes and human the ecosystem. disruptions to habitats, as researchers will be able to study and compare how changes over time.

Link Tree of Life Programme at the Sanger Institute

1,000 acres are a designated Site of Special Scientific Interest Approximately 4,000 eukaryotic species in Wytham Woods

The woods contain 700 species of moths and butterflies

37 What we do Our work Our approach Other information

3 Brown trout (Salmo trutta) is one of the most The brown trout genome will enable Brown trout genetically diverse vertebrates. Taxonomists scientists to compare DNA from different once classified brown trout as up to 50 populations of trout to the reference genome genome will distinct species, though now they are sequence – giving insights into how particular considered one. Different populations genetic variations allow certain trout to live help explain have adapted to exploit particular biological in habitats that would be fatal to others. niches, with some living their whole lives Pinpointing genetic variations that allow species’ genetic within a 200 metre stretch of freshwater Scottish loch trout to adapt to living in stream while others migrate from the stream relatively acidic waters, for example, may superpowers where they were born to the open sea. be useful in guiding conservation efforts to protect populations affected by increasing The newly sequenced brown trout Researchers at the Sanger Institute and their acidity in rivers and oceans as a result of genome will help settle the debate collaborators have sequenced the brown global warming. around whether it is one species or trout genome for the first time. The data will several and aid conservation efforts allow scientists and conservationists to better as we face a warming planet. understand the genetic roots of this highly specialised species. The genome will also Link help settle the long-standing debate about Brown trout genome news story whether the physically-varied brown trout is a single species or several.

Taxonomists once classified brown trout as up to 50 distinct species

38 What we do Our work Our approach Other information

Squirrel pox can kill up to 80% of a red squirrel population in an affected area

Sanger scientists extracted DNA from Genome could red and grey squirrel samples sent by Stories ® collaborators. They used PacBio SMRT 3-4 hold the key and Illumina sequencing technology to generate the first, high-quality red and to red squirrel grey squirrel reference genomes.

survival It is hoped that the genomes will be used 25 Genomes for 25 Years – to identify the genetic basis of immunity Institute’s anniversary project Sanger researchers and their to squirrel pox in grey squirrels, as well The brown trout, the red squirrel and the collaborators have sequenced the as in the reds that survive outbreaks. grey squirrel are some of the UK species genomes of red and grey squirrels, The genome can also be used to help to have been sequenced as part of the boosting research and aiding red ensure that populations are genetically Sanger Institute’s 25th anniversary 25 squirrel conservation efforts. mixed, helping to select animals for Genomes project. The project laid the reintroduction based on their genetic groundwork for the ambitious Darwin Red squirrels are native to Britain and compatibility with existing populations Tree of Life project, which will sequence Ireland, but they are under serious threat to give them the best chance of survival 60,000 complex species in the UK. of extinction in the UK because of habitat through successful breeding. loss, competition with grey squirrels and fatal squirrel pox. While grey squirrels The red and grey squirrels are two of are immune to the disease, squirrel pox the species to be sequenced as part outbreaks can kill up to 80 per cent of a of the Sanger Institute’s 25 Genomes red squirrel population in an affected area. project and will contribute to the ambitious Darwin Tree of Life project. Red squirrels are now confined to isolated pockets in northern England and Wales, with a more connected population in and Ireland. They are at great Reference risk of being pushed out of their territories Mead D. et al. The genome sequence of the Eurasian red squirrel Sciurus vulgaris Linnaeus by greys, outbreaks of squirrel pox 1758. Wellcome Open Research. DOI: 10.12688/ and inbreeding. wellcomeopenres.15679.1.

39 What we do Our work Our approach Other information Our approach helps to nurture the next generation.

We foster strong collaborations with scientists, clinicians, institutions, governments and society for mutual benefit.

40 What we do Our work Our approach Other information

Enabling all to thrive Read more about how we empower our staff on pages 46 and 47

41 What we do Our work Our approach Other information

Scale

In this section 1 DepMap Drugs – the research extends 1 Cracking cancer at scale Cracking cancer the Genomics of Drug Sensitivity in Cancer project to systematically screen hundreds 2 C ancer research has a at scale of anti-cancer drugs and drug combinations COSMIC future against thousands of cancer cell lines to The Sanger Institute is making the identify and associate genomic characteristics 3 Sequencing 500,000 genomes Cancer Dependency Map: a genome- with dependencies and treatment options. wide chart of every cancer type’s vulnerabilities to drive DepMap Genes – this stream is using precision medicine. CRISPR technology and programmable single-guide RNAs to knockout every single The mutations that drive a cancer’s gene across the human genome in many development also change the way its cells cancer types to map gene function and function – making the tumour vulnerable in determine individual cancer cell ways that normal tissue isn’t. Charting these dependencies. The results are made cancer-specific dependencies across the available via ProjectScore, which ranks entire genome, in every cancer type and and prioritises vulnerabilities and has already tissue, will provide the rulebook for precision highlighted a promising new drug target. medicine in cancer. Both the Sanger Institute and Broad Institute are producing cancer DepMap Analytics – this work is creating dependency maps that will enable the new computational approaches and researchers around the world to discover software needed to process, combine, new drug targets, precisely match an analyse and visualise the data being individual cancer’s mutations to specific produced by the Models, Drugs and vulnerabilities, and change the way we Genes research. treat cancer. All the models, data and algorithms To create such a map requires bench generated by the Institute are being experimentation at an unparalleled scale made available to the worldwide research and innovative computer-based analytics. community to provide the foundations To achieve this, the Sanger Institute has for precision oncology. founded four streams of research: DepMap Models, DepMap Drugs, DepMap Genes and DepMap Analytics. Link DepMap Models – by building on the Cancer dependency map at 1,600 cell lines the Institute has already the Sanger Institute website made available to scientists worldwide, this stream is creating the cell lines, 3D organoid cultures and genetically engineered cancer models required. The work encompasses 1,000 the Human Cancer Model Initiative, which is new cancer cell generating approximately 1,000 new cancer models are being cell models, and Cell Model Passports, generated which enables researchers to access key information on the genetic changes and drug sensitivities of a particular cancer type.

42 What we do Our work Our approach Other information

2 COSMIC: such global genetic resources Cancer research as NIH GDC, NCI CGC, OpenTargets, then and now cBioPortal, and Ensembl. has a COSMIC April 2004 – Version 1 COSMIC helps to power much early-stage future pharmaceutical research and cancer 57,444 10,647 diagnostics development, and new tumours mutations After more than 15 years of providing resources and insights are regularly vital cancer gene data, the Catalogue of developed. One example is COSMIC-3D, September 2019 – Version 90 Somatic Mutations in Cancer (COSMIC) developed with Astex Pharmaceuticals, database has become self-sustaining which enables visualisation of cancer- associated proteins to aid drug and is driving innovation in development. Another is the integration 1,412,466 precision oncology. samples of methylation and expression information COSMIC started in 2002 as a spreadsheet in collaboration with Bayer. Further updates for the Sanger Institute’s cancer genome and innovations are planned for 2020. project to keep track of the genetic 34,320 mutations being discovered in cancer. The costs of growing and sustaining whole genomes Updated weekly, it proved to be so useful COSMIC are substantial and, while initially that it was developed and launched as a funded by Wellcome, the COSMIC team freely available, open-access database for have deployed a range of commercial the global cancer research community just licensing opportunities. The result is that 26,829 papers two years later. COSMIC is now self-funding by pioneering the balance of commercial data sales with COSMIC has grown rapidly. At launch it academic open-access to all academic contained data on mutations across just four and charitable institutions across the 9,733,455 13,099,101 coding mutations samples key cancer-related genes. It now comprises world. This model will ensure that its data tens of millions of genetic variants across the will continue to power genomics research entire human genome, encompassing every for many years to come. 19,396 1,207,190 type of mutation mechanism, across every gene fusions copy number form of human cancer. variants

Currently more than 4,000 independent Link 9,197,630 7,929,101 researchers use the COSMIC website COSMIC website gene expression differentially every day to help drive innovation in variants methylated CpG precision oncology. Many more benefit from COSMIC’s highly annotated data through

3 The £200 million collaboration brings together Sequencing the UK Government, charities, researchers, and four leading pharmaceutical companies. 500,000 The sequencing work is being shared by the Sanger Institute and deCODE genetics genomes in Iceland.

Having successfully delivered 50,000 The resulting database of anonymised UK Biobank human genome sequences genomic information, linked with on time and on budget, the Sanger comprehensive clinical characterisation, Institute is now using its power to will help to provide a unique insight into why deliver the most ambitious human some people develop particular diseases genome sequencing project to date. and others do not.

The Sanger Institute is helping to produce Building on the work of the pilot programme, a game-changing genomics resource: the the plan is to complete sequencing of the UK Biobank’s Whole Genome Sequencing remaining 450,000 participants in two project of 500,000 volunteers’ genetic code. tranches. The first tranche of up to 125,000 The data will be allied with their clinical and whole-genome sequences is expected to lifestyle data, providing a wealth of be accessible in Spring 2021 and the entire The UK Biobank’s information to power the next wave of cohort should be available in early 2023. Whole Genome Sequencing research into human health and project is sequencing the disease worldwide. genetic code of

The project is the most ambitious Link sequencing programme undertaken by UK Biobank Whole Genome Sequencing project 500,000 a public-private partnership. It follows on at the Sanger Institute volunteers Sanger Institute’s successful initiation of the in total UK Biobank’s proof-of-concept pilot project to sequence 50,000 volunteers’ whole genomes.

43 What we do Our work Our approach Other information

Innovation

In this section 1 The collaboration brings together patient 1 Microbiotica: fast maturing company Microbiotica: samples from Genentech’s clinical trials of grows at a pace potential inflammatory bowel disease (IBD) fast maturing medicines and Microbiotica’s precision 2 Congenica grows globally microbiome platform to company grows discover and develop diagnostic biomarkers 3 Funding future technologies and treatments for the condition. In addition, at a pace Microbiotica is using the samples to expand its bacteria collection and reference Innovative Sanger Institute spin-out genome database. company Microbiotica has established itself as a leading player in microbiome- Using bioinformatics and artificial Multi-year based live therapeutics and biomarkers. intelligence, Microbiotica’s researchers strategic collaboration identify the bacterial species present in with Genentech worth Microbiotica was founded on gut microbiome individuals’ guts to discover microbiome up to research carried out in the Institute’s ‘signatures’ that can be linked to specific Parasites and Microbes Programme. It is diseases and conditions. They then now fast maturing into a major force in validate these signatures in mice with $534 gut research. In 2018, its extensive culture a humanised microbiome. collection of human gut bacteria and million reference genome database of gut bacteria The Microbiotica team and functions are species attracted the interest of biopharma currently located across the Wellcome world leader Genentech and secured Genome Campus in dry-lab compute and multi-year strategic collaboration worth wet-lab experimentation spaces that were up to $534 million. provided to the company as it established itself. To aid the company’s continued In 2019, the magnitude and potential of this growth, they will be moving to the nearby partnership was recognised when it was Chesterford Research Park, releasing space announced as a Scrip Awards finalist in for the next innovative start-up to grow at the Best Partnership Alliance category. the .

Link Microbiotica website 44 What we do Our work Our approach Other information

2 sequencing and targeted copy number Congenica grows analysis in a single assay. The product simplifies generating and interpreting globally molecular and cytogenomic data by removing the need for additional frontline Congenica, the Sanger Institute’s tests such as microarrays. spin-out clinical decision support company, has continued its national The company continues to develop its and international growth through analysis software and has increased the strategic alliances, data integration platform’s power by integrating reference and new funding. data from the DECIPHER project and Mastermind. To enable further growth the Based at the Wellcome Genome Campus’ company secured an additional £13.25 BioData Innovation Centre, Congenica million in equity investment from new delivers accelerated interpretation of strategic investors, including Digital China complex genomic data to improve disease Health Technologies Corporation Limited, diagnosis for the UK NHS and a range of alongside follow-on funding from existing Launched hospitals and research projects across investors to reach £23.3 million in the globe. series B financing. all-in-one During 2019, the company partnered Closer to home, Congenica is also whole with Chinese firm BGI Genomics to launch supporting the Next Generation Children and interpretation service BGI-Xome, an all-in-one whole exome project – a collaboration between the NHS with BGI Genomics sequencing and interpretation service. East Anglian Medical Genetics Service, the The offering combines BGI Genomics’ NIHR Bioresource – Rare Diseases, and laboratories with interpretation by certified the University of Cambridge to investigate clinical scientists using the Congenica the clinical effectiveness of rapid whole analysis and clinical decision support genome sequencing for children in intensive platform to provide a gold-standard care. The approach is providing fast and service for its users. accurate genomic diagnoses that may help children to receive appropriate treatment Congenica also teamed up with Nonacus and avoid further invasive tests, and is Ltd to create Exome CG, a clinically validated likely to become the template for future Link exome capture kit that enables whole exome NHS services. Congenica website: congenica.com

3 FLIP is a technique that dramatically improves Funding future the speed, affordability, scale and efficiency of producing conditional human and mice technologies gene knock-out models. The approach places a cassette of custom DNA sequence In 2019, the Sanger Institute provided into a gene which will not affect the gene’s more than £500,000 in development functioning. However, when a recombinase funding to early stage in-house is used to flip the cassette’s research as part of its commitment orientation, the gene is disrupted and to deliver scientific and knocked out in a single step. The effect healthcare benefits. is also completely reversible – a second recombinase will flip the cassette back The goal of the funding, awarded by the to its original orientation and reinstate Sanger Institute Translation Committee, gene function. is to allow the intellectual property of the discovery, technique or process to be IsoTyper allows researchers to profile sufficiently developed to attract further the antibody diversity and functioning translation by an external partner. The of isotype-specific B cells in a single projects currently being supported cover a experiment. By using specially-designed wide range of scientific endeavour, including barcoded DNA primer sets and DNA equipment design, immunology and novel sequencing, the system can identify gene knock-out library construction. antibody-producing cells that flow cytometry techniques cannot. IsoTyper Examples of the science that are being could be used to monitor leukaemia developed for translation include FLIP progression and responses to therapy, and IsoTyper. to optimise vaccine development, or to provide early detection of transplant rejection or developing autoimmune or anti-drug responses.

Link Sanger Institute Innovations: sanger.ac.uk/innovations 45 What we do Our work Our approach Other information Culture

In this section 1 The commitment is a university and research 1 Institute ramps up its support Institute ramps institution initiative that aims to ensure for technicians visibility, recognition, career development up its support and sustainability for technicians working 2 How DORA supports Sanger in higher education and research, across scientists for technicians all disciplines. In May 2019, the Institute submitted its action plan to address these 3 Enabling all to thrive, valuing diversity The Sanger Institute is indebted to key issues. its highly skilled workforce of almost 500 technicians whose diverse range The plan is now being executed by a of skills are crucial to the delivery of Technician Commitment manager who its science. To nurture their career champions the Institute’s technicians both development, the Institute is a member internally and externally. The Institute has Almost of the Technician Commitment. launched new training courses and has joined Higher Education and Technician’s From cutting-edge cell manipulation carried Educational Development (HEaTED) to out on the latest laboratory equipment, to expand the range of resources, specialist 500 novel bioinformatics analysis of genomic data training, advice, and networking Sanger Institute using our high-performance IT infrastructure, opportunities to support technicians’ technicians make our the Institute’s technicians deliver the development. In addition, the Institute has science happen Institute’s world-leading research. To ensure developed procedures to enable staff to that the Institute’s technicians receive the become members of professional bodies training, career development opportunities covering animal technology, biomedical and recognition they deserve, the Institute sciences and computer technology. signed up to the Technician Commitment in 2018. Over the course of 2020 a programme of events, internal policies, procedures and processes, and seminars will be established to further promote and empower the Institute’s technicians.

Link Technician Commitment at the Sanger Institute 46 What we do Our work Our approach Other information

2 The Institute is keen to ensure that its How DORA researchers are free to produce science that produces the greatest possible supports Sanger benefit to research, healthcare or society, regardless of the form it takes. For this scientists reason, it was delighted to sign up to the San Francisco Declaration on Research Signing up to the Declaration on Assessment in 2019. Research Assessment (DORA) reinforces the Institute’s approach to To avoid placing unnecessary pressure recognising the value of all its staff’s on staff to publish in high-impact journals, scientific contributions. the Institute employs a progressive approach that draws on expert peer The Sanger Institute produces research that review and a range of information to powers genomic and medical exploration assess the scientific merit of a and discovery around the world. Its staff researcher’s work. To this end, the create and disseminate cell lines, genome Sanger Institute broadly defines research manipulation and analysis techniques, data excellence to include data resources, repositories and software tools – available leadership of large consortia and free of charge and open access wherever mentoring, and to recognise societal possible. The scientific impact of the impact through translation into Institute’s research output is much greater healthcare or scientific benefit and than simply counting its published scientific public engagement. papers – and the impact factor of the journals they are in. These guiding principles also align with Wellcome’s new policy on Open Access, which will launch in 2021.

Link DORA at the Sanger Institute

3 The Institute is committed to supporting Enabling all to women’s careers and it has addressed potential obstacles by providing generous thrive, valuing parental leave, return-to work-fellowships, paid carers leave and flexible working. The mean pay gap diversity Our organisation is a bronze award member reduced from of the Athena SWAN Charter, and we have The Sanger Institute is a genomic recently applied for silver. 20% to research ‘ideas factory’, powered by the imagination, skills, experiences and In addition, the Institute has introduced insights of our diverse, interdisciplinary measures that are increasing the proportion 16.2% staff. By valuing difference and – and interview success rate – of women at nurturing talent, we seek to enable senior grades. These include unconscious everyone to reach their full potential bias training, expanding our management and deliver world-changing science. and leadership programmes, and increasing access to promotion opportunities. In common with many industries, scientific research can suffer from a lack of diversity, To further develop our inclusive culture, but the Sanger Institute is committed to the Institute has partnered with AdvanceHE fostering an environment where the talents to embed the principles of the Race Equality of all genders, races and neurodiversity can Charter in our workplace, through staff thrive. At a structural level, the Institute is support and tracking the ethnicity pay gap. addressing inequalities in pay and career In 2020 we will join the Stonewall Diversity opportunities, with the aim to establish Programme, and will use the Stonewall equal pay across all nine pay grades by Workplace Equality Index to benchmark 2022. By updating our staff’s job families our support of our LGBTQ+ staff against to introduce a new job family and link pay best practice. to market rates, the mean pay gap has already reduced from 20% mean (2016) But there is more we can do to support to 16.2% mean (2017). diversity in our workplace and the Sanger Institute is exploring how we can support neurodiversity within our staff.

Link 47 Equality in Science at the Sanger Institute What we do Our work Our approach Other information

Influence

In this section 1 The speed and ease of international 1 Sanger’s work highlighted by Sanger’s work travel means that continued health and Chief Medical Officer prosperity is now a shared global endeavour. highlighted by Focusing solely on domestic diseases brings 2 Institute advises UK Parliament the risk of being unable to respond to global on commercial genomic testing Chief Medical threats when they arise, the CMO warned in her 11th annual report: Health, our global 3 N agoya: how to balance fairness Officer asset – partnering for progress. with effectiveness In her final annual report, the outgoing To highlight the value of international UK Chief Medical Officer (CMO) collaboration, the report featured an article used Sanger Institute research to from a leading Sanger Institute Parasites demonstrate the importance of global and Microbes Programme researcher. collaboration and data sharing for It detailed how a rapid-response global national and international health. team for cholera successfully used genomic surveillance to trace the source of 2017’s Yemen outbreak.

The work brought together skills and experience from around the world. In Yemen, Médecins Sans Frontières (MSF) and local healthcare workers collected Vibrio cholerae samples from the Yemeni population and from a temporary refugee camp. Researchers at the Institute Pasteur, Paris sequenced the bacterial genomes and Sanger scientists analysed the data. By comparing the sequences to those from the current, ongoing global pandemic, the team showed that the Yemen outbreak was caused by the same 7PET lineage, and was likely to have entered the region from Eastern Africa. The team also found that the Yemeni samples were missing four genes responsible for resistance to commonly used antibiotics, making the bacterium more susceptible to treatment.

UK-based researchers play a vital role in helping to understand emerging global health threats. The Institute is a leading player in building the partnerships needed for global health by developing local genomic research capacity in low- to middle-income countries through training and sharing technical knowledge.

Link Sanger Institute contribution to the 2019 CMO report 48 What we do Our work Our approach Other information

2 • diagnostic services for the NHS – including Institute advises prenatal screening, complex disease risk assessment, and searching for rare UK Parliament genetic disorders. on commercial In terms of direct-to-consumer tests, the Institute counselled that an independent genomic testing regulatory body should be set up to oversee testing standards, accuracy and the privacy As at-home DNA testing kits and security of individuals’ genomic data. become increasingly accessible, and commercial providers seek to provide Regarding commercial genomic services genetic testing for the NHS, the Sanger for the NHS, the Institute highlighted the Institute provided expert advice to the need to develop a highly skilled workforce UK Government on the opportunities able to interpret the complex genomic data and risks. being produced with polygenic risk scores to deliver meaningful diagnoses. Without In 2019 the House of Commons Science adequate resources for training or additional and Technology Committee set up an inquiry staff these services may place increased into commercial genomic testing. To help pressure on the healthcare system. guide UK lawmakers through the issues around health diagnoses outside of clinical In addition, the Institute encouraged support, result accuracy, implications for the the NHS to adopt standard file formats NHS, and the safety of sensitive data, the for its genomic data and to store them Institute provided written and oral evidence. in accessible databases. In this way, with appropriate security and governance, The Institute pointed out that commercial the research community could draw on genomic services fall into two main areas: the collective power of these data to further benefit healthcare and develop • direct-to-consumer tests – such as for new treatment options. genomic ancestry

Link Sanger Institute contribution on Commercial Genomics inquiry 3 Much of the Institute’s research uses In counselling the CBD, the Institute Nagoya: how samples from other countries: from cited the vital role of globally-accessible, monitoring the global emergence and free, open access databases such as the to balance spread of infectious diseases to studying European Nucleotide Archive in facilitating the influence of environmental factors on collaborative research. If the ability to share fairness with cancer. The Nagoya Protocol officer works digital sequence information freely and across the Institute at all levels to ensure rapidly through such resources was effectiveness that its researchers develop, and use, restricted, then it might prove impossible best practice structures and processes. to respond effectively to global public As part of its core principles, the health emergencies, deliver actionable Sanger Institute uses the Nagoya Recently the international Convention public health strategies, or address Protocol to handle its genetic samples on Biological Diversity (CBD) proposed conservation challenges. and discoveries fairly and ethically. But extending the Nagoya Protocol to include the Institute is concerned that proposed digital sequence information. While expansions in scope could harm the supporting fair use of samples, the Institute future of global genomic research. is concerned that such a move could cause Link more societal and scientific harm than good. Sanger Institute response to The Sanger Institute is committed to Nagoya Protocol proposal generating and sharing freely available open-access data to benefit the worldwide research community and wider society. To ensure that the Institute leads the way when appropriately gathering and handling samples from around the world, it has embedded a Nagoya Protocol officer into its processes.

49 What we do Our work Our approach Other information

Connections

In this section 1 In the same way that the World Wide Web 1 GA4GH: powering genomic GA4GH: powering Consortium develops the standards needed research’s future to enable the internet to deliver easy-to-find, genomic globally available information that can be read on any computer, the GA4GH is 2 Free online learning for all research’s future developing the standards needed for an internet of genomic data. The Sanger Institute is helping to build the foundations of easy access global The Sanger Institute provides expert genomic data sharing to enable guidance and administrative support to genomic enquiry at scale. the GA4GH. The Institute’s Policy team has helped the Alliance develop its Regulatory The power of genomic research lies in and Ethics Framework, based on human analysing enormous datasets, often created rights. In addition, in 2019, Institute by aggregating data from multiple data researchers helped to deliver the GA4GH sources around the world. Yet this work Passports and Data Use Ontology standards is hampered by the relevant data sources which work in harness to enable all being worldwide, in countries with different researchers to automatically access data regulatory and data protection processes, worldwide by demonstrating that they fulfil and different clinical and genomic the necessary restriction criteria. terminology. Finding, obtaining access to, and aligning the relevant datasets to analyse Life today – powered by the internet – is them are major barriers to future discovery. radically different to life just two decades ago: through standards such as Passports To remove these obstacles, the Sanger and Data Use Ontology, the GA4GH will Institute and the Broad Institute helped enable a paradigm shift in global to found the Genomic Alliance for Global genomic research. Health (GA4GH) in 2013 to enable genomic data sharing for the benefit of human health.

Link GA4GH website

Workflow of accessing and analysing genomic data

Currently

A B C D

Search for individual Apply for access by filling forms to Wait for permission to be Download the data sites for information show that the researcher matches granted by a committee the restrictions of the data in terms overseeing the data of data use agreements, data protection and security

Using GA4GH approaches, including Passports and Data Use Ontology

A B

Use Beacon protocol to quickly Data providers check researcher’s Passport (using Data Use search all relevant databases Ontology to detail organisation, level of access, data protection and regulation criteria) automatically to grant access 50 What we do Our work Our approach Other information

2 Sponsored by ACSC, each three-week-long Free online course is completely free to use, including assessment and certification. The courses learning for all can be taken via computer, mobile phone or tablet and require no genomic knowledge. Campus-based partnership provides Aimed at doctors, nurses, microbiologists, free genomic training for everyone secondary school teachers and pupils, around the world. the content is delivered via a mix of videos, articles, tests and quizzes to validate 13,500 Over the past 15 years, the Wellcome learning. Each course is presented by expert researchers and healthcare Genome Campus Advanced Courses and educators who also support the students. professionals have completed Scientific Courses (ACSC) team and Sanger the online courses scientists, along with researchers around So far, more than 13,500 have benefited from the world, have successfully collaborated the online training and further courses are to train other researchers worldwide in planned throughout 2020 and 2021. bioinformatics and genetic techniques. But the number of people who can benefit To find out more, visit: has always been limited by the physical coursesandconferences. constraints of location and capacity. wellcomegenomecampus.org/ event-type/online-courses To enable everyone to benefit from the revolution in genomic medicine, ACSC has partnered with FutureLearn to harness the power of the internet to provide training across the globe for free. Designed and delivered in partnership with the Institute’s pathogen researchers, the series of four courses explore how genomics is used to understand bacterial genomes to monitor the spread of diseases and drug resistance.

A couple of months

E F G

Convert data to a form Compare ontologies and Analyse data on computer/ Get results that researcher can query try to match terminology organisation’s system

Using GA4GH approaches, including Passports and Data Use Ontology Matter of days

C

Query data in the cloud using Get results online servers (as data is in standard file formats) 51 What we do Our work Our approach Other information

Image Credits

All images belong to Wellcome Sanger Institute, Genome Research Limited except where stated below:

Outside Page 24 – Blood sugar level testing – Adobe Stock Front Cover – Butterfly wing – iStock Page 25 – Child with building blocks – pixabay.com Inside Deciphering Developmental Disorders project logo – Front Cover – Cancer cell – Shutterstock Deciphering Developmental Disorders project Page 26 – Bacteria computer imagery – CDC/ Antibiotic What we do Resistance Coordination and Strategy Unit Page 2 – Gorilla hand – Getty images Mother and newborn baby – Iuliia Bondarendo, Page 4 – Pancreatic cancer cells – Anne Weston, pixabay.com Page 27 – Surgeons washing hands – Shutterstock Page 5 – Mosquito – Emphyrio, pixabay.com Page 28 – Blister packs of medications – Shutterstock Baby ultrasound – Dr Wolfgang Moroder, MRSA scanning electron microscopy – National Wikimedia Commons Institute of Allergy and Infectious Diseases (NIAID) Lebanese Crusader skeleton – Page 29 – Klebsiella pneumoniae scanning electron microscopy Dr Claude Doumet-Serhal – National Institute of Allergy and Infectious Diseases Steptococcus A computer images – CDC/PHIL (NIAID) Child with asthma inhaler – Adobe Stock Page 30 – Strep A scanning electron microscopy – Page 6 – CT scanner – Adobe Stock National Institute of Allergy and Infectious Diseases, Big Ben Elizabeth Tower – Ukyo Katayama, National Institutes of Health pixabay.com Page 31 – World War 1 soldiers – Imperial War Museums Surgeon washing hands – Piron Guillaume, (Wikimedia Commons) Unsplash.com Page 32 – Gorilla – Adobe Stock Brown trout in water – Shutterstock Page 33 – African family receiving antimalarial medication – Newborn baby – Iuliia Bondarendo, Pixabay.com William Brieger, Jhpiego Gorilla – Adobe Stock Page 34 – Malaria parasites in red blood cells computer imagery – Shutterstock Our work Page 35 – Mosquito on arm – Emphyrio, pixabay.com Page 8 – Mosquito – Adobe Stock Page 36 – Bumblebee – iStock Page 11 – CT scanner – Adobe Stock Page 37 – Butterfly on flower – iStock Page 12 – Cancer cells – Anne Weston, Francis Crick Institute Wytham Woods – Sandra Bouwhuis Page 16 – Liver Cell Development Atlas – Newcastle University Page 38 – Brown trout in water – Shutterstock Page 17 – Child’s hand with intravenous line – Adobe Stock Page 39 – Red squirrel – Adobe Stock Page 18 – Kidney cortex – Menna Clatworthy, University of Cambridge Our approach Page 19 – Computer imagery of lungs – Adobe Stock Page 40 Lung cells section – Professor Wim Timens, UMCG and 41 – Child and parent hands – iStock Page 20 – Child using asthma inhaler – Shutterstock Page 45 – Congenica logo – Congenica Ltd Page 21 – Expanded pluripotent stem cells – Xuefei Gao Immunoglobin molecule computer imagery – Page 22 – Human genetics section – measles virus computer Adobe Stock imagery, CDC/PHIL Page 49 – Big Ben Elizabeth Tower – Ukyo Katayama, Child with measles – CDC/Molly Kurnit, M.P.H pixabay.com Page 23 – Crusader skull – Dr Claude Doumet-Serhal World connections map – Pete Linforth, pixabay.com Nigerian man – Ovinuchi Ejiohuo, unsplash.com

52 What we do Our work Our approach Other information

Wellcome Sanger Institute Highlights 2019/20

The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London NW1 2BE.

First published by the Wellcome Sanger Institute, 2020.

This is an open-access publication and, with the exception of images and illustrations, the content may, unless otherwise stated, be reproduced free of charge in any format or medium, subject to the following conditions: content must be reproduced accurately; content must not be used in a misleading context; the Wellcome Sanger Institute must be attributed as the original author and the title of the document specified in the attribution.

Designed and produced by

53 Wellcome Sanger Institute Tel: +44 (0)1223 834244 sanger.ac.uk