<<

Strategic Plan 2011-2016

Wellcome Trust Sanger Institute Strategic Plan 2011-2016

Mission The Sanger Institute uses sequences to advance understanding of the biology of humans and pathogens in order to improve human health.

-i-

Wellcome Trust Sanger Institute Strategic Plan 2011-2016

- ii -

Wellcome Trust Sanger Institute Strategic Plan 2011-2016

CONTENTS

Foreword ...... 1 Overview ...... 2

1. History and philosophy ...... 5 2. Organisation of the science ...... 5 3. Developments in the scientific portfolio ...... 7 4. Summary of the Scientific Programmes 2011 – 2016 ...... 8 4.1 Cancer and ...... 8 4.2 Human Genetics ...... 10 4.3 Pathogen Variation ...... 13 4.4 Malaria ...... 15 4.5 Mouse and Zebrafish Genetics ...... 16 4.6 Genome Informatics ...... 18 5. Research into diseases of developing countries ...... 19 6. Core platforms for data generation and analysis ...... 20 7. Strategic issues across all Programmes ...... 22 7.1 Strategy for DNA sequencing ...... 22 7.2 Strategy for data handling and sharing ...... 25 8. Strategy for translation ...... 26 9. Developing our people ...... 27 10. Developing our organisation ...... 30 11. Premises ...... 31 12. Strategic relations ...... 31 13. Spreading the word ...... 33 14. Societal aspects of the Personal Genome...... 34 15. Resources...... 35

Supporting documents ...... S-1/47

-iii-

Wellcome Trust Sanger Institute Strategic Plan 2011-2016

Foreword

We present below the Strategic Plan for the Wellcome Trust Sanger Institute for 2011-2016. The centrepiece of the Plan is the portfolio of scientific experiments we propose to conduct over the next quinquennium. However, the document also addresses training of young scientists and the ways in which we will inform and interact with the public about our work. Our science is conducted by people in a complex physical and societal environment. The Plan therefore further reflects on our scientific infrastructure and strategy for its development, the organisation and governance of the Institute, our interactions with other scientific organisations, the development of our people, our buildings, and the resources we are requesting in order to implement the Plan. It has been written subsequent to external review of our proposals and a Site Visit by a review panel at Sanger in May 2010, and incorporates the advice and comments received.

-1-

Wellcome Trust Sanger Institute Strategic Plan 2011-2016

Overview

The Sanger Institute’s primary scientific theme in the 2011-2016 quinquennium will be the study of differences in DNA sequence between individual , understanding the consequences of this variation for the biology of humans and other and developing approaches by which this understanding can be used to improve human health.

Our mission and strategy are entirely congruent with those of the Wellcome Trust and we aim to contribute as much as we can from our scientific corner to achieving the Trust’s goals.

Why study variation in DNA sequence? Because individual members of a species, whether humans, mice, fish, worms, parasites, , or viruses differ from one another in their anatomy, physiology, metabolism and behaviour, and differences in their DNA sequences make a substantial contribution to this variation in phenotype.

The overarching aim of our studies is, therefore, to characterise differences between individual genomes and identify the phenotypic changes which are the consequence of genomic variants. Our research will entail exploration of variation in DNA sequence between human genomes, between the genomes of microorganisms that cause infectious disease (pathogens) and between the genomes of individual from two model species, mouse and zebrafish. We will extensively explore naturally occurring DNA variation and will complement it with experimental genomic differences that we ourselves introduce.

Our studies of DNA variation in humans will be wide ranging. They aim to identify DNA variants that cause phenotypic differences which fall within the normal spectrum, for example in hair colour or height, and also variants that result in overt disease. A spectrum of human diseases, rare and common, prevalent in developed and developing countries will be investigated including obesity, inflammatory disease, epilepsy, severity of response to malarial infection and disorders of development. Our studies in humans will also encompass somatic DNA variation, differences between the genomes of individual cells in the body, which underlies the development of all cancers.

Naturally occurring variation will also be studied in many disease-associated microorganisms, allowing us to elucidate the genomic variants that are associated with important phenotypic features for human health such as microbial virulence, resistance to antibiotics and escape from vaccination.

Because DNA variation distinguishes one member of a species from another and tells us how closely they are related, studying DNA sequence differences also allows us to track patterns and changes in the distribution of organisms. For example, it reveals how human populations have migrated in the past and the environmental or lifestyle factors implicated in selecting some humans over others, such as infectious disease and food scarcity. In a similar manner, it will allow us to track the spread of disease-causing microbes, monitor their routes of geographical dissemination and understand the waxing and waning of epidemics. Indeed, we will also be able to track cancer cells using somatically acquired genomic differences, even when they are undetectable by other methods.

Importantly, investigation of natural variation at Sanger is complemented by our ability to introduce artificial changes into genomes. We will knock out in many microorganisms allowing us to identify those that are absolutely required for the organism to survive and

- 2 -

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 therefore which potentially recommend themselves as drug targets. Our studies of experimentally induced variation in zebrafish, mice and embryonic stem cells will systematically and comprehensively explore the changes in phenotype that result when genes are deleted, one by one. We will also use engineered in mice to cause cancers and thus understand in greater depth the genes that are implicated in tumour formation. Of course, these experiments will also provide the animals or cells themselves, which can then be used to investigate further the disturbance in biology that is caused by the DNA variant and which ultimately results in the phenotype.

The results emanating from all these experiments will be made freely available in data repositories. In many cases we will, in addition, organise and annotate the data in intelligible and easily usable databases, transforming the ability of other research scientists to use the information. Physical resources such as animals, cells and DNA clones will also be made available and are already widely used by many research communities around the world.

In some instances, the discoveries we make will lead directly to changes in medical practice which will have impact on the wellbeing of patients. For example, the clinical management of families with a genetic disease is often transformed by the identification of the underlying causal mutated . In other cases, the future potential impact is apparent but will require further “translational” studies to bring it into routine clinical practice. For example, sequencing the genomes of bacteria in patients with infections, with its attendant insights into classification, virulence, antibiotic resistance and geographical origin is likely to transform the way that microbiological diagnostics is conducted in the future. Similarly, tracking cancer cells through the DNA they leak into blood is likely to change the way in which we monitor tumour load in cancer sufferers. For both of these discoveries, and others, translational programmes of research will be initiated to bring them closer to clinical usage.

In all cases, however, our discoveries will provide a new starting point for investigating the disturbance of biological processes that is caused by a DNA variant and which ultimately leads to the disease. This is important, because it is biological understanding which provides us with opportunities for rationally designed therapeutic intervention and hence modification of the disease course.

The sheer scale of the work we are proposing for the next five years is an extraordinary testament to recent changes in technology. We will sequence approximately 40,000 human genomes (completely or in part), 20,000 malaria genomes, 2,500 mosquito genomes and the genomes of 30,000 bacteria, parasites and viruses. We will deliver engineered mutant mice, zebrafish and cells on a scale greater than any other single institute.

Completion of this experimental parade is, in itself, a substantial challenge. Our aim, however, is to remain world leading in the delivery of genome sequence based science and this will require us to maintain and operate, at the highest level of efficiency, the core facilities on which our scientific output is based; the DNA pipelines, the Mouse and Zebrafish facility and the Data Centre which has to cope with the huge amounts of sequence information they generate.

The success of these ambitious plans is in the hands and minds of the 900 or more people who work at Sanger and for whom we must develop an environment which is intellectually stimulating, facilitative for work, enabling for personal development and is safe. We will develop our Institutional organisation and physical infrastructure in order to achieve this.

-3-

Wellcome Trust Sanger Institute Strategic Plan 2011-2016

We will also continue to nurture the many collaborations and alliances with scientists and institutions external to Sanger upon which we critically depend. In particular, we will develop our scientific links with the family of Wellcome Trust Centres and Overseas Programmes, and with other international centres and consortia in developing countries that confer a truly global dimension to our work.

Centred around the science it conducts, the Institute will enthusiastically continue to train the next generation of genome scientists. It will do so through its excellent Programme for PhD students, through ever increasing training opportunities for clinicians and by opening its doors to many others wishing to taste genomic science. We will elaborate further our activities to inform schoolchildren, teachers and the wider public about genomes through electronic modalities, exhibitions and direct contact, inspiring wonder about the remarkable DNA code that is altering our lives and our perceptions of ourselves.

DNA is an incredibly robust biological molecule. This toughness in the face of the elements, the richness of information harboured in its sequence and the rapid advance of technologies to analyse it mean that we are on the brink of a revolution in the implementation of DNA based approaches in medicine and beyond. The “Personal Genome” is no longer the subject of idle intellectual conjecture, but a reality that we will be confronting ever more frequently as a society in the next few years. The Institute will contribute to the science base for its understanding, will facilitate its usage and will take an active part in the debate on how it should be implemented.

Wellcome Library, London ] [

Mike Stratton FMedSci FRS Director

- 4 -

Wellcome Trust Sanger Institute Strategic Plan 2011-2016

1. History and philosophy The Sanger Centre (as it then was known) was founded in 1992 by and colleagues with a mission to sequence genomes that are relevant to human health. In its first decade Sanger sequenced numerous pathogens that cause human disease (for example, the microorganisms responsible for malaria and tuberculosis) and many key model organisms (including C. elegans, S. pombe and the mouse). In aggregate, this was an immense contribution to medical science that culminated in the historic first draft of the at the turn of the millennium. Around these extraordinary undertakings Sanger and the Wellcome Trust developed principles of open data release that have now been widely adopted. Sequencing genomes also endowed us with a tradition of large scale projects, high throughput biological data generation and informatics which still define our science today.

In 2000 John Sulston stepped down and was appointed as the second director. His primary goal over the last 10 years has been to convert Sanger into an academic genome institute. In 2001, the Sanger Centre became the Sanger Institute supported by a five year core budget from the Wellcome Trust. New infrastructure has been built and the Institute has evolved into a thriving scientific enterprise with a basic research mission and a commitment to train the next generation of scientists. Our Faculty have established major programmes in human, model organism and pathogen genetics as well as informatics. Today, the Institute is known both for its legacy genome projects and its activities in human, model organism and pathogen genetics.

In 2010, Mike Stratton became Director with the intention of using genome sciences to transform our understanding and management of human disease through basic and applied research. Our scientific mission over the next five years will be to understand the implications of variation between individual genome sequences. In this aspiration we aim to make scientific discoveries that are fundamental and enabling, unleashing new waves of biological exploration by ourselves and others. We also intend, however, to follow through and develop our discoveries closer to the needs of patients.

Our endeavours will continue to be at large-scale, using high throughput approaches and generating large quantities of data, a scientific niche that distinguishes us from most of UK and European science. We remain firmly committed to facilitating science by others through release of our data in a timely manner. Whatever the scientific context in which we work, we aim to be at the cutting edge, leaders in the generation, analysis, interpretation and extraction of biological insight from genome sequences.

2. Organisation of the science Sanger will focus on four scientific themes; human genetics, pathogen genetics, model organism (mouse and zebrafish) genetics and informatics.

Within these four themes, the portfolio of experiments has been organised into six Programmes. Each Programme defines a major subject of Sanger research which has a particular biological, disease or analytic focus. Thus there are Programmes on broad biological areas such as Human Genetics, Pathogen Genetics and Model Organisms; on specific diseases, including Cancer and Malaria; and using particular scientific approaches, notably Informatics.

-5-

Wellcome Trust Sanger Institute Strategic Plan 2011-2016

Further to the scientific Programmes there are two Programmes which are designed to inform, interest and inspire the world about genomic science and to train the next generation of genomic scientists: our Public Engagement and Graduate Training Programmes.

Each of the scientific Programmes explores more than one of our core scientific themes. For example the Cancer Genetics and Genomics Programme incorporates studies in both human and mouse genetics; the Human Genetics Programme follows up its discoveries from human studies using mice and zebrafish; the Malaria Programme includes studies in human, Plasmodium and mouse genetics and Pathogen genetics includes experiments in mice and ES cells. All the Programmes are underpinned by Informatics. Thus, across the Sanger portfolio scientific theme, expertise and people are closely interwoven.

Each scientific Programme is composed of two or three projects, resulting in a total 16 projects, which are the primary scientific units of the portfolio. Each project has clearly focused aims, has been conceived of and designed by multiple Faculty and is supported by one or more of the Institute’s major infrastructure platforms (core facilities). The projects differ in scale and cost such that their requested resources vary from £1 million to £40 million depending on their nature and ambition.

The figure below illustrates the scientific Programmes [blue boxes] with their associated projects [grey boxes].

Most of the projects bring together Sanger Faculty with external groups who contribute expertise, knowledge and biological samples to a scientific question of common interest. All of the human and pathogen genetics projects have these external dependencies involving collaborative partnerships that are well established and with global access. The Mouse and Zebrafish Genetics Programme is also becoming increasingly embedded in a community of researchers who are interested in accessing mutant animals. Much of the activity in the Genome Informatics Programme is jointly run with the European Institute (EMBL-EBI). In summary, very little of our scientific portfolio is conducted in isolation.

The Programmes and projects were conceived by our Faculty through discussions over several months that were conducted in six Working Groups and which generated multiple iterations of the proposals. Concurrently, there was consideration by the Board of Management, with feedback of strategic advice concerning structure and resources at

- 6 -

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 multiple stages of Plan development. The overall consequence of this process has been a reduction in the number of Programmes and projects from the first to the final version of the Plan. The breadth of our Portfolio is clearly central to our impact. However, to ensure delivery, focus is also critical. Therefore in the course of these discussions, some potential scientific initiatives have been left out. We believe that the balance between breadth and focus is now appropriate.

Oversight and management of the science over the next five years will continue to be the responsibility of the six Working Groups, each managing one of the six Programmes. Each Working Group will be chaired by the Programme Heads and will be composed of the Faculty members who will contribute to the studies.

Interim assessments of scientific progress will be provided by reviews of each Programme undertaken twice during the course of the forthcoming quinquennium at our yearly Faculty retreats. Each Programme will also be scrutinised and discussed twice during the course of the five years by the Institute’s Scientific Advisory Board.

3. Developments in the scientific portfolio Many of the projects in the Plan reflect continuation, development and of previous activities within our thematic areas. However, developments in technology have wrought huge changes on most of our science, such that it is unrecognisable in scale and scope.

Notably, sequencing technology has undergone seismic shifts in the last few years with 10,000-fold increases in sequencing rates at the same cost. This has transformed opportunities for new scientific insights across all our scientific areas. As a result of this single change, most projects in the portfolio have been totally rethought compared to their design just a couple of years ago. Similarly, changes in other areas have had substantial impact, for example technological advances in insertional mutagenesis using the PiggyBac transposon have resulted in expansion of plans in the Cancer Genetics and Genomics Programme for somatic insertional mutagenesis in mice.

There are several new projects, however, compared to the portfolio in the 2005 Strategic Plan. The fact that many have already been initiated and are developing maturity reflects the privilege and strength of our position as a core funded institute: we are able to identify new directions based upon scientific opportunity and medical need and to act on them promptly.

A focus on Malaria was proposed as a new area of investment in 2005, but without a detailed scientific plan. Now, well developed studies of Malaria in humans, parasites, insect vectors and experimental organisms constitute one of our six scientific Programmes. Within the Pathogen Programme, studies on bacteria and parasites continue, but the Virus Genome Diversity Project, initiated 18 months ago, is a new and timely departure as we face the prospect of waves of global virus epidemics and their potential for causing large numbers of deaths.

The Human Genetics Programme has now developed a much broader and more coherent front of research into the genetics of human disease, incorporating both common and rare diseases. However, the continuing strategic development of our studies in Human Genetics is reflected in a new initiative in developing countries, particularly in Africa, extending into the genetics of susceptibility to both communicable and non-communicable diseases.

-7-

Wellcome Trust Sanger Institute Strategic Plan 2011-2016

Translational research also constitutes a much stronger theme than previously, with multiple new lines of enquiry to this end, for example in the Cancer Genetics and Genomics Programme.

The pace of scientific change is furious. Even since we submitted documents for the Quinquennial Review in December 2009, there have been major scientific developments to which we are considering our response. One area of rapid advance is the study of genetic variation in human induced pluripotent stem (iPS) cells, a field in which Sanger Faculty are already making world-leading discoveries (see the Mouse and Zebrafish Genetics Programme). iPS cells are derived from ordinary human cells (for example from blood or skin) but are immortal and can potentially be differentiated into many other cell types such as liver or nerve cells. To exploit the emerging scientific opportunities these new technologies offer on a large scale, i.e. genome-wide and potentially using thousands of human samples, we are developing plans for the introduction of a new core facility dedicated to large scale, high throughput cell culture at Sanger.

4. Summary of the Scientific Programmes 2011 – 2016 The breadth of the science proposed for the next five years at Sanger is outlined below in a synopsis of the major highlights of the six scientific Programmes. For a detailed description of the Programmes, their constituent projects and their aims, see Supporting documents section 1.

4.1 Cancer Genetics and Genomics All cancers are clones of cells that originate from a single cell which has lost normal mechanisms of growth control. This deficiency in growth regulation is caused by changes in the DNA sequence of the cancer progenitor cell, known as somatic mutations, that have been acquired during the lifetime of the cancer patient and which alter key genes, known as cancer genes.

The aim of this Programme is to elucidate how cancers develop, in particular to identify the mutated genes that cause normal cells to become cancer cells, and then to use this information to improve the care of cancer patients. The Programme is constituted of two projects, one primarily based on whole genome DNA sequencing, the other using experimental models of cancer development in mice.

Positron emission tomography scan of a patient with We will sequence the genomes of approximately 2,000 metastatic . Black human cancers (and the normal genomes from the same blotches are cancer deposits. individuals) in the next five years. Our studies will focus on three tumour types; breast cancer, bone cancer and myelodysplasia (a cancer of white cells of the blood).The primary goal of these experiments is to generate essentially complete catalogues of somatic mutations from each individual cancer sequenced.

- 8 -

Wellcome Trust Sanger Institute Strategic Plan 2011-2016

Through analysis of these catalogues we will identify the mutated cancer genes that have been operative in the development of these three cancer types and hence elucidate the biological pathways that have been subverted in order to convert normal cells in these three tissues into cancers. The encoded by some of these mutated genes may be fruitful targets for the development of new drugs, as has been the case with BRAF mutations in malignant melanoma which we discovered several years ago.

Furthermore, scrutiny of the types of somatic mutation that are found will provide insights into the processes that actually generated the mutations in the first place, for example exposures to tobacco-smoke carcinogens, to ultraviolet light or to currently unknown agents which may have been active decades before the cancer developed.

Our results will be released alongside data from similar studies on most other human cancer types to be carried out over the next decade by the International Cancer Genome Consortium, an organisation that we were instrumental in initiating in order to render worldwide coordinated and comprehensive. Together the results will represent an historic and lasting legacy of information on the genetic basis of cancer development, providing the foundation for future research into the biology and epidemiology of the disease, and prompting new strategies for cancer diagnosis, therapy and prevention.

The discoveries we make from sequencing the genomes of human cancers will be considered alongside results obtained through studies of experimentally generated cancers of mice. These cancers will be induced using retrotransposons (DNA sequences that jump out of their positions in the genome and randomly insert elsewhere) and will be carried out in cells from several tissues and on multiple different genetically engineered backgrounds. Sequencing of the genomic insertion sites of the transposons in these cancers will reveal the cancer genes that have been recruited to contribute to tumour development in these animals. Representation of the complete catalogue of somatic mutations This set of experiments in mice, which, to our knowledge, from a human malignant is the largest in the world of its type envisaged in the next melanoma. decade, is configured to exhaustively explore the sets of cancer genes and biological pathways that can contribute to cancer development. The data from these experimental mouse studies will thus provide supportive evidence for the results of our human sequencing studies and explore in more depth the biological pathways that can contribute to oncogenesis.

The Programme will translate discoveries from cancer genomes into developments that we envisage will transform patient management. By analysing the genomes of approximately 1,000 immortal cancer cell lines and then examining how each of these cell lines responds in vitro to approximately 400 anti-cancer therapies (separately funded by a Wellcome Trust Strategic Award), we will develop understanding of how the presence of a mutated gene within a cancer influences its response to anticancer drugs. These findings hold the potential, in future, of optimising therapies for individual patients by indicating which drugs to select for cancers carrying particular mutated genes.

-9-

Wellcome Trust Sanger Institute Strategic Plan 2011-2016

We will also develop novel approaches to monitoring the tumour burden in patients. These studies will exploit rearrangements (breakages and abnormal rejoinings) of the genome that are specific to each individual cancer. Many cancers leak naked DNA into the bloodstream and we aim to detect rearranged fragments of DNA, which originate from cancer cells, in plasma samples taken from patients, using them as surrogates for the presence and amount of cancer remaining in the patient’s body. This strategy could have many applications, from assessing the early response of a cancer within hours of starting a new drug treatment to detecting early recurrence of a cancer years after it has apparently gone into remission.

Finally, we will present these data publicly in a comprehensible and usable form to cancer researchers and clinicians. This will be achieved through our COSMIC database (Catalogue of Somatic Mutations in Cancer), the only comprehensive database of cancer somatic mutations which is used by essentially all researchers in the field.

4.2 Human Genetics In the last few years it has become clear that the chance of acquiring a very wide range of human traits or diseases is influenced by inherited differences (genetic variation) between individuals. In the next five years our studies will investigate the differences between individual human genomes that underlie a broad spectrum of human phenotypic differences, from normal variation (for example, in height or hair colour) that are without obvious adverse consequences to disease syndromes with significant morbidity and mortality.

Child with a cleft palate: a developmental Our studies will provide major insights into the genetic basis of disorder sometimes many human diseases. The diseases we will study range from caused by genetic more common, so-called “complex” diseases that may be caused abnormalities. by multiple genetic variants even in a single person to rarer, “simpler” diseases which are likely to be genetically more homogeneous. Indeed, a major aim of our studies will be to define the extent to which “common diseases” may be considered a combination of many, individually rare, disorders, which are distinguishable only at the genetic level.

The Programme is composed of three projects which address naturally occurring variation in normal populations, genetic variants underlying rare diseases and variants underlying common diseases. It will be closely integrated with two recent, separately funded, major initiatives in Human Genetics at Sanger which have already started; the UK10K Project, supported by a Strategic Award from the Wellcome Trust, and the Deciphering Developmental Disorders Project (DDD), supported by the Health Innovation Challenge Fund. The UK10K Project will sequence, over three years, all or part of the genomes of 10,000 individuals including 4,000 from two cohorts within the UK and 6,000 individuals with disease. The DDD Project will analyse copy number changes in the genomes of 12,000 children with developmental disorders over the next five years. Subsequently, sequencing of cases from the DDD Project in which causative copy number changes have not been found will form a major component of the studies in this Plan.

The Human Genetics Programme is the largest in the Sanger portfolio. The widely encompassing body of work it entails, coupled with the scale of the associated UK10K and

- 10 -

Wellcome Trust Sanger Institute Strategic Plan 2011-2016

DDD Projects, will make it a major influence on the world stage in the next decade in developing ideas relating to human genetics and disease.

The Programme will primarily achieve its aims by large scale sequencing in a total of ~40,000 individuals. These sequencing studies will range from the whole genome at low coverage in some individuals, to the whole genome at high coverage in others and restriction to the coding exome and microRNAs (approximately 1% of the genome) in a further subset of cases. Importantly, the Programme also incorporates experimental exploration of how the abnormal genes which are identified through these sequencing studies result in disease by reconstructing mutations artificially in mice and zebrafish.

We will study genome variation in normal populations. Having examined the UK population through the UK10K Project (see above), we will extend this type of investigation to hundreds of individuals from selected populations in Europe. These populations from Orkney, Finland, Sardinia and other locations are unusual in originating from a few tens to a few hundred individuals. In consequence, by sequencing relatively limited numbers of people within each population, it will be possible to find much of the genomic variation present, which in turn will provide niche opportunities for studying the genetic basis of disease and other phenotypes.

Studies in normal populations will also allow us to detect DNA Variation in normal variants within genes which have been selected during the recent human populations. evolution of human populations. As populations migrate and [, London] encounter different environments, sequence variants in certain genes provide advantage to those carrying them and, consequently, these individuals tend to prosper reproductively. Environmental or lifestyle influences that exert this selective influence include sunlight, altitude, infectious disease and milk consumption. Thus by searching for evidence of such selection and examining the functional consequences of the sequence variants that underlie it we will be able to shed light on the environmental forces that have been active during human evolution.

The set of genomic variants in a population is not static. New genetic variants are continually being added to human populations. Every newborn child has approximately 100 DNA sequence variants that are not present in either parent, but were generated in the sperm or egg from which the child developed. We will examine in detail how many new variants each child has, explore whether there are differences in that number between individual children and, if so, begin to investigate what influences, genetic or otherwise, might be responsible for those differences.

Some of the diseases or disease-related traits we will study are relatively common in developed world populations, including obesity, Crohn’s disease (an inflammatory disease of the bowel) and variation in blood lipid levels. Over the last few years, many DNA variants that confer an increased risk of these three diseases/traits have been identified. However, most of these are relatively common in the general population (i.e. present in more than 5% of the population), cause a very small increased risk and together only explain a limited proportion of the inherited risk of these diseases.

-11-

Wellcome Trust Sanger Institute Strategic Plan 2011-2016

So, in the next five year phase of study we will be investigating the contribution made by rarer causative variants, each of which may well confer a greater risk of these diseases. Ultimately we aim to use the three diseases as illustrative examples which we will explore exhaustively, uncovering essentially their complete genetic landscapes of inherited predisposition and showing how this may differ between different conditions.

An important initiative in this quinquennium is to substantially increase our investigations into the genetic basis of common diseases in African populations. This new wave of studies in Human Genetics is part of a much broader front within the Institute’s research portfolio, including several in the Pathogen and Malaria Programmes, addressing health issues in developing countries. These studies are motivated by their relevance to local disease burden. In addition, however, the ancient origins of African populations confer differences in patterns of genomic variation which provide particularly powerful opportunities for discovering disease-causing variants.

Our studies in Africa will include diseases that are prevalent in the region, for example sickle cell disease. Homozygosity for a DNA variant in the beta-globin gene has, for several decades, been recognised as causing “sickling”, a condition in which the red blood cells are rigid, undergo changes in shape and block small blood vessels. However, there is substantial variation in the severity of sickle cell disease, even though all individuals carry the same variant. Some of this is likely to be due to other genetic variants and we aim to identify these. Our studies will also include diseases that are common in populations of developed countries, such as diabetes and cardiovascular disease, but which are increasing in incidence in Africa. Underpinning these experiments will be our engagement in local capacity building in order to engender local changes in scientific expertise and infrastructure which will then become self sustaining.

In addition to these “common” diseases, we will be extensively exploring the genetic variants that cause rare human genetic disorders. For most of these, we expect there to be a single genetic abnormality underlying the disease in each individual. Nevertheless, their study is often complicated. For example, two individuals with very similar phenotypic abnormalities may have underlying disease-causing mutations in two different genes. Conversely, two individuals with abnormalities in the same gene may appear rather different.

We will use the power of the new generation of sequencing technologies to systematically examine part or all of the genome of thousands of people, investigating a wide range of rare genetic diseases. A large proportion of the cases will be children from the Deciphering Developmental Disorders Project in whom analyses of copy number change have not yielded the causative mutation. In addition, however, children with congenital cardiovascular disorders and individuals with epilepsy and other diseases will be investigated.

Together, these studies addressing the genetic basis of human disease will have two main impacts. For some diseases, discovering the abnormal gene responsible will have immediate clinical impact for the affected families. The field of genetics is remarkable for the speed with which its discoveries are translated into clinical practice. For all the diseases studied, uncovering the underlying abnormal genes will provide biological insights into the pathogenesis of the disease, information upon which future hopes for therapeutic intervention is based.

- 12 -

Wellcome Trust Sanger Institute Strategic Plan 2011-2016

4.3 Pathogen Variation The Pathogen Variation Programme will provide a detailed understanding of the key differences between the genomes of individual microorganisms from a wide and diverse set of pathogenic species including eukaryotic parasites, bacteria and viruses. We will provide insight into important consequences of this variation such as virulence and antibiotic resistance.

The Programme will achieve its aims by identifying variation in pathogen populations in various Scanning electron micrograph of ecological niches ranging from the healthcare Shigella bacteria interacting with a cell. system of the UK to many sites internationally, particularly in developing countries. The Programme is composed of three projects, one each for Bacteria, Viruses and Parasites.

We will sequence the genomes of multiple isolates of many bacterial species which are human pathogens including Salmonella, Staphylococcus aureus, Streptococcus pneumoniae, Mycobacterium tuberculosis and Clostridium difficile providing us with an overview of natural variation in populations of these microorganisms.

Working with researchers from hospitals in disease endemic regions we will map variation in these pathogen genomes onto local geographic coordinates. Then, using the genomic variants as markers of relatedness, we will be able to track the spread of each species; from ward to ward of a single hospital over days or from continent to continent over months and years. Furthermore, by comparing the genome sequences of many well characterised isolates we will be able to identify individual variants that have altered virulence potential, resulted in antibiotic resistance or have caused escape from vaccination.

One set of microorganisms of particular interest are those that inhabit the human body itself, known as the microbiota. In healthy human beings our internal and external body surfaces (for example gut and skin) have been colonised by a complex mixture of bacterial species. We will be sequencing the genomes of bacterial samples from various body surfaces to identify the organisms that are present and will then investigate how the bacterial mixture varies in health and disease, linking with studies in the Human Genetics and Mouse and Zebrafish Genetics Programmes.

In parallel with these studies of natural populations of bacteria, we will investigate the phenotypic consequences of experimentally knocking out genes from these microorganisms. By creating populations of bacteria, each member of which has a different gene knocked out, and then exposing such populations to different experimental conditions in vitro and in vivo we will be able to identify genes which are necessary for survival, virulence and reproduction and assess whether the same or different genes are necessary under different environmental conditions.

-13-

Wellcome Trust Sanger Institute Strategic Plan 2011-2016

For disease-causing protozoa (single celled parasites), we will compare genome sequences between and within species to understand the genetics of drug resistance and virulence. Our primary targets will be Plasmodium (malaria, see below), Leishmania (leishmaniasis) and Trypanosoma (sleeping sickness).

Other parasites, such as helminths (multicellular worms), have large and complex genomes and are currently not Schistosoma mansoni sequenced. These include Schistosoma (bilharzia), trematodes under low Echinococcus (tapeworms) and Strongyloides (threadworms). magnification (78x). Such neglected pathogens infect millions of people worldwide [CDCI Marianna Wilson] with substantial morbidity. We will therefore now generate their sequences, as we have for many organisms in the past. The importance of laying this genomic groundwork cannot be overstated. It opens completely new avenues of study, attracting scientists to work on organisms which have previously been difficult to investigate, in essence creating a new community of researchers and new scientific field.

We will sequence the complete genomes of viruses from hundreds of patients with diverse viral illnesses. This will allow us to track the development and spread of viral diseases, for example influenza, and provide unprecedentedly fine-grained insights into the evolution of epidemics. An intriguing path of exploration into the unknown will be the search for currently unidentified viruses that cause human disease. We will investigate this by sequencing samples from patients with illnesses that have features of viral infection, looking for new DNA or RNA sequences which may turn out to be from novel Herpes simplex virus. viral pathogens.

It is important to recognise that many of the investigations outlined here as research studies represent templates for the application of sequencing technologies in routine clinical settings in the future. For example, sequencing the genomes of bacterial samples fresh from a patient and without culture may become the optimal way to identify the pathogenic organism, at the same time providing understanding of where it has come from and predicting responsiveness to therapy. Thus our studies are likely to be forerunners of future diagnostics and we intend to play a major role in facilitating this transition.

A key influence in determining the course and outcome of infection by a pathogen is the response of the host, in particular of its immune system. We will explore the role of host genes in preventing or facilitating pathogen infection by using mice and embryonic stem cells in which individual genes have been knocked out. By challenging these animals and cells with bacteria and viruses, and then monitoring the success and extent of pathogen infection we will be able to identify host genes which modulate the spread of the organism, potentially providing new avenues for prevention and therapy.

- 14 -

Wellcome Trust Sanger Institute Strategic Plan 2011-2016

4.4 Malaria Malaria affects half a billion people and is associated with substantial mortality and morbidity in many parts of Africa, Asia and Oceania. The Malaria Programme aims to make a key contribution to the scientific body of knowledge which will underpin global efforts to eliminate the disease.

There is considerable variation in mortality and morbidity due to malaria (both between populations and within populations) that is the consequence of many complex interacting factors. Red blood cells infected with These factors may influence the host (the human), the vector malaria parasites. [Public Health Image Library] (the Anopheles mosquito) or the Plasmodium parasite itself.

Our studies over the next five years will investigate DNA variation in the genomes of all three organisms, the human, vector and parasite, in order to understand the genetic basis of different patterns of malaria infection and its outcome. The Programme is composed of two projects and employs both large scale sequencing of naturally occurring variation and experimental studies in model organisms to generate insights into disease causation and to provide strategies for designing new therapies.

A major obstacle to malaria control is the ability of the malaria parasite to develop resistance to anti-malarial drugs and of the mosquito vector to develop resistance to insecticides. We will be sequencing the genomes of malaria parasites from thousands of patients with the disease and of hundreds of Anopheles mosquitoes. The variation we find in the genomes of both organisms will provide insights into the genetic factors that underlie resistance and will allow us to explore their geographical spread.

In performing these studies, we will have also generated tools allowing us in future to gain early warning of the emergence and spread of new forms of drug or insecticide resistance and to monitor the effects of public health interventions. Moreover, the sequence data accumulated by these studies will provide a remarkable resource for the future study of the basic biology of parasite and vector.

In addition to the effects of genome variation between individual Plasmodium parasites and individual Anopheles mosquitoes, variation in the severity of malaria infection is also attributable to DNA sequence differences between the genomes of the humans who are infected. In order to gain a more complete understanding of this variability, we will be examining thousands of human genomes from multiple populations in Africa, Asia and Oceania generating a detailed catalogue of the genetic determinants of malaria resistance. The ultimate goal of these studies is therefore to identify the molecular mechanisms conferring protective immunity against malaria, how these vary and subsequently to use the insights gained to develop new protective or therapeutic strategies.

Our studies of malaria and other pathogens have been based upon capacity building, infrastructure support and thoughtful models of partnership with colleagues in Africa. These will continue to be themes in this Programme and will be important mainstays in the future expansion of the Human Genetics Programmes in Africa.

Proceeding alongside these studies of naturally occurring variation will be studies using experimental modifications of model organisms to understand better how malaria develops

-15-

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 and to identify possible drug targets. Our studies will examine the mouse parasite Plasmodium berghei, which is of the same family as Plasmodium falciparum and Plasmodium vivax which cause malaria in humans.

Using technology we have developed at Sanger, we will knock out several thousand genes in Plasmodium berghei and investigate how effectively parasites missing each gene are able to infect mice. In principle, P. berghei genes which are required for successful infection constitute plausible drug targets, because anti-malarial drugs would be expected to inhibit the protein encoded by the gene and thus mimic the effect of a gene knockout.

Finally, we will be examining directly the physical interactions between Plasmodium falciparum merozoites (the phase of the malaria life cycle that infects human red blood cells) and the red blood cell itself. The aim of these studies is to identify the proteins on the surface of the parasite and those on the surface of the red blood cell which directly interact in order to allow the parasite entrance to the cell. These host-pathogen interactions might well represent sensitive points in the life cycle for development of new therapies.

4.5 Mouse and Zebrafish Genetics The aim of the Mouse and Zebrafish Genetics Programme is to use mice, zebrafish and embryonic stem cells to examine systematically the roles of thousands of genes and ultimately all the genes in the genome. The overall strategy we will adopt is to “knock out” or render inactive through mutations both parental copies of each Mice are used as model gene (homozygous knockouts) and subsequently observe organisms: obese mouse (left) the effects on the phenotype of the organism or cell. and its littermate control (right). In addition to the systematic studies outlined in this Programme, experiments in model systems which investigate specific genes found through our studies of natural variation in humans and in pathogens are embedded in each of the previously described Programmes.

By their nature, the experiments in this Programme will generate physical resources (mice, zebrafish, and cells) which can be subject to further detailed analysis in order to understand how each gene exerts its effects. These will be freely available to scientists around the world.

In the next five years we will identify inactive versions of thousands of zebrafish genes and subsequently generate fish with non-functioning versions of both parental copies of each gene. Indeed, the already ambitious goals of this project have advanced further over the last few months because the major constraints on identification of mutant animals have been the Adult zebrafish. rate of DNA sequencing and sequencing costs. Thus the recent [Wellcome Library, London] advances in sequencing technology mean that we now realistically anticipate knocking out every protein coding gene in the zebrafish genome over the course of the next quinquennium.

Through Sanger’s provision of the reference zebrafish genome sequence (now essentially finished) and our generation of mutant fish for every gene in the zebrafish genome, we anticipate that the Institute will have transformed the use of this model organism in studies of

- 16 -

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 biology. Since basic phenotyping of zebrafish is relatively straightforward and since zebrafish can be studied in their thousands with relative ease, our studies promise to confer an ever more important future role for this model organism in large-scale biology.

Using the resource of mouse ES cells which has been generated in the last five years, which includes targeted heterozygous null alleles of most protein coding genes in the genome, in the next quinquennium the Mouse Genetics project will generate and phenotype 500-1,000 homozygous knockout alleles in mice. These studies will provide insights into genes which are required for normal embryonic development, which are needed for normal fertility and which underpin a range of metabolic and other processes, for example resistance or susceptibility to infection. It is anticipated that these studies will become part of a larger collaboration orchestrated by the International Mouse Phenotyping Consortium providing the foundation of a future systematic genome-wide annotation of mammalian genes with respect to their role in diverse physiological and developmental processes.

Examining the consequences of knocking out genes in whole animals has the important advantage of reporting each gene’s role in the construction of tissues, organs and the complete . However, by the nature of experimentation on whole animals, with its requirements for breeding and animal maintenance, studies are slow and resource intensive. Thus in the next five years we will also introduce studies of homozygous gene knockout in embryonic stem cells (ES cells).

ES cells are derived from early embryos. They are immortal and can differentiate into all cell types in the body. Since many ES cell lines can be examined in parallel and at relatively low cost, these studies will allow many more genes and phenotypes to be examined than is possible with whole animals. By differentiating ES cells to various cell types we will, therefore, be able to investigate the role of each gene in multiple different cellular A colony of contexts. undifferentiated mouse embryonic stem cells Indeed, although we originally proposed to carry out these growing in culture. studies in mouse ES cells, we now believe that recent [Jenny Nichols, Wellcome Images] technological advances will allow us to introduce another model organism into our studies, namely humans, in the guise of human induced pluripotent stem cells (iPS cells). These are human cells derived in vitro from somatic human cells (for example fibroblasts, cells of the haematopoietic system etc) and which have many features of mouse ES cells.

Although this technology is still in its infancy and uncertainties remain, it appears likely that it will soon be feasible to convert accessible cells from humans (for example from blood or skin) into iPS cells and then to differentiate them into many of the cell types present in the human body, such as nerve or liver cells. Moreover, it also appears likely that these cells will be genetically manipulable in a similar manner to mouse embryonic stem (ES) cells.

Thus a major new avenue of research is opening up in front of us; to study the biological consequences in vitro of both naturally occurring variation (using iPS cells potentially derived from thousands of people) and artificially engineered DNA variation across the spectrum of human cell types. The forms these experiments might take and their ramifications are myriad. In particular, however, they may ultimately offer the opportunity for introducing small molecules or drugs into these systems to assess their biological effects.

-17-

Wellcome Trust Sanger Institute Strategic Plan 2011-2016

To implement this type of experiment on a large scale, we are currently scoping the requirements for a new core facility at Sanger to undertake large scale, high throughput culture and phenotypic monitoring of cells. We are already well-informed and experienced in such endeavours through the cell culture facility used by the to expose cancer cell lines to anticancer drugs. Underpinned by robotic platforms and sophisticated imaging systems, this platform would allow us to culture thousands of cell lines simultaneously, examining how DNA variation and other perturbations cause changes in the biological function and metabolism of human cells.

4.6 Genome Informatics The Genome Informatics Programme is central to scientific progress at the Institute and elsewhere. A major global challenge over the next five years will be coping with the amount of biological data generated, particularly in genomic and genetic research, which is set to grow exponentially. At Sanger, we will be in the front line of confronting the problems posed by analysis of these data and in providing organised and accessible databases that Inside the Data Centre at the facilitate global biological innovation and discovery by Sanger Institute. others around the world.

The Programme aims to continue organising and developing important data resources, enabling systematic integration of data and methods, and driving innovation in computational biology. A key theme for the next five years is to develop these resources and analysis strategies taking into account variation in the genome within populations, whether of humans, pathogens or model organisms, and integrating raw phenotype data. This will maximise the potential for interpreting experiments carried out in individuals or in specific cell types. Many of the activities in this Programme in particular are collaborative or aligned with the EMBL- EBI, most notably the Ensembl genome browser.

An important aim over the next five years will be reanalysing and reassembling three key vertebrate genomes, human, mouse and zebrafish, in particular to capture regions of significant variation within the population as alternative haplotypes. Although the human and mouse genomes have been announced as “finished” there are regions within each in which the assembly requires revision and other areas which have simply not yet been sequenced or adequately assembled due to recalcitrant local sequence features. Given their central importance, we will continue to improve the sequences of these genomes.

Equally important, we will continue to provide key reference outputs from these and other genomes. Notably, we will compile complete sets of protein coding and RNA genes together with accurate structures of each gene. In collaboration with other Programmes, we will develop improved ways to represent genome sequences from populations, such as from the Human Genetics and Pathogen Variation Programmes, and will develop better representations of the increased amounts of complex phenotype data linked to individuals being studied in the Human Genetics Programme and from the Mouse and Zebrafish Genetics Programme. These, and the wealth of additional data types which are being associated with each of these genomes by many research groups around the world, will be presented to the research community in an easily navigable and intelligible way through the Ensembl genome browser (in collaboration with the EMBL-EBI).

- 18 -

Wellcome Trust Sanger Institute Strategic Plan 2011-2016

We further aim to provide databases that organise sequence data and present it in ways that maximise the biological knowledge that can be extracted. For example, we will continue our organisation of proteins into families which are related to each other through sequence similarity (through the database) and similarly organise RNA genes into families through structural similarities. In addition, we will be developing a variety of approaches and tools that will useful in analysing and understanding genome data. These will include novel and improved ways of assembling individual genomes, ways of identifying regulatory regions within the genome and methods for predicting the likely functional effects of genomic variation.

It is anticipated that organisation of data linking genome variation and phenotype in humans will lead to increased connections between this Programme and healthcare databases. One aspect will be facilitating research involving data from medical records. In many countries electronic medical records systems are being built and programmes are being set up to facilitate research by federating datasets in a secure fashion, such as the UK Research Capability Programme of the National Health Service. These will ultimately want to federate with databases of genetic annotation and we will facilitate this.

Beyond what is planned in this Programme, of major importance in the next five years will be consideration of the types of data resources and tools that could be developed for use by clinicians, researchers and the general public in interpreting results emanating from personal genomes. The integration of raw data concerning the phenotypes associated with DNA variants is already planned. However, extracting validated information from this and developing decision making tools to aid clinicians, and possibly others, is complex. It is a clinically based project that will require additional resources and collaborations, for example with specialised clinical research institutes. As this could ultimately be a major translational output of the overall Programme, we are currently reviewing with EMBL-EBI what the priorities are in this area, what would be required to service them and what could be realistically achieved.

5. Research into diseases of developing countries Studies of diseases prevalent in the developing world, particularly in Africa and Asia, have been a distinctive feature of the Sanger portfolio for many years. In the past, this has included sequencing the genomes of many pathogens including those that cause malaria, leishmaniasis and schistosomiasis.

More recently, there have been studies of molecular epidemiology, in which genome variation within populations of pathogenic microorganisms has allowed detailed tracking of their geographical spread, with direct implications for public health. Infectious diseases, such as malaria, have Research into diseases of developing countries is primarily instigated major health impact by the health needs of the local populations. There are also, on children in Africa. however, particular opportunities for scientific insight based, for example, on the fact that Africa is the ancestral origin of all human populations.

-19-

Wellcome Trust Sanger Institute Strategic Plan 2011-2016

Genome sequencing of more pathogens and further molecular epidemiology of key organisms will continue. As previously described, studies of the genomics of Malaria now constitute a Programme in its own right. This is a mature collaborative enterprise based on partnership with workers in the field that has been built over several years and which has accumulated important experience for us to draw on in future.

We are now proposing to initiate further projects in Human Genetics, including an investigation into the genetic basis of variation in disease severity in sickle cell disease and studies of susceptibility to metabolic diseases. Additional potential studies have emerged in the last few months and we intend actively to facilitate and develop contacts that bring new research opportunities.

These developments will be undertaken in close collaboration with the Wellcome Trust Overseas Programmes. They are also firmly based on the Institute’s previous experience in building research partnerships in developing countries and will rely on networks formed by pre-existing programmes such as MalariaGen.

Associated with building scientific research programmes we will contribute to local capacity and expertise by supporting key workers, by exchange programmes with collaborators and through training. Several students on our Graduate Programme are from developing countries, a trend that we will look to further encourage, develop and support. We will also support initiatives in the Wellcome Trust Advanced Courses at sites in developing countries, facilitating their local establishment with the attendant growth of expertise.

6. Core platforms for data generation and analysis Following its establishment as a genome sequencing centre in 1992, experiments at Sanger have been conducted at large scale and high throughput, resulting in the generation of large amounts of data. The scale of our experiments remains a key defining characteristic that distinguishes Sanger science from that of most biological institutes and universities, and is a feature that we will continue to foster.

Conduct of science on this scale is critically dependent upon the existence of large core facilities and platforms. These require substantial infrastructure, are manned by multiple large teams of scientists and are constituted of complex pipelines which require careful management. These attributes confer upon Sanger a different shape, demographic and financial structure compared to other biological institutes.

The Institute has three major core facilities which are central to our scientific profile and productivity. These are the DNA analysis pipelines that conduct sequencing and genotyping, the Mouse and Zebrafish facility and the Information Technology resource (Data centre) which facilitates storage and processing of data. In addition, there are two smaller facilities for mass spectrometry and cytogenetics.

The current parameters of the three major core facilities are briefly summarised on the next page.

- 20 -

Wellcome Trust Sanger Institute Strategic Plan 2011-2016

DNA pipelines The Institute’s raw sequencing output is now ~1,000 gigabases per week, the equivalent of 1,800,000 bases of DNA per second and of two 20-fold coverage human genomes per day. We are one of the largest sequence producers in the world.

The main platform in the sequencing facility is currently 36 Illumina GAIIx second generation DNA sequencers which are shortly to be An Illumina GAIIx swapped for ~20 HiSeq machines, the latest incarnation of the second generation DNA Illumina workhorse, which we anticipate will provide our main sequencer. platform for the next 18-24 months. We also have 15 ABI capillary machines, two Roche 454 and will be incorporating one Pacific Biosciences and two Ion Torrent machines by the end of the year. The pipelines for handling and preparation of samples, running of machines and early data processing are composed of 140 people.

Mouse and Zebrafish facility The Mouse and Zebrafish facility has approximately 21,000 mouse cages and 5,000 aquatic tanks, annually producing over 210,000 mice and more than 50,000 fish. It is the largest such facility in Europe with one of the highest knockout mutant production capabilities in the world. The facility is run by more than 40 highly trained licensed animal technicians who supervise a diverse series of technical procedures in a highly automated and carefully tracked unit that is frequently used as an exemplar and Mice exploring environmentally model by the Home Office. enriched cages in the Institute’s Mouse and Zebrafish facility.

Information Technology The IT infrastructure at Sanger is one of the most extensive worldwide in the life sciences. It provides storage and compute services for analysing data that flow from all the scientific Programmes, and the infrastructure for externally distributing our data via services such as the Sanger website and the Ensembl genome browser.

By far the largest demand for analysis and storage, however, is generated by the DNA sequencing and genotyping pipelines. To match this demand, we now have seven petabytes (7,000,000,000,000,000 bytes) of online storage, housed in the Campus Data Centre. Increase in requirement for To keep up with the demands of DNA sequence Sanger Data Centre storage analysis we have recently increased the number of 2000-2010. compute cores in our central analysis farm, bringing the total available to research teams to more than 6,000 (equivalent to 6,000 personal computer hard disks). The facility is maintained by a team of 35 hardware and software support staff.

-21-

Wellcome Trust Sanger Institute Strategic Plan 2011-2016

Maintaining world-leading core facilities Over the next five years we will maintain and further develop these core facilities in order to sustain our scientific position. In the first instance, this will require careful consideration and management of their size and make-up. The challenges and complexities in making these decisions are discussed below in sections 7.1 and 7.2 on Sequencing strategy and Data strategy.

Equally important, however, will be the daily operation of the core facilities. Over the last few years we have been rationalising parts of the management structure in all the core facilities, most notably in the DNA pipelines, and this process will continue.

The core facilities will be supervised by a cadre of professional senior scientific managers. These individuals will have particular expertise in the operation of complex “pipelines” that may involve various combinations of people, animals, machines, reagents and computers. Given the central importance of the core facilities in the generation of our science, a key aim for the next five years will be to strengthen this cadre further, both through recruitment and development of current personnel.

This cadre will have responsibility for instilling a strong culture of operational efficiency, with attention to timeliness, quality of output, communication to users, budgetary monitoring and succession planning. They will not be members of our scientific Faculty (whose role is to conceive and oversee the science) but will interact closely with and be advised by Faculty notably through the internal Sanger Committees that oversee Sequencing, Genotyping, IT and the Mouse and Zebrafish facility. Their importance is reflected in the new position of Head of DNA Pipelines, who will be a Board of Management member and who will oversee sample logistics, sequencing, genotyping and microarray expression analysis together with primary data processing from these platforms. We have recently appointed to this position.

To provide independent oversight and ensure that the core facilities are effectively operated, each will be subject to formal review at least once during the forthcoming quinquennium by a set of panels composed of internal and external scientists with appropriate expertise.

7. Strategic issues across all Programmes In considering the future delivery and shape of our science, two major strategic issues that cut across all the scientific Programmes warrant further consideration: our strategies for DNA Sequencing and for the Data we generate. These are, therefore, specifically addressed below.

7.1 Strategy for DNA sequencing Sequencing is the most rapidly-changing technology used in the Institute. Over the last three years there have been transformative changes in the type of sequencing machine in the Sanger facility. ABI capillary sequencers have been reduced to ~15 (from ~70 five years ago and ~140 at the height of human genome sequencing) and second generation Illumina machines have increased to The sequencing facility at the approximately 40 (from none four years ago). This Sanger Institute. conversion has been accompanied by major changes in [Wellcome Library, London]

- 22 -

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 staff requirements, necessitating redundancies which have required sensitive management. The costs, throughput and platforms are likely to change again dramatically and unpredictably over the next five years. The costings used throughout the Plan have been best estimates based on our knowledge of past developments and likely future trajectories.

Whatever the challenges in navigating these rapidly shifting sands, leadership in the introduction of second generation sequencing has engendered many of the Institute’s most notable discoveries and publications over the last two years and Sanger is generally viewed as a scientific pioneer in the area. This is a status we wish to preserve. However, there are several questions that will continue to require strategic evaluation over the next five years.

Since its establishment, Sanger has been one of “big six” global genome centres, the others being the Broad Institute, Washington University, Baylor College of Medicine, the Joint Genome Institute and the Beijing Genomics Institute. Until recently, our sequencing capacity essentially matched or outstripped the other genome centres and was far greater than ordinary biological institutes or universities. In itself, this provided us with an almost unique niche in global biological science.

The current size of the sequencing facility at Sanger has been pitched to accommodate the scientific projects of Sanger Faculty and their collaborators and is predominantly determined by the consumables budget available for these projects (from core funding, from out-of- envelope Wellcome Trust funding or third party funding). After detailed discussions of this matter, we have decided that we do not wish Sanger to develop as a facility which provides “service” sequencing for projects in which we do not have scientific interest or involvement in the experimental design or analysis of data. This is a different approach from that taken by some other genome centres.

It is worth reflecting, however, on significant recent shifts in the relative size of our sequencing facility compared to that of other genome centres. Eighteen months ago, Sanger sailed the largest fleet of second generation sequencing machines in the world. Now, we are essentially third, with other centres anticipating running more than twice our number of machines. In part, this change is due to extraordinary, and possibly short term, influxes of funding at these centres. In part, it may also be due to their adoption of a “service” model. Indeed, commercial entities whose business incorporates service provision of next generation sequencing have also sprung up, including Complete Genomics, Illumina and others.

Equally important to consider is that many smaller, next generation sequencing centres have mushroomed around the globe, each of which may end up being from 20% to 75% our size. The latter development has occurred, at least in part, because the cumbersome and difficult- to-replicate liquid handling infrastructure that used to define large genome centres like Sanger, is not required for next generation sequencing.

Although we understand some of the underlying forces behind these changes, the relative size of our sequencing fleet (albeit a somewhat crude measure) will reflect the proportion of DNA sequence based science that we are contributing globally, and this is apparently diminishing. Moreover, in certain scientific areas, for example cancer genomics, Sanger’s sequence output will likely be similar to that of much smaller centres which have been developed primarily for the purpose of cancer genomics.

-23-

Wellcome Trust Sanger Institute Strategic Plan 2011-2016

We will have to carefully monitor the implications of these changes over the next quinquennium. We anticipate that the focus, scale and scientific impact of the science we are proposing in this Strategic Plan, coupled to our expertise in generating sequence data, in its analysis, in its interpretation and in its public display will sustain our leadership position. There is, however, much more DNA sequence based science that we could be conducting, leveraging further all our expertise and infrastructure. Where possible, therefore, we will seek further funding for sequence based studies and continue to expand and diversify the ways in which we use DNA sequence.

In the light of these changes in the landscape of sequencing provision, one question that we have considered internally is whether Sanger should maintain its own sequencing facility or devolve, fully or partially, to service providers. Our current position is that the centrality of sequencing to our science, with our expertise and infrastructure, coupled to the still evolving and unpredictable nature of the field, our need for close daily control of sequencing priorities, the uncertain business models and futures of commercial sequencing providers, the major differences in the nature and quality of sequencing produced by providers, our need to remain close to the sequencing hardware in order to be at the cutting edge of and interpretation, our lower cost, and our wish to remain leaders in the field, dictate that we should maximise efforts to maintain one of the largest and most efficient sequencing facilities in the world.

At our recent internal strategic sequencing review in April 2010, we reaffirmed the conclusion that our major second generation platform would be from a single manufacturer and that this would continue to be based on Illumina machines. The next crucial question is likely to be at what point third generation technologies, which are currently in development, will outstrip the throughput and costs of the current upgraded second generation machines. It seems likely that this will be within three years, but may be sooner. Crossing this boundary will probably require a complete technology refresh for the bulk sequencing platform within the Institute. Beyond this, there are suggestions of technologies that could underpin fourth generation machines with sequencing rates a thousand times faster than those currently being explored.

Coping with this uncertainty will require nimble and forward-looking processes within the Institute. Underlying this is the Sequencing Committee (SeqCom), which has members from both the scientific Faculty who are at the forefront of sequencing applications, and the sequencing production and development managers who have close links with the technology providers. SeqCom regularly reviews protocol and technology developments, and has a formal strategic platform review every 18 months. Both Faculty and the sequencing development group interact with current and future technology providers, and we aim to appraise new technologies as, or before, they become public, and bring in beta-test machines from existing and new manufacturers where they are likely to be useful. We also cultivate links with more blue-sky technology developers who have the potential for future sequencing developments.

- 24 -

Wellcome Trust Sanger Institute Strategic Plan 2011-2016

7.2 Strategy for data handling and sharing In the next five years the scale of data generated by Sanger is expected to continue increasing exponentially. This increase is largely determined by the changes in technology described above which have multiplied several thousand- fold the amount of sequence information that can be obtained for the same cost.

In addition, however, over the last few years the rate of improvement of hard disk density, CPU speed and network bandwidth have all slowed, thus falling even further behind The Ice Cube: external view of the growth in our data production. This trend of data the Data Centre at the Sanger. production outstripping our ability to store and analyse it is likely to continue in the next quinquennium.

Our strategy for handling data over the next five years is influenced by this trend, by our continuing adherence to principles of data sharing and by our mission to organise and extract biological information from data (in data resources such as Ensembl or COSMIC) which facilitates scientific usage by others.

With a view to minimising our IT requirements, we will continuously review the nature of data that is stored and the duration of storage in the light of our evolving experience of scientific requirements. With data storage and analysis becoming an ever more significant component of the full process of sequence generation, knowing the disk space and CPU required to process and analyse a unit of genome sequence will allow us to adjust allocation of resources, enabling us to adapt to future jumps in sequencing technology.

Under the Sanger Institute data sharing policy, the raw data from large scale experiments will be shared publicly. Where there are ethical concerns about the potential identification of human subjects, managed data access policies will be employed in which members of the research community will be furnished with data if they are bona fide researchers and agree to adhere to our principles of data usage.

For most data types, data sharing will be achieved through submission to public repository databases, mostly held at the EMBL-EBI, reducing our need to maintain all data in house. Although the amounts of data are large, the expected data output from Sanger is within the projections for data growth expected by EMBL-EBI.

While “repositories” ensure that data is available to other researchers, the data analysis upon which our researchers make discoveries is increasingly dependent on being able to scale our storage and compute resources to handle very large datasets. Thus we must remain competitive in this area and will need to invest with foresight in hardware over the course of the next five years.

Organising and extracting value from raw data in database “resources” enormously empowers the scientific usage of the data. Sanger Institute researchers will continue to maintain our successful database resources which organise and present data in areas where we perceive a need and in which we have specific expertise. These include Pfam (for protein families), COSMIC (for somatic mutations in cancer), GeneDB (for pathogen sequences), DECIPHER (for human germline copy number changes that cause developmental disorders)

-25-

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 and Ensembl (for genome viewing, in collaboration with EMBL-EBI). We will, however, regularly review the justification for maintaining these at Sanger and, where appropriate, transfer them to other agencies.

8. Strategy for translation

An example of translation: a tumour is removed from a patient → rearrangements are found → a blood sample is tested for rearranged DNA that has leaked out from the tumour.

At the Institute we seek to maximise the translation of our scientific output to healthcare benefit. A central pillar of our translation strategy has been open release and easily intelligible presentation of information together with provision of physical resources such as mice, zebrafish, mouse embryonic stem cells and DNA clones. Material testimony to the power of this approach is ever busier traffic on our website and the increasing stream of requests for resources. This strategy will continue.

In the past we have also adopted complementary approaches. Where appropriate, we have taken out intellectual property that provides the basis for fostering industrial partnerships. We have also initiated projects within Sanger with explicit translational goals. These include the DECIPHER database, which directly facilitates sharing of genomic data from individuals with developmental abnormalities, and translational projects in cancer genomics searching for genomic features of cancer cells that predict sensitivity to anti-cancer drugs (funded by a Wellcome Trust Strategic Award). In a similar vein, we recently obtained funding from the Health Innovation Challenges Fund for additional Sanger based translational projects including the Deciphering Developmental Disorders Project and for initiatives in the detection of cancer derived DNA in plasma. Most notably, Sanger recently spun out its first company, Kymab, the goal of which is to develop humanised antibodies that can be used as therapeutics.

In the next quinquennium, we aim to ratchet up our translation activities, recognising that our science is likely to provide further opportunities, which we should grasp, and that that there are fruitful partnerships with outside entities to be made. This will not entail a shift in our scientific perspective away from basic science. It will, however, involve a greater alertness and opportunistic attitude to translational possibilities and a more directed, rapid and bespoke approach to nurture them.

This shift will be engineered in several ways. Together with our Faculty we will regularly review our scientific portfolio in order to identify, at an early stage, research themes that show promise for translation. Each will be considered on its own merits, with the choice of translation strategy determined by the approach that is likely to maximise benefit to human health.

- 26 -

Wellcome Trust Sanger Institute Strategic Plan 2011-2016

We will identify scientific gaps that exist between our basic research and research that might be ripe for translation and aim to fill them by providing internal bridging funding for the requisite intervening studies. Finally, we will be creative in this endeavour, developing new models where necessary, as recently illustrated by Kymab a commercial entity in which access to resources for the research community is built into the strategy.

To provide internal momentum and drive to these activities we are specifically proposing to set up a Translation Office at the Institute and employ a Science Translation Manager, a scientist with previous experience of development and translation, and an intellectual property lawyer. This team will be supported by a junior assistant and an administrator. Their role will be to review the scientific portfolio, supporting and informing Faculty with respect to opportunities and approaches to translation, conducting the diligence and groundwork for translational initiatives and to interact closely with Wellcome Trust Technology Transfer Division.

We will allocate ~£250K per year from our core budget to provide bridging funding for developing aspects of our scientific portfolio to a point at which they are attractive for translation grants and awards. Our intention is that, eventually, this internal budget will be covered by royalties and revenues received from our translation activities. We will establish a committee that will serve as a forum for discussion of further translational initiatives and will provide oversight and monitoring of the impact of our bridging funding.

We will also increase external interactions between Sanger and the pharmaceutical and biotech industries in order to publicise the opportunities that Sanger science holds for industry and to increase our scientists’ awareness of what is involved in developing and exploiting our science. Early plans to this effect include an Open Day for potential industrial partners that will take place at Sanger in November 2010. We are also investigating ways in which Sanger can partner with the pharmaceutical sector in longer term strategic alliances forged through complementary expertises, requirements and scientific agendas.

9. Developing our people At Sanger we aim to provide a stimulating and Staff Scientists & rewarding work environment in which individuals at Scientific all levels can learn, develop and grow and, in so Managers, IT/ 129 Informatics, doing, to cultivate the highest calibre of scientific 215 PhD staff who can play their part in the delivery of world Students, 50 class science.

Postdoctoral Central to this mission is our commitment to , 82 developing and nurturing future generations of scientists in genomic sciences. We seek to inspire Faculty, 38 the young, to provide researchers and clinicians Research/ Technical with extraordinary opportunities to develop their Administ- Support, 266 ration, 91 skills and to foster scientific leadership capability through high quality training and mentoring. In the next five years we will build on existing partnerships Staff distribution May 2010.

-27-

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 and where necessary forge new links to further enhance the Institute’s reputation as a training environment. We will also seek ways in which to support capacity building in developing countries.

The 38 Faculty at Sanger are the creative force behind the Institute’s science and are expected to be leaders in their fields. Over the next five years we anticipate that the number of Faculty will remain more or less steady with similar distribution between the three levels of Career Development (the most junior level, of whom there are 8), Group Leader (9) and Senior Group Leader (21). However, we will recruit in order to refresh the Institute’s intellectual pool and respond to scientific fluxes and opportunities. In consequence some current members of the Institute Faculty will leave. We have a well developed Faculty model, and a process for implementing it, that will allow these changes to be achieved.

Some members of the Institute’s Faculty.

There has been remarkable success in attracting clinically trained scientists to Sanger, an achievement that has brought new directions to our science and which we aim to maintain and develop. However, we are cognisant of some issues that do require consideration. Notably, the number of women Faculty is low (four out of 38). Indeed, taking a broader view, the proportion of women falls as one moves up through the Sanger staff structure. We will review our approaches to recruitment and to managing equality and diversity, in particular our policies with respect to families to make sure that we are doing all we can to support the employment of talented female scientists.

Recruitments which are key to the future of the Institute are the Heads of Human Genetics and of Mouse and Zebrafish Genetics. The process for the Head of Human Genetics is well advanced and we are optimistic that an appointment will be made by the end of 2010. We are currently reviewing the criteria and background we aspire to in a new Head of Mouse and Zebrafish Genetics.

One concept that will be actively developed over the next five years is that of Associate Faculty. These will be scientists who work at other institutions on scientific agendas that would benefit from using Sanger’s expertise and infrastructure and which we would like to foster within the Sanger environment.

While maintaining their positions elsewhere, Associate Faculty will visit Sanger regularly, participate in, contribute to and enrich Sanger’s scientific discourse by bringing in new directions and perspectives and will be able to apply for grant funding through their Sanger

- 28 -

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 position. We will provide office space for them and their scientific staff, facilitate their travel and accommodation, and support them in using Sanger’s intellectual and physical resources to develop their research aims. We hope that, in this way, we can maximise the engagement of Sanger’s unique expertise and infrastructure with the UK research community whilst widening our scientific activities and creating opportunities for the Institute to develop in new directions.

Educating and training the next generation of genome scientists remains central to Sanger’s mission. Genomic science has already established itself as a major new dimension in basic medical research and will be increasingly important in implementation of healthcare over the next decade. Scientists who are trained in genomic science at Sanger will be in a strong position to assume key roles in the future. We will therefore run a variety of schemes and programmes catering for individuals at different career stages and spanning biological and computational disciplines. Training at the Sanger plays a role in developing all staff. Our Graduate Programme has been highly successful and we [Wellcome Library, London] wish to develop it further. We will aim to obtain additional external funding in order to provide further opportunities for PhD students, will maintain our position of attracting and supporting talented students from all over the world and particularly investigate ways of training more students from developing countries.

We will develop further the remarkable expansion of PhD training opportunities for clinicians at Sanger in the last couple of years by continuing our partnership with the and Addenbrooke’s Hospital and looking for additional funding from elsewhere. Initially tentative about studying for PhDs at Sanger, the current crop of Sanger clinicians is spreading the word that the Institute is an exciting place at which to study, where they are thoughtfully mentored and supervised with care and consideration for their backgrounds.

One group that we believe will require more attention and mentoring than in the past is the Postdoctoral fellows. We will therefore put in place an overarching organisation and structure to provide oversight for postdoctoral training, based on the model of the Graduate Programme.

There are many other opportunities at Sanger for the development of aspiring scientists through stimulating work placement opportunities for sixth formers, undergraduates, and medical students. We currently offer a range of options to students who wish to gain hands- on experience of working in a scientific research environment while studying. We will review this process to evaluate if there are ways in which to strengthen the programme, for example through summer schools for sixth formers.

Sanger is an institute of about 900 people. While the Faculty, Graduate students and Postdoctoral fellows have been an important focus of our activities in the area of training, career development and mentorship, others at the Institute have been supported through our in-house training programmes for acquisition and refinement of management skills, and through day release for higher degrees and professional qualifications. We are reviewing the ways in which we can support all our staff in developing themselves to the full, including a

-29-

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 redesign of the learning curriculum, enhanced management training and the introduction of a new appraisal system.

10. Developing our organisation Excellence of Sanger as an organisation critically underpins the delivery of our scientific and training missions and creation of a safe and facilitatory working environment for our staff. Therefore we aim for the Institute to be an exemplar of organisational effectiveness. Our corporate governance structures should provide transparent mechanisms for management and accountability, and demonstrate ethical responsibility in all our actions. As employers, we need to have an embedded culture of health and safety, with employee wellbeing as the central focus.

Over the next quinquennium, we will continue to develop our Health and Safety policies, structures and operations which have already been considerably strengthened in recent years. With the recruitment of a new Health and Safety team the Institute has been reviewing and refreshing implementation of procedures, in concert with the Genome Campus and EMBL-EBI. We are now confident that this process is ensuring the highest standards of Health and Safety awareness and culture among the Institute’s staff, together with strong, active support from the Health and Safety team for implementation, oversight and audit.

With extensive programmes in Human Genetics and experimental studies of animals, we have in place individuals and committees that are responsible for appraising the Institute of external legislative changes. The committees also provide oversight that we are conforming to the highest standards of ethical best practice with respect to human tissue samples, data from human subjects and animal use.

Sanger requires a strong corporate governance and administrative structure that effectively informs and oversees its activities. We have recently reviewed the structure and believe, overall, that it serves the purpose for which it was intended. However, we are cognisant that societal and legislative issues change over time, as do our areas of scientific engagement, requiring introduction of new components to the structure. Conversely, we recognise that bureaucracy for its own sake will not improve the Institute’s operation. Therefore, we will review the corporate management structure on a regular basis to ensure that it remains lean, informed and effective.

The Institute’s Corporate Services (Human Resources, Health and Safety, Finance, Science Administration, the Library and, Media and Public Relations) were very positively reviewed as part of the process leading up to this Strategic Plan. This review cycle will be maintained through the next quinquennium allowing us to continue striving for further improvement in the light of experience and advice from other institutes.

Alongside this process, to ensure that Core Facilities, Corporate Services and other areas of the Institute are transparently maintaining operation at the highest levels of effectiveness and customer satisfaction, we will consider instituting a limited number of key performance indicators that will provide referable measures of how effectively our organisation is fulfilling its aims.

- 30 -

Wellcome Trust Sanger Institute Strategic Plan 2011-2016

11. Premises The Sanger buildings and the Genome Campus are a much envied model of design and installation of scientific premises. They address the practicalities of scientific operation but are also underpinned by a well thought-out, well cared for environment in which people can contentedly An aerial view of part of the Wellcome Trust Genome spend a substantial proportion of Campus. their waking hours. Over the next five years, we will conduct our science within these premises and are not requesting extra space.

We will, however, be making specific internal alterations, notably to the balance between office and laboratory space. The nature of the research to be carried out over the next five years, with its particularly heavy emphasis on informatics and data analysis by many scientific staff, means that there is considerable pressure on office space. There are also some alterations required to accommodate new forms of laboratory science that will be entering the Institute. We will be using the £1m/year budget we usually put aside for building projects, to make these changes.

The Sulston Laboratories and the associated West Pavilion are now fifteen years old. There are nooks and corners where it is starting to fall short of the quality environment and image that is such a notable hallmark of the Campus. The more recently constructed SouthField buildings provide a marked contrast in terms of architecture, accommodation and communication spaces.

We are therefore suggesting that a programme of improvements to the external and internal appearance of the older Sanger buildings is implemented to bring them up to the standard of the SouthField buildings and preserve the investment the Wellcome Trust has made at the Hinxton Campus. This will include a facelift to the appearance of the buildings, but will also incorporate state-of-the-art concepts of scientific accommodation with informal communication areas, sustainable work areas and environmentally-friendly and energy efficient work places.

12. Strategic relations

In delivering its science the Institute is in constant engagement with other scientists, collaborative groups, scientific organisations, funders, the biotech and pharmaceutical sector, and the general public. We will continue to nurture and further develop these relationships to

-31-

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 maximise our scientific output, impact and influence and to ensure that Sanger is optimally serving the needs of UK science.

We believe that there may be substantial potential for further interactions and synergies with the “family” of Wellcome Trust Centres. While engagement and collaboration with the Wellcome Trust Centre for Human Genetics at Oxford is of long standing, we have fewer interactions with other Wellcome Trust Centres in the UK. We will arrange reciprocal visits to the Centres in the next year to explore scientific possibilities of mutual interest and develop them further where they exist. Through a similar approach we aim also to consolidate our many links with the Wellcome Trust Overseas Programmes in Asia and Africa, with a particular view to generating momentum in the new series of Human Genetics and Pathogen Genetics projects.

One notable focus for development over the next five years will be to ensure close interchange and coordination of our science with that of the nascent UK Centre for Medical Research and Innovation (UKCMRI), the edifice for which is soon to be constructed on the St Pancras site in London. We believe that our current scientific focus, infrastructure and scale of science will make us entirely complementary to UKCMRI and we foresee a mutually fruitful and integrated relationship.

Our interactions with the EMBL-EBI have increased substantially over the last two years and now include joint internal lecture series, joint Faculty, shared PhD students, shared Postdoctoral fellows, sharing of the Data Centre and joint involvement in many scientific projects. We envision this alliance in particular, strengthening further in the next quinquennium as the requirement for a joint approach to presentation and use of Personal Genome information becomes a pressing need. In a similar vein, our proximity to Cambridge has spawned a naturally expanding set of links with the University of Cambridge in Graduate student supervision, Clinical Fellowships, shared Faculty and an ever burgeoning series of scientific collaborations. The relationships with EMBL-EBI and Cambridge are built from the bottom up, the most robust mode, but will also be maintained by regular meetings between the Director of Sanger and the Director of the EMBL-EBI, and with the Cambridge Regius Professor of Medicine.

Large, multicentre, international collaborative studies, such as the and the International Cancer Genome Consortium, are a traditional format of research at Sanger and we expect that our scientists will continue to be leaders in such enterprises. Furthermore, we will naturally maintain through personal and institutional contacts our long standing relationships with other genome centres including the Broad Institute, Washington University, Baylor College of Medicine, the Joint Genome Institute and the Beijing Genomics Institute, continuing our practice of exchanging expertise in the generation and analysis large scale genomic data.

A relatively new area of collaboration for Sanger will be explored with the pharmaceutical and biotech sector with the aim of developing both our basic and translational science. We are currently in the process of publicising Sanger’s scientific agenda and approach to the commercial sector through an Industry Open Day. We are also in discussions to understand better the complementarity of strengths and needs between Sanger and the pharmaceutical industry as a basis for forming longer term strategic alliances.

- 32 -

Wellcome Trust Sanger Institute Strategic Plan 2011-2016

13. Spreading the word While the Institute disseminates its science to the research community through the standard channels of research publications and databases, there are other important audiences with whom we wish to converse. Our aim is to discuss with the public, in a clear and measured manner, developments in the science of genomes in order to inspire interest and engagement and also to influence policy and opinion through contributions to public debate.

Sanger has a strong track record of activity in informing the general public about genomes and of participating in pertinent issues of discussion and debate through the written, broadcast and electronic media. For a scientific institute we have a brand name that is distinctive and remarkably well-recognised by the lay public. Moreover, the nature of the science in which we are involved will become an increasingly popular focus of public attention as DNA sequencing technologies and personal genomes, with their attendant opportunities and challenges, enter healthcare and societal culture.

Our main voice in the press and media is our Faculty. They will be supported in further developing their media skills, although many are already consummately professional performers.

Much of our activity in this context has, however, been reactive to the “scientific publication of the week” or in response to an enquiry from a journalist needing to generate the next day’s copy. We now aim to construct a more strategic approach to relations with the general public and Julian Parkhill explaining genomes on the media. As part of this, the Institute should come BBC news website. to be seen as a centre of scientific culture with [www.bbc.co.uk/news/] respect to the genome, a geographic and intellectual stopping off point for those wishing to think about genomes.

To this end, for example, we will initiate a series of invited visits by key individuals including politicians, opinion formers in the media, captains of industry, religious figures, and those from the Arts world who are bold enough to cross the cultural divide. These visits will summarise and explain Sanger and genome science, show off our main infrastructure platforms and explore genome-related issues of particular interest through discussions with Faculty. In return, we will encourage our visitors to be forthcoming about their opinions on genome related questions and record these. At the end of the day we hope that visitors will leave the Institute better informed but also thoughtful, reflective and perhaps engaged in the future of genomics.

We will also continue to develop strong relationships with individual journalists. A few, with longstanding interest in our scientific area, now regularly contact us informally for advice and interpretation of new scientific developments. The rapid pace of change has meant that these individuals are keen to visit Sanger on a regular basis, as a convenient way to update themselves on the background to scientific developments as well as what Sanger itself is doing. Through these contacts, some of whom are developing into “colleagues”, we aim to engage in more extensive and well-developed features in written and broadcast media.

-33-

Wellcome Trust Sanger Institute Strategic Plan 2011-2016

The Genome Campus buildings and grounds also offer opportunities for entertaining the public. In conjunction with the we would like to explore the use of the Sanger building and Campus for exhibits and events that relate to genomes, perhaps developing here also the interface of Arts and Science that is the theme of the exhibitions in Euston Road.

Our flourishing Public Engagement Programme has had extraordinary success in communicating genome science to a variety of audiences through exhibitions, electronic resources, visits, lectures and discussions. Undoubtedly, a key group of recipients have been schoolchildren and teachers who have visited Sanger in large numbers to use the resources and tools that we offer, and simply to breathe A school child at the Institute the air of a scientific environment. This will continue to be searching for mutations in strongly supported. DNA sequence. As important as visits, are our web-based tools. These offer the potential of being hugely influential across the UK and beyond, in engaging large numbers of teenagers not only in the science and societal aspects of genomes, but in thinking about cancer, microbes, genetic and infectious diseases, gargantuan computers that occupy whole buildings and the diverse range of topics that Sanger science touches on. We are therefore instituting a review of the content of our web resources on all these topics by members of Sanger Faculty, colleagues in the EMBL-EBI and others in order to assess whether we have pitched it right and to build a really coherent web-based curriculum for genome and related sciences that can be used by schools and others.

While use of the internet can inform large numbers of people, direct contact between researchers and the public offers the opportunity of a two way discourse, potentially providing insights into public concerns about the consequences of what we do. We will develop further a series of events at the Institute bringing community groups to personally meet scientists, encouraging them to communicate their views about what we do and helping them understand what makes us tick.

14. Societal aspects of the Personal Genome The Institute’s historical involvement in sequencing the reference human genome and our future scientific Programmes of research into the nature and role of genetic variation, inevitably lead us to confront many of the issues relating to Personal Genomes.

For example, in the course of designing our research we have had to think through and find pragmatic solutions to many ethical conundrums emanating from complete human genome sequencing. Learned disputation.

Such challenges include how to handle the discovery of DNA variants that are known to be disease-predisposing in an individual who has donated their sample for research. To inform or not to inform? What level of patient consent do we need for studies on archival samples

- 34 -

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 and how do we share data with scientific colleagues that may theoretically result in identification of the person being studied?

More generally, what are our attitudes to provision of human genome information to individuals; what regulation or monitoring of commercial providers should there be; what is the optimal way to provide reliable interpretation of individual genome sequences; and what should be the response of the NHS to a patient presenting their privately acquired genome to their GP for opinion and action?

Sanger is regularly consulted by external opinion-forming bodies and by government, providing responses on many such issues, with a member of staff dedicated to researching and coordinating our opinions on policy questions. We believe that the Institute should be at the centre of societal debate and discussion on Personal Genomes, contributing thoughtful and well-founded opinion. Personal Genomes should be a key topic for active discussion within Sanger and these discussions should inform our involvement as an Institute in advising and contributing to societal debate. In principle, such discussions should engage all members of the Institute, but we believe should be particularly orchestrated to involve our PhD students and Postdoctoral fellows in addition to Faculty, senior managers and other interested staff.

To this end, we will convene a Working Group with a remit to coordinate activities on societal issues pertaining to Personal Genomes within Sanger. The Working Group will, for example, organise a series of debates and visiting speakers to reflect on these questions. These events should be entertaining, controversial, thought-provoking and room filling. Our Public Engagement team will start videoing the response of Sanger staff and visitors to the question of whether they would have their own genomes sequenced, ultimately assembling the responses into a video collage of opinion and perspective which could be used by schools, the media or others wishing to inform themselves on the dimensions of the issue.

15. Resources We are requesting £395m over the next five years to deliver the Institute’s scientific and training Programmes. We aim to manage budgets, personnel and space to maximise the scientific output from the allocated resources.

The people and skills at Sanger will develop and change in response to our scientific agendas. Over the last decade there has been an increase in the number of people we employ, although this has slowed recently. We will carefully monitor trends in the composition of our workforce, but anticipate that the number of people at the Institute, currently ~900 of which ~800 are core funded, will remain at this level or perhaps reduce slightly over the next few years.

Space is at a premium at Sanger. Nevertheless, we intend to carry out the proposed scientific Programme within our current space and buildings. During the course of the quinquennium we will certainly need to manage carefully our space allocation in response to scientific need. As already outlined, we may have to adjust further the balance between laboratory and office space, which is determined by fluxes in the requirement for “wet” laboratory work and the ever-increasing need for desk-bound computational analysts. As part of this process, we will assess the opportunities for home working in order to optimise space usage at Sanger.

-35-

Wellcome Trust Sanger Institute Strategic Plan 2011-2016

As in the past, our budgets will be managed on a yearly basis, with allocation taking into consideration changing scientific priorities. In order to accommodate new scientific opportunities or respond to unexpected need there will be a flexible fund amounting to 3% of the budget, with decisions on its deployment taken at the end of each financial year.

Supplementary to core funding from the Wellcome Trust, the Institute currently receives additional funding, either from the Trust as “out-of-envelope” grants or from third parties. Currently, we have a number of out-of-envelope awards including the UK10K Project, the Deciphering Developmental Disorders Project and two projects in for Discovery of drug-sensitising genotypes in human cancer cells and Quantifying disease burden in patients with cancer. It is anticipated that the opportunity to be part of such projects will continue in the next five years.

Since most of the Institute’s activities are supported by the core budget, we do not expect our Faculty to spend a substantial segment of their time raising funds through third party grants. Rather we advise pursuit of funding on a strategic and opportunistic basis, with success in a few large awards than many smaller ones. Third party funding has certainly increased over the last few years and we aim for it to reach 20% of the Institute’s budget overall, although inclement financial conditions may moderate our success in this aspiration. Third party funding particularly allows us to initiate projects that were previously unanticipated and develop them to a point at which they can be incorporated into the core.

The chart below shows the percentage split of funding (past and predicted) between sources from 2007 – 2016.

Constructing the Budget The specific studies within our scientific portfolio were developed by the Institute’s Faculty over several months of iterative discussion. During this period, the Board of Management engaged in strategic consideration of the overall shape of Sanger science. The latter deliberations related to focus and impact and thus the appropriate balance between the number and size of projects. They considered our scientific leadership in each proposed area and feasibility of the studies. They also dwelt on the emphasis that should be given to each of the four themes of Sanger science Human Genetics, Pathogen Genetics, Mouse and Zebrafish Genetics and Informatics and the balance between sequencing, mouse and zebrafish work and IT. Guided by the outcomes of these discussions, the Board of Management dispensed multiple rounds of moderation and advice to the Faculty concerning the evolving set proposals that were received.

- 36 -

Wellcome Trust Sanger Institute Strategic Plan 2011-2016

Through this process, the previously outlined Programme and project structure emerged, each project with several aims. As the detailed plans matured, outline budgets were calculated by aim so that value for money considerations for each aim and project, together with the overall cost of the scientific portfolio, could be assessed in the light of the likely budget available. The budgets included all the direct costs of each project; staff, consumables and equipment. There was also an allocation to each project of directly attributable core facilities costs, which included use of the sequencing facility, and the mouse and zebrafish facility. This resulted in adjustments to the overall portfolio based on scientific, budgetary and strategic considerations.

At submission in December 2009, the total budget request to the Wellcome Trust from the Sanger Institute for the years 2011 to 2016 was £387m. This was split as illustrated below. Considering all classes of expenditure, staff salaries represent the largest single component of the Institute’s budget request. Consumables account for 23%, of which 15% is for DNA sequencing. The allocation for consumables may be relatively high compared to other scientific enterprises, reflecting the large scale nature of our experiments.

Capital Flex fund Consumables Equipment 14% 3% 8% maintenance 3% Sequencing consumables 15%

Travel 1%

Premises Other staff Staff salaries 12% costs 42% 2% Percentage split between the different types of expenditure (at £387m) The largest scientific Programme budget is for Human Genetics, followed by Mouse and Zebrafish Genetics. Human Genetics has more Faculty than any other Programme and the large size of the human genome dictates large sequencing costs. By contrast, the relatively modest sizes of the Pathogen and Malaria Programme budgets reflect the smaller numbers of Faculty involved and the smaller sizes of the genomes they study. Flexible funds Cancer Corporate 3% Genetics and Services Genomics Human 22% 9% Genetics 19%

Faculty and Malaria education 4% 7% Pathogen Other IT Variation support Mouse and 8% 11% Genome Zebrafish Informatics Genetics 4% 13% Percentage split between the different areas (at £387m)

-37-

Wellcome Trust Sanger Institute Strategic Plan 2011-2016

Comparing the major core platforms which will generate our data, the largest component of budget has been allocated to DNA sequencing (and other DNA pipeline activities such as genotyping), with IT second and the use of the Mouse and Zebrafish facility third (shown in the pie chart below). Indeed, if consumables, capital, core facility staff, and full IT costs of sequencing are combined, the generation and analysis of DNA sequence accounts for £110m (~30%) of our requested budget allocation.

IT systems, 44

Sequencing

Mouse and facility, 94 Zebrafish facility, 16

Genotyping facility, 4

Cost of the four main core platforms (£m)

Subsequent to our original submission two further budget lines have been added. The first is for the Institute to fund its own Technology Office in order to provide a strong foundation for our increasing Translational activities (as described in Supporting documents section 2). The second is to provide a budget for Sulston building upgrades in order to bring it up to the standards seen elsewhere on the Campus (as described in Supporting documents section 7.4). The current total envelope request is therefore £395m.

A further development since the submission is the potential installation of a new core facility for large scale, high throughput cell culture particularly using the technology of human iPS cells, as referred to at the end of section 3. These plans are still at an early stage of development and it is not yet clear what the size, scale and therefore required resource will be to fund these experiments.

- 38 -

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents Supporting documents

Contents

1. Scientific Programmes ...... S-1 1.1 Cancer Genetics and Genomics ...... S-1 1.2 Human Genetics ...... S-5 1.3 Pathogen Variation ...... S-11 1.4 Malaria...... S-16 1.5 Mouse and Zebrafish Genetics ...... S-21 1.6 Genome Informatics ...... S-25 2. Translating our science ...... S-29 3. Developing our people ...... S-31 4. Developing our organisation ...... S-35 5. Strategic relations ...... S-37 6. Spreading the word ...... S-39 7. Resources ...... S-42

[S-i]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents

[S-ii]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents

1. Scientific Programmes

1.1 Cancer Genetics and Genomics

Heads of Programme and P Andrew Futreal

1.1.1 Structure of the Programme

Michael Stratton P Andrew Futreal David Adams Allan Bradley Peter Campbell Pentao Liu Ultan McDermott George Vassiliou

1.1.2 Vision for the next five years The Cancer Genetics and Genomics Programme aims to systematically and comprehensively catalogue somatic mutations of all classes in human and mouse cancer genomes, integrating and mutually empowering the complementary approaches of large scale sequencing and experimental insertional mutagenesis. The integration and scale of human and mouse cancer genome analysis by both sequencing and insertional mutagenesis is a distinctive, if not unique, feature of this Programme in cancer research worldwide.

The Programme will present these data publicly in a comprehensible and usable form to cancer researchers and clinicians. The Programme will further directly explore translation of information on cancer genomes by examining how somatic mutations influence drug responsiveness and can be used to monitor disease burden.

Through success in these strategic aims, the Programme will generate profound insights into the evolution of the cancers studied, with essentially complete identification of the cancer

[S-1]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents genes implicated, understanding of the biological pathways subverted and scrutiny of the mutational processes that have been operative. These advances will provide the foundation for further studies by cancer researchers into the biology and epidemiology of these cancer types, and provide new strategies for cancer diagnosis, therapy and prevention.

1.1.3 Detailed aims and deliverables The Cancer Genetics and Genomics Programme proposes to investigate somatically acquired structural changes in human and mouse cancer genomes in order to reveal the mutated genes that confer abnormal proliferative capacity on cancer cells, elucidate mutational processes implicated in cancer development, organise and present this information publicly, use it to understand responsiveness to anticancer therapies and improve patient management.

Project 1: Profiles of somatic mutations in cancer genomes This project will use systematic sequencing of human and mouse cancer genomes to discover cancer genes, reveal mutational processes, understand the basis of responsiveness to drugs, develop approaches for monitoring residual disease in cancer patients and present somatic mutations in the COSMIC database.

Aim 1: Identification of human cancer genes and processes of somatic mutation • Generation of comprehensive catalogues of somatic mutation in hundreds of cases of three classes of human cancer • Transcriptomic profiles of the same series of cancer samples by RNA sequencing • Epigenomic profiles of the same series of cancer samples • Identification of new somatically mutated cancer genes present at ~3% or greater prevalence in the three cancer classes • Estimation of the number of driver mutations and the biological pathways subverted during the development of individual cancers of these three classes • Investigation of patterns of somatic mutations to elucidate the mutational exposures and the DNA repair processes operative during the development of individual cancers • Investigation of the primary phenotypic consequences of driver mutations by correlation with transcript profiles

Aim 2: Identification of cancer genes and processes of somatic mutation in mice • Generation of comprehensive catalogues of somatic mutation in approximately one hundred cases of engineered murine cancer • Comparison of the mutations to those found in the human cancers that are being modelled • Identification of the somatically mutated cancer genes present and their relationship to the engineered initiating events and/or tissue of origin • Description of the patterns of somatic mutations present and their relationship to the engineered initiating events and/or tissue of origin

Aim 3: Development and maintenance of the COSMIC database • Maintenance, updating and improvement of the COSMIC database to provide a comprehensive database of somatic mutations for cancer researchers • Inclusion and integration of data from mouse insertional mutagenesis and mouse cancer genome sequencing into COSMIC

[S-2]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents • Presentation of individual cancer genomes with annotation for driver mutations, patterns of somatic mutations and other genomic attributes • Further development of COSMIC viewing and browsing tools

Aim 4: Genomic analysis of ~1,000 publicly available immortal cancer cell lines • Sequencing of all coding and miRNA genes in the 1,000 cancer cell lines that constitute the translation project set • Transcriptomic profiling of the 1,000 cancer cell lines using RNA sequencing • Genome-wide epigenetic analyses of the 1,000 cancer cell lines

Aim 5: Detection of minimal residual disease in solid tumours • Assessment of disease burden in patients with cancer by quantification of tumour- specific genomic rearrangements in circulating tumour DNA at multiple-time points before, during and after therapy in 150 patients (50 patients each from three types of cancer breast, colorectal and bone cancer)

Project 2: Somatic insertional mutagenesis in mice for cancer gene discovery This project will use transposon-based insertional mutagenesis screens to identify new cancer genes in mice and transposon-based approaches to validate cancer genes discovered in human studies

Aim 1: Establishment of transposon-mediated gain-of-function and loss-of-function models of breast cancer • Perform large-scale insertional mutagenesis in a gain-of-function and loss-of- function context to identify new cancer genes and genetic interaction in the mammary gland • Generate a comprehensive collection of transposon insertion sites across a vast panel of genetic backgrounds and tumour types, and a repository of cell lines and tissues derived from these mice

Aim 2: Establishment of transposon-mediated gain-of-function and loss-of-function models of gastrointestinal cancer • Perform large-scale insertional mutagenesis in a gain-of-function and loss-of- function context to identify new cancer genes and a repository genetic interaction in gastrointestinal cancer • Generate a comprehensive collection of transposon insertion sites across a vast panel of genetic backgrounds and tumour types, and a repository of cell lines and tissues derived from these mice

Aim 3: Establishment of transposon-mediated gain-of-function and loss-of-function models of hematopoietic cancer • Perform large-scale insertional mutagenesis in a gain-of-function and loss-of- function context to identify new cancer genes and genetic interaction in hematopoietic cancer • Generate a comprehensive collection of transposon insertion sites across a vast panel of genetic backgrounds and tumour types, and a repository of cell lines and tissues derived from these mice

[S-3]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents Aim 4: Large-scale validation of candidate cancer genes in the mouse • Validate frequently mutated genes identified from Aims 1-3 and from the Cancer Genome Project (Project 1) as cancer driver genes • Validation of up to 200 genes per year and up to 20 candidate tumour suppressor genes

[S-4]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents 1.2 Human Genetics

Heads of Programme Inês Barroso and Richard Durbin

1.2.1 Structure of the Programme

Inês Barroso Richard Durbin Carl Anderson Jeffrey Barrett Allan Bradley Nigel Carter Panos Deloukas Helen Firth (Honorary) Paul Flicek (Honorary) Matthew Hurles Dominic Kwiatkowski Willem Ouwehand (Honorary) Aarno Palotie Manjinder Sandhu Charles Shaw-Smith Nicole Soranzo Derek Stemple Chris Tyler-Smith Eleftheria Zeggini

1.2.2 Vision for the next five years The goal of the Human Genetics Programme is to exploit new methods of genome analysis to study human genome variation across the spectrum of variation classes and allele frequencies, in order to further understanding of the genetic basis of disease as well as to understand human evolution, demography and the impact of selection on our genome. This Programme includes a new strategic initiative which will strengthen synergistic links with the Malaria and Pathogen Variation Programmes, through expansion of genetic studies in African populations (Genetics of common diseases project, Aim 2).

In this Programme we will investigate the genetic landscape of all forms of phenotypic variation, from normal variation (not related to any adverse effects – Human genetic variation project) to rare monogenic disorders with significant individual morbidity and mortality (Genetics of rare diseases project) and common complex diseases (Genetics of common diseases project). In the context of this proposal we will consider ‘more common, complex diseases’ those that may be genetically more heterogeneous (both in terms of alleles and loci), and where the clinical presentation of the disease is phenotypically indistinguishable precluding a finer classification of the patients into discrete units of investigation, and ‘rarer simpler diseases’ those likely to be genetically more homogeneous. A question which our

[S-5]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents Programme intends to answer is the extent to which ‘common diseases’ may be considered a combination of many, each individually rare, disorders, distinguishable only, at least initially, at the genetic level.

In designing this Programme we have selected our samples to represent a broad range of study designs, including familial genetically “loaded” cases, founder populations, as well as outbred highly diverse populations. We have paid particular attention to select carefully phenotyped samples with rich phenotypic data (including both molecular and lifestyle/ environmental traits) that will allow us to tackle in a comprehensive way the genetic complexity of human health and disease.

We have selected a group of rare disorders with the potential to provide important lessons in human biology, some of which will contribute to our understanding of the molecular pathways behind the selected common diseases, others which represent a significant cause of morbidity and mortality in their own right or are expected to provide novel insights into human biology. By identifying genes and mutations behind these phenotypes, we will discover critical molecular mechanisms underlying human disease and health and provide the basis for improved diagnosis, care and prevention. The selection of common diseases and traits to study has been based on the fit with the Institute’s mission combined with the expertise and competence of the Institute’s Faculty.

To identify and characterise the novel genes and biological pathways, critical for human health and disease, we have assembled: 1) Technologies in high throughput genome analyses, including new, and constantly developing sequencing technologies; 2) Strategically well-selected and deeply phenotyped study samples and population cohorts. 3) Expertise in statistical and bioinformatic analyses facilitating development and application of novel statistical methods and 4) Pipelines for the production of animal models, including various knock out options in zebrafish and genetically modified mice.

Many of our proposed studies build on the UK10K Strategic Award that has recently been funded by the Wellcome Trust and will run from 2010-2013, overlapping the start of the five year period covered by this proposal (see figure below). In the UK10K project we will conduct genome-wide sequencing of 4,000 UK samples from the deeply phenotyped TwinsUK (www.twinsuk.ac.uk/) and ALSPAC (www.alspac.bris.ac.uk/) cohorts, to provide full sequence genotype data both for direct association analysis of quantitative traits (QTs) measured in the samples, and also a reference resource for imputation into other genotyped samples of UK origin, and as a control set for sequence-based disease association studies. In addition, we will sequence the exomes of 6,000 genetically loaded samples selected for extreme obesity, neurodevelopmental disorders, and rare likely monogenic familial extreme traits, including some of those to be studied in the Genetics of rare diseases project, to identify rare variants contributing to these diseases. The data generated will be an order of magnitude deeper than that relevant to UK samples from the 1000 Genomes Project, and coming from samples with extensive phenotype data including expression and other molecular phenotypes, will provide an underpinning resource for further research. Supporting some of the activities within the Genetics of rare diseases project is the recently funded Deciphering Developmental Disorders (DDD - Health Innovation Challenge Fund 2010-2015). This project will work with 23 UK Regional Genetics Services to ascertain ~12,000 patients (with a range of developmental disorders) and their parents, establish a

[S-6]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents ‘Developmental Disorders DNA Biobank’ and provide array-based CNV detection, genome- wide SNP genotyping and the dissemination of the results through DECIPHER. Here we propose to use the DNA Biobank established by this initiative to exome resequence up to 10,000 patients with developmental diseases who remain undiagnosed after array-based analyses.

Connections between the Human Genetics Programme and the Strategic Award UK10K Samples sequenced within the UK10K proposal will serve as controls for disease studies in the Genetics of rare diseases and the Genetics of common diseases projects, as well as being a source of variants for further analysis in the Human genetic variation project and of potential associations for follow-up in the Genetics of common diseases project.

Through the success in these projects, the Programme will provide a comprehensive insight into the landscape of genetic variation in healthy individuals and in several rare and common diseases or traits in European and African populations. These advances will provide the basis for clinical genetic diagnosis of patients with many inherited diseases and a framework for understanding their biology as a foundation for clinical management and new therapies.

1.2.3 Detailed aims and deliverables The Human Genetics Programme exploits next generation genome analysis to investigate genetic variation within the context of human evolution and its influence in health and disease. As technologies continue to rapidly evolve we will carefully monitor their development and employ the technology most suited at the time of the experiment taking into consideration cost-benefit of each approach and the need to make scientific progress across all projects and aims in each year of the proposal.

Project 1: Human genetic variation Variation created by the processes of mutation, selection and drift underlies all research in human genetics. We propose to use high throughput new-technology sequencing to describe the sequence and structural variation in the genomes of at least 17,000 individuals from multiple populations, and couple this to investigations into mutation and selection in humans, and computational methods development to manage and display these data.

[S-7]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents Aim 1: Deep characterisation of human genetic variation in targeted populations • In the first two years we will sequence over 3,500 genomes from Finland and isolate founder populations to leverage their special genetic structure for efficient characterisation of almost all segregating variation and phenotype analysis. Following this we will use over 200Tbp sequencing to further investigate UK cohort samples beyond the UK10K project. We will also sequence multiple populations from Africa and other non-European countries to characterise genetic variation to support association studies in the Genetics of common diseases project and the Malaria Programme, and for evolutionary studies in Aim 2. Finally, we will analyse population structure and history using the data collected in this aim and elsewhere.

Aim 2: Study of positive selection in humans • We will use data from Aim 1 and targeted follow-up sequencing to identify loci under positive selection in humans in the last 100,000 years, and investigate several in further detail using mouse models

Aim 3: Characterisation of human germline mutational processes • We will use whole-genome sequencing of samples from both human and mouse pedigrees to observe directly the mutational processes in the germline, looking both at normal rates and patterns and how these vary with age, environment and genetic background

Aim 4: Development of computational tools for managing and visualising human sequence and genotyping data • We will develop software for local interactive analysis of full sequence-based genetic data, to support the research in this Programme and also more in the research community at large

Project 2: Genetics of rare diseases We propose to discover causal genetic variants underlying rare diseases using microarray- based CNV discovery, exome and whole-genome resequencing. To identify the basic biological function of some of the genes and variants we discover, we will disrupt the expression of orthologous genes in zebrafish and mice.

Aim 1: Identify causal variants by resequencing patients with rare diseases • Whole-exome sequences of at least 2,000 patients with extreme forms of common diseases studied in the Genetics of common diseases project • High-depth (>20x) whole-genome sequences of up to 700 patients with a strong family history of extreme forms of common diseases studied in the Genetics of common diseases project but no molecular cause revealed by exome resequencing • Whole-exome sequences of at least 1,500 patients with congenital heart disease and 2,000 patients with migraine and/or epilepsy • High-depth (>20x) whole-genome sequences of up to 700 patients with congenital heart disease and up to 750 patients with migraine and/or epilepsy, for those patients with a strong family history but no molecular cause revealed by exome resequencing • Whole-exome sequences of up to 10,000 patients with undiagnosed developmental diseases

[S-8]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents • An expanded DECIPHER database and browser incorporating all forms of genetic variation (CNV, indels and base substitutions) together with clinical phenotypes for patients with developmental disorders

Aim 2: Establish zebrafish and mouse models of human disease genes Zebrafish • Morpholino (MO) knockdown studies in 50-100 genes/year • Specialised phenotype analysis appropriate for the individual disease of 5-10 zebrafish mutants/year (from the Zebrafish Mutation Resource) • Transcript counting expression analysis on groups of MO injected zebrafish (up to 30-60 Illumina sequencing lanes/year) Mice • Establish as mice 20 alleles/year from the KOMP/EUCOMM mutant ES cell resource • Where appropriate conditionally knock out the gene in relevant tissues using appropriate Cre driver lines • Establish 2-4 custom alleles/year in mice (deletions/ duplications, point mutations and BAC insertions) • Conduct primary phenotyping for all mice (by the Mouse genetics project) • Specialised phenotyping on 2-4 mouse models/year

Project 3: Genetics of common diseases We propose to use next generation high-density GWAS arrays (5-10M variants), whole- exome and whole-genome sequencing approaches to identify additional low-to-rare frequency alleles influencing disease risk and component traits in three disease areas where the Institute’s investigators have established leadership: cardiometabolic quantitative traits (QTs), obesity and Crohn’s disease. We also propose to expand these studies to African diseases and traits.

Aim 1.1: Cardiometabolic QTs • Replication of ~100K SNPs in large sample collections (up to 20K samples internally and up to 50-100K through large collaborative efforts) • Locus boundaries and targeted resequencing in population-based cohorts (up to 30Mb at 30x depth in up to 5K samples) • Targeted resequencing of sub-regions (up to 5Mb at 30x depth in up to 5K cases) in disease cases (e.g. Myocardial infarction (MI)/ Coronary artery disease (CAD), type 2 diabetes (T2D), dyslipidaemias) • Comprehensive list of validated variants influencing cardiometabolic QTs and influencing risk of related diseases (e.g. MI and T2D)

Aim 1.2: Obesity • Locus boundaries and targeted resequencing in additional extreme and population samples (up to 5Mb, at 30x depth, 25-multiplex, in up to 10-15K samples) • Replication genotyping (up to 5-10K SNPs) in large sample collections (15-30K) • High-depth (>20x) whole-genome resequencing of ~500 obese patients • Additional targeted resequencing and genotyping for replication testing in large sample collections (up to 50Mb, at 30x depth, 25-multiplex, in up to 10-15K) • Comprehensive list of variants robustly associated with BMI and risk of obesity

[S-9]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents Aim 1.3: Crohn’s disease (CD) • Genotyping and association results of 5-10M variants on 5K CD samples • Locus boundaries and targeted resequencing in up to 5K samples (up to 30Mb at 30x depth, 25-multiplex) • Whole-genome resequencing of 2K CD patients (at 6x) • Comprehensive list of variants robustly associated with risk of CD

Aim 2: To examine the genetic determinants of cardiometabolic and infectious traits and diseases in African populations • Wellcome Trust Advanced Course in genomic epidemiology • Research training exchange Programme • Network of research collaborators and institutions across East Africa • Genome-wide data (5-10M SNP) in up to 2,000 study participants from the Tanzanian Sickle Cell Disease (SCD) cohort • Genome-wide data (5-10M SNP) in up to 5,000 study participants from East African infectious and cardiometabolic cohorts • Replication and targeted sequencing of associated regions • Whole-exome/genome sequencing of more than 1,000 participants

Aim 3: To evaluate and develop methodological approaches for the analysis of large-scale data • Develop and apply high-powered methods for rare variant association study design, analysis and interpretation • Extend and apply methods for the simultaneous analysis of multiple phenotypes • Develop and apply computationally efficient methods for accurate imputation of rare variants using large sequencing reference panels • Develop and apply well-powered methods for the meta-analysis of genome-wide scans across populations within Africa, and across continents • Generate integrated pipelines and freely available analytical tools for all of the developed methodologies

[S-10]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents 1.3 Pathogen Variation

Heads of Programme Julian Parkhill and Gordon Dougan

1.3.1 Structure of the Programme

Julian Parkhill Gordon Dougan Matt Berriman Paul Kellam Dominic Kwiatkowski Chris Newbold (Honorary) Sharon Peacock (Honorary)

1.3.2 Vision for the next five years The rapid development of new sequencing technologies over the past several years has opened up tremendous opportunities for research on pathogens and their hosts. The Institute has been at the forefront of these changes and has pioneered many of them. Understanding the structure and coding capacity of genomes defines the limits of genotype and provides a mapable landscape that can drive genome wide experiments. In the next five years we can expect genetic manipulation of both pathogens and their hosts to be central to many experimental designs. Further, experiments that were once restricted to particular strains or laboratory-adapted organisms can now be performed using microbial populations. These can be populations of the same species or even populations of different species inhabiting a particular ecological niche on the host. Mapping whole genome variation in microbes will define the ultimate typing schemes and facilitate both genetic association studies and transmission mapping in time and space. Such studies have to be performed at scale and access is required to large research engines such as high throughput sequencing, microbial culture collections, cellular phenotyping and animal challenge facilities and clinical sites, all linked through strong informatics. All of these are available on site at the Institute.

[S-11]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents As we move from sequencing to functional genomics, access to biological models is essential. We will achieve this through focused in-house projects linked to external specialist collaborators working in universities, research institutes and sites in disease endemic regions. These networks are now established and work well. One key to success is to allow external scientists access to the core research engines on site at the Institute, including immediate access to data for analysis. Such a relationship is mutually beneficial and greatly expands the impact of Sanger Programmes. The relationships also benefit enormously through the open access policy operated by the Institute.

The Programme, as outlined, is ambitious but we believe it is deliverable. In the first phase we expect to deliver detailed information on the population structure of a number of different microbial populations. Our ability to curate and present this data to the community will benefit from an increasingly visible role for the EMBL-EBI located alongside us at the Hinxton Campus. We will work with EMBL-EBI to enable curation and presentation of genome data, through an expanded EnsemblGenomes covering microbes. This information will be exploitable by a wide range of scientists working at different places around the globe. Our internal functional work, linked to match external projects, will provide broad coverage in key areas.

We are becoming increasingly involved in translational activities extending beyond the presentation and analysis of sequence information. Thus, we have an opportunity to expand into this area, in line with the direction The Wellcome Trust is taking. We are already contributing to the analysis of microbes in geobiological space, defining pathogen haplotypes circulating in vaccine trial sites and mapping the emergence and transmission of antibiotic resistance. We will expand these interests into the healthcare systems within the UK through collaborations with the HPA and joint funding initiatives. Internationally, we will continue to work with our overseas collaborators. These projects will also provide an opportunity for training and capacity building in endemic diseases areas in the poorer regions of the world. Clearly our helminth projects are an example of this commitment.

Interactions with the murine and human Programmes at the Institute will strengthen our interest in the host. The Mouse and Zebrafish Genetics Programme is unique in scale and infection susceptibility screening is central to this. At the time of writing we have screened close to 200 novel knock out lines and have already identified several novel host susceptibility loci. Linking these discoveries to humans will be simplified by our relationships with the human geneticists and such links are written into their Programme documents. By the end of the next funding phase we hope to be linking pathogen and human genotypes, identifying co-evolutionary trends operating at a local or global level or driven by human therapies.

1.3.3 Detailed aims and deliverables We will exploit the considerable core resources and infrastructure available within the Institute alongside our open access philosophy and collaborative network. These factors bring considerable power to the Programme and provide tremendous support to the scientific community. Many of the objectives and deliverables are shared between the individual projects and should be considered as cross-project. We believe that the Institute is unique in the international context in terms of our ability to apply genome wide analysis from pathogen to host as an integrated process. We also have the ability to work simultaneously on multiple pathogens and disease processes bringing the power of comparative analysis to the forefront of our approaches.

[S-12]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents Project 1: Population genomics, evolution and pathogenicity in bacteria We will exploit high throughput approaches to link genotype to phenotype and to direct laboratory and field based investigations on infection, transmission, emerging threats and therapeutic challenges.

Aim 1: Comparative high throughput genomic analysis • Identify natural variation and investigate the population structure and phylogenetics of specific pathogens • Exploit natural variation (gathered globally and locally) to understand the basis of pathogenicity, host-association and transmission and the evolution of these traits (bacterial association studies) • Integrate bacterial association studies with association studies of the host, to investigate possible synergic effects between host and pathogen variation • Investigate the response of natural populations to vaccine or therapeutic intervention

Aim 2: Real-time evolutionary analyses during natural infection • Investigate evolution of pathogens in infections and transmission chains to understand the impact of genetic changes • Use natural infections of farm animals and model animal challenge systems to investigate evolutionary pathways and transmission mechanisms

Aim 3: High throughput phenotypic analysis • Develop sequence-based transcriptomics to refine genomic analyses, elucidate novel RNA genome components and investigate the transcriptional response in specific experimental systems • Utilise highly parallel bacterial phenotype microarrays (Biolog) to investigate phenotypic variation in sequenced populations and correlate this with genomic variation • Develop saturating random mutagenesis protocols for bacterial genomes, using directed high throughput sequencing. Utilise very deep transposon libraries to investigate gene function in in vitro and in vivo challenges • Support and mutagenesis studies by applying high throughput proteomics tools to the investigation of specific aspects of bacterial host interactions

Aim 4: The host-associated microbiota in health and disease • Elucidate the deep population structure of the microbiota in human carriage and disease studies and specific murine models using population surveys based on ultra-deep 16S sequencing • In association with the International Human Microbiome Consortium, generate reference genome sequences for specific components of the human microbiota • Investigate the potential of future long-read sequencing technologies for deep metagenomic sequencing and reconstruction of the host-associated microbiota

Aim 5: Software tools and databases • Develop database structures and analysis pipelines to support and extend the specific sequence-based and biological analyses in this program • Integrate external tools to support the analysis where appropriate

[S-13]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents • Design and maintain curation tools to enable in-house and community curation of reference genomes and interact with higher-level federated databases, such as EnsemblGenomes

Project 2: Parasite comparative and functional genomics We will use comparative and functional genomics within and between species to study parasites of global health importance.

Aim 1: De novo genome sequencing for key parasite species • Generate and maintain reference genomes and high-quality draft genomes for key helminth and protozoan parasites • Develop software and pipelines to combine sequencing technologies for automated pre-finishing and contiguation of assembled sequence and to refine reference sequences

Aim 2: and variation discovery • Identify genomic determinants of key biological differences between species • Investigate the genetic basis for virulence and disease tropism in key parasites • Investigate population structure in parasite populations and explore host- pathogen co-evolution

Aim 3: Use high throughput RNA sequencing to investigate parasite and regulation • Identify genes, improve genome assembly, and provide a detailed catalogue of alternative splicing • Identify genes and ncRNAs undergoing stage- and tissue-specific (in multicellular parasites) expression • Examine molecular phenotypes of specific isolates and gene knockouts from the Malaria Programme

Aim 4: Develop, consolidate and curate annotation resources for helminths and other parasites across in-house and external databases • Develop community specific tools and target curation efforts towards translational research, including drug target and vaccine discovery

Project 3: Virus genome diversity in infection and pathogenesis We will sequence the complete genomes of defined populations and cohorts of selected RNA and DNA viruses of importance for human and veterinary health. The project will investigate aspects of host virus interplay at the genome level and look for potential adventitious agents in complex DNA preparations.

Aim 1: Sequence the complete genomes from large collections of different RNA and DNA viruses • Develop next generation sequencing pipelines to enable the enrichment, indexing and sequencing of full-length viral genomes from clinical material • Examine viral genome diversity and dynamics in collections of samples of medical and veterinary importance

[S-14]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents Aim 2: How determining the viral metagenome in human and animal reservoirs can lead to the identification of known and new viruses in defined disease settings • Develop, as part of an international virology consortium, the genome enrichment methods for determining the metavirome in cell free material • Adapt and utilise RNAseq for infected cell material with the aim of identifying known viruses in different clinical settings and the presence of new viruses in known diseases

Aim 3: How patterns of gene expression and transcript processing dynamics in cells infected with human viruses influences virus pathogenesis • In collaboration with external groups determine the full transcriptional architecture of cells infected with different human pathogenic viruses • Determine the transcriptional pattern and coding sequence polymorphisms of genes that respond to viral infection from species with potential for zoonotic transfer to humans

Aim 4: How computational resources and database systems can provide functional insights into virus genome projects • Produce data resources for processing and visualising multiple finished whole viral genome sequences in collaboration with other international bioinformatics resources • Provide ways of relating sequence variation to encoded proteins and transcriptional control regions • Develop computational methods and data resources for investigating host and viral interactions

[S-15]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents 1.4 Malaria

Heads of Programme Dominic Kwiatkowski and Oliver Billker

1.4.1 Structure of the Programme

Dominic Kwiatkowski Oliver Billker Matt Berriman Chris Newbold (Honorary) Julian Rayner Gavin Wright

1.4.2 Vision for the next five years The primary aim of the Malaria Programme is to assist global efforts to eliminate malaria. We will develop genomic tools to guide public health policymakers in the most effective use of existing anti-malarial drugs and insecticides, and we will undertake a programme of basic research on natural and experimental genetic variation aimed at discovering novel therapeutic and vaccine strategies and providing both physical and bioinformatics resources to help drive experimental genetics research throughout the malaria research community.

A major obstacle to malaria control is the ability of the parasite to develop resistance to anti- malarial drugs, and of the vector to develop resistance to insecticides. It is difficult to combat this process of natural selection without tools to monitor it. We will develop systems to monitor genome variation in Plasmodium and Anopheles populations, with statistical methods and web-based tools that will utilise these real-time population genomic data to gain early warning of the emergence and spread of new forms of drug or insecticide resistance. We will provide tools and resources to underpin research into parasite and vector epidemiology on both large and small scale, to elucidate the transmission dynamics of the disease and the effect of public health interventions. Finally, we expect that the sequencing data accumulated by these studies and made available to the community will provide a precious resource for the study of the basic biology of parasite and vector, and will be highly synergistic with the Pathogen Variation Programme.

[S-16]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents We will combine new statistical approaches in genome-wide association analysis with next generation sequencing to generate a detailed catalogue of human genetic determinants of malaria resistance, with the goal of identifying molecular mechanisms of protective immunity. Specifically, we will undertake a trans-ethnic genome-wide association study of resistance to P. falciparum malaria in multiple populations in Africa, Asia and Oceania. More generally, we aim to utilise the high levels of genetic diversity and population structure found in African populations to overcome roadblocks in the fine-mapping of causal variants that are encountered in European populations. This will link in closely with the Human Genetics Programme at multiple levels, particularly in understanding human genome variation in different populations, and in developing capacity and infrastructure for genetic studies of common diseases in Africa.

We will provide technologies and resources to advance Plasmodium experimental genetics beyond the current medium-scale projects, and make these resources freely available. We will use recombineering technology to generate a large open resource of versatile gene modification vectors to catalyse genome-wide targeted mutagenesis studies across the malaria research community. This will have many potential applications, particularly in functional characterisation of potential drug and vaccine targets. We will integrate with the Mouse and Zebrafish Genetics Programme to exploit the Institute’s extensive mouse mutagenesis Programme, including a genome-wide library of conditional homozygous mutant ES cells, to undertake a systematic functional analysis of host genes associated with malaria infection, and publish these phenotypic data in real-time through the Institute’s Mouse Genetics Portal. Our ability to systematically mutate both the parasite and host genome will also allow us to follow up any parasite or host genes identified as functionally significant in our genomic epidemiology work described above.

We will also advance technologies for molecular phenotyping of Plasmodium parasites, and apply these technologies to an in-depth understanding of Plasmodium motility and host cell invasion across Plasmodium species and life cycle stages. We will apply quantitative mass spectrometry of post-translationally modified proteins to a panel of Plasmodium lines with motility defects to generate an understanding of the protein networks that underpin the regulation of motility and invasion. We will also apply novel Sanger protein:protein interaction technology to identify P. falciparum – erythrocyte protein:protein interaction pairs. Both of these aims have the potential to yield novel intervention targets.

1.4.3 Detailed aims and deliverables Project 1: Genomic Epidemiology of Malaria Our overarching goal is to understand how natural genome variation in human, Plasmodium and Anopheles populations affects the biology and epidemiology of malaria, and to utilise this knowledge to develop more effective ways of controlling a disease which afflicts half a billion people each year.

We will investigate methods for monitoring genome variation in Plasmodium and Anopheles populations by sequencing large numbers of isolates each year. Our goal is to develop systems to provide early warning of the emergence and spread of new forms of anti-malarial drug resistance and insecticide resistance. If successful, this could transform global strategies for malaria elimination.

To provide new leads for vaccine development, we will generate a detailed catalogue of human genetic determinants of resistance to P. falciparum malaria by trans-ethnic genome-

[S-17]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents wide association analysis combined with population-specific . Our analytical strategy aims to utilise the genetic diversity of African populations to enable fine- mapping of causal variants. We will work with the Human Genetics Programme to understand how to conduct GWA studies across different populations based on systematic characterisation of population-specific patterns of genome variation; and to develop capacity and infrastructure for genetic studies of common diseases in Africa.

Together with the Pathogen Variation Programme, we will develop new approaches for comparative genomics and the functional analysis of genome variation to gain basic insights into host-parasite biology, including the use of next generation sequencing to investigate natural variation in the Plasmodium transcriptome.

Aim 1: Discover human genomic variants that confer resistance to malaria • Perform curation and statistical evaluation of an archive of over 60,000 DNA samples together with clinical and epidemiological data from 15 malaria-endemic countries • Conduct a trans-ethnic GWA study of severe malaria in Africa, Asia and Oceania, with large-scale replication and fine-mapping studies. This will be underpinned by population-specific sequence data generated by the Human genetic variation project • Conduct functional investigation of putative host resistance genes discovered by GWA, by studying these in the Experimental genetics of malaria project

Aim 2: Use genome sequencing of Plasmodium populations to inform malaria control strategy • Characterise genome variation and population structure in P. falciparum and P. vivax populations around the world • Develop informatic tools for monitoring genome variation and evolutionary processes in Plasmodium populations • Understand how to perform reliable GWA studies of drug resistance and other medically important phenotypes in Plasmodium populations • Develop standardised tools to quantify phenotypic variation in parasite-erythrocyte interactions, linking work on natural genetic variation to the Experimental genetics of malaria project

Aim 3: Use genome sequencing of Anopheles populations to inform vector control strategy • Characterise genome variation and population structure in Anopheles gambiae populations around the world • Develop informatic tools for monitoring genome variation and evolutionary processes in Anopheles gambiae populations • Understand how to perform reliable GWA studies of insecticide resistance and other medically important phenotypes in Anopheles populations

Aim 4: Use genome and transcriptome sequencing of reference parasites to gain insights to basic biology • Refine current reference assemblies and annotation by incorporating new data from large-scale malaria genome and transcriptome sequencing projects (will link closely with work on other parasites in the Pathogen Variation Programme) • Underpin malaria genome variation, comparative genomics and transcript sequencing projects at the Institute and in the research community

[S-18]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents Project 2: Experimental Genetics of Malaria Plasmodium drug resistance is one of the major challenges facing global malaria control and the characterisation of new drug and vaccine targets is an urgent priority. Completion of the P. falciparum genome sequence revealed many new potential targets, but assigning priorities to these targets requires a thorough understanding of the basic biology of Plasmodium and its hosts.

The overall aim of this project is to accelerate this process by generating genome-scale resources of use to the wider research community and by applying these resources to understand, at a molecular level, specific host-parasite interactions that could be targeted by new therapies.

We propose to remove a major roadblock currently limiting large scale Plasmodium experimental genetics by providing the community with genome-wide vector resources for the genetic modification of malaria parasites (Aim 1). We will also apply state-of-the art proteomics technology to a global analysis of specific Plasmodium protein post-translational modifications and make these data available without delay through public databases where they will aid drug development (Aim 2). We will develop an in-depth understanding of one high-priority intervention target, host cell invasion and parasite motility, both by using the resources generated in Aims 1 and 2 to identify the conserved protein networks regulating motility, and by using large-scale molecular screening technology developed at the Institute to systematically map for the first time the extracellular interactions between P. falciparum merozoites and erythrocytes (Aim 3). Finally, to identify key host genes modulating malaria susceptibility and pathology we will make use of the Institute’s large resources of mouse mutants and mouse embryonic stem cells by implementing scalable infection protocols for primary screening (Aim 4).

Aim 1: Genome-scale resources for genetic manipulation of Plasmodium • Use recombineering to generate comprehensive libraries of versatile targeting vectors encompassing ~5000 genes in P. berghei and trial application to P. falciparum genome • Make vectors freely available to the research community • Develop improved targeting cassettes and conditional gene knockout technology

Aim 2: Understanding gliding and invasion through molecular phenotyping • Use quantitative mass spectrometry to generate reference datasets of phosphorylated and palmitoylated proteins in P. falciparum and P. berghei • Combine genome-wide resources described above with comparative proteomics to generate an in-depth biological understanding of the role of post-translational modification in regulating parasite motility and host cell invasion

Aim 3: Identifying novel extracellular Plasmodium-erythrocyte interactions • Develop platforms for identifying and quantifying extracellular erythrocyte- merozoite invasion receptor-ligand pairs using recombinant proteins • Apply the platform to all versus all screens to discover new host-parasite interaction partners that function during erythrocyte invasion • Generate an in-depth understanding of identified interactions by mapping binding domains and investigating their function in vitro

[S-19]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents Aim 4: Host genes that modulate malaria • Discover host genes important for experimental malaria by challenging knockout mice and homozygous knockout embryonic stem cell lines with rodent malaria (P. bergei) • Develop animal models for functional studies on candidate genes, which the natural genetics project finds to be under parasite selection or linked to malaria disease in humans

[S-20]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents 1.5 Mouse and Zebrafish Genetics

Head of Programme Derek Stemple (acting)

1.5.1 Structure of the Programme

Derek Stemple David Adams Oliver Billker Allan Bradley Gordon Dougan Pentao Liu Darren Logan William Skarnes Karen Steel Gavin Wright Vijay Yadav

1.5.2 Vision for the next five years This Programme aims to functionally annotate vertebrate genomes using genetic analysis of mice, zebrafish and ES cells on very large scale. These will be the largest and most ambitious projects of their type anywhere. The outputs from this Programme are designed to foster a community of researchers, empowering them with structured phenotypic data and providing genetic resources to facilitate further in depth analysis. Five years from now, this Programme will have examined the function of more than 1,000 genes in mice, 8,000 genes in zebrafish and many aspects of cellular function will have been probed using genome wide mutagenesis in ES cells. The genetic resources generated by this Programme will be of long lasting value. The Mouse genetics project will double the number of available mutants in the archives, the Zebrafish mutation resource will increase the resource by approximately an order of magnitude and the ES cell mutation resource project will be an entirely new resource.

These projects will provide functional information which can be extrapolated to the orthologous genes in the human genome. The mouse and the zebrafish are complementary model systems for this purpose. As a mammal, the mouse shares greater anatomical and physiological parallels with humans. Zebrafish offer the particular advantage of rapid

[S-21]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents external embryonic development, facilitating discovery and analysis of developmental genes. The Mouse genetics project builds on the phenomenal mouse ES cell resources to provide access to new gene functions in a variety of areas from early embryonic development to adult physiology. The ES mutation resource project will be used to identify genes required for many aspects of cellular function such as self-renewal and differentiation. Both the Mouse genetics and ES mutation resource projects will leverage the Institute’s investment in pathogen genetics and discover host genes involved in infection and immunity. The Zebrafish mutation resource will focus on identifying developmental genes.

This Programme aims to introduce a third model organism, namely humans, in the guise of human iPS cells. As the technology of generating transgene-free human iPS cells is fully developed, we will be well placed to develop robust manipulative genetic systems. In the future, these technologies can be implemented to recover iPS cell lines from human samples, to support studies of the impact of natural genetic variation (see Human Genetics Programme).

1.5.3 Detailed aims and deliverables This principal objective of this Programme is to discover the function(s) of thousands of vertebrate genes through experimental genetic approaches in mice, ES cells and zebrafish which can be extrapolated to the human genome. These objectives will be achieved through large scale internal programmes and provision of resources to external groups, providing common platforms and working towards the shared vision of functionally annotating vertebrate genomes. The objectives have a firm foundation, building on progress and resources assembled during the current quinquennium. In the next five years, we plan to add a third organism, human, by developing technology and resources to enable facile genetic manipulation of human iPS cells.

Project 1: Mouse genetics project This project is based on the large scale targeted ES cell resources assembled under the auspices of the KOMP and EUCOMM projects. This project aims to establish more than 1,000 of these alleles in mice and conduct a first pass primary phenotype analysis as part of a coordinated effort under the auspices of the International Mouse Phenotyping Programme. Phenotypic data and genetic resources will be distributed globally.

Aim 1: Establish in mice 1,000 alleles from the targeted ES cell archive • Generate mice from ES cells at the rate of 200 annually • Determine the gene expression patterns in adult tissues and in the embryo • Archive and distribute mice

Aim 2: Determine the primary phenotype of each mutant allele • Assess homozygous embryonic lethality or viability of each allele • Assess male and female fertility • Conduct primary phenotypic screens on mutant mice including challenges

Aim 3: Establish a homozygous mutant cell line bio-bank. • Recover and store tissues and primary cell lines from each homozygous mutant

Aim 4: Towards an electronic encyclopaedia of gene function. • Electronic mouse tracking and data warehouse for health records

[S-22]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents • Assemble a structured phenotype archive to support the external community • Present data to the scientific community

Project 2: An ES cell mutation resource The Mouse genetics project supports the assessment of gene function at scale, however, in vivo analysis is limited to just 5% of the genome over the next five years. This project uses high throughput genetics approaches in ES cells as a surrogate for the mouse, enabling the functional assessment of thousands of genes over the next five years. This project will also develop concepts and approaches to efficiently genetically manipulate human iPS cells, providing a foundation for manipulative genetics of the human genome.

Aim 1: Library of conditional homozygous mutant ES cells • Engineer ES cells to have inducible Cre and Flp expression • Generate targeting constructs for both alleles from KOMP/EUCOMM vectors • Serially target ES cells and identify double targeted cells

Aim 2: Growth and differentiation phenotypes • Induce homozygous mutations and assay growth and viability of undifferentiated cells • Assay differentiation potential of homozygous mutants • Generate transcript counting expression profiles of mutants which exhibit phenotypes • Establish databases of mutant ES cells and associated phenotypes

Aim 3: Genome-wide libraries of homozygous mutant ES cells • Transposon mutagenesis in Blm-deficient ES cells and selection of bi-allelic mutants • Complexity assessment of pooled libraries enriched for homozygous mutant ES cells

Aim 4: Genome-wide screens for cellular phenotypes • Drop-out screens to identify mutants sensitive to DNA damage and toxins • Screen libraries to identify genes for viral, bacterial and Plasmodium infection

Aim 5: Human iPS cells • Efficient generation of "transgene-free" human iPS cells • Gene targeting and transposon mutagenesis in human iPS cells • BLM-deficient human iPS cells and their use for genetic screens

Project 3: Zebrafish mutation resource The Zebrafish mutation resource uses ENU mutagenesis and Illumina sequencing to generate and detect mutations in zebrafish genes and phenotype them. This project has a major role in the provision of requested mutant alleles to the international zebrafish research community.

Aim 1: Mutation discovery • Application of Illumina sequencing to discovery of disruptive alleles • Exploration of new mutagenesis and mutation discovery approaches

[S-23]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents Aim 2: Morphological phenotype analysis • Describe the morphological phenotype for mutations in 8,000 protein-coding genes • Controlled vocabulary descriptions of morphological phenotype (~20% of genes) • Maternal-zygotic phenotypes • Pilot scale effort to explore adult phenotypes

Aim 3: Molecular phenotype analysis • For 20% of mutants produce a high-quality gene expression dataset • Development of methods for RNA expression analysis

Aim 4: Phenotype databases • Facilitate searches and links to and from the genome and resource providers • Provide phenotype data to existing databases ZFIN and ArrayExpress • Promote integration across existing model organism and disease databases

[S-24]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents 1.6 Genome Informatics

Heads of Programme and

1.6.1 Structure of the Programme

Tim Hubbard Alex Bateman Matt Berriman Ewan Birney (EMBL-EBI) Richard Durbin Paul Flicek (Honorary) Jennifer Harrow Ville Mustonen Zemin Ning Stephen Searle Derek Stemple

1.6.2 Vision for the next five years Biology as a field has reached a point where digital data generation is ubiquitous and the discipline of processing and using biological data, bioinformatics, is a core activity alongside , biophysics etc. The exponential growth in biological data is set to continue and indeed exceeds the increase in computer performance, creating stringent demands on the technical aspects of bioinformatics.

Nowhere is bioinformatics more critical than in genomic and genetic research, as is carried out at the Institute. A large part of the science we undertake depends on the computational processing and analysis of genomic data. Therefore many of the Programmes have bioinformatics components in themselves. This Genome Informatics Programme will play an important central role by:

• Organising and developing important data resources • Enabling systematic integration of data and methods • Driving computational innovation

[S-25]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents We firmly believe that well organised and accessible databases for key information facilitate biological innovation and discovery at a global scale. Second, biology needs organising principles and concepts to structure and use the vast amounts of data produced. Third, scientific breakthroughs are often conceived at the intersections of disciplines, thus, it is critical to develop frameworks which enable the pursuit of integrative approaches.

Our three guiding principles are aligned with and contribute substantially to the Institute’s overall research strategy as detailed in this and the individual Project documents. However, we also deliver research analysis, tools and resources to the world at large. The figure below shows how the three projects interact with each other and integrate with resources at the EMBL-EBI. There are multiple synergies and interactions between these projects and others at EMBL-EBI, creating a powerful set of integrated biological resources for external users. For example, although the Ensembl infrastructure developed in Project 1 could be applied to all genomes, it is restricted to vertebrates with much of the effort is focused around human, mouse and zebrafish, the core Institute vertebrate model organism systems. EMBL-EBI which has a remit to provide access to sequence data from all genomes contained in its repositories has recently started a separate project for non-vertebrates. As a result, the Campus as a whole is able to offer access to genome annotation across biology through the same interface. Ensembl and Ensembl Genomes share infrastructure development. Separately Ensembl Genomes has links to the GeneDB resource of the Pathogen Variation Programme.

The relationship between Genome Informatics projects and the flow of data to users and into repositories Project 1 generates vertebrate genome annotation, focused particularly around the Institute model organisms of human, mouse and zebrafish. Project 2 generates functional annotation around families of biological elements. Project 3 contributes novel algorithms and analysis. Institute high throughput data is deposited in repositories, mainly at EMBL-EBI.

[S-26]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents A key theme that runs through the projects of this Programme is to develop these resources and our analysis strategies to take into account functional genome variation within populations. Up to now all the resources, algorithms and analysis have been based around a single reference genome and a single set of annotations for each species. Moving toward a population based picture will be essential in order to appropriately organise and interpret data that will be collected in the next quinquennium. Another key theme is a refocusing of the resources in the Programme on the Institute priority organisms: human, mouse, zebrafish and pathogens.

Finally, it is anticipated that the organisation of data linking genotype and phenotype in humans will lead to connections between this Programme and healthcare systems. One aspect will be facilitating research involving data from medical records. In many countries electronic medical records systems are being built and programmes are being setup to facilitate research federating datasets in a secure fashion, such as the UK Research Capability Programme of the National Health Service (NHS). These will ultimately want to federate with databases of genetic annotation. The other aspect will be the integration of a subset of research level genotype phenotype relationships into decision making tools to aid clinicians. This will be such a complex and clinically based project that it is likely to be done by collaboration, such as with specialised clinical research institutes. However it may ultimately be a major translational output of the overall Programme.

1.6.3 Detailed aims and deliverables The three projects below address our key principles described in 6.1: organising and developing important data resources, enabling systematic integration of data and methods, and driving computational innovation.

Project 1: Attachment of function to vertebrate genomes This project aims to support and develop vertebrate genome resources. It will extend both annotation resources and genomes assemblies to represent significant variation across populations for reference genomes.

Aim 1: Develop and maintain genome annotation resources • Provide an up to date integrated reference gene resource for “high-quality” sequenced vertebrate genomes (mainly human, mouse and zebrafish) • Provide regular releases of Ensembl integrated data resource • Provide significant enhanced variation data resources and connections between genotype and phenotype data

Aim 2: Periodically create new sequence assemblies for Human/Mouse/Zebrafish • Periodically create new genome sequence assemblies in collaboration with Genome Reference Consortium (GRC) that resolve identified assembly/haplotype/indel problems • Carry out informatics based evaluation of existing assemblies using relevant new datasets, such as population sequencing data • Carry out experimental investigations, driven by informatics analysis and community monitoring, to resolve assembly problems

[S-27]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents Project 2: Relationship between biological sequences This project aims to support and develop resources and tools that organise sequence data and present it in a way that maximises the functional knowledge biologists and bioinformaticians can infer.

Aim 1: Production of Pfam, , TreeFam, MEROPS • Regular releases of the databases • Development of the Pfam protein families database • Development of the MEROPS peptidase database • Development of the Rfam RNA families database • Development of the TreeFam gene trees database

Aim 2: Improve understanding of the evolutionary history of proteins and ncRNAs • Methods to identify distant to identify function • Non-coding RNA identification and variation • Understanding pathogen biology with family resources

Project 3: Computational and statistical methods This project will support research in computational biology theory and methods at the boundaries of what can be done in areas of importance for Institute science, and develop implementations that advance those boundaries for use both within the Institute and by others.

Aim 1: Develop improved methods • Develop and implement efficient methods to construct overlap assembly graphs using Burrows-Wheeler Transform (BWT) based self-indices • Build population assembly graphs that represent variation within a species • Develop read error-correcting methods based on multiple alignments that make use of sequence quality information • Work with others to establish standards for representation of assembly graphs • Parallelisation of assembly representation and algorithms

Aim 2: Identify vertebrate gene regulatory signals on a large scale • Generate a comprehensive library of factor binding (TFB) motifs for human, mouse and zebrafish using computational motif discovery algorithms • Identify families of evolutionarily related TFB motifs from within and between species • Annotate regulatory sites in vertebrate genomes at the transcription factor binding motif level • Annotate non-coding variation within annotated regulatory sites • Develop computational motif discovery algorithms to improve sensitivity • Investigate relationships between motif binding sites in large scale regulatory regions such as enhancers

Aim 3: Develop methods to help understand functional genomic variation • Develop molecular phenotype based analysis methods for population variation data • Develop population genetic methods for genomic time-series data of mutations

[S-28]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents 2. Translating our science

2.1 Vision for the next five years We will maximise the translation of our scientific output into health benefits: this will include our traditional dissemination of data and biological materials to the wider scientific community. In this five year period we will be more proactive in our approach and involvement in translation such that our scientific advances and discoveries will be exploited for healthcare benefits both by the direct translation of our research output into diagnostics and therapeutics but also by the development of our data output to support clinical understanding of genetics and genomics. This will involve, where necessary, the protection of our intellectual property and greater involvement in licensing activities and partnering with commercial concerns. The Institute will foster a culture which values, encourages and rewards technology development and innovation with realisable clinical potential. The Institute will facilitate this through regular assessment of our research portfolio and the encouragement and progression of ideas and technologies for translational output.

2.2 Detailed aims and deliverables Aim 1: Support and develop existing opportunities towards health benefits • Review progress on the translation of scientific opportunities in our current projects and support Faculty in their plans for translation • Establish an academic access committee for Kymab to provide recommendations on the distribution of antibodies and mice • Monitor and advise our Sequencing R&D activities on the protection and exploitation of opportunities for improving high throughput sequencing • Support our projects in receipt of Health Innovation Challenge funding: Deciphering Developmental Disorders, Monitoring Cancer Disease Burden

Aim 2: Create a Translation Office to drive translation • Recruit a business development professional and a lawyer as the core of a technology transfer team at the Institute • Establish processes for the operation of a budget for gap funding to validate and perform proof of principle studies • Build an external support network for the team to provide intellectual property and expert advice • Establish the means of communication and seeking advice on potential translation projects and models with GRL Board and the Wellcome Trust

Aim 3: Encourage and educate Faculty in translating their science • Form a translation committee for oversight of the development of opportunities at the Institute including the provision of external advice • Review with Faculty the translation opportunities they foresee in the Programmes and projects of our 2011-2016 Plan • Build links with industry to understand their needs and establish links for future development collaborations • Attract seminar speakers from the commercial world and academic translation offices to discuss examples of translation

[S-29]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents Aim 4: Maximise the clinical benefits of our 2011-2016 scientific projects • Support the translation initiatives of our projects through business development advice and, where necessary, protection of intellectual property • Invite clinical geneticists to visit the Institute to discuss their requirements and help define the supporting genomic information they will require for clinical decisions • Monitor the progress of projects with the potential for improving health benefits and provide resources and advice to ensure translation objectives are met • Provide support and advice to projects on protection of intellectual property, licensing options and relevant industrial partners capable of taking products through the drug development pipeline

Aim 5: Derive guiding principles and criteria for translation • Facilitate a debate amongst Faculty on the principle of translation of achieving the maximum health benefit of our science • Provide guidance on how the principles of open access can be fulfilled whilst exploiting the results of our science through commercial means • Explore models for partnering with industry and/or commercially exploiting our intellectual property

[S-30]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents 3. Developing our people

3.1 Vision for the next five years We aim to provide a stimulating and rewarding environment in which individuals at all levels can learn, develop and grow during their time at the Institute while playing their part in the delivery of world class science. Central to this mission is our commitment to, and passion for, developing and nurturing future generations of scientists in genomic sciences. In particular, we seek to inspire the young, to provide researchers and clinicians with unprecedented opportunities to develop their skills and to foster scientific leadership capability through high quality training and mentoring. We will build on existing partnerships and where necessary forge new links to further enhance the Institute’s reputation as a training environment – we will also seek ways in which to support capacity building in developing countries.

3.2 Detailed aims and deliverables Scientific leadership The quality, international competitiveness and productivity of the Institute are dependent on its scientific leaders, their vision and the teams who are responsible for executing it.

Aim 1: Maintain strong scientific leadership for our Programmes • Recruit senior leadership for Human Genetics and Mouse and Zebrafish Genetics with the vision to further develop and integrate our research Programmes in these areas • Continue to support outstanding researchers and attract the brightest and best Faculty to the Institute, providing them with a level of support and access to resources which allow them the freedom to explore the most challenging questions in their field • Take careful decisions on the renewal of our Faculty and maintain a career structure that achieves a balance between stability and turnover to enable our research portfolio to remain at the forefront of scientific and medical developments • Promote the professional management of the Institute’s major technology platforms and pipelines by recruiting and retaining high calibre individuals to drive performance and delivery in these areas

Aim 2: Increase the breadth and depth of our scientific connections • Make further joint and Associate Faculty appointments in priority areas • Promote and encourage sabbatical opportunities to bring new intellectual contributions as well as expand the influence of the Institute • Strengthen links with UK and international groups – in particular, by partnering with Wellcome Trust Investigators in other Wellcome Trust Centres and Overseas Programmes to drive scientific output and maximise opportunities for the community of Wellcome Trust scientists

Training the next generation of scientists Educating and training the next generation of scientists will remain central to the Institute’s mission. Genomic science is increasingly important for UK healthcare and scientists who are trained in high throughput science at this Institute will be in a position to assume key roles in the future. We will continue to run a variety of schemes and programmes catering for

[S-31]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents individuals at different career stages and spanning biological and computational disciplines. In particular, we will expand our participation in clinical research training schemes by partnering with the University of Cambridge and Addenbrooke’s Hospital. We will also invest in aspiring scientists by providing stimulating work placement opportunities for school-age students and undergraduates, including medical students.

Aim 1: Refine and develop the Institute’s Graduate Programme • Increase resource for the Programme through external funding, in particular targeting high calibre candidates with research interests aligned to our Programmes and projects • Explore opportunities for cross fertilisation in graduate student education between the University of Cambridge and the Institute • Investigate mechanisms for the ongoing support and development of students from developing countries, in particular Africa • Consider options for streamlining our student application procedure to improve tracking of applicants and to reduce duplication of effort between the University of Cambridge and the Institute • Further develop our student alumni database, incorporating an annual newsletter

Aim 2: Enhance postdoctoral training • Evaluate options for an overarching organisation and structure to provide oversight and Faculty leadership for postdoctoral training • Equip individuals with a broad range of transferable skills during their time at the Institute to enable them to achieve their career goals – whether that is preparation for the transition to independence or applying their scientific training in other roles and settings • Encourage our Postdoctoral fellows to plan effectively for their future through regular personal development discussions with supervisors and by providing access to first class career guidance through the University of Cambridge Careers Service (for life science Postdoctoral researchers)

Aim 3: Establish the Institute as a first class training environment for clinicians • Further enhance the Institute’s reputation as a training environment for clinicians by consolidating existing links and forming new connections with clinical PhD and other programmes run by the University of Cambridge • Seek to increase the number of clinical fellows at Postdoctoral and Faculty level by sponsoring applications for recognised clinical fellowship award schemes • Strengthen links between clinical PhD students, other clinical fellows and clinical Faculty members to facilitate clinical mentoring

Aim 4: Provide a range of opportunities for young people to broaden their knowledge of science and scientific careers • Review the opportunities that we offer to young people at various stages of their education in the form of work experience, vacation or longer-term placements as part of a sandwich degree course • Investigate and evaluate options for a graduate recruitment and training scheme

[S-32]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents Learning and development Our people are our greatest asset and we aim to provide learning opportunities for all members of the Institute such that they can perform of their best whatever their role. We have identified a number of priority areas, for example induction and training in health and safety, and will look to our newly appointed Learning and Development Manager to coordinate and drive associated activities and initiatives.

Aim 1: Develop and implement a new learning strategy for the Institute • Devise a new learning curriculum for the Institute, investigate and evaluate different modes of learning delivery and streamline arrangements for learning administration. The new curriculum will incorporate the themes of translation and ethical awareness mentioned elsewhere in this document [see section 8: Translating our science, and Supporting documents section 2] • Implement a new performance appraisal process which provides a framework for objective setting and helps to promote a positive and engaging culture around individual review and personal development • Achieve closer integration between learning and the linked processes of appraisal, career development and succession planning to ensure i) the organisation has the necessary skills to support the delivery of the Institute’s Programmes and ii) refreshment of ideas through the recruitment of ‘new blood’ and turnover

Aim 2: Foster scientific leadership capability at all levels and career stages • Commit resource for and ensure access to high quality leadership development programmes • Continue to work with academic partner organisations (EMBL-EBI, CR-UK, MRC, ICR etc) to develop joint initiatives for leadership training and thus benefit from shared experiences and mixing of individuals across organisation boundaries

Aim 3: Create a culture in which management excellence is recognised and valued • Encourage managers through the provision of high quality training and mentoring to continually refine their managerial skills • Devise a comprehensive modular management training programme for the Institute’s managers and supervisors covering all aspects of the management role

Aim 4: Provide a more structured framework for the mentoring of individuals • Provide information and tools to support mentoring, raise awareness and refine skills • Facilitate a coaching and mentoring culture in which individuals are supported and developed to achieve their potential in roles within and beyond the Institute • Consider the needs of special groups, in particular younger women scientists, clinicians and scientists from developing countries

Aim 5: Promote diversity as part of an inclusive work environment in which there is equality of treatment and opportunity • Undertake a formal review and audit of diversity policy and practices to assess the extent to which inclusiveness and equality of treatment and opportunity is happening

[S-33]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents • Raise awareness amongst staff at all levels of diversity issues providing guidance and training as appropriate to ensure the effective management of such issues in the workplace • Take positive action to attract, develop and support clinicians, scientists from developing countries and female scientists

[S-34]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents 4. Developing our organisation

4.1 Vision for the next five years As a world leading scientific research institute we aim to be an exemplar of organisational effectiveness, ensuring our corporate governance structures provide transparent mechanisms for management and accountability for the Institute and demonstrating ethical responsibility in all our actions. As employers we aim to have an embedded culture of health and safety with employee wellbeing as the central focus.

4.2 Detailed aims and deliverables Aim 1: Improve organisational efficiency and effectiveness and maximise the benefit achieved from investment • Establish a cycle of process and performance review to ensure that our corporate and core services provide best value in fiscal terms and improved productivity • Develop a range of key performance indicators that drive improvements in our organisational effectiveness and, where appropriate, provide quantitative or qualitative measures of outcomes enabling measurement of our success against our stated objectives • Review the administrative support framework for scientific projects with the aim of enabling our scientists to focus on the delivery of the scientific objectives and development of high performance teams • Benchmark our performance against similar organisations to gauge our organisational effectiveness • Adapt examples of best practice from other organisations in the delivery of transactional services to the Institute’s own needs

Aim 2: Continue the development of principles, supporting policies and organisational structures to manage the Institute’s legal and ethical responsibilities • Establish a rolling review of the governance framework, ensuring that all committees have clear terms of reference and a knowledgeable and balanced membership • Review the legislative/policy frame within which the Institute operates to ensure it supports the accomplishment of our legal and ethical obligations. The Institute’s policies will be reviewed to ensure they reflect current legislation and are easily accessible • Actively participate in consultation on national and international issues that directly impact on the work of the Institute and/or its employees, providing constructive comment and influencing outcomes • Review the risk assessment processes to ensure they are robust and facilitate early identification, assessment and prioritisation of foreseeable risks and maintain a timetable for regular review of our Risk Register • Participate in the public debate and discussion on ethical and societal issues related to our programme of science. Facilitate and actively encourage a debate in the Institute on these ethical issues and broaden our scientists’ appreciation, education and knowledge of these issues

[S-35]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents Aim 3: Develop our Health and Safety culture • Continue the operational implementation of the Campus Health and Safety Policy. The Policy will be reviewed and updated in line with regulatory changes and respond to the impact of changes on the Institute’s operations • The model of a framework of local coordinators will be evaluated to determine its potential for deployment more broadly across the Campus • Develop a matrix of health and safety competencies and introduce a training programme in response to needs assessed through consultation with managers • Develop an audit programme and introduce a regime of regular inspection and monitoring as part of compliance verification • Promote, support and monitor occupational health so that a holistic approach to employee well being and work related health issues is embedded

[S-36]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents 5. Strategic relations

5.1 Vision for the next five years In delivering its mission the Institute interacts with many different stakeholders, including the scientific community; commercial entities; research funders; other key stakeholders (e.g. MPs, journalists, opinion formers etc.) and the general public [see Supporting documents section 6: Spreading the word]. We will develop and foster these relations with the overall goal of maximising opportunities for scientific collaboration, translation and funding and to foster an environment that is conducive to scientific research. Underpinning this goal is the need for appropriate communication of the Institute’s strategy to the different stakeholder groups [see Supporting documents section 6: Spreading the word].

Collaboration is at the heart of the scientific endeavour bringing additional expertise and/or resources to address specific scientific questions. Institute scientists are engaged in many collaborative research projects from individual, investigator driven collaborations to leading international research consortia. We plan to increase awareness of the Institute’s research activities, to explore the possibilities of new collaborations with other research institutes and to strengthen and build on existing links. Increasingly the Institute will be seeking additional funding from other funders and a more proactive approach will be taken to this. In order to translate the outcomes of its research into real healthcare benefits the Institute must interact with commercial partners who have the R&D capabilities and distribution capacities to take the outputs of basic scientific research through to the market place (vaccines, therapeutics, diagnostics, etc.). We will develop relationships with potential commercial partners to facilitate future translation opportunities.

5.2 Detailed aims and deliverables Aim 1: Build and maintain collaborations • Continue to work with scientists worldwide on investigator driven research collaborations • Develop a plan of visits to other Wellcome Trust Centres and Programmes in the UK and overseas • Explore opportunities for collaboration with other major genome centres • Explore opportunities for interaction with the developing UK CMRI

Aim 2: Strengthen local links • Continue to collaborate closely with the EMBL-EBI through joint Faculty, Postdoctoral fellows, students and projects • Identify opportunities for increasing interactions between the Institute and the EMBL-EBI through regular strategic discussion between the Institutes’ Directors • Continue to build collaborations with scientists and clinicians at the University of Cambridge through Associate and joint Faculty, students and joint projects • Identify opportunities to build new links between the Institute and the University of Cambridge through regular interactions between the Institute Director and the Head of the Clinical School and other key staff

Aim 3: Strengthen links with Africa • Build on the Institute’s expertise and resources along with those of key African partners to identify and address key scientific questions

[S-37]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents • Increase communication between Institute staff working in Africa and partners to share experience and expertise and to identify opportunities for coordination of efforts wherever possible • Work with the Wellcome Trust Advanced Courses to identify opportunities for capacity building

Aim 4: Lead international research consortia • Continue to play a key role in the scientific leadership of the 1000 Genomes Project, MalariaGEN, the International Knockout Mouse Consortium and the International Cancer Genome Consortium • Continue to play a key role in the development of the International Mouse Phenotyping Consortium • Influence ethical and policy issues around international research consortia through direct involvement in the debate on these issues • Identify new opportunities to develop and lead international research consortia where this makes scientific, political and practical sense

Aim 5: Facilitate funding opportunities • Maintain close interactions with the Wellcome Trust • Develop links with traditional sources of funding (MRC, BBSRC, CR-UK, EC, NIH, etc.) and to explore potential new sources for additional third party funding

Aim 6: Facilitate translational opportunities • Develop relationships with specific companies through targeted visits • Host an Industry Open Day to showcase our research to potential partners • Explore other mechanisms for increasing awareness of Institute science and translation opportunities within the commercial scientific community

[S-38]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents 6. Spreading the word

6.1 Vision for the next five years Our vision is to empower researchers in this demanding world, enabling them to better communicate their research and engage in public dialogue about its possible implications for society. We will enable and support researchers to maximise the value of their participation in media, public relations or public engagement.

Through our Media and Public Relations activity, we will seek to enhance the Institute’s reputation in order to attract and retain the best researchers and to deliver more, and better, science as a result of increased intellectual, political, financial and public support. The Institute believes it is best placed to translate its findings, knowledge and expertise into products, tools, ethics, guides and understanding, and that Institute researchers are its voice. The Institute will help its researchers to strengthen existing mechanisms for communication, to explore new opportunities and to deliver high-quality and consistent messages across a range of channels.

Through our Communication and Public Engagement activity, we seek to engage with and empower individuals in an age when genetic and genomic applications are increasingly integrated into healthcare and broader society. We will build on the practices and infrastructure developed in this quinquennium to deepen interactions between Institute researchers and non-specialist publics and embed informed discussion of contemporary genomic research into school and adult education. Institute researchers will be provided with training and support to enhance their public engagement activity, and we will seek to maximise the impact of their involvement through collaborations with scientific, educational and cultural institutions.

6.2 Detailed aims and deliverables Communicating and influencing The Institute has a clear vision of its work, its goals and its aims that can be encapsulated in a fashion that can be clearly disseminated to staff, researchers and wider publics. Institute staff must have the full range of support and training needed to communicate most efficiently and take full advantage of the communication channels available. With a distinct, positive and clear voice the Institute will reach new audiences and reach existing audiences in new ways. The health of the Institute is founded, in part, in the breadth of its internal dialogue and will use best practice to improve internal communication.

Aim 1: Consistent and efficient processes for a clear voice • Present our mission and vision in ways that allow clear dissemination to external and internal audiences • Broaden media training to a wider group of researchers and to support discussion of a wider range of issues • Develop robust systems of process, guidance and background information that support researchers to make efficient use of their time • Provide guidance and training for web authorship such that the Institute has a group of author/owners whose content will strengthen the Institute’s messages in the Institute voice

[S-39]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents Aim 2: Seek and create opportunities to make our voice heard • Develop a programme of visitor invitations, targeted to key influencers across a range of arenas • Develop relationships with journalists and broadcast feature producers • Develop opportunities for feature comment in print and online media

Aim 3: Explore and develop new opportunities for communication • Enhance the ability of our researchers to use new communication channels • Develop a range of new media outputs that differ in tone to suit the communication medium and audience • Explore opportunities to develop documentary themes with broadcast producers • Explore new methods to communicate research outputs through the website, including richer content • Work with others to explore best practice and to review the opportunities offered by emerging communications channels

Aim 4: Support and enable rich and productive internal dialogue • Develop new intranets for the Institute and for the Wellcome Trust Genome Campus • Deliver guidance, documentation and training for the intranet and ensure that the intranet is regarded as a primary but evolving space for internal communication • Examine internal communications mechanisms and develop improved systems and mechanisms • Facilitate effective, consistent, relevant and timely communication by Institute staff that enhances the Institute’s reputation internally

Engaging with publics Through public engagement, the Institute seeks to stimulate public interest in genomic research and to encourage informed debate about associated societal issues (Aims 1 and 2), and foster a community of researchers who can effectively engage with non-specialist audiences (Aim 3). In the next quinquennium, the Programme will extend its reach and influence while maintaining its relevance and reputation through constant interaction with its stakeholders, collaborators, and audiences (Aim 4).

Aim 1: Facilitate dialogue between Institute researchers and non-specialist publics • Host visits from secondary school students, teachers and community groups and provide onsite public events to enable audiences to interact directly with scientific staff • Provide opportunities for discussions between non-specialist publics and Institute scientists through social media tools or in person at scientific and cultural venues across the UK • Seek public response – through dialogue and qualitative research – into key societal questions related to Institute science (e.g. attitudes to personal genome sequencing) • Where possible, facilitate ongoing collaboration between Institute researchers and public stakeholders to guide the effective translation of genomic research

[S-40]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents Aim 2: Develop highly sought-after, accurate and engaging activities, resources and experiences on genomic research and issues for education and adult audiences • Collaborate effectively with Institute scientists to develop and maintain targeted resources, spanning the six scientific Programmes and their associated societal issues • Collaborate with an extended network of scientific, cultural and educational institutions to embed contemporary genetics content into existing public programmes and place genomic research within an historical or societal context. Provide intermediaries – such as teachers, science communicators, online and broadcast media producers, artists and academics – with effective access to scientists and appropriate supporting information to develop targeted resources for their audiences • Continue to develop the written, animated and video resources of Yourgenome.org, the Institute’s public engagement website, and enhance these with social media features to enable discussion between researchers and non- specialist audiences

Aim 3: Support a community of researchers who can effectively engage with audiences • Offer a range of training and development opportunities, tools and frameworks for scientific staff to enhance their science communication and public dialogue skills • Provide libraries of activity kits, exhibit items and media to assist scientific staff in their public engagement efforts • Continue rich collaborations with Institute researchers to develop novel strategies and resources that can be used by scientific staff to gain confidence and communicate the nature of their work

Aim 4: Maintain high Programme quality while extending our reach and influence as a leader in genetics and genomics communication • Consult with Institute senior management, Wellcome Trust and other external collaborators, and scientific staff across the major Institute Programmes and sections through steering and stakeholder committees • Conduct timely and appropriate evaluation of the Programme’s products (e.g. resources and events) and its processes (e.g. researcher support and involvement, collaboration, project management) • Contribute to the development of the science communication field by: partnering with science communication courses; sharing learning and principles with other public engagement efforts; and disseminating our models and tools through science communication conferences, journals and portals • Identify additional opportunities to work with or through major public information networks and science education providers

[S-41]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents 7. Resources

7.1 Vision for the next five years The strategy for the Institute remains that of maximising scientific output from the resource available. The resources which the Institute utilises include people, space and finance.

7.2 Detailed aims and deliverables Aim 1: Manage the Institute’s people resource • Balance stability, flexibility and employee resources such that the Institute, its people and its structures can change and grow in line with scientific needs • Develop and keep under review workforce plans for projects and operational areas to ensure that resources and team structures/profiles are actively managed within budget to meet scientific and business need • Review the team and line management hierarchy to ensure it supports the new scientific Programme/project framework and the successful operation of the Institute’s major technology platforms, pipelines and core facilities

Aim 2: Manage the Institute’s space resource • Remain within the current space available • Change the use and layout of space to optimise utilisation • Develop information and tracking of people including both staff and visiting workers so that teams can better plan the management of space allocated to them • Balance the need for flexibility with forward planning and ensure that the Institute’s priorities are reflected in staffing levels and space allocations between teams • Be creative in the management of space but within allocations, whether by sharing offices, hot-desking within team/group space or, where suitable, working from home

Aim 3: Manage the Institute’s financial resources • Maximise the return on the annual spend • Increase 3rd party funding to 20% per annum • Plan carefully the budgets year to year so that the Institute’s efforts remain focused on the highest scientific priorities within the five year Plan • Remain flexible in the allocation of financial resource between projects and between time periods and manage the flex fund to take advantage of new opportunities as they arise and provide a contingency for changes in budget assumptions

7.3 Financing the Strategic Plan The future scientific strategy set out in this document has been developed primarily as a bottom-up activity but with significant top-down moderation from the Board of Management (BoM). Initially, a programmatic framework was established and within this framework the entire Faculty were charged with working together to bring forward their ideas for the Institute’s scientific plans over the next five to seven years.

[S-42]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents As the plans developed and the aims were defined, these have been organised into discrete projects which have been progressively honed and refined through an iterative process with substantial input and oversight from the BoM.

As detailed Programmes evolved, detailed outline budgets were calculated aim by aim so that individual value for money considerations for each aim and project and the overall cost of the scientific portfolio could be assessed in the light of the likely budget available. The budgets included all the direct costs of each project allocated between aims. These include staff costs, consumables, sequencing consumables and equipment costs. There is also an allocation to each project of directly attributable core facilities costs which includes use of the sequencing facility, and the Mouse and Zebrafish facility.

The BoM considered these budgets, how they mapped onto the scientific aims and issues of scientific leadership and feasibility. They also reviewed the proportions of the budget that would be expended in the Institute’s primary core facilities: sequencing, genotyping, Information Technology and the Mouse and Zebrafish facility. This process resulted in adjustments to the overall portfolio based on scientific, budgetary and strategic considerations. With this process complete, the Faculty added flesh to the bones of their proposals and drafted the project documents which were subjected to several rounds of detailed scrutiny by the BoM. The plans were finalised in December 2009 and submitted to the Wellcome Trust and the Site Visiting Committee. Summary versions are provided in Supporting documents section 1: Scientific Programmes.

At submission, the total budget request for the Institute from the Wellcome Trust for the years 2011 to 2016 was £387m and was split as illustrated below. Subsequent to that submission two further budget lines have been added. The first is for the Institute to fund its own Technology Transfer team (as described in Supporting documents section 2) and the second is to add a budget for Sulston building upgrades (as described in Supporting documents section 7.4). The total envelope request is therefore £395m.

Flexible funds Cancer Corporate 3% Genetics and

Services Genomics Human 22% 9% Genetics 19%

Faculty and Malaria education 4% 7%

Other IT Pathogen support Mouse and Variation 11% Genome Zebrafish 8% Informatics Genetics 4% 13%

Percentage split between the different areas (at £387m)

[S-43]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents A further development since the submission is the potential for a new core facility for large scale, high throughput cell culture using technology in human induced pluripotent stem (iPS) cells (see section 3: Developments in the scientific portfolio). This technology is still in its infancy and so it is not yet clear what the size, scale and therefore resource that would be required to fund the experiment.

7.4 Premises In this five year Plan 2011-2016 we intend to operate within our existing buildings and we are not asking for more space. However space is a major issue for us and we need to manage its utilisation and refurbish, where necessary, to support the science of this coming period. One matter we are contending with is the changes in our science towards greater reliance on informatics and statistical genetics. We need to provide more informatics office space for our projects as well as refurbishing our laboratories to suit the types of ‘wet-science’ envisaged in our plans.

In our five year budget we have made provision for the refurbishment of our laboratories and work spaces. We have £1m each year for building projects in our budget and we will use this for laboratory conversions and maximising the office space we can provide for our informatics science.

However, the Sulston Laboratories and the associated West Pavilion are now fifteen years old. It is starting to fall short in providing that quality environment and image that is a hallmark of the Campus. The more recently constructed SouthField buildings provide a marked contrast in terms of architecture, accommodation and communication spaces. We are therefore suggesting a budget of £1m a year for projects to improve the external and internal look and quality of the older Institute buildings to bring them up to the standard of the SouthField buildings and preserve the building investment the Wellcome Trust has made at the Hinxton Campus.

The Wellcome Trust already makes available a ‘Lifecycle Budget’ to the Campus for the replacement of worn-out mechanical and electrical and fabric deterioration. This proposal is to complement this investment with a series of projects to modernise the look of the buildings and provide state-of-the-art accommodation that will address up-to-date considerations of providing informal communication areas, sustainable work areas and environmentally-friendly and energy efficient work places.

We will consult an architect and produce a programme to modernise and provide a face-lift for our older buildings and produce an action plan for these improvements. The areas and issues we will address are:

• Sulston building external look and image • Sulston reception and the Campus library area • Provision of informal communication areas and upgrading of meeting areas • Energy conservation ideas such as changed glazing and blinds • Modern sustainability ideas for laboratory and office spaces • Improvement of ceilings, flooring, lighting, communal areas to bring a revitalised and modern feel to our buildings.

[S-44]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents 7.5 Other funding Supplementary to the envelope funding request the Institute receives other sources of funding either from the Wellcome Trust as out-of-envelope grants or from third parties.

The out-of-envelope grants from the Wellcome Trust are for collaborative projects where the Institute is either key lead and/or is providing access to core facilities (e.g. sequencing). Currently the Institute has a number of those awards including: • 10,000 UK genome sequences: accessing the role of rare genetic variants in health and disease – a Strategic Award • Discovery of drug-sensitising genotypes in human cancer cells – a Strategic Award (Technology Transfer) • Deciphering Developmental Disorders - a Health Innovation Challenge Fund Award • Quantifying disease burden in patients with cancer using tumour-specific genomic rearrangements - a Health Innovation Challenge Fund award

It is anticipated that the opportunity to continue to be part of projects which the Wellcome Trust fund and which the Institute is part of a collaborative group will continue. However it is difficult to estimate the level of funding which will come through that source.

Third party funding has become increasingly important to the Institute as there is an expectation that some portion of its activities will be supported by funders other than the Wellcome Trust. Most of the Institute’s activities are supported by the core budget, thus we do not expect our Faculty to spend a significant portion of their time raising funds rather we advise pursuit of funding on a strategic and opportunistic basis. Ideally we prefer to have a few large awards than many smaller ones. Third party funding can support new projects that were not contemplated when the core budget was prepared and these can add to existing projects and extend the scope, leveraging the support from the Wellcome Trust.

In the current five year Plan the target for third party funding was 13% and the Institute looks set to achieve around 17%. The plan for the next five years is to increase that to 20% per annum. With the global recession and the cessation of the various stimulus packages injected by governments around the world, it is expected that the funding environment in 2011 – 2016 will be challenging. It will be important for the Institute to be able to demonstrate the unique value to funders in making major grants to the Institute for large scale science.

The chart below shows the percentage split of funding between sources from 2007 – 2016.

[S-45]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents 7.6 Sensitivity analysis The budget request is based on a number of underlying assumptions. Given that the view forward is to the year 2016, there will be some uncertainty about those assumptions.

For New Technologies in the core facilities the impact is assumed to be constant during the five years with constant improvements / and/or increased capacity at a same constant cost, i.e. each year there is more delivered for the same budget. This assumption includes the costs of sequencing and the costs and requirements of IT storage and compute.

There are also senior recruitments in progress in Human Genetics and in Mouse and Zebrafish Genetics. Both posts are strategically important. It is anticipated that the overall budget and high level strategy would not change as a result of appointments and that any shifts in emphasis within the scientific projects will not affect the total core funded resource requirements.

The budget split between types of expenditure is shown below:

Capital Flex fund Consumables Equipment Sequencing 3% 8% consumables 14% maintenance 15% 3%

Travel 1%

Premises Other Staff salaries 12% staff costs 42% 2%

7.7 Management of budget The budget responsibility will be with the same Principal Investigator (PI) that has responsibility for the management of the scientific projects. There are also budget holders for core facilities and for corporate services. The management of positions and how the headcount relates to both funding available and space available is also a responsibility of the budget PIs.

The BoM will be responsible for monitoring the actual and projected expenditure project by project and for any viring between years or aims and projects within Programmes. The BoM will also have responsibility to set annual budgets for the Institute including non-core funding and for the approval of expenditure against the flex fund.

[S-46]

Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Supporting documents The Institute introduced a new pay framework in 2006. This facilitates planning for staff resources and provides a stable context for budgeting staff costs in the future. The framework is subject to periodic review and benchmarking against external comparators to ensure the Institute remains competitive in the employment market and can continue to attract and retain talented scientists and support staff at all levels. The prevailing economic climate will continue to be a major influence on our decision-making in the area of reward, including pensions, and importantly, our budget assumptions provide the necessary tolerances to respond to external factors.

[S-47]