MRC - The

DOCTORAL TRAINING PROGRAMME

IN PRECISION MEDICINE

PROJECT PROPOSALS

March 2016

Information on the DTP in Precision Medicine: http://www.ed.ac.uk/medicine-vet-medicine/postgraduate/research- degrees/phds/precision-medicine

Apply online: http://www.ed.ac.uk/studying/postgraduate/degrees/index.php?r=site/view&id= 919

Project title:

Towards patient stratification at point of care using rapid microRNA detection Project description:

Background

Paracetamol (acetaminophen) overdose is one of the most common reasons for emergency hospital attendance and is the leading cause of acute liver failure in the Western world. Annually across the UK paracetamol overdose results in approximately 100,000 Emergency Department presentations and 50,000 acute hospital admissions. To put this workload in context, these numbers are larger than admissions due to heart attacks. The current approach to treatment is sub-optimal predominately due to an inability to stratify patients and identify those who stand to benefit from treatment. At present, too many patients are being admitted to hospital for unnecessary treatment with the antidote (NAC) because current blood tests cannot rule out liver damage in the emergency department. Therefore, there is an unmet clinical need for markers that can target treatment only to those who stand to benefit and rule out liver damage earlier than is currently possible. We have addressed these important limitations by identifying a new blood biomarker (microRNA-122 ‘miR-122’) that is more sensitive and specific for liver injury than all standard tests and accurately reports injury at first presentation to hospital, a time when current blood tests are often still in the normal range.1,2 The key roadblock to further clinical development - specifically using miR-122 in stratified clinical trials lead by our pharma partners - is the lack of a rapid point of care assay.

Aims

The proposed PhD project will expand an existing platform that uses electrochemical impedance spectroscopy (EIS) as a point of care test for liquid biopsies focussing on microRNA biomarkers.3,4 We have demonstrated that EIS detects synthetic miR-122 and reports paracetamol-induced liver injury in pigs and humans, which will be further developed in the current project including detection of wider miRNA panels. The EIS detection platform will be combined in a microfluidic device with an innovative, low complexity sample preparation and the direct, label-free detection approach will be developed during the PhD project. In the final phase of the project, the integrated assay will be tested using an unrivalled bank of samples from humans, and a range of animal samples including rodents, pigs and zebrafish.

EIS detection of miR-122 has utility beyond our primary clinical indication, paracetamol overdose. Rapid detection of hepatotoxicity has great value in early phase drug development and our application is supported by our pharma partners who are important potential end- users.

Hepatotoxicity is commonly caused by antimicrobial agents. As part of the project the student will perform RNA-Seq studies to identify microRNA markers for bacterial infection to complement our toxicity marker and allow targeted therapy. This work builds on our experience in antibiotic resistance and microRNA .

Training outcomes

The PhD student will be part of a multi-disciplinary team with world-leading expertise in molecular diagnostics, biosensors and clinical toxicology closely supported by our commercial

2 partner and the university technology transfer office. Specifically the student will learn RNA- Seq, bioinformatics analysis, microRNA manipulation and clinical study design.

References:

1. Lewis, P. J. S.; Dear, J.; Platt, V.; Simpson, K. J.; Craig, D. G. N.; Antoine, D. J.; French, N. S.; Dhaun, N.; Webb, D. J.; Costello, E. M.; Neoptolemos, J. P.; Moggs, J.; Goldring, C. E.; Park, B. K. Circulating MicroRNAs as Potential Markers of Human Drug-Induced Liver Injury. Hepatology 2011, 54 (5), 1767-1776.

2. Vliegenthart, A. D. B.; Shaffer, J. M.; Clarke, J. I.; Peeters, L. E. J.; Caporali, A.; Bateman, D. N.; Wood, D. M.; Dargan, P. I.; Craig, D. G.; Moore, J. K.; Thompson, A. I.; Henderson, N. C.; Webb, D. J.; Sharkey, J.; Antoine, D. J.; Park, B. K.; Bailey, M. A.; Lader, E.; Simpson, K. J.; Dear, J. W. Comprehensive microRNA profiling in acetaminophen toxicity identifies novel circulating biomarkers for human liver and kidney injury. Scientific Reports 2015, 5.

3. Corrigan, D. K.; Schulze, H.; Henihan, G.; Hardie, A.; Ciani, I.; Giraud, G.; Terry, J. G.; Walton, A. J.; Pethig, R.; Ghazal, P.; Crain, J.; Campbell, C. J.; Templeton, K. E.; Mount, A. R.; Bachmann, T. T. Development of a PCR-free electrochemical point of care test for clinical detection of methicillin resistant Staphylococcus aureus (MRSA). Analyst 2013, 138 (22), 6997-7005.

4. Huang, J. M.; Henihan, G.; Macdonald, D.; Michalowski, A.; Templeton, K.; Gibb, A. P.; Schulze, H.; Bachmann, T. T. Rapid Electrochemical Detection of New Delhi Metallo-beta- lactamase Genes To Enable Point-of-Care Testing of Carbapenem-Resistant Enterobacteriaceae. Anal. Chem. 2015, 87 (15), 7738-7745.

Contact email address(es):

[email protected]

[email protected]

[email protected]

Institute/Centre and/or other useful web addresses:

The University of Edinburgh, College of Medicine and Veterinary Medicine, Edinburgh Medical School – Biomedical Sciences, Division of Infection and Pathway Medicine www.research.ed.ac.uk/portal/en/persons/till-bachmann(4c731049-5ce7-4f71-9984- fb216ee36fab).html www.ed.ac.uk/pathway-medicine www.cvs.ed.ac.uk/users/james-dear www.ed.ac.uk/pathway-medicine/our-staff/staff-profiles/drkatetempleton

3

Project title:

Identification of therapeutically-relevant patient subgroups from clinical and biological data Project description:

The student appointed to this project will develop the skills to extract useful clinical and biological insights from high-dimensionality datasets. Background knowledge in biological sciences and statistics will be required. Training in computational methods and biological techniques will be provided. If successful, this project may identify new treatments and diagnostic tests that can be directly evaluated in clinical practice.

Medicine advances by identifying important similarities between patients. We treat a future patient by making a prediction based on similarity with past patients. Until recently, similarities between patients could only be identified using easily observable features, compiled into patterns in the memory of an observant clinician. Now we have the technology to record and analyse millions of patient measurements.

Many clinical syndromes are loose groupings of patients who have relatively little in common. Perhaps the best example is sepsis, a frequently fatal condition that accounts for 30% of admissions to intensive care units in the UK. Sepsis is a final common pathway from severe infection. It can be caused by infection of any organ with any of an extremely wide range of pathogens. These infections are clearly different, but because the patients are clinically similar, they are all treated as a single disease. If we could stratify patients with sepsis, we could treat them better with drugs that already exist: targeted, narrow-spectrum antibiotics that would eliminate the causative organism without destroying commensals, whilst minimising of antibiotic resistance.

The student will develop and evaluate network methods for the detection subgroups of patients sharing important biological similarities. Initially, analyses will focus on sepsis, before moving on to more generalisable analyses of clinical trial data.

Our previous work has employed sophisticated network analysis tools to detect biologically important subgroups of regulatory regions in the human (1), and clinically-distinct syndromes of acute mountain sickness(2), leading to a revision of the global consensus criteria for this condition.

We have extended this theme in unpublished work employing a novel method, exhaustive observation of network space (EONS). When applied to group of patients with various types of sepsis, for whom high-resolution biological data were available, EONS detects a clear separation of patients with sepsis caused by different types of bacteria (gram-positive bacteria vs. gram-negative). This signal was not detected by the authors of the original study(3).

In this project, the student will:

1. optimiSe and evaluate network analysis methods for detecting subgroups of patients with sepsis using existing datasets. This will be published during year 1.

2. Generate and analyse additional data from high-resolution phenotyping of confirmed bacteraemic patients in critical care. In addition to detailed clinical information, we will employ a high-resolution transcriptome sequencing methodology, CAGE, which we have recently

4 shown is able to detect cell type-specific promoters and enhancers in numerous different cell types(4), thus enabling the detection of many additional biologically-important signals in patient samples.

3. Employ network methods to detect therapeutically-important subgroups in data from clinical trials, first in permuted data, then in data from completed clinical trials with various levels of biological phenotyping.

References: 1. Forrest, A. R. R., Kawaji, H., Rehli, M., Baillie, J.K., et al. A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014).

2. Hall, D. P. et al. Network Analysis Reveals Distinct Clinical Syndromes Underlying Acute Mountain Sickness. PLoS ONE 9, e81229 (2014).

3. Tang, B. M., Huang, S. J. & McLean, A. S. Genome-wide transcription profiling of human sepsis: a systematic review. Crit. Care 14, R237 (2010).

4. Arner, E. et al. Transcribed enhancers lead waves of coordinated transcription in transitioning mammalian cells. Science 347, 1010–1014 (2015).

Contact email address(es): [email protected] [email protected]

Institute/Centre and/or other useful web addresses:

http://baillielab.net/eons/studentship (requires modern browser: chrome/firefox/safari) http://www.roslin.ed.ac.uk/ken-baillie/ http://www.cphs.mvm.ed.ac.uk/people/staffProfile.php?profile=nlone

5

Project title:

Exploring the complexity of cytoplasmic long non-coding RNA in vascular disease Project description: The vascular smooth muscle cell is essential to the maintenance of vessel wall structure and function. It is, however, intimately involved in abnormalities of blood vessel function, including the adverse remodelling associated with remodelled vascular grafts, aortic aneurysms and small vessels associated with pulmonary hypertension.

We propose that a radical new approach will be required to identify and quantify the molecular cues that control the function of the vascular smooth muscle cell. Approximately 60% of the human genome is transcribed into protein-coding transcripts, yet only 1.5% actually encodes proteins. Large numbers of RNAs which appear not to encode protein are transcribed from loci that intervene between protein-coding genes. These long noncoding RNAs (lncRNAs) may be essential in modulating gene expression networks that underpin cell function. To understand and unravel this molecular complexity requires an interdisciplinary approach using novel datasets, computational approaches and assessment of lncRNA function in cellular and whole organism experiments. In this project, we will seek to determine whether lncRNAs contribute substantially to smooth muscle cell function and whether this might be exploited for new advances in therapy.

Aims:

(1) Aim 1: Generation of key datasets to inform computational approaches. Using our established in vitro model of primary human vascular smooth muscle cells, we will predict lncRNA using RNAseq datasets (generated by Hi/Seq4000) from quiescent cells stimulated with pathogenic growth factors and cytokines which mimic the in vivo response to vascular injury. Critically, we envisage identifying lncRNA only from cytoplasmic cellular fractions because cytoplasmic lncRNA control the cellular transcriptome by modulating mRNA abundance via microRNA-mediated “sponging” (or “decoying”) interactions. (2) Aim 2: Perform transcriptional network analysis to identify causal pathways. Computationally identify lncRNA and mRNA that are both significantly correlated in expression change, and contain unexpectedly high densities of predicted shared microRNA-binding sites. The transcript abundance of such lncRNA-mRNA pairs could be modulated co-ordinately by microRNAs, thereby altering key downstream cellular processes. (3) Aim 3: Generate proof-of-concept experiments in primary human cell culture. We will use our expansive in vitro models to ascertain the expression, regulation and function of candidate lncRNA, using qRT-PCR, RNA-FISH and other techniques, potentially using CRISPR-Cas9 approaches. Biotinylated lncRNA will be used for pull downs to identify and quantify binding partners and their interaction kinetics and downstream consequences. (4)Aim 4: Validate in vitro concepts using intact vessel injury models in vivo and ex vivo. We will use mouse models (genetic/injury models) to assess functional consequence of manipulation in the intact vessel wall. (5) Aim 5: Assess relevance to human vascular pathology using material from patients with cardiovascular disease. We possess ethical approval for RNA samples from patients with and without diseases of the vasculature, and will use these for assessment of lncRNA expression. Training outcomes: The student will develop an interdisciplinary skill set, as evidenced by the complementary research of both supervisors (Tan et al. 2014, Deng et al. 2015, McDonald et al. 2015, Tan et al. 2015) involving in depth and transcriptomics, human physiology and pathology.

6

References: Deng, L., et al. (2015). "miR-143 Activation Regulates Smooth Muscle and Endothelial Cell Crosstalk in Pulmonary Arterial Hypertension." Circulation Research. 17(10), pp. 870- 883. (doi:10.1161/circresaha.115.306806).

McDonald, R. A., et al. (2015). "Reducing In-Stent Restenosis: Therapeutic Manipulation of miRNA in Vascular Remodeling and Inflammation." Journal of the American College of Cardiology 65(21): 2314-2327. ( http://www.sciencedirect.com/science/article/pii/S0735109715016496 ).

Tan, J. Y., et al. (2015). "Extensive microRNA-mediated crosstalk between lncRNAs and mRNAs in mouse embryonic stem cells." Genome Research 25(5): 655-666. ( http://genome.cshlp.org/content/25/5/655.full )

Tan, J. Y., et al. (2014). "Crosstalking noncoding RNAs contribute to cell-specific neurodegeneration in SCA7." Nature structural & Molecular Biology 21(11): 955-961. ( http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4255225/ )

Contact email address(es): Professor Andrew Baker - [email protected] Professor Chris Ponting - [email protected]

Institute/Centre and/or other useful web addresses: http://www.cvs.ed.ac.uk/users/andrew-baker (under construction) http://www.hgu.mrc.ac.uk/people/chris.ponting.html

7

Project title:

Induced TEC as a tool for repairing adaptive immunity in patients Project description:

Several primary immunodeficiencies (PID) arise from problems in development or function of the thymus, the organ responsible for T cell production throughout the lifespan. An example is DiGeorge Syndrome (DGS), one of the most common congenital abnormalities, where affected individuals can partially or wholly lack thymus function. Diminished thymus function is also a major complication in adult haematopoietic stem cell transplantation (HSCT). Proof- of-principle that thymic deficiency can be treated by thymus transplantation is well established, but currently depends on transplantation of neonatal human thymus tissue and therefore is limited by tissue availability. T cell development depends on a set of highly specialized epithelial cells found only in the thymus (thymic epithelial cells; TEC). Through previous work, we identified the forkhead transcription factor (TF), FOXN1, as a master regulator of TEC differentiation (Nowell, 2011). Recently, we showed that FOXN1 up-regulation can rejuvenate the thymus in aged animals (Bredenkamp 2014a). We also showed that forced expression of FOXN1 in a non-thymic cell- type converts these cells into functional TEC (Bredenkamp 2014b). The ‘induced TEC’ (iTEC) can support T cell development in vitro and, on transplantation, generate a properly organized and fully functional thymus that can populate the immune system with functional T cells (Bredenkamp, 2014b). By providing readily available, transplantable TEC, iTEC may therefore overcome the current barriers to broader clinical use of thymus transplantation.

The aim of this PhD project is to determine whether iTEC-based thymus-transplantation can be used treat thymus-dependent primary immunodeficiency. You will test whether transplantation of autologous iTEC-derived thymi can repair the immune defect in a mouse model of DGS, by investigating the quality and functionality of the T cell repertoires produced, and the stability and longevity of the iTEC-derived thymic transplants. You will also use genome editing in conjunction with iTEC to test whether gene corrected mouse and/or human DGS iPS cells can generate functional iTEC upon transplantation. If time permits, you will also explore clinically compatible methods for iTEC generation, including the development and use of synthetic TF as reprogramming tools.

You will be trained in stem cell biology and relevant aspects of cellular immunology. You will develop skills and receive training in analysis of cell phenotype and function, including generation and analysis/visualization of RNAseq and ChIPseq data; cellular reprogramming techniques; genome editing using CRISPR/CAS9; analysis of immune system functionality; and advanced imaging, advanced flow cytometry, and gene expression analysis by RT-qPCR. All of the required approaches are available within the Blackburn lab or through one of the secondary supervisors. The student will also benefit from training and exchange opportunities available through the EU-funded consortium ThymiStem (www.thymistem.org), and in the CRM and the wider University, and from close collaboration with Prof SE Jacobsen at the Karolinska Institute, Stockholm.

8

References:

Ulyanchenko, S., O’Neill, K.E., Medley, T., Farley, A.M., Vaidya, H.J., Cook, A.M., Blair, N.F., and Blackburn, C.C. (2016) Identification of a bipotent epithelial progenitor population in the adult thymus. Cell Reports in press Bredenkamp, N., Jin, X., Liu, D., O'Neill, K.E., Manley, N.R., Blackburn, C.C. (2015) Construction of a functional thymic microenvironment from pluripotent stem cells for the induction of central tolerance. Regen Med. 10:317-29 PMID: 25933240 Regeneration of the aged thymus by a single transcription factor. Bredenkamp, N., Nowell, C.S., and Blackburn, C.C. (2014a) Development. 141, 1627-1637 PMID: 24715454 An organized and functional thymus generated from FOXN1-reprogrammed fibroblasts. Bredenkamp, N., Ulyanchenko, S., O’Neill, K.E., Manley, N.R., Vaidya, H.J. and Blackburn, C.C. (2014b) Nature Cell Biology, 16 902-8. PMID: 25150981 Dynamics of thymus organogenesis and colonization in early human development. Farley, A.M., Morris, L.X., Vroegindeweij, E., Depreter, M.L.G, Vaidya, H., Stenhouse, F.H., Tomlinson, S.R, Anderson, R.A., Cupedo, T, Cornelissen, J, and Blackburn, C.C. (2013) Development 140 2015- 26 PMID: 23571219 Foxn1 regulates lineage progression in cortical and medullary thymic epithelial cells but is dispensable for medullary sublineage divergence. Nowell, C.S., Bredenkamp, N., Tetélin, S., Jin,X., Tischner, C., Vaidya, H., Sheridan, J.M., Stenhouse, F.H., Heussen, R., Smith, A.J.H., and Blackburn, C.C. (2011) PLOS Genetics 7:e1002348 PMID: 22072979 Microenvironmental reprogramming of thymic epithelial cells to skin multipotent stem cells. Bonfanti, P., Claudinot, S., Amici, A.W., Farley, A., Blackburn, C.C. and Barrandon, Y. (2010) Nature, 466 978-982. PMID: 20725041

Contact email address(es): [email protected]

Institute/Centre and/or other useful web addresses: http://www.crm.ed.ac.uk/research/group/thymus-generation-and-regeneration

9

Project title:

Assessment of clinical discriminatory performance of CRC pathological staging and related uncertainty using omic data and Bayesian methodology Project description:

Colorectal cancer (CRC) is the second most common cause of cancer death in UK. Whilst pathological staging stratifies prognostic groups, it does not perform well in categorising poor/ good prognosis tumours at individual level, and the genetic predisposition to CRC prognosis has not been extensively investigated. In this project we will combine data from the UK Biobank and Scottish Colorectal Cancer Study (SOCCS), and will fit robust statistical models to investigate the association of genotype with mortality, adjusting for age, gender and stage of cancer and accounting for the tumour MSI status. The clinical variation within pathological staging groups with respect to prognostic categorization and chemotherapy administration at individual level will also be addressed. We aim to appraise the clinical usefulness of identified genetic, socio-demographic and biomarker risk factors in discriminating high and low risk tumours within a given stage, and also help inform the decision of chemotherapy administration, given side effects and cost implications. We will assess statistical measures related to predictive ability and prognostic accuracy using methodology relying on posterior predictive checking and latent residual processes, under a Bayesian statistical approach that allows for the uncertainty associated with predictions to be quantified. The application of such methodology in the context of this project is novel and at the forefront of current practice.

The project will be funded by a DTP in Precision Medicine (http://www.ed.ac.uk/medicine-vet- medicine/news-events/latest-news/mrc-award-for-precision-medicine-doctoral-training) and will be supervised jointly by the Usher Institute of Population Health Sciences and Informatics, University of Edinburgh and the School of Mathematical and Computer Sciences, Heriot-Watt University.

Training outcomes

The student will be trained in the core disciplines of informatics, data science, mathematics and statistics placed in the CRC context and will develop skills related to novel analytical and quantitative methods underpinning modern data-driven approaches. They will also get further cross-disciplinary biomedical science training to determine knowledge value.

References:

1. Dunlop MG, Dobbins SE, et al. Common variation at 6p21 (CDKN1A), 11q13.4 (POLD3) and Xp22.2 influences colorectal cancer risk. Nature Genetics, 2012; 44: 770-776 (PMID: 22634755) 2. Phipps AI, Passarelli MN et al. 2015. Common Genetic Variation and Survival after Colorectal Cancer Diagnosis: A Genome-Wide Analysis. Carcinogenesis. 2016; 37(1):87-95 (PMID: 26586795) 3. Smith CG, Fisher D, et al. Analyses of 7,635 Patients with Colorectal Cancer Using Independent Training and Validation Cohorts Show That rs9929218 in CDH1 Is a Prognostic Marker of Survival. Clin Cancer Res. 2015;21: 3453-61 (PMID: 25873087) 4. Zgaga L*, Theodoratou E*, et al. Plasma Vitamin D Concentration Influences Survival Outcome Following a Diagnosis of Colorectal Cancer. Journal of Clinical Oncology, 2014; 32(23):2430-9 (PMID: 25002714)

10

5. Lau, M.S.Y., Marion, G., Streftaris, G. and Gibson, G.J. New model diagnostics for spatio- temporal systems in epidemiology and ecology. Journal of the Royal Society Interface, 2014; 11: 20131093 (DOI 10.1098/rsif.2013.1093)

Contact email address(es): [email protected] [email protected] [email protected]

Institute/Centre and/or other useful web addresses:

Usher Institute of Population Health Sciences and Informatics, University of Edinburgh Institute of Genetics and Molecular Medicine, University of Edinburgh School of Mathematical and Computer Sciences, Heriot-Watt University

11

Project title:

Single cell heterogeneity in tumour suppression and ageing Project description:

Why do cells age, and not live for ever? We are seeking an enthusiastic student who would use the latest single cell genomic and epigenomic approaches to address this question.

Cellular ageing is a heterogeneous field of research, drawing insights from diverse model systems such as cells isolated from older individuals or from prematurely ageing (progeroid) patients. Other approaches are based on the observation that cells isolated from healthy young individuals stop dividing after a certain number of cell divisions (replicative senescence) or after exposing the cells to certain cellular stresses (stress-induced senescence, which also plays a key role in tumour suppression). Many rivalling theories exist on the cause of ageing and many scientists today agree on the notion that there is no single cause of age-related phenotype. One proposed mechanism of ageing is a gradual loss of epigenetic control and maintenance (for example through a reduction in maintenance methylation). A stochastic loss of epigenetic maintenance would create a more heterogeneous cell population, leading in some cells to a loss of functionality. While the loss of epigenetic control is an attractive concept it has been very difficult to address before the onset of high-throughput single cell technologies.

This project will test this hypothesis applying state of the art single cell genomic and epigenomic profiling comparing the heterogeneity of daughter cells from isolated stem cells and fibroblasts. We will contrast cells isolated from young and old individuals, premature ageing syndromes and cellular stress, such as oncogene activation. We will also test the effect of modulating epigenetic control and nuclear landscape on cellular heterogeneity using small molecules and transduced RNA interference. Finally, we will attempt to rescue the loss of epigenetic control in these cellular ageing models through epigenetic interventions. The computational analysis of the data will be jointly supervised by Professor Chris Ponting.

Since this project involves experimental as well as computational genomics, we would like to encourage students with quantitative or biological backgrounds to apply for the position.

References:

Chandra T et al. Global Reorganization of the Nuclear Landscape in Senescent Cells. Cell Rep 2015; 10:471–83.

Chandra T et al. Independence of repressive histone marks and compaction during senescent heterochromatic layer formation. Mol Cell 2012; 47:203–14.

Macaulay IC et al. G&T-seq: parallel sequencing of single-cell and transcriptomes. Nat Methods 2015; 12(6):519-22

12

Lay summary of our previous work

Contact email address(es) (hyperlinked):

Dr Tamir Chandra

Institute/Centre and/or other useful web addresses:

MRC Human Genetics Unit

Institute for Genetics and Molecular Medicine

13

Project title:

Precision medicine approaches to prevent colorectal cancer: Characterising the molecular determinants of vitamin D effects on chromatin accessibility, transcription factor binding and the transcriptomic landscape of human large bowel mucosa.

Project description:

Colorectal cancer (CRC) is a common fatal cancer (41,000 cases/yr in UK). Compelling evidence indicates that CRC is largely preventable. We seek to fully elucidate the underlying genetic and environmental basis of the disease, with the ultimate aim of prevention through modifying causally implicated molecular pathways.

Employing genomewide association (GWAS) approaches, we identified 35 common genetic risk loci that each impart modest CRC risk. Tagging SNPs lie almost exclusively outwith coding regions, many affect genomic control regions, impacting on gene expression in large bowel mucosa. These eQTLs are likely influenced by many factors, including transcription factor binding, dynamics of chromatin accessibility and differential methylation, but also by effects mediated through the vitamin D receptor (VDR), a ligand-activated transcription factor. We have also shown that vitamin D deficiency is strongly associated with CRC risk2. Furthermore, in epidemiological studies, we have shown that vitamin D interacts with DNA sequence variants to modify CRC risk, as well as cancer survival3. We have shown that calcitriol (active vitamin D) modifies gene expression in large bowel epithelial organoids, whilst published in vitro ChIPseq experiments have identified hundreds of VDR response elements influenced by calcitriol4.

This project will investigate the relationship of vitamin D and genomic control elements and apply “precision medicine” principles to define DNA sequence and/or transcriptomic features in human large bowel epithelium that impart vitamin D responsiveness. The project offers cross-cutting training in computational genomics analytical skills, linked with opportunity for wet-lab experience in a vibrant, well-funded, laboratory environment.

The lab is focused on understanding the molecular basis of CRC and intervening to modify risk. We have assembled an unparalleled human sample resource and affiliated datasets, including gene expression datasets from >500 large bowel samples (array-based and total RNAseq), GWAS and exome genotyping array data, whole genome sequencing, ATACseq from large bowel epithelium and cell lines, ChIP and ChIPseq. This is linked to ongoing human large bowel epithelial organoid culture experiments that enable direct testing of the validity of any genomic identified by data analysis approaches.

Specific aims

 To interrogate whole genome sequencing and genotyping array data and describe (cis- and trans-) genomic control regions influenced by vitamin D (vitD-eQTL) in human large bowel epithelium, in relation to CRC risk loci.  To define the mechanism underlying vitD-eQTL effects through analysis of matched plasma/blood DNA/normal mucosa samples from cancer subjects and cancer-free subjects and relate this to expression data (array and RNAseq) and chromatin accessibility (ATACseq) data.  To describe and calibrate the molecular signature of responsiveness to the effects of vitamin D on gene expression and validate these associations through analysis of human

14

rectal biopsy samples collected for a vitamin D intervention study (before/after treatment) using gene expression as readout. The student will be trained in computational genomics, analytical techniques and statistical

methodologies required for genome data science. He/she will also be exposed to wet-lab biomedical science training.

The successful candidate is expected to have a 1st or 2:1 BSc degree (or equivalent). An MSc in biomedical sciences/bioinformatics would be preferred.

References:

1. Dunlop MG, et al. Common variation at 6p21 (CDKN1A), 11q13.4 (POLD3) and Xp22.2 influences colorectal cancer risk. Nature Genetics, 2012; 44: 770-776. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4747430/

2. Zgaga L, et al, and Dunlop MG. Plasma Vitamin D Concentration Influences Survival Outcome Following a Diagnosis of Colorectal Cancer. Journal of Clinical Oncology, 2014; 32(23):2430- 9. http://jco.ascopubs.org/content/32/23/2430.long

3. Zgaga L, et al. Model selection approach suggests causal association between 25- hydroxyvitamin D and colorectal cancer. (2013) PLoS One 8:e63475. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3663843/

4. Ramagopalan SV et al. A ChIP-seq defined genome-wide map of vitamin D receptor binding: associations with disease and evolution. Genome Res. 2010 Oct;20(10):1352-60. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2945184/

Contact email address(es): Prof Malcolm Dunlop - [email protected] Professor Chris Ponting - [email protected] Dr Martin Taylor - [email protected]

Institute/Centre and/or other useful web addresses: http://www.hgu.mrc.ac.uk/people/m.dunlop.html http://www.hgu.mrc.ac.uk/people/chris.ponting.html http://www.hgu.mrc.ac.uk/people/taylor.html https://www.hgu.mrc.ac.uk/ http://www.igmm.ac.uk/

15

Project title:

Mining phenotypic data to aid genetic diagnoses in developmental disorders

Project description:

Deciphering Developmental Disorders (DDD) is a Wellcome Trust and National Institute for Health Research (NIHR) joint funded project, which is the largest of its kind, and aims to identify causative genetic variants in children with currently undiagnosed severe or extreme developmental disorders. Samples from ~14,000 probands and both parents have been collected from clinical genetics services throughout the UK and Ireland. Associated with each of the affected individuals is detailed clinical information which includes ontologically-coded clinical terms and quantitative information on growth and developmental milestones. Validated methods have been developed to filter genomic variants in order to enrich for causative mutations [1, 2]. However the next challenge is to incorporate the rich phenotypic data into the computational approaches to finding new disease loci [3]. This MRC funded Doctoral Training Programme (DTP) in Precision Medicine project will have access to the unique data resources of DDD, including genomic data, quantitative and terminologically-encoded phenotypic data, and, potentially, data on probands' facial features from image processing. Machine learning methods are increasingly being incorporated into data processing pipelines for the interpretation of such complex genomic data sets. Challenges to be addressed here include defining features that capture the detail of the genetic and other data appropriately, and the selection and combination of learning algorithms. The supervisors of this project have substantial experience in optimizing machine learning approaches to integrative modeling in genomics [4].

Our aim is to identify robust systems that can detect distinctive patterns of clinical parameters that can predict a molecularly defined diagnosis. Distance estimates can then be used to rank candidate variants from a genome wide sequence analysis from an individual on the basis of the available phenotypic information. Initial analyses will utilise advanced statistical and machine learning algorithms for regression and classification to deepen our understanding of the links between mutations affecting open reading frames (single nucleotide variants (SNVs), larger structural variants) and phenotype. This will then be extended to include putative regulatory variants involving noncoding regions. We will also assess the power of analytical models to predict disease associated genotypes (particularly de novo variants) using phenotypic data prior to sequence data generation; this approach will have particular utility in “proband only” sequencing strategies that are commonly used in clinical laboratories and hence will have wide applicability.

The student will learn and apply concepts from probability and information theory. This will inform their use of existing statistical packages in R, and allow the development of novel analyses. The application of advanced data processing techniques in a medical informatic context will provide the student with a diverse range of transferrable skills. Computational approaches to understanding and predicting candidate disease causing variants is likely to become an important component in genetic diagnostics.

16

References:

1. Large-scale discovery of novel genetic causes of developmental disorders. Nature, 519

(2015), 223-228. PMID: 25533962 2. Wright CF, Fitzgerald TW, Jones WD et al. Genetic diagnosis of developmental disorders in the DDD study: a scalable analysis of genome-wide research data. Lancet, 385 (2015), 1305-1314. PMID: 25529582

3. Akawi N, McRae J, Ansari M et al. Discovery of four recessive developmental disorders using probabilistic genotype and phenotype matching among 4,125 families. Nat Genet, 47 (2015), 1363-1369. PMID: 26437029

4. Moore BL, Aitken S, Semple CA. Integrative modeling reveals the principles of multi-scale chromatin boundary formation in human nuclear organization. Genome Biol, 16 (2015), 110. PMID: 26013771

Contact email address(es):

Prof David FitzPatrick: [email protected]

Dr Stuart Aitken [email protected]

Prof Colin Semple [email protected]

Institute/Centre and/or other useful web addresses:

Website: http://www.hgu.mrc.ac.uk/Users/

http://www.ddduk.org

17

Project title:

Role of the Intestinal Microbiota in Clostridium difficile Infection and Recurrence Project description:

Over the last few years, our knowledge of the diversity of microorganisms that live within and on the human body (the human microbiome) has expanded enormously. The intestinal microbiome is especially abundant and diverse and contributes enormously to gut health, as well as being crucial to immune system development and that of other organs such as the brain. Conversely dysbiosis, a disrupted, generally lower-diversity state of the intestinal microbiota, is associated with bacterial infection, conditions such as inflammatory bowel disease, and the usage of broad-spectrum antibiotics.

Our ever-increasing understanding of how the intestinal microbiome varies between individuals, with age and with diet should allow the design of personalised microbiome-based therapies to prevent or treat dysbiosis with various causes. However, this requires specific knowledge of how variations in the microbiota affect the predisposition to, or outcome of treatment for, dysbiosis. Using patient samples and data from the NHS Lothian Regional Infectious Diseases Unit (RIDU), together with state-of-the-art metataxonomic and metagenomic analysis techniques and bioinformatics, this project seeks to acquire such knowledge for a major healthcare-associated dysbiosis, Clostridium difficile infection (CDI).

During this project, the student will:

• Obtain faecal samples from NHS patients across the adult age range comprising healthy aged-matched controls, C. difficile carriers without disease, patients with C. difficile active disease and patients with C. difficile relapsing disease.

• Use metataxonomic profiling of the microbiota within these samples to determine changes in the microbial species composition associated with CDI, CDI relapse and non-pathological carriage of C. difficile in comparison to healthy controls.

• Monitor changes in the intestinal microbiota of individual patients during active CDI, treatment and CDI relapse, and to assess the role of age-related microbiota changes in the risk of recurrence (which increases by 17% per decade of age).

• Relate the success of faecal transplantation to restore normal intestinal microbiota in CDI patients to pre-exisiting microbiota composition, age and other factors.

• Perform targeted, high-resolution metagenomic analysis of the intestinal microbiota of patients with characteristics of particular interest (e.g. repeated recurrence, carriage of C. difficile without disease) to determine specific genetic markers which can be used in preventative diagnosis or treatment.

• Develop a set of rules for the eventual patient-specific diagnosis (via microbiome profiling), prevention and treatment of CDI using knowledge of the predictive or consequent changes in intestinal microbiota composition obtained during the project.

As well as receiving training in molecular biology and microbiology techniques, the student will learn to use standard bioinformatics pipelines to analyse next-generation Illumina sequence data from metataxonomic and metagenomic studies, in addition to statistical techniques to analyse the outputs of these pipelines. S/he will also modify and apply in-house Python scripts used in the Free lab to predict functional characteristics of the microbiota from metataxonomic data, as well as employing techniques to mine metagenomic datasets for functional

18 information. Interactions with both clinicians (NHS Lothian) and mathematical modellers developing general models of microbial functional interaction and specific functional models of the human gut (School of Physics, BioSS) will help develop the student’s skill set further.

References:

• Glendinning, L. and Free, A. (2014) Supra-organismal interactions in the human intestine. Front Cell Infect Microbiol 4: 47. doi 10.3389/fcimb.2014.00047. • Buffie, C. G. et al. (2012) Profound alterations of intestinal microbiota following a single dose of clindamycin results in sustained susceptibility to Clostridium difficile-induced colitis. Infect Immun 80: 62-73. • Buffie, C.D. et al. (2015) Precision microbiome reconstitution restores bile acid mediated resistance to Clostridium difficile. Nature 517: 205-208. • Theriot, C.M. and Young, V.B. (2015) Interactions between the gastrointestinal microbiome and Clostridium difficile. Annu Rev Microbiol 69: 445-461. Contact email address(es): Dr. Andrew Free ([email protected]), School of Biological Sciences, University of Edinburgh (primary supervisor) Dr. Olga Moncayo ([email protected]), NHS Lothian Laboratory of Medicine and MRC Centre for Inflammation Research, University of Edinburgh (secondary supervisor) Institute/Centre and/or other useful web addresses:

Free group web page. MRC Centre for Inflammation Research.

19

Project title:

Predictive immune-pathway biology: investigation of the systemic response to infection and sterile inflammation in early life

Project description:

Infection remains an important cause of morbidity and mortality, especially in early life (1). Natural defenses to infection are mediated by intrinsic/innate and adaptive immune responses. While our understanding is considerable it is incomplete and new emerging areas of research such as those related to the immune-metabolic axis are only beginning to be appreciated (2,3). Much of this new insight has been accelerated recently from studies that combine and apply in depth hypothesis driven experimental approaches with well designed unbiased systems level investigations.

Understanding how neonates and infants respond to infection from a molecular (metabolic and immune aspects) has great future potential and clinical use, by developing host-directed strategies to not only understand but also to diagnose and treat infectious and related inflammatory diseases (1). Such host-directed diagnostic and anti-infective strategies will enable a significant enhancement in clinical precision for diagnosing and ultimately treating infections, and thereby help reduce the non-beneficial and adverse consequences for antibiotic over use.

This studentship project combines world-leading expertise in immune pathway biology and of infection in humans with groundbreaking work unraveling the beneficial and harmful role of immune-metabolism and molecular profiling of whole blood from patients (Ghazal, Edinburgh Medical School), and with computational statistics and machine learning expertise with pioneering studies in Bayesian modeling of biological pathways (Husmeier, Mathematics and Statistics at Glasgow).

Aims

This studentship will develop systematic review and analyses of in house databases (over 2000 patient profiles – previously funded by grants from the MRC and CSO to Ghazal) and community resources of clinical molecular and metabolic profiling of infants and children with infectious and sterile inflammation. State of the art methods from Machine Learning and Computational Statistics will be used to infer the signaling, metabolic and regulatory pathways linked to bacterial, viral or sterile inflammatory response by systematic integration of data and prior knowledge from the literature. This will be used to further experimentally test and validate and will form part of the iterative precision medicine approach. Finally this study will explore how these molecular (metabolic and transcriptomic) responses may impact on restricting the provision of antibiotics to a patient.

Training outcomes

The student will learn how to combine hypothesis driven research questions with data-driven computer-intensive inference. Expertise will be gained in acquiring, annotating, data-cleaning and processing of human genomic (transcriptomic) and metabolic based data. The studies will develop an in-depth understanding of the underlying data structure and characteristics of biological/clinical data with attendant quality assessment. The student will also gain a deeper understanding of state-of-the-art computational inference in nonlinear multivariate statistics, and how to balance predictive accuracy versus computational efficiency within a post-genomic

20 application context. Their studies will form a strong foundation for enhancing precision in diagnosis and treatment of infection and inflammatory diseases.

References:

1. Ghazal, P; P. Dickinson and CL. Smith. Early life response to infection. Current Opinion in Infectious Diseases. 26: 213-218. 2013 2. Smith, CL; Dickinson, P; Forster, T; Craigon, M; Ross, AJ; Khondoker, MR; France, R; Ivens, A; Lynn, DJ; Orme, J; Jackson, A; Lacaze, P; Flanagan, KL; Stenson, BJ; Ghazal, P. Identification of a human neonatal immune-metabolic network associated with bacterial infection. Nature Commun. 5:4649 doi: 10.1038/ncomms5649 (2014). 3. Blanc, M; Hsieh, WY; Robertson, KA; Kropp, KA; Forster, T; Shui, G; Lacaze, P; Watterson, S; Griffiths, SJ; Spann, NJ; Meljon, A; Talbot, S; Krishnan, K; Covey, DF; Wenk, MR; Craigon, M; Ruzsics, Z; Haas, J; Angulo, A; Griffiths, WJ; Glass, CK; Wang, Y; Ghazal, P. The Transcription Factor STAT-1 Couples Macrophage Synthesis of 25- Hydroxycholesterol to the Interferon Antiviral Response. Immunity. 38: 106–118. 2013 4. Grzegorczyk, M., and Husmeier, D. (2013) Regularization of non-homogeneous dynamic Bayesian networks with global information-coupling based on hierarchical Bayesian models. Machine Learning, 91(1), pp. 105-154

Contact email address(es): [email protected]

Institute/Centre and/or other useful web addresses: http://www.ed.ac.uk/pathway-medicine http://www.research.ed.ac.uk/portal/en/organisations/division-of-infection-and-pathway- medicine(f9a4c808-718c-49b9-bc07-e26086ca8d14)/publications.html

21

Project title:

Identification and characterisation of 3D structural variants linked to human disease

Project description: This is an imaginative multi-disciplinary PhD project based at the Institute of Genetics and Molecular Medicine (IGMM) and the School of Physics at the University of Edinburgh. The project is highly computational and will suit a candidate with a strong background in mathematics, physics or a related area with an interest in biology.

Genome wide association studies (GWAS) have identified numerous genomic loci linked to human diseases. However, DNA variants that help explain disease susceptibility mostly tag long-range regulatory elements, such as transcription factor (TF) binding sites, rather than protein-coding changes (Buckle et al., 2015). In parallel, epigenetic variants have been reported (McDaniell et al., 2010) across large individual cohorts indicating that GWAS hits could also tag variant epigenetic features or even contribute to inherent epigenetic variation by affecting TF binding. We recently developed a thermodynamics-based polymer model to predict chromatin structure enabling us to generate 3D models of individual genomic loci. This software uses 1D epigenetic and bioinformatic data such as chromatin accessibility, ChIP-seq data for chromatin-binding architectural proteins such as CTCF, and predicts 3D structures.

Precision medicine needs to take into account individual variability; this is already well understood at the level of genetics but the role of epigenetic and 3D structural variation and how it can impact on disease treatment and prevention is poorly understood. We hypothesise that many hits from human disease GWAS will tag genomic loci that encode epigenetic variants (e.g. TFs or architectural proteins) that give rise to an altered 3D chromatin landscape and influence gene expression. To test this hypothesis we have developed an exciting PhD project that brings together comprehensive bioinformatics, computer simulations and experimental validation to identify and characterise genomic loci that show 3D chromatin architecture variation and enable a new stratification of clinical samples that can better inform about disease predisposition and treatment.

The PhD student will mine available and novel data sets corresponding to SNPs of biomedical relevance, and analyse how these correlate with histone modifications, chromatin accessibility and ChIP-seq tracks. These data will then be used as an input for computer simulations of chromatin structure, to be performed using software developed in Physics (Brackley and Marenduzzo, see Brackley et al, 2015). The student will further develop this model to exploit its predicted power to survey which SNPs (either individually or in combination) either tag epigenetic variants or are themselves likely to disrupt 3D chromatin architecture (e.g., because they have compromised accessibility or the binding of specific architectural proteins). These simulations will allow a fast and efficient scan of biologically relevant SNPs to identify 3D chromatin structure variants that can be used to stratify different sample cohorts and promising variants will be prioritized for further study.

The successful candidate will develop a deep understanding of the quantitative skills required to mine complex genetic and epigenetic datasets, analysis of epigenetic data, and thermodynamic-based polymer modelling to predict 3D chromatin architecture. Computational training will be combined with experience in laboratory approaches using imaging and molecular biology techniques to validate their findings.

References:

22

C. A. Brackley et al., Predicting the three-dimensional folding of cis-regulatory regions in mammalian genomes using bioinformatic data and polymer models, under review at Genome Biology (2016). C. A. Brackley, J. Johnson, S. Kelly, P. R. Cook, D. Marenduzzo Binding of bivalent transcription factors to active and inactive regions folds human chromosomes into loops, rosettes and domains, arXiv:1511.01848 (2015). McDaniell R, Lee BK, Song L, Liu Z, Boyle AP, Erdos MR, Scott LJ, Morken MA, Kucera KS, Battenhouse A, Keefe D, Collins FS, Willard HF, Lieb JD, Furey TS, Crawford GE, Iyer VR, Birney E. Heritable individual-specific and allele-specific chromatin signatures in humans. Science. 2010 Apr 9;328(5975):235-9 Buckle, A., van Heyningen, V., Kleinjan, D-J., Gilbert, N. Putative common variants associated with BMI are linked to novel pancreatic enhancers within the PAX6 regulatory domain, under review (2016)

Contact email address(es):

[email protected] [email protected] [email protected]

Institute/Centre and/or other useful web addresses: www.chromatinlab.org www.hgu.mrc.ac.uk http://www2.ph.ed.ac.uk/~dmarendu/

23

Project title:

Leveraging proteomic data to identify loci underlying complex disease Project description:

The aim of this project is to use health-associated proteins and traits such as and growth to gain insight into the mechanisms that control health status. The student will combine data on genetics, levels of health-related proteins and health traits from up to 30000 individuals to build pathways connecting genes and their expression with measures of phenotypic variation related to disease. This will aid in the identification of the genetic variation underlying disease and enlighten disease biology.

Background:

Genome-wide studies of complex traits and diseases such as growth and obesity have identified many single-nucleotide polymorphisms (SNPs) associated with complex traits and diseases. Although a number of these point to potentially causative loci, only for a very few SNPs has this association been converted into a causal link and for even fewer has a causative DNA variant been identified. One potentially fruitful route to fill the gap between genotype and high- level (i.e. complex) phenotype is to build the biological pathway between these points using data on intermediate (potentially less complex) phenotypes such as gene expression and protein or metabolite levels. Subsequently it is possible to utilize the principles of Mendelian randomization to explore potentially causal relationships between DNA variation, intermediate phenotypes and final phenotypic consequence.

More information:

This project will utilize data from up to 30,000 individuals from mainland Scotland, the Northern Isles and Croatia. These samples have high-density SNP coverage and over 200 traits recorded per individual including a set related to obesity (e.g. weight, BMI, BIA Fat, etc.). We have recently measured circulating levels of circa 270 proteins that relate to cardiovascular disease and inflammation in 2000 individuals from the Northern Isles and Croatia cohorts using OLink technology1. Such intermediate variants have been demonstrated to often have higher heritabilities and stronger influences for individual SNPs than higher level phenotypes2 and to provide a potential pathway from DNA sequence to phenotypic consequence3. This project will test this approach for development of such pathways in our own populations.

The project will suit a student with an interest in harnessing computational skills to understand the underlying biology of complex trait variation and disease, with a focus on obesity-related traits. Candidates with a background in statistical modelling, informatics or mathematics or physics with an interest in applying their skills to biological problems as well as those coming from a biological background with strong numerical skills are encouraged to apply. The student will gain understanding in the genetics of complex traits and diseases and genetic approaches to analysis of such traits. In addition they will develop knowledge of large scale genetic data analysis (association analyses, heritability estimation) and gain computational skills including programming and use of major computational packages (PLINK, GenABEL, GCTA, etc.) and experience with high-performance and parallel computation. A good student would be able to publish several high-impact papers from these studies and will have gained knowledge and experience making them very employable in research or commercial laboratories world-wide.

24

References: 1Assarsson et al. (2014) PLoS ONE 9): e95192. doi:10.1371/journal.pone.0095192 2Enroth, S. et al. (2014) Nat. Commun. 5:4684 doi: 10.1038/ncomms5684 (2014). 3Shin, S-Y et al. (2013) Nature Genetics 46, 543–550 (2014) doi:10.1038/ng.2982

Contact email address(es): [email protected] [email protected] [email protected]

Institute/Centre and/or other useful web addresses: http://www.hgu.mrc.ac.uk/people/c.haley.html http://ki.se/en/people/xiash http://www.shen.se/homepage/Startpage.html

25

Project title:

A single-cell sequencing approach to identify new therapeutic targets to drive liver regeneration Project description:

Background:

Liver disease is a major cause of morbidity and mortality worldwide. Transplantation remains the only effective treatment for end-stage liver disease. Nevertheless, limited donor organ availability and the high cost and morbidity of transplantation mean that more effective pro- regenerative therapies are urgently required.

The liver has a unique ability to regenerate, and hepatocyte replication is key to this process as hepatocytes comprise 80% of the liver mass. Hepatocytes are unusual cells in that they display different levels of ploidy (the number of sets of chromosomes in a cell) varying from 2N,4N,8N and even up to 16N, and can also be mono-nucleate or bi-nucleate. However the mechanisms controlling liver regeneration remain poorly understood, and in particular we still don’t understand why certain hepatocytes, and not others, enter the cell cycle and replicate, and what signals and factors within hepatocytes regulate both the commencement and conclusion of the hepatic regenerative response. Therefore, by deepening our understanding of hepatocyte biology during liver regeneration, we seek to identify new, rational therapeutic strategies to harness and promote the intrinsic properties of the native hepatocyte regenerative response, thereby negating the need for liver transplantation.

Current dogma suggests that hepatocytes are a highly homogeneous population of cells with very similar function. Nevertheless, why they then exhibit variable ploidy and nuclear number is unknown. This project will therefore investigate first whether hepatocytes display functional heterogeneity, and then how ploidy level, nuclear number and chromosomal copy number regulate hepatocyte biology including RNA transcript abundance. This project will employ a broad range of cutting-edge techniques, spanning both ‘wet lab’ and ‘dry lab’ approaches to answer these complex and important fundamental biological questions.

Aims:

1. Cell sorting of single mouse and human hepatocytes by ploidy level and degree of nuclearity from both uninjured and regenerating mouse and human liver. 2. Generation of sequencing libraries for DNA and RNA (using G&TSeq [Nat Methods. 2015 12:519-22]) from the single hepatocytes generated in Aim 1. 3. Mapping, analysis and interpretation of short sequencing reads to determine the DNA chromosomal copy number and to estimate the RNA transcript abundance for 2N or polyploidy genomes and mononucleate or multinucleate hepatocytes.

Training will be provided in a broad range of cutting-edge lab and computational techniques:

1) Mouse models of liver regeneration – 2/3 partial hepatectomy. BL6 mice can regenerate their entire liver mass following 2/3 partial hepatectomy in 5-7 days. This is a standard murine model of hepatic regeneration and routinely performed in the Henderson lab. 2) Isolation of primary hepatocytes from mouse and human liver using flow cytometry assisted cell sorting to segregate hepatocytes by ploidy level and also nuclearity. 3) Training in state-of-the-art computational and statistical tools to allow the analysis and interpretation of single cell transcriptomic and genomic data (Ponting lab).

26

References: Macaulay IC et al. G&T-Seq: parallel sequencing of single-cell genomes and transcriptomes. Nature Methods 2015 Jun;12(6):519-22.

Contact email address(es): [email protected] [email protected]

Institute/Centre and/or other useful web addresses: http://www.cir.ed.ac.uk/investigator/professor-neil-henderson http://www.hgu.mrc.ac.uk/people/chris.ponting.html

27

Project title:

Genetics of Hair Colour and Associated Traits Project description:

Hair colour is one of the most striking human physical characteristics. The inheritance of hair colour has always been interesting to geneticists, but surprisingly little is known about it (1). Some time ago we identified variants in the MC1R gene that are associated with red hair (2), but in most cases these alone do not account for red hair. The genetics of blonde hair is understood even less. There are also many anecdotes, mostly probably not true, associating hair colour, particularly red hair, with all kinds of traits or medical conditions. This project will use the UK Biobank, a cohort of 500,000 individuals for whom a lot of medical and other data has been collected, included DNA and genotypes. There are more than 22,000 redheads and 56,000 blondes in the cohort on which we can perform genome-wide association studies. This is by far the largest population ever collected in which hair colour genetics can be studied. Preliminary analysis on a subset of 150,000 UK Biobank participants has found several genes which are candidates for interacting with MC1R to produce red hair. In addition we have identified new variants near to MC1R that are likely to affect the expression of the gene. We have also found over 20 loci which contribute to blonde hair. The project will extend the analysis to the whole Biobank population and will identify likely causative genetic changes that result in hair colour variation. The best candidates will be tested by genomic engineering in mice, using CRISPR/Cas9, in collaboration with colleagues at MRC Human Genetics Unit (3)

The project will also study suggested associations between red hair and other traits in Biobank. We have data on melanoma and other skin cancers and sensitivity to sunburn, which are likely to show associations with hair colour genes. We also have birth weight, blood pressure, haematological data, hearing, eye parameters and a host of disease and behavioural data that have been suggested to have an association with red hair in particular. The project will look for associations between genetic hair colour variants and these putative associated traits.

The student will receive training in analysis of complex genetic traits, in analysis of genomes and genomic sequences, in genome engineering and in phenotypic analysis.

The project is a collaboration between the Roslin Institute and the MRC Human Genetics Unit in the Institute of Genetics and Molecular Medicine. Both Institutes have international reputations in the analysis of genomic associations with physical traits (4,5) and in the functional analysis of genetic variation.

References: 1.Sturm (2009) Human Molecular Genetics 18 R9-17 2. Valverde et al (1995) Nature Genetics 11 328-330 3. Jackson et al (2007) Human Molecular Genetics 16 2341-8 4. Wilkinson et al (2013) PLoS Genetics 9 e1003453 5. Tenesa et al (2016) Genome Biology 15 269

Contact email address(es): [email protected] [email protected]

Institute/Centre and/or other useful web addresses: www.hgu.mrc.ac.uk

28 http://www.hgu.mrc.ac.uk/people/i.jackson.html www.igmm.ac.uk www.roslin.ed.ac.uk http://www.roslin.ed.ac.uk/albert-tenesa/

29

Project title:

Human-Robot Interaction for Dementia Prevention and Research Project description:

The prevalence and the trend towards substantial increases in the incidence of dementia make its prevention a major societal challenge. Cognitive activity, physical activity and social engagement are among the strategies identified as promising for dementia prevention [3]. From a research perspective, major efforts are underway which seek to implement targeted collection of observational data in order to assemble knowledge that can be translated into new interventions, in terms of therapy as well as public health. Interactive computing and robotics technology could contribute significantly to these objectives. Developments in computing technology have opened new opportunities for assessing the health and cognitive well-being of older persons through embedded sensors. This can be done longitudinally, more conveniently and more frequently than is currently possible with existing procedures. Multimodal signal processing and machine learning methods have been successfully employed in different areas for inference of high-level interaction data [2], and could be used for aggregating and analysing data in assisted living settings. In separate developments, assistive social robots [1] have been increasingly used in elderly care, targeting psychological well-being and maintenance of cognitive function, as well as novel therapies for dementia patients [4]. The overall goal of this project is to investigate novel approaches for gathering detailed physiological and cognitive data through human-robot interaction, and assess the applicability of the resulting data sets to dementia prevention research. More specifically, it will seek to answer a number of interrelated questions, including:  How to design human-robot interaction systems that are engaging and can serve as a means for interaction and cognitive stimulation to the participants,  How health and well-being related data can be collected through the robot, in the context of its interaction with the user. These data will typically include physiological data, possibly acquired through sensors built into the robot, and cognitive data, possibly acquired through speech, gaze and gesture signals acquired through microphones and cameras as the user engages in interaction with the robot.  How these data collected through daily-life interaction between older persons and robots could be incorporated into larger datasets, for research and health monitoring purposes.  How the resulting data and technologies can underpin the development of methods for predicting health and well-being trajectories of participants. The PhD student will gain knowlwdge of advanced interactive systems for assistive social robotics, and machine learning methods to infer the relative importance of different categories of participant-generated signals in predicting health and well-being trajectories. The PhD student will also be trained in methods for aggregation of different sensor and cognitive (e.g. speech, language) data, and devise strategies for integrating such data into existing population data resources.

References:

1. Broekens J, Heerink M, Rosendal H. “Assistive social robots in elderly care: a review”. Gerontechnology. 2009;8(2). doi:10.4017/gt.2009.08.02.002.00

30

2. Luz S. Automatic Identification of Experts and Performance Prediction in the Multimodal Math Data Corpus Through Analysis of Speech Interaction. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction. ICMI ’13. New York, NY, USA: ACM; 2013:575-582. doi:10.1145/2522848.2533788. 3. Middleton LE, Yaffe K. PRomising strategies for the prevention of dementia. Arch Neurol. 2009;66(10):1210-1215. doi:10.1001/archneurol.2009.201. 4. Wada K, Shibata T, Musha T, Kimura S. Robot therapy for elders affected by dementia. IEEE Engineering in Medicine and Biology Magazine. 2008;27(4):53-60. doi:10.1109/MEMB.2008.919496.

Contact email address(es):  Dr Saturnino Luz ([email protected])  Prof Craig Ritchie  Prof David Robertson

Institute/Centre and/or other useful web addresses:  The Usher Institute for Population Health Sciences and Informatics  Centre for Dementia Prevention  The School of Informatics

31

Project title:

Structural analysis and prediction of dominant negative mutations in protein complexes associated with developmental disorders Project description:

Many developmental disorders are caused by rare inherited or de novo mutations in protein- coding genes. With the huge reduction in the cost of sequencing, it is now becoming feasible to identify causal mutations in many cases. For example, the ongoing UK- and Ireland-wide Deciphering Developmental Disorders (DDD) project seeks to identify the genetic causes for >12,000 children and adults with previously undiagnosed developmental disorders [1].

Given this explosion in human genome sequence data and the fact that many disease- associated mutations have never been identified previously, there is a pressing need for computational methods to prioritise genetic variants most likely to cause disease.

An often neglected factor that can influence mutation pathogenicity is the assembly of proteins into complexes [2,3]. For instance, when a multiple copies of a protein are present in the same complex, a heterozygous mutation can result in most assembled complexes containing a mixture of wildtype and mutated proteins. If a single mutated protein disrupts the function of the entire complex, this leads to a “dominant negative” mechanism, whereby a heterozygous mutation can cause an almost complete loss of function.

In this project, we will first perform a detailed large-scale structural bioinformatic investigation into the locations of human disease-associated mutations with respect to experimentally determined and homology modelled protein complex structures. We will use various computational methods to model the effects of each mutation on the stability of individual protein subunits and the assembly of protein complexes. This will allow us to identify protein structural features that can best discriminate between different types of mutations.

The second part of this project will involve developing a new computational predictor of mutation pathogenicity using machine-learning techniques. Rather than simply predicting the probability of a mutation being pathogenic, our new method will assign a relative probability that a mutation involves a dominant negative mechanism vs. a gain- or loss-of-function mechanism. This will be particularly useful for identifying new de novo variants, which are more likely to manifest their phenotypic effect via a dominant negative mechanism. We will apply this method to mutations identified by the DDD project to assess the practical benefits in terms of identifying causative mutations and predicting molecular mechanisms.

References: [1] DDD Study (2015) Nature 519:223 [2] Ahnert, Marsh et al (2015) Science 350:1331 [3] Wells et al (2016) Cell Reports 14:679 Contact email address(es): [email protected] [email protected]

Institute/Centre and/or other useful web addresses: MRC Human Genetics Unit Institute of Genetics and Molecular Medicine

32

Project title:

Stratifying Type 1 diabetes by residual insulin secretion: implications for risk management, therapy and disease prevention Project description:

Background The objective of this project is to investigate how stratifying patients with Type 1 diabetes by residual insulin secretion affects their management and outcome, and how this residual insulin secretion can be preserved, allowing better control, prevention and ultimately even cure of type 1 diabetes (T1DM). T1DM is an auto-immune disease characterised by destruction of the pancreatic beta cell leading to depletion of insulin. Until recently it was believed that all beta cells were destroyed. However the development of ultra-sensitive assays for C-peptide (a marker of insulin production) has led to the realization that some people with long-standing T1DM maintain residual insulin production. This is important since it gives hope for therapies not just to prevent T1DM before clinical onset but also for reversal even after long duration. If we can understand the determinants, underlying mechanisms and consequences of such persistent C-peptide this will inform the development of therapies to preserve beta cells and even regenerate beta cells in people with T1DM leading to an improvement in, and possible reversal of, diabetes. Aims i) To understand the genetic and environmental determinants of persistent C- peptide in people with type 1 diabetes ii) To understand whether there are immunologically-defined substrata of type 1 diabetes in which persistent C-peptide is more prevalent iii) To delineate the relevance of such determinants for intervention pathways iv) To quantify the impact of persistent C-peptide on diabetes complications To study these and other questions a large bioresource of biosamples from people with type 1 diabetes has been linked retrospectively and prospectively to their clinical records and C- peptide levels and auto-antibody levels are being measured In this project the student will work with these data to explore the genetic and environmental determinants of persistent C-peptide, the relationship to auto-antibody profile and the relationship to diabetes complications. This will lead to further studies in which specific subset of patients will have detailed immune biomarker profiles measured. This project will involve learning and using standard statistical methods used in gene discovery data analysis. This will extend this though to using more novel approaches such as machine learning to combine information about many genes to extract additional information on relevant genes. The pathways underlying associations will then be further explored using a range of bioinformatics techniques. These further analyses include seeking evidence on the mechanisms and pathways involved by combining the genetic association with other large publicly available datasets on relevant traits such as gene expression in immune cells. The project will also involve using statistical methods including survival analysis for modelling the relationship of C-peptide to prevalent and incident diabetes complications since such data are key to informing what future levels of beta cell recovery, if not complete, would nonetheless have important clinical impact. Professor Colhoun is an internationally recognised leader in the field of diabetes epidemiology including the application of large scale genetic and other ‘omic discovery

33 techniques for understanding pathogenesis and improving prediction. Professor McKeigue is an internationally recognised leader in the field of epidemiology, and statistical genetics. Training Outcomes

Applicants for this project must be numerate and could have a background in epidemiology, genetics, bioinformatics, mathematics, statistics or computer science. They will receive training in epidemiology, statistical genetics and bioinformatics. At the end of the PhD the student will be competent to derive valid phenotypes from longitudinal electronic health record data, conduct a genome wide association study, conduct meta-analysis of GWAS studies, conduct bioinformatics analysis of reported associations, be familiar with publicly available data resources relevant to immune disorders and will be able to use a range of statistical techniques on longitudinal anonymised data. They will also receive training in compliance with data governance requirements.

References:

Oram RA, Jones AG, Besser RE, Knight BA, Shields BM, Brown RJ, Hattersley AT, McDonald TJ. The majority of patients with long-duration type 1 diabetes are insulin microsecretors and have functioning beta cells. Diabetologia. 2014 Jan;57(1):187-91.

Roep BO, Peakman M. Surrogate end points in the design of immunotherapy trials: emerging lessons from type 1 diabetes. Nat Rev Immunol. 2010 Feb;10(2):145-52. doi: 10.1038/nri2705. PubMed PMID: 20098462.

Looker HC, Colombo M, Hess S, Brosnan MJ, Farran B, Dalton RN, Wong MC, Turner C, Palmer CN, Nogoceke E, Groop L, Salomaa V, Dunger DB, Agakov F, McKeigue PM, Colhoun HM; SUMMIT Investigators. Biomarkers of rapid chronic kidney disease progression in type 2 diabetes. Kidney Int. 2015 Oct;88(4):888-96. doi: 10.1038/ki.2015.199. Epub 2015 Jul 22. PubMed PMID: 26200946

Postmus I, Trompet S, Deshmukh HA, et al. Pharmacogenetic meta-analysis of genome-wide association studies of LDL cholesterol response to statins. Nat Commun. 2014 Oct 28;5:5068. doi: 10.1038/ncomms6068. PubMed PMID: 25350695; PubMed Central PMCID: PMC4220464.

Contact email address(es): Principal supervisor: Professor Helen Colhoun, Institute of Genetics and Molecular Medicine (moving from Dundee on 1 April): [email protected]

Co-supervisor: Professor Paul McKeigue, Usher Institute of Population Health Sciences: [email protected]

Centre or other useful web address(es):

Institute of Genetics and Molecular Medicine (http://www.igmm.ac.uk/) Usher Institute of Population Health Sciences and Informatics (http://www.ed.ac.uk/molecular-clinical-medicine/usher-institute)

34

Project title:

Integrated systems biology of multiple organ dysfunction in acute pancreatitis Project description:

Background Acute pancreatitis (AP) is a common and devastating inflammatory condition of the pancreas that is considered to be a paradigm of sterile inflammation leading to systemic multiple organ dysfunction syndrome (MODS) and death. Until now, there are no specific therapies available that protect individuals against AP-MODS. We have recently reported that kynurenine-3- monooxygenase (KMO), a key enzyme of tryptophan metabolism, is central to the pathogenesis of AP-MODS (Mole et al, Nature Med. Nature Medicine 22, 202–209 (2016). This work is part of a drug discovery collaboration with GSK (Discovery Partnerships with Academia DPAc scheme), which is in transition from the preclinical to clinical phases.

Stratifying which patients might benefit most from this new medicine is challenging, and our collaborative epidemiological studies to date have not defined a pre-existing demographic which can predict AP-MODS during AP. There is therefore a significant unmet medical need for a rapid turnaround precision medicine approach to define precisely which individuals will benefit most from this new medicine. As part of this study, we have performed comprehensive metabolomic phenotyping of an entire cohort of persons with AP, and in addition are doing array-based genotyping and RNASeq transcriptomic sequencing to for this cohort. We will extend this analysis during the run-up to the studentship by performing cap analysis of gene expression (CAGE). This RNA sequencing technology enables identification of transcription start sites at single base resolution. We have recently applied this technique, in the FANTOM5 consortium, to detect numerous alternative promoter regions activated in innate immune cells exposed to a variety of stimuli (Forrest, Kawaji, Rehli, Baillie et al, Nature (2014) 507:462-70; Arner et al, Science (2015) 347:1010-14), and to detect activity of enhancer elements genome-wide (Andersson et al, Nature 2014 507:455-61).

The bioinformatics opportunity created by this dataset is substantial. The student will contribute to the analysis of a globally-unique dataset of extremely high biological resolution data, with serial sampling in an extensively clinically-phenotyped cohort. The primary aim of the studentship will be to identify groups of patients sharing therapeutically-important similarities – a fundamental problem facing critical care medicine.

The student will apply existing methods, and contribute to the development of novel analytic approaches, to stratify this patient cohort at key clinical decision points (e.g. presentation to hospital, referral to critical care), drawing on key elements of genomic, transcriptomic, metabolomic and clinical data.

Training outcomes  Biomarker discovery and validation techniques  Computational methods in python and R, biostatistics, and pathway analysis  Data visualisation and presentation using command-line graphing software, BioLayout Express 3D, d3.js and javascript.  Exposure and experience of industry and the relevance of integrated systems biology to the pharmaceutical industry will be provided by a short period in GSK in the bioinformatics division during the studentship.

Summary This project bridges the gap between fundamental biology, a deep understanding of disease, clinical translational medicine, precision medicine and the pharmaceutical industry in a real

35 world environment for which the team is engaged in clinical trials of a ground breaking new medicine in AP. References: Please1. Kynurenine-3-monooxygenase inhibition prevents multiple organ failure in rodent models of acute pancreatitis. Damian J Mole, Scott P Webster, Iain Uings et al. Nature Medicine 22, 202–209 (2016) 2. A promoter-level mammalian expression atlas. Forrest, Kawaji, Rehli, Baillie et al, Nature (2014) 507:462-70 3. Transcribed enhancers lead waves of coordinated transcription in transitioning mammalian cells. Arner et al, Science (2015) 347:1010-14 4. An atlas of active enhancers across human cell types and tissues. Andersson et al. Nature 2014 507:455-61

Contact email address(es): [email protected] [email protected] [email protected]

Institute/Centre and/or other useful web addresses: http://www.cir.ed.ac.uk http://www.roslin.ed.ac.uk/ken-baillie/ www.dpac.gsk.com

36

Project title:

Identifying new diabetic retinopathy biomarkers based on Optical Coherence Tomography Angiography (OCT-A), advanced image processing, and computational modelling Project description:

Background Diabetic retinopathy (DR) is the leading cause of visual loss in developed countries worldwide. Previous studies have reported haemodynamic changes in the diabetic eye that precede clinically evident pathological alterations of the retinal microvasculature. Therefore, new methods that allow greater understanding of these early haemodynamic changes may empower earlier detection of DR helping to limit vision loss.

Optical Coherence Tomography Angiography (OCT-A) [1] is a non-invasive technique for retinal imaging. OCT-A can resolve the microvasculature of the eye to a level of detail never seen before. Only very recently, OCT-A commercial products have started to be utilised in clinical centres. One of the first OCT-A devices installed in the UK is hosted in Edinburgh at the Clinical Research Imaging Centre (CRIC). CRIC’s Image Analysis Laboratory Manager, Dr MacGillivray, is one of the co-supervisors of this project and works closely with a team at Queen’s University in Belfast, where a second OCT-A device is hosted. Furthermore, co-supervisors Prof. Dhillon and Dr MacGillivray have over 12 years’ experience developing a successful research programme around retinal image analysis and are behind the software package VAMPIRE used in research centres worldwide [2].

In recent work [3], co-supervisor Dr Bernabeu and colleagues showed that it is possible to build computational blood flow models from high-resolution images of the parafoveal region of the retina (of paramount importance for sharp central vision and visual detail). These computational models can provide a detailed haemodynamic characterisation of the region. This will enable us to run longitudinal studies featuring patients with diabetes and identify flow- based early indicators of the structural changes that lead to visual loss in advanced DR stages. We now have a unique opportunity to combine this technology with the OCT-A datasets at Edinburgh and Belfast and the VAMPIRE toolkit in order to realise the full potential of this patient-specific modelling approach to achieving Precision Medicine for eye care.

The supervisory team brings together a unique set of skills in order to lead a programme of research of high calibre: Prof. Andrew Morris (Health Informatics, Data Science), Dr Miguel O. Bernabeu (computational modelling in Biology and Medicine), Dr Tom MacGillivray (multimodal image acquisition and retinal image processing), and Prof. Baljean Dhillon (Clinical Ophthalmology and Brain Sciences).

Aims In this project, the student will bring together cutting edge image processing and computational modelling methods in order to characterise the early changes in microvascular haemodynamics associated with DR. The project will take advantage of the new OCT-A datasets in Edinburgh and Belfast, which are world-leading. The main objective is to investigate clinically relevant diabetic retinopathy biomarkers and design the computational pipelines necessary to facilitate their use in future large-scale clinical studies.

Training outcomes The student will receive state-of-the-art training in the core disciplines of image analysis, computational modelling, statistical methods, and data science while gaining expert knowledge in the context of diabetic retinopathy. The student will develop the essential soft and domain-

37 specific skills necessary to design and implement novel quantitative and computational methods that could solve challenging problems across the entire spectrum of vascular medicine both in academic and industrial settings.

References: [1] de Carlo et al. “A review of optical coherence tomography angiography (OCTA)” International Journal of Retina and Vitreous 2015 1:5. DOI:10.1186/s40942-015-0005-8

[2] E. Trucco, A. Giachetti, L. Ballerini, D. Relan, A. Cavinato, T. MacGillivray, Morphometric Measurements of the Retinal Vasculature in Fundus Images with VAMPIRE, in Biomedical Image Understanding: Methods and Applications, J. Lim, S. Ong, W. Xiong, Eds., John Wiley & Sons, 2015. DOI:10.1002/9781118715321.ch3

[3] Bernabeu et al. “Characterization of parafoveal hemodynamics associated with diabetic retinopathy with adaptive optics scanning laser ophthalmoscopy and computational fluid dynamics” in Engineering in Medicine and Biology Society (EMBC), 2015 37th Annual International Conference of the IEEE, pp.8070-8073, 25-29 Aug. 2015. DOI:10.1109/EMBC.2015.7320266

Contact email address(es):

[email protected] [email protected] [email protected] Institute/Centre and/or other useful web addresses: VAMPIRE retinal image processing platform: http://vampire.computing.dundee.ac.uk

HemeLB blood flow simulation software: https://github.com/UCL/hemelb

Usher Institute of Population Health Sciences and Informatics: http://www.ed.ac.uk/molecular-clinical-medicine/usher-institute

Clinical Research Imaging Centre: http://cric.ed.ac.uk

Edinburgh Data Science: http://www.ed.ac.uk/data-science

38

Project title:

Identification, functional characterization and in vivo mouse CRISPR transgene modeling of novel human genome wide association variants for metabolic disease/ageing Project description:

Background: Advancing age is the greatest risk factor driving the metabolic dysregulation underpinning diabetes, cardiovascular disease and cancers. However, the molecular basis of this relationship is poorly understood. Identification of genes and genetic risk factors interlinking longevity, fat mass/distribution and cellular metabolism will help to advance our understanding of the ageing process. The Wilson/Joshi group have led genome wide association (GWA) meta-analysis to reveal the impact of homozygosity on complex traits (Joshi, PK, et al. and Wilson JF Nature 2015) and have already established an association between lifespan and two DNA variants in apoE alleles previously associated with chronic disease [Hypercholesterolame and Alzheimers; Joshi et al., Nature Communications acceptedin revision] emphasizing the interconnectedness of ageing and metabolic processes. The studentship will extend successful GWA and deep-phenotyping (e.g. accurate determination of fat distribution and parent life history traits), using computation/statistical genetics methods to discover novel human DNA variants linked to fat distribution and blood pressure in the unique population isolates of Orkney/Shetland (Schraut et al., manuscript in preparation). The use of population isolates enable alleles rare in the general population to rise to detectable levels in isolated populations.

An integrated approach to understanding ageing/metabolic disease risk: We are currently investigating the molecular basis of the lead adiposity/blood pressure variants, situated in a chromosome 4 gene intron, on promoter/enhancer function with the leading functional genomics group of Bickmore (Williamson et al., Genes & Development 2014) and the student will have the opportunity to extend this analysis to novel hits they discover in the project. The student will then contribute to CRISPR genome-editing to generate in vivo mouse models of the genes/variants with the leading-edge Wood laboratory (Wood AJ et al., Science 2011). Completion of the gene-to-function story will involve determining the metabolic impact of editing the identified candidate genes using “gold-standard” mouse metabolic phenotyping techniques with the Morton group (Morton et al., Nature Medicine accepted Feb 2016) in collaboration with the expertise of the Selman group in mammalian ageing and metabolism (Selman et al. Science, 2009). This powerful integrated training approach will take the student from identification of new variants/genes through to the understanding of their molecular and physiological mechanisms and illumination of potential new therapeutic targets for metabolic disease and ageing.

Aims and Training outcomes: This project will train the student in 3 critical areas relevant to precision medicine. 1. Bioinformatic analysis of large-scale human genetics datasets. 2. Functional genomics of the association intervals using bioinformatics and in vitro approaches (3D-FISH, chromosome conformation capture technologies) with rapid development of new CRISPR-targeted mouse models of the novel variants that they discover. 3. Cutting-edge phenotypic assessment, initially training on mature projects within the supervisors laboratories (unique candidates already discovered above). Together, this studentship project will provide a broad, highly dynamic and unique set of skills that will be highly competitive in the burgeoning genomics discovery era.

39

References: Directional dominance on stature and cognition in diverse human populations. Joshi, P, et al., and Wilson JF Nature. 2015 Jul 23;523(7561):459-62. doi: 10.1038/nature14618.

Spatial genome organization: contrasting views from chromosome conformation capture and fluorescence in situ hybridization.Williamson I, Berlivet S, Eskeland R, Boyle S, Illingworth RS, Paquette D, Dostie J, Bickmore WA. Genes Dev. 2014 28:2778-91. doi: 10.1101/gad.251694.114. PMID: 25512564

The MS is Ribosomal protein S6 kinase 1 signaling regulates mammalian life span. Selman C, Tullet JM, Wieser D, Irvine E, Lingard SJ, Choudhury AI, Claret M, Al-Qassab H, Carmignac D, Ramadani F, Woods A, Robinson IC, Schuster E, Batterham RL, Kozma SC, Thomas G, Carling D, Okkenhaug K, Thornton JM, Partridge L, Gems D, Withers DJ. Science. 2009 7;334(6052):39.

Targeted genome editing across species using ZFNs and TALENs. Wood AJ1, Lo TW, Zeitler B, Pickle CS, Ralston EJ, Lee AH, Amora R, Miller JC, Leung E, Meng X, Zhang L, Rebar EJ, Gregory PD, Urnov FD, Meyer BJ. Science. 2011 Jul 15;333(6040):307. doi: 10.1126/science.1207773.

Morton, N.M et al., Genetic identification of an adipocyte expressed anti-diabetic target in mice selected for resistance to diet-induced obesity. Nature Medicine manuscript accepted Feb 2016.

Contact email address(es): [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]

Institute/Centre and/or other useful web addresses: http://www.cvs.ed.ac.uk/users/nik-morton http://www.orcades.ed.ac.uk/team/ruth.html www.hgu.mrc.ac.uk/people/a.wood.html http://www.hgu.mrc.ac.uk/people/w.bickmore.html http://www.gla.ac.uk/researchinstitutes/bahcm/staff/colinselman/

40

Project title:

Use of -omics and retinal image data to improve vascular risk prediction in type 2 diabetes Project description:

There is a need for improved accuracy of vascular risk prediction and stratification in people with type 2 diabetes by identifying ‘novel’, potentially causal vascular risk factors which can be targeted through preventive strategies aimed at reducing the incidence of both the macro- and micro- vascular complications of diabetes. As an emerging approach for disease prevention, as well as treatment, the application of the research techniques employed within Precision Medicine, including analysis of high density ‘-omics’ data from well-phenotyped, longitudinal cohorts, is ideal for addressing this broad research need. Using data from the Edinburgh Type 2 Diabetes Study (ET2DS), which includes serial data (4 time-points over 10 years) on a wide range of clinical variables, retinal images, stored blood and urine and record linkage to routine hospital discharges for over 1000 men and women with type 2 diabetes1, the project will focus on one or more of the following specific research questions: (1) identification of long-term trajectories of modifiable biomarkers which increase the risk of micro- and macro-vascular complications in people with type 2 diabetes, focusing initially on glycemic control, blood pressure, lipid profile, obesity, inflammatory mediators and kidney function, (2) identification of ‘novel’ cardiometabolic and proteomic vascular biomarkers, using analysis of high resolution metabolic and proteomic phenotyping of stored serum samples, using a high-throughput 1H-nuclear magnetic resonance (NMR) spectroscopy platform3 and a high-multiplex immunoassay platform4 respectively, (3) identification of retinal vessel traits associated with micro and macro vascular disease using state-of-the-art VAMPIRE (Vascular Assessment and Measurement Platform for Images of the Retina) retinal trait imaging software, (4) combination of sub-clinical markers of vascular disease with biomarker profiles to identify people with diabetes most at risk of vascular events. The student will be trained in advanced computational, statistical and epidemiological methods as applied to hypothesis-driven research of public health importance. They will benefit from the epidemiological, statistical and imaging expertise of the supervisors as well as the availability of considerable and wide-ranging methodological expertise from senior members of the Usher Institute molecular epidemiology research group. They will also have access to methodologists using cutting-edge ‘-omics’ analysis techniques within major consortia involving the ET2DS2 and to cross-disciplinary researchers (including clinicians) who form the ET2DS research team.

References:

1. Price JF, Reynolds RM, et al. The Edinburgh Type 2 Diabetes Study: study protocol. BMC Endocrine Disorders, 2008; 8: 18 2. Shah T, Engmann J, et al. Population Genomics of Cardiometabolic Traits: Design of the University College London-London School of Hygiene and Tropical Medicine-Edinburgh- Bristol (UCLEB) Consortium. PLoS ONE 2013; 8(8): e71345 3. Ala-Korpela M. Critical evaluation of 1H NMR metabonomics of serum as a methodology for disease risk assessment and diagnostics. Clin Chem Lab Med, 2008; 46: 27-42 4. Goncalves I, Bengtsson E, et al. Elevated Plasma Levels of MMP-12 Are Associated With Atherosclerotic Burden and Symptomatic Cardiovascular Disease in Subjects With Type 2 Diabetes. Arterioscler Thromb Vasc Biology, 2015; 35: 1723-1

41

Contact email address(es): [email protected]

Institute/Centre and/or other useful web addresses: http://www.ed.ac.uk/molecular-clinical-medicine/usher-institute

42

Project title:

Stromal transcriptomic and epigenetic signatures in breast cancer metastasis and chemotherapy resistance. Project description: This project aims to provide novel insights into the mechanism of breast cancer metastasis and chemotherapy resistance and to train the PhD candidate to become a cutting-edge expert in both computational and biomedical research.

A tremendous amount of high-throughput data has been generated with the intention to stratify cancer patients for personalized diagnosis and treatment. However, almost all studies have utilized whole tumour tissues that often contain a large amount of tumour stromal cells; greatly complicating data interpretation. Furthermore, studies pioneered by the primary supervisor1,2 and others have demonstrated that tumour stromal cells play critical roles in cancer metastasis and therapy response that has been largely overlooked in the past. It is therefore imperative to better understand the ‘-omics’ data of cancer stromal cells to fully understand the disease mechanism. However, it is difficult and expensive to isolate different stromal cell types from patient samples in omics studies. Alternatively, transcriptome signature of tumour associated immune cells have been extrapolated from cancer transcriptome data based on immune cell specific markers using bioinformatics approaches3. Recent studies have also illustrated the potential to use DNA methylation as an epigenetic marker to specify tumour stromal cells4. Together, these studies establish the possibility of utilizing molecular data from tumour stromal cells to improve our understanding of the disease mechanism and to assist personalized cancer therapy. However, these approaches rely on a limited number of lineage markers, which induces bias and is difficult to verify with functional assays.

Aims: Clinically relevant in vivo models have led to significant breakthroughs in tumour stroma studies. The current project aims to develop novel computational approaches to integrate the transcriptome and epigenome data of tumour stromal cells generated from in vivo preclinical models of breast cancer and bone metastasis. These siganatures will then be used to extrapolate from public databases (The Cancer Genome Atlas, Cancer Genome Project and International Cancer Genome Consortium, etc.) the key gene networks associated with stromal cells in patients. We will focus on the stromal gene networks that related to metastatic disease and chemotherapy resistance and test functionally using in vitro and in vivo functional assays established in the laboratory. These works will provide novel insights of the disease mechanism and identify potential therapeutic targets. Machine learning approaches will also be used to translate the functionally validated signatures into diagnostics.

Training outcomes: Our lack of understanding of tumour stroma is a fundamental gap in our ability to deliver precision cancer medicine. The proposed project will identify key transcriptomic and epigenetic signatures of stromal cells associated with the major clinical challenges in breast cancer medicine: metastasis and therapy resistance. The supervision team provides complimentary knowledge in tumour microenvironment, in vivo research (BZQ), bioinformatics (AS) and epigenetics (DS) to provide extensive training and skills development in molecular pathology, data processing/analysis and machine learning. The PhD candidate will become an expert in combining both in vivo skills and computation methods to maximize the impact of multidisciplinary research and precision medicine.

References: 1. Qian, B.-Z. et al. CCL2 recruits inflammatory monocytes to facilitate breast-tumour metastasis. Nature 475, 222–225 (2011).

2. Hughes, R. et al. Perivascular M2 Macrophages Stimulate Tumor Relapse after Chemotherapy. Cancer Res. 75, 3479–3491 (2015).

43

3. Ogino, S., Galon, J., Fuchs, C. S. & Dranoff, G. Cancer immunology—analysis of host and tumor factors for personalized medicine. Nat. Rev. Clin. Oncol. 8, 711–719 (2011).

4. Jeschke, J., Collignon, E. & Fuks, F. DNA methylome profiling beyond promoters - taking an epigenetic snapshot of the breast tumor microenvironment. FEBS J. 282, 1801–14 (2015).

Contact email address(es): Binzhi Qian [email protected]

Institute/Centre and/or other useful web addresses: http://www.crh.ed.ac.uk/binzhi-qian/ www.ecrc.ed.ac.uk/Researchers/item/Dr-Andy-Sims.html http://www.hgu.mrc.ac.uk/people/d.sproul.html

44

Project title:

Development of genomic profiling to improve clinical outcome and personalize treatment for osteoporosis Project description:

Osteoporosis affects about 30% of women and 12% of men at some point during life and incurs treatment costs of almost £2 billion in the UK alone. At the present time targeting drug treatment lacks sophistication and is offered to patients with clinical risk factors for fracture and low bone mineral density [1]. The response to these treatments is variable and many patients develop drug related adverse effects indicating that there is a clinical need for more refined methods of targeting treatment. Genetic factors play a key role regulating susceptibility to osteoporosis [2, 3] and there is increasing evidence that genetic factors also play a role in predicting treatment response. For example we recently identified one genome wide significant hit for response to Teriparatide (TPTD) therapy (5 x 10-9) in patients with osteoporosis and several suggestive loci [4]. The aim of this project is to identify genetic predictors of therapeutic response to anti-osteoporosis therapies as well as to identify predictors of adverse effects

The project study will utilized a unique bio-resource of serum samples, DNA and clinical data from 5000 patients that have been referred to the osteoporosis service in NHS Lothian over the past 10 years. From a methodological point of view the project will use advanced statistical analysis and machine learning methods that exploit high-dimensional biomarkers to cluster individuals into clinical subtypes from which drug response and other outcomes can be predicted. The project will also use NHS record linkage to explore relationships between genotype, clinical risk factor analysis and clinical outcomes of importance such as fracture and adverse effects. Finally the project will also apply experimental medicine and wet laboratory based approaches (cell culture, molecular biology) to explore the physiological basis of drug related adverse effects, particularly to understand the mechanisms underlying gastrointestinal intolerance with oral bisphosphonates which we think may be due to genetic differences in the response of gamma delta T cells to isopentenyl pyrophosphate which accumulates in patients that are given bisphosphonates [5].

Training outcomes The successful candidate will join a dynamic team who are internationally recognised leaders in the genetic basis of bone disease and gain experience in over a broad range of techniques, from statistical analysis and machine learning, through human molecular genetics, bioinformatics, cell biology clinical epidemiology and experimental medicine. In addition to advancing understanding of the responses to osteoporosis treatments the project could be successful in developing intellectual property for new diagnostic tests that could be offered to patients in collaboration with industrial partners.

References: 1. Management of Osteoporosis and the Prevention of Fragility Fractures. SIGN, 1-128. 2015.

2. Zheng H.F. et al. (2015) Whole-genome sequencing identifies EN1 as a determinant of bone density and fracture. Nature. http://www.ncbi.nlm.nih.gov/pubmed/26367794?dopt=Citation

3. Estrada K, et al. (2012) Genome-wide meta-analysis identifies 56 bone mineral density loci and reveals 14 loci associated with risk of fracture. Nat.Genet. 44:491-501. http://www.ncbi.nlm.nih.gov/pubmed/22504420?dopt=Citation

45

4. Alonso N, Riches PL, Langdahl BL, Ralston SH (2015) Genome-Wide Analysis Identifies Significant Predictors of Therapeutic Response to Teriparatide in Severe Osteoporosis. Journal of Bone & Mineral Research 30 (abstract). http://www.asbmr.org/education/2015-abstracts

5. Hewitt RE, Lissina A, Green AE, Slay ES, Price DA, Sewell AK (2005) The bisphosphonate acute phase response: rapid and copious production of proinflammatory cytokines by peripheral blood gamma delta T cells in response to aminobisphosphonates is inhibited by statins. Clin.Exp.Immunol. 139:101-111. http://www.ncbi.nlm.nih.gov/pubmed/15606619?dopt=Citation

Contact email address(es): [email protected]

Institute/Centre and/or other useful web addresses: http://www.cgem.ed.ac.uk/research

46

Project title:

Determining the cause of failure to respond to tyrosine kinase inhibitor therapy in chronic myeloid leukaemia Project description:

The leukaemic cells in Chronic Myeloid Leukaemia (CML) are characterised by a reciprocal translocation t(9;22) which fuses BCR (chromosome 22) to the oncogene ABL (chromosome 9) leading to the generation of a fusion transcript (BCR-ABL). The chromosome 22 involved in the rearrangement (the “derivative 22”), also known as the Philadelphia chromosome in karyotype analyses after the US city in which it was first identified, was the first genetic defect ever to be associated with a malignancy1. The BCR-ABL fusion transcript produces a protein with enhanced tyrosine kinase activity. Haematopoietic stem cells with the translocation have a growth advantage to the extent that, at initial presentation, most patients have high myeloid blood cell counts, and 100% of cells are Philadelphia chromosome positive. If left untreated the chronic phase eventually transforms to a form of acute myeloid leukaemia characterised by increased numbers of undifferentiated cells (myeloblasts). This is usually fatal. The mechanism of transformation is not well understood but it probably results from the acquisition of a number of different secondary mutations. These mutations also promote growth, and they cause the Philadelphia chromosome positive cells to be less dependent on BCR-ABL alone for their growth advantage.

In the chronic phase of CML before secondary mutations have developed, treatment with small molecule inhibitors of the BCR-ABL tyrosine kinase activity are very effective and have revolutionised the management of this condition. The tyrosine kinase inhibitor (TKI) used most frequently in the CML clinic is Imatinib. Patients generally respond well and lose Philadelphia chromosome positivity by karyotype analysis after 1 year on treatment2. Indeed some patients even become negative by more sensitive molecular techniques (RT-PCR for the fusion transcript). Acute transformation is now quite rare, whereas at one time it was inevitable. However there are patients that do not respond well from the outset, and others who respond initially but then lose their response. In the latter group of patients acquired mutations in the kinase domain (TKD) of BCR-ABL can be identified in 50% of patients3. These mutations prevent the small molecule inhibitor from binding stably to the BCR-ABL protein. The aim of the project is to identify the genetic causes of failure to respond to Imatinib in the 50% of non- responders that do not have ABL TKD mutations.

The UK Spirit 2 trial compared Imatinib with Dasatinib at presentation4. Samples were collected from all 800 patients enrolled into the trial at 3 monthly intervals. We have performed a mutation analysis of the BCR-ABL tyrosine kinase domain in all Spirit 2 patients with a poor response at 1 year5. Those patients that developed treatment resistance where that resistance cannot be explained by mutations in the BCR-ABL TKD will be whole genome sequenced. Uncovering the genetic basis of poor response should inform on the appropriate management of these patients, either with second or third generation tyrosine kinase inhibitors, or with other treatments modalities.

References: 1. http://www.cancer.gov/publications/dictionaries/cancer-terms?cdrid=561237 2. http://www.ncbi.nlm.nih.gov/pubmed/26434969 3. http://www.ncbi.nlm.nih.gov/pubmed/25971713 4. http://www.spirit-cml.org/spirit-2-home.aspx

47

5. http://www.cgem.ed.ac.uk/research/people/bernard-ramsahoye

Contact email address(es):

[email protected]

Institute/Centre and/or other useful web addresses: http://www.cgem.ed.ac.uk/research

48

Project title:

Dynamics of mitochondrial genome complexity in trypanosomes Project description: Trypanosomatid parasites cause human and animal diseases with devastating health and economic consequences (1). Although there are curative drugs they have many undesirable side-effects. The viability of these parasites - like that of all other eukaryotes - crucially depends on mitochondrial (mt) function and, in most cases, expression of mitochondrially encoded genes. The aim of this project is to increase our understanding of the genomic structure and function of trypanosomal mtDNA (the kinetoplast). A quantitative analysis of how genetic information is stored and recombined in the kinetoplast, and how it is expressed during the parasite’s complex life cycle, is critical for this understanding. The kinetoplast is remarkable and unlike any other mtDNA (2). It is made up of thousands of interlocked DNA rings like chainmail armour. One type of ring, the maxicircle DNA, contains genes for subunits of the mitochondrial respiratory chain. The other type of ring, called minicircle DNA, encodes hundreds of different guide RNAs (gRNAs) that are required for post-transcriptional editing of the maxicircle-encoded mRNA. In some genes almost half the genetic information is edited into the mRNA. Our groups (one theoretical and another experimental) are interested in determining the complexity of the kinetoplast on the genome and transcriptome level and how it evolves over time (3,4). To this end we use a combination of mathematical modelling, bioinformatics, next-generation sequencing and experimentation. Our preliminary data reveal an extraordinary dynamic complexity and the existence of novel small RNAs, the function of which remains to be determined.

This project is ideal for someone interested in working at the intersection of wet lab and computational biology. A strong background in mathematics or programming is highly desirable. The candidate will obtain training in cutting edge molecular biology as well as in mathematical biology and scientific programming.

References: (1) Stuart K, Brun R, Croft S, Fairlamb A, Gürtler RE, McKerrow J, Reed S, Tarleton R. (2008). Kinetoplastids: related protozoan pathogens, different diseases. Journal of Clinical Investigation 118(4):1301-10.

(2) Jensen RE, Englund PT (2012). Network news: the replication of kinetoplast DNA. Annual Reviews in Microbiology 66:473-91.

(3) Savill NJ and Higgs PG (1999). A theoretical study of random segregation of minicircles in trypanosomatids. Proceedings of the Royal Society B – Biological Sciences, 266:611–620.

(4) Dean S, Gould MK, Dewar CE, Schnaufer AC (2013). Single point mutations in ATP synthase compensate for mitochondrial genome loss in trypanosomes. Proceedings of the National Academy of Sciences U S A 110(36):14741-6.

Contact email address(es): [email protected]

Institute/Centre and/or other useful web addresses:

Institute of Immunology and Infection Research www.ed.ac.uk/biology/immunology-infection Centre for Immunity, Infection and Evolution ciie.bio.ed.ac.uk

49

Project title:

Computational dissection of regulatory variation in motor neurone disease Project description: Motor neurone disease (MND; aka amyotrophic lateral sclerosis, ALS) is an untreatable, rapidly progressive and fatal neurodegenerative disorder. Major advances in understanding the pathogenic basis of the disease, particularly the genetic aetiology, have been made in the last 10 years but the majority of familial cases and over 80% of sporadic cases remain without a genetic diagnosis. A repeat expansion in the first intron of the C9orf72 gene is considered to be the best candidate causal variant in a minority of MND cases, emphasizing the role of rare, regulatory variation; but many novel loci are expected to emerge via whole genome sequencing (WGS) studies (Diekstra et al, 2014). In concert with WGS of MND patients now being carried out in Europe and the USA, we will soon generate WGS of MND samples collected in Scotland over the past thirty years, as part of the Scottish Genomes Partnership for medical genomics. This collection benefits from deep longitudinal phenotype and follow-up data through Scottish electronic healthcare records and the dedicated Scottish Motor Neurone Disease Audit, Research and Trials (SMART-MND) register. This project will aim to define the complete allele spectrum of Scottish MND patients, including all coding and non-coding sequence variants. Allele frequencies will be compared with those in the Scottish population cohorts previously genotyped at the MRC Institute of Genetics and Molecular Medicine (IGMM). This will extend knowledge of genotype-phenotype correlations in Scottish patients with MND, through correlation of coding and non-coding sequence variants with disease course data for this cohort from the SMART-MND register. The WGS data will be integrated with a variety of other datasets to shed light on variant functions. In particular we will exhaustively annotate non-coding single nucleotide and structural variants according to their overlaps with neural eQTL and chromatin data. This will allow us to calculate the mutational burden of rare variants at different classes of regulatory sites using established methodologies (e.g. Melton et al, 2015). Using integrated regulatory compendia (e.g. Moore et al, 2015) we will also seek evidence of common regulatory dysfunction underlying constellations of disease associated non-coding variants. Overall, we anticipate that the data generated will increase the proportion of Scottish MND patients with a genetic diagnosis, will enhance understanding of genotype-phenotype relationships in MND, including knowledge of genetic indicators of prognosis, and will ultimately contribute to the international efforts to find treatments for this presently incurable disease. Professor Semple is Head of Bioinformatics at the IGMM. Professor Aitman is Director of the Center for Genomic and Experimental Medicine and jointly leads the SGP. Professor Chandran is Director of the Centre for Clinical Brain Sciences, Director of the Euan MacDonald Centre for MND Research, and co-Director of the Anne Rowling Regenerative Neurology Clinic. The student undertaking this project will gain advanced bioinformatics skills and join a thriving community of computational biologists, in the context of a multidisciplinary biomedical research institute (IGMM). It would suit a biologist with computing experience or those from a computing/maths/physics background with a strong interest in biomedicine.

References: Diekstra FP, Van Deerlin VM, van Swieten JC, Al-Chalabi A, Ludolph AC, Weishaupt JH, Hardiman O, Landers JE, Brown RH Jr, van Es MA, et al. 2014. C9orf72 and UNC13A are shared risk loci for amyotrophic lateral sclerosis and frontotemporal dementia: a genome-wide meta- analysis. Ann Neurol 76: 120-33. Melton C, Reuter JA, Spacek DV, Snyder M. 2015. Recurrent somatic mutations in regulatory regions of human cancer genomes. Nat Genet 47: 710-6. Moore BL, Aitken S, Semple CA. 2015. Integrative modeling reveals the principles of multi- scale chromatin boundary formation in human nuclear organization. Genome Biol 16: 110. Contact email address(es):

50

[email protected]

Institute/Centre and/or other useful web addresses: http://www.hgu.mrc.ac.uk/Users/Colin.Semple/

http://www.cgem.ed.ac.uk/research/people/tim-aitman http://www.siddharthanchandran.org.uk/ http://www.igmm.ac.uk/ http://www.scottishgenomespartnership.org/

51

Project title:

Identifying neurobiological consequences of genetic risk for schizophrenia on traits associated with Major Depressive Disorder (MDD) Project description:

Background: Major depressive disorder (MDD) is a heritable, disabling psychiatric disorder, renowned for clinical and aetiological heterogeneity. This heterogeneity has severely hampered progress in the understanding of the disorder. MDD is characterised by low mood, cognitive deficits and altered personality traits, such as neuroticism. These traits have a polygenic architecture and importantly also differ in cases and controls in several other major psychiatric disorders such as schizophrenia (SCZ) and bipolar disorder (BD). The degree to which a genetic risk for SCZ and BD contribute to mood, cognitive deficits and personality in MDD is unclear, and may also mitigate some of the heterogeneity seen in MDD.

The current proposal is based on our previous findings in two important cohorts. Firstly, in the Lothian Birth Cohort (LBC) we demonstrated that polygenic risk for SCZ is associated with lower cognitive ability in healthy controls at age 70 years, and with greater cognitive decline (1). Secondly, in another large population-based sample (Generation Scotland; GS), we found that mechanisms underlying cognitive impairments in MDD, specifically relating to processing speed, are related to increased genetic loading for SCZ that are over and above the impact seen in healthy controls (2).

Aims: This proposal seeks to explore the neurobiological consequences of genetic loading for SCZ and its impact on cognition and other traits associated with MDD using the extensive imaging, phenotypic and genetic data available through UK Biobank, LBCs and STRADL (GS). This directly follows work we conducted previously linking polygenic risk scores with imaging markers of psychiatric disease (3), and from work in LBC and GS cohorts, whilst also maximizing potential by the additional use of UK Biobank. UK Biobank is a newly available major national health resource with the aim of improving prevention, diagnosis and treatment of a wide range of illnesses including depression, and includes data on 500,000 individuals. The use of such samples permits analysis of genetic neuroimaging in sample sizes not previously possible. It also permits the creative application of more sophisticated computerised modeling techniques (such as tests of mediation, prediction, and stratification) to combined multi-modal brain MRI, genetic and phenotypic data (4). Ultimately, this proposal offers the opportunity to stratify MDD based on defining subgroups of individuals with common mechanistic origins related to schizophrenia risk. The potential impact of this research could transform the understanding of MDD.

Training outcomes: Extensive training in MR image processing and sophisticated image analysis would be undertaken, including widely used software packages such as SPM, and FSL. It would also be expected that the PhD candidate would develop a thorough understanding of the modeling procedures offered by the statistical package ‘R’, as well as familiarisation with access procedures and use of these large cohorts. The candidate would also develop a sound understanding of procedures necessary for generating polygenic risk scores and of psychiatric genetics in general. Training outcomes would culminate in a sound understanding of techniques at the forefront of neuroimaging and psychiatric genetics.

52

References:

1) McIntosh et al., “Polygenic risk for schizophrenia is associated with cognitive change between childhood and old age”. Biological Psychiatry, 2013. 73(10): 938-43. 2) Whalley et al., “Polygenic dissection of cognition and personality traits in Major Depressive

Disorder using independently generated scores for schizophrenia & bipolar disorder”. (In Prep)

3) Whalley et al., “Polygenic Risk and White Matter Integrity in Individuals at High Risk of Mood Disorder”. Biological Psychiatry 2013 (74): 280–286

4) Cox et al., “Compensation or inhibitory failure? Testing hypotheses of age-related right frontal lobe involvement in verbal memory ability using structural and diffusion MRI”. Cortex. 2015 (63): 4–15

Contact email address(es): [email protected], [email protected], [email protected]

Institute/Centre and/or other useful web addresses: www.pst.ed.ac.uk http://www.ccace.ed.ac.uk/

53

Project title:

Using big data to identify precision medicine targets for

Project description: The UK has the highest prevalence of asthma and some of the poorest health outcomes from asthma in the world. This results in a substantially high healthcare, personal and societal burden for the UK [1]. It is now widely recognised that asthma is in fact a heterogeneous group of conditions [2] and greater appreciation of asthma phenotypes and endotypes has the potential to allow improved tailoring of treatments. This understanding is as yet not translated into primary care where over 90% of asthma patients are exclusively managed. General practice databases are a rich source of phenotypic/endotypic information [3] and are further enriched through linkage to other routinely collected sources of data or by machine learning (ML) and natural language processing (NLP) of electronic free text recorded in the clinical record. These databases are also increasingly being used in the support of clinical trials. For instance, the increasing amount of electronic health data being collected for patient care can be used to support clinical research by identifying potential study participants and following up for outcomes [4].

The primary aim of this PhD will be to investigate approaches to deeper characterisation of asthma phenotypes/endotypes using electronic health records in primary care with a view to identifying those who have the potential to benefit from precision medicine treatment approaches.

 The PhD will involve exploration and interrogation of very large GP databases (SIVE II, OPCRD and SAIL) using a range of epidemiological, machine learning and NLP techniques and then validating the potential signals in a second independent database;  Building on this database infrastructure, a computerised decision support algorithm will be constructed that will help frontline clinicians to identify those who may benefit from further investigation/specific interventions;  The extent to which routine electronic medical record systems can support the identification of potential study participants (e.g. by characterising the allergic asthma phenotype) and follow them for longer-term outcomes via a pilot clinical trial will be explored.

Based within the Usher Institute of Population Health Sciences and Informatics, the PhD will capitalise on The Farr Institute’s and the Asthma UK Centre for Applied Research’s (AUKCAR) extensive informatics and disease-specific infrastructures and the synergistic relationship between these UK-wide endeavours. These organisations have embedded patient and public involvement and knowledge exchange, in addition to strong links with industry which will be encouraged to develop through placements or workshops. AUKCAR members also sit on National Institute for Health and Care Excellence (NICE) and British Thoracic Society/ Scottish Intercollegiate Guidelines Network (BTS/SIGN) asthma guidelines and they will, therefore, be ideally placed to translate the findings from this work into national asthma guidelines.

The PhD student will be trained in advanced epidemiology statistical techniques, natural language processing, machine learning and informatics. The PhD student will also gain knowledge regarding current Randomised Clinical Trials (RCT) for people with asthma, trial and use of data in RCT design methodology and the use of linked electronic health records and emerge as a major leader in the field of population-based informatics and precision medicine of asthma.

54

References:

1. Gupta R, Sheikh A, Strachan DP, et al. Burden of allergic disease in the UK: secondary analyses of national databases. Clin Exp Allergy 2004; 34, 520-6

2. Bush A, Kleinert S, Pavord ID. The asthmas in 2015 and beyond: a Lancet Commission.

Lancet 2015;385:1273-5

3. Simpson CR, et al. Seasonal Influenza Vaccination Effectiveness II (SIVE II): Use of a large national primary care and laboratory-linked dataset. NIHR-HTA. 4. Wallace P, Delaney B, Sullivan F. Unlocking the research potential of the GP electronic care record. Br J Gen Pract. 2013;63:284-5

Contact email address(es): Dr Colin Simpson Prof Aziz Sheikh Dr Saturnino Luz

Institute/Centre and/or other useful web addresses: The Usher Institute for Population Health Sciences and Informatics The Farr Institute Asthma UK Centre for Applied Research

55

Project title:

Using machine-learning analyses of epigenetic data to complement eye disease genetics

Project description:

Our risk of developing common diseases such as cancer, diabetes and heart disease is influenced by our genetics. The key recent development in our understanding of this genetic risk is the demonstration that it is conferred by DNA sequence polymorphisms that affect the transcriptional regulation of genes rather than the proteins they encode1. However, while our ability to understand the molecular impact of polymorphisms on proteins is well developed, we cannot reliability evaluate how they affect gene regulation. This means that it is very difficult to determine which polymorphisms confer the genetic risk of common diseases.

Regulatory elements in the human genome are demarcated by chromatin and DNA modifications collectively known as epigenetic marks. Thus, the effects of DNA polymorphisms on epigenetic marks can be used as a surrogate marker of their effect on gene regulation. Thus, epigenomic profiling of disease free individuals can be used to determine which polymorphisms affect epigenetic marks and delineate those that confer a risk of disease.

This approach suffers from two problems: it is limited to easily accessible cell types such as those in the blood and difficult to apply to diseases affecting organs such as the eye and brain. It can also only be applied to polymorphisms that are commonly observed in the population and not to evaluate isolated rare mutations.

We have shown that it is possible to computationally model the relationship between DNA sequence and epigenetic marks using machine learning2. Others have shown that this approach can be extended to predict the effect of sequenced polymorphisms on the local levels of these marks3. Thus it is possible to leverage limited epigenomic data collected from cell lines or a small number of patient samples to evaluate the potential regulatory impact of polymorphisms in silico.

Aims

The proposed project aims to apply recently developed machine-learning approaches to understand the genetic basis of eye disease, particularly focusing on age-related macular degeneration for which the genetic risk is well established:

1. Develop a computational model to predict the effect of sequence polymorphisms on epigenetic marks from available retinal cell lines.

2. Apply these models to delineate potential causative polymorphisms for macular degeneration risk.

Training outcomes

This project uses macular degeneration as a paradigm to construct a computational framework that can be broadly applied to human disease genetics. It is a collaboration between two world- leading centres, the Medical Research Council Human Genetics Unit and the University of Edinburgh School of Informatics. This provides the student with excellent interdisciplinary research training in both genetics and informatics. The student will be embedded within Dr Sproul’s research laboratory that undertakes both computational and experimental

56 epigenomics research. The supervisory team offers complementary skills in machine learning (Dr Sanguinetti) and human genetics with a particular focus on eye disease (Drs Hayward and Vitart).

References:

1. Maurano et al (2012) Systematic localization of common disease-associated variation in regulatory DNA. Science 337:1190-5.

2. Benveniste et al (2014) Transcription factor binding predicts histone modifications in human cell lines. PNAS 111:13367-72.

3. Lee et al (2015) A method to predict the impact of regulatory variants from DNA sequence. Nature Genetics 47:955-61.

Contact email address(es):

Duncan Sproul: [email protected]

Institute/Centre and/or other useful web addresses: Human Genetics Unit:

Individual researcher websites: Duncan Sproul: http://www.hgu.mrc.ac.uk/people/d.sproul.html Guido Sanguinetti: http://www.inf.ed.ac.uk/people/staff/Guido_Sanguinetti.html Veronique Vitart: http://www.hgu.mrc.ac.uk/people/v.vitart.html Caroline Hayward: http://www.hgu.mrc.ac.uk/people/c.hayward.html

57

Project title:

MUHUNT – the ultra sensitive & robust detection of mosaicism and genetic heterogeneity in cancer Project description:

Genetic mutations provide the raw material for evolution, they are responsible for heritable disease and driving the development of cancer. A new mutation arises in a single DNA duplex in a single cell and despite the remarkable fidelity of DNA replication, such a mistake occurs in most human cell divisions. As a consequence we are all mosaics of cells with constellations of different mutational changes. Most of these changes are of little functional consequence but some disrupt normal cellular control processes and can ultimately lead to the development of cancer. But it is not just cancer, a somatic mutation occurring early in development may be inherited by many daughter cells in a particular tissue and can manifest as a congenital disease. We have previously found such cases (Ansari et al, 2014) but due to the cellular heterogeneity they are readily missed in standard genetic tests. Special strategies are required to reliably and robustly detect rare mutations in a background of more abundant non-mutated sequences (Kinde et al, 2011), often necessitating modified sequencing strategies. We have prototyped a means of analysis that can identify rare, even unique (single DNA duplex) mutations with high confidence genome-wide from standard whole genome and some types of exome sequencing data. The first aim of this project would be to further develop the computational approach and apply it to the huge resources of genome sequence data already available and being produced at an accelerating rate. We will also develop strategies of modified sequence library preparation allowing us to enhance our resolution at specific sites or categories of sites across the genome. Achieving these initial aims will allow the candidate to survey mutation patterns across different cell types, quantify the genetic heterogeneity and mutation patterns of tumours (samples available through Malcolm Dunlop and major consortia) and it is envisaged will directly lead to the genetic diagnosis of disease previously missed in mosaic individuals (cohorts available through David FitzPatrick and major consortia). This study will complement ongoing work investigating germline mutation patterns in the Taylor group (Reijns et al, 2015) and has potential for wide application to cancer genomics. There is the possibility of an industry placement during this project.

References:

Ansari et al. Genetic heterogeneity in Cornelia de Lange syndrome (CdLS) and CdLS-like phenotypes with observed and predicted levels of mosaicism. Journal of Medical Genetics. 2014 https://dx.doi.org/10.1136/jmedgenet-2014-102573 Kinde I, Wu J, Papadopoulos N, Kinzler KW, Vogelstein B. Detection and quantification of rare mutations with massively parallel sequencing. PNAS. 2011. https://dx.doi.org/10.1073/pnas.1105422108 Reijns MA, Kemp H, Ding J, de Proce SM, Jackson AP, Taylor MS. Lagging-strand replication shapes the mutational landscape of the genome. Nature 2015. https://dx.doi.org/10.1038/nature14183 Meynert AM, Ansari M, FitzPatrick DR, Taylor MS. Variant detection sensitivity and biases in whole genome and exome sequencing. BMC Bioinformatics 2014. https://dx.doi.org/10.1186/1471-2105-15-247

58

Contact email address(es): [email protected] [email protected] [email protected]

Institute/Centre and/or other useful web addresses: https://www.hgu.mrc.ac.uk/ http://www.igmm.ed.ac.uk/ https://taylor.igmm.ed.ac.uk/

59

Project title:

Disentangling the role of functional variation in disease

Project description:

In recent years a large number of genetic loci linked to complex diseases such as cancer have been identified. However despite this success, challenges remain. Not only are most of the genomic regions linked to disease yet to be discovered, but of those that have been identified, the mechanisms by which they mediate disease generally remain unclear. To address these gaps in understanding and gain further biological insights, we propose a new approach that takes advantage of publicly available tissue-specific expression data and health-related phenotypes and genotypes from the UK Biobank. Previously we have shown that it is possible to accurately predict tissue-specific intermediate phenotypes (such as gene expression levels) using the genotype of an individual and that these predicted phenotypes can subsequently be tested for links to important phenotypes [2]. Using this approach we propose to use the large number of tissues present in the GTEx Project, to impute expression levels into the 500,000 individuals from the UK Biobank. This will allow us to model health-related phenotypes gathered in the UK Biobank as a function of the imputed tissue-specific intermediate (expression) traits, thus gaining information on the biological mechanisms underpinning the phenotype that is not available from the use of genotypes alone. The results of this project have the potential to increase significantly our understanding of the mechanisms underlying disease risk and have an impact on the methodology used to predict disease status, which are both fundamental in precision medicine.

This big data project will involve handling large datasets and performing computationally demanding statistical analysis within the high performance computing environment of the UK national supercomputer ARCHER (http://www.archer.ac.uk). Although, we have previously developed software to run this type of analyses [3] within a high performance environment, experience or willingness to learn to code software for a high performance computing environment would be desirable.

The successful candidate will benefit from the interactions with the systems biology lab of Tom Michoel (Secondary supervisor) and close collaborators in the fields of bioinformatics (James Prendergast) and quantitative genetics (Pau Navarro).

References:

[1] Eichler et al. 2010. Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet.

[2] Rowe et al. 2013. Complex Variation in Measures of General Intelligence and Cognitive Change. PLoS One.

[3] Canela-Xandri et al. 2015. A new tool called DISSECT for analyzing large genomic datasets using a Big Data approach. Nat Commun.

60

Contact email address(es): mailto:[email protected]

Institute/Centre and/or other useful web addresses:

Dissect software. http://www.dissect.ed.ac.uk Albert Tenesa's Roslin Institute web page. Tom Michoel’s web page. Albert Tenesa's MRC-HGU web page.

61

Project title:

Using genome sequence data to combat antimicrobial resistance Project description:

Whole genome sequencing (WGS) is now routinely applied to characterize pathogenic bacteria in multiple ways, including their antimicrobial resistance (AMR) profiles. However, the sample- to-actionable genome (StAG) pipeline is still too slow to be clinically relevant. This project will tackle a key element of StAG by developing a suite of bioinformatics algorithms that convert raw sequence data to comprehensive typing information on pathogenic bacteria in a faecal, blood or respiratory sample. To be clinically useful (i.e. fast) and to supersede traditional laboratory culturing of bacteria, we need to be able to use culture-free data obtained by ‘agnostic’ direct sequencing of samples. This represents an intriguing and challenging bioinformatics problem which is the key element of this project.

WGS data can also be used to track the movement of bacteria and/or resistance determinants between different populations, here between humans and livestock populations in Kenya. This is important because the extent to which antibiotic usage in livestock contributes to the burden of resistance in human populations has yet to be quantified in this setting, and will have practical implications in terms of regulating antibiotic usage.

Microbiological validation will take place at a laboratory in Nairobi. Bacteria will be typed for pathotype and virulence factors, as well as AMR profiles, particularly to drugs used in Kenya as first line treatments for sepsis and urinary tract infections (chloramphenicol, gentamicin, amoxicillin and cotrimoxazole). The specificity and sensitivity of sequence-based typing will be quantified.

The project is expected to result in several high impact publications in scientific journals. In addition, by linking the bioinformatics pipeline to portable, rapid sequencing platforms there is the opportunity to help develop a one-stop bacterial typing system suitable for use even in remote settings, providing vital clinical information in real time without the need to submit specimens to a central laboratory facility, a step change in diagnostic capability that will help save lives.

The project will provide the student with multiple training opportunities in microbiology, bioinformatics and diagnostics. The student will gain experience of field work overseas (in Kenya) and, importantly, of integrating field and laboratory studies – involving both bacterial typing and genome sequencing – with state-of-the-art quantitative analysis, especially bioinformatics but also elements of statistical epidemiology and phylogeographic analyses. Formal training in bioinformatics will be provided during Year 1 and support for the student will be available throughout. The student will be part of a large group of researchers working on related problems that will provide a stimulating, supportive and world class academic environment.

References: Woolhouse, M., Ward, M., van Bunnik, B. and Farrar, J. (2015). Antimicrobial resistance in humans, livestock and the wider environment. Philosophical Transactions of the Royal Society, Biological Sciences 370: e20140083 (7pp).

McAdam, P.R., Richardson, E.J. and Fitzgerald, J.R. (2014). High-throughput sequencing for the study of bacterial pathogen biology. Current Opinion in Microbiology 19: 106-113.

62

Ward, M.J., Gibbons, C.L., McAdam, P.R., van Bunnik, B.A.D., Girvan, E.K., Edwards, G.F. Fitzgerald, J.R. and Woolhouse, M.E.J. (2014). Time-scaled evolutionary analysis of the transmission and antibiotic resistance dynamics of Staphylococcus aureus CC398. Applied and Environmental Microbiology 80: 7275-7282.

Contact email address(es): [email protected] [email protected] [email protected]

Institute/Centre and/or other useful web addresses: www.epigroup.biology.ed.ac.uk http://ciie.bio.ed.ac.uk/ http://www.eid.ed.ac.uk/

63