An Investigation into the Signatures of Evolution in Pathogen Effector Genes Naveed Ishaque Thesis submitted for the degree of Doctor of Philosophy University of East Anglia The Sainsbury Laboratory January 2012 © This copy of the thesis has been supplied on the condition that anyone who consults it is understood to recognize that its copyright rests with the author and that no quotation from the thesis, nor any information derived therefrom, may be published without the author’s prior written consent. a b Abstract The oomycete Hyaloperonospora arabidopsidis (Hpa) is a pathogen of Arabidopsis thaliana and a model for dissection of A. thaliana pathogen response networks. Hpa suppresses plant immunity by secreting effector proteins into the host thereby interfering with the host defence response and facilitating its own growth. The host’s resistance genes are able to recognise certain alleles of some effectors and trigger an immune response. This interaction exerts selection pressure on effectors and resistance genes and has been likened to an evolutionary arms race. It has been shown that some effectors of Hpa that increase virulence activity and have certain alleles recognised by the host are under positive selection. I investigated sequence variation in Hpa using the Illumina second generation sequencing platform. I was involved in the Hpa genome sequencing project. Using Illumina sequenced reads I isolated and removed contaminations, identified and integrated 4 Mb of novel sequence and developed new methods to evaluate genome completeness. I then trained and used various gene prediction algorithms to predict gene models for Hpa. Annotation and analysis of the gene models revealed interesting aspects about Hpa biology, including incomplete nitrogen and sulphur assimilation pathways, a reduced complement of effectors compared to other similar pathogens and a significant increase of sequence variation in candidate effectors (Baxter et al., 2010). For ease of visualisation I implemented a genome browser, which displays the gene models, sequence variation, and expression data of Hpa. I developed a novel pipeline that performs phylogenetic and evolutionary analysis to identify genes under selection. Comparative genomics analysis using this pipeline revealed that some effectors and under higher selective pressure compared to other genes. Analysis of the most highly evolving genes reveals a novel class of effectors, providing a valuable resource for further elucidating mechanisms of effector biology. c d Table of Content Acknowledgements ................................................................................................................. i Publications arising from this thesis ....................................................................................... ii Chapter 1 – General Introduction .......................................................................................... 1 1.1 Introduction and rationale ........................................................................................... 1 1.2 Evolution, variation and selection ................................................................................ 2 1.2.1 Variation ................................................................................................................ 2 1.2.2 Mutation ................................................................................................................ 2 1.2.3 Modelling variation ............................................................................................... 3 1.2.3.1 Genetic drift .................................................................................................... 3 1.2.3.2 Hardy-Weinberg equilibrium .......................................................................... 3 1.2.3.3 Coalescent theory ........................................................................................... 4 1.2.4 Evolution ................................................................................................................ 4 1.2.4.1 Natural selection ............................................................................................ 4 1.2.4.2 Neutral theory of molecular evolution ........................................................... 5 1.2.5 Evolutionary tests .................................................................................................. 5 1.2.5.1 ω ratio ............................................................................................................. 5 1.2.5.2 Evolutionary modelling with variable ω ......................................................... 5 1.2.5.3 Tests of neutrality ........................................................................................... 6 1.2.5.3.1 Tajima’s D ................................................................................................ 7 1.2.5.3.1 Fu and Li’s D, D*, F and F*. ...................................................................... 7 1.2.5.3.1 Fu’s Fs ...................................................................................................... 7 1.3 Plant-pathogen co-evolution ....................................................................................... 7 1.3.1 Plant immunity ...................................................................................................... 8 1.3.2 Effector translocation and structure ..................................................................... 9 1.3.3 Hpa effector evolution ........................................................................................ 11 1.4 Genomics and Sequencing ......................................................................................... 14 1.4.1 Second generation sequencing ........................................................................... 14 1.4.1.1 454 Pyrosequencing ..................................................................................... 15 1.4.1.2 Sequencing by synthesis ............................................................................... 16 1.4.2 Second generation sequencing applications and tools ....................................... 17 1.4.2.1 Genome assembly ........................................................................................ 17 1.4.2.2 Alignment and variant calling ....................................................................... 18 e 1.4.2.3 Transcriptomics ............................................................................................ 19 1.5 Objectives of the project ............................................................................................ 20 Chapter 2 – Materials and Methods .................................................................................... 21 2.1 Biological material and sequencing ........................................................................... 21 2.1.1 Sanger sequencing............................................................................................... 21 2.1.2 Illumina Sequencing ............................................................................................ 21 2.1.2.1 DNA extraction. ............................................................................................ 21 2.1.2.2 Illumina DNA library preparation and sequencing. ...................................... 21 2.1.2.3 Quality checking the Illumina preparation and sequencing......................... 22 2.1.3 Illumina cDNA sequencing. .................................................................................. 23 2.1.4 454 Sequencing ................................................................................................... 23 2.2 Software and protocols .............................................................................................. 24 2.2.1 Assembly ............................................................................................................. 24 2.2.1.1 Assembly of Sanger reads ............................................................................ 24 2.2.1.2 Short read assembly ..................................................................................... 25 2.2.1.3 Hybrid assembly ........................................................................................... 25 2.2.2 Alignment ............................................................................................................ 25 2.2.2.1 DNA alignment ............................................................................................. 25 2.2.2.2 cDNA alignment ............................................................................................ 28 2.2.3 Gene predictions ................................................................................................. 29 2.2.4 Gene annotation.................................................................................................. 29 2.2.5 Evolutionary analysis ........................................................................................... 30 Chapter 3 – Use of sequencing by synthesis to evaluate and improve the Hyaloperonospora arabidopsidis genome assembly .......................................................................................... 31 3.1 Introduction ............................................................................................................... 31 3.2 Results and discussion ............................................................................................... 32 3.2.1 Establishing a short read de-novo assembly for Hpa Emoy2 .............................

Table of Contents

Sequencing and Comparative Analysis of <I>De Novo</I> Genome

Assessment of Next Generation Sequencing Technologies for <I>De Novo</I> and Hybrid Assemblies of Challenging Bacter

Genome Evolution of Symbiodiniaceae

Nring Thesis Corrected 161020.Pdf

Optimizing Experimental Design for Genome Sequencing and Assembly with Oxford Nanopore Technologies John M

Genomic Comparison of Mycoplasma Species Isolated from Commercial Chickens in South Africa by Amanda Beylefeld Submitted in Fulf

2014 Adetunjimodupeo

Design and Evaluation of Two Hybrid Genome Assembly Approaches

Long-Read Sequencing in Deciphering Human Genetics to a Greater Depth

The Utility of Long Read Sequencing for the Discovery of Genomic Retroviral Insertions and for Hybrid Genome Assembly

History and Current Approaches to Genome Sequencing and Assembly

Genome Assembly and Discovery of Structural Variation in Cultivated Potato Taxa