A Computational Genomics Perspective

Abstract of thesis entitled Understanding the Pathogenic Fungus Penicillium marne®ei:A Computational Genomics Perspective by James J. Cai for the degree of Doctor of Philosophy at The University of Hong Kong in May 2006 Penicillium marne®ei, a thermally dimorphic fungus that alternates be- tween a ¯lamentous and a yeast growth form in response to changes in its environmental temperature, has become an emerging fungal pathogen endemic in Southeast Asia. De¯ning the genomics of P. marne®ei will provide a better understanding of the fungus. This thesis reports the draft sequence of the P. marne®ei genome as- sembled from 6.6 coverage of the genome through whole genome shotgun sequencing. The 31 Mb genome obtained from the assembly contains 10,060 protein-coding genes. The complete mitochondrial genome is 35 kb long and its gene content and gene order are very similar to that of Aspergillus. An annotation system and P. marne®ei genome database (PMGD) were developed to allow a preliminary annotation of the sequences and provide an intuitive graphic interface to give curators and users ready access to the annotation and the underlying evidence, and a Matlab-based software package, MBEToolbox, was developed for data analysis in phylogenetics and comparative genomics. A well-designed and structured annotation system and powerful sequence analysis software are essential requirements for the success of large-scale genome analysis projects. Analysis of the gene set of P. marne®ei provided insights into the adaptations required by a fungus to cause disease. The genome encodes a diverse set of putative virulence genes such as proteinase, phospholi- pase, metacaspase and agglutinin, which may enable the fungus to adhere to, colonise and invade the host, adapt to the tissue environment, and avoid the host's humoral and cellular defences of the innate and adaptive immune responses. A gene cluster involved in biosynthesis of melanin, a known virulence factor in some other pathogenic fungi, was also identi- ¯ed in the genome, indicating that P. marne®ei may produce melanin or melanin-like immunosuppressive compounds that protect the fungus against immune e®ector cells. More interestingly, P. marne®ei genome contains more intragenic tandem repeats (IntraTRs) than other fungi. These IntraTRs encoding repeat domains/motifs may create quantita- tive variation in surface proteins, allowing the fungus to `disguise' itself to slip past the vigilant defences of the host immune system. The genome sequence of P. marne®ei also revealed a number of genes associated with mating processes and sexual development, suggesting an unidenti¯ed sexual cycle in the fungus. The extent and evolutionary patterns of duplicate genes in P. marn- e®ei and other ascomycetes were compared. All ascomycetes show a certain degree of redundancy (though its extent can vary considerably), which may provide the foundation for the specialisation of fungal genes and form the basis for fungal diversi¯cation. An inverse relationship be- tween the lineage speci¯city of a gene and gene's evolutionary rate was also discovered, implying that an accelerated evolutionary rate may be responsible for the emergence of lineage speci¯c genes. The genome sequence of P. marne®ei has provided our ¯rst glimpse into the genomic basis of the physiology of the dimorphic ¯lamentous fungus. Understanding the Pathogenic Fungus Penicillium marne®ei: A Computational Genomics Perspective BY James J. Cai M.D., Henan Medical University, 1996 M.S., University of New South Wales, 2001 THESIS Submitted in partial ful¯llment of the requirements for the degree of Doctor of Philosophy at The University of Hong Kong May 2006 To Yan \Any living cell carries with it the experiences of a billion years of experimentation by its ancestors." Max Delbruck (1949) DECLARATION I declare that this thesis represents my own work, except where due acknowledgement is made, and that it has not been previously included in a thesis, dissertation or report submitted to this University or to any other institution for a degree, diploma or other quali¯cations. Signature: Date: i ACKNOWLEDGEMENTS First of all, a special thanks goes to my principle supervisor, Pro- fessor Kwok-yung Yuen, for his enthusiasm and support during the course of my study. My heartfelt thanks to Dr. David K. Smith and Dr. Xuhua Xia who introduced me to the fascinating world of bioinformatics and molecular evolution. Thanks to my friends and colleagues for their moral support and technical assistance over the past four years especially Dr. Patrick Woo, Dr. Sussana Lau, and Jade, Huang Yi, Ken, Haw, Candy, Rachel ... I am also grateful to my external mentor Dr. Gavin Huttley and fellow colleagues Peter, Ray, Helen and Brett in the Australian National University. Finally, I am very grateful to my wife and my parents. Without their support, this work would not have been possible. ii TABLE OF CONTENTS Declaration i Acknowledgements ii List of Figures x List of Tables xii Abbreviations xiv Glossary xviii Introduction 1 Chapter 1: The draft genome sequence of Penicillium marne®ei 4 1.1 Introduction .......................... 4 1.2 Literature Review ...................... 5 1.2.1 General fungal biology ................ 5 1.2.2 P. marne®ei, as an important fungal pathogen . 7 1.2.3 Penicilliosis marne®ei . 13 1.2.4 Fungal genome projects . 20 1.3 Materials and Methods .................... 23 1.3.1 Strain and DNA preparation . 23 1.3.2 Library construction, shotgun sequencing . 24 1.3.3 Sequence assembly . 24 1.3.4 Data release ...................... 24 iii 1.4 Results ............................. 25 1.4.1 Assembly and general characteristic . 25 1.4.2 Genome architecture and co-linearity . 29 1.4.3 Gene duplications (multigene families) and com- parisons ........................ 30 1.4.4 Interspecies proteome comparison . 31 1.4.5 Lineage-speci¯c genes . 33 1.4.6 Cell signalling and morphogenesis . 35 1.4.7 Potential mating ability . 35 1.4.8 Putative virulence genes . 35 1.4.9 Cell wall antigens and biosynthetic genes . 35 1.5 Discussion ........................... 37 Chapter 2: Penicillium marne®ei genome database and annotation pipeline 40 2.1 Introduction .......................... 40 2.2 Literature Review ...................... 42 2.2.1 Methods for predicting protein function . 42 2.2.2 Software/database systems for protein function pre- diction ......................... 44 2.2.3 The art of gene ¯nding . 47 2.3 Implementation ........................ 50 2.3.1 Annotation pipeline . 50 2.3.2 Assembly process ................... 53 2.3.3 Gene ¯nding ..................... 55 2.3.4 Database and databank to store results . 57 2.3.5 Perl source code collection . 58 2.3.6 Genome browser con¯guration . 58 2.3.7 Synteny identi¯cation . 59 iv 2.4 Results ............................. 60 2.4.1 Statistics of assembly . 60 2.4.2 Genome size estimation . 61 2.4.3 Accuracy of gene ¯nding . 63 2.4.4 Combination of gene ¯nding . 63 2.4.5 Database and databank to store results . 65 2.5 Discussion ........................... 65 Chapter 3: Mitochondrial genome of Penicillium marn- e®ei 69 3.1 Introduction .......................... 69 3.2 Materials and Methods .................... 72 3.2.1 Library construction and sequence assembly . 72 3.2.2 Mitochondrial DNA sequence annotation . 72 3.2.3 Phylogenetic analysis . 73 3.2.4 Mitochondrial DNA sequences in nuclear genome . 73 3.3 Results and Discussion .................... 74 3.3.1 Gene content and genome organisation . 74 3.3.2 Protein coding genes . 74 3.3.3 Genetic code and codon usage . 81 3.3.4 tRNA genes ...................... 81 3.3.5 Other RNA genes . 81 3.3.6 Group I introns .................... 84 3.3.7 Mitochondrial DNA sequences in nuclear genome . 85 Chapter 4: Genomic evidence for the presence of melanin biosynthesis gene cluster in Penicillium marn- e®ei 88 4.1 Introduction .......................... 88 4.2 Literature Review ...................... 89 v 4.2.1 Potential virulence factors . 90 4.2.2 Genomic approaches in identi¯cation of virulence factors ......................... 95 4.3 Materials and Methods .................... 96 4.3.1 Identi¯cation of melanin biosynthesis genes in P. marne®ei ....................... 96 4.3.2 Multiple alignments and phylogenetic analyses . 97 4.4 Results and Discussion .................... 97 4.4.1 Melanin gene cluster present in P. marne®ei . 97 4.4.2 Disrupted aﬂatoxin biosynthesis gene cluster in P. marne®ei . 101 4.4.3 Absence of penicillin biosynthesis genes in P. marn- e®ei . 103 Chapter 5: Mating abilities in Penicillium marne®ei 105 5.1 Introduction . 105 5.2 Literature Review . 107 5.2.1 Mating in hemiascomycete yeasts . 108 5.2.2 Mating in ¯lamentous ascomycetes . 109 5.3 Materials and Methods . 112 5.4 Results and Discussion . 113 5.4.1 Homologs of known sexual genes . 114 5.4.2 Mating type genes . 116 5.4.3 Mating pheromone precursor genes . 120 5.4.4 Mating pheromone processing genes . 123 5.4.5 Mating pheromone receptor and other GPCRs . 126 Chapter 6: Exploring the genetic components associated with the dimorphism of Penicillium marnef- fei 128 vi 6.1 Introduction . 128 6.2 Materials and Methods . 130 6.2.1 Sequence similarity . 130 6.2.2 Phylogenetic Analysis . 131 6.3 Results and Discussion . 131 6.3.1 Perception of external stimuli by cellular sensors . 132 6.3.2 Transduction of biochemical signal . 134 6.3.3 Alteration of the genomic expression . 136 6.3.4 Structural reorganization towards the morphologi- cal change . 141 Chapter 7: Intragenic tandem repeats in Penicillium marn- e®ei and other ascomycetes 144 7.1 Introduction . 144 7.2 Materials and Methods . 146 7.2.1 Identi¯cation of coding tandem repeats . 146 7.2.2 Sequence analysis . 146 7.3 Results and Discussion . 146 Chapter 8: Extent and evolutionary pattern of duplicate genes in Penicillium marne®ei and other ascomycetes 155 8.1 Introduction .

A Computational Genomics Perspective

Pathways and Networks Biological Meaning of the Gene Sets

Web-Link Name Reference DRSC Flockhart I, Booker M, Kiger A, Et Al.: Flyrnai: the Drosophila Rnai Screening Center Database

Genmapp 2: New Features and Resources for Pathway Analysis Nathan Salomonis Gladstone Institute of Cardiovascular Disease

Genmapp 2: New Features and Resources for Pathway Analysis Nathan Salomonis Gladstone Institute of Cardiovascular Disease

Transcriptomic Uniqueness and Commonality of the Ion Channels and Transporters in the Four Heart Chambers Sanda Iacobas1, Bogdan Amuzescu2 & Dumitru A

1471-2105-8-217.Pdf

Chromatin Remodelling Complex Dosage Modulates Transcription Factor Function in Heart Development

Endocrine System Local Gene Expression

Systematic and Integrative Analysis of Proteomic Data Using Bioinformatics Tools

Mappfinder: Using Gene Ontology and Genmapp to Create a Global

Microarray Data Analysis Tool (Mat)

An Exchange Format for Genmapp Biological Pathway Maps