<<

The Pennsylvania State University

The Graduate School

Huck Institute of the Life Sciences

A COMPARATIVE GENOMIC INVESTIGATION OF

NICHE ADAPTATION IN FUNGI

A Dissertation in

Integrative Biosciences

by

Venkatesh Moktali

© 2013 Venkatesh Moktali

Submitted in Partial Fulfillment of the Requirements for the Degree of

Doctor of Philosophy

August 2013 !

The dissertation of Venkatesh Moktali was reviewed and approved* by the following:

Seogchan Kang Professor of Plant Pathology and Environmental Microbiology Dissertation Advisor Chair of Committee

David M. Geiser Professor of Plant Pathology and Environmental Microbiology

Kateryna Makova Professor of Biology

Anton Nekrutenko Associate Professor of Biochemistry and Molecular Biology

Yu Zhang Associate Professor of Statistics

Peter Hudson Department Head, Huck Institute of the Life Sciences

*Signatures are on file in the Graduate School

! !

Abstract

The Kingdom Fungi has a diverse array of members adapted to very disparate and the most hostile surroundings on earth: such as living plant and/or animal tissues, soil, aquatic environments, other microorganisms, dead animals, and exudates of plants, animals and even nuclear reactors. The ability of fungi to survive in these various niches is supported by the presence of key enzymes/proteins that can metabolize extraneous harmful factors.

Characterization of the of these key proteins gives us a glimpse at the molecular mechanisms underpinning adaptations in these organisms. proteins (CYPs) are among the most diversified protein families, they are involved in a number of processed that are critical to fungi. I evaluated the evolution of Cytochrome P450 proteins (CYPs) in order to understand niche adaptation in fungi. Towards this goal, a previously developed database the fungal cytochrome P450 database (FCPD) was improved and several features were added in order to allow for systematic comparative genomic and phylogenomic analysis of CYPs from numerous fungal . Specifically, an in-house platform was developed to standardize and improve the protein sequence clustering procedure, more than 100 fungal and non-fungal genomes were added, putative functional classification of CYPs into three broad categories was added, the CYP clan/family classification was extended to 117 CYP clans and 292 families. With these new features the FCPD 1.2 (http://p450.riceblast.snu.ac.kr/) was published with systematic classification of 22,940 CYPs from 213 species.

Using the CYP data from the FCPD 1.2 I carried out detailed phylogenomic analysis of

6108 CYPs belonging to 51 species from the subphylum Pezizomycotina. The analysis revealed

CYPomes that were specific to the niches occupied by the different fungi. There was preferential

! iii

! ! presence of CYP families among pathogenic species. I introduced a putative functional diversification ratio to identify divergence of “CYPomes”, the ratio suggests that the non- pathogenic fungi tended to have comparatively more CYP families/clans compared to the pathogenic fungi. I also identified the CYP clans ancestral to Pezizomycotina fungi, the results confirm previously estimated ancestral CYP clans CYP51, CYP61 and add CYP52 and CYP58 to the set of clans present in the last common ancestor of Pezizomycotina fungi. Putative metabolic classification of CYPs in the group suggests increase in CYPs involved in secondary metabolism among fungal plant pathogens. I also carried out family evolution analysis with the CYP data, the analyses suggested high number of gains and losses among serious plant pathogens such as Magnaporthe oryzae and Fusarium spp.

Calcium is a ubiquitous ion that plays a major role in myriad processes in the fungal cell.

I developed the fungal calcium-signaling database (FCSD: http://fcsd.ifungi.org/) in order to understand the roles of interacting directly or indirectly with calcium. The computational pipeline standardized in the FCPD was used to cluster the calcium signaling gene dataset.

Comparative genomics analyses of calcium signaling genes indicated varying complexity of the calcium-signaling pathway across fungal phyla. I also carried out detailed gene family analysis of different calcium signaling genes, niche-specific gene gains were observed through this analysis. I believe that the various features built into FCSD encourage community participation and research on understanding calcium signaling in fungi.

! iv

! !

Contents Pages

List of Tables x

List of Figures xii

Acknowledgements xv

1 Introduction 1

1.1 History of Fungal Research…………………………………………………...... 1 1.2 Diversity in Kingdom Fungi…………………………………………………………..2 1.3 Variation in Fungal content…………………………………………………..3 1.4 Emergence of Comparative Genomics and “Phylogenomics”………………………..7 1.5 Investigating evolution via comparative analysis of protein families………...... 9

2 Pilot Project: Optimization of Cytochrome P450 sequence clustering procedure 10 2.1 Background…………………………………………………………………………..11 2.1.1 Cytochrome P450 proteins…………………………………………………11 2.1.2 Cytochrome P450 nomenclature system…………………………...... 14 2.1.3 FCPD Pipeline……………………………………………………………..15 2.1.4 Fungal cytochrome P450 database: drawbacks …………………...... 16 2.2 Results and Discussion………………………………………………………………17 2.2.1 Genus Aspergillus …………………………………………………………17 2.2.2 P450 Analysis Platform……………………………………………………21 2.2.3 Species-specific CYPs……………………………………………………..26 2.2.4 CYPs involved in Secondary Metabolism Clusters………………………..29 2.2.5 N. fischeri cluster…………………………………………………………..31 2.2.6 Putative secretory CYPs…………………………………………...... 35 2.3 Materials & Methods………………………………………………………...... 38 2.3.1 Data………………………………………………………………...... 38

! v

! !

2.3.2 Analyses Tools……………………………………………………………..38 2.4 Conclusion …………………………………………………………………………..39

3 Systematic and searchable classification of cytochrome P450 proteins encoded by fungal and oomycete genomes 40 3.1 Background…………………………………………………………………………..42 3.2 Results and Discussion………………………………………………………………44 3.2.1 Identification of CYPs and optimization of clustering parameters...... 44 3.2.2 CYP clustering in fungal cytochrome P450 database (FCPD) 1.2...... 46 3.2.3 Wide variation of the CYPome……………………………………………47 3.2.4 Phyletic distribution of CYP families and clans in fungi………………….48 3.2.5 Functional annotation and classification of CYP clusters ………...... 50 3.2.6 Detailed analysis of specific clans…………………………………………51 3.2.6a Clans 51 and 61 ………………………………………………….52 3.2.6b Clans 65 and 68…………………………………………………...53 3.2.6c Clan 505…………………………………………………………..54 3.2.6d Clan 52 …………………………………………………………...54 3.2.6e Clan 53 and Clan 504 …………………………………………….55 3.2.6f Clan 533 …………………………………………………………..56 3.2.7 CYPs in Mucoromycotina, Blastocladiomycota and Oomycota ………….57 3.2.8 CYPs with unusual phyletic profiles ………………………………………58 3.3 Conclusion…………………………………………………………………...... 60 3.4 Materials and methods ………………………………………………………………62 3.4.1 Acquisition of sequence data and phylogenetic analyses …………………62 3.4.2 Clustering the CYPs using BLASTp and the optimized Tribe-MCL algorithm…………………………………………………………………………63 3.4.3 Clan Identification…………………………………………………………63 3.4.4 Classification of CYPs into putative functional categories………………..64 3.4.5 Online database architecture……………………………………………….64 3.5 Tables………………………………………………………………………...... 65 3.7 Figures………………………………………………………………………………..77

! vi

! !

4 Phylogenomic investigation of Cytochrome P450 proteins from 51 Pezizomycotina species 85 4.1 Background…………………………………………………………………………..85 4.2 Results………………..………………………………………………………………87 4.2.1 CYP clan/family diversity in Pezizomycotina……………………………..87 4.2.2 Metabolic distribution of CYPs……………………………………………89 4.2.3 CYP clan gains and losses in Pezizomycotina………….………………….91 4.2.4 CYPs under selection pressure………………………….………………….93 4.2.5 CYP duplication in Pezizomycotina……..………………………...... 94 4.2.6 Taxa and species-specific CYPs…………………………………...... 95 4.2.7 Secretory CYPs of Pezizomycotina………………………………………..95 4.3 Discussion.………………………………………………….………………………..96 4.3.1 CYP functional diversity is correlated with pathogenic lifestyles…………96 4.3.2 CYP clan losses and gains are correlated with fungal lifestyles…...... 97 4.3.3 Preferential gains/losses of CYPs in specific metabolic categories..………98 4.3.4 CYP clan evolution…………..…………………………………………...100 4.3.5 Species and Class-specific CYPs indicate niche specific diversification...101 4.3.6 Reasons to find species-specific CYPs…………….……………………..103 4.4 Conclusion………………………………………………….………………………104 4.5 Materials and methods……………………………………………………………...105 4.5.1 Description of the computational pipeline…………………………..……105 4.5.2 Comparison of Pezizomycotina with characterized CYPs………...... 105 4.5.3 Calculating dN/dS ratio……………………………………………..……105 4.5.4 Phylogenetic tree construction……………………………………………106 4.5.5 CYP clan gain and loss analysis………………………………………….106 4.5.6 Finding secretory CYPs…………………………………………………..106 4.5 Supplementary Figures……………….……………….……………………………107 4.6 Supplementary Tables………………………………….…………………...... 108

! vii

! !

5 Fungal Calcium Signaling Database (FCSD): A community resource for calcium signaling in fungi 119 5.1 Background………………………………………………….……………………...120 5.2 Construction and content…………………………………………………...... 121 5.2.1 Computational pipeline utilized in FCPD………………………………...121 5.2.2 Searching and accessing sequence data in FCSD………………...... 122 5.2.3 BLAST utility………………………………………………….…………123 5.2.4 Exploring the calcium-signaling pathway………………………………..123 5.2.5 Phenotype data exploration tool…………………………………………..123 5.2.6 Blog and twitter feeds on calcium signaling……………………………...124 5.2.7 Reference page for identifying references………………………………..124 5.3 Utility and Discussion………………………………………………………………125 5.3.1 Diversity in the calcium-signaling pathway in fungal phyla……………..125 5.3.2 Accuracy and annotation correction……………………………...... 126 5.3.3 Simplified reference search……………………………………………….127 5.3.4 Community applications for research…………………………………….127 5.4 Evolution and comparative genomics of calcium signaling genes…………………128 5.4.1 Calcium permeable channel proteins (CCH1, MID1, ECM7, FIG1 and YVC1) ………………………………………………………………………….128 5.4.2 Calcium pumps (PMR1, PMC1, ENA1, SPF1 and NEO1) ……...... 130 5.4.3 Calmodulin and Calcineurin……………………………………...... 133 5.4.4 Calcium exchanger proteins (VCX1, VNX1) ……………………………134 5.4.5 Phospholipase C…………………………………………………………..134 5.5 Conclusions…………………………………………………………………………135 5.6 Tables……………………………………………………………………………….136 5.7 Figures………………………………………………………………………………138

6 Summary and Conclusion 147 6.1 A repository of fungal cytochrome P450 protein sequences……………………….147 6.2 Phylogenomic analyses of CYPs reveal taxa specific gains and losses…………….149 6.3 Community platform for fungal calcium signaling.………………………………..150 6.4 Calcium signaling complexity varies greatly between fungal phyla……………….151

!viii

! !

6.5 Future work ………………………………………………………………………...152

Bibliography 154

! ix

! !

List of Tables

2.1 Distribution of key protein families in different fungi [1]

2.2 Features of the 13 published genomes belonging to the genus Aspergillus [2].

2.3 Genome comparison between and within species of Aspergillus [3]

2.4 Number of species-specific P450s in the genus Aspergillus

2.5 Genes of the gibberellin biosynthesis gene cluster [4].

2.6 Overlap between the genes present in the clusters of A. nidulans and N. fischeri. The colors indicate similar function of the members.

2.7 Number of putative secretory CYPs across the Aspergillus species

2.8 Average ratio of secretory proteins and ratio of secretory CYPs

2.9 Functional domains present among the secretory CYPs in different Aspergilli

3.1 CYP families and clans

3.2 Characterized CYPs used for functional classification

3.3 Parameter optimization for clustering

3.4 The largest 30 clusters that contain only fungal and oomycete sequences.

3.5 Blast hits to characterized CYPs.

3.6 Clans involved in primary, secondary and xenobiotic metabolism

3.7 Top ten CYP families in fungi

4.1 Average gain/loss in every class/order in Pezizomycotina

5.1 32 putative calcium-signaling genes of the core calcium-signaling pathway

5.2 Distribution of functionally characterized families of P-type ATPases (families 1–9) encoded within the genomes of 26 eukaryotes[5]

! x

! !

List of Figures

1.1 Genome content and size variation across the different fungal phyla

1.2 Variation in the genome and gene content across various fungi

1.3 Number of genomes sequenced since the first genome of S. cerevisiae in 1996.

1.4 Phylogenomic pipeline used across various analyses

2.1 Action of CYPs in the endoplasmic reticulum.

2.1 The Species tree for Aspergillus genus

2.2 The Fungal cytochrome P450 database pipeline [2]

2.3 The P450 Analysis Platform Pipeline

2.4 CYP Name Analysis

2.5 CYP cluster analysis

2.6 CYP search

2.7 Submit form

2.8 Genome view of the A. flavus gene (AFL2G_07229, see (a)) & A. terreus gene (ATEG_04721.1, see (b)) with the exon/intron structure and the configuration of the Interpro domains associated with the exons.

2.9 (a) Gibberellin cluster in G.fujikuroi, (b) Cluster from A.oryzae similar to viridin biosynthesis cluster from Trichoderma virens, (c) a putative Trichothecene biosynthesis cluster observed in N.fischeri (d) a putative broken Gibberellin biosynthesis like cluster seen in A.nidulans.

2.10 Dot plots of (a) A. nidulans 6 against rest of theAspergilli (b) Dot plot of A. nidulans chromosome 8 against rest of the Aspergilli(c) Dot plot of N. fischeri against the rest of the Aspergilli (Dotplots taken from the Broad Institute Aspergillus comparative genomics page - http://www.broad.mit.edu/annotation/genome/aspergillus_terreus/Dotplot.html)

3.1 Phylogenetic relationships among taxa in FCPD 1.2 and the number of CYPs and clusters in each taxon

! xi

! !

3.2 Range of CYPome sizes across various kingdoms and fungal phyla

3.3 Optimizing parameters for clustering

3.4 Earliest CYP families in fungi

3.5 CYP families follow power law distribution

3.6 Pipeline employed in FCPD 1.2 version.

3.7 Phylogenetic tree of CYP65 in Pezizomycotina.

3.8 Phylogenetic tree of CYP68

3.9 Neighbor joining tree of CYP51

3.10 Neighbor joining tree of CYP61

3.11 Neighbor joining tree of CYP65

3.12 Neighbor joining tree of CYP505-CYP541

3.13 Maximum-likelihood tree of CYP52

3.14 Maximum-likelihood tree of CYP504

3.15 Phylogenetic tree of CYP540

3.16 Phylogenetic tree of CYP544

3.17 Phylogenetic of CYP5025

3.18 Phylogenetic of CYP645

4.1 Properties of fungi juxtaposed with phylogeny, CYP clans and number of CYPs belonging to three metabolic categories

4.2 CYP clan gain and loss with species-specific CYPs, Duplicated CYPs, and Secretory CYPs in Pezizomycotina

5.1 FCSD computational pipeline

5.2 Distribution of calcium signaling genes across organisms

! xii

! !

5.3 Distribution of core calcium signaling genes among fungal orders

5.4 The Diagram tab

5.5 Phenotype data management tab

5.6 Calcium signaling genes in 5 classes across different fungi

5.7 Gene family gain and loss in P-type ATPase PMC1 gene family.

5.8 Gene family gain and loss in P-type ATPase ENA1 gene family.

5.9 Gene family gain and loss in P-type ATPase VCX1 gene family.

5.10 Gene family gain and loss in PLC1 gene family.

!xiii

! !

Acknowledgements

I am greatly indebted to a number of people who have made this thesis possible.

Foremost, I would like to express my deepest gratitude to my advisor and mentor Prof. Seogchan

Kang for providing an excellent environment for graduate research. It is difficult to imagine my

PhD without his encouragement and guidance. I would like to thank Prof. David Geiser for reminding me the biological relevance of several of my questions. His courses on fungal biology and mycotoxins have been critical in widening my understanding of fungi. I deeply appreciate my committee members Prof. Kateryna Makova, Dr. Anton Nekrutenko and Dr. Yu Zhang for helping me become a better scientist with their questions and suggestions. I am very thankful to

Dr. Yong-Hwan Lee for providing me the wonderful opportunity to visit his lab at Seoul

National University and work on the Fungal Cytochrome P450 Database.

I want to sincerely thank the Kang lab members - Bongsoo Park, Vasileios Bitas, Hye-

Seon Kim, Jung-Eun Kim, and Douglas Whalen for all the engrossing discussions, fun, support and critique on my research, especially Bongsoo for supporting and helping me in learning PHP and building database systems. I want to thank my collaborators at Dr. Lee lab at SNU - Dr.

Jongsun Park for helping me enormously when I began my thesis research, Jaeyoung Choi for his infectious enthusiasm on being in this wonderful era of genomics and pointing me to so many useful tools and skills that I am able to use in this thesis.

I would like to thank my colleagues in the bioinformatics and genomics (BG) program –

Ti-cheng Chang, Zhaorong Ma and Zhenhai Zhang for all the fun we had working together on assignments and class projects. To BG program seniors Dr. Yogeshwar Kelkar, Dr. Melissa

Wilson, Dr. Chungoo Park and Dr. Sunando Roy for guiding me in the right direction at the

!xiv

! ! beginning of my PhD and all the brainstorming and feedback during my candidacy and comprehensive examinations.

I also want to thank my wonderful friends Samudra Sengupta, Jesal Kanani, Kshitij

Hasabnis, Navneet Chahal, Manasi Kamat, Vamsi Krishna Adhikamsetty, Suhas Bambardekar,

Ujjawal Gandhi, Vivek Narayan, Ninad Mokhariwale and Pooja Nadkarni, life at Penn state would not have been the same without them. From them I learnt to take life less seriously and they have contributed indirectly to my thesis.

My sincere thanks to the ever-smiling Janice Kennedy and Mike Radis at the Huck institutes of life sciences, they have been phenomenal in their help to new and old graduate students of the BG program. BG program directors Prof. John Carlson and Dr. Shashikant

Cooduvalli for being so wonderful in supporting BG graduate students, and giving me the opportunity to host journal clubs, and for guiding me in organizing the first ever student organized Bioinformatics and genomics retreat at Penn State.

Most importantly, I would like to thank my family for supporting me in this endeavor.

My Dad for letting me choose a PhD in bioinformatics and genomics over taking charge of his company, my mother who always believed in me and has been a source of comfort and assurance through the difficult phases. I want to thank my sisters Meena Bhujang and Rashmi Khasnis, for always being there for me and being proud of even the smallest of my achievements. My deepest thanks to my lovely wife, Bhargavi, for her support and patience through the last years of my graduate school at Penn State. I want to thank my high school teacher Ms. Bapat, who believed in my abilities and inspired me to achieve my potential.

! xv

! !

Lastly, I want to convey my heartfelt thanks and gratitude to Rajini and Suresh

Pachchapur, my aunt and uncle who tutored me and instilled in me the values and ingredients to be a respectful and successful person, to them I dedicate my thesis.

!xvi

! !

Chapter 1 Introduction

1.1 History of Fungal Research

Humans have known fungi since ancient times; this can be ascertained by the ancient wall paintings of mushrooms throughout the world. The early Romans and the Egyptians used for baking and fermentation; bread was baked as early as 10,000 BC. Probably the earliest attempt at understanding fungi came from Giambista della porta in 1588 who described fungal as “tiny black seeds.” With the invention of microscopes people were able to observe fungi more closely; subsequently Joseph piton de tournefort drew germinating spores in 1707. In

1729, Florentine scientist pier Antonio Micheli demonstrated that spores were responsible for producing new generations of organisms [6]. Much later in the early 1800s, Prevost illustrated the several stages of germinating and growing bunt spores in detail using his microscope and concluded that these spores carried the . In 1879, Anton De Bary coined the term

“Symbiosis” after looking at the mutually benefitting relationship between fungi and algae, around at the same time De Bary named the potato fungus as Phytophthora infestans, thus beginning the surge in investigations on fungal infections in crop plants and commercially significant plants. The first major fungal infestation was seen in the rubber plantations in the

British colonies across the world, Thomas Petch realized that it was the same fungus infecting the rubber leaves all over the world; it was known to mycologists as Dothidella ulei [6]. With each epidemic of crop plants and commercially important trees, mycologists learnt more about the abilities of the fungi and their importance in the industry [7].

! 1

! !

1.2 Diversity in Kingdom Fungi

There are an estimated 5.1 million fungal species on the planet [8], this is roughly 6 times the number of plant species. Only Bacteria among all the species perhaps rival this diversity.

Fungal diversity can be observed in the variety of habitats that they live in, fungi can be found in marine water, fresh water, soils, on other fungi, with other fungi on plant roots, as plant pathogens, as lichens, pathogens of arthropods and invertebrate animals and inside plant leaves and stems[8]. Inside plant leaves and stems they live as endophytes that help the plant in tolerating abiotic stress [9] such as drought tolerance[10] and tolerance to high temperature[11] and salinity[12]. Fungal endophytes also help in evading biotic stress by activating the plant defense response [13]. There are more than 100 species of fungi that are pathogenic to humans

[14], these fungi affect humans in the form of skin rashes, infection of the skin, nails, and even hair [15]. In addition, they have also been found to be involved not only in a host of respiratory disorders caused due to inhaling of fungal spores [16] but also in food poisoning due to mycotoxin, such as aflatoxins [17]. In arthropods[18] they colonize the tissues and cause serious lesions as well as death.

The diversity is also seen in the myriad applications of fungi in the industry, starting with fermentation and food processing [19], biofuels, and production of industry scale enzymes [20].

The fermentation industry uses fungi to produce ethanol, variety of organic acids [21], antibiotics, and beverages. Penicillium and related species have been used extensively to produce various types of cheese and are responsible for the characteristic taste of these cheeses [22, 23].

A number of Aspergilli are used for the production of commercially important enzymes; citric acid is majorly produced by Aspergillus niger all over the world, cellulase enzyme produced by

! 2

! ! many fungi is used in giving the stone-washed texture to denim jeans as well as making fabrics cleaner and shinier [24]. Cyclosporin A used as an immunosuppressant is commercially produced from strains of Tolypocladium inflatum [25]. White rot fungi such as Phanerochaete chrysosporium [26] and brown rot fungi such as Postia placenta [27] have been used for the degradation of lignin from the wood to yield cellulose in the paper industry. Secondary metabolites produced by Aspergillus and Fusarium species are used as drugs for several human diseases, lovastatin is one such secondary metabolite used to lower blood cholesterol levels [28].

Fungi have also been found in the extreme of environments and some have also garnered the name of “extremophiles” [29]. A spectacular example is the fungi Cladosporium sphaerospermum that can survive in the radiation heavy environments of the Chernobyl nuclear reactor in Ukraine. The melanin pigment in this fungus can protect it from harmful radiation.

Surprisingly, this fungus can utilize this radiation as a source of energy to grow and survive [30].

The tremendous potential of fungi in the industry and its ability to survive in the worst possible surroundings make them very fascinating organisms to work on.

1.3 Variation in Fungal genome content

The Fungal Kingdom consists of 7 phyla namely Blastocladiomycota, Chytridiomycota,

Glomeromycota, Microsporidia, Neocallimastigomycota, and Basidiomycota. It also contains 4 taxa with uncertain placements (“incertae sedis”). The diversity of fungal niches is in turn reflected in the variation in genome size and architecture between fungal species and within fungal genera (Figure 1.1). The fungal genome sizes vary from 9-124 Mb in the Ascomycota phyla to 8-176 Mb in the Basidiomycota phyla (JGI- Mycocosm; http://genome.jgi- psf.org/programs/fungi/index.jsf). The main contributors to the variation in genome size are the

! 3

! ! differences in the complexity of these fungi and the niche they occupy. For instance there has been a large expansion in the genome size of filamentous fungal pathogens compared to the non- pathogenic counterparts [31]. Similarly, the Microsporidia fungi that have developed a predominantly simple and parasitic lifestyle have extremely reduced genome size and consequently important protein families and functions[32]. A comparison of fungal lifestyles such as biotrophy (lifestyle dependent on a living host organisms), hemibiotrophy (lifestyle dependent on living host that -- continues with life cycle on dead tissues), necrotrophy (lifecycle on dead tissues) and symbiotrophy (nourishment through symbiosis) indicates varying genome sizes and content (Figure 1.2). While biotrophs seem to have a larger genome with larger repeat content and lesser amount of genes, hemibiotrophs have larger gene content in comparatively smaller genomes. Thus it seems like there is a direct correlation between the genome content and the adaptability and pathogenicity of fungi [31].

! 4

! !

Figure 1.1: The figure shows the genome content and size variation across the different fungal phyla, it is indicative of the divergence in genome size and content across all fungi

! 5

! !

Figure 1.2: The figure shows the variation in the genome and gene content across various fungi

that span different niches and lifestyles[33].

There have been atleast three independent ancestral genome duplication events in fungi, one each in (Ascomycota), Mucoromycotina (insertae sedis) and

Blastocladiomycota [2]. These whole genome events as well as several polyploidy in individual species has further increased the variation in the genome size and the content giving rise to the

! 6

! ! current diversity. It has also recently been observed that there exist similar to bacterial plasmids, mobile genetic material[34] in the form of dispensable or supernumerary chromosomes that have helped fungi like Fusarium species[35], Mycosphaerella graminicola

[36], Cochliobolus heterostrophus, Fusarium solani and Leptosphaeria maculans. These dispensable chromosomes typically carry genes that play an important role in the niche occupied by the respective fungi. Some of these chromosomes have also been shown to horizontally transfer between strains to allow for transfer of pathogenicity and other important traits[37].

Such phenomena lead to the variation in the genomes from strain to strain within a species.

1.4 Emergence of Comparative Genomics and “Phylogenomics”

With the recent innovations in sequencing methods a number of fungal genomes have been sequenced [38], the first fungal genome to be sequenced was that of [39] in 1996 followed by the genome of the fission Schizosaccharomyces pombe

[40] in 2002 (Figure 1.3). Several filamentous fungi such as [41],

Aspergillus nidulans [42], Magnaporthe grisea [43] and Ustilago maydis [44] were also sequenced subsequently. With the introduction of new sequencing technologies in 2007 there was a sudden rise in genome sequencing in all organisms. Now more than 200 fungal genomes have been sequenced and 1000 more (http://1000.fungalgenomes.org/) are slated to come in the next 5 years. This explosion in the sequencing of genomes in fungi has given rise to a flurry in the research based on these genomic sequences [45, 46]. As people compare the diverse fungal genomes they are discovering the myriad adaptation mechanisms that these fungi have amassed in order to adapt to their hosts/environments. This comparison of genomes to identify phyla or organism specific adaptation or specialization has been called as comparative genomics. Many

! 7

! !

No.$of$Genomes$Sequenced$ 30"

25"

20"

15"

10"

5"

0" 1994" 1996" 1998" 2000" 2002" 2004" 2006" 2008" 2010" 2012" 2014"

Figure 1.3: Number of genomes sequenced since the first genome of S. cerevisiae in 1996.

comparative genomic analyses have been carried out to provide critical insights into pathogenicity and population divergence[47], identification of horizontally acquired novel virulence genes[48] and niche specific colonization strategies[49]. These comparisons of genomes involve comparing gene/protein sequence datasets and building multi-gene phylogenies to identify patterns of relevance. This building of multigene phylogenies to understand evolutionary events is called as Phylogenomics. The following pipeline (Figure 1.4) has been used extensively throughout the different projects in the thesis. These multi-gene phylogenies can help us in understanding the evolution of fungal niches by examining the gene gain, loss or amplification.

! 8

! !

Figure 1.4: Phylogenomic pipeline used across various analyses

1.5 Investigating evolution via comparative analysis of protein families

The availability of the genomic data has made it convenient to explore the roles of certain protein families in the adaptation of fungi to niche environments. The gene for gene concept introduced by H. H. Flor in 1942 elucidated the classic battle between plants and their pathogens with each inventing a gene targeted at overcoming the effects of the other, Flor used the model of flax rust fungus Melampsora lini [50], it was seen that for each resistance gene in the plant

Linum usitatissimum there was a equivalent virulence gene in the fungus. Certain protein families have been more widely observed in this war of genes between the host and the pathogen, a recent analysis by Soanes et al. [51] examines the expansions in gene families found to be responsible for pathogenesis. They found that certain motifs such as Fungal transcription factor, Trypsin, Cutinase, Peroxidase were more abundant in the pathogenic fungi as compared to the non-pathogenic ones, thus proving that there are certain protein families that spearhead war between genes.

! 9

! !

Chapter 2

Pilot Project: Optimization of Cytochrome P450 sequence clustering procedure

Cytochrome P450 proteins (CYPs) are heme-thiolate proteins that are involved in critical fungal biological processes such as growth, virulence and degradation of xenobiotic compounds.

With the sequencing of several fungal genomes it has become convenient to inquire the cytochrome P450 data across different species to understand their role in these species.

Comparative genomic analyses of protein families like CYPs has revealed several important virulence associated features. The fungal cytochrome p450 database

(http://p450.riceblast.snu.ac.kr/), a repository of CYP associated data such as sequence data, protein family data, references, for over 60 species of fungi and oomycetes was built to allow for comparative analysis of CYP data among sequenced fungi. The database organized CYP sequenced into groups using a clustering algorithm called TribeMCL. However, clustering with default parameters of the clustering tool resulted in suboptimal grouping of the CYPs into protein families. Also, we wanted to add several other features to the database to allow for better analyses. The goal of this pilot project was to optimize the clustering procedure and improve the computational pipeline. We followed this with some preliminary analyses that resulted in ideas for new features to be added. The optimization procedure and the preliminary analyses are mentioned in this chapter.

! 10

! !

2.1 Background

2.1.1 Cytochrome P450 proteins

The Cytochrome P450s (CYPs) are cellular proteins that absorb light at 450nm. They are often found to be involved in secondary metabolism clusters, Paracetamol (acetaminophen) metabolism, Aflatoxin biosynthesis, Terpene biosynthesis, Xenobiotic compound degradation and a host of other fungal defense mechanisms[52]. P450s generally catalyze the monooxygenase reaction (1):

+ – RH + O2 + 2H + 2e → ROH + H2O (1)

They function by obtaining their electrons from donor enzymes like the Cytochrome P450

Reductase (CPR), a flavoprotein containing FAD and FMN domain, is localized in the endoplasmic reticulum where most of the detoxification of drugs occurs [53] (Fig 2).

Figure 2.1: Action of CYPs in the endoplasmic reticulum. (Image from Edwards’s lab homepage - http://www.bio.ilstu.edu/Edwards/Projects/P450.shtml)

! 11

! !

Most of the P450s possess a signature motif “FW - SGNH - x - GD - {F} - RKHPT - {P}

- C - LIVMFAP – GAD”; this motif is often found in conjunction with other domains that often define the characteristic property of the P450. They are often categorized as ‘Heme-thiolate proteins’, it is difficult to define P450 systems because of their functional diversity, but efforts have been made to define them and characterize their domains [54, 55]. Two different P450 subclasses have been found: (i) Nitric oxide reductase, which can reduce nitric oxide without mono- activity (B-class P450s), and (ii) E-class P450s that do not require molecular oxygen for their catalytic activity, these are termed as self-sufficient P450s. Certain P450s in bacillus species have CPR domain in them, thus making them self sufficient. But most P450s have characteristic domains like FAD domain, Ferredoxin domain, FMN domain and a

Cytochrome b5 [54] that specify the peculiar function they perform.

Malachite green, a widely used dye, is an antifungal agent used in growing fish cultures.

CYPs identified from elegans could reduce this dye to leucomalachite green and primary and secondary arylamines [54]. Gibberella fujikuroi secretes gibberellins that can modulate their host’s growth and development, thus rendering them suitable to infection by the fungus [56]. Rojas et al found a CYP (P450-1), a multifunctional enzyme that is a part of the cluster responsible for gibberellin production. The knockout of this gene led to the shutdown of this entire pathway [57]. Phytoalexins are a class of antibiotics secreted by plants when invaded by a pathogen. Phytoalexins can cause severe damage to the pathogen via puncturing cell wall, delaying maturation, disrupting metabolism, and even preventing the reproduction of the pathogen [58]. It was reported that Nectria haematococca has developed new weapon of P450 belonging to a new class, pisatin demethylase. This can detoxify phytoalexins secreted by pea ! 12

! ! plants [59]. These examples stress the importance of CYPs for fungi to adapt to severe surroundings successfully.

P450s have played a pivotal role in diversifying the fungal detoxification enzyme repertoire and the adaptation of fungi to hostile surroundings [60], starting with house-keeping biochemical functions[61, 62], being virulence factors of pathogenic fungi[63-66], detoxification of antifungals[59, 63, 67, 68] and to species-specific roles [51, 69] (Table 2.1). Many of these genes have duplicated and diversified in function to become the forefront arsenal in the co- evolution warfare between the prey and the predator. Lake et al. [70] in their seminal paper on the evolution of this family of genes map the timeline of the evolution of CYPs starting from the archaebacteria 3500 million years ago and the changes in the structure of these proteins adaptive to the environmental conditions of predominantly oxygen limiting to an oxygen rich one. They also suggest that the diversification of P450s is correlated with the emergence of new animal forms that in turn is correlated to the increase in the level of atmospheric oxygen levels in the past 370 million years. The phylogenetic tree of the P450s from the prokaryotic ancestral gene to the present day ones elucidates the importance of studying P450s to study species-specific adaptation.

! 13

! !

Table 2.1: Distribution of key protein families in different fungi [71]

Cochliobolus Fusarium Botrytis Neurospora Ashbya Saccharomyces

Peptide synthetases 30 37 29 7 0 0

Polyketide synthases 40 35 42 7 0 0

ABC transporters 51 54 46 39 17 29

Cytochrome P450s 63 40 33 44 nd 4

Protein kinases 112 94 70 120 nd 117

2.1.2 Cytochrome P450 nomenclature system

The cytochrome P450 protein sequence data is organized by a nomenclature[72] based on sequence identity. According to the nomenclature, any two CYPs that have >40% identity are grouped into a CYP family, and within a CYP family when CYPs have an identity >55% they are placed into a subfamily. Based on this type of nomenclature a manually curated database

(http://drnelson.uthsc.edu/CytochromeP450.html) has been maintained by Dr. Paul Nelson that consists of CYP families and subfamilies from all organisms. The database currently consists of

276 CYP families from fungi. Due to the rise of CYP data from massive genome sequencing projects, remembering the CYP family/sub-family names became difficult. To overcome this problem, a higher order clan classification was introduced[73]. Since this introduction of clan and family classification, various efforts have been undertaken to classify and annotate CYPs from various fungal subgroups[74, 75].

! 14

! !

2.1.3 FCPD Pipeline

However even after the introduction of robust nomenclature and a classification system the effort to classify all CYPs in fungi has been fragmented. Also, most efforts on CYP clan/family classification had been based on manual curation and classification, such manual

CYP classification is not scalable especially considering the increase in genome data. In 2008, our group introduced a first of its kind automated classification of fungal cytochrome p450 proteins along with a comprehensive database that would store CYP data via the Fungal

Cytochrome P450 Database (FCPD)[76]. The database accumulated 4,538 sequences from 66 fungal and 4 oomycete species. The FCPD (http://p450.riceblast.snu.ac.kr/) [76] pipeline consisted of four major steps. (1) CYPs from more than 150 fungal species were recruited based on 16 Interpro terms associated with cytochrome P450 proteins (Figure 2.1), false positives generated (i.e., short domains) were avoided by restricting the length of the domain to 25 amino acids, the ones that were recruited even after these tight restrictions were categorized as questionable CYPs. (2) In the second step, the recruited CYPs were subjected to a Bi-BLAST procedure to generate pairwise identities and e-values. (3) These pairwise e-values were then submitted to the TribeMCL algorithm for clustering (this procedure is described in greater detail in chapter 3), the default parameters for clustering were used: e-value=1e-05 and inflation factor=5. (4) In the fourth and last step these clustered CYPs were assigned CYP family and clan names.

! 15

! !

Figure 2.2: The Fungal cytochrome P450 database pipeline [76]

2.1.4 Fungal cytochrome P450 database: drawbacks

However, in spite of the robust and commonly used pipeline there were certain drawbacks in the FCPD, (a) the computational pipeline (Figure 2.1) generated suboptimal clustering of the CYP sequences, (b) there was no putative functional annotation for aiding any phylogenomic or comparative genomic analyses, (c) due to the identification of new CYP clans and families the database needed to be updated, also there was a FCPD specific nomenclature that would possibly compound identification issues unless the CYP family/clan were also appropriately indicated against each CYP, (d) the rise of genome sequencing also necessitated the addition of more sequences.

! 16

! !

In order to overcome these drawbacks we devised several steps, the primary and the most important step was to optimize the CYP clustering procedure. We undertook a pilot project with sample data in order to optimize the clustering. This chapter explains how we used different combination of parameters and the data used and the pipeline. The greater details of the parameter optimization will be described in chapter 3.

2.2 Results and Discussion

2.2.1 Genus Aspergillus

The pilot project started with finding a dataset for the pipeline optimization, we started with the 10 Aspergillus species (Figure 2.1). This dataset was considered due to the diversity of the Aspergillus species. After S. cerevisiae, the most well studied fungus is A. nidulans belonging to the genus Aspergillus. The genus has 8 members that have been completely sequenced by the Broad Institute at MIT; among the fungal kingdom the genus Aspergillus has one of the most numbers of sequenced genomes inside the genus. There are several reasons for the large number of sequenced genomes in this genus: (i) Aspergillus species have been important either for the diseases that they are involved in or for their commercial role. Citric acid, a very common ingredient of many food products, has been produced in the industry using

A. niger. Many Japanese and oriental cuisine are made using strains of A. oryzae especially sake

– an alcoholic beverage and shoyu – soy sauce, it has been listed as Generally Recognized As

Safe (GRAS) by the Food and Drug Administration (FDA). (ii) Some Aspergilli have been known to produce potent toxins. A. flavus is known to produce aflatoxins which have proven to be highly carcinogenic [77]. (iii) Aspergillus species is also responsible for aspergillosis in immunocompromised humans [78, 79]. A. fumigatus is one of the major causes of aspergillosis. ! 17

! !

This fungus can also cause mycotoxicosis and affect people working in mushroom farms or having a compromised immune function, the mortality in the latter case being as high as 50%.

Figure 2.1: The Species tree for Aspergillus genus

(iv) Some of Aspergillus species have been studied as model fungus. A. nidulans is a model fungus and was one of the earliest fungal genomes to be sequenced [42]; it has been a spectacular model to study DNA repair, mutation, cell cycle control, pathogenesis and metabolism. N. fischeri, a teleomorph of Aspergillus fischerianus is one more member of the genus that has drawn lot of interest because of its closeness to A. fumigatus. It is a rare pathogen and occasionally causes keratitis and pulmonary aspergillosis.

! 18

! !

Table 2.2: Features of the 13 published genomes belonging to the genus Aspergillus [46].

Species Genome size (Mb) GC Ratio (%) # of ORFs # of Exons

A. clavatus 27.86 49.21 9,121 27,959

A. carbonarius 34.87 51.818 0+ 0*

A. flavus 36.79 48.347 12,604 40,971

A.niger CBS513.88 33.98 50.373 14,086 50,371

A. niger ATCC1015 37.2 50.356 11,200 34,971

A. oryzae 37.12 48242 12,063 35,319

A. terreus 29.33 52.898 10,406 33,116

N. fischeri 32.55 49.433 10,403 0*

A. fumigatus A1163 29.21 49.546 9,929 29,094

A. fumigatus Af293 29.38 49.798 9,887 28,164

A. nidulans 30.07 50.324 10,701 35,525

P. chrysogenum 32.22 48.96 12,791 40,441

P. marneffei 28.64 46.668 10,638 34,306

+ORF numbers was not provided.

*Exon numbers was not provided.

N. fischeri genome is 10% larger than that of A. fumigatus and had lots of genes unique (Table

3) to its lifestyle making it an interesting species to look at from an evolutionary adaptation perspective [80]. This wide range of applications and disparate lifestyle in a single genus makes it very suitable to work on for studying evolution and adaptation.

! 19

! !

Table 2.3: Genome comparison between and within species of Aspergillus [80]

Within Within A. niger A. fumigatus

A. flavus vs. A. fumigatus Species Af293 vs. A1163 ATCC1015 vs. CBS 513.88 A. oryzae vs. N. fischeri

Genome vs. genome 99.80% 99.30% 99.50% 92.40%

CDS vs. genome 99.60% 99.10% 99.10% 94.30%

Protein vs. protein 99.50% 96.70% 98.00% 93.40%

In total 1,353 sequences from 8 published Aspergillus genomes were taken for the optimization procedure. These CYPs were further submitted to OrthoMCL, an ortholog detection tool based on reciprocal blast hit and Markov clustering [81], which carries out a reciprocal Blast protocol to generate all possible pairs of ortholog and paralog among sequences, so that the list of homologues would contain all the proteins that were created as a result of possible duplication and speciation events. Now, the OrthoMCL tool clusters sequences based on two parameters: E- value and inflation factor (described more in chapter 3). Excluded sequences in the result of

OrthoMCL fail to be categorized based on default settings of these parameters (e-value=1e-05, inflation factor=5). We examined the result of the clustering with the default parameters. The clustering with default parameters resulted in suboptimal clustering that accumulated a lot of

CYP families into the same clusters. For instance, cluster 1 contained atleast two clans and cluster 7 included 3 clans that should ideally be in different clusters. The first idea was to vary the parameters used in OrthoMCL. We tried various combinations of parameters and examined the clustering result, with every combination we had a different grouping result with the number

! 20

! ! of clusters increasing as we decreased the e-value and increased the inflation factor. However, at every iteration we observed that there were a number of false-positives that were generated due to the e-value. These false-positives consisted of sequences that returned high e-value but had good identity over smaller sequence length. Thus, we realized that e-value alone was not enough to generate confident clustering of protein sequences.

2.2.2 P450 Analysis Platform

In order to overcome the problem of false-positives we introduced another parameter in the clustering criteria. This parameter was “coverage”, defined as the percentage length of the query that is similar to the hit returned. So, now we had three parameters namely, e-value, inflation factor and coverage that could be used to dictate the clustering results. We tried again all combinations of these three parameters. The clustering results improved markedly on the occurrence of false-positives. However, we still had aggregation at higher values of e-value. We decided to match our cluster with manually curated CYP clan and family groupings defined earlier[74, 82]. We wanted to choose the parameters that best matched the manually curated groupings. In order to try these combinations of various clustering parameters we built a platform and called it P450 analysis platform (http://p450analysis.riceblast.snu.ac.kr/). The pipeline for the P450 Analysis platform remained highly similar to the FCPD except that we added the coverage parameter (Figure 2.3). We also built into the P450 Analysis platform features to aid in examining the generated clusters. These features included (a) CYP Name analysis that allows for checking the occurrence of CYP clan and families among the clusters.

The feature allows the user to look at the presence of a CYP clan/family among all the clusters generated as seen in Figure 2.4. This feature allowed us to see how many subtypes (clusters) may exist for a certain CYP clan/family. These CYP clan/family were also labeled separately as ! 21

! !

“low” and “high” depending on the CYPs in the cluster, the low labeled ones are CYP that have lower probability belong to a particular clan/family as compared to the high labeled ones. We observed that typically, the CYP clans had many more clusters compared to CYP families; on the other hand the species-specific CYP families were present in singular clusters.

Figure 2.3: The P450 Analysis Platform Pipeline

! 22

! !

Figure 2.4: CYP Name Analysis

(b) CYP cluster analysis tab shows the list of clusters generated in a tabular format (Figure 2.5) that includes the number of fungal and non-fungal CYPs in each cluster, it also allows for checking the CYPs in each cluster and exploring the species distribution in every cluster (by clicking on the cluster). Another way to explore species distribution in the clustering result is via the (c) CYP search option, with this option users can search the clustering result based on

Phylum/Subphylum or by locus/sequence IDs. This search function is vital to find the species/order/subphylum or phylum-wide distribution of CYPs, such distribution helps in closely examining the match between the manual curated groupings and the automated clustering. Thus with all these features we were able to check the accuracy and the suitability of parameters for

! 23

! ! the clustering of CYP sequences. Finally, we also created a submit page (Figure 2.7) that allows for insertion of e-value, inflation factor, the dataset for clustering and the coverage.

Figure 2.5: CYP cluster analysis

We tried atleast 5 different combinations of values for each parameter (described in greater detail in chapter 3). We found that the parameters combination of e-value=1e-50, coverage=60% and inflation factor=5 gave the best clustering result. This combination was able to closely match the manually curated data at Nelsons database as well as the semi-automated procedure adopted by

Deng et al [74].

With these parameters we went ahead and clustered a larger dataset containing 22,940 sequences from more than 300 genomes that include fungi, bacteria, eukaryotic pathogens,

! 24

! ! animals and plants. We observed similar results and there was a high degree of match with manual curated data. Now, that we had optimized the clustering procedure, we wanted to identify additional features that could be added to the database by exploring the uses of the database.

Figure 2.6: CYP search

Figure 2.7: Submit form

! 25

! !

Thus the Aspergillus dataset was explored, I identified some of the ways in which the P450 dataset could be used to build meaningful hypothesis. The following paragraphs describe some of the analysis that was done.

2.2.3 Species-specific CYPs

Among the clustering results were singlet clusters that contained just one CYP sequence, we called these sequences as species-specific CYPs. This is because they would not group with the other clusters and were extremely unique to the species they came from. A total of 83 species-specific CYPs from 8 genomes of the genus Aspergillus were obtained from the pipeline employed in the pilot project (Table 2.4).

Table 2.4: Number of species-specific P450s in the genus Aspergillus

# of species- Total # of Genome Species specific Most Common Function P450s Size (Mb) P450s

A. nidulans 13 122 30.07 GA14 synthase

A. flavus 8 159 36.79 Aflatoxin biosynthesis

N. fischeri 8 99 32.55 Terpene biosynthesis

Fumonisin biosynthesis, Aflatoxin A. clavatus 6 97 27.86 biosynthesis

A. fumigatus A1163 3 79 29.21 Benzoate-4-monooxygenase

A. oryzae 10 163 37.12 Benzoate-4-monooxygenase

Benzoate-4-monooxygenase, Aflatoxin A. niger CBS513.88 22 151 33.98 biosynthesis

! 26

! !

A. terreus 13 125 29.33 Benzoate-4-monooxygenase

The number of species-specific P450s was fairly proportional to the total number of

P450s and genome size of each member of the genus. The most observed functions were also a fair indicator of the lifestyles of these individual fungi (mentioned in the Table 4). We observed that A. niger had the highest number of species-specific CYPs while A. fumigatus has the lowest number of species-specific CYPs, this pattern is observed to some extent from the most common function observed among these species-specific CYPs. The A. niger that is used extensively in the enzyme industry seems to have a rich and unique set of genes involved in synthesizing and degrading various enzymes. One of the species-specific protein in A. flavus (AFL2G_07229) that had two domains one being a P450 domain and the other one being a MFS domain (Figure

2.8a). This seemed like a fusion protein that could have polygenic effects. Examining the genes neighboring this one shows one P450 protein and a MFS transporter. But after examining the paper by Yu et al. [83] which compared the aflatoxin gene cluster between A. flavus and A. parasticus gene cluster it was concluded that this gene was fused by wrong automated gene calling. Two other proteins, which also contain both P450 domain and MFS domain, were identified; however they display different features as compared to the previous case. One is

ATEG 04721.1 (Figure 2.8b) belonging to A. terreus and the other one being EEA23504 from

P. marneffei. The ATEG 04721.1 has both the P450 and the MFS domain fused on a single exon as against the presence of these domains on different exons in the A. flavus gene. The Interpro

! 27

! ! domain labeling for the P. marneffei gene is not currently available for analysis and hence not described here.

Figure 2.8: Genome view of the A. flavus gene (AFL2G_07229, see (a)) & A. terreus gene

(ATEG_04721.1, see (b)) with the exon/intron structure and the configuration of the

Interpro domains associated with the exons. (These diagrams were generated by SNU

Genome Browser (http://genomebrowser.snu.ac.kr/))

(a)

(b)

! 28

! !

Also, it is of interest to note that the most common function in 5/8 cases in the above table is the secondary metabolism activity. It may be possible that these aspergilli have species-specific secondary metabolites.

2.2.4 CYPs involved in Secondary Metabolism Clusters

We also wanted to identify the CYPs of note that are part of secondary metabolism clusters. Three paralogous genes in A. nidulans and two duplicates in N. fischeri were observed that could be a part of secondary metabolism clusters similar to ones seen in this genus and some of the closer relatives. The duplicated P450 genes in N. fischeri (NFIA_098920 and

NFIA_098930) exhibit similarities to Trichothecene biosynthesis gene cluster (Figure 2.9c) whereas the three P450 genes in A. nidulans (AN9248.3, AN9251.3, and AN9253.3) seem to belong to a gene cluster that has similarities to the gibberellin/diterpenoid biosynthesis cluster in

G. fujikuroi [56, 57]. The A. nidulans genes similar to those in the gibberellin biosynthesis gene cluster are located in what could be a broken cluster distributed on different chromosomes, namely chromosomes 6 and 8 (five genes in each cluster, Figure 2.9d). The structure and the order of these genes on these two chromosomes are very characteristic of a gibberellin biosynthesis cluster: Three CYP duplicated paralogous genes are next to each other on chromosome 8, next to each of them are genes similar to phytanoyl CoA dioxygenase which are oxidoreductases. Chromosome 6 has yet another species-specific P450 (AN3253.3) which is similar to a geranyl-geranyl diphosphate synthase (GGDP synthase). Neighboring it, there are two proteins, one is an MFS protein (AN3254.3), a protein very similar to CPS/KS protein

(AN3252.3 - the fusion of ent-copalyl synthase and ent-kaurene synthase found in fungi), and the other is a glutathione S-transferases (AN3255.3) and an Ent-Kaurene oxidase (AN3256.3). These

! 29

! ! genes are described in table 5. Now, in a typical gibberellin biosynthesis pathway like all other diterpenoids, gibberellins (GA) are produced by hydroxymethylglutaryl (HMG) coenzyme A via mevalonic acid, isopentenyl diphosphate, geranyldiphosphate (GDP), and geranyl-geranyl diphosphate (GGDP). The genes from the A. nidulans cluster also present in gene cluster from G. fujikuroi (Table 2.5) [56]

Figure 2.9: (a) Gibberellin cluster in G.fujikuroi, (b) Cluster from A.oryzae similar to viridin biosynthesis cluster from Trichoderma virens, (c) a putative Trichothecene biosynthesis cluster observed in N.fischeri (d) a putative broken Gibberellin biosynthesis like cluster seen in A.nidulans.

! 30

! !

Table 2.5: genes of the gibberellin biosynthesis gene cluster [56].

Regulation of Gene Enzyme function Reference gene expression

General isoprenoid pathway

Constitutive hmgr HMG-CoA reductase [84] expression

Constitutive fpps Farnesyl diphosphate synthase [85] expression

Constitutive ggs (ggs1) General GGDP-synthase, primary metabolism [86] expression

GA biosynthetic gene cluster

Ggs2 GA-specific GGDP-synthase AREA-control [87] cps/ks Bifunctional ent-copalyl-ent-kaurene synthase AREA-control [88]

P450-4 ent-kaurene oxidase (P450 monooxygenase) AREA-control [89]

P450-1 Multifunctional CYP monooxygenase, GA14 synthase AREA -control [57]

P450-2 GA 20-oxidase, oxidase elimination of the carbon 20 AREA-control [90] des GA4 1, 2-desaturase, conversion of GA4 to GA7 AREA-control [91]

No N-metabolite P450-3 13-hydroxylase, conversion of GA7 to GA3 [91] repression

Weak nitrogen Smt Sugar membrane transporter upstream of the gene cluster [92] regulation

Constitutive B. Tudzynski et al., orf1, orf2 mfs Genes of unknown function (downstream of gene cluster) expression unpublished

2.2.5 N. fischeri cluster

N. fischeri also has a cluster consisting of 8 genes (Fig 5c). These clusters share similarity to Trichothecene biosynthesis clusters reported in Trichoderma virens and A. oryzae [93]. There

! 31

! ! are genes that have overlaps with the gibberellin biosynthesis like cluster seen in A. nidulans.

These overlaps have been described in Table 2.6.

Table 2.6: Overlap between the genes present in the clusters of A. nidulans and N. fischeri.

The colors indicate similar function of the members.

Gibberellin/Trichothecene Protein Function Locus Name (chromosome number) biosynthesis cluster member

A. nidulans N. fischeri

CPS/KS protein Bifunctional ent-copalyl-ent-kaurene synthase AN3252 ggs2 GGDP synthase AN3253 NFIA_98920

NFIA_98930 mfs Multi Facilitator Superfamily protein AN3254 NFIA_98950

P450-4 Ent Kaurene Oxidase AN3256 oxidoreductases Phytanoyl CoA dioxygenase AN9246

AN9254

P450-1 GA-14 synthase like proteins AN9248 NFIA_98910

AN9251

AN9253

GAPDH C-3 Sterol dehydrogenase NFIA_98940 reductase C14 sterol reductase NFIA_98900 regulator protein C6 transcription factor NFIA_98960 cyclase NAD-binding oxidoreductase NFIA_98970

To verify the presence of these clusters in other members of the genus, the genomes of A. nidulans and N. fischeri were compared with the rest of Aspergillus species. This was done using the genome dotplot program on the Broad institute website. It was seen that the cluster from the

A. nidulans had no conserved synteny with any of the other members of the genus (Figure 2.10a

! 32

! ! and 2.10b). It was the same with N. fischeri except that interestingly it fell into a gap of a very large conserved block with A. fumigatus (Figure 2.10c). These observations could be indicative of novel clusters that are unique to these fungi and may have arisen as a result of a possible lateral transfer or segmental duplication events.

Figure 2.10: Dot plots of (a) A. nidulans chromosome 6 against rest of the Aspergilli (b) Dot plot of A. nidulans chromosome 8 against rest of the Aspergilli (c) Dot plot of N. fischeri against the rest of the Aspergilli (Dotplots taken from the Broad Institute Aspergillus comparative genomics page - http://www.broad.mit.edu/annotation/genome/aspergillus_terreus/Dotplot.html)

(a)

! 33

! !

(b)

(c)

! 34

! !

2.2.6 Putative secretory CYPs

A number of CYPs are secretory proteins and it is worthwhile finding out their function in the survival of fungi inside host environment (in this context – Aspergillus). These secretory proteins can be predicted using seven signal peptide prediction software hosted on the secretome database (http://fsd.snu.ac.kr; Choi et al., under review). From the pilot project the CYPs were analyzed using the tool above to estimate the number of secretory CYPs (Table 2.7).

Table 2.7: Number of putative secretory CYPs across the Aspergillus species

Ratio of # of putative secretory Species name secretory CYPs # of CYPs to total (Class SP) CYPs

A. nidulans 24 122 0.197

A. flavus 20 159 0.126

A. oryzae 25 163 0.153

A. clavatus 23 97 0.237

A. niger 36 154 0.234

N. fischeri 21 99 0.212

A. terreus 12 125 0.096

A. fumigatus 12 79 0.152

P. chrysogenum 19 101 0.188

P. marneffei 15 120 0.125

Total 207 1,219 0.170

! 35 !

One of the interesting things is that the ratio of putative secretory CYPs is much higher than average ratio of putative secretory proteins at genome-wide level even though there is little experimental evidence that CYPs can be secreted (Table 2.8).

Table 2.8: Average ratio of secretory proteins and ratio of secretory CYPs

No. Species name Secretory Total Total Total Ratio

CYPs sec/total ORF/ all No. of (SP/Total)

(SP) CYPs ORFs CYPs

1 A. nidulans 24 0.590 0.200 122 0.197

2 A. flavus 20 0.528 0.203 159 0.126

3 A. oryzae 25 0.583 0.209 163 0.153

4 A. clavatus 23 0.660 0.196 97 0.237

5 A. niger 36 0.597 0.184 154 0.234

6 N. fischeri 21 0.566 0.200 99 0.212

7 A. terreus 12 0.512 0.206 125 0.096

8 A. fumigatus 12 0.570 0.205 79 0.152

9 P. chrysogenum 19 0.723 0.181 101 0.188

10 P. marneffei 15 0.583 0.171 120 0.125

We also wanted to investigate further and find the functional domains present in these secretory

CYPs by using the CFGP; we found common functions across these secretory CYPs (Table 2.9).

Whereas most functions were common to all the 8 species, we also found species-specific functions among the secretary CYPs.

! 36 !

Table 2.9: Functional domains present among the secretory CYPs in different Aspergilli

A.n – A. nidulans, A.ni – A. niger, N.f – N. fischeri, A.fl – A. flavus, A.fu – A. fumigatus, A.o – A. oryzae, A.t – A. terreus, A.c – A. clavatus, and red asterisk denote proteins that are putative secretory and species-specific.

Functional domains of putative secretory CYPs A.n A.ni N.f A.fl A.fu A.o A.t A.c

Immunoglobulin/ major histocompatibility complex * *

Pisatin demethylase * * * * * *

CYP52 * * * * * * * *

E-class P450 group 1 * * * * * * * *

E-class P450 group 4 * * * * * * *

Legume lectin, beta domain *

Haem peroxidase, plant/fungal/bacterial *

Ribosomal protein S2 * *

Peptidase aspartic, active site

Integrins alpha chain *

Protein kinase * *

Homeobox protein, antennapedia type *

E-class P450 group 2 *

! 37 !

Based on our observations we realized that it would be worthwhile to be able to allow users of the FCPD to get access to putative functional annotation for every CYP, as well as tools that can be used to carry out the above mentioned analyses. We set out to make these features available in the next version of the FCPD. 2.3 Materials & Methods

2.3.1 Data

The data consisted of 17,329 CYP sequences from genomes of fungi, bacteria, plants and metazoan species. The fungal species included all the published genomes available. The other species were included to allow for comparison of CYP clusters. These genomes were imported from the Comparative Fungal Genomics Platform (CFGP; http://cfgp.riceblast.snu.ac.kr) archive.

2.3.2 Analyses Tools

BLASTMatrix implemented in the Comparative Fungal Genomics Platform (CFGP; http://cfgp.snu.ac.kr/) [46] was also used simultaneously to look at the conservation of these

P450s across the fungal kingdom and also some select genomes from vertebrates and bacterial kingdom. BLASTMatrix allows comparison of protein sequences across genomes and outputs a similarity color-coded table that lets you look at the conservation of a set of sequences across genomes. It also then calculates the number of proteins that are specific to the species (after comparison with every genome selected) and gives a pie chart. The results from BLASTMatrix were taken to confirm the species-specific proteins generated.

ClustalX [94], the desktop version of ClustalW, and MEGA [95] were used to build and display phylogenetic trees. All trees were built with a 1000 bootstraps and the maximum- likelihood algorithm was used. The Aspergillus comparative database hosted by the broad institutes was used for checking the secondary metabolism clusters; the chromosome data and

! 38 ! the gene browser feature of the database were used for this purpose. The interactive Dot Plot tool was used to find out the conservation/preservation of synteny in clusters similar to those involved in secondary metabolism. Finally, all the clustering related analyses such as finding species/class/subphylum/phylum specific (singlet) clusters and CYP name analysis were carried out using the in-house P450 analysis platform. 2.4 Conclusion

CYPs have been classified into clusters based on sequence similarity via a modified

(optimized) form of TribeMCL (as per the FCPD) [76]. An in-house database with features called the P450 analysis platform was created to carry out detailed cluster analysis. Using this platform we clustered 22,940 CYP sequences from more than 300 genomes into 2579 clusters.

The clustering provides vital information regarding the CYP groupings into families and sub- families. However, we plan to add putative functional classification of CYP clans and families based on characterized CYP sequences. Expand the current knowledge of CYP clans and family groupings by applying the existing nomenclature system to the clusters generated by the pipeline, this will allow uniformity in the CYP data in the database and that available across other resources. Finally, allow for browsing of the CYP data based on CYP clan and family classification.

! 39 !

Chapter 3 Systematic and searchable classification of cytochrome P450 proteins encoded by fungal and oomycete genomes

Cytochrome P450 proteins (CYPs) play diverse and pivotal roles in fungal metabolism and adaptation to specific ecological niches. Fungal genomes encode extremely variable

"CYPomes" ranging from one to more than 300 CYPs. Despite the rapid growth of sequenced fungal and oomycete genomes and the resulting influx of predicted CYPs, the vast majority of

CYPs remain functionally uncharacterized. To facilitate the curation and functional and evolutionary studies of CYPs, we previously developed Fungal Cytochrome P450 Database

(FCPD), which included CYPs from 70 fungal and oomycete species. Here we present a new version of FCPD (1.2) with more data and an improved classification scheme. The new database contains 22,940 CYPs from 213 species divided into 2,579 clusters and 115 clans. By optimizing the clustering pipeline, we were able to uncover 36 novel clans and to assign 153 orphan CYP families to specific clans. To augment their functional annotation, CYP clusters were mapped to

David Nelson's P450 databases, which archive a total of 12,500 manually curated CYPs.

Additionally, over 150 clusters were functionally classified based on sequence similarity to experimentally characterized CYPs. Comparative analysis of fungal and oomycete CYPomes revealed cases of both extreme expansion and contraction. The most dramatic expansions in fungi were observed in clans CYP58 and CYP68 (Pezizomycotina), clans CYP5150 and CYP63

(Agaricomycotina), and family CYP509 (Mucoromycotina). Although much of the extraordinary diversity of the pan-fungal CYPome can be attributed to and adaptive

! 40 ! divergence, our analysis also suggests a few potential horizontal gene transfer events. Updated families and clans can be accessed through the new version of the FCPD database. FCPD version

1.2 provides a systematic and searchable catalogue of 9,550 fungal CYP sequences (292 families) encoded by 108 fungal species and 147 CYP sequences (9 families) encoded by five oomycete species. In comparison to the first version, it offers a more comprehensive clan classification, is fully compatible with Nelson's P450 databases, and has expanded functional categorization. These features will facilitate functional annotation and classification of

CYPs encoded by newly sequenced fungal and oomycete genomes. Additionally, the classification system will aid in studying the roles of CYPs in the evolution of fungal adaptation to specific ecological niches.

! 41 !

3.1 Background

Cytochrome P450 proteins (CYPs) are found in all domains of life [96] and comprise one of the largest protein families. Their existence predates the emergence of oxygen metabolizing life forms [70]. CYPs are defined by the absorption of light at 450nm by the heme cofactor, which facilitates the oxidation of a very diverse array of metabolic intermediates and environmental compounds. CYPs are responsible for a large number of different metabolic reactions in a cell [97], including primary, secondary and xenobiotic metabolisms [68, 98, 99].

The evolution of CYPs has been intimately intertwined with organismal adaptation to new ecological niches, by participating in pathogenicity, utilization of specific substrates, and/or detoxification of xenobiotics. Many CYPs are hypothesized to have evolved through the chemical warfare between plants, animals, insects and microbes which involves biosynthesis and detoxification of toxic metabolites [70, 100]. In fungi, several CYPs have been implicated in fungal virulence and neutralization of antifungal compounds resulting from host defense responses [59, 65, 101]. Expansions and diversifications in several CYP families have been associated with the evolution of fungal pathogenicity [51]. Therefore, functional and evolutionary analyses of CYPs is useful in predicting the ecological specialization and diversification of individual fungal taxa [102].

The extraordinary functional and evolutionary diversity of fungal CYPomes presents a major hurdle to CYP classification [74]. They share little sequence similarity, except for a few conserved residues that are characteristic of CYPs. The most conserved region is involved in binding of a heme cofactor. Substrate binding regions are much more variable but may possess a signature motif. This motif is often found in conjunction with one or more domains such as

! 42 ! cytochrome b5, ferredoxin, and binding sites for FAD (flavin adenine dinucleotide) and FMN

(flavin mononucleotide) [54].

Another challenge in developing a comprehensive CYP classification system is the rapidly increasing number of sequenced fungal genomes. Currently, more than 250 genomes are present in the public domain [45, 46], but this number is predicted to increase rapidly (e.g., http://1000.fungalgenomes.org). The rapid influx of genome sequences calls for robust computational tools that can effectively support large-scale comparative analyses of genomes and specific gene families.

The first nomenclature/grouping schema for CYPs, based on amino acid sequence similarity, was proposed by Nebert et al. in 1987 [72]. According to this schema, any two CYPs with sequence identity greater than 40% belong to a single CYP family; and any two CYPs with sequence identity greater than 55% belong to a subfamily. A manually curated database based on this approach has been maintained at http://drnelson.uthsc.edu/CytochromeP450.html to curate CYPs in multiple kingdoms [82, 103]. The database also serves as a repository of CYP nomenclature. Unfortunately, this schema cannot be efficiently applied to curate and classify rapidly increasing CYPs uncovered through fungal genome sequencing.

The clan system approach was developed to support higher-level grouping of families identified via the sequence-similarity based schema. This approach, which places all CYP families with a monophyletic origin into a single clan, has been successfully applied to classify

CYP families in metazoa [73] and four fungal species [74]. For example, if new CYPs had equal identity to two or more CYP families, they can be assigned to a clan in which these families belong. Since the introduction of the “clan concept” in 1998 to classify metazoan CYPs [73], additional clans in vertebrates (9), plants (11) [104], arthropods and bivalves (4) [105], and fungi

! 43 !

(115) [74] have been identified.

However, phylogeny based clan classification has become problematic for the pan-fungal

CYPome, because the number of fungal CYPs is too large to conduct phylogenetic analyses efficiently. Automated clustering based on sequence similarity is the gold standard approach for fast classification of large protein sets [106, 107]. This approach does not require any prior knowledge and allows for rapid clustering of large protein families such as CYPs.

In 2008, we employed an automated clustering approach to build the Fungal Cytochrome

P450 Database (FCPD) [76]. Since then the number of sequenced fungal genomes has increased substantially, which necessitated the development of a new classification system. Additionally, the original FCPD classification generated several mega clusters, indicating that clustering parameters should be optimized.

Here we present FCPD release 1.2 (http://p450.riceblast.snu.ac.kr) and the new CYP classification pipeline based on the modified TRIBE-MCL algorithm. The pipeline allowed for a larger number of CYP families to be included into existing clans as well as supporting the discovery of potential new clans. To aid functional annotation, putative functional roles were assigned to over 150 clusters using literature surveys of previously characterized fungal CYPs.

The families and clans are accessible through the FCPD database, which offers global viewing and analysis of fungal CYPs.

3.2 Results and Discussion

3.2.1 Identification of CYPs and optimization of clustering parameters

We first extract all proteins that contained Interpro (http://www.ebi.ac.uk/interpro/) terms associated with CYPs from 324 genomes corresponding to 113 fungal and Oomycete species, 94

! 44 ! other eukaryotic species, and six bacterial species (Figure 3.1) as previously described [76].

While our main focus was on curating fungal and Oomycete CYPs, other eukaryotic species and selected bacterial species were included to aid in comparative evolutionary studies across kingdoms. Although Oomycetes are fungus-like in that they produce hyphae and spores, they reside in a more basally derived eukaryotic lineage that includes chromophyte algae (Figure 1).

However, because mycologists have traditionally studied Oomycetes, we analyzed CYPs from both true fungi and Oomycetes. This data extraction resulted in 22,940 CYPs including 9,697

CYPs from fungi and Oomycetes and 13,243 CYPs from other organisms (Figure 3.1).

Extracted proteins were then split into clusters using an optimized protocol based on reciprocal pair-wise BLASTp all-against-all comparisons [108] followed by Tribe-MCL clustering [107] (see Methods and Figure 3.6 for details). The revision of the clustering pipeline used for the original FCPD classification [76] was motivated by a few factors, including the presence of many mega clusters with over 100 members, singlet clusters, and clusters that did not match families in Nelson’s P450 database. While there is no absolute “best” criterion to optimize clustering, our main goal was to achieve more uniform grouping by minimizing the fraction of very large (>100 members) and singlet clusters.

Three parameters (E-value, inflation factor, and a new parameter called “coverage”) were evaluated and adjusted to optimize the performance (Table 3.3). Coverage was defined as the percentage of the query sequence matched by the sequence from the database, thus higher the coverage lower is the possibility of false-positives. To find optimal values for these parameters, we tested patterns of clustering with various combinations of parameters in the optimum plane of a three-parameter space (Figure 3.3), which resulted in the following combination of parameters:

E-value = 1e-50, inflation factor = 5 and coverage = 60%. The coverage parameter was

! 45 ! instrumental in filtering out many false positives displaying high E-values over short regions of similarity.

3.2.2 CYP clustering in fungal cytochrome P450 database (FCPD) 1.2

Using the optimized protocol, we categorized 22,940 CYPs into 2,579 clusters (Figure

3.1): fungal and Oomycete CYPs belong to 1,090 (42%) clusters, while 1,489 (58%) clusters contained only non-fungal CYPs. Although there are a few clusters that contain CYPs in more than one kingdom, most of them are kingdom specific. All Oomycete clusters consist of CYPs in

Oomycete species with the exception of one that also contains CYPs in plants, fungi and protists.

Among the non-fungal clusters, 778 clusters contained plant CYPs and 652 clusters contained

CYPs from Metazoa.

To validate our clustering approach and to link resulting clusters to previous classifications, the clusters were compared with CYP families and clans identified in previous studies [73, 74]. Our clusters were consistent in most instances with previous classifications, indicating good concordance between FCPD clusters and known families and clans. Out of 459 fungal CYP families identified in the Nelson’s database, 292 CYP families matched with the

CYPs in the FCPD. Those that did not match corresponded to CYPs in species that are not currently covered in our database.

At the clan level, 77 clusters matched with 115 clans from the previous clan classification

(Table 3.1) with some clusters including multiple clans. In only three instances our clustering results suggested that two or more clans needed to be merged: (i) clans CYP531 and CYP532;

(ii) CYP619 and CYP530; and (iii) CYP567, CYP561, CYP563 and CYP60. Orphan clans identified in previous classification[74] were assigned to some of the non-orphan clans through

! 46 ! our clustering. We identified 38 new putative clans and validated existing clans, which brought the total number of clans in FCPD to 115.

As a result of this expanded clan classification, 131 additional CYP families were put into new and existing clans (Table 3.1; marked in bold). Of those, six families that correspond to singlet FCPD clusters were classified as orphan clans. The resulting clans vary in size and the number of CYP families included. The largest clans (CYP531 and CYP58) contain 14 families each. The size distribution analysis showed that, like many other protein families [109, 110],

CYP clusters follow a power law distribution (Figure 3.5). Only 37 clusters with more than 100 members were observed. In contrast, 1726 clusters were comprised of a single CYP. This again reflects the common evolutionary pattern found in many protein families.

Clusters, families and clans have been made accessible through the FCPD database, which enables global viewing and analysis of fungal CYPs. New CYPs can be annotated using the BLAST search function. The FCPD 1.2 includes sequences from newly sequenced fungi as well as non-fungal species and provides a higher order (clan) classification of CYP families.

Also, hits to characterized CYPs can be explored as well as hits to newly characterized CYPs can be identified under the characterized CYP tab. More details regarding the FCPD database have been previously described [76]. To assist users, we added a video tutorial on how to search the database.

3.2.3 Wide variation of the CYPome

The total number of CYPs and their relative fraction within the proteome in different eukaryotic kingdoms and phyla varied widely. The boxplots in Figure 3.2A shows that plants

! 47 ! have the largest CYPome (0.82%), protists have the smallest CYPome (0.05%), and fungi are placed in the middle (0.40%). The soybean has the largest CYPome comprised of 699 CYPs.

The size of CYPome of individual species within kingdoms also varied drastically, presumably reflective of their diverse lifestyles and ecological niches. The largest variation was observed in fungi and plants. In fungi, Pezizomycotina and Basidiomycota have the largest and most variable CYPomes (Figure 3.2B). The CYPomes of certain Basidiomycota fungi such as the brown rot fungus Postia placenta (353 CYPs) and the cocoa tree pathogen Moniliopthora perniciosa (307 CYPs) are almost as big as plant CYPomes. In these species, massive expansions of CYPs involved in oxidizing complex hydrocarbons were observed [111]. In contrast, some Basidiomycota fungi, such as Puccinia graminis (18 CYPs) and Malassezia globosa (6 CYPs), have undergone massive reductions, probably reflecting their obligatory pathogenic lifestyles. Members of the Chytridiomycota and Oomycota also showed small

CYPomes. Members of Saccharomycotina and Taphrinomycotina have the smallest CYPomes among fungi (2-3 CYPs).

3.2.4 Phyletic distribution of CYP families and clans in fungi

Our phyletic analysis showed uneven distribution of CYP cluster sizes among taxa, which is consistent with extreme expansions and contractions of certain CYP families in the course of evolution. Seven out of 30 largest fungal-specific clusters were exclusively comprised of CYPs from the subphylum Pezizomycotina. The most dramatic expansions were observed in

Pezizomycotina (clans CYP58 and CYP68), Agaricomycotina (clans CYP5150 and CYP63) and

Mucoromycotina (family CYP509). Some small clusters contained only species-specific CYPs; such clusters were especially prevalent in members of Oomycota and Mucoromycotina.

! 48 !

Five largest fungal-specific clusters in FCPD had 1056, 472, 452, 322 and 319 CYPs, respectively. These clusters represent some of the largest CYP families in fungi (Table 3.4,

Table 3.7). The largest cluster (Cluster # 3) contains CYPs from the subphyla Agaricomycotina

(Basidiomycota) and Pezizomycotina (Ascomycota). In this cluster, most Pezizomycotina CYPs

(100) correspond to members of family CYP620, whereas a number of Agaricomycotina CYPs

(508) belong to family CYP5144. Some members of both families, especially those in

Agaricomycotina, are known to be involved in xenobiotic metabolism [112]. Additionally, this cluster includes 156 CYPs from Heterobasidion annosum and 122 CYPs from Postia placenta and 6 basidiomycete species with more than 50 CYPs, which suggests an expansion of CYPs involved in lignin degradation in these wood-rotting fungi.

The second largest fungal-specific cluster (# 11) has CYPs from Saccharomycotina and

Pezizomycotina. It is comprised of families CYP52, CYP548, CYP539, and CYP655 as well as a few other families involved in alkane assimilation (Table 3.4). The third largest cluster (# 12) consists of CYPs from Pezizomycotina. Its most dominant family is CYP65, which contains

CYPs predicted to function in secondary metabolism.

In addition to fungal-specific clusters, six clusters that contain both fungal and non- fungal CYPs were found. Many of them are involved in evolutionary conserved core metabolic roles and likely have derived from common ancestral proteins. Cluster 17 contains family

CYP61, one of the most conserved CYP families in fungi and beyond. The cluster has CYPs from all sub-phyla of fungi, Amoebozoa, unicellular diatom C. owczarzaki and one CYP from the algae Coccomyxa sp. Cluster 23 includes families CYP505 and CYP541, and CYPs from all fungal taxa, actinobacteria, Bacillariophyta, and the plant Populus trichocarpa. Cluster 7 includes CYPs from Zygomycota and Blastocladiomycota as well as Oomycetes, protists and

! 49 ! plants. Cluster 8 includes a single family from the chytrid Spizellomyces punctatus and many

CYPs from chordate eukaryotes. Cluster 13 contains members of CYP51, which are implicated in sterol biosynthesis and antifungal drug resistance in all fungal phyla [61] as well as various

CYPs from Amoebozoa, Bacillariophyta, Euglenozoa and Chordata. Lastly, cluster 69 contains

CYP55, in which fungal and bacterial CYPs are clustered together. Some of these families will be discussed in more detail below.

Our clustering approach also revealed 959 phylum-specific clusters as well as 1,044

CYPs that did not belong to any previously defined CYP families, out of which 560 were present in singlet clusters. CYP families present in individual phyla and subphyla (excluding

Saccharomycotina) were also examined. Five CYP families were present in all Pezizomycotina species while four families were present in all Basidiomycete species. Among these three families (CYP51, CYP61, and CYP53) were common to both taxa. The CYP53 family is absent among the chytrid fungi (Figure 3.4). The most parsimonious explanation is that CYP51,

CYP61 and CYP530 were present in the last common ancestor of all fungi. Indeed, CYP51 is thought to be present even in early eukaryotes, and it has been hypothesized that CYP61 evolved from CYP51. On the other hand, family CYP530 seems to be specific to fungi and is known to be involved in degradation of various fatty acids and hydrocarbons as nutrient sources (Table

3.6: xenobiotic metabolism). It might have evolved in the last common ancestor of all fungi to adapt to a new nutritional niche.

3.2.5 Functional annotation and classification of CYP clusters

To assign putative functional roles to individual clusters, we conducted a comprehensive literature review for functionally characterized fungal CYPs. This survey led to the identification of 54 CYPs that had been shown to be involved in (i) primary metabolism (15

! 50 !

CYPs), (ii) secondary metabolism (28) or (iii) xenobiotic metabolism (11) (Table 3.2). We then used BLASTp to search the FCPD database with these CYPs as queries (Methods). A total of

2,457 hits (E-value cutoff of 1e-100) were generated with the CYPs involved in the primary metabolism. This high number of hits is mainly due to the presence of well conserved, housekeeping families such as CYP51 and CYP61, which are involved in ergosterol biosynthesis

[61, 62]. Additionally, we found 544 and 642 hits with those involved in the secondary and xenobiotic metabolisms, respectively (Table 3.5).

The relatively small number of hits to experimentally characterized CYP clusters involved in secondary metabolism suggests that many fungi have evolved a lineage-specific repertoire of CYPs to produce specific secondary metabolites. Notably, only one family

(CYP58) contained CYPs involved in both the secondary and xenobiotic metabolisms. It has been hypothesized that CYP58 from Phanerochaete chrysosporium includes a member that functions as benzoate 4-hydroxylase (Xenobiotic metabolism) and is also involved in trichothecene biosynthesis (Secondary Metabolism) [113].

Excluding CYP58, we found 12, 30 and 12 CYP families that uniquely matched to the primary, secondary and xenobiotic metabolism categories, respectively. These 54 CYP families were then used to assign putative functional roles to the respective clans. Using this approach we tentatively classified a total of 34 clans into primary metabolism (5 clans), secondary metabolism (17), and xenobiotic metabolism (12) (Table 3.6).

3.2.6 Detailed analysis of specific clans

Selected CYP clans and families were analyzed in detail to augment and validate previous evolutionary studies [61, 62, 112-116] and to highlight notable features.

! 51 !

3.2.6a Clans 51 and 61

Our clustering analysis fully supported families CYP51 and CYP61, which are comprised of housekeeping CYPs found in almost all fungi, plants and animals. CYP51 is a lanosterol 14- alpha demethylase involved in 14-demethylation of sterol precursors, and this demethylation step is common throughout all organisms [117]. To better understand its evolution, we constructed a phylogenetic tree with members of CY51s from fungi, the early opisthokonts and other single celled eukaryotes (Figure 3.9).

Most yeast have a single gene for CYP51 whereas most Pezizomycotina species have two genes except that Fusarium species and Aspergillus carbonarius have three genes.

Basidiomycetes also have a single gene for CYP51 with the exception of Postia placenta and

Coprinus cinereus (two genes). Rhizopus oryzae encodes two CYP51s. Allomyces macrogynus, the earliest chytrid, and the diatom Fragilariopsis cylindrus have two genes. These species do not have any CYP61, which is consistent with the view that the CYP51 gene duplicated very early in fungal evolution and then one of the duplicates might have given rise to CYP61 [62].

The phylogenetic analysis of CYP61 (Figure 3.10) revealed the presence of a single gene in all yeasts and all basidiomycetes except P. placenta (two genes). Most Pezizomycotina have at least two genes with the exception of Puccinia graminis and Melampsora laricis-populina, which do not have a gene for CYP61. The absence of CYP61 genes in these two species could be due to their obligate biotrophic lifestyle, wherein they can utilize essential sterols from the plant hosts.

! 52 !

3.2.6b Clans 65 and 68

These clans consist of CYPs that belong to the secondary metabolism category. CYP65 has been found to catalyze the epoxidation reaction in trichothecene biosynthesis as well as radicicol biosynthesis (Table 3.1, Figure 3.7 and 3.11), while CYP68 carries out C-8 oxygenation reaction in trichothecene biosynthesis (Table 3.1, Figure 3.8) and oxidation reaction in gibberellin biosynthesis [56]. The phylogenetic trees of CYP65 and CYP68 reveal multiple recent duplications and expansions (Figure 3.7, 3.8 and 3.11). These clans are absent in

Ascomycete yeasts and Basidiomycete species, suggesting that they might have emerged in the ancestor of Pezizomycotina.

Among Pezizomycotina, there is a wide variation in the number of CYPs in these clans.

The Coccidioides species have just one gene for CYP65, whereas Dothideomycetes and

Aspergillus species have on average 8-10 genes for CYP65s and 3-4 genes for CYP68s.

Dothideomycetes have on an average at least 5-6 more genes than other fungi, which is consistent with their secretion of diverse host selective toxins (HSTs, [118]). Many of these

HSTs are products of secondary metabolism pathways.

The highest number of CYP65 and CYP68 clan members is seen in Magnaporthe oryzae,

Colletotrichum graminicola and Colletotrichum higginsianum (Figure 3.7 and 3.8). All three fungi form appressoria (specialized infection structure formed by germinating spores) to enter the plant cell. Expression studies have demonstrated that secondary metabolism pathways are active during the infection process [64], suggesting that the increased number of CYP65 and

CYP68 family members in these fungi might be linked to their pathogenicity.

! 53 !

3.2.6c Clan 505

CYP505 members are fatty acid hydroxylases and carry out the sub-terminal omega hydroxylation of fatty acids, a step required for using them as an energy source. It was hypothesized that CYP505 in fungi has evolved from bacterial CYP450BM3 via a horizontal gene transfer (HGT) event [114]. This hypothesis is supported by the fact that both types have a fused NADPH CPR domain (http://drnelson.uthsc.edu/P4503d.html).

To test this HGT hypothesis, we performed a phylogenetic analysis of this clan (includes

161 CYPs from families CYP505 and CYP541). Contrary to the hypothesis, the tree topology

(Figure 3.12) suggests an ancient origin of this clan in eukaryotes and subsequent loss. The earliest members seem to be present in the unicellular opisthokonts Capsaspora owczarzaki and the unicellular algae Fragilariopsis cylindrus apart from having members in bacteria. There are at least two genes for CYP505 in most fungi, while early eukaryotes F. cylindrus and A. macrogynus have 5 and 4 genes, suggesting an early increase in copy number and subsequent gene losses. CYP505s are absent in Ascomycete yeasts. Among Pezizomycotina, A. flavus and

Podospora anserina have 5 genes, and M. grisea has 4 genes. Basidiomycetes also have at least

2 genes with the white rot fungus P. chrysosporium containing 6 genes. It has been hypothesized

[119] that CYP505 is used by plant-associated fungi to degrade plant cuticle which is synthesized by in-chain hydroxylation of fatty acids [120].

3.2.6d Clan 52

Cluster 11 contained all the CYPs belonging to this clan; we built a neighbor-joining tree to look at their relationships (Figure 3.13). CYP52 members are found in Candida species that

! 54 ! are known to metabolize alkane and other hydrocarbons, but are absent in Saccharomyces cerevisiae and Schizosaccharomyces pombe [121]. There were as many as 12 CYP52 proteins encoded by Yarrowia lipolytica; it is however absent in Basidiomycetes. The most parsimonious evolutionary scenario suggests that the family evolved in the ancestor of budding yeasts and was lost in S. cerevisiae lineage but expanded in the Pezizomycotina. There is wide variation in the sub-phylum Pezizomycotina. The highest numbers of CYP52 proteins (12) are seen in

Aspergillus flavus, A. niger CBS 513.88, Trichoderma virens Gv29-8, Botrytis cinerea and

Magnaporthe oryzae. Talaromyces stipitatus and Penicillium marneffei have 10 and 11 members of CYP52, respectively. This pattern suggests that expansion of this family allowed these

Ascomycete fungi to efficiently metabolize various hydrocarbon compounds. It has been seen in

M. oryzae that CYP52 is upregulated during the penetration of the plant cuticle, which is made up of hydrocarbons [122]. Similar processes could be happening in B. cinerea and A. flavus, both of which are known to be pathogenic to plants. Trichoderma virens Gv29-8, T. reesei (9 genes) and T. atroviride (6 genes) are known to penetrate fungal cell wall [123] as well as plant roots

[124] and might be using their CYP52 repertoire to support these processes.

3.2.6e Clan 53 and Clan 504

CYP53 is a benzoate-para-hydroxylase enzyme that was first discovered in Aspergillus niger [125]. This benzoate detoxification occurs via the beta-ketoadipate pathway [126], which is present in many soil microbes that degrade aromatic compounds some of which are released by plants [127]. Although benzoate detoxification appears to be the main function of this CYP, it has also been found to exhibit O-demethylation activity [112]. Clan 53 is a single-family clan in cluster 37 and contains 89 CYPs. This family is absent in Ascomycete yeasts. A wide variation in its size was observed in the wood decaying Postia placenta (14 genes), Pleurotus osteratus (3

! 55 ! gens) and Phanerochaete chrysosporium (1 gene). Considering their proposed role in degrading plant based aromatic compounds that are released by the plants into the soil or might be present as a part of the dead plant material, this wide variation is puzzling. They are also present in several plant pathogenic fungi like Fusarium oxysporum (3), F. graminearum (4), Puccinia graminis (1), Moniliopthora perniciosa (2), Cochliobolus heterostrophus (3), and Botrytis cinerea (2), suggesting the possibility that the benzoate degrading activity may contribute to pathogenesis.

Clan CYP504 includes CYPs that are involved in phenylacetate catabolism [128].

Specifically, they are involved in the ortho-hydroxylation of phenylacetate, which is a precursor in penicillin production. Like Clan 53 this clan is a single-family cluster (cluster 29; Figure

3.14). The family is found in a lot of saprophytic species as well as a number of Basidiomycetes fungi which can degrade phenol derivatives as a source of carbon [68]. This family is also present in a number of both human and plant pathogenic fungi like Stagonospora nodorum (3),

C. heterostrophus (4), Penicillium marneffei (5), Fusarium oxysporum (3), F. graminearum (4) and F. solani (5). Both CYP53 and CYP504 family members were found to be upregulated during cuticle infection by insect pathogenic fungi Metarhizium anisopliae (4 genes) and M. acridum (2 genes) [67]. It was suggested that in these insect pathogens these CYP families carry out detoxification of insect released phenylacetate [67, 129].

3.2.6f Clan 533

This clan forms one of the largest fungal clusters. It contains 15 CYP families; of these, two are specific to Ascomycota, 10 are specific to Basidiomycota, and three (CYP533, CYP620 and CYP621) are common to both. The three common families form clan 533 in the previous classification by Deng et al. [74]. CYPs belonging to the CYP533 family seem to be involved in

! 56 ! the secondary metabolism since they show similarity to CYPs involved in sterigmatocystin and aflatoxin biosynthesis. The largest Basidiomycete-specific family in this clan is the CYP5144 family that has 354 members, some of which have been found to be involved in degradation of polyaromatic hydrocarbons (PAH) [112]. Many CYPs in this cluster exist in the brown rot Postia placenta (120 CYPs), the forest pathogen Heterobasidion annosum (78 CYPs), saprophytic

Coprinus cinereus (61), the edible mushroom Pleurotus osteratus (60), the white rot

Phanerochaete chrysosporium (56), and the dry rot Serpula lacrymans (55). In Ascomycete fungi Aspergillus flavus (8), A. oryzae (8), A. niger (5), Fusarium verticillioides (6), F. oxysporum (7), F. graminearum (7) and Trichoderma virens (5), all of which are known for their secondary metabolite repertoire, have the largest numbers. The presence of CYP5144 (PAH and xenobiotics degradation) and CYP533 (secondary metabolite biosynthesis) in this cluster indicates that these families might have evolved from a common ancestral CYP family.

3.2.7 CYPs in Mucoromycotina, Blastocladiomycota and Oomycota

Certain clusters contained CYPs from Mucoromycotina, Blastocladiomycota and

Oomycota. CYPs from Mucoromycotina were grouped into 28 clusters, which include three clusters that also included non-fungal CYPs (CYP51, CYP61, and CYP505) and 22 clusters having CYPs only from Mucoromycotina. One of the clusters (# 7) had CYPs from

Mucoromycotina as well as CYPs from Oomycota, Blastocladiomycota, protists, plants and

Ustilago maydis (Basidiomycota). Plant CYPs in this cluster (clan CYP86) included characterized enzymes modifying fatty acid and alkane substrates. This pattern is consistent with an ancient origin of this alkane metabolizing CYP clan, perhaps predating the split of the eukaryotes into Unikonts, Plantae and Chromalveolates.

! 57 !

Blastocladiomycota CYPs also follow a pattern similar to those seen in Mucoromycotina.

Only three clusters contain CYPs from other phyla. Interestingly, there is no CYP61s present in

Blastocladiomycota, possibly indicating their loss of ability to synthesize ergosterol. There are

14 clusters that contain CYPs from Blastocladiomycota only. Most CYPs from Mucoromycotina and Blastocladiomycota exhibited low similarity to CYPs in the Nelson’s classification system.

As expected, Oomycota CYPs gather into separate clusters (18 out of 19) with the exception of cluster 7. There are 11 CYPs that do not show any significant similarity to CYPs in the Nelson’s classification system. Only four known CYP families with unknown functions were identified (CYP5014-5017) in Oomycetes. Blast with CYP5015 as a query showed 30% identity over 89% coverage to CYP94 in that is involved in fatty acid metabolism.

Similarly, CYP5014 shows 34% identity and 89% coverage to fatty acid omega hydroxylases

(CYP86) in . CYP5016 and CYP5017 also show similar levels of identity to fatty acid hydroxylases. Thus, most CYPs from Oomycota, which have about 30-40 CYPs per genome, could be involved in fatty acid metabolism. Our observations are consistent with some of the previous studies [102, 130] that predict the absence of extensive secondary metabolism clusters (and consequently CYPs) in these organisms.

3.2.8 CYPs with unusual phyletic profiles

To better understand CYP evolution, we have examined several clusters that contained

CYPs from more than one kingdom. We observed several patterns that may have emerged through horizontal gene transfer (HGT), which has been implicated as a contributing factor in fungal adaptation to new ecological niches [131-136], or rapid birth–death evolution. Our analysis of clusters 23 and 69 were consistent with previously published examples of HGT in

Fusarium oxysporum [114] and Phanerochaete chrysosporium [75]. Cluster 69 contains

! 58 !

CYP55s from Phanerochaete chrysosporium, Pezizomycotina, and Streptomyces spp. Similarly, cluster 23 (clan CYP505) contains CYPs from bacteria, plants, early opisthokonts and fungi.

Additional examples of CYPs with unusual phyletic profiles are described below.

However, in all cases, the evolutionary relationships within these clusters have been difficult to firmly establish due to low taxon sampling. Cluster 46 has 72 CYP540 members including five

CYPs from Mucoromycotina species that show high sequence similarity to bacterial CYPs.

Phylogenetic analysis shows two branches, one with only fungal CYPs and another branch with bacterial and Mucoromycotina CYPs (Figure 3.15).

Clan CYP5081 (Cluster 126) is composed of 18 intron-less CYPs including four from

Aspergillus spp. and three from Microsporum spp. The CYPs from A. fumigatus species are predicted to be involved in helvolic acid biosynthesis [137], and their orthologs in the insect pathogens Metarhizium anisopliae and M. acridum are expressed during cuticle infection [138].

The observed phyletic pattern is consistent with massive gene loss in most fungi, although HGT from nitrogen-fixing bacteria that also synthesize helvolic acid [137] cannot be completely excluded.

Clan CYP544 (Cluster 109) contains 21 CYPs mainly from plant pathogens and epiphytes (fungi that survive on the surface of plants). Some members share sequence similarity with CYPs involved in the biosynthesis of camptothecin [139], an alkaloid secreted by plants that have anti-cancer properties. There are two homologs from Fusarium solani in this cluster; one of them has been identified as a pseudoparalog [140]. This pseudoparalog lies on the dispensable chromosomes in F. solani, and shows similarity to CYP94 family members from plants [141]. Other CYPs in the cluster also show similarity to plant CYPs belonging to the

! 59 !

CYP86 clan. Our phylogenetic analysis (Figure 3.16) is consistent with previously published evidence [139] of HGT from plants to fungi intimately associated with them.

We also analyzed clusters 173 and 212, which contain 10 and 7 CYPs, respectively, from plant-pathogenic and plant-associated fungi. While Cluster 173 has CYPs from four different

Basidiomycota fungi, Cluster 212 has seven CYPs from Puccinia graminis. All the CYPs in these clusters belong to families CYP5025 and CYP5026, which share significant similarity to

CYP86 and CYP704, families that metabolize fatty acids and function in the biosynthesis of plant cutin among other functions [142]. The phylogenetic analysis (Figure 3.17) suggests that clan CYP86 in plants and families CYP5025/CYP5026 in fungi have arisen from a common ancestral CYP family that was involved in metabolism of complex hydrocarbons.

Finally, three CYPs from Fusarium species (Cluster 416, Clan CYP645) showed sequence similarity to bacterial P450RhF proteins [143]. The RhF CYPs represent the first example of bacterial CYPs that receive electrons provided by a FMN- and Fe/S- reductase fused to them [144]. No other fungi have this type of CYP. The phylogenetic tree (Figure 3.18) is consistent with the presence of this type of CYP in the ancestor of F. oxysporum and F. graminearum.

3.3 Conclusion

Here we present a new version of FCPD, which holds 9,697 CYPs from 113 fungal and oomycete species in addition to CYPs from selected species in other kingdoms. There is no perfect solution to clustering proteins as diverse and numerous as CYPs, but we believe that our clustering pipeline provides an improved CYP classification system. Using this pipeline we have identified new clans and families. To our knowledge, this study represents the most extensive

! 60 ! classification of fungal and oomycete CYPs, which will facilitate functional annotation and classification of putative CYPs encoded by newly sequenced fungal and Oomycete genomes.

The FCPD 1.2 pipeline can efficiently group CYPs from newly sequenced genomes and help predict their functions.

The CYP number for certain species may have been exaggerated due to the following factors: (i) heterozygous alleles of the same gene, and (ii) artifacts created during genome assembly and annotation being counted as unique genes. Some species are diploids with certain degrees of heterozygosity between alleles, which might have been counted as unique genes, thus increasing the total number of CYPs. In some cases gene fragments (arising from errors during genome assembly) have been counted as separate genes. Rectifying these potential artifacts manually is challenging due in part to the very large size of data present in FCPD and also due to the difficulties of validating individual data.

There is also CYP redundancy in the database due to the presence of CYP sequences from multiple strains of several species. In the case of Postia placenta, which encodes the largest

CYPome among fungi, we identified eight alleles that have been counted as separate genes.

Similar analysis of the Solanum phureja CYPome (the largest among plants) showed four alleles that had been identified as distinct genes. Because the database includes data from 112 strains from 26 species, there is redundancy in the CYP data. We caution that users should keep these caveats in mind when using the database.

Our analysis of fungal CYPs points to a number of notable evolutionary patterns. Gene duplication and subsequent modification of the duplicated copies seem to have played a major role in creating the observed CYP diversity. The CYP family expansions seen in some of the basidiomycetes like Postia placenta, Heterobasidion annosum, and Phanerochaete

! 61 ! chrysosporium as well as ascomycetes such as Magnaporthe oryzae, Stagonospora nodorum,

Fusarium solani, and F. oxysporum may have led to these fungi adapting to their current ecological niches. Although massive CYP gene losses probably underpin unusual phyletic profiles, horizontal gene transfer as a mechanism cannot be completely discounted. The curated

CYP dataset in FCPD 1.2 provides a solid foundation for in-depth studies on myriad evolutionary patterns, which will contribute to understanding fungal evolution.

3.4 Methods

3.4.1 Acquisition of data and phylogenetic analyses

In total, 323 genomes stored in the Comparative Fungal Genomics Platform (CFGP) [46] were used to identify CYPs. Sixteen Interpro domains associated with CYP proteins were used to identify CYPs. To filter out false positives, domains that spanned fewer than 25 amino acids were labeled as “questionable” and manually evaluated as previously described [76]. The filtered sets of protein sequences were used for clustering (Figure 3.6).

Phylogenetic analyses were performed using the neighbor-joining (NJ), minimum evolution (ME), and maximum-likelihood (ML) methods as implemented in MEGA version 5.05 with 1,000 bootstraps [145]. In order to deal with alignment gaps we used a pair-wise deletion method for NJ and ME trees, whereas complete deletion was used in building ML trees. Default parameter values were used for all the phylogenetic methods. The alignments were constructed with ClustalW option of MEGA, with Gonnet matrix and default parameter values. In each case, the most prevalent phylogenetic tree with the best bootstrap support was chosen for further analysis. In some cases, such as Figures 3.15, 3.16, 3.17, and 3.18, phylogenetic trees were built

! 62 ! with GenBank sequences extracted via Blast with selected CYP queries. This was done to include CYPs from species that were not represented in the FCPD.

3.4.2 Clustering of the CYPs using BLASTp and the optimized Tribe-MCL algorithm

CYP sequences were clustered using the optimized Tribe-MCL algorithm [107].

Reciprocal Blast searches were performed to identify putative ortholog groups to be submitted to the clustering algorithm. The Tribe-MCL clustering procedure is dictated by two main parameters: (i) E-value obtained from the pair-wise BLASTp comparison of all CYPs (default value 1e-5 or lower) and (ii) the inflation factor (indicating “tightness” of the cluster) at the highest value 5 [107]. To improve the classification, we added one more parameter, “coverage”, which was defined as the percentage of the query sequences matched by sequences from the database. To find optimal conditions for these three parameters, we tested efficiency of clustering with various combinations: (i) e-values between 1e-10 and 1e-100 at intervals of 1e-

10; (ii) nine coverage values from 20% to 100% at intervals of 10%, and (iii) inflation factor from 1 to 5. We empirically chose optimal parameters as: e-value = 1e-50, coverage = 60%, and inflation factor = 5 (Table 3.3).

3.4.3 Clan identification

We were able to expand the clans identified in earlier studies [74, 113, 146] through our optimized clustering procedure. We searched for each clan through our database using a search function that was built to facilitate searching the database using various terms (e.g., Sequence ID, taxonomic group, and CYP family). We followed this step for all the clans mentioned in previous studies [73, 74, 113, 146], which allowed us to identify novel clans and assign CYP

! 63 ! families to previously identified orphan clans (Table 3.1). There were a number of CYPs that did not show any significant similarity to any of the CYP families in Nelson’s P450 databases, indicating that they are members of novel CYP families. Most of them were present in singlet clusters.

3.4.4 Classification of CYPs into putative functional categories

An extensive literature search was performed to identify 54 functionally characterized fungal CYPs. These CYPs were then matched to CYPs in FCPD using BLASTp with an E-value cutoff of 1e-100. This stringent E-value was chosen based on an empirical testing of several E- values. Based on similarity to the characterized CYPs, CYP families were classified into three broad functional categories: (i) primary metabolism, (ii) secondary metabolism, and (iii) xenobiotic metabolism. Many of the hits occurred in more than one category. In order to link

CYP clans into these functional categories, we have transferred functional annotations described above into respective clans. The BLASTp hits and the characterized set of CYPs can be accessed at http://p450.riceblast.snu.ac.kr/char_p450.php.

3.4.5 Online database architecture

FCPD has been developed using PHP script with MySQL database [76]. The Linux- based apache web-server and task management system supports BLAST analysis and MCL clustering. The middle-ware written in Perl script simultaneously executes the bioinformatics pipelines from the query submitted by the end-user, and retrieves the archived CYP dataset. The pipeline for FCPD can be found in Figure 3.6.

! 64 !

3.5 Tables

Table 3.1: CYP families and clans

Clans Families

51 51 52 52, 538, 539, 584, 585, 655, 5087, 5113, 656, 5203 53 53 54 54, 503, 560, 599, 602, 604, 649, 5204, 5213, 5085, 5086, 5103, 601 55 55 56 56, 661, 509, 5210, 5211, 5212, 5099 58 58, 542, 551, 552, 681, 682, 580, 680, 579, 5094, 5095, 5096, 5105, 5112 59 59, 586, 587, 662 61 61 62 62, 684, 626 63 63 64 5206, 5207, 5208, 5209 65 65, 561, 562, 563, 564, 565, 567, 568, 5117, 5118, 60 68 68, 595, 596, 622, 650, 651, 652, 597, 598, 5061, 5067, 5073, 5074 504 504 505 505, 541, 5205

506 506 507 507, 527, 535, 570 512 512 526 526, 591, 638, 644 528 528 529 529, 543, 545, 592 530 530, 5027, 5065, 5066, 5068, 5069, 5148, 619, 663,665, 5093,5098,5119

! 65 !

531 531, 631, 532, 57, 536, 629, 674, 675, 676, 5028, 5077, 5078, 5080, 5104 533 533, 502, 620,621,64,5037,5144,5145,5146,5147,5149,5152 534 534 537 537, 577 540 540 544 544 546 546, 5053 547 547, 581, 582, 616, 617, 618, 5070 548 548, 5114, 5115 549 549 550 550, 553, 633, 634, 635, 636, 660, 610, 611, 612 559 559, 606, 623, 647 566 566 572 572, 573, 5109 574 574, 5029, 5076, 628, 669, 670, 671, 575 575 576 576 578 578, 625 589 589, 5075, 614 590 590 593 593 603 603 605 605 607 607 608 608 609 609 613 613, 686, 685, 5082

! 66 !

615 615 624 624 627 627, 5030 630 630 632 632 637 637 639 639, 5100 640 640 642 642 643 643 645 645 646 646 648 648 653 653, 654 657 657, 641 659 659, 5090, 5111 664 664 666 666 667 667 672 672 673 673 677 677, 5064, 5142 678 678 683 683 687 687 698 698 5014 5014, 5015

! 67 !

5016 5016 5017 5017 5025 5025, 5026 5031 5031 5032 5032 5035 5035, 5036 5042 5042 5052 5052 5058 5058 5063 5063 5071 5071, 5106 5081 5081 5083 5083 5084 5084, 5121 5089 5089 5091 5091 5092 5092 5097 5097 5101 5101 5102 5102 5108 5108 5110 5110 5116 5116 5136 5136, 5137 5139 5139, 5151, 5034, 5033, 5138 5140 5140 5141 5141, 5154

! 68 !

5143 5143 5150 5150, 5155 5153 5153 5156 5156 5157 5157

There were certain families that were found in singlet clusters (containing a single CYP), these were left as orphan CYP clans: CYP511, CYP67, CYP583, 658, 668, 679, 5120, and 5125 The following CYP families from Nelson’s database did not have matches in FCPD 1.2: CYP5160- CYP5190, 5200-5400, CYP6000, CYP501, CYP5038, CYP5039, CYP5040, CYP5043, CYP5044, CYP5045, CYP5046, CYP5047, CYP5048, CYP5049, CYP5050, CYP5051, CYP5054, CYP5055, CYP5056, CYP5057, CYP5060, CYP5062, CYP510, CYP5107, CYP5127, CYP5128, CYP5129, CYP5130, CYP5131, CYP5132, CYP5133, CYP5134, CYP5135, CYP5159, CYP557, CYP5667, CYP66, CYP69, CYP697 and CYP699.

Table 3.2: Characterized CYPs used for functional classification

Functional Gene symbol FCPD ID Genbank ID Function PMID Category Primary Erg11 P450_Scl3004 NP_011871.1 Ergosterol 12140549 Metabolism Biosynthesis 14-alpha P450_Caw007 ACT21069 Ergosterol 8277826, demethylase Biosynthesis 18627475 , 11600353 Lanosterol-14alpha AAP33132.1 Ergosterol 14599667 demethylase Biosynthesis CYP51a P450_Af1075 ACF17705.1 Ergosterol 15917566 Biosynthesis CYP51b P450_Af1029 AAK73660.1 Ergosterol 11427550 Biosynthesis Erg11 P450_CnD008 AAF35366.1 Ergosterol 15474487 Biosynthesis CYP51Ap P450_Afl251 EED56341.1 Ergosterol 18775650 Biosynthesis CYP51Bp P450_Afl201 EED50354.1 Ergosterol 18775650 Biosynthesis CYP51 P450_Pc030 ACI23621.1 Ergosterol 18853217 Biosynthesis Eln2 P450_Cc206 BAA33717 Involved in pattern 10779399 formation

! 69 !

ahbB P450_An062 AAR15377 Sphingolipid 15465388 synthesis Erg11 P450_Pc030 ACI23621.1 Ergosterol 18853217 Biosynthesis Secondary CypA AAS90045 Aflatoxin 15528514 Metabolism Biosynthesis AF115 P450_Afl243 AAT65721 Aflatrem 15528556 Biosynthesis , 19801473 GliC P450_Af1066 EDP49542 Gliotoxin 15979823 Biosynthesis GliF P450_Af116 AAW03300. Gliotoxin 15979823 1 Biosynthesis Tri1 P450_Fg059 AAQ02672.1 Trichothecene 15066795 Biosynthesis , 16604118 Tri11 P450_Fg035 BAC22120.1 Trichothecene 12650935 Biosynthesis Tri4 P450_Fg036 AAK53584.1 Trichothecene 11425709 Biosynthesis , 11976083 StcL P450_An104 EAA61601.1 Sterigmatocystin 16372000 Biosynthesis StcS P450_An076 EAA61596.1 Sterigmatocystin 16372000 Biosynthesis Tri4 AAB72032 Trichothecene 7651333 Biosynthesis Tri4 AAC49958 Trichothecene 9529523 Biosynthesis SirB AAS92544 Sirodesmin 15387811 Biosynthesis SirC AAS92547 Sirodesmin 15387811 Biosynthesis SirE AAS92549 Sirodesmin 15387811 Biosynthesis PaxP AAK11528 Paxilline 11169115 Biosynthesis PaxQ AAK11527 Paxilline 11169115 Biosynthesis StcF P450_An077 AAC49196 Sterigmatocystin 8643646, Biosynthesis 7486998 StcB AAC49192 Sterigmatocystin 8643646, Biosynthesis 7486999 apdB P450_An081 CBF80479 Oxidation of tetramic 16372000 acid , 19146970

! 70 !

orf1 BAI52800 Brassicicene C 19700326 Biosynthesis CYP619C2 P450_Acl183 ACG60892.1 Patulin Biosynthesis 19383676 CYP619C3 P450_Acl182 ACG60891.1 Patulin Biosynthesis 19383676 ftmE P450_Af1055 BAH23999 Fumitremorgin 19226505 Biosynthesis ftmC P450_Af1054 BAH23996 Fumitremorgin 19226505 Biosynthesis ftmG P450_Af1056 BAH24001 Fumitremorgin 19226505 Biosynthesis Dit2 P450_Caw004 CAK54651 coat formation 18663031 RadP ACM42407.1 CYP Epoxidase 19101477 pikC AAC68886.1 Hydroxylation of 9831532 macrolactones PpoC P450_An066 AAT36614 Fatty acid oxygenase 19878096 PpoA P450_An029 AAR88626 Fatty acid oxygenase 14699095 , 15941990 , 16040966 Xenobiotic PDA (pisatin P450_Fs117 AAC01762.1 Pisatin demethylase 8208242 Metabolism demethylase) activity P450alk CAA39367.1 Alkane inducible 7865134 P450 phacA P450_An043 CAB43093 Phenylacetate 2- 10329644 hydoxylase phacB P450_An086 ABB20530 3- 17189487 hydroxyphenylacetat e 6-hydroxylase Alk8 P450_Caw009 CAA75058 Alkane assimilation 11536334 ivoC P450_An059 CBF77085 Phenol oxidase McCorki ndale et al, Phytoche mistry, 1983 nicA BAC01275.1 Nitric oxide 15502348 reductase Bph P17549 Benzoate para 2250647 hydroxylase P450nor BAB60855 denitrification 1138972 CYP52A3-b AAC60531 Alkane inducible 1368716 P450 CYP52A4 P16141.3 Alkane inducible 8645001 P450 bzuA P450_An048 AAL10516.1 Benzamide 11848676

! 71 !

Utilization

Table 3.3: Parameter optimization for clustering

(a) E-value E-value, Singlet Total % Large Parameter coverage, Large Clusters (>100) Clusters Clusters clusters % Singlet set ID inflation factor (A) (B) (C) (>100) clusters 1 1e-10, 70, 5 30 911 1,220 2 75 2 1e-20, 70, 5 33 769 1,230 3 63 3 1e-30,70, 5 35 1,208 1,754 2 69 4 1e-40, 70, 5 35 1,628 2,276 2 72 5 1e-50, 70, 5 39 2,049 2,814 1 73 6 1e-60, 70, 5 38 2,484 3,332 1 75 7 1e-70, 70, 5 37 2,922 3,886 1 75 8 1e-80, 70, 5 39 3,351 4,471 1 75 9 1e-90, 70, 5 31 3,830 5,066 1 76 10 1e-100, 70, 5 21 4,327 5,727 0 76

Inflation (b) Factor Singlet Total % Large Parameter Large Clusters (>100) Clusters Clusters clusters % Singlet set ID Parameters (A) (B) (C) (>100) clusters 1 1e-70, 60, 2 40 2,776 3,406 1 82 2 1e-70, 60, 3 38 2,782 3,516 1 79 3 1e-70, 60, 4 41 2,800 3,629 1 77 4 1e-70, 60, 5 38 2,816 3,744 1 75

(c) Coverage Singlet Total % Large Parameter Large Clusters (>100) Clusters Clusters clusters % Singlet set ID Parameters (A) (B) (C) (>100) clusters 1 1e-70, 20, 5 40 2,657 3,551 1 75 2 1e-70, 30, 5 38 2,663 3,554 1 75 3 1e-70, 40, 5 38 2,679 3,569 1 75 4 1e-70, 50, 5 38 2,720 3,621 1 75 5 1e-70, 60, 5 38 2,816 3,744 1 75 6 1e-70, 70, 5 37 2,922 3,886 1 75 7 1e-70, 80, 5 38 3,064 4,073 1 75 8 1e-70, 90, 5 38 3,398 4,515 1 75 9 1e-70, 100, 5 1 9,058 12,406 0 73

! 72 !

Table 3.4: The largest 30 clusters that contain only fungal and oomycete sequences. (1-T indicates the total number of CYPs belonging to that cluster.) Cluster Dominant Functional ID CYP family T1 subphylum category 3 CYP502,CYP5037,CYP5065,CYP5144, 1056 Agaricomycotina Secondary CYP5145,CYP5146,CYP5147,CYP5149, Metabolism CYP5152,CYP5158,CYP533,CYP620,CY P621,CYP64 9 CYP5117,CYP5118,CYP561,CYP562,CY 472 Pezizomycotina Secondary P563,CYP564,CYP565,CYP567,CYP568, Metabolism CYP60, CYP65 11 CYP5087,CYP5113,CYP52,CYP538,CY 452 Pezizomycotina Xenobiotic P539,CYP584,CYP585,CYP655,CYP656 Metabolism 14 CYP5028,CYP5077,CYP5078,CYP5080, 321 Pezizomycotina Pisatin CYP5104,CYP531,CYP532,CYP536, Demethylase CYP57,CYP631,CYP674,CYP675, CYP676 15 CYP5094,CYP5095,CYP5096,CYP5105, 320 Pezizomycotina CYP542,CYP551,CYP552,CYP579,CYP 58, CYP580,CYP680,CYP681, CYP682 16 CYP61 246 Saccharomycotina Primary Metabolism 18 CYP5065,CYP5066,CYP5068,CYP5093, 217 Pezizomycotina Secondary CYP5098,CYP5119,CYP5152,CYP530, Metabolism CYP619,CYP663,CYP665 19 CYP5061,CYP5067,CYP5073,CYP5074, 211 Pezizomycotina Secondary CYP595,CYP596,CYP597,CYP598,CYP Metabolism 622, CYP650,CYP651,CYP652,CYP68 20 CYP5014 186 Pezizomycotina Primary Metabolism 24 CYP507,CYP527,CYP535,CYP570 169 Pezizomycotina 25 CYP5150,CYP5155 166 Agaricomycotina 26 CYP548,CYP5114,CYP5115 154 Pezizomycotina 28 CYP5070,CYP547,CYP581,CYP582,CY 148 Pezizomycotina P617 29 CYP504 147 Pezizomycotina Xenobiotic Metabolism 30 CYP503,CYP5086,CYP5103,CYP54,CY 146 Pezizomycotina Secondary P560,CYP599,CYP601,CYP602,CYP604, Metabolism CYP653, CYP654 31 CYP512 144 Agaricomycotina 33 CYP5099,CYP56,CYP661 141 Saccharomycotina Primary

! 73 !

Metabolism 34 CYP5029,CYP5076,CYP628,CYP670,CY 133 Pezizomycotina Secondary P671 Metabolism 37 CYP53 121 Pezizomycotina Xenobiotic Metabolism 38 CYP5139,CYP5151,CYP5034,CYP5033, 97 Agaricomycotina CYP5138 39 CYP63 96 Agaricomycotina 40 CYP578 94 Pezizomycotina 42 CYP5141 93 Agaricomycotina 43 CYP5137,CYP5136 93 Agaricomycotina 46 CYP540 78 Pezizomycotina 47 CYP62,CYP684,CYP626 74 Pezizomycotina Secondary Metabolism 49 CYP5014,CYP5015 72 Oomycota* Primary Metabolism 51 CYP526, CYP644, CYP591 69 Pezizomycotina 56 CYP5035, CYP5036 64 Agaricomycotina Xenobiotic Metabolism 57 CYP59, CYP587, CYP662 63 Pezizomycotina

Table 3.5: Blast hits to characterized CYPs.

Functional Principal function P450 Genes CYP families # of hits to Category characterized CYPs Primary Ergosterol Biosynthesis Erg11 CYP51 259 Metabolism Ergosterol Biosynthesis 14-alpha CYP51 257 demethylase Ergosterol Biosynthesis CYP51a CYP51 259 Ergosterol Biosynthesis CYP51b CYP51 263 Ergosterol Biosynthesis Erg11 CYP51 258 Ergosterol Biosynthesis CYP51Ap CYP51 260 Ergosterol Biosynthesis CYP51Bp CYP51 119 Ergosterol Biosynthesis CYP51 CYP51 258 Pattern Formation Eln2 CYP5037, 63 CYP502 Sphingolipid Biosynthesis ahbB CYP657 12 Involved in spore coat Dit2 CYP56 105 formation Ergosterol Biosynthesis Erg5 CYP61 227 Fatty Acid Oxygenase PpoC CYP5014 173 Fatty Acid Oxygenase PpoA CYP5014 174 Secondary Aflatoxin Biosynthesis CypA 0

! 74 !

Metabolism Aflatrem Biosynthesis AF115 CYP660 27 Gliotoxin Biosynthesis GliC CYP613 5 Gliotoxin Biosynthesis GliF CYP5085 6 Trichothecene Biosynthesis Tri1 CYP68 22 Trichothecene Biosynthesis Tri11 CYP65 52 Trichothecene Biosynthesis Tri4 CYP579, 31 CYP58 Trichothecene Biosynthesis Tri4 CYP579, 34 CYP58 Trichothecene Biosynthesis Tri4 CYP579, 22 CYP58 Sirodesmin Biosynthesis SirB CYP5093 7 Sirodesmin Biosynthesis SirC 1 Sirodesmin Biosynthesis SirE 0 Paxilline Biosynthesis PaxP CYP653, 16 CYP503 Paxilline Biosynthesis PaxQ CYP698 3 Sterigmatocystin StcF CYP60, 5 Biosynthesis CYP65 Sterigmatocystin StcB CYP62 1 Biosynthesis Brassicicene C orf1 CYP551 23 Biosynthesis Patulin Biosynthesis CYP619C2 CYP619 24 Patulin Biosynthesis CYP619C3 CYP619 23 Fumitremorgin ftmE CYP5066 4 Biosynthesis Fumitremorgin ftmC CYP5076, 33 Biosynthesis CYP628 Fumitremorgin ftmG CYP5067 4 Biosynthesis CYP Epoxidase RadP CYP65 10 Xenobiotic Pisatin demethylase PDA CYP5080, 7 Metabolism activity CYP57 Benzamide Utilization bzuA CYP53 66 Alkane inducible P450 P450alk CYP52 82 Phenylacetate 2- phacA CYP504 91 hydroxylase 3-hydroxyphenylacetate 6- phacB CYP504 94 hydroxylase Alkane inducible P450 Alk8 CYP52 76 Phenol oxidase ivoC CYP682 9 Nitric oxide reductase nicA CYP55 29 Benzoate para hydroxylase Bph CYP53 87 P450 involved in P450nor CYP55 27

! 75 !

denitrification Alkane inducible P450 CYP52A3-b CYP52, 92 CYP584 Alkane inducible P450 CYP52A4 CYP52, 82 CYP584

Table 3.6: Clans involved in primary, secondary and xenobiotic metabolism Primary Metabolism Secondary Metabolism Xenobiotic Metabolism CYP51: CYP51 CYP54: CYP54, CYP503, CYP52: CYP52, CYP538, CYP560, CYP599, CYP602, CYP539, CYP584, CYP585, CYP604, CYP649, CYP5085, CYP655, CYP5015, CYP5087, CYP5086, CYP5103, CYP601 CYP5113, CYP656 CYP61: CYP61 CYP58: CYP58, CYP542, CYP53: CYP53 CYP551, CYP552, CYP681, CYP682, CYP580, CYP680, CYP579, CYP5094, CYP5095, CYP5096, CYP5105, CYP5112, CYP5142 CYP56: CYP56, CYP59: CYP59, CYP586, CYP504: CYP504 CYP661, CYP5099 CYP587, CYP662 CYP540: CYP540 CYP68: CYP68, CYP595, CYP505: CYP505, CYP541 CYP596, CYP622, CYP650, CYP651, CYP652, CYP597, CYP598, CYP5061, CYP5067, CYP5073, CYP5074 CYP657: CYP657, CYP526: CYP526, CYP591, CYP531: CYP531, CYP631, CYP641 CYP638, CYP639, CYP644 CYP532, CYP57, CYP536, CYP629, CYP674, CYP675, CYP676, CYP5028, CYP5077, CYP5078, CYP5080, CYP5104 CYP62: CYP62, CYP684, CYP55: CYP55 CYP626 CYP613: CYP613, CYP686, CYP537: CYP537, CYP577 CYP685, CYP5082 CYP547: CYP547, CYP581, CYP548: CYP548, CYP5114, CYP582, CYP616, CYP617, CYP5115 CYP618, CYP5070 CYP550: CYP550, CYP553, CYP630: CYP630 CYP633, CYP634, CYP635, CYP636, CYP660, CYP611, CYP610, CYP612 CYP578: CYP578, CYP625 CYP507: CYP507, CYP527, CYP535, CYP570 CYP653: CYP653, CYP654 CYP530:CYP530, CYP5027, CYP5065, CYP5066, CYP5068,

! 76 !

CYP5144, CYP5145, CYP5147, CYP5148, CYP619 CYP65 : CYP65, CYP561, CYP533: CYP533, CYP5014, CYP562, CYP563, CYP564, CYP620, CYP621 CYP565, CYP567, CYP568, CYP5117, CYP5118, CYP60 CYP574: CYP574, CYP5029, CYP63: CYP63 CYP5076, CYP628, CYP669, CYP670, CYP671 CYP603: CYP603 CYP605: CYP605 CYP698 CYP5093

Table 3.7: Top ten CYP families in fungi

CYP Family CYPs CYP51 270 CYP65 250 CYP61 236 CYP5144 206 CYP5150 149 CYP52 132 CYP504 128 CYP620 126 CYP505 115 CYP584 114

! 77 !

3.6 Figures

! 78 !

! 79 !

Figure 3.5: CYP families follow power law distribution

! 80 !

Figure 3.6: Pipeline employed in FCPD 1.2 version.

! 81 !

Figure 3.7: Phylogenetic tree of CYP65 in Pezizomycotina.

! 82 !

Figure 3.8: Phylogenetic tree of CYP68

Figure 3.9:Neighbor joining tree of CYP51 http://www.biomedcentral.com/content/supplementary/1471-2164-13-525-s10.pdf

Figure 3.10:Neighbor joining tree of CYP61 http://www.biomedcentral.com/content/supplementary/1471-2164-13-525-s11.pdf

! 83 !

Figure 3.11:Neighbor joining tree of CYP65 http://www.biomedcentral.com/content/supplementary/1471-2164-13-525-s13.pdf

Figure 3.12:Neighbor joining tree of CYP505-CYP541 http://www.biomedcentral.com/content/supplementary/1471-2164-13-525-s15.pdf

Figure 3.13:Maximum-likelihood tree of CYP52 http://www.biomedcentral.com/content/supplementary/1471-2164-13-525-s16.pdf

Figure 3.14:Maximum-likelihood tree of CYP504 http://www.biomedcentral.com/content/supplementary/1471-2164-13-525-s17.pdf

Figure 3.15:Phylogenetic tree of CYP540 http://www.biomedcentral.com/content/supplementary/1471-2164-13-525-s18.pdf

Figure 3.16:Phylogenetic tree of CYP544 http://www.biomedcentral.com/content/supplementary/1471-2164-13-525-s19.pdf

Figure 3.17:Phylogenetic of CYP5025 http://www.biomedcentral.com/content/supplementary/1471-2164-13-525-s20.pdf

Figure 3.18:Phylogenetic of CYP645 http://www.biomedcentral.com/content/supplementary/1471-2164-13-525-s21.pdf

! 84 !

Chapter 4

Phylogenomic investigation of Cytochrome P450 proteins from 51 Pezizomycotina species

The subphylum Pezizomycotina presents a vast diversity of ecological niches and biochemical processes observed in fungal subphyla. Changes in members of the cytochrome

P450 (CYP) superfamily appear to have played key roles in fungal niche adaption and evolution.

Availability of genomic data from many species in this subphylum has enabled comprehensive phylogenomic studies to understand the taxon-specific genetic changes that potentially underpin the observed functional and ecological diversity. 6108 CYPs from 51 Pezizomycotina species were analyzed to study the patterns of gene birth and death. This analysis revealed niche- and class-specific CYP family expansions and contractions. Putative metabolic functions were assigned to individual CYPs in each species based on sequence similarity to functionally characterized CYP proteins. Also, pathogenic Pezizomycotina fungi were divided into nine lifestyle classes (Hemibiotrophs, Biotrophs, Saprotrophs, Endophytes, Necrotrophs, Non- pathogenic, Human-pathogenic, Plant-pathogenic, and Mycopathogenic) to identify CYP family expansions and innovations potentially associated with these classes. Rapid clan gains and losses in CYP clans were observed among hemibiotrophs and necrotrophs, while large losses were observed among saprophytic species. Functional innovation in the form of species-specific CYP families was also observed from our analysis. Examination of the classes/divisions within

Pezizomycotina suggested a number of independent losses and gains in CYP families.

! 85 !

4.1 Background

Cytochrome p450 proteins (CYPs) are a vital gene family that has been observed to be involved in several activities vital to fungi. They are a key ingredient of the Ergosterol metabolism pathway that define fungal cell membranes[61, 62], synthesis of the spore coat in

Saccharomycotina fungi[147], biosynthesis of secondary metabolites[1, 93, 148], defense against xenobiotic compounds[59], degrading insect cuticle[149] and being virulence factors[63, 66,

101]. These activities cover almost all the important defense and virulence mechanisms adopted by fungi to colonize various host organisms. The diversity of activities in turn is matched by the diversity in the CYP sequences; consequently fungi have the largest number of CYP families among all species, followed by plants and animals. Fungi also have the largest amount of CYP sequences in proportion to the genome size.

In order to organize the diverse CYP sequence data, Nebert and Nelson introduced a sequence identity based nomenclature system in 1993 [72]. According to the nomenclature, CYP sequences with more than 40% identity are grouped into a CYP family and when the identity was greater than 55% they are grouped into a subfamily. Further, the different members of the subfamily are named with alphabets. Later, due to the burgeoning number of CYP families and subfamilies due to increasing sequence data a higher order clan classification was introduced

[73]. The CYP clans are defined as groups of CYPs that cluster together consistently on phylogenetic trees (http://drnelson.uthsc.edu/protocols.chapter.2006.pdf). Currently, there are

459 CYP families and 117 clans in fungi [150]. Also, CYPs have been classified into three functional categories based on their role in different cellular metabolisms namely, primary, secondary and xenobiotic metabolism[99]. Primary metabolism CYPs are involved in some of

! 86 ! the house keeping functions of the fungal cell, e.g. CYP51 and CYP61 family CYPs that are involved in the Ergosterol metabolism pathway[61, 62]. Secondary metabolism CYPs have been found to be part of secondary metabolism clusters in fungi, e.g. CYP65 and Xenobiotic metabolism CYPs degrade environmental and secreted xenobiotics.

Recently, we released a systematic and comprehensive classification of CYPs from more than 200 species of fungi and organized it into a pre-existing database called the Fungal

Cytochrome P450 Database (FCPD)[76] and published the updated version as FCPD 1.2 [150].

We described the data in the previous paper and here we use CYP data from selected sequenced fungi belonging to the Phylum Pezizomycotina (from the database) to understand the evolution and the niche specific patterns of the CYPs.

Phylum Pezizomycotina arguably consists the best representative diversity in fungi.

There are human pathogens (Aspergillus fumigatus, dermatophytes – Microsporum canis,

Blastomyces dermatitidis, Coccidioides immitis), plant pathogens (Magnaporthe oryzae,

Sclerotinia sclerotiorum, Stagonospora nodorum) as well as endophytes (Fusarium oxysporum,

Chaetomium globosum) and mycoparasitic fungi (Trichoderma species) thus covering a large range of fungal habitats. It is an ideal dataset to investigate the possible correlation between the diversity of niches with innovation in CYP families.

In this paper we have analyzed 6108 CYPs from 51 sequenced genomes of fungi from the sub-phylum Pezizomycotina. We found wide variation in CYPs among taxa, and patterns specific to diverse ecologies. The focus of this paper was to perform detailed phylogenomic analyses to identify patterns of gene birth and death, as well as trying to look for correlation of

CYP repertoire with fungal lifestyles and evolution.

! 87 !

4.2 Results

4.2.1 CYP clan/family diversity in Pezizomycotina

For the analysis we took 6108 CYP sequences from 51 published genomes belonging to the subphylum Pezizomycotina. The CYP clustering utilized in the FCPD 1.2 was used for analyses of CYP clusters[150]. The 6108 CYP sequences were distributed into a total of 609 clusters; among these the Sordariomycetes were grouped into 298 clusters, while the

Eurotiomycetes and the Dothideomycetes were distributed into 274 and 143 clusters respectively

(Supplementary Figure 4.1). The number of clusters belonging to Sordariomycetes was highest due to the larger number of genomes considered from this order. We found a total of 61 clusters that were common to all three classes; among them were 30 of the largest clusters from the result. There were also clusters that were specific to only some of the classes, we found 187 clusters that were specific to Sordariomycetes, 169 were specific to Eurotiomycetes and 58 were specific to Dothideomycetes. There were 96 clusters common to Sordariomycetes and

Eurotiomycetes, 76 clusters common to Sordariomycetes and Dothideomycetes and 70 clusters that were common to Dothideomycetes and Eurotiomycetes. Among Eurotiomycetes, there are

114 clusters specific to Eurotiales and 50 clusters specific to Onygenales. There are about 65 clusters that are present in Onygenales but not in Eurotiales and about 159 clusters present in

Eurotiales not present in Onygenales. These numbers suggest the amount of diversity and genomic activity in the class Eurotiales.

CYP clans and family assignments from the FCPD 1.2 were analyzed for CYPs from

Pezizomycotina. At the class/order level, we observed that on average Onygenales encoded smaller CYPomes (collection of CYPs from a species) whereas the Eurotiales had the largest

! 88 !

CYPomes among Pezizomycotina classes. At the species level the Dothideomycete pathogen

Colletotrichum higginsianum encoded the largest CYPome at 230 CYPs, and the Eurotiomycete saprophyte Unicinocarpus reesii encoded the smallest CYPome at 41 CYPs. The clan/family distribution followed a similar pattern with Onygenales consisting of the lowest number of CYP clans and families and the Eurotiales containing the most number of clans and families.

However, at the class level Dothideomycetes and Sordariomycetes had the most number of families and clans respectively (Supplementary Table 4.1).

Fusarium and Aspergillus species contained the most number of CYP families among the

Pezizomycotina, this is possibly because of the higher number of CYPs encoded by the genomes of these two genera. Colletotrichum higginsianum contained the most number of CYP families at

102 CYP families; it also encodes the highest number of CYPs among Pezizomycotina.

Stagonospora nodorum was the other Pezizomycotina fungi among top 10 with high number of

CYP families. On the other hand 6/10 of the lowest number of CYP families belonged to fungi from the order Onygenales, the lowest number of CYP families were found in Paracoccidioides brasiliensis (Supplementary Table 4.1).

Because the CYP family and clan distribution mostly correlated with the total number of

CYPs in Pezizomycotina genomes, we could not discern any patterns based on the numbers themselves. To address this we identified a ratio that signified the number of functions present in a species, this ratio was calculated by dividing the number of clans in each species by the number of families present in that species. Thus, higher the ratio higher is the functional diversity. Vice versa is true when the ratio is small, signifying low functional diversity (Figure 4.1).

Among the top ten species with high CYP functional diversity Neurospora tetrasperma had the highest diversity while the other Neurospora species N. discreta and N. crassa encoded the

! 89 ! second and third most diverse CYP repertoire. The only onygenales species in this group was

Paracoccidioides brasiliensis, while Aspergillus fumigatus and Penicillium chrysogenum were the only representatives from Eurotiales. The top ten also included the thermophilic species

Sporotrichum thermophile and Thielavia terrestris. On the other hand, Mycospherella fijiensis displayed the lowest CYP functional diversity among the Pezizomycotina fungi. With the exception of Fusarium graminearum all the Fusarium species display low functional diversity, the second lowest diversity was seen in Colletotrichum higginsianum. The 102 CYP families of

C. higginisanum are divided into only 44 CYP families, the other Colletotrichum species – C. graminicola had the eighth lowest diversity. Almost all the Dothideomycete fungi showed low functional diversity

4.2.2 Metabolic distribution of CYPs

In order to identify the putative metabolic CYP repertoire of every species we compared our

CYP data with the putative metabolic classification of CYPs on the FCPD 1.2. We grouped

CYPs from each of the species into three functional categories: Primary, Secondary and

Xenobiotic metabolism to generate putative CYP metabolic profiles for each species (Figure

4.1). Genomes that are known for their secondary metabolism repertoire such as M. oryzae[151],

A. oryzae[152] and Colletotrichum species[153] encode large amount of secondary metabolism

CYPs. Fusarium encoded relatively higher amount of xenobiotic CYPs compared to secondary and primary metabolism CYPs. Among the Onygenales, the Coccidoides species encoded high

! 90 !

! 91 ! number of CYPs involved in primary metabolism and had severely reduced CYPs in other metabolic categories. On the other hand Trichophyton, Microsporum and Histoplasma fungi have much larger number of CYPs involved in secondary metabolism. Among the three

Trichoderma species, Trichoderma virens has a high number of CYPs involved in secondary metabolism as well as high number of CYPs overall. On the other hand, T. atroviride and T. reesei have more number of xenobiotic CYPs. Cochliobolus heterostrophus has the highest number of xenobiotic CYPs among Dothideomycetes. Alternaria brassicicola, Leptosphaeria maculans, Stagonospora nodorum, Mycosphaerella fijiensis all have comparatively more CYPs involved in secondary metabolism among the seven species in the subgroup.

4.2.3 CYP clan gains and losses in Pezizomycotina

In order to identify the loss and gain of various CYP functions we calculated CYP clan gains and losses using the tool CAFÉ[154]. The average expansions and contractions of CYP clans was calculated over Pezizomycotina classes/orders (Table 4.1). The average number of expansion was highest in Dothideomycetes while it was lowest in Eurotiomycetes. Among the top ten species with maximum CYP clan expansion, five were Sordariomycetes, three

Dothideomycetes and one each from Eurotiales and Onygenales (Supplementary Table 4.2).

On the other hand no expansions were observed in 11 species out of which again five of them were Sordariomycetes and three Dothideomycetes. The average number of loss of CYP clans was greatest in Sordariomycetes while it was lowest in Eurotiomycetes. The top ten species with maximum CYP clan loss includes five Eurotiales, while there was no loss among three

Sordariomycete species. There was also a wide variation of CYP clans retained among the

Pezizomycotina species. Among Eurotiomycetes orders, Onygenales on average have retained the most number of CYP clans since the last common ancestor while Eurotiales have undergone

! 92 !

! 93 ! considerable reduction since LCA. The Sordariomycete species have retained the least number of

CYP clans compared to other classes. In order to identify the core number of clans present in the last common ancestor of Pezizomycotina we performed a simple parsimony analysis. We identified the clans that were present in all the species under a monophyletic branch to find a core set of clans common to all the species in that clade, we then identified those clans that were present in all of the Pezizomycotina except for the species in a particular clade – these were named as clans that were lost from that clade (Figure 4.2). We found four clans that were common to all Pezizomycotina fungi, namely CYP51, CYP61, CYP52 and CYP58.

Table 4.1: average gain/loss in every class/order in Pezizomycotina Sordariomycetes! Dothideomycetes! Leotiomycetes! Eurotiomycetes! ! (20)! (6)! (2)! Onygenales! Eurotiales! ! ! ! (11)! (12)! ! Average! 2.72! 4.66! 6.25! 5! 3! Expansion!

Average! 88.54! 76.41! 80.1! 70.5! 90! Remain!

Average! 4.72! 14.91! 9.65! 20.5! 3! Decrease!

4.2.4 CYPs under selection pressure

Rapid gains and losses are often accompanied with high selection pressure; the high selection pressure is also seen in regions that are being selected for. CYPs are among the prime families that have been part of horizontal gene transfer events as well as gene for gene evolutionary warfare between hosts and fungi. In order to capture such events we examined the dN/dS values across CYP clans. We aligned the CYP sequences belonging to various clans and

! 94 ! used the Site-wise Likelihood Ratio (SLR) tool to calculate the dN/dS ratio for every clan. We selected all clans that had CYPs in atleast 10 out of the 51 species in our dataset; this resulted in the consideration of a total of 30 CYP clans (Supplementary Table 4.4). We found five CYP clans that seemed to be under positive selection pressure, namely CYP548, CYP547, CYP657,

CYP677 and CYP5063. There were 31 clans that had a dN/dS value less than 1 and five clans that had a dN/dS value of 0 indicating purifying and neutral selection respectively. It seems that there is a general purifying selection occurring on most CYP families indicating a pressure to maintain the sequence function. The overarching mechanism thus seems like duplication and diversification followed by purifying selection pressures.

4.2.5 CYP duplication in Pezizomycotina

One of the ways towards gene functional divergence is gene duplication followed by sub- functionalization or neo-functionalization of the duplicated gene copy [155]. Thus, it is safe to say that the presence of duplicated genes is suggestive of an evolutionary event (sub or neo) ready to occur. In order to capture such an event we looked at the total number of duplicated

CYPs present in the species in our dataset (Supplementary Table 4.5). We also calculated the ratio of these duplicated CYPs to the total number of CYPs present in every species (Figure 4.2).

The largest amount of duplicated CYPs was seen in the genomes of Histoplasma capsulatum, F. oxysporum and F. solani. Most species in the Onygenales had no duplicated CYPs; also

Neurospora species had no duplicated CYPs due to the active RIP mechanism that prevents accumulation of duplicated sequences. On average, the Eurotiales fungi had the maximum number of duplicated CYPs. !

! 95 !

4.2.6 Taxa and species-specific CYPs

The clustering of CYPs from the Pezizomycotina species generated several singlet clusters that had CYPs from specific species. In most cases these CYPs are placed into individual clusters because they are highly divergent in nature. We called these CYPs species-specific

(Supplementary Table 4.5), similarly we also identified class and order-specific CYPs for our analysis. Colletotrichum higginsianum encoded the highest number of species-specific CYPs at

(31 CYPs) followed by Fusarium oxysporum (22 CYPs), Sclerotinia sclerotiorum (21 CYPs) and Stagonospora nodorum (19 CYPs). Most Onygenales encoded no species-specific CYPs in their CYP repertoire. We calculated the ratio of species-specific CYPs to the total CYPomes in

Pezizomycotina to understand the relative abundance in each species (Figure 4.2). On comparing the ratio, S. sclerotiorum had the highest abundance of species-specific CYPs in its

CYPome followed by the other three species mentioned above.

4.2.7 Secretory CYPs of Pezizomycotina

Arguably the most popular example of a secreted CYP that is also a pathogenicity determinant is the pisatin demethylase secreted by the fungi Fusarium solani [59]. However, apart from this example there is very little known about the possibility of CYPs being secreted outside the cellular environment and other studies have pointed to this possibility [26, 47]. We wanted to identify if some of the CYPs in our data could be secreted outside the cell. We used the SignalP program to identify the probable secretory CYPs for every Pezizomycotina fungi

(Supplementary Table 4.5). Though the presence of this domain merely suggests CYPs located on cellular membranes, some of them might be secreted, this possibility has been suggested

! 96 ! before. However, experimental evidence will be required to prove the secretory possibility of these CYPs. We found species and phylum-specific patterns in the presence of CYPs with

SignalP domain. For instance, Verticillium species have been found to produce the highest amount of secretory proteins among Sordariomycetes, and they also find themselves in the list of species with highest number of secretory CYPs. Industrially used fungi such as Aspergillus niger and closely related Aspergillus carbonarius encoded large repertoire of putative secreted CYPs along with C. higginsianum and F. solani. When these secretory CYPs were compared to the total number of CYPs in every species (Figure 4.2), we found high number of these secretory

CYPs in Thielavia terrestris followed by A. carbonarius, A. fumigatus, Verticillium dahliae, V. albo-atrum, Sporotrichum thermophile and Mycosphaerella graminicola in that order. T. terrestris [156], A. carbonarius, A. fumigatus and S. thermophile are thermophilic fungi, some of them produce enzymes that have been predicted to have super-catalytic abilities with possible applications in the industry, and so their putative secretory CYP repertoire seems to be correlated with their enzymatic abilities. No phylum-wide patterns were found, however we found that

Onygenales encoded far less putative secretory CYPs compared to the other phyla.

4.3 Discussion

4.3.1 CYP functional diversity is correlated with pathogenic lifestyles

We wanted to identify if the CYP functional diversity was correlated with the lifestyles of the fungi. Nine broad fungal lifestyles were chosen, namely Saprophytic, Hemibiotrophic,

Necrotrophic, Endophytic, Biotrophic, Non-pathogenic, Human-pathogenic, Plant-pathogenic and Mycopathogenic (Figure 4.1). We found that 7 out of 10 species with low functional diversity were saprophytic, and 6 out of top 10 species with low functional diversity were

! 97 !

Hemibiotrophic in lifestyle. Interestingly, all the onygenales grouped in the middle of table between and high and low values of functional diversity. Among individual species,

Mycosphaerella fijiensis had the lowest CYP functional diversity, suggesting redundancy in CYP functions, this maybe due to the overall repetitive content of the genome that is predicted to be as high as 51%[157]. The other species with low CYP functional diversity - C. higginsianum and C. graminicola had repeat content of 9.1% and 22.3% respectively, also the former has been found to encode far more CYPs compared to the latter[153]. The industrially used fungi Aspergillus niger, A. terreus and Talaromyces stipitatus were a part of the low functional diversity group, the reason could be the artificial selection for industry enzymes that maybe amplified in these species. Saprophytic species such as Neurospora species, Podospora anserina, T. terrestris, and

Sporotrichum thermophile contained showed far higher CYP functional diversity. We found that on average the pathogenic species had comparatively less CYP functional diversity compared to non-pathogenic fungi. This observation is possibly due to pathogenic fungi having the need to accrue more CYPs involved in defense against host enzymes and CYPs involved in better colonization of hosts, such disproportionate expansions have also been observed in other gene families of pathogenic species[158]. On the other hand non-pathogenic fungi seem to have higher functional diversity and consequently encode a balanced number of CYP families and clans.

4.3.2 CYP clan losses and gains are correlated with fungal lifestyles

We identified 7 biotrophs, 3 endophytes, 11 hemibiotrophs, 5 necrotrophs and 25 saprotrophs among the 51 Pezizomycotina species (Figure 4.1, Figure 4.2, Supplementary Table 4.2). We examined the CYP clan gains and losses in each of these lifestyle categories. Biotrophs retain a large part of the CYP clans and show few large clan gains or losses. The largest is the loss of 15

! 98 !

CYP clans in Penicillium marneffei, a majority of these are involved in secondary metabolism.

Among the seven, Trichophyton equinum has the retained most number of clans. The endophytic

Trichoderma reesei and T. atroviride both show a loss of 12 CYP clans, while on the other hand

T. virens shows a gain of 7 clans (4 secondary metabolism clans) as well as an overall retention of CYP clans since the LCA. This is in agreement with the higher amount of secondary metabolism genes in T. virens compared to other Pezizomycotina species[124]. 7 out of the 11 hemibiotrophic species have lost or gained or in some cases both lost and gained more than 10

CYP clans, suggestive of more genomic activity compared to the other pathogens. C. parasitica shows the highest CYP clan loss at 41, among the lost clans characterized (11) based on metabolism 7 of them belong to the secondary metabolism category. Hemibiotrophs also show the least amount of retention of CYP clans from the LCA, 6/11 species are among the top ten with least retention of CYP clans.

4.3.3 Preferential gains/losses of CYPs in specific metabolic categories

We used the afore-mentioned metabolic classification to identify the CYP clan gain and loss among the metabolic categories (Supplementary Table 4.3). The Dothideomycete

Mycosphaerella graminicola retained the most number of CYP clans in the secondary metabolism category while M. oryzae retained the least number of clans among hemibiotrophs.

We found that hemibiotrophic fungi M. oryzae and Fusarium species had gained as well as lost more secondary metabolism CYPs compared to xenobiotic or primary metabolism CYPs. There were no major patterns among the saprophytic species except for the gain of about five clans in secondary and xenobiotic metabolism categories in Podospora anserina and P. brasiliensis and loss of 11 and 7 secondary metabolism clans in Aspergillus nidulans and P. chrysogenum respectively. Among the endophytic Trichoderma the only noticeable expansion was in

! 99 !

Trichoderma virens that showed a gain of 5 clans involved in secondary metabolism, while T. atroviride and T. reesei lost atleast 6 secondary metabolism clans. All the three species retained almost the same amount of CYP clans across all categories. There is a decrease in CYPs involved in secondary and xenobiotic CYPs in Coccidioides species, this reduction is also seen in other gene families in these species due to their lifestyle oriented around keratin rich animals[159].

Necrotrophic Dothideomycete fungi C. heterostrophus and P. tritici-repentis showed gain of most clans (11) and loss of most number of clans (13) respectively. Among the 11 clans gained by the former 5 of them are involved in xenobiotic metabolism, possibly suggestive of a need for xenobiotic degradation in the species. B. cinerea seems to have retained the most number of CYP clans compared to the other necrotrophs. Saprotrophs have the least number of

CYP clan gains or losses, 6/25 saprophytic Pezizomycotina species in the dataset had zero clan gain or loss. Overall we found higher loss of clans involved in secondary metabolism compared to other metabolic categories, while there was equal gain of clans in both secondary and xenobiotic metabolism.

Among biotrophs Penicillium marneffei showed gain of 4 clans involved in xenobiotic metabolism. The biotrophs seemed to have lost a number of CYP clans involved in secondary metabolism with Microsporum canis having lost as many as 6 clans, this maybe suggestive of host dependence for those select CYPs. Among necrotrophic fungi, Cochliobolus heterostrophus showed a gain of 5 xenobiotic CYP clans, while Stagonospora nodorum gained 4 secondary metabolism clans. The necrotrophic species retained more secondary metabolism clans compared to the other two categories. Recent evidence suggests the role of secondary metabolism genes in

!100 ! important necrotrophic processes[160, 161]; the retention and gain of secondary metabolism

CYPs seem to support this hypothesis.

4.3.4 CYP clan evolution

Among the four clans common to all of Pezizomycotina were CYP51 and CYP61 clans involved in the essential Ergosterol metabolism pathway, CYP52 involved in xenobiotic (e.g. alkane) degradation [162] and the CYP58 clan that contains CYPs that are part of several secondary metabolism clusters [163, 164]. The number of core clans varied among the different classes/orders, all Dothideomycetes species contained a core 19 CYP clans, Sordariomycetes contained 10 clans and Onygenales and Eurotiales contained 7 and 18 core clans respectively

(Figure 4.2). The difference in the core-conserved clans is indicative of the niche that may have been occupied by the ancestor of the three main classes.

The dN/dS analysis identified five CYP clans that seemed to be under positive selection pressure, namely CYP548, CYP547, CYP657, CYP677 and CYP5063. Among these five clans two (CYP548, CYP677) seem to be involved in xenobiotic metabolism, two (CYP547,

CYP5063) involved in secondary metabolism while CYP657 contains CYPs carrying out primary metabolism activities. CYP5063 clan is specific to Pezizomycotina and contains most

CYPs without any introns indicating a possible need for rapid transcription. Based on the gene neighborhood these CYPs also seem to be a part of a putative secondary metabolism cluster.

Interestingly the Eurotiomycete specific clan CYP657 mentioned above also consists CYPs under positive selection pressure. There were 31 clans that had a dN/dS value less than 1 and five clans that had a dN/dS value of 0 indicating purifying and neutral selection respectively. It seems that there is a general purifying selection occurring on most CYP families indicating a pressure

!101 ! to maintain the sequence function. The overarching mechanism thus seems like duplication and diversification followed by purifying selection pressures.

Thus, in order to find the extent of CYP duplication among Pezizomycotina species, we identified duplicated CYPs from every species (Figure 4.2). These duplication of CYPs maybe reflective of the need to duplicate a certain function, for e.g. the most number of duplications seen in Eurotiales are in the CYP51 and CYP61 (two copies in each) as well as CYP involved in secondary metabolism. CYP duplication was higher in pathogenic species as compared to non- pathogenic species, such phenomenon again has been observed for other gene families in fungi[158].

4.3.5 Species and Class-specific CYPs indicate niche specific diversification

The clustering of CYPs from the Pezizomycotina species generated several clusters that had

CYPs from specific class/order as well as specific species. These species and class-specific

CYPs have been described in the class-specific sections below:

Eurotiomycetes

We found 22 CYPs from Onygenales species (one copy in every species) in a single cluster, these CYPs seemed to belong to the CYP657 clan that contain CYPs involved in sphingolipid biosynthesis. Sphingolipid biosynthesis plays a critical role in signaling and pathogenicity especially in human pathogenic fungi, while there has been research on role of sphingolipids in pathogens such as Candida albicans, Cryptococcus neoformans, and Fusarium species there has been no work on the involvement of Sphingolipids in the pathogenicity of dermatophytes[165]. These CYPs may help in understanding the role of sphingolipids in

Onygenales fungi.

!102 !

Similarly, CYPs belonging to the CYP5081 clan were found to contain CYPs from some of the dermatophytes and A. fumigatus and N. fischeri, the clan members have been found to be highly expressed during cuticle degrading phase in entomopathogenic fungi. We described these

CYPs earlier[150] and their clustering separately suggests a preferential presence of this clan to utilize cuticle-like substances in the host organisms.

CYPs belonging to the CYP666 clan from Aspergillus species were found in a single cluster. We found that the CYPs belonging to this clan were also specifically found in the

Basidiomycete fungi Volvariella volvacea wherein they have been implicated in biodegradation[166]. The presence of this clan in different fungal phyla may be indicative of convergent evolution to metabolize xenobiotics. Similarly, CYPs belonging to the CYP672 clan involved in benzoate-4-monooxygenase activity were specifically found in Aspergillus species.

We also found several other cluster containing CYPs belonging to various families/clans -

CYP5089, CYP646, CYP5069, CYP5101 and CYP639 that were specific to Eurotiomycetes.

Sordariomycetes

Among Sordariomycetes we found CYPs from the CYP544 clan that were specific to the species from this class (except F. oxysporum and V. albo-atrum), one of these was a pseudoparalog from F. solani (Nectria haematococca)[140]. The clan seems to be related to plant CYP86 clan that is involved in fatty acid metabolism[142], and also seems to be present in oomycetes and some of the Basidiomycete species. The pattern suggests the possible retention of fatty acid metabolism from an ancestral CYP among some of the Sordariomycete fungi, the other possibility is the utilization of these fatty acids in plant pathogenicity by mimicking plant fatty acids.

!103 !

CYPs from the CYP618 family belonging to the CYP547 clan were present only among some Sordariomycetes, these CYPs had homologues in Mucor circinelloides and plants. The family seems to be involved in phenolic degradation [167]; the presence of this CYP family in fungi from basal lineage may be suggestive of specific loss in other fungi since speciation from plants. We also found CYPs present only in Fusarium species and Epichloe festucae species that belong to the CYP526 clan; the CYP family has been implicated in Trichothecene biosynthesis

[168]. These CYPs maybe indicative of secondary metabolites specific to these species

Similarly, we found CYPs present only in Fusarium and Trichoderma species that belong to the CYP642 clan, these CYPs show more than 50% identity to Basidiomycete fungi

Auricularia delicata. Possibly indicating some lateral transfer into the A. delicata genome of these genes. We also found other Sordariomycete-specific clusters containing CYPs belonging to families/clans CYP637, CYP549, CYP639 and more.

Dothideomycetes

Finally, we also found class specific cluster belonging to Dothideomycetes, (a) The largest class-specific clusters contained CYPs belonging to the CYP641 family (CYP657 clan), we found these CYPs to be divergent members of the family that were specific to

Dothideomycetes (b) Among a number of clusters that contained CYPs specific to P. tritici- repentis were members of the CYP5116 clan, again these CYPs represent highly divergent members of the clan.

4.3.6 Reasons to find species-specific CYPs

We identified species-specific CYPs as CYPs that are clustered into individual single member clusters. Most Onygenales had none to very few species-specific CYPs, this could be

!104 ! partially due to comparatively reduced genome size as well as the possibility of limited usage of

CYPs. The highest species-specific CYPs were found in Colletotrichum higginisianum, F. oxysporum and Sclerotinia sclerotiorum in that order. However, on comparing the number of species-specific CYPs with total CYPs the trend reverses. The presence of high number of species-specific CYPs in Sclerotinia sclerotiorum is partially due to the distance (phylogenetic) of the species from the rest of the sordariomycetes. However, the reason for finding such CYPs in the other two species is partially due to the presence of lineage specific chromosomes. For instance, in F. oxysporum out of 22 species-specific CYPs 7 are located on lineage specific chromosomes and 3 CYPs are on unpositioned scaffolds. Finally, we also wanted to find the reason for finding these species-specific CYPs in our dataset. For this we collected all the species-specific CYPs coming from the singlet clusters and identified some of these reasons. We found that majority of them 34% of these CYPs were highly divergent compared to the other

CYPs, about 32% of them were pseudoparalogues generated due to more than one reasons[169],interestingly about 24% of them were generated due to structural annotation errors, and about 10% were possible pseudogenes. These numbers not only suggest the various reason for observing these curious CYPs but also serve as a good way to curate genome data.

4.4 Conclusion

This is the first comprehensive phylogenomic analysis of CYPs from the Pezizomycotina subphyla. We were able to find several interesting facets of the cytochrome p450 gene content across the Pezizomycotina genomes. The putative functional and metabolic annotation of CYPs from every species indicates CYPomes that satisfy the needs of the niches occupied by the fungi in the dataset. We believe that analysis of such datasets using several databases and toolsets

!105 ! maybe useful for curation of annotation errors generated in large genome sequencing projects.

With more transcriptome and other expression data analysis of CYP and other protein families in the context of fungal genome evolution may become more meaningful.

4.5 Materials and methods

4.5.1 Description of the computational pipeline

The computational pipeline consisted of retrieving 6108 CYP sequences belonging to 52

Pezizomycotina species from Fungal Cytochrome P450 Database [76, 150]. These sequences were than compared with each other using Bi-BLAST to generate pairwise e=value dataset. The data was then submitted for clustering using a modified version of TribeMCL [107, 150] with parameter values: E-value=1e-50, Inflation factor = 5 and Coverage = 60%. The clustering generated 609 CYP clusters, these clusters were than used for various comparisons and analyses.

4.5.2 Comparison of Pezizomycotina with characterized CYPs

CYP family and clans categorized into the three putative metabolic categories (Primary,

Secondary and Xenobiotic) in the FCPD 1.2 were used for classifying the CYP data from

Pezizomycotina. These putative categories were used to annotate every CYP clan/family belonging to the 52 species.

4.5.3 Calculating dN/dS ratio

Alignments were generated for each of the top 30 CYP clans with sequences belonging to that clan. Extremely divergent sequences had to be ignored for the sake of the alignment.

MUSCLE package in the MEGA suite was used to generate these alignments. These alignments were then converted into codon alignments using the online tool PAL2NAL[170]. The codon

!106 ! alignments were submitted to the SLR[171] program (site-wise likelihood estimation) for calculation of omega value (dN/dS).

4.5.4 Phylogenetic tree construction

The phylogenetic species tree was created using concatenated sequences consisting of

RPB1 and RPB2 genes. The sequences were aligned using ClustalW tool, the maximum- likelihood method in the tool MEGA [95] was used to build the tree. The gene trees were created from the newick gene trees created by CAFÉ [154].

4.5.5 CYP clan gain and loss analysis

The CYP clan gain/loss was calculated using the tool CAFÉ, an ultrametric species tree was created using RPB1 and RPB2 using the tool MEGA. The branch lengths were then rounded off using the tool r8s [172] using the PL method. The species tree with rounded off branch lengths and table with distribution of CYP clans in each species was submitted to the CAFÉ tool to generate predictions. The CYP clans and families were identified from four data sources, the

FCPD 1.2 [150], Nelsons database (http://drnelson.uthsc.edu/CytochromeP450.html) and clans identified in the Deng et al [74] and Doddapaneni et al [113].

4.5.6 Finding secretory CYPs

The secretory CYPs are CYPs in which we found signal peptide domain. The CYP sequences were taken as favorites from the FCPD 1.2, these CYPs were checked for the presence of signal peptide domain using the SignalP program in the CFGP.

!107 !

Supplementary Figures

Supplementary figure 4.1: Venn Diagram of CYP cluster distribution

!108 !

Supplementary Tables

Supplementary Table 4.1: Number of CYPs, CYP families, Total ORFs in each of the 52 fungi

No. of No. of CYP CYP Total Total Species Family Clans CYPs ORFs Aspergillus oryzae 95 48 163 12,063 Aspergillus flavus 95 50 159 12,604 Aspergillus terreus 80 36 125 10,406 Aspergillus niger 86 40 154 11,200 Aspergillus carbonarius 87 41 134 11,624 Aspergillus nidulans 91 44 120 10,568 Aspergillus fumigatus 56 39 77 9,887 Neosartorya fischeri 68 39 94 10,406 Aspergillus clavatus 70 39 97 9,121 Penicillium chrysogenum 64 40 101 12,791 Talaromyces stipitatus 80 36 165 13,252 Penicillium marneffei 72 34 120 10,638 Trichophyton equinum 42 22 62 8,576 Trichophyton tonsurans 42 23 60 8,245 Trichophyton rubrum 44 24 62 8,643 Microsporum canis 56 29 85 8,777 Microsporum gypseum 46 27 70 8,876 Coccidioides immitis 33 18 45 10,457 Coccidioides posadasii 33 18 45 10,125 Uncinocarpus reesii 31 18 41 7,798 Histoplasma capsulatum 35 20 46 8,038 Blastomyces dermatitidis 31 19 38 9,587 Paracoccidioides brasiliensis 30 20 38 9,136 Sclerotinia sclerotiorum 32 34 96 14,522 Botrytis cinerea 65 54 136 16,448 Sporotrichum thermophile 45 28 49 8,806 Chaetomium globosum 69 37 92 11,124 Thielavia terrestris 50 33 61 9,815 Podospora anserina 72 37 115 10,596 Neurospora crassa 39 27 43 9,935 Neurospora tetrasperma 37 36 42 10,640 Neurospora discreta 41 39 43 9,948 Cryphonectria parasitica 74 39 121 11,184 Magnaporthe oryzae 77 41 135 12,991 Fusarium oxysporum 96 43 169 17,735 Fusarium verticillioides 83 39 129 14,199

!109 !

Fusarium graminearum 79 45 118 13,339 Fusarium solani 93 42 162 15,707 Trichoderma virens 80 42 120 11,643 Trichoderma reesei 51 30 73 9,129 Trichoderma atroviride 46 29 70 11,100 Verticiium dahliae 55 33 69 10,575 Verticillium albo-atrum 52 33 69 10,239 Colletotrichum graminicola 81 38 148 12,022 Colletotrichum higginsianum 102 44 230 16,150 Pyrenophora tritici-repentis 68 32 113 12,169 Cochliobolus heterostrophus 80 38 139 9,633 Leptosphaeria maculans 50 28 62 12,469 Stagonospora nodorum 91 44 149 15,983 Mycosphaerella graminicola 63 30 81 11,395 Mycosphaerella fijiensis 68 25 94 10,327

Supplementary Table 4.2: The expansion, decrease and remaining CYP clans among the

Pezizomycotina fungi

Species Expansion Remain Decrease

L. maculans 0 74 22

M. fijiensis 8 79 9

M. graminicola 2 89 5

P. tritici-repentis 7 76 13

S. nodorum 8 78 10

C. heterostrophus 11 82 3

P. marneffei 5 76 15

A. nidulans 3 50 43

P. chrysogenum 6 50 40

A. terreus 7 67 22

A. clavatus 3 77 16

A. carbonarius 4 78 14

!110 !

A. fumigatus 1 87 8

A. niger 9 81 6

N. fischeri 3 87 6

A. oryzae 1 92 3

T. stipitatus 7 86 3

A. flavus 2 92 2

S. sclerotiorum 1 87 8

B. cinerea 0 93 3

M. canis 2 85 9

T. rubrum 1 90 5

T. tonsurans 1 93 2

T. equinum 0 95 1

B. dermatitidis 0 91 5

P. brasiliensis 14 77 5

U. reesei 2 89 5

H. capsulatum 1 92 3

M. gypseum 0 93 3

C. posadasii 1 94 1

C. immitis 0 96 0

V. alboatrum 1 89 6

V. dahliae 0 92 4

T. atroviride 4 80 12

T. reesei 1 83 12

T. virens 7 87 2

C. parasitica 9 46 41

M. oryzae 30 31 35

!111 !

C. graminicola 4 70 22

F. verticillioides 2 77 17

F. graminearum 16 69 11

F. solani 12 74 10

C. higginsianum 7 84 5

F. oxysporum 2 92 2

S. thermophile 0 83 13

T. terrestris 1 83 12

P. anserina 12 75 9

C. globosum 7 87 2

N. crassa 0 96 0

N. discreta 0 96 0

N. tetrasperma 0 96 0

Supplementary Table 4.3: Expansions and Contraction of CYP clans from Pezizomycotina

involved in Primary Secondary and Xenobiotic metabolism.

Class/Order! Species! Expansion! Contraction!

Primary! Secondary! Xenobiotic! Primary! Secondary! Xenobiotic!

Eurotiales! A.#carbonarius# 1! 1! 1! 0! 4! 3!

Eurotiales! A.#clavatus# 0! 1! 2! 0! 2! 4!

Eurotiales! A.#flavus# 0! 0! 2! 0! 1! 1!

Eurotiales! A.#fumigatus# 0! 0! 1! 0! 2! 3!

Eurotiales! A.#nidulans# 1! 2! 0! 1! 11! 3!

Eurotiales! A.#niger# 1! 3! 3! 1! 2! 0!

!112 !

Eurotiales! A.#oryzae# 0! 1! 0! 0! 0! 0!

Eurotiales! A.#terreus# 2! 2! 2! 1! 2! 2!

Eurotiales! N.#fischeri# 0! 1! 1! 0! 0! 0!

Eurotiales! P.#chrysogenum# 1! 2! 3! 1! 7! 0!

Eurotiales! P.#marneffei# 0! 1! 4! 2! 4! 1!

Eurotiales! T.#stipitatus# 0! 3! 1! 0! 2! 0!

Onygenales! B.#dermatitidis# 0! 0! 0! 0! 1! 2!

Onygenales! C.#immitis# 0! 0! 0! 0! 0! 0!

Onygenales! C.#posadasii# 0! 0! 0! 1! 0! 0!

Onygenales! H.#capsulatum# 0! 0! 0! 0! 1! 1!

Onygenales! M.#canis# 0! 1! 0! 0! 6! 2!

Onygenales! M.#gypseum# 0! 0! 0! 0! 1! 0!

Onygenales! P.#brasiliensis# 3! 5! 5! 0! 1! 1!

Onygenales! T.#equinum# 0! 0! 0! 0! 0! 1!

Onygenales! T.#rubrum# 0! 0! 1! 2! 2! 0!

Onygenales! T.#tonsurans# 0! 0! 1! 0! 2! 0!

Onygenales! U.#reesei# 1! 1! 0! 1! 1! 0!

Dothideomycetes! C.#heterostrophus# 1! 3! 5! 0! 1! 0!

Dothideomycetes! L.#maculans# 0! 0! 0! 3! 3! 4!

Dothideomycetes! M.#fijiensis# 0! 3! 2! 1! 2! 1!

Dothideomycetes! M.#graminicola# 0! 0! 2! 0! 1! 0!

Dothideomycetes! P.#triticiDrepentis# 1! 2! 1! 1! 2! 3!

Dothideomycetes! S.#nodorum# 0! 4! 2! 0! 2! 1!

Leotiomycetes! B.#cinerea# 0! 0! 0! 0! 0! 1!

Leotiomycetes! S.#sclerotiorum# 0! 0! 0! 0! 4! 2!

!113 !

Sordariomycetes! C.#globosum# 1! 3! 2! 0! 2! 0!

Sordariomycetes! C.#graminicola# 0! 0! 1! 1! 9! 3!

Sordariomycetes! C.#higginsianum# 2! 5! 0! 0! 2! 2!

Sordariomycetes! C.#parasitica# 0! 6! 2! 2! 7! 2!

Sordariomycetes! F.#graminearum# 2! 6! 3! 0! 2! 3!

Sordariomycetes! F.#oxysporum# 0! 1! 1! 0! 1! 0!

Sordariomycetes! F.#solani# 0! 3! 5! 0! 2! 0!

Sordariomycetes! F.#verticillioides# 0! 1! 1! 0! 5! 4!

Sordariomycetes! M.#oryzae# 3! 8! 5! 1! 6! 5!

Sordariomycetes! N.#crassa# 0! 0! 0! 0! 0! 0!

Sordariomycetes! N.#discreta# 0! 0! 0! 0! 0! 0!

Sordariomycetes! N.#tetrasperma# 0! 0! 0! 0! 0! 0!

Sordariomycetes! P.#anserina# 1! 5! 6! 0! 4! 0!

Sordariomycetes! S.#thermophile# 0! 0! 0! 1! 4! 2!

Sordariomycetes! T.#atroviride# 1! 1! 1! 0! 7! 2!

Sordariomycetes! T.#reesei# 0! 0! 0! 1! 6! 2!

Sordariomycetes! T.#terrestris# 0! 0! 0! 2! 4! 2!

Sordariomycetes! T.#virens# 1! 4! 2! 0! 1! 0!

Sordariomycetes! V.#alboatrum# 0! 1! 0! 0! 3! 1!

Sordariomycetes! V.#dahliae# 0! 0! 0! 0! 3! 0!

!114 !

Supplementary Table 4.4: dN/dS values across the top 30 CYP clans in Pezizomycotina

CYP Family No. of dN/dS No. of Selection based on Putative Genomes Sequences dN/dS Function

CYP65* 42 0.01 463 Purifying Selection Secondary

CYP52* 42 0.05 374 Purifying Selection Xenobiotic

CYP58* 43 0.11 310 Purifying Selection Secondary

CYP68 37 0.02 212 Purifying Selection Secondary

CYP530 37 0.10 176 Purifying Selection Xenobiotic

CYP507 37 0.00 168 Neutral Xenobiotic

CYP548 42 2.67 150 Positive Xenobiotic

CYP531 42 0.07 147 Purifying Selection Xenobiotic

CYP547 39 1.78 146 Positive Secondary

CYP505 39 0.10 143 Purifying Selection Xenobiotic

CYP54 39 0.00 133 Neutral Secondary

CYP504 36 0.02 129 Purifying Selection Xenobiotic

CYP574 37 0.00 125 Neutral Secondary

CYP51 42 0.14 119 Purifying Selection Primary

CYP578 39 0.10 94 Purifying Selection Secondary

CYP61 43 0.10 93 Purifying Selection Primary

CYP53 35 0.00 75 Neutral Xenobiotic

CYP62 22 0.81 73 Purifying Selection Secondary

CYP540 28 0.10 73 Purifying Selection Primary

CYP526 28 0.09 66 Purifying Selection Secondary

CYP59 26 0.04 55 Purifying Selection Secondary

!115 !

CYP56 17 0.78 42 Purifying Selection Primary

CYP537 30 0.11 40 Purifying Selection Xenobiotic

CYP589 14 0.00 37 Neutral Unknown

CYP559 30 0.05 34 Purifying Selection Unknown

CYP534 28 0.70 34 Purifying Selection Unknown

CYP55 18 0.10 34 Purifying Selection Xenobiotic

CYP613 21 0.09 32 Purifying Selection Secondary

CYP657 24 49.99 29 Positive Primary

CYP659 11 0.04 27 Purifying Selection Unknown

CYP639 17 0.09 26 Purifying Selection Secondary

CYP533 32 0.10 25 Purifying Selection Xenobiotic

CYP546 16 0.04 25 Purifying Selection Unknown

CYP528 20 0.11 24 Purifying Selection Unknown

CYP677 13 12.89 23 Positive Unknown

CYP529 21 0.10 22 Purifying Selection Unknown

CYP544 17 0.18 20 Purifying Selection Unknown

CYP5063 10 50.00 16 Positive Unknown

CYP572 19 0.10 13 Purifying Selection Unknown

CYP550 34 0.10 10 Purifying Selection Secondary

CYP642 10 0.06 7 Purifying Selection Unknown

!116 !

Supplementary Table 4.5: CYP duplicates, Species-specific CYPs and CYPs with signalP domain from species in Pezizomycotina

Species- CYPs Duplicate CYPs specific with Total Total Species (pairs) CYPs SignalP CYPs ORFs Aspergillus oryzae 4 6 29 163 12,063 Aspergillus flavus 4 7 23 159 12,604 Aspergillus terreus 4 8 15 125 10,406 Aspergillus niger 8 4 35 154 11,200 Aspergillus carbonarius 7 3 45 134 11,624 Aspergillus nidulans 1 6 26 120 10,568 Aspergillus fumigatus 3 1 26 77 9,887 Neosartorya fischeri 3 3 22 94 10,406 Aspergillus clavatus 0 6 24 97 9,121 Penicillium chrysogenum 3 6 19 101 12,791 Talaromyces stipitatus 7 7 34 165 13,252 Penicillium marneffei 5 8 18 120 10,638 Trichophyton equinum 0 0 14 62 8,576 Trichophyton tonsurans 0 0 14 60 8,245 Trichophyton rubrum 0 0 7 62 8,643 Microsporum canis 0 3 18 85 8,777 Microsporum gypseum 0 3 12 70 8,876 Coccidioides immitis 0 2 6 45 10,457 Coccidioides posadasii 0 0 10 45 10,125 Uncinocarpus reesii 0 0 8 41 7,798 Histoplasma capsulatum 4 1 6 46 8,038 Blastomyces dermatitidis 0 0 7 38 9,587 Paracoccidioides brasiliensis 0 4 3 38 9,136 Sclerotinia sclerotiorum 0 21 12 96 14,522 Botrytis cinerea 1 10 19 136 16,448 Sporotrichum thermophile 0 1 14 49 8,806 Chaetomium globosum 1 8 22 92 11,124 Thielavia terrestris 0 0 26 61 9,815 Podospora anserina 3 8 25 115 10,596 Neurospora crassa 0 1 9 43 9,935 Neurospora tetrasperma 0 1 8 42 10,640 Neurospora discreta 0 0 9 43 9,948 Cryphonectria parasitica 1 8 23 121 11,184 Magnaporthe oryzae 5 5 20 135 12,991 Fusarium oxysporum 13 22 28 169 17,735 Fusarium verticillioides 2 8 21 129 14,199 Fusarium graminearum 2 8 20 118 13,339 Fusarium solani 10 8 36 162 15,707

!117 !

Trichoderma virens 2 2 17 120 11,643 Trichoderma reesei 0 0 9 73 9,129 Trichoderma atroviride 2 0 14 70 11,100 Verticiium dahliae 0 3 23 69 10,575 Verticillium albo-atrum 0 4 23 69 10,239 Colletotrichum graminicola 5 4 35 148 12,022 Colletotrichum higginsianum 7 31 58 230 16,150 Pyrenophora tritici-repentis 5 9 23 113 12,169 Cochliobolus heterostrophus 8 5 20 139 9,633 Leptosphaeria maculans 0 3 13 62 12,469 Stagonospora nodorum 1 19 28 149 15,983 Mycosphaerella graminicola 0 2 21 81 11,395 Mycosphaerella fijiensis 0 11 19 94 10,327

!118 !

Chapter 5

Fungal Calcium Signaling Database (FCSD): A community resource for calcium signaling in fungi

Calcium probably is one of the most versatile elements in biological systems. It serves as a pivotal signal in controlling diverse cellular and developmental processes to ensure the healthy functioning of every organism. The mechanism of translating external stimuli to specific cellular and developmental responses via changes in calcium ions plays an essential role in the plant- microbe and microbe-environmental interactions. Accordingly, many genes of the calcium- signaling pathway have been found to be virulence factors of fungal pathogens. How this simple and ubiquitous ion has evolved to control so many processes is one of the central questions in biology with many practical implications. Rapid advances in genome sequencing of many fungal and oomycete species have uncovered conserved core calcium signaling genes, as well as lineage-specific features. To support systematic studies on this evolutionary variability in fungi and oomycetes and the functional roles of individual genes, we built the Fungal Calcium

Signaling Database (FCSD; http://fcsd.ifungi.org/), an online platform that categorizes and annotates key calcium signaling proteins from more than 100 published fungal and oomycete genomes. The database also archives experimental results from studies on mutants of calcium signaling genes and resulting calcium signatures in both video and picture formats. The FCSD will greatly support the fungal community in studying and understanding calcium signaling.

!119 !

5.1 Background

Calcium is perhaps the most ubiquitous ion found in biological systems. Calcium signaling is involved in sensing extracellular signals and translating them into intracellular responses that in turn help in modulating various pathways in order to tackle abiotic [173] and biotic stresses. This translation happens via complex cascades that involve various molecules that pump and exchange calcium in achieving cellular homeostasis[174]. In fungi, the calcium- signaling pathway has been implicated in survival from extreme stress during host-pathogen interactions[175, 176], as well as coordinating critical cellular processes[177]. Genes involved in the calcium-signaling pathway have also been virulence factors of fungal pathogens[178, 179].

Despite the essential role played by calcium signaling across critical cellular processes and the availability of fungal genomic resources, the identification and characterization of the calcium signaling proteins in fungi have not been addressed intensively.

Genome sequencing has been constantly rising, especially with the lowering of sequencing costs and awareness of the impact of sequenced genomes[180]. Currently, over 300 fungal genomes are available from online public databases. With this rise in genome data, comparative genomics studies have been carried out to understand the critical differences in species, thereby elucidating the various fungal niches. Similar approaches have been adopted in trying to understand the calcium-signaling pathway; these approaches have revealed interesting patterns of calcium signaling gene distribution among filamentous fungi[176, 181, 182].

However, despite the advantages of such analysis there are no existing resources that allow for detailed comparative analysis of genes involved in the calcium-signaling pathway. In order to address this problem we developed the Fungal Calcium Signaling Database (FCSD; http://fcsd.ifungi.org/). The FCSD archives 4,187 calcium genes from 116 sequenced fungal and

!120 ! oomycete genomes. Although there are a myriad of genes that interact with calcium, recent studies have identified a core set of calcium signaling genes in fungi[176] [183]. Based on these studies we defined 32 genes (Table 5.1) that are important for the calcium signaling machinery.

These core set of genes were categorized into five classes namely calcium permeable channels, calcium pumps, calcium exchangers/antiporters, calcium signaling proteins and CAM-binding proteins (Downstream proteins). We added a phenotype management tool that allows users to add phenotype data in the form of gene knockout data or videos of calcium pulsatiles that illustrate the role of calcium in specific processes. FCSD also allows for analysis of the calcium signaling data in several ways. The FCSD is integrated with a blog that will allow for people to post their comments and feedback. We hope that the blog will also serve as a platform to discuss data archived in the database as well as the current status of calcium signaling in fungi.

Ultimately, the FCSD with its set of tools and features has been designed to become a community resource for calcium signaling in fungi.

5.2 Construction and content

5.2.1 Computational pipeline utilized in FCPD

A computational pipeline (Figure 5.1) was designed to identify core set of calcium signaling genes from fungal and oomycete genomes. (a) The first step of the pipeline consisted of identifying characterized calcium signaling genes for the core calcium signaling pathway, for this we scanned through the literature for characterized calcium signaling genes and found several important reports of essential genes that are part of the core calcium signaling pathway

(references). We identified 32 characterized calcium-signaling genes from the genome of

Saccharomyces cerevisiae (http://www.yeastgenome.org). (b) In the second step, these 32

!121 ! characterized calcium signaling genes were then compared with sequences from 116 fungal and

Oomycete genomes present in the Comparative Fungal Genomics Platform (CFGP; http://cfgp.riceblast.snu.ac.kr/) data warehouse[184]. This comparison was carried out using the

BLASTMatrix tool within the CFGP with an e-value=1e-05. (c) The third step consisted of collecting the best-matched protein sequences for the identification of orthologous gene clusters using a modified form of the TRIBE-MCL algorithm[107, 150] with an e-value=1e-50, inflation factor=5 and coverage=60%. Broad clusters were identified that corresponded with the five classes we defined earlier. (d) In the last step, these clusters were checked manually for presence of sequences with annotation errors. The annotation errors were mentioned explicitly for convenience of database users. Finally, using the computational pipeline we were able to identify a total of 4187 protein calcium signaling protein sequences.

5.2.2 Searching and accessing sequence data in FCSD

There are more than one ways to explore the calcium signaling protein sequence data stored in the FCSD. The Summary Table allows the users to access the sequence data via a table that consists of species listed in a tabular format with the corresponding number of sequences belonging to each species in the five categories mentioned above. On clicking the numbers users will be taken to a page where the protein and nucleotide sequences for each of the calcium signaling protein/genes can be accessed. The entire list of protein sequences can also be downloaded from the table itself using the download feature that allows you to select sequences from multiple species and download all the sequences in a single file. The Protein List tab consists of a list of the 32 calcium-signaling genes with links. On clicking a particular gene/protein name, the user is directed to a page that displays details for that respective gene by displaying the information from Saccharomyces Genome Database[185] in the FCSD browser

!122 ! window. A link at the top also allows users to access a table that shows the comparative genomic distribution of the gene/protein across 116 fungal and oomycete species.

5.2.3 BLAST utility

We have built a BLAST[108] utility in the FCSD to allow users to identify putative calcium signaling sequences. Alternatively, the BLAST utility can also be used to find sequences in the database. Currently, the utility accepts only protein sequences and hence supports only the

BLASTp feature. The parameters can be adjusted for the BLAST search as well as modified to obtain modified hits. The users can download the hits returned from the database using a custom download option.

5.2.4 Exploring the calcium-signaling pathway

One of the main features of the FCSD is to empower users to understand calcium- signaling pathway better by providing a diagrammatic representation of the pathway. The

Diagram tab achieves this by allowing the user to not only pick a species but also browse through taxa, on picking a certain species the specific calcium signaling genes can also be explored in the pathway. The feature than allows one to also download the specific protein/nucleotide sequence. This feature will allow users to pick the genes based on their role and location in the pathway (Figure 5.4). The utility also allows for the examination of the increase in the number of genes related to a certain function in the pathway, as well as comparison of these numbers across species.

5.2.5 Phenotype data exploration tool

In order to support some of the sequence data available on the FCSD we also added phenotype data. Phenotype data in the form of various gene knockouts in different conditions

!123 ! were added to the database. The Phenotype Data Management button allows the users to access this data. The interface allows users to not only view existing data but also submit phenotype data in the form of images (jpeg, gif, tiff, pdf) available with them. The view option allows for viewing of available phenotype data via species to explore single genes knockouts or knockouts of multiple genes. The images could consist of photos of petri-plates, plant specimen or even videos of experimental protocol (Figure 5.5).

5.2.6 Blog and twitter feeds on calcium signaling

A wordpress (http://www.wordpress.org) Blog was built into the FCSD to foster outreach and provide a platform for scientific discussions pertaining to calcium signaling in fungi. The blog is also plugged with the latest tweets on calcium signaling using appropriate hashtags. The goal of the blog is to allow for users, especially plant pathologists to share the latest news and views in the field of calcium signaling. Contributors can create an account to add content as well as post comments to existing content.

5.2.7 Reference page for identifying references

The References tab was also added for browsing references related to calcium signaling, the utility allows for browsing references as well as searching them based on title or author information. We added a total of 1521 reference papers for the users perusal. The utility also allows for addition of more references by the community using the add paper function. Each reference highlights the calcium signaling genes that feature in it, and for every reference added these genes can be highlighted.

!124 !

5.3 Utility and Discussion

The collection of fungal and oomycete calcium signaling genes in FCSD show a number of interesting patterns. For instance based on the data in the database, one can observe marked changes in the calcium signaling gene repertoire among all the major orders in fungi (Figure

5.3). Basidiomycota fungi have a total of 25-43 core calcium-signaling genes, in comparison

Ascomycota fungi have between 21-54 and the Oomycetes have 14-46 core calcium-signaling genes. Within the Ascomycota phyla there is further range of gene sets, with Saccharomycotina fungi having 21-49 genes and Pezizomycotina fungi have 32-52 core calcium-signaling genes.

These numbers signify the rise in the number of paralogs of these core calcium-signaling genes in every species across different phyla.

5.3.1 Diversity in the calcium-signaling pathway in fungal phyla

We also performed a comparison of gene numbers in the five broad functional categories mentioned above (Figure 6). We observed patterns of gene expansion and contraction across different fungal orders. In all categories, Microsporidia fungi show the lowest number of genes due to their parasitic lifestyle and an overall reduction in genome size. Curiously, the

Sordariomycete Fusarium and Colletotrichum species show an amplification of genes belonging to three out of five categories. In all three cases Colletotrichum higginsianum shows the maximum numbers, the high calcium signaling genes thus may have a role in pathogenicity and the hemibiotrophic lifestyle of the fungi [153]. Taking into account the genome sizes, the sub- phylum Saccharomycotina has undergone maximum expansion of the calcium signaling repertoire followed by Pezizomycotina and Basidiomycete fungi. The largest proportion of calcium signaling genes was found to belong to the Saccharomycete fungi Saccharomyces

!125 ! bayanus whereas the smallest proportion belonged to Blastocladiomycota fungi Alomyces macrogynus. The Microsporidia fungi show no orthologs to calcium permeable channels and calcium exchangers/antiporters indicating a possible loss of these two classes of calcium signaling genes/proteins. The Phytophthora species show a high number of calcium pumps compared to all species ranging from 10-24. Rhizopus oryzae has the maximum number of calcium signaling and CaM binding proteins, one of the reasons for such observation must be the genome duplication event that has occurred in the species[186]. Colletotrichum higginsianum is another species that has the second largest calcium-signaling repertoire in the database; it is also the eleventh largest in terms of the number of ORFs encoded in the genome.

5.3.2 Accuracy and annotation correction

The protein sequences are clustered based on a previously optimized clustering procedure based on the Tribe-MCL algorithm[107, 150]. The pipeline is able to generate clusters that coincide with the defined putative functional classes along with some clusters that seem to contain species-specific calcium associated genes. The clustering also generated false-positives in the form of genes that were present in the cluster but should not have been present and false- negatives that consisted of genes that got categorized into different clusters (in most cases into individual clusters). We analyzed these manually and found that almost all of such genes had structural annotation errors in them. We believe that such clustering of sequences helps in identifying annotation errors and helps in better curation of the dataset. The species-specific clusters contained genes that were either highly divergent, or had annotation errors in them.

!126 !

5.3.3 Simplified reference search

The FCSD stores all the possible references that mention or contain information regarding the calcium-signaling pathway. The reference tab leads the user to a utility that systematically stores references that can be searched in ways as mentioned above. Also, the reference utility shows the calcium signaling genes referred in the paper as buttons under every reference. On clicking the button, the user is taken to a list of information regarding that gene as well as a list of sequences available in the database. This utility can help the user in identifying the genes before reading the paper and thus makes it easier to search for the appropriate references.

5.3.4 Community applications for research

A lot of tools and databases have been built to address the burgeoning sequence data and more are scheduled to come. But often times the users do not get enough support to use the tools or databases. In many cases the tools are outdated and users have to interact with each other on various forums to solve the usability issues.

The FCSD blog serves as an avenue for us to accept feedback on the database and improve it continuously. The blog contains as posts detailed tutorials on the usage of every feature of the database. Users are encouraged to post their questions regarding any particular feature in the comments section. These questions shall be answered as well as archived into an

FAQ section when enough queries are generated.

Since its creation twitter has been used increasing by people to share scientific data, conference related news, latest findings and other happenings. Our twitter account allows people to know the latest additions to the FCSD by way of following us. Twitter feeds related to various

!127 ! topics have proved to be a vital source of information. The twitter feed in the FCSD blog serves to allow users to get the latest tweets in the field of calcium signaling. It also serves as another way of addressing user queries and feedback apart from the comments on the blog.

5.4 Evolution and comparative genomics of calcium signaling genes

The calcium signaling sequence data in the FCSD facilitates various types of phylogenomic and comparative genomics analysis. We wanted to summarize some of the patterns observed with the data stored in the database. In the following paragraphs some protein critical families have been examined from an evolutionary standpoint. We were able to observe species-specific, order-specific, phyla-specific as well as organism-specific patterns. There are several patterns, but briefly some of these have been covered in the following sections.

5.4.1 Calcium permeable channel proteins (CCH1, MID1, ECM7, FIG1 and

YVC1)

The Calcium permeable channels are voltage gated calcium channel proteins that carry out the influx/efflux of calcium to/from the cell. The five genes included in this category include the high-affinity calcium uptake system (HACS) genes [187]– CCH1, MID1 and ECM7, the low affinity calcium uptake system (LACS) gene – FIG1[188] and the vacuolar calcium influx gene –

YVC1[189]. These voltage gated Ca2+ channels have been hypothesized to evolve from single domain potassium channels before the separation of eukaryote species[190].

The HACS system consists of CCH1, MID1 and ECM7 genes that work together, the system has an essential role in calcium uptake[187]. All the three genes share to

!128 ! animal voltage gated calcium channels (VGCC) in animals[191]. The MID1 protein is homologues to alpha2delta subunit, while CCH1 shares high homology with alpha subunit1, and

ECM7 gene has similarity to gamma subunit of the VGCCs. The HACS system is activated on instantly whenever the fungal cell undergoes abiotic stresses in the form of low pH, administered antifungals or hypertonic shock[173] as well as mating[187]. In our clustering result, most fungi had a single copy of CCH1 gene. However, we also found duplicated CCH1 members (e.g.

Postia placenta, A. macrogynus). The chytrid fungi A. macrogynus has 6 copies of CCH1 homologues, and two copies of MID1 proteins. Most fungi and Oomycete have a single copy of

MID1 genes. However, Oomycetes and Zygomycota seem to have 2 or more copies of CCH1 genes. Curiously, apart from Saccharomycotina fungi all the species in the datasets seem to show an absence of ECM7 gene, even the early evolving chytrid fungi show no copies of ECM7. This could be indicative of a divergence of the HACS system in filamentous fungi and a remarkable conservation of the HACS system in yeasts from Saccharomycotina fungi.

FIG1 is a part of the LACS that is found to be critical for mating; the role of FIG1 has been characterized in Saccharomyces cerevisiae[192], Candida albicans[193] and Fusarium graminearum[188], thus covering almost all of Ascomycota. The FIG1 protein belongs to a

PMP22_Claudin superfamily of protein that also includes ECM7 mentioned above. It seems to play a major role in sexual development of Saccharomycotina fungi as well as F. graminearum species. It is mainly involved in calcium uptake when there is an excess availability. In FCSD, we found homologs of FIG1 gene in all Ascomycota but no homologs in any other fungal phyla and they were absent in Oomycetes as well. The LACS is probably replaced by a different system in Basidiomycota or may have been lost completely in other fungi. However, this needs to be tested experimentally with Basidiomycete and Oomycete fungi.

!129 !

YVC1 is a mechanosensitive Transient Receptor Potential channel homolog identified in

S. cerevisiae that is able to release calcium ions from the vacuole to the cytoplasm during hyperosmotic shock [194, 195]. There is no YVC1 protein homolog in Taphrinomycotina,

Microsporidia, Blastocladiomycota, Zygomycota, Oomycota clades. All fungi in Pezizomycotina clades have YVC1 homologues, but some fungi in Saccharomycotina have missing homologs of the protein (e.g. Ashbya gossypii, Debaryomyces hansenii, gossypii,

Kluyveromyces polysporus) Except two copies of YVC1 in Phanerochaete chrysosporium, other

Agricomycotina fungus have one copy of YVC1. The gene has been annotated as non- essential[189] and thus may have been lost in other fungi (apart from Ascomycota and

Basidiomycota). This selective absence of calcium permeable channel proteins indicates the divergence of the calcium influx/efflux system in fungi.

5.4.2 Calcium pumps (PMR1, PMC1, ENA1, SPF1 and NEO1)

Calcium pumps belong to the superfamily of P-type ATPases that hydrolyse ATP to drive the active transport of Calcium from the cytoplasm either out of the cell or into internal stores such as ER or Golgi. P-type designation indicates that these integral membrane proteins form phosphorylated intermediate during ATP hydrolysis. Most P-type ATPases mediate the transport of small cation transporters, and move phospholipids from one side of a membrane bilayer to the other. Based on substrated specificities the P-type ATPases are divided into five classes[196].

The PMR1 gene is a type IIA ATPase, PMC1 is type IIB ATPase, NEO1 is type IV ATPase,

ENA1 and SPF1 are type IID and type V ATPases respectively. Among these, NEO1 is the only essential P-type ATPase[197]. Based on the families represented, P-type ATPases in fungi seem to resemble animal ATPases than plant ATPases [5]. Also, it seems that fungi encode a lot more

ATPases as compared to other eukaryotic species, infact the total number of P-type ATPases

!130 ! encoded by the human genome is just two more than that encoded by the fungi Aspergillus fumigatus [5].

PMR1 is an ATPase that is involved in calcium transporting at the Golgi complex.

Knockdown of the PMR1 in Magnaporthe oryzae resulted in slow-growth and shut down of conidiation [183]. It has been seen that filamentous fungi have one homologue of PMR1 gene

[198], the clustering result of P-type ATPases shows 2 homologues in all Pezizomycotina and 1 homologue in most Saccharomycotina fungi. There are atleast 3 copies of PMR1 gene in

Basidiomycota with some species like Postia placenta having as many as 5 copies. The

Microsporidia fungi do not contain PMR1, PMC1 and ENA1 homologues. Interestingly, the

Phytophthora species show a high number of PMR1 homologues with P. infestans containing 13 of them. The uniformity in the number of PMR1 genes in Ascomycota maybe indicative of a more housekeeping role in this phyla as compared to Basidiomycota where this role may go beyond house keeping functions. PMC1 on the other hand in co-ordination with VCX1 carries out the removal of excess calcium from the vacuolar membrane[195]. The PMC1 mutants in M. oryzae have malformed conidia [183], however the mutants in S. cerevisiae do not show any discernable phenotype[198]. In FCSD, we find high variation in the gene copy numbers across

Ascomycota. The Pezizomycotina fungi contain from 1 to as many as 6 copies of PMC1 gene, the Saccharomycotina fungi on the other hand have only one copy across all the species. It is thus surprising that the PMC1 deletion mutant in S. cerevisiae should show no phenotype. We also carried out gene family gain and loss analysis of Pezizomycotina fungi using the tool

CAFÉ[154] to identify patterns of birth and death of genes involved in calcium signaling. The

PMC1 family shows a loss in the branch leading to the Onygenales and a gain on the branch leading to Sordariomycetes, the branch leading to Fusarium species seem to have gained one

!131 ! extra gene compared to rest of the fungi with the exception of Colletotrichum species (Figure

5.7). The Basidiomycota fungi on the other hand contain only one copy of PMC1 gene, Rhizopus oryzae contain 7 copies with Phytophthora species containing 4 copies each of PMC1.

ENA1 is a P-type ATPase that pumps out sodium from the plasma membrane. In the

FCSD, there is again marked variability in the number of ENA1 homologues. Thielavia terrestris does seem to be the exception in that it does not contain any homolog of ENA1. Cryphonectria parasitica is another exception in that it contains only one copy of this pump. The rest of the

Pezizomycotina fungi seem to contain 2-5 homologs, yet again the Fusarium clade show a gain in gene copy number compared to the rest of the fungi in the group (Figure 5.8). Most

Saccharomycotina fungi contain a single homolog of ENA1 with the exception of

Saccharomyces cerevisiae S288C that contains 3 copies. Again interestingly, most

Basidiomycete fungi do not contain any ENA1 with exceptions. The gene also does not have any homologs in Oomycete species, while chytrid fungi seem to carry atleast 2 copies. The SPF1 gene located in the ER works along with PMR1 much like the association between PMC1 and

VCX1. SPF1 is involved in protein folding and glycosylation and the knockout of the genes show effects similar to knocking out PMR1[199]. The gene knockout in M. oryzae shows a number of defects including growth, sporulation and appressorium formation[183]. All most all fungi in the FCSD contain a single copy of the gene, except a couple of candida species (genome duplication might be responsible for such an observation). The Oomycetes however contain two homologues of this gene. Lastly, NEO1 is the only essential P-type ATPases among the ATPases included in our database. It is required for retrograde transport between the golgi and ER and the knockout shows severe growth and pathogenicity effects[183, 197]. Most fungi contain 2 homologues of NEO1 with some containing 3 copies. The Oomycetes however contain more

!132 ! than 3 homologues of the gene; again P. infestans contains the most copies at 6 homologues of

NEO1. Thus, it seems that for most part, the calcium pump evolution and usage has possibly been different in Basidiomycota as compared to Ascomycota. This could have happened before the diversification of the two fungal phyla, this though has been also been advocated in a recent paper[200].

5.4.3 Calmodulin and Calcineurin

Calmodulin is a calcium ion binding protein that regulates calcium ion independent and dependent processes (Calcineurin involved). It is also one of the most conserved proteins across the fungal kingdom. Both Calmodulin and Calcineurin belong to the Efh domain superfamily of proteins that contain a calcium-ion binding site. These motifs have helix-loop-helix structure with an inter-helical loop. Calcineurin A consists of several domains, including the calmodulin and calcineurin B binding domains. Calmodulin interacts with many proteins and regulates CaM related proteins to diverse cellular effect such as muscle contraction, fertilization, cell proliferation, vesicular fusion, and apoptosis[201]. Calmodulin and calcineurin mediated calcium signaling also plays a critical role in virulence and pathogenicity[202]. There is a single copy of calmodulin in all fungi and oomycetes with the exception of the chytrid fungi. Rhizopus oryzae have as many as four calmodulin homologues and Phycomyces blakesleeanus has two copies.

One of the reasons could be the genome duplication of Rhizopus oryzae, and the other being a duplication of calmodulin in the ancestor of chytrid fungi. Calcineurin A and B are also single copy in most fungi except for some Saccharomycotina fungi and again the chytrid fungi. The oomycetes also posses more than one copies of these proteins.

!133 !

5.4.4 Calcium exchanger proteins (VCX1, VNX1)

Calcium exchanger proteins move calcium ions from cytoplasm to the various organelles and vice versa in order to maintain homeostasis. The VCX1 is one such calcium exchanger that is located in the vacuoles and works in co-ordination with PMC1 to efflux calcium from the vacuolar membrane[195]. Another protein that is homologous to VCX1 and is also located on the vacuolar membrane is the VNX1 gene. The VNX1 protein belongs to the type II calcium exchanger family and is a cation/H+ antiporter [203]. Based on reverse studies it has been shown that VNX1 along with VCX1 work towards sequestration of ions inside the vacuole especially K+ and Ca2+ , especially since vacuole is a major sink for cations [204]. There is a wide variation in the number of VCX1 homologues in Ascomycota between 1 in most

Saccharomycotina to 6 homologues in some of the Sordariomycete fungi. The Oomycete fungi seem to contain only one homologue each. However, there are only one copies of VNX1 in all fungi except the chytrids that have 2 copies each. The Oomycete fungi do not show any homologues of this gene. The gene family analysis shows the same patterns as mentioned above, with losses observed in Onygenales whereas large gain in Sordariomycetes, especially in the

Fusarium branch (Figure 5.9).

5.4.5 Phospholipase C

Phospholipase C (PLC) converts phospholipids into fatty acids and other lipophilic substances, PLC contributes two second messengers to the calcium-signaling pathway: Inositol

1,4,5-triphosphate (IP3) and diacylglycerol (DAG). However, this IP3 has not been found yet in the fungal system. The knockout of PLC in M. oryzae causes decrease in pigmentation as well as growth rate[183]. The deletion also caused defects in appressoria formation as well as

!134 ! pathogenicity, due to these reasons it has been identified to play a key role in fungal development and pathogenicity[178]. There are no PLC homologues in Oomycetes as well as early fungi that include the chytrids and the zygomycetes, suggesting that this may have been a late acquisition. Almost all Saccharomycotina fungi have a single copy of PLC gene, however the rest of the Ascomycota has atleast 2 copies of this gene. Most Basidiomycota have in between 1-3 PLC genes. The gene family analysis revealed that sordariomycetes have almost twice as many PLC genes as compared to the other phyla. Yet again, there is a gain in gene copies in Fusarium branch (Figure 5.10).

5.5 Conclusions

Calcium signaling pathway plays a pivotal role in the smooth functioning of the cell, yet there is no resource available so far for accessing information regarding calcium-signaling pathway specifically. The FCSD was created with a goal of building such a resource for calcium signaling in fungi and stimulate involvement and participation from the scientific community. In

FCSD, we aim to provide a repository of calcium signaling data (that includes sequence, phenotype and references) that will help the scientific community in understanding the role of calcium signaling in the fungal cell better. Thus, all the tools and features present in the FCSD are focused at achieving such participation. Also, the FCSD is the first database of its kind; there exists no such repertoire of calcium signaling genes from fungi. The calcium signaling gene repertoire in FCSD also allows for comparative and phylogenomic analysis that aid in understanding the evolution and niche adaptation of fungi. Our analysis reveals specific patterns of gene family gain and loss that may have allowed the respective fungi to adapt better to their surroundings.

!135 !

5.6 Tables

Table 5.1: 32 putative calcium-signaling genes that possibly constitute the core calcium- signaling pathway Functional class Gene Function Ref Voltage-gated high affinity Ca2+ Calcium permiable channels CCH1 channel [187] Stretch-activated Ca2+-permeable Calcium permiable channels MID1 channel [205] Integral membrane protein - Ca2+ Calcium permiable channels ECM7 uptake [187] Integral membrane protein - regulate Calcium permiable channels FIG1 Ca2+ influx [192] Vacuolar cation channel - Ca2+ release Calcium permiable channels YVC1 from vacuole [206] High affinity Ca2+/Mn2+ P-type Calcium pumps PMR1 ATPase [207] Calcium pumps PMC1 Vacuolar Ca2+ ATPase [208] Calcium pumps ENA1 P-type ATPase sodium pump [209] P-type ATPase - ER membrane ion Calcium pumps SPF1 transporter [210] Putative aminophospholipid translocase, Calcium pumps NEO1 ATPase like [183] Vacuolar membrane antiporter, Ca2+ Calcium exchangers / antiporter VCX1 exchange [211] Ca2+/H+ antiporter located in ER Calcium exchangers / antiporter VNX1 membrane [204] Calcium exchangers / antiporter MCU Mitochondrial Calcium Uniporter [212] MICU1 [213] Calcium signaling CMD1 Calmodulin, Ca2+ binding protein [214] Calcineurin A, catalytic subunit of Calcium signaling CNA1 calcineurin [215] Calcineurin B, regulatory subunit of Calcium signaling CNB1 calcineurin [216] Calnexin-like, integral membrane ER Calcium signaling CNE1 chaperone [217] Involved in calcineurin regulation in Calcium signaling RCN1 Ca2+ signaling [218] Calcineurin-responsive zinc finger Calcium signaling CRZ1 transcription factor [219] Calcium signaling FPR1 Peptidyl-prolyl cis-trans isomerase [4] Cytoplasmic peptidyl-prolyl cis-trans Calcium signaling CPR1 isomerase [4] Phospholipase C, hydrolyses PIP2 to Calcium signaling PLC1 IP3 and DAG [178] Calmodulin dependent protein kinase 1 CaM binding proteins (Downstream) CMK1&2 & 2 [220] CaM binding proteins (Downstream) RCK1&2 Protein kinases responsive to oxidative [221]

!136 !

stress Required in DNA damage-induced CaM binding proteins (Downstream) DUN1 transcription [222] Protein kinase required for cell cycle CaM binding proteins (Downstream) RAD53 arrest [223] Regulates organization and function of CaM binding proteins (Downstream) PAK1 cytoskeleton [224] Involved in endocytosis, and cell wall CaM binding proteins (Downstream) END3 morphogenesis [225] CaM binding proteins (Downstream) KIN4 Involved in mitosis exit network [226] Required for spindle orientation and CaM binding proteins (Downstream) ARP1 nuclear migration [227]

Table 5.2: Distribution of functionally characterized families of P-type ATPases (families 1–9) encoded within the genomes of 26 eukaryotes [5]

Organism Full P- Tot type Organism Family ATPases/family (n) al 1 2 3 4 5 6 7 8 9 Animal Anopheles gambiae 1 3 0 0 1 0 0 5 0 10 Animal 2 4 0 0 1 0 0 6 0 13 Animal 2 3 0 0 1 0 0 6 0 12 Animal Homo sapiens 6 8 0 0 2 0 0 8 0 24 Total 11 18 0 0 5 0 0 25 0 59 Plant Arabidopsis thaliana 0 14 12 0 4 4 0 12 0 46 Plant Oryza sativa 0 14 9 0 6 3 0 9 0 41 Total 0 28 21 0 10 7 0 21 0 87 Fungi Aspergillus fumigatus 2 5 3 0 3 0 0 4 3 20 Fungi Aspergillus nidulans 0 7 2 0 2 0 0 4 3 18 Fungi Aspergillus oryzae 2 6 3 0 4 0 0 4 3 22 Fungi Cryptococcus neoformans 0 3 2 0 1 0 0 4 1 11 Fungi Coccidioides posadasii 0 4 1 1 3 0 0 4 3 16 Fungi Neurospora crassa 0 4 1 0 3 0 0 5 3 16 Fungi Saccharomyces cerevisiae 0 2 2 0 3 0 0 5 2 14 Fungi Schizosaccharomyces pombe 0 2 2 0 1 0 0 5 1 11 Total 4 33 16 1 20 0 0 35 19 128 1 -Cell eukaryote Cyanidioschyzon merolae 0 3 1 0 1 0 0 2 0 7 1-Cell eukaryote Cryptosporidium parvum 0 2 0 0 0 0 0 3 0 5 1-Cell eukaryote Dictyostelium discoideum 2 3 1 1 3 0 0 9 0 19 1-Cell eukaryote Encephalitozoon cuniculi 0 0 0 0 0 0 0 2 0 2 1-Cell eukaryote Entamoeba histolytica 0 4 0 0 0 0 0 9 1 14 1-Cell Leishmania major 0 4 1 0 2 0 0 5 1 13

!137 ! eukaryote 1-Cell eukaryote Plasmodium falciparum 0 2 0 0 0 0 0 3 0 5 1-Cell eukaryote Trypanosoma brucei 0 5 3 0 0 0 0 4 1 13 1-Cell eukaryote Trypanosoma cruzi 0 3 1 0 1 0 0 7 1 13 1-Cell eukaryote Theileria parva 0 2 0 0 0 0 0 2 0 4 1-Cell eukaryote Thalassiosira pseudonana 0 4 1 0 0 0 0 1 1 7 Total 2 32 8 1 7 0 0 47 5 102 Ciliate thermophila 21 11 0 0 0 0 0 23 0 55 Total 38 122 45 2 42 7 0 151 24 431

5.7 Figures

Figure 5.1: FCSD computational pipeline

!138 !

Figure 5.2: Distribution of calcium signaling genes across organisms

Figure 5.3: Distribution of core calcium signaling genes among fungal orders

!139 !

Figure 5.4: The Diagram tab

!140 !

Figure 5.5: Phenotype data management tab

!141 !

Figure 5.6: Calcium signaling genes in 5 classes across different fungi

!142 !

Figure 5.7: Gene family gain and loss in P-type ATPase PMC1 gene family.

!143 !

Figure 5.8: Gene family gain and loss in P-type ATPase ENA1 gene family.

!144 !

Figure 5.9: Gene family gain and loss in P-type ATPase VCX1 gene family.

!145 !

Figure 5.10: Gene family gain and loss in PLC1 gene family.

!146 !

Chapter 6

Summary and Conclusion

It is perhaps the best time to be working in fungal genomics currently; there has been a realization of the potential applications of fungal enzymes in the biofuels industry, the bioremediation industry, the pharmaceutical industry as well as the synthetic biology industry. A key ingredient in the generation of this level of excitement is the relative ease of sequencing a fungal genome and the plethora of wonderful discoveries it entails. In order to exploit this fact a number of genomes have been sequenced, with more to be sequence as part of the 1000 fungal genomes project (http://1000.fungalgenomes.org/). With this explosion of data there is need to first (a) create tools that allow interpretation of sequence data and second (b) draw meaningful hypothesis that help in understanding the underpinnings to the fungal lifestyle and any possible applications. Several databases and tools such as Galaxy, UCSC genome browser, FungiDB and

ACGT have been built over the past few years to address the former need. Comparative genomics and phylogenomics are two methodologies that have been relatively recently developed to build hypothesis using genomic data. Comparative genomic analysis first coined perhaps in 1995 involves comparing genomes of organisms to explain the differences in their characteristics and features, Phylogenomics coined by John Eisen around 1998 involved combining phylogenetics and comparative genomics, specifically to answer questions regarding organismal evolution based on genome comparisons. These two tools have revolutionized the way genomic data can be interpreted. Among the vast applications, It has been possible to reveal entire pathways[228, 229], find disease/pathogenicity factors[230-232], drug targets[232, 233] using the fields of comparative genomics and phylogenomics.

!147 !

This dissertation focuses (a) building databases that store genomic sequences in ways that aid in performing comparative and phylogenomic analysis. Towards this I have developed and maintained two databases the Fungal Cytochrome P450 Database (FCPD 1.2) – mentioned in chapter 3 and the Fungal Calcium Signaling Database (FCSD) described in chapter 5. (b) Carry out comparative genomic and phylogenomic analyses using these bioinformatics cyber- infrastructure to understand fungal evolution. Though most fungi have a core-conserved genome, it is the species-specific features and genome content that make each one unique. Identifying these species-specific changes can help in understanding the adaptation of fungi to their niches.

6.1 A repository of fungal cytochrome P450 protein sequences

CYPs are some of the most widely studied protein families; in fungi they have been found to be vital for a variety of functions. The fungal cytochrome P450 database (FCPD) was developed in 2007 to store CYP sequences and other information from fungal genomes and it has been a widely used resource since its release. In the first part of my thesis I optimized the computational pipeline that was used in the FCPD to generate better clustering of CYP sequences. For this optimization and clustering an in-house clustering portal was built called the

Cytochrome P450 analysis platform. With the improvised clustering I published an improved release of the database called as the FCPD 1.2 in October 2012. The new release stores 22,972

CYP sequences from 217 species that includes fungi, oomycetes and other species for comparative analysis. I also added putative metabolic classification as well as comprehensive family/clan classification of CYP sequences to the database. I added 117 CYP clans and 292 families, and among them I putatively grouped 34 clans and 159 families into three functional classes primary, secondary and xenobiotic metabolism. This putatively functional categorization can help in understanding the CYP functional repertoire of fungi; with characterization of more

!148 !

CYPs there can be further putative functional annotation added to the database. I also added easy-to-use tutorials to aid database users; a feature to add characterized CYPs to the database, a complete catalogue of CYP clans and families with corresponding links to explore CYPs. Since its publication it has become a highly accessed article with a couple citations already. This goes to show the interest in the scientific community to understand fungal CYPs, in the future the

FCPD will be continually updated with latest releases of genomes and maintained to address the needs of the community.

6.2 Phylogenomic analyses of CYPs reveal taxa specific gains and losses

Sub-phylum Pezizomycotina constitutes one of the major groups of fungi that contain some of the most devastating plant and human pathogens. The group also contains fungi with a number of industrial applications. The new release of the database and the in-house platform was used to carry out in-depth phylogenomic analysis of CYPs from 51 species of Pezizomycotina.

The analyses revealed species-specific and taxa-specific gains and losses. I concluded that saprophytic and non-pathogenic fungi tended to have larger number of CYP families and clans when compared to pathogenic fungi with hemibiotrophic/biotrophic or necrotrophic lifestyles.

The most successful pathogens with diverse hosts tend to have a very dynamic and plastic

CYPome that may have undergone gains and losses suiting a cosmopolitan niche. Genomes with repeat induced polymorphisms (RIP) tended to have comparatively larger diversity in CYP clans and families possibly due to the active prevention in accumulation of duplicated sequences. One overarching feature of phylogenomic analyses is that most cases of pseudoparalogues that have arisen due to possible lateral transfer are subject to the number of genomes sequenced at the time of analysis. In our analyses we find that previously suspected lateral transfers could be discarded with better phylogenetic representation of species. I mention one such example involving the

!149 !

CYP505 family in chapter 3 (FCPD 1.2). I also found some secondary metabolism clusters specific to organisms, for instance I found a secondary metabolism cluster in Neosartorya fischeri that seems to have been completely deleted from Aspergillus fumigatus by virtue of

CYPs that are shared by these two closely related fungi. I also devised a putative metric of CYP functional classes in a species, this was calculated as the ratio of CYP families to clans. The higher the ratio the lower is the functional diversity of CYPs in the species. The availability of more expression data will help in functional annotation of CYP families that can in turn help expand the analysis capabilities of the FCPD 1.2. In future, experimental validation of such phenomena that can be observed via comparative genomics can help in finding novel gene clusters as well as enzymes. Also, transcriptomic and microarray studies of species CYPome could be a very helpful in getting a snapshot of the functional CYP repertoire of an organism.

6.3 Community platform for fungal calcium signaling

Calcium signaling affects fungal pathogenicity in many ways due to its ubiquitous need across several cellular pathways. The fungal calcium-signaling database was built to address the needs of understanding the components involved in mediating calcium signaling across all fungi.

32 core calcium-signaling genes from Saccharomyces cerevisiae that interact directly or indirectly with calcium were chosen, we called it the core calcium-signaling pathway. Some of these core genes have been described earlier to be critical for the growth [183, 188], virulence

[179, 210, 234], sporulation and overcoming abiotic stresses [202, 210]. We picked more than

116 fungal genomes to identify homologues of these 32 genes from S. cerevisiae. The optimized pipeline used in the FCPD 1.2 was used for identifying these homologues and cluster them based on sequence identity. I annotated these clusters based on the 32 genes and created larger functional groupings as mentioned in chapter 5. The data is presented in the FCSD to allow for

!150 ! comparative viewing, for instance in a tabular format where calcium signaling gene numbers across five functional groups can be compared between species, or the diagram tab that displays calcium signaling components on a pictorial representation of the hyphae. I also added two major features into the database, first a phenotype submission and viewing tool was added that allowed users to find knockout/phenotype data associated with each of the 32 genes across various conditions, and second a blog feature was connected to the database. I connected the FCSD with a blog feature to enhance community involvement as well as to address questions regarding the features in the FCSD as well as any suggestions that help in improving the resource.

6.4 Calcium signaling complexity varies greatly between fungal phyla

A comparison of the calcium signaling genes across fungi occupying diverse niches can provide insights into pathogenicity mechanisms. I performed phylogenomic analysis of the 32 genes across more than 115 genomes (described in chapter 5). I found that Ascomycota fungi seemed to have far more complex calcium-signaling network compared to Basidiomycota. I divided these 32 genes into five major classes (described in chapter 5) based on their broad role in the calcium movement. Within Ascomycota, the Pezizomycotina fungi have far more dense calcium signaling pathway compared to the Saccharomycotina members, this complexity difference within Ascomycota has been identified in before[176, 198]. Conserved members of the core calcium-signaling pathway such as calmodulin and calcineurin are single copies throughout most of the fungal kingdom are surprisingly maintained in multiple copies among zygomycota. Considering the role of calcium pumps and calcium exchangers in virulence and growth it is worthwhile comparing them among fungi with different lifestyles. I found that there were more calcium exchangers and calcium pumps in hemibiotrophs and necrotrophs compared to the other types of fungi of proteins, the biotrophs maintained high copies of calcium

!151 ! permeable channels and the saprophytes were rich in calcium signaling group of genes. I also carried out gene family evolution analysis to identify gene gains and losses, the analysis suggested that there have been a number of gains of calcium pump group of genes between

Fusarium and Colletotrichum (hemibiotrophs) species of fungi (details in chapter 5), whereas they are highly reduced among the onygenales. Also, oomycetes have an enormous repertoire of calcium exchangers and pumps. These niche-specific and species-specific expansions and losses suggest adaptive changes in calcium genes. Systematic knockout of these genes across various fungi may allow for better understanding of these expansions and how they benefit fungi.

6.5 Future work

It is quite clear that there is a need for computational resources such as databases and programs that can better store genomic data such that it becomes trivial to understand basic differences between various organisms. Through the work in my thesis I have found that sequence databases are a fantastic method to curate sequence data that inherently and unavoidably gets released with several errors. Protein family specific databases such as Fungal cytochrome P450 database (http://p450.riceblast.snu.ac.kr/), TransportDB

(http://www.membranetransport.org/), Transcription factor databases like TFDB

(http://115.156.249.50/TFDB/) and FTFD (http://ftfd.snu.ac.kr/) help in systematic analyses post genome sequencing especially for groups that are not directly involved in the sequencing of the respective genomes. However, I have found that very few tools and resources come with necessary support. Many times these resources have not been looked after post publication and there has to be more effort in ensuring maintenance of these resources. The FCPD 1.2 and the

FCSD developed in our group does not suffer from those issues due to competent IT support through lab and its collaborators.

!152 !

With my effort in the FCPD 1.2 I have provided the scientific community with a comprehensive resource for analyzing cytochrome p450 proteins. The extended CYP clan and family classification has led to the comprehensive identification of CYP clans and families in almost all of recently sequenced fungal genomes. The putative functional groupings will help the cytochrome p450 researchers to identify the functional significance of having certain set of CYP families in their favorite species as well as add more characterized CYPs. Such features have allowed users to interact and take ownership of the resource. Similarly, the FCSD is also built to contribute to community-wide collaboration and research on calcium signaling in fungi. The blog and the twitter features enhance such collaboration and interaction. The phenotype addition tool lets people upload their datasets and also get a sense of the importance of some of the genes to various species. Ultimately, these tools will contribute to more discoveries and knowledge on fungal biology.

I have also performed detailed phylogenomic analyses based on the above-mentioned protein families. These protein family analyses are hampered by the limited number of species that have been sequenced, this often leads to limited idea about the significance of certain observed pattern or worse misinterpretation. With the release of several genomes through the

1000 fungal genomes project and the individual genome sequencing initiatives undertaken worldwide will help in painting a comprehensive picture.

!153 !

Bibliography

1.! Arvas! M,! Kivioja! T,! Mitchell! A,! Saloheimo! M,! Ussery! D,! Penttila! M,! Oliver! S:! Comparison! of! protein!coding!gene!contents!of!the!fungal!phyla!Pezizomycotina!and!Saccharomycotina.!BMC# Genomics#2007,!8:325.! 2.! Albertin! W,! Marullo! P:! Polyploidy! in! fungi:! evolution! after! wholeJgenome! duplication.! Proceedings#Biological#sciences#/#The#Royal#Society#2012,!279(1738):2497Q2509.! 3.! Andrade! AC,! Van! Nistelrooy! JG,! Peery! RB,! Skatrud! PL,! De! Waard! MA:! The! role! of! ABC! transporters!from!Aspergillus!nidulans!in!protection!against!cytotoxic!agents!and!in!antibiotic! production.!Mol#Gen#Genet#2000,!263(6):966Q977.! 4.! ArevaloQRodriguez! M,! Wu! X,! Hanes! SD,! Heitman! J:! Prolyl! isomerases! in! yeast.! Frontiers# in# bioscience#:#a#journal#and#virtual#library#2004,!9:2420Q2446.! 5.! Thever!MD,!Saier!MH,!Jr.:!Bioinformatic!characterization!of!pJtype!ATPases!encoded!within!the! fully!sequenced!genomes!of!26!eukaryotes.!The#Journal#of#membrane#biology#2009,!229(3):115Q 130.! 6.! Money!NP:!The!triumph!of!the!fungi!:!a!rotten!history.!New!York!;!Oxford:!Oxford!University! Press;!2007.! 7.! Prescott!DDP!(ed.):!The!Triumph!of!the!Fungi:!Oxford!university!press;!2007.! 8.! Blackwell! M:! The! fungi:! 1,! 2,! 3! ...! 5.1! million! species?! American# journal# of# botany# 2011,! 98(3):426Q438.! 9.! Singh! LP,! Gill! SS,! Tuteja! N:! Unraveling! the! role! of! fungal! symbionts! in! plant! abiotic! stress! tolerance.!Plant#signaling#&#behavior#2011,!6(2):175Q191.! 10.! Bae!H,!Sicher!RC,!Kim!MS,!Kim!SH,!Strem!MD,!Melnick!RL,!Bailey!BA:!The!beneficial!endophyte! Trichoderma!hamatum!isolate!DIS!219b!promotes!growth!and!delays!the!onset!of!the!drought! response!in!Theobroma!cacao.!Journal#of#experimental#botany#2009,!60(11):3279Q3295.! 11.! Redman!RS,!Sheehan!KB,!Stout!RG,!Rodriguez!RJ,!Henson!JM:!Thermotolerance! generated! by! plant/fungal!symbiosis.!Science#2002,!298(5598):1581.! 12.! Waller! F,! Achatz! B,! Baltruschat! H,! Fodor! J,! Becker! K,! Fischer! M,! Heier! T,! Huckelhoven! R,! Neumann!C,!von!Wettstein!D#et#al:!The!endophytic!fungus!Piriformospora!indica!reprograms! barley!to!saltJstress!tolerance,!disease!resistance,!and!higher!yield.!Proc#Natl#Acad#Sci#U#S#A# 2005,!102(38):13386Q13391.! 13.! Rodriguez!R,!Redman!R:!More!than!400!million!years!of!evolution!and!some!plants!still!can't! make! it! on! their! own:! plant! stress! tolerance! via! fungal! symbiosis.! Journal# of# experimental# botany#2008,!59(5):1109Q1114.! 14.! Bowman!BH,!Taylor!JW,!White!TJ:!Molecular!evolution!of!the!fungi:!human!pathogens.!Mol#Biol# Evol#1992,!9(5):893Q904.! 15.! Baxter!M,!Mann!PR:!Electron! microscopic! studies! of! the! invasion! of! human! hair! in! vitro! by! three!keratinophilic!fungi.!Sabouraudia#1969,!7(1):33Q37.! 16.! Lacey! J,! Crook! B:! Fungal! and! actinomycete! spores! as! pollutants! of! the! workplace! and! occupational!allergens.!Ann#Occup#Hyg#1988,!32(4):515Q533.! 17.! Eaton!DL,!Gallagher!EP:!Mechanisms!of!aflatoxin!carcinogenesis.!Annu#Rev#Pharmacol#Toxicol# 1994,!34:135Q172.! 18.! Xiao!G,!Ying!SH,!Zheng!P,!Wang!ZL,!Zhang!S,!Xie!XQ,!Shang!Y,!St!Leger!RJ,!Zhao!GP,!Wang!C#et#al:! Genomic!perspectives!on!the!evolution!of!fungal!entomopathogenicity!in!Beauveria!bassiana.! Scientific#reports#2012,!2:483.! 19.! Hesseltine!CW:!A!Millennium!of!Fungi,!Food,!and!Fermentation.!Mycologia#1965,!57:149Q197.!

!154 !

20.! Punt!PJ,!van!Biezen!N,!Conesa!A,!Albers!A,!Mangnus!J,!van!den!Hondel!C:!Filamentous!fungi!as! cell!factories!for!heterologous!protein!production.!Trends#Biotechnol#2002,!20(5):200Q206.! 21.! Takao! S:! Organic! acid! production! by! Basidiomycetes.! I.! Screening! of! acidJproducing! strains.! Appl#Microbiol#1965,!13(5):732Q737.! 22.! Kinsella! JE,! Hwang! DH:! Enzymes! of! Penicillium! roqueforti! involved! in! the! biosynthesis! of! cheese!flavor.!CRC#Crit#Rev#Food#Sci#Nutr#1976,!8(2):191Q228.! 23.! Suzzi! G,! Lanorte! MT,! Galgano! F,! Andrighetto! C,! Lombardi! A,! Lanciotti! R,! Guerzoni! ME:! Proteolytic,! lipolytic! and! molecular! characterisation! of! Yarrowia! lipolytica! isolated! from! cheese.!Int#J#Food#Microbiol#2001,!69(1Q2):69Q77.! 24.! Campos! R,! Kandelbauer! A,! Robra! KH,! CavacoQPaulo! A,! Gubitz! GM:! Indigo! degradation! with! purified!laccases!from!Trametes!hirsuta!and!Sclerotium!rolfsii.!J#Biotechnol#2001,!89(2Q3):131Q 139.! 25.! Hamlyn!PF:!Fungal!Biotechnology.!In:!NWFG#Newsletter.!April!1997.! 26.! Martinez!D,!Larrondo!LF,!Putnam!N,!Gelpke!MD,!Huang!K,!Chapman!J,!Helfenbein!KG,!Ramaiya! P,! Detter! JC,! Larimer! F# et# al:! Genome! sequence! of! the! lignocellulose! degrading! fungus! Phanerochaete!chrysosporium!strain!RP78.!Nat#Biotechnol#2004,!22(6):695Q700.! 27.! Martinez!D,!Challacombe!J,!Morgenstern!I,!Hibbett!D,!Schmoll!M,!Kubicek!CP,!Ferreira!P,!RuizQ Duenas!FJ,!Martinez!AT,!Kersten!P# et#al:! Genome,! transcriptome,! and! secretome! analysis! of! wood!decay!fungus!Postia!placenta!supports!unique!mechanisms!of!lignocellulose!conversion.! Proc#Natl#Acad#Sci#U#S#A#2009,!106(6):1954Q1959.! 28.! Manzoni!M,!Rollini!M:!Biosynthesis!and!biotechnological!production!of!statins!by!filamentous! fungi! and! application! of! these! cholesterolJlowering! drugs.! Appl# Microbiol# Biotechnol# 2002,! 58(5):555Q564.! 29.! Gostincar! C,! Grube! M,! de! Hoog! S,! Zalar! P,! GundeQCimerman! N:! Extremotolerance! in! fungi:! evolution!on!the!edge.!FEMS#microbiology#ecology#2010,!71(1):2Q11.! 30.! Mironenko!NV,!Alekhina!IA,!Zhdanova!NN,!Bulat!SA:!Intraspecific!variation!in!gammaJradiation! resistance!and!genomic!structure!in!the!filamentous!fungus!Alternaria!alternata:!a!case!study! of!strains!inhabiting!Chernobyl!reactor!no.!4.!Ecotoxicol#Environ#Saf#2000,!45(2):177Q187.! 31.! Raffaele!S,!Kamoun!S:!Genome!evolution!in!filamentous!plant!pathogens:!why!bigger!can!be! better.!Nat#Rev#Microbiol#2012,!10(6):417Q430.! 32.! Corradi! N,! Slamovits! CH:! The! intriguing! nature! of! microsporidian! genomes.! Briefings# in# functional#genomics#2011,!10(3):115Q124.! 33.! Schmidt!SM,!Panstruga!R:!Pathogenomics!of!fungal!plant!parasites:!what!have!we!learnt!about! pathogenesis?!Curr#Opin#Plant#Biol#2011,!14(4):392Q399.! 34.! Covert!SF:!Supernumerary!chromosomes!in!filamentous!fungi.!Curr#Genet#1998,!33(5):311Q319.! 35.! Ma!LJ,!van!der!Does!HC,!Borkovich!KA,!Coleman!JJ,!Daboussi!MJ,!Di!Pietro!A,!Dufresne!M,!Freitag! M,! Grabherr! M,! Henrissat! B# et# al:! Comparative! genomics! reveals! mobile! pathogenicity! chromosomes!in!Fusarium.!Nature#2010,!464(7287):367Q373.! 36.! Stukenbrock!EH,!Jorgensen!FG,!Zala!M,!Hansen!TT,!McDonald!BA,!Schierup!MH:!WholeJgenome! and! chromosome! evolution! associated! with! host! adaptation! and! speciation! of! the! wheat! pathogen!Mycosphaerella!graminicola.!PLoS#Genet#2010,!6(12):e1001189.! 37.! van! der! Does! HC,! Rep! M:! Horizontal! transfer! of! supernumerary! chromosomes! in! fungi.! Methods#in#molecular#biology#2012,!835:427Q437.! 38.! Mardis!ER:!The! impact! of! nextJgeneration! sequencing! technology! on! genetics.!Trends#Genet# 2008,!24(3):133Q141.! 39.! Goffeau!A,!Barrell!BG,!Bussey!H,!Davis!RW,!Dujon!B,!Feldmann!H,!Galibert!F,!Hoheisel!JD,!Jacq!C,! Johnston!M#et#al:!Life!with!6000!genes.!Science#1996,!274(5287):546,!563Q547.!

!155 !

40.! Wood!V,!Gwilliam!R,!Rajandream!MA,!Lyne!M,!Lyne!R,!Stewart!A,!Sgouros!J,!Peat!N,!Hayles!J,! Baker! S# et# al:! The! genome! sequence! of! Schizosaccharomyces! pombe.! Nature# 2002,! 415(6874):871Q880.! 41.! Galagan!JE,!Calvo!SE,!Borkovich!KA,!Selker!EU,!Read!ND,!Jaffe!D,!FitzHugh!W,!Ma!LJ,!Smirnov!S,! Purcell!S# et#al:!The! genome! sequence! of! the! filamentous! fungus! Neurospora! crassa.!Nature# 2003,!422(6934):859Q868.! 42.! Galagan!JE,!Calvo!SE,!Cuomo!C,!Ma!LJ,!Wortman!JR,!Batzoglou!S,!Lee!SI,!Basturkmen!M,!Spevak! CC,!Clutterbuck!J#et#al:!Sequencing! of! Aspergillus! nidulans! and! comparative! analysis! with! A.! fumigatus!and!A.!oryzae.!Nature#2005,!438(7071):1105Q1115.! 43.! Dean!RA,!Talbot!NJ,!Ebbole!DJ,!Farman!ML,!Mitchell!TK,!Orbach!MJ,!Thon!M,!Kulkarni!R,!Xu!JR,! Pan!H#et#al:!The!genome!sequence!of!the!rice!blast!fungus!Magnaporthe!grisea.!Nature#2005,! 434(7036):980Q986.! 44.! Kamper!J,!Kahmann!R,!Bolker!M,!Ma!LJ,!Brefort!T,!Saville!BJ,!Banuett!F,!Kronstad!JW,!Gold!SE,! Muller! O# et# al:! Insights! from! the! genome! of! the! biotrophic! fungal! plant! pathogen! Ustilago! maydis.!Nature#2006,!444(7115):97Q101.! 45.! Jung!K,!Park!J,!Choi!J,!Park!B,!Kim!S,!Ahn!K,!Choi!D,!Kang!S,!Lee!YH:!SNUGB:!a!versatile!genome! browser!supporting!comparative!and!functional!fungal!genomics.!BMC#genomics#2008,!9:586.! 46.! Park!J,!Park!B,!Jung!K,!Jang!S,!Yu!K,!Choi!J,!Kong!S,!Kim!S,!Kim!H,!Kim!JF#et#al:!CFGP:!a!webJbased,! comparative!fungal!genomics!platform.!Nucleic#Acids#Res#2008,!36(Database!issue):D562Q571.! 47.! Manning!VA,!Pandelova!I,!Dhillon!B,!Wilhelm!LJ,!Goodwin!SB,!Berlin!AM,!Figueroa!M,!Freitag!M,! Hane!JK,!Henrissat!B#et#al:!Comparative!genomics!of!a!plantJpathogenic!fungus,!Pyrenophora! triticiJrepentis,!reveals!transduplication!and!the!impact!of!repeat!elements!on!pathogenicity! and!population!divergence.!G3#2013,!3(1):41Q63.! 48.! Gardiner! DM,! McDonald! MC,! Covarelli! L,! Solomon! PS,! Rusu! AG,! Marshall! M,! Kazan! K,! Chakraborty!S,!McDonald!BA,!Manners!JM:!Comparative! pathogenomics! reveals! horizontally! acquired! novel! virulence! genes! in! fungi! infecting! cereal! hosts.! PLoS# pathogens# 2012,! 8(9):e1002952.! 49.! Suzuki!H,!MacDonald!J,!Syed!K,!Salamov!A,!Hori!C,!Aerts!A,!Henrissat!B,!Wiebenga!A,!VanKuyk! PA,!Barry!K#et#al:!Comparative!genomics!of!the!whiteJrot!fungi,!Phanerochaete!carnosa!and!P.! chrysosporium,!to!elucidate!the!genetic!basis!of!the!distinct!wood!types!they!colonize.!BMC# Genomics#2012,!13:444.! 50.! Flor!HH:!The!Gene!for!Gene!concept.!Phytopathology#1942,!32.! 51.! Soanes!DM,!Alam!I,!Cornell!M,!Wong!HM,!Hedeler!C,!Paton!NW,!Rattray!M,!Hubbard!SJ,!Oliver! SG,! Talbot! NJ:! Comparative! genome! analysis! of! filamentous! fungi! reveals! gene! family! expansions!associated!with!fungal!pathogenesis.!PLoS#ONE#2008,!3(6):e2300.! 52.! Morrissey! JP,! Osbourn! AE:! Fungal! resistance! to! plant! antibiotics! as! a! mechanism! of! pathogenesis.!Microbiol#Mol#Biol#Rev#1999,!63(3):708Q724.! 53.! van!den!Brink!HM,!van!Gorcom!RF,!van!den!Hondel!CA,!Punt!PJ:! Cytochrome! P450! enzyme! systems!in!fungi.!Fungal#Genet#Biol#1998,!23(1):1Q17.! 54.! Cha! CJ,! Doerge! DR,! Cerniglia! CE:! Biotransformation! of! malachite! green! by! the! fungus! Cunninghamella+elegans.!Appl#Environ#Microbiol#2001,!67(9):4358Q4360.! 55.! Degtyarenko!KN:!Structural!domains!of!P450Jcontaining!monooxygenase!systems.!Protein#Eng# 1995,!8(8):737Q747.! 56.! Tudzynski! B:! Gibberellin! biosynthesis! in! fungi:! genes,! enzymes,! evolution,! and! impact! on! biotechnology.!Appl#Microbiol#Biotechnol#2005,!66(6):597Q611.! 57.! Rojas!MC,!Hedden!P,!Gaskin!P,!Tudzynski!B:!The!P450J1!gene!of!Gibberella!fujikuroi!encodes!a! multifunctional!enzyme!in!gibberellin!biosynthesis.!Proc#Natl#Acad#Sci#U#S#A#2001,!98(10):5838Q 5843.!

!156 !

58.! Kuc!J:!Phytoalexins,!stress!metabolism,!and!disease!resistance!in!plants.!Annu#Rev#Phytopathol# 1995,!33:275Q297.! 59.! Maloney!AP,!VanEtten!HD:!A!gene!from!the!fungal!plant!pathogen!Nectria+haematococca!that! encodes!the!phytoalexinJdetoxifying!enzyme!pisatin!demethylase!defines!a!new!cytochrome! P450!family.!Mol#Gen#Genet#1994,!243(5):506Q514.! 60.! KisQPapo!T,!Oren!A,!Wasser!SP,!Nevo!E:!Survival!of!filamentous!fungi!in!hypersaline!Dead!Sea! water.!Microb#Ecol#2003,!45(2):183Q190.! 61.! Aoyama!Y,!Noshiro!M,!Gotoh!O,!Imaoka!S,!Funae!Y,!Kurosawa!N,!Horiuchi!T,!Yoshida!Y:!Sterol! 14Jdemethylase!P450!(P45014DM*)!is!one!of!the!most!ancient!and!conserved!P450!species.!J# Biochem#1996,!119(5):926Q933.! 62.! Kelly! SL,! Lamb! DC,! Baldwin! BC,! Corran! AJ,! Kelly! DE:! Characterization! of! Saccharomyces+ cerevisiae! CYP61,!sterol!delta22Jdesaturase,!and!inhibition!by!azole!antifungal!agents.!J#Biol# Chem#1997,!272(15):9986Q9988.! 63.! Coleman!JJ,!White!GJ,!RodriguezQCarres!M,!Vanetten!HD:!An!ABC!transporter!and!a!cytochrome! P450!of!Nectria!haematococca!MPVI!are!virulence!factors!on!pea!and!are!the!major!tolerance! mechanisms!to!the!phytoalexin!pisatin.!Mol#Plant#Microbe#Interact#2011,!24(3):368Q376.! 64.! Collemare!J,!Pianfetti!M,!Houlle!AE,!Morin!D,!Camborde!L,!Gagey!MJ,!Barbisan!C,!Fudal!I,!Lebrun! MH,!Bohnert!HU:!Magnaporthe+grisea!avirulence!gene!ACE1!belongs!to!an!infectionJspecific! gene!cluster!involved!in!secondary!metabolism.!New#Phytol#2008,!179(1):196Q208.! 65.! Leal! GA,! Gomes! LH,! Albuquerque! PS,! Tavares! FC,! Figueira! A:! Searching! for! Moniliophthora+ perniciosa!pathogenicity!genes.!Fungal#Biol#2010,!114(10):842Q854.! 66.! Yan! X,! Ma! WB,! Li! Y,! Wang! H,! Que! YW,! Ma! ZH,! Talbot! NJ,! Wang! ZY:! A! sterol! 14alphaJ demethylase! is! required! for! conidiation,! virulence! and! for! mediating! sensitivity! to! sterol! demethylation!inhibitors!by!the!rice!blast!fungus!Magnaporthe!oryzae.!Fungal#Genet#Biol#2011,! 48(2):144Q153.! 67.! Gao!Q,!Jin!K,!Ying!SH,!Zhang!Y,!Xiao!G,!Shang!Y,!Duan!Z,!Hu!X,!Xie!XQ,!Zhou!G#et#al:!Genome! sequencing! and! comparative! transcriptomics! of! the! model! entomopathogenic! fungi! Metarhizium+anisopliae!and!M.+acridum.!PLoS#genetics#2011,!7(1):e1001264.! 68.! Lah!L,!Podobnik!B,!Novak!M,!Korosec!B,!Berne!S,!Vogelsang!M,!Krasevec!N,!Zupanec!N,!Stojan!J,! Bohlmann! J# et# al:! The! versatility! of! the! fungal! cytochrome! P450! monooxygenase! system! is! instrumental!in!xenobiotic!detoxification.!Mol#Microbiol#2011,!81(5):1374Q1389.! 69.! Martinez!DA,!Oliver!BG,!Graser!Y,!Goldberg!JM,!Li!W,!MartinezQRossi!NM,!Monod!M,!Shelest!E,! Barton!RC,!Birch!E#et#al:!Comparative! genome! analysis! of! Trichophyton! rubrum! and! related! dermatophytes!reveals!candidate!genes!involved!in!infection.!mBio#2012,!3(5):e00259Q00212.! 70.! Lewis! DF,! Watson! E,! Lake! BG:! Evolution! of! the! cytochrome! P450! superfamily:! sequence! alignments!and!pharmacogenetics.!Mutat#Res#1998,!410(3):245Q270.! 71.! Yoder!OC,!Turgeon!BG:!Fungal!genomics!and!pathogenicity.!Curr#Opin#Plant#Biol#2001,!4(4):315Q 321.! 72.! Nebert! DW,! Adesnik! M,! Coon! MJ,! Estabrook! RW,! Gonzalez! FJ,! Guengerich! FP,! Gunsalus! IC,! Johnson!EF,!Kemper!B,!Levin!W#et#al:!The!P450!gene!superfamily:!recommended!nomenclature.! DNA#1987,!6(1):1Q11.! 73.! Nelson!DR:!Metazoan!cytochrome!P450!evolution.!Comp#Biochem#Physiol#C#Pharmacol#Toxicol# Endocrinol#1998,!121(1Q3):15Q22.! 74.! Deng! J,! Carbone! I,! Dean! RA:! The! evolutionary! history! of! cytochrome! P450! genes! in! four! filamentous!Ascomycetes.!BMC#Evol#Biol#2007,!7:30.! 75.! Yadav!JS,!Doddapaneni!H,!Subramanian!V:!P450ome! of! the! white! rot! fungus! Phanerochaete+ chrysosporium:! structure,! evolution! and! regulation! of! expression! of! genomic! P450! clusters.! Biochem#Soc#Trans#2006,!34(Pt!6):1165Q1169.!

!157 !

76.! Park!J,!Lee!S,!Choi!J,!Ahn!K,!Park!B,!Kang!S,!Lee!YH:!Fungal! cytochrome! P450! database.!BMC# genomics#2008,!9:402.! 77.! Davis!ND,!Diener!UL,!Eldridge!DW:!Production!of!aflatoxins!B1!and!G1!by!Aspergillus!flavus!in!a! semisynthetic!medium.!Appl#Microbiol#1966,!14(3):378Q380.! 78.! Denning!DW:!Invasive!aspergillosis.!Clin#Infect#Dis#1998,!26(4):781Q803;!quiz!804Q785.! 79.! Denning! DW,! Follansbee! SE,! Scolaro! M,! Norris! S,! Edelstein! H,! Stevens! DA:! Pulmonary! aspergillosis! in! the! acquired! immunodeficiency! syndrome.! N# Engl# J# Med# 1991,! 324(10):654Q 662.! 80.! Rokas!A,!Payne!G,!Fedorova!ND,!Baker!SE,!Machida!M,!Yu!J,!Georgianna!DR,!Dean!RA,!Bhatnagar! D,!Cleveland!TE#et#al:!What!can!comparative!genomics!tell!us!about!species!concepts!in!the! genus!Aspergillus?!Stud#Mycol#2007,!59:11Q17.! 81.! Li! L,! Stoeckert! CJ,! Jr.,! Roos! DS:! OrthoMCL:! identification! of! ortholog! groups! for! eukaryotic! genomes.!Genome#Res#2003,!13(9):2178Q2189.! 82.! Nelson! DR,! Koymans! L,! Kamataki! T,! Stegeman! JJ,! Feyereisen! R,! Waxman! DJ,! Waterman! MR,! Gotoh! O,! Coon! MJ,! Estabrook! RW# et# al:! P450! superfamily:! update! on! new! sequences,! gene! mapping,!accession!numbers!and!nomenclature.!Pharmacogenetics#1996,!6(1):1Q42.! 83.! Yu!J,!Chang!PK,!Cary!JW,!Wright!M,!Bhatnagar!D,!Cleveland!TE,!Payne!GA,!Linz!JE:!Comparative! mapping!of!aflatoxin!pathway!gene!clusters!in!Aspergillus!parasiticus!and!Aspergillus!flavus.! Appl#Environ#Microbiol#1995,!61(6):2365Q2371.! 84.! Woitek!S,!Unkles!SE,!Kinghorn!JR,!Tudzynski!B:!3JHydroxyJ3JmethylglutarylJCoA!reductase!gene! of!Gibberella!fujikuroi:!isolation!and!characterization.!Curr#Genet#1997,!31(1):38Q47.! 85.! Homann!V,!Mende!K,!Arntz!C,!Ilardi!V,!Macino!G,!Morelli!G,!Bose!G,!Tudzynski!B:!The!isoprenoid! pathway:!cloning!and!characterization!of!fungal!FPPS!genes.!Curr#Genet#1996,!30(3):232Q239.! 86.! Mende!K,!Homann!V,!Tudzynski!B:!The!geranylgeranyl!diphosphate!synthase!gene!of!Gibberella! fujikuroi:!isolation!and!expression.!Mol#Gen#Genet#1997,!255(1):96Q105.! 87.! Tudzynski!B,!Holter!K:!Gibberellin!biosynthetic!pathway!in!Gibberella!fujikuroi:!evidence!for!a! gene!cluster.!Fungal#Genet#Biol#1998,!25(3):157Q170.! 88.! Tudzynski!B,!Kawaide!H,!Kamiya!Y:!Gibberellin!biosynthesis!in!Gibberella!fujikuroi:!cloning!and! characterization!of!the!copalyl!diphosphate!synthase!gene.!Curr#Genet#1998,!34(3):234Q240.! 89.! Tudzynski!B,!Homann!V,!Feng!B,!Marzluf!GA:!Isolation,!characterization!and!disruption!of!the! areA!nitrogen!regulatory!gene!of!Gibberella!fujikuroi.!Mol#Gen#Genet#1999,!261(1):106Q114.! 90.! Tudzynski!B,!Rojas!MC,!Gaskin!P,!Hedden!P:!The!gibberellin!20Joxidase!of!Gibberella!fujikuroi!is! a!multifunctional!monooxygenase.!J#Biol#Chem#2002,!277(24):21246Q21253.! 91.! Tudzynski!B,!Mihlan!M,!Rojas!MC,!Linnemannstons!P,!Gaskin!P,!Hedden!P:!Characterization!of! the!final!two!genes!of!the!gibberellin!biosynthesis!gene!cluster!of!Gibberella!fujikuroi:!des!and! P450J3! encode! GA4! desaturase! and! the! 13Jhydroxylase,! respectively.! J# Biol# Chem# 2003,! 278(31):28635Q28643.! 92.! Voss!T,!Schulte!J,!Tudzynski!B:!A!new!MFSJtransporter!gene!next!to!the!gibberellin!biosynthesis! gene!cluster!of!Gibberella!fujikuroi!is!not!involved!in!gibberellin!secretion.!Curr#Genet#2001,! 39(5Q6):377Q383.! 93.! Mukherjee! M,! Horwitz! BA,! Sherkhane! PD,! Hadar! R,! Mukherjee! PK:! A! secondary! metabolite! biosynthesis!cluster!in!Trichoderma!virens:!evidence!from!analysis!of!genes!underexpressed!in! a!mutant!defective!in!morphogenesis!and!antibiotic!production.!Curr#Genet#2006,!50(3):193Q 202.! 94.! Jeanmougin!F,!Thompson!JD,!Gouy!M,!Higgins!DG,!Gibson!TJ:!Multiple!sequence!alignment!with! Clustal!X.!Trends#Biochem#Sci#1998,!23(10):403Q405.! 95.! Kumar! S,! Nei! M,! Dudley! J,! Tamura! K:! MEGA:! a! biologistJcentric! software! for! evolutionary! analysis!of!DNA!and!protein!sequences.!Brief#Bioinform#2008,!9(4):299Q306.!

!158 !

96.! Bernhardt!R:!Cytochromes!P450!as!versatile!biocatalysts.!J#Biotechnol#2006,!124(1):128Q145.! 97.! Guengerich!FP:!Cytochrome!p450!and!chemical!toxicology.!Chem#Res#Toxicol#2008,!21(1):70Q83.! 98.! Schuler! MA,! WerckQReichhart! D:! Functional! genomics! of! P450s.! Annu# Rev# Plant# Biol# 2003,! 54:629Q667.! 99.! Cresnar!B,!Petric!S:!Cytochrome!P450!enzymes!in!the!fungal!kingdom.!Biochim#Biophys#Acta,! 1814(1):29Q35.! 100.! Gonzalez! FJ,! Nebert! DW:! Evolution! of! the! P450! gene! superfamily:! animalJplant! 'warfare',! molecular!drive!and!human!genetic!differences!in!drug!oxidation.!Trends#Genet#1990,!6(6):182Q 186.! 101.! Siewers!V,!Viaud!M,!JimenezQTeja!D,!Collado!IG,!Gronover!CS,!Pradier!JM,!Tudzynski!B,!Tudzynski! P:!Functional!analysis!of!the!cytochrome!P450!monooxygenase!gene!bcbot1!of!Botrytis+cinerea! indicates!that!botrydial!is!a!strainJspecific!virulence! factor.!Mol#Plant#Microbe#Interact#2005,! 18(6):602Q612.! 102.! Soanes!DM,!Richards!TA,!Talbot!NJ:!Insights!from!sequencing!fungal!and!oomycete!genomes:! what!can!we!learn!about!plant!disease!and!the!evolution!of!pathogenicity?!Plant#Cell#2007,! 19(11):3318Q3326.! 103.! Nelson!DR,!Kamataki!T,!Waxman!DJ,!Guengerich!FP,!Estabrook!RW,!Feyereisen!R,!Gonzalez!FJ,! Coon!MJ,!Gunsalus!IC,!Gotoh!O#et#al:!The!P450!superfamily:!update!on!new!sequences,!gene! mapping,!accession!numbers,!early!trivial!names!of!enzymes,!and!nomenclature.!DNA#Cell#Biol# 1993,!12(1):1Q51.! 104.! Nelson!D,!WerckQReichhart!D:!A!P450Jcentric!view!of!plant!evolution.!Plant#J#2011,!66(1):194Q 211.! 105.! Feyereisen!R:!Arthropod!CYPomes!illustrate!the!tempo!and!mode!in!P450!evolution.!Biochim# Biophys#Acta#2011,!1814(1):19Q28.! 106.! Krause! A,! Stoye! J,! Vingron! M:! Large! scale! hierarchical! clustering! of! protein! sequences.! BMC# Bioinformatics#2005,!6:15.! 107.! Enright! AJ,! Van! Dongen! S,! Ouzounis! CA:! An! efficient! algorithm! for! largeJscale! detection! of! protein!families.!Nucleic#acids#research#2002,!30(7):1575Q1584.! 108.! Altschul!SF,!Gish!W,!Miller!W,!Myers!EW,!Lipman!DJ:!Basic!local!alignment!search!tool.!J#Mol# Biol#1990,!215(3):403Q410.! 109.! Reed!WJ,!Hughes!BD:!A! model! explaining! the! size! distribution! of! gene! and! protein! families.! Math#Biosci#2004,!189(1):97Q102.! 110.! Unger!R,!Uliel!S,!Havlin!S:!Scaling!law!in!sizes!of!protein!sequence!families:!from!superJfamilies! to!orphan!genes.!Proteins#2003,!51(4):569Q576.! 111.! Syed! K,! Doddapaneni! H,! Subramanian! V,! Lam! YW,! Yadav! JS:! GenomeJtoJfunction! characterization! of! novel! fungal! P450! monooxygenases! oxidizing! polycyclic! aromatic! hydrocarbons!(PAHs).!Biochem#Biophys#Res#Commun#2010,!399(4):492Q497.! 112.! Ide! M,! Ichinose! H,! Wariishi! H:! Molecular! identification! and! functional! characterization! of! cytochrome!P450!monooxygenases!from!the!brownJrot!basidiomycete!Postia+placenta.!Arch# Microbiol#2012,!194(4):243Q253.! 113.! Doddapaneni!H,!Chakraborty!R,!Yadav!JS:!GenomeJwide!structural!and!evolutionary!analysis!of! the! P450! monooxygenase! genes! (P450ome)! in! the! white! rot! fungus! Phanerochaete+ chrysosporium:!evidence!for!gene!duplications!and!extensive!gene!clustering.!BMC#genomics# 2005,!6:92.! 114.! Kitazume! T,! Takaya! N,! Nakayama! N,! Shoun! H:! Fusarium+ oxysporum! fattyJacid! subterminal! hydroxylase! (CYP505)! is! a! membraneJbound! eukaryotic! counterpart! of! Bacillus+ megaterium! cytochrome!P450BM3.!J#Biol#Chem#2000,!275(50):39734Q39740.!

!159 !

115.! Zimmer!T,!Ohkuma!M,!Ohta!A,!Takagi!M,!Schunck!WH:!The!CYP52!multigene!family!of!Candida+ maltosa!encodes!functionally!diverse!nJalkaneJinducible!cytochromes!P450.!Biochem#Biophys# Res#Commun#1996,!224(3):784Q789.! 116.! Craft!DL,!Madduri!KM,!Eshoo!M,!Wilson!CR:!Identification! and! characterization! of! the! CYP52! family! of! Candida+ tropicalis! ATCC! 20336,! important! for! the! conversion! of! fatty! acids! and! alkanes!to!alpha,omegaJdicarboxylic!acids.!Appl#Environ#Microbiol#2003,!69(10):5983Q5991.! 117.! Aoyama! Y:! Recent! progress! in! the! CYP51! research! focusing! on! its! unique! evolutionary! and! functional!characteristics!as!a!diversozyme!P450.!Front#Biosci#2005,!10:1546Q1557.! 118.! Ciuffetti!LM,!Manning!VA,!Pandelova!I,!Betts!MF,!Martinez!JP:!HostJselective!toxins,!Ptr!ToxA! and!Ptr!ToxB,!as!necrotrophic!effectors!in!the!Pyrenophora+tritici>repentisJwheat!interaction.! New#Phytol#2010,!187(4):911Q919.! 119.! Deng! J:! Structural,! Functional! and! Evolutionary! Analyses! of! the! Rice! Blast! Fungal! Genome.! Raleigh:!North!Carolina!State!University;!2006.! 120.! Salaun!JP,!Helvig!C:!Cytochrome!P450Jdependent!oxidation!of!fatty!acids.!Drug#Metabol#Drug# Interact#1995,!12(3Q4):261Q283.! 121.! Ohkuma!M,!Muraoka!S,!Tanimoto!T,!Fujii!M,!Ohta!A,!Takagi!M:!CYP52! (cytochrome! P450alk)! multigene! family! in! Candida+ maltosa:!identification!and! characterization! of! eight! members.! DNA#Cell#Biol#1995,!14(2):163Q173.! 122.! Oh!Y,!Donofrio!N,!Pan!H,!Coughlan!S,!Brown!DE,!Meng!S,!Mitchell!T,!Dean!RA:!Transcriptome! analysis!reveals!new!insight!into!appressorium!formation!and!function!in!the!rice!blast!fungus! Magnaporthe+oryzae.!Genome#Biol#2008,!9(5):R85.! 123.! Gruber! S,! SeidlQSeiboth! V:! Self! vs.! nonJself:! Fungal! cell! wall! degradation! in! Trichoderma.! Microbiology#2012.! 124.! Kubicek!CP,!HerreraQEstrella!A,!SeidlQSeiboth!V,!Martinez!DA,!Druzhinina!IS,!Thon!M,!Zeilinger!S,! CasasQFlores! S,! Horwitz! BA,! Mukherjee! PK# et# al:! Comparative! genome! sequence! analysis! underscores! mycoparasitism! as! the! ancestral! life! style! of! Trichoderma.! Genome# Biol# 2011,! 12(4):R40.! 125.! van! Gorcom! RF,! Boschloo! JG,! Kuijvenhoven! A,! Lange! J,! van! Vark! AJ,! Bos! CJ,! van! Balken! JA,! Pouwels! PH,! van! den! Hondel! CA:! Isolation! and! molecular! characterisation! of! the! benzoateJ paraJhydroxylase! gene! (bphA)! of! Aspergillus+ niger:!a!member!of!a!new!gene!family!of!the! cytochrome!P450!superfamily.!Mol#Gen#Genet#1990,!223(2):192Q197.! 126.! Harwood!CS,!Parales!RE:!The!betaJketoadipate!pathway!and!the!biology!of!selfJidentity.!Annu# Rev#Microbiol#1996,!50:553Q590.! 127.! BénigneQErnest!Amborabéa!PFQL,!JeanQFrançois!Cholletb,!Gabriel!Roblina,!:!Antifungal!effects!of! salicylic! acid! and! other! benzoic! acid! derivatives! towards! Eutypa! lata:! structure–activity! relationship.!Plant#Physiology#and#Biochemistry#2003,!40(12):1051Q1060.! 128.! Mingot! JM,! Penalva! MA,! FernandezQCanon! JM:! Disruption! of! phacA,! an! Aspergillus+ nidulans! gene! encoding! a! novel! cytochrome! P450! monooxygenase! catalyzing! phenylacetate! 2J hydroxylation,!results!in!penicillin!overproduction.!J#Biol#Chem#1999,!274(21):14545Q14550.! 129.! Mendonca!Ade!L,!da!Silva!CE,!de!Mesquita!FL,!Campos!Rda!S,!Do!Nascimento!RR,!Ximenes!EC,! Sant'Ana!AE:!Antimicrobial!activities!of!components!of!the!glandular!secretions!of!leaf!cutting! ants!of!the!genus!Atta.!Antonie#Van#Leeuwenhoek#2009,!95(4):295Q303.! 130.! Tyler!BM,!Tripathy!S,!Zhang!X,!Dehal!P,!Jiang!RH,!Aerts!A,!Arredondo!FD,!Baxter!L,!Bensasson!D,! Beynon! JL# et# al:! Phytophthora! genome! sequences! uncover! evolutionary! origins! and! mechanisms!of!pathogenesis.!Science#2006,!313(5791):1261Q1266.! 131.! Goddard!MR,!Burt!A:!Recurrent!invasion!and!extinction!of!a!selfish!gene.!Proc#Natl#Acad#Sci#U#S# A#1999,!96(24):13880Q13885.!

!160 !

132.! HolstQJensen!A,!Vaage!M,!Schumacher!T,!Johansen!S:! Structural! characteristics! and! possible! horizontal!transfer!of!group!I!introns!between!closely!related!plant!pathogenic!fungi.!Mol#Biol# Evol#1999,!16(1):114Q126.! 133.! Collins! RA,! Saville! BJ:! Independent! transfer! of! mitochondrial! chromosomes! and! plasmids! during!unstable!vegetative!fusion!in!Neurospora.!Nature#1990,!345(6271):177Q179.! 134.! Khaldi!N,!Wolfe!KH:!Elusive! origins! of! the! extra! genes! in! Aspergillus+oryzae.!PLoS#ONE#2008,! 3(8):e3036.! 135.! Lawrence! JG,! Roth! JR:! Selfish! operons:! horizontal! transfer! may! drive! the! evolution! of! gene! clusters.!Genetics#1996,!143(4):1843Q1860.! 136.! Campbell!MA,!Rokas!A,!Slot!JC:!Horizontal!transfer!and!death!of!a!fungal!secondary!metabolic! gene!cluster.!Genome#Biol#Evol#2012,!4(3):289Q293.! 137.! Lodeiro! S,! Xiong! Q,! Wilson! WK,! Ivanova! Y,! Smith! ML,! May! GS,! Matsuda! SP:! Protostadienol! biosynthesis!and!metabolism!in!the!pathogenic!fungus!Aspergillus+fumigatus.!Org#Lett#2009,! 11(6):1241Q1244.! 138.! Mitsuguchi! H,! Seshime! Y,! Fujii! I,! Shibuya! M,! Ebizuka! Y,! Kushiro! T:! Biosynthesis! of! steroidal! antibiotic! fusidanes:! functional! analysis! of! oxidosqualene! cyclase! and! subsequent! tailoring! enzymes!from!Aspergillus+fumigatus.!J#Am#Chem#Soc#2009,!131(18):6402Q6411.! 139.! Rehman!S,!Shawl!AS,!Verma!V,!Kour!A,!Athar!M,!Andrabi!R,!Sultan!P,!Qazi!GN:!An!endophytic! Neurospora! sp.! from!Nothapodytes+ foetida! producing! camptothecin.!Prikl#Biokhim#Mikrobiol# 2008,!44(2):225Q231.! 140.! Coleman! JJ,! Rounsley! SD,! RodriguezQCarres! M,! Kuo! A,! Wasmann! CC,! Grimwood! J,! Schmutz! J,! Taga! M,! White! GJ,! Zhou! S# et# al:! The! genome! of! Nectria+ haematococca:! contribution! of! supernumerary!chromosomes!to!gene!expansion.!PLoS#genetics#2009,!5(8):e1000618.! 141.! Temporini! ED,! VanEtten! HD:! An! analysis! of! the! phylogenetic! distribution! of! the! pea! pathogenicity!genes!of!Nectria+haematococca!MPVI!supports!the!hypothesis!of!their!origin!by! horizontal!transfer!and!uncovers!a!potentially!new!pathogen!of!garden!pea:!Neocosmospora+ boniensis.!Curr#Genet#2004,!46(1):29Q36.! 142.! Pinot!F,!Beisson!F:!Cytochrome!P450!metabolizing!fatty!acids!in!plants:!characterization!and! physiological!roles.!FEBS#J#2011,!278(2):195Q205.! 143.! Roberts!GA,!Celik!A,!Hunter!DJ,!Ost!TW,!White!JH,!Chapman!SK,!Turner!NJ,!Flitsch!SL:!A! selfJ sufficient! cytochrome! p450! with! a! primary! structural! organization! that! includes! a! flavin! domain!and!a![2FeJ2S]!redox!center.!J#Biol#Chem#2003,!278(49):48914Q48920.! 144.! Hunter!DJ,!Roberts!GA,!Ost!TW,!White!JH,!Muller!S,!Turner!NJ,!Flitsch!SL,!Chapman!SK:!Analysis! of! the! domain! properties! of! the! novel! cytochrome! P450! RhF.!FEBS#Lett#2005,!579(10):2215Q 2220.! 145.! Tamura!K,!Peterson!D,!Peterson!N,!Stecher!G,!Nei!M,!Kumar!S:!MEGA5:!molecular!evolutionary! genetics!analysis!using!maximum!likelihood,!evolutionary!distance,!and!maximum!parsimony! methods.!Mol#Biol#Evol#2011,!28(10):2731Q2739.! 146.! Petric!S,!Hakki!T,!Bernhardt!R,!Zigon!D,!Cresnar!B:!Discovery!of!a!!11alphaJhydroxylase! from!Rhizopus+oryzae!and!its!biotechnological!application.!J#Biotechnol#2010,!150(3):428Q437.! 147.! Melo!NR,!Moran!GP,!Warrilow!AG,!Dudley!E,!Smith!SN,!Sullivan!DJ,!Lamb!DC,!Kelly!DE,!Coleman! DC,!Kelly!SL:!CYP56!(Dit2p)!in!Candida!albicans:!characterization!and!investigation!of!its!role!in! growth! and! antifungal! drug! susceptibility.! Antimicrobial# agents# and# chemotherapy# 2008,! 52(10):3718Q3724.! 148.! Georgianna!DR,!Fedorova!ND,!Burroughs!JL,!Dolezal!AL,!Bok!JW,!HorowitzQBrown!S,!Woloshuk! CP,! Yu! J,! Keller! NP,! Payne! GA:! Beyond! aflatoxin:! four! distinct! expression! patterns! and! functional!roles!associated!with!Aspergillus!flavus!secondary!metabolism!gene!clusters.!Mol# Plant#Pathol,!11(2):213Q226.!

!161 !

149.! Lin! L,! Fang! W,! Liao! X,! Wang! F,! Wei! D,! St! Leger! RJ:! The! MrCYP52! cytochrome! P450! monoxygenase! gene! of! Metarhizium! robertsii! is! important! for! utilizing! insect! epicuticular! hydrocarbons.!PLoS#One#2011,!6(12):e28984.! 150.! Moktali! V,! Park! J,! FedorovaQAbrams! ND,! Park! B,! Choi! J,! Lee! YH,! Kang! S:! Systematic! and! searchable! classification! of! cytochrome! P450! proteins! encoded! by! fungal! and! oomycete! genomes.!BMC#genomics#2012,!13:525.! 151.! Collemare!J,!Billard!A,!Bohnert!HU,!Lebrun!MH:!Biosynthesis!of!secondary!metabolites!in!the! rice! blast! fungus! Magnaporthe! grisea:! the! role! of! hybrid! PKSJNRPS! in! pathogenicity.! Mycological#research#2008,!112(Pt!2):207Q215.! 152.! Kobayashi!T,!Abe!K,!Asai!K,!Gomi!K,!Juvvadi!PR,!Kato!M,!Kitamoto!K,!Takeuchi!M,!Machida!M:! Genomics! of! Aspergillus! oryzae.!Bioscience,#biotechnology,#and#biochemistry#2007,!71(3):646Q 670.! 153.! O'Connell!RJ,!Thon!MR,!Hacquard!S,!Amyotte!SG,!Kleemann!J,!Torres!MF,!Damm!U,!Buiate!EA,! Epstein! L,! Alkan! N# et# al:! Lifestyle! transitions! in! plant! pathogenic! Colletotrichum! fungi! deciphered!by!genome!and!transcriptome!analyses.!Nat#Genet#2012,!44(9):1060Q1065.! 154.! De!Bie!T,!Cristianini!N,!Demuth!JP,!Hahn!MW:!CAFE:!a!computational!tool!for!the!study!of!gene! family!evolution.!Bioinformatics#2006,!22(10):1269Q1271.! 155.! Ohta!T:!Role!of!gene!duplication!in!evolution.!Genome#/#National#Research#Council#Canada#=# Genome#/#Conseil#national#de#recherches#Canada#1989,!31(1):304Q310.! 156.! Rosgaard! L,! Pedersen! S,! Cherry! JR,! Harris! P,! Meyer! AS:! Efficiency! of! new! fungal! cellulase! systems! in! boosting! enzymatic! degradation! of! barley! straw! lignocellulose.! Biotechnology# progress#2006,!22(2):493Q498.! 157.! Santana! MF,! Silva! JC,! Batista! AD,! Ribeiro! LE,! da! Silva! GF,! de! Araujo! EF,! de! Queiroz! MV:! Abundance,! distribution! and! potential! impact! of! transposable! elements! in! the! genome! of! Mycosphaerella!fijiensis.!BMC#Genomics#2012,!13:720.! 158.! Powell!AJ,!Conant!GC,!Brown!DE,!Carbone!I,!Dean!RA:!Altered!patterns!of!gene!duplication!and! differential!gene!gain!and!loss!in!fungal!pathogens.!BMC#genomics#2008,!9:147.! 159.! Sharpton!TJ,!Stajich!JE,!Rounsley!SD,!Gardner!MJ,!Wortman!JR,!Jordar!VS,!Maiti!R,!Kodira!CD,! Neafsey! DE,! Zeng! Q# et# al:! Comparative! genomic! analyses! of! the! human! fungal! pathogens! Coccidioides!and!their!relatives.!Genome#Res#2009,!19(10):1722Q1731.! 160.! Islam!MS,!Haque!MS,!Islam!MM,!Emdad!EM,!Halim!A,!Hossen!QM,!Hossain!MZ,!Ahmed!B,!Rahim! S,!Rahman!MS# et#al:!Tools! to! kill:! genome! of! one! of! the! most! destructive! plant! pathogenic! fungi!Macrophomina!phaseolina.!BMC#Genomics#2012,!13:493.! 161.! Amselem!J,!Cuomo!CA,!van!Kan!JA,!Viaud!M,!Benito!EP,!Couloux!A,!Coutinho!PM,!de!Vries!RP,! Dyer!PS,!Fillinger!S#et#al:!Genomic! analysis! of! the! necrotrophic! fungal! pathogens! Sclerotinia! sclerotiorum!and!Botrytis!cinerea.!PLoS#genetics#2011,!7(8):e1002230.! 162.! Iida!T,!Sumita!T,!Ohta!A,!Takagi!M:!The!cytochrome!P450ALK!multigene!family!of!an!nJalkaneJ assimilating!yeast,!Yarrowia!lipolytica:!cloning!and!characterization!of!genes!coding!for!new! CYP52!family!members.!Yeast#2000,!16(12):1077Q1087.! 163.! Hohn!TM,!Desjardins!AE,!McCormick!SP:!The!Tri4!gene!of!Fusarium!sporotrichioides!encodes!a! cytochrome! P450! monooxygenase! involved! in! trichothecene! biosynthesis.! Mol# Gen# Genet# 1995,!248(1):95Q102.! 164.! Ehrlich!KC,!Chang!PK,!Yu!J,!Cotty!PJ:!Aflatoxin!biosynthesis!cluster!gene!cypA!is!required!for!G! aflatoxin!formation.!Appl#Environ#Microbiol#2004,!70(11):6518Q6524.! 165.! Rhome!R,!Del!Poeta!M:!Sphingolipid!signaling!in!fungal!pathogens.!Advances#in#experimental# medicine#and#biology#2010,!688:232Q237.!

!162 !

166.! Bao!D,!Gong!M,!Zheng!H,!Chen!M,!Zhang!L,!Wang!H,!Jiang!J,!Wu!L,!Zhu!Y,!Zhu!G#et#al:!Sequencing! and!Comparative!Analysis!of!the!Straw!Mushroom!(Volvariella!volvacea)!Genome.!PLoS#One# 2013,!8(3):e58294.! 167.! Subramanian!V,!Yadav!JS:!Role!of!P450!monooxygenases!in!the!degradation!of!the!endocrineJ disrupting!chemical!nonylphenol!by!the!white!rot!fungus!Phanerochaete!chrysosporium.!Appl# Environ#Microbiol#2009,!75(17):5570Q5580.! 168.! Brown! DW,! McCormick! SP,! Alexander! NJ,! Proctor! RH,! Desjardins! AE:! Inactivation! of! a! cytochrome! PJ450! is! a! determinant! of! trichothecene! diversity! in! Fusarium! species.! Fungal# Genet#Biol#2002,!36(3):224Q233.! 169.! Makarova! KS,! Wolf! YI,! Mekhedov! SL,! Mirkin! BG,! Koonin! EV:! Ancestral! paralogs! and! pseudoparalogs!and!their!role!in!the!emergence!of!the!eukaryotic!cell.!Nucleic#Acids#Res#2005,! 33(14):4626Q4638.! 170.! Suyama!M,!Torrents!D,!Bork!P:!PAL2NAL:! robust! conversion! of! protein! sequence! alignments! into!the!corresponding!codon!alignments.!Nucleic#Acids#Res#2006,!34(Web!Server!issue):W609Q 612.! 171.! Massingham!T,!Goldman!N:!Detecting!amino!acid!sites!under!positive!selection!and!purifying! selection.!Genetics#2005,!169(3):1753Q1762.! 172.! Sanderson!MJ:!r8s:!inferring!absolute!rates!of!molecular!evolution!and!divergence!times!in!the! absence!of!a!molecular!clock.!Bioinformatics#2003,!19(2):301Q302.! 173.! Cunningham! KW:! Acidic! calcium! stores! of! Saccharomyces! cerevisiae.! Cell# calcium# 2011,! 50(2):129Q138.! 174.! Schwaller! B:! The! regulation! of! a! cell's! Ca(2+)! signaling! toolkit:! the! Ca! (2+)! homeostasome.! Advances#in#experimental#medicine#and#biology#2012,!740:1Q25.! 175.! Vadassery! J,! Oelmuller! R:! Calcium! signaling! in! pathogenic! and! beneficial! plant! microbe! interactions:! what! can! we! learn! from! the! interaction! between! Piriformospora! indica! and! Arabidopsis!thaliana.!Plant#signaling#&#behavior#2009,!4(11):1024Q1027.! 176.! Rispail!N,!Soanes!DM,!Ant!C,!Czajkowski!R,!Grunler!A,!Huguet!R,!PerezQNadales!E,!Poli!A,!Sartorel! E,! Valiante! V# et# al:! Comparative! genomics! of! MAP! kinase! and! calciumJcalcineurin! signalling! components!in!plant!and!human!pathogenic!fungi.!Fungal#genetics#and#biology#:#FG#&#B#2009,! 46(4):287Q298.! 177.! Kim!HS,!Czymmek!KJ,!Patel!A,!Modla!S,!Nohe!A,!Duncan!R,!Gilroy!S,!Kang!S:!Expression!of!the! Cameleon! calcium! biosensor! in! fungi! reveals! distinct! Ca(2+)! signatures! associated! with! polarized!growth,!development,!and!pathogenesis.!Fungal#genetics#and#biology#:#FG#&#B#2012,! 49(8):589Q601.! 178.! Choi! J,! Kim! KS,! Rho! HS,! Lee! YH:! Differential! roles! of! the! phospholipase! C! genes! in! fungal! development!and!pathogenicity!of!Magnaporthe!oryzae.!Fungal#genetics#and#biology#:#FG#&#B# 2011,!48(4):445Q455.! 179.! Zhang!J,!Silao!FG,!Bigol!UG,!Bungay!AA,!Nicolas!MG,!Heitman!J,!Chen!YL:!Calcineurin!is!required! for!pseudohyphal!growth,!virulence,!and!drug!resistance!in!Candida!lusitaniae.!PloS#one#2012,! 7(8):e44192.! 180.! Shendure!J,!Lieberman!Aiden!E:!The!expanding!scope!of!DNA!sequencing.!Nature#biotechnology# 2012,!30(11):1084Q1094.! 181.! Bencina!M,!Bagar!T,!Lah!L,!Krasevec!N:!A!comparative!genomic!analysis!of!calcium!and!proton! signaling/homeostasis! in! Aspergillus! species.! Fungal#genetics#and#biology#:#FG#&#B# 2009,! 46! Suppl!1:S93QS104.! 182.! Nagamune!K,!Sibley!LD:!Comparative!genomic!and!phylogenetic!analyses!of!calcium!ATPases! and! calciumJregulated! proteins! in! the! apicomplexa.! Molecular# biology# and# evolution# 2006,! 23(8):1613Q1627.!

!163 !

183.! Nguyen!QB,!Kadotani!N,!Kasahara!S,!Tosa!Y,!Mayama!S,!Nakayashiki!H:!Systematic! functional! analysis!of!calciumJsignalling!proteins!in!the!genome!of!the!riceJblast!fungus,!Magnaporthe! oryzae,! using! a! highJthroughput! RNAJsilencing! system.! Molecular# microbiology# 2008,! 68(6):1348Q1365.! 184.! Choi!J,!Cheong!K,!Jung!K,!Jeon!J,!Lee!GW,!Kang!S,!Kim!S,!Lee!YW,!Lee!YH:!CFGP!2.0:!a!versatile! webJbased! platform! for! supporting! comparative! and! evolutionary! genomics! of! fungi! and! Oomycetes.!Nucleic#acids#research#2013,!41(Database!issue):D714Q719.! 185.! Engel!SR,!Cherry!JM:!The!new!modern!era!of!yeast!genomics:!community!sequencing!and!the! resulting! annotation! of! multiple! Saccharomyces! cerevisiae! strains! at! the! Saccharomyces! Genome! Database.! Database# :# the# journal# of# biological# databases# and# curation# 2013,! 2013:bat012.! 186.! Ma!LJ,!Ibrahim!AS,!Skory!C,!Grabherr!MG,!Burger!G,!Butler!M,!Elias!M,!Idnurm!A,!Lang!BF,!Sone!T# et#al:!Genomic!analysis!of!the!basal!lineage!fungus!Rhizopus!oryzae!reveals!a!wholeJgenome! duplication.!PLoS#genetics#2009,!5(7):e1000549.! 187.! Martin!DC,!Kim!H,!Mackin!NA,!MaldonadoQBaez!L,!Evangelista!CC,!Jr.,!Beaudry!VG,!Dudgeon!DD,! Naiman!DQ,!Erdman!SE,!Cunningham!KW:!New!regulators!of!a!high!affinity!Ca2+!influx!system! revealed! through! a! genomeJwide! screen! in! yeast.! The# Journal# of# biological# chemistry# 2011,! 286(12):10744Q10754.! 188.! Cavinder!B,!Trail!F:!Role! of! Fig1,! a! component! of! the! lowJaffinity! calcium! uptake! system,! in! growth!and!sexual!development!of!filamentous!fungi.!Eukaryotic#cell#2012,!11(8):978Q988.! 189.! Chang!Y,!Schlenstedt!G,!Flockerzi!V,!Beck!A:!Properties!of!the!intracellular!transient!receptor! potential!(TRP)!channel!in!yeast,!Yvc1.!FEBS#letters#2010,!584(10):2028Q2032.! 190.! Hille!B:!Ionic!channels:!molecular!pores!of!excitable!membranes.!Harvey#lectures#1986,!82:47Q 69.! 191.! Paidhungat!M,!Garrett!S:!A!homolog!of!mammalian,!voltageJgated!calcium!channels!mediates! yeast! pheromoneJstimulated! Ca2+! uptake! and! exacerbates! the! cdc1(Ts)! growth! defect.! Molecular#and#cellular#biology#1997,!17(11):6339Q6347.! 192.! Muller!EM,!Mackin!NA,!Erdman!SE,!Cunningham!KW:!Fig1p!facilitates!Ca2+!influx!and!cell!fusion! during! mating! of! Saccharomyces! cerevisiae.! The# Journal# of# biological# chemistry# 2003,! 278(40):38461Q38469.! 193.! Yang!M,!Brand!A,!Srikantha!T,!Daniels!KJ,!Soll!DR,!Gow!NA:!Fig1!facilitates!calcium!influx!and! localizes! to! membranes! destined! to! undergo! fusion! during! mating! in! Candida! albicans.! Eukaryotic#cell#2011,!10(3):435Q444.! 194.! Palmer! CP,! Zhou! XL,! Lin! J,! Loukin! SH,! Kung! C,! Saimi! Y:! A! TRP! homolog! in! Saccharomyces! cerevisiae!forms!an!intracellular!Ca(2+)Jpermeable!channel!in!the!yeast!vacuolar!membrane.! Proceedings# of# the# National# Academy# of# Sciences# of# the# United# States# of# America# 2001,! 98(14):7801Q7805.! 195.! Denis! V,! Cyert! MS:! Internal! Ca(2+)! release! in! yeast! is! triggered! by! hypertonic! shock! and! mediated!by!a!TRP!channel!homologue.!The#Journal#of#cell#biology#2002,!156(1):29Q34.! 196.! Axelsen! KB,! Palmgren! MG:! Evolution! of! substrate! specificities! in! the! PJtype! ATPase! superfamily.!Journal#of#molecular#evolution#1998,!46(1):84Q101.! 197.! Hua!Z,!Graham!TR:!Requirement!for!neo1p!in!retrograde!transport!from!the!Golgi!complex!to! the!endoplasmic!reticulum.!Molecular#biology#of#the#cell#2003,!14(12):4971Q4983.! 198.! Zelter!A,!Bencina!M,!Bowman!BJ,!Yarden!O,!Read!ND:!A!comparative!genomic!analysis!of!the! calcium! signaling! machinery! in! Neurospora! crassa,! Magnaporthe! grisea,! and! Saccharomyces! cerevisiae.!Fungal#genetics#and#biology#:#FG#&#B#2004,!41(9):827Q841.!

!164 !

199.! Vashist!S,!Frank!CG,!Jakob!CA,!Ng!DT:!Two! distinctly! localized! pJtype! ATPases! collaborate! to! maintain! organelle! homeostasis! required! for! glycoprotein! processing! and! quality! control.! Molecular#biology#of#the#cell#2002,!13(11):3955Q3966.! 200.! Cai!X,!Clapham!DE:!Ancestral!Ca2+!signaling!machinery!in!early!animal!and!fungal!evolution.! Molecular#biology#and#evolution#2012,!29(1):91Q100.! 201.! Hoeflich! KP,! Ikura! M:! Calmodulin! in! action:! diversity! in! target! recognition! and! activation! mechanisms.!Cell#2002,!108(6):739Q742.! 202.! Kraus!PR,!Heitman!J:!Coping!with!stress:!calmodulin!and!calcineurin!in!model!and!pathogenic! fungi.!Biochemical#and#biophysical#research#communications#2003,!311(4):1151Q1157.! 203.! Cagnac!O,!Leterrier!M,!Yeager!M,!Blumwald!E:!Identification!and!characterization!of!Vnx1p,!a! novel! type! of! vacuolar! monovalent! cation/H+! antiporter! of! Saccharomyces! cerevisiae.! The# Journal#of#biological#chemistry#2007,!282(33):24284Q24293.! 204.! Cagnac!O,!ArandaQSicilia!MN,!Leterrier!M,!RodriguezQRosales!MP,!Venema!K:!Vacuolar!cation/H+! antiporters! of! Saccharomyces! cerevisiae.! The# Journal# of# biological# chemistry# 2010,! 285(44):33914Q33922.! 205.! Cavinder!B,!Hamam!A,!Lew!RR,!Trail!F:!Mid1,!a!mechanosensitive!calcium!ion!channel,!affects! growth,! development,! and! ascospore! discharge! in! the! filamentous! fungus! Gibberella! zeae.! Eukaryotic#cell#2011,!10(6):832Q841.! 206.! Bouillet! LE,! Cardoso! AS,! Perovano! E,! Pereira! RR,! Ribeiro! EM,! Tropia! MJ,! Fietto! LG,! Tisi! R,! Martegani!E,!Castro!IM#et#al:!The! involvement! of! calcium! carriers! and! of! the! vacuole! in! the! glucoseJinduced! calcium! signaling! and! activation! of! the! plasma! membrane! H(+)JATPase! in! Saccharomyces!cerevisiae!cells.!Cell#calcium#2012,!51(1):72Q81.! 207.! Strayle!J,!Pozzan!T,!Rudolph!HK:!SteadyJstate!free!Ca(2+)!in!the!yeast!endoplasmic!reticulum! reaches!only!10!microM!and!is!mainly!controlled!by!the!secretory!pathway!pump!pmr1.!The# EMBO#journal#1999,!18(17):4733Q4743.! 208.! Li! X,! Qian! J,! Wang! C,! Zheng! K,! Ye! L,! Fu! Y,! Han! N,! Bian! H,! Pan! J,! Wang! J# et# al:! Regulating! cytoplasmic! calcium! homeostasis! can! reduce! aluminum! toxicity! in! yeast.! PloS# one# 2011,! 6(6):e21148.! 209.! Krauke!Y,!Sychrova!H:!Cnh1!Na(+)!/H(+)!antiporter!and!Ena1!Na(+)!JATPase!play!different!roles! in! cation! homeostasis! and! cell! physiology! of! Candida! glabrata.! FEMS# yeast# research# 2011,! 11(1):29Q41.! 210.! Yu!Q,!Wang!H,!Xu!N,!Cheng!X,!Wang!Y,!Zhang!B,!Xing!L,!Li!M:!Spf1!strongly!influences!calcium! homeostasis,! hyphal! development,! biofilm! formation! and! virulence! in! Candida! albicans.! Microbiology#2012,!158(Pt!9):2272Q2282.! 211.! Kmetzsch! L,! Staats! CC,! Simon! E,! Fonseca! FL,! de! Oliveira! DL,! Sobrino! L,! Rodrigues! J,! Leal! AL,! Nimrichter! L,! Rodrigues! ML# et# al:! The! vacuolar! Ca(2)(+)! exchanger! Vcx1! is! involved! in! calcineurinJdependent! Ca(2)(+)! tolerance! and! virulence! in! Cryptococcus! neoformans.! Eukaryotic#cell#2010,!9(11):1798Q1805.! 212.! Raffaello! A,! De! Stefani! D,! Rizzuto! R:! The! mitochondrial! Ca(2+)! uniporter.! Cell# calcium# 2012,! 52(1):16Q21.! 213.! Perocchi!F,!Gohil!VM,!Girgis!HS,!Bao!XR,!McCombs!JE,!Palmer!AE,!Mootha!VK:!MICU1!encodes!a! mitochondrial!EF!hand!protein!required!for!Ca(2+)!uptake.!Nature#2010,!467(7313):291Q296.! 214.! Desrivieres! S,! Cooke! FT,! MoralesQJohansson! H,! Parker! PJ,! Hall! MN:! Calmodulin! controls! organization! of! the! actin! cytoskeleton! via! regulation! of! phosphatidylinositol! (4,5)J bisphosphate! synthesis! in! Saccharomyces! cerevisiae.! The# Biochemical# journal# 2002,! 366(Pt! 3):945Q951.!

!165 !

215.! GarrettQEngele!P,!Moilanen!B,!Cyert!MS:!Calcineurin,!the!Ca2+/calmodulinJdependent!protein! phosphatase,!is!essential!in!yeast!mutants!with!cell!integrity!defects!and!in!mutants!that!lack!a! functional!vacuolar!H(+)JATPase.!Molecular#and#cellular#biology#1995,!15(8):4103Q4114.! 216.! Cruz! MC,! Goldstein! AL,! Blankenship! JR,! Del! Poeta! M,! Davis! D,! Cardenas! ME,! Perfect! JR,! McCusker! JH,! Heitman! J:! Calcineurin! is! essential! for! survival! during! membrane! stress! in! Candida!albicans.!The#EMBO#journal#2002,!21(4):546Q559.! 217.! Parlati!F,!Dominguez!M,!Bergeron!JJ,!Thomas!DY:!Saccharomyces!cerevisiae!CNE1!encodes!an! endoplasmic! reticulum! (ER)! membrane! protein! with! sequence! similarity! to! calnexin! and! calreticulin!and!functions!as!a!constituent!of!the!ER!quality!control!apparatus.!The#Journal#of# biological#chemistry#1995,!270(1):244Q253.! 218.! Miyazaki! T,! Izumikawa! K,! Nagayoshi! Y,! Saijo! T,! Yamauchi! S,! Morinaga! Y,! Seki! M,! Kakeya! H,! Yamamoto!Y,!Yanagihara!K#et#al:!Functional!characterization!of!the!regulators!of!calcineurin!in! Candida!glabrata.!FEMS#yeast#research#2011,!11(8):621Q630.! 219.! Choi! J,! Kim! Y,! Kim! S,! Park! J,! Lee! YH:! MoCRZ1,! a! gene! encoding! a! calcineurinJresponsive! transcription! factor,! regulates! fungal! growth! and! pathogenicity! of! Magnaporthe! oryzae.! Fungal#genetics#and#biology#:#FG#&#B#2009,!46(3):243Q254.! 220.! Moser! MJ,! Geiser! JR,! Davis! TN:! Ca2+Jcalmodulin! promotes! survival! of! pheromoneJinduced! growth! arrest! by! activation! of! calcineurin! and! Ca2+JcalmodulinJdependent! protein! kinase.! Molecular#and#cellular#biology#1996,!16(9):4824Q4831.! 221.! Ramne!A,!BilslandQMarchesan!E,!Erickson!S,!Sunnerhagen!P:!The!protein!kinases!Rck1!and!Rck2! inhibit!meiosis!in!budding!yeast.!Molecular#&#general#genetics#:#MGG#2000,!263(2):253Q261.! 222.! Maringele!L,!Lydall!D:!EXO1Jdependent!singleJstranded!DNA!at!telomeres!activates!subsets!of! DNA!damage!and!spindle!checkpoint!pathways!in!budding!yeast!yku70Delta!mutants.!Genes#&# development#2002,!16(15):1919Q1933.! 223.! Crider!DG,!GarciaQRodriguez!LJ,!Srivastava!P,!PerazaQReyes!L,!Upadhyaya!K,!Boldogh!IR,!Pon!LA:! Rad53! is! essential! for! a! mitochondrial! DNA! inheritance! checkpoint! regulating! G1! to! S! progression.!The#Journal#of#cell#biology#2012,!198(5):793Q798.! 224.! Nath! N,! McCartney! RR,! Schmidt! MC:! Yeast! Pak1! kinase! associates! with! and! activates! Snf1.! Molecular#and#cellular#biology#2003,!23(11):3909Q3917.! 225.! Morishita!M,!Engebrecht!J:!End3pJmediated!endocytosis!is!required!for!spore!wall!formation!in! Saccharomyces!cerevisiae.!Genetics#2005,!170(4):1561Q1574.! 226.! D'Aquino!KE,!MonjeQCasas!F,!Paulson!J,!Reiser!V,!Charles!GM,!Lai!L,!Shokat!KM,!Amon!A:!The! protein! kinase! Kin4! inhibits! exit! from! mitosis! in! response! to! spindle! position! defects.! Molecular#cell#2005,!19(2):223Q234.! 227.! Igarashi! R,! Suzuki! M,! Nogami! S,! Ohya! Y:! Molecular! dissection! of! ARP1! regions! required! for! nuclear!migration!and!cell!wall!integrity!checkpoint!functions!in!Saccharomyces!cerevisiae.!Cell# structure#and#function#2005,!30(2):57Q67.! 228.! Wattam!AR,!Inzana!TJ,!Williams!KP,!Mane!SP,!Shukla!M,!Almeida!NF,!Dickerman!AW,!Mason!S,! Moriyon! I,! O'Callaghan! D# et# al:! Comparative! genomics! of! earlyJdiverging! Brucella! strains! reveals!a!novel!lipopolysaccharide!biosynthesis!pathway.!mBio#2012,!3(5):e00246Q00211.! 229.! Ferrer! L,! Shearer! AG,! Karp! PD:! Discovering! novel! subsystems! using! comparative! genomics.! Bioinformatics#2011,!27(18):2478Q2485.! 230.! Donkor! ES,! Stabler! RA,! Hinds! J,! Adegbola! RA,! Antonio! M,! Wren! BW:! Comparative! phylogenomics! of! Streptococcus! pneumoniae! isolated! from! invasive! disease! and! nasopharyngeal!carriage!from!West!Africans.!BMC#Genomics#2012,!13:569.! 231.! Bushley!KE,!Turgeon!BG:!Phylogenomics! reveals! subfamilies! of! fungal! nonribosomal! peptide! synthetases!and!their!evolutionary!relationships.!BMC#Evol#Biol#2010,!10:26.!

!166 !

232.! Ali!A,!Soares!SC,!Santos!AR,!Guimaraes!LC,!Barbosa!E,!Almeida!SS,!Abreu!VA,!Carneiro!AR,!Ramos! RT,!Bakhtiar!SM#et#al:!Campylobacter!fetus!subspecies:!comparative!genomics!and!prediction! of!potential!virulence!targets.!Gene#2012,!508(2):145Q156.! 233.! Butt! AM,! Nasrullah! I,! Tahir! S,! Tong! Y:! Comparative! genomics! analysis! of! Mycobacterium! ulcerans! for! the! identification! of! putative! essential! genes! and! therapeutic! candidates.!PLoS# One#2012,!7(8):e43080.! 234.! Kmetzsch!L,!Staats!CC,!Rodrigues!ML,!Schrank!A,!Vainstein!MH:!Calcium!signaling!components! in!the!human!pathogen:!Cryptococcus!neoformans.!Communicative#&#integrative#biology#2011,! 4(2):186Q187.!

!

!167 VITA Venkatesh Moktali

EDUCATION 2008-2013 The Pennsylvania State University, University Park, PA Ph.D in Bioinformatics & Genomics Dissertation topic: “A Comparative Genomic Investigation of Niche Adaptation in Fungi”

2002-2006 Vellore Institute of Technology, Vellore, India Bachelor of Technology in Bioinformatics

AWARDS & ACHIEVEMENTS • USDA-AFRI Fellow, Microbial Genomics grant (2010-2012) • Evolutionary Genomics Workshop student award, Max Planck Institutes, Germany (2011) • Fungal Genetics Conference Travel award (2009, 2011) • University Graduate Fellowship (2007) • Finalist at the Dow Sustainability Innovation Challenge Award (2012)

PROFESSIONAL AFFILIATIONS • Student member of the International Society for Computational Biology • Member of Bioinformatics & Genomics Association, College of Agricultural Sciences Advisory committee and Plant Pathology Association at Penn State University • Former President of the Indian Graduate Students Association at Penn State University • Co-founder of Strategy Club at Penn State University

TEACHING EXPERIENCE • Instructor for a 200 level Biochemistry laboratory course BMB 201 (Fall 2009)

PUBLICATIONS • Venkatesh Moktali, Seogchan Kang “Phylogenomic analysis of Cytochrome P450 proteins from 50 Pezizomycotina fungi.” (Manuscript in preparation) • Venkatesh Moktali, Bongsoo Park, Seogchan Kang “The Fungal Calcium Signaling Database: A community resource for calcium signaling in fungi.” (Manuscript in preparation) • Natalie D. Fedorova, Venkatesh Moktali and Marnix H. Medema “Bioinformatics approaches and software for detection of secondary metabolic gene clusters” Fungal Secondary Metabolism, Methods in Molecular Biology, October 14, 2012, Springer Publications, ISBN 978-1-62703-121-9. • Venkatesh Moktali, Jongsun Park, Yong-Hwan Lee, Seogchan Kang “Systematic and searchable classification of cytochrome P450 proteins encoded by fungal and oomycete genomes.” BMC Genomics, October 4, 2012, 13:525 (Highly accessed). !

!