Genome-Wide Upstream Motif Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Oberstaller et al. BMC Genomics 2013, 14:516 http://www.biomedcentral.com/1471-2164/14/516 RESEARCH ARTICLE Open Access Genome-wide upstream motif analysis of Cryptosporidium parvum genes clustered by expression profile Jenna Oberstaller1,2, Sandeep J Joseph1 and Jessica C Kissinger1,2,3* Abstract Background: There are very few molecular genetic tools available to study the apicomplexan parasite Cryptosporidium parvum. The organism is not amenable to continuous in vitro cultivation or transfection, and purification of intracellular developmental stages in sufficient numbers for most downstream molecular applications is difficult and expensive since animal hosts are required. As such, very little is known about gene regulation in C. parvum. Results: We have clustered whole-genome gene expression profiles generated from a previous study of seven post-infection time points of 3,281 genes to identify genes that show similar expression patterns throughout the first 72 hours of in vitro epithelial cell culture. We used the algorithms MEME, AlignACE and FIRE to identify conserved, overrepresented DNA motifs in the upstream promoter region of genes with similar expression profiles. The most overrepresented motifs were E2F (5′-TGGCGCCA-3′); G-box (5′-G.GGGG-3′); a well-documented ApiAP2 binding motif (5′-TGCAT-3′), and an unknown motif (5′-[A/C] AACTA-3′). We generated a recombinant C. parvum DNA-binding protein domain from a putative ApiAP2 transcription factor [CryptoDB: cgd8_810] and determined its binding specificity using protein-binding microarrays. We demonstrate that cgd8_810 can putatively bind the overrepresented G-box motif, implicating this ApiAP2 in the regulation of many gene clusters. Conclusion: Several DNA motifs were identified in the upstream sequences of gene clusters that might serve as potential cis-regulatory elements. These motifs, in concert with protein DNA binding site data, establish for the first time the beginnings of a global C. parvum gene regulatory map that will contribute to our understanding of the development of this zoonotic parasite. Keywords: Apicomplexa, Transcription, Gene regulation, Motif, AP2, E2F, G-box Background recognition of a life-threatening form of infection in im- The AIDS-related protist parasite Cryptosporidium parvum munocompromised individuals [3]. Cryptosporidium was primarily infects the microvillous border of the intes- also recently implicated as a significant pathogen contrib- tinal epithelium, and to a lesser extent extraintestinal uting to moderate-to-severe diarrhea in children under epithelia, causing acute gastrointestinal disease in a two years of age in sub-Saharan Africa, second only to wide range of mammalian hosts. The first case of hu- rotavirus [4]. Seroprevalence rates of 25-35% in the United man Cryptosporidium infection was reported in 1976 States indicate that infection with Cryptosporidium is very [1], and only seven additional cases were documented common among healthy persons [5]. before 1982 [2]. Since then the number of cases iden- C. parvum has a complex, obligate-intracellular life tified has increased dramatically, largely due to the cycle involving both asexual and sexual developmental stages. Transmission of Cryptosporidium occurs through the fecal-oral route where an infection is initiated by the * Correspondence: [email protected] ingestion of oocysts, which release sporozoites capable 1Center for Tropical and Emerging Global Diseases, University of Georgia, of invading intestinal epithelial cells. The parasite’s obli- Athens, GA 30602, USA 2Department of Genetics, University of Georgia, Athens, GA 30602, USA gate intracellular developmental stages are exceedingly Full list of author information is available at the end of the article difficult to study. The volume of parasite material relative © 2013 Oberstaller et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Oberstaller et al. BMC Genomics 2013, 14:516 Page 2 of 20 http://www.biomedcentral.com/1471-2164/14/516 to host cell is vanishingly small. Each parasite is 5 μmor functionally related groups of genes and subsequently smaller (depending on lifecycle stage; [6]) in a host cell characterized the function of some of these conserved that is a hundred to thousand times larger in volume. elements using reporter assays in the parasite [18]. Given these complications of size, the post-infection Behnke et al. (2010) used T. gondii tachyzoite gene ex- parasite cannot be isolated from host cells in sufficient pression profiles to predict regulatory elements in their numbers, nor can sufficient post-infection parasite pro- upstream sequences [19]. Guo and Silva (2008) mined tein or RNA be obtained for most downstream molecular the non-coding sequences in two Theileria genomes applications. Currently, C. parvum is not amenable to and predicted the presence of five putative cis-regula- continuous in vitro cultivation or genetic dissection [7,8]. tory elements [20]. Two previous studies characterized Given the above-mentioned difficulties, transcriptional putative regulatory elements in upstream sequences in regulation in this parasite is largely unknown. Indeed, C. parvum. They grouped genes based on function and transcriptional regulation across the entire apicomplexan looked for conserved DNA motifs in the promoter re- phylum is still poorly understood, though the combin- gions, then correlated these conserved motifs with the ation of computational and bench analyses have yielded RT-PCR expression profiles of the genes examined significant discoveries in the distantly related parasites [21,22]. Many of these classical techniques for the ex- Plasmodium falciparum and Toxoplasma gondii.Genome- perimental analysis of promoters and gene expression wide scans of the phylum for proteins containing possible are not feasible in C. parvum. Alternate approaches are DNA-binding domains revealed several families of DNA- required. The availability of several genome sequences binding proteins including a significant expansion of the [23,24] enabled the design of primers and the quan- Apicomplexan AP2 (ApiAP2) family of transcriptional tification of expression for each gene using semi- regulators [9]. Subsequent experimental analyses con- quantitative-RT-PCR [25]. These transcriptome data lay firmed regulatory roles for several of these ApiAP2 pro- a foundation for inference of gene regulatory mecha- teins [10-12]. Campbell et al. (2010) [13] determined nisms since they can be used in conjunction with the DNA-binding specificities for 20/27 identified members of genome sequence to identify putative cis-acting pro- this family in P. falciparum by generating recombinant moter elements. ApiAP2 proteins and testing them on protein-binding We utilize expression profiles from a study that gener- microarrays (PBMs) [14]. These experiments identified ated whole genome expression data for C. parvum using binding site sequences matching several previously deter- semi-quantitative RealTime-PCR of RNA from seven mined Plasmodium cis-elements. Militello et al. (2004) post-infection time points [25]. Out of 3,805 annotated computationally predicted a cis-regulatory element in the protein-encoding genes, expression data were generated upstream sequences of 8/18 P. falciparum heat shock for 3,281. We standardized these data and clustered gene genes (called the G-Box) and subsequently demonstrated expression profiles using fuzzy c-means (FCM) cluster- the importance of this element through transient transfec- ing. We identified groups of genes with similar expres- tions and mutational analyses [15]. Similarly, Young et al. sion patterns throughout the first 72 hours of the (2008) predicted several cis regulatory elements upstream intracellular life cycle in HCT-8 epithelial cell culture. of Plasmodium genes clustered based on similarity of gene We used motif-finding algorithms to identify conserved, expression profile (21 clusters total) and demonstrated the overrepresented DNA motifs in the upstream region of regulatory importance of one of the predicted elements genes with very similar expression profiles. A recombin- (PfM18.1, 5′-GTGCA-3′) in vitro [16]. Elemento et al. ant C. parvum DNA-binding protein domain from a pu- (2007) developed a powerful bioinformatic approach tak- tative ApiAP2 transcription factor [CryptoDB: cgd8_810] ing advantage of mutual information (expression infor- was generated and tested on PBMs to determine its mation and overrepresentation of short DNA sequences binding specificity. We demonstrate that cgd8_810 can upstream of potentially co-regulated genes) to predict putatively bind an overrepresented G-box motif, provid- several additional putative cis-regulatory elements ing support for our methods and potentially implicating [17]. The fact that Campbell et al. (2010) could iden- this ApiAP2 protein in the regulation of many gene tify specific trans factors that bound many of these mo- clusters. We additionally investigate Cryptosporidium- tifs [13] confirms the power of computational methods specific functionally related genes (Cryptosporidium oo- to predict cis-regulatory elements in Plasmodium. cyst wall proteins), genes found to be co-regulated in Computational methods have been used successfully other organisms (ribosomal proteins), or