YY1's DNA-Binding Motifs in Mammalian Olfactory Receptor Genes Christopher D Faulk and Joomyeong Kim*
Total Page:16
File Type:pdf, Size:1020Kb
BMC Genomics BioMed Central Research article Open Access YY1's DNA-Binding motifs in mammalian olfactory receptor genes Christopher D Faulk and Joomyeong Kim* Address: Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA Email: Christopher D Faulk - [email protected]; Joomyeong Kim* - [email protected] * Corresponding author Published: 3 December 2009 Received: 21 April 2009 Accepted: 3 December 2009 BMC Genomics 2009, 10:576 doi:10.1186/1471-2164-10-576 This article is available from: http://www.biomedcentral.com/1471-2164/10/576 © 2009 Faulk and Kim; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Background: YY1 is an epigenetic regulator for a large number of mammalian genes. While performing genome-wide YY1 binding motif searches, we discovered that the olfactory receptor (OLFR) genes have an unusual cluster of YY1 binding sites within their coding regions. The statistical significance of this observation was further analyzed. Results: About 45% of the olfactory genes in the mouse have a range of 4-8 YY1 binding sites within their respective 1 kb coding regions. Statistical analyses indicate that this enrichment of YY1 motifs has likely been driven by unknown selection pressures at the DNA level, but not serendipitously by some peptides enriched within the OLFR genes. Similar patterns are also detected in the OLFR genes of all mammals analyzed, but not in the OLFR genes of the fish lineage, suggesting a mammal-specific phenomenon. Conclusion: YY1, or YY1-related transcription factors, may help regulate olfactory receptor genes. Furthermore, the protein-coding regions of vertebrate genes can contain cis-regulatory elements for transcription factor binding as well as codons. Background factor controlling the transcriptional activity of the The transcription factor YY1 is a Gli-Kruppel type zinc fin- murine retrotransposon 'Intracisternal A Particle' [8]. ger protein that is highly conserved from insects through Since then, many retroposons, including SINE, LINE, and vertebrates [1]. YY1 can function as an activator, repressor, endogenous retrovirus families, have been shown to con- or initiator depending upon the other regulatory elements tain YY1 binding sites in their promoter regions [3,4]. Due in the region [2]. YY1 also interacts with a variety of pro- to this ubiquitous presence of YY1 binding sites in teins including components of RNA polymerase II com- genome-wide repeats, YY1 has also been regarded as a sur- plex, transcription factors, and histone-modifying veillance gene that is responsible for repressing transcrip- complexes [3-5]. According to genome-wide surveys, tional background noise from these repeats [9]. about 10% of all human genes contain YY1 binding motifs in their promoter regions [6]. Functionally, YY1 is The olfactory receptor (OLFR) genes of mammals encode involved in many biological processes, including embry- short, single coding exon, G protein-coupled receptors onic development, cell cycle progression, apoptosis, B cell that are responsible for sensing a large number of air- development, Polycomb group Gene (PcG)-mediated borne scents [10]. This gene family is comprised of over repression, genomic imprinting, and X chromosomal 800 and 1,300 gene members in human and mouse inactivation [2,3,7]. YY1 was also initially identified as a respectively, forming the largest gene family in mamma- Page 1 of 10 (page number not for citation purposes) BMC Genomics 2009, 10:576 http://www.biomedcentral.com/1471-2164/10/576 lian genomes [10-13]. The aquatic vertebrates, the teleost (PWM)-based Perl script. Repetitive elements are known fish lineage, also have a similar odorant receptor gene to contain YY1 binding sites so the RepeatMasked genome family [14]. However, the odorant receptor (OR) family was used [27]. Later, the results of these searches were vis- of the fish lineage consists of a much smaller number of ualized using the repeat-masked Custom Track of the genes than that of the mammals, and these OR genes are UCSC genome browser [28]. While inspecting global also much more diverse in sequence identity than those of localization patterns of YY1 binding motifs in each mammals. Mammalian OLFR genes are divided into Class genome, we noticed that clusters of YY1 binding motifs I and Class II groups based on sequence identity [15]. are co-localized with the genomic regions harboring olfac- Class II genes make up ~90% of OLFRs and are thought to tory receptor (OLFR) genes. The mammalian OLFR genes have expanded during the transition to land-based living. show a single coding exon structure, and they are also localized as gene clusters in specific regions of mamma- In mammals these olfactory receptors presumably lian chromosomes. One such example is shown using the expanded due to the selective advantage conferred by a 100-kb genomic region from Mmu 7 chromosome (Fig- well developed sense of smell [15]. While mice and other ure 1). This figure shows a representative sample of 5 mammals retain function and expression of almost all OLFR genes, and the locations of these genes coincide OLFRs, the majority of these are pseudogenized in with those of YY1 binding motifs. Each OLFR gene humans [16]. The mammalian OLFR genes are highly tis- appears to contain a range of 4 to 8 YY1 binding motifs sue-specific and are expressed primarily in the olfactory within its 1-kb-long Open Reading Frame (ORF). As epithelium though a subset expresses in a chemosensory expected, the coding regions correlate well to the placental role in other tissues such as kidney and sperm [17-19]. mammal conservation plot, a default track available on Furthermore, only one copy (allele) out of all 1,000 OLFR the UCSC genome browser. Also, the identified YY1 bind- genes is selected and expressed in each neuron cell of the ing motifs appear to be random in location within the olfactory tissue [20]. The unusual transcriptional control coding regions but do show a bias in orientation with of the OLFR gene family is likely mediated through respect to the direction of OLFR gene transcription, which unknown trans-acting factors [21]. The tissue-specific will be described later. nature of their expression coupled with their widespread duplication requires a mitotically-stable global silencing The unusual clustering of YY1 binding motifs within the mechanism in all cell types. According to recent studies, protein-coding regions of the OLFR genes was further ana- potential cis-regulatory elements recruiting these trans fac- lyzed to test if this pattern is unique to only the OLFR tors are hypothesized to be located within the protein- genes or also found in the ORFs of other genes. For this coding regions of the OLFR genes rather than their sur- analysis, the entire set of the mouse mRNA database was rounding genomic regions [22,23]. scanned with the PWM-based Perl program to identify YY1 binding motifs. The number of the identified YY1 While performing genome-wide searches of the DNA- binding motifs within a given DNA sequence correspond- binding motifs of YY1, we discovered that the mamma- ing to the transcript was further divided by the size of the lian OLFR genes contain unusual clusters of YY1 binding mRNA sequence, yielding a YY1 density score. The mouse sites within their protein-coding regions, whereas most mRNA sequences (total number = 20,191) were subse- YY1 binding sites are solitary and upstream of a regulated quently binned based on their relative YY1 density scores gene. In this study, we further analyzed the significance of (the X-axis on Figure 2a). The Y-axes of figure 2 represents this discovery with several bioinformatic and statistical the number of genes within a given range of the YY1 den- measures, which will be described below. Specifically we sity score. This analysis indicated that the majority of the test whether the presence of the YY1 binding sites could be non-olfactory mouse mRNA sequences (19,083 explained by DNA sequence or amino acid motif conser- sequences) were distributed evenly and randomly vation. throughout the varying ranges of the YY1 density score (0 to 0.114) (Figure 2a). In contrast, our detailed inspection Results revealed that about half of the OLFR gene set (1,108) YY1 DNA-binding motifs in the mammalian OLFR genes show very high YY1 density scores. To better visualize this YY1 is predicted to be a global epigenetic regulator based unusual pattern, we separated only the OLFR gene set on its ubiquitous expression and interaction with many from the rest of the mRNA set, and derived another histo- histone-modifying enzymes [4]. As part of the efforts gram (Figure 2b). As shown in Figure 2b, about 45% (or exploring this possibility, we have performed several 496 of 1,108) of the OLFR genes have YY1 density scores series of YY1 binding motif searches using the genome ranging from 0.032 to 0.094, which are equivalent to 4 to sequences of mammals (human, mouse, and cow) [24- 8 (or more) YY1 binding sites per a 1-kb mRNA sequence. 26]. Here we first scanned the genome sequences of In contrast, a scan of the 1 kb upstream regions of 20,419 human and mouse using a Position Weight Matrix Refseq genes revealed that only 10% of these putative pro- Page 2 of 10 (page number not for citation purposes) BMC Genomics 2009, 10:576 http://www.biomedcentral.com/1471-2164/10/576 Mouse chromosome 7 100 kb YY1Figure binding 1 sites in genomic context YY1 binding sites in genomic context. YY1 binding sites are mapped onto the Custom Track of the UCSC Genome Browser http://genome.ucsc.edu/.