
CALIFORNIA STATE UNIVERSITY, NORTHRIDGE Bioinformatic Comparison of the EVI2A Promoter and Coding Regions A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Biology By Max Weinstein May 2020 The thesis of Max Weinstein is approved: Professor Rheem D. Medh Date Professor Virginia Oberholzer Vandergon Date Professor Cindy Malone, Chair Date California State University Northridge ii Table of Contents Signature Page ii List of Figures v Abstract vi Introduction 1 Ecotropic Viral Integration Site 2A, a Gene within a Gene 7 Materials and Methods 11 PCR and Cloning of Recombinant Plasmid 11 Transformation and Cell Culture 13 Generation of Deletion Constructs 15 Transfection and Luciferase Assay 16 Identification of Transcription Factor Binding Sites 17 Determination of Region for Analysis 17 Multiple Sequence Alignment (MSA) 17 Model Testing 18 Tree Construction 18 Promoter and CDS Conserved Motif Search 19 Results and Discussion 20 Choice of Species for Analysis 20 Mapping of Potential Transcription Factor Binding Sites 20 Confirmation of Plasmid Generation through Gel Electrophoresis 21 Analysis of Deletion Constructs by Transient Transfection 22 EVI2A Coding DNA Sequence Phylogenetics 23 EVI2A Promoter Phylogenetics 23 EVI2A Conserved Leucine Zipper 25 iii EVI2A Conserved Casein kinase II phosphorylation site 26 EVI2A Conserved Sox-5 Binding Site 26 EVI2A Conserved HLF Binding Site 27 EVI2A Conserved cREL Binding Site 28 EVI2A Conserved CREB Binding Site 28 Summary 29 Appendix: Figures 30 Literature Cited 44 iv List of Figures Figure 1. EVI2A is nested within the gene NF1 30 Figure 2 Putative Transcriptions Start Site 31 Figure 3. Gel Electrophoresis of pGC Blue cloned, Restriction Digested Plasmid 32 Figure 4. Gel Electrophoresis of pGL3 Basic cloned, Restriction Digested Plasmids 33 Figure 5. Non-Significant differences in activity between EVI2A 34 promoter deletion constructs Figure 6. Phylogenetic Tree of EVI2A CDS Generated Through Bayesian Inference 35 Figure 7. Phylogenetic Tree of EVI2A CDS Generated Through Maximum Likelihood 36 Figure 8. Phylogenetic Tree of EVI2A Putative Promoter 37 Generated Through Bayesian Inference Figure 9. Phylogenetic Tree of EVI2A Putative Promoter 38 Generated Through Maximum Likelihood Figure 10. MSA of the amino acids in the leucine zipper motif of the EVI2A protein 39 Figure 11. MSA of the amino acids of the casein kinase II phosphorylation site 40 Figure 12. MSA of nucleotides composing the Sox-5 Transcription Factor Binding Site 41 Figure 13. MSA of nucleotides composing the HLF Transcription Factor Binding Site 42 Figure 14. MSA of nucleotides composing the cREL Transcription Factor Binding Site 42 Figure 15. MSA of nucleotides composing the CREB Transcription Factor Binding Site 43 v Abstract Bioinformatic Comparison of the EVI2A Promoter and Coding Regions By Max Weinstein Master of Science in Biology Evolution is driven by natural selection operating on the genes present in a given species. However, it is unclear if the different parts of a gene (coding and non-coding) are under the same types of selection as each other. Determining the similarity of selection between coding and non- coding regions of a gene can be determined using bioinformatics; by analyzing the differences in conserved sites in the coding and non-coding (promoter) regions of the gene. Ecotropic Viral Integration Site 2A was analyzed at both its putative promoter region and its coding DNA sequence to determine its similarity of selection. After identifying the putative promoter region of the EVI2A gene through tracking of conserved regions, the genetic sequences for the EVI2A putative promoter and coding sequences were compared across several different vertebrate species to generate genetic phylogenies, which were then compared against each other and known evolutionary phylogenetic trees. Conserved sequences were identified in both the putative promoter regions and the coding sequences. The motifs conserved in the coding sequence were much more stringently conserved than were the conserved sequences in the putative promoter region, suggesting that the two regions of the EVI2A gene are under different types of selective pressure. vi Introduction For transcription of a gene to even begin, RNA polymerase must combine with auxiliary transcription factors to form a transcription preinitiation complex, which binds to the gene locus upstream of the transcription start site, in an approximately 1000 base pair long area called the promoter region (Vo et al. 2017). This process of RNA polymerase binding is tightly regulated to control when a given gene is expressed in a cell. A failure of this regulation can have far-ranging consequences on the human body. Diseases that can result from genetic mis-regulation include autoimmune inflammation, neurodegenerative disorders, and cancer (Lee and Young, 2013). RNA polymerase II binds to the DNA strand upstream of the DNA sequence to be transcribed. This transcription start site marks the location where RNA polymerase II begins transcribing the DNA blueprint into mRNA. This is separate from where translation of mRNA into protein begins which is always a methionine amino acid. In order to properly bind, the polymerase requires a number of other proteins to facilitate proper regulation of binding. These proteins, known as transcription factors, must also bind to the DNA upstream of the transcription start site, in an area known as the promoter region. Within the promoter region is the core promoter, an area approximately 40 base pairs long that is the minimum continuous stretch of DNA necessary for the proper binding of RNA polymerase II to initiate transcription (Butler et al. 2002). In addition to the promoter itself, assorted transcription factors exist that contribute to the process of controlling when a gene is transcribed, and the protein produced. “Enhancers” are sequences of DNA that recruit proteins known as “activators” to the promoter region. These activators, in turn, act to recruit RNA polymerase to the promoter (Maston et al. 2006). Conversely, “silencer” sequences of DNA recruit “repressor” transcription factors, which act to block transcription of the gene. Factors that include 1 environmental stimuli, response to infection, and intercellular signaling act to control when these transcription factors bind to their transcription factor binding sites (TFBS). The core promoter structure varies between different genes, and it is this variation that helps to allow for genes to be differentially regulated. Though the core promoter as a whole differs between genes, they are usually made up of a number of similar sequence motifs. It is the combination of these motifs that differentiates core promoters from each other. In general, there are two distinct types of core promoters; focused and dispersed. Focused promoters have a single, strong, transcription start site, or a group of tightly clustered start sites, where transcription factors bind to in order to guide in RNA polymerase II (Juven-Gershon et al. 2008). In these focused promoters, the single binding site is usually a TATA Box. Despite being the oldest known form of core promoter, focused promoters are in the minority in vertebrates. Only about 35 percent of vertebrate promoters are focused promoters (Dikstein 2011). The other 65 percent of promoters are dispersed promoters. Dispersed promoters, unlike focused promoters, make use of a number of weaker transcription start sites spread out over an area of 50-100 nucleotides (Juven-Gershon et al. 2008). The binding sites for dispersed promoters are usually (approximately 50% of the time) associated with CpG islands; areas of high G/C concentration (Juven-Gershon et al. 2008, Mahpour et al. 2018). The most ancestral motifs found in core promoters, but not dispersed promoters, is the “TATA Box”. The TATA Box is identified by the sequence “TATAWAAR” with R being the downstream element of the gene (Juven-Gershon et al. 2008). It is also possible for up to two mismatches to be present in the TATA box while still maintaining its function (Dikstein 2011). The TATA box acts as a binding site for Transcription Factor IID, a functional unit made up of a number of proteins. The protein “TATA Binding Protein” (TBF) binds to the TATA box, and the 2 TBF-Associated Factors bind onto TBF, building TFIID. TFIID, once assembled, aids in the binding of RNA polymerase II to the DNA at the proper location to begin transcription. Another protein, known as TRF3 or TBF2 functions as an analog for TBF. TRF3, like TBF, is able to bind to the TATA box. TRF3 also interacts with some of the same transcription factors as TBF; such as TFIIA and TFIIB, and is capable of performing basal transcription, making it a replacement for TBF (Goodrich and Tjian 2010). The TATA box if often accompanied by a TFIIB Recognition Element, or “BRE”. The BRE can be located either upstream or downstream of the TATA box (Juven-Gershon et al. 2008). Another motif seen in focused promoters is the initiator. The initiator motif spans the transcription start site with a consensus sequence of YYANWYY, though the sequence has such a high level of variability that it has been suggested that the consensus site should only be YR, with R corresponding to the transcription start site and replacing “ANWYY” (Dikstein 2011). By itself, the initiator can be weakly bound to by RNA polymerase II. A much stronger bond is formed when RNA polymerase II forms a complex with TFIIB (binding to BRE), TFIID (binding to the TATA box), and TFIIF. A third motif important for basal transcription activity seen in focused promoters is the Downstream Promoter Element. It is a highly conserved motif across animal species (Juven- Gershon et al. 2008). As the name suggests, this element is found downstream of the transcription start site; approximately +33 bases from the +1 site of transcriptional initiation. The DPE acts as a binding site for Transcription Factor IID. A transcription factor unique to dispersed core promoters is the “CGCG Element”, a motif with the consensus sequence of TCTCGCGAGA (Mahpour et al.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages53 Page
-
File Size-