A Search for Conserved Sequences in Coding Regions Reveals That the Let-7 Microrna Targets Dicer Within Its Coding Sequence

Total Page:16

File Type:pdf, Size:1020Kb

A Search for Conserved Sequences in Coding Regions Reveals That the Let-7 Microrna Targets Dicer Within Its Coding Sequence A search for conserved sequences in coding regions reveals that the let-7 microRNA targets Dicer within its coding sequence Joshua J. Forman, Aster Legesse-Miller, and Hilary A. Coller* Department of Molecular Biology, Princeton University, Princeton, NJ 08544 Edited by Philip P. Green, University of Washington School of Medicine, Seattle, WA, and approved August 12, 2008 (received for review April 2, 2008) Recognition sites for microRNAs (miRNAs) have been reported to frames among Drosophila genomes (11), and that target sites be located in the 3؅ untranslated regions of transcripts. In a introduced into the 5Ј UTR of transcripts can repress translation computational screen for highly conserved motifs within coding (12). Further, the potential importance of sites embedded within regions, we found an excess of sequences conserved at the nucle- the coding sequence of genes is supported by the nature of the otide level within coding regions in the human genome, the genetic code itself, which has been shown to be nearly optimal highest scoring of which are enriched for miRNA target sequences. for conveying information in parallel to the amino acid se- To validate our results, we experimentally demonstrated that the quence (13). One recent report has shown downregulation via a let-7 miRNA directly targets the miRNA-processing enzyme Dicer coding region target directly (29). within its coding sequence, thus establishing a mechanism for a We sought to investigate whether miRNA target sites within miRNA/Dicer autoregulatory negative feedback loop. We also coding regions are functional, beginning with an unbiased screen found computational evidence to suggest that miRNA target sites for coding region sequence motifs that are more highly evolu- ؅ in coding regions and 3 UTRs may differ in mechanism. This work tionarily conserved than is required for amino acid conservation. demonstrates that miRNAs can directly target transcripts within We found that the motifs most highly conserved in coding their coding region in animals, and it suggests that a complete regions are indeed miRNA target sites, and we experimentally search for the regulatory targets of miRNAs should be expanded to confirmed that the endonuclease Dicer is targeted by the include genes with recognition sites within their coding regions. As miRNA let-7 by means of sites within its coding region. We also more genomes are sequenced, the methodological approach that found computational evidence to suggest that miRNA target we used for identifying motifs with high sequence conservation Ј will be increasingly valuable for detecting functional sequence sites in coding regions and 3 UTRs may differ in mechanism. motifs within coding regions. Results and Discussion computational biology ͉ posttranscriptional regulation ͉ We developed a computational algorithm to find conserved se- comparative genomics ͉ multiple-sequence alignment ͉ quence motifs within coding regions (http://www.sitesifter.org). Our evolutionary conservation approach is based on the assumption that DNA sequences with a regulatory function should be evolutionarily conserved at the nucleotide sequence level over and above any conservation icroRNAs (miRNAs) are endogenously encoded, single- required to maintain the amino acid sequence of the encoded Mstranded regulatory RNAs that bind to and inhibit the translation of transcripts with complementary sequence (1). proteins. Most computational approaches for finding miRNA Computational evidence suggests that miRNAs regulate at least target sites put special emphasis on the miRNA target ‘‘seed’’ 20% of human genes and have been implicated in the regulation (corresponding to positions 2–7 of the miRNA) because comple- of a wide range of biological systems (2). In plants, miRNA mentarity at these base pairs has been shown to play an targets can be predicted with relatively high confidence because important role in target recognition (4, 14). In keeping with this of the extensive base pairing between plant miRNAs and their approach, we chose to search for conserved motifs 8 bp in length, target mRNAs (1). In animals, in contrast, miRNAs typically because we have found that this length increases specificity for bind to their targets with significantly less complementarity, and miRNAs. Because coding sequences are under constraint by the the short target sequences are, therefore, difficult to identify on canonical coding for amino acids, a larger number of genomes the basis of sequence alone. As a result, most computational must be used when surveying conservation in coding regions compared to 3Ј UTRs. We took advantage of a multiple- approaches to predict miRNA–target interactions rely on con- CELL BIOLOGY servation of target sites (3–6). sequence alignment of 17 genomes (human, chimpanzee, ma- Although early studies reported some evidence for the tar- caque, mouse, rat, rabbit, dog, cow, armadillo, elephant, tenrec, geting of miRNAs to sites within protein coding regions (4, 6), opossum, chicken, frog, zebrafish, green spotted puffer, and subsequent research has reported that there is minimal func- fugu) obtained from the University of California Santa Cruz tionality for sites in ORFs or 5Ј UTRs (7). A focus on miRNAs (UCSC) Genome Browser (15), used in conjunction with a list present within 3Ј UTRs is supported by evidence suggesting that of coding region boundaries extracted from the UCSC Table the G-cap/poly(A) tail interface (which connects the two ends of Browser (16). eukaryotic mRNAs during translation) is important for miRNA function (8) and that miRNAs tend to be more effective when localized at the end of the 3Ј UTR rather than the middle (7, 9). Author contributions: J.J.F. and H.A.C. designed research; J.J.F. and A.L.-M. performed Indeed, the protein translation machinery might be expected to research; J.J.F., A.L.-M., and H.A.C. analyzed data; and J.J.F. and H.A.C. wrote the paper. displace an miRNA complex present within a gene’s coding The authors declare no conflict of interest. sequence. However, exogenously added siRNAs that target This article is a PNAS Direct Submission. coding sequences, including siRNAs with imperfect base pairing, *To whom correspondence should be addressed. E-mail: [email protected]. are effective at silencing (10). More recent reports have also This article contains supporting information online at www.pnas.org/cgi/content/full/ shown that, contrary to other highly conserved coding region 0803230105/DCSupplemental. motifs, miRNA target sites are conserved in all three reading © 2008 by The National Academy of Sciences of the USA www.pnas.org͞cgi͞doi͞10.1073͞pnas.0803230105 PNAS ͉ September 30, 2008 ͉ vol. 105 ͉ no. 39 ͉ 14879–14884 Downloaded by guest on September 28, 2021 Fig. 1. Motifs with a high SLCS are enriched for miRNA target sequences. The distribution of non-zero SLCSs (black bars) from a 17-genome alignment and the distribution that resulted from an analysis of a randomly permuted genome are plotted. The distribution demonstrates that there are significantly more high-scoring motifs within coding regions compared with the distribution of scores from a randomized alignment (white bars). The highest-scoring sequence motifs (Inset) are enriched for miRNA target sequences. Results are filtered so that conserved motifs are removed if they are within 3 bp of a higher-scoring motif. Our algorithm first parses coding regions from the whole- the splicing factor TRA2␤. We expected the TRA2␤ motif to be genome multiple alignment. The fraction of coding region recognized by our algorithm, because it confers a regulatory nucleotides for which the sequences of all 17 species are aligned signal by means of DNA sequence and was also identified by a (23.4%) is then searched for perfectly conserved sequences 8 bp previous report on coding region binding sites (17). For both in length. Conserved motifs are assigned a Sequence Level datasets, discovered motifs were matched against known human Conservation Score (SLCS) representing the degree to which the miRNAs obtained from miRBase (18, 19). Among the top 15 motif is conserved at the sequence level over and above the scoring motifs in the real dataset are four miRNA target sites amino acid level. The SLCS is based on an empirical measure of (for let-7, miR-9, miR-125a, and miR-153), which had SLCSs of the probability that a given codon is sequence-conserved, given 183.6, 153.2, 132.4, and 56.6, respectively, representing signifi- that it is amino acid-conserved [see supporting information (SI) cant enrichment compared with the total set of scored motifs Table S1 for the full list of codons and their respective sequence from the permuted dataset (P Ͻ 10Ϫ6, one-tailed hypergeomet- level conservation probabilities]. For a given motif, the SLCS ric test) (Fig. 1 Inset). Other motifs include highly conserved represents the logarithm of the probability of sequence conser- sequence elements of unknown function. vation across the motif’s constituent codons, taking into account To investigate whether high-scoring sequence motifs matching codons that partially overlap the motif. If a motif is conserved miRNA target sites are indeed responsive to their associated multiple times throughout the genome, the score is summed over miRNA, we performed a functional assay on the highest-scoring every conserved occurrence (see Materials and Methods). motif, which corresponds to the let-7 target site. The let-7 Results were obtained from the multiple alignment as well as miRNA is highly conserved and was originally demonstrated to a randomly permuted alignment in which the human genome regulate developmental timing in the roundworm Caenorhabditis sequence was left unchanged, but the choice of which codons elegans (20). More recently, let-7 has been found to play a role were fully conserved at the sequence level in other species was in cell cycle regulation and cancer in humans (21). We trans- assigned randomly. This randomization procedure alters the fected cultured human fibroblasts with a let-7 precursor or a DNA sequence of genes in the multiple alignment while main- control RNA and used microarrays to monitor changes in taining their amino acid sequences.
Recommended publications
  • Molecular Evolution and Nucleotide Sequences of the Maize Plastid Genes for the Cy Subunit of CFI (Atpa)And the Proteolipid Subunit of Cfo (Atph)
    Copyright 0 1987 by the Genetics Society of America Molecular Evolution and Nucleotide Sequences of the Maize Plastid Genes for the cy Subunit of CFI (atpA)and the Proteolipid Subunit of CFo (atpH) Steven R. Rodermel and Lawrence Bogorad The Biological Laboratories, Harvard University, Cambridge, Massachusetts 021 38 Manuscript received December 8, 1986 Accepted February 16, 1987 ABSTRACT The nucleotide sequences of the maize plastid genes for the a subunit of CFI (atpA) and the proteolipid subunit of CFo (atpH)are presented. The evolution of these genes among higher plants is characterized by a transition mutation bias of about 2:l and by rates of synonymous and nonsynony- mous substitution which are much lower than similar rates for genes from other sources. This is consistent with the notion that the plastid genome is evolving conservatively in primary sequence. Yet, the mode and tempo of sequence evolution of these and other plastidencoded coupling factor genes are not the same. In particular, higher rates of nonsynonymous substitution in atpE (the gene for the t subunit of CFI)and higher rates of synonymous substitution in atpH in the dicot vs. monocot lineages of higher plants indicate that these sequences are likely subject to different evolutionary constraints in these two lineages. The 5‘- and 3‘- transcribed flanking regions of atpA and atpH from maize, wheat and tobacco are conserved in size, but contain few putative regulatory elements which are conserved either in their spatial arrangement or sequence complexity. However, these regions likely contain variable numbers of “species-specific”regulatory elements. The present studies thus suggest that the plastid genome is not a passive participant in an evolutionary process governed by a more rapidly changing, readily adaptive, nuclear compartment, but that novel strategies for the coordinate expression of genes in the plastid genome may arise through rapid evolution of the flanking sequences of these genes.
    [Show full text]
  • Discovery of Regulatory Elements by a Computational Method for Phylogenetic Footprinting Mathieu Blanchette and Martin Tompa
    Discovery of Regulatory Elements by a Computational Method for Phylogenetic Footprinting Mathieu Blanchette and Martin Tompa Presented by Ben Bachman What is a regulatory element? In promoter region upstream of transcription sometimes in introns/UTR Regulates gene expression Not expressed itself Are conserved through evolution Implicated in many diseases: Asthma Thallassemia - reduced hemoglobin Rubinstein - mental and physical retardation Many cancers Problem: different properties than exons How does this fit into biology? G. Orphnides and D. Reinberg (2002) A Unified Theory of Gene Expression. Cell 108: 439-451. How does this fit into biology? http://kachkeis.com/img/essay3_pic1.jpg Goal: Detection of TF Binding Site Currently - analyze multiple promoters from coregulated genes, find conserved sequences Problems? Must find the coregulated genes Not all genes are coregulated with another Instead - look at orthologous and paralogous genes in different species Also uses evolutionary tree Advantages: Can work on single genes Existing tools for the job? CLUSTALW Global multiple alignment using phylogeny Won't find 5-20bp highly conserved sequence in large promoter Motif discovery MEME, Projection, Consensus, AlignAce, ANN-Spec, DIALIGN None use phylogeny Solution? New tool "FootPrinter" Method - Algorithm Dynamic programming For two related leaves, find the most parsimonious way to have all possible k-mers (4^k) for some value of k Continue up the tree Return k-mers under max parsimony score for clade Work back to find locations Only allowed
    [Show full text]
  • Insights Into Comparative Genomics, Codon Usage Bias, And
    plants Article Insights into Comparative Genomics, Codon Usage Bias, and Phylogenetic Relationship of Species from Biebersteiniaceae and Nitrariaceae Based on Complete Chloroplast Genomes Xiaofeng Chi 1,2 , Faqi Zhang 1,2 , Qi Dong 1,* and Shilong Chen 1,2,* 1 Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining 810008, China; [email protected] (X.C.); [email protected] (F.Z.) 2 Qinghai Provincial Key Laboratory of Crop Molecular Breeding, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining 810008, China * Correspondence: [email protected] (Q.D.); [email protected] (S.C.) Received: 29 October 2020; Accepted: 17 November 2020; Published: 18 November 2020 Abstract: Biebersteiniaceae and Nitrariaceae, two small families, were classified in Sapindales recently. Taxonomic and phylogenetic relationships within Sapindales are still poorly resolved and controversial. In current study, we compared the chloroplast genomes of five species (Biebersteinia heterostemon, Peganum harmala, Nitraria roborowskii, Nitraria sibirica, and Nitraria tangutorum) from Biebersteiniaceae and Nitrariaceae. High similarity was detected in the gene order, content and orientation of the five chloroplast genomes; 13 highly variable regions were identified among the five species. An accelerated substitution rate was found in the protein-coding genes, especially clpP. The effective number of codons (ENC), parity rule 2 (PR2), and neutrality plots together revealed that the codon usage bias is affected by mutation and selection. The phylogenetic analysis strongly supported (Nitrariaceae (Biebersteiniaceae + The Rest)) relationships in Sapindales. Our findings can provide useful information for analyzing phylogeny and molecular evolution within Biebersteiniaceae and Nitrariaceae.
    [Show full text]
  • Mutation Bias Shapes Gene Evolution in Arabidopsis Thaliana ​
    bioRxiv preprint doi: https://doi.org/10.1101/2020.06.17.156752; this version posted June 18, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. Mutation bias shapes gene evolution in Arabidopsis thaliana ​ 1,2† 1 1 3,4 Monroe, J. Grey ,​ Srikant, Thanvi ,​ Carbonell-Bejerano, Pablo ,​ Exposito-Alonso, Moises ,​ 5​ ​ 6 7 ​ 1† ​ Weng, Mao-Lun ,​ Rutter, Matthew T. ,​ Fenster, Charles B. ,​ Weigel, Detlef ​ ​ ​ ​ 1 Department​ of Molecular Biology, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany 2 Department​ of Plant Sciences, University of California Davis, Davis, CA 95616, USA 3 Department​ of Plant Biology, Carnegie Institution for Science, Stanford, CA 94305, USA 4 Department​ of Biology, Stanford University, Stanford, CA 94305, USA 5 Department​ of Biology, Westfield State University, Westfield, MA 01086, USA 6 Department​ of Biology, College of Charleston, SC 29401, USA 7 Department​ of Biology and Microbiology, South Dakota State University, Brookings, SD 57007, USA † corresponding​ authors: [email protected], [email protected] ​ ​ ​ Classical evolutionary theory maintains that mutation rate variation between genes should be random with respect to fitness 1–4 and evolutionary optimization of genic 3,5 ​ mutation rates remains controversial .​ However, it has now become known that ​ cytogenetic (DNA sequence + epigenomic) features influence local mutation probabilities 6 ,​ which is predicted by more recent theory to be a prerequisite for beneficial mutation 7 rates between different classes of genes to readily evolve .​ To test this possibility, we ​ used de novo mutations in Arabidopsis thaliana to create a high resolution predictive ​ ​ ​ model of mutation rates as a function of cytogenetic features across the genome.
    [Show full text]
  • Conserved Sequence Human Genome Transcription
    Conserved Sequence Human Genome Transcription Pen often save trickishly when repressible Clint compartmentalizes inestimably and axed her farandoles. Ignazio hisrenormalizing planets measuredly. her specie unlimitedly, she disc it incontrollably. Owned and unidiomatic Jereme still masquerades The early in a variety of human genome This provides information required for a deeper understanding of Mediator function in plants, suggesting that the TCP family also includes proteins with opposite functions in abiotic stress. This can result in substantial discretion in computational resources and time a produce results more efficiently. The different of avoiding false positives in genome scans for natural selection. Emergence of a new can from an intergenic region. Exons are shown as boxes, one might last a decreased level between single celled organisms compared with multicellular organisms. Understanding of conservation is a regulatory space complexity, especially when changes at the many instances where only. First by gene DNA must be converted or transcribed into messenger RNA. Regulators of Gene Activity in Animals Are Deeply Conserved. Thank you for anyone interest in spreading the expertise on Plant Physiology. Knowing which sequence. Phylogenetic sequence alignment as conserved sequences efficiently discover functional crms, transcription factors in genomes, low diploid chromosome? 1 General questions Which elements may be involved in regulation of gene transcription. A c-myc tag where a polypeptide protein tag derived from the c-myc gene product that. Junk dna sequences belong to human evolutionary age of transcription beyond positional conservation values indicate that nevertheless, it allows us branch of sciences. These sequences than human genome sequencing techniques are biologically relevant transcript.
    [Show full text]
  • Chapter 3. the Beginnings of Genomic Biology – Molecular
    Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology – molecular genetics 3.1. DNA is the Genetic Material 3.6.5. Translation initiation, elongation, and termnation 3.2. Watson & Crick – The structure of DNA 3.6.6. Protein Sorting in Eukaryotes 3.3. Chromosome structure 3.7. Regulation of Eukaryotic Gene Expression 3.3.1. Prokaryotic chromosome structure 3.7.1. Transcriptional Control 3.3.2. Eukaryotic chromosome structure 3.7.2. Pre-mRNA Processing Control 3.3.3. Heterochromatin & Euchromatin 3.4. DNA Replication 3.7.3. mRNA Transport from the Nucleus 3.4.1. DNA replication is semiconservative 3.7.4. Translational Control 3.4.2. DNA polymerases 3.7.5. Protein Processing Control 3.4.3. Initiation of replication 3.7.6. Degradation of mRNA Control 3.4.4. DNA replication is semidiscontinuous 3.7.7. Protein Degradation Control 3.4.5. DNA replication in Eukaryotes. 3.8. Signaling and Signal Transduction 3.4.6. Replicating ends of chromosomes 3.8.1. Types of Cellular Signals 3.5. Transcription 3.8.2. Signal Recognition – Sensing the Environment 3.5.1. Cellular RNAs are transcribed from DNA 3.8.3. Signal transduction – Responding to the Environment 3.5.2. RNA polymerases catalyze transcription 3.5.3. Transcription in Prokaryotes 3.5.4. Transcription in Prokaryotes - Polycistronic mRNAs are produced from operons 3.5.5. Beyond Operons – Modification of expression in Prokaryotes 3.5.6. Transcriptions in Eukaryotes 3.5.7. Processing primary transcripts into mature mRNA 3.6. Translation 3.6.1.
    [Show full text]
  • The Most Conserved Genome Segments for Life Detection on Earth and Other Planets
    Orig Life Evol Biosph DOI 10.1007/s11084-008-9148-z ASTROBIOLGY The Most Conserved Genome Segments for Life Detection on Earth and Other Planets Thomas A. Isenbarger & Christopher E. Carr & Sarah Stewart Johnson & Michael Finney & George M. Church & Walter Gilbert & Maria T. Zuber & Gary Ruvkun Received: 17 June 2008 /Accepted: 23 September 2008 # Springer Science + Business Media B.V. 2008 Abstract On Earth, very simple but powerful methods to detect and classify broad taxa of life by the polymerase chain reaction (PCR) are now standard practice. Using DNA primers corresponding to the 16S ribosomal RNA gene, one can survey a sample from any environment for its microbial inhabitants. Due to massive meteoritic exchange between Earth and Mars (as well as other planets), a reasonable case can be made for life on Mars or other planets to be related to life on Earth. In this case, the supremely sensitive technologies used to study life on Earth, including in extreme environments, can be applied to the search for life on other planets. Though the 16S gene has become the standard for life detection on Earth, no genome comparisons have established that the ribosomal genes are, in fact, the most conserved DNA segments across the kingdoms of life. We present here a computational comparison of full genomes from 13 diverse organisms from the Archaea, Bacteria, and Eucarya to identify genetic sequences conserved across the widest divisions of life. Our results identify the 16S and 23S ribosomal RNA genes as well as other universally conserved nucleotide sequences in genes encoding particular classes of transfer RNAs and within the nucleotide binding domains of ABC transporters as the most conserved DNA Christopher E.
    [Show full text]
  • "The" Genetic Code?
    Evolutionary Anthropology 14:6–11 (2005) CROTCHETS & QUIDDITIES “The” Genetic Code? KENNETH M. WEISS AND ANNE V. BUCHANAN The DNA-based code for protein through messenger and transfer RNA is widely themselves, that carry the informa- regarded as the code of life. But genomes are littered with other kinds of coding tion. elements as well, and all of them probably came after a supercode for the tRNA Your life depends on the fidelity of system itself. these many codes. Aberrant codes re- lated to cell behavior can lead to dys- genesis or various metabolic diseases. Evolution and the diversification of Everyone knows of “the” genetic Anomalous cell-surface proteins can organisms are made possible by code, by which nucleotide triplets in cause autoimmune destruction, and vi- codes, or arbitrary assignments of DNA in the nucleus of cells specify the ruses are the Alan Turings of life that “meaning,” in multiple ways. Many amino acid (aa) sequence of proteins. evolve ways to break their receptor are not widely appreciated. Codes al- This is the code described in text- codes to gain illicit entry into cells (Fig. low the same system of components books as the heart of the genetic the- 1). to be used for multiple purposes. ory of life and its evolution. Discover- But there is an additional code, a These can be open-ended, the way the ies in recent years have made things code of codes, that makes all of this alphabet and vocabulary make this more complicated by showing that ge- possible, including “the” genetic code column possible, but the flexibility of nomes are littered with all sorts of itself, and may be the oldest and most a code can become constrained once a other kinds of coding elements.
    [Show full text]
  • Designing Lentiviral Vectors for Gene Therapy of Genetic Diseases
    viruses Review Designing Lentiviral Vectors for Gene Therapy of Genetic Diseases Valentina Poletti 1,2,3,* and Fulvio Mavilio 4 1 Department of Woman and Child Health, University of Padua, 35128 Padua, Italy 2 Harvard Medical School, Harvard University, Boston, MA 02115, USA 3 Pediatric Research Institute City of Hope, 35128 Padua, Italy 4 Department of Life Sciences, University of Modena and Reggio Emilia, 41125 Modena, Italy; [email protected] * Correspondence: [email protected] Abstract: Lentiviral vectors are the most frequently used tool to stably transfer and express genes in the context of gene therapy for monogenic diseases. The vast majority of clinical applications involves an ex vivo modality whereby lentiviral vectors are used to transduce autologous somatic cells, ob- tained from patients and re-delivered to patients after transduction. Examples are hematopoietic stem cells used in gene therapy for hematological or neurometabolic diseases or T cells for immunotherapy of cancer. We review the design and use of lentiviral vectors in gene therapy of monogenic diseases, with a focus on controlling gene expression by transcriptional or post-transcriptional mechanisms in the context of vectors that have already entered a clinical development phase. Keywords: lentiviral vectors; transcriptional regulation; post-transcriptional regulation; miRNA; promoters; retroviral integration; ex vivo gene therapy Citation: Poletti, V.; Mavilio, F. 1. Introduction Designing Lentiviral Vectors for Gene Therapy of Genetic Diseases.
    [Show full text]
  • Analysis of Codon Usage Patterns in Giardia Duodenalis Based on Transcriptome Data from Giardiadb
    G C A T T A C G G C A T genes Article Analysis of Codon Usage Patterns in Giardia duodenalis Based on Transcriptome Data from GiardiaDB Xin Li, Xiaocen Wang, Pengtao Gong, Nan Zhang, Xichen Zhang and Jianhua Li * Key Laboratory of Zoonosis Research, Ministry of Education, College of Veterinary Medicine, Jilin University, Changchun 130062, China; [email protected] (X.L.); [email protected] (X.W.); [email protected] (P.G.); [email protected] (N.Z.); [email protected] (X.Z.) * Correspondence: [email protected]; Tel.: +86-431-8783-6172; Fax: +86-431-8798-1351 Abstract: Giardia duodenalis, a flagellated parasitic protozoan, the most common cause of parasite- induced diarrheal diseases worldwide. Codon usage bias (CUB) is an important evolutionary character in most species. However, G. duodenalis CUB remains unclear. Thus, this study analyzes codon usage patterns to assess the restriction factors and obtain useful information in shaping G. duo- denalis CUB. The neutrality analysis result indicates that G. duodenalis has a wide GC3 distribution, which significantly correlates with GC12. ENC-plot result—suggesting that most genes were close to the expected curve with only a few strayed away points. This indicates that mutational pressure and natural selection played an important role in the development of CUB. The Parity Rule 2 plot (PR2) result demonstrates that the usage of GC and AT was out of proportion. Interestingly, we identified 26 optimal codons in the G. duodenalis genome, ending with G or C. In addition, GC content, gene expression, and protein size also influence G.
    [Show full text]
  • A Conserved Heptamer Motif for Ribosomal RNA Transcription
    Proc. Nadl. Acad. Sci. USA Vol. 91, pp. 5368-5371, June 1994 Evolution A conserved heptamer motif for ribosomal RNA transcription termination in animal mitochondria (transcription termination signal/mitochondria/nitochondrial ribosomal RNA/motif conservation) Jose R. VALVERDE, ROBERTO MARCO, AND RAFAEL GARESSE* Departamento de Bioqufmica, Instituto de Investigaciones Biom6dicas, Facultad de Medicina, Universidad Aut6noma de Madrid, c/Arzobispo Morcillo 4, 28029 Madrid, Spain Communicated by Arthur Kornberg, January 24, 1994 ABSTRACT A search of sequence data bases for a tridec- secondary stem-loop at the 3' end of the 23S-like rRNA amer transcription termination signal, previously described in present in most prokaryotic and eukaryotic genomes. human mtDNA as being responsible for the accumulation of mitochondrial ribosomal RNAs (rRNAs) in excess over the rest The Termination Signal Is Conserved in Animal mtDNA of mitochondrial genes, has revealed that this termination signal occurs in equivalent positions in a wide variety of In all vertebrate mtDNA sequences now available in the organisms from protozoa to mammais. Due to the compact European Molecular Biology Laboratory (EMBL) and Gen- organiation of the mtDNA, the tridecamer motif usually Bank data bases, the tridecamer sequence is conserved in the appears as part of the 3' adjacent gene sequence. Because in tRNALu(UuR) gene located 3' downstream adjacent to the phylogenetically widely separated organisms the mitochondrial 16S rRNA (Fig. 1). The similar mitochondrial genomic or- genome has experienced many rearrangements, it is interesting ganization in vertebrates (10, 11) and the fact that the that its occurrence near the 3' end of the large rRNA is sequence conservation in the tRNALeU(UUR) is very high (12) independent ofthe adjacent gene.
    [Show full text]
  • Transcription and Open Reading Frame
    Transcription The expression of genetic information stored in the DNA sequence starts with synthesis of the RNA copy of the gene in a process called transcription. The RNA copy of the gene is called messenger RNA (mRNA). A special enzyme, RNA polymerase, recognizes a sequence, called promoter, on the DNA double helix upstream from the protein coding sequence. RNA polymerase binds to the promoter and opens it up: separates the complementary strands at about 12-nt-long region of the promoter. Then the enzyme starts the mRNA synthesis using all 4 NTPs. The DNA strand, which is used as a template by RNA polymerase, is called the template or antisense strand. The opposite strand of the gene, which sequence is identical to the sequence of mRNA (except the substitution T U, of course), is called the coding or sense strand. There is also a special sequence after the end of the gene, which signals to RNA polymerase to terminate the mRNA synthesis. Thus synthesized mRNA molecule, which includes the protein coding region flanked by short unrelated sequences from both sides, is either translated by the ribosome into the protein molecule at the spot (in case of prokaryotes) or is transported from the nucleus to the cytoplasm (in case of eukaryotes) and there it is translated by the ribosome. Open Reading Frame (ORF) Since the genetic code is triplet, three reading frames are possible for the same mRNA molecule. For instance, the sequence: …..AUUGCCUAACCCUUAGGG…. can be separated into triplets by three possible ways: ….AUUGCCUAACCCUUAGGG…. ….AUUGCCUAACCCUUAGGG…. ….AUUGCCUAACCCUUAGGG…. The frame, in which no stop codons are encountered, is called the Open Reading Frame (ORF).
    [Show full text]