2021.01.13.426573V1.Full.Pdf

bioRxiv preprint doi: https://doi.org/10.1101/2021.01.13.426573; this version posted January 13, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. EVOLUTIONARY PERSPECTIVE AND EXPRESSION ANALYSIS OF INTRONLESS GENES HIGHLIGHT THE CONSERVATION ON THEIR REGULATORY ROLE Katia Aviña-Padilla1,2, José Antonio Ramírez-Rafael3, Gabriel Emilio Herrera-Oropeza1,4, Vijaykumar Muley1, Dulce I. Valdivia2, Erik Díaz-Valenzuela2, Andrés García-García3, Alfredo Varela-Echavarría1* and Maribel Hernández-Rosales2*. 1Instituto de Neurobiología, Universidad Nacional Autónoma de México, Querétaro, México. 2Centro de Investigación y de Estudios Avanzados del IPN, Unidad Irapuato, Guanajuato, México. 3Centro de Física Aplicada y Tecnología Avanzada, Universidad Nacional Autónoma de México, Querétaro, México. 4Centre for Developmental Neurobiology, Institute of Psychiatry, Psychology, and Neuroscience, King's College London, London, United Kingdom. * Correspondence: Maribel Hernández-Rosales [email protected] Alfredo Varela-Echavarría [email protected] Keywords: intronless genes1, exon-intron architecture2, embryonic telencephalon3, protocadherins4, histones5, transcription factors6, evolutionary histories7, microproteins8. bioRxiv preprint doi: https://doi.org/10.1101/2021.01.13.426573; this version posted January 13, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International licenseIntronless. genes regulatory role Abstract Eukaryotic gene structure is a combination of exons generally interrupted by intragenic non-coding DNA regions termed introns removed by RNA splicing to generate the mature mRNA. Thus, eukaryotic genes can be either single exon genes (SEGs) or multiple exon genes (MEGs). Among SEGs, intronless genes (IGs) are a subgroup that additionally lacks introns at their UTRs, and code for proteins essentially involved in development, growth, and cell proliferation. Gene expression of IGs has been proposed to be highly specialized for neuro-specific functions and linked to cancer, neuropathies, and developmental disorders. The abundant presence of introns in eukaryotic genomes is pivotal for the precise control of gene expression. Notwithstanding, IGs exempting splicing events entail a higher transcriptional fidelity, making them even more valuable for regulatory roles. This work aimed to infer the functional role and evolutionary history of IGs using the mouse genome. Intronless protein-coding genes consist of a subgroup of ~6 % of a total of 21,527 genes with one exon. To understand the prevalence, biological relevance, and evolution, we identified and studied their 1,116 functional proteins. We validated differential expression in transcriptomics data of early embryo stages using mouse telencephalon tissue. Our results showed that expression levels of IGs are lower compared to MEGs. However, strongly upregulated IGs include transcription factors (TFs) such as the class 3 of POU (HMG Box), Neurog1, Olig1, and BHLHe22, BHLHe23, among other essential genes including the beta cluster of protocadherins. Most striking was the finding that IG-encoded BHLH TFs qualify the criteria to be referred to as microprotein candidates. Finally, predicted protein orthologs in other six genomes confirmed a high conservancy of IGs associated with regulating neurobiological processes and with chromatin organization and epigenetic regulation in Vertebrata. Moreover, this study highlights that IGs are essential modulators of regulatory processes, as Wnt signaling pathway and biological processes as pivotal as sensory organs developing at a transcriptional and post-translational level. Overall, our results suggest that IG proteins have specialized, prevalent, and unique biological roles and that functional divergence between IGs and MEGs is likely to be the result of specific evolutionary constraints. bioRxiv preprint doi: https://doi.org/10.1101/2021.01.13.426573; this version posted January 13, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International licenseIntronless. genes regulatory role Introduction Genes with one exon are characteristic of prokaryotic genomes. Most eukaryotic genes, in contrast, contain introns, nucleotide DNA sequences that after transcription as part of the messenger RNA are removed by splicing during its maturation. The latter are thus termed multiple exon genes (MEGs), although eukaryotic genomes also contain an important proportion of single-exon genes (SEGs). Diverse studies of SEGs in eukaryotes have been performed over the past decades (Tine et al. 2011; Zou, Guo, and He 2011; K. R. Sakharkar et al. 2006; Yan et al. 2014). A precise definition, however, has only been recently formulated. The improved ontology defines an SEG as a nuclear gene whose protein-coding sequence comprises only one exon. Pseudogenes, functional RNAs, tRNA, rRNA, ribozyme long non-coding RNAs, and miRNAs are excluded from this definition. This classification scheme includes two main groups of SEGs; those having introns in their untranslated regions ‘uiSEGs’; and those lacking introns in the entire gene, termed ‘Intronless Genes’ (IGs) (Jorquera et al. 2018). Because of their prokaryotic-like architecture, IGs in eukaryotic genomes, provide interesting datasets for computational analysis in comparative genomics and evolutionary trajectories. Comparative analysis of their sequences between different genomes could help to identify their unique and conserved features, thus providing insights into the role of introns in gene evolution and lead to a better understanding of genome architecture and arrangement. The abundant presence of introns in most genes of multicellular organisms entails regulatory processes associated with the generation of multiple splice variants not present in intronless genes. In consequence, IGs not requiring splicing events represents a higher transcriptional fidelity, which makes them even more valuable for regulatory roles. To date, more than 2000 protein-coding genes in the human genome have been determined as SEGs (https://www.ensembl.org/index.html). Among them, a considerable amount of IGs is responsible for encoding proteins such as G-protein-coupled receptors (GPCRs), canonical histones, transcription factors, many genes involved in signal transduction, as well as some involved in the regulation of development, growth, and cell proliferation (Meena K Sakharkar et al. 2005; Grzybowska 2012). The expression of IGs has been proposed to be highly specialized for neuro-specific functions and linked to diseases such as cancer, neuropathies, and developmental disorders. Examples of IGs with bioRxiv preprint doi: https://doi.org/10.1101/2021.01.13.426573; this version posted January 13, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International licenseIntronless. genes regulatory role clinical relevance are the RPRM gene related to gastric cancer which causes increased cell proliferation and possesses tumor suppression activity (Amigo et al. 2018) and the protein kinase CK2α gene which is up-regulated in all human cancers (Hung et al. 2010). Other IGs linked to cancer include CLDN8 in colorectal carcinoma and renal cell tumors, ARLTS1 in melanoma, and PURA and TAL2 in leukemia (Grzybowska 2012). IGs have also been associated with neuropathies, such as ECDR1, a cerebellar degeneration-related protein, and NPBWR2, a neuropeptide B/W receptor type (Louhichi, Fourati, and Rebaï 2011). Regarding their role in the diseases described above, IGs in humans are potential clinical biomarkers and drug targets that deserve careful consideration (Grzybowska 2012; Ohki et al. 2000; Liu et al. 2017). Their functional role and their evolutionary conservation in other genomes, however, remains poorly understood. Furthermore, there is a current debate between related theories that place the origin of introns, early or late during the evolutionary history of eukaryotes (Fedorova and Fedorov 2003). With this backdrop, our work aimed to characterize the functional role of mouse IGs and to infer their evolutionary pattern across six additional vertebrate genomes. We have analyzed their expression, particularly during brain development at early embryonic stages, and their potential as transcriptional as well as post-translational modulators. Overall, this study sheds light on the concerted role played by this peculiar group of genes and helps contrast the functional features of intron-containing and intronless genes across vertebrate species and their collective evolutionary roadmaps. bioRxiv preprint doi: https://doi.org/10.1101/2021.01.13.426573; this version posted January 13, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International licenseIntronless.

2021.01.13.426573V1.Full.Pdf

Genomic Correlates of Relationship QTL Involved in Fore- Versus Hind Limb Divergence in Mice

Environmental Influences on Endothelial Gene Expression

A Computational Approach for Defining a Signature of Β-Cell Golgi Stress in Diabetes Mellitus

A Clinicopathological and Molecular Genetic Analysis of Low-Grade Glioma in Adults

Supplementary Table 1: Adhesion Genes Data Set

LETTER Doi:10.1038/Nature09515

Identification of Gene Modules Associated with Warfarin Dosage by a Genome-Wide DNA Methylation Study

Sporadic Autism Exomes Reveal a Highly Interconnected Protein Network of De Novo Mutations

Pathogenetic Subgroups of Multiple Myeloma Patients

Nº Ref Uniprot Proteína Péptidos Identificados Por MS/MS 1 P01024

PCDHB3 Polyclonal Antibody (A01) Cell-Cell Connections

The Consensus Coding Sequences of Human Breast and Colorectal Cancers Tobias Sjöblom,1* Siân Jones,1* Laura D