The Simple Biology of Flipons and Condensates Enhances the Evolution of Complexity
Total Page:16
File Type:pdf, Size:1020Kb
molecules Review The Simple Biology of Flipons and Condensates Enhances the Evolution of Complexity Alan Herbert Unit 3412, Discovery, InsideOutBio 42 8th Street, Charlestown, MA 02129, USA; [email protected] Abstract: The classical genetic code maps nucleotide triplets to amino acids. The associated sequence composition is complex, representing many elaborations during evolution of form and function. Other genomic elements code for the expression and processing of RNA transcripts. However, over 50% of the human genome consists of widely dispersed repetitive sequences. Among these are simple sequence repeats (SSRs), representing a class of flipons, that under physiological conditions, form alternative nucleic acid conformations such as Z-DNA, G4 quartets, I-motifs, and triplexes. Proteins that bind in a structure-specific manner enable the seeding of condensates with the potential to regulate a wide range of biological processes. SSRs also encode the low complexity peptide repeats to patch condensates together, increasing the number of combinations possible. In situations where SSRs are transcribed, SSR-specific, single-stranded binding proteins may further impact condensate formation. Jointly, flipons and patches speed evolution by enhancing the functionality of condensates. Here, the focus is on the selection of SSR flipons and peptide patches that solve for survival under a wide range of environmental contexts, generating complexity with simple parts. Keywords: Z-DNA; Z-RNA; flipons; simple repeats; condensates; G4; evolution; MYC; non-coding RNA; DNA conformation; complexity; phase separation; enhancersome; nucleosome Citation: Herbert, A. The Simple Biology of Flipons and Condensates Enhances the Evolution of Complexity. Molecules 2021, 26, 4881. 1. Starting Simple https://doi.org/10.3390/ The theme of this article is presented in Figure1. Here, a nucleic acid structural molecules26164881 motif is recognized by a structure-specific protein interaction. The nucleic acid acts as a scaffold and the protein as an anchor for cellular machines. Patches on the anchor Academic Editor: Akinori Kuzuya protein provide a docking site for other proteins (Figure1, left panel). The outcome depends on the functions of the assembled proteins, any one of which may have peptide Received: 25 June 2021 patches for the attachment of additional proteins. In the simplest case, both the nucleic Accepted: 10 August 2021 acid structures and the peptide patches are encoded by simple sequence repeats (SSRs) Published: 12 August 2021 (Figure1, right panel). The approach to build complex biological machines is adaptative for ever-changing environments. Publisher’s Note: MDPI stays neutral The scheme exploits the properties of SSRs that enable them to encode alternative with regard to jurisdictional claims in nucleic acid structures, called flipons, to specify simple peptide patches that fold in different published maps and institutional affil- ways and to engage sequence-specific, single-stranded binding proteins that play multiple iations. roles in condensate biology. We will focus in this review on the biology of flipons and peptide patches: how they can initiate and how they can promote condensate formation to optimize responses. We will discuss the role of SSRs in disease and their evolutionary impact. To introduce the concepts, we will start with a description of condensates and their Copyright: © 2021 by the author. role in classical genetics and then move on to SSRs and flipon genetics. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). Molecules 2021, 26, 4881. https://doi.org/10.3390/molecules26164881 https://www.mdpi.com/journal/molecules Molecules 2021, 26, x FOR PEER REVIEW 2 of 17 Molecules 2021, 26, 4881 2 of 16 Figure 1. Simple nucleic acid structures andand structure-specificstructure-specific proteins proteins with with low-complexity low-complexity patches patches enable enable many many combina- combi- nationstions that that enhance enhance rapid rapid evolution evolution based based on provenon proven functional functional domains. domains. The The left panelleft panel displays displays flipons flipons that formthat form alternative alter- nativenucleic nucleic acid structures acid structures under physiological under physiological conditions conditions and are recognized and are recognized by structure-specific by structure proteins-specif nucleateic proteins condensates nucleate condensates that perform specific functions. The condensates assembled depend on the flipon conformation and protect that perform specific functions. The condensates assembled depend on the flipon conformation and protect the alternative the alternative structures by walling them off to maintain genome integrity. The right panel displays how simple sequence structures by walling them off to maintain genome integrity. The right panel displays how simple sequence repeats impact repeats impact condensate formation through structural motifs, peptide patches (highlighted in yellow), and sequence- specificcondensate interactions formation with through single structural-stranded motifs, binding peptide proteins patches that can (highlighted also alter inthe yellow), condensates and sequence-specific formed. interactions with single-stranded binding proteins that can also alter the condensates formed. 2. Condensates 2. Condensates Condensates are defined as reversible protein assemblies. As the importance of con- Condensates are defined as reversible protein assemblies. As the importance of densates has become better understood, much attention has focused on the way peptide- condensates has become better understood, much attention has focused on the way peptide- patch-dependent assemblies work, especially those interactions involving intrinsically patch-dependent assemblies work, especially those interactions involving intrinsically disordered regions (IDRs) [1,2] and low-complexity domains (LCDs) of proteins, with the disordered regions (IDRs) [1,2] and low-complexity domains (LCDs) of proteins, with the former term applying to structure and the later to sequence [3].The field is advancing fast, former term applying to structure and the later to sequence [3].The field is advancing with the view that multivalent, low-affinity interactions mediated by IDRs allow the con- fast, with the view that multivalent, low-affinity interactions mediated by IDRs allow the centrationconcentration and andretention retention of proteins of proteins once once a acondensate condensate is is nucleated nucleated by by a a h high-affinityigh-affinity event.event. Recent Recent findings findings also also suggest suggest th thatat other other interactions interactions based based on IDR IDRss and LCDs help initiallyinitially localiz localizee high high-affinity-affinity components to their site of action. CondensatesCondensates are are often often discussed discussed in relationship to phase separation, an event where prproteinsoteins partition into discrete compartments. Phase separation has many advantages. It allowsallows concentration concentration of of protein activities within a defineddefined milieu. This localization can affectaffect protein protein function function through through differences differences in in free free water, water, salt salt concentrations, concentrations, lipid lipid profile, profile, effectiveeffective pH pH,, and and other chemical properties properties specific specific to that environment [4] [4].. Initiall Initially,y, the formationformation of of condensates condensates is is nucleated nucleated by by high high-affinity-affinity interactions interactions and and then then extended extended by weakby weak,, multivalent, multivalent, reversible reversible interactions interactions of high of high avidity avidity [5]. Multivalent [5]. Multivalent interactions interactions me- diatedmediated by IDRs by IDRs and andLCDs LCDs are important are important at both at bothstage stagess [6,7]. [IDR6,7]. appear IDR appear essential essential to local- to izinglocalizing transcription transcription factors factors to the to sites the sites where where they they are areobserved observed to act to actin cells in cells [8], [ 8while], while in otherin other examples, examples, LCD LCDss are are essential essential for for seeding seeding condensate condensate formation formation [9]. [ 9With]. With aging aging of condensatesof condensates,, the theplasticity plasticity and andreversible reversible nature nature of their of their formation formation is sometimes is sometimes lost, pro- lost, ducingproducing insoluble insoluble aggregates aggregates similar similar to tothose those observed observed in in many many diseases diseases [10] [10].. The The more generalgeneral consequences consequences of of phase phase separation separation depend depend on on what what else else is is happening happening in in the the cell, cell, particularlyparticularly on the availability of each component and its ability to partition at that time. RNAsRNAs also also influence influence condensate condensate formation formation [11 [11––1414]].. RNAs RNAs can can act act in in a a non non-sequence--sequence- specificspecific manner,manner, withwith interactions interactions often often involving involving repetitive repetitive RNAs RNAs [12 [12,13,13]. For]. For example, example, the thenucleolar nucleolar size size depends depends