The Simple Emergence of Complex Molecular Function
Total Page:16
File Type:pdf, Size:1020Kb
The simple emergence of complex molecular function Susanna Manrubia Department of Systems Biology, National Centre for Biotechnology (CSIC). c/ Darwin 3, 28049 Madrid, Spain Interdisciplinary Group of Complex Systems (GISC), Madrid, Spain (Dated: May 26, 2021) At odds with a traditional view of molecular evolution that seeks a descent-with-modification relationship between functional sequences, new functions can emerge de novo with relative ease. At early times of molecular evolution, random polymers could have sufficed for the appearance of incipi- ent chemical activity, while the cellular environment harbors a myriad of proto-functional molecules. The emergence of function is facilitated by several mechanisms intrinsic to molecular organization, such as redundant mapping of sequences into structures, phenotypic plasticity, modularity, or co- operative associations between genomic sequences. It is the availability of niches in the molecular ecology that filters new potentially functional proposals. New phenotypes and subsequent levels of molecular complexity could be attained through combinatorial explorations of currently available molecular variants. Natural selection does the rest. I. INTRODUCTION ble genotypes coding for comparable phenotypes [20, 21]. Further, function is flexible, so phenotypes admit a range Half a century ago, the idea that gene specificity could of variation, and phenotypes are plastic, so their ex- rely on a unique protein sequence raised concerns regard- pression adapts to different environments [22, 23]. Be- ing the come into being of functional genes. Natural se- yond multiple inconsequential variations in genotypes, lection would be ineffective if the raw material on which also changes in molecular structure or composition might it had to act were random sequences, given that a myriad be irrelevant for the functionality of a phenotype. of universes as old and as large as ours, where random Molecular mimicry [24], protein moonlighting [25, 26] sequences would be systematically generated, appeared or enzyme promiscuity [27] are three widespread expres- absolutely insufficient to produce the tiniest functional sions of phenotypic redundancy and functional flexibil- molecule [1]. The apparent paradox, whose assumptions ity. Molecular mimicry was originally defined in the echoed intelligent design arguments [2], was obviating the context of immunology: occasionally, self-peptides are overwhelming number of neutral and quasi-neutral muta- sufficiently similar in sequence so as to be mistaken tions [3–5] and the fact that partial functionality is better by pathogen-derived peptides and, consequently, trigger than no functionality at all [6]. Actually, molecular com- an autoimmune response [24]. The concept, however, plexity might be relatively straight to achieve. can be extended to embrace the many instances where Molecular function is highly redundant. The same molecules of dissimilar composition disguise to resemble molecular phenotype can be obtained from an astronom- others’ structure [28]. In protein moonlighting, the same ically large number of different genotypes, as revealed by protein may have multiple, context-dependent functions; computational and empirical studies of complete genome this property has been also reported for RNA molecules spaces for short sequences [7–9]. Such genotypes are or- [29]. Enzyme promiscuity is a conceptually related no- ganized as networks-of-networks [10, 11], a non-trivial tion that refers to the ability of an enzyme to catalyze topology with important dynamic consequences [11] that reactions other than those for which it was in principle se- also points to the need of an updated metaphor to rep- lected. RNA promiscuity naturally appears when struc- arXiv:2105.11784v1 [q-bio.PE] 25 May 2021 resent adaptive landscapes [12]. The abundance of phe- tures compatible with a given sequence, but different notypes is not homogeneous in sequence space [8, 13–15], from the minimum-free-energy secondary structure, are since most sequences are mapped into a small fraction of considered [30]. The former mechanisms illustrate com- very large phenotypes. This fraction, however, suffices plementary properties of relevance in evolution: on the to guarantee a spectrum of different and efficient enough one hand, molecules of different origins might perform functions so as to sustain life as we know it. The high similar functions; on the other hand, the same molecule dimensionality of genotype spaces and the fact that the can be recruited to perform a different function. Phe- networks of abundant phenotypes percolate the space of notype tolerance to endogenous and exogenous variation sequences under very general conditions [16] further en- implies that genes and, in general, any sort of functional sures that different functions may be awaiting just a few molecule may not need initial adjustments to engage into mutations apart [17, 18]. a secondary function [31]. But, once in place, natural se- The genotype-to-function map is redundant in several lection can act towards optimization, if needed. different ways beyond neutral and quasi-neutral muta- The skewed distribution of phenotype sizes conditions tions. The many layers of expression from genotype to what is visible to evolution [32] but may, under certain function [19] act towards increasing the ensemble of possi- circumstances, also facilitate the appearance of simple 2 molecular functions de novo. The emergence of func- quantifying the bias in phenotype (structure) abundance. tion has in all likelihood been relevant in the origin of Accordingly, common phenotypes are orders of magni- an RNA world [33, 34] and in the genesis of simple repli- tude more frequent than average-sized or small pheno- cators [35]. Still, single gene or single molecule redun- types (which are, by any reasonable measure, invisible to dancy, and the selective improvement of their function evolution). RNA structure is tightly related to molecular through point mutations, only represents the fine-tuning function and, as such, it may have played a main role at of molecular evolution. Once originated and optimized, the early stages of chemical evolution and, especially, in small functional sequences might act as the basic bricks an RNA world predating modern cells [51]. The reper- of multi-purpose molecules [36] through a modular con- toire of secondary structures in large populations of short structive principle that applies from proteins [37] to or- RNA polymers is limited [52]: topologically simple RNA ganisms [38, 39]. modules are abundant. In open RNA chains, there is Ensembles of agents that replicate independently — a predominance of stem–loops (composed of a stem and which, in a certain sense, may act as competitors— can a hairpin loop) and hairpin structures [53], while closed integrate to form a new, more complex entity [40] through RNA chains preferentially fold into rod-like structures (a what is known as a major evolutionary transition [41]. stem closed by two hairpin loops) [54]. This fact is solely Many of these transitions are cooperative [42], a paradig- based on thermodynamic principles [55] and implies that, matic example at the molecular level being the emergence by default, short RNA sequences will predominantly yield of chromosomes. However, there are multiple examples, a handful of structures. especially in the viral world, where flexible cooperation, The relevant implication of this high redundancy is that is, without irreversible fusion of the parts, appears as that some of those abundant structures might be the a successful adaptive strategy. In viral quasispecies, eco- end result of random RNA polymerization (as it could logical roles can be allocated in different mutant classes have been the case in prebiotic environments [56]). With [43], and collective cooperation may be needed to main- no previous selection, random polymers might have thus tain pathogenesis [44]. In viruses, horizontal gene trans- covered an array of incipient functions. Such is the case fer (HGT) is not only a common mechanism for adap- of a variety of hairpin-like structures able to promote tation, but probably also a way to generate new viral ligation reactions [57, 58], or of hammerhead structures species [45]. A remarkable form of distributed coop- involved in cleavage [59]. RNA self-ligation might in- eration is that of multipartite viruses, whose genes are deed be instrumental in the modular construction of more propagated in independent capsids [46, 47]. Viruses are complex ribozymes, as theoretically proposed [33] and a powerful system of generation of new molecular func- empirically shown [60, 61]. tion. Together with other mobile elements, they func- There are reasons to believe that the severe pheno- tion as transporters of genomic sequences and may assist typic bias of the sequence-to-structure RNA map, as their integration in higher organisms. In the coevolu- quantified through the log-normal distribution of pheno- tion of viruses with hosts, the former may even promote type sizes, is a general property of genotype-to-phenotype increases in host complexity [48] and spur major evolu- maps [9, 15, 19, 21]. If this is the case, the scenario de- tionary transitions [49]. scribed for RNA should hold broadly, and apply to other In the forthcoming sections, we will focus on specific polynucleotides, to peptides, and to polymers at large. examples that illustrate how some of the mechanistic The next step in the construction of chemical complexity principles outlined in this introduction