Ancestral Sequence Reconstruction As a Tool to Understand Natural History and Guide Synthetic Biology: Realizing and Extending the Vision of Zuckerkandl and Pauling
Total Page:16
File Type:pdf, Size:1020Kb
CHAPTER 2 Ancestral sequence reconstruction as a tool to understand natural history and guide synthetic biology: realizing and extending the vision of Zuckerkandl and Pauling Eric A. Gaucher 2.1 Historical context opportunity to ‘‘replay’’ the molecular tape of life The recent accumulation of DNA sequence data, (Gould, 1989). combined with advances in evolutionary theory It is probably appropriate that any discussion of and computational power, have paved the way for ancestral sequence reconstruction be placed within innovative approaches to understanding the ori- a historical framework. By doing so in this chapter, gins, evolution, and distribution of life and its con- I hope to convey how the field has made recent stituent biomolecules (Pauling and Zuckerkandl, advances, and emphasize the need to continue this 1963; Benner et al., 2002; Gaucher et al., 2004). progress so that our concepts enjoy wider recog- One approach to understanding ancestral states nition and create greater impact than the achieve- follows a present-day-backwards strategy, where- ments to date. The chapter will begin with a brief by genomic sequences from extant (modern) account of the field during the 1990s. I will con- organisms are incorporated into evolutionary tinue with a discussion on one of the resurrected models that estimate the extinct (ancient) character gene families that has been studied, and then states of genes no longer present on Earth (Fitch, conclude with a discussion on future directions in 1971; Shih et al., 1993; Benner, 1995; Koshi and the field, with a particular emphasis on evolu- Goldstein, 1996; Schultz et al., 1996; Cunningham, tionary synthetic biology. 1999; Omland, 1999; Pagel, 1999; Schultz and In 1963, Emile Zuckerkandl and Linus Pauling Churchill, 1999; Chang and Donoghue, 2000; published an intriguing article entitled ‘‘Chemical Thornton, 2004; Hall, 2006). These inferred ances- paleogenetics molecular restoration studies of tral gene sequences act as hypotheses that can be extinct forms of life’’ (Pauling and Zuckerkandl, tested in the laboratory through the resurrection of 1963). In it, they put forward the notion of recon- the ancestral proteins themselves (paleogenetics). structing amino acid sequences of ancestral pro- Results from functional assays of the protein prod- teins by virtue of a comparison between sequences ucts from these ancient genes permit us to accept of related proteins found in contemporary organ- or reject null hypotheses about the sequences isms and subsequent synthesis (and thereby res- themselves, or about their interactions, bind- urrection) of these sequences in the laboratory. ing specificities, environments, etc. And beyond While limits in technology prohibited the actual narratives relating ancient phenotypes and resurrection, Zuckerkandl and Pauling presented a environments, paleogenetics provides the unique sequence reconstruction of ancient mammalian 20 EXTENDING ZUCKERKANDL AND PAULING’S VISION 21 hemoglobins. The duo then suggested that a future simulations and experimental phylogenetics were resurrection of ancient hemoglobins assayed for used to differentiate accuracy and consistency dioxygen affinity and pH dependence would between distance-based (e.g. neighbor-joining) generate higher-order inferences of biological and character-based (e.g. parsimony, likelihood, interpretation. Or, more broadly, the joining of and Bayesian) approaches to phylogenetics. Fur- chemical, biological, and structural models to ther, several discussions about reconstructing natural history would provide a more accurate ancestral character states were also appearing in description of macromolecular behavior beyond the literature during this time. For instance, that supplied by studying individual molecules a symposium lead by Cunningham, Oakley, disconnected from the selective forces governing Omland, and others on behalf of the Society of their evolution (this approach is also termed Systematic Biology focused on this topic (Omland, planetary biology). 1999). Meanwhile, Pagel, Goldstein, Yang, It is important to note the significance of and others were independently developing max- Zuckerkandl and Pauling’s proposal. Molecular imum-likelihood approaches based on Bayesian reconstructions and experimental resurrections are statistics (called ML or empirical Bayesian recon- fundamentally exercises in theory/computation/ struction; Yang et al., 1995; Koshi and Goldstein, algorithmic development and technological 1996; Pupko et al., 2000). These authors advocated advances, respectively. Proposing these approa- the use of maximum likelihood over parsimony for ches would have been sufficiently significant. But inferring ancestral states (see Pagel et al., 2004, and Zuckerkandl and Pauling realized the potential of Schluter, 1995, for a criticism of previous work on creating a paradigm that attempted to connect resurrected proteins). Similarly, Ronquist, Huel- chemical structure to natural selection at the senbeck, Schultz, and others were developing molecular, cellular, organismal, population, and hierarchical Bayesian algorithms for character- planetary levels. state inferences (Schultz and Churchill, 1999; The first examples of molecular resurrections Huelsenbeck and Bollback, 2001; Ronquist, 2004). were performed on artiodactyl pancreatic ribonucle- A fuller discussion of methods for ancestral ases and lysozymes (Malcolm et al., 1990; Stack- sequence reconstruction is presented in Chapters house et al., 1990; Jermann et al., 1995; see Chapters 4–9 in this volume. 1 and 18 in this volume for further discussion). The Regardless of the growing literature, it remained former study was particularly notable in that it not unclear at the time which approach would best only put Zuckerkandl and Pauling’s theory into guide an experimental design of ancestral practice, it did so in a manner that connected sequences. Maximum likelihood’s superior per- chemical reactivity (RNA hydrolysis), molecular formance under certain models of evolution was biology (single- versus double-stranded RNA- apparent from numerous simulation studies and binding affinities), organismal evolution (bacterial laboratory evolution experiments with viruses fermentation and foregut digestion in ruminants), (Bull et al., 1993; Zhang and Nei, 1997). Hier- and planetary biology (diversification of grasses archical Bayesian methods could theoretically during the Oligocene cooling, and the ability of outperform likelihood approaches, although only artiodactyls to digest and extract nutrients from Yang’s empirical approach was available to us at these grasses via bacterial fermentation). These the time. Ideally, a hierarchical Bayesian approach connections were the first to exemplify Zuck- accounts for uncertainty from phylogenetic hypo- erkandl and Pauling’s proposed paradigm, and by theses (these include uncertainty in the topology, the late 1990s we were intent on extending the branch lengths, and any other parameter esti- paradigm as far back in history as possible— mates). A heuristic work-around, however, could ideally to the earliest life forms on Earth. take advantage of an empirical Bayesian approach Successful studies on ancestral reconstruc- in which alternative models, parameters, and tions require a sufficient base of knowledge in topologies could be analyzed separately. The out- evolutionary theory and models. Computational put from these separate analyses could then be 22 ANCESTRAL SEQUENCE RECONSTRUCTION combined with the expectation of resolving Recent discussions over which of these two uncertainty or bias arising from any of the individ- sequences introduces less bias during the recon- ual analyses alone. struction procedure have resulted in lively The heuristic hierarchical Bayesian approach exchanges, and will undoubtedly continue to do so was initially applied to the alcohol dehydrogenase for the next few years. The center issue remains (ADH) gene family (the same approach was later unresolved: under what phylogenetic and evolu- applied to elongation factors (EFs) and seminal tionary conditions will the two sequences be ribonucleases). Ancestral character states were biased and thus no longer able to capture the true inferred from the ADH phylogeny using multiple ancestral behavior/function? To what extent do models each for DNA-, codon-, and amino acid- sequence divergence, episodes of adaptive evolu- based evolution, all for two competing topologies tion (e.g. high nonsynonymous/synonymous (Thomson et al., 2005). The results from these ratios or heterotachy), heterogeneous processes, separate analyses required that 12 putative ances- etc., lead to bias during the reconstruction process? tral genes be synthesized to account for the dif- For instance, consider an ancient gene- ferences (or uncertainty) at individual sites among duplication event resulting in neofunctionalization the various analyses. Although this approach did whereby the novel behavior is determined by not eliminate all uncertainty and bias associated amino acid replacements at two sites only. Each of with the ancestral reconstruction methods, it was the two sites experiences an alanine-to-serine an improvement in the ability to capture the true replacement along the branch leading to the new ancestral state. function. Reconstructing the ancestral sequence at During our gene synthesis experiments of the node representing the common ancestor of the ancestral ADHs, Chang and colleagues had for- paralogs infers alanine at 80% and serine at 20%. mulated a similar heuristic