KENYATTA UNIVERSITY DEPARTMENT OF BIOCHEMISTRY AND BIOTECHNOLOGY SBC 471: MOLECULAR EVOLUTION AND ADVANCED BIOINFORMATICS FEBRUARY - AUGUST 2018

TOPIC 2: DESIGN OF NEW MOLECULES

(a) Introduction  The structure and properties of new molecules and molecular entities are crucial to studying the basics of underlying modes of action of identified disease targets.  The molecular entities can be small synthetic molecules, but also larger molecular architectures like peptides, proteins, DNA, RNA or membranes.  Molecular design focuses on the discovery of new molecular entities using innovative design strategies, high-throughput analysis and advanced biocatalysis and synthesis strategies.  Virtual screening and diversity-oriented synthetic strategies will facilitate the search in chemical space for appropriate scaffolds and the construction of valuable focused libraries.  Computational rationalization and prediction of biological properties of new drug candidates are based on the molecular structure and dynamics of targets. This results in more focused drug discovery and development efforts.  The design also includes research in the mechanistic approach in drug disposition, drug effects and drug toxicity to assess the efficacy and safety balance of novel molecular entities as potential drugs and therapeutics in selected disease areas.

(b) Prebiotic Chemistry  Prebiotic chemistry refers to inorganic or organic chemistry in the natural environment before the advent of life on Earth. Prebiotic (nutrition), non-digestible food ingredients.

1) The RNA world and the Origins of Life  According to this hypothesis, RNA stored both genetic information and catalyzed the chemical reactions in primitive cells.  Only later in evolutionary time did DNA take over as the genetic material and proteins became the major catalyst and structural component of cells.  Biologists used to view RNA as a lowly messenger — the molecule that carries information from DNA to the protein-building centres of the cell. But discoveries since the early 1980s have shown that RNA can do much more.  In addition to carrying genetic information, RNA can fold up into a complex structure that catalyzes a chemical reaction or binds another molecule, linking up with it in a way that allows the other molecule to be identified, activated, or deactivated.  Once it was discovered that RNA could both carry information and cause chemical reactions (like those that would be required to copy a molecule), RNA became the prime suspect for the earliest self-replicating molecule.  In fact, biologists hypothesize that early in life's history, RNA occupied centre stage and performed most jobs in the cell, storing genetic information, copying itself, and performing basic metabolic functions.  This is the "RNA world" hypothesis. Today, these jobs are performed by many different sorts of molecules (DNA, RNA, and proteins, mostly), but in the RNA world, RNA did it all.  To fully understand the processes occurring in present-day living cells, we need to consider how they arose in evolution.  The most fundamental of all such problems is the expression of hereditary information, which today requires extraordinarily complex machinery and proceeds from DNA to protein through an RNA intermediate.

Page 1 of 26

KENYATTA UNIVERSITY DEPARTMENT OF BIOCHEMISTRY AND BIOTECHNOLOGY SBC 471: MOLECULAR EVOLUTION AND ADVANCED BIOINFORMATICS FEBRUARY - AUGUST 2018

 How did this machinery arise? One view is that an RNA world existed on Earth before modern cells arose (Figure 2.1). According to this hypothesis, RNA stored both genetic information and catalyzed the chemical reactions in primitive cells.  Only later in evolutionary time did DNA take over as the genetic material and proteins become the major catalyst and structural component of cells.  If this idea is correct, then the transition out of the RNA world was never complete; as RNA still catalyzes several fundamental reactions in modern-day cells, which can be viewed as molecular fossils of an earlier world.

Figure 2.1: Time line for the universe, suggesting the early existence of an RNA world of living systems  In support of the RNA world hypothesis, the more surprising features of modern-day cells, such as the ribosome and the pre-mRNA splicing machinery, are most easily explained by viewing them as descendants of a complex network of RNA-mediated interactions that dominated cell metabolism in the RNA world.  We also discuss how DNA may have taken over as the genetic material, how the genetic code may have arisen, and how proteins may have eclipsed RNA to perform the bulk of biochemical catalysis in modern-day cells.

2) Life Requires Autocatalysis  It has been proposed that the first “biological” molecules on Earth were formed by metal- based catalysis on the crystalline surfaces of minerals.  In principle, an elaborate system of molecular synthesis and breakdown (metabolism) could have existed on these surfaces long before the first cells arose.  But life requires molecules that possess a crucial property: the ability to catalyze reactions that lead, directly or indirectly, to the production of more molecules like themselves.  Catalysts with this special self-promoting property can use raw materials to reproduce themselves and thereby divert these same materials from the production of other substances.  But what molecules could have had such autocatalytic properties in early cells? In present-day cells the most versatile catalysts are polypeptides, composed of many different amino acids with chemically diverse side chains and, consequently, able to adopt diverse three- dimensional forms that bristle with reactive chemical groups.  But, although polypeptides are versatile as catalysts, there is no known way in which one such molecule can reproduce itself by directly specifying the formation of another of precisely the same sequence.

3) Polynucleotides can both store information and catalyse chemical reactions  Polynucleotides have one property that contrasts with those of polypeptides: they can directly guide the formation of exact copies of their own sequence.  This capacity depends on complementary base pairing of nucleotide subunits, which enables one polynucleotide to act as a template for the formation of another. Such complementary

Page 2 of 26

KENYATTA UNIVERSITY DEPARTMENT OF BIOCHEMISTRY AND BIOTECHNOLOGY SBC 471: MOLECULAR EVOLUTION AND ADVANCED BIOINFORMATICS FEBRUARY - AUGUST 2018

templating mechanisms lie at the heart of DNA replication and transcription in modern-day cells.  But the efficient synthesis of polynucleotides by such complementary templating mechanisms requires catalysts to promote the polymerization reaction; without catalysts, polymer formation is slow, error-prone, and inefficient.  Today, template-based nucleotide polymerization is rapidly catalyzed by protein enzymes— such as the DNA and RNA polymerases. How could it be catalyzed before proteins with the appropriate enzymatic specificity existed?  The beginnings of an answer to this question were obtained in 1982, when it was discovered that RNA molecules themselves can act as catalysts.  For example, a molecule of RNA is the catalyst for the peptidyl transferase reaction that takes place on the ribosome. The unique potential of RNA molecules to act both as information carrier and as catalyst forms the basis of the RNA world hypothesis.  RNA therefore has all the properties required of a molecule that could catalyze its own synthesis (Figure 2.2). Although self-replicating systems of RNA molecules have not been found in nature, scientists are hopeful that they can be constructed in the laboratory.  While this demonstration would not prove that self-replicating RNA molecules were essential in the origin of life on Earth, it would certainly suggest that such a scenario is possible.

Figure 2.2: An RNA molecule that can catalyze its own synthesis  This hypothetical process in Figure 2.2 would require catalysis of the production of both a second RNA strand of complementary nucleotide sequence and the use of this second RNA molecule as a template to form many molecules of RNA with the original sequence. The projected rays (in red) represent the active site of this hypothetical RNA enzyme.

4) A pre-RNA world probably predates the RNA world  Although RNA seems well suited to form the basis for a self-replicating set of biochemical catalysts, it is unlikely that RNA was the first kind of molecule to do so.  From a purely chemical standpoint, it is difficult to imagine how long RNA molecules could be formed initially by purely nonenzymatic means.  For one thing, the precursors of RNA, the ribonucleotides, are difficult to form non- enzymatically.  Moreover, the formation of RNA requires that a long series of 3′ to 5′ phosphodiester linkages form in the face of a set of competing reactions, including hydrolysis, 2′ to 5′ linkages, 5′ to 5′ linkages, and so on.  Given these problems, it has been suggested that the first molecules to possess both catalytic activity and information storage capabilities may have been polymers that resemble RNA but are chemically simpler (Figure 2.3).

Page 3 of 26

KENYATTA UNIVERSITY DEPARTMENT OF BIOCHEMISTRY AND BIOTECHNOLOGY SBC 471: MOLECULAR EVOLUTION AND ADVANCED BIOINFORMATICS FEBRUARY - AUGUST 2018

 We do not have any remnants of these compounds in present-day cells, nor did such compounds leave fossil records.  Nonetheless, the relative simplicity of these “RNA-like polymers” make them better candidates than RNA itself for the first biopolymers on Earth that had both information storage capacity and catalytic activity.

Figure 2.3: Structures of RNA and two related information-carrying polymers  In each case, B indicates the positions of purine and pyrimidine bases. The polymer p-RNA (pyranosyl-RNA) is RNA in which the furanose (five-membered ring) form of ribose has been replaced by the pyranose (six-membered ring) form.  In PNA (peptide nucleic acid), the ribose phosphate backbone of RNA has been replaced by the peptide backbone found in proteins.  Like RNA, both p-RNA and PNA can form double helices through complementary base- pairing, and each could therefore in principle serve as a template for its own synthesis (see Figure 2.2 above).  The transition between the pre-RNA world and the RNA world would have occurred through the synthesis of RNA using one of these simpler compounds as both template and catalyst.  The plausibility of this scheme is supported by laboratory experiments showing that one of these simpler forms (PNA—see Figure 2.3) can act as a template for the synthesis of complementary RNA molecules, because the overall geometry of the bases is similar in the two molecules.  Presumably, pre-RNA polymers also catalyzed the formation of ribonucleotide precursors from simpler molecules.

Page 4 of 26

KENYATTA UNIVERSITY DEPARTMENT OF BIOCHEMISTRY AND BIOTECHNOLOGY SBC 471: MOLECULAR EVOLUTION AND ADVANCED BIOINFORMATICS FEBRUARY - AUGUST 2018

 Once the first RNA molecules had been produced, they could have diversified gradually to take over the functions originally carried out by the pre-RNA polymers, leading eventually to the postulated RNA world.

5) Single-stranded RNA molecules can fold into highly elaborate structures  Complementary base-pairing and other types of hydrogen bonds can occur between nucleotides in the same chain, causing an RNA molecule to fold up in a unique way determined by its nucleotide sequence.  Comparisons of many RNA structures have revealed conserved motifs, short structural elements that are used over and over again as parts of larger structures. Some of these RNA secondary structural motifs are illustrated in Figure 2.4.

Figure 2.4: Common elements of RNA secondary structure

 In addition, a few common examples of more complex and often longer-range interactions, known as RNA tertiary interactions, are shown in Figure 2.5. Conventional, complementary base-pairing interactions are indicated by “steps” (red) in double-helical portions of the RNA.

Figure 2.5: Examples of RNA tertiary interactions  Some of these interactions can join distant parts of the same RNA molecule or bring two separate RNA molecules together.

Page 5 of 26

KENYATTA UNIVERSITY DEPARTMENT OF BIOCHEMISTRY AND BIOTECHNOLOGY SBC 471: MOLECULAR EVOLUTION AND ADVANCED BIOINFORMATICS FEBRUARY - AUGUST 2018

 Protein catalysts require a surface with unique contours and chemical properties on which a given set of substrates can react. In exactly the same way, an RNA molecule with an appropriately folded shape can serve as an enzyme (Figure 2.6).  Like some proteins, many of these ribozymes work by positioning metal ions at their active sites. This feature gives them a wider range of catalytic activities than can be accounted for solely by the limited chemical groups of the polynucleotide chain.

Figure 2.6  This simple RNA molecule catalyzes the cleavage of a second RNA at a specific site. This ribozyme is found embedded in larger RNA —called viroids—which infect plants.  The cleavage, which occurs in nature at a distant location on the same RNA molecule that contains the ribozyme, is a step in the replication of the viroid . Although not shown in the figure, the reaction requires a molecule of Mg positioned at the active site.

Page 6 of 26

KENYATTA UNIVERSITY DEPARTMENT OF BIOCHEMISTRY AND BIOTECHNOLOGY SBC 471: MOLECULAR EVOLUTION AND ADVANCED BIOINFORMATICS FEBRUARY - AUGUST 2018

 Relatively few catalytic RNAs exist in modern-day cells, however, and much of our inference about the RNA world has come from experiments in which large pools of RNA molecules of random nucleotide sequences are generated in the laboratory.  Those rare RNA molecules with a property specified by the experimenter are then selected out and studied (Figure 2.7).  Experiments of this type have created RNAs that can catalyze a wide variety of biochemical reactions (Table 2.1), and suggest that the main difference between protein enzymes and ribozymes lies in their maximum reaction speed, rather than in the diversity of the reactions that they can catalyze.

Figure 2.7

Page 7 of 26

KENYATTA UNIVERSITY DEPARTMENT OF BIOCHEMISTRY AND BIOTECHNOLOGY SBC 471: MOLECULAR EVOLUTION AND ADVANCED BIOINFORMATICS FEBRUARY - AUGUST 2018

 Beginning with a large pool of nucleic acid molecules synthesized in the laboratory, those rare RNA molecules that possess a specified catalytic activity can be isolated and studied.  Although a specific example (that of an autophosphorylating ribozyme) is shown (Figure 2.7), variations of this procedure have been used to generate many of the ribozymes listed in Table 2.1.  During the autophosphorylation step, the RNA molecules are sufficiently dilute to prevent the “cross”-phosphorylation of additional RNA molecules.  In reality, several repetitions of this procedure are necessary to select the very rare RNA molecules with catalytic activity.  Thus the material initially eluted from the column is converted back into DNA, amplified many fold (using reverse transcriptase and PCR), transcribed back into RNA, and subjected to repeated rounds of selection.

Table 2.1: Some Biochemical Reactions That Can Be Catalyzed by Ribozymes

ACTIVITY RIBOZYMES Peptide bond formation in protein synthesis ribosomal RNA RNA cleavage, RNA ligation self-splicing RNAs; also in vitro selected RNA DNA cleavage self-splicing RNAs RNA splicing self-splicing RNAs, perhaps RNAs of the spliceosome RNA polymerizaton in vitro selected RNA RNA and DNA phosphorylation in vitro selected RNA RNA aminoacylation in vitro selected RNA RNA alkylation in vitro selected RNA Amide bond formation in vitro selected RNA Amide bond cleavage in vitro selected RNA Glycosidic bond formation in vitro selected RNA Porphyrin metalation in vitro selected RNA  Like proteins, RNAs can undergo allosteric conformational changes, either in response to small molecules or to other RNAs.  One artificially created ribozyme can exist in two entirely different conformations, each with a different catalytic activity (Figure 2.8).  Moreover, the structure and function of the rRNAs in the ribosome alone have made it clear that RNA is an enormously versatile molecule. It is therefore easy to imagine that an RNA world could reach a high level of biochemical sophistication.  This 88-nucleotide RNA (Figure 2.8 below), created in the laboratory, can fold into a ribozyme that carries out a self-ligation reaction (left) or a self-cleavage reaction (right).  The ligation reaction forms a 2′,5′ phosphodiester linkage with the release of pyrophosphate.  This reaction seals the gap (gray shading), which was experimentally introduced into the RNA molecule. In the reaction carried out by the (Hepatitis Delta Virus) HDV fold, the RNA is cleaved at this same position, indicated by the arrowhead.  This cleavage resembles that used in the life cycle of HDV, a hepatitis B satellite virus, hence the name of the fold. Each nucleotide is represented by a colored dot, with the colors used simply to clarify the two different folding patterns.

Page 8 of 26

KENYATTA UNIVERSITY DEPARTMENT OF BIOCHEMISTRY AND BIOTECHNOLOGY SBC 471: MOLECULAR EVOLUTION AND ADVANCED BIOINFORMATICS FEBRUARY - AUGUST 2018

Figure 2.8: An RNA molecule that folds into two different ribozymes  The folded structures illustrate the secondary structures of the two ribozymes with regions of base-pairing indicated by close oppositions of the colored dots. Note that the two ribozyme folds have no secondary structure in common. (Adapted from E.A. Schultes and D.P. Bartel, Science 289:448–452, 2000.)

6) Self-Replicating Molecules Undergo Natural Selection  The three-dimensional folded structure of a polynucleotide affects its stability, its actions on other molecules, and its ability to replicate. Therefore, certain polynucleotides will be especially successful in any self-replicating mixture.  Because errors inevitably occur in any copying process, new variant sequences of these polynucleotides will be generated over time.  Certain catalytic activities would have had a cardinal importance in the early evolution of life.  Consider in particular an RNA molecule that helps to catalyze the process of templated polymerization, taking any given RNA molecule as a template. (This ribozyme activity has been directly demonstrated in vitro, albeit in a rudimentary form that can only synthesize moderate lengths of RNA.)  Such a molecule, by acting on copies of itself, can replicate. At the same time, it can promote the replication of other types of RNA molecules in its neighborhood (Figure 2.9).  If some of these neighboring RNAs have catalytic actions that help the survival of RNA in other ways (catalyzing ribonucleotide production, for example), a set of different types of RNA molecules, each specialized for a different activity, may evolve into a cooperative system that replicates with unusually great efficiency.

Page 9 of 26

KENYATTA UNIVERSITY DEPARTMENT OF BIOCHEMISTRY AND BIOTECHNOLOGY SBC 471: MOLECULAR EVOLUTION AND ADVANCED BIOINFORMATICS FEBRUARY - AUGUST 2018

Figure 2. 9: A family of mutually supportive RNA molecules, one catalyzing the reproduction of the others  One of the crucial events leading to the formation of effective self-replicating systems must have been the development of individual compartments.  For example, a set of mutually beneficial RNAs (such as those of Figure 2.9) could replicate themselves only if all the RNAs were to remain in the neighborhood of the RNA that is specialized for templated polymerization.  Moreover, if these RNAs were free to diffuse among a large population of other RNA molecules, they could be co-opted by other replicating systems, which would then compete with the original RNA system for raw materials.  Selection of a set of RNA molecules according to the quality of the self-replicating systems they generated could not occur efficiently until some form of compartment evolved to contain them and thereby make them available only to the RNA that had generated them.  An early, crude form of compartmentalization may have been simple adsorption on surfaces or particles.  The need for more sophisticated types of containment is easily fulfilled by a class of small molecules that has the simple physicochemical property of being amphipathic, that is, consisting of one part that is hydrophobic (water insoluble) and another part that is hydrophilic (water soluble).  When such molecules are placed in water they aggregate, arranging their hydrophobic portions as much in contact with one another as possible and their hydrophilic portions in contact with the water.  Amphipathic molecules of appropriate shape spontaneously aggregate to form bilayers, creating small closed vesicles whose aqueous contents are isolated from the external medium (Figure 2.10).  The phenomenon can be demonstrated in a test tube by simply mixing phospholipids and water together: under appropriate conditions, small vesicles will form.

Page 10 of 26

KENYATTA UNIVERSITY DEPARTMENT OF BIOCHEMISTRY AND BIOTECHNOLOGY SBC 471: MOLECULAR EVOLUTION AND ADVANCED BIOINFORMATICS FEBRUARY - AUGUST 2018

 All present-day cells are surrounded by a plasma membrane consisting of amphipathic molecules—mainly phospholipids. This is discussed further in the origin of cells topic below.

Figure 2.10: Formation of membrane by phospholipids  Because these molecules have hydrophilic heads and lipophilic tails, they align themselves at an oil/water interface with their heads in the water and their tails in the oil.  In the water they associate to form closed bilayer vesicles in which the lipophilic tails are in contact with one another and the hydrophilic heads are exposed to the water.  Presumably, the first membrane-bounded cells were formed by the spontaneous assembly of a set of amphipathic molecules, enclosing a self-replicating mixture of RNA (or pre-RNA) and other molecules.  It is not clear at what point in the evolution of biological catalysts this first occurred. In any case, once RNA molecules were sealed within a closed membrane, they could begin to evolve in earnest as carriers of genetic instructions: they could be selected not merely on the basis of their own structure, but also according to their effect on the other molecules in the same compartment.  The nucleotide sequences of the RNA molecules could now be expressed in the character of a unitary living cell.

7) How did Protein Synthesis Evolve?  The molecular processes underlying protein synthesis in present-day cells seem inextricably complex. Although we understand most of them, they do not make conceptual sense in the way that DNA transcription, DNA repair, and DNA replication do.  It is especially difficult to imagine how protein synthesis evolved because it is now performed by a complex interlocking system of protein and RNA molecules; obviously the proteins could not have existed until an early version of the translation apparatus was already in place.  Although we can only speculate on the origins of protein synthesis and the genetic code, several experimental approaches have provided possible scenarios.  In vitro RNA selection experiments of the type summarized previously in Figure 2.7 have produced RNA molecules that can bind tightly to amino acids.

Page 11 of 26

KENYATTA UNIVERSITY DEPARTMENT OF BIOCHEMISTRY AND BIOTECHNOLOGY SBC 471: MOLECULAR EVOLUTION AND ADVANCED BIOINFORMATICS FEBRUARY - AUGUST 2018

 The nucleotide sequences of these RNAs often contain a disproportionately high frequency of codons for the amino acid that is recognized. For example, RNA molecules that bind selectively to arginine have a majority of Arg codons and those that bind tyrosine have a majority of Tyr codons.  This correlation is not perfect for all the amino acids, and its interpretation is controversial, but it raises the possibility that a limited genetic code could have arisen from the direct association of amino acids with specific sequences of RNA, with RNAs serving as a crude template to direct the non-random polymerization of a few different amino acids.  In the RNA world described previously, any RNA that helped guide the synthesis of a useful polypeptide would have a great advantage in the evolutionary struggle for survival.  In present-day cells, tRNA adaptors are used to match amino acids to codons, and proteins catalyze tRNA aminoacylation. However, ribozymes created in the laboratory can perform specific tRNA aminoacylation reactions, so it is plausible that tRNA-like adaptors could have arisen in an RNA world.  This development would have made the matching of “mRNA” sequences to amino acids more efficient, and it perhaps allowed an increase in the number of amino acids that could be used in templated protein synthesis.  Finally, the efficiency of early forms of protein synthesis would be increased dramatically by the catalysis of peptide bond formation.  This evolutionary development presents no conceptual problem since, as we have seen, this reaction is catalyzed by rRNA in present-day cells.  One can envision a crude peptidyl transferase ribozyme, which, over time, grew larger and acquired the ability to position charged tRNAs accurately on RNA templates—leading eventually to the modern ribosome.  Once protein synthesis evolved, the transition to a protein-dominated world could proceed, with proteins eventually taking over the majority of catalytic and structural tasks because of their greater versatility, with 20 rather than 4 different subunits.

8) All Present-day Cells use DNA as their Hereditary Material  The cells of the RNA world would presumably have been much less complex and less efficient in reproducing themselves than even the simplest present-day cells, since catalysis by RNA molecules is less efficient than that by proteins.  They would have consisted of little more than a simple membrane enclosing a set of self- replicating molecules and a few other components required to provide the materials and energy for their replication.  If the evolutionary speculations about RNA outlined above are correct, these early cells would also have differed fundamentally from the cells we know today in having their hereditary information stored in RNA rather than in DNA (Figure 2.11).

Page 12 of 26

KENYATTA UNIVERSITY DEPARTMENT OF BIOCHEMISTRY AND BIOTECHNOLOGY SBC 471: MOLECULAR EVOLUTION AND ADVANCED BIOINFORMATICS FEBRUARY - AUGUST 2018

Figure 2.11: The hypothesis that RNA preceded DNA and proteins in evolution  In the earliest cells, pre-RNA molecules would have had combined genetic, structural, and catalytic functions and these functions would have gradually been replaced by RNA.  In present-day cells, DNA is the repository of genetic information, and proteins perform the vast majority of catalytic functions in cells.  RNA primarily functions today as a go-between in protein synthesis, although it remains a catalyst for a number of crucial reactions.  Evidence that RNA arose before DNA in evolution can be found in the chemical differences between them.  Ribose, like glucose and other simple carbohydrates, can be formed from formaldehyde (HCHO), a simple chemical which is readily produced in laboratory experiments that attempt to simulate conditions on the primitive Earth.  The sugar deoxyribose is harder to make, and in present-day cells it is produced from ribose in a reaction catalyzed by a protein enzyme, suggesting that ribose predates deoxyribose in cells.  Presumably, DNA appeared on the scene later, but then proved more suitable than RNA as a permanent repository of genetic information.

Page 13 of 26

KENYATTA UNIVERSITY DEPARTMENT OF BIOCHEMISTRY AND BIOTECHNOLOGY SBC 471: MOLECULAR EVOLUTION AND ADVANCED BIOINFORMATICS FEBRUARY - AUGUST 2018

 In particular, the deoxyribose in its sugar-phosphate backbone makes chains of DNA chemically more stable than chains of RNA, so that much greater lengths of DNA can be maintained without breakage.  The other differences between RNA and DNA—the double-helical structure of DNA and the use of thymine rather than uracil—further enhance DNA stability by making the many unavoidable accidents that occur to the molecule much easier to repair, as discussed in detail in Chapter 5 (see pp. 269–272).

9) Summary  From our knowledge of present-day organisms and the molecules they contain, it seems likely that the development of the directly autocatalytic mechanisms fundamental to living systems began with the evolution of families of molecules that could catalyze their own replication.  With time, a family of cooperating RNA catalysts probably developed the ability to direct synthesis of polypeptides.  DNA is likely to have been a late addition: as the accumulation of additional protein catalysts allowed more efficient and complex cells to evolve, the DNA double helix replaced RNA as a more stable molecule for storing the increased amounts of genetic information required by such cells.

(c) Origin of cells  The appearance of the first cells marked the origin of life on Earth. However, before cells could form, the organic molecules must have united with one another to form more complex molecules called polymers. Examples of polymers are polysaccharides and proteins.  In the 1950s, Sidney Fox placed amino acids in primitive Earth conditions and showed that amino acids would unite to form polymers called proteinoids. The proteinoids were apparently able to act as enzymes and catalyze organic reactions.  More recent evidence indicates that RNA molecules have the ability to direct the synthesis of new RNA molecules as well as DNA molecules.  Because DNA provides the genetic code for protein synthesis, it is conceivable that DNA may have formed in the primitive Earth environment as a consequence of RNA activity. Then DNA activity could have led to protein synthesis (see Chapter 10).  For a cell to come into being, some sort of enclosing membrane is required to hold together the organic materials of the cytoplasm.  A generation ago, scientists believed that membranous droplets formed spontaneously. These membranous droplets, called protocells, were presumed to be the first cells.  Modern scientists believe, however, that protocells do not carry any genetic information and lack the internal organization of cells.  Thus the protocell perspective is not widely accepted. Several groups of scientists are currently investigating the synthesis of polypeptides and short nucleic acids on the surface of clay. The origin of the first cells remains a mystery.

1) Evolution of Cells  Cells are divided into two main classes; Prokaryotic cells (bacteria) that lack a nuclear envelope and Eukaryotic cells that have a nucleus in which the genetic material is separated from the cytoplasm.

Page 14 of 26

KENYATTA UNIVERSITY DEPARTMENT OF BIOCHEMISTRY AND BIOTECHNOLOGY SBC 471: MOLECULAR EVOLUTION AND ADVANCED BIOINFORMATICS FEBRUARY - AUGUST 2018

 Prokaryotic cells are generally smaller and simpler than Eukaryotic cells; in addition to the absence of a nucleus, their genomes are less complex and they do not contain cytoplasmic organelles or a cytoskeleton.  In spite of these differences, the same basic molecular mechanisms govern the lives of both prokaryotes and eukaryotes, indicating that all present-day cells are descended from a single primordial ancestor. How did this first cell develop? And how did the complexity and diversity exhibited by present-day cells evolve?

The First cell  It appears that life first emerged at least 3.8 billion years ago, approximately 750 million years after Earth was formed (Figure 2.12).  How life originated and how the first cell came into being are matters of speculation, since these events cannot be reproduced in the laboratory. Nonetheless, several types of experiments provide important evidence bearing on some steps of the process.  It was first suggested in the 1920s that simple organic molecules could form and spontaneously polymerize into macromolecules under the conditions thought to exist in primitive Earth's atmosphere.  At the time life arose, the atmosphere of Earth is thought to have contained little or no free oxygen, instead consisting principally of CO2 and N2 in addition to smaller amounts of gases such as H2, H2S, and CO.  Such an atmosphere provides reducing conditions in which organic molecules, given a source of energy such as sunlight or electrical discharge, can form spontaneously.  The spontaneous formation of organic molecules was first demonstrated experimentally in the 1950s, when Stanley Miller (then a graduate student) showed that the discharge of electric sparks into a mixture of H2, CH4, and NH3, in the presence of water, led to the formation of a variety of organic molecules, including several amino acids (Figure 2.13).  Although Miller's experiments did not precisely reproduce the conditions of primitive Earth, they clearly demonstrated the plausibility of the spontaneous synthesis of organic molecules, providing the basic materials from which the first living organisms arose.

Page 15 of 26

KENYATTA UNIVERSITY DEPARTMENT OF BIOCHEMISTRY AND BIOTECHNOLOGY SBC 471: MOLECULAR EVOLUTION AND ADVANCED BIOINFORMATICS FEBRUARY - AUGUST 2018

Figure 2.12: Time scale of evolution: The scale indicates the approximate times at which some of the major events in the evolution of cells are thought to have occurred.

Page 16 of 26

KENYATTA UNIVERSITY DEPARTMENT OF BIOCHEMISTRY AND BIOTECHNOLOGY SBC 471: MOLECULAR EVOLUTION AND ADVANCED BIOINFORMATICS FEBRUARY - AUGUST 2018

Figure 2.13: Spontaneous formation of organic molecules: Water vapor was refluxed through an atmosphere consisting of CH4, NH3, and H2, into which electric sparks were discharged. Analysis of the reaction products revealed the formation of a variety of organic molecules, including the amino acids alanine, aspartic acid, glutamic acid, and glycine.  The next step in evolution was the formation of macromolecules. The monomeric building blocks of macromolecules have been demonstrated to polymerize spontaneously under plausible prebiotic conditions.  Heating dry mixtures of amino acids, for example, results in their polymerization to form polypeptides.  But the critical characteristic of the macromolecule from which life evolved must have been the ability to replicate itself and further evolution.  A critical step in understanding molecular evolution was thus reached in the early 1980s, when it was discovered in the laboratories of Sid Altman and Tom Cech that RNA is capable of catalyzing a number of chemical reactions, including the polymerization of nucleotides. RNA is thus uniquely able both to serve as a template for and to catalyze its own replication (Figure 2.14).  Consequently, RNA is generally believed to have been the initial genetic system, and an early stage of chemical evolution is thought to have been based on self-replicating RNA molecules—a period of evolution known as the RNA world.

Page 17 of 26

KENYATTA UNIVERSITY DEPARTMENT OF BIOCHEMISTRY AND BIOTECHNOLOGY SBC 471: MOLECULAR EVOLUTION AND ADVANCED BIOINFORMATICS FEBRUARY - AUGUST 2018

 Ordered interactions between RNA and amino acids then evolved into the present-day genetic code, and DNA eventually replaced RNA as the genetic material.

Figure 2.14: Self-replication of RNA: Complementary pairing between nucleotides (adenine [A] with uracil [U] and guanine [G] with cytosine [C]) allows one strand of RNA to serve as a template for the synthesis of a new strand with the complementary sequence.  The first cell is presumed to have arisen by the enclosure of self-replicating RNA in a membrane composed of phospholipids (Figure 2.15). Phospholipids are the basic components of all present-day biological membranes, including the plasma membranes of both prokaryotic and eukaryotic cells.  The key characteristic of the phospholipids that form membranes is that they are amphipathic molecules, meaning that one portion of the molecule is soluble in water and another portion is not.  Phospholipids have long, water-insoluble (hydrophobic) hydrocarbon chains joined to water- soluble (hydrophilic) head groups that contain phosphate.  When placed in water, phospholipids spontaneously aggregate into a bilayer with their phosphate-containing head groups on the outside in contact with water and their hydrocarbon tails in the interior in contact with each other.

Figure 2.15: Enclosure of self-replicating RNA in a phospholipid membrane: The first cell is thought to have arisen by the enclosure of self-replicating RNA and associated molecules in a membrane composed of phospholipids. Each phospholipid molecule has two long hydrophobic tails attached to a hydrophilic head group. The hydrophobic tails are buried in the lipid bilayer; the hydrophilic heads are exposed to water on both sides of the membrane.

Page 18 of 26

KENYATTA UNIVERSITY DEPARTMENT OF BIOCHEMISTRY AND BIOTECHNOLOGY SBC 471: MOLECULAR EVOLUTION AND ADVANCED BIOINFORMATICS FEBRUARY - AUGUST 2018

 Such a phospholipid bilayer forms a stable barrier between two aqueous compartments—for example, separating the interior of the cell from its external environment.  The enclosure of self-replicating RNA and associated molecules in a phospholipid membrane would thus have maintained them as a unit, capable of self-reproduction and further evolution.  RNA-directed protein synthesis may already have evolved by this time, in which case the first cell would have consisted of self-replicating RNA and its encoded proteins.

(d) The Selfish Gene  The Selfish Gene is a 1976 book on evolution by Richard Dawkins, in which the author builds upon the principal theory of George C. Williams's Adaptation and Natural Selection (1966).  Dawkins uses the term "selfish gene" as a way of expressing the gene-centred view of evolution as opposed to the views focused on the organism and the group, popularising ideas developed during the 1960s by W. D. Hamilton and others.  From the gene-centred view, it follows that the more two individuals are genetically related, the more sense (at the level of the genes) it makes for them to behave selflessly with each other.  A lineage is expected to evolve to maximise its inclusive fitness—the number of copies of its genes passed on globally (rather than by a particular individual). As a result, populations will tend towards an evolutionarily stable strategy.  The book also coins the term meme for a unit of human cultural evolution analogous to the gene, suggesting that such "selfish" replication may also model human culture, in a different sense.  Memetics has become the subject of many studies since the publication of the book. In raising awareness of Hamilton's ideas, as well as making its own valuable contributions to the field, the book has also stimulated research on human inclusive fitness.  In the foreword to the book's 30th-anniversary edition, Dawkins said he "can readily see that [the book's title] might give an inadequate impression of its contents" and in retrospect thinks he should have taken Tom Maschler's advice and called the book The Immortal Gene.  In July 2017, The Selfish Gene was listed as the most influential science book of all time in a poll to celebrate the 30th anniversary of the Royal Society science book prize.  Dawkins builds upon George C. Williams's book Adaptation and Natural Selection (1966), which argued that altruism is not based upon group benefit per se, but is a result of selection that occurs "at the level of the gene mediated by the phenotype" and any selection at the group level occurred only under rare circumstances.  This approach was developed further during the 1960s by W. D. Hamilton and others who opposed group selection and selection aimed directly at benefit to the individual organism.  Dawkins considers DNA’s role in evolution and its organisation into chromosomes and genes, which in his view behave selfishly.

Summary  Dawkins begins by discussing the altruism that people display, indicating that he will argue it is explained by gene selfishness, and attacking group selection as an explanation. He considers the origin of life with the arrival of molecules able to replicate themselves.  From there, he looks at DNA's role in evolution, and its organisation into chromosomes and genes, which in his view behave selfishly.

Page 19 of 26

KENYATTA UNIVERSITY DEPARTMENT OF BIOCHEMISTRY AND BIOTECHNOLOGY SBC 471: MOLECULAR EVOLUTION AND ADVANCED BIOINFORMATICS FEBRUARY - AUGUST 2018

 He describes organisms as apparently purposive but fundamentally simple survival machines, which use negative feedback to achieve control. This extends, he argues, to the brain's ability to simulate the world with subjective consciousness, and signalling between species.  He then introduces the idea of the evolutionarily stable strategy, and uses it to explain why alternative competitive strategies like bullying and retaliating exist. This allows him to consider what selfishness in a gene might actually mean, describing W. D. Hamilton's argument for kin selection, that genes for behaviour that improves the survival chances of close relatives can spread in a population, because those relatives carry the same genes.  He examines childbearing and raising children as evolutionary strategies. He attacks the idea of group selection for the good of the species as proposed by V. C. Wynne-Edwards, arguing instead that each parent necessarily behaves selfishly.  A question is whether parents should invest in their offspring equally or should favour some of them, and explains that what is best for the survival of the parents' genes is not always best for individual children.  Similarly, Dawkins argues, there are conflicts of interest between males and females, but he notes that R. A. Fisher showed that the optimal sex ratio is 50:50.  He explains that this is true even in an extreme case like the harem-keeping elephant seal, where 4% of the males get 88% of copulations.  In that case, the strategy of having a female offspring is safe, as she'll have a pup, but the strategy of having a male can bring a large return (dozens of pups), even though many males live out their lives as bachelors.  Amotz Zahavi's theory of honest signalling explains stotting as a selfish act, he argues, improving the springbok's chances of escaping from a predator by indicating how difficult the chase would be.  Dawkins discusses why many species live in groups, achieving mutual benefits through mechanisms such as Hamilton's selfish herd model: each individual behaves selfishly but the result is herd behaviour.  Altruism too can evolve, as in the social insects such as ants and bees, where workers give up the right to reproduce in favour of a sister the queen; in their case, the unusual (haplodiploid) system of sex determination may have helped to bring this about, as females in a nest are exceptionally closely related.  The final chapter of the first edition introduced the idea of the meme, a culturally-transmitted entity such as a hummable tune, by analogy to genetic transmission.  Dawkins describes God as an old idea which probably arose many times, and which has sufficient psychological appeal to survive effectively in the meme pool. The second edition (1989) added two more chapters.

(e) Molecular innovations 1) Early timeline  1856-1863: Mendel studied the inheritance of traits between generations based on experiments involving garden pea plants. He deduced that there is a certain tangible essence that is passed on between generations from both parents. Mendel established the basic principles of inheritance, namely, the principles of dominance, independent assortment, and segregation.  1866: Austrian Augustinian monk Gregor Mendel's paper, Experiments on Plant Hybridization, published.

Page 20 of 26

KENYATTA UNIVERSITY DEPARTMENT OF BIOCHEMISTRY AND BIOTECHNOLOGY SBC 471: MOLECULAR EVOLUTION AND ADVANCED BIOINFORMATICS FEBRUARY - AUGUST 2018

 1869: Friedrich Miescher discovers a weak acid in the nuclei of white blood cells that today we call DNA. In 1871 he isolated cell nuclei, separated the nucleic cells from bandages and then treated them with pepsin (an enzyme which breaks down proteins). From this, he recovered an acidic substance which he called "nuclein."  1880-1890: Walther Flemming, Eduard Strasburger, and Edouard Van Beneden elucidate chromosome distribution during cell division  1889: Richard Altmann purified protein free DNA. However, the nucleic acid was not as pure as he had assumed. It was determined later to contain a large amount of protein.  1889: Hugo de Vries postulates that "inheritance of specific traits in organisms comes in particles", naming such particles "(pan)genes"  1902: Archibald Garrod discovered inborn errors of metabolism. An explanation for epistasis is an important manifestation of Garrod’s research, albeit indirectly. When Garrod studied alkaptonuria, a disorder that makes urine quickly turn black due to the presence of gentesate, he noticed that it was prevalent among populations whose parents were closely related.  1903: Walter Sutton and Theodor Boveri independently hypothesizes that chromosomes, which segregate in a Mendelian fashion, are hereditary units; see the chromosome theory. Boveri was studying sea urchins when he found that all the chromosomes in the sea urchins had to be present for proper embryonic development to take place. Sutton's work with grasshoppers showed that chromosomes occur in matched pairs of maternal and paternal chromosomes which separate during meiosis. He concluded that this could be "the physical basis of the Mendelian law of heredity."  1905: William Bateson coins the term "genetics" in a letter to Adam Sedgwick (Zoologist) and at a meeting in 1906.  1908: G.H. Hardy and Wilhelm Weinberg proposed the Hardy-Weinberg equilibrium model which describes the frequencies of alleles in the gene pool of a population, which are under certain specific conditions, as constant and at a state of equilibrium from generation to generation unless specific disturbing influences are introduced.  1910: Thomas Hunt Morgan shows that genes reside on chromosomes while determining the nature of sex-linked traits by studying Drosophila melanogaster. He determined that the white-eyed mutant was sex-linked based on Mendelian's principles of segregation and independent assortment.  1911: Alfred Sturtevant, one of Morgan's students, invented the procedure of linkage mapping which is based on the frequency of recombination.  1913: Alfred Sturtevant makes the first genetic map of a chromosome.  1913: Gene maps show chromosomes containing linear arranged genes.  1918: Ronald Fisher publishes "The Correlation Between Relatives on the Supposition of Mendelian Inheritance" the modern synthesis of genetics and evolutionary biology starts.  1920: Lysenkoism Started, during Lysenkoism they stated that the hereditary factor are not only in the nucleus, but also in the cytoplasm, though they called it living protoplasm.  1923: Frederick Griffith studied bacterial transformation and observed that DNA carries genes responsible for pathogenicity.  1928: Frederick Griffith discovers that hereditary material from dead bacteria can be incorporated into live bacteria.  1930s–1950s: Joachim Hämmerling conducted experiments with Acetabularia in which he began to distinguish the contributions of the nucleus and the cytoplasm substances (later discovered to be DNA and mRNA, respectively) to cell morphogenesis and development.

Page 21 of 26

KENYATTA UNIVERSITY DEPARTMENT OF BIOCHEMISTRY AND BIOTECHNOLOGY SBC 471: MOLECULAR EVOLUTION AND ADVANCED BIOINFORMATICS FEBRUARY - AUGUST 2018

 1931: Crossing over is identified as the cause of recombination; the first cytological demonstration of this crossing over was performed by Barbara McClintock and Harriet Creighton  1933: Jean Brachet, while studying virgin sea urchin eggs, suggested that DNA is found in cell nucleus and that RNA is present exclusively in the cytoplasm. At the time, "yeast nucleic acid" (RNA) was thought to occur only in plants, while "thymus nucleic acid" (DNA) only in animals. The latter was thought to be a tetramer, with the function of buffering cellular pH.  1933: Thomas Morgan received the Nobel prize for linkage mapping. His work elucidated the role played by the chromosome in heredity.  1941: Edward Lawrie Tatum and George Wells Beadle show that genes code for proteins.  1943: Luria–Delbrück experiment: this experiment showed that genetic mutations conferring resistance to arise in the absence of selection, rather than being a response to selection.

2) The DNA era  1944: The Avery–MacLeod–McCarty experiment isolates DNA as the genetic material (at that time called transforming principle).  1947: Salvador Luria discovers reactivation of irradiated phage, stimulating numerous further studies of DNA repair processes in bacteriophage, and other organisms, including humans.  1948: Barbara McClintock discovers transposons in maize.  1950: Erwin Chargaff determined the pairing method of nitrogenous bases. Chargaff and his team studied the DNA from multiple organisms and found three things (also known as Chargaff's rules). First, the concentration of the pyrimidines (guanine and adenine) are always found in the same amount as one another. Second, the concentration of purines (cytosine and thymidine) are also always the same. Lastly, Chargaff and his team found the proportion of pyrimidines and purines correspond each other.  1952: The Hershey–Chase experiment proves the genetic information of phages (and, by implication, all other organisms) to be DNA.  1952: an X-ray diffraction image of DNA taken by Raymond Gosling in May 1952, a student supervised by Rosalind Franklin.  1953: DNA structure is resolved to be a double helix by Rosalind Franklin, James Watson and Francis Crick.  1955: Alexander R. Todd determined the chemical makeup of nitrogenous bases. Todd also successfully synthesized adenosine triphosphate (ATP) and flavin adenine dinucleotide (FAD) . He was awarded the Nobel prize in Chemistry in 1957 for his contributions in the scientific knowledge of nucleotides and nucleotide co-enzymes.  1955: Joe Hin Tjio, while working in Albert Levan's lab, determined the number of chromosomes in humans to be of 46. Tjio was attempting to refine an established technique to separate chromosomes onto glass slides by conducting a study of human embryonic lung tissue, when he saw that there were 46 chromosomes rather than 48. This revolutionized the world of cytogenetics.  1957: Arthur Kornberg with Severo Ochoa synthesized DNA in a test tube after discovering the means by which DNA is duplicated . DNA polymerase 1 established requirements for in vitro synthesis of DNA. Kornberg and Ochoa were awarded the Nobel Prize in 1959 for this work.  1957/1958: Robert W. Holley, Marshall Nirenberg, proposed the nucleotide sequence of the tRNA molecule. Francis Crick had proposed the requirement of

Page 22 of 26

KENYATTA UNIVERSITY DEPARTMENT OF BIOCHEMISTRY AND BIOTECHNOLOGY SBC 471: MOLECULAR EVOLUTION AND ADVANCED BIOINFORMATICS FEBRUARY - AUGUST 2018

some kind of adapter molecule and it was soon identified by Holey, Nirenberg and Khorana. These scientists help explain the link between a messenger RNA nucleotide sequence and a polypeptide sequence. In the experiment, they purified tRNAs from yeast cells and were awarded the Nobel prize in 1968.  1958: The Meselson–Stahl experiment demonstrates that DNA is semiconservatively replicated.  1960: Jacob and collaborators discover the operon, a group of genes whose expression is coordinated by an operator.  1961: Francis Crick and Sydney Brenner discovered frame shift mutations. In the experiment, proflavin-induced mutations of the T4 bacteriophage gene (rIIB) were isolated. Proflavin causes mutations by inserting itself between DNA bases, typically resulting in insertion or deletion of a single base pair. The mutants could not produce functional rIIB protein. These mutations were used to demonstrate that three sequential bases of the rIIB gene’s DNA specify each successive amino acid of the encoded protein. Thus the genetic code is a triplet code, where each triplet (called a codon) specifies a particular amino acid.  1961: Sydney Brenner, Francois Jacob and Matthew Meselson identified the function of messenger RNA.  1961-1967: Combined efforts of scientists "crack" the genetic code, including Marshall Nirenberg, Har Gobind Khorana, Sydney Brenner & Francis Crick.  1964: Howard Temin showed using RNA viruses that the direction of DNA to RNA transcription can be reversed 1964: Lysenkoism Ended.  1966: Marshall W. Nirenberg, Philip Leder, Har Gobind Khorana cracked the genetic code by using RNA homopolymer and heteropolymer experiments, through which they figured out which triplets of RNA were translated into what amino acids in yeast cells.  1969: Molecular hybridization of radioactive DNA to the DNA of cytological preparation. By Pardue, M. L. and Gall, J. G.  1970: Restriction enzymes were discovered in studies of a bacterium, Haemophilus influenzae, by Hamilton O. Smith and Daniel Nathans, enabling scientists to cut and paste DNA.  1972: Stanley Norman Cohen and Herbert Boyer at UCSF and Stanford University constructed Recombinant DNA which can be formed by using restriction Endonuclease to cleave the DNA and DNA ligase to reattach the "sticky ends" into a bacterial plasmid.

3) The era  1972: Walter Fiers and his team were the first to determine the sequence of a gene: the gene for bacteriophage MS2 coat protein.  1976: Walter Fiers and his team determine the complete nucleotide-sequence of bacteriophage MS2-RNA  1976: Yeast genes expressed in E. coli for the first time.  1977: DNA is sequenced for the first time by Fred Sanger, Walter Gilbert, and Allan Maxam working independently. Sanger's lab sequence the entire genome of bacteriophage Φ-X174.  In the late 1970s: nonisotopic methods of nucleic acid labeling were developed. The subsequent improvements in the detection of reporter molecules using immunocytochemistry and immunofluorescence,in conjunction with advances in fluorescence microscopy and image analysis, have made the technique safer, faster and reliable  1980: Paul Berg, Walter Gilbert and developed methods of mapping the structure of DNA. In 1972, recombinant DNA molecules were produced in Paul Berg’s

Page 23 of 26

KENYATTA UNIVERSITY DEPARTMENT OF BIOCHEMISTRY AND BIOTECHNOLOGY SBC 471: MOLECULAR EVOLUTION AND ADVANCED BIOINFORMATICS FEBRUARY - AUGUST 2018

Stanford University laboratory. Berg was awarded the 1980 Nobel Prize in Chemistry for constructing recombinant DNA molecules that contained phage lambda genes inserted into the small circular DNA mol.  1980: Stanley Norman Cohen and Herbert Boyer received first U.S. patent for gene cloning, by proving the successful outcome of cloning a plasmid and expressing a foreign gene in bacteria to produce a "protein foreign to a unicellular organism." These two scientist were able to replicate proteins such as HGH, Erythropoietin and Insulin. The patent earned about $300 million in licensing royalties for Stanford.  1982: The U.S. Food and Drug Administration (FDA) approved the release of the first genetically engineered human insulin, originally biosynthesized using recombination DNA methods by Genentech in 1978. Once approved, the cloning process lead to mass production of humulin (under license by Eli Lilly & Co.).  1983: Kary Banks Mullis invents the polymerase chain reaction enabling the easy amplification of DNA.  1983: Barbara McClintock was awarded the Nobel Prize in Physiology or Medicine for her discovery of mobile genetic elements. McClintock studied transposon-mediated mutation and chromosome breakage in maize and published her first report in 1948 on transposable elements or transposons. She found that transposons were widely observed in corn, although her ideas weren't widely granted attention until the 1960s and 1970s when the same phenomenon was discovered in bacteria and Drosophila melanogaster.  1985: Alec Jeffreys announced DNA fingerprinting method. Jeffreys was studying DNA variation and the evolution of gene families in order to understand disease causing genes. In an attempt to develop a process to isolate many mini-satellites at once using chemical probes, Jeffreys took x-ray films of the DNA for examination and noticed that mini-satellite regions differ greatly from one person to another. In a DNA fingerprinting technique, a DNA sample is digested by treatment with specific nucleases or Restriction endonuclease and then the fragments are separated by electrophoresis producing a template distinct to each individual banding pattern of the gel.  1986: Jeremy Nathans found genes for color vision and color blindness, working with David Hogness, Douglas Vollrath and Ron Davis as they were studying the complexity of the retina.  1987: Yoshizumi Ishino accidentally discovers and describes part of a DNA sequence which later will be called CRISPR.  1989: Thomas Cech discovered that RNA can catalyze chemical reactions, making for one of the most important breakthroughs in molecular genetics, because it elucidates the true function of poorly understood segments of DNA.  1989: The human gene that encodes the CFTR protein was sequenced by Francis Collins and Lap-Chee Tsui. Defects in this gene cause cystic fibrosis.  1992: American and British scientists unveiled a technique for testing embryos in-vitro (Amniocentesis) for genetic abnormalities such as Cystic fibrosis and Hemophilia.  1993: Phillip Allen Sharp and Richard Roberts awarded the Nobel Prize for the discovery that genes in DNA are made up of introns and exons. According to their findings not all the nucleotides on the RNA strand (product of DNA transcription) are used in the translation process. The intervening sequences in the RNA strand are first spliced out so that only the RNA segment left behind after splicing would be translated to polypeptides.  1994: The first breast cancer gene is discovered. BRCA I, was discovered by researchers at the King laboratory at UC Berkeley in 1990 but was first cloned in 1994. BRCA II, the second

Page 24 of 26

KENYATTA UNIVERSITY DEPARTMENT OF BIOCHEMISTRY AND BIOTECHNOLOGY SBC 471: MOLECULAR EVOLUTION AND ADVANCED BIOINFORMATICS FEBRUARY - AUGUST 2018

key gene in the manifestation of breast cancer was discovered later in 1994 by Professor Michael Stratton and Dr. Richard Wooster.  1995: The genome of bacterium Haemophilus influenzae is the first genome of a free living organism to be sequenced.  1996: Saccharomyces cerevisiae , a yeast species, is the first eukaryote genome sequence to be released.  1996: Alexander Rich discovered the Z-DNA, a type of DNA which is in a transient state, that is in some cases associated with DNA transcription. The Z-DNA form is more likely to occur in regions of DNA rich in cytosine and guanine with high salt concentrations.  1997: Dolly the sheep was cloned by Ian Wilmut and colleagues from the Roslin Institute in Scotland.  1998: The first genome sequence for a multicellular eukaryote, Caenorhabditis elegans, is released.  2000: The full genome sequence of Drosophila melanogaster is completed.  2001: First draft sequences of the human genome are released simultaneously by the Human Genome Project and Celera Genomics.  2001: Francisco Mojica and Rudd Jansen propose the acronym CRISPR to describe a family of bacterial DNA sequences that can be used to specifically change genes within organisms.  2003 (14 April): Successful completion of Human Genome Project with 99% of the genome sequenced to a 99.99% accuracy.  2004: Merck introduced a vaccine for Human Papillomavirus which promised to protect women against infection with HPV 16 and 18, which inactivates tumor suppressor genes and together cause 70% of cervical cancers.  2007: Michael Worobey traced the evolutionary origins of HIV by analyzing its genetic mutations, which revealed that HIV infections had occurred in the United States as early as the 1960s.  2007: Timothy Ray Brown becomes the first person cured from HIV/AIDS through a Hematopoietic stem cell transplantation.  2008: Houston-based Introgen developed Advexin (FDA Approval pending), the first gene therapy for cancer and Li-Fraumeni syndrome, utilizing a form of Adenovirus to carry a replacement gene coding for the p53 protein.  2010: transcription activator-like effector nucleases (or TALENs) are first used to cut specific sequences of DNA.  2016: A genome is sequenced in outer space for the first time, with NASA astronaut Kate Rubins using a MinION device aboard the International Space Station.

4) Clinical Applications of Molecular Genetic Discoveries 1. Genetic testing

 Best applied in familial setting and through trio or cascade screening.  Best applied in single gene disorders with Mendelian patterns of inheritance.  Preclinical diagnosis in familial setting enabling interventions to reduce the risk.  Unambiguous assignment of causality is almost impossible in a single individual in most situations.  Limited applications in complex phenotypes, whether as a single variant or as a genetic risk score.

Page 25 of 26

KENYATTA UNIVERSITY DEPARTMENT OF BIOCHEMISTRY AND BIOTECHNOLOGY SBC 471: MOLECULAR EVOLUTION AND ADVANCED BIOINFORMATICS FEBRUARY - AUGUST 2018

2. Accurate diagnosis and disease classification

 Distinction of the disease from phenocopy conditions, which enables implementation of proper treatment.  Possible genetic-based categorization of the disease, which might enable rendering appropriate therapy.

3. Prognostication  With few exceptions, limited impact beyond the clinical means and often unreliable.

4. Response to therapy

 Early applications in cancer therapy.  Potential for individualized therapy and prevention of side effects, albeit not proven to be superior to conventional medical approaches.

5. Insight into fundamental mechanism(s) of disease  This is likely the primacy of the genetic data, enabling discovery of new mechanisms and targets for prevention and treatment.

Page 26 of 26