<<

An Introduction to Synthetic and iGEM

UNBC-Canada iGEM 2017

TABLE OF CONTENTS

1. INTRO TO BACTERIAL AND MICROBIOLOGY ...... 2 2. & ...... 4 3. ANTIBIOTICS & ANTIBIOTIC RESISTANCE ...... 7 4.NUCLEIC ACIDS & PLASMID DESIGN ...... 8 5. TRANSFORMATION, TRANSDUCTION & TRANSFECTION ...... 13 6. DIGESTION & LIGATION REACTIONS ...... 15 7. PCR & AGAROSE GELS ...... 17 8. CRISPR, GENE EDITING, & GENE SILENCING TECHNOLOGY ...... 19 9. DNA SEQUENCING TECHNOLOGIES ...... 21 10. APPLICATIONS OF & IGEM ...... 23 LIST OF PHOTO REFERENCES ...... 25

1 Chapter 1. Intro to Bacterial Genetics & Microbiology

As far as goes, are quite simple to understand. This simplicity arises from the fact that, in most cases, what you see is what you get with bacteria. Simply put, there are significantly less variables to consider when working in bacterial systems in the lab when compared to . Some examples include by binary fission, ease of , and the natural tendency of bacteria to interact with each other. An extremely important component of bacterial DNA is circular chromosomes called plasmids. As you will learn, these are some of the most important tools utilized in synthetic biology. Firstly, bacteria reproduce to form exact clones of them when expanding and colonizing an area (for example, a Petri dish). This is very convenient for us as synthetic biologists, as we can grow large colonies of bacteria that all have the same characteristics we are interested in. For example, we can engineer bacteria to produce a certain or have a certain gene, grow a few litres of culture (volume of bacteria colonies), and isolate the protein or DNA of interest from the culture in large amounts to work with later. In fact, this is how synthetic insulin is produced for the treatment of diabetes. The fact that gene expression is very straightforward in bacteria, compared to their eukaryotic counterparts, is perhaps the reason that biotechnology has been so successful up to this point. Bacteria do not possess nearly as many regulatory elements and processing proteins as eukaryotes. Also, they simultaneously transcribe messenger RNA (mRNA) and translate that mRNA into protein, which saves post-transcriptional processing steps that modify the mRNA and complicate things. It is very simple to simply “plug and play” with most bacteria, and various bacterial strains are actually optimized for this ease of expression; examples include DH5α and Rosetta pLysS, both strains of E. coli optimized for plasmid propagation and protein expression, respectively. What is meant by “plug and play” is that a gene of interest can be introduced to a bacterium and expressed soon thereafter; typically, these genes are introduced in plasmids. For example, a plasmid containing the reporter GFP (Green Fluorescent Protein) can be introduced to a bacterium and cause the subsequent colony to glow green. In fact, this procedure is so easy a five year old could do it, providing they know proper pipetting technique. Lastly, understanding interactions between bacterium is important for understanding some of the processes that could account for error in the lab. Bacteria can share or obtain DNA through the processes of conjugation and/or transformation. Conjugation is the fancy way of saying “bacterial sex.” When we think of sexual reproduction, we typically recall two haploid gametes coming together and sharing genes that have been “mixed” by the process of crossing over. However, in the bacterial system, this is far from what happens, so we will be referring to it as the less-racy “conjugation” or “horizontal gene transfer (HGT)”. In this process, bacteria share plasmid DNA. In short, there are genes present on some plasmids present in bacteria that “hijack” the bacterium’s machinery in order to provide copies of itself to neighbouring bacteria. The plasmid simultaneously replicates itself and feeds the copy through a protein channel into the neighbouring ; this process typically takes about 2 hours. Incidentally, this process is

2 responsible for the propagation of antibiotic resistance genes, which will be covered in Chapter 3. Transformation is not as specific as conjugation; transformation is the process where bacteria take in DNA from their surroundings. You can think of transformation as being analogous to levelling up in a video game and acquiring a new ability, in that if the DNA turns out to be beneficial, the will hold onto it. Examples include antibiotic resistance genes, or genes that make the organism either more efficient or able to outcompete their neighbours. In a natural system, transformation works through an array of proteins embedded in the outermost membrane of the bacteria. These proteins and bind to the DNA then create a pore in the membrane to allow entry of the DNA into the cell. In the lab, this technique is absolutely essential to genetically modifying a bacterium or eukaryotic cell such as human cells or yeast; this process will be covered in-depth in Chapter 5. In the lab, we grow bacteria in a liquid or solid media. Media is, essentially, the food we provide to the bacteria to grow optimally. Luria Broth (LB), for example, is the best choice for growing E. coli; LB contains essential salts and proteins required for growth. There are many different kinds of media that have ingredients specifically tailored to what bacteria you would like to grow in that media. After cells are introduced to media and begin to grow, it is collectively referred to as “cell culture”, or simply “culture”. The media for growing cells in a Petri dish contains a certain amount of agar. Agar is a compound derived from the sugar agarose, which is produced by some algae and harvested. The purpose of including agar in your media is so that, when you pour the liquid media into the Petri dish, it will solidify and give the cells a surface to adhere to, which is necessary for them to form colonies (collections of bacteria bound together). Interestingly, agar is the ingredient in Jell-O that turns in into a low-density solid. Liquid media, or “broth culture”, is simply the same media you would use in a Petri dish without the agar. This type of media is useful to use when you want to grow massive quantities of cells to isolate DNA or protein from. One important step that should not be overlooked is called autoclaving. Autoclaving is the process of using very high temperatures (121°C) and pressure to sterilize your media; this is essential, as you know you are only growing your bacteria of interest when you introduce your cells, known as inoculation. There are many ways to grow bacteria in the lab, and the method you choose depends on the nutritional type of your bacteria, as well as whether or not they require oxygen (aerobic) or no oxygen (anaerobic). For aerobic bacteria, such as E. coli, broth cultures and stationary phase cultures (in a Petri dish) are the norm. For anaerobic bacteria, such as Porphyromonas gingivalis, the bacterium responsible for gingivitis, a deep or a slant is commonly used. These two involve making a solid media, as with a Petri dish, in the bottom of a test tube, then inoculating the bacteria using a needle to ensure they are at the bottom of the media, where no oxygen is present.

3 Concept Check

1. What makes bacteria cells easier to study than human cells?

2. Describe a plasmid.

3. Compare and contrast transformation and conjugation.

Lab scenario fill-in-the-blank: How would you grow E. coli to purify a protein?

E. coli is an ______bacteria, meaning it (does/does not) require oxygen.

Therefore, I want to eventually make a ______culture using ______media. First, I will sterilize my media by the process of ______. Then, I will grow them in a Petri dish, select one colony after they grow overnight, and

______into ______.

Chapter 2. Proteins & Enzymes

Before we can properly talk about proteins, we must first establish a basic understanding of what is known as the Central Dogma of Molecular Biology. The Central Dogma governs how we understand life itself, and the role of DNA, RNA, and proteins in what makes what. As we know, DNA is the main genetic material, which can be thought of as the blueprint for an organism. Protein is the final product of the blueprint and gives cells their . It is important to understand the process lay out by the Central Dogma, as we will revisit it numerous times.

4 Proteins are macromolecules that perform every function imaginable inside cells. They are made up of a sequence of amino acids, which are carbon and nitrogen based units that link together to ultimately form a protein. The key parts constituting an are an N-terminus (N=nitrogen), C-terminus (C=carbon), and an R group (R=“the rest”) on the alpha carbon, the carbon adjacent to the carboxylic acid (figure 1). These units linked together form what is known as the “primary ” of a protein, which can be thought of as a linear sequence of amino acids; the N-terminus of one amino acid links to the C-terminus of the adjacent amino acid, forming a sort of backbone, with R groups contributing to the chemical properties of the entire entity. It is these R groups that determine what the protein will bind to, be it DNA, RNA, other proteins, or organic molecules, such as antibiotics. Primary structure then, through different molecular forces, twists and folds based on favourable interactions with other amino acids to form that constitute the “secondary structure”. There are two main “motifs”, or common ways that proteins fold into secondary structure. The first, the α-helix (pronounced “alpha helix”), is shaped sort of like the shocks on a car, with the R groups all pointing outwards. This motif is very common in proteins that cross cell membranes. The next most common motif is the β-sheet (pronounced “beta- sheet”). β-sheets are the result of primary strucures folding back on themselves multiple times, analogous to folding a piece of paper many times over. These are very common in structural proteins, and they commonly come together to form other strucures, such as a β-barrel, which is a pourous hole in the membrane to allow for DNA or other macromolecule entry, for example.

Figure 11. The basic structure of an amino acid; the N-terminal amino group shown on the left, the C-terminal carboxyl group on the right, and the alpha carbon & R group central.

Collections of secondary structure elements are joined together via β-turns to create “tertiary structure”. Tertiary structure is absolutely necessary for enzymatic activity, as the structure creates pockets inside the protein for reactions to occur. Some large proteins, such as Hemoglobin (the protein responsible for transporting oxygen in the blood), have collections of tertiary structure, which collectively form quaternary structure. Each portion of tertiary structure is referred to as a “subunit”, or “”; if these subunits are the same, the protein is known as a homomer, and if they are different, a heteromer. Therefore, a heterotrimeric protein would have quaternary structure composed of three different ,

5 and a homotetrameric protein would be composed of four of the same monomers. Hemoglobin is a homotetramer. Besides structural importance, proteins speed up and/or perform many chemical reactions in the cell; these proteins are known as enzymes. Within the human body, different cell types express a smattering of different enzymes that are specific for the function of that cell. In bacteria, every cell produces the same set of enzymes typical to that organism. Enzymes act as catalysts, which, essentially, means they speed up the rate of a reaction that would typically not occur under normal conditions and are not used up in the process. Enzymes have a few typical modes of action: they can alter the pH of an isolated pocket to promote a chemical reaction, bring two molecules closer together to increase the chance of a reaction, or physically bind to a to change its conformation (shape) to lower the activation energy for a reaction. An important note is that most names end in the suffix “-ase”, as in ATP synthase, which synthesizes ATP. More examples of protein function includes modifying DNA, generating ATP for energy, and ensuring other proteins being made fold into their tertiary structure properly. There are enzymes that degrade foreign DNA, known as restriction enzymes. There are also enzymes that join DNA fragments together, called ligase. Some enzymes create RNA from DNA, and vice-versa. Still other proteins make proteins from RNA, which is what we know as the ribosome. The list continues indefinitely.

Concept Check

1. Describe how a protein is made from basic amino acid units, all the way until quaternary structure.

2. List 3 ways enzymes catalyze chemical reactions and define a catalyst.

6 Chapter 3: Antibiotics & Antibiotic Resistance

Preluding our talk on the ever-important topic of antibiotic resistance, we must first discuss antibiotics. Antibiotics are, by definition, organic small molecules that either kill or inhibit the growth of bacteria. Think back to the last time you have had to take antibiotics; penicillin may be a familiar sounding antibiotic. Penicillin is a perfect example of an antibiotic; it was the first to be discovered in 1928 by Alexander Fleming. Penicillin, as with many antibiotics, originates from fungi. Penicillin works by inhibiting the amino acid cross-linkages from forming between sugars in the cell wall of Gram positive bacteria. Without a cell wall, these bacteria are exposed directly to the environment, and cannot survive. Since then, as with many antibiotics, they have been improved upon via organic chemists. Staying with our example of penicillin, it has been improved upon to produce ampicillin and methicillin, both much stronger forms of the drug. For lab purposes, penicillin is quite expensive, therefore ampicillin is generally used. Methicillin is becoming increasingly common in treating bacterial infections in a hospital setting, however many bacteria are becoming resistant to it (as is the story with penicillin and ampicillin), notably Methicillin-resistant Staphylococcus aureus. Other lab-use antibiotics, such as chloramphenicol and kanamycin, work differently than penicillin, as penicillin and its relatives only work against Gram positive bacteria. Chloramphenicol, for example, prevents of mRNA into protein, thus rendering the bacterium non-viable. Kanamycin works similarly, however it differs in the fact that it doesn’t completely inhibit translation, however it causes the ribosome to read the mRNA improperly, thus leading to non-functional proteins. At the heart of genetic engineering and biotechnology is our ability to introduce DNA into that code for proteins or enzymes that we are interested in, typically by transformation of a plasmid. One of the most typical examples you will encounter if you choose to do lab work is antibiotic selection. Plasmids contain an antibiotic resistance gene that allows selection of successful “clones” via growing the bacteria in the presence of antibiotics; bacteria that contain the plasmid will be able to grow, because they have the DNA to code for proteins that give them antibiotic resistance. These proteins are almost always enzymes, except in the case of PBP2A (Penicillin Binding Protein 2A, the protein responsible for Methicillin resistance). For example, some plasmids contain the gene Chloramphenicol Acetyltransferase, which detoxifies chloramphenicol, or β- lactamase, which metabolizes penicillin and its relatives. In the face of the antibiotic resistance crisis in healthcare today, you can rest assured that this process is completely safe and does not generate “superbugs” or anything of the like.

7 Concept Check

1. Compare and contrast Chloramphenicol and Kanamycin.

Clonal selection fill-in-the-blank: How do you know you have successful clones?

You introduce a plasmid with an ampicillin resistance factor, which codes for

the protein ______, into a culture of E. coli. To know whether

or not you were successful, you grow the E. coli in a (broth/Petri dish)

containing ______. If you were successful, you should see

______.

Chapter 4: Nucleic Acids and Plasmid Design

So far, we have talked a lot about DNA, RNA, and plasmids. At this point, we all know a bit about the structure of DNA and RNA, and the purpose of both. We also know that plasmids, although important in nature, are essential for you as budding synthetic biologists. Now, we may begin our in-depth discussion on nucleic acids. Let us start by defining a : a nitrogenous base, a ribose sugar, and a phosphate group constitute a nucleotide. A nucleoside is all of the above, without the phosphate group. DNA is the blueprint for all the proteins, and therefore, functions, of each cell in every living organism. Important structural notes include: double stranded antiparallel helical structure, nitrogenous bases pointing inwards and hydrogen bonding with one another, and a phosphate backbone comprising the outskirts of the helix (figure 2a). Antiparallel refers to the fact that one strand of DNA runs the opposite way that the other strand does. By convention, we call one strand the 5’ – 3’ coding strand (read 5 prime – 3 prime), and the other is the 3’ – 5’ template strand. 5’ and 3’ refer to what hydroxyl group is exposed from the ribose sugar, which depends on the orientation of the strand. The reasoning behind these names

8 is that the coding strand is exactly the strand that will be turned into an mRNA, whereas the template strand is the strand read by RNA polymerase to create the mRNA (don’t worry if this is confusing; some 4th year undergraduate students still have a problem with this). If you recall from , phosphate groups are negatively charged, giving DNA a massively negative overall charge; the linkages between phosphates create what is known as a “phosphodiester backbone”. The four nitrogenous bases most commonly found in DNA are adenine (A) and guanine (G) (the purine bases), and cytosine (C) and thymine (T) (the pyrimidine bases; figure 2b). In a set of rules known as Chargaff’s rules (after Erwin Chargaff, whom discovered this rules), we know that A always pairs with T, and G always pairs with C. Therefore, by inference, we know that the amount of A will always equal the amount of T in an organisms , as with G and C; this is what is known as “complementary base pairing”. These rules become very important when discussing the genetics of bacteria, as there are groups with higher GC content, such as all species of Staphylococcus. You may be wondering what the importance of that is; after all, DNA is DNA, right? Well, DNA with a higher GC content has a greater propensity to “stick together”. This is a direct result of the hydrogen bonding difference between AT base pairs and GC base pairs. A has one hydrogen bond donor and one acceptor while T has the opposite, for a total of two hydrogen bonds per AT . In contrast, G has two hydrogen bond donors and one acceptor, and C has the complimentary number, resulting in three hydrogen bonds per base pair. Over a sequence of millions of bases, one can see how that extra hydrogen bond would significantly contribute to greater overall stability of the genome. Except when otherwise stated, DNA is linear and organized either in chromosomes in eukaryotes or clumped in the nucleoid region of bacteria, since bacteria lack true nuclei.

Figure 2. a2) The structure of DNA, showing the phosphodiester backbone and nitrogenous bases. b3) Hydrogen bonding between the two base pairs, AT and GC.

9

RNA, although theoretically similar to DNA, is quite different in terms of and structure. The most notable differences are: RNA is single stranded, the nitrogenous base Uracil (U) is used instead of T, and the ribose sugar has all of its hydroxyl groups (hence, ribonucleic acid vs. deoxyribonucleic acid; figure 3a). Both of these aspects significantly reduce the stability of RNA. However, this is fantastic for cells to exploit; in transcribing DNA into mRNA, the mRNA can be degraded much faster than the DNA, which adds a controlling mechanism to gene expression. It would be highly detrimental to cells if an RNA was constantly being translated into protein, as it is both energy inefficient and simply not needed if the cell doesn’t need to be expressing that protein at that time. Furthermore, the single- stranded nature of RNA allows for modification in eukaryotes. For example, a poly-A tail is added to prevent premature degradation before the mRNA exits into the . Another example is known as pre-mRNA splicing, in which non-coding regions (introns) of RNA are removed from the mRNA; depending on which introns are removed, the same piece of mRNA can code for many different types of proteins. Structurally, the single stranded nature of RNA means that, if there are regions that would form complimentary base pairs (i.e., GC or AU), then the RNA will fold back on itself to form what are known as “hairpin structures” (figure 3b). These can contribute to various characteristics of the RNA, such as shape and catalytic activity. The catalytic activities of some are comparable to enzymes. In fact, a major part of the ribosome is catalytic RNA. One final important note about RNA is that, in bacteria, mRNA is translated into protein at the same time it is being transcribed from DNA. This allows for ease of translation of related proteins, as well as the ability to compact the genome in such a small organism.

Figure 3. a4) The single stranded structure of RNA, with the 2’ hydroxyl highlighted. b5) A hairpin structure formed by RNA folding in on itself due to complimentary base pairing.

10

Plasmids are closed, circular, double-stranded DNA chromosomes most commonly found in bacteria. In nature, they can contain genes that allow for their propagation via conjugation, as discussed earlier, or genes that confer some sort of advantage over their counterparts, such as a gene for an enzyme that allows them to metabolize a new set of molecules for energy. In the lab, plasmids are an invaluable tool when doing anything synthetic biology related. We sometimes refer to them as “vectors” as they provide a mode of transport for getting our gene of interest into our cells. As previously mentioned, they have a useful purpose in determining successful clones via antibiotic selection. There are a few other key components of plasmids that are essential to their usefulness: an origin of replication, a multiple cloning site, and a promoter region. Lets dissect these one at a time. An origin of replication, as the name suggests, is absolutely essential for the plasmid to be able to reproduce. It is essentially a region of DNA that uses enzymes from the bacteria to replicate the plasmid within the host; without it, a single plasmid would not be able to replicate and spread to other cells. This also inhibits the plasmid from being passed down to the progeny of the original parent cell that carries the plasmid, so if it confers some sort of advantage, only one organism in a colony of millions will be able to benefit, and it will be lost. There are various synthetic origins of replication that have been isolated from plasmids found in nature that optimize the amount of plasmid present in each cell. For example, the common plasmid pUC19 (p = plasmid; pronounced “puck 19”) has an origin of replication derived from pMB1, naturally found in E. coli, which results in a high number of copies of plasmids per cell compared to other origins of replication. On a plasmid map, an origin of replication is represented by ori (figure 4). A multiple cloning site (MCS) is the region that contains a sequence of DNA where you would insert your gene of interest into the plasmid (figure 4). This region contains many sites that are recognized and cut by restriction endonucleases to create overhanging complementary ends for your gene to be inserted; this will be covered in greater detail in our future discussion on digestion and ligation reactions. You can think of it as if you had a piece of string that you tied together to form a circle, then untied the knot and tied each end of the original piece of string to a smaller piece of string to re-join the circle, now with two knots and an inserted piece. In some plasmids, cutting the MCS will disrupt what is known as a reporter gene; these reporter genes are typically Green Fluorescent Protein (GFP) or Red Fluorescent Protein (RFP). If these reporter genes remain in tact, the cells will express the protein and glow either red or green. Inserting a gene in the MCS will result in normal colored colonies, which is another way to select successful clones. For now, an understanding of the purpose of an MCS is more than sufficient. A promoter region is the region just before the gene of interest; we refer to entities that come before the entity of interest “upstream”. It is a region that recruits the enzyme RNA polymerase to make an mRNA copy of our gene of interest so it may be expressed in the cell. In synthetic biology, promoter characterization and design is an important area of interest, as we can optimize expression based on what promoter system we use. For example, the lac operon, which is a series of genes required for lactose founded in the organism E. coli, contains two

11 different promoters, and what are known as repressors. Repressor proteins bind to the promoter of the sequence of genes to inhibit expression when there is no lactose present, however when lactose is present, it binds to the repressor protein and allows expression to occur; these promoters are referred to as “inducible”. The repressor gene, found just upstream of the operon, is also under the control of its own promoter. As one can see, promoters are absolutely essential when it comes to gene expression. Not only do plasmids and bacterial genes contain them, but all organisms, including humans, do as well. Human promoters are many thousands of times larger than bacterial ones, however, and they contain many more complex regulatory regions that go far beyond the scope of this book.

Figure 46. A map of a typical plasmid, showing the promoter, MCS, and antibiotic resistance factors. Note under the name pGEX-3X, the size is shown as 4952 base pairs.

Concept Check

1. Describe the important structural aspects of DNA. Compare and contrast these with RNA.

2. What is the rule that allows DNA to be double stranded and that allows RNA to fold in on itself? Who discovered this rule?

12 3. You are an undergraduate researcher wanting to express and purify a newly discovered protein. How do you design your plasmid? What essential elements do you need to include? Where would you insert the gene that codes for your protein?

Chapter 5. Transformation, Transduction, and Transfection

So far, we have covered many aspects of typical cloning experiments, including plasmid design, antibiotic selection, and protein expression. However, we have yet to cover in detail how to introduce your DNA into your cells. Transformation, transduction, and transfection are different methods of getting DNA into cells, depending on your cell type and application. Transformation is the most common way of getting DNA into your cells. Namely, transformation is used exclusively for bacterial applications. This process happens in nature, albeit rarely, as previously mentioned in Chapter 1. In the lab, we can increase efficiency by millions, even by billions. First, we must make the cells “competent”. This means they are able to take in DNA from their environment with ease. There are two ways of doing this: using chemicals, or using a high-voltage electric shock, which is referred to as “electroporation”. In terms of efficiency, electroporation is approximately 10-100 fold better than chemical transformations. Chemical competency works by creating pores in the membranes of the cells with rubidium/calcium chloride. The cells can then be frozen at -80°C. When you are ready to transform DNA into them, they can be thawed, then you put some DNA into the tube with the cells, shock the cells with 42°C heat, and you’re done. This is highly convenient way of doing things after doing the initial work of making them competent. The heating step, known as a “heat shock”, creates disorder in the . After the cells cool, their membranes re-form and encapsulate the DNA inside the cells. With chemical transformations, you want to use 1-100 nanograms of plasmid DNA; the volume that you add is dependent on the concentration of the DNA. Typically, using 1 microliter of DNA is optimal, regardless of concentration, to ensure the DNA is not too dilute among the cells. Electroporation is both easier and more of a hassle that chemical transformations. It involves introducing DNA into cells and shocking them with a voltage dependent on what cell type you are using. For example, E. coli requires a voltage of 2500V for 5 milliseconds. This shock causes the membrane to be briefly disordered, as in chemical transformations. Afterwards, the cell membranes close and encapsulate the DNA inside the cell. Typically, you want to use less DNA doing

13 electroporation that chemical competency. Why would you want to use less DNA, but still get higher transformation efficiency? Well, the answer stems from the fact that DNA is negatively charged. If there is too much charged DNA relative to the neutral cells, it increases the conductivity of the sample. Thus, when the shock is delivered, the sample can “arc”, which is essentially frying your cells with a miniature lightning bolt. In summary, electroporation can be much more efficient and rapid than chemical transformations, however it can also be more finicky and much of your success depends on the settings on your machine. Transduction uses a virus to deliver DNA to bacteria cells exclusively by encapsulating the DNA in a protein coat and introducing millions of virus particles into the media with the cells. The viruses then inject their DNA into the host cell and the protein coat is digested, providing a one-shot delivery system. Transfection is essentially transformations for mammalian cells. Transfection encompasses chemical and electrical methods, as well as many other methods that are mammalian specific: liposomal delivery and microinjection are two widely used methods. Liposomal delivery involves packaging DNA in cell-like vessels that are essentially cell membrane around DNA. These liposomes fuse with the cell membrane of the target cell and dump their DNA into the target cell. Microinjection involves using a microscopic needle to literally inject DNA into the target cell. Microinjection is always done with the aid of a high-power microscope, and is widely used in cancer research to obtain a single modified cell and analyze its affects on tumor growth.

Concept Check

1. What is the difference between transformation and transfection?

2. Compare and contrast chemical transformations to electroporation. Discuss what is most advisable when using bacteria versus eukaryotes.

14 Chapter 6. Digestion and Ligation Reactions

Molecular cloning is the process of inserting a piece of DNA into a vector and subsequently introducing that vector into your cell of choice. We have covered vectors and methods of introducing DNA into your cells. Digestion and ligation reactions are the missing pieces to the proverbial puzzle; these are the tools you use to insert your DNA into your vector. Digestion reactions can be thought of as a process analogous to, as the name suggests, your stomach digesting food; except in our sense, we would like to refer to it as digesting DNA. Carefully think about the process of digestion: what is happening in your body, strictly in the digestion process, before anything is being converted into energy? Just as your stomach uses enzymes, such as amylase, lipase, and pepsin, to digest the macromolecules entering your digestive system, digestion reactions utilize enzymes that perform a specific function, which is to cut DNA at highly precise locations. These enzymes are what are known as “restriction endonucleases”, or “restriction enzymes”. Lets dissect what is meant by restriction endonuclease. Firstly, endo is the prefix that typically refers to “on the inside”; thus, we know these enzymes are inside the cell. Nuclease is the term for any enzyme that breaks down DNA by cutting the phosphodiester backbone. Note, nucleases do not alter the bases or sugars of DNA. These endonucleases we are interested in are called restriction endonucleases because, when they were found in nature, they were found “restricting” the entry of foreign DNA into bacterial cells. Further analysis found that these enzymes have evolved to target very specific sequences of DNA and have characteristic cleavage sites, known as “cut sites”. As one could imagine, these would be highly useful in the lab for cutting specific regions of DNA. As discussed in the previous chapter, plasmids contain a multiple cloning site, which is a region of DNA with many restriction enzyme cut sites. In digestion reactions, we combine DNA with enzymes and a special buffer in a tube, and the enzymes work their magic by opening up the plasmids and creating ends to which your gene can be inserted. There are two different kinds of ends left behind after a restriction enzyme digests your DNA. First, there are “sticky ends”. These ends are created by the enzyme cutting on two different spots on the two strands of DNA, creating overhanging segments of single-stranded DNA (figure 5a). Second, there are “blunt ends”, which do not have any overhanging segments of DNA (figure 5b). When performing digestion reactions, you must be aware of the overlap you wish to create between your destination plasmid and your insert, or even multiple inserts, as is the case many times; this will determine how your pieces of DNA will fit together.

Figure 5. Two different restriction sites. The green line outlines where the enzyme will cut. a7) The restriction site for the enzyme EcoRI. b8) The restriction site for the enzyme SmaI.

15 So now, you have a slew of digested pieces of DNA; how do you finally piece everything together? The answer: ligation reactions. Ligation reactions are known as such based on the main enzyme used, called DNA ligase. There are many forms of DNA ligase from different organisms, similarly to restriction enzymes; however, the job of ligase is to do the exact opposite of restriction enzymes. We will first discuss the job of ligase in nature, so we can appreciate its usefulness in the lab. When DNA is replicated, there are short fragments that are left un-joined adjacent to one another. The function of ligase is to join these fragments together by forming the phosphodiester backbone. Therefore, we can use ligase in the lab to join together two fragments that are adjacent to one another, but do not have their phosphate backbones linked together. How do we know that the two exact strands of DNA we want to join together are not only adjacent, but in the right orientation? After all, you have millions of pieces of DNA floating around in your solution, analogous to you floating in a tube in the Pacific Ocean. The answer comes from hydrogen bonding between nitrogenous bases in DNA. When you have two pieces of DNA that have overhanging ends that are complimentary to one another, they form hydrogen bonds that allow them to interact in solution just long enough for ligase to recognize them and fuse them together.

Concept Check

1. How were restriction endonucleases discovered?

Molecular cloning fill-in-the-blank: How do you insert genes into your vector?

You are interested in inserting a gene that codes for ______

(be creative) into your expression plasmid. First, you identify ______on either side of your gene to perform a ______.

Next, you add your DNA to a new tube, along with the enzyme ______.

You then purify your constructed plasmid and ______into E. coli.

16 Chapter 7. PCR & Agarose Gels

Some of the most useful tools we have in the lab are PCR and agarose gels. Briefly, the Polymerase Chain Reaction, or PCR, amplifies specific segments of DNA, whereas agarose gels are useful in checking if our digestion and ligation reactions worked properly. The polymerase chain reaction (hereby referred to as PCR) is a technique of DNA amplification discovered by Kary Mullis in 1985. How it works is by utilizing the enzyme DNA polymerase from the organism Thermus aquaticus to generate millions of copies of a single gene. Even though all organisms have DNA polymerase, it will become apparent as to why we use the enzyme from T. aquaticus an organism that in high temperature environments. First, DNA polymerase is responsible for reading a template strand of DNA and synthesizing a new, complementary strand of DNA. This requires the use of a “primer”, which is a short strand of DNA complementary to your target that binds to the DNA after it is separated by heating up the tube containing the DNA. When the polymerase gets to the end of the gene, it simply dissociates and goes on to find another strand with a primer bound to it. In PCR, the use of a “thermocycler” is absolutely necessary. This intermittently heats and cools the tubes containing the reaction mixture. The purpose of this, as mentioned earlier, is to constantly separate the two strands of DNA and re-join them together. A typical cycle runs as follows: heating the sample to 95°C for 15-30s to denature the DNA and expose the bases to be able to bind to primers, then cooling to 45°C-65°C for 15-60s to allow for primer binding (annealing), then heating to 68°C for 5min to allow Taq polymerase to synthesize new strands of DNA. The annealing step is also required to allow the new copies of DNA to recombine to form normal, double stranded DNA. These cycles are usually repeated about 30 times. As one can see, if you start with one piece of DNA, there will be two pieces after one cycle, four after the second, and so on, resulting in 230 or greater than one billion pieces of DNA! Agarose gels are extremely useful in checking whether or not we performed a successful digestion and/or ligation reaction. They work by applying an electric field to the gel containing DNA, which then causes the DNA to migrate down the gel; the distance migrated by the pieces of DNA are based on size (i.e., smaller pieces move further down the gel faster). There are many components to an agarose gel. First, the gel itself, is made by dissolving powdered agarose, which is a sugar extracted from seaweed. The mixture is then heated, typically in a microwave, and poured into a gel box (figure 6a). You then put a “comb” into the gel box, and when the gel has solidified, the comb leaves “wells” behind, in which you load your samples. A key ingredient to add to the gel before you pour it is called Ethidium Bromide, or EtBr. EtBr is a highly carcinogenic and mutagenic substance, meaning it can cause cancer by embedding itself in your DNA and causing mutations when DNA is replicated. It is very useful for visualizing DNA on an agarose gel for exactly this reason, as well as the fact that it is fluorescent under a UV light (figure 6b). Next, when preparing your samples to load onto the gel, you must add a tracking dye so you can see where the maximum possible distance travelled by the smallest pieces of DNA in your sample will end up; you do not want to run all your DNA off of the

17 end of your gel! When you are ready to load your samples and run your gel, you must add what is known as a running buffer, either TBE or TAE, to carry the electric current; you want to add enough running buffer to cover the surface of your gel. Finally, using a fine pipette tip, you load your samples into the wells created by the comb, attach the top plate and connect it to your power source, and turn the voltage up to 80-130V and leave it run. When the dye reaches 75-80% of the distance down the gel, you then take it to a special UV camera and then you can visualize where your DNA is. Based on the distance your DNA has migrated compared to a standard, known as a “ladder”, you can estimate the size of your DNA. This is useful in checking whether or not you successfully inserted your gene into a plasmid. For example, to do that experiment, you would load the following gel, with the numbers corresponding to the lanes left to right:

1. Ladder 2. Undigested plasmid without insert 3. Undigested plasmid with insert 4. Digested plasmid with insert 5. Insert

You may also run a gel to check that you had a successful PCR amplification and that your sample contains only the gene that you were interested in amplifying.

Figure 6. a9) a photo of an agarose gel box, with a gel and running buffer inside, as well as the top plate with the electrodes. b10) a UV photo of an agarose gel. Note the ladder on the far right, and the bands corresponding to a plasmid being cut into different sized pieces in the 3 lanes to the left of the ladder.

18 Concept Check

1. Describe the step-by-step process you would follow to make and run an agarose gel.

2. In PCR, why do we use DNA polymerase from Thermus aquaticus?

Chapter 8. CRISPR, Gene Editing, and Gene Silencing Technologies

Perhaps the most controversial topic in biotechnology today is the use of gene editing technologies to alter DNA in whatever way one chooses. CRISPR, short for Clustered Regularly Interspaced Short Palindromic Repeats, has been one of the most popular topics of research over the past decades as a tool for making these edits. An obvious ethical dilemma arises here; if used for the wrong application, there can have devastating effects. Some examples of negative uses include designer babies and raising a “super race” of humans. However, its use in medicine could provide an answer to genetically based diseases, including cancer. Here, we will focus the rest of our time on the science behind it. CRISPR was discovered in the same way that restriction endonucleases were discovered. The difference between CRISPR and simple restriction enzymes is that with CRISPR, the organism develops an immunity of sorts. Think about how the human body acquires immunity; it must be exposed to the infectious agent before it can build memory. Prokaryotes use CRISPR to cut up foreign DNA and integrate it into clusters inside its own genome as interspaced elements, separated by short

19 palindromic repeats at regular intervals. The bacteria then produce what are known as CRISPR RNAs (crRNAs) from the DNA inserted into their genome that interact with the protein Cas-9 (an endonuclease) to give Cas-9 specificity for that same piece of foreign DNA, should it ever re-enter the cell. This is particularly useful for bacteria protecting against viral invasion, as the virus cannot propagate and kill the bacteria if its DNA is degraded upon transduction. We can use this in the lab by introducing CRISPRs that code for crRNAs into cells via transformation, transduction, or transfection, and Cas-9 protein on a plasmid. The crRNAs give specificity for what gene is to be edited, and then Cas-9 cuts the DNA for either insertion or deletion of target genes. Gene silencing can be accomplished in other ways besides using the CRISPR/Cas-9 system. For example, there are small interfering RNA (siRNA) mediated gene silencing and RNA interference (RNAi); these are both examples of post-translational gene silencing. CRISPR/Cas-9 is among the group of gene silencing methodologies known as pre-transcriptional, and post-translational involves using drugs. siRNA and RNAi are similar in their mode of action, however they differ in which organism they will work in. siRNA gene silencing involves the introduction of a small RNA (~30 bases) that binds to complementary sequences of mRNA to inhibit the ribosome from transcribing it into protein. Additionally, these siRNAs contain binding regions that proteins can recognize; namely, proteins that affect the degradation of the mRNA-siRNA duplex. For example, in bacteria, a protein known as Hfq binds to double stranded regions of RNA, which would be the case if an siRNA bound with its target mRNA. It then wraps around the duplex and keeps them bound together long enough for a nuclease, RNase III, to degrade the double stranded RNA. RNAi is very similar to siRNA, the difference being RNAi is exclusive to eukaryotes. RNAi involves cutting foreign DNA into smaller chunks by a protein called DICER, which are then transcribed into siRNA. These siRNAs interact with the protein complex RISC, which, analogous to Cas-9, uses the siRNA to recognize foreign mRNAs and degrading it by cutting it into small fragments. We use this in the lab the same way we use CRISPR, however this is exclusively for post- transcriptional shutdown of genes.

Concept Check

1. Compare and contrast CRISPR to RNAi.

2. What is the full name of CRISPR?

20 Chapter 9. DNA Sequencing Technologies

We already know how to use sequence data in digestion and ligation reactions, as well as with CRISPR. We know how to visualize our DNA and estimate its size. But how do we actually know the sequence of base pairs in a gene? DNA sequencing, pioneered by Fred Sanger in the 1970s, gives us this ability. Since its original use, there have been many other faster, more accurate methods developed, utilizing the most modern technology. Originally, Sanger sequencing (named after Fred Sanger) was a very expensive and slow way of sequencing DNA. Sanger sequencing is similar to PCR in that it involves using DNA polymerase to generate numerous new strands of DNA based on the template you provide. In the reaction mixture, you add the building blocks of DNA, as with PCR, however you include what are known as di- deoxyribonucleotides, which lack a 3’ hydroxyl group necessary to form the phosphodiester bond between molecules; each of the four bases is colored with a different dye to allow differentiation between . Thus, the product is millions of different pieces of DNA, all of different lengths. These are then ran on a gel, as previously discussed, and based on the pattern of bands, the sequence is manually analyzed and reproduced to get your data. Now, this method is automated, and computers recognize the differences between the different dyes to reconstruct a sequence. There are a few new sequencing technologies that revolutionize how quickly and inexpensively we can sequence not only genes, but also entire . Two interesting methods that have been developed recently are Nanopore sequencing and Roche 454 sequencing. Nanopore sequencing involves applying an electric field to a nanometer sized protein pore (hence, Nanopore) that becomes disrupted when the DNA is fed through the pore. As it turns out, each base has a characteristic disruption pattern; therefore, a computer can reconstruct a sequence based on the disruptions in the electric field by each base. This method is very compact and involves a machine the size of a USB flash drive, which can be plugged into a laptop in the field for many different applications, including analysis of soil microbiomes, monitoring of environmental pathogens like Ebola, and so on. Roche 454 sequencing was the first sequencing method that was given the designation of a “next generation sequencing” method. It begins with using restriction endonucleases to cut up the target DNA into smaller fragments. These fragments are then bound to a small, specific primer, known as an adaptor. This adaptor facilitates binding to extremely small beads, which have sequences on them complementary to the adaptor. Upon binding to the beads, the interaction destabilizes and forms single- stranded regions of DNA. These regions are then subject to a typical PCR reaction for amplification. The building blocks of DNA, deoxyribonucleotide tri-phosphates (dNTPs), are then added, conjugated to a fluorescent dye. Different dNTPs are added in sequence and a computer reads when there is a fluorescent emission, the computer knows that, say, a G was added to the strand. Looking at the fluorescence emission data allows us to either manually elucidate the sequence, or for a computer to do it for us. An example read from a 454 sequence is shown in figure 7.

21 Both Nanopore sequencing and Roche 454 sequencing greatly reduce the amount of time sequencing takes and allows for massive amounts of DNA to be sequenced, known as “high-throughput” sequencing. For example, using 454 sequencing, the Human Genome Project could have been completed in three days; using Sanger sequencing, it took scientists fifteen YEARS.

Figure 711. A typical read from a Roche 454 sequencing run. Note the differing peaks corresponding to fluorescence from different bases. The letters on the bottom correspond to the most likely sequence gathered from the data.

Concept Check

1. What reaction, discovered before DNA sequencing, is absolutely essential to DNA sequencing technologies?

2. What are the advantages of NGS (next generation sequencing) methods? Can you think of any disadvantages?

22 Chapter 10. Applications of Synthetic Biology & iGEM

So far, we have covered a lot of how to genetically engineer things. But we have not yet covered the most important question: why? Why would one want to genetically manipulate life itself to do something it isn’t supposed to? What is the benefit? What are the limitations we face in trying to tackle real-world issues using synthetic biology and bioengineering? Let us discuss each of the questions you may have surrounding “why”. The purpose of synthetic biology research is to solve real-world problems using life itself. Nature provides us with an endless list of structural proteins, enzymes, and genomic elements that allows us to address almost any physical issue imaginable using biological molecules. Many times, synthetic biologists will find a protein from an organism that tolerates extreme environments, and will then introduce it into a harmless, non-pathogenic organism, such as E. coli K-12, so it can perform its function and have no further impact. As for the limitations of synthetic biology, engineered organisms continue to perform better and better in their new environments, suggesting the sky is the limit for engineered organisms. The marked outperformance of organisms over traditional chemical methods also lends support to the world of synthetic biology, with engineered organisms able to be more flexible, clean, and cost effective. iGEM, or the International Genetically Engineered Machine Competition, aims to advance the field of applied synthetic biology on a global scale. Teams from all over the world research and develop synthetic biology solutions to major problems and present these solutions at a conference, known as the Giant Jamboree, every fall in Boston, Massachusetts. In the past, teams have addressed every issue imaginable, from decomposing and gum, to purifying water from uranium contamination, to developing an organism-based detection method for chlamydia, to purifying polluted air. Many teams begin with an idea that becomes a marketable product by the time the Giant Jamboree is held. There are overgraduate, undergraduate, and even high school teams that participate in iGEM. However, science is only half of the iGEM experience. The other half is known as Human Practices, which focuses on how the projects covered in iGEM affect the world , as well as ethical issues surrounding synthetic biology, and education. This is all part of iGEM being an “open source” competition, in which the competition between teams is eliminated to allow for maximum scientific advancement. Teams collaborate with one another to help accomplish each other’s goals and advance the field of synthetic biology to benefit humankind. This year is the second year UNBC is participating in iGEM. Previously, we attempted to engineer E. coli to produce extracellular copper binding proteins to purify contaminated water. This year, we are using siRNA mediated gene silencing to try and combat the antibiotic resistant superbug Methicillin-resistant Staphylococcus aureus (MRSA), which is a global health concern, including within our own . We have identified Hfq from S. aureus and purified the protein, and have measured the proteins binding affinity to our custom designed siRNA

23 binding regions. We will then test the affect of our siRNAs in MRSA before attending iGEM this fall.

Concept Check

1. What does iGEM stand for?

2. What is an issue you would like to address using synthetic biology? Be creative. Attempt to design how you would engineer your organism based on all previously mentioned information.

24 Photo references list. All photos are reusable under the creative commons agreement.

1. https://commons.wikimedia.org/wiki/File:L-alpha-amino-acid-2D-skeletal.png

2. https://commons.wikimedia.org/wiki/File:DNA_structure_formula_virgin.svg

3. https://qph.ec.quoracdn.net/main-qimg-f1d6ed0e5d353d582d476ab278734933

4. https://upload.wikimedia.org/wikipedia/commons/7/7f/Rna-structure.jpg

5. https://upload.wikimedia.org/wikipedia/commons/thumb/3/3f/Stem- loop.svg/1280px-Stem-loop.svg.png

6. https://upload.wikimedia.org/wikipedia/commons/c/c6/PGEX- 3X_cloning_vector.png

7.https://upload.wikimedia.org/wikipedia/commons/thumb/7/78/EcoRI_restricti on_enzyme_recognition_site.svg/2000px- EcoRI_restriction_enzyme_recognition_site.svg.png

8.https://upload.wikimedia.org/wikipedia/commons/thumb/9/9f/SmaI_restrictio n_enzyme_recognition_site.svg/500px- SmaI_restriction_enzyme_recognition_site.svg.png

9.https://upload.wikimedia.org/wikipedia/commons/8/86/Large_Gel_Electrophor esis_Chamber_with_Agarose_gel_inside_-_%283%29.jpg

10. https://upload.wikimedia.org/wikipedia/commons/e/e6/DNAgel4wiki.png

11. http://forensics.psu.edu/research/dr.-mitchell- holland/projects/documents/NextGenerationDNASequencingDataFlowgram.png

25