<<

Biol 200 – Professor Richard Roy

Lecture 17 – Molecular Genetics Methods II

E. Coli and bacteriophage lambda are the workhorses of molecular biology

 Most higher organisms use similar processes as bacteria and viruses (phage) as to replicate, transcribe, and translate their essential cellular molecules  Purified components (i.e. enzymes) facilitated the rapid progress  Paul Berg and others were the first to successfully introduce exogenous DNA fragments into bacteria to produce a recombinant gene product  This milestone ushered in the era of recombinant DNA technology

Cloning – Easy to Misunderstand

 Clone: o Group of organisms produced from one stock or ancestor o One such organism o Person or thing regarded as identical to another  It can be used in different circumstances to point out different procedures o Early frog embryo . Clone by physically splitting . Two frogs develop from same zygote . One single blastomere can develop more than one o Cloning by: replacing the nucleus of an egg cell by the nucleus of an adult, somatic cell, implanting a resulting embryo . Get rid of the nucleus in an oocyte and take a nucleus from a fully differentiated cell, e.g. cell from mammory tissue . Put nucleus in the oocyte and reprograms the cell . E.g. cloning of Dolly . Can give rise to totipotent cells through this reprogramming mechanism o There are certain mutants that have been identified with strange morphological changes . E.g. drosophila head with legs growing where the antenna should be – antennapedia . Geneticists can identify the messed up chromosome . They take chunks of chromosomes and chew it down in more simple bits of DNA that you can work with . Find chromosomes in a DNA context and find a way to make tons of that DNA . Can put DNA in plasmids and amplify o Need lots of material to understand sequences . We use plasmids that can replicate autonomously to produce large quantities . Can carry out analyses on large quantity

Bacterial restriction enzymes cut DNA at specific sites

 EcoRI (and other enzymes like it) cuts DNA at a specific six nucleotide, palindromic sequence (GAATTC) that occurs relatively infrequently throughout the genome  The reaction bi-products have overhanging ends that are fully complementary and are thus “cohesive”  Specialized plasmids called vectors were engineered to carry out specific functions  Polylinkers (or multiple cloning sites) facilitate the introduction of DNA fragments

Differential digestion facilitates directional insertion of desired DNA at specific sites

 Remember that vectors can close back on itself because the cohesive ends can stick back together  If use only one enzyme, it can go in either direction  Digestion of a vector with two different enzymes facilitates directional cloning o Asymmetry in the digestion o Plasmid can’t close up on itself because the ends are different o DNA interested in can only go in one way – EcoRI will only interact with EcoRI ends and SphI will only interact with SphI ends

Studying Genes of Interest: DNA Libraries

 Permanent collections of genes can be obtained and maintained in DNA libraries o Purify DNA or chromosome of an organism o Cut that DNA or chromosome with a restriction enzyme digest o Take the whole mixture and clone it into vectors so that every little piece of the chromosomes will be incorporated into those particular vectors o Anneal them using DNA ligase o Maintain all the plasmids as a library o Should have all DNA represented in a library that you can grow and maintain o If you want to identify a chromosome or specific DNA sequence, you can take library anytime and fish for the particular plasmid that has that insert that corresponds to the DNA of interest  Genomic libraries contain copies of the DNA present in genomic/chromosomal context (intergenic regions, introns, exons, repetitive sequence)  cDNA libraries represent mRNA (tissue specificity, abundance) mRNA can be enriched from a starting population of total RNA (tRNA, rRNA)

 You want to purify the mRNA from a given source

o It has all the qualities that allow us to enrich it away from the more abundant ribosomal RNA and tRNAs that exist in the cell  The poly-A tail of mRNAs enables their purification via oligo-T-nucleotides linked to a solid support  Can use a column with oligo-T to run RNA with a poly-A tail which will bind to the oligo-T nucleotides in the column while everything else goes through  Can then elute the column and the elution mixture will contain a great enrichment of mRNA – all because of poly-A tail  mRNA in that tube should be representative of all the mRNAs that were formed in a cell at a given time  But they degrade very rapidly – must make it into cDNA  Using the oligo-T paired to poly-A tails as primer, reverse transcriptase (RT) is used to synthesize the complementary or cDNA o RT is a viral enzyme that allows viruses to use RNA as a template to make DNA o Can make dsDNA that can be maintained (cDNA is single-stranded and can act as a template) mRNA can be converted to cDNA using retroviral Reverse Transcriptase (RT)

 If you make the first strand of DNA from an mRNA template, you can make a double-stranded molecule  You would need to generate a primer that would sit on that single-stranded DNA molecule in a 5’ to 3’ fashion  Bacterial enzymes will elongate and make DNA using the single-stranded DNA as a template  When you have a few copies of the DNA, you can amplify using PCR to make many copies of that cDNA  You can put this into a library by introducing DNA into bacterial vectors o Use an RT step, a poly-DT or poly-oligonucleotide to hybridize with the poly-A tail of all the molecules in the complex sample solution o Extend it with RT and remove the RNA by an alkali treatment or with RNAase o Leaves you with single-stranded DNA complementary copy to the RNA that you started with  mRNA is converted to complementary DNA or cDNA by priming the poly A tail with a single-stranded poly T oligonucleotide  RT uses this primer to initiate single-strand DNA synthesis that is fully complementary to the mRNA template  RNA is then removed and a poly dG adapter (a small bit of DNA with a known sequence) is annealed to the 3’ end. A poly dC primer is used to initiate synthesis of the second DNA strand o Single-strand DNA molecule with unknown ends  add on poly dG nucleotide to all of them o Use DNA ligase so you have a bunch of molecules with a poly dG end (3’ end) o Then you know the end so you can prime the second DNA synthesis step using a poly dC oligonucleotide that will recognize the bit added on o It will base pair with the adapter to prime second strand synthesis with E. coli DNA polymerase I so that you form a dsDNA molecule representative of the mRNA that was present in the initial source of sample

 E. coli DNA polymerase I progresses through any remaining hybrid regions and extends the second strand o The number of cDNA molecules that correspond to any given DNA product will also represent DNA abundance

Studying genes of interest: Genomic and cDNA libraries

 cDNA libraries correspond to the population of mRNA molecules present at a given time, or in a given tissue (or both)  Permanent, reusable record of the mRNA expressed in a given tissue/stage  Every single mRNA (cDNA) should be represented at least once among the phage plaques  complexity  Probes allow you to find specific cDNAs – allows you to identify specific sequences of DNA based on hybridization, complementarity

Polymerase Chain Reaction (PCR)

 A technique that has revolutionized molecular biology  PCR has become feasible for routine lab work by the discovery of thermostable DNA polymerases o These were isolated form extreme thermophiles that live in very harsh conditions, i.e. the hot springs and geysers in Yellowstone National Park, Wyoming, USA  PCR greatly facilitates the exponential amplification of target DNA molecules from a minimal amount of starting template, in the extreme case from a single molecule (the ice-man, forensic science, Neanderthal ancestors)

Studying genes of interest: RT-PCR

 Since all mRNAs present in a given cell will have a poly-A tail  Poly-T primers  RT-PCR  Abundance of cDNA is representative of transcript number cDNA: What to do with it

 Recombinant can be overexpressed in E. coli to make large quantities of : insulin  The lac promoter provides an effective means of inducing gene expression o Lac operon in bacteria turns on lac Z gene o Can take out lac Z gene and put any cDNA in its place (e.g. insulin, growth hormone) . Induction of lac gene expression can make tons of that cDNA

Bacterial expression vectors: over-expression of recombinant protein

 Recombinant proteins can be overexpressed in E. coli to make large quantities of protein – insulin, hGH, EPO, interferon  The lac promoter provides an effective means of inducing gene expression

Lecture 18 – Molecular Genetics Methods II (continued)

Specialized vectors permit efficient expression in higher eukaryotic cells

 Specific vectors often have a function – require elements important for transcription (promoters)  Can use viral signals encoded into these vectors that will specify where to put the poly-A tail  Don’t always want to express a protein in bacteria  Can introduce expression vectors that express cDNAs into eukaryotic cells in a process similar to transformation in bacteria - Transfection  Transient transfection: only some cells express the transgene o Vector express cDNA of interest for a short period of time until cells realize it and kick it out  Stable transfection: all the cells express the transgene o Select for the presence of that particular expression vector by using drugs that will kill cells that don’t express that vector o All the cells remaining will express cDNA of interest

Expression of fusion proteins facilitates functional analysis

 Key elements at the end of an RNA (stops) where the will fall off  Can add on piece of another protein to gene product of interest o Introduce domains at the C-terminus of the protein by eliminating stops and allowing RNA polymerase to continue transcribing o Can also put coding region of interest at the beginning of a transcript o Makes hybrid mRNA that becomes a hybrid protein  Aequorea victoria produces a fluorescent protein that has been named Green Fluorescent Protein (GFP) o Can figure out where GFP is being expressed by looking at the fluorescence o Can use GFP downstream of a promoter and find out where the promoter is being activated  Fusion proteins: o Promoter driving GFP (gene expression patterns) o GFP added to proteins (protein detection) o GFP-tagged histones and decorate the chromosomes and the cytoskeleton during mitosis in a living cell in real time

Optogenetic approaches take advantage of the properties of channelrhodopsin

 Channelrhodopsin is a channel that allows the movement of ions into cells – microbial channel  It is sensitive to light

Optogenetic approaches are based on introducing a microbial Channelrhodopsin into specific neurons

 If Channelrhodopsin is introduced into specific neurons, it will allow the entrance of these ions into these neurons and activate them  Made trans genes – promoters that might be neurospecific, driving this microbial Channelrhodopsin  Introduced into animals so the promoter will be activated into those specific neurons and will produce a weird microbial Channelrhodopsins in mice  Blue light stimulus on the animal – the neurons with Channelrhodopsin will open the channels and allow the ion to flow into the neurons  In doing that, you cause an action potential which activates the neuron

Molecular biological techniques advance our analytical capabilities

 Qualitative analysis – the nature of the molecules in question o Size o Nucleotide composition o Conformation/configuration o Structure  Quantitative analysis – molecular approaches can be used to determine the levels of specific gene products o i.e. tumour markers (p53, BRCA1/2)

Molecular probes can be used to find a needle in a haystack

 Can visualize changes in nucleic acids using these probes  Separate complex mixtures on an agarose gel according to size to reduce complexity

 Can then stick them onto solid support, e.g. nitrocellulose or nylon  Don’t know much except for size  Use a marked sensitive probe which can detect a few molecules  Radioactivity is usually used to mark nucleic acid probes so they are very sensitive  Mix solid state support that contains separated mixture with probe  Probe will only bind to target through Watson and Crick base pairings – if you wash it really well, you can detect specifically where the molecule might be

Single-stranded oligonucleotides can be labelled using polynucleotide kinase

 Known sequence that corresponds to gene or gene product of interest  Then you need to label it  Synthesize an oligonucleotide that has the complementary sequence to the specific region of the gene region of interest  Polynucleotide Kinase (PNK) will transfer the phosphate of ATP to the free hydroxyl at the 5’ end of the oligonucleotide  Can purify oligonucleotide on a column leaving you with the sequence of interest  When you mix radioactive probe with complex mixture that contains a nucleic acid that possesses that sequence, it will stick to it very well and you can get rid of everything else by washing

PCR can be used to make radiolabelled DNA probes

 By incorporating deoxyribonucleotides that carry a radiolabel on the α-phosphate into PCR amplified DNA o You reduce the concentration of one of the dNTPs  Unincorporated radioactive nucleotides are removed and the radiolabelled DNA can then be used as a probe  It must be rendered single-stranded prior to use – it will act as a complementary probe

Non-Radioactive Nucleotide Probes

 Give off light instead of radioactivity  You add on particular molecules, incorporate them into DNA strands, and then you have to detect them somehow o Based on antibodies or affinities for given molecule in question  You do the PCR reaction and incorporate the molecules - then there are reagents like antibodies that recognize e.g. biotin

PCR can also be used to make fluorescently labelled DNA probes

 Incorporation of conjugated deoxyribonucleotides into PCR-amplified DNA  The non-labelled probe can be used in hybridizations just as a radioactively-labelled DNA probe  Detection requires a second hybridization step that includes antibodies or fluorescently-labelled conjugates

Nucleic acid hybridization techniques enable DNA and RNA detection

 Nucleic acids of complementary sequences can base pair to form double-stranded hybrids

Analysis of nucleic acids by transferring to solid state support (nylon, nitrocellulose)

 Both DNA and RNA can be separated according to size using an agarose gel. DNA is cut with a restriction enzyme.

 mRNAs of different sizes can be separated on an agarose gel  The separated molecules are transferred to a solid state support where they are maintained in a denatured (single- stranded) state  Ensure that they are covalently linked by treating them with UV  Before putting the DNA on the solid state support, you have to denature it in the gel, so you do a bath with alkaline  Solid state support = blots  DNA  nylon, nitrocellulose  RNA  nitrocellulose

Once covalently bound, the levels and positions are permanently recorded

 The nucleic acids are then bound covalently  Permanent record of abundance and the size of the molecules following separation  The “blot” can be “hybridized” with probes o If it’s a DNA blot, it’s called a Southern blot – an RNA blot is called a Northern blot o You do this by mixing your blot with your probe – but it must be single-stranded o PCR products must be boiled beforehand so it will interact with the targets  Washing removes non-specific signal and only complementary sequences will be detectable on the blot following autoradiography

Nucleic acid hybridization techniques and DNA detection

 Nucleic acids of complementary sequences can base pair to form double-stranded hybrids, i.e. Southern Blot  Southern blots are useful in detecting differences in genome  You take DNA from individuals to find out where a diseased gene may come from (pedigree or family) o Cut DNA with restriction enzyme and run through an agarose gel to separate the sizes o Transfer to nylon after being rendered single-stranded, then hybridized with a probe o Idea of a region you’re interested in – probe will only light up a few regions on that complex mixture o Can interrogate whether there are differences between various members of the family o Can correlate differences in fragments with prevalence of the disease – Restriction Fragment Length Polymorphism (RFLP)

PCR can also be used for genotyping

 An example: a specific allele of the Breast Cancer Related 1 (BRCA1) gene can be detected by the presence of a particular restriction enzyme site – restriction fragment length polymorphisms (RFLP) can be detected  Amplify PCR fragment

 You isolate gene and cut it with an enzyme  If the enzyme doesn’t cut it then the mutation isn’t there – wild type  If it can be cut, you can have two regions of DNA – one that has mutation and one that doesn’t; there is a copy of the gene (heterozygous)  Products of a full digestion shows that both copies have mutation so that person will develop breast cancer – very high risk

Nucleic acid hybridization techniques enable RNA detection: Northern Analysis

 Run RNA through agarose gel to get rid of complexity, transfer to nitrocellulose, use a complementary probe to RNA of interest  All RNAs is on nitrocellulose and only one section corresponds to RNA of interest  Hybridize blots with probe and wash it to get an idea where the RNA is being expressed  Nucleic acids of complementary sequences can base pair to form double-stranded hybrids

Overview of the RNA-seq Procedure

 Northern blot  can only look at one gene at a time  Next generation sequencing technologies look at all RNAs expressed in a given tissue at a given time  Take all RNA, make cDNA, make small libraries, and do next generation sequencing  Can look at quantities of each one of those cDNAs and link them to their position in the genome sequence based on the sequence that you’re finding

 You end up with RNA expression level based on nucleotide position  Can say that x number of genes are expressed in x number of copies in a particular tissue at a certain time

Lecture 19 – Proteins: Structure, Function and Separation Strategies

Some proteins have very interesting properties

 Make fusion proteins by linking GFP to another protein in which we are interested  In doing so, you mark/label that protein and follow it in real time using a microscope  Can make special vectors, reporter vectors, using GFP sequence o These reporter vectors would have GFP downstream from a promoter that you suspect might be expressed in a given tissue type o Can put that promoter in and interrogate when GFP is being expressed – whenever you see green fluorescence  Can make trans genes and find out when specific genes are being expressed  GFP molecule was characterized structurally using X-ray diffraction and crystallization  It’s composed of ribbons – beta sheets, and emits fluorescence

Crystallography and X-ray diffraction have been useful to identify

 In order to understand the structure of a protein, you need to purify large amounts of it o Then try a different combination of salts to see if you can get crystals  High concentrations of purified protein can organise to form crystal lattices  These crystals can be bombarded with high energy beams (i.e. X-rays) which are scattered according to the atomic arrangement of the atoms within the crystal o Crystals are placed on a small scaffold along giant beams that fire up radiation o X-rays will hit the sample then deflect to give information how the molecules are structured – relies on electron density  The scatter pattern is detected by radioactive detectors and the data is analyzed by complex programs to provide a prediction of electron density  “Like deciphering the shape of a rock by its ripple pattern”

Electron density maps provide a skeleton upon which one can build a model

 By recording the electron density maps structural biologists can begin to build models by filling in the observed density with amino acids that correspond to such shapes

 Amino acids are linked into ball and stick models to complete the chain and then secondary and tertiary structures along with individual interactions (hydrogen bonding, van der Waals interactions…) between residues can be highlighted

Structural characterisation links primary sequence to protein topology

 Structure and function are not always revealed from primary sequence – it’s just a chain of amino acids  By determining protein structure, we can better understand how proteins carry out their respective functions

Proteins can bind other molecules

 Proteins may interact specifically with other molecules; ligand = binding entities  Specific association between proteins and ligands that may control the output of that protein  Some common ligands include: o Growth factors  receptor o Steroid hormones  receptor o Cytokines  receptor  Ligand binding can change protein conformation considerable: Allosteric switches o Protein will either take on a function or block a function o Cells uses proteins to dictate these types of switches  Strength of a protein-protein interaction can be expressed as:

Kd – Dissociation constant

o If Kd small – proteins like to stick together (high protein density)

Proteins can be catalytic – Enzymes

 Enzymes reduce the energy required to carry out a given reaction o Proteins can cause reactions that require a lot of energy to occur – normally they would not occur in its regular state  Often, enzymes enhance substrate interactions such that reaction are favoured o Need substrate binding sites o Also needs a catalytic site – a collection of amino acids that will interact with the substrate when it’s in the cleft that will force the reaction forward and form products o It will be responsible for the conversion of the substrate to the final product  Critical residues in the active site may be involved in ternary complex formation and drive reactions forward

Quantification of catalytic efficiency can reflect specific properties of a protein

 Enzyme must first find its substrate  Once the enzyme binds the substrate, it can carry out that catalytic reaction and form a product (at least 2 steps)  After that, the substrate wants to liberate from the enzyme for another series of reactions  Can describe formation of product by looking at the accumulation of product as a function of concentration of substrate you start with – for a given amount of enzyme  The formation of product will increase very rapidly as you increase the substrate, then you arrive at a plateau that corresponds to max velocity  In order for all of this to happen, the enzyme must find the substrate and bind to it

 Enzyme’s ability to bind to a substrate = Km (Michaelis Constant)

 Km = concentration of substrate which gives the half maximal of the velocity

 Good substrate = low Km, bad substrate = high Km

 Km never changes – it’s a characteristic of that protein and that substrate

Calcium, Callmodulin and conformation

 Influx of calcium ions into neurons can activate them  Mary proteins require Ca2+ for optimal function. Often this requires a Ca2+ binding protein called calmodulin.  Calmodulin radically changes its conformation when bound to Ca2+. o Calmodulin  allosteric trigger  Calcium will interact with calmodulin (which has EF domains) o EF domains will interact with calcium and change the conformation of the domain o It is an allosteric regulator which depends on Ca2+  In the Ca2+-bound state, calmodulin can recognise specific regions of proteins to which it binds – thereby altering protein function in a Ca2+- dependent manner

Proteins can act as switches

 Switches involve turning a protein on or off using small molecules  Active proteins often exist in an “on” and “off” state and fluctuate between these states in response to intra- or extracellular cues  GTP binding proteins can possess a GTP hydrolysing activity (GTPase) that converts GTP to GDP o This turns the protein off and it goes to a bound state which is inactive  GTPase activity is influenced by GTPase activating proteins (GAPs) o GAPs turns the GTP-bound molecules off – they flip them into an inactive GDP bound state o Guanine exchange proteins (GEFs) displaces GTP and activates the switch  Phosphorylation by protein kinases can also act as a trigger by catalyzing changes in protein conformation o Through activity of protein kinase, the phosphate will be transferred to an acceptor molecule o Conformational changes can result in an increase or decrease in activity  These changes are counteracted by protein phosphatases that remove these modifications

Centrifugation

 Need to purify proteins  Differential centrifugation o Spinning solutions at different RPMs to sediment different sized proteins o Larger particles will sediment faster while smaller particles will stay in solution o Can get fractions of separated proteins – large, small  Rate-zonal centrifugation o Layer sample on a density gradient o Density is greatest at the bottom and lowest at the top o Particles settle according to mass

SDS-Polyacrylamide Gel Electrophoresis

SDS-Polyacrylamide Gel Electrophoresis

 Does not use agarose – uses polyacrylamide gel  If you mix polyacrylamide with crosslinkers, it will form a gel matrix with pores  When you run proteins through, they will go in and out of these pores and eventually separate  They will separate based on their conformations, size, etc.  Heat protein with SDS o SDS will disrupt all hydrophobic regions of the protein and open up the protein o SDS will add on a negative charge to all of the polypeptides – charge will all be identical  In the presence of reducing agents which will block or inhibit these disulphide bonds and all the complex will fall into its individual components which are completely open up  Mass to charge ratio will be more or less identical for all of the protein entities in the starting sample  Negatively-charged molecules will migrate in the gel and resolve according to their mass

Southern, Northern and Western blots

 Can transfer SDS gel onto solid-state membrane  Western blot  proteins  Incubate blot with antibody directed against protein of interest o Because of specificity between protein and antibody interaction, can eliminate all the background and light-up where that antibody is o Can focus on only proteins of interest

2D Gel electrophoresis

 When 2 proteins are similar in mass, cannot separate in SDS gel  Can separate according to charge  Can separate protein according to charge, and then according to their mass afterwards  2D gel electrophoresis  Isoelectric focusing (initial step) o Form gel strip that contains a gradient of pH set up by amphalytes o Amphalytes will provide a negative and positive charge on the strip – varying pHs o Proteins will migrate in the gel to the point where they are no longer charged – they arrive at a pH where there is no charge (isoelectric point)  Can put strip in SDS solution, on top of a polyacrylamide gel and then run all the proteins that have been separated out according to charge through an SDS page electrophoretic experiment o First resolved according to charge, then by size

Gel filtration or size exclusion chromatography separates according to size

 Can separate polypeptides by column chromatography  Matrix contains pores or indents of a specific size (agarose, sephadex) o Indents could be modified into different sizes o Could run a complex mixture of proteins and they will make their way to the bottom  Proteins or complexes will flow through the channels between the beads  Small proteins make it through the pores while big proteins are confined to space between the beads. o Big proteins could be the native proteins of the crude product – they are too big to go into the dimples so they go right through o Small proteins will go in and out of the dimples in the column until they elute or come off the column where they can be collected by different fractions – different fractions = different sizes

 Very large proteins simply flow through the column while smaller proteins take the long way o Larger proteins will come off first while smaller proteins come off last  Longer column lengths provide greater separation efficiency

Ion exchange chromatography: separation based on electrochemical properties

 Entirely based on the protein’s ability to interact with the matrix  Have chemical entities onto the gel itself so it is negatively-charged or positively-charged  For a negatively-charged gel, positively-charged particles will interact, while negatively-charged proteins will just go through  If you run a buffer over that column and increase the ionic strength (usually NaCl or KCl), you increase the concentration of ions  In doing so, positively-charged proteins compete with the negatively-charged proteins and kick them out  You end up displacing these positively- charged proteins initially bound to the negative column by increasing the concentration of positive ions in the buffer you’re using to elute the column  Proteins that are weakly bound will come off early because they don’t compete as well  Can collect fractions at the end of the column

Affinity Chromatography

 The most efficient method of purifying proteins  An affinity tag can be added to engineer a translational fusion variant of any protein of interest. This facilitates rapid purification based on affinity between molecules. o Introduce affinity tag on a DNA sequence that you know corresponds to protein – Know cDNA of protein of interest o Disrupt translation stop on cDNA and have it fuse into a series of amino acids that corresponds to a known epitope (a small peptide that antibodies have been raised against) o Antibody recognizes that small peptide o When we introduce protein into cells through transfection, cells can produce a lot of this protein with the affinity tag o We can crush up those cells and put that complex mixture in an affinity column o If we have an antibody that recognizes that particular tag, we can conjugate that antibody to small agarose beads in the column o The antibody will interact with the affinity tag and grab onto that fusion protein – everything else will flow through o Lock and key mechanism of enriching protein population on the column

o Can also run low pH buffer in the column so the protein drops down the column because the protein is not recognized by antibody – then you quickly want to bring the pH back up  Some affinity combinations: o GST/glutathione agarose o 6His/Nikel columns (IMAC) . Binding to a column with a metal o Antigen/antibody columns (protein A linkage) HA, Myc, FLAG

Lecture 20 – Eukaryotic Transcription I

Transcription in eukaryotes is compartmentalized

Transcription in eukaryotes is restricted to three organelles:

 Plastids (plants, algae) o Mechanistically resembles prokaryotic transcription  Mitochondria o Mechanistically resembles prokaryotic transcription  Nucleus o The nucleus is the most important site for transcription in eukaryotes

The three eukaryotic RNA polymerases have specialized tasks

 RNA polymerase I: transcription of the ribosomal precursor RNA genes in the nucleolus-28S, 5.8S, 18S  RNA polymerase II: transcription of all protein encoding genes (mRNA), some RNAs required for splicing (U1-U5), other small non- coding RNAs  RNA polymerase III: transcription of ribosomal RNA genes outside the nucleolus (5S), transcription of transfer RNA (tRNA) genes and small stable RNAs including U6 – require for splicing

Eukaryotes possess three RNA polymerases

 All three eukaryotic RNA polymerases share common features: o All of them are multimeric protein complexes o Some subunits show significant homology with bacterial RNA polymerase o All of these subunits are more or less essential  RNA pol II has a carboxy terminal domain while RNA pol I and III do not o It is made up of repeats that seem to go on 26 times in yeast and around 50-52 times o These repeats on the large subunit are very important and it has become clear that the CTD is critical for a number of downstream events that occur during and after transcription

TATA sequence directs transcription at the promoters of some protein coding genes

 If you have a purified fraction and you put it in an enzymatic reaction, it will just initiate transcription o If you have a nick in the DNA, that’s sufficient enough for RNA pol II to do transcription  It’s one of these enzymes that needs to be disciplined  There are aluments that regulate pol II so they transcribe only where they’re supposed to  One of the first aluments was one that corresponded to the nucleotide sequence TATAA and it was upstream of the few genes identified before any genome sequencing  TATA box is usually found upstream (-35) of the first nucleotide that corresponds to the 5’ end of the pre-mRNA (+1 site) o It is very important to the initiation of transcription  TATA box is absent in some genes but they were actively transcribed as well  Another series of aluments are equally important for the transcription of genes

Other proximal elements are equally important in transcriptional initiation

 There are a number of DNA aluments which contribute to efficient transcription o Initiator (-2 to +4) o Downstream core elements (+28 to +32) o TFIIB recognition element (-37 to -32)  Other things can compensate for the lack of a TATA box

Elements in the promoter affect transcriptional efficiency

 Some genes have nicely-placed TATA boxes  Other genes don’t have TATA boxes but they have CpG elements which are regions in the proximal promoter that are enriched in CpG (CGCGCGCG elements) – and it’s the same going the other way  Most of the genome is transcribed in both directions

Transcription occurs in both directions off some promoters

 TATA boxes will be associated with the genes at distance 0 or close to it

TATA-box binding protein (TBP)

 The TBP is a core polypeptide in a larger complex called TFIID  TFIIs are general transcription factors required for RNA Pol II or Class II transcription  The conserved C-terminal domain of TBP binds to the minor groove and distorts the double helix – it makes a kink in the DNA  Some promoters are “TATA-less”, but still require TBP for initiation  It is that kink in the DNA as well as TBP that is required to recruit all other transcription factors

RNA polymerase II: it’s complex

1. Kink in DNA after binding of TBP 2. TFIIB will recognize that complex and interact directly to TBP 3. A complex of RNA Pol II associated with another transcription factor called TFIIF will interact with TFIIB which is linked to TBP, which is linked to the TATA box 4. TFIIE comes into the complex 5. It is followed by TFIIH which has enzymatic activities associated to it  It has 2 DNA helicase activities and 1 protein kinase activity associated with different components of TFIIH  The complex becomes very stable – preinitiation complex  Once ATP is provided to the complex in addition to NTPs, the complex will open by melting duplex DNA – the coding strand will be exposed so RNA pol II can start to read off its template and start making mRNA  Switch from pre-initiation to initiation  The protein kinase activity associated with TFIIH will phosphorylate key residues on the carboxy terminal domain of RNA polymerase II and this phosphorylation is also associated with the switch from initiation to elongation  Helicase opens up a bubble, protein kinase is responsible for the phosphorylation

Components of the basal transcription machinery play additional roles

 TFIIH is the only general/basal transcription factor that has ATP-dependent enzymatic activities  TFIIH contains two DNA helicases involved in Xeroderma pigmentosum  Xeroderma pigmentosum patients have a faulty nucleotide excision repair (NER) system  Some patients have defects that are much more severe

TFIIH may couple basal transcription with DNA repair

 Heavily transcribed regions are repaired more effectively  TFIIH may affect transcription-coupled repair o Would repair thymine dimer

Developmental gene expression is synchronized through release of paused elongation complexes

 Transcription-initiation is probably the rate-limiting step, but it may not be a general case  Gene transcription may be regulated downstream of initiation at elongation

 Instead of immediately going through the gene, the complex may get to the first nucleosome and wait for particular contingencies to switch over to an elongation phase of active transcription  A number of genes tend to be transcribed, elongated at the same time  There is a coordination of gene expression associated with elongation controls as opposed to initiation controls

What genes are being actively transcribed?

 Nuclear Run on or Genome-wide Nuclear Run on (GRO)  You purify nuclei and provide those nuclei with a radioactively-labelled nucleotide which will be incorporated into a growing mRNA chain of RNA pol complexes that are actively transcribing any gene at any given time  You let that go for a while then you stop it and add elongation inhibitors  You can purify RNA from that reaction and get an idea of how many RNA pol molecules were transcribing – should be more or less related to the amount of radioactive RNA that you get from that particular sample  GRO can be combined with RNAseq – a genome-wide run-on assay to figure out how well all the genes are being transcribed at one single time  Instead of radiolabelling all the RNAs at once, you can use antibodies o Antibodies against Phospho-RNA Polymerase II (elongation specific) . Antibodies recognize elongating RNA Pol II . Smash up nuclei and add antibodies so they can interact with its antigen RNA Pol II in its elongating form . You can get rid of everything else and pull down RNA pol II and presumably all the RNA still associated with it o Perform immunoprecipitation and recover RNA . Get rid of everything else . Can bind antibody to certain matrices . Spin complexes and precipitate them . Precipitate has beads, antibodies, and RNA Pol II o Make cDNA and perform RNAseq . Make libraries and do RNAseq to find out how many cDNA sequences correspond to the genes in the genome . Can find out what genes are being transcribed and at what level all at one time o Protein A can bind to agarose beads and interact with antibody with high affinity which interacts with RNA pol II at high affinity o Can precipitate by centrifugation o Can also find out the direction of transcription

RNA polymerases I transcription requires the assembly of factors on a Pol I promoter

 Requires DNA elements to recruit RNA polymerase I to the appropriate Class I genes that need to be transcribed  To work properly, it needs TBP  surprising  No need for ATP for Class I (unlike Class II)  Two elements drive transcription of Pol I promoters: o The core element situated at -40 to +5 is essential for Pol transcription o The upstream element -155 to -60 greatly enhances Pol I transcription efficiency o Assembly in vitro begins with recognition of the upstream element by a protein complex called the upstream activating factor (UAF)  Atrimeric complex called core factor, which contains TBP (TATA-binding protein) then associates growing assembly o Following association Pol I then joins the complex

Pol III promoters are really different

 TBP is also required for optimal efficiency  No ATP is required for Class III  Pol III promoters control the transcription of the tRNA genes and the 5S rRNA gene  Unlike Pol I and Pol II genes, Pol III promoters are located within the transcribed regions  Conserved internal promoter elements called the A Box and the B Box of tRNA genes also encode key structural elements of the tRNA  The C Box acts as a promoter element in 5S gene transcription  Initiation requires sequential assembly: TFIIIC (with TFIIIA in 5S), TFIIIB then Pol III

Lecture 21 – Eukaryotic Transcription II

TATA-box Binding Protein (TBP)

 The conserved C-terminal domain of TBP binds to the minor groove and distorts the double helix.  Some promoters are “TATA-less”, but still require TBP for initiation  TBP is a component in all classes of eukaryotic transcription

Formation of a Pre-initiation Complex: the General (or Basal) Transcription Factors

 TBP (or TFIID) binds the TATA box and recruits in other general factors and Pol II  TFIIH is required for open complex formation and initiation o It melts the DNA and forms an open complex o It is dependent on ATP as an energy source  If the pre-initiation complex has nTPS, it will initiate transcription  Once elongation starts, RNA pol II changes to a phosphorylated form where residues in the carboxy terminal domain of the large subunit become heavily phosphorylated and these phosphorylations are dependent on a protein kinase present in TFIIH  Once that occurs, we switch from initiation to elongation  When people were doing in vitro assays using basal transcription factors with purified components, it became clear that you cannot get a strong activation of transcription  TBP is sufficient for basal transcription activity but cannot support activated transcription  TBP is a poor-binding protein that will interact with the DNA but doesn’t have a lot of the other factors required for transcription activation which is more important for abundant transcription of genes

TFIID…

 TBP is just one small player in the TFIID complex  The other TBP-associated complexes all contribute to efficient transcription in vivo  It may be due to the large multi-protein complex interacting with transcriptional activators working upstream

Mediator…

 Large multi-protein complex required for activating transcription  Bridges what’s going on in the proximal regions in a promoter with more distal sites in the promoter  Important for changing the configuration of the chromatin to allow for efficient transcriptional activation

Transcription factors recognize specific DNA sequence motifs

 Activated transcription makes those general transcription factors more efficient o Effecting re-initiation so transcription factors can re-initiate o Ensure that those factors come together more efficiently to form pre-initiation complexes  An activated transcription depends on elements that may be further upstream within the promoters and very distal o Changes the topology of the chromosomes o Looping of chromatin regions to allow accessibility of these factors o Turning on transcription depends on DNA-binding factors which will interact with the DNA in a specific manner due to interactions between specific domains

 Structure have alpha-helical domains that will interact in a non-covalent manner with the bases and major grooves of the DNA helix  Commonly, transcription factors possess an alpha-helical domain, the so-called recognition helix  These recognition helices, by interacting with those bases, will change the configuration of that region and allow for either greater pre-initiation complexes or some aspect of basal transcription o The recognition helix can recognize a particular DNA sequence motif through non-covalent interactions with atoms in the bases.  Recognition usually occurs through interaction of the helix with the major groove of DNA  Not all have to interact with major groove, e.g. TBP interacts with minor groove

Transcription factor binding sites can be found through linker scanning mutations

 Linker scanning mutations identify important sites required for transcriptional activation downstream  To identify where the DNA elements are within a promoter region, we can carry out a number of experiments  One of the most efficient means is to scan through DNA sequence and remove chunks using specific enzyme digest o Cut a region, take out that DNA that corresponds to that region and then patch it back up o Can test different regions in transcription to see their efficiency o If you reduce certain regions, you can see the output  Reporter genes facilitate the relative quantification of transcriptional efficacy o Type of indicator o Tells us that you’re making a protein based on particular transcription or translation of this trans gene o Driven by a promoter o Put it into a vector and introduce it to a host o With the entire control gene, you see efficient activation of the reporter gene o If we take out chunks of that control region by targeted enzyme digest, we can test every one of those variants for its ability to activate transcription of that reporter gene o If you remove key elements required for efficient transcription of the reporter, this will show up by how strongly you can stain for β-galactosidase activity for example  In the control region, there are at least 3 sub-regions in the DNA that are critical for the transcription of a downstream gene o These regions of the DNA may bind specific DNA-binding transcription factors  Some common “reporters”: o Green fluorescent protein (GFP) o β-galactosidase (lacZ) o Thymidine kinase (tk) o Luciferase (luc) o Chloramphenicol acetyltransferase (CAT)

Electrophoretic mobility shift assays (EMSA) – DNA binding activity

 Assesses DNA-binding proteins with specific DNA elements  EMSA or gel/band/mobility shift assays to assay for DNA binding activity o Assess a complex mixture or a purified sample of protein and its ability to interact with a specific double-stranded DNA region that you’ve identified by linker scanning or some other means  A radiolabelled dsDNA segment is used as a probe – it corresponds to a region of DNA that you’re interested in that potentially could bind a transcription factor o In each fraction you add the radioactive probe and set up a condition so any DNA-binding protein can interact with that DNA and if it does, it will form a ternary complex of the DNA, the probe, and the DNA-binding protein o You take all of these reactions and run them on the polyacrylamide gel which acts as a molecule sieve so molecules that are more complex will not run through that sieve as efficiently as the DNA probes o When you run that gel and visualize where the radioactivity is, the free probe ran really rapidly with the run of the gel in most of the fractions, but in certain fractions, you see shifted complexes suggesting that there is some entity that interacts specifically with the probe which changes its mobility – these are most likely DNA-binding protein, DNA ternary complexes  Protein: DNA mobility is altered in the non-denaturing polyacrylamide gel  EMSA/Gel shifts cannot reveal the precise sequence that is bound by the protein

DNA binding activity can be visualized in vitro

 Can find actual binding sites – physical structure configuration binding protein to DNA strand  One way to do this is with a DNA footprint assay o You subject a DNA element to a gentle DNAase treatment o One cut per molecule o Use 5’ end-labelled oligonucleotide combined with a complementary oligonucleotide to make it double- stranded o Subject it to DNAase treatment, you will get random cuts throughout the probe o If you were to denature it and run it on a gel, you would see a small fragment if it cuts at the beginning and a larger fragment if it cuts a bit farther down, etc. o Will get an idea of a ladder which will contain region that you think protein is biding o Add protein you think is interacting with specific DNA region, e.g. TBP o If protein is interacting with certain sequence, DNA digest cannot cut o Regions of DNA protected by DNA-binding protein against DNA digest

o Can run sequencing gel beside and find exact sequence that is being protected

Transcription factors are modular

 Most transcription factors have several domains that each perform distinct functions  Characteristics of DNA-binding domains will dictate which sequences in the DNA they’ll interact with  Just interacting with DNA is insufficient to turn on transcription – there are other functional regions within DNA transcription activators required to activate the transcription  Example: the GAL4 transcription factor from yeast o Zinc-finger transcription factor

o Contains a DNA binding domain to bind UASGAL in the proximal region and activates a downstream reporter gene o Contains an activation domain to stimulate transcription o The very terminus of GAL4 is required to interact with the UAS domain o The end terminus possesses the DNA-binding function of GAL4 o Because it can interact with DNA, it never turns on transcription – it needs to bind DNA to activate transcription  Just binding DNA is not enough to activate the transcription of a reporter gene  You require a DNA-binding domain coupled with a transcriptional activation domain

Modular structure is very common in eukaryotic transcription factors

 Transcription factors can possess domains for: o DNA binding . Modular in nature . Need something to bring them in proximity to DNA o Transcription activation o Transcription repression o Chromatin remodelling o Nuclear import o Protein interaction  Activation domain is important for turning on transcription downstream

The Drosophila Antennapedia mutant

 An extremely important family of transcription factors was identified through genetic mutants that had homeotic transformations.  Homeotic transformation: parts of your body forming in the wrong place o Mutation of the Drosophila gene Antennapedia leads to the formation of legs instead of antennae on a Drosophila head segment  This and many other genes that caused related transformations were cloned and found to encode transcription factors with a highly conserved DNA binding domain referred to as the homeodomain  These factors are highly conserved from flies to humans and in many lower metazoa

Homeodomain Proteins

 Genes affected by homeotic mutations encoded DNA-binding transcription factors o They fall into a class of transcription activators called homeodomain proteins o Bind to a very defined set of nucleotides that make up their DNA element o Bind to elements by using particular helices and activate transcription of downstream genes  The homeodomain was named due to its presence in several transcription factors that give rise to homeotic transformations when mutated at particular residues  DNA binding element are not very complex

Different types of zinc finger DNA binding domains exist

 C2H2 types usually contain three or more finger units and bind to DNA as monomers

 C4 types usually contain only two finger units and bind to DNA as homo- or heterodimers, i.e. steroid hormone receptors

 The C6 Zinc Finger transcription factor is a variation wherein six cysteine metal ligands coordinately bind two Zn2+ ions

Leucine zipper proteins

 Leucine zipper proteins bind DNA exclusively as homo- or heterodimers with their extended alpha-helices, which bind the DNA’s major groove  They contain a leucine or a different hydrophobic amino acid in every seventh position in the C-terminal region of the DNA binding domain – BZIP proteins  These hydrophobic residues form a domain, which is required for dimerization

Helix-Loops-Helix Proteins

 HLH proteins are very similar to leucine zipper proteins  Instead of an extended alpha-helix they are characterized by two alpha-helices, which are connected by a short loop  HLH proteins contain hydrophobic amino acids spaced at intervals characteristic of an amphipathic alpha-helix in the C-terminal region of the DNA binding domain

Combinatorial possibilities greatly extend the potential for diversified gene regulation

 The combination of transcription factor binding sites in promoters leads to a diversity of transcriptional responses  Homo- and heterodimer formation is common among transcription factors  Three transcription factors that can homodimerize or heterodimerize  6 different possible combinations

Transcription factors of unrelated classes can also bind cooperatively

 Two completely different classes of transcription factors can help each other out to turn on transcription downstream of the gene  NFAT and AP1 interact with IL2 – neither interact strongly to the DNA, but if they work together, they can stabilize the formation of a ternary complex  They act very strongly to enhance transcription of an IL2 gene

An example: specification of floral organs in Arabidopsis

 Arabidopsis flowers consist of concentric whorls of organs: four sepals in the outer whorl surround four petals in the next whorl, followed by six stamens and two carpels  These organs are specified by the combinatorial action of MADS box class DNA binding transcription factors: Agamous, Apetala 1, Apetala 2, Pistillata and Sepalata 3