Learning Objectives :

• Understand the basic differences between genomic and cDNA libraries • Understand how genomic libraries are constructed • Understand the purpose for having overlapping DNA fragments in genomic libraries and how they are generated • Understand how cDNA libraries are constructed and the use of reverse transcriptase for their construction • Understand the rationale for library screening • Understand the method of plaque hybridization • Understand the four methods for library screening and when they are put into use Molecular cloning in bacterial cells….

This strategy can be applied to genomic DNA as well as cDNA Library construction • two types of libraries • a contains fragments of genomic DNA (genes) • a cDNA library contains DNA copies of cellular mRNAs • both types are usually cloned in vectors

Construction of a genomic library

vector DNA (bacteriophage lambda) • lambda has a linear double- stranded DNA genome • the left and right arms are essential for the phage replication cycle • the internal fragment is dispensable Bam HI sites

“left arm” “right arm”

internal fragment (dispensable for phage growth) NNGGATCCNN human genomic DNA (isolated from Bam HI sites: many cells) NNCCTAG GNN

cut with Bam HI cut with Sau 3A (4-base cutter) (6-base cutter) which has ends compatible with Bam HI: NNN GATCNNN internal fragment NNNCTAG NNN remove internal fragment isolate ~20 kb fragments “left arm” “right arm” “left arm” “right arm”

combine and treat with DNA ligase

“left arm” “right arm”

package into bacteriophage and infect E. coli 5 6 2 3 1 4 7 • genomic library of human DNA fragments in which each phage contains a different human DNA sequence Partial restriction enzyme digestion allows cloning of overlapping fragments

a “contig”

• isolation of ~20 kb fragments provides optimally sized DNAs for cloning in bacteriophage • partial digestion with a frequent-cutter (4-base cutter) allows production of overlapping fragments, since not every site is cut • overlapping fragments insures that all sequences in the genome are cloned • overlapping fragments allows larger physical maps to be constructed as contiguous chromosomal regions (contigs) are put together from the sequence data • number of clones needed to fully represent the human genome (3 X 109 bp) assuming ~20 kb fragments • theoretical minimum = ~150,000 • 99% probability that every sequence is represented = ~800,000 All possible sites:

Results of a partial digestion: = uncut = cut Genomic Library making… The partial digest is one of the most important steps. Why???

Due to the production of overlapping DNA fragments The production of a cDNA library Construction of a cDNA library

• reverse transcriptase makes a DNA copy of an RNA

The life cycle of a retrovirus depends on reverse transcriptase

retrovirus 2. the capsid is uncoated, releasing genomic RNA and reverse transcriptase

3. reverse transcriptase 1. enters cell makes a DNA copy and looses envelope

4. then copies the DNA strand to make it double-stranded DNA, 6. it is translated into viral proteins, removing the RNA with RNase H and assembled into new virus particles

5. the DNA is then integrated into the host cell genome where it is transcribed by new host RNA polymerase II • cDNA library construction 5’ AAAAA 3’ mRNA (all mRNAs in cell) anneal oligo(dT) primers of 12-18 bases in length

5’ AAAAA 3’ 3’ TTTTT 5’ add reverse transcriptase and dNTPs

5’ AAAAA 3’ 3’ TTTTT 5’ cDNA add RNaseH (specific for the RNA strand of an RNA-DNA hybrid) and carry out a partial digestion 5’ 3’ AA TTTTT short RNA fragments serve as primers for second strand synthesis using DNA polymerase I 5’ AAAAA 3’ TTTTT short RNA fragments serve as primers for second strand synthesis using DNA polymerase I 5’ AAA 3’ TTTTT DNA polymerase I removes the remaining RNA with its 5’ to 3’ exonuclease activity and continues synthesis 5’ AAA 3’ TTTTT DNA ligase seals the gaps

5’ AAAAA double-stranded 3’ TTTTT cDNA 5’ AAAAA 3’ TTTTT NNNNNNNNG EcoRI linkers are ligated to both ends NNNNNNNNCTTAA using DNA ligase AAAAANNNNNNNNG TTTTTNNNNNNNNCTTAA 5’ AATTCNNNNNNNN 3’ GNNNNNNNN

• double-stranded cDNA copies of mRNA with EcoRI cohesive ends are now ready to ligate into a bacteriophage lambda vector cut with EcoRI EcoRI sites cDNAs “left arm” “right arm”

combine cDNAs with lambda arms and treat with DNA ligase

“left arm” “right arm”

package into bacteriophage and infect E. coli 5 6 2 3 1 4 7 • cDNA library in which each phage contains a different human cDNA DNA libraries

• Genomic DNA libraries contains both introns and exons and promoters etc… – Usually made with 4 base cutters that cut frequently ( every 275 bases or so). – The production of overlapping sequences is due to partial digestion. – Libra sqry complexity is important to make sure that the sequence you are looking for is found in the DNA that has been sampled. – N = ln (1-P) / ln (1-f) where N = number of clones, P = probability that the DNA fragment is found in your library and f = the frequency of the DNA in your library. Genomic DNA complexity • To screen for a clone in a library usually want a 99% probability that your clone is found there. • Frequency is the size of the DNA fragment in the library/the size of the haploid genome. For a lambda library 17 kb (1.7 x 104) is the average size of library. The size of the genome is 3 x 109 bp F = 1.7 x 104 / 3 x 109 bp N = ln (1-.99) / ln (1- [1.7 x 104 / 3 x 109]) N = ln .01 / ln (1 - 0.56 x 10-5) N = -4.6061702 / -0.0000056 N = 822,351 clones Genome equivalents

• How many genome equivalents are there in this library? How do you calculate this? 822,351 x 1.7 x 104 bps = 1.40 x 1010 bps Divide by the genome size 3.0 x 109 bps = 4.67 times the genome equivalent How many positives will you get if you screen for a single copy gene? Insertional mutagensis

• In all of the vectors that are currently used to date there is a system that can either identify or select for vectors containing clones. This is the backbone of recombinant DNA technology. • Initial vectors involved the cloning into a antibiotic resistance gene making a bacteria containing a vector with a DNA fragment sensitive to the antibiotic. This is not the best situation, Why? Insertional mutagenesis II

• The use of the beta-galactosidase gene for an insertional mutagenesis target allowed the screening of all clones for those that contained inserts by a simple blue white color assay. This gene cleaves X gal (chromagen) to give rise to a blue dye that colors the bacteria or phage plaque. This allows the screening those or phage particles that contain DNA disrupting the target gene. Insertional mutagenesis III

• In addition suppressor tRNA genes can be used to identify YAC that contain an insert. The suppressor tRNA can suppress the effects of a Ade2 ochre mutation. This gives a white yeast colony. When the tRNA gene is disrupted the colonies are pink due to the accumulation of a precursor of Adenine. Pink colonies are what is desired. See Figure 4.16 Clones are usually characterized first by restriction digestion. This DNA fragment was digest with various enzymes giving rise to specific sizes. These can be used to generate a restriction map Vectors for library construction

vectors – Small circles of DNA that contain a selection marker like antibiotic resistance. – Insertional mutagenesis target with a multi cloning site. – A variety DNA replicons. Bacterial, Yeast. • Maximum size of insert is about 10 kb. Lambda and Cosmid vectors

• Bacteriophage lambda can be used as a . It has a genome of about 50 kb of linear DNA. Its life cycle is condusive to the use as a cloning vector… The lytic cycle can be supported by only a portion of the genes found in the lambda genome. Lambda life cycle. The lytic life cycle produces phage particles immediately The lysogenic life cycle requires genes in the middle of the genome, which can be replaced

Lambda insertion and replacement vectors • Only 37 to 52 kb DNA fragments can be packaged into the lambda head. This can be done in vitro .Because the middle portion of the lambda genome can be replace if the lytic life cycle is used up to 23 kb DNA can be inserted in lambda genome. These are used for genomic DNA libraries. • Insertion vectors can hold up to 7 kb of cDNA. Lambda genome In vitro Packaging of ligated lambda DNA. Cosmid vectors

• A cosmid is a hybrid between a lambda vector and a plasmid. The COS sites are the only thing that is necessary for lambda DNA packaging. Therefore if one can ligate COS sites about 50 kb apart then the ligation products can be in vitro packaged. Therefore cosmid vectors can contain 33 to 45 kb. Cosmid vector ligation Making a Genomic Library with Cosmids • Partial digest

TetR 21.5 kb

cos

EcoRI • Ligation into site EcoRI Final Steps of a Genomic Library • Package into heads and plug with tails • Transduce E. coli receptor cell • Select white colonies with tetR • Check for plasmid • Screen in your mutant for phenotype restoration Things You Should Remember

• Some plasmids are used as vectors or cloning vehicles but they are limited to the amount of DNA that can be cloned. • A cosmid is a plasmid that has at least one COS (cohesive end site). • COS comes from a bacteriophage. • A genomic library contains at least one copy of every gene in an organism. Cloning large DNA fragments

• Due to the large size of the human genome and the fact that many genes are very large and some DNA fragments cannot be replicated in lambda other vector systems needed to be developed. • Bacterial Artificial chromosomes (BAC) vectors – These vectors are based on the E. coli F factor. These vectors are maintained at 1-2 copies per cell and can hold > 300 kb of insert DNA. – Problems are low DNA yield from host cells. (due to low copy number when compared to 300 copies per cell with a plasmid vector like pUC19. Cloning large DNA fragments II

• Bacteriophage P1 – These vectors are like lambda and can hold up to 110 to 115 kb of DNA . This DNA can then be packaged by the P1 phage protein coat. – The use of T4 in vitro packaging systems can enable the recovery of 122 kb inserts. – See Figure 4.15 Bacteriophage P1 vector system. Cloning large DNA fragments III • Yeast Artificial Chromosomes ƒ Many DNA fragments cannot be propagated in bacterial cells. Therefore yeast artificial chromosomes can be built with a few specific components. 1. Centromere 2. Telomere 3. Autonomously replicating sequence (ARS) • Genomic DNA is ligated between two telomeres and the ligation products are transformed into yeast cells using the spheroplast method. YAC cloning system Cloning systems

Vector systems that can be used to clone DNA Plaque hybridization

• This is a general technique required for a number of specific approaches for isolating cDNA or genomic clones • Generally, one starts by 1). Isolating a cDNA sequence from a cDNA library, then 2). The gene from a genomic library using the cDNA as a probe • Information gained from cDNA and genomic clones 1). cDNA clones provide the amino acid sequence of the full-length protein, unencumbered by intron sequences 2). Genomic clones provide the control regions and are required for searching for mutations

Library screening: four experimental approaches

• Starting with a protein 1). Synthetic oligonucleotide - plaque hybridization 2). Antibody - variation of plaque hybridization • Starting with mRNA 1). Differential cDNA library screening - yet another variation 2). Expression screening - does not utilize plaque hybridization Library screening • plaque hybridization • plate phage library on lawn of E. coli (bacteria >>> phage) • plaques form as a consequence of a spreading lytic infection starting with a single phage-infected bacterial cell • each phage plaque is a clone of identical recombinant phage • prepare replica of phage plaques and hybridize DNA with probe

E. coli lawn is grown on agar plate and then overlayered with the recombinant phage library.

Wherever a single bacteriophage particle infects a bacteria cell, a plaque will form. This is a clear area caused by the lysis of bacteria on the lawn of E. coli.

A replica of the agar plate is made on a nitrocellulose sheet - the DNA is denatured and adheres to the nitrocellulose. The nitrocellulose is hybridized with a labeled DNA probe (such as an oligonucleotide) and the nitrocellulose is exposed to X-ray film. X-ray film spot on film indicates a plaque containing DNA of interest Labeled probe in solution 32P label e.g. an ‘oligo’ probe

Hybridization of probe to immobilized DNA

Probe hybridized to immobilized DNA forming double-stranded region How does one isolate a gene for an inherited disorder? • Start with a candidate protein DNA protein • If a protein candidate has been identified for a genetic disease it can be used to make a probe to screen for the gene 1. oligonucleotide probe • purify the protein of interest • partially sequence the protein • find a region having amino acids with the fewest possible codons • predict a DNA sequence that could represent a gene region encoding a portion of the protein • synthesize a set of degenerate oligonucleotides for that region • hybridize the labeled oligonucleotide to the phage library MET.GLU.PHE.TYR.ILE.CYS.GLN.LYS amino acids AUG.GAA.UUU.UAU.AUU.UGU.CAA.AAA G C C C C G G all possible A } oligonucleotides 1 X 2 X 2 X 2 X 3 X 2 X 2 X 2 = 192-fold degenerate 2. antibody probe • purify the protein of interest • make an antibody to that protein • construct cDNA library to express recombinant proteins in E. coli • use the antibody to detect the protein being made from the cloned cDNA encoded by the recombinant phage in E. coli using plaque hybridization method modified for antibody probes

bacterial promoter and Shine-Dalgarno sequence left arm right arm

human cDNA insert How does one isolate a gene for an inherited disorder? • Start with a candidate mRNA DNA mRNA • mRNA candidates can be identified by comparing mRNA populations between normal and abnormal tissues, or by looking for a specific function encoded by the mRNA 1. differential cDNA library screening • prepare duplicate plaque replica plates • hybridize one with a labeled cDNA probe made to all the mRNAs in the normal cell and hybridize the other (duplicate) with the corresponding probe to the abnormal cell • differences in cell function should be reflected by differences in the mRNA populations • any plaques showing differential hybridization are candidates differentially no expressed hybrid- clone ization to this plaque hybridization with cDNA hybridization with cDNA probe probe from abnormal cells f l ll 2. expression screening • develop a cell-based functional assay for the abnormality (e.g., a transport assay) • construct cDNA library in a way that will allow expression of protein in mammalian cells • inject groups of cDNA clones into cells and assay function • narrow down cDNA clones using smaller groups of clones until the function is observed with a single cDNA species • inject groups of cDNA clones • if the function being assayed is observed, divide the group of clones into smaller groups and retest • continue process of testing smaller groups until the function being assayed is obtained with one clone • for example, inject clones and test cells for transport activity left arm right arm

mammalian promoter human cDNA insert - + - -

- - - + + 104 - - 102 - - - - - Expression Cloning

• Certain vector systems can be used to produce specific products. • The type of expression product • RNA… Riboprobes • Protein product… • The type of environment • In vitro … cell free • In vivo … mammalian or prokaryotic cells • Purpose of the expression system • To produce large quantities of proteins for protein studies or antibody production. cDNA expression libraries

• The gene for a specific protein can be cloned from an expression cDNA library if an antibody to the protein is available. A variety vectors can be used to produce fusion proteins which can be detected with Ab in question. See Figure 4.18 Expression for Ab detection Expression in Eukaryotic cells

• Many proteins need specific modifications to work properly… expression in bacterial cells is not sufficient • Plasmid based Eukaryotic expression systems which work after transient transfection into mammalian cell lines have been produced. • Viral based system are also popular.