CMB 621 Fogelgren Lecture #1

From DNA to RNA: RNA and mRNA processing

Ben Fogelgren, Ph.D. Dept. of Anatomy, Biochemistry, and Physiology John A. Burns School of Medicine University of Hawaii [email protected] phone: 692-1420

(Thanks to Marla Berry and Dulal Borthakur for many slides!) My Objectives (2 lectures):

9/7/17 Chapter 6 of Alberts – Part 1: DNA to RNA • RNA Transcription • RNA Processing • Few Relevant CMB techniques 9/12/17 Chapter 7 - Control of Gene Expression + Article Discussion

** Obviously I can’t compress everything in these chapters into 2 lectures! So, I want you to read these sections, but the lectures will highlight what I think is most important. An overview of the flow information through the cell An overview of the flow information through the cell

• A mRNA is made as a complementary copy of the two DNA strands that make up a gene. • The use of mRNA allows the cell to separate information storage from information utilization. • DNA remains stored in the nucleus, while mRNA can be imparted into the cytoplasm for translation. • RNA also allows the cell to greatly amplify its output. • mRNA is translated into proteins by ribosomes, which contain both proteins and RNA. Figure 6-3 of the Cell (© Garland Science 2008) Gene Transcription

• Copying portion of the DNA code to RNA • Enzymes responsible for transcription are called RNA polymerases. • The site on which an RNA polymerase binds prior to initiating transcription is called the . • Cellular RNA polymerases require the help of additional proteins called transcription factors for recognizing promoters. 2 differences between DNA and RNA molecules:

(A) Ribonucleotides in RNA (ribose sugar) (B) Uridine (U) instead of Thymidine (T) Classes of mRNAs, rRNAs, tRNAs snRNAs - small nuclear snoRNAs - small nucleolar tmRNA - transfer/messenger (bacteria) lincRNA – long intergenic non-coding RNAs miRNAs - microRNAs siRNAs - small interfering Lecture #2 shRNAs - short hairpin RNA can form 3D structures

• The RNAs of a ribosome are called rRNAs. • tRNAs constitute a third major class of RNA that is required during protein synthesis. • Both rRNAs and tRNAs owe their activity to their complex secondary and tertiary structures. • Many RNAs fold into complex 3-dimensional shapes. • RNA folding is driven by the formation of regions having complementary base pairs. RNA can fold into specific structures

Watson-Crick base Non-conventional Structure of an pair interactions base pair interactions actual RNA

(Fig. 6.6. Mol Biol of the Cell) Some examples of Non-Watson-Crick or Hoogsteen base pairing found in tRNA Other RNAs in eukaryotic cells

• Small nuclear RNAs (snRNAs): Located in the nucleus; involved in RNA splicing, maintaining telomeres etc.

• Small nucleolar RNAs (snoRNAs): Small RNA molecules that guide chemical modifications of other RNAs, rRNAs, tRNAs, and snRNAs.

• Small interfering RNAs (siRNAs): is a class of double stranded RNAs, 20 -25 nt in length. siRNAs are involved in the RNAi pathway, where they interfere with the expression of a specific gene.

• MicroRNAs (miRNAs): single-stranded RNA molecules of 21-23 nt in length. Their main function is to down-regulate gene expression. Our genome encodes hundreds of these tiny microRNAs Chain elongation during transcription

The RNA polymerase covers approximately 35 bp of DNA. The transcription bubble contains about 15 bp single-stranded DNA and about 9 bp DNA-RNA hybrid. The enzyme generates overwound (positively supercoiled) DNA ahead of itself and underwound (negatively supercoiled) DNA behind itself. RNA polymerase in the act of transcription elongation

The DNA makes a sharp turn in the region of the active site. Transcription Cycle in Bacteria

Promoter β’ α σ β α ω

σ factor Core enzyme Holoenzyme

β’ α σ β α ω Transcription in bacteria After ~10 nts of RNA synthesis, the core enzyme breaks away its interaction with the promoter and the σ factor.

Promoter and terminator sequences are heterogeneous. (Fig. 6.11. Mol Biol of the Cell) Chain elongation Transcription and RNA processing in eukaryotic cells

• Eukaryotic cells have three distinct RNA polymerases. • In eukaryotes, a large variety of accessory proteins or transcription factors are also required for transcription. • All three major types of RNAs - mRNAs, rRNAs, and tRNAs - are derived from precursor RNA molecules or primary transcripts. • Primary transcripts have a fleeting existence.

Major differences in the transcription machinery between prokaryotes and eukaryotes

• While bacteria have only one type of RNA polymerase, eukaryotes have three RNA polymerases • While bacterial RNA polymerase requires only a single σ factor for transcription initiation, eukaryotic RNA requires many additional proteins • initiation must deal with the packing of DNA into and higher order forms of chromatin structures, features that are absent in bacteria The machinery for mRNA transcription

• All eukaryotic mRNA precursors are synthesized by RNA polymerase II. • Initiation of transcription by RNA polymerase II occurs in cooperation with a number of general transcription factors (GTFs). • GTFs are composed of a dozen different subunits, designated as “TFII…”. • GFTs carry out functions similar to σ factor in bacteria. • TFIIF has the same 3-dimentional structure as the equivalent portions of σ factor. The machinery for mRNA transcription

• In vast majority of the genes studied, a critical portion of the promoter lies between 24 and 32 bases upstream from the start site of transcription. • This region often contains a 5’TATAAA-3’, which is known as TATA box. • The TATA box of the DNA is the site of assembly of a preinitiation complex that contains the GTFs and the polymerase; this complex must assemble before gene transcription can be initiated. The TATA box

26 bp

25 bp

24 bp

Start site of transcription

The TATA box is the site of assembly of the preinitiation complex that contains GTFs and the polymerase. Initiation of transcription from a eukaryotic polymerase II promoter

• The first step in assembly of the preinitiation complex is binding of a protein, called TATA-binding protein (TBP), that recognizes the TATA box of eukaryotic promoters. • TBP is present as a subunit of a much larger protein complex called TFIID ( for polymerase II, fraction D). • TFIID also includes a number of other proteins. • TFIID causes a large distortion in the DNA of the TATA box Initiation of transcription from a eukaryotic promoter

Start site of trancription

TFIID includes the TFIID TBP subunit, which TBP binds to TATA box. TFIID

TFIIB TFIIB is thought to provide a binding site for RNA polymerase. Initiation of transcription from a eukaryotic promoter

TFIIA binds directly to TBP and TFIIA TFIIH contains stabilizes its binding to DNA 9 subunits, 3 of which possess enzymatic activities. TFIIF contains a subunit homologous TFIIH’s activities to bacterial σ factor include DNA- dependent ATPase helicase, C-terminal domain kinase

TFIIE is an α/β heterodimer and it modulates the helicase and kinase activities of TFIIH

(Fig. 6.16. Mol Biol of the Cell) Initiation of transcription from a eukaryotic promoter

Helicase activity and UTP, ATP, phosphorylation CTP, GTP Disassembly of most GTFs

RNA Transcription (Fig. 6.16. Mol Biol of the Cell) The core promoter elements The B recognition TSS element (BRE) is found immediately upstream of the TATA box, and consists of 7 nts G/C G/C G/A C G C C. TFIIB recognizes BRE TATA and binds to it. (-30) BRE Some genes do not (-35) have a TATA box and INR DPE use an initiator (+30) element (Inr) for The downstream promoter element transcription initiation. (DPE) is within the transcribed portion of The Inr element overlaps TSS. a gene

INR enhances the strength of a promoter that contains a TATA box. The core promoter elements

(Fig. 6.17. Mol Biol of the Cell) TBP bends the DNA molecule approximately 80° and allows TFIIB to bind to the DNA both upstream and downstream of TATA box

TATA-binding protein

TFIIB TFIIA

TFIIB TFIIA BBP

TFIID

TATA Fig. 6-18 Molecular Biology of the Cell Transcription • Together, RNA polymerase and GTFs are sufficient to promote a low, basal level of transcription from most promoters under in vitro conditions. • Once transcription begins, certain GTFs including TFIID may be left behind at the promoter, while others are released from the complex. • As long as TFIID remains bound to the promoter, additional RNA polymerase molecules may be able to attach and initiate additional rounds to transcription. Multiple transcriptions of genes can occur at same time Gene control region Promoter-proximal element TATA box Enhancer Enhancer Enhancer

Intron Intron Intron Exon Exon Exon Exon GC box CAAT box Promoter-proximal element includes CCAAT box (or CAAT box) at about 70 bp upstream of +1 and GC box (or GGCG) in a region of 100-150 bp further upstream from the TATA box

The CAAT box and GC boxes are also often required for a polymerase to initiate transcription.

Besides, CAAT and GC boxes, there may be CpG islands, comprising 20-50 nts CG-rich region within ~100 bp upstream of +1. In mammals, 70 - 80% of CpG cytosines are methylated Ref: Fig. 7.16 Mol Cell Biol, 6th Ed CpG islands

~100 bp upstream Multiple start from the start site sites for transcription Enhancer CpG island

CGCGCGCGCG….CGCG Intron Intron Intron Exon Exon Exon

Some house-keeping genes that express at low levels have multiple possible transcription start sites over an extended region, 20-200 bp in length. These genes give rise to mRNAs with multiple alternative 5’ ends.

They do not have a TATA box or Inr element and their transcription can begin at any of the multiple possible sites within 20-200 bp in length.

Most genes of this type contain CpG islands, comprising 20-50 nts CG- rich region within ~100 bp upstream of +1. CTD phosphorylation • The carboxyl-terminal domain (CTD) of the largest subunit of polymerase II has an unusual structure. • It consists of seven amino acid repeats (Tyr-Ser-Pro-Thr- Ser-Pro-Ser or YSPTSPS). • In mammals CTD consists of 52 repeats of this heptapeptide. • Of the seven amino acids, Ser2 and Ser5 are phosphorylated by at least 4 different kinases including TFIIH. • TFIIH phosphorylates ser #5. • PTEFb (elongation fa) phosphorylates Ser #2 Why CTD phosphorylation?

• Phosphorylation allows the enzymes to escape from the preinitiation complex and move down the DNA template. • It allows a new set of proteins to associate with the tail of RNA polymerase II for transcription elongation and RNA processing. • It serves as a tether, holding a variety of proteins close by until they are needed. SII, ELL, and PTEFb are three of a number of elongation factors that may be associated with the polymerase as it moves along the DNA.

PTEFb is a kinase that phosphorylates the Ser2 residues of the CTD after elongation begins. Post-transcription processing of Eukaryotic mRNAs

Capping occurs early in mRNA synthesis

Polyadenylation occurs at the last step of transcription, but mRNAs can later be deadenylated and readenylated in the cytoplasm (e.g. microRNA-mediated silencing)

Splicing of different introns can occur both before and after polyadenylation

Transport out to the cytoplasm is needed for translation to occur

Figure 24.2, Genes VIII, Pg. 698 Properties of eukaryotic mRNA • They contain a continuous nucleotide sequence encoding a specific polypeptide • They are found in the cytoplasm • They are attached to ribosomes when they are translated • Noncoding portions are found at both the 5' and 3' ends of a mRNA and contain sequences that have important regulatory roles. (5’ and 3’ UTR – untranslated regions) • Eukaryotic mRNAs have special modifications at their 5' and 3' termini that are found neither on prokaryotic messages nor on tRNAs, or rRNAs • 3' end of nearly all eukaryotic mRNAs – has a string of 50 - 250 adenosine residues that form a the poly(A) tail • 5' end has methylated guanosine cap 5' end modification

• 5' methylguanosine cap forms very soon after RNA synthesis begins • 5' end initially has triphosphate from first nucleoside triphosphate incorporated in RNA at initiation site of RNA synthesis • Once the 5' end of an mRNA precursor is synthesized, several enzymes act on this end of molecule • These enzymatic modifications at 5' end occur while the RNA is still in its very early stages of synthesis 5' methylguanosine cap

The mRNA contains a 5’ methylguanosine cap.

It is referred to as a 7- methylguanosine cap, abbreviated m7G.

The guanine nucleotide is connected to the mRNA via an unusual 5' to 5' triphosphate linkage. This guanosine is methylated on the 7 position directly after capping Methylated at ribose 2' position 5' methylguanosine cap

• First, the last of the three phosphates (γ) is removed by a phosphatase, leaving behind diphosphate. • Then, GMP is added in inverted orientation by a guanyl transferase so that guanosine 5' end faces 5' end of RNA chain. Thus the first 2 nucleosides are joined by an unusual 5'-5' triphosphate bridge.

• Next, a guanine-7-methyl transferase adds a methyl group to the terminal inverted guanosine at position 7 on guanine base. A 2-O-methyl transferase adds a methyl group to the (CH ) 2' hydroxy-group of the first one or two riboses of the 3 5' end of the mRNA. Functions of the 5’ cap

(a) Recruit cap binding complex, CBC = CBP 80 and CBP 20

(b) Enhance splicing – interactions between CBP 80 and splicing factors at 1st exon-intron junction

(c) Enhance cleavage prior to polyadenylation – immunodepletion of CBP 80 reduces cleavage

Proudfoot et al., Cell 108:501-512, 2002. Functions of the 5’ cap (continued)

(d) Protect from degradation Cap binding Proteins (CBP) blocks decapping enzyme (DCP1) Cap 5’-5’ linkage is resistant to 5’-3’ exonuclease even if CBP80 and CBP20 (CBComplex) are absent

CBP 80

CBP 20 7mGpppG mRNA

DCP1 decapping enzyme or 5’ – 3’ exonuclease Functions of the 5’ cap (continued)

(e) Transport of mRNA from nucleus through nuclear pore – CBC implicated, along with factors deposited during splicing

(f) Initiating first round of translation. First ribosome(s) recruited by CBC, but recruitment is inefficient Exchange of CBC for eIF4E – efficient ribosome recruitment

nucleus cytoplasm CBP 80 eIF-4G CBP20 eIF-4E 7mG 7mG mRNA mRNA

AAAAAAAAA AAAAAAAAA AAAAAAAAA AAAAAAAAA PABII PABP 3' end modification

• The poly(A) tail: 3' end of mRNA contains a string of adenosine residues that forms a tail • Tail invariably starts ~20 bases downstream from AAUAAA, a recognition site in primary transcript for the assembly of a protein complex that carries out processing reactions at the mRNA 3' end The poly(A) tail

• The poly(A) processing complex is also physically associated with the RNA polymerase as it synthesizes the primary transcript • Included among proteins of processing complex is an endonuclease that cleaves the pre-mRNA just downstream from the recognition site (CFI & CFII) • After nuclease cleavage, poly(A) polymerase (PAP) adds ~250 adenosines without the need of template • The poly(A) tail together with an associated protein protects the mRNA from premature degradation by exonucleases Addition of poly(A) tail

CstF – Cleavage stimulation factor CPSF – Cleavage and polyadenylation specificity factor Functions of 3’ polyadenylation

• Enhance splicing – poly A polymerase (PAP) interacts with U2AF at 3’ terminal intron-exon junction

Proudfoot et al., Cell 108, 501-512, 2002

• Prevent RNA degradation - PABs block 3’à5’ exonucleases

AAAAAAAAA AAAAAAAAA AAAAAAAAA 3’à5’ exonuclease Functions of polyadenylation (continued)

• Interactions between nuclear PABII and CBC link transcription and 3’ end processing • Interactions between cytoplasmic PABP and eIF4G are involved in mRNA stability, ribosome recruitment and possibly ribosome recycling • Interactions between 5’ and 3’ end have been shown to be important in many RNA regulatory processes

CBP 80 eIF-4G CBP 20 eIF-4E 7mG 7mG mRNA mRNA

AAAAAAAAA AAAAAAAAA AAAAAAAAA AAAAAAAAA PABII PABP

nucleus cytoplasm Split genes: introns and exons

• The pre-mRNAs are several times the size of mRNAs; processing needed as with rRNA precursors • Early rRNA processing studies showed that mature RNAs were carved from larger precursors • Large segments were removed from both the 5' and 3' sides of various rRNA intermediates to yield the final, mature rRNA products • Exons – those parts of the gene that contribute to the mature RNA product • Introns - intervening sequences Introns • Introns are found in all types of eukaryotic genes, including those that code for tRNAs, rRNAs and mRNAs • Type I collagen gene contains >50 introns • Average human gene contains ~9 introns Number and size of introns

• Individual exons average ~150 nucleotides • Individual introns average ~3,500 nucleotides • This explains why pre-mRNAs are much longer than mRNAs • Human dystrophin gene extends for roughly 100 times the length needed to code for its corresponding message RNA splicing pathways

• Several methods of RNA splicing occur in nature • The type of splicing depends on the structure of the spliced intron and the catalysts required for splicing to occur. • Mechanisms exist for : (i) Nuclear pre-mRNA or spliceosomal introns; (ii) tRNA introns and self- splicing introns: Group I and Group II – We’ll talk mainly about pre-mRNA… Most pre-mRNAs can have “alternative splicing” Splicing mechanisms

• Spliceosomal introns often reside in eukaryotic protein-coding genes.

• The 5’ end of an intron typically starts with the sequence GU and the 3’ end of an intron terminates with AG. • In addition, a short stress of bases adjacent to these GU AG sequences tend to be similar among different introns. • Upstream from the polypyrimidine tract is the branch point, which includes an important adenine nucleotide

Fig. 8-7. Mol Cell Biol RNA splicing mechanism

• The spliceosome is a complex of small nuclear ribonucleoproteins (snRNPs) consisting of five kinds snRNA (termed: U1, U2, U4, U5, U6) combined with more than 200 proteins. • The spliceosome assembles de novo at each round of splicing • Spliceosomes assemble on pre-mRNA while the pre-mRNA molecule is still being transcribed. • Each snRNP contains one or two molecules of uridine-rich snRNA (small nuclear RNA U1, etc.) Coordination of splicing with transcription

• Some of the components of the splicing apparatus appear to be tethered to the phosphorylated CTD of RNA polymerase II

• As the first splice junction is synthesized, it is bound to a spliceosome

• The second junction is captured by the spliceosome as it is synthesized Coordination of splicing with transcription

Fig. 26-16c. Lehninger Biochem Splicing always involves two events • Phosphodiester bonds at the exon-intron boundaries are broken. • A new phosphodiester bond is formed between the 3'-end of the upstream exon and the 5'-end of the downstream exon. The pre-mRNA splicing reaction

First transesterification First transesterification

2nd transesterification 2nd transesterification

Fig. 6-26. Mol Biol of the Cell Fig. 8-7 Mol Cell Biol Spicing mechanism in mRNA primary transcript

• Spicesomes are assembled by sequential binding of snRNPs to pre- mRNA. • The first step is binding of a snRNP, called U1, whose RNA contains a nucleotide sequence that allows it to base-pair with the 5’ splice site. • A second snRNP, called U2, then binds to the pre-mRNA in a way that causes a specific ‘A’ residue to bulge out of the surrounding helix. • This is the site that later becomes the branch-point of the lariat

Fig. 26-16a. Lehninger Biochem Spliceosome coordination of mRNA splicing Spliceosome coordination of mRNA splicing Human mutations can cause disease by changing gene splicing

Example: β thalassemia Synthesis and processing of rRNAs and tRNAs

• Eukaryotic cells contain several million ribosomes. • Each ribosome consists of several molecules of rRNA + dozens of r-proteins. • >80% of the RNA in most cells consists of rRNA. • DNA sequences encoding rRNA (rDNA) are normally repeated hundreds of times.

Main reason – mRNA can be amplified into many protein copies per mRNA, but for rRNA and tRNA the RNA is the final product, and they are needed in vast quantities The macromolecular composition of eukaryotic ribosome

The small subunit 18S rRNA contains an 18S Eukaryotic ribosomes have four RNA molecule. distinct rRNAs, three in large subunit 33 ribosonal and one in the small subunit. proteins 40S subunit

49 ribosonal proteins 80S subunit

60S subunit

The large subunit contains a 28S, 5.8S and 5S RNA 5S 5.8S molecule. rRNA rRNA 28S rRNA 28S, 18S, and 5.8S are The 5S rRNA is synthesized from carved by various a separate RNA precursor in the nucleases from a single nucleoplasm outside the 45S primary transcript. 18S rRNA nucleolus. Transcribed by polymerase III 33 ribosonal proteins 40S subunit

49 ribosonal proteins 80S subunit

60S subunit

The 45S pre-rRNA is transcribed by 5S polymerase I 5.8S rRNA rRNA 28S rRNA Synthesis and processing of the 45S rRNA precursor in mammals rRNA gene DNA Transcription

13000 nt Chemical modification

Degraded regions of RNA processing by snoRNAs primary transcript

5S rRNA made Incorporated into small rRNA subunit elsewhere Incorporated into large rRNA subunit rRNA and the Nucleolus

• rRNA transcription and processing occur in the Nucleolus, an membrane- less organelle within the nucleus

• in humans, there are about 200 copies of the 45S rRNA gene, spread out in 5 chromosomal gene clusters. The chromatin containing these clusters hang out in the nucleolus. Transfer RNAs

• ~ 50 tRNAs in plant and animal cells, each encoded by DNA sequences repeated a number of times within genome • Repeat number varies with organism (yeast - ~275 total tRNA genes, fruit flies - ~850, humans - ~1,300) • Genes are found in small clusters scattered throughout genome • A single cluster typically contains multiple copies of different tRNA genes • tRNA genes transcribed by Polymerase III t-RNAs will be covered more in translation lectures… Let’s take a break…. Some unsolicited advice from a former CMB graduate student…

• When choosing a research lab, the project itself is not the most important thing! - The PI, the productivity of the lab, the stability of the funding, the “fit”… - also, still choices to join another lab if you don’t find a good fit during your 3 rotations.

• During graduate school, learning non-science stuff is as equally important as learning the science… (how does the University work? Grantsmanship, Management, Budgeting, Politics, Scientific Writing, Presentation, Supervising, Lab organization, etc.)

• Find successful people that are 1 or 2 steps ahead of you in the career path, and learn from them. (“mentors”) - young grad students - find older successful grad students - older grad students - find successful postdocs - postdocs - find successful young faculty Some unsolicited advice from a former CMB graduate student…

• Choosing a Dissertation Committee - don’t wait too long. - the committee is FOR YOUR PROTECTION AND BENEFIT

• Try to figure out what you want to do after graduate school If you want to go into academic science: • 2 most difficult transition times (plan ahead) • Stay in Hawaii???

• Aim to submit a graduate fellowship in year 3-4 of your PhD program

• At the end of your graduate career (with PhD), you will be judged mainly on quality and quantity of publications. Also your PI’s recommendation will be very important. Fellowship applications are the only place that will ask for grades.

• What impresses your PI? Work your butt off, read lots of articles, good attitude, and constantly work on your communication skills! Some unsolicited advice from a former CMB graduate student…

• Remember the PI is the captain of the ship - everyone should ask questions and give their opinions, but the PI makes the final decision!

- He/she is looking at the big picture, and maybe be thinking on very different levels than the people working in the lab.

• Remember the lab is a TEAM. The more you work together, the more data and publications. Look for opportunities to help other people in their projects, and possibly get on their papers. And let others help you, and put them on your papers. Do favors, and don’t be shy about asking for favors.

• For experiments - think about CONTROLS all the time. What will be your negative controls, what will be your positive controls? - If you don’t use proper controls - you cannot draw any valid conclusions from your experiment!

• Keep your data and samples well organized – it’s worth the time. Some common molecular techniques used to investigate transcription…. Reverse Transcription PCR (RT-PCR)

• “RT” because it goes from RNA to DNA (against central dogma of molecular biology!). • Uses Reverse Transcriptase enzyme – discovered and cloned from RNA retroviruses (this work led to 1975 Nobel Prize)

In the lab: Real Time PCR (qRT-PCR, qPCR)

• Essentially a PCR reaction designed to measure the quantity of a specific template molecule in a heterogenous sample.

• From RT-PCR reaction as template – you can measure the concentration of a specific mRNA, which correlates to that gene’s “expression level”

• Accomplished using fluorescent probes and a real time PCR machine that can read fluorescence after every cycle in the PCR reaction.

• Can use known template to compare with unknown samples to deduce starting copy number. Real Time PCR (qRT-PCR, qPCR) • Simplest readouts often use DNA-binding dyes - most commonly: SYBR Green

• More complicated probes use fluorescent probes - most commonly: Taqman probes Real Time PCR (qRT-PCR, qPCR)

Linear regression of Ct values

(from NCBI)

Numbers are # of template molecules Transcription Reporter Constructs • Used to measure a specific segment of DNA’s ability to activate transcription, or to measure a protein’s transcriptional activity

(from www.piercenet.com) Electrophoretic mobility shift assay (EMSA)

• Used for detecting proteins bound to specific DNA or RNA segments

Some type of label (biotin, radioactivity, etc.)

Cell extract or purified proteins

(or DNA probe)

(from www.piercenet.com) Chromatin Immunoprecipitation (ChIP) 1.

1) Proteins are crosslinked to DNA strands with formaldehyde 2.

2) DNA is isolated and broken up into small pieces with sonication or digestion 3. 3) A specific antibody against the protein of interest is used to IP the DNA/protein complexes. 4. 4) Precipitated DNA can then be PCR amplified and sequenced. Chromatin Immunoprecipitation (ChIP)

• Can identify which DNA segments target proteins are bound to in vivo under specific conditions.

• Can be adapted to a wide variety of useful experiments. - including identifying RNA-protein binding