Origin of Replication - I

Signals and Systems in Biology Kushal Shah @ EE, IIT Delhi DNA Replication Bacterial DNA Ori : Some facts

I Replication may proceed uni-directionally or bi-directionally

I Usually AT-rich region

I : circular DNA and single

I : Circular DNA and multiple origins

I : Linear DNA and multiple origins

I Firing time of each Ori may be different I Modeling of this ’firing’ phenomenon is a challenging task Ori : Some facts

I Replication may proceed uni-directionally or bi-directionally

I Usually AT-rich region

I Bacteria : circular DNA and single origin of replication

I Archaea : Circular DNA and multiple origins

I Eukaryotes : Linear DNA and multiple origins

I Firing time of each Ori may be different I Modeling of this ’firing’ phenomenon is a challenging task Ori : Some facts

I Replication may proceed uni-directionally or bi-directionally

I Usually AT-rich region

I Bacteria : circular DNA and single origin of replication

I Archaea : Circular DNA and multiple origins

I Eukaryotes : Linear DNA and multiple origins

I Firing time of each Ori may be different I Modeling of this ’firing’ phenomenon is a challenging task Ori : Some facts

I Replication may proceed uni-directionally or bi-directionally

I Usually AT-rich region

I Bacteria : circular DNA and single origin of replication

I Archaea : Circular DNA and multiple origins

I Eukaryotes : Linear DNA and multiple origins

I Firing time of each Ori may be different I Modeling of this ’firing’ phenomenon is a challenging task Ori : Some facts

I Replication may proceed uni-directionally or bi-directionally

I Usually AT-rich region

I Bacteria : circular DNA and single origin of replication

I Archaea : Circular DNA and multiple origins

I Eukaryotes : Linear DNA and multiple origins

I Firing time of each Ori may be different I Modeling of this ’firing’ phenomenon is a challenging task Ori : Some facts

I Replication may proceed uni-directionally or bi-directionally

I Usually AT-rich region

I Bacteria : circular DNA and single origin of replication

I Archaea : Circular DNA and multiple origins

I Eukaryotes : Linear DNA and multiple origins

I Firing time of each Ori may be different I Modeling of this ’firing’ phenomenon is a challenging task Ori : Some facts

I Replication may proceed uni-directionally or bi-directionally

I Usually AT-rich region

I Bacteria : circular DNA and single origin of replication

I Archaea : Circular DNA and multiple origins

I Eukaryotes : Linear DNA and multiple origins

I Firing time of each Ori may be different I Modeling of this ’firing’ phenomenon is a challenging task Ori : Some facts

I Replication may proceed uni-directionally or bi-directionally

I Usually AT-rich region

I Bacteria : circular DNA and single origin of replication

I Archaea : Circular DNA and multiple origins

I Eukaryotes : Linear DNA and multiple origins

I Firing time of each Ori may be different I Modeling of this ’firing’ phenomenon is a challenging task Ori : Cycle

I : Initiation of major replication regulatory processes

I : Actual DNA replication

I G2 phase : Correction of replication errors or other damages

I M phase : Segregation of parent cell into daughters

Prokaryotic cell : ~ 20 mins

Eukaryotic cell : ~ 18 to 24 hrs! Ori :

I G1 phase : Initiation of major replication regulatory processes

I S phase : Actual DNA replication

I G2 phase : Correction of replication errors or other damages

I M phase : Segregation of parent cell into daughters

Prokaryotic cell : ~ 20 mins

Eukaryotic cell : ~ 18 to 24 hrs! Ori : Cell Cycle

I G1 phase : Initiation of major replication regulatory processes

I S phase : Actual DNA replication

I G2 phase : Correction of replication errors or other damages

I M phase : Segregation of parent cell into daughters

Prokaryotic cell : ~ 20 mins

Eukaryotic cell : ~ 18 to 24 hrs! Ori : Cell Cycle

I G1 phase : Initiation of major replication regulatory processes

I S phase : Actual DNA replication

I G2 phase : Correction of replication errors or other damages

I M phase : Segregation of parent cell into daughters

Prokaryotic cell : ~ 20 mins

Eukaryotic cell : ~ 18 to 24 hrs! Ori : Cell Cycle

I G1 phase : Initiation of major replication regulatory processes

I S phase : Actual DNA replication

I G2 phase : Correction of replication errors or other damages

I M phase : Segregation of parent cell into daughters

Prokaryotic cell : ~ 20 mins

Eukaryotic cell : ~ 18 to 24 hrs! Ori : Cell Cycle

I G1 phase : Initiation of major replication regulatory processes

I S phase : Actual DNA replication

I G2 phase : Correction of replication errors or other damages

I M phase : Segregation of parent cell into daughters

Prokaryotic cell : ~ 20 mins

Eukaryotic cell : ~ 18 to 24 hrs! Ori : Cell Cycle

I G1 phase : Initiation of major replication regulatory processes

I S phase : Actual DNA replication

I G2 phase : Correction of replication errors or other damages

I M phase : Segregation of parent cell into daughters

Prokaryotic cell : ~ 20 mins

Eukaryotic cell : ~ 18 to 24 hrs! Ori :

I Circular DNA and single origin of replication

I 9-mer and 13-mer repeats

I DnaA box (4 nos.): 5’ - TTATCCACA - 3’ I DnaB box (3 nos.): 5’ - GATCTNTTNTTTT - 3

I DnaA protein binds to 9-mers & simulates the 13-mers to unwind

I DnaC loads the DnaB to each of the two unwound strands

I SSB prevents single strands from forming secondary structures

I DNA gyrase relieves the stress! Ori : Prokaryotes

I Circular DNA and single origin of replication

I 9-mer and 13-mer repeats

I DnaA box (4 nos.): 5’ - TTATCCACA - 3’ I DnaB box (3 nos.): 5’ - GATCTNTTNTTTT - 3

I DnaA protein binds to 9-mers & simulates the 13-mers to unwind

I DnaC loads the DnaB to each of the two unwound strands

I SSB prevents single strands from forming secondary structures

I DNA gyrase relieves the stress! Ori : Prokaryotes

I Circular DNA and single origin of replication

I 9-mer and 13-mer repeats

I DnaA box (4 nos.): 5’ - TTATCCACA - 3’ I DnaB box (3 nos.): 5’ - GATCTNTTNTTTT - 3

I DnaA protein binds to 9-mers & simulates the 13-mers to unwind

I DnaC loads the DnaB to each of the two unwound strands

I SSB prevents single strands from forming secondary structures

I DNA gyrase relieves the stress! Ori : Prokaryotes

I Circular DNA and single origin of replication

I 9-mer and 13-mer repeats

I DnaA box (4 nos.): 5’ - TTATCCACA - 3’ I DnaB box (3 nos.): 5’ - GATCTNTTNTTTT - 3

I DnaA protein binds to 9-mers & simulates the 13-mers to unwind

I DnaC loads the DnaB to each of the two unwound strands

I SSB prevents single strands from forming secondary structures

I DNA gyrase relieves the stress! Ori : Prokaryotes

I Circular DNA and single origin of replication

I 9-mer and 13-mer repeats

I DnaA box (4 nos.): 5’ - TTATCCACA - 3’ I DnaB box (3 nos.): 5’ - GATCTNTTNTTTT - 3

I DnaA protein binds to 9-mers & simulates the 13-mers to unwind

I DnaC loads the DnaB to each of the two unwound strands

I SSB prevents single strands from forming secondary structures

I DNA gyrase relieves the stress! Ori : Prokaryotes

I Circular DNA and single origin of replication

I 9-mer and 13-mer repeats

I DnaA box (4 nos.): 5’ - TTATCCACA - 3’ I DnaB box (3 nos.): 5’ - GATCTNTTNTTTT - 3

I DnaA protein binds to 9-mers & simulates the 13-mers to unwind

I DnaC loads the DnaB to each of the two unwound strands

I SSB prevents single strands from forming secondary structures

I DNA gyrase relieves the stress! Ori : Prokaryotes

I Circular DNA and single origin of replication

I 9-mer and 13-mer repeats

I DnaA box (4 nos.): 5’ - TTATCCACA - 3’ I DnaB box (3 nos.): 5’ - GATCTNTTNTTTT - 3

I DnaA protein binds to 9-mers & simulates the 13-mers to unwind

I DnaC loads the DnaB to each of the two unwound strands

I SSB prevents single strands from forming secondary structures

I DNA gyrase relieves the stress! Ori : Prokaryotes

I Circular DNA and single origin of replication

I 9-mer and 13-mer repeats

I DnaA box (4 nos.): 5’ - TTATCCACA - 3’ I DnaB box (3 nos.): 5’ - GATCTNTTNTTTT - 3

I DnaA protein binds to 9-mers & simulates the 13-mers to unwind

I DnaC loads the DnaB to each of the two unwound strands

I SSB prevents single strands from forming secondary structures

I DNA gyrase relieves the stress! Ori : Prokaryotes

I Circular DNA and single origin of replication

I 9-mer and 13-mer repeats

I DnaA box (4 nos.): 5’ - TTATCCACA - 3’ I DnaB box (3 nos.): 5’ - GATCTNTTNTTTT - 3

I DnaA protein binds to 9-mers & simulates the 13-mers to unwind

I DnaC loads the DnaB to each of the two unwound strands

I SSB prevents single strands from forming secondary structures

I DNA gyrase relieves the stress! Prokaryotic DNA Replication Eukaryotic DNA replication Occurs inside the cytoplasm Occurs inside the nucleus Only one origin of replication Have many origins of replication Ori length about 100-200 nt Each Ori of about 150 nt DnaA and DnaB boxes No conserved consensus sequence (S. cerevisiae : WTTTAYRTTTW) W=A/T, Y=C/T, T=A/G Initiation by DnaA and DnaB Initiation by ORC protein Replication is very rapid Replication is very slow Prokaryotic DNA Replication Eukaryotic DNA replication Occurs inside the cytoplasm Occurs inside the nucleus Only one origin of replication Have many origins of replication Ori length about 100-200 nt Each Ori of about 150 nt DnaA and DnaB boxes No conserved consensus sequence (S. cerevisiae : WTTTAYRTTTW) W=A/T, Y=C/T, T=A/G Initiation by DnaA and DnaB Initiation by ORC protein Replication is very rapid Replication is very slow Prokaryotic DNA Replication Eukaryotic DNA replication Occurs inside the cytoplasm Occurs inside the nucleus Only one origin of replication Have many origins of replication Ori length about 100-200 nt Each Ori of about 150 nt DnaA and DnaB boxes No conserved consensus sequence (S. cerevisiae : WTTTAYRTTTW) W=A/T, Y=C/T, T=A/G Initiation by DnaA and DnaB Initiation by ORC protein Replication is very rapid Replication is very slow Prokaryotic DNA Replication Eukaryotic DNA replication Occurs inside the cytoplasm Occurs inside the nucleus Only one origin of replication Have many origins of replication Ori length about 100-200 nt Each Ori of about 150 nt DnaA and DnaB boxes No conserved consensus sequence (S. cerevisiae : WTTTAYRTTTW) W=A/T, Y=C/T, T=A/G Initiation by DnaA and DnaB Initiation by ORC protein Replication is very rapid Replication is very slow Prokaryotic DNA Replication Eukaryotic DNA replication Occurs inside the cytoplasm Occurs inside the nucleus Only one origin of replication Have many origins of replication Ori length about 100-200 nt Each Ori of about 150 nt DnaA and DnaB boxes No conserved consensus sequence (S. cerevisiae : WTTTAYRTTTW) W=A/T, Y=C/T, T=A/G Initiation by DnaA and DnaB Initiation by ORC protein Replication is very rapid Replication is very slow Prokaryotic DNA Replication Eukaryotic DNA replication Occurs inside the cytoplasm Occurs inside the nucleus Only one origin of replication Have many origins of replication Ori length about 100-200 nt Each Ori of about 150 nt DnaA and DnaB boxes No conserved consensus sequence (S. cerevisiae : WTTTAYRTTTW) W=A/T, Y=C/T, T=A/G Initiation by DnaA and DnaB Initiation by ORC protein Replication is very rapid Replication is very slow Prokaryotic DNA Replication Eukaryotic DNA replication Occurs inside the cytoplasm Occurs inside the nucleus Only one origin of replication Have many origins of replication Ori length about 100-200 nt Each Ori of about 150 nt DnaA and DnaB boxes No conserved consensus sequence (S. cerevisiae : WTTTAYRTTTW) W=A/T, Y=C/T, T=A/G Initiation by DnaA and DnaB Initiation by ORC protein Replication is very rapid Replication is very slow Prokaryotic DNA Replication Eukaryotic DNA replication Occurs inside the cytoplasm Occurs inside the nucleus Only one origin of replication Have many origins of replication Ori length about 100-200 nt Each Ori of about 150 nt DnaA and DnaB boxes No conserved consensus sequence (S. cerevisiae : WTTTAYRTTTW) W=A/T, Y=C/T, T=A/G Initiation by DnaA and DnaB Initiation by ORC protein Replication is very rapid Replication is very slow How long does the replication process take?

I Speed

I Prokaryotes : 1000 nt per second E. coli ∼ 4000kb ⇒ 4000 seconds I Eukaryotes : 50 nt per second If only one origin was present, time ~ 35 days!! Presence of multiple Ori reduces overall time.

I Error Rate

I Error during replication is about 1 per 100000 nt Humans: 6 billion nt ⇒ 60000 errors!! I Error Correcting Mechanism : Proof-reading and Mismatch repair −9 −2 I Reported final error rates : 10 to 10 I Not all errors are bad!! How long does the replication process take?

I Speed

I Prokaryotes : 1000 nt per second E. coli ∼ 4000kb ⇒ 4000 seconds I Eukaryotes : 50 nt per second If only one origin was present, time ~ 35 days!! Presence of multiple Ori reduces overall time.

I Error Rate

I Error during replication is about 1 per 100000 nt Humans: 6 billion nt ⇒ 60000 errors!! I Error Correcting Mechanism : Proof-reading and Mismatch repair −9 −2 I Reported final error rates : 10 to 10 I Not all errors are bad!! How long does the replication process take?

I Speed

I Prokaryotes : 1000 nt per second E. coli ∼ 4000kb ⇒ 4000 seconds I Eukaryotes : 50 nt per second If only one origin was present, time ~ 35 days!! Presence of multiple Ori reduces overall time.

I Error Rate

I Error during replication is about 1 per 100000 nt Humans: 6 billion nt ⇒ 60000 errors!! I Error Correcting Mechanism : Proof-reading and Mismatch repair −9 −2 I Reported final error rates : 10 to 10 I Not all errors are bad!! How long does the replication process take?

I Speed

I Prokaryotes : 1000 nt per second E. coli ∼ 4000kb ⇒ 4000 seconds I Eukaryotes : 50 nt per second If only one origin was present, time ~ 35 days!! Presence of multiple Ori reduces overall time.

I Error Rate

I Error during replication is about 1 per 100000 nt Humans: 6 billion nt ⇒ 60000 errors!! I Error Correcting Mechanism : Proof-reading and Mismatch repair −9 −2 I Reported final error rates : 10 to 10 I Not all errors are bad!! How long does the replication process take?

I Speed

I Prokaryotes : 1000 nt per second E. coli ∼ 4000kb ⇒ 4000 seconds I Eukaryotes : 50 nt per second If only one origin was present, time ~ 35 days!! Presence of multiple Ori reduces overall time.

I Error Rate

I Error during replication is about 1 per 100000 nt Humans: 6 billion nt ⇒ 60000 errors!! I Error Correcting Mechanism : Proof-reading and Mismatch repair −9 −2 I Reported final error rates : 10 to 10 I Not all errors are bad!! How long does the replication process take?

I Speed

I Prokaryotes : 1000 nt per second E. coli ∼ 4000kb ⇒ 4000 seconds I Eukaryotes : 50 nt per second If only one origin was present, time ~ 35 days!! Presence of multiple Ori reduces overall time.

I Error Rate

I Error during replication is about 1 per 100000 nt Humans: 6 billion nt ⇒ 60000 errors!! I Error Correcting Mechanism : Proof-reading and Mismatch repair −9 −2 I Reported final error rates : 10 to 10 I Not all errors are bad!! How long does the replication process take?

I Speed

I Prokaryotes : 1000 nt per second E. coli ∼ 4000kb ⇒ 4000 seconds I Eukaryotes : 50 nt per second If only one origin was present, time ~ 35 days!! Presence of multiple Ori reduces overall time.

I Error Rate

I Error during replication is about 1 per 100000 nt Humans: 6 billion nt ⇒ 60000 errors!! I Error Correcting Mechanism : Proof-reading and Mismatch repair −9 −2 I Reported final error rates : 10 to 10 I Not all errors are bad!! How long does the replication process take?

I Speed

I Prokaryotes : 1000 nt per second E. coli ∼ 4000kb ⇒ 4000 seconds I Eukaryotes : 50 nt per second If only one origin was present, time ~ 35 days!! Presence of multiple Ori reduces overall time.

I Error Rate

I Error during replication is about 1 per 100000 nt Humans: 6 billion nt ⇒ 60000 errors!! I Error Correcting Mechanism : Proof-reading and Mismatch repair −9 −2 I Reported final error rates : 10 to 10 I Not all errors are bad!! How long does the replication process take?

I Speed

I Prokaryotes : 1000 nt per second E. coli ∼ 4000kb ⇒ 4000 seconds I Eukaryotes : 50 nt per second If only one origin was present, time ~ 35 days!! Presence of multiple Ori reduces overall time.

I Error Rate

I Error during replication is about 1 per 100000 nt Humans: 6 billion nt ⇒ 60000 errors!! I Error Correcting Mechanism : Proof-reading and Mismatch repair −9 −2 I Reported final error rates : 10 to 10 I Not all errors are bad!! How long does the replication process take?

I Speed

I Prokaryotes : 1000 nt per second E. coli ∼ 4000kb ⇒ 4000 seconds I Eukaryotes : 50 nt per second If only one origin was present, time ~ 35 days!! Presence of multiple Ori reduces overall time.

I Error Rate

I Error during replication is about 1 per 100000 nt Humans: 6 billion nt ⇒ 60000 errors!! I Error Correcting Mechanism : Proof-reading and Mismatch repair −9 −2 I Reported final error rates : 10 to 10 I Not all errors are bad!! How long does the replication process take?

I Speed

I Prokaryotes : 1000 nt per second E. coli ∼ 4000kb ⇒ 4000 seconds I Eukaryotes : 50 nt per second If only one origin was present, time ~ 35 days!! Presence of multiple Ori reduces overall time.

I Error Rate

I Error during replication is about 1 per 100000 nt Humans: 6 billion nt ⇒ 60000 errors!! I Error Correcting Mechanism : Proof-reading and Mismatch repair −9 −2 I Reported final error rates : 10 to 10 I Not all errors are bad!! How long does the replication process take?

I Speed

I Prokaryotes : 1000 nt per second E. coli ∼ 4000kb ⇒ 4000 seconds I Eukaryotes : 50 nt per second If only one origin was present, time ~ 35 days!! Presence of multiple Ori reduces overall time.

I Error Rate

I Error during replication is about 1 per 100000 nt Humans: 6 billion nt ⇒ 60000 errors!! I Error Correcting Mechanism : Proof-reading and Mismatch repair −9 −2 I Reported final error rates : 10 to 10 I Not all errors are bad!! Ori finding method : Skews

I Chargaff’s Parity Rule : A ≈ T and G ≈ C for whole Used by Watson and Crick in their discovery of DNA structure

I Over smaller windows, G 6= C and A 6= T

I Different mutational pressures on the leading and lagging strand due to difference in replication machinery I Difference in selective pressures due to inhomogeneous distribution of Ori finding method : Skews

I Chargaff’s Parity Rule : A ≈ T and G ≈ C for whole genome Used by Watson and Crick in their discovery of DNA structure

I Over smaller windows, G 6= C and A 6= T

I Different mutational pressures on the leading and lagging strand due to difference in replication machinery I Difference in selective pressures due to inhomogeneous distribution of genes Ori finding method : Skews

I Chargaff’s Parity Rule : A ≈ T and G ≈ C for whole genome Used by Watson and Crick in their discovery of DNA structure

I Over smaller windows, G 6= C and A 6= T

I Different mutational pressures on the leading and lagging strand due to difference in replication machinery I Difference in selective pressures due to inhomogeneous distribution of genes Ori finding method : Skews

I Chargaff’s Parity Rule : A ≈ T and G ≈ C for whole genome Used by Watson and Crick in their discovery of DNA structure

I Over smaller windows, G 6= C and A 6= T

I Different mutational pressures on the leading and lagging strand due to difference in replication machinery I Difference in selective pressures due to inhomogeneous distribution of genes Ori finding method : Skews

I Chargaff’s Parity Rule : A ≈ T and G ≈ C for whole genome Used by Watson and Crick in their discovery of DNA structure

I Over smaller windows, G 6= C and A 6= T

I Different mutational pressures on the leading and lagging strand due to difference in replication machinery I Difference in selective pressures due to inhomogeneous distribution of genes Ori finding method : Skews

I Chargaff’s Parity Rule : A ≈ T and G ≈ C for whole genome Used by Watson and Crick in their discovery of DNA structure

I Over smaller windows, G 6= C and A 6= T

I Different mutational pressures on the leading and lagging strand due to difference in replication machinery I Difference in selective pressures due to inhomogeneous distribution of genes Ori finding method : Skews

I Chargaff’s Parity Rule : A ≈ T and G ≈ C for whole genome Used by Watson and Crick in their discovery of DNA structure

I Over smaller windows, G 6= C and A 6= T

I Different mutational pressures on the leading and lagging strand due to difference in replication machinery I Difference in selective pressures due to inhomogeneous distribution of genes Ori finding method : Skews

I Chargaff’s Parity Rule : A ≈ T and G ≈ C for whole genome Used by Watson and Crick in their discovery of DNA structure

I Over smaller windows, G 6= C and A 6= T

I Different mutational pressures on the leading and lagging strand due to difference in replication machinery I Difference in selective pressures due to inhomogeneous distribution of genes GC skew and Sliding window method

ATATGTAGCAGTGAGTACGAGATCGAGAGTCGAGA

ATATGTAGCAGTGAGTACGAGATCGAGAGTCGAGA

ATATGTAGCAGTGAGTACGAGATCGAGAGTCGAGA

ATATGTAGCAGTGAGTACGAGATCGAGAGTCGAGA

C − G A − T C + G A + T

J. R. Lobry, Science 1996 J. Mrazek and S. Karlin, PNAS 1998 GC Skew

0.06 E. coli

0.04

0.02

0

-0.02 (C-G)/(G+C)

-0.04

-0.06

-0.08 0 50 100 150 200 250 300 350 400 450 500 Window Number

Arrow : ter Entropy? CGC : Cumulative GC skew

A. Grigoriev, Nucleic Acids Research 1998 Z-curve method

xn = (An + Gn)–(Cn + Tn) yn = (An + Cn)–(Gn + Tn) zn = (An + Tn)–(Cn + Gn)

R. Zhang and C. T. Zhang, BBRC 2002 1 N−k N−1 C k a a C C k Correlation : ( ) = N k ∑ j j+k G = ∑ | ( )| − j=1 k=1

0.2 0.15 (a) B. subtilis 0.1 0.05 0 -0.05 (C-G)/(G+C) -0.1 -0.15 0 50 100 150 200 250 300 350 400 450 Window Number 0.55 ) B. subtilis G 0.5 0.45 (b) 0.4 0.35 0.3 0.25 Correlation (C 0.2 0 50 100 150 200 250 300 350 400 450 Window Number

K. Shah and A. Krishnamachari, BioSystems 2012 1 N−k N−1 C k a a C C k Correlation : ( ) = N k ∑ j j+k G = ∑ | ( )| − j=1 k=1

0.4 P. falciparum 0.3 (a) 0.2 0.1 0

(C-G)/(G+C) -0.1 -0.2 0 50 100 150 200 250 300 Window Number 0.88 ) P. falciparum apicoplast

G 0.86 (b) 0.84 0.82 0.8 0.78 0.76 0.74

Correlation (C 0.72 0.7 0 50 100 150 200 250 300 Window Number

K. Shah and A. Krishnamachari, BioSystems 2012