arXiv:physics/0609139v2 [physics.bio-ph] 14 Nov 2006 ∗ function biological and conformations DNA Single lcrncades [email protected] address: Electronic e od:DA igemlcls N opn,DAdenatura DNA looping, DNA nanosensing molecules, in single DNA DNA, of words: use Key the for physical suitable The particularly be initiation. to transcription on nanobubbles as site DNA such binding intermittent processes, specific of their formation relat spontaneous for be the proteins may of DNA-looping process random search how revi t discuss this relevance We biophysical whose In appreciated. the DNA-knots, investigate DNA. of We configurations of equilibrium phenomena: mole conformations following DNA the physical involving sin on the reactions or to few biochemical reactions robust are highly example for prime examples proces cellular extraordinary perspective, provide nanoscience a From Abstract. 75083 TX Richardson, nttt fBoeia cecsadTcnlg n Depart and Technology and Sciences Biomedical of Institute Levene C Einstein Stephen Albert Biophysics, and Physiology of Department Zhang Yongli Universi Technology, and at Texas Sciences of Biomedical University of Astronomy, Institute and Physics of Department Hanke Andreas Techn address) of (New Institute 02139 Massachusetts Chemistry, Blegda of Physics, Department Theoretical for Institute NORDITA—Nordic Past Ambj¨ornsson Louis Tobias 150 Ottawa, of University Physics, Blegda of Physics, Department Theoretical for Institute NORDITA—Nordic Metzler Ralf ∗ rpriso N a nedtr out turn indeed may DNA of properties e n hi eue nvtoimitations vitro in reduced their and ses h N oeue n edelon dwell we And molecule. DNA the ilgclpoessaeincreasingly are processes biological o yo ea tDla,Rcado,T 75083 TX Richardson, Dallas, at Texas of ty eto oeua n elBooy nvriyo ea tDa at Texas of University Biology, Cell and Molecular of ment l oeuerato ahas A pathways. reaction molecule gle rpriso N-opn n the and DNA-looping of properties n hi motnefrbiological for importance their and applications. lg,7 ascuet vne abig,Massachusetts Cambridge, Avenue, Massachusetts 77 ology, se 7 10Cpnae ,Dnakand Denmark Ø, Copenhagen 2100 17, msvej and Denmark Ø, Copenhagen 2100 17, msvej leeo eiie rn,N 10461 NY Bronx, Medicine, of ollege w esmaiercn results recent summarise we ew, u,Otw,Otro 1 N,Cnd Nwaddress) (New Canada 6N5, K1N Ontario, Ottawa, eur, ue,adteculn fthese of coupling the and cules, dt h ffiinyo h target the of efficiency the to ed rwsil,8 otBon rwsil,T 82 and 78520 TX Brownsville, Brown, Fort 80 Brownsville, in nt,gn regulation gene knots, tion, llas, 2

Contents

I. Introduction 3

II. Physical properties and biological function of DNA 4

III. DNA-looping 7 A. Biological significance of DNA looping 7 B. DNA loop model 9 C. Simplified protein geometries and flexibility parameters 9 D. DNA loops having zero end-to-end distance and antiparallel helical axes 9 E. DNA looping with finite end-to-end distance, antiparallel helical axes, and in-phase torsional constraint 10 F. DNA looping in synapsis 11 G. Conclusion 13

IV. DNA knots and their consequences: entropy and targeted knot removal 13 A. Physiological background of knots 13 B. Classification of knots 15 C. Long chains are almost always entangled. 16 D. Entropic localisation in the figure-eight slip-link structure. 16 E. Simulations of entropic knots in 2D and 3D. 18 F. Flattened knots in dilute and dense phases. 20 1. Flat knots in dilute phase. 21 2. Flat knots under Θ and dense conditions. 23 G. 3D knots defy complete analytical treatment. 25

V. DNA breathing: local denaturation zones and biological implications 26 A. Physiological background of DNA denaturation 26 B. The Poland-Scheraga model of DNA melting 28 C. Fluctuation dynamics of DNA bubbles: DNA breathing 30 D. Probabilistic modelling—the master equation (ME) 31 E. Stochastic modelling—the Gillespie algorithm 32 F. Bacteriophage T7 promoter sequence analysis. 33 G. Interaction of DNA bubbles with selectively single-strand binding proteins 34

VI. Role of DNA conformations in gene regulation 35 A. Physiological background of gene regulation and expression 35 B. Binding proteins: Specific and nonspecific binding modes 36 C. The search process for the specific target sequence 37 D. A unique situation: Pure one-dimensional search of SSB mutants 38 E. L´evy flights and target search 38 F. Viruses—extreme nanomechanics 40

VII. Functional molecules and nanosensing 40 A. Functional molecules 40 B. Nanosensing 41

VIII. Summary 42

Acknowledgments 42

Biographical notes 43

A polymer primer. 45 Polymer networks. 46

References 48 3

I. INTRODUCTION in the virus capsid and prevent ejection of the DNA into a host, and thereby infection. DNA knots are also be- Deoxyribonucleic acid (DNA) is the molecule of life ing recognised as a potential complication in the use of as we know it.1 It contains all information of an entire nanochannels for DNA separation and sequencing. In organism.2 This information is copied during cell divi- such confined geometries DNA knots are created with sion with an extremely high fidelity by the replication appreciable probability, affecting the reliability of these mechanism. Despite the rather high chemical and phys- techniques. Similarly to DNA knots, DNA looping is ical stability of DNA, due to constant action of enzymes intimately connected to the function of DNA. Current and other binding proteins (mismatches, rupture) as well results on DNA looping and DNA knot behaviour are as potential environmentally induced damage (radiation, summarised in the first parts of this review. chemicals), this low error rate, i.e., the suppression of the The Watson-Crick double-helix represents the thermo- liability to mutations, is only possible with the constant dynamically stable state of DNA at moderate salt con- action of repair mechanisms (1; 2; 3; 4). Although DNA’s centrations and below the melting temperature. This structural and mechanical properties are rather well es- stability is effected by Watson-Crick hydrogen bonding tablished for isolated DNA molecules (starting with Ros- and the stronger base stacking of neighbouring base-pairs alind Franklin’s X-ray diffraction images (5)), the charac- (bps). However, even at room temperature DNA locally terisation of DNA in its cellular environment, and even in opens up intermittent flexible single-stranded domains, vitro during interaction with binding proteins, is subject so-called DNA-bubbles. Their size typically ranges from of ongoing investigations. a few broken bps, increasing to some 200 broken bps Recent advances in experimental techniques such as closer to the melting temperature. The thermal melt- fluorescence methods, atomic force microscopy, or opti- ing of DNA has traditionally been used to obtain the cal tweezers have leveraged the potential to both probe sequence-dependent stability parameters of DNA. More and manipulate the equilibrium and out of equilibrium recently, the role of intermittent bubble domains has behaviour of single DNA molecules, making it possible been investigated with respect to the liability of DNA- to explore DNA’s physical and mechanical properties as denaturation induced by proteins that selectively bind to well as its interaction with other biopolymers, such as the single-stranded DNA. It has been speculated that due to DNA-protein interplay during gene regulation or repair the liability to denaturation of the TATA motif bubble processes. An important ingredient is the coupling to formation may add in transcription initiation. The dy- thermal activation due to the highly Brownian environ- namics of single bubbles can be monitored by fluorescence ment. Although mostly performed in vitro, these exper- methods, opening a window to both study the breathing iments provide access to increasingly refined information of DNA experimentally, but also to obtain high precision on the nature of DNA and its environment-controlled be- DNA stability data. Finally, bubble dynamics has been haviour. suggested as a useful tool in optical nanosensing. DNA In addition to chromosomal packaging inside the nu- breathing is the topic of the second part of this work. cleus of eukaryotic cells and the concentration of DNA Essentially all the biological functions of DNA rely on in the membraneless nucleoid region of prokaryotes, the site-specific DNA-binding proteins locating their targets global structure of the DNA molecule can be affected (cognate sites) on the DNA molecule, and therefore re- by topological entanglements. Thus, by error or design quire searching through megabases of non-target DNA in a DNA molecule can attain a knotted or concatenated a highly efficient manner. For instance, gene regulation state, reducing or inhibiting biologically relevant func- is performed by specific regulatory proteins. On binding tions, for instance, replication or transcription. Such en- to a promoter area on the DNA, they recruit or inhibit tangled states can be actively reduced by enzymes of the binding of RNA polymerase and subsequent transcrip- topoisomerase family. Their precise action, in particular, tion of the associated gene. The search for the cognate how they determine the presence of an entangled state, is site is in fact facilitated by the DNA molecule: in addi- not fully known. Current studies therefore aim at shed- tion to three-dimensional search it enables the proteins ding light on possible mechanisms, in particular, in view to also move one-dimensionally along the DNA while be- of the importance of topoisomerase action (or better, its ing non-specifically bound. Moreover, at points where inhibition) in tumour proliferation. Other applications the DNA loops back on itself, this polymeric conforma- may be directed towards the treatment of viral deceases tion provides shortcuts for the proteins in the chemical by modifying the packaging of viral DNA to create knots coordinate along the DNA, approximately giving rise to search-efficient L´evy flights. Target search is currently a very active field of research, and single molecule meth- ods have been shown to provide essential new informa- 1 Our DNA world during biotic and prebiotic evolution was sup- tion. Moreover, the architecture of more complex pro- posedly preceded by an RNA world and, quite likely, by sugarless moters relying on the simultaneous presence of several nucleic acids. 2 A small fraction of genetic information is stored on DNA that regulatory proteins is being investigated to create in sil- is kept at other regions of the cell and not replicated on cell ico circuits for highly sensitive chemical probes in small division, such as mitochondrial or ribosomal DNA. volumes. Such nanosensing applications are expected to 4 be of great importance in microarrays or other nano- and microapplications. The third part of this review deals with diffusional aspects of gene regulation. DNA At the same time DNA’s role in classical polymer physics is increasingly appreciated. With the possibility to reproduce DNA with extremely low error rate by the PCR3, monodisperse samples can be prepared. While shorter single-stranded DNA can be used as a model for flexible polymers, the double strand exhibits a semi- flexible behaviour with a persistence length, that can be easily probed experimentally. Moreover, DNA is orders RNA PROTEIN of magnitude longer than conventional polymers. Com- bined with the potential of single molecule probing, DNA is advancing as a model polymer. After an introduction to the properties of DNA we ad- FIG. 1 Central dogma of molecular biology after F. Crick: dress these functional properties of DNA from the per- Potentially, information flow is completely symmetric between spective of biological relevance, physical behaviour and the three levels of cellular biopolymers (DNA, RNA, pro- nanotechnological potential. Most emphasis will be put teins). However, the recognised pathways are only those rep- on the single molecular aspects of DNA. We note that resented here, where solid lines represent probable transfers, and dotted lines for (in principle) possible transfers (8). this is not intended to be an exhaustive review on the physical properties of DNA. Rather, we present some im- Sugar− Sugar− portant features and their consequences from a personal phosphate Base pairs phosphate perspective. backbone backbone A T P S S Hydrogen bonds II. PHYSICAL PROPERTIES AND BIOLOGICAL P G C FUNCTION OF DNA P S S P Biomolecules, that occur naturally in biological sys- TA P tems, can be grouped into unspecific oligo- and macro- S molecules and biopolymers in the stricter sense (1). Un- S P A T specific biomolecules are produced by biological organ- P isms in a large range of molecular weight and structure, S S such as polysaccharides (cellulose, chitin, starch, etc.), P C G higher fatty acids, actin filaments or microtubules. Also P Base pair S the natural ‘-rubber’ from the Hevea Brasiliensis S tree, historically important for both industrial purposes P G C and the development of polymer physics (7) belongs to P S this group. S Biopolymers in the stricter sense we are going to as- Nucleotide sume here comprise the polynucleotides DNA and RNA FIG. 2 Ladder structure of the DNA formed by its four consisting of the four-letter nucleotide alphabet with building-blocks A, G, C, and T, giving rise to the typical A-T and G-C (A-U and G-C for RNA) bps, and the double-helical structure of DNA. A-T bps establish 2 H- polypeptidic proteins consisting of 20 different amino bonds, G-C bps 3 H-bonds. acids, each coded for by 3 bases (codons) in the RNA (1; 2; 3; 4). We will come back to proteins later when reviewing binding protein-DNA interactions. Biopoly- code stored in the DNA (in some cases in RNA) DNA mers are copied and/or created according to the infor- is copied by DNA polymerase (replication), and the pro- mation flow sketched in figure 1, the so-called central teins as the actually task-performing biopolymers are cre- dogma of molecular biology, a term originally coined by ated via messenger RNA (created by DNA transcription Frances Crick (9). Accordingly, starting from the genetic through RNA polymerase) and further by translation in ribosomes to proteins.4 DNA is made up of the four bases (1; 2; 3; 4; 10;

3 Polymerase Chain Reaction: thermal denaturation of a DNA molecule into two single strands and subsequent cooling in a solution of single nucleotides and invariable primers, produces two new complete double-stranded DNA molecules. Cycling of 4 Alternatively, the genetic code can be transcribed into transfer this process produces large, monodisperse quantities of DNA. and ribosomal RNA that is not translated into proteins. 5

11): A(denine), G(uanine), C(ytosine), and T(hymine) cells it can reach the order of a few m, roughly 2 m in a that form the DNA ladder structure shown in figure 2. human cell and 35 m in a cell of the South American lung- These building-blocks A, G, C, T bp according to the fish, albeit split up into the individual chromosomes (3). key-lock principle as A-T and G-C, where the AT bond DNA in bacteria in vivo, or extracted from bacteria and is weaker than the GC bond in terms of stability. Apart higher cells for our purposes can therefore be viewed a from the Watson-Crick base-pairing energy, the stability fully flexible polymer with a persistence length of roughly of dsDNA is effected by the stacking interactions, the 50 nm, being governed by generic effects independent of specific matching of subsequent bps along the double- the detailed sequence. On short scales DNA becomes strand, i.e., bp-bp interactions. In standard literature, semiflexible and governed by the worm-like chain model the stacking interactions are listed for pairs of bps (e.g., (Kratky-Porod model) (24); on even shorter scales, lo- for AT-GC, AT-AT, AT-TA, etc.), see below.5 cal structural elements become important (in particular, Based on this AGCT alphabet, the primary structure for recognition by binding proteins (1)), and eventually of DNA can be specified. DNA’s six local structural ele- molecular resolution is reached. ments twist, tilt, roll, shift, slide, and rise are effected by Stacking interactions govern the local structure of ds- the stacking interactions between vicinal bps. In figure 3, DNA. Globally, an additional constraint arises due to we show a map with the structure elements of the entire the circular nature of the DNA, since it has to satisfy the E.coli genome, demonstrating the degree of structural in- conservation law (25; 26; 27) formation currently available. These structural elements Lk = Tw+Wr, (1) define the local geometrical structure of DNA within a typical correlation (persistence) length6 of about 150 bps where Lk stands for the linking number, Tw for the twist, corresponding to 50 nm (the bp-bp distance measures 3.4 and Wr for the writhe of the double helix. The linking A,˚ reflecting the rather complex chemical structure of a number Lk is an integer and formally given by one-half nucleotide in comparison to the monomer size of man- the number of signed crossings of one DNA strand with made polymers such as polyethylene) (12; 13; 14; 15). On the other in any regular projection of the molecule. Lk a larger scale, much longer than the persistence length, is a topological property, and no deformation of a closed DNA becomes flexible. On this level, tertiary structural DNA, without breaking and rejoining the DNA strands, elements come into play. One example is DNA looping, will alter it. T w is equal to the number of times that that is the formation of polymeric lasso loops induced the two strands of DNA wind about the central axis of by chemical bonds between binding proteins attached to the molecule, and W r is a number whose absolute value the DNA at specific bps which are remote along the DNA equals approximately the number of times that the DNA backbone (1; 2; 16; 17; 18; 19; 20). An extreme limit of axis winds about itself.8 Whereas Tw is a property of tertiary structure is the packaging of DNA onto histones the double-helical structure of DNA, Wr is a property and further wrapping into the chromosomes of eukary- of the DNA axis alone. Tw and Wr do not need to be otic cells (1; 21; 22). At the same time, dsDNA may integers and are not conserved, but coupled through Lk locally open into floppy ssDNA bubbles, with a persis- by equation (1). A nicked circular DNA, i.e., when the 7 tence length of a few bases. These fluctuation-induced twist can fully relax, carries Lk0 = N/h links, where N is bubbles increase their statistical weight at higher tem- the number of bp and h (h ≃ 10.5 in B-DNA) the number peratures, until the dsDNA fully denatures (melts). We of bps per turn. will come back to DNA denaturation bubbles below. De- The degree of supercoiling of DNA can be expressed in pending on the external conditions, DNA occurs in sev- terms of the linking number difference, ∆Lk = Lk −Lk0. eral configurations. Under physiological conditions, one The DNA of virtually all terrestrial organisms is under- is concerned with B-DNA, but there are other states such wound or negatively supercoiled, i.e., ∆Lk < 0 (figures 4 9 as A, B’, Z, ps, triplex DNA, quadruplex DNA, cruciform, and 5). Often, the superhelical density σ = ∆Lk/Lk0 and H, reviewed, for instance, in (12; 13). DNA occurs is used; most supercoiled DNA molecules isolated from naturally in a large range of length scales. In viruses, either prokaryotes or eukaryotes have σ values between DNA is of the order of a few µm long. In bacteria, it al- −0.05 and −0.07 (29). Negative supercoiling is regulated ready reaches lengths of several mm, and in mammalian in prokaryotes by DNA gyrase; eukaryotes lack gyrase but maintain negative supercoiling through winding of DNA around nucleosomes and interactions with DNA- unwinding proteins. There are two forms of intracellu- 5 Longer ranging bp-bp interactions are most likely small in com- lar supercoiling, the plectonemic form, characteristic of parison. 6 The persistence length of a polymer chain defines the charac- teristic length scale above which the polymer is susceptible to bending induced by thermal fluctuations, i.e., it is the length scale above which the tangent-tangent correlation decays along 8 For details about the calculation of Tw and Wr for representative the chain, see the Appendix. models of DNA, see (28). 7 In fact, it has been questioned whether there is a meaningful 9 An exception are thermophilic organisms living near undersea value of the persistence length of ssDNA at all, due to its signif- geothermal vents that have positively supercoiled DNA in order icant apparent sequence dependence (23). to stabilise the double helix at extreme temperatures. 6

FIG. 3 Structure atlas of the E.coli genome. Figure courtesy David Ussery, Technical University of Denmark. The structure atlas is available under the URL www.cbs.dtu.dk/services/GenomeAtlas/.

DNA and is largely responsible for the extraordinary de- gree of compaction required to condense typical genomes into the cell’s nucleus.10 Negative supercoiling facilitates the local unwinding of DNA by providing a ubiquitous source of free energy that augments the unwinding free energy accompanying the interactions of many proteins with their cognate DNA sequences. The local unwind- ing of DNA, in turn, is an integral part of many bio- logical processes such as gene regulation and DNA repli- cation (see section VI). Therefore, understanding the FIG. 4 Right-handed (negative), normal, and left-handed interplay of supercoiling and local helical structure is (positive) superhelix. The DNA of virtually all terrestrial essential to the understanding of biological mechanisms organisms is negatively supercoiled. (15; 31; 32; 33; 34).

Ribonucleic acid (RNA) consists of the same build- plasmid DNA and accessible, nucleosome-free regions of ing blocks as DNA, with the exception that T(hymine) chromatin, and the toroidal or solenoidal form, where su- is replaced by U(racile) (11). RNA typically occurs in percoiling is attained by DNA wrapped around histone single-stranded form. Therefore, its secondary structure octamers or prokaryotic non-histone DNA-binding pro- teins (figure 6). The former is the active form of su- percoiled DNA and is freely accessible to proteins in- volved in transcription, replication, recombination and 10 The nucleus of a human cell has a radius of circa 5 µm and stores DNA repair. The latter is the stored form of supercoiled the 2 m of the human genome (30). 7

from the glnALG operon (44). The size of DNA loops formed in these systems varies between approximately 100 and 600 bps. In eukaryotes, a variety of transcrip- tion factors bind to enhancers that are hundreds to sev- eral thousand bps away from their promoters and inter- act with RNA polymerases directly or through media- tors in order to achieve combinatorial gene regulation (45). DNA looping is required to juxtapose two recombi- nation sites in intramolecular site-specific recombination (46; 47; 48) and is also employed by a number of re- striction endonucleases such as SfiI and NgoMIV, which recognise and cut two copies of well-separated cognate sites simultaneously (49; 50; 51). Here we describe a re- cent statistical-mechanical theory of loop formation that connects global mechanical and geometric properties of both DNA and protein and demonstrates the importance FIG. 5 Electron micrographs of nicked (left) and supercoiled of protein flexibility in loop-mediated protein-DNA inter- (right) 6996-bp plasmid DNAs. The supercoiled example is from a population of DNA molecules with an average super- actions (52; 53). helix density,σ ¯= -0.027, close to the value expected in vivo.

A. Biological significance of DNA looping

The biological importance of DNA loop formation is underscored by the abundance of architectural proteins in the cell such as HU, IHF, and HMG, which facili- tate looping by bending the intervening DNA between protein-recognition sites (54). Moreover, DNA looping has been shown to be subject to regulation through the binding of effector molecules that alter protein conforma- tion or protein-DNA interactions (55). FIG. 6 Toroidal (left) and plectonemic (right) forms of su- Two characteristics of DNA looping have been demon- percoiled DNA. strated by in vitro and in vivo experiments. One is co- operative binding of a protein to its two cognate sites, which can be demonstrated by footprinting methods (56). is richer, being characterised by sequences of hairpins: DNA looping can increase the occupancies of both bind- Smaller regions in which chemically remote sequences ing sites; in particular, it can significantly enhance pro- of bases match, pair and form hairpins which are stiff tein association to the lower-affinity site because of the and energy-dominated, similar to dsDNA. The remain- tethering effect of DNA looping. This is a general mech- ing regions form entropy-dominated floppy loops, analo- anism by which many transcription factors recruit RNA gous to the ssDNA bubbles. Additional tertiary struc- polymerases in gene regulation. Another hallmark is ture in RNA comes about by the formation of so-called the helical dependence of loop formation (38; 40), which pseudoknots, chemical bonds established between bases arises because of DNA’s limited torsional flexibility and sitting on chemically distant segments of the secondary the requirement for correct torsional alignment of the two structure. In RNA-modelling the incorporation of pseu- protein-binding sites. Although many methods have been doknots is a non-trivial problem, which currently re- developed to directly observe DNA looping in vitro, such ceives considerable interest; see, for instance, references as scanning-probe (44) and electron microscopy (10), and (11; 35; 36; 37). single-molecule techniques (57), assays based on helical dependence have been the only way to identify DNA looping in vivo. In these experiments, the DNA length III. DNA-LOOPING between two protein binding sites is varied and the yield of DNA loop formation is monitored, for example by the The formation of DNA loops mediated by proteins repression or activation of a reporter gene (58). Using bound at distant sites along a single molecule is an es- this helical-twist assay, DNA looping in the ara operon sential mechanistic aspect of many biological processes was first discovered (40). including gene regulation, DNA replication, and recom- Our knowledge about the roles of DNA bending, twist, bination (for reviews, see (38; 39)). In E. coli, DNA and their respective energetics in DNA looping has come looping represses gene expression at the ara, gal, lac, and largely from analyses of DNA cyclisation (38; 59; 60). deo operons (40; 41; 42; 43) and activates transcription Circularisation efficiencies of DNA fragments, which are 8 quantitatively described by J-factors, oscillate with DNA The basis of the extension of the model to DNA looping length and therefore torsional phase (61; 62). The J- (53) is to treat the protein subunits as connected rigid factor is defined as the ratio of the partition function bodies and to allow for a limited number of degrees of of a circularised polymer chain to that of an open chain. freedom between the subunits. Motions of the subunits Since there is a dimension reduction due to circularisation are assumed to be governed by harmonic potentials and constraints (two polymer ends have to meet), the ratio an associated set of force constants, neglecting the an- has a unit of concentration, or 1/L3 with L represent- harmonic terms often required for proteins undergoing ing length; see (52) for details. In the present context, large conformational fluctuations among their modular the J-factor is equal to the free DNA-end concentration domains. Indeed, the use of a harmonic approximation whose bimolecular ligation efficiency equals that of the is supported by the success of continuum elastic mod- two ends of a cyclising DNA molecule (63). For short els that are based only on shape and mass-distribution DNA fragments J-factors are limited by the significant information in descriptions of protein motion (74). Sim- bending and twisting energies required to form closed cir- ilar to the description used for individual DNA bps in cles, whereas for long DNA, the chain entropy loss during the model, protein geometry and dynamics are described circularisation exceeds the elastic-energy decrease and re- by three rigid-body rotation angles (tilt, roll, and twist). duces the J-factor. Because of this competition between Therefore, DNA looping can be viewed as a generalisa- bending and twisting energetics and entropy, there is an tion of DNA cyclisation in which the protein component optimal DNA length for cyclisation (52). Analogous be- is characterised by a particular set of local geometric con- haviour has been expected for DNA looping, especially straints and elastic constants. This treatment not only with respect to the helical dependence discussed above. unifies the theoretical descriptions of DNA cyclisation Quantitative analyses of DNA looping and cyclisation and looping, but also allows consideration of flexibilities are challenging problems in statistical mechanics and at protein-DNA and protein-protein interfaces and appli- have been largely limited to Monte Carlo or Brownian cation of the concepts of linking number and writhe. In dynamics simulations (64; 65; 66; 67; 68). Analytical previous work, proteins were considered rigid and their solutions are available only for some ideal and special effects on DNA configuration were represented by a set cases. An important contribution in this area is the the- of constraints applied to DNA ends (38; 75; 76). With ory of Shimada and Yamakawa (69), which is based on a the present approach, programs developed for analysing homogeneous and continuous elastic rod model of DNA. DNA cyclisation can be used to analyse DNA looping This theory has been applied extensively to DNA cycli- with only minor modifications. sation (61; 70) and also DNA looping (59; 60; 71). The Shimada-Yamakawa theory makes use of a perturbation approach, in which small configurational fluctuations of a DNA chain around the most probable configuration are accounted for in the evaluation of the partition function. The elastic-equilibrium conformation is obvious for the The new method (52; 53) is most applicable to the homogeneous DNA circle studied by Shimada and Ya- problem of short DNA loops, in which the free energy makawa (69). However, the search for the elastic-energy of a wormlike chain is dominated by bending and tor- minimum of homogeneous DNA molecules with complex sional elasticity (52; 53). Possible modes of DNA self con- geometry, such as in DNA looping, supercoiling, and tact and contacts between protein and DNA at positions the case of inhomogeneous DNA sequences containing other than the binding sites are not considered. For large curvature and nonuniform DNA flexibility, is not triv- loops contributions to the free energy from chain entropy ial (41; 72; 73). Recently, a statistical-mechanical theory and DNA-DNA contacts can become highly significant. for sequence-dependent DNA circles has been developed Several alternative treatments of DNA looping have ap- (52) and applied to the problem of DNA cyclisation (52) peared recently. One of these addresses the excluded- and DNA looping (53). In this model, the DNA configu- volume contribution to DNA looping within large open- ration is described by parameters defined at dinucleotide circular molecules (20), whereas two others consider the steps, i.e., tilt, roll, and twist, which allows straightfor- effect on looping of traction at the ends of a DNA chain ward incorporation of intrinsic or protein-induced DNA (77; 78). None of these treatments includes helical phas- curvature at the bp level. Following Shimada and Ya- ing effects on DNA looping. In contrast, a method based makawa’s method, the theory first determines the me- on the Kirchhoff elastic-rod model, which includes the chanical equilibrium configuration in small DNA circles helical-phase dependence, has been presented (76; 79). (i.e., less than ∼ 1000 bp) under certain constraints; fluc- However, this approach does not include thermal fluctu- tuations around the equilibrium configuration are then ations per se and therefore is not directly applicable to taken into account using an harmonic approximation. calculations of the J-factor. The comprehensive treat- The new method is much more computationally efficient ment of small DNA loops described in (52; 53) is thus far than Monte Carlo simulation, has comparable accuracy, unique to the extent that it accounts for sequence- and and has been applied successfully to analyse experimen- protein-dependent conformational and flexibility param- tal results from DNA cyclisation (52). eters, thermal fluctuations, and helical phasing effects. 9

B. DNA loop model

The protein subunits that mediate loop formation are modelled as two identical and connected rigid bodies, as shown in figure 7 (53). There are three additional sets of rigid-body rotation angles that are defined in addition to those for dinucleotide steps: two sets for the interfaces between protein and the last (DP) and first bps (PD) of the DNA and one set for the interface between the two protein domains (PP), where the symbols in parentheses are used to indicate the corresponding angles through subscripts. The local Cartesian-coordinate frames for protein subunits are defined such that their origins co- incide with vertices of a circular chain and their z-axes point toward the next vertex in succession. Thus protein dimensions can be modelled in terms of a non-canonical value for the helix rise corresponding to particular seg- ments within a circular polymer chain. Angles are expressed in degrees, and length in units of the DNA helical rise, ℓbp = 3.4 A.˚ All calculations used canonical mechanical parameters for duplex DNA: helical ◦ twist τ0 = 34.45 , a sequence-independent twist-angle ◦ standard deviation, or twisting flexibility, στ = 4.388 , and standard deviations, or bending flexibilities, for all ◦ tilt and roll angles, σθ and σφ, respectively, of 4.678 (equivalent to a persistence length of 150 bp). Except for specific cases where intrinsic DNA bending is considered, the average values of tilt and roll are taken to be zero.

FIG. 7 Rigid-body models for studies of protein-mediated DNA looping. (a) A prototype 137-bp DNA loop generated by interactions with a pair of rigid, DNA-binding protein sub- C. Simplified protein geometries and flexibility parameters units is shown. DNA bps are represented by rectangular slabs (red) with axes (blue) that indicate the orientation of the lo- cal Cartesian coordinate frame whose origin lies at the centre For DNA loops with either zero or nonzero end-to-end of each bp. Two sets of coordinate axes (green) represent distances, constraints are directly applied to the DNA the local coordinate frames embedded in the protein subunits ends, as in the case of DNA cyclisation. We modelled (gold ellipsoids) that mediate DNA looping. The coupling DNA loops formed during site synapsis using protein- of protein and DNA geometry is characterised by tilt, roll, ◦ dependent parameters roll = φDP = φP D = 90 and and twist values for the DNA-protein, protein-protein, and ◦ twist = τDP = τP D = 34.45 . The angle was consid- protein-DNA interfaces. Three of these variables are shown ered an adjustable parameter that we denote the axial here: the DNA-protein roll angle, φDP ; the protein-protein angle and, unless specified, all other protein-related an- twist angle, τPP ; and the protein-DNA roll angle, φPD. (b) Prototype 179-bp loop with protein-protein twist angle, τ , gular parameters were set equal to 0◦. In these cases the PP equal to −60 degrees. The view is from the base of the loop DNA ends (the centres of two protein-binding sites on toward the DNA apex. (c) Loop conformation shown in (b) DNA) are separated by twice the protein-arm length ℓp viewed from the side, perpendicular to the loop dyad axis. and displaced from one another along the +x direction, or toward the major groove of DNA. Projected along the x-axis, the axial angle is the included angle between the tangents to the DNA at the two protein binding sites and is altered by varying the twist between protein subunits D. DNA loops having zero end-to-end distance and (figure 7 b, c). An axial angle equal to 0◦ corresponds to antiparallel helical axes antiparallel axes at the ends as shown in figure 7a. The case of a rigid protein assembly is modelled by setting the standard deviations of the DP, PP, and PD sets of DNA loops containing N bps in which the two ends rigid-body rotation angles to 1 · 10−8 deg. meet in an antiparallel orientation can be empirically de- 10 scribed by the following formula:

Tilt : θi = −Ai cos(180 + δ) (2)

Roll : φi = Ai sin(180 + δ) 0 Twist : τi = τ where τ 0 is the intrinsic DNA twist and δ an arbitrary an- gle related to the unconstrained torsional degree of free- dom of DNA. The coefficients Ai are given by 1 i A = f , i =0,...,N − 1 (3) i N N − 1   with g(x) , 0 ≤ x ≤ 0.5 f(x)= (4) g(0.5 − x) , 0.5 < x ≤ 1  where

5 i g(x)= ai x , 0 ≤ x ≤ 0.5 . (5) i=1 X The coefficients in equation (5) were obtained by fit- ting the space curve corresponding to the DNA heli- cal axis that gives the minimum elastic energy confor- mation of DNA loops of different sizes and are as fol- lows: a0 = −335.0142, a1 = 2318.881, a2 = −1299.164, a3 = −4483.366, a4 = 38169.74, a5 = −54753.5. The error for end-to-end distances computed using equation (2) is less than 2% of DNA length from 50 bp to 100 bp, and less than 0.5% from 100 bp to 500 bp. The torsional phase angle between two ends is ξ = − (N − 2) τ − 2δ. FIG. 8 Conformation of an antiparallel, 150-bp DNA loop The entire loop lies in a plane, and the angle between with zero end-to-end distance. (a) Computed space-filling the normal vector of the plane and the x-axis of the ex- model of the loop generated with 3DNA (82). The ends of ternal coordinate can be shown to be ψ = 180+ τ − δ. the DNA juxtapose exactly with antiparallel helical axes and The expressions for ξ and ψ suggest that δ is related to exact torsional phasing. (b) Equilibrium roll and magnitude of the loop shown in (a). The bending magnitude of each DNA bending isotropy. Loop configurations with differ- 2 2 ent δ values are related to each other by globally twisting dinucleotide step is defined as pθi + φi where θi and φi are the tilt and roll of i-th dinucleotide step, respectively. DNA molecules. Since the orientation of the first bp is fixed, this global twist is equivalent to rotation of the loop plane, which corresponds to the rotational symme- try met in DNA cyclisation of homogeneous DNA with J-factor occurs at approximately the same DNA length, bending isotropy (52). Therefore, J-factors for configu- or 460 bp (data not shown), as in DNA cyclisation (52). rations with different δ values are identical. This can be partly explained by the fact that the total If DNA looping needs to be torsionally in-phase, only bending magnitude of the loop is 290 degrees, close to a two degenerate loop configurations are available, break- full circle, instead of 180 degrees. ing the rotational symmetry. These loop geometries can be expressed by equation (2) with two different δ values: δ1 = −(N − 2)τ/2 and δ2 = 180 − (N − 2)τ/2, which E. DNA looping with finite end-to-end distance, satisfy the torsional phase requirement ξ = 360 · n, n = antiparallel helical axes, and in-phase torsional constraint 0, ±1, ±2,... In contrast to DNA cyclisation, no twist change is involved in forming these ideal DNA loops for Separation of the DNA ends breaks the rotational sym- any DNA length and thus the helical dependence van- metry, restoring the dependence on helical twist. Figure ishes in this case. From the expression given above for ψ 9a shows the J-factor as a function of DNA length for it is clear that the helical axes of the two loops are coin- end-to-end distances of 10 bp and 30 bp. The helical cident and their directions are reversed. Figure 8 shows dependence increases with end-to-end separation. Start- the bending profile of the loop configuration correspond- ing from the two loop configurations (corresponding to δ1 ing to δ1 for a 150 bp DNA. Surprisingly, the maximal and δ2) with zero end-to-end distance and in-phase tor- 11

sional alignment as initial configurations, two mechani- cal equilibrium configurations are obtained by using the iterative algorithm described in (52). The J-factor in figure 9a is the sum of separate J-factors calculated for the two configurations. Note that in all cases involving configurations that differ in linking number, equilibration between the two forms requires breakage of at least one of the protein-DNA interfaces. The contributions from each of these configurations are shown in detail for the case where the ends are separated by 10 bp. Interestingly, the length dependence of J computed from the individual configurations are out of phase and have a periodicity of 2 helical turns, which results from the half-twist depen- dence of the phase angles δ1 and δ2. However, their sum displays a periodicity of one helical turn. Figures 9 b and c show two such configurations for DNA molecules that are torsionally in-phase (N = 210 bp) or out-of-phase (N = 215 bp). In the case of cyclisation, the helical-phase dependence of the J-factor persists at DNA lengths well beyond that corresponding to the maximum value of J, which lies near 500 bp. This is clearly not the case for DNA looping. In figure 9a, the periodic dependence of J on DNA length for 10-bp end-to-end separation decays nearly to zero well before the maximum J value is reached. Although the periodicity of J is not attenuated quite as strongly for 30- bp end separation, there is less than four-fold variation in the value of J near 300 bp, as opposed to the more than ten-fold variation in cyclisation J-factors expected in this length range. The differences between looping and cyclisation are largely due to substantial differences in the relative contributions of DNA writhe in the two processes, as discussed below.

F. DNA looping in synapsis

Intramolecular reactions of most site-specific recombi- nation systems (46; 47; 48) and a number of DNA restric- tion endonucleases such as SfiI and NgoMIV (49), pro- ceed through protein-mediated intermediate structures in FIG. 9 The DNA-length-dependent J-factor and loop config- which a pair of DNA sites are brought together in space uration as a function of end-to-end separation (the J-factor is and the intervening DNA is looped out. The intermediate defined in section III.A). (a) The helical dependence of DNA nucleoprotein complex involved in site pairing and strand looping is shown for values of the end-to-end separation equal cleavage (and also exchange, in the case of recombinases) to 10 bp and 30 bp. The two configurations for the 10-bp sep- is termed the synaptic complex. In these systems, two aration are obtained from corresponding configurations with characteristic geometric parameters are of interest: the zero end-to-end separation by using an iterative algorithm. average through-space distance between the sites and the Therefore the two configurations are designated by the initial ◦ average crossing angle between the two ends of the loop, configurations with phase angles δ = −(N − 2)τ/2+0 (0 , which we denote the axial angle (see section III.C). The dashed line) and δ = −(N − 2)τ/2 + 180 (180◦, solid line) as latter quantity can be described in terms of the twist an- described in the text. (b) and (c) show stereo models of the two equilibrium configurations for 210-bp (b) and 215-bp (c) gle between the protein domains, τP P (figure 7b), and we antiparallel DNA loops with end-to-end separation equal to use these terms interchangeably. 10 bp. The 210- and 215-bp DNA correspond to an adjacent Figure 10 shows the helical dependence of looping (fig- peak and valley of the curve in (a), respectively. Conforma- ure 10a) and the elastic-minimum configuration of DNA tions shown in blue correspond to δ = 0; those shown in red loops (figure 10b) for different values of the axial angle. ◦ are for δ = 180 . Note that for N-bp DNA, the chain contour The most prominent feature of these results is that the length is equal to (N − 1)ℓbp. phase of the helical dependence is shifted as a function 12 of the axial angle, characterised by a relative global shift of the curve along the x-axis. This implies that DNA looping does not always occur most efficiently when two sites are separated by an integral number of helical turns, as has been suggested for some simple DNA looping sys- tems studied previously. The axial angle also globally modulates J-factors, which is apparent from the verti- cal shift in the J versus length curve and effects on the amplitude of the helical dependence. The torsion-angle- independent value of J, averaged over a full helical turn, decreases with increasing axial angle, whereas the am- plitude of the helical dependence increases. The above observations can be qualitatively explained by analogous results from DNA cyclisation. As in cyclisation, DNA forms loops most efficiently when the number of helical turns in the loop is close to an integer value. It is there- fore appropriate to consider this issue in terms of the linking number for the looped conformation, Lk, which involves contributions from the geometries of both the protein and DNA.

We define the loop helical turn Ht,loop as the sum of the DNA twist and the twist introduced by the protein subunits, divided by 360. Therefore, changing the twist angle, the axial angle will shift the phase of the helical dependence relative to that of the DNA alone. For a loop with N = 179 bp and τP P = 0, the total twist is simply equal to that for the DNA loop. Because this loop has 17.0 helical turns, only one loop topoisomer contributes to the J-factor. The value of J is a local maximum at τP P = 0 and, as shown in figure 11a, decreases mono- tonically for both τP P > 0 and τP P < 0. Contributions to J from other topoisomers of the 179-bp loop are less ◦ ◦ than 5 percent over the range −135 < τP P < +120 . The twist for the planar equilibrium conformation of a 173-bp loop is 16.5 helical turns; thus there are two alter- FIG. 10 Dependence of the J-factor on axial angle (the J- native loops that can be efficiently formed (figure 11a): factor is defined in section III.A, and the axial angle is defined either a loop with Ht,loop = 17.0 and τP P > 0, or a loop as the average crossing angle between the two ends of the loop, with Ht,loop = 16.0 and τP P < 0. The J value at τP P =0 see section III.C). (a) DNA-length dependence of J for axial is a local minimum and there is a bimodal dependence angles of 0◦, 60◦, and 120◦ with the end-to-end separation set on axial angle for loops in which the DNA twist is half- equal to 40 bp. Note that the positions of the extrema shift to integral. We investigated the phase shift of the J-factor the left with increasing values of the axial angle. (b) Stereo and found that this quantity is a non-linear function of models of minimum elastic-energy conformations of 179-bp the axial angle. From figure 10a, the calculated phase loops colour coded in accord with the corresponding axial- shifts for 60◦ and 120◦ axial angles relative to 0◦ are ap- angle values in (a). proximately 52◦ and 103◦, respectively. Moreover, the local maxima for the total J curve for N = 173 shown in figure 11a are located at −58.5◦ and 63◦, positions that the DNA. The relative contributions of loop writhe and are not in agreement with predicted angle values based twist for the Lk = 16 topoisomer of a 173-bp loop are ◦ ◦ solely on Ht,loop (−166 and 194 , respectively). shown as a function of axial angle in figure 11b. These deviations can be explained by the fact that In figure 11c, we plot the axial-angle-dependent values writhe makes an important contribution to the overall of the bending and twisting free energies for the Lk = 16 Lk for the loop. This aspect of DNA looping is dra- topoisomer and their sum, which is the total elastic-free matically different from that in the cyclisation of small energy of the loop. The minimum value of the total elas- ◦ DNA molecules. The conformations of small DNA cir- tic energy occurs at τP P = −58.5 , coincident with the cles are close to planar and the writhe contribution is position of the J-factor maximum for this topoisomer small relative to DNA twist (52; 67; 80; 81). In the case (figure 11a). This mechanical state can be achieved with of protein-mediated looping, nonzero values of the axial very little twist deformation of the loop, but at the ex- angle impose an intrinsically nonplanar conformation on pense of significant bending energy. Further reduction of 13

way that the loop geometry can compensate for this is through twist deformation. This asymmetry arises be- cause we are considering the contribution of only one loop topoisomer to the elastic free energy.

G. Conclusion

The statistical-mechanical theory for DNA looping dis- cussed above (52; 53) suggests that the helical depen- dence of DNA looping is affected by many factors and leads to the conclusion that whereas a positive helical- twist assay can often confirm DNA looping, a negative result cannot exclude DNA looping. Since it is difficult to explore the architecture of DNA loops with current experimental techniques, this theory will be useful for more reliably analysing DNA looping with limited exper- imental data. The model has advantages over previous approaches based exclusively on DNA mechanics, partic- ularly when protein flexibility is taken into account. In these cases, entropy effects become important and are responsible for the observed decay of looping efficiency with DNA length.

IV. DNA KNOTS AND THEIR CONSEQUENCES: ENTROPY AND TARGETED KNOT REMOVAL

Bacterial DNA occurs largely in circular form. No- tably, instead of a simply connected ring shape (the un- knot), the DNA often exhibits permanently entangled FIG. 11 J-factor, loop-geometry parameters, and elastic-free states, such as catenated and knotted DNA. An example energies as functions of axial angle; compare figure 10. (a) for a DNA trefoil knot is shown in figure 12. Such config- J-factor values for loop topoisomers corresponding to 179-bp urations have potentially devastating effects on the cell and 173-bp loops in figure 10. The principal contribution to development. Conversely, however, knots might have de- J for N = 179 bp comes from a single loop topoisomer with signed purposes in gene regulation, separating different Lk = 17. For N = 173 bp, the overall J-factor is the sum of contributions from two loop topoisomers with Lk values regions of the genome, or, alternatively, locking chemi- of 16 and 17, generating a bimodal dependence of J on axial cally remote parts of the genome proximate in geomet- angle as described in the text. (b) Excess helical twist, ∆Ht, rical space. In eukaryotic cells additional topological ef- and writhe of the loop formed by the Lk = 16 topoisomer fects occur in the likely entanglement of individual chro- for N = 173 bp as a function of axial angle. Excess twist mosomes. Here, we concentrate on the prokaryotic case. is computed from the expression Ht,loop − 16, where Ht,loop is the loop helical turn value described in the text, and de- pends linearly on the axial angle. The writhing number of the A. Physiological background of knots loop was calculated using the method of Vologodskii (65; 83). (c) Elastic-free energies of the Lk = 16 loop topoisomer for The discovery how one can use molecular biological N = 173 bp calculated according to equation 38 of Zhang and tools to create knotted DNA resolved a long-standing ar- Crothers (52). The individual contributions of bending and gument against the Watson-Crick double helix picture of twisting energies are shown along with their sum. DNA (12), namely that the replication of DNA could not work as the opening up of the double helix would pro- duce a superstructure such that the two daughter strands the axial angle requires even less twisting energy; how- could not be separated. In fact, the topology of both ss- ever, the bending energy increases monotonically. In con- DNA and dsDNA is continuously changed in vivo, and ◦ trast, for τP P > −58.5 , somewhat less bending energy is this can readily be mimicked in vitro, although the ac- required, but the twisting energy begins to increase sig- tivity of enzymes in vivo is much more restricted than in nificantly with increasing axial angle. Since the sense of vitro (85; 86): Different concentrations of enzymes ver- the bending deformation for τP P > 0 opposes the needed sus knotted DNA molecules accessible in vitro, that is, reduction in loop linking number, the elastic energy can- makes it possible to probe topology-altering effects by not be decreased by increasing the axial angle. The only enzymes which in vivo do not contribute to such effects. 14

(A) Topo I Topo IV Topo IV

Gyrase Gyrase

(B)

Topo IV

FIG. 12 Electron microscope image of a DNA trefoil knot, from (84). c Science, with permission.

(C)

Although it would be likely with a probability of Topo IV 1 roughly 2 that the linear DNA injected by bacterio- phage λ into its host E.coli would create a knot before cyclisation, it turned out to be difficult to detect (12). First studies therefore concentrated on the fact that un- der physiological conditions knots are introduced by en- zymes, DNA replication and recombination, DNA repair, FIG. 13 Enzymes changing the topology of dsDNA by cutting and pasting of one or both strands (example for E.coli ): (A) and topoisomerisation, using these enzymes to prove Torsional stress resulting from the Lk deficit causes the DNA both knotting and unknotting (84; 87; 88; 89; 90; 91; 92). double helix to writhe about itself (negative supercoiling). In DNA-knotting is also prone to occur behind a stalled E.coli , gyrase introduces negative supercoils into DNA and replication fork (93; 94). Some of the typical topology- is countered by topoisomerase I (topo I) and topo IV, which altering reactions undergoing in E.coli are summarised relax negative supercoils. (B) Topo IV unlinks catenanes gen- in figure 13. Knots can efficiently be created from erated by replication or recombination in vivo. (C) Topo IV nicked11 dsDNA under action of topoisomerase I at non- unknots DNA in vivo. After (85). physiological concentrations (95). Another possibility is by active packaging of a DNA mutant into phage cap- sids (96), and then denaturing the capsid proteins. Both space. Finally, knots may lead to double-strand breaks, methods produce a distribution of different knot types. as they weaken biopolymers considerably due to creation They can be separated by electrophoresis (97). of localised sharp bends (101; 102; 103; 104) as well as The existence of DNA-knots has far-reaching effects macroscopic lines and ropes (105).12 on physiological processes, and knottedness of DNA has Above we said that knots can be introduced, inter alia, therefore to be eliminated in order to maintain proper by the different enzymes of the topoisomerase family. To functioning of the cell. Among other possible effects, it remove a knot from a dsDNA, it is necessary to cut both is immediately clear that the presence of a knot in a cir- strands, and then pass one segment through the created cular DNA impedes replication of the DNA, i.e., the full gap, before resealing the two open ends. In vivo, this separation of the two daughter strands (1; 12). More- is usually achieved by topoisomerases II and IV. A re- over, even transcription is impaired (98). The presence construction of topo II is shown in figure 14, indicating of knots inhibits the assembly of chromatin (99), knotted the upper clamp holding a segment of the DNA, while chromosomes cannot be separated during mitosis (1), and the bulge-clamp introduces the cut through which the knots in a chromosome may serve as topological barri- upper segment is passed. In the figure, the segment vis- ers between different sections of chromosomes, such that ible in the pocket of the lower clamp has already been the genomic structural organisation is altered, and cer- passed through the gap. After resealing, topo II de- tain sections of the chromosomal DNA may no longer taches. This process requires energy, provided by ATP. interact (100). Conversely, it is conceivable that knots, Notably, topo II is extremely efficient, for circular ds- analogously to protein induced DNA looping, lock re- mote segments of the genome close together in geometric

12 The weakness of strings at the site of the knot can be experienced easily by pulling apart a linear nylon string in comparison to a 11 One of the two strands is cut. knotted one (102). 15

nection with the Bridges of K¨onigsberg problem (112), determining a closed path by crossing each K¨onigsberg bridge exactly once. However, the first investigations of topological problems in modern science is most proba- bly due to Kepler, who studied surface tiling to great detail (therefore the notion of Kepler tiling in mathe- matical literature) (113). Further initial steps were due to Leibniz, Vandermonde and Gauss, in whose collection of papers drawings of various knots were found13 whose linking (‘Umschlingungen’=windings) number is indeed a knot invariant (114; 115; 116). Gauss’ student, List- ing, in fact introduced the term ‘topology’, and his work on knots may be viewed as the real starting point of knot theory (117), although his complexions number was proved by Tait not to be an invariant. Inspired by Helmholtz’ theory of an ideal fluid and building on Listing’s early contributions to knot the- ory, Scotsmen and chums Maxwell, Tait and Thom- son (Lord Kelvin) started to discuss the possible im- plications of knottedness in physics and chemistry, ul- timately distilled into Thomson’s theory of vortex atoms FIG. 14 Topoisomerase II. This enzyme can actively change (118; 119). Out of this endeavour emerged Tait’s inter- the topology of DNA by cutting the double-strand and pass- est in knots, and he devoted most of his career on the ing another segment of double-stranded DNA through the gap classification of knots. Numerous charts and still un- before resealing it. The image depicts a short stretch of DNA resolved conjectures on knots document his pioneering (horizontally at the bulge of the enzyme, as well as another work (120; 121; 122; 123). The studies were carried on segment in the lower clamp (perpendicular to the image) after by Kirkman and Little (124; 125; 126; 127). A more de- passage through the gap from the upper clamp. This mech- tailed historical account of knot theory may be found in anism makes sure that no additional strand passage through the review article by van de Griend (128), and on the St. the open gap can take place (110; 111). Figure courtesy James Andrews history of mathematics webpages14. M Berger, UC Berkeley. Planar projections of knots were rendered unique by Listing’s introduction of the handedness of a crossing, DNA of length ≃ 10 kbp it was found that topo reduced i.e., the orientational information assigned to a point the knotted state in between 50 and 100-fold, in com- where in the projection two lines intersect. With this parison to a ‘dumb’ enzyme, which would simply pass information, projections are the standard representation segments through at random (106). We note that the for knot studies. On their basis, the minimum number of step-wise action of topoisomerase II was recorded in a crossings (‘essential crossings’) can be immediately read single molecule setup using magnetic tweezers (107; 108). off as one of the simplest knot invariants. To arrive at Topoisomerases are surveyed in the review of (109). the minimum number, one makes use of the Reidemeister moves, three fundamental permitted moves of the lines in a knot projection, as shown in figure 15. More com- plex knot invariants include polynomials of the Alexan- B. Classification of knots der, Kauffman and HOMFLY types (114; 115; 116).15 Here, we will only employ the number of essential cross- Knottedness can only be defined on a closed (circular) ings as classification of knots, in particular, we do not chain. This is intuitively clear as in an open linear chain concern ourselves with the question of degeneracy for a a knot can always be tied, or an existing knot released. given knot invariant. However, the bookkeeping of knot Mathematically, this means that knot invariants are only well-defined for a closed space-curve. However, a linear chain whose ends are permanently attached to one, or two walls, or whose ends are extended towards infinity, can 13 Probably copies from an English original. be considered as (un)knotted in the proper mathematical 14 The MacTutor History of Mathematics archive, URL: sense, i.e., their knottedness cannot change. In a loser http://turnbull.mcs.st-and.ac.uk/∼history/ sense, we will also speak of knots on an open piece of 15 These polynomials all start to be degenerate for higher order DNA, appealing to intuition. knots, i.e., above a certain knot complexity several knots may correspond to one given polynomial (114; 115). In the case of the The classification of knots, or graphs in general, in simpler knots attained in most DNA configurations and in knot terms of invariants can essentially be traced back to Eu- simulations, the Alexander polynomials are unique, in contrast to ler, recalling his graph theoretical elaboration in con- the Gauss or Edwards invariant, compare, e.g., reference (129). 16

n

N−n

FIG. 16 Figure-eight structure, in which a slip-link separates two loops of size n and N − n, such that they can freely exchange length among each other, but none of the loops can completely retract from the slip-link. On the right, a schematic drawing of a slip-link, which may be thought of as a small belt buckle.

required to form a knot K, without the closing segment (145). The tendency towards knotting during polymer cyclisation creates problems in industrial and laboratory FIG. 15 The three Reidemeister moves. All topology- processes. preserving moves of a knot projection can be decomposed into these three fundamental moves. D. Entropic localisation in the figure-eight slip-link structure. types is vital in knot simulations. To obtain a feeling for how and when entropy leads to the localisation of a permanently entangled structure, C. Long chains are almost always entangled. we consider the simplest polymer object with non-trivial (non-unknot) geometry, the figure-eight structure (F8) During the polymerisation and final cyclisation of a displayed in figure 16. In this compound, a pair con- polymer grown in a solvent under freely floating condi- tact is enforced by a slip-link, separating off two loops tions, a knot is created with probability 1. This Frisch- in the circular polymer, such that none of the loops can Wassermann-Delbr¨uck conjecture (130; 131) could be fully retract, and both loops can freely exchange length mathematically proved for a self-avoiding chain (132; among each other. We denote the loop sizes by n and 133), compare also (134). This is consistent with nu- N − n, where N is the (conserved) total length of the merical findings that the probability of unknot forma- polymer chain. For such an object, we can actually per- tion decreases dramatically with chain length (129; 135). form a closed statistical mechanical analysis based on Indeed, recent simulations results indicate that the prob- results from scaling theory of polymers, and compare the ability of finding the unknot in such a cyclisised polymer result with Monte Carlo simulations of the F8. decays exponentially with chain length (136; 137; 138; The statistical quantities that are of particular interest 139): are the gyration radius, Rg, and the number of degrees of N freedom, ω (146). Rg, as defined in equation (61), mea- P∅(N) ∝ exp − . (6) sures the root mean squared distance of the monomers N  c  along the chain to the gyration centre, and is therefore However, there exist theoretical arguments and simula- a good measure of its extension. It can, for instance, be tions results indicating that the characteristic number measured by light scattering experiment. The degrees of of monomers Nc occurring in this relation may become freedom ω count all possible different configurations of surprisingly large (140; 141; 142; 143). The probability the chain. For a circular polymer (i.e., a polymer with to find a given knot type K on random circular poly- r(0) = r(N)), the gyration radius becomes mer formation has been fitted with the functional form ν (143; 144; 145) Rg ≃ AN (8)

c b N with exponent ν = 1/2 for a Gaussian chain, and ν = P (N)= a N − N exp − , (7) K 0 d 0.588 in the 3D excluded volume case (ν = 3/5 in the    Flory model, and ν = 3/4 in 2D). Whereas in 2D this where a, b, and d are free parameters depending on K, scaling contains truly a ring polymer, in 3D the exponent and c ≈ 0.18. N0 is the minimal number of segments ν emerges from averaging over all possible topologies, and 17

necessarily includes knots of all types (146; 147; 148). For a circular chain, the number of degrees of freedom contains the number of all possible ways to place an N- step walk on the lattice with connectivity µ (e.g., µ =2d on a cubic lattice in d dimensions), µN , and the entropy loss for requiring a closed loop, N −dν , involving the same Flory exponent ν. For the Gaussian case, we recognise in this entropy loss factor the returning probability of the random walk. In the excluded volume case, N −dν is an analogous measure (24; 146; 150). Thus, for a circular chain embedded in d-dimensional space, the number of degrees of freedom is

ω ≃ µN N −dν . (9)

Let us evaluate these measures for the F8 from figure 16. FIG. 17 Bead-and-tether chain used in Monte Carlo simula- As a first approximation, consider the F8 as a Gaus- tion, showing a typical equilibrium configuration for a self- sian (phantom) random walk, demonstrating that, like avoiding chain: the localisation of the smaller loop is distinct. in the charged knot case (149), entropic effects give rise Note that in this 2D simulation the slip-link is represented by the three tethered black beads. to long-range interactions. The two loops correspond to returning random walks, i.e., the number of degrees of freedom for the F8 in the phantom chain case becomes loose 16 large ( ) loop, compare figure 17. This can be quan- (24; 146; 151) tified in terms of the average size of the smaller loop,

N −d/2 −d/2 ωF8,PC ≃ µ n (N − n) , (10) L/2 hℓi< ≡ 2 ℓp(ℓ)dℓ . (12) where d is the embedding dimension. We note that nor- Za malisation of this expression produces the probability In d = 2, we obtain density function for finding the F8 with a given loop size ℓ = na (L = Na), L hℓi< ∼ , (13) −d/2 −d/2 | log(a/L)| pF8,PC(ℓ) ≃ N ℓ (L − ℓ) , (11) such that with the logarithmic correction the smaller loop N where denotes a normalisation factor. The conversion is only marginally smaller than the big one. In contrast, from expressing the chain size in terms of the number of one observes weak localisation monomers to its actual length is of advantage in what fol- 1/2 1/2 lows, as it allows to more easily keep track of dimensions. hℓi< ∼ a L (14) Here, we use the length unit a, which may be interpreted as the monomer size (lattice constant), or as the size of in d = 3, in the sense that the relative size hℓi 4 one To classify different grades of localisation, we follow encounters hℓi< ∼ a, corresponding to strong localisation, the convention from references (152; 153). The average as the size of the smaller loop does not depend on L L−a but is set by the short-distance cutoff a. Above four loop size hℓi determined through hℓi = a ℓp(ℓ)dℓ is trivially hℓi = L/2 by symmetry of the structure. Here, dimensions, excluded volume effects become negligible, we introduce a short-distance cutoff setR by the lattice and therefore both Gaussian and self-avoiding chains are constant a. However, as the probability density func- strongly localised in d ≥ 4.17 tion is strongly peaked at ℓ = 0 and ℓ = L, the two To include self-avoiding interactions, we make use of poles caused by the returning probabilities, and there- results for general polymer networks obtained by Du- fore a typical shape consists of one small (tight) and one plantier (147; 154), which are summarised in the ap- pendix at the end of this review. In terms of such net- works, our F8-structure corresponds to the following pa- rameters: the number N = 2 of polymer segments with 16 Here and in the following we consider two configurations of a lengths s1 = ℓ = na and s2 = L − ℓ = (N − n)a, form- polymer chain different if they cannot be matched by transla- ing L = 2 physical loops, connected by n4 = 1 vertex tion. In addition, the origin of a given structure is fixed by a vertex point (see below), i.e., a point where several legs of the polymer chain are joint. In the F8-structure, this vertex natu- rally coincides with the slip-link. For a simply connected ring polymer, such a vertex is a two-vertex anywhere along the chain. 17 Consideration of higher than the physical 3 space dimensions is often useful in polymer physics. 18 of order four. By virtue of equation (79), the number of 600 configurations of the F8 with fixed ℓ follows the scaling form 500 ℓ ω (ℓ) ≃ µL(L − ℓ)γF8−1X , (15) 400 F8 F8 L − ℓ   300 with the configuration exponent γF8 = 1 − 2dν + σ4. In the limit ℓ ≪ L, the contribution of the large loop 200 in equation (15) should not be affected by a small ap- pendix, and therefore should exhibit the regular Flory 100 scaling ∼ (L − ℓ)−dν (155; 156; 210). This fixes the scal- ing behaviour of the scaling function X (x) ∼ xγF8−1+dν F8 0 in this limit (x → 0 in dimensionless variable x), such 0 2e+07 4e+07 6e+07 8e+07 that FIG. 18 Monte Carlo simulation of an F8-structure in 2D: L −dν −c ωF8(ℓ) ≃ µ (L − ℓ) ℓ , ℓ ≪ L, (16) loop sizes ℓ and L − ℓ as a function of Monte Carlo steps for a chain with 512 monomers. The symmetry breaking after the where c = −(γF8 −1+dν)= dν −σ4. Using σ4 = −19/16 symmetric initial condition is distinct. and ν =3/4 in d = 2 (147; 154), we obtain 100 c = 43/16=2.6875, d =2. (17) slope=−2.68

In d = 3, σ4 ≈ −0.48 (155; 156; 157) and ν ≈ 0.588, so −1 that 10 c ≈ 2.24. (18) 10−2 In both cases the result c > 2 enforces that the loop of length ℓ is strongly localised in the sense defined above. This result is self-consistent with the a priori assumption 10−3 ℓ ≪ L. Note that for self-avoiding chains, in d = 2 the localisation is even stronger than in d = 3, in contrast to the corresponding trend for ideal chains. 10−4 We performed Monte Carlo (MC) simulations of the 1 10 100 2D figure-eight structure, in which the slip-link was rep- resented by three tethered beads enforcing a sliding pair FIG. 19 Power-law fit to the probability density function of contact such that the loops cannot fully retract (see fig- the smaller loop. The fit produces a slope of -2.68, in excellent ure 18). We used a 2D hard core bead-and-tether chain agreement with the calculated value. with 512 monomers, starting off from a symmetric initial condition with ℓ = L/2. Self-crossings were prevented by keeping a maximum bead-to-bead distance of 1.38 times ted chains. Before going further into the theoretical mod- the bead diameter, and a maximum step length of 0.15 elling of knotted chains, we report some of the results times the bead diameter. As shown in figure 19, the size based on simulations studies of both Gaussian and self- distribution for the small loop can be fitted to a power avoiding walks.18 Such simulations either start with a law with exponent c =2.68 in good agreement with equa- given knot configuration and then perform moves of spe- tion (17). cific segments, each time making sure that the topol- An experimental study of entropic tightening of ogy is preserved; or, each new configuration emanates a macroscopic F8-structure was reported in reference from a new random walk, whose correct topology may (158). There, a granular chain consisting of hollow steel be checked by calculating the corresponding knot invari- spheres connected by steel rods was once twisted and ant, usually the Alexander polynomial, and created con- then put on a vibrating table. From digital imaging, the figurations that do not match the desired topology are distribution of loop sizes could be determined and com- discarded. We note that it is of lesser significance that pared to a power-law with index 43/16 as calculated for knot invariants such as the Alexander polynomials in fact the 2D excluded volume chain. The agreement was found are no longer unique for more complex knots, because for to be consistent (158). typical chain lengths with the highest probability simpler

E. Simulations of entropic knots in 2D and 3D. 18 Although per se a Gaussian chain cannot have a fixed topology Much of our knowledge about the interaction of knots due to its phantom character, such simulations introduce a fixed with thermal fluctuations is based on simulations of knot- topology by rejecting moves that result in a different knot type. 19

knots are created, for which the invariants are unique. by the size of the tight knot is a confluent effect, such For more details we refer to the works quoted below. that the confluent exponent ∆ should be related to the In fact, the fixed topology turns out to have a highly size distribution of the knot region. Not surprisingly, non-trivial effect on chains without self-excluded volume. the connectivity factor µK ≈ 4.68 was found to be in- As conjectured in (159), a Gaussian circular chain, whose dependent of K, assuming the standard value for a cu- permitted set of configurations is restricted to a fixed bic lattice (167). Also the amplitude AK and the expo- topology, will exhibit self-avoiding behaviour. This was nent ∆K of the confluent correction turned out to be K- proved in a numerical analysis in (148). The required independent. We note that a similar analysis in (pseudo) number of monomers to reach this self-avoiding expo- 2D20 also strongly points towards tight localisation of the nent was estimated to be of the order of 500. Keeping knot (168). this non-trivial scaling of a Gaussian chain at fixed topol- In contrast to the above results, 3D simulations under- ogy in mind, knot simulations on the basis of phantom taken in (169) (also compare (170)) show the dependence Gaussian chains were performed in (160), always making 3/5 −4/15 sure that the configurations taken into the statistics fulfil Rg ≃ N C (22) the desired knot topology. The dependence of the gyration radius Rg on the knot of the gyration radius on the knot type, characterised type was investigated for simpler knots in 3D in reference by the number C of essential crossings. Rg, that is, (139). On the basis of the expansion decreases as a power-law with C, where the exponent −4/15 = 1/3 − ν (169). The functional form (22)) was 2 −∆ −1 2νK Rg ≃ AK 1+ BKN + CKN + o(1/N) N , (19) derived from a Flory-type argument for a polymer con-   struct of C interlocked loops of equal length N/C by ar- including a confluent correction (139; 161; 162) in com- guing that each loop occupies a volume ≃ (N/C)3ν , and parison to the standard expression (8), it was found that the volume of the knot is given by V ≃ C(N/C)3ν (i.e., the Flory exponent νK is independent of the knot type assuming that due to self-avoiding repulsion the volume K and has the 3D value 0.588. This was interpreted via of individual loops adds up to the total volume). Equa- a localisation of knots such that the influence of tight tion (22) then follows immediately. This model of equal knots on Rg is vanishingly small. In fact, ∆ is of the loop sizes is equivalent to a completely delocalised knot. order of 0.5 according to the investigations in references It may therefore be speculated, albeit rather long chain (161; 162; 163; 164). Based on longer chains in compar- sizes of up to 400 were used, whether the numerical algo- ison to reference (139), the study of (165) thus corrobo- rithm employed for the simulations in (169) causes finite- rates the independence of νK ≈ 0.588 of the knot type K. size effects that, in turn, prevent a knot localisation. We In recent AFM experiments analysing single DNA knots, note that the Flory-type scaling assumed to derive ex- ν the Flory scaling Rg ≃ N was confirmed for both simple pression (22) is consistent with a modelling brought for- and complex knots (166). ward in reference (171), in which the knot is quantified For the number of degrees of freedom ωK, it was found by the aspect ratio in a configuration corresponding to a 19 for the form maximally inflated tube with the given topology (i.e., a state corresponding to complete delocalisation). In refer- αK−2 N BK ωK ≃ AKN µ 1+ + ... (20) K N ∆K ence (169), the temporal relaxation behaviour of a given   knot was also studied. While regular Rouse behaviour with confluent corrections, that while for the unknot with was found for the case of the unknot, the knotted chains α∅ ≈ 0.27 expression (20) is consistent with the standard displayed somewhat surprising long time contributions to result (9) ([0.27 − 2]/3 ≈ −0.58 ≈ ν), for prime knots the relaxation time spectrum (7; 172; 173; 174), a phe- αK = α∅ + 1, and for composite knots with Nf prime nomenon already pointed out by de Gennes within an ac- components, tivation argument to create free volume in a tight knot in α = α + N . (21) order to move along the chain (175). Note that relatively K ∅ f lose knots in shorter chains do not appear to exhibit such This finding is in agreement with the view that each extremely long relaxation time behaviour (176). prime component of a knot K is tightly localised and Simulation of a 3D knot with varying excluded vol- statistically able to move around one central loop, each ume showed, if only the excluded volume becomes large prime component counting an additional factor N of de- enough, the gyration radius of the knot is independent grees of freedom. The fact that for a chain of finite thick- of the knot type (177). The picture of tight knots is fur- ness the size of the big central loop is in fact diminished ther corroborated in the study by Katritch et al. using

19 Note that we changed the exponent by 1 in comparison to the 20 The simulated polymer chain moves in 2D, however, crossings original work, making the counting of non-translatable configu- are permitted at which one chain passes underneath another. rations consistent with the counting convention specified in foot- In that, the simulated polymers are in fact equivalent to knot note 16. projections with a certain orientations of individual crossings. 20 a Gaussian chain model with fixed topology to demon- a consistent result by a variant of the knot scale method. strate that the size distribution of the knot is distinctly Another recent study uses a more realistic model for a peaked at rather small sizes (144). polymer chain, namely, a simplified model of polyethy- Apart from determining the statistical quantities Rg lene; with up to 1000 monomers in the simulation, the and ωK from simulations, there also exist indirect meth- exponent t ≈ 0.65 is found (and delocalisation is obtained ods for quantifying the size of the knot region in a knotted in the dense phase) (184). polymer. One such method is to confine an open chain Thus, there exist simulations results pointing in both containing a knot between two walls, and measuring the directions, knot localisation and delocalisation. As the finite size corrections of the force-extension curve due to latter may be explained by finite size effects, it seems the knot size. This is based on the idea that the gyration likely that (at least simple) knots in 2D and 3D localise radius for a system depending on more than one length in the sense that the knot region occupies a portion of scale (i.e., apart from the chain length N) shows above the chain that is significantly smaller in comparison to mentioned confluent corrections, such that (178) the entire chain. In particular, this would imply that the average size of the knot region hℓi scales with the chain N N R = AN ν Φ 0 , 1 ,... ≃ AN ν 1 − BN −∆ (23) length Na with an exponent less than one, such that g N N     hℓi when only the largest correction is considered, and in 3D lim =0. (26) N→∞ Na ∆ ≈ 0.5 is supposed to be universal (161; 162; 163; 164). If this leading correction is due to the argument N0/N Below, we show from analytical grounds that such a lo- in the scaling function Φ, the length scale N0 depends t calisation is a natural consequence of interactions of a on N through the scaling N0 ∼ N with ∆ = 1 − t. chain of fixed topology with fluctuations. We note, how- From Monte Carlo simulations of a bead and tether chain ever, that conclusive results for knot localisation may model, it could then be inferred that the size of the knot in fact come from experiments: Manipulation of single scales like (178) chains such as DNA can be performed for rather long chains, making it possible to reach beyond the finite-size N ∼ N t, t =0.4 ± 0.1 (24) k corrections inherent in, e.g., the force-extension simula- This, in turn, enters the force-extension curve f ′ = G(R′) tions mentioned above. The aforementioned AFM stud- ′ ν ies on single DNA knots indeed reveal knot localisation with the dimensionless force f = fAN /(kBT ) and dis- tance R′ = R/(AN ν ) of the walls, in the form with con- of flattened knots (166); due to experimental limitations, fluent correction presently only one DNA length was investigated, such that the scaling exponent t currently cannot be obtained. f ′ ≃ G(R′) 1+ g(R′)N −∆ . (25) Before proceeding to these analytical approaches, we note that there have also been performed simulations of   From the simulation, t = 0.4 corresponds to the best knotted chains under non-dilute conditions (185; 186). data collapsing, assuming the validity of the scaling ar- In (pseudo) 2D, these have found delocalisation of the guments. An argument in favour of this approach is the knot, i.e., limN→∞hℓi/N = const. We come back to consistency of the exponent t = 0.4 with the inferred these simulations below in connection with the modelling ∆=0.6, which is close to the known value. Note that of dense and Θ-knots. the force-extension of a chain with a slip-link was dis- cussed in reference (179) and shown that a loop sepa- rated off by a slip-link is confined within a Pincus-de F. Flattened knots in dilute and dense phases. Gennes blob. We also note that results corresponding to delocalisation in force-size relations were reported in Analytically, knots are a hard problem to tackle. Sta- (180; 181). An entropic scale was conceived in (182): tistical mechanical treatments of permanently entangled Separating two chains with fixed topology but allowing polymers are so difficult to treat since topological restric- them to exchange length (e.g., through a small hole in a tions cannot be formulated as a Hamiltonian problem but wall) would enable one to infer the localisation behaviour appear as hard constraints partitioning the phase space of a knot by comparing the equilibrium balance of this (24; 146; 187; 188).21 A segment of a 3D knot, in other knot with a slip-link construct of known degrees of free- words, can move without feeling the constraints due to dom until the average length on both sides coincides. the non-trivial topology of a knotted state, until it actu- The preliminary results in (182) are shadowed by finite- ally collides with another segment. The accessible phase size effects of the accessible system size, as limited by space of degrees of freedom is therefore characterised by computation power. The analysis in reference (183) of a self-avoiding polygon model uses the method of closure of a short fragment of the knot and subsequent determi- nation of its Alexander polynomial to obtain the scaling 21 For comparison, self-avoidance in 3D is usually treated as a per- exponent t = 0.75; in a second variant, the authors find turbation, i.e., as a “soft constraint”, in analytical studies (146). 21

inequalities.22 Consequently, only a relatively small range of prob- lems have been treated analytically, starting with the seminal papers by Edwards (189; 190), in which he considers the classification of topological constraints in polymer physics. De Gennes addressed the problem of tight knot motion along a polymer chain using scaling arguments for the activation of free length inside the knot region, producing a double-exponential expression for the corresponding time scale (175), which might ex- plain the extreme long-time contributions in the relax- ation time spectrum of permanently entangled polymers (7; 172; 173; 174; 191). Some analytical results were ob- tained for a pair, or an ‘Olympic’ gel of entangled poly- mer rings, see for instance, (192; 193; 194; 195; 196). In a mean field approach based on the Kauffman invari- ant the entropy of knots was investigated in references (197; 198; 199). Similarly, some statistical properties of random knot diagrams were investigated in (200; 201). However, some insight can be gained on the basis of phe- nomenological models, which we will come back to below. Here, we continue with an analytical study of flat knots. FIG. 20 AFM tapping-mode images of flattened, complex One possibility to treat knotted polymer chains ana- DNA-knots with approximately 30-40 essential crossings, see lytically is to confine the degrees of freedom of the knot (166). The substrate surface used is AP-mica (freshly cleaved to motion in 2D, only. The knot, that is, is preserved, mica reacted with an amino terminal silane to obtain a pos- as at the crossings the chain is allowed to form an over- itively charged surface). The DNA knots used are extracted /underpassing, while the rest of the knot is confined to from bacteriophage P4; the DNA is a 11.4 kbp molecule (with 2D. Such a confinement can in fact be experimentally a 1.4 kbp deletion resulting in a final length of 10 kbp) which realised in various ways. Thus, the chain can be con- has two cohesive ends. They are not covalently closed, thus fined between two close-by glass slabs, as demonstrated no supercoiling is present. The knot adsorbed out of the 3D bulk on to the surface is strongly trapped, i.e., the knot is in (202); it can be pressed flat on a surface by gravita- ‘projected’ onto the surface without any equilibration. The tion or similar forces, for instance in macroscopic sys- knot appears rather delocalised. Courtesy F. Valle and G. tems (158; 203); the chain can be adhesively bound to a Dietler. membrane and still reach configurational equilibrium, as experimentally shown for DNA in references (166; 204). Or it can be adsorbed to a mica surface either by APTES transient additional loops are sufficiently short-lived so coating or by providing bivalent Mg ions in solution, as that we can neglect them in our analysis. Then, we can shown in figure 20. From such flat knots as discussed apply results from scaling analysis of polymer networks of in the remainder of this section, we will be able to infer the most general type shown in figure 79, see the primer certain generic features also for 3D knots. in the appendix. We note that from the Monte Carlo A flat knot therefore corresponds to a polymer network simulations we performed it may be concluded that such in 2D, but the orientation of the crossings is preserved, additional vertices can in fact be neglected. such that the network graph actually coincides with a typical knot projection (114; 115; 116), as shown in fig- ure 21 on the left. This projection of the trefoil, and sim- 1. Flat knots in dilute phase. ilar projections for all knots, displays the knot with the essential crossings. A flat knot can, in principle acquire We had previously found that for the F8-structure the an arbitrary number of crossings by Reidemeister moves; probability density function for the size of each loop is for instance, the bottom left segment of the flat trefoil peaked at ℓ → 0 and ℓ → L. From the scaling anal- can slide under the vicinal segment, creating a new pair ysis for self-avoiding polymer networks, we concluded of vertices, and so on. However, we suppose that such strong localisation of one subloop. For more complicated structures, the joint probability to find the individual segments with given lengths si is expected to peak at the edges of the higher-dimensional configuration hyper- 22 Although a similar statement is true for polymer networks in space. Some analysis is necessary to find the characteris- 3D, the field theoretical results for their critical exponents are in fact obtained as averages over all topologies. For instance, the tic shapes. Let us consider here the simplest non-trivial exponent ν entering the gyration radius of a a 3D polymer ring knot, the (flat) trefoil knot 31 shown in figure 21. Each counts all knotted states (147). of the three crossings is replaced with a vertex with four 22

s3 six different contractions, as grouped in figure 22 around the original flat trefoil at position III. As an example, in the top row of figure 22 contraction VI follows from s4 the original trefoil III if the uppermost segment becomes very small, and similarly the network VII emanates from s5 contraction VI if one of the four symmetric segments be- s6 comes very small. For each of these networks, one can calculate the corresponding exponent c in a similar way s1 s2 as above, leading to the general expression

N FIG. 21 Flat trefoil knot with segment labels. On the right, c =2+ nN dν − 1 + |σN |− dν . (30) a schematic representation of a localised flat trefoil with one 2 N≥4      large segment is shown. X The σN are given in equation (80). In figure 22, the various contractions are arranged in increasing exponent outgoing legs, and the resulting network is assumed to c. separate into a large loop and a multiply connected re- 5 Our scaling analysis relies on an expansion in a/ℓ ≪ 1, gion which includes the vertices. Let ℓ = i=1 si be and the values of c determine a sequence of contractions the total length of all segments contained in the multiply smallest P according to higher orders in a/ℓ: The value of connected knot region. Accordingly, the length of the c corresponds to the most likely contraction, while the large loop is s6 = L − ℓ. others represent corrections to this leading scaling be- In the limit ℓ ≪ L, the number of configurations of this haviour, and are thus less and less probable (see figure network can be derived in a similar way as in the scaling 22). To lowest order, the trefoil behaves like a large ring approach followed for the F8. This procedure determines polymer at whose fringe the point-like knot region is lo- the concrete behaviour of the scaling form cated. At the next level of resolution, it appears con- s s s s tracted to the figure-eight shape GI. For more accurate ω ≃ µLW L − ℓ, ℓ, 1 , 2 , 3 , 4 (27) III III ℓ ℓ ℓ ℓ data, the higher order shapes II to VII may be found with   decreasing probability. Interestingly, the original uncon- including the scaling function W that depends on alto- tracted trefoil configuration ranks third in the hierarchy gether six arguments. The index III is chosen according of shapes. to figure 22, where the flat trefoil configuration in the These predictions were checked by MC simulations dense phase appears at position III of the scheme (ex- with the same conditions as described above, to pre- plained below). After some manipulations, the number vent intersection. The flat trefoil knot was prepared of degrees of freedom yields in the form (152) from a symmetric, harmonic 3D representation with 512 L −dν −c monomers, which was collapsed and then kept on a hard ωIII(ℓ,L) ∼ µ (L − ℓ) ℓ , (28) ∗ wall by the “gravitational” field V = −kBTh/h perpen- with the scaling exponent dicular to the wall, where h is the height of a monomer, and h∗ was set to 0.3 times the bead diameter. Configu-

c = −(γIII − 1+ dν) − m, m =4. (29) rations corresponding to contraction I are then selected by requiring that besides a large loop, they contain only Here, m = 4 corresponds to the number of independent one segment larger than a preset cutoff length (taken to integrations over the segments si (i = 1,..., 4) of the be 5 monomers), and similarly for contraction II. The knot region, as we only retain the cumulative size ℓ = size distributions for such contractions, as well as for all 5 i=1 si of the knot region. Putting numerical values, we possible knot shapes are shown in figure 23. The tails of find c = 65/16, i.e., strong localisation. the distributions are indeed consistent with the predicted PHowever, some care is necessary in performing these power laws, although the data (especially for contraction integrations, since the scaling function WIII may exhibit II) is too noisy for a definitive statement. non-integrable singularities if one or more of the argu- Our scaling results pertain to all flat prime knots. In ments si/ℓ tend to 0. The geometries corresponding to particular, the dominating contribution for any prime these limits (edges of the configuration hyperspace) rep- knot corresponds to the figure-eight contraction GI, as resent contractions of the original trefoil network GIII in equation (30) predicts a larger value of the scaling ex- the sense that the length of one or more of the segments ponent c for any network G other than GI. Accordingly, si is of the order of the short-distance cutoff a. If such a figure 24 demonstrates the tightness of the prime knot short segment connects different vertices, they cannot be 819. Composite knots, however, can maximise the num- resolved on larger length scales, but appear as a single, ber of configurations by splitting into their prime factors new vertex. Thus, each contraction corresponds to a dif- as indicated in figure 24 for 31#31. Each prime factor ferent network G, which may contain a vertex with up to is tight and located at the fringe of one large loop, and eight outgoing legs. For the flat trefoil knot, there exist accounts for an additional factor of L for the number of 23

I II III IV V VI VII Dilute

c 2.6875 3.375 4.0625 5.5 6.1875 9.0625

Dense

c 0.125 0.75 1.375 1.875 2

Θ−point

c 0.714 1.143 1.571 2.095 2.524 3.857

FIG. 22 Hierarchy of the flat trefoil knot 31. Upper row: dilute phase. Middle row: Θ-phase. Bottom row: dense phase. To the left of each row, the trefoil projection is shown. It splits up into the hierarchies of configurations, with exponents c below each contraction. The small protruding legs represent the big central loop, compare, for instance, figure 21 on the right with position III of the top row. See text for details.

by ‘projecting’ a dilute 3D knot from the bulk onto a −1 fig. 8 slip−link 10 contraction I mica surface, on which the knot is adsorbed. Variation trefoil knot of the ionic strength in the solution determines whether contraction II slope=−2.7 the knot is going to be strongly trapped on the surface slope=−3.4 such that, once captured on the surface, it is completely 10−2 immobilised (small ionic strength); or whether the ad- sorption is weaker such that the knot can (partially) equi- librate while being confined to 2D, i.e., equilibrate as a flat knot. Figure 20 shows a strongly trapped complex 10−3 knot, whereas figure 25 depicts a weakly adsorbed simple knot, compare (166).

−4 10 2. Flat knots under Θ and dense conditions. 10 100 In many situations, polymer chains are not dilute. Polymer melts, gels, or rubbers exhibit fairly high densi- ties of chains, and the behaviour of an individual chain FIG. 23 Power law tails in probability density functions for the size ℓ of tight segments: As defined in the figure, we in such systems is significantly different compared to the show results for the smaller loop in a figure-eight structure, dilute phase (146; 172; 191). Similar considerations ap- the overall size of the trefoil knot, as well as the two leading ply to biomolecules: in bacteria, the gyration radius of contractions of the latter. the almost freely floating ring DNA may sometimes be larger than the cell radius itself. Moreover, under certain conditions, there is a non-negligible osmotic pressure due configurations as compared to a ring of length L without to vicinal layers of protein molecules, which tends to con- a knot. Indeed, this gain in entropy leads to the tight- fine the DNA (205; 206; 207). In protein folding studies, ness of knots. Flat knots can experimentally be produced globular proteins in their native state are often modelled 24

FIG. 25 Flat knot imaged by AFM, similar to the one shown in figure 20. However, this knot is rather simple (likely a trefoil) and was allowed to relax while attaching to freshly cleaved mica in the presence of bivalent Mg counterions. Courtesy E. Ercolini, J. Adamcik, and G. Dietler. Note the close resemblance to the trefoil configuration shown in figure 24.

related to 2D vesicles and lattice animals (branched poly- mers) (217; 218; 219; 220). As studied in reference (221), the value of the exponent c for the 2D dense F8 is (compare to the appendix)

c = −γF8 = 11/8=1.375, (31) implying that the smaller loop is weakly localised. This means that the probability for the size of each loop is FIG. 24 Typical configurations of 256-mer chains for the tre- peaked at ℓ = 0 and, by symmetry, at ℓ = L. An analo- foil 31, the prime knot 819, and the composite knot 31#31 gous reasoning for the 2D F8 at the Θ point gives consisting of two trefoils, in d = 2. The initial conditions were symmetric in all cases. c = 11/7=1.571. (32) In both cases the smaller loop is weakly localised in the sense that hli 0. This ure 26, the lines represent the bonds (tethers) between can be obtained by considering a single polymer of total the monomers (beads, not shown here). The three black length L inside a box of volume V and taking the limit dots mark the locations of the tethered beads forming L → ∞, V → ∞ in such a way that f = L/V remains the slip-link in 2D. The initial symmetric configuration finite (209; 210; 211). Alternatively, dense polymers can soon gives way to a configuration with ℓ ≪ L on ap- be obtained in an infinite volume through the action of proaching equilibrium. Figure 27 shows the development an attractive force between monomers. Then, for tem- of this symmetry breaking as a function of the number peratures T below the collapse (Theta) temperature Θ, of MC steps. We note, however, that the fluctuations of the polymers collapse to a dense phase, with a density the loop sizes in the “stationary” regime appear to be f > 0, which is a function of T (210; 212; 213; 214). larger in comparison to the dilute case studied in refer- For a dense polymer in d dimensions, the exponent ν, ence (153), compare figure 18. We checked that for den- ν defined by the radius of gyration Rg ∼ L , becomes sities (area coverage) above 40% the scaling behaviour ν = 1/d. The limit f = 1 is realised in Hamiltonian becomes independent of the density. (The above simu- paths, where a random walk visits every site of a given lation results correspond to a density of 55%.) The size lattice exactly once (215; 216). Dense polymers may be distribution data is well fitted to a power law (for over 25

10−1 slope=−1.375

10−2

10−3

10−4

FIG. 26 Symmetric (ℓ = L/2 = 128) initial configuration 10−5 of a 2D dense F8 (left) and equilibrium configuration (right) 1 10 100 1000 with periodic boundary conditions. The two different grey values correspond to the two subloops created by the slip- link. The slip-link itself is represented by the three (tethered) FIG. 27 The loop size probability distribution p(ℓ) at ρ = black dots. 55% area coverage, for the F8 with 512 (top) and 1024 (bot- tom) monomers. The power law with with the predicted ex- ponent c = 1.375 in equation (31) is indicated by the dotted 1.5 decades with 1024 monomers), and the corresponding line. exponent with 512 and 1024 monomers in figure 27 is in good agreement with the predicted value (31). For our MC analysis, we again used a hard core bead- the dense 2D trefoil, we predict that one mainly observes and-tether chain, in which self-crossings were prevented delocalised shapes corresponding to the original trefoil by keeping a maximum bead-to-bead distance of 1.38 and position II in figure 22, and further, with decreasing times the bead diameter, and a maximum step length probability, the weakly localised F8 and the other shapes of 0.15 times the bead diameter. To create the dense F8 of the hierarchy (top part) in figure 22. initial condition, a free F8 is squeezed into a quadratic These predictions are consistent with the numerical box with hard walls. This is achieved by starting off from simulations of reference (185), who observe that the mean the free F8, surrounding it by a box, and turning on a value of the second largest segment of the simulated 2D force directed towards one of the edges. Then, the op- dense trefoil configurations grows linearly with L, and posite edge is moved towards the centre of the box, and conjecture the same behaviour also for the other seg- so on. During these steps, the slip-link is locked, i.e., the ments, corresponding to the delocalisation of the trefoil chain cannot slide through it, and the two loops are of obtained above. equal length during the entire preparation. Finally, when An analogous reasoning can be applied to the 2D tre- the envisaged density is reached, the hard walls are re- foil in the Θ phase. We find that in this case that the placed by periodic boundary conditions, and the slip-link leading shape is again the original (uncontracted) trefoil, is unlocked. After each step, the system is allowed to re- with c = 5/7 < 1. This implies that the 2D trefoil is lax for times larger than the localisation times occurring delocalised also at the Θ point. All other shapes are at at the main stage of the run. least weakly localised, and subdominant to the leading A similar analysis as for the dense/Θ-F8 structure and scaling order represented by the original trefoil. The re- the dilute flat trefoil above, reveals the number of degrees sulting hierarchy of shapes is shown in figure 22 (bottom of freedom for the flat dense trefoil in the form (221) part). −c ω3(ℓ,L) ∼ ω0(L)ℓ (33) with c = −γ3 − m, where γ3 = −33/8 from equation G. 3D knots defy complete analytical treatment. (82) in the appendix (L = 4, n4 = 3) and m = 4 is the number of independent integrations over chain segments. As already mentioned, 3D knots correspond to a prob- Thus, c = 1/8 < 1 which implies that the dense 2D lem involving hard constraints that defy a closed analyt- trefoil is delocalised. As above, we have to consider the ical treatment. It may be possible, however, that by a various possible contractions of the flat knot. For dense suitable mapping to, for instance, a field theory, an an- polymers, the present scaling results show that both the alytical description may be found. This may in fact be original trefoil shape (c =1/8 < 1, see above) and posi- connected to the study of knots in diagrammatic solu- tion II (c =3/4 < 1) are in fact delocalised and represent tions in high energy physics (222). There exists a fun- equally the leading scaling order (cf. top part of figure damental relation between knots and gauge theory as 22). The F8 is only found at the third position and is knot projections and Feynman graphs share the same weakly localised (c = 11/8 > 1). In an MC simulation of basic ingredients corresponding to a Hopf algebra (115). 26

However, up to now no such mapping has been found, simulations study in (178), the advantage of experiments and a theoretical description of 3D knots based on first being the fact that it should be possible to go towards principles is presently beyond hope. To obtain some in- rather high chain lengths that are inaccessible in simula- sight into the statistical mechanical behaviour of knotted tions. To overcome similar difficulties in the context of chains, one therefore has to resort to simulations stud- the entropic elasticity for rubber networks, Ball, Doi, Ed- ies or experiments. In addition, a few phenomenologi- wards and coworkers replaced permanent entanglements cal models for both the equilibrium and dynamical be- by slip-links (225; 226; 227; 228). Gaussian networks haviour of knots have been suggested such as in references containing slip-links have been successful in the predic- (141; 169; 171; 176; 180; 181; 223; 224). tion of important physical quantities of rubber networks When discussing numerical knot studies, we already (172), and they have been used to study a small num- mentioned the Flory-type model brought leading to equa- ber of entangled chains (229). In a similar fashion, one tion 22 (169; 170). One may argue that the differences in may investigate the statistical behaviour of single poly- the knot size for the different knot types corresponding mer chains in which a fixed topology is created by a to the same C may be included in the prefactor, that number of slip-links. Such ‘paraknots’ can be studied is independent of the chain length N. Obviously, this analytically using the Duplantier scaling results (153). model of equal loop sizes is equivalent to a completely As mentioned previously, knowledge of the statistical be- delocalised knot. This statement is in fact equivalent haviour of paraknots can be used to create a knot scale to another Flory-type approach to knotted polymers re- for calibrating the degrees of freedom of real knots, and ported in (171). In this model, the knot is thought of as therefore also important to understand or design indirect an inflatable tube: for a very thin tube diameter, the tube experiments on knot entropy, such as by force-extension is equivalent to the original knot conformation; inflating measurements (179). Paraknots may also be useful in the the tube more and more will increasingly smoothen out design of entropy-based functional molecules (230; 231). the shape until a maximally inflated state is reached. The knot is then characterised by the aspect ratio V. DNA BREATHING: LOCAL DENATURATION ZONES L p = , therefore 1 ≤ p ≤ N, (34) AND BIOLOGICAL IMPLICATIONS D between length L and maximum tube diameter D. It ”A most remarkable physical feature of the DNA helix, appears that p is a (weak) knot invariant, and can be and one that is crucial to its functions in replication and used to characterise the gyration radius of the knot. It is transcription, is the ease with which its component chains clear that, by construction, the aspect ratio described a can come apart and rejoin. Many techniques have been totally delocalised knot, and indeed it turns out that in used to measure this melting and reannealing behaviour. good solvent, the gyration radius shows the dependence Nevertheless, important questions remain about the ki- 3/5 1/5 −4/15 netics and thermodynamics of denaturation and renatu- Rg ≃ AN τ p , where τ is the (dimensionless) deviation from the Θ temperature (171). Obviously, the ration and how these processes are influenced by other aspect ratio appears to be proportional to the number molecules in the test tube and cell” (3). This remark- of essential crossings in comparison to expression (22). able quotation, despite 30 years old, still summarises the We note that similar considerations are employed in ref- challenge of understanding local and global denaturation erence (224), including a comparison to the entropy of a of DNA, in particular, its dynamics. In this section, we tight knot, finding comparable entropic likelihood. The report recent findings on the spontaneous formation of modelling based on the aspect ratio p is further refined intermittent denaturation zones within an intact DNA bubbles in (141). double helix. Such denaturation fluctuate in size Knot localisation is a subtle interplay between the de- by (random) motion of the zipper forks relative to each grees of freedom of one big loop, and the internal degrees other. The opening and subsequent closing of DNA bub- DNA breathing of freedom of the various segments in the knot region. bles is often called . Under localisation, the number of degrees of freedom ω ≃ µN N 1−dν (35) A. Physiological background of DNA denaturation includes an additional factor N from the knot region en- The Watson-Crick double-helix is the thermodynami- circling the big loop. For flat knots, the competition cally stable configuration of a DNA molecule under physi- between the single big loop and the knot region is in- ological conditions (normal salt and room/body temper- deed won by the big loop. In the case of 3D knots, this ature). This stability is effected (a) by Watson-Crick balance is presently not resolved for knots of all complex- H-bonding, that is essential for the specificity of base- ity. Probably only detailed simulations studies of higher pairing, i.e., for the key-lock principle according to which order knots will make it possible to decide for the var- the nucleotide Adenine exclusively binds to Thymine, ious models of 3D knots. Major contributions are also and Guanine only to Cytosine. Base-pairing therefore expected from single molecule experiments, for instance, guarantees the high level of fidelity during replication and from force-extension measurements along the lines of the transcription. (b) The second contribution to DNA-helix 27

1.0

0.8

θ 0.6 h 0.4

0.2

0

355360 365 370 375 T [K]

FIG. 28 Fraction θh of double-helical domains within the DNA as a function of temperature. Schematic representa- FIG. 29 Overstretching of double-stranded DNA. The black tion of θh(T ), showing the increased formation of bubbles and unzipping from the ends, until full denaturation has been curve shows the typical force-extension behaviour of DNA reached. following the rapid worm-like chain increase until at around 65 pN a plateau is reached. Crossing of the plateau corresponds to progressive mechanical denaturation. See text for details. Figure courtesy Mark C. Williams. stability comes from base-stacking between neighbouring bps: through hydrophobic interactions between the pla- nar aromatic bases, that overlap geometrically and elec- (234). Careful melting studies allow one to obtain accu- tronically, the bp stacking stabilises the helical structure rate values for the stacking energies of the various combi- against the repulsive electrostatic force between the neg- nations of neighbouring bps, a basis for detailed thermo- atively charged phosphate groups located at the outside dynamic modelling of DNA-melting and DNA-structure of the DNA double-strand. While hydrogen bonds con- per se (235; 236). In fact, thermal melting data have tribute only little to the helix stability, the major support been successfully used to identify coding sequences of the comes from base-stacking (3; 233). genome due to the different G-C content (237; 238; 239). The quoted ease with which its component chains can Complementary to thermal or pH induced denatura- come apart and rejoin, without damaging the chemical tion, dsDNA can be driven toward denaturation mechan- structure of the two single-strands, is crucial to many ically, by applying a tensional stress along the DNA in physiological processes such as replication via the pro- an optical tweezer trap (240). As shown in figure 29, the teins DNA helicase and polymerase, and transcription force per extension increases in worm-like chain fashion, through RNA polymerase. During these processes, the until a plateau at approximately 65 pN is reached. This proteins unzip a certain region of the double-strand, to plateau is sometimes interpreted as new DNA configu- obtain access to the genetic information stored in the ration, the S form (241). By a series of experiments, it bases in the core of the double-helix (3; 6; 232). This appears more likely that the plateau corresponds to the unzipping corresponds to breaking the hydrogen bonds mechanical denaturation transition (242). To first order, between the bps. Classically, the so-called melting and the effect of the longitudinal pulling translates into an reannealing behaviour of DNA has been studied in solu- external torque T, whose effect is a decrease in the free tion in vitro by increasing the temperature, or by titra- energy for melting a bp: tion with acid or alkali. During thermal melting, the stability of the DNA duplex is related to the content of T triple-hydrogen-bonded G-C bps: the larger the fraction ∆GF = ∆GF =0 − θ0, (36) of G-C pairs, the higher the required melting tempera- ture or pH value. Thus, under thermal melting, dsDNA where θ0 = 2π/10.35 is the twist angle per bp of the starts to unwind in regions rich in A-T bps, and then double helix (243). proceeds to regions of progressively higher G-C content An important application of thermal DNA melting is (3; 233). Conversely, molten, complementary chains of the Polymerase Chain Reaction (PCR). In PCR, dsDNA single-stranded DNA (ssDNA) begin to reassociate and is melted at elevated temperatures into two strands of eventually reform the original double-helix under incu- ssDNA. By lowering the temperature in a solution of in- ◦ bation at roughly 25 below the melting temperature Tm variable primers and single nucleotides, each ssDNA is (3). The relative amount of molten DNA in a solution can completed to dsDNA by the key-lock principle of base- be measured by UV spectroscopy, revealing large changes pairing (244; 245). By many such cycles, of the order of in absorption in the presence of perturbed base-stacking 109 copies of the original DNA can be produced within 28

the range of hours.23 Again, the error rate due to the to the understanding of the binding of single-stranded underlying biochemistry can be considered negligible for DNA binding proteins (SSBs) that selectively bind to ss- most purposes. In particular, from the viewpoint of poly- DNA, and that play important roles in replication, re- mer physics/chemistry, the obtained sample is monodis- combination and repair of DNA (4). One of the key perse and free of parasitic reactions, creating (almost) tasks of SSBs is to prevent the formation of secondary ideal samples for physical studies, in particular, as any structure in ssDNA (1; 2). From the thermodynami- designed sequence of bases can be custom-made in mod- cal point of view one would therefore expect SSBs to ern molecular biology labs (2). be of an effectively helix-destabilising nature, and thus While the double-helix is the thermodynamically sta- to lower Tm (255). However, it was found that neither the gp32 protein from the T4 phage nor E.coli SSBs do ble configuration of the DNA molecule below Tm (at non- denaturing pH), even at physiological conditions there (255; 256; 257). An explanation to this apparent para- exist local denaturation zones, so-called DNA-bubbles, dox was suggested to consist in a kinetic block, i.e., a ki- predominantly in A-T-rich regions of the genome (234; netic regulation such that the rate constant for the bind- 262). Driven by ambient thermal fluctuations, a DNA- ing of SSBs is smaller than the one for bubble closing bubble is a dynamical entity whose size varies by ther- (257; 258). This hypothesis could recently be verified in mally activated zipping and unzipping of successive bps extensive single molecule setups using mechanical over- at the two forks where the ssDNA-bubble is bordered by stretching of dsDNA by optical tweezers in the presence the dsDNA-helix. This incessant zipping and unzipping of T4 gene 32 protein (259; 260; 261), as detailed below. leads to a random walk in the bubble-size coordinate, and to a finite lifetime of DNA-bubbles under non-melting conditions, as eventually the bubble closes due to the en- ergetic preference for the closed state (234; 262). DNA- breathing typically opens up a few bps (246; 247). It has B. The Poland-Scheraga model of DNA melting been demonstrated recently that by fluorescence correla- tion methods the fluctuations of DNA-bubbles can be ex- The most widely used approach to DNA melting in plored on the single molecule level, revealing a multistate bioinformatics is the statistical, Ising model-like Poland- kinetics that corresponds to the picture of successive zip- 24 Scheraga model (sometimes also referred to as Bragg- ping and unzipping of single bps. At room tempera- Zimm model) and its variations (234; 262; 263); see also ture, the characteristic closing time of an unbounded bp (155; 156; 264; 265). It defines the partition function Z was found to be in the range 10 to 100 µsec corresponding of a DNA molecule in a grand canonical picture with ar- to an overall bubble lifetime in the range of a few msec bitrary many bubbles. For simplicity, we will restrict the (249). The multistate nature of the DNA-breathing was following discussion to a single bubble. Below the melt- confirmed by a UV-light absorption study (250). The ing temperature Tm, the one bubble picture is a good zipping dynamics of DNA is also investigated by NMR approximation: due to the high energy cost of bubble ini- methods (251; 252; 253), revealing considerably shorter tiation, the distance between bubbles on a DNA molecule time scales than the fluorescence experiments. An in- is large, and bubbles behave statistically independently. teresting finding from NMR studies is the dramatically In typical experimental setups for measuring the bub- different denaturation dynamics in B’ DNA, where more ble dynamics (see below), the used DNA construct is than three AT bps occur in a row (254). It is conceiv- actually designed to host an individual bubble. For a able that fluorescence correlation and NMR probe differ- homopolymer stretch of double-stranded DNA with 400 ent levels of the denaturation dynamics. Our analysis of bps, figure 30 shows the probabilities to find zero, one the single DNA fluorescence data reported below demon- or, or two bubbles as a function of the Boltzmann fac- strates that, albeit the much longer time scale, the de- tor u = exp(∆G/RT ) for denaturation of a single bp.25 pendence of the measured autocorrelation function on the Even at the denaturation transition ∆G = 0, it is quite stacking along the sequence is very sensitive, and agrees unlikely to find two bubbles simultaneously. well with the quantitative behaviour predicted from the stability data. The free energy ∆G to break an individual bp are constructed as follows. We mention two different ap- The presence of fluctuating DNA-bubbles is essential proaches. Common for both is the Poland-Scheraga con- struction of the partition function. We start with the case that a linear DNA molecule denatures from one of its

23 Most proteins denature at temperatures between 40 to 60◦C, in- cluding polymerases. In early PCR protocols, after each heating step new polymerase had to be washed into the reaction cham- ber. Modern protocols make use of heat-resistant polymerases 25 In biochemistry, energies are usually measured in calories per that survive the temperatures necessary in melting. Such heat- mol. Instead of the Boltzmann factor β = 1/kB T commonly used resistant proteins occur, for instance, in bacteria dwelling near in physics and engineering, it is therefore convenient to replace undersea thermal vents. the Boltzmann constant kB by the gas constant R = kBNA, 24 Essentially, the zipper model advocated by Kittel (248). where NA is the Avogadro-Loschmidt number. 29

1 the free energy (247) 0.9 (0) Sep ST HB P ∆Gx,x+1 = ∆Gx,x+1 + ∆Gx (39) 0.8 ST HB 0.7 where the Gibbs free energies ∆Gx,x+1 and ∆Gx mea- 0.6 sure the stacking of bps x and x + 1 and the hydrogen bonding of bp x including the entropy release on disrup- 0.5 HB P(u) tion. Note that ∆Gx is chosen such that it only depends 0.4 on the broken bp and has two values for AT and GC bps, irrespective of the orientation (3’ or 5’). The stacking free 0.3 energies ∆GST were determined from denaturation at a 0.2 DNA nick and show a pronounced asymmetry between P(1) 0.1 (2) AT/TA and TA/AT bonds (247). For an end-denaturing P DNA both descriptions are equivalent (though somewhat 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 different when one puts numbers), as the breaking of each u bp involves the disruption of one hydrogen bonds of bp FIG. 30 Probability of having 0, 1, or 2 bubbles as a function x and one stacking with its neighbour. of u for a DNA region of chain length 400 bps. The coop- The difference between the two approaches becomes −3 erativity parameter was σ0 = 10 and the loop correction apparent when we consider the initiation of a bubble, i.e., exponent c=1.76 (see text). a denatured coil enclosed by intact double-helix. Now, the partition function for a bubble with left fork position at xL and consisting of m broken bps, ends. The corresponding partition function is (234; 262) xL+m Z ∆Gx,x+1/RT m mid(xL,m) = ΛΣ(m) e , (40) Z ∆Gx,x+1/RT x=xL end(m)= e (37) Y x=1 Y differs from (37) in three respects: (i) While the bub- ble consists of m molten bps, m + 1 stacking interactions where m is the number of broken bps, and ∆Gx,x+1 is the stacking free energy for disruption of the bp at posi- need to be broken to create two boundaries between in- tion x measured from the end of the DNA. The notation tact double-strand and the single-strand in the bubble; explicitly refers to the stacking between the bp at x and the extra stacking interaction is effectively incorporated x + 1. The first closed bp is located at x = m +1. For into Λ. (ii) The polymeric nature of the flexible single- stranded bubbles involves the entropy loss factor Σ(m)= a homonucleotide, ∆Gx,x+1 = ∆G, while for a given se- 1/(m + D)c with critical exponent c and the parameter quence of bps, there come into play the different stacking 27 energies for the possible combinations of pairs of bps in D to take care of finite size effects (235; 264; 266); 26 (iii) the factor Λ: In the standard notation, Λ ≡ σ0 = sequence The stacking energies ∆Gx,x+1 have the fol- −4 −5 exp(−Fs/RT ) ≃ 10 − 10 (234; 235; 262; 268), while lowing contributions. −3 The more traditional way to determine the stacking according to (247), Λ = ξ ≃ 10 is called the ring factor. interactions is by fit of bulk melting curves of DNA con- Interestingly, the cooperativity parameter σ0 is of the or- structs containing exclusively pairs of the specific bp- der of what corresponds to the singular unbalanced stack- ing enthalpy for breaking the first bp to initiate the bub- bp combination such as (AT/TA)n (see, e.g., (233; 267) and references therein). The free energy used in this ble. The new stability data lead to a more pronounced automated fit procedure using the MELTSIM algorithm asymmetry in opening probabilities between different bp- (235), bp combinations. The analysis in references (269; 270) demonstrates that the parameters from (247) appear to Mix ST support better the biological relevance of the TATA mo- ∆G = ∆H − T ∆Sx,x+1, (38) x,x+1 x,x+1 tif28 in natural sequences, that is, show a more pro- nounced simultaneous opening probability for the TATA combines the stacking enthalpy difference ∆HST for x,x+1 motif. both hydrogen bond and actual stacking energies, and As demonstrated for the autocorrelation function mea- the entropy difference ∆S chosen to explicitly de- x,x+1 suring the breathing dynamics in figure 32, the descrip- pend on the nature of the broken bp. A recent alterna- tion in terms of the partition function Z based on the tive to determine the stability parameters of DNA was developed in the group of Frank-Kamenetskii, leading to

27 Usually, D = 1 is chosen. 28 ·TATA· 26 I.e., an AT bp followed by another AT as different from an AT The four bp ·ATAT· sequence is one of the typical codes marking followed by a TA, etc. where RNA polymerase starts the transcription process (1; 2). 30

stability parameters from (247) reproduces the experi- =stacking bond u (1) u (M+1) mental data well. The analysis in (269; 270) also indi- st =hydrogen bonds st   cates that the accuracy of the model predictions for the       bubble dynamics is rather sensitive to the parameters. It    x L   x   R  is therefore conceivable that improved fluorescence mea-         surement of the bubble dynamics may be employed to ob-  x T tain accurate DNA, stability parameters, complementing u hb(1) u hb(M) the more traditional melting, NMR, and gel electrophore- x= 01 2 3 M M+1 sis bulk methods. It should be noted that the dynamics is strongly influenced by local deviations from the B config- uration of the DNA double helix. Thus, in local stretches FIG. 31 Clamped DNA domain with internal bps x = 1to M, of more than three AT bps in sequence, the B’ structure statistical weights uhb(x), ust(x), and tag position xT . The is assumed, leading to pronouncedly different zipping dy- DNA sequence enters through the statistical weights ust(x) and uhb(x) for disrupting stacking and hydrogen bonds re- namics (254). spectively. The bubble breathing process consists of the ini- Two major questions remain in the thermodynamic tiation of a bubble and the subsequent motion of the forks at formulation of DNA denaturation and its dynamics. positions xL and xR. See (270) for details. Namely, the exact origin of the bubble initiation factor σ0 (or, alternatively, the ring factor ξ from (247)), and a method to properly calibrate the zipping rate k. The fac- (272; 273). Applying the MELTSIM algorithm to typical tor σ0 is related to the entropic imbalance on opening the sequences, it was found that there is a connection be- first bp of a bubble: While this requires the breaking of tween the fit result for the cooperativity parameter σ0, two stacking interactions, only one bp has access to an in- whose value is reduced from ≈ 10−5 to ≈ 10−3 by as- creased amount of degrees of freedom. Still, these degrees suming c =2.12 instead of 1.76 (236). Below the melting of freedom must be influenced by the fact that the single transition, the typical bubble size is only a few bps, and in open bp is coupled to two zipper forks. Currently, σ0 re- that regime the polymeric treatment of the loop entropy mains a fit parameter. The exact value of the zipping rate loss is of approximative nature. Indeed, in the analysis of k remains open. While NMR experiments indicate much (247) no entropy loss due to polymer ring formation was 8 −1 faster rates in the nanosecond range (≃ 10 sec ), the included. For the breathing dynamics, we include c, to fluorescence correlation measurements produce values in cover higher temperatures with somewhat larger bubbles, 4 5 −1 the microsecond range (≃ 10 − 10 sec ). This large but find no significant change in the behaviour between discrepancy may be based on the fact that both meth- c < 2 and c > 2, as long as the exponent is sufficiently ods have different sensitivity to the amplitude of intra-bp close to 2. We therefore use the value c = 1.76, that is separation. Currently, k is taken as a fit parameter. In consistent with the traditional data fits employed in the the analysis in (269; 270), we use the stacking parame- determination of the stacking parameters. ters from (247) including the value of the ring factor ξ, so that k (a shift along the logarithmic abscissa) is the only adjustable parameter. C. Fluctuation dynamics of DNA bubbles: DNA breathing It has been under debate what exact value should be taken for the critical exponent c entering in the entropy Below the melting temperature T , DNA bubbles are loss factor for a denaturation bubble. This is connected m intermittent, i.e., they form spontaneously due to ther- to the fact that c > 2 would imply a first order denat- mal fluctuations, and after some time close again. DNA- uration transition on melting, while 1 2 are either based on the inclusion of corresponds to the survival time of the first passage prob- polymeric self-avoidance interactions of the bubble with lem of relaxing to the m = 0 state after a random walk the remainder of the chain (155); or built on a directed in the m > 0 halfspace (269; 270; 275; 276; 277). Bub- polymer model (271). Despite the elegance of both ap- ble breathing on the single DNA-bubble level was mea- proaches, it is an open question how truly they repre- sured by fluorescence correlation spectroscopy in (249). sent the detailed denaturation behaviour of real DNA This technique employs a designed stretch of DNA, in which weaker AT bps form the bubble domain, that is clamped by stronger GC bonds. In the bubble domain, a fluorophore-quencher pair is attached. Once the bubble 29 Due to the rather small DNA samples used in melting experi- ments (5000 bp and less (234)), claims about the order of the is created, fluorophore and quencher are separated, and underlying thermodynamical phase transition should be consid- fluorescence occurs. A schematic of this setup is shown ered with some care. in figure 31. 31

The zipper forks move stepwise xL/R → xL/R ± 1 with 1.2 5'-GGCGCCCATATATATATATATATATGCGCtt- ± 3'-CCGCGGGTATATATATATATATATACGCGtt- (AT9) rates tL/R(xL/R,m). We define for bubble size decrease 1 22C 22C repeat + − 26C t (xL,m)= t (xL,m)= k/2 (m ≥ 2) (41) L R 0.8 30C 33C 37C 30 for the two forks. The rate k characterises a single bp 0.6 f(τ) 41C 0.6 45C zipping. Its independence of x corresponds to the view 0.5 45C repeat that bp closure requires the diffusional encounter of the A(t) 49C 0.4 0.4 T=49oC ME 49C two bases and bond formation; as sterically AT and GC T=33oC ME 33C 0.3 T=22oC ME 49Ccorr bps are very similar, k should not significantly vary with 0.2 bp stacking. The rate k is the only adjustable parameter 0.2 of our model, and has to be determined from experiment 0 0.1 τ -1 or future MD simulations. The factor 1/2 is introduced 0 [k ] 0 2 4 6 8 10 12 14 for consistency (269; 270; 276; 277; 278). Bubble size -0.2 increase is controlled by 0.001 0.01 0.1 1 t [1/k] − tL (xL,m) = kust(xL)uhb(xL)s(m)/2, + FIG. 32 Time-dependence of the autocorrelation function t (xL,m) = kust(xR + 1)uhb(xR)s(m)/2, (42) R At(xT ,t) for the sequence AT9 measured in the FCS setup c of reference (249) at 100mM NaCl. The full lines show the for m ≥ 1, where s(m) = {(1 + m)/(2 + m)} . Finally, result from the master equation, based on the DNA stability bubble initiation and annihilation from and to the zero- parameters from Krueger et al. (247). The inset shows the bubble ground state, m =0 ↔ 1 occur with rates broadening of the relaxation time spectrum with increasing temperature. + ′ tG(xL) = kξ s(0)ust(xL + 1)uhb(xL + 1)ust(xL + 2) t−(x ) = k. (43) G L D. Probabilistic modelling—the master equation (ME) The rates t fulfil detailed balance conditions. The an- DNA breathing is described by the probability distri- nihilation rate t−(x ) is twice the zipping rate of a sin- G L bution P (x ,m,t) to find a bubble of size m located at gle fork, since the last open bp can close either from L xL whose time evolution follows the ME (269; 270; 276; the left or right. Due to the clamping, xL ≥ 0 and − 277; 278) xR ≤ M + 1, ensured by reflecting conditions tL (0,m)= + t (xL,M−xL) = 0. Therates t together with the bound- ∂ R P (x ,m,t)= WP (x ,m,t). (45) ary conditions fully determine the bubble dynamics. ∂t L L In the FCS experiment fluorescence occurs if the bps The transfer matrix W incorporates the rates in a ∆-neighbourhood of the fluorophore position xT are open (249). Measured fluorescence time series thus corre- t. Detailed balance guarantees equilibration to- Z Z spond to the stochastic variable I(t), that takes the value ward limt→∞ P (xL,m,t) = (xL,m)/ , with Z = Z (xL,m) (280). The ME and the 1 if at least all bps in [x − ∆, x + ∆] are open, else it is xL,m T T W 0. The time averaged ( · ) fluorescence autocorrelation explicit construction of are discussed at length in referencesP (270; 276). Eigenmode analysis and matrix 2 diagonalisation produces all quantities of interest such At(xT ,t)= I(t)I(0) − I(t) (44) as the ensemble averaged autocorrelation function

2 for the sequence AT9 from (249) are rescaled in figure 32. A(xT ,t)= hI(t)I(0)i− (hIi) . (46) We note that an alternative method to obtain precise DNA stability data may be provided by a DNA construct hI(t)I(0)i is proportional to the survival density that the with two AT-rich zones between which a shorter GC-rich bp is open at t and that it was open initially (269; 270). bridge is located. The first passage problem correspond- In figure 32 the blue curve shows the predicted be- ◦ ing to bubble merging at temperatures between the melt- haviour of A(xT ,t), calculated for T = 49 C with the ing temperatures of the AT and GC zones was recently parameters from (247). As in the experiment we as- calculated (279), and provides the framework for mod- sumed that fluorophore and quencher attach to bps xT ified fluorescence correlation setups similar to the one and xT + 1, that both are required open to produce a from reference (249). fluorescence signal. From the scaling plot, we calibrate the zipping rate as k = 7.1 × 104/s, in good agreement with the findings from reference (249). The calculated behaviour reproduces the data within the error bars, ◦ 30 Due to intrachain coupling (e.g., Rouse), larger bubbles may in- while the model prediction at T = 35 C shows more volve an additional ‘hook factor’ m−µ (276). pronounced deviation. Potential causes are destabilising 32

effects of the fluorophore and quencher, and additional E. Stochastic modelling—the Gillespie algorithm modes that broaden the decay of the autocorrelation. The latter is underlined by the fact that for lower temper- Despite its mathematically simple form, the master atures the relaxation time distribution f(τ), defined by equation (45) needs to be solved numerically by invert- A(xT ,t)= exp(−t/τ)f(τ)dτ, becomes narrower (figure ing the transfer matrix (270; 276). Moreover, it produces 32 inset). Deviations may also be associated with the ensemble-averaged information. Given the access to sin- correction forR diffusional motion of the DNA construct, gle molecule data, it is of relevance to obtain a model measured without quencher and neglecting contributions for the fully stochastic time evolution of a single DNA- from internal dynamics (281). Indeed, the black curve bubble, providing a description for pre-noise-averaged shown in figure 32 was obtained by a 3% reduction of the quantities such as the step-wise (un)zipping. With this diffusion time;31 see details in (270). scope, we introduced a stochastic simulation scheme for A remark on a prominent alternative approach to DNA the (un)zipping dynamics, using the Gillespie algorithm breathing appears in order. This is the Peyrard Bishop to update the state of the system by determining (i) the Dauxois (PBD) model (282; 283) based on the set of random time between individual (un)zipping events, and Langevin equations (284) (ii) which reaction direction (zipping, ←, or unzipping, →) will occur (288). This scheme is efficient computa- tionally, easy to implement, and amenable to generalisa- 2 d yn dV (yn) dW (yn+1,yn) dW (yn,yn−1) tion. m 2 = − − − dt dyn dyn dyn To define the model, we denote a bubble state of m dyn broken bps by the occupation numbers bm = 1 and bm′ = −mγ = ξ (t). (47) ′ dt n 0 (m 6= m). The stochastic simulation then corresponds to the nearest-neighbour jump process 2 Here, V (yn) = Dn [exp(−anyn) − 1] is a Morse po- b0 ⇄ b1 ⇄ ... ⇄ bm ⇄ ... ⇄ bM−1 ⇄ bM , (48) tential for the hydrogen bonding, Dn and an assuming two different values for AT and GC bps; W (y,y′) = with reflecting boundary conditions at b0 and bM . Each k [1+ ρ exp{−β(y + y′)}] (y − y′)2 is a nonlinear poten- 2 jump away from state bm occurs after a random time τ, tial to include bp-bp stacking interactions between ad- ′ and in random direction to either bm−1 or bm+1, governed jacent bps y and y . The parameters k, ρ, β, γ, and by the reaction probability density function32 (289; 290) m are invariant of the sequence. Usually, the stochas- tic equations (47) is integrated numerically (284). Due µ −(t+(m)+t−(m))τ to its formulation in terms of a set of Langevin equa- P (τ,µ)= t (m)e , (49) tions, the DPB model is very appealing, and it is a useful where µ ∈ {+, −} denotes the unzipping (+) or zipping model to study some generic features of DNA denatura- ± tion. The disadvantage of the current formulation of the (−) of a bp, and the jump rates t (m) are defined be- DBP model is the fact that it does not include enough low. From the joint probability density function (49), parameters to account for the known independent sta- the waiting time probability density function that a jump bility constants of double stranded DNA (in fact, only away from bm occurs is given by ψ(τ)= µ P (τ,µ), i.e., two parameters are allowed to vary with the sequence, it is Poissonian. The probability that the bubble size does not change in the time interval [0P,t] is given by in contrast to the 12 independent parameters needed t to fully describe the bp stacking and hydrogen bonding φ(t)=1 − 0 ψ(τ)dτ. The fork position xL (and thereby (247)). Moreover, there appear to be certain ambiguities the sequence of bps) is straightforwardly incorporated R in the proper formulation of boundary conditions in the (269; 270). stochastic integration (286), and also with respect to the We start the simulations from the completely zipped interpretation of the biological relevance and computa- state, b0 = 1 at t = 0, and measure the bubble size at M tional limitations of the PBD model (287). The master time t in terms of m(t) = m=0 mbm(t). The time se- equation and Gillespie approach brought forth in refer- ries of m(t) for a single stochastic realization is shown ences (269; 270; 276; 277; 278) bridges the gap between in figure 33. It is distinctP that the bubble events are the thermodynamic data for the bp stacking and hydro- very sharp (note the time windows of the zoom-ins), and gen bonding obtained by various experimental methods, most of the time the zero-bubble state b0 prevails due to and the dynamical nature of DNA breathing. It is hoped σ0 ≪ 1. Moreover, raising the temperature increases the that both dynamic models will synergetically be devel- bubble size and lifetime, as it should. By construction of oped further and eventually lead to a better understand- ing of DNA denaturation fluctuations.

32 The original expression for the reaction probability density func- tµ − tµ tion, P (τ, µ) = bm (m) exp “ τ m,µ bm (m)”, that is rele- 31 P For diffusion time τD = 150µs measured for an RNA construct vant for consideration of multi-bubble states, simplifies here due of comparable length in (281). to the particular choice of the bm. 33

20 20 18 (a) 18 (b) 16 16 14 14 12 12 10 10 m(t) m(t) 8 8 6 6 4 4 2 2 0 0 0 20 40 60 80 100 0 20 40 60 80 100

20 20 18 18 16 16 14 14 12 12 10 10 m(t) m(t) 8 8 6 6 4 4 2 2 0 0 17.9852 17.9854 17.9856 17.9858 17.986 17.9862 39.5912 39.592 39.5928 39.5936 39.5944 39.5952 time × 10-4 (in units of 1/k) time × 10-4 (in units of 1/k)

−3 FIG. 33 Time series of single bubble-breathing dynamics for σ0 = 10 , M = 20, and (a) u = 0.6 and (b) u = 0.9. The lower panel shows a zoom-in of how single bubbles of size m(t) open up and close. the simulation procedure, it is guaranteed that an occu- TATA in comparison to the vicinal GC-rich domain (note pation number bm = 1 (m 6= 0) corresponds to exactly that AT/TA bps are particularly weak (247)). This is one bubble. quantified by the waiting time density ψ(τ), whose char- In a careful analysis, it was shown that the stochas- acteristic time scale is more than an order of magnitude tic simulation method provides accurate information of longer for the xT = 41 position. In contrast, we observe the statistical quantities of the bubble, such as opening almost identical behaviour for the bubble survival den- probability and autocorrelation function (288). It can sity φ(τ). Due to the proximity of xT = 41 to TATA, the therefore be used to obtain the same information as the typical bubble sizes for both tag positions are similar, master equation with the advantage of also giving access and therefore the relaxation time. However, as shown in to the noise in the system. With the Gillespie technique, figure 34 bottom, the variation of the mean lifetime ob- we also obtained the data points in the graphs in this tained from the master equation is quite small (within a section. factor 2) for the entire sequence. The latter graph also shows the insignificant variation according to the earlier stability parameters by Blake et al. (235). F. Bacteriophage T7 promoter sequence analysis. The results summarised in figure 34 and further stud- An example from the analysis of the promoter sequence ies in (269; 270) may indicate that it is not solely the in- AAAA1AAAAAAAAAAAAAAAAAA20 creased opening probability at the TATA motif, as stud- AAAA|AAAAAAAAAAAAAAAAAA|AAAAAAAAAAAA ied in (291). Given the rather short bubble opening times 5’-aTGACCAGTTGAAGGACTGGAAGTAATACGACTC of order of a few k−1, it might be sufficient to induce bind- (50) AAAAGTATAGGGACAATGCTTAAGGTCGCTCTCTAGGAg-3’ ing of transcription enzymes (or other single stranded AAAAA|AAA| AAAAAAAAAAAAAAAAAAAAAAAAAA|AAA AAAAA38AA41AAAAAAAAAAAAAAAAAAAAAAAAAA68AAA DNA binding proteins) if only bubble events are repeated often enough. In the present example, the waiting time of bacteriophage T7 is shown in figure 34 (269). It depicts between individual bubble events is increased by a factor the time series of I(t) for the tag positions xT = 38 at the of 25 inside the TATA motif. Guided by such results, beginning of TATA, and xT = 41 at the first GC bp af- detailed future studies combining optical tweezers over- ter TATA. It is distinct how frequent bubble events are in stretching and monitoring transcription initiation may 34 ¼  25 38/41

I(t)

ψ Ò φ(t), x =38 (t), xT=41 φ T ψ 20 (t), xT=41 (t), xT=38 Ñ;

Z n=10 τ -1 τ 4 -1

(t) =3.00 k =2.29 10 k 38 -1 38 5 -1 n=0

ψ τ τ 41=2.21 k 41=5.82 10 k 15 n=1

(t), φ

¼ n=5

 10

2 4 6 8 1000 10000 100000 1e+06 average free energy,

Ò

¬ F Ñ Z Ñ 2.2 τ∆ [1/k] K 5 ad =2 B 1.8 Ñ;

1.4 F

¬ 1 0 10 20 30 40 50 60 0 10 20 30 40 m ¼  25 FIG. 34 Time series I(t) for the T7 promoter, with xT = 38,

41. Middle: Waiting time (ψ(τ)) and bubble survival time Ò (ϕ(τ)) densities. Bottom: Mean bubble survival time, ∆ = 2. 20 Ñ; Z

15 n=0 be a step toward better understanding of this important n=1

biochemical process. ¼ We note that the influence of noise (e.g., due to rep-  10 n=10 etition of single molecule experiments) on the bubble

dynamics can also be studied in the weak noise limit n=5 Ò 5 by a WKB method (292). This model provides infor- Ñ; mation, for instance, about the time it takes a DNA average free energy,

F

¬ F Ñ Z Ñ

ad

to denature under temperatures above T (mathemat- ¬ 0 m 0 10 20 30 40 ically corresponding to a finite time singularity). Bubble m breathing can be mapped on the Coulomb problem of the Schr¨odinger equation, and the corresponding phase FIG. 35 Effective free energy in the limit γ ≫ 1 (—), and transition studied (293). ‘free energy’ for various fixed n (u = 0.6, M = 40, c = 1.76, λ = 5). Top: κ = 0.5; bottom: stronger binding, κ = 1.5.

G. Interaction of DNA bubbles with selectively single-strand binding proteins than one. In order to be able to bind in between two al- ready bound SSBs, the distance between these two SSBs Let us now come back to the destabilising effect of must be larger than λ. The larger λ the less efficient single-stranded DNA binding proteins (SSBs) mentioned the SSB-binding becomes, similar to parking large cars in section V.A. In a homopolymer approach, this was on a parking lot desgined for small vehicles. Apart from studied in a master equation approach in references the binding size λ of the SSBs, two additional physical (276; 277). The quantity of interest is the joint prob- parameters come into play: the unbinding rate q of the eq ability P (m,n,t) to have a bubble consisting of m bro- SSBs, and their binding strength κ = c0K consisting ken bps, and n SSBs bound to the two arches of the of the volume concentration c0 of SSBs and the equilib- ± eq bubble. In addition to the rates t for bubble increase rium binding constant K = v0 exp(β|ESSB|), with the ± and decrease, the rates r for SSB binding and unbind- typical SSB volume and binding energy ESSB. ing are necessary to define the breathing dynamics in the The coupled dynamics of SSB-binding and bubble presence of SSBs. On the statistical level, the effect of breathing is discussed in references (276; 277); similar the SSBs becomes coupled to the motion of the zipper effects in end-denaturing DNA was studied in (294) in forks. Thus, the rate for bubble size decrease is propor- detail. Here, we report the behaviour of the effective free tional to the probability that no SSB is located right next energy landscape in the limit of fast SSB-binding in the to the corresponding zipper fork; and the rate for SSB sense that the dimensionless parameter γ ≡ q/k of SSB- binding is proportional to the probability that there is unbinding and bubble zipping rates is large, γ ≫ 1. This sufficient unoccupied space on the bubble. Binding is al- limit allows one to average out the SSB-dynamics and to lowed to be asymmetric, and is related to a parking lot calculate an effective free energy, in which the bubble dy- problem in the following sense. The number λ of bps oc- namics with the slow variable m runs off. The result for cupied by a bound SSB is usually (considerably) larger two different binding strengths κ is shown in figure 35, 35

along with the free energies corresponding to keeping n fixed. It is distinct that while for lower κ the presence of SSBs diminishes the slope of the effective free energy, for larger κ the slope actually becomes negative. In the first case, that is, the bubble opening is more likely, but still globally unfavourable. In the latter case, the presence of SSBs indeed leads to full denaturation. One observes dis- tinct finite size effects due to λ> 1: only when the bubble reaches a minimal size m ≥ λ, SSB-binding may occur, a second SSB is allowed to bind to the same arch only once m ≥ 2λ, etc. This effect also produces the nucleation bar- rier for full denaturation in the lower plot of figure 35. Similar finite size effects were investigated for biopoly- mer translocation in references (295; 296). We note that the transition to denaturation could also be achieved by reaching a smaller positive slope of the effective free en- ergy in the presence of SSBs, and additional titration or change of the effective temperature through actual tem- perature change or mechanical stretching as performed in the experiments reported in references (259; 260; 261).

VI. ROLE OF DNA CONFORMATIONS IN GENE REGULATION FIG. 36 Gene regulation, here the example of the (diver- gent) bacteriophage λ switch after infection of E.coli . Figure Our current understanding of gene regulation to large from (16), with permission from M. Ptashne. This figure was extent is based on the experiments by Andr´eLwoff at modified by the author from the corresponding figure in M. Institut Pasteur more than 50 years ago (297). Lwoff Ptashne, A Genetic Switch: Phage Lambda and Higher Or- and his collaborators discovered that while a strain of ganisms, 2nd edition c Blackwell Science, Malden, MA and E.coli, a common intestinal bacterium, divided regularly Cell Press, Cambridge, MA, with permission. when undisturbed, an unexpected phenomenon occurred when the strain was exposed to UV light: the bacte- ria stop growing and after some 90 minutes they burst protein. Cro bound to the operator then recruits RNA (lyse), releasing a load of viruses. These viruses then in- polymerase to the operator, stabilising the Cro produc- vade new E.coli.33 Some of the newly infected bacteria tion and blocking cI. Simultaneously, a whole sequence of immediately lyse again, while the rest divides normally— genes is being expressed, and the virus reproduces itself while carrying the virus in them. This dormant state inside E.coli until lysis occurs. UV light flips the switch (lysogeny) of the bacterium can then be driven toward from transcription of the gene cI maintaining the dor- lysis by renewed UV exposure. mant lysogenic pathway, inducing lysis that is fostered The UV exposure-induced transition from lysogeny to by transcription of the cro gene (16; 298). lysis occurs as sketched in figure 36. On infection, the The activity of a gene can be monitored even on the bacteriophage λ injects its DNA into E.coli . In the lyso- single genome level, by combining the targeted gene gI genic pathway, the viral DNA is integrated into the host with the gene leading to synthesis of GFP, the green flu- DNA. During lysogeny, repressor dimers bind to certain orescent protein, i.e., when gI is transcribed, then so is operator sites on the λ part of the DNA, recruiting RNA the gene for GFP. Occurrence of fluorescence then reports polymerase to bind to the overlapping promoter region(s) transcription of gI. Connected to questions such as the and blocking of the vicinal promoter for the divergent cro stability of a genetic pathway is the search process of a gene. RNA polymerase then transcribes the cI gene to specific gene by regulatory proteins, that is, how dynam- the left of the operator, leading to the expression of new ically the binding protein actually locates the operator repressor molecules. UV light, however, leads to cleavage on the genome. We address these points in what follows. of the repressor dimers.34 Now the basal transcription of the gene cro, opposite to the cI gene with respect to the operator region, leads to the expression of the Cro A. Physiological background of gene regulation and expression

The λ switch from figure 36 is an example of a rel- 33 Often these viruses are called phages or bacteriophages—bacteria atively simple mechanism. Even simpler is the well- eaters. studied Lac repressor. There, the lacZ gene is expressed 34 By activation of RecA proteins. by recruitment through the CAP protein when E.coli is 36 starved of glucose and exposed to lactose. This enables E.coli to digest lactose. In absence of lactose, lacZ is blocked by the rep protein. In general, the expression of a certain gene is just one element in a cascade of simulta- neous and/or hierarchical control units, such as in the de- velopmental regulatory network of the sea urchin embryo (299). The basic physiological background is common to all of them: Genes are the blueprints of proteins. They control physiological processes but also developmental pathways: from a fertilised egg cell, eventually all cell types (skin, hair, liver, brain, etc cells) of a human body emerge, or a skin cell changes colour on sunlight exposure. A gene is but a stretch of a DNA molecule, typically comprising some 200-1000 bps. Roughly speaking, a gene is on when it is being transcribed by RNA polymerase, otherwise it is off. RNA polymerase binds at the promoter region consisting of some 60 bps close to the beginning of the gene. It then converts the A, T, G, C code of the gene FIG. 37 Activity of the two λ promoters as function of re- into a complementary messenger RNA (from which, in pressor concentration for vanishing Cro concentration. The turn, the protein is produced during translation). The full line corresponds to wild type data, whereas the dashed stop of transcription is triggered by a certain sequence at lines correspond to ”mutations”. The thin vertical line corre- the end of the gene. Depending on specific conditions of sponds to lysogenic CI concentration. See reference (309) for the recruitment by regulatory proteins, RNA polymerase details. binding to the promoter of a certain gene is either blocked (the gene is off), facilitated (high binding affinity of RNA tain mutations can even be compensated by parallel mu- polymerase due to the (simultaneous) presence of certain tations influencing other binding energies (suppressors) protein(s)), or basal (in absence of any bound regulatory (309; 310). A typical result is shown in figure 37. protein, RNA polymerase can still have a minor affinity to the promoter and then autonomously start transcrip- tion). The transcription mechanism is part and parcel B. Binding proteins: Specific and nonspecific binding of the central dogma of molecular biology summarised in modes figure 1. Molecular switches such as the λ switch are surpris- Given their very specific function, DNA-binding pro- ingly stable against noise, despite the fact that there are teins must recognise a specific (cognate) sequence of nu- only about 100 repressor dimers in the entire bacteria cell cleotides along the genome. In fact, without opening the (300). Thus, apart from external induction, lysis occurs double helix, the outside of the DNA can be read by pro- by spontaneous induction due to absence of CI from the teins, as the edge of each bp is exposed at the surface. operators (301; 302). Such noise-induced errors are esti- These patterns are unique only in the major groove of the mated to occur once in 107 cell generations (303; 304). DNA, this being the reason why gene regulatory proteins The stability of the λ switch against noise was analysed generally bind to the major groove. Apart from single in terms of a Wentzell-Freidlin approach (305) and by a bp pattern recognition, the protein binding is sensitive simulation analysis (306). The latter confirmed that the to the special surface features of a certain DNA region. currently known molecular mechanisms used in modelling This local structure of DNA needs to be complementary the λ switch appear sufficient. While the classical Shea- to the protein structure. Typical structure patterns (mo- Ackers model based on a statistical mechanical approach tifs) include helix-turn-helix, zinc fingers, leucine zipper, (307) is well established and studied numerically (308), and helix-loop-helix motifs (1). In bacteria, typical DNA- it relies on the knowledge of 13 fundamental Gibbs free binding proteins cover some 20 bps or less. For instance, binding energies composed to 40 different binding states the lac repressor has a cognate sequence of 21 base pairs, of regulatory proteins and RNA polymerase at the two the CAP protein 16, and the λ repressor cI 17 bps (1). promoters of λ. Simulation of the complete λ regulatory Although the interaction with a single nucleotide within system proved the understanding of the mechanisms of such a DNA-protein bond is relatively weak, the sum of the switch (308). Two more recent studies show that the all matching nucleotides reaches appreciable values for λ switch remains stable even when each of the fundamen- the overall binding enthalpy, see below. Moreover, reg- tal Gibbs free energies is varied within its (appreciable) ulatory proteins bound simultaneously can significantly experimental error. Moreover, effects of potential muta- enhance the stability of their individual bonds. tions resulting in more significant changes of the bind- A simple model for the binding interaction goes back ing energies were studied, and it was shown that cer- to the work of Berg and von Hippel (311; 313). Ac- 37 cordingly, the binding free energy is comprised of two nbulk DB contributions: (i) the (average) non-specific binding free koff energy due to electrostatic interaction with the DNA; and (ii) additional binding free energy if the sequence of Bulk the binding site is sufficiently close to the best (perfectly excursion matching) sequence. The transition to the non-specific Sliding binding is supposed to occur via a conformational change kon of the regulatory protein from one that allows more hy- drogen bond-formation to another that permits closer contact between the positive charges of the binding pro- DL tein to the negatively charged DNA backbone (312). This Intersegmental is supported by more recent structural studies: While in transfer the non-specific binding mode the Lac repressor is bound to DNA in a rather loose and fuzzy way (314), it ap- Target pears much more ordered in the specific mode. In fact, in finding j(t) O the latter the protein induces a bend in the DNA (315). In case (ii), the additional binding free energy can, to good approximation, be considered independent and ad- FIG. 38 Schematic of the search mechanisms in equation (52). ditive. reference (316) provides a review of these issues, and derives the following result. Accordingly, to satisfy both the thermodynamic and kinetic constraints of the mechanisms and should not just cut the cell’s own DNA DNA-binding protein interaction, each additional base (320). This does not mean that restriction enzymes do mismatch in comparison to the best sequence amounts not bind non-specifically—in fact, this is an important ingredient of their search process in total analogy to reg- to the loss of roughly 2 kBT , and the optimal value for the transition between best specific binding to the cog- ulatory proteins. However, their sole active role occurs nate site and non-specific binding is shown to be some on complete matching. 16 kBT below the energy of the best binding. This value is quite close to the ≈ 14 kBT found for the difference between specific and nonspecific binding in (317; 318). C. The search process for the specific target sequence The fact that regulatory proteins bind with varying To find their specific (cognate) binding site along the affinity is an important ingredient in gene regulation: genome, DNA-binding proteins such as restriction en- Not all promoters should have the same activity, because zymes or transcription factors have to search megabases some proteins are required by the cell at much higher along the DNA molecule. The high accuracy of gene levels than others. Thus, one given regulatory protein, expression control by binding proteins such as in the λ- that controls the recruitment to the promoters of several switch requires a fast search and recognition of the target genes, can act with different strength depending on the sequence by the proteins. A simple 3-dimensional (3D) degree of matching with the local sequence. search of the target sequence by the proteins is not suf- Non-specific binding can become quite appreciable. It ficient to explain experimentally measured target search was discovered in reference (319) that in the case of rates. It has been suggested relatively early (321; 322) the Lac repressor less than 10% of the proteins were that additional search mechanisms such as 1D sliding unbound. In a more recent study using in vivo data along the genome are needed to account for the actual of the λ switch, it was found that in a lysogen nearly efficiency of the search process. In their pioneering work, 90% of the repressor protein cI is non-specifically bound. Berg, von Hippel and coworkers established a statistical This implies that only 10-20 free cI dimers exist in the model for target search comprising the four fundamental E.coli cell at any time, pointing at the important role steps, as shown in figure 38: (i) 3D macrohops during of non-specific binding in the search process of the cog- which the protein fully detaches from the genome un- nate site addressed in the following subsection. Under til after a volume excursion it rebinds to the DNA (as different conditions, both cI and Cro are always non- a good approximation, the landing site on the DNA af- specifically bound by more than 50%. The corresponding ter a macrohop can be assumed to be equidistributed non-specific binding energies were estimated as 7 kBT and uncorrelated); (ii) microhops during which the pro- (317; 318). tein detaches from the DNA but always stays very close We note that in contrast to regulatory proteins, restric- to it (i.e., the microhop takes place within a cylinder tion enzymes have an approximate all-or-nothing match- whose radius corresponds to the escape distance of the ing condition: If a defined sequence matches the restric- protein from the DNA, see (311)); (iii) 1D sliding along tion enzyme, it will cut, otherwise not. Even a single the genome (while preserving a certain bonding to the mismatch reduces the action of the restriction enzyme DNA due to nonspecific binding); and (iv) intersegmen- by orders of magnitude. This distinction from regulatory tal jumps. The latter are mediated by DNA-loops bring- protein makes sense as restriction enzymes are survival ing two chemically remote segments of the DNA close 38

in Euclidean space, see, for instance, (20) and references therein. A protein like Lac repressor, which can estab- 100000 Simulation data lish bonds to two different stretches of dsDNA simulta- Experimental data, 100mM neously, can then jump from one to the neighbouring 35 segment. This process might lead to a paradoxical dif- 10000 fusion behaviour (323; 324). However, if the conforma- tional changes in the DNA are not too slow, both the ] 1000 bulk mediated macrohops and the intersegmental trans- -1 [s fer lead to fast mixing of the enzymes’ positions along a k the chain (as it was shown for the related problem in 100 (325)), and on the mean-field level can be described by a

desorption followed by the absorption at a random place. 10 Recently, there has been renewed interest in the target- ing problem, both theoretically (see, for instance, (316; 1 326; 327; 328)) and experimentally (e.g., (329; 330)), 1e-09 1e-08 1e-07 including single molecule studies (260; 261; 331). De- C [M] spite the extensive knowledge of specific binding rates and both specific and non-specific binding free energies, the precise relative contributions of the different search FIG. 39 Dimensional binding rate ka in 1/s as function of mechanisms (and, to some extent, also the stringent cri- protein concentration C in M, for parameters corresponding to 100 mM salt. The fitted 1D diffusion constant for sliding teria to define these four elementary interactions) are not −9 2 along the dsDNA is D1d = 3.3 · 10 cm /sec, located nicely fully resolved. Moreover, it has been suggested that un- within the experimental value 10−8 . . . 10−9cm2/sec (259). der tight(er) binding conditions, the sliding of the protein becomes subdiffusive due to the local structure landscape of a heteropolymer DNA (332). This complication, how- able to quickly find specific locations on DNA molecules ever, is expected to be relaxed in a more loosely bound that are undergoing replication, and which have large search mode of the searching protein (326). We here sections of single-stranded DNA exposed for gp32 bind- adopt the latter view of normal diffusion, which is cor- ing. The resulting nonlinear concentration dependence of roborated by the experimental study in the next subsec- gp32 binding will likely have significant effects on gp32’s tion. ability to find its replication sites as well as its ability to recruit other proteins during replication. If these non- linear effects also occur for TFs, this characteristic will strongly affect regulatory processes governed by protein D. A unique situation: Pure one-dimensional search of SSB mutants binding. Results from the single DNA overstretching experi- ment are shown in figure 39 along with the results from In previous studies, the 1D sliding problem had al- the theoretical and simulations analysis from references ways been considered as a problem of 3D diffusion which (333). The scaling of search rate as function of concen- is enhanced by 1D diffusion. Thus, workers such as Berg, tration is described by the relation Winter, and von Hippel (311) assumed that proteins non- specifically bound would on average unbind before find- 2 ka = D1dn0 (51) ing their specific binding sites. This results in an en- hancement of specific binding rates that is proportional obtained for the pure 1D search of random walkers of 2 to the 1D sliding rate, but the overall specific binding line density n0 searching along the DNA. For low con- rate depends linearly on protein concentration. These centrations, the McGhee and von Hippel isotherm (334) studies neglect the possibility that the protein finds its predicts a linear relation between n0 and the volume con- 2 specific site before unbinding. Given the experimental centration C; thus, ka ∝ C . The experimental evidence conditions under which transcription factor binding has for the purely linear search process, as shown in figure 39 been previously studied, this approximation is appropri- for 100 mM salt, was found for a large range of salt con- ate. However, as demonstrated in (333), this mechanism, centrations, see references (261; 333) for details. The case in which the unbinding rate is much lower than the spe- of high line density of proteins was discussed in (335). cific binding rate, occurs for the 1D search of DNA by the single-stranded DNA binding protein T4 gene 32 protein (gp32). This fast 1D search rate is essential for gp32 to be E. L´evy flights and target search

We now address the general search process with inter- change of 1D and 3D diffusion, and intersegmental jumps. 35 Possibly, also other binding proteins are able to perform inter- To this end, we first quickly review the definition of L´evy segmental jumps. flights (336; 337; 338; 339; 340). 39

1

= ¿=¾¸ EÕº ´½4µ

L´evy flights (LFs) are random walks whose jump «

= ¿=4¸ EÕº ´½4µ

lengths x are distributed like λ(x) ≃ |x|−1−α with ex- 0.9 «

« = ½=4¸ EÕº ´½4µ

= ¿=4¸ EÕº ´½6µ

ponent 0 < α < 2 (150). Their probability density 0.8 «

« = ½=4¸ EÕº ´½7µ to be at position x at time t has the characteristic 0.7 ∞ iqx α function P (q,t) ≡ e P (x, t)dx = exp(−DL|q| t), 0.6

−∞ ¼ ÓÒ

a consequence of the generalised central limit theorem =k 0.5 ÓÔØ R Ó« (342; 343); in that sense, LFs are a natural extension k of normal Gaussian diffusion (α = 2). LFs occur in a 0.4

wide range of systems (338); in particular, they repre- 0.3

sent an optimal search mechanism in contrast to locally 0.2 oversampling Gaussian search (341). Dynamically, LFs 0.1 can be described by a space-fractional diffusion equa- 0 −2 −1 0 1 2

α α 10 10 10 10 10

« =¾ ´½ « =¾µ

tion ∂P/∂t = D ∂ P (x, t)/∂|x| , a convenient basis to ¼

D =´D µ

L k

ÓÒ

Ä B introduce additional terms, as shown below. DL is a α diffusion constant of dimension cm /sec, and the frac- FIG. 40 Optimal choice of off rate koff as function of the LF tional derivative is defined via its Fourier transform, diffusion constant, from numerical evaluation of the model in α α α opt F {∂ P (x, t)/∂|x| } = −|q| P (q,t) (338; 339; 340). reference (344). The circle on the abscissa marks where koff LFs exhibit superdiffusion in the sense that h|x|ζ i2/ζ ≃ becomes 0 in the case α< 1/2. 2/α DLt (0 <ζ<α) (338), spreading faster than the linearly growing mean squared displacement of standard diffusion (α = 2). A prime example of an LF is linear first arrival time T to the target; in particular, to find the particle diffusion to next neighbour sites on a fast folding value of koff that minimises T . (‘annealed’) polymer that permits intersegmental jumps The various regimes of target search embodied in equa- at chain contact points (see figure 38) caused by polymer tion (52) are discussed in detail in references (344; 345). looping (323; 324). In fact, the contour length |x| stored The main result for the efficiency of the related search in a loop between such contact points is distributed in 3D process is summarised in figure 40, i.e., which protein like λ(x) ≃ |x|−1−α, where α = 1/2 for Gaussian chains unbinding rate koff optimises the mean search time T . Three regimes can be distinguished: (θ solvent), and α ≈ 1.2 for self-avoiding walk chains opt ′ (good solvent) (154). (i) Without L´evy flights, we obtain koff = kon: the In our description of the target search process, we use proteins should spend equal amounts of time in bulk and the density per length n(x, t) of proteins on the DNA on the DNA. This corresponds to the result obtained for as the relevant dynamical quantity (x is the distance single protein searching on a long DNA (326; 327). along the DNA contour). Apart from intersegmental (ii) For α > 1, i.e., when DNA is in the self-avoiding transfer, we include 1D sliding along the DNA with dif- regime, we find fusion constant DB, protein dissociation with rate koff opt ′ koff ∼ (α − 1)kon : (53) and (re)adsorbtion with rate kon from a bath of proteins of concentration nbulk. The dynamics of n(x, t) is thus The optimal off rate shrinks linearly with decreasing α. governed by the equation (344) (iii) For α< 1, i.e., when DNA leaves the self-avoiding phase (e.g., by lowering the temperature or introducing ∂ ∂2 ∂α opt n(x, t)= D + D − k n(x, t) attractive interactions) the value of koff approaches zero ∂t B ∂x2 L ∂|x|α off   as the frequency of intersegmental jumps (∝ DL) in- +konnbulk − j(t)δ(x). (52) creases: The L´evy flight mechanism becomes so efficient that bulk excursions become irrelevant. At α =1/2, the Here, j(t) is the flux into the target located at x = 0. case of the ideal Gaussian chain, we observe a qualitative opt We determine the flux j(t) by assuming that the tar- change: When α < 1/2, the rate koff reaches zero for get is perfectly absorbing: n(0,t) = 0 (346). Be ini- finite values of the rate for intersegmental jumps. Note tially the system at equilibrium, except that the tar- that when α < 1, the spread of the L´evy flight (≃ t1/α) get is unoccupied; then, the initial protein density is grows faster than the number of sites visited (≃ t), 36 n0 = n(x, 0) = konnbulk/koff . The total number of rendering the mixing effect of bulk excursions insignif- particles that have arrived at the target up to time t is icant. A scaling argument to understand the crossover J(t) = t dt′ j(t′). We derive explicit analytic expres- at α =1/2 relates the probability density of first arrival 0 1/α sions for J(t) in different limiting regimes, and study the with the width (≃ t ) of the Green’s function of a L´evy −1/α general caseR numerically. We use J(t) to obtain the mean flight pfa ≃ t . We see that the associated mean ar- rival time becomes finite for 0 <α< 1/2, even for the infinite chain limit considered here. We remark that this model is valid for an annealed 36 Note that the dimension of the on and off rates differ; while DNA only. This means that the chain can equilibrate (at −1 2 [koff ] = sec , we chose [kon]= cm /sec. least, locally) on the typical time scale between interseg- 40

mental jumps. Even though real DNA in solution might ceeded in 1958 (357). Modern organic chemistry has seen not be fully annealed, features of this analysis will reflect the development of refined synthesis methods to generate on the target search. A more detailed study of different topological molecules. regimes of DNA is under way.

A. Functional molecules F. Viruses—extreme nanomechanics In parallel to the miniaturisation in electronics (358) Viruses have played an important role in the discovery and the possibility of manipulating single (bio)molecules of the mechanisms underlying gene regulation, see, for in- (359), supramolecular chemistry which makes use of stance, reference (297). From a nanoscience perspective, chemical topology properties is coming of age (360; 361). viruses are of interest on their own part. During the as- Thus, rotaxane-type molecules are believed to be the sembly of many viruses, the viral DNA of several µm building blocks for certain nanoscale machines and mo- length is packaged into the capsid, the protein container tors (362), so-called hermaphrodite molecules have been making up most of the virus, by a motor protein. This shown to perform linear relative motion (“contraction motor packages the DNA by exerting forces of up to 60 and stretching”) (363), and pirouetting molecules have pN or more, causing pressures building up in the capsid been synthesised (364). Moreover, topological molecules of the order of 6 MPa (347; 348). The size of the cap- are thought to become components for molecular elec- sid spans few tens of µm, and is therefore comparable to tronics switching devices in memory and computing ap- the persistence length of DNA (347; 349; 350; 351; 352). plications (365; 366). These molecular machines are usu- Therefore, fluctuation-based undulations are suppressed, ally of lower molecular weight, and their behaviour is es- and the chain can be approximately thought of as being sentially energy-dominated in the sense that their confor- wound up helically like thread on a bobbin, or like a ball mations and dynamical properties are governed by exter- of yarn. Ultimately, a relatively highly ordered 3D con- nal and thermal activation in an energy landscape. The figuration of the DNA inside the capsid is achieved, which understanding of the physical properties and the theoreti- under certain conditions may even lead to local crystalli- cal modelling of such designer molecules and their natural sation of the DNA (349; 350; 352; 353; 354; 355; 356). It biological counterparts has increasingly gained momen- is generally argued that this ordered arrangement helps tum, and the stage is already set for the next generation to avoid the creation of entanglements or even knots of of applications (358; 359; 360; 361; 362; 363; 364; 365; the wound-up DNA, thus enabling easy ejection, i.e., re- 366; 367; 368; 369; 370; 371; 372). lease of the DNA once the phage docks to a new host cell; In reference (230) we introduced some basic concepts this ejection is not assisted by the packaging motor, but for functional molecules whose driving force is entropic it can be facilitated by host cellular DNA polymerase, rather than energetic, see also the more recent pub- which starts to transcribe the DNA and thereby pulls it lications in chemistry journals (373; 374). Entropy- out of the capsid (1; 2; 96; 354). Details on the packaging functional molecules will be of higher molecular weight energetics can be found in reference (353), and the works (hundred monomers or above) in order to provide suf- cited therein. Model calculations for the entropy loss, ficient degrees of freedom such that entropic effects can binding and twist energy, and electrostatic forces that determine the behaviour of the molecule. The potential need to be overcome on packaging reveal, that at higher for such entropy-driven functional molecules can be an- packaging ratios the packaging force almost exclusively ticipated from the classical Gibbs Free energy comes from the electrostatic repulsion. F = U − TS ; (54)

VII. FUNCTIONAL MOLECULES AND NANOSENSING in functional molecules, F is minimised mainly by vari- ation of the internal energy U representing the shape Complex molecules can be endowed with the distinct of the energy landscape of the functional unit. New feature that they contain subunits which are linked to types of molecules were proposed for which F is min- each other mechanically rather than chemically (357). imised by variations of the entropy S, while the energies The investigation of the structure and properties of such and chemical bondings are left unchanged (230). The interlocked topological molecules is subject of the grow- entropy-functional units of such molecules can be specifi- ing field of chemical topology (130); while speculations cally controlled by external parameters like temperature, about the possibility of catenanes 37 (Olympic rings) date light flashes, or other electromagnetic fields (360; 361). back to the early 20th century lectures of Willst¨atter, We note that DNA is already being studied as a macro- the actual synthesis of catenanes and rotaxanes 38 suc- molecular prototype building block for molecular ma- chines (369). A typical example is the molecule shown in figure 41. According to the arrangement of the sliding rings 1 and 2, 37 catena (lat.), the chain. this compound exhibits the unique feature of a molecule 38 rota (lat.), the wheel; axis (lat.), the axle. that it can slide laterally. Suggested as precursors of 41

  3 2                1        FIG. 41 Molecular muscle consisting of two interlocked rings   1 and 2 with attached rod-like molecules. Within this struc-   ture, sliding rings, 3, can be placed, which, if activated, tend   to contract the muscle by entropic forces.      molecular muscles (363), this compound could be pro-   pelled with internal entropy-motors, which entropically  x=xT  adjust the elongation of the muscle. In the configuration   shown in figure 41, the sliding ring 3 creates, if activated,  an entropic force which tends to contract the “muscle”; at T = 300K and on a typical scale x = 10 nm, the entropic force kB T/x is of the order of pN, and thus comparable to the force created in biological muscle cells (375). Molecular muscles of such a make can be viewed FIG. 42 Molecular beacon based on local DNA denaturation. The green blobs may represent single-stranded DNA bind- as the nano-counterpart of macroscopic muscle models ing proteins, or more specifically binding proteins binding or proposed by de Gennes (376), in which the contraction other molecule to a custom designed DNA sequence along the is based on the entropy difference between the isotropic denaturation fork. Bound proteins stabilize the denatured and nematic phases in liquid crystalline elastomer films fork and change the spectrum of the beacon. (377). Similarly, one might speculate whether the DNA helix- coil transition (262) in multiplication setups could be fa- similar to the ones described in references (249; 281) cilitated in the presence of pre-ring molecules which in works as follows. As long as the dsDNA is intact, flu- vitro attach to an opened loop of the double strand and orophore and quencher are in close proximity. Once close, creating an entropy pressure which tends to open they come apart from one another when the denaturation up the vicinal parts of the DNA which are still in the he- wedge opens up, the incident laser light causes fluores- lix state. Finally, considering molecular motors, it would cence of the dye. The on/off blinking of this ”molecular be interesting to design an externally controllable, purely beacon” can be monitored in the focus of a confocal mi- entropy-driven rotating nanomotor. croscope, or, depending on the intensity of the emitted Numerous additional nanoapplications of biopolymers light, by a digital camera. The blinking renders immedi- appear in current literature. An interesting example is ate information about the state of the bp, that is tagged the nanomotor created by a DNA ring in a periodically by the dye-quencher pair. Fluorescence, that is, indi- driven external field, for instance, a focused light beam cates that the bp is currently broken. It is therefore ad- inducing localised temperature variations (378; 379). vantageous to define the random variable I(t) with the The speeds possibly attained by such a device are of the property order of those reached by biological organisms. Such a 0 if base-pair at x = x is closed nanorotor could be used to stir smallest volumes in higher I(t)= T , (55) viscous environments. 1 if base-pair at x = xT is open  and in experiments one typically measures the corre- B. Nanosensing sponding blinking autocorrelation function

The advances in minituarization of reactors and de- A(t)= hI(t)I(0)i − hIieq2 , (56) vices also brings along the need of probes, by which where hIi is the (ensemble) equilibrium value, or its smallest volumes can be tested. For instance, microar- eq spectral decomposition rays used in genomics require sensors to detect the pres- ence of certain proteins (often at small concentrations) ∞ t in a microdish, without disturbing the environment in A(t)= f(τ)exp − dτ, (57) τ the small volume too much. Similarly, single molecule Z0   experiments require specific local detection possibilities. where A fine example for a potential nanosensore is the blink- 2 ing behaviour of a fluorophore-quencher pair mounted on f(τ)= Tp δ (τ − τp) . (58) the denaturation wedge as shown in figure 42. This setup, Xp6=0 42

1 croscopes become important tools to manipulate and 0.9 probe biomolecules and their interaction even on the sin- κ=0.1 0.8 gle molecule level. Secondly, biomolecules are entering κ=0.5 the stage as nanotools such as nanosensors, functional

) 0.7 p molecules, or highly sensitive force transducers. τ κ=1 − τ 0.6 The possibility to perform controlled experiments on ( κ δ =3

2

) 0.5 biomolecules, for instance, to measure the force-extension p no SSBs curves of single biopolymers, also opens up novel possi-

)=(T 0.4 τ

f( bilities to test new physical theories. The foremost ex- 0.3 amples may be the exploration of persistence lengths and 0.2 other polymer physics properties, and the statistical me-

0.1 chanical concepts relevant for small system sizes. The latter are known under the keyword of the Jarzynski re- 0 0 50 100 150 200 250 300 350 400 450 lation connecting the non-equilibrium work performed on τ −1 (in units of k ) a physical system with the difference in the thermody- 1 namic (i.e., equilibrium) potential between initial and fi- 0.9 nal states (380; 381). However, there exist by now several similar theories addressing different physical quantities, 0.8 γ=2 such as the concept of entropy production along a single 0.7 particle trajectory (382). 0.6 This review summaries fundamental physical proper-

A(t) 0.5 κ=0.1 ties of DNA, and their relevance for both biological pro- cesses and technological applications. The extensive list κ 0.4 =0.5 of references will be useful for further studies on specific 0.3 κ=1 topics covered herein. We are confident that the role of 0.2 κ=3 biomolecules in technology, not at least for biomedical

0.1 no SSBs applications, will experience a dramatic increase during the coming years and will enable us to extend current 0 −2 −1 0 1 2 3 10 10 10 10 10 10 physical understanding of fundamental processes. t (in units of k−1)

FIG. 43 Spectral response of the denaturation beacon in Acknowledgments the presence of single-stranded DNA binding proteins. Top: Relaxation time spectrum, bottom: blinking autocorrelation RM and TA acknowledge many helpful and enjoy- function. able discussions with Jozef Adamcik, Audun Bakk, Suman Banik, Erika Ercolini, Giovanni Dietler, Hans Fogedby, Yacov Kantor, Mehran Kardar, Joseph Klafter, is called the relaxation time spectrum. Oleg Krichevsky, Michael Lomholt, Maxim Frank- Figure 43 shows an example for the achievable sen- Kamenetskii, Igor Sokolov, Andrzej Stasiak, Francesco sitivity of such nanobeacons, in an example where the Valle, and Mark Williams. RM acknowledges partial denaturation wedge is in solution together with a certain funding from the Natural Sciences and Engineering Re- concentration (proportional to κ, compare Sec. IVG) search Council (NSERC) of Canada and the Canada of selectively single-stranded DNA binding proteins, as Research Chairs programme. TA acknowledges partial discussed previously. It is distinct how both measur- funding from the Wallenberg foundation. AH acknowl- able signals, A(t) and f(τ) change with varying SSB- edges funding by the AFOSR (FA9550-05-1-0472) and concentration. by the NIH (SCORE program GM068855-03S1). SDL acknowledges funding by the Joint DMS/NIGMS Math- ematical Biology Initiative (NIH GM 67242) and by the VIII. SUMMARY National Foundation for Cancer Research through the Yale-NFCR Center for Protein and Nucleic Acid Chem- Biopolymers such as DNA, RNA, and proteins are istry. indispensable for their specificity and robustness in all forms of life. Given their detailed physical proper- ties such as DNA’s persistence length of some 50nm or its local denaturation in nano-bubbles already at room temperature, and biochemically relevant interfaces such as 10-20 bps, they deeply stretch into the nanoscience domain. This statement is twofold in the following sense. Firstly, nanotechniques such as atomic force mi- 43

Biographical notes Andreas Hanke received his doctoral degree in physics from the University of Ralf Metzler received his Wuppertal in Germany, doctoral degree in physics FRG. He then went to the from the University of Ulm, Massachusetts Institute of FRG. He then went to Tel Technology (MIT) for his Aviv University as Humboldt postdoctoral studies with Feodor Lynen fellow and later Mehran Kardar. After a as Minerva Amos de Shalit period as visiting scientist fellow, to work with Joseph with Michael Schick at the Klafter. As DFG Emmy University of Washington, Se- Noether fellow, after a pe- attle, he moved to a postdoc- riod as visiting scientist at toral position in John Cardy’s group at the University the University of Illinois at of Oxford, UK, followed by postdocs with Udo Seifert in Urbana-Champaign with Pe- Stuttgart, FRG, and again with John Cardy in Oxford. ter Wolynes, Ralf moved to In 2004 Andreas became Assistant Professor of Physics the Massachusetts Institute of at the University of Texas at Brownsville. He is also Ad- Technology (MIT) in Cambridge, MA, where he worked junct Assistant Professor at the Department of Physics with Mehran Kardar. In 2002 Ralf was appointed As- at The University of Texas at Dallas and a member of sistant Professor at the Nordic Institute for Theoretical the Institute of Biomedical Sciences and Technology at Physics (NORDITA) in Copenhagen, Denmark. In sum- UT Dallas. In addition, his research includes summer mer 2006, Ralf assumed his post as Associate Professor appointments at the UT Dallas NanoTech Institute. An- and Canada Research Chair in Biological Physics at the dreas has worked in the fields of mesoscopic quantum University of Ottawa, Canada. Ralf works extensively systems, soft condensed matter physics, and biological on biological physics problems and anomalous stochas- physics. Currently he is building up a theory division tic processes. These include DNA physics such as DNA in Molecular Biophysics and Nanoscience at the Univer- denaturation, DNA topology and the role of DNA confor- sity of Texas at Brownsville. In his spare time, Andreas mations in gene regulation, as well as anomalous diffusion enjoys latin and salsa music and dancing, all kinds of in biological systems. The latter questions are connected outdoor activities, and excursions to Mexico. with L´evy statistics leading to, respectively, subdiffusion or L´evy flights. In his spare time, Ralf enjoys the com- pany of his daughter and wife, he listens to classical music and is a keen reader of murder mysteries. In Canada, Ralf is looking forward to nature walks, cross-country skiing and ice-skating. Yongli Zhang is a postdoctoral fellow of the Jane Coffin Childs Memorial Fund for Med- Tobias Ambj¨ornsson re- ical Research in Carlos ceived his PhD from Chalmers Bustamantes group at and Gothenburg University, University of California where he worked on the at Berkeley. He received electromagnetic response of his Ph.D. in Molecular matter. In 2003 he be- Biophysics and Biochem- came a NORDITA postdoc- Biochemistry from in 2003, under super- toral fellow, working with vision of Donald Crothers. In December 2006, he will Ralf Metzler on the modelling move to the Department of Physiology and Biophysics of single biomolecule prob- at Albert Einstein College of Medicine as an assistant lems such as DNA breathing professor. Yongli has been doing both experimental and and biopolymer translocation theoretical research in DNA mechanics, chromatin dy- through nanopores. Tobias re- namics, and mechanism of chromatin remodeling. His recently received a prestigious fellowship from the Wal- lab will mainly use single-molecule techniques, such as lenberg foundation, allowing him to join the group of optical tweezers and atomic force microscopy, to study Robert Silbey at MIT in autumn 2006, to pursue studies mechanism of molecular motors and dynamics of macro- on nanosensors. Tobias enjoys listening to pop music, molecular assemblies. travelling, and spending time with friends. 44

Stephen Levene is Associate Professor of Molecular and Cell Biology at The Univer- sity of Texas at Dallas. Dr. Levene received his Ph.D. in Chemistry from Yale University in 1985 and was an American Cancer Soci- ety postdoctoral fellow with Bruno Zimm at University of California, San Diego until 1989. He then spent one year as a staff scientist at the Human Genome Center at Lawrence Berkeley Laboratory and was a Program in Mathematics and Molecular Biology Fellow at University of California, Berkeley in Nicholas Cozzarellis laboratory. Dr. Levene’s research interests are in the area of nucleic-acid structure and flexibility, mechanisms of DNA recombination, the structural organization of human telomeres, and applica- tions of these areas to biotechnology. He leads the focus group in Molecular Diagnostics and Bioimaging in the UT-Dallas Institute of Biomedical Sciences and Technol- ogy, is recipient of an Obermann Interdisciplinary Re- search Fellowship, a member of the Biophysical Society, and has served on several NIH study sections. In addi- tion to scientific pursuits, Dr. Levene is an avid snow skier and cyclist, having previously competed in both disciplines. 45

APPENDIX: A polymer primer. For large N, due to the independence of individual ai, this probability density function will acquire a Gaussian In this section, we introduce some basic concepts from shape, polymer physics. Starting from the random walk model, we define the fundamental measures of a polymer chain, d d/2 dr2 p(r)= exp − , (66) before introducing excluded volume. For more details, 2πNa2 2Na2 we refer to the monographs (24; 146; 191; 383).     The simplest polymer model is due to Orr (384). It where the normalisation is such that hr2i = Na2. From models the polymer chain a a random walk on a periodic this expression, we can deduce that the number of degrees lattice with lattice spacing a. Then, each monomer of of freedom of a closed random walk chain is proportional −d/2 index i is characterised by a position vector Ri with i = to N , the entropy loss suffered by a chain subject to 0, 1,...,N. The distance between monomers i and i +1 the constraint r = 0. On a general lattice, is called ai+1 = Ri+1 −Ri. Consequently, the end-to-end N −d/2 vector of the polymer is ω ≃ µ N (67)

r = ai. (59) with the connectivity constant µ, a measure for in how i many different directions the next bond vector can point X (µ =2d in a cubic lattice). At fixed end-to-end distance, Different ai have completely independent orientations, the entropy of the random walk becomes S(r) = S0 − such that we immediately obtain the average (h·i over dr2/(2Na2) where S absorbs all constants. For the free different configurations) squared end-to-end distance 0 energy F (r)= E − kBTS(r) we therefore obtain 2 2 2 2 R0 = hr i = hai · aj i = hai i = Na . (60) 2 F F dkBT r i,j i (r)= 0 + 2 , (68) X X 2R0 1/2 R0 ≃ N a is a measure for the size of the random walk. An alternative measure of the size of a polymer chain i.e., the random walk likes to coil, the restoring force −∇F (r) being linear in r. This is often called the en- is provided by its radius of gyration Rg, which may be measured by light scattering experiments. It is defined tropic spring character of a Gaussian polymer. Note that by the ‘spring constant’ increases with temperature (‘en- tropy elasticity’). N In this random walk model of a polymer chain, it is 2 1 2 R = h(Ri − RG) i, (61) straightforward to define the persistence length of the g 1+ N i=0 a X chain. By this we mean that successive vectors i are and measures the average squared distance to the centre not independent, but tend to be parallel. Over long of gravity, distance, this correlation is lost, and the chain behaves like a random walk. Due to the quantum chemistry of N 1 the monomers, an adjacent pair of vectors ai, ai+1 in- R = R . (62) G 1+ N i cludes preferred angles, for carbon chains leading to the i=0 X trans/gauche configurations. This feature is captured Expression (61) can be rewritten as schematically in the freely rotating chain as depicted in figure 44. Following (191), we can obtain the correlation N−1 N han · ami as follows. If we fix all vectors am,..., an−1, 2 −2 2 Rg =(1+ N) h(Ri − Rj) i. (63) then the average hani = an−1 cos θ. am,am+1,...,an−1 fixed i=0 j=i+1 X X Multiplication by am produces j With Rj − Ri = an, one can easily show that n=i+1 ham · ania a = am · an−1 cos θ. (69) 2 2 m,..., n−1 fixed Rg = a N(N + 2)/[6(N + 1)]. For large N, that is, 2 P a Averaging over the a ,..., a leads to the recursion Rg ≃ 6 N, and therefore: m n−1 relation ham · ani = ham · an−1i cos θ. With the initial 1/2 2 2 Rg ∼ R0 ∼ aN . (64) condition ha i = a , we find

On a cubic lattice in d dimensions, each step can go 2 |n−m| ham · ani = a cos θ. (70) in 2d directions, and for a general lattice, each vector ai will have µ possible directions. The number of distinct Thus, if θ = 0, we obtain a rigid rod behaviour, while N N walks with N steps is therefore µ . Denote N (r) the for θ 6= 0, there occurs an exponential decay of the cor- number of distinct walks with end-to-end vector r, the relation between any two bond vectors an and am. This probability density function for a given r is defines a length scale

NN (r) a p(r)= . (65) ℓp ≡ , (71) r NN (r) log cos θ P 46

The total free energy becomes an F N 2 R2 ≃ v(T ) + , (75) T Rd Na2

d+2 2 3 with a minimum at RF = v(T )a N , so that the Flory an−1 radius scales like 3 R ∼ AN ν , therefore ν = . (76) F 2+ d

an−2 The values of the exponent ν(d =2)=3/4 and ν(d = θ 3) = 3/5 are extremely close to the best known values 0.75 and 0.588.39

FIG. 44 Freely jointed chain, in which successive bond vectors include an angle θ. Polymer networks.

A linear excluded volume polymer chain has the size the ‘persistence length’ of the chain. It diverges for ◦ 2 2ν θ → 0, while for θ = 90 , it vanishes, corresponding to Rg ≃ AN (77) the random walk model discusses above (‘freely jointed chain’). As with ν = 0.75 in d = 2, and ν = 0.588 in d = 3. Its number of degrees of freedom is given in terms of the ∞ ∞ 2 k 2 1+cos θ configuration exponent γ such that han+k · ani = a 1+2 cos θ = a , ! 1 − cos θ k=−∞ k=1 ω ≃ µN N γ−1, (78) X X (72) 2 2 we find R0 = a N(1+cos θ)/(1 − cos θ), i.e., statistically, the freely jointed chain behaves the same as the random where γ =1.33 in d = 2 and γ =1.16 in d = 3. walk chain, but with a rescaled monomer length. The Remarkably, similar critical exponents can be obtained statistical unit in a polymer chain is often taken to be for a general polymer network of the type shown in figure 79, as originally by Duplantier (147; 154), compare also the Kuhn length ℓK =2ℓp. Above chain models are often referred to as being and in references (157; 385): In a network G consisting of N chain segments of lengths s1,...,sN and total length phantom, i.e., the chain can freely cross itself. A physical N polymer possesses an excluded volume and behaves like L = i=1 si, the number of configurations ωG scales as a so-called self-avoiding chain. Mathematically, this can P s s be modelled by self-avoiding walks. To include the ma- ω (s ,...,s )= µLsγG −1Y 1 ,..., N−1 , (79) jor effects, it is sufficient to follow a simple argument due G 1 N N G s s  N N  to Flory. Consider a chain with unknown radius R and d internal monomer concentration cint ≃ N/R . Assum- where YG is a scaling function, and µ is the effective con- ing that the self-avoiding character is due to monomer- nectivity constant for self-avoiding walks. The exponent monomer interactions, the repulsive energy is propor- γG is given by γG =1 − dνL + N≥1 nN σN , where ν is tional to the squared concentration, i.e., the swelling exponent, L is the number of independent P loops, nN is the number of vertices with N outgoing legs, 1 2 Frep = T v(T )c , (73) and σN is an exponent associated with such a vertex. In 2 d = 2, this exponent is given by (147; 154) with the excluded volume parameter v(T ) (v(T ) ≡ (1 − 2χ)ad in Flory’s notation, where the θ condition (2 − N)(9N + 2) σN = . (80) χ = 1/2 corresponds to ideal chain behaviour). To ob- 64 tain the total averaged repulsive energy Frep|tot, we need to average over c2. In a mean field approach, we take In the dense phase in 2D (209; 210; 212; 213; 214; 386), hc2i −→ hci2 ∼ c2 . We therefore obtain and at the Θ transition (387), analogous results can be int obtained. N 2 F ≃ T v(T )c2 Rd = T v(T ) , (74) rep|tot int Rd favouring large values of R. This ‘swelling’ competes with 39 An interesting discussion about the flaws underlying this reason- 2 2 the entropic elasticity contribution Fel ≃ T R /(Na ). ing can be found in reference (146). 47

monomers, a different scaling behaviour emerges if the system is not below but right at the Θ point (210). In this case the number of configurations of a general net- work G is given by

γ −1 s s ω (s ,...,s ) ∼ µL s G Y 1 ,..., N−1 , (84) G 1 N N G s s  N N  with the network exponent

γG =1 − dνL + nN σN . (85) NX≥1 FIG. 45 Polymer network G with vertices (•) of different order N, where N self-avoiding walks are joined (n1 = 5, n3 = 4, Overlined symbols refer to polymers at the Θ point. In n4 = 3, n5 = 1). d = 2, ν =4/7 and σN = (2 − N)(2N + 1)/42 (210).

First, consider the dense phase in 2D. If all segments have equal length s and L = N s, the configuration num- 40 ber ωG of such a network scales as (209; 210)

γG ωG (s) ∼ ω0(L) s , (81)

where ω0(L) is the configuration number of a simple ring of length L. For dense polymers, and in contrast to the dilute phase or at the Θ point, ω0(L) (and thus ωG) de- pends on the boundary conditions and even on the shape of the system (210; 212; 213; 214; 386). For exam- ple, for periodic boundary conditions (which we focus on in this study) corresponding to a 2D torus, one finds L Ψ−1 ω0(L) ∼ µ L with a connectivity constant µ and Ψ = 1 (210). However, the network exponent

γG =1 − L + nN σN (82) NX≥1 is universal and depends only on the topology of the network by the number L of independent loops, and by the number nN of vertices of order N with vertex expo- 2 nents σN = (4 − N )/32 (209; 210). For a linear chain, the corresponding exponent γlin = 19/16 has been veri- fied by numerical simulations (210; 388). For a network made up of different segment lengths {si} of total length N L = i=1 si, equation (81) generalises to (cf. section 4 in reference (210)) P s s ω (s ,...,s ) ∼ ω (L) sγG Y 1 ,..., N−1 , G 1 N 0 N G s s  N N  (83) which involves the scaling function YG . For polymers in an infinite volume and endowed with an attractive interaction between neighbouring

40 Note that due to the factor ω0(L) the exponent of s is γG , and not γG − 1 like in the expressions used in the dilute phase (154) −dν or at the Θ point, for which ω0(L) ∼ L . However, for 2D dense polymers one has dν = 1, so that both definitions of γG are equivalent, cf. section 3 in reference (210). 48

References [33] A. V. Vologodskii, S. D. Levene, K. V. Klenin, M. Frank-Kamenetskii, and N. R. Cozzarelli, [1] B. Alberts, A. Johnson, J. Lewis, M. Raff, K. Roberts, J. Mol. Biol. 227, 1224 (1992). and P. Walter, Molecular biology of the cell (Garland, [34] H. Tsen, A. Hanke, and S. D. Levene, in preparation New York, 2002). (2006). [2] D. P. Snustad and M. J. Simmons, Principles of Genet- [35] M. Pillsbury, H. Orland, and A. Zee, Phys. Rev. E. 72, ics (John Wiley & Sons, New York, 2003). 011911 (2005). [3] A. Kornberg, DNA synthesis (W. H. Freeman, San [36] H. Orland and A. Zee, Nucl. Phys. B 620, 456 (2002). Francisco, CA, 1974). [37] M. Baiesi, E. Orlandini, and A. L. Stella, Phys. Rev. [4] A. Kornberg and T. A. Baker, DNA Replication Lett. 91, 198102 (2003). (W. H. Freeman, New York, 1992). [38] R. Schleif, Annu. Rev. Biochem. 61, 199 (1992). [5] R. E. Franklin and R. G. Gosling, Nature 171, 740 [39] K. S. Matthews, Microbiol. Rev. 56 123 (1992). (1953). [40] T. M. Dunn, S. Hahn, S. Ogden, and R. F. Schleif, Proc. [6] J. D. Watson and F. H. C. Crick, Nature 171, 737 Nat. Acad. Sci. USA 81, 5017 (1984). (1953). [41] M. Geanacopoulos, G. Vasmatzis, V. B. Zhurkin, and [7] L. R. G. Treloar, The physics of rubber elasticity S. Adhya, Nat. Struct. Biol. 8, 432 (2001). (Clarendon Press, Oxford, 1975). [42] M. C. Mossing and M. T. Record, Science 233 889 [8] F. Crick, Nature 227, 561 (1970). (1986). [9] F. C. H. Crick, in Symp. Soc. Exp. Biol., The Biological [43] P. Valentin-Hansen, B. Albrechtsen, and J. E. Love Replication of Macromolecules, XII, 138 (1958). Larsen, Embo J. 5 2015 (1986). [10] V. A. Bloomfield, D. M. Crothers, and I. Tinoco, Phys- [44] K. Rippe, M. Guthold, P. H. von Hippel, and C. Bus- ical Chemistry of Nucleic Acids (Harper & Row, New tamante, J. Mol. Biol. 270, 125 (1997). York, 1974). [45] K. Ogata, K. Sato, and T. Tahirov, Curr. Opin. Struct. [11] R. F. Gesteland and J. F. Atkins, editors, The RNA Biol. 13, 40 (2003). world (Cold Spring Harbor Laboratory Press, Cold [46] G. D. van Duyne, Annu. Rev. Biophys. Biomolec. Spring Harbor, New York, 1993). Struct. 30, 87 (2001). [12] M. D. Frank-Kamenetskii, Unraveling DNA: The Most [47] H. Benjamin and N. Cozzarelli, in The Robert A. Welch Important Molecule of Life (Perseus, Cambridge, MA, Foundation Conferences on Chemical Research, vol. 26, 1997). p. 107 (1986). [13] M. D. Frank-Kamenetskii, Phys. Rep. 288, 13 (1997). [48] H. Tsen and S. D. Levene, Proc. Natl. Acad. Sci. USA. [14] J. F. Marko and E. D. Siggia, Macromolecules 28, 8759 94, 2817 (1997). (1995). [49] M. A. Watson, D. M. Gowers, and S. E. Halford, J. Mol. [15] J. F. Marko and E. D. Siggia, Phys. Rev. E 52, 2912 Biol. 298, 461 (2000). (1995). [50] M. L. Embleton, S. A. Williams, M. A. Watson, and [16] M. Ptashne and A. Gunn, Genes and signals (Cold S. E. Halford, J. Mol. Biol. 289, 785 (1999). Spring Harbor Laboratory Press, Cold Spring Harbor, [51] M. Deibert, S. Grazulis, G. Sasnauskas, V. Siksnys, and New York, 2002). R. Huber, Nat. Struct. Biol. 7, 792 (2000). [17] B. R´evet, B. v. Wilcken-Bergmann, H. Besset, A. [52] Y. L. Zhang and D. M. Crothers, Biophys. J. 84, 136 Barker, and B. M¨uller-Hill, Curr. Biol. 9, 151 (1999). (2003). [18] C. E. Bell and M. Lewis, J. Mol. Biol. 314, 1127 (2001). [53] Y. Zhang, A. E. McEwen, D. M. Crothers, and S. D. [19] C. E. Bell, P. Frescura, A. Hochschild, and M. Lewis, Levene, Biophys. J. 90, 1903 (2006). Cell 101, 801 (2000). [54] E. D. Ross, P. R. Hardwidge, and L. J. Maher, Mol. [20] A. Hanke and R. Metzler, Biophys. J. 85, 167 (2003). Cell. Biol. 21, (2001). [21] H. Schiessel, J. Phys. Cond Mat. 15, R699 (2003). [55] R. Schleif, Trends Genet. 16, 559 (2000). [22] G. Kreth, J. Finsterle, J. von Hase, M. Cremer, and C. [56] A. Hochschild, Method Enzymol. 208, 343 (1991). Cremer, Biophys. J. 86, 2803 (2004). [57] L. Finzi and J. Gelles, Science 267, 378 (1995). [23] N. L. Goddard, G. Bonnet, O. Krichevsky, and A. [58] J. M¨uller, S. Oehler, and B. M¨uller-Hill, J. Mol. Biol. Libchaber, Phys. Rev. Lett. 85, 2400 (2000). 257, 21 (1996). [24] A. Yu. Grosberg and A. R. Khokhlov, Statistical Me- [59] K. Rippe, P. H. von Hippel, and J. Langowski, Trends chanics of Macromolecules (AIP Press, New York, Biochem. Sci. 20, 500 (1995). 1994). [60] K. Rippe, Trends Biochem. Sci. 26, 733 (2001). [25] G. C˘alug˘areanu, Czech. Math. J. 11, 588 (1961). [61] D. Shore, J. Langowski, and R. L. Baldwin, Proc. Nat. [26] J. H. White, Am. J. Math. 91, 693 (1969). Acad. Sci. USA. 78, 4833 (1981). [27] F. B. Fuller, Proc. Natl. Acad. Sci. USA 75, 3557 [62] D. Shore and R. L. Baldwin, J. Mol. Biol. 170, 957 (1971). (1983). [28] J. H. White and W. R. Bauer, J. Mol. Biol. 189, 329 [63] D. M. Crothers, J. Drak, J. D. Kahn, and S. D. Levene, (1986). Method Enzymol. 212, 3 (1992). [29] W. R. Bauer, Ann. Rev. Biophys. Bioeng. 7, 287 (1978). [64] K. Klenin, H. Merlitz, and J. Langowski, Biophys. J. [30] H. B. Sun, J. Shen, and H. Yokata, Biophys. J. 79, 184 74, 780 (1998). (2000). [65] A. V. Vologodskii, V. V. Anshelevich, A. V. Lukashin, [31] C. J. Benham, Comput. Appl. Biosci. 12, 375 (1996). and M. D. Frank-Kamenetskii, Nature 280, 294 (1979). [32] S. Goetze, A. Gluch, C. Benham, and J. Bode, Bio- [66] P. J. Hagerman, Biopolymers 24, 1881 (1985). chemistry 42, 154 (2003). [67] S. D. Levene and D. M. Crothers, J. Mol. Biol. 189, 61 (1986). 49

[68] L. M. Edelman, R. Cheong, and J. D. Kahn, Biophys. V. Katritch, J. Dubochet, and A. Stasiak, J. Mol. Biol. J. 84, 1131 (2003). 278, 1 (1998). [69] J. Shimada and H. Yamakawa, Macromolecules 17 689 [98] J. Portugal and A. Rodriguez-Campos, Nucleic Acids (1984). Res. 24, 4890 (1996). [70] A. Bacolla, R. Gellibolian, M. Shimizu, S. Amirhaeri, [99] A. Rodriguez-Campos, J. Biol. Chem. 271, 14150 S. Kang, K. Ohshima, J. E. Larson, S. C. Harvey, B. D. (1996). Stollar, and R. D. Wells, J. Biol. Chem. 272, 16783 [100] P. Staczek and N. P. Higgins, Mol. Microbiol. 29, 1435 (1997). (1998). [71] S. M. Law, G. R. Bellomy, P. J. Schlax, and M. T. [101] Y. Arai, R. Yasuda, K.-I. Akashi, Y. Harada, H. Miyata, Record, J. Mol. Biol. 230, 161 (1993). K. Kinosita Jr., and H. Itoh, Nature 399, 446 (1999). [72] M. H. Hao and W. K. Olson, Macromolecules 22, 3292 [102] P. Pieranski, S. Kasas, G. Dietler, J. Dubochet, and A. (1989). Stasiak, New J. Phys. 3, 101 (2001). [73] B. D. Coleman, W. K. Olson, and D. Swigon, J. Chem. [103] A. M. Saitta, P. D. Soper, E. Wasserman, and M. L. Phys. 118, (2003). Klein, Nature 399, 46 (1999). [74] D. Ming, Y. Kong, M. A. Lambert, Z. Huang, and J. [104] A. Stasiak, A. Dobay, J. Dubochet, and G. Dietler, Sci- Ma, Proc. Natl. Acad. Sci. USA 99, 8620 (2002). ence 286, 11a (1999). [75] I. Tobias, B. D. Coleman, and W. K. Olson, J. Chem. [105] T. McNally, The complete book of fly fishing (Mountain Phys. 101, 10990 (1994). Press, Camden, ME). [76] A. Balaeff, L. Mahadevan, and K. Schulten, Phys. Rev. [106] V. V. Rybenkov, C. Ullsperger, A. V. Vologodskii, and Lett. 83, 4900 (1999). N. R. Cozzarelli, Science 277, 690 (1997). [77] J. Yan, R. Kawamura, and J. F. Marko, Phys. Rev. E [107] A. Stasiak, Curr. Biol. 10, R526 (2000). 71, 061905 (2005). [108] T. R. Strick, V. Croquette, and D. Bensimon, Nature [78] S. Blumberg, A. V. Tkachenko, and J. C. Meiners, Bio- 404, 901 (2000). phys. J. 88, 1692 (2005). [109] J. C. Wang, Annu. Rev. Biochem. 65, 635 (1996). [79] A. Balaeff, C. R. Koudella, L. Mahadevan, and K. Schul- [110] J. M. Berger, S. J. Gamblin, S. C. Harrrison, and J. C. ten, Philos. Transact. A 362, 1355 (2004). Wang, Nature 379, 225 (1996). [80] D. Shore and R. L. Baldwin, J. Mol. Biol. 170, 983 [111] J. C. Wang, Quart. Rev. Biophys. 31, 107 (1998). (1983). [112] L. Euler, Solutio problematis ad geometriam situs per- [81] D. S. Horowitz and J. C. Wang, J. Mol. Biol. 173, 75 tinentis Comment. Academiae Sci. Imp. Petropolitanae (1984). 8, 128 (1736); English translation: Sci. Amer. 189, 66 [82] X. J. Lu and W. K. Olson, Nucleic Acids Res. 31, 5108 (1953). (2003). [113] J. Kepler, in W. v. Dyck and M. Caspar (Editors), Jo- [83] K. Klenin and J. Langowski, Biopolymers 54, 307 hannes Kepler, Gesammelte Werke (Beck, M¨unchen, (2000). 1937). [84] S. A. Wassermann, J. M. Dungan, and N. R. Cozzarelli, [114] A. C. C. Adams, The knot book: an elementary intro- Science 229, 171 (1985). duction to the mathematical theory of knots (Freeman, [85] R. W. Deibler, S. Rahmati, and E. L. Zechiedrich, Genes New York, 1994). & Development 15, 748 (2001). [115] L. H. Kauffman, Knots and physics, Series on Knots and [86] N. R. Cozzarelli and J. C. Wang, DNA topology and its Everything, Vol. I (World Scientific, Singapore, 1993). biological effects (Cold Spring Harbor Laboratory Press, [116] K. Reidemeister, Knotentheorie (Springer, Berlin, 1931) Cold Spring Harbor, NY, 1990). [Knot theory (BSC Assocs., Moscow, Idaho, 1983)]. [87] L. F. Liu, C.-C. Liu, and B. M. Alberts, Cell 19, 697 [117] J. B. Listing, Vorstudien zur Topologie, G¨ottinger Stu- (1980). dien (Vandenhoeck und Ruprecht, G¨ottingen, 1848). [88] K. Mizuuchi, L. M. Fisher, M. H. O. Dea, and M. [118] W. Thomson, Lord Kelvin, Phil. Mag. 34, 15 (1867). Gellert, Proc. Natl. Acad. Sci. 77, 1847 (1980). [119] W. Thomson, Proc. Roy. Soc. Edinb. 27, 59 (1875-76). [89] T. J. Pollock and H. A. Nash, J. Mol. Biol. 170, 1 [120] P. G. Tait, Trans. Roy. Soc. Edinb. 28, 145 (1876-77). (1983). [121] P. G. Tait, Trans. Roy. Soc. Edinb. 32, 327 (1883-4). [90] S. J. Spengler, A. Stasiak, and N. R. Cozzarelli, Cell [122] P. G. Tait, Trans. Roy. Soc. Edinb. 33, 493 (1884-85). 42, 325 (1985). [123] P. G. Tait, Scientific papers (Cambridge University [91] S. A. Wassermann and N. R. Cozzarelli, Science 232, Press, London, 1898). 951 (1986). [124] T. P. Kirkman, Proc. Royal Soc. Edinb. 13, 120, 363 [92] S. A. Wassermann and N. R. Cozzarelli, Proc. Natl. (1884-5). Acad. Sci. USA 82, 1079 (1984). [125] T. P. Kirkman, Proc. Royal Soc. Edinb. 32, 483 (1884- [93] E. Viguera, P. Hernandez, D. B. Krimer, A. S. Boistov, 85). R. Lurz, J. C. Alonso, and J. B. Schvartzman, J. Biol. [126] C. N. Little, Trans. Acad. Sci. 18, 7, 27 Chem. 271, 22414 (1996). (1885). [94] J. Sogo, A. Stasiak, M. L. Mart´ınez-Robles, D. B. [127] C. N. Little, Trans. Roy. Soc. Edinb. 35. 771 (1890). Krimer, P. H´ernandez, and J. B. Schvartzman, J. Mol. [128] P. van de Griend, in History and science of knots, edited Biol. 286, 637 (1999). by J. C. Turner and P. van de Griend (World Scientific, [95] F. B. Dean, A. Stasiak, T. Koller, and N. R. Cozzarelli, Singapore, 1996). J. Biol. Chem. 260, 4975 (1985). [129] A. V. Vologodskii, A. V. Lukashin, M. D. Frank- [96] J. Arsuaga, M. Vazquez, S. Trigueros, D. Sumners, and Kamenetskii, and V. V. Anshelvich, Sov. Phys. JETP J. Roca, Proc. Natl. Acad. Sci. USA 99, 5373 (2002). 39, 1059 (1974). [97] A. V. Vologodskii, N. J. Crisona, B. Laurie, P. Pieranski, [130] H. L. Frisch and E. Wassermann, J. Am. Chem. Soc. 50

83, 3789 (1961). Madras, J. Phys. A 23, 1589 (1990). [131] M. Delbr¨uck, in Mathematical problems in biological sci- [164] B. Li, S. N. Madras, and A. D. Sokal, J. Stat. Phys. 80, ences (Proc. Symp. Appl. Math. 14, 55 (1962)), edited 661 (1995). by R. E. Bellman. [165] E. Orlandini, M. C. Tesi, E. J. Janse van Rensburg, and [132] D. W. Sumners and S. G. Whittington, J. Phys. A 21, S. G. Whittington, J. Phys. A 31, 5953 (1998). 1689 (1988). [166] E. Ercolini, F. Valle, J. Adamcik, G. Witz, R. Met- [133] N. Pippenger, Discrete Appl. Math. 25, 273 (1989). zler, P. de los Rios, J. Roca, and G. Dietler, E-print [134] C. Vanderzande, J. Phys. A 28, 3681 (1995). cond-mat/0609084. [135] M. D. Frank-Kamenetskii, A. V. Lukashin, and A. V. [167] A. J. Guttmann, J. Phys. A 22, 2807 (1989). Vologodskii, Nature 258, 398 (1975). [168] E. Guitter and E. Orlandini, J. Phys. A 32, 1359 (1999). [136] K. Koniaris and M. Muthukumar, Phys. Rev. Lett. 66, [169] S. R. Quake, Phys. Rev. Lett. 73, 3317 (1994). 2211 (1991). [170] S. R. Quake, Phys. Rev. E 52, 1176 (1986). [137] J. P. J. Michels and F. W. Wiegel, Proc. Roy. Soc. (Lon- [171] A. Yu. Grosberg, A. Feigel, and Y. Rabin, Phys. Rev. don) A 403, 269 (1986). E 54, 6618 (1996). [138] J. P. J. Michels and F. W. Wiegel, Phys. Lett. 90A, 381 [172] B. Erman and J. E. Mark, Structures and properties of (1982). rubberlike networks (Oxford, New York, 1997). [139] E. J. Janse van Rensburg and S. G. Whittington, J. [173] J. D. Ferry, Viscoelastic properties of polymers (Wiley, Phys. A 24, 3935 (1991). New York, 1970). [140] M. D. Frank-Kamenetskii, A. V. Lukashin, V. V. An- [174] I. M. Ward and D. Hadley, An Introduction to the Me- shelevich, and A. V. Vologodskii, J. Biomol. Struct. chanical Properties of Solid Polymers (John Wiley and Dyn. 2, 1005 (1985). Sons Ltd., New York). [141] A. Yu. Grosberg, E-print cond-mat/0207427 (2002). [175] P.-G. de Gennes, Macromolecules 17, 703 (1984). [142] K. V. Klenin, A. V. Vologodskii, V. V. Anshelevich, [176] P. K. Lai, Y. J. Sheng, and H. K. Tsao, Phys. Rev. Lett. A. M. Dykhne, and M. D. Frank-Kamenetskii, J. 87, 175503 (2001). Biomol. Struct. Dyn. 5, 1173 (1988). [177] M. K. Shimamura and T. Deguchi, Phys. Rev. E 64, [143] M. K. Shimamura and T. Deguchi, Phys. Lett. A 274, 020801 (2001). 184 (2000). [178] O. Farago, Y. Kantor, and M. Kardar, Europhys. Lett. [144] V. Katritch, W. K. Olson, A. Vologodskii, J. Dubochet, 60, 53 (2002). and A. Stasiak, Phys. Rev. E 61, 5545 (2000). [179] R. Metzler, Y. Kantor, and M. Kardar, Phys. Rev. E [145] A. Dobay, P. E. Sottas, J. Dubochet, and A. Stasiak, 66, 022102 (2002). Lett. Math. Phys. 55, 239 (2001). [180] Y. J. Sheng, P. K. Lai, and H. K. Tsao, Phys. Rev. E [146] P. G. de Gennes, Scaling concepts in polymer physics 61, 2895 (2000). (Cornell University Press, Ithaca, New York, 1979). [181] P. K. Lai, Y. J. Sheng, and H. K. Tsao, Macromol. [147] B. Duplantier, J. Stat. Phys. 54, 581 (1989). Theor. Simul. 9, 578 (2000). [148] J. M. Deutsch, Phys. Rev. E 59, R2539 (1999). [182] R. Zandi, Y. Kantor, and M. Kardar, E-print [149] P. G. Dommersnes, Y. Kantor, and M. Kardar, Phys. cond-mat/0306587 (2003). Rev. E 66, 031802 (2002). [183] B. Marcone, E. Orlandini, A. L. Stella, and F. Zonta, [150] B. D. Hughes, Random Walks and Random Environ- J. Phys. A 38, L15 (2005). ments, Volume 1: Random Walks (Oxford University [184] P. Virnau, Y. Kantor, and M. Kardar, J. Amer. Chem. Press, Oxford, 1995). Soc. 127, 15102 (2005). [151] P. J. Flory, Statistical Mechanics of Chain Molecules [185] E. Orlandini, A. L. Stella, and C. Vanderzande, Phys. (Oxford University Press, Oxford, 1989). Rev. E 68, 031804 (2003). [152] R. Metzler, A. Hanke, P. G. Dommersnes, Y. Kantor, [186] E. Orlandini, A. L. Stella, and C. Vanderzande, J. Stat. and M. Kardar, Phys. Rev. Lett. 88, 188101 (2002). Phys. 115, 681 (2004). [153] R. Metzler, A. Hanke, P. G. Dommersnes, Y. Kantor, [187] T. A. Vilgis, Phys. Rep. 336, 167 (2000). and M. Kardar, Phys. Rev. E 65, 061103 (2002). [188] A. L. Kholodenko and T. A. Vilgis, Phys. Rep. 298, 251 [154] B. Duplantier, Phys. Rev. Lett. 57, 941 (1986). (1998). [155] Y. Kafri, D. Mukamel, and L. Peliti, Phys. Rev. Lett. [189] S. F. Edwards, Proc. Phys. Soc. (London) 91, 513 85, 4988 (2000). (1967). [156] Y. Kafri, D. Mukamel, and L. Peliti, Eur. Phys. J. B [190] S. F. Edwards, J. Phys. A 1, 15 (1968). 27, 135 (2002). [191] M. Doi and S. F. Edwards The Theory of Polymer Dy- [157] L. Sch¨afer, C. v. Ferber, U. Lehr, and B. Duplantier, namics (Clarendon Press, Oxford, 1986). Nucl. Phys. B 374, 473 (1992). [192] M. Otto and T. A. Vilgis, Phys. Rev. Lett. 80, 881 [158] M. B. Hastings, Z. A. Daya, E. Ben-Naim, and R. E. (1998). Ecke, Phys. Rev. E 66, 025102(R) (2002). [193] M. Otto and T. A. Vilgis, J. Phys. A 29, 3893 (1996). [159] J. des Cloizeaux, J. Phys. (France) Lett. 42, L433 [194] T. A. Vilgis and M. Otto, Phys. Rev. E 56, R1314 (1981). (1997). [160] A. Dobay, J. Dubochet, K. Millett, P.-E. Sottas, and A. [195] F. Ferrari, H. Kleinert, and I. Lazzizzera, Int. J. Mod. Stasiak, Proc. Natl. Acad. Sci. USA 100, 5611 (2003). Phys. B 14, 3881 (2000). [161] J. C. LeGuillou and J. Zinn-Justin, Phys. Rev. B 21, [196] F. Ferrari, H. Kleinert, and I. Lazzizzera, Phys. Lett.A 3976 (1989). 276, 31 (2000). [162] J. C. LeGuillou and J. Zinn-Justin, J. Physique (France) [197] A. Yu. Grosberg and S. Nechaev, Adv. Polym. Sci. 106, 50, 1365 (1989). 1 (1993). [163] E. J. Janse van Rensburg, S. G. Whittington, and N. [198] A. Yu. Grosberg and S. Nechaev, Europhys. Lett. 20, 51

613 (1992). [233] S. G. Delcourt and R.D. Blake, J. Biol. Chem. 266, [199] A. Yu. Grosberg and S. Nechaev, J. Phys. A 25, 4659 15160 (1991). (1992a). [234] R. M. Wartell and A. S. Benight, Phys. Rep. 126, 67 [200] S. Nechaev, in Topological aspects of low dimensional (1985). systems, Lecture notes of Les Houches 1998 sum- [235] R. D. Blake, R. D., Bizzaro, J. W., Blake, J. D., Day, mer school; extended version available as E-print G. R., Delcourt, S. G., Knowles, J., Marx, K. A., & cond-mat/9812205. SantaLucia, J., Jr. (1999) Bioinf. 15, 370 (1999). [201] O. Vasilyev and S. Nechaev, JETP 93, 1119 (2001). [236] R. Blossey and E. Carlon, Phys. Rev. E 68, 061911 [202] S. W. P. Turner, M. Cabodi, and H. G. Craighead, Phys. (2003). Rev. Lett. 88, 128103 (2002). [237] E. Yeramian, Gene 255, 139 (2000). [203] E. Ben-Naim, Z. A. Daya, P. Vorobieff, and R. E. Ecke, [238] E. Yeramian, Gene 255, 151-168 (2000). Phys. Rev. Lett. 86, 1414 (2001). [239] E. Carlon, M. L. Malki, and R. Blossey, Phys. Rev. Lett. [204] B. Maier and J. O. R¨adler, Phys. Rev. Lett. 82, 1911 94, 178101 (2005). (1999). [240] M. C. Williams, Biophys. Textbook online Op- [205] H. Walter and D. E. Brooks, FEBS Lett. 361, 135 tical tweezers: measuring piconewton forces, (1995). www.biophysics.org/btol/. [206] V. V. Vasilevskaya, A. R. Khokhlov, Y. Matsuzawa, and [241] S. B. Smith, Y. J. Cui, and C. Bustamante, Science 271, K. Yoshikawa, J. Chem. Phys. 102, 6595 (1995). 795 (1996). [207] L. Lerman, Proc. Natl. Acad. Sci. USA 68, 1886 (1971). [242] M. C. Williams, I. Rouzina, and V. A. Bloomfield, Ac- [208] T. Garel, Remarks on homo- and hetero-polymeric cts. Chem. Res. 35, 159 (2002). aspects of protein folding, E-print cond-mat/0305053 [243] T. Hwa, E. Marinari, K. Sneppen, and L.-H. Tang, Proc. (2003). Natl. Acad. Sci. USA 100, 4411 (2003). [209] B. Duplantier, J. Phys. A 19, L1009 (1986). [244] C. W. Dieffenbach and G. S. Dycksler, PCR primer: [210] B. Duplantier and H. Saleur, Nucl. Phys. B 290, 291 a laboratory manual (Cold Spring Harbor Laboratory (1987). Press, Cold Spring Harbor, NY, 1995). [211] J. L. Jacobsen, N. Read, and H. Saleur, Phys. Rev. Lett. [245] M. J. McPherson and S. G. Møller, PCR Basics 90, 090601 (2003). (Springer-Verlag Telos, New York, 2000. [212] B. Duplantier, Phys. Rev. Lett. 71, 4274 (1993). [246] M. Gu´eron, M. Kochoyan, and J. L. Leroy, Nature 328 [213] A. L. Owczarek, T. Prellberg, and R. Brak, Phys. Rev. 89 (1987). Lett. 70, 951 (1993). [247] A. Krueger, E. Protozanova, and M. D. Frank- [214] A. L. Owczarek, T. Prellberg, and R. Brak, Phys. Rev. Kamenetskii, Biophys. J., at press. (2006). Lett. 71, 4275 (1993). [248] C. Kittel, Am. J. Phys. 37, 917 (1969). [215] B. Duplantier and F. David, J. Stat. Phys. 51, 327 [249] G. Altan-Bonnet, A. Libchaber, and O. Krichevsky, (1988). Phys. Rev. Lett. 90, 138101 (2003). [216] J. Kondev and J. L. Jacobsen, Phys. Rev. Lett. 81, 2922 [250] Y. Zeng, A. Montrichok and G. Zocchi, J. Mol. Biol. (1998). 339, 67 (2004). [217] M. E. Fisher, Physica D 38, 112 (1989). [251] M. Gu´eron, M. Kochoyan, and J. L. Leroy, Nature 328, [218] F. Orlandini, F. Seno, A. L. Stella, and M. C. Tesi, 89 (1987). Phys. Rev. Lett. 68, 488 (1992). [252] M. Gu´eron and J. L. Leroy, Methods Enzymol. 261, 383 [219] R. Dekeyser, E. Orlandini, A. L. Stella, and M. C. Tesi, (1995). Phys. Rev. E 52, 5214 (1995). [253] I. M. Russu, Methods Enzymol. 261, 383 (1995). [220] J. Cardy, J. Phys. A 34, L665 (2001). [254] C. Chen and I. M. Russu, Biophys. J. 87, 2545 (2004). [221] A. Hanke, R. Metzler, P. G. Dommersnes, Y. Kantor, [255] D. E. Jensen and P. H. von Hippel, J. Biol. Chem. 251, and M. Kardar, Eur. Phys. J. E 12, 347 (2003). 7198 (1976). [222] R. Gambini and J. Pullin, Loops, knots, gauge theo- [256] R. L. Karpel, in reference (232). ries and quantum gravity (Cambridge University Press, [257] D. E. Jensen, R. C. Kelly and P. H. von Hippel, J. Biol. Cambridge, UK, 1996). Chem. 251, 7215 (1976). [223] R. Metzler, New J. Phys. 4, 91.1 (2002). [258] R. L. Karpel, IUBMB Life 53, 161 (2002). [224] A. Yu. Grosberg, Phys. Rev. Lett. 85, 3858 (2000). [259] K. Pant, R. L. Karpel and M. C. Williams, J. Mol. Biol. [225] R. C. Ball, M. Doi, S. F. Edwards, and M. Warner, 327, 571 (2003). Polymer 22, 1010 (1981). [260] K. Pant, R. L. Karpel, I. Rouzina and M. C. Williams, [226] M. Doi and S. F. Edwards, J. Chem. Soc. Farad. Trans. J. Mol. Biol. 336, 851 (2004). 274, 1802 (1978). [261] K. Pant, R. L. Karpel, I. Rouzina, and M. C. Williams, [227] S. F. Edwards and T. A. Vilgis, Polymer 27, 483 (1986). J. Mol. Biol. 349, 317 (2005). [228] P. G. Higgs and R. C. Ball, Europhys. Lett. 8, 357 [262] D. Poland and H. A. Scheraga, Theory of Helix- (1989). Coil Transitions in Biopolymers (Academic Press, New [229] J. U. Sommer, J. Chem. Phys. 97, 5777 (1992). York, 1970). [230] A. Hanke and R. Metzler, Chem. Phys. Lett. 359, 22 [263] C. R. Cantor and P. R. Schimmel, Biophysical Chem- (2002). istry (W. H. Freeman and Company, New York, NY, [231] R. Metzler and T. Ambj¨ornsson, J. Comp. Theoret. 1980). Nanosc. 2, 389 (2005). [264] C. Richard and A. J. Guttmann, J. Stat. Phys. 115, 925 [232] A. Revzin, Editor, The biology of non-specific DNA- (2004). protein interactions (CRC Press, Boca Raton, FL, [265] A. Hanke, J. Phys.: Condens. Matter 17, S1731 (2005). 1990). [266] M. Fixman and J. J. Freiere, Biopol. 16, 2693 (1977). 52

[267] J. SantaLucia Jr., Proc. Natl. Acad. Sci. 95, 1460 U.S.A. 68, 2185 (1971). (1998). [301] R. Metzler, Phys. Rev. Lett. 87, 068103 (2001). [268] Yu. S. Lazurkin, M. Frank-Kamenetskii, and E. N. Tri- [302] R. Metzler and P. G. Wolynes, Chem. Phys. 284, 469 fonov, Biopol. 9, 1253 (1970). (2002). [269] T. Ambj¨ornsson, S. K. Banik, O. Krichevsky, and [303] J. W. Little, D. P. Shipley, and D. W. Wert, EMBO J. R. Metzler, Phys. Rev. Lett. 97, 128105 (2006). 18, 4299 (1999). [270] T. Ambj¨ornsson, S. K. Banik, O. Krichevsky, and [304] D. V. Rozanov, R. D’Ari, and S. P. Sineoky, J. Bacte- R. Metzler, in preparation. riol. 180, 6306 (1998). [271] T. Garel, C. Monthus, and H. Orland, Europhys. Lett. [305] E. Aurell and K. Sneppen, Phys. Rev. Lett. 88, 048101 55, 132 (2001). (2002). [272] A. Hanke and R. Metzler, Phys. Rev. Lett. 90, 159801 [306] E. Aurell, S. Brown, J. Johanson, and K. Sneppen, (2003). Phys. Rev. E. 65, 051914 (2002). [273] S. M. Bhattacharjee, Europhys. Lett. 57, 772. [307] M. A. Shea and G. K. Ackers, J. Mol. Biol. 181, 211 [274] L. D. Landau and E. M. Lifshitz, Physical Kinetics (1985). (Butterworth-Heinemann, Oxford, UK, 1981). [308] A. Arkin, J. Ross, and H. H. McAdams, Genetics 149, [275] A. Hanke and R. Metzler, J. Phys. A 36, L473 (2003). 1633 (1998). [276] T. Ambj¨ornsson and R. Metzler, J. Phys. Cond. Mat. [309] A. Bakk, R. Metzler, and K. Sneppen, Biophys. J. 86, 17, S1841 (2005). 58 (2004). [277] T. Ambj¨ornsson and R. Metzler, Phys. Rev. E 72, [310] A. Bakk, R. Metzler, and K. Sneppen, Isr. J. Chem. 44, 030901(R) (2005). 309 (2004). [278] R. Metzler and T. Ambj¨ornsson, J. Biol. Phys. 31, 339 [311] O. G. Berg, R. B. Winter, and P. H. von Hippel, (2005). Biochem. 20, 6929 (1981). [279] T. Novotn´y, J. N. Pedersen, M. S. Hansen, [312] R. B. Winter, O. G. Berg, and P. H. von Hippel, T. Ambj¨ornsson, and R. Metzler, E-print Biochem. 20, 6961 (1981). cond-mat/0610752. [313] P. H. von Hippel and O. G. Berg, Proc. Natl. Acad. Sci. [280] N. G. van Kampen, Stochastic Processes in Physics and U.S.A. 83, 1608 (1986). Chemistry (North-Holland, Amsterdam, 1992). [314] C. G. Kalodimos, N. Biris, A. M. Bonvin, M. M. Levan- [281] O. Krichevsky and G. Bonnet, Rep. Prog. Phys. 65, 251 doski, M. Guennuegues, R. Boelens, and R. Kaptein, (2002). Science 305, 350 (2004). [282] M. Peyrard and A. R. Bishop, Phys. Rev. Lett. 62, 2755 [315] M. Lewis, G. Chang, N. C. Horton, M. A. Kercher, (1989). H. C. Pace, M. A. Schumacher, R. G. Brennan, and [283] T. Dauxois, M. Peyrard, and A. R. Bishop, Phys. Rev. P. Lu, Science 271, 1247 (1996). E 44, R44 (1993). [316] U. Gerland, J. D. Moroz, and T. Hwa, Proc. Natl. Acad. [284] B. S. Alexandrov, L. T. Wille, K. Ø. Rasmussen, A. R. Sci. U.S.A. 99, 12015 (2002). Bishop, and K. B. Blagoev, E-print cont-mat/0601555. [317] A. Bakk and R. Metzler, FEBS Lett. 563, 66 (2004). [285] A. Campa and A. Giansanti, Phys. Rev. E 58, 3585 [318] A. Bakk and R. Metzler, J. Theor. Biol. 231, 525 (1998). (2004). [286] S. Ares, N. K. Voulgarakis, Ø. Rasmussen, and A. R. [319] Y. Kao-Huang, A. Revzin, A. P. Butler, P. O’Conner, Bishop, Phys. Rev. Lett. 94, 035504 (2005); cf. reference D. W. Noble, and P. H. von Hippel, Proc. Natl. Acad. [19]. Sci. U.S.A. 74, 4228 (1977). [287] T. S. van Erp, S. Cuesta-Lopez, J.-G. Hagmann, and [320] G. D. Stormo, Bioinformatics 16, 16 (2000). M. Peyrard, Phys. Rev. Lett. 95, 218104 (2005). [321] G. Adam and M. Delbr¨uck, in A. Rich, and N. Davidson, [288] S. K. Banik, T. Ambj¨ornsson, and R. Metzler, Euro- Eds., Structural Chemistry and Molecular Biology (W. phys. Lett. 71, 852 (2005). H. Freeman, San Francisco, CA, 1968). [289] D. T. Gillespie, J. Comp. Phys. 22, 403 (1976). [322] P. H. Richter and M. Eigen, Biophys. Chem. 2, 255 [290] D. T. Gillespie, J. Phys. Chem. 81, 2340 (1977). (1974). [291] Ch. H. Choi, G. Kalosakas, K. Ø. Ramsussen, M. Hiro- [323] I. M. Sokolov, J. Mai and A. Blumen, Phys. Rev. Lett. mura, A. R. Bishop, and A. Usheva, Nucl. Acids Res. 79, 857 (1997). 32, 1584 (2004). [324] D. Brockmann and T. Geisel, Phys. Rev. Lett. 91, [292] H. C. Fogedby and R. Metzler, in preparation. 048303. [293] H. C. Fogedby and R. Metzler, E-print [325] I.M. Sokolov, J. Mai, and A. Blumen, J. Luminesc. 76- cond-mat/0608458. 77, 377 (1998). [294] T. Ambj¨ornsson and R. Metzler, J. Phys. Cond. Mat. [326] M. Slutsky and L. A. Mirny, Biophys. J. 87, 4021 17, S4305 (2005). (2004). [295] T. Ambj¨ornsson, M. A. Lomholt, and R. Metzler, J. [327] M. Coppey, O. B´enichou, R. Voituriez, and M. Moreau, Phys. Cond. Mat. 17, S3945 (2005). Biophys. J. 87, 1640 (2004). [296] T. Ambj¨ornsson and R. Metzler, Phys. Biol. 1, 77 [328] J. Yan and J. F. Marko, Phys. Rev. Lett. 93, 108108 (2004). (2004). [297] A. Lwoff, Bacteriol. Rev. 17, 269 (1953). [329] A. O. Grillo, M. P. Brown, and C. A. Royer, J. Mol. [298] M. Ptashne, A Genetic Switch, 3rd edition (Cold Spring Biol. 287, 539 (1999). Harbor Laboratory Press, Cold Spring Harbor, New [330] R. S. Spolar and M. T. Record, Science 263, 777 (1994). York, 2004) [331] N. Shimamoto, J. Biol. Chem. 274, 15293 (1999). [299] E. H. Davidson et al., Science 295, 1669 (2002). [332] M. Slutsky, M. Kardar, and L. A. Mirny, Phys. Rev. E [300] L. Reichardt and A. D. Kaiser, Proc. Natl. Acad. Sci. 69, 061903 (2004). 53

[333] I. M. Sokolov, R. Metzler, K. Pant, and M. C. Williams, [361] J.-P. Sauvage and C. Dietrich-Buchecker (editors), Biophys. J. 89, 895 (2005). Molecular catenanes, rotaxanes, and knots: a journey [334] J. D. McGhee and P. H. von Hippel, J. Mol. Biol. 86, through the world of molecular topology, Wiley-VCH, 469 (1974). Weinheim, 1999. [335] I. M. Sokolov, R. Metzler, K. Pant, and M. C. Williams, [362] M.-J. Blanco, M. C. Jim´enez, J.-C. Chambron, V. Heitz, Phys. Rev. E 72, 041102 (2005). M. Linke, J.-P. Sauvage, Chem. Soc. Rev. 28 (1999) 293. [336] M. F. Shlesinger, G. M. Zaslavsky, and J. Klafter, Na- [363] M. C. Jim´enez, C. Dietrich-Buchecker, J.-P. Sauvage, ture 363, 31 (1993). Angew. Chem. Int. Ed. 39 (2000) 3284. [337] J. Klafter, M. F. Shlesinger and G. Zumofen, Phys. To- [364] L. Raehm, J.-M. Kern, J.-P. Sauvage, Chem. Eur. J. 5 day 49 (2), 33 (1996). (1999) 3310. [338] R. Metzler and J. Klafter, Phys. Rep. 339, 1 (2000). [365] A. R. Pease, J. O. Jeppesen, J. F. Stoddart, Y. Luo, C. [339] R. Metzler and J. Klafter, J. Phys. A 37, R161 (2004). P. Collier, J. R. Heath, Acc. Chem. Res. 34 (2001) 433. [340] A. V. Chechkin, V. Yu. Gonchar, R. Metzler, and J. [366] J.-M. Lehn, Chem. Eur. J. 6 (2000) 2097. Klafter, Adv. Chem. Phys., at press (2006). [367] C. Montemagno, G. Bachand, Nanotechnology 10 [341] G. M. Viswanathan, S. V. Buldyrev, S. Havlin, (1999) 225. M. G. .E . da Luz, E. P. . Raposo, and H. E. Stanley, [368] R. K. Soong, G. D. Bachand, H. P. Neves, A. G. Nature 401, 911 (1999). Olkhovets, H. G. Craighead, C. D. Montemagno, Sci- [342] P. L´evy, Th´eorie de l’addition des variables al´eatoires ence 290 (2000) 1555. (Gauthier-Villars, Paris, 1954). [369] B. Yurke, A. J. Turberfield, A. P. Mills jr., F. C. Simmel, [343] B. V. Gnedenko and A. N. Kolmogorov Limit Distribu- J. L. Neumann, Nature 406 (2000) 605. tions for Sums of Random Variables (Addison–Wesley, [370] R. Metzler, Phys. Rev. E 63 (2001) 12103. Reading, 1954). [371] A. Ajdari, F. J¨ulicher, J. Prost, Rev. Mod. Phys. 69 [344] M. A. Lomholt, T. Ambj¨ornsson, and R. Metzler, Phys. (1998) 1269. Rev. Lett. 95, 260603 (2005). [372] A. Vilfan, E. Frey, F. Schwabl, M. Thormahlen, Y. H. [345] M. A. Lomholt, T. Ambj¨ornsson, and R. Metzler, in Song, E. Mandelkow, J. Mol. Biol. 312 (2001) 1011. preparation [373] G. Bottari, F. Dehez, D. A. Lehigh, P. J. Nash, E. M. [346] A. V. Chechkin, R. Metzler, J. Klafter, V. Y. Gonchar, Perez, J. K. Y. Wong, and F. Zerbetto, Angew. Chemie and L. V. Tanatarov, J. Phys. A 36, L537 (2003). Intl. Ed. 42, 5886 (2003). [347] D. E. Smith, S. J. Tans, S. B. Smith, S. Grimes, D. L. [374] S. Garaudee, S. Silvi, M. Venturi, A. Credi, A. H. Flodd, Anderson, and C. Bustamante, Nature 413, 748 (2001). and J. F. Stoddart, Chem. Phys. Chem. 6, 2145 (2005). [348] R. Zandi and D. Reguera, Phys. Rev. E , (2005). [375] F. Gittes, E. Meyhofer, S. Baek, J. Howard, Biophys. J. [349] C. E. Catalano, D. Cue, and M. Feiss, Mol. Microbiol. 70 (1996) 418. 16, 1075 (1995). [376] P.-G. de Gennes, Comptes Rendus Acad. Sci. Paris S´er. [350] H. Fujisawa and M. Morita, Genes to Cells 2, 537 II 324 (1997) 343. (1997). [377] D. L. Thomsen, P. Keller, J. Naciri, R. Pink, H. Jeon, [351] N. V. Hud and K. H. Downing, Proc. Natl. Acad. Sci. D. Shenoy, B. R. Ratna, Macromol. 34 (2001) 5868. USA 98, 14925 (2001). [378] I. M. Kulic, R. Thaokar, and H. Schiessel, Europhys. [352] N. V. Hud, Biophys. J. 69, 1355 (1995). Lett. 72, 527 (2005). [353] M. E. Cerritelli, N. Cheng, A. H. Rosenberg, C. E. [379] I. M. Kulic, R. Thaokar, and H. Schiessel, J. Phys. McPherson, F. P. Booy, and A. C. Steven, Cell 17, 271 Cond. Mat. 17, S3965 (2005). (1997). [380] C. Jarzinsky, Phys. Rev. Lett. 78, 2690 (1997). [354] J. Kindt, S. Tzlil, A. Ben-Shaul, and W. M. Gelbert, [381] O. Braun, A. Hanke, and U. Seifert, Phys. Rev. Lett. Proc. Natl. Acad. Sci. USA 98, 13671 (2001). 93, 158105 (2004). [355] P. K. Purohit, M. M. Inamdar, P. D. Grayson, T. M. [382] U. Seifert, Phys. Rev. Lett. 95, 040602 (2005). Squires, J. Kondev, and R. Phillips, Biophys. J. 88, 851 [383] P. J. Flory, Principles of Polymer Chemistry (Cornell (2005). University Press, Ithaca, New York, 1953). [356] I. Ali, D. Marenduzzo, and J. M. Yeomans, J. Chem. [384] W. J. Orr, Trans. Faraday Soc. 43, 12 (1947). Phys. 121, 8635 (2004). [385] K. Ohno and K. Binder, J. Phys. (Paris) 49, 1329 [357] G. Schill, Catenanes, rotaxanes and knots, Academic (1988). Press, New York, 1971. [386] B. Duplantier and F. David, J. Stat. Phys. 51, 327 [358] D. Bishop, P. Gammel, C. R. Giles, Phys. Today 54(10) (1988). (2001). [387] B. Duplantier and H. Saleur, Phys. Rev. Lett. 59, 539 [359] T. Strick, J.-F. Allemand, V. Croquette, D. Bensimon, (1987). Phys. Today 54(10) (2001) 46. [388] P. Grassberger and R. Hegger, Ann. Phys. 4, 230 (1995). [360] J.-M. Lehn, Supramolecular Chemistry, VCH, Wein- heim, 1995.