THE CATHOLIC UNIVERSITY OF AMERICA

Development and Use of Novel Transverse Magnetic Tweezers for Single-Molecule Studies of DNA- Interactions

A DISSERTATION

Submitted to the Faculty of the Department of Biomedical Engineering School of Engineering Of The Catholic University of America In Partial Fulfillment of the Requirements For the Degree Doctor of Philosophy

By Christopher D. Tyson

Washington, D.C.

2016

Development and Use of Novel Transverse Magnetic Tweezers for Single-Molecule Studies of DNA-Protein Interactions

Christopher D. Tyson, Ph.D.

Director: Abhijit Sarkar, Ph.D.

I describe several contributions to single molecule experiments. A transverse magnetic tweezers is presented that enables in-plane micromechanical manipulation of a single DNA molecule. This includes a new method for tethering DNA utilizing two labeled beads and a functionalized glass micro-rod. The attachment chemistry reported here enables rapid capture of multiple DNA tethers in parallel, overcomes the difficulties associated with bead aspiration, and preserves the ability to perform differential extension measurements from the bead centroids. Combined with micro- injection pipettes, a new sample cell design, and a buffer exchange system, the components increase the ease-of-use and experimental throughput of the magnetic tweezers device. On the software side, several unique computational methods for interrogating single molecule data are described. First, a technique that uses the diffraction pattern of beads to perform sub-pixel, ~10 nm-level localization of the bead centroids is explained. Second, a novel method for automatically detecting steps in

DNA extension data is presented. This algorithm is well-suited for analyzing experiments involving binding and force-induced unbinding of DNA-protein

complexes, which produce flat extension regions – steps – corresponding to the times between individual protein association or dissociation events. Finally, a new algorithm for tracking densely-populated, fast spawning, indistinguishable objects moving unidirectionally at high-velocities is developed and its performance thoroughly characterized. Together, these results should improve single molecule micromanipulation techniques by providing a hardware and software combination that can be implemented and used relatively easily, while enabling near-Brownian-noise limit force and extension measurements on DNA and DNA-protein complexes.

This dissertation by Christopher D. Tyson fulfills the dissertation requirement for the doctoral degree in Biomedical Engineering approved by Otto Wilson, Ph.D., as Advisor, and by Abhijit Sarkar, Ph.D., and Lorenzo Resca, Ph.D. as Readers.

______

Otto Wilson, Ph.D., Advisor

______

Abhijit Sarkar, Ph.D., Reader

______

Lorenzo Resca, Ph.D., Reader

ii

Acknowledgements

For their contributions in support of my work, I would like to thank:

Roberto Fabian

Christopher McAndrew, Ph.D.

Anneliese Striz

Prof. Pamela Tuma, Ph.D.

Prof. Ian L. Pegg, Ph. D.

Additionally, funding from the Vitreous State Laboratory and The Catholic University

of America is gratefully acknowledged.

iii

Table of Contents

Chapter 1: Introduction

1.1 Single-molecule Biology

1.2 DNA

1.3

1.4 Response of DNA to Micromechanical Manipulation

1.5 Studying DNA-Protein Interactions with Single Molecule Methods

1.6 Research Problem and Approach

1.7 Plan of Research

Chapter 2: Transverse Magnetic Tweezers

2.1 Introduction to Methods for Single-molecule Micromechanical Manipulation

2.2 Horizontal Magnetic Tweezers Methodology

2.2.1 Attachment Protocol

2.2.2 Horizontal Magnetic Tweezers Device Design

2.2.3 Optical Calibration with Graticule

2.2.4 Force Calculation via Fluctuation-dissipation Theorem

2.2.5 Force Calibration via Stokes Law

2.3 Experimental Results

iv

2.3.1 DNA tether extension

2.3.2 Force Calibration results

2.3.3 Determination of Experimental Precision

2.3.4 DNA-protein complexation

2.4 Discussion

Chapter 3: Step-finding Algorithm

3.1 Introduction to Step-finding Algorithms

3.2 Step-finding Algorithm Methodology

3.2.1 Step-finding Algorithm description

3.2.2 Step-trace simulations

3.2.3 Performance Analysis

3.2.4 Experimental Data

3.3 Results

3.3.1 Simulation Results

3.3.2 Algorithm Performance

3.4 Discussion

Chapter 4: Object Tracking Algorithm

v

4.1 Introduction to Tracking Algorithms

4.2 Object Tracking Algorithm Methodology

4.2.1 Tracker Algorithm Description

4.2.2 Trajectory Simulations

4.2.3 Performance Analysis

4.2.4 Experimental Data

4.3 Results

4.3.1 Total Number of Objects

4.3.2 Initial x-position

4.3.3 Initial y-position

4.3.4 Initial x-velocity

4.3.5 Initial y-velocity

4.3.6 y-acceleration

4.3.7 Noise and Spawn rate

4.3.8 Experimental Parameter Simulations

4.4 Discussion

Chapter 5: Conclusions

5.1 Summary of Results

vi

5.2 Future Directions

Appendices

A.1 Protocol for surface functionalization of glass micro-rod

A.2 Protocol for DNA End-functionalization

A.3 Protocol for labeling of superparamagnetic beads with anti-digoxigenin

A.4 Magnetic Tweezers Component List

A.5 Protocol for preparation of

A.6 Bead Centroid Localization code

A.7 Force Calculation code

A.8 Step-generator Simulation code

A.9 Step-finding Algorithm code

A.10 Object Tracking code

A.11 Trajectory Simulation code

vii

Chapter 1 - Introduction

1.1 Single Molecule Techniques – An Overview

Single molecule techniques involve high precision measurements on mechanical and optical signals from individual biological macromolecules in vitro and inside living cells, in vivo. Traditionally, proteins and DNA have been studied using ensemble techniques that involve large numbers of molecules. The resulting data are population averages, making it difficult to infer information about the distributions around mean molecular responses or to detect and characterize unusual but biologically-relevant subpopulations of molecules. Single molecule techniques are powerful because they provide a means to overcome these limitations. For instance, using single molecule methods, rare or short-lived intermediate macromolecular conformations that would otherwise be averaged out in an ensemble assay can be discovered and thoroughly investigated. When protein-DNA interactions display heterogeneity in their kinetics, single molecule approaches are well-suited to distinguish alternate reaction pathways.

In ensemble assays, on the other hand, very large number of molecules would have to have their activities synchronized to obtain comparable data, something nearly impossible to do in those experiments (Nahas et al. 2004, 1107-1113). Moreover, direct measurement of displacements, torques, twists, forces, and free energies of interacting

1

2 biomacromolecules is only possible at the single molecule level. Another advantage is that single molecule approaches naturally take into account the role of fluctuations in in vivo processes. Fluctuations arise because biopolymers are sometimes present inside cells in low copy numbers, thus giving concentration fluctuations a large role

(Ghaemmaghami et al. 2003, 737-741). Figure 1.1 shows a conceptual comparison between ensemble assays (1.1A) and single molecule (1.1B) approaches.

Figure 1.1 Comparison between Ensemble Assays and Single-molecule Techniques.

(A) Ensemble assays observe multiple events simultaneously, leading to a signal which is an average over individual events. In this case, the photon counts from a large number of fluorescent proteins produce a noisy, averaged signal. (B) A single molecule approach allows individual fluorescence on- and off-states to be observed.

3

(Image courtesy Arrondo, Jose Luis R., and Alicia Alonso. 2006. Advanced Techniques in

Biophysics. Berlin Heidelberg: Springer-Verlag.)

Single molecule methods can be divided into two categories: fluorescence microscopy and . Fluorescence techniques involve the imaging of single fluorescent molecules, fluorophores, at room temperature. The key challenge here is the very low photon signal-to-noise ratio that makes detection, imaging, and localization of individual, densely-clustered fluorophores extremely technically demanding. Furthermore, these experiments must be performed with samples kept at room temperature, in aqueous buffers, and at high-enough acquisition rates to assay fast DNA and protein conformational fluctuations. However, recent advances in (a) microscope objective lens designs, (b) charged-coupled device (CCD) and complementary metal oxide semiconductor (CMOS) detectors, (c) fluorescent dyes, (d) protein- and DNA-dye attachment chemistries, (e) computational algorithms, and (f) experimental designs have now enabled three dimensional, sub-diffraction-limit detection, localization, and imaging of individual, densely-clustered fluorophores in biologically-relevant contexts (Brooks Shera et al. 1990, 553-557; Betzig and Chichester

1993, 1422-1425).

4

Force spectroscopy techniques are used to investigate the micromechanical responses of biomacromolecules, mainly DNA and DNA-protein complexes. The general approach involves tethering a single linear DNA molecule to a glass coverslip at one end and a micron-sized polystyrene bead at the other end. An important variation involves replacing the coverslip with a second bead resulting in dual-bead tethers. The beads act as macroscopic handles that can be manipulated in a variety of ways. The major micromanipulation approaches use (a) optical traps, (b) magnetic tweezers (or magnetic traps), (c) atomic force microscopy-based (AFM) methods, (d) micropipette- aspiration-based techniques, and (e) flow-induced forces. No matter the particular approach, the micromanipulation apparatus allows forces to be measured as a function of DNA’s extension – the so-called fixed extension ensemble or extension clamp experiments - or extensions to be measured as a function of applied forces – also known as fixed force ensemble or force clamp experiments. The measured quantity in each case, forces for experiments in the fixed extension ensemble and extensions in the fixed force ensemble, are thermally averaged. Besides forces and extensions, the linking number, which is made up of the twist and writhe of a DNA molecule, and its thermodynamically-conjugate quantity, torque, can also be modulated resulting in experiments in the fixed-linking number or fixed-torque ensembles.

5

Optical tweezers utilize highly focused lasers as a means to trap and manipulate one or two dielectric beads that have been attached to a biomacromolecule, usually

DNA (Ashkin et al. 1986, 288). Magnetic tweezers systems replace the laser with a permanent or electro-magnet and the polystyrene bead with a superparamagnetic bead

(when using a single bead tether), resulting in a force on the DNA-tethered particle, and thus the DNA molecule itself, in the presence of an inhomogeneous produced by the magnet (Smith, Finzi, and Bustamante 1992, 1122-1126). AFM is performed by attaching one end of a biomolecule to a functionalized surface and the other end to a microscopic cantilever. A piezo-drive motor can be used to displace the surface relative to the cantilever thereby producing a force on the DNA tether connecting the two (Binnig, Quate, and Gerber 1986, 930-933). Stiff micropipettes can be used to aspirate on beads that are attached to the ends of a DNA tether; the extension of the molecule can then be manipulated by changing the position of the micropipettes.

Finally, flow-based stretching uses biomolecules anchored to a surface at one end with a microsphere at the other end. A laminar flow imparts a drag force on the microsphere which extends the biomolecule. Figure 1.2 is a drawing that shows the basic principle behind each of the force spectroscopy methods mentioned.

While single molecule approaches offer many benefits, they can be more difficult to perform than ensemble assays, are less standardized, and often require unique

6 instrumentation and procedures. However, continuous advances in hardware, experimental protocols, and computational techniques are lowering the entry barrier to single-molecule research, making it easier to isolate, manipulate, and record the behavior of individual molecules in aqueous buffers, at fast sampling rates, and at room temperature.

Figure 1.2 Principles of Force Spectroscopy Methods. (A) use highly-focused lasers to manipulate dielectric beads. The surfaces of the beads are functionalized to allow binding of biomacromolecules. Optical tweezers may use dual

(on the right side of the panel) or a single optical trap. In case of the latter design, the other end of the biomolecule fixed to a functionalized glass coverslip. (B) AFM

7 methods attach one end of the biomacromolecule to the tip of a microscopic cantilever and the other end to a substrate. The cantilever tip deflects based on the tension in the molecule and this deflection is observed by a laser beam. (C) Magnetic tweezers utilize permanent or electro-magnets to generate an inhomogeneous magnetic field.

The biomacromolecule is functionalized and a paramagnetic bead attached to one end. When placed in the magnetic field, the paramagnetic bead has a force applied to it, which is transmitted to the molecule. (D) Flow stretching utilizes fluid flow that produces a drag force on a bead attached to a biomacromolecule. (Image courtesy

Deniz et al. 2015)

1.2 DNA

DNA is the primary storage system for genetic information and is the means for passing hereditary traits between generations. Its sequence specifies the blueprints for all proteins required within the cell, and also contains signals that help synchronize and regulate cellular processes (Saenger 1984). Physically, DNA is a long, semi-flexible, negatively charged, heteropolymer with lengths in vivo ranging from several microns to several centimeters (Saenger 1984). For instance, each individual in a human cell contains DNA that is on the order of centimeters long (Saenger 1984). A polymer is a molecule comprised of repeated monomer subunits that are covalently

8 bonded together. When the monomers are all identical, we refer to the polymer as a homopolymer; when the monomers are not identical, as in DNA or proteins or RNA, the polymer is labeled a heteropolymer. An important characteristic of a polymer is its topology, which describes how many bonds a monomer makes with neighboring monomers. Examples are linear, in which every monomer has two neighbors except the first and the last monomers which have only one; circular, with all monomers having two neighbors; star; brush; branched; and so on. DNA in cells is found in either linear or circular topologies.

The study of DNA began when Friedrich Miescher discovered a substance in cellular nuclei that was rich in phosphorous and nitrogen and called it nuclein (Dahm

2007, 565-581). Albrecht Kossel (Kossel 1886, 248-264) and Phoebus Leven (Levene 1919,

415-424) determined that nuclein was composed of base, sugar, and phosphate groups.

The role of DNA as the carrier of genetic information was established by Avery,

McLeod and McCarthy in 1944 when they transfected one strain of common bacteria with DNA from another strain leading to the former strain displaying phenotypes of the latter, donor strain (Avery, MacLeod, and McCarty 1944, 137-158). Several years later, Erwinn Chargaff discovered that the nucleobases cytosine (C), guanine (G), thymine (T), and adenine (A) were found in DNA in specific ratios (Chargaff,

Zamenhof, and Green 1950, 756-757): cytosine and guanine were found in a ratio of 1:1

9 as were thymine and adenine. This formed the background to the seminal work of

Francis Crick and James Watson (Watson and Crick 1953, 737-738), who used X-ray diffraction data from single crystals of DNA acquired by Franklin and Gosling (Franklin and Gosling 1953, 740-741) to propose the double helical structure of DNA.

A DNA molecule is comprised of a pair of chains intertwined in the form of a right-handed double helix. Each chain consists of monomer repeat units (nucleotides)

A, T, C, or G covalently linked together in various sequences by phosphodiester bonds.

The two chains in DNA are complementary, i.e. an A in one chain always faces a T on the other; a G in one chain faces a C on the other. This arrangement maximizes hydrogen bonding between complementary bases and provides greater stability to the double helical structure. Three hydrogen bonds are made between C-G and two between A-T (Saenger 1984). The double-helical structure prevents relaxation of torsional stress by rotation about the covalent bonds in the backbone, and the base- stacking interaction leads to large flexional rigidity in DNA (Strick et al. 2000, 115-140).

The base-pair spacing is 3.4 angstrom and the helical twist rate or pitch of DNA is one complete revolution for every 10.5 base pairs, giving a pitch of 3.6 nm (Saenger 1984).

This is a mean twist rate around which thermally-driven fluctuations produced small variations. DNA molecules carry a charge of -2e per base pair. Free positively charged ions in solution are attracted to the DNA and generate an electrostatic screening effect

10 that prevents the two negatively charged strands in DNA from repelling each other.

The counter-ions contribute to the radius of the polymer, resulting in a hydrodynamic or van der Waals radius of DNA of 1 nm in physiological conditions (Saenger 1984).

Physiological conditions are taken to be PH 7.1 and 150 millimolar of monovalent salt.

Long DNA molecules display semiflexibility: on length scales greater than 50 nm

(equivalent to 150 base pairs), DNA behaves as a random walk, while on shorter length scales, DNA is a rigid rod. Thus, 50 nm is DNA’s persistence length and gives the length scale beyond which thermal bending fluctuations become significant.

The chemical structure of the nucleotides has been well-characterized. Their composition is modular with each nucleotide having a common core of a 5-carbon sugar ring called deoxyribose and a phosphate group. Nucleotides get their unique properties from four different bases that are linked to the sugar. The carbons in the sugar are labeled as 1’ – 5’ (read as “one-prime”, “five-prime”, and so on). Within a single nucleotide, the phosphate group is attached at the 5’ site of the deoxyribose, a hydroxyl group to the 3’ site, while the base is covalently bonded to the 1’ carbon.

A key feature of DNA is that the complementary strands are chiral, displaying a well-defined directionality. This comes about from the manner in which the phosphodiester bonds are made. Each nucleotide binds to the next nucleotide in a chain via a link between the phosphate group on the 5’ carbon of the first nucleotide to the

11 hydroxyl group on the 3’ carbon of the next nucleotide. Thus, a single strand of linear

DNA contains a backbone with free 5’ and 3’ ends. Furthermore, the resulting sugar- phosphate backbone has a directionality specified by the order of travel along the chain.

Two possible orientations are specified as follows: from the 3’ carbon to the 5’ carbon on consecutive nucleotides (3’-5’) or in the opposite direction, from the 5’ carbon to the 3’ carbon (5’-3’). Complementary strands of DNA are spatially organized with opposed chiralities: one strand is oriented 3’-5’ while the other is 5’-3’ as this ensures maximum structural stability.

The linear sequence of nucleotides is known as the primary structure of DNA, while the double helical structure is referred to as its secondary structure. DNA can also assume tertiary structures, for instance when it supercoils or is arranged in complex spatial configurations inside the cellular nucleus (Saenger 1984). Figure 1.3 shows the various components of DNA.

1.3 Proteins

Proteins are heteropolymers of amino acids which perform a wide array of functions in cells. As enzymes, they are involved in virtually every process in the cell, for instance in metabolic cycles; manipulating DNA during transcription, cell division,

12

Figure 1.3 Structural Components of DNA. This cartoon illustrates the various structural characteristics of DNA. The rise or vertical distance between neighboring base pairs is 0.34 nm. The helical pitch or the rise for one full turn is 10 bases or 3.4 nm. The van der Waal’s radius of DNA is 1 nm. Each backbone consists of nucleotides linked together by covalent phosphodiester bonds between the 3’ hydroxyl group and the 5’ phosphate group. The two opposing and complementary strands are connected by hydrogen bonds between the bases. (Image courtesy Reece, Jane B., and Neil A.

Campbell. 2011. Campbell Biology. Boston: Benjamin Cummings / Pearson.) replication, and other genetic processes; breaking down macromolecules; and so on

(Kessel and Ben-Tal 2010). Protein-protein interactions are also very important as these can produce rigid structural components of the cell, regulate enzymatic activity,

13 modulate the cell cycle, and define inter- and intra-cellular signaling networks

(Mathews et al. 2012). In their role in cell signaling or binding, proteins can transmit information from one cell to another, they can act as membrane receptors that bind signaling molecules to induce a change within a single cell, or as antibodies they may bind to foreign objects (antigens) as part of the immune system (Murray et al.

2006). Ligand transport proteins are able to bind small molecules and shuttle them across cell membranes, hemoglobin, which transports oxygen from the lungs to other tissues throughout the organism, being a classic example (Mathews et al. 2012).

Structural proteins often join together in large protein complexes to provide support (as in actin or tubulin), or as connective tissue (collagen and elastin), and in some cases motility (myosin or kinesin) (Murray et al. 2006).

There are 21 types of amino acids that are commonly found in eukaryotes, and they are linked together in various sequences by peptide bonds to form proteins (Kessel and Ben-Tal 2010). The primary structure of proteins consists of the amino acid sequence. The secondary structure of proteins is formed by hydrogen bonding between different segments of the polymer, i.e. by intra-chain hydrogen bonding, leading to the formation of three structural motifs: alpha helices, beta sheets, and turns. These may all be found in a single protein molecule. Proteins display a further level of spatial organization referred to as their tertiary structure, which relates closely to the function

14 of the protein. Tertiary structures can take many different forms and are generally stabilized by nonlocal interactions such as salt bridges, disulfide bonds, or, commonly, a hydrophobic core (Murray et al. 2006). However, tertiary structures can be grouped into three main classes (and functions): globular (enzymes), fibrous (structural), and membrane (receptors and channels) (Mathews et al. 2012). There is also a quaternary structure which arises when several individual proteins form complexes. Figure 1.4 illustrates these ideas.

Figure 1.4 Four Levels of Protein Structure. The primary structure or proteins consists of chains of individual amino acids. These chains can form three-dimensional

15 shapes held together by hydrogen bonds between amino acids, referred to as the secondary structure. The secondary structures are further organized into a tertiary structure. Certain proteins then utilize a fourth level of structure when multiple polypeptides join together. (Image courtesy Reece, Jane B., and Neil A. Campbell.

2011. Campbell Biology. Boston: Benjamin Cummings / Pearson.)

1.4 Response of DNA to Micromechanical Manipulation

Initial single molecule experiments on the application of tension on individual

DNA molecules were by Smith, Finzi, and Bustamante in 1992 (Smith, Finzi, and

Bustamante 1992, 1122-1126). In these early experiments, DNA was shown to behave in a three-dimensional random walk on length scales much larger than 50 nm.

Furthermore, two distinct regimes were identified. First, the entropic regime at low forces (< 3 pN) occurs when thermal fluctuations, which tend to randomly coil the molecule, are balanced by the applied tension. As the applied tension is increased past 3 pN, the DNA enters a Hookean regime in which it acts elastically according to Hooke’s

Law.

Increasing the force towards 60 pN and beyond revealed a surprising “over- stretching” transition in DNA (Cluzel et al. 1996, 792-794; Smith, Cui, and Bustamante

1996, 795-799). This reversible transition results in DNA quickly jumping to 1.7 times its

16 normal contour length near a force of 65 pN. Figure 1.5 shows an example force- extension plot for DNA, with the three different regimes labeled: entropic elasticity,

Figure 1.5 Example Force-Extension Plot for DNA. Shown is the relationship between applied force and extension for a single DNA molecule. The three main regimes of the

DNA force response curve are shown. DNA displays entropic elasticity for forces between 0 and 5 pN. At this force, DNA has reached full extension, in this case 16.4

m for -DNA. Between 5 pN and 60 pN, the DNA backbone is stretched and the molecule behaves as a Hookean spring. The highly-nonlinear overstretching transition occurs as forces reach 65 pN, and here DNA extends to 1.7 times its contour length.

(Image courtesy Bongini, L., Lombardi, V., and Bianco, P. 2014. The transition

17 mechanism of DNA overstretching: a microscopic view using molecular dynamics. J.

R. Soc. Interface. DOI: 10.1098/rsif.2014.0399.)

Hookean, and over-stretching. The structure of the DNA in the over-stretching transition is still not completely understood (Williams, Rouzina, and McCauley 2009,

18047-18048), although recent experiments have attempted to shed further light on the question. One potential explanation is that the double-helix structure of DNA uncoils into some other unidentified quasi-helical structure in which the helical pitch is lengthened. Another idea is that the high tensions on the double helix impart significant stress on the hydrogen bonds between base pairs and cause them to rupture, referred to as DNA melting (Mameren et al. 2009, 18231-18236). Melting, however, may not be a requirement for the over-stretching transition and may only become significant when nicks or free-ends are present in the DNA molecule (Paik and Perkins 2011, 3219-3221).

Also, stretching experiments on short segments of DNA with very specific base pair concentrations (A-T vs. G-C) point to the presence of structurally and thermodynamically well-defined form of DNA in the overstretched state (Bosaeus et al.

2012, 15179-15184). More recent research has suggested that perhaps the competing explanations for the overstretching transition may all be valid but in a manner contingent upon experimental details such as the presence or absence of nicks and free ends, or the sequence of the DNA (Zhang et al. 2013, 3865-3870).

18

Attempts to model DNA as a freely-jointed chain model led to significant deviation from the measured force-extension response in the entropic and Hookean regimes. This was because the semiflexibility of long DNA molecules was not taken into account. When this effect was incorporated into the model, leading to the so-called worm-like chain model of DNA, excellent agreement between theory and experiments resulted, with a DNA persistence of 50 nm in physiological conditions providing the best fit (Bustamante et al. 1994, 1599-1600; Bouchiat et al. 1999, 409-413). Based on the model, an expression for the force response of a single DNA molecule has been obtained (Marko and Siggia 1995, 8759-8770):

( )

Here, f is the applied force, is Boltzmann’s constant, T is the temperature (300 K is often used for physiological conditions), b is the persistence length (50 nm as discussed previously), L0 is the contour length of the DNA molecule (which is 16.4 µm for λ-phage

DNA), K0 is the elastic modulus (1000 pN for dsDNA), and z is the observed end-to-end extension. This model is useful in the force range up to about 40 pN. Models also exist to explain the overstretching transition and the DNA’s elastic behavior for forces up to

~100 pN.

The torque response of DNA has also been observed using single molecule techniques. As a single DNA molecule is over-wound—that is, twisted in the same

19 direction as the double helix structure—plectonemic structures are generated (Strick et al. 1996a, 1835-1837). These can be described as DNA spiraling around itself, in a similar way that a phone cord will bunch up around itself as it is twisted. Formation of plectonemes results in an abrupt change in extension (Maffeo et al. 2010, 158101; Forth et al. 2008, 148301; Oberstrass, Fernandes, and Bryant 2012, 6106-6111), and the release of torque accumulated along the DNA molecule from being overwound (Marko and

Neukirch 2012, 011908; Oberstrass, Fernandes, and Bryant 2012, 6106-6111). DNA under negative torques, in which the DNA molecule is twisted counter to the helical direction, has a twist stiffness significantly greater than would be expected from strand-separated

DNA (Sheinin et al. 2011, 108102; Oberstrass, Fernandes, and Bryant 2012, 6106-6111).

This suggests that DNA may be able to exist in structural conformations characterized by mixed torsional states (Bryant, Oberstrass, and Basu 2012, 304-312).

Additionally, double-stranded DNA can be unzipped by grasping the two backbones separately and pulling in opposite directions. This ruptures the hydrogen bonds between the base pairs. Sequence-dependent force responses have been observed using optical tweezers (Bockelmann et al. 2002, 1537-1553), AFM (Rief, Clausen-

Schaumann, and Gaub 1999, 346-349), and magnetic tweezers (Danilowicz et al. 2003,

1694-1699), with force signals from distinct 10 base pair sequences being measured. It has been found that 10 base pair (or longer) regions of a DNA molecule with

20 predominantly C-G base pairs have a higher unzipping force compared to A-T dominated sequences (Bockelmann et al. 2002, 1537-1553). Also, unzipping DNA can cause stresses to propagate along the DNA and influence its conformation beyond the unzipping fork (Bockelmann et al. 2002, 1537-1553). Unzipping experiments with magnetic tweezers have also been used to characterize jumps and pauses and their dependence on sequence (Danilowicz et al. 2003, 1694-1699).

1.5 Studying DNA-Protein Interactions with Single Molecule Micromanipulation

Methods

Proteins must interact with the DNA to carry out critical genetic processes, and in binding to DNA often produce changes in its extension, twist, or linking number.

Single molecule methods have been used to monitor and control these changes in the presence of applied mechanical stresses and strains. Some proteins are processive in nature, translocating along the DNA in discrete steps after binding. For these proteins, single molecule techniques have been used to directly measure parameters such as the mean step size and its distribution, duration and distribution of spontaneous pauses, and the force required to stall the motion.

The first protein to be studied at the single molecule level was RNA Polymerase

(RNAP) (Wang et al. 1998, 902-907). This protein is an enzyme that uses the DNA as a

21 template for transcribing the genetic sequence by generation of complementary RNA polymers (thus the name RNA polymerase). Optical tweezers (Abbondanzieri et al.

2005, 460-465; Neuman et al. 2003, 437-447) and later single molecule fluorescence

(Friedman and Gelles 2012, 679-689; Kapanidis et al. 2006, 1144-1147) as well as magnetic tweezers (Revyakin et al. 2006, 1139-1143) have been used to elucidate the sub-steps involved with the transcription process, and to analyze how forces affect transcription (Herbert et al. 2006b, 1083-1094; Shaevitz et al. 2003, 684-687). Optical tweezers have also been used to investigate how RNAP’s transcription efficiency is modulated by its interactions with other proteins (Zhou et al. 2011a, 635-646; Herbert et al. 2010, 17-30). Forces that RNAP is able to apply to the DNA molecule have been also been measured and found to be around 5 pN (Wang et al. 1998, 902-907).

The process of DNA replication can involve dozens of proteins in eukaryotes

(Johnson and O'Donnell 2005, 283-315) including helicases, , DNA gyrases, single strand binding (SSB) proteins, and DNA polymerases. Single molecule micromanipulation has been used to investigate the motor activity for DNA helicases

(Manosas et al. 2009, 904-912; Ribeck et al. 2010, 2170-2179; Gollnick et al. 2015, 1273-

1284; Dessinges et al. 2004, 6439-6444) and DNA polymerases (Maier, Bensimon, and

Croquette 2000, 12002-12007; Wuite et al. 2000, 103-106). Recently, optical tweezers were combined with fluorescence to investigate the role of single-strand binding protein, a

22 protein which prevents the non-template DNA strand from re-hybridizing with the template strand that is being replicated (Zhou et al. 2011b, 222-232). Magnetic tweezers have also been implemented to examine the torsional response of DNA as a result of and DNA gyrase activity (Lipfert et al. 2010, 977-980; Gore et al. 2006,

100-104; Neuman 2010, 22363-22364).

Histones are proteins which aid in the formation of the tertiary structure of DNA and have been studied extensively with single molecule techniques. These proteins condense long DNA molecules so that they can fit within the cell nucleus. Five varieties of histones exist—H2A, H2B, H3, H4, and HI (or H5)—which interact with DNA to form (Murray et al. 2006). Nucleosomes consist of a core particle of eight histones, referred to as a octamer or octameric core particle, bound to DNA. The core particle is composed of two each of histones H2A, H2B, H3, and H4. Around this core, 146 base pairs of DNA wraps around ~1.65 times in a left-handed manner (Luger et al. 1997, 251-260). In the presence of H1 (or H5), nucleosomes form sequentially along the DNA molecule in a “beads on a string” arrangement, resulting in a compact structure called fiber. Single molecule studies have characterized the micromechanical properties of chromatin, nucleosomal arrays, mononucleosomes, and even whole . Early single molecule studies of chromatin fiber deduced a

Hookean spring response at low forces (< 5 pN), but irreversible extension at forces

23 approaching 20 pN (Cui and Bustamante 2000, 127-132). Later experiments using optical tweezers reported the dissociation of individual nucleosomes over a range of forces (Bennink et al. 2001, 606-610; Brower-Toland et al. 2002, 1965; Pope et al. 2005,

3572-3583). The use of magnetic tweezers allowed a much lower loading rate to be used, with detection of dissociation events at 4 pN (Yan et al. 2007, 464-474). Most recently, nucleosome rupture events have been observed near the theoretical force of 1.5 pN (Marko and Siggia 1997, 2173-2178) with a horizontal magnetic tweezers device

(McAndrew 2012).

Other DNA-protein interactions have been probed using single molecule techniques. The recA protein polymerizes onto DNA under tension and can be driven off of the DNA template at sufficiently high forces (Leger et al. 1998, 12295-12299).

Proteins such as HMGB1, HU, and NHP6A have been shown to result in bending and looping of the DNA molecule (Skoko et al. 2004, 13867-13874), and the concentration of proteins has been found to affect the binding energies of the DNA-protein complexes

(Skoko et al. 2006, 777-798).

These are just some examples of DNA-binding proteins that have been studied using micromanipulation techniques. Much information about the function and mechanism of these proteins is at present unknown, and single molecule approaches

24 remain to be exploited to analyze the activity of a large number of other DNA-binding proteins.

1.6 Research Problem and Approach

While existing single-molecule techniques have provided many important insights into biological processes, they tend to be difficult to reproduce, challenging to use, often involve very complex and specialized instrumentation, and are very expensive to implement. One of my goals here is to present a new horizontal magnetic tweezers that makes substantial improvements along all these dimensions. After describing the design in detail, I present data used to validate the system and quantify the limits of its performance.

Data from single molecule experiments have to be analyzed using sophisticated image and data processing software, with new algorithms and implementations being developed to keep pace with advances in instrumentation. For instance, step finding algorithms have been designed to analyze single molecule experiments on processive motors like myosin, while particle tracking algorithms have been inspired by applications in biological and soft condensed matter physics. These techniques are often based on assumptions about the shape of the steps or the number of particles to be tracked, limiting their use to small sets of experiments where the relevant assumptions

25 are satisfied. Here, I describe several unique algorithms for step finding and particle tracking that improve upon existing approaches by relaxing a large number of restrictive assumptions. After describing their details, I will present how their performance was validated and also illustrate their use on experimental data obtained using the horizontal tweezers.

1.7 Plan of Research

This dissertation is structured as follows. In Chapter 2, after reviewing single molecule techniques in more detail, I discuss the design and performance of the horizontal magnetic tweezers. Chapter 3 presents a method for detecting step-like features in data sets and describes a systematic investigation of its performance. In

Chapter 4, a new tracking algorithm designed to assemble trajectories for densely- populated, high-velocity, rapidly-spawning, indistinguishable, objects is presented and its performance investigated. I will conclude in Chapter 5 with a brief summary of the results presented here, how they impact the field of single molecule biology, and future directions for this research

Chapter 2 – Transverse Magnetic Tweezers

2.1 Introduction to Experimental Methods for Single-Molecule Micromanipulation of

DNA Tethers

The major single molecule force spectroscopy methods generally use optical traps (tweezers), magnetic tweezers, and AFM. All have been used to successfully manipulate single DNA tethers or DNA with bound proteins by applying piconewton forces and nanometer displacements.

Optical tweezers exploit the fact that light has the ability to apply forces and torques on dielectric materials with refractive indices significantly larger than that of air or vacuum (Ashkin 1970, 156-159). High numerical aperture objectives are used with near-infrared laser light to create a large spatial gradient in light intensity at a very specific position in 3-dimensional space (Ashkin et al. 1986, 288; Neuman and Block

2004, 2787-2809). At the point of focus, a dielectric microsphere can be “trapped” in a

Hookean well (Moffitt et al. 2008, 205-228). The microspheres can be functionalized on the exterior to allow binding of various biomacromolecules – DNA, RNA, proteins – allowing the optical tweezers to manipulate and apply forces in the range of 1 pN per milliwatt of laser light on the molecules (Moffitt et al. 2008, 205-228). The dielectric

26

27 microsphere serves as a handle with which to manipulate the biomacromolecule, and also as a probe that enables molecular forces and extensions to be monitored. In state- of-the-art optical tweezers, the location of the microsphere is determined using back- focal plane interferometry (Allersma et al. 1998, 1074-1085; Gittes and Schmidt 1998b, 7-

9; Pralle et al. 1999, 378-386) which can now allow the detection of bead displacements on the order of 1 angstrom (Moffitt et al. 2008, 205-228) under physiological conditions and at room temperature. The resulting system has provided many key insights into the molecular mechanisms of important genetic processes in biology (Neuman, Lionnet, and Allemand 2007, 33-67).

Although optical traps are naturally-suited for experiments in the fixed- extension ensemble, variations have been developed to use them as force clamps or to apply controlled torques on single DNA molecules. For instance, Stone et al used an optical trap on one end of a DNA molecule and an aspiration micropipette on the other end to devise a DNA torque wrench (Stone et al. 2003, 8654-8659). Other variations of the basic design include one that uses four traps to manipulate two DNA molecules simultaneously in order to observe DNA’s interaction with the structuring protein H-NS (Dame, Noom, and Wuite 2006, 387-390). The combination of fluorescence and optical tweezers has been utilized by several groups to investigate DNA polymer

28 physics (Perkins et al. 1994, 822-826) as well as interactions between proteins and DNA

(Handa et al. 2005, 745-750; Mameren et al. 2006, 78).

Several issues arise when using optical tweezers. First, it is difficult to use them in fixed-force mode. Although force feedback methodologies have been devised using real-time measurements of the trap stiffness to continuously adjust the position of the trap so as to maintain constant force (Visscher and Block 1998, 460-489), these are known to be subject to a variety of important limitations. Sample chamber drift can adversely impact optical tweezers which do not use dual traps with two-bead DNA tethers, and in these cases it may be necessary to quantify the drift (Visscher, Gross, and

Block 1996, 1066-1076; Nugent-Glandorf and Perkins 2004, 2611-2613). Optical traps are complex systems and each component contributes to the overall noise in the data. These noise contributions have to be controlled to obtain high precision force and extension data, and, thus, noise characterization and attenuation is a major design challenge in these systems (Nugent-Glandorf and Perkins 2004, 2611-2613; Shaevitz et al. 2003, 684-

687; Abbondanzieri et al. 2005, 460-465; Carter et al. 2007, 421-427). Calibration of voltages and other experimental outputs to the desired physical parameters like forces and extensions is also highly non-trivial and subject to variety of noise considerations

(Moffitt et al. 2006, 9006-9011; Gittes and Schmidt 1998a, 75-81). The incident laser light of optical traps can also impart heat energy into the sample which can destroy sensitive

29 biomolecules, and when using fluorescence, special techniques are required to avoid photobleaching by the trapping laser and overlap (Dijk et al. 2004, 6479-6484; Hohng et al. 2007, 279-283; Brau et al. 2006, 1069-1077). In summary, an optical tweezers device that is capable of accurate measurements is often highly complex in both design and operation, requiring specialized components to construct it and a steep learning curve to operate it.

Atomic Force Microscopy (AFM) utilizes a microscopic cantilever with a tip that is rastered over a sample producing topographic maps of the surface (Dai et al. 1993,

7401-7407; Binnig, Garcia, and Rohrer 1985, 1336-1338). As the tip passes over the sample surface, it is deflected by microscopic forces and the tip deflection is determined by a laser reflected off a highly-polished segment of the cantilever or a microfabricated mirror affixed to the cantilever. By calibrating the cantilever stiffness, these deflections can be used to determine the topology of the object.

While AFMs were originally devised to study surfaces, the ability to determine forces and positions makes them well suited for DNA and protein micromanipulation

(Marti et al. 1988, 803-809; Bustamante et al. 1995, 263-272; Shao, Yang, and Somlyo

1995, 241-265; Lee, Chrisey, and Colton 1994, 771-773; Yu et al. 2007, 284-289).For these applications, the DNA tether must be anchored at two points, one being the AFM cantilever tip and the other a substrate or surface on the piezo-stage. By moving the

30 piezo-stage vertically by known distances, the molecular tether can be extended and the deflection of the cantilever used to measure the force (Engel 1991, 79-108). The sub- nanometer displacement capability of the piezo-stage and the ability to resolve cantilever deflection with the laser beam allow determination of high-resolution force versus extension curves of single molecules, especially when the vertical displacement of the tip with respect to the piezo-stage surface can be determined with a calibrated evanescent field (Neuman and Nagy 2008, 491-505).

This approach does pose some difficulties though. The cantilever stiffness must be known for accurate force and extension calculations; however, the cantilever calibration will have errors ranging from 5% to 20% (Cumpson, Zhdan, and Hedley

2004, 241-251). Moreover, forces computed based on the cantilever stiffness have limited resolution (Neuman and Nagy 2008, 491-505). While displacements in the same direction as the cantilever deflection (the z-axis) are readily measured in AFM, it is very difficult to determine lateral (x- and y-axis) position of the anchor points (Neuman and

Nagy 2008, 491-505). Additionally, non-specific interactions between the molecule and the surface substrate can present a challenge and can be difficult to eliminate (Neuman and Nagy 2008, 491-505). These considerations necessitate additional procedures to ensure that only a single molecule is being manipulated and add to the complexity of the experiments (Marszalek et al. 1999, 100-103; Carrion-Vazquez et al. 2000, 63-91).

31

A magnetic tweezers-based DNA micromanipulation system presents an excellent alternative to other methods due to its relatively simple design and implementation (Smith, Finzi, and Bustamante 1992, 1122-1126; Strick et al. 1996b, 1835-

1837; Gosse and Croquette 2002, 3314-3329; Neuman and Nagy 2008, 491-505). In comparison to AFM and optical tweezers, which operate in a fixed extension ensemble, magnetic tweezers operate in a fixed force ensemble and are excellent for maintaining controlled forces on DNA tethers. The general design of these instruments consists of

DNA tethers with superparamagnetic particles attached to one end (the other end fixed to a glass coverslip) placed in inhomogeneous magnetic fields resulting in forces in the

0.01 pN to 100 nN range on the magnetic particles, and thus the DNA tethers (Strick et al. 2000, 115-140; Kollmannsberger and Fabry 2007, 114301). The magnetic fields can be generated using either permanent or electro-magnets (Haber and Wirtz 2000, 4561-4570;

Zacchia and Valentine 2015, 053704). Magnetic tweezers can be used to apply torques to the DNA tether and can also independently alter forces and torques on a DNA molecule

(Mosconi, Allemand, and Croquette 2011, 034302; Janssen et al. 2012, 3634-3639). Other variations include temperature-controlled magnetic tweezers used to study heat- sensitive proteins as they interact with DNA (Gollnick et al. 2015, 1273-1284) and tweezers using proteins (instead of DNA molecules) as the tethered biomolecules (You et al. 2015, 2424).

32

In many designs, a linear DNA molecule is tethered between a superparamagnetic bead and a functionalized cover-slip or other substrate (Smith,

Finzi, and Bustamante 1992, 1122-1126). In this case, a magnet system is usually located above the sample cell. This vertical geometry is widely used due to the simplicity and relative ease of locating a DNA tether (compared to optical tweezers or AFM). Its popularity also facilitates inter-lab data reproducibility studies. There are potential issues with vertical magnetic tweezers, however. The force range of vertical magnetic tweezers can be limited in some designs (Smith, Finzi, and Bustamante 1992, 1122-1126), the magnetic beads move in and out of the focal plane during an experiment, and sample cell drift is a concern, especially for high precision extension/force measurements. Advanced designs can deal with these issues, including producing large forces (Strick et al. 2000, 115-140; Haber and Wirtz 2000, 4561-4570), more precise extension measurements using diffraction fringe counting (Klaue and Seidel 2009,

028302; Huhle et al. 2015), and using feedback-based active compensation to control sample cell drift (Chen et al. 2011, 517-523). These improvements, however, contribute to additional design complexity.

Another approach involves applying forces to the DNA tether in the horizontal plane (Sun et al. 2008, 3279-3287; Danilowicz et al. 2009, 19824-19829; Yan, Skoko, and

Marko 2004, 011905). With a horizontal configuration, the extension measurements are

33 performed easily since the bead(s) are confined in the focal plane and calibration between vertical displacement and diffraction pattern is not necessary. Furthermore, a dual bead construct allows differential measurement which offers excellent passive drift cancellation improving precision and accuracy, a principle exploited mainly in optical tweezers. While micropipette aspiration has been used (Yan, Skoko, and Marko 2004,

011905) with a dual bead construct, the vast majority of horizontal tweezers utilize a single-bead tether (with the other end of the DNA anchored to a glass surface). As with vertical tweezers, such a design makes accurate extension measurements difficult since the anchor point is often hard to locate. As a result, a number of techniques including fringe counting (Danilowicz et al. 2009, 19824-19829) or the over-stretching transition

(Chen et al. 2011, 517-523) have been used to calibrate extensions. However, such methods still require active stage stabilization to compensate for drift.

2.2 Horizontal Magnetic Tweezers

The horizontal magnetic tweezers device I have designed should enhance the repeatability and accuracy of force and extension measurements, while increasing the ease with which experiments can be performed. The following sections will discuss the various components of the instrument. In Figure 2.1 I show the basic principle of the horizontal magnetic tweezers: a single molecule of DNA has two microspheres, one

34 polystyrene and one superparamagnetic, attached one at each end. The polystyrene microsphere is also attached to a glass micro rod. A magnetic field generated by a permanent bar magnet is used to impart a force on the DNA molecule.

Figure 2.1 Principle and design of transverse magnetic tweezers device. (A) The tweezers uses a non-magnetic polystyrene microsphere to attach a DNA tether to a glass micro-rod. The other end of the tether has a superparamagnetic microsphere attached to it. Forces ranging from 0.1 pN to 20 pN (and higher) can be realized by adjusting the distance between the DNA tether and a permanent bar magnet.

35

2.2.1 DNA Tether Capture Concept

As discussed in section 2.1, magnetic tweezers—in both horizontal and vertical geometries—generally involve attaching the DNA molecule directly to a stationary substrate. An obstacle with these designs is the calibration of extensions, often requiring a fiducial marker or some other extra calibration scheme. On the other hand, a few designs have utilized a pipette that can aspirate on one bead of a dual bead construct, which simplifies extension measurements, but the process of creating pipettes and using them to manipulate DNA constructs can be very problematic and difficult to reproduce. In my device, instead of using a micropipette, one bead is attached simultaneously to the DNA and to the functionalized surface of a glass micro-rod; another, superparamagnetic, bead is attached to the other free end of the DNA. This arrangement eliminates the requirement to aspirate on a bead providing the benefits of a two-bead tether without the experimental difficulties associated with micropipette aspiration. Indeed, the results provide for passive drift compensation, improved stability, ease, and repeatability of a surface-anchored tether.

A solid, square cross-section, commercially-available glass micro-rod of size 0.25 mm and length 100 mm is held in a custom designed 3-D printed holder. The surface of the glass micro-rod is functionalized with using silane-PEG-biotin (see Appendix

A.1 for protocol); the ethoxyl/methoxyl groups on the silane attach to the hydroxyl

36 groups on the glass surface, leaving the biotin free to bind to streptavidin. A dual-bead

DNA tether construct is created using λ-phage DNA that is end-functionalized with biotin on one end and digoxigenin on the other end (see Appendix A.2 for protocol).

The biotin on the DNA binds to a non-magnetic microsphere that is coated with streptavidin, while the digoxigenin on the other end of the DNA binds to a superparamagnetic microsphere that has amine surface groups that are crosslinked with anti-digoxigenin (see Appendix A.3 for protocol). The functionalized glass micro- rod is placed into the sample cell and then the DNA-bead constructs are introduced.

The streptavidin-coated beads—which already have DNA attached to the surface—bind to the biotinylated surface of the glass micro-rod, while the anti-digoxigenin-coated superparamagnetic beads attached to the other end of the DNA are unconstrained and can have forces applied to them in a suitable magnetic field. Figure 2.2 shows the attachment concept.

2.2.2 Horizontal Magnetic Tweezers Design

The tweezers utilizes a Nikon Diaphot TMD inverted light microscope as the basis for the design. A 40× bright field objective with a 0.65 numerical aperture is used to image beads. A halogen lamp with condenser is used for illuminating the sample.

37

Figure 2.2 DNA tether attachment protocol concept. A series of steps are required to attach a dual bead DNA tether to the surface of a glass micro-rod. The glass micro-rod must be surface-functionalized with silane-PEG-biotin (see Appendix A.2 for details).

Then, DNA molecules must be end-functionalized with biotin and digoxigenin at separate ends by way of 12 base-pair oligos (see Appendix A.3 for details). Once the

DNA is end-functionalized, non-magnetic polystyrene microspheres with streptavidin and super-paramagnetic microspheres with anti-digoxigenin (see Appendix A.4 for details) are allowed to bind to the ends of the DNA, producing a dual bead construct.

In the experimental sample cell, the end of the DNA with the non-magnetic polystyrene microsphere coated with streptavidin binds with the biotin on the glass surface.

38

Experiments are video recorded using a Point Grey Grasshopper3 camera GS3-U3-

23S6M-C, which connects to a Windows PC using a USB3 connection. The

Grasshopper3 camera is capable of a maximum image resolution of 1920 x 1200 pixels at

16-bits mono and a maximum frame rate of 162 Hz. Since recording hours-long experiments would result in extremely large data files, only a cropped, 480 x 960 pixel region of interest is typically stored, a pixel bit-depth of 8 is used, and the frame rate is set to between 30 Hz – 60 Hz depending on the experiment. An adjustable zoom lens is located between the microscope side port and the camera; however, after insertion and calibration, the zoom level is not changed during or between experiments. The Point

Grey camera interfaces with my own LabView Virtual Instrument program for monitoring and recording the experiments.

The components necessary for micromanipulation of single molecules are assembled upon the microscope as follows. Optical breadboard side platforms from

Sutter Instruments are mounted to the right and left of the microscope, while a pair of linear rails spans the two breadboards. The glass micro-rod and injection pipettes are clamped to a hydraulic micromanipulator using a custom holder. The hydraulic micromanipulator is itself fixed to the middle of a custom-made rectangular aluminum fixture that is bolted to carriages affixed to the two linear rails. The position of the carriages can be adjusted by sliding them on the rails. The hydraulic micromanipulator

39 and associated assembly is positioned to the left of the microscope objective. The hydraulic manipulator is used to manually adjust the position of the glass micro-rod during experiments. I find that the hydraulic micromanipulator has lower levels of mechanical noise and drift compared to its motorized counterparts.

The motorized stage is bolted directly on to the right platform opposite the hydraulic micromanipulator and is characterized by excellent orthogonality of the motion control system, mechanical stability, and superior low-speed control—velocities in the range of 200 nm per second are achievable. At the lowest stepping velocities, loading rates of femtonewtons per second can be attained allowing the DNA molecule or DNA-protein complex to be manipulated in quasi-static conditions. The stage system can be controlled manually with a joystick or from a PC. The control software allows more consistent loading rates and other force- and position-dependent experimental parameters to be realized.

In Figure 2.3 the relationship of the various components of the horizontal magnetic tweezers device is displayed. Figure 2.3A shows a conceptual block diagram that demonstrates the scheme of the component organization. Then in Figure 2.3B, I present a photograph of the actual device that has been annotated to help identify the components—the color of the names in text at the bottom correspond to the colors of

40

Figure 2.3 Placement of the components of the horizontal magnetic tweezers. (A)

A block diagram showing the layout of the system. (B) An annotated image shows the physical sizes and relationships of the various components that comprise the horizontal magnetic tweezers. Names of items are colored to match the line that surrounds the item in the image.

41 the outlines of the parts in the image. A detailed list of the components, model numbers, vendors, and other details is presented in Appendix A.4.

A specially-designed sample cell holder is attached to the z-axis of the motorized stage, and mechanically isolates the sample cell affixed to it. Figure 2.4A shows a CAD drawing of the sample cell holder while a photograph of the actual sample cell holder is shown in Figure 2.4B. The sample cell holder is 3-D printed from a carbon-fiber reinforced plastic (CFRP) using the selective laser sintering (SLS) process, which offers superior tensile strength compared to typical 3-D printed plastics, lighter weight than metal (to enable the motion control system to operate most efficiently), and better vibration damping than metal.

Figure 2.4 The sample cell holder. (A) Computer-aided design (CAD) was used to create a sample cell holder with precise dimensions and characteristics. (B) Production of the sample cell holder utilized selective laser sintering and carbon-fiber reinforced plastic to produce a stiff and lightweight component.

42

The sample cell has a 3D printed spacer with integrated buffer exchange channels. Figure 2.5A shows the CAD drawing of the sample cell, while a photograph of the actual assembly is shown in Figure 2.5B. The combined process of CAD and 3D printing allows for consistent sample cell construction compared to methods in which the sample cell is built by hand with tapes, pieces of glass, or other parts, which can often lead to gaps that allow enhanced buffer evaporation and significant variations in sample cell volume or in relative positions of the components inside the cell. While the overall dimensions of the spacer is approximately 60 mm × 40 mm, the actual sample cell, i.e. the region where the glass micro-rod and DNA tethers are located during an experiment, is merely 5 mm × 10 mm, which gives a sample cell volume of just 50 microliters. This small sample cell volume allows for higher concentrations to be achieved in the sample cell with lower volumes of proteins or DNA. Permanent magnets are arranged side-by-side at the edge of the sample cell chamber and project a magnetic field into the sample chamber.

2.2.3 Optical Calibration with Graticule

When the sample is imaged with a CCD- or CMOS-based camera, there is a specific ratio between the pixel size of the sensor and the physical area of the sample from which each pixel collects photons, the value of this ratio being determined by the

43

Figure 2.5 The experimental sample cell. (A) CAD drawings show the design of the sample cell, from the top on the left and the bottom on the right. Important features are the buffer outlet (hose barb) and buffer inlet with associated fluid channels, the sample cell area, and the magnet notch. (B) The resulting part when produced using stereolithography (SLA) method. In this picture, the sample cell is being tested with the original buffer being withdrawn in exchange for the buffer that is colored brown.

44 optics used to image the sample. This ratio must be determined in order to calculate the

DNA tether extension and forces, and is done by imaging stationary NIST-certified graticules which consist of markings of known dimensions and spacing etched into glass. A count of the number of pixels between the centers of two adjacent marks gives the required ratio as the physical distance between the marks is known – see Figure

2.6A and Figure 2.6B for images of the graticules and magnified markings.

Figure 2.6 Optical calibration graticules. (A) The first of two graticules used to calibrate the optical pixel ratio (top), and under 40x magnification (bottom) showing the 0.0001 inch divisions. (B) The second graticule used to calibrate the optical pixel ratio (top), again with the markings under magnification (bottom). In this graticule, the distance between markings is 100 µm.

45

The calibration procedure was as follows. Three different images of each of the two graticules were acquired and analyzed independently by three different people at three unique spots in each image, leading to ( 2 x 3 x 3 x 3 ) = 54 counts of the pixel ratio.

Analysis of these counts determined a physical pixel size ratio of 0.058 m (58 nm). That is, each pixel collects photons from a square in the sample cell that is approximately 58 nm per side at the focal plane. Using this value, a lambda-phage DNA molecule that is fully extended at 16.4 m should have approximately ( 16400/58 ) ≈ 283 pixels between the two beads that are attached to the molecule (when measuring the distance from the bead centroid, corrections for the finite bead radii will have to be made).

2.2.4 Computational Methods for Force and Extension Calculations

Since a single DNA molecule cannot be imaged directly using optical microscopy, tether extensions and forces are found from precise measurements of the positions of the optical centroids (or optical center of masses) of the two micron-sized beads (which can be visualized using light). Since typical displacements of a DNA molecule in my studies are of the order of 10 nm, while the pixel size is 58 nm, there is a requirement to localize the position of the beads to a level better than a single pixel, a process called sub-pixel localization. For instance, in the case of a single histone octamer dissociating from the DNA, the ends of the DNA will shift by ~50 nm upon unbinding

46 of the octamer, which would typically not show up as a detectable displacement of the bead centroids; that is, the centroid would remain in the same pixel before and after the octamer popped off.

My scheme for sub-pixel localization is based upon fitting circles to the maxima of the diffraction pattern, i.e. Airy disks, generated by the beads. It is known that a point source will generate a diffraction pattern of concentric circles with a central maxima and alternating minima and maxima moving outwards. By locating several concentric circles, a common center point can be calculated with greater precision than allowed by the size of a single pixel. The first step is to use a Circular Hough transform to detect the concentric diffraction rings. The Hough transform is a well-established process from which the specific Circular Hough transform can be generated (Ballard

1981, 111-122). At edges of images, the gradient of the image should be relatively large as the intensity of neighboring pixels changes rapidly. The algorithm for detecting the circular features in an image then goes as follows: 1) calculate the image intensity gradient, 2) where the absolute value of the gradient is larger than some threshold, create a circle according to the equation where (i, j) is the pixel in question and (a, b) are points which satisfy the equation for a given radius r, 3) the cells that have (a, b) which satisfy the equation for a circle are incremented in the accumulator, 4) repeat this process for all pixels which have gradients above the

47 threshold, 5) cells which have been voted in the accumulator to above a certain limit are considered to be the center of a circular feature. At this point, the circular objects of interest have been identified. From there, the pixels that represent the edge of the circle can be identified by examining the pixel intensity gradients again, which identify the edge(s) of the circle(s). Once these edge pixels are identified, a circle can be fit and an optical centroid can be located to sub-pixel precision.

In Figure 2.7A, an example frame grab is shown from a movie in which a single

DNA molecule is attached to two beads with the non-magnetic bead also attached to a glass micro-rod at the top of the image. The accumulator array is shown in Figure 2.7B, where the values in the plot represent the pixels in the accumulator that are most likely to be the center of a circular object. In comparing the accumulator array to the initial image, it is evident that there are several peaks that correspond not only to the microspheres that are desired, but also neighboring objects. However, the microspheres of interest are often (as in this case) represented by the largest peak values in the accumulator array, since the microspheres of interest show clear focus and diffraction patterns. Using this information, the accumulator array is analyzed and the two primary microsphere centroid localizations are determined. Finally, in Figure 2.7C, I show with red markers how the peaks in Figure 2.7B correspond to the original image; the optical centroids of the two beads are highlighted by these red markers. As

48

Figure 2.7 Localization of the microspheres using a Circular Hough Transform. (A)

Each individual video frame is analyzed by the microsphere localization procedure.

(B) The accumulation array generated by analyzing the image with the Circular

Hough Transform, showing the various regions that represent potential circular features. Usually the two highest peaks correspond to the desired microspheres, which are then used to calculate the centroids of the microspheres to sub-pixel resolution. Note the diffraction rings that appear around the primary peaks in the accumulation array. (C) The centroid locations of the two primary microspheres are plotted against the original image.

49 implemented in my software, each frame is analyzed to detect the two beads. This process is repeated for every frame in the movie, which generates a list of times (frame number) and coordinates (x, y) for both beads. The coordinates can be converted to physical units using the conversion ratio discussed earlier and the extensions can be obtained by calculating the Euclidean distance between the two bead centroids.

The applied tension on the DNA molecule is found by analyzing the Brownian fluctuations of the tethered magnetic bead in the transverse direction (transverse to the force, i.e. in the x-direction). Their variance is inversely proportional to the force on the

tether and is given by the Fluctuation-Dissipation (FD) theorem: . As

implemented in my program, the force value is evaluated as follows: 1) select a window size in frames (in which the number of frames is related to real time by the observation frequency, or frame rate) with the current frame at the center of the window, 2) calculate the average microsphere centroid x-position for all observations in that window, < x >, as well as the mean extension for that same set of observations, < z >, 3) calculate the deviation of each observation in the window from the mean position,

, 4) calculate the mean of the squared deviations, , and 5) use the FD theorem to determine the force at the current frame. This process is repeated by shifting the window by one frame, recalculating all the values for the FD theorem, and continuing on for all frames throughout the entire movie. See Appendix A.6 for the

50 complete MATLAB code that calculates positions of bead positions and the forces applied to the DNA tethers.

2.2.5 Force Calibration via Stokes Law

I checked the performance of the forces calculated by the Fluctuation-Dissipation theorem by comparison to drag forces computed from the terminal velocities of untethered superparamagnetic microspheres moving in an inhomogeneous magnetic field. These velocities allowed us to determine the magnetic forces on the particles at various distances from the magnet. Good agreement between the two sets of measurements would indicate that calculated DNA tether forces were determined accurately.

There experiments were carried out as follows. Using micropipettes with a 15 µm

– 20 µm opening, I released 2.8 µm magnetic beads 300 µm from the magnet and halfway between the floor and the roof of the cell, and then at distances from the magnet increased in 100 µm increments up to 2500 µm. The buffers used were low , 1.5 centi-Poise (cP), 25% w/v CaCl2 solution and high viscosity, 7cP 55% w/v glycerol solution. Magnetic microspheres (bead density 1.22 gm/cm3) are neutrally buoyant in the CaCl2 solution while glycerol retards sedimentation. The ejected beads quickly reach terminal velocity in the frame of the image. The spatial rate of change of

51 the component of the magnetic field pointing towards the magnet does not vary too greatly over a distance of 20 µm – 30 µm. This is because the force is constant over 20

µm – 30 µm changes in distance between tethered beads and the magnet, even as close as 300 µm from the magnet. Since 20 µm – 30 µm is also the approximate z-field of view (the direction of the force is defined to be the z-axis), I use the terminal velocity in

Stokes drag law (Batchelor 2000) to evaluate the force at that location, .

Here, η is the viscosity of the medium, vz is the velocity in the direction of the force, and d is the bead diameter, d = 2.8 µm. Because the velocity of the beads is ~10 µm/s and, thus, the Reynolds number is small, use of Stokes force law is valid. Further, the effect of the vertical bounding surfaces is negligible because particle trajectories are confined to a plane well separated from them. The buffer were measured using a

Thermo Haake RheoStress 600 viscometer (Thermo Scientific, Pittsburgh, PA).

The resulting movies are processed frame-by-frame to determine the optical centroids of the particles using the method of Crocker and Grier (Crocker and Grier

1996, 298-310). Note that only the image processing and feature localization subroutines are used; the trajectory determination procedures are not called. The MATLAB code utilized for this is courtesy of Blair and Dufresne (Blair and Dufresne 2008). Once particle coordinates are determined, I utilize my own tracking algorithm (see Chapter 4 of this dissertation) to determine the likely trajectories for the microspheres at the

52 various pipette-magnet separation distances, which are then used to calculate the velocities of the particles. The inhomogeneity in the magnetic properties of the particles leads to a histogram of terminal velocities. The peak velocity of the histogram is used as the terminal velocities of all beads. The forces are computed using Stokes Drag Law as mentioned and then checked against the force-distance relationship from single molecule experiments.

2.3 Experimental Results

The magnetic tweezers device was tested to demonstrate proper functionality.

Experiments were performed with single tethers of DNA and also with proteins and

DNA to detect DNA-protein interactions.

2.3.1 DNA tether extension

I show a series of frame images from a DNA pulling experiment in Figure 2.8.

Here, the dual bead construct is evident with the non-magnetic polystyrene bead fixed to the surface of a glass micro-rod at the top of the image while the superparamagnetic bead is connected to the non-magnetic bead by way of the DNA molecule. At a low

53

Figure 2.8 Attached DNA tether undergoing extension. These are frame grabs from an experiment in which dual-bead DNA tethers were attached to the surface of a glass rod through a link between the nonmagnetic bead and the glass micro-rod. From left to right, the distance between the magnet and the glass micro-rod is decreased causing an increase in the force on the DNA tether extending the molecule until its full contour length of 16.4 µm is reached. force (~2 pN), the left-most panel shows a tether extension of approximately 14 µm.

Going from left to right, the distance between the tether and the magnet is decreased by moving the magnet closer to the fixed glass micro-rod. Ultimately, the full contour length of 16.4 µm is attained as the force approaches 30 pN in the right-most panel.

54

A hallmark of DNA micromanipulation is the ability to reversibly extend and contract a single DNA molecule. Since in thermal equilibrium DNA shows no hysteresis, the molecule should follow the same path of force and extension as it is manipulated. In Figure 2.9, I plot results from an experiment in which the DNA molecule is moved back and forth between different distances from the magnet. Figure

2.9A shows the change in the tether extension over time. Beginning at an extension of 14

µm, the magnet and DNA are brought closer together (blue) and end-to-end length of the DNA molecule increases up to ~15.5 µm. At this point, the experiment pauses for several minutes before moving the magnet and DNA apart from each other (green).

This continues into a region of entropic elasticity down to ~13 µm extension. At this point, the experiment pauses before once again moving the magnet and DNA tether closer together (red) and leading to the DNA molecule reaching an extension of ~15.7

µm. The forces calculated during this experiment are plotted in Figure 2.9B. As with

Figure 2.9A, the blue and red points are data when the DNA tether and the magnet are being brought closer together, while the green data are for when the magnet is being moved further away from the DNA molecule. As expected, the force increases in the blue and red regions while the force decreases in the green region. The blue region covers a range of forces from ~1 pN to ~5 pN, after which the green region goes all the

55

Figure 2.9 Reversible stretching of a single DNA molecule. (A) The DNA tether extension is shown for 3 separate parts of a single experiment. In blue, the tether extension increases as the magnet is moved closer to the molecule. Then, in green, the magnet is moved away from the molecule, resulting in a decrease in extension.

Finally, the magnet and tether are brought closer once again and the DNA molecule extends again. (B) The forces calculated for the experiment phases as describe,

56 resulting in an increase of force as the magnet is moved closer to the tether (blue), a decrease in force as the two are moved apart (green), and an increase in force as the molecule and magnet are brought together (red). (C) A plot of DNA force versus extension. As in parts A and B, the colors represent changing the distance between the magnet and the DNA tether. The Worm-Like Chain model is plotted as a black dashed line. way down to ~0.5 pN, followed by an increase of force in the red region to nearly 12 pN.

Finally, in Figure 2.9C I plot force and extension together. Here, each data point has coordinates corresponding to the force and extension from individual frames.

Again, the blue and red data occur as the DNA and magnet are moving closer together while the green data occur as the DNA and magnet move away from each other.

Additionally, the Worm-Like Chain Model for DNA is plotted as a dashed line in black.

It is clear that the three different regions of the experiment overlap with each other, showing no hysteresis, and that all of these regions also show excellent agreement with the Worm-Like Chain model.

57

2.3.2 Force Calibration results

I present the results of force calibration trials in Figure 2.10. The single molecule experimental data came from a series of DNA micromanipulation experiments. In the bead drop experiments, superparamagnetic beads identical to the ones use in the DNA experiments (as described in section 2.2.5) were used and were subject to the same forces at the same distances from the magnet as the microspheres used in the DNA tether experiments. For the DNA tether data, forces were calculated using the

Fluctuation-Dissipation theorem; for the bead drop experiments, Stokes Drag law was used to relate the terminal velocity of the superparamagnetic beads to the magnet force.

A comparison of the two methods is given in Figure 2.10. Plotted in solid black are the results from single DNA tether force calculations as a function of distance from the magnet. Similarly plotted are four separate sets of data collected from the bead drop experiments, again showing force as a function of distance from the magnet. I show 3 trials using glycerol-based buffer solution and one trial using calcium-chloride buffer

4 solution. Also plotted is a 1/r force curve with two fitting parameters A and B:

. Based on my data, I found A = 2.15 x 1013 pN·m4 and B = 4 x 10-7 m fit the results | | very well. This curve gives the expected theoretical force versus distance relationship for a magnet with the geometry I used – see (Yung, Landecker, and Villani 1998, 39-52).

58

Figure 2.10 Force Calibration via Stokes Law. This plot shows the results from the bead drop experiments which utilized Stokes Drag Law as a way to measure force at various distances from the magnet. Three trials used a glycerol buffer while one trial used a calcium-chloride buffer. In black, 7 different DNA tether extensions were used to find the force on the DNA molecule as calculated using the fluctuation-dissipation theorem. The black line represents the mean calculated force for those trials. Finally, the dashed grey line shows a 1/r4 fit line which is the relationship between the distance from a magnet and the force for a permanent bar magnet.

For medium forces and distances (500 - 1000 µm), the glycerol bead drop trials in blue and yellow follow the 1/r4 curve and also align well with the Fluctuation-Dissipation- based calculations. Between 1000 - 2000 µm, I also see very good agreement between the

59 forces found using DNA micromanipulation experiments, the bead drop trials, and theory. The results here show the consistency of experimental data and tracking-based methods, since all approaches agree on the force-distance relationship. This is especially important since the forces here, 0.1-10 pN, are in the range relevant to DNA-protein interaction studies carried out using the magnetic tweezers. However, at closer distances to the magnet (< 500 µm), the DNA-based forces do not agree with those from experiments or the 1/r4 fit. At these distances, the microsphere experiments and the theoretical fit show that forces are > 50 pN, a range where the bead-DNA bonds break easily, making it difficult to acquire sufficient data.

2.3.3 Determination of Experimental Precision

Extension and force data are subject to thermal fluctuations, which set the fundamental limit on precision. Additional noise components include localization error from video-microscopy-based centroid detection and other non-thermal (instrumental) noise sources. To quantify the precision, nine replications of an experiment were performed in which a DNA molecule was held at specific distances from the magnet for five minutes at each location while recording force and extension data. The distances used for these experiments were 2000 µm, 1750 µm, 1500 µm, 1250 µm, 1000 µm, and

900 µm. Analysis of the data involved calculating the average extension and force for

60 each of these distances, as well as the standard error of the mean. I plot the results of these experiments in Figure 2.11. At 2000 µm from the magnet, the mean DNA extension was 13.08 µm with a standard error (S.E.) of 0.030 µm, while the mean force was 0.59 pN with S.E. = 0.050 pN; at 1750 µm from the magnet, the mean extension was

13.53 µm with S.E. = 0.024 µm while the mean force was 0.72 pN with S.E. = 0.05 pN; at

Figure 2.11 Extension and force measurement precision. Experiments were performed in which individual DNA tethers were held at six distances from the magnet. The corresponding extensions and forces were calculated and averaged across all experiments. The error bars represent the standard error of the mean in both extension and force.

61

1500 µm, a mean extension of 13.86 µm was observed with S.E. = 0.020 µm, while the mean force was 0.77 pN with S.E. = 0.060 pN; at 1250 µm, a mean extension of 14.12 µm was recorded with S.E. = 0.017 µm, and the mean force was calculated as 0.87 pN with

S.E. = 0.0638 pN; at 1000 µm, a mean extension of 14.38 µm was obtained with S.E. =

0.0078 µm, while a mean force of 1.13 pN was determined with S.E. = 0.080 pN; and at

900 µm, a mean extension of 14.68 µm was found with a standard error of 0.0065 µm, while the mean force was determined to be 1.43 pN with a standard error of 0.10 pN.

These standard errors suggest that in the range at which typical DNA-protein experiments occur (<10 pN), force precision is on the order of ~0.05 pN and extension precision ~10 nm (and less than 10 nm for forces >1 pN).

2.3.4 DNA-protein complexation

Experiments were performed in which histones were introduced to the sample cell in an effort to observe complexation of proteins with a single DNA molecule. Upon binding with the DNA molecule, the tether should become significantly shorter. Indeed,

Figure 2.12 plots the DNA extension as a function of time and the change in extension due to the binding and unbinding of histones is evident. With the DNA molecule initially held at ~1 pN, proteins were introduced to the sample cell and interacted with the DNA, causing an abrupt compaction. This compaction results in the DNA molecule

62

Figure 2.12 Protein complexation with a single DNA tether. The DNA tether extension is plotted as a function of time. Here, the molecule is held at approximately

1 pN as proteins (histones) are introduced to the sample cell. When the histones bind to the DNA molecule, the tether undergoes compaction from ~15.5 µm to ~3.8 µm.

After resting for several minutes, the force on the DNA molecule is slowly increased.

As the force increases, the histones dissociate from the DNA molecule when their binding energy is no longer sufficient to remain attached. Eventually all histones are ruptured and the DNA resumes its original length.

63 reducing its end-to-end length from ~15.5 µm to ~3.8 µm. After compaction, the force on the DNA molecule is slowly increased by moving the magnet and the DNA tether closer together. As this happens, the histones begin to dissociate from the DNA molecule when their binding energy is no longer sufficient to remain complexed with the DNA. Since native histones can unbind to DNA at a range of kinetic rates, the histone ruptures do not occur simultaneously; rather, they occur continually as the force is increased. Eventually the DNA molecule will regain its original end-to-end length once all bound histones are driven off the tether at sufficiently high forces.

Examining the histone rupture sequence in greater detail can provide insight about how these events occur. As described in section 1.5, histone octamers typically wrap 146 base pair of DNA around their core. At 0.34 nm per base pair, this is equivalent to ~50 nm of tether length. Thus, as force is increased and the binding energy of each histone is no longer sufficient, there should be jumps of 50 nm in the DNA tether end-to-end length as the histone octamers rupture from the DNA molecule. In

Figure 2.13, I plot several regions of the histone dissociation sequence that show these characteristic 50 nm jumps as detected by the step-finding algorithm described in

Chapter 3 of this dissertation. First, in Figure 2.13A, I show a region of the protein experiment just above the critical force, between frames 44,400 and 45,000. Here, four individual 50 nm jumps are evident, as is a single event in which two histone octamers

64

Figure 2.13 Individual histone dissociation events quantified. In this figure, different regions of the protein complexation and dissociation experiments are investigated in greater detail. The visual distance between steps in the different plots may vary due to disparate axis scales. (A) Several histone octamer ruptures are identified, including an event in which two histone octamers are released, causing a

100 nm change in DNA extension. (B) The step-finding algorithm allows unbiased detection of events that may be difficult to differentiate manually. Here, several 50nm jumps are detected. (C) Multiple histone octamers may rupture simultaneously. In this portion of the experiment, two events of ~200 nm are shown, which correspond to

65 simultaneous ruptures of 4 histone octamers, as well as a jump of ~150 nm or 3 histone octamers. (D) Three examples of dual ruptures and two examples of single ruptures are evident here. simultaneously ruptured, causing a jump of 100 nm in the DNA tether. Moving to

Figure 2.13B, in frames 48,800 to 49,300, the force has increased and four individual histone octamer dissociation events are observed, each causing a release of 50 nm of the

DNA tether. It is possible to have even greater numbers of histone octamers dissociate simultaneously. This is especially true at higher forces, which I show in Figure 2.13C.

Here, between frames 52,700 and 53,000, two distinct events where four histone octamers have ruptured simultaneously are detected, causing the end-to-end length of

DNA to grow 200 nm for each. Additionally, there is a release of ~150 nm of DNA, corresponding to a triple dissociation event. Again, in Figure 2.13D, several ruptures of multiple histone octamers are evident: three events releasing 100 nm occur.

Interestingly, in this region between 54,000 and 54,400 frames, there still remain individual histone octamer dissociations that can occur.

2.4 Discussion

The transverse magnetic tweezers device that I have developed utilizes several novel aspects in an effort to achieve greater simplicity, reliability, precision and

66 reproducibility. Attaching a dual-bead DNA tether construct to the surface of a movable glass micro-rod allows easy capture and the ability to examine several DNA tethers for optimal performance. Dual-bead DNA tethers are ideal for passive drift compensation and precise extension measurements. Previous efforts to use a dual-bead constructs were mainly limited to optical tweezers and aspiration-based magnetic tweezers. As mentioned in the introduction, these methods have drawbacks which my method works to overcome. For instance, while the loading rate of optical tweezers can be large and difficult to control, the transverse magnetic tweezers I present here is capable of quasi-static conditions. From section 2.3.2, I can characterize the force as a function of distance from the magnet by using the calibration fit line defined as

| | which has a derivative

where r is in microns and F is in pN. The linear stage system is capable of moving at a speed of approximately 200 nm/s. With these values and at 1000 microns distance from the magnet, the tweezers device is theoretically loading the DNA molecule at 0.0001 pN/s, or 0.1 femtonewton per second. A change in force of that level is far too low for any device to detect, and thus for all intents and purposes the tweezers is operating in

67 quasi-static equilibrium. This is especially beneficial for DNA-protein experiments as it allows the device to delicately increase the tension to disrupt the binding interactions, which allows for the observation of distinct features of the force and extension data during protein experiments.

Aspiration pipette magnetic tweezers eliminate a large portion of the complexity of optical tweezers and in some instances can enable loading rates that approach the level of my device, but the ability to consistently create and implement the micropipettes can prove a significant hurdle in achieving reproducibility from trial to trial. Eliminating the variable aspect of creating individual micropipettes and replacing them with standardized manufactured glass micro-rods helps to increase experimental consistency. Additionally, I have found that the attaching of the dual-bead constructs directly to surface of the glass micro-rod allows multiple dual-bead DNA tethers to be examined in a matter of minutes, which permits the user to quickly choose which DNA molecule to use for the experiment. This is in contrast to optical tweezers or aspiration magnetic tweezers in which users must repeatedly “fish” for DNA tethers individually.

I have also demonstrated repeatability of my device by performing two types of experiments. First, individual DNA tethers are able to have forces repeatedly applied and released by varying the distance between the magnet and the DNA tether. An example of this experiment was shown in Figure 2.8. Since DNA does not show

68 hysteresis, the expected behavior of a single DNA molecule is that it should follow the

Worm-Like Chain model regardless of the manner in which forces are applied. Thus, the close agreement between the force-extension measurements that I show and the

Worm-Like Chain model are evidence that the device and the DNA tethers function appropriately. Secondly, experiments were performed in which single DNA tethers were moved at a constant rate towards the magnet and a broad spectrum of forces calculated. These results, as a complement to the bead-drop calibration experiments, show consistent agreement with the expected force relationship and also with the bead- drop experiments. Thus, the ability to repeatedly manipulate single tethers of DNA across a wide range of forces is shown with my device.

It is also important to discuss the performance limitations of the device.

Brownian motion of solvent molecules results in an uncertainty in the precision and accuracy of measurements of bead positions at room temperature. Moreover, as a bead tethered to a single DNA molecule in a magnetic trap may be modeled as a bead attached to a spring, thermal-fluctuation-actuated displacements of the bead will lead to random extensions of the spring, and thus to a random component of the force on the bead (by Hooke’s law, the force is proportional to the displacement). And, thus, force measurements on micron-sized objects are also subject to a fundamental measurement limit. As a result, accuracy and precision of micromechanical extension and force

69 measurements cannot be improved beyond the thermal limit without changing the bead size, buffer viscosity, or temperature.

However, the bead size can be altered only within a narrow limit. On the one hand, beads have to be large enough to be detected optically; on the other hand, very large beads will not tether to DNA properly. For my experiments, beads with diameters

~ 3 m have been found to be the right size. The viscosity of the buffer is also set by the requirement of performing experiments in physiological conditions. This means single molecule experiments have to be performed in aqueous buffers with viscosities close to that of water. Additionally, as the experiments are performed on single DNA molecules and proteins, the temperature has to remain constant at room temperature.

The thermal resolution limit of extension measurements can be estimated using

while the smallest measured force can be found from

√ as discussed in (Neuman and Nagy 2008, 491-505). Here, α is the (magnetic) trap stiffness, is the drag coefficient, a is the bead radius, η is the viscosity of the

-21 -5 buffer, ∆f is the cut-off frequency, and = 4.1 × 10 J. At 10 pN, α = 2.66 x 10 N/m, a

70

= 1.5 × 10-6 m, η = 8.9 × 10-4 Pa s, and ∆f = 15 Hz, leading to a thermal limit for extension resolution of 2.96 nm and δF = 0.079 pN.

Use of video-microscopy to localize bead centroids further lowers the precision with which beads can be found. The magnitude of the error thus introduced may be estimated using

as discussed in Bobroff (Bobroff 1986, 1152-1157) – see also the following reports

(Pertsinidis, Zhang, and Chu 2010, 647-651; Thompson, Larson, and Webb 2002, 2775-

2783; De Vlaminck and Dekker 2012, 453-472). Here, is the systematic error in centroid localization, Γ is the half-width (or radius) of the image spot (or object of interest), SNR is the signal-to-noise ratio of the detector (the CCD or CMOS camera in this case), and NΓ is the number of samples in the distance Γ. As expected, depends on camera parameters SNR and NΓ and microscope objective parameter Γ. This equation states that the accuracy of centroid localization is proportional to the spot size, and inversely proportional to the SNR of the device and the number of samples (pixels) used in the localization scheme. Thus, to minimize the localization error one should: 1) increase the SNR, 2) decrease the spot size, which decreases the absolute amount of

71 noise that exists in the region of interest, and 3) increase the number of observation points in the spot size, which improves the quality of the fit.

For the tweezers described here, the SNR of the Point Grey Grasshopper3 camera

is 45.12 dB (Point Grey 2015) which gives a unitless SNR of ; the half- width of the central maxima of the diffraction pattern is ~ 0.8 m; and the number of pixels in the distance Γ is approximately 13.86 (with a pixel-size ratio of 0.058 m per pixel). This leads to an error in centroid localization of , or 8.6 nm.

In some cases, additional corrections due to finite integration (or exposure) time, aliasing, and motion blurring become important and have to be taken into account.

CCD- or CMOS-based cameras have a finite integration time, which is the time it takes for the sensor to image a single frame. For the camera used here, the exposure time can be set in the range 0.005 ms to 3.2 s (Point Grey 2015). Aliasing refers to the introduction of spurious frequency components in the power spectrum of 〈 〉 the position variance. This is because a camera with frame rate of can only correctly sample motion at frequencies less than or equal to ⁄ by Nyquist’s theorem. Motion blurring refers to position measurements being averages over the bead position between two successive frames. Longer the time between successive frames, greater is the averaging.

The combined effect is to introduce biases or systematic offsets in the estimate for

〈 〉 and, thus, in the measured force. However, these are mainly of concern for

72 experiments with extremely short tethers (less than 1000 bp), small microspheres (less than micron), and higher applied forces (De Vlaminck and Dekker 2012, 453-472), and thus do not influence my results in a substantial way.

Comparison of the calculated limits to the experimental precision values from

Figure 2.11 show that the transverse magnetic tweezers device operates very near the theoretical limits. From section 2.2.3, I report experimental precision in extension approaching 10 nm, depending on the force applied to the DNA tether, which compares favorably with the theoretical resolution limit of ~10 nm, due to thermal noise (

) and localization uncertainty ( ). Similarly, I find experimental precision in force to be ~ 0.1 pN, which again is very near the suggested theoretical limit of 0.08 pN. These results advocate that the physical device and software methods are operating at a level very near the limit of physical possibilities.

The precision of the device is particularly important when performing DNA- protein complexation experiments. Proteins that interact with DNA will often cause changes in extension on the order of 10 nm, which is the theoretical limitation of the transverse magnetic tweezers. I examined the ability for my device to observe individual DNA-protein interactions by allowing histones to bind with single DNA tethers. As shown in Figure 2.12, this results in the expected compaction of the DNA tether and subsequent dissociation of histone octamers as the tension is increased on the

73 molecule. Additionally, in Figure 2.13, tether extension jumps in multiples of 50 nm are resolved repeatedly at various forces throughout the entire experiment in agreement with previously reported resu

Chapter 3 – Step-finding Algorithm

3.1 Review of Step-finding Algorithms

Methods to analyze time series or other signals to detect regions in which the signal is constant are useful in many different fields of science and engineering. Single- molecule DNA-protein micromanipulation experiments, for instance, often involve studying the binding and force-induced unbinding of proteins from a single tethered

DNA molecule. After the DNA tether is extended under a small force, proteins are introduced into the sample cell. These proteins may bind to the DNA and compact the

DNA by stabilizing loops, bends, or kinks. Once the DNA is compacted, the force can be increased until just large enough to drive off the bound proteins. When the end-to- end tether extension is measured, the unbinding events will be seen a jumps between constant-extension regions. This type of step-like structure can also be found in other types of single molecule experiments in biology, and often arise in areas such as electrical engineering or econometrics. In this paper, I present a novel method for automatically finding steps in such data series.

Algorithms that perform analyses of this kind are often referred to as “step- finding,” “step-fitting,” or “step detection” algorithms. The general goal of algorithms in this family is to identify regions of a data trace where the signal maintains a certain

74

75 level for some time and then changes to another level. This change is frequently abrupt, as in a step described by the Heaviside function. In such a case, the linking segment between successive steps is defined to have a slope of infinity. While it is rare to encounter signals in nature with purely step-function type jumps, it is common to model step data as a series of such functions. Indeed, many step detection algorithms in existence have utilized this principle as a basis for their analysis of the data (Neuman et al. 2005, S3811-S3820).

A key consideration in using such techniques is how well the signal can be modeled this way and what effect departures from the model have on the results.

Another important consideration is preventing over-fitting. For instance, by using suitable number of parameters, it may be possible to detect all steps in a particular type of noisy signal with very high confidence. However, when these settings are applied to other signals, the algorithm’s performance may decline rapidly. Thus, a step finding method should be robust enough to be applied to large classes of signals without requiring re-adjustment of parameters to ensure accurate results. Finally, the signals of interest inevitably have large noise components. The noise may or may not be Gaussian and the algorithm should, ideally, perform well in all cases.

A number of previous studies have examined different approaches to the step finding problem. Kalafut and Visscher describe the application of the Schwarz

76

Information Criterion (SIC) to fitting step functions to a data trace (Kalafut and Visscher

2008, 716-723). The SIC is used as a way to prevent over-fitting by penalizing fits that utilize too many parameters. Carter and Cross propose an algorithm that uses the two sample student’s t-test to evaluate steps (Carter and Cross 2005, 308-312). This algorithm compares a given data point with other individual data points within a certain time range, resulting in the algorithm categorizing each point as belonging to a dwell (plateau), forward step, or downward step. A method utilizing wavelet transform multiscale products was proposed by Sadler and Swami (Sadler and Swami 1999, 1043-

1051). Multiscale products were shown to be effective in isolating the abrupt step transitions in signals which in turn allows identification of the steps. Kerssemakers et al analyzed step data using a chi-squared reduction principle (Kerssemakers et al. 2006,

709-712). By comparing a series of step best-fits to a series of counter-fits, in which the fitted steps are displaced to be in between the best fits, the algorithm is able to approach an optimal number of steps fitted to the data.

Velocity calculation and thresholding is the method described by Levi et al (Levi et al. 2006, L7-L9). This algorithm calculates local velocities of the data—the change in position over time. If the velocity is below a certain threshold, that region of the data is assumed to be a level step. If the velocity is above the threshold, the data must be sub-

77 divided and the analysis is repeated. This method also uses the Akaike Information

Criterion (AIC) to penalize over-fitting of the data.

Arunajadai and Cheng utilize a combination of Generalized Least Squares and

Bayesian Information Criterion to reduce over-fitting of the step functions to the data with success (Arunajadai and Cheng 2013, e59279). Their method is notable for its avoidance of assuming a particular type of noise. Rather, it observes and determines the noise correlation throughout the data to more accurately detect the underlying signal.

Pair-wise distribution functions have also been used to map steps in a data trace by Kuo

(Kuo et al. 1991, 135-138) and Block (Block et al. 2003, 2351-2356), respectively. Such algorithms can be useful when the data contain step sizes that are relatively constant; step-sizes that vary over a wide range are not handled well by such methods. Other groups, including Milescu (Milescu et al. 2006, 1156-1168), have based their step detection algorithms on knowledge of underlying principles that generate the data and a Markovian model process to identify steps in the data. These methods are intended for very specific applications, and as such it is difficult to compare them to more generalized algorithms.

Finally, I refer the method of Herbert et al (Herbert et al. 2006a, 1083-1094) which utilizes an 8-fold averaged log-dwell histogram to extract peaks corresponding to dwells of RNA polymerase movement along base pairs of a DNA tether. The first step

78 of this method is similar to mine—calculation of relative probabilities for extensions— but diverges quickly after that in order to meet the specific needs of the algorithm's application in that situation.

Many of the aforementioned techniques utilize statistical methods to fit a series of step functions to the data. This means that the transition between steps is instantaneous. For data in which the steps do not feature a linking segment slope of infinity--data for which the linking segments may have a variable slope--such algorithms may not be so effective. Furthermore, several of the methods make assumptions about the underlying noise properties. While it is true that independent

Gaussian white noise is the predominant type of noise encountered experimentally, it is not always the case. Opto-mechanical systems that are used in biology, biological physics, and electrical engineering can introduce frequency-dependent or colored noise.

Analyzing a data trace that features correlated noise with an algorithm that assumes uncorrelated noise can lead to problems.

3.2 Step-finding Algorithm Methodology

The intent of the step-finding algorithm is to detect steps or plateaus in a noisy data trace. This algorithm should not assume that the signal can be modeled as a series of step functions (e.g. Heaviside functions) and does not assume any specific type of

79 noise. The algorithm must be capable of detecting plateaus in the data that are not strictly step-like in nature and may instead be linked by data has a trend line that has a finite slope. This method shall be generalized and nor requiring any a priori knowledge about the data signal, such as step sizes, distributions, frequencies, durations, or noise.

The algorithm methodology will be broken into three parts: step-trace simulations, step-finding algorithm description, and performance analysis. The step- trace simulation section will lay the groundwork for characterizing steps based on several parameters. The step-finding algorithm description section will demonstrate the theoretical background and components of the actual detection algorithm. Finally, the performance analysis section will discuss how the effectiveness of the algorithm is to be judged.

3.2.1 Step-trace simulations

The simulated signals consist of a series of individual “steps”, which are horizontal line segments or “plateaus” of adjustable length, separated by line segments, also of adjustable lengths, with positive slopes. Each step is characterized by three quantities: (a) the step length, which is the length of the plateau; (b) the step height, which is the vertical distance between the end of one plateau and the start of the next, and (c) the plateau separation, which is the linear distance between the end of one

80 plateau and the start of the next one. The slopes can be set from near zero to arbitrarily large values, and are calculated from the ratios of the step heights to plateau separations. The slopes of the connecting segments are not specified explicitly. The goal is to determine the locations of the steps which are specified in arbitrary units corresponding to the y-coordinate value. Figure 3.1 shows an example data trace with the various step parameters labeled.

Figure 3.1 Defining the various step parameters. An example of a typical data trace and the various parameters used to describe the steps is shown, with arbitrary units of position and time along the y and x axes respectively. Step length represents the duration of the horizontal component of the step. Step height refers to the vertical distance between subsequent steps. Step separation is used to describe the horizontal

81 distance between subsequent steps. These three parameters can be used to define an entire data trace of unique steps.

A MATLAB program was written to generate simulated data traces based on the parameters described above (see Appendix A.7 for the code). The input variables consist of: 1) the number of steps to be generated, 2) the step separation, 3) the step height, 4) the step length, 5) whether or not to include normally-distributed noise in the final signal, and 6) the noise width, σ which is the standard deviation of the normal distribution. Input parameters 2, 3, 4, and 6 are specified in arbitrary units. Similar methodology was used to generate data sets with Poisson-distributed noise; the only difference being in step 6, where random noise according to the Poisson distribution is added in rather than Gaussian noise.

From the step separation and step height, a slope is calculated for the link between successive plateaus, which is then used to determine the y-coordinates from the x-coordinates for the segments linking the plateaus. Once these segments have been determined, the plateaus are found by extending the last point y-coordinate of the linking segments for the number of points specified by the step length parameter. This generates the solid line data trace of Figure 3.1. From here, the noisy data trace (dots in

Figure 3.1) is created by multiplying a normally distributed random number in the interval [-1,1] by σ and adding that to the red base signal. Thus, for σ = 4, the noise will

82 modify the original base signal by some additive value in [-4,4] in accordance with the normal distribution, for each point in the base signal. I show in Figure 3.2 how the addition of Poisson distributed noise can alter the baseline signal of the step simulation process; note the presence of clustered data points, a hallmark of the discrete noise generated using the Poisson distribution.

Figure 3.2 Example of Poisson distributed noise. Subtle differences are evident in comparing the Poisson distributed noise of this figure with the Gaussian noise of

Figure 3.1. The discrete nature of the Poisson distribution results in clusters of points that are close together yet separated more from other clusters.

83

All three step parameters in the simulation program can be assigned randomized values from a user-defined interval within a single simulated signal. This allows a better emulation of experimental data. Step lengths, step heights, and step separations may be drawn from uniform or normal distributions, and the user may independently specify the range for each parameter: for instance, [20, 100] for step length, [50, 200] for step height, [40, 80] for step separation. The data traces for such simulations are calculated in essentially the same way as for constant parameter traces, except that for each single step a random parameter is chosen prior to calculating the slope of linking segment and length of plateau; the noisy data trace is generated in exactly the same way as in other examples.

3.2.2 Step-finding Algorithm description

In Figure 3.3A I plot in blue a portion of a data trace that is to be evaluated by the algorithm, with the base line signal shown as a solid red line in Figure 3.3B. The raw data (blue) shown in Figure 3.3A have noise σ=3 and are unfiltered. From here, the first stage of the step-detection algorithm is determining the discrete probability density function (PDF) of the data trace (Figure 3.3C, red). The discrete PDF is a histogram that is calculated by counting the number of data points for discrete bins with each bin defined by an interval of y-coordinates. The user may generate the PDF in a completely

84 automated way. In this case, the algorithm will find the mean difference between the y- values of successive points throughout the entire data set, which gives a rough idea of the “width” of the data. This number is then used as the bin size for the data.

Alternatively, the user may specify a bin size to be used. The result is the transformation of the data from position-time space to probability-y-position space.

Next, the second derivative of the PDF is calculated using finite differences (Figure

3.3D, green). This helps to emphasize changes in the direction of the signal, which is the core information used to identify values of the y-coordinates where there is a high density of data points and therefore a significant probability of the existence of a plateau. Then, the second derivative is added back into the original PDF (Figure 3.3E, magenta) and the result is squared (Figure 3.3F, blue) to make all values positive. This step of summing the original PDF and its own second derivative is performed because I find that it helps to amplify the portions of the signal at which there are a simultaneously large changes in direction and large number of points while reducing the prominence of the signal where there are fewer points and smaller changes of direction.

After these steps, the data have been transformed into a system of peaks and valleys in which the peaks represent the y-coordinate values with a higher probability of data points residing at that value than at the surrounding values and the valleys

85

Figure 3.3 Step-finding algorithm demonstration. This series of plots shows a step- by-step breakdown of the method by which my algorithm obtains the locations of

86 plateaus in an unfiltered data trace with noise σ = 3. For each plot, the thick (bold) colored line represents the signal at that specific stage of the algorithm, with other previous steps shown in thinner colored lines for perspective. (3A) [Blue] A segment of a noisy data trace with randomly assigned step parameters. (3B) [Red] The baseline signal for that data set shows four (4) areas where the signal is constant. (3C) [Red]

The initial probability distribution function (pdf) result is plotted. (3D) [Green] The second derivative of the pdf is shown. (3E) [Magenta] The second derivative from 3D is added back into the original pdf. (3F) [Blue] The result of 3E is squared. (3G) [Black]

The resulting signal from 3F is modified using Local Extrema Interpolation

Averaging. (3H) After identifying the significant peaks from 3G and determining the center of those peaks, the results [dashed black] are plotted against the raw data from

3B. representing a low probability of points existing at that y-coordinate value with respect to the surrounding values.

While identification of peaks is possible using the resulting distribution, nearly all examples will show better results if the PDF is smoothed. A very effective way to smooth this signal is to use my own Local Extrema Interpolation Averaging (LEIA). The

LEIA method I have created finds all local maxima and local minima separately and then uses a common interpolation method (linear or cubic spline) to determine two

87 smoothed distributions using the local maxima and local minima respectively. Once these two signals are obtained, they are then averaged over the entire data range to obtain a new data distribution (Figure 3.3G, black).

Next, a MATLAB routine extrema.m is used to find the peaks, which are then analyzed by a peak scoring method based on the arc lengths of all peaks in the data series. For each local maxima, the arc length along the LEIA smoothed function from the minima preceding a maxima to the minima succeeding it is calculated. The standard deviation of the set of arc lengths is calculated. The set of arc lengths is trimmed of the largest value and the standard deviation is calculated again using the trimmed set of arc lengths. This is repeated for the entire set and the percent difference of the standard deviation is calculated after each iteration of the trimming step. The point at which the largest percent difference between standard deviations of trimmed sets occurs is used to differentiate between the significant peaks and noise peaks. The reasoning here is that the standard deviation will show the greatest change in percent difference when the peaks with large arc lengths are trimmed from the set of arc lengths and only small, relatively constant peaks remain. This is generally sufficient to successfully determine which peaks are significant enough to be the result of a plateau in the data.

These significant peaks are then the output of the algorithm, representing the points at which the data signal has a local plateau. Figure 3.3H plots the original signal

88

(from Figure 3.3B) along with the locations (dashed black lines) of the peaks from

Figure 3.3G, showing that the peaks correspond to sections of the signal that contain steps.

An optional component of the step-finding algorithm is a data-preprocessing step that involves the application of a filter. While the step-finding algorithm is capable of identifying steps in raw data, there are many cases in which data pre-processing can be helpful. In situations where a data set may see benefit from pre-processing, an edge- preserving bilateral filter was adapted from the DIRART Image Processing MATLAB toolbox (Yang et al. 2009, 844-847) in order to best preserve the step-like features in the data. The bilateral filter assigns a new coordinate for each point in a time series based upon a two-dimensional Gaussian-weighted kernel. The basic inputs to the bilateral filter consist of the data, filter width and filter height. The filter width and height determine the size of the Gaussian-weighted kernels in the horizontal (x-coordinate) and vertical (y-coordinate) directions, respectively. Increasing the filter width would include more points in the x-coordinate range when calculating the new value for each point, while increasing the filter height would utilize more points in the vertical y- coordinate. As a special case, using a width of zero for either parameter would eliminate using any points in that dimension to calculate the new value. A filter height of zero would result in a purely horizontal Gaussian kernel, while a filter width of zero

89 would result in a purely vertical Gaussian kernel. When the data were filtered using this method, filter width and height of 10 and 10 were used. Figure 3.4 shows an example data trace pre- and post-filtering.

Figure 3.4 Bilateral filter applied to simulated noisy data. The effects of using the bilateral filter pre-processing on noisy data are shown. The solid black line represents the base signal of randomly chosen step parameters. The empty circles are the signal after normally-distributed noise with σ = 9 is added. The black dots represent the result of the noisy data after the bilateral filter with filter height and width both equal to 10. Position and time are in arbitrary units.

90

3.2.3 Performance Analysis

Determining the performance of the algorithm on the simulated data requires comparing the detected steps from the algorithm against the known steps in the base signal with generated data. This involves the simple process of comparing two lists and finding the closest elements on both. The standard for accuracy of a detected step was whether it fell within half-noise-width of the base signal step. Using this criteria, three performance metrics were computed: detected steps, false positives, and false negatives.

Detected steps are those which fall within one-half-noise-width of the known steps.

False positives are results that do not correspond with any known step and false negatives are known steps not properly identified. In the event where multiple steps were inferred by the algorithm within one-half-noise-width of a single real step, the inferred step that is closest to the real step was counted as a true detected step and the other(s) as false positive(s).

3.3 Results

3.3.1 Simulation Results

Simulated data were generated as described in the previous section to test the performance of the algorithm in response to changes in each individual step parameter: length, height, and separation. I refer again to Figure 3.1 to define these parameters,

91 where the step length describes the duration of the dwell at a certain position or y- value, the step height describes the vertical distance between successive dwells, and the step separation describes the horizontal distance between successive dwells.

Additionally, for all permutations of these parameters tested, the noise width was changed between 1 and 9. Tests were carried out by holding two parameters constant at

50 (arbitrary units as discussed in the methods section) and varying the other parameter from 0 or 1 to 90. Since step length or height of zero eliminates any plateaus in the data, those parameters start at 1. However, a step separation of 0 is possible as it corresponds to a canonical step function, and so for step separation the simulated data begins at zero. While holding two parameters constant at 50 and varying the other from 0 or 1 to

90, each parameter triplet is repeated 100 times by drawing from a distribution with a given noise width.

For example, the first parameter set of [1,50,50] for [height, length, separation] is generated 100 times by drawing the noise component from a normal distribution of width 1. This parameter set is then repeated 100 times with noise of width 3; then 100 times with width 5; and so on until noise width of 9.

Next, the independent parameter (in this case height) would be increased to 10, and so a parameter set of [10,50,50] is simulated 100 times, first with noise width 1; then with noise-width 3, and so on until noise width 9. This process is repeated until finally a

92 parameter set of [90,50,50] is generated. This process is similarly repeated for step length by varying the parameter triplet from [50,1,50] to [50,90,50], with each parameter triplet used to generate 100 traces for each of 5 different noise widths. Therefore, each single parameter is tested for 10 different values, at 5 different noise widths, and for 100 replications (up to noise-induced fluctuations); in all, this led to 5000 runs for each parameter, and 15000 data sets in total. For all of these trials, the data was analyzed without any pre-processing or filtering.

To test the algorithm in more realistic cases, it was used on both randomized simulated data and experimental data. The randomized simulated data were generated as described in the methods section, with input parameters of [20 200] for all three parameters of length, height, and separation. Thus, each step is randomly assigned a unique linking segment slope and step length. With parameters for step separation and step height ranging from 20 to 200, linking segment slopes between 10 and 1/10 are possible for instance. These ranges were chosen as they are typical of the parameters observed in the DNA-protein experiments. Normally-distributed noise was added with noise width σ ranging from 1 to 5. These trials were analyzed both without and also with the bilateral filter pre-processing step.

93

3.3.2 Algorithm Performance

The relationship between algorithm effectiveness and step height is shown in

Figure 3.5A. This figure plots the percent of steps found against the step height for various noise width σ. The data sets for these trials were not filtered prior to analysis. It is observed that at very low step heights—in this case, step height of 1 is equal to or less than the noise width—the algorithm has difficulty identifying steps, with only about 10 percent of steps found for all σ. A general trend is evident in Figure 3.5A that the larger the σ, the larger the step height needs to be for the algorithm to correctly identify the steps. For σ = 9, the algorithm has difficulty identifying more than 60% of steps for any step height value. Correctly identifying more than 90% of steps for noise widths of 1, 3, and 5 requires step heights of 20, 50, and 80 respectively. This relationship between step height and noise is expected since noise introduces variability of the signal in the vertical direction, and it follows that greater noise width values require greater vertical step heights in order for the algorithm to discriminate between neighboring steps.

In Figure 3.5B and Figure 3.5C I plot the false positive and false negative rates for these trials, respectively. For σ = 1, both rates fall rapidly to less than 5% by a step height of 20, and eventually reach zero false step identifications. When σ = 3, the algorithm requires a step height of 50 before it has false positive and false negative rates of less than 5%. In general, I observe an inverse relationship between the percent steps

94

Figure 3.5 Algorithm performance as a function of step height. (A) The average percent of steps found for trials with variable step height is plotted for several noise

95 width σ. The unfiltered data were analyzed for these simulations. In these trials, step length and step separation were held at 50 points each while step height was assigned values of 1, 10, 20, 30, 40, 50, 60, 70, 80, and 90. Each step parameter tuple, i.e.

[40,50,50], was tested with 100 different sets of normally-distributed noise. (B) The average number of false positive step identifications are plotted for the same trials. (C)

The average number of false negatives for the same trials are plotted. Step height in arbitrary units. found and the rate of false positives and false negatives. I also find that the rates of false positives and negative are essentially equal to each other for these trials.

In Figure 3.6A, the performance of the algorithm as a function of step length and noise is plotted. Again, the raw data was analyzed without any filtering. Similarly to the step height plot, I found that a longer step length—and therefore more points at that y- coordinate value—increased the likelihood that the algorithm would identify steps correctly. For step length of 1, it is extremely difficult for the algorithm to extract steps since the “gap” between successive sloped linking segments is minimal and steps are nearly non-existent. For each σ, there comes a point where an increase in step length does not have much impact on the ability of the algorithm to correctly identify the steps. With σ = 1, the algorithm reaches 100% effectiveness at step length of 30. For σ = 3, the algorithm reaches maximum effectiveness of ~92% at step length of 50 and

96

Figure 3.6 Algorithm performance as a function of step length. (A) The average percent of steps found for trials with variable step length is plotted for several noise

97 width σ. The unfiltered data were analyzed for these simulations. In these trials, step height and step separation were held at 50 points each while step length was assigned values of 1, 10, 20, 30, 40, 50, 60, 70, 80, and 90. Each step parameter tuple, i.e.

[40,50,50], was tested with 100 different sets of normally-distributed noise. (B) The average number of false positive step identifications are plotted for the same trials. (C)

The average number of false negatives for the same trials are plotted. Step length in arbitrary units. remains close to that for the rest of the step lengths. Similarly, increasing σ to 5 shows a maximum effectiveness at step length of 70, with higher step lengths remaining close to

~60% effectiveness.

Figure 3.6B shows the rates of false positives for these trials, and Figure 3.6C plots the false negatives. These plots show a similar behavior to that of the response to step height. For σ = 1, the rates decline to near zero as the step length approaches 20. For

σ = 3, the false positive rate reaches 5% at a step length of 50 and then remains relatively constant. Similarly, when σ = 5, the algorithm maintains an essentially constant rate of

50% false positives after step lengths of 60. For high values of noise, such as σ = 9, the rates of false positives and false negatives both see very little fluctuation over all ranges of step length at around 80%. Again, as with step height, these plots for false positives and false negatives essentially mirror the plots for the percent of steps found.

98

The algorithm performance as a function of step separation is shown in Figure

3.7A. As in Figure 3.5 and Figure 3.6, the data in these trials were processed without any filtering. In opposition to the other two parameters, algorithmic effectiveness here is inversely proportional to step separation. For very low noise (σ = 1), the algorithm can identify nearly all steps for each step separation value in the plot. As the noise width σ increases the decrease in algorithm effectiveness occurs at lower step separation values as expected. For noise width σ = 3, the algorithm shows reasonable performance of greater than 95% recognition until the step separation reaches 50. Step recognition for σ

= 5 drops below 95% when step separation is greater than 20, and at σ = 7 the recognition rate is above 95% only for a step separation of 0. A larger σ inhibits the algorithm’s ability to successfully identify more than 90% of steps for any step separation.

Figure 3.7B and Figure 3.7C plot the false positive and false negative rates, respectively, for these trials. At σ = 1, the algorithm is able to limit false identifications of both kind to under 5% for all step separation values. For σ = 3, only after a step separation of 50 does the algorithm exceed 5% false identifications for both positives and negatives. Increasing the noise width to σ = 5, I find that beyond step separation of

30, the algorithm exceeds 5% false identification rate. Akin to the rates for false identifications in trials which vary the step height and length, the rates for false

99

Figure 3.7 Algorithm performance as a function of step separation. (A) The average percent of steps found for trials with variable step separation is plotted for several

100 noise width σ. The unfiltered data were analyzed for these simulations. In these trials, step length and step height were held at 50 points each while step separation was assigned values of 1, 10, 20, 30, 40, 50, 60, 70, 80, and 90. Each step parameter tuple, i.e.

[40,50,50], was tested with 100 different sets of normally-distributed noise. (B) The average number of false positive step identifications are plotted for the same trials. (C)

The average number of false negatives for the same trials are plotted. Step separation in arbitrary units. identifications as step separation varies is essentially inverse to the rates of steps found as plotted in Figure 3.7A.

In Figure 3.8A, I plot results for step-trace simulations with added Poisson noise.

The percent (or number) of proper step identifications is plotted as a function of step height and variable λ. Similar to the results of the Gaussian noise simulations, the correct identification of steps is rather low for step heights of 1 and 10. However, the detection of steps for higher step heights increases rapidly, with the algorithm detecting greater than 80 percent of steps for heights greater than 30 points for all noise width λ.

Furthermore, as expected, for a given parameter set triplet the algorithm is generally able to perform better for lower λ. The number of false positives for these trials is plotted in Figure 3.8B. While the number of false positives is initially high for all noise

101

Figure 3.8 Algorithm performance for simulations with Poisson distributed noise.

(A) The average percent of steps found for trials with variable step height is plotted for several Poisson noise width λ. The unfiltered data were analyzed for these simulations. In these trials, step length and step separation were held at 50 points each while step height was assigned values of 1, 10, 20, 30, 40, 50, 60, 70, 80, and 90. Each

102 step parameter tuple, i.e. [40,50,50], was tested with 100 different sets of normally- distributed noise. (B) The average number of false positive step identifications are plotted for the same trials as in A. (C) The average percent of steps found for trials with variable step length is plotted for several Poisson noise width λ. The unfiltered data were analyzed for these simulations. In these trials, step length and step separation were held at 50 points each while step height was assigned values of 1, 10,

20, 30, 40, 50, 60, 70, 80, and 90. Each step parameter tuple, i.e. [40,50,50], was tested with 100 different sets of normally-distributed noise. (D) The average number of false positive step identifications are plotted for the same trials as in C. (E) The average percent of steps found for trials with variable step separation is plotted for several

Poisson noise width λ. The unfiltered data were analyzed for these simulations. In these trials, step length and step separation were held at 50 points each while step height was assigned values of 1, 10, 20, 30, 40, 50, 60, 70, 80, and 90. Each step parameter tuple, i.e. [40,50,50], was tested with 100 different sets of normally- distributed noise. (F) The average number of false positive step identifications are plotted for the same trials as in E. width λ, with counts above 50 for each λ at a step height of 1, they quickly drop to levels of 20 or lower for all step heights above 30 and also for all noise widths.

103

The percent of found steps for trials in which step lengths and λ were varied are plotted in Figure 3.8C. There I observe poor performance for step lengths of 1, but a sharp increase in performance when moving to a step length of 10. For λ of 1 to 5, more than 90 percent of steps are found for all lengths 20 or greater, and at step lengths of 30 or more, at least 80 percent of steps are found for all noise widths. Furthermore, I see that the performance increases steadily, or remains relatively constant, for each λ as step lengths increase. Figure 3.8D plots the number of false positives found for these trials. Similar to the results of the step height trials shown in Figure 3.8B, the number of false positives for low values of step length is quite high, but quickly drops to fewer than 20 for all noise widths and for step heights greater than 30.

The percent steps found for trials for which step separation and λ were varied is plotted in Figure 3.8E. Interestingly, the percent steps found for each λ remain relatively constant for all step separations, from 1 to 100. Across the range of values tested, the percent steps found varies by roughly 5 percent for nearly all cases, with a few exceptions. Similarly, the number of false positives, shown in Figure 3.8F, is also relatively constant, remaining below 20 for all λ and all step separations.

In Figure 3.9, the results of randomized step parameter testing are plotted, trials which most closely represent actual data from single-molecule DNA extension experiments. The solid line represents the percent steps found when the raw unfiltered

104

Figure 3.9 Algorithm performance with randomized step parameters. For these trials, the step parameters were assigned values randomly in the interval [20,200].

Thus, each step has a step height between 20 and 200 points, step length between 20 and 200 points, and step separation between 20 and 200 points. These parameters closely mimic the values seen in the single-molecule DNA-protein experimental data.

For each noise width σ, the trials are repeated with 100 different sets of normally- distributed noise. I show in this figure the average percent steps found for these trials for raw data (solid line) and data that has been pre-processed with the bilateral filter

(dashed line) using a filter height of 10 and a filter width of 10. Also plotted are the average false positives for the raw data analysis (dotted line) and the average false positives for the pre-processed data (dash-dot line). Noise width is in arbitrary units.

105 data are analyzed using the algorithm, while the dashed lines utilize bilateral filter preprocessing. Each point represents the average percent steps found for 100 replications at each noise width σ. The unfiltered analysis shows ~90% steps found for σ

= 1 to σ = 4, dropping down to 80% at σ equal to 5. However, with preprocessing, performance improves to greater than 90% steps found at all noise widths. The exception is for σ ≤ 2 where using the unfiltered data results in better performance.

Also, note that the unfiltered data have increasing standard error in the percent steps found as noise width grows, while the filtered data sets maintain a relatively constant standard error for all σ. Also in Figure 3.9, I show the false positive results from these trials. The dotted line shows the false positives from the trials on raw data. I see that false positives number less than on average for noise widths of less than 2, growing in number to 27 at a noise width of 5. Conversely, the false positives for the trials in which bilateral filter preprocessing was used are plotted with the dash-dot line. Interestingly, the number of false positives decrease over the range of noise widths tested here, going from about 12 false positives at σ = 1 to 8 false positives at σ = 5.

Real-world usability of the step-finding algorithm was assessed by applying it to

DNA-protein experimental data, which I show in Figure 3.10. (Chapter 2 of this dissertation provides a detailed explanation of the experiments, but I will briefly recapitulate the relevant ideas here.) The DNA-protein data is the result of experiments

106

Figure 3.10 Application of the step-finding algorithm to locate features in DNA- protein experiments. (A) The step-finding algorithm is applied to the section of the protein extension plot during which the histones rupture from the DNA molecule.

107

The identified steps are plotted as horizontal black lines against the experimental data in blue. The inset shows from which portion of the entire experiment the data is drawn. (B) Further magnifying the data, individual steps identified by the algorithm are evident. The step-finding algorithm automatically determines the location of these events, which occur as events in multiples of approximately 45-50 nm. in which histone proteins were introduced to single molecules of DNA. The histones bind to the DNA, and upon increasing the force applied to the DNA molecule, these proteins will rupture when their binding energies are surpassed. These dissociation events are characterized by distinct increases in the DNA tether end-to-end length of approximately 50 nm, or multiples thereof. Thus, using the step-finding algorithm to analyze the DNA-protein data should lead to identification of these 50 nm jumps.

In Figure 3.10A, the DNA-protein extension data (blue) and the step-finding algorithm results (black horizontal lines) are plotted together. The area from the entire experiment from which the histone rupture sequence is extracted is shown in the inset.

Since the entire histone dissociation sequence covers a change in DNA tether extension of nearly 12 µm, it is difficult to see the individual 0.05 µm steps with the axes set to the full range. However, after zooming in closer to the data, Figure 3.10B shows that the step-finding algorithm is able to identify individual histone rupture events. Four distinct individual histone octamer dissociation events of 45~50 nm are evident, as is a

108 single event in which two histone octamers ruptured simultaneously and caused a 100 nm jump in extension.

Simulation trials were also performed to analyze the speed at which the algorithm was able to evaluate the data sets. I find that the speed of the algorithm is dependent on two primary steps within the algorithm: the calculation of the modified

PDF (see Figure 3.3) and the determination of the significant peaks. The speed with which the modified PDF is calculated is a function of the number of points in the data trace. This in turn is a function of several variables: the number of steps and the step length and separation. Determination of the significant peaks, on the other hand, varies with the number of steps in the data trace.

For example, a data set with 100 steps with height, length, and separation all set to 50, has some 9850 points. Increasing the step length to 100 while keeping the height and separation at 50 yields a data set of 14750 points. Increasing step separation also to

100 (while length is also 100, but height is still 50), creates a data set of 19700 points.

Applying my algorithm to such data sets on an Intel E6750 2.66GHz Linux machine with 2GB RAM leads to run times of roughly 0.15 s for 9750 points, running times of

0.20 s for 14750 points, and 0.25 s for the trials consisting of 19700 points. Note that changes in step height result in no change to the number of points considered, and thus have minimal effect on running time.

109

Additionally, I calculated run times for variable number of steps while holding the step parameters constant at 50 points each. For 100 steps, as above, the running time is about 0.15 s. For 200 steps (corresponding to 19850 points, an additional 10000 points), the running time increases to just over 0.50 s. At 300 steps and 29850 points, the running time is around 1.18 s. Increasing the number of steps to 500 (which is 49850 points) requires in the range of 3.25 s. Finally, running trials consisting of 1000 steps and nearly 100000 points resulted in running times in the 11 s range.

Most of the calculations involved in the algorithm are relatively simple. It is interesting to compare the timing of the 100 steps trials that generated 19700 points

(using step length and separation of 100, with height at 50) to the 200 step trials that generated 19850 points, only 150 points difference yet the 200 step data set requires twice as much time (0.5 s versus 0.25 s). A likely explanation for this is that the process of determining the significant peaks is the most computationally intensive part of the algorithm, as increasing the number of peaks seems to have a far greater influence on the running time than does altering the step parameters for a given number of points.

3.4 Discussion

The step-detection algorithm I present provides a tool for unbiased automatic detection of plateaus in a data signal. I have demonstrated that it is able to accurately

110 determine plateaus in a data set for certain ranges of step height, length, separation, and noise. Because I approach the step detection problem in a novel way, I believe that my algorithm provides a robust alternative to existing step-detection algorithms.

An important feature of my algorithm is that it does not rely on statistical fitting of canonical step functions to the data. Most of the step-detection algorithms discussed in the introduction use such an approach to fit a piecewise series of canonical step functions, such as the Heaviside function, to the data. For a data set that consists of strict step functions, such an algorithm would be appropriate. However, for data sets in which the plateaus of the signal are not canonical, i.e. two adjacent plateaus are connected by a line with a finite slope, such fitting algorithms will not be ideal. My method, on the other hand, relies on relative probabilities within the data to determine regions expected to have a plateau. As shown in Figure 3.9, even when the slopes of lines connecting plateaus change from step to step, the algorithm correctly identifies plateaus in the data set.

The trials plotted in Figure 3.9 most closely resemble the data traces for single- molecule DNA experiments and the ability of the algorithm—when utilized with pre- processed data—to correctly identify and locate greater than 90% of steps in these trials is critical to a faithful analysis of experimental data.

111

When I applied my method to analyze data from single molecule micromanipulation experiments, I found that the results were consistent with parameters reported in the literature. For instance, the lengths of DNA wrapped around each histone octameric core particle were found at the correct values: approximately 50 nm and multiples of the same. Also, the number of such events detected was consistent with the observed change in DNA tether extension and the estimated maximum number of octamers that could populate the tether. These results strongly suggest that when my algorithm is applied to experimental data, the constant-extension regions are correctly enumerated and quantified, even in the presence of unpredictable noise and also when the steps cannot be characterized with any specificity.

Another aspect of many step-detection algorithms is that they rely on numerous user-defined parameters to achieve a good fit of the step-functions to the data. Aside from making such algorithms complicated to implement for people unfamiliar with the definitions of the parameters, manipulation of these parameters can lead to over-fitting of the data. From the beginning, my approach was designed to minimize the difficulty of using the algorithm and to reduce the chance of over-fitting of the data. My method can be implemented easily and quickly since any user input or modification is optional.

The algorithm I have presented also does not involve fitting of a specific function to the data which eliminates any chance of over-fitting. Rather, I base my approach on finding

112 the regions of the data more likely to represent a plateau and in this sense my approach may be deemed non-parametric.

I sought to minimize user input so as to reduce user bias. In searching for certain features of the data, a user may be able to adjust algorithm parameters to confirm a specific result that they would like to find. It is all too easy to desire that a small step in the data is attributable to noise when it should not be, or vice versa, and to adjust the parameters to either include or exclude those steps. By eliminating most user input and relying on the mathematical properties of the data itself, my step-detection algorithm offers a substantial reduction in the possibility for user bias to alter results.

With regard to user-defined parameters, the primary values in my algorithm that can be adjusted are the bin size for the discrete probability distribution function and the tolerance for determining which peaks are to be taken as identifying steps in the data.

My method for automatically determining the bin size for the distribution function is sufficient for many applications, but in situations where the steps are closely spaced or the noise is substantial (and no filtering is desired), it can be useful to decrease or increase the bin size parameter. The peak tolerance parameter can be adjusted or the value determined by comparison to the median path lengths of all peaks (see Materials

& Methods section) can be used as default. It can be advantageous for the user to set this value when automatic determination of the plateaus fails to recognize data plateaus

113 that are significantly shorter than the majority of the plateaus in the data. A user may wish to modify this tolerance parameter to expand the number of peaks that are classified as representing a step in the data, but adjusting this too far can lead to false positives.

While my algorithm is designed to detect data “plateaus” which may or may not be true step functions, the algorithms discussed in the introduction are specifically tailored to fit a series of step functions to the data. As I have shown, my algorithm performs very well when analyzing data which are not a sequence of pure step- functions. To the best of my knowledge, there are no other well-characterized methods capable of doing so. This lack of other methods was a primary driver behind developing my algorithm, but it also makes direct performance comparisons to existing algorithms difficult.

For the trials plotted in Figures 3.5 through 3.7, in which a single step parameter was altered with Gaussian noise, I find that false positives equal the false negatives.

(The number of false negatives is simply the difference between the total number of steps and the number of steps found.) It should be noted that in principle the false positives can be larger or smaller in number than the false negatives. That they happen to be equal in number to each other strongly suggests that the algorithm is finding only those steps present in the signal—not more or less—but for some of the steps is not

114 assigning them accurately enough to the correct locations. This result is likely due to the criterion that I use to declare that a step has been correctly detected; viz. the step must lie within one-half σ of the true location of the signal.

The relationship between false positives and negatives is most likely unique to the type of signals used in Figures 3.5 through 3.7—signals in which all step parameters are the same for all steps within an individual signal. Specifically, if the calculated arc lengths for all significant peaks are relatively equal—as would be the case for a data trace in which all steps have the same configuration—it is trivial to classify which peaks correspond to actual steps as opposed to which peaks result from fluctuations due to noise. Indeed, the results plotted in Figure 3.9 show that when step parameters are not uniform throughout the data trace, the number of false positives is larger than the number of missed steps. This indicates that the algorithm is inferring steps where there are none, rather than being unable to accurately locate the actual steps as in Figures 3.5 through 3.7.

Comparison between the Gaussian noise results and the Poisson noise results shows similar trends between the two, but also some interesting differences. For both

Poisson and Gaussian noise, when either the step height or step length increases, there tends to be an increase in the percent of steps found and also a decrease in the number of false positives. However, in trials with the Poisson distributed noise, the performance

115 of the algorithm is much more robust to changes in step separation in comparison to runs with Gaussian noise. Another important difference between the Poisson and

Gaussian noise trials is the absolute value of the percent steps found and false positives.

In general, for comparable variances, Poisson noise resulted in a greater percent of actual steps found than the Gaussian noise, although operating on Gaussian noise data sets resulted in higher maximum percent steps found.

Also of note are the false positives from the Poisson noise trials. Whereas the

Guassian noise trials were found to generate approximately equal number of false positives and false negatives—implying that the algorithm was detecting the steps, but without sufficient accuracy—I find that the Poisson noise trials have excessive false positives, particularly for low step heights and lengths, an indication that the algorithm is finding steps where none exist at all. I attribute this significant difference in performance to the discrete nature of the Poisson distribution. Unlike the continuous

Gaussian distribution, the Poisson distribution will tend to increase or decrease the baseline data by a discrete integer amount. Furthermore, for a given λ, these amounts are drawn from a relatively small distribution (i.e. a set consisting of integers), resulting in a much greater likelihood of a specific value being repeated within a certain number of draws. On the other hand, it is highly unlikely for draws from the Gaussian distribution to be exactly equal for the range of points in steps considered by my

116 method. Thus, it is possible for the Poisson distribution to artificially create intermediate steps.

For example, if the portion between neighboring step dwells (one step located at

40 and the other at 50) has a slope of 1, and for 5 successive points in the transient region the baseline data are [44,45,46,47,48]. It is conceivable that the Poisson distribution could return a set of noise values [+2,+1,0,-1,-2] for those five points, and the resulting raw data would be [46,46,46,46,46]. In such a case, I can see that an artificial step at a location of 46 would be created in between two actual steps. While the example case would not be likely to happen frequently, any scenario in which a several data points in close time proximity are roughly equal in value due to noise could lead to the creation of an artificial step. In Figure 3.2, for instance, there are several clusters of points at a similar position within a short period of time that do not correspond to a real step in the baseline data.

Part of my method that may seem counterintuitive is the addition of the second derivative of the PDF back into the original PDF. This is opposite to image processing in which the second derivative is frequently subtracted from the image to enhance edge transitions of the image (Levi 1974, 163-177). It is important to note that I am applying the second derivative to the PDF and not the data (or image) itself. With that clarified, my justification for this step is primarily empirical in that I have run multiple trials

117 utilizing subtraction of the second derivative, as well as skipping the second derivative step altogether, and find optimal performance when the second derivative of the PDF is added back to the signal. Figure 3.3E illustrates this effect, where the magenta signal is the sum of the PDF (red) and second derivative (green), resulting in better isolation of the significant peaks in the signal. This is an effect I see repeatedly and is the reason I include this step in the algorithm. A theoretical justification of this process can be realized by examining individual peaks in Figure 3.3E. It is evident that peaks in the

PDF correspond to large negative values in the second derivative. The sharpness of the peaks will generally be uncorrelated to the size of the peaks. Therefore, when the second derivative is added back to the PDF, the peaks in the PDF will see moderate reduction in magnitude. However, since most peaks in the PDF are about equally sharp but differ in height, addition of the second derivative amounts to subtracting a relatively constant quantity from the peaks. This clearly affects the shallower peaks more profoundly than the taller peaks, helping to reduce the identification of non- significant peaks.

Chapter 4 – Object Tracking Algorithm

4.1 Introduction to Tracking Algorithms

The ability to automatically track objects in images acquired through video microscopy is critical in many fields of engineering and science. The track estimation task in general involves two distinct steps (Thomann et al. 2003, 230-248). First, objects of interest—for example, particles, microspheres, cells, fluorophores—must be found and annotated in each image frame. Second, by comparing successive frames, trajectories for the annotated objects must be built up. The second task becomes easier if the objects are distinguishable; in that case, each object can be identified unambiguously in the next frame, and a list of the frame-to-frame coordinates of the objects are the trajectories I wish to determine. However, the problem becomes much more challenging when the objects are indistinguishable and, thus, it is a priori not apparent how to map an object in one frame to itself in the next frame. If the objects are closely spaced, further complications arise since trajectory assignments for two neighboring objects may be switched by an algorithm without the altered trajectories deviating significantly from the true ones, making such errors especially hard to detect computationally.

A number of track determination algorithms have been developed and have performed with varying degrees of success, especially in the most challenging case of

118

119 densely clustered, identical particles moving at high speed—see reviews by Meijering and Chenouard and references 32 through 57 therein (Meijering, Dzyubachyk, and Smal

2012, 183-200; Chenouard et al. 2014, 281-289). These algorithms may be generally divided into two categories: predictive tracking and measurement assignment tracking.

In predictive tracking, image data are used to estimate an ensemble of kinematic models, and based on this ensemble, the algorithms determine probabilities for the current observations conditional on each possible trajectory in the ensemble. These probabilities can also be augmented with additional data – for example, Anderson et al use image intensity information (Anderson et al. 1992, 425), while other methods have taken into account distributions of errors for observed parameters and kinetic model fits

(Bar-Shalom and Tse 1975, 451-460). For densely-clustered particles or significant noise, predictive tracking can offer an advantage. However, if the exact kinematic model is unknown or not accurately estimated – which can often be the case – the effectiveness of these algorithms will suffer.

Measurement assignment tracking algorithms typically involve “scoring” each potential trajectory assignment and then using the scores to determine the best assignment. Velocities, trajectories, smoothness, shape, and size of objects of interest are often used to compute scores and must be estimated from the data. A number of techniques use a Kalman filter (Kalman 1960, 45) for this purpose (Cerveri, Pedotti, and

120

Ferrigno 2003, 377-404). Constraints on the acceleration/deceleration, radius of turn, or inertia can be used to isolate only the objects of interest increasing computational efficiency (Barniv 1985, 144-156; Fortmann, Bar-Shalom, and Scheffe 1983, 173-184;

Hashirao, Kawase, and Sasase 2002, 29-37; Logothetis, Krishnamurthy, and Holst 2002,

473-490). However, one limitation is that these methods are primarily intended for tracking single objects under low noise conditions, although modifications exist to remove these constraints (Blanding, Willett, and Bar-Shalom 2007, 1994-2006; Chen and

Tugnait 2001, 239-249; Hong et al. 1998, 55-77). An alternative to Kalman-filter-based methods is the Multiple Hypothesis Tracking algorithm (Reid 1979, 843-854; Cox and

Hingorani 1996, 138-150; Noyes and Atherton March 2004, 115-120) in which all measurements prior to the current observations are compared against a predefined kinematic model to generate a set of parent hypotheses. Measurements in the current observation, as well as dummy measurements for noise, new objects, and false positives, are assigned a hypothesis with respect to the parent hypotheses, and Bayes theorem is used to calculate the probabilities of each current measurement based on the prior measurements.

While numerous scoring methods have been developed, one of the simplest involves classifying trajectories based on distances in trajectory space, which may be computed in terms of image or motion parameters using a suitable Euclidean metric

121

(Bonneau, Dahan, and Cohen June 2005, 141-141). When multiple parameters are used, individual parameters can be weighted according to a variety of schemes (Veenman,

Reinders, and Backer 2003, 2049-2067). Various score parameterizations are possible and several methods (Sethi and Jain 1987, 56-73; Chetverikov and Verestói 2014, 321-338;

Sbalzarini and Koumoutsakos 2005, 182-195) have implemented a score function that is mapped to the unit interval. As the scoring step alone may not yield optimal results, additional algorithms have been developed to select the final track assignments. The simplest multi-particle method is the greedy algorithm (Anderson et al. 1992, 425;

Ghosh and Webb 1994, 1301-1318) which works iteratively by examining assignment possibilities for a single trajectory and choosing the one with the best score. If a potential assignment conflicts with previously made assignments, the algorithm will move to the next best assignment, until the best “free” assignment is reached. This is a simple, yet often effective, way to sort through trajectory scores. An improvement can be achieved by incorporating exchanges within this framework (Sethi and Jain 1987, 56-

73; Veenman, Reinders, and Backer 2001, 54-72; Shafique and Shah 2005, 51-65). The exchange step involves examining if a trajectory assignment swap can improve the total score. If so, the swap is made and a new score is calculated. This process is repeated until no suitable exchanges exist, and offers substantial improvement over the basic greedy algorithm. It can, however, be computationally intensive in its most

122 rudimentary form, although the dynamic programming algorithm has been found to improve the computational load (Sage et al. 2003, 586 Vol.1; Rink et al. 2005, 735-749).

In the specialized case where the number of objects from frame-to-frame is conserved, there are no noise-induced false object detections, and observation sequences are of short duration, the Hungarian algorithm (Kuhn 1955, 83-97) can be used to compute optimum trajectory assignments far more efficiently than greedy algorithms. However, it is difficult to guarantee that all requirements are met, and thus it is not widely used as a general tracking algorithm.

A notable example of a measurement assignment algorithm is the method of

Crocker and Grier which is based upon minimizing a total distance traveled by all particles between consecutive frames within the constraints of the dynamics of non- interacting Brownian particles (Crocker and Grier 1996, 298-310). It is still widely used in microscopy and can be effective for objects that are not highly clustered and which do not move rapidly. In situations with high noise, high velocity, or densely-clustered objects that may be spawning at fast rates, however, the Crocker algorithm may not be appropriate.

Velocimetry methods are a subset of tracking algorithms that seek to identify the overall velocity of object groupings rather than the individual object trajectories (Adrian

1991, 261-304; Huang, Dabiri, and Gharib 1997, 1427). They are quite useful in

123 determining velocities of densely-clustered and high-velocity objects, but tend to suffer in performance when object characteristics and movements are non-uniform. Moreover, inferring individual object trajectories is challenging.

These techniques, although powerful, tend not to perform well for densely- clustered, indistinguishable particles moving unidirectionally at high speed and spawned at high rates, and therefore a new algorithm may be required. I present an algorithm that fills the gap. My method can be thought of as a hybrid. Its core is a measurement assignment tracking method in that I do not use the data to fit an explicit kinematic model. However, the method takes advantage of the underlying dynamics of objects moving in a unidirectional force field, much like a predictive tracking algorithm would, in order to provide an initial set of trajectories for the measurement-assignment- tracking-inspired step.

4.2 Object Tracking Algorithm Methodology

A description of the algorithm will be presented first, followed by an illustrative example to demonstrate the method. Additionally, object trajectory simulations and the metrics used to quantify the performance of the algorithm will be discussed.

124

4.2.1 Tracker Algorithm Description

I wish to analyze scenes consisting of densely-clustered, indistinguishable objects moving at high speed nearly exclusively in a single direction, henceforth labeled (unless otherwise noted) the negative-y direction. I allow objects to be spawned at high rates, and thus their number frame-to-frame is not conserved as newly-born objects add to the object population in successive frames and others exit the scene before traveling the entirety of the sensor’s field of view. Physically, this may happen for a variety of reasons. One example comes from tracking polystyrene superparamagnetic beads moving in viscous buffer at high speeds under the influence of an inhomogeneous magnetic field, as in the case of the calibration method used with the magnetic tweezers as described in section 2.2.5. If the beads are not neutrally buoyant, in addition to the motion within the field of view, they will also move downwards and some may fall out of focus before traveling the full length of the imaged scene.

The tracking algorithm involves two conceptually distinct computations, the first involving a scoring function, which generates trajectory assignments for the particles, and the second involving the backtracking method, which further refines or possibly modifies those trajectories. (I assume that the optical centroids of the objects have been computationally determined in a pre-processing step, and take them as inputs to my algorithm.) The scoring function, , calculates a score for each object p at time 1

125 with respect to objects q at time 2, i.e. it assigns a real number to every possible pairing of objects in two consecutive frames. The score is not bounded, can take negative, positive, and zero values, and is defined as follows:

√( ) ( ) ( ) . | ( ( ))|

The first term is the signum function, which gives the score a positive value for movements in the direction of the force and a negative value for movements in the opposite direction. The numerator is the Euclidean distance between the two observations. The denominator is the angle formed between the direction of the force and the vector that connects the two objects. The final term, DBy, is calculated by finding the difference in the mean y-coordinates for all objects in two successive frames.

However, since the location of a particle in the next frame has not been definitively assigned at this stage, it is calculated for different particle pairings and the minimum value is used. This term serves to penalize observation pairs which do not show substantial movement in the direction of the force, and approximates the minimum distance an object is expected to move between successive observations based on the gross behavior of all objects. The scoring function is constructed such that a small, positive scores are preferred as these indicate movement in the direction of the force;

126 large, negative scores, on the other hand, imply displacements in opposite or orthogonal directions.

Scores for two consecutive times are arranged in a matrix of values of size m × n, where m is the number of objects at time 1 and n is the number of objects at time 2. The resulting matrices will not be square matrices in general since the number of objects may change from frame-to-frame. If an object at time 1 has negative scores for all possible matches with objects at time 2 – this would involve looking along a row – I conclude that it has exited the field-of-view. Similarly, scanning down a column, if all entries are negative, i.e. if an object at time 2 has negative scores for all possible matches with objects at time 1, the object is taken to have just spawned or entered the field-of- view and the algorithm will assign it a new tag or ID.

The likely trajectory links are determined beginning with the object at time 2 that is closest to leaving the field-of-view in the direction of the force and finding which object pairing at time 1 leads to the smallest positive score. These two objects are then assigned to the same trajectory. The algorithm then moves to the object at time 2 that is next closest to leaving the field-of-view and finds which of the remaining objects at time

1 has the smallest positive score. This process is repeated for all objects at time 2, and then repeated for the remaining frames. The end result is a unique trajectory for each

127 object that exists for two or more frames, and these are the inputs to the back-tracking step discussed next.

The back-tracking approach provides a check on the quality of score-based track assignments. It builds on the idea that the shortest distance between two points is a line.

Thus, if an object is observed at (x1, y1) at time 1 and subsequently at (x3, y3) at 3, then it is likely that for an observation at an intermediate time 2, the coordinates (x2, y2) should fall close to the straight line connecting (x1, y1) and (x3, y3). In my method, this is accomplished by calculating the following:

[√ √ ]

√ .

This quantity is associated with a frame triplet and is the difference between the sum of the distances between an object at the intermediate time 2 and its position at times 1 and

3 and the distance between its positions at 1 and 3. Large D-values represent an intermediate trajectory observation that departs widely from the direct path that connects observations at times 1 and 3, while small values correspond to an observation at time 2 that is more closely aligned with the path connecting times 1 and 3; the limit of

D = 0 represents the case of three collinear points.

The D-computation would normally be very expensive computationally if examining all possible ways to organize a set of objects across three time slices.

128

However, I have implemented a procedure to dramatically cut down on the combinatorial complexity of this step, allowing its use in scenes with hundreds of objects. The first step is finding the D-values for the scoring-function-based trajectory assignments spanning three time slices (t = 1, t = 2, t = 3, for instance) and using these to find the initial total cost , where ∑ ; here, n is the object assignment number.

This serves as the initial total reference cost for those three time slices. Initial trajectories with D-scores close to zero are very likely to be correct. However, the back-tracking method will investigate any potential object assignment swaps as follows. A list of potential alternate trajectories is created in which they differ from the initial trajectories by only one assignment. For instance, suppose there are five objects labeled A, B, C, D and E, and the initial trajectories from the scoring step are designated as A-A-A, B-B-B, and so on, a list of potential trajectories could include A-A-C and A-E-A, but not A-D-E or A-C-B. The reason for not considering the latter variations is that by the time an object assignment has reached the second of the triplets in the D-value calculation, it has already been “vetted” by the scoring function and any previous back-tracking evaluations. In that case, spending computational resources on re-evaluating such assignments is likely to be wasteful. Then, from this list of potential alternate trajectories, D-values are calculated for each individual alternate trajectory A-A-B, A-A-

E, A-C-A, A-D-A and so on. Once these individual trajectory D-values are calculated,

129 they are compared to the initial trajectories A-A-A, B-B-B, etc, and only those which are smaller than the initial trajectories’ D-values are evaluated for complete triplet swaps.

That is, if the individual trajectory A-E-A has a D-value less than the initial A-A-A, the entire triplet swap is carried out (thus E-A-E is evaluated as well as any other alternate assignment conditions that arise from such a swap) and a total D-value, i.e. DR, is determined. If this is larger than the reference DR, then the potential alternate assignment is discarded. If the new total D-value is smaller than the reference DR value, then the swap is accepted and the object trajectory assignments are realized as the new reference or initial trajectories and the reference DR takes on the new total D-value. If all potential alternate assignments do not produce a smaller DR, then the initial assignments are confirmed as the best possible assignment and the algorithm moves forward a single time step. This process is then repeated for all time steps.

In addition to providing a check on the results from the scoring method, an important advantage of the back-tracking method is that it does not place any constraints on the movement of the objects. Thus, while the scoring part of the algorithm biases decisions in a way to promote movement along the force field, the back-tracking method can revise for movements against the direction of the force as well as significant lateral deviations and accept such moves if they satisfy the approval criteria just discussed.

130

4.2.2 Illustrative Example

I illustrate the scoring method in Figure 4.1 with a simple example consisting of two objects, drawn as a square and a circle for clarity but considered identical by the algorithm, that have been observed at three different times; the times have been color coded as follows: t = 1 by red objects, t = 2 by green, and t = 3 by blue. The force field is pointing downwards. Arrows connecting the objects at two consecutive times are characterized by their magnitude and the angle θ that they make with the force direction. Starting with the red circle at t = 1, the algorithm must decide at t = 2 whether to assign the green circle or the green square to the red circle. This is done by computing the score for all vectors from the red circle to the green objects. As drawn, the link from the red circle to the green square (the dashed yellow vector) has a y- component that points against the direction of the force. Thus, even though the green square is closer to the red circle than the green circle is, the green square link is rejected due to the “backwards” movement, and the green circle assigned to the red circle’s trajectory. Accordingly, the green square would be linked to the red square’s trajectory.

In the next iteration, the blue objects (at t = 3) have to be linked to the green objects.

Again, the blue square is slightly closer to the green circle then the blue circle is, and furthermore in this case the blue square does not show movement against the direction

131 of the force. Yet the angle that is formed between the vector connecting the green circle and the blue square is considerably larger than the angle for the vector connecting the

Figure 4.1 Example of scoring function. This figure highlights the method by which the scoring function assigns trajectories. The two identical objects, depicted as a square and a circle for clarity, are observed at three times, represented as red at time t

= 1, green at time t = 2, and blue at time t = 3. The objects move in a directional force field as shown. The black arrows that connect the objects symbolize the trajectory links that must be determined. The various parameters for the scoring function are displayed.

132 circles, and thus the score for the blue square to green circle would be appreciably larger than the score for the blue circle to the green circle. Thus the algorithm would assign the blue circle to the trajectory of the green circle (and therefor also the red circle) while assigning the blue square to the trajectory defined by the previous squares.

The application of the back-tracking method to this situation is shown in Figure

4.2. The actual trajectories are drawn as solid lines while alternate trajectory assignments are indicated with dashed lines (Figure 4.2A). These have also been color coded: lines connecting red (t = 1) to green (t = 2) by yellow, lines connecting green (t =

2) to blue (t = 3) by cyan, and lines connecting red (t = 1) to blue (t = 3) by purple.

Referring to Figure 4.2B, I see that the computed for the solid yellow and cyan vectors, which is the square’s true path, is rather small since the green square is just slightly removed from the vector (purple) connecting the square object at times t = 1 and t = 3. The true cost for the circle, , is even smaller since it almost lies on the purple vector. (If it were exactly on that purple line, the cost would be zero for that trajectory.)

Thus, the total cost, , for the correct trajectories is quite small.

However, the algorithm does not know this a priori and so must make local changes to the tracks to explore ways to reduce even further. For instance, selecting (in no particular order) t = 1, the red square and red circle are swapped in Figure 4.2C, so that the green square is connected to the red circle at t = 1 and the green circle is connected

133

Figure 4.2 Example illustrating the backtracking method. Two objects, represented by a square and a circle, are observed at three times, red at time t = 1, green at time t =

2, and blue at time t = 3. The back-tracking method must determine whether the initial trajectories determined by the scoring function are appropriate and, if not, utilize a greedy exchange method to determine which trajectory assignments minimize the

134 total cost function (see text for definition of the cost.) (A) Here, all possible trajectory links are shown. The links are color coded according to conventional RGB color mixing: the link that connects red and green is yellow, the link connecting green and blue is cyan, and the link connecting red to blue is purple. The initial trajectories that result from the scoring function are shown as solid lines while alternate trajectory links are the dashed lines. The individual trajectory cost D is the sum of the magnitudes of the yellow and cyan links minus the magnitude of the purple link. (B)

Examining the initial trajectory assignments, it is evident that the total cost for this arrangement is rather small. (C) The cyan trajectory link has been preserved but the yellow and purple trajectory links have been swapped as a result of exchange the object assignments at time t = 1. A brief examination of the trajectory link magnitudes will lead to the conclusion that the total cost for this configuration is substantially larger than that of the configuration shown in 2C. (D) In this configuration, the yellow links have been preserved from the original state, but the purple and cyan links are different due to an exchange of the objects at time t = 3 (blue). It is seen that the green objects (time t = 2) stray further from the “ideal” straight line path connecting the objects at times t=1 (red) and t = 3 (blue) in comparison to the original configuration in

2C. (E) The final permutation of the potential trajectory links is shown. In this case the intermediate object observation (green at time t = 2) has been swapped, preserving the

135 purple link but altering the yellow and cyan links. The sum of magnitudes of the yellow and cyan trajectory links compared to the sum of the magnitudes of the purple will be substantially larger, thus solidifying the original trajectory assignments as the optimal case. to the red square at t = 1, while other links between t = 2 and t = 3 are unchanged. As is apparent, although the new trajectory initiated at the red square has a small D, the revised track starting from the red circle has a much higher D now, since the cyan vector link alone is larger than the purple vector link. Therefore, the exchange is not accepted. Another possibility is shown in Figure 4.2D. Now, the blue objects (at t = 3) are swapped. It is evident that the sum of the magnitude of the yellow and cyan vector links for both trajectories is significantly larger than the magnitude of the purple vector link; this also results in a higher total cost and is rejected. Figure 4.2E presents one more example. Here, t = 2 (green observations) is swapped. This results in the cyan vector, which connects the green square to the blue circle, having a larger magnitude than the purple vector connecting the red circle to the blue circle. Again, the cost is substantially larger compared to the true case (Figure 4.2B.) While this covers all possible swaps in the simple two-object case, scenes with larger numbers of objects would proceed in a

136 similar fashion, examining other potential swaps until trajectory assignments with the minimum are found.

4.2.3 Particle Trajectory Simulations

Performance of the algorithm was tested against simulated trajectories generated using the following inputs: (a) the total number of objects generated over the course of a simulation np, (b) the x- and y- dimensions of the “image,” (c) a value which defines the typical time between successive objects entering the field of view or spawning within the field of view, which I refer to as the spawn rate, (d) initial position (x0, y0), (e) initial velocity (vx0, vy0), (f) acceleration (ay) of each object, and noise components drawn independently for x- and y-object coordinates from a Normal distribution with mean zero and (g) variance σ, a measure of the signal-to-noise ratio. x0 is defined as the row coordinate of the pixel where a particle enters the simulated field of view; y0, the column coordinate. The inputs can be single-valued or defined from an interval from which they may be drawn with uniform probability. The latter allows us to assess how well the algorithm can deal with objects with non-uniform behavior and also perform sensitivity analyses. The simulations continue until the final object exits the field-of- view of the “image.” Object coordinates are determined using the basic kinematical equations:

137

Here, xt1 is the position of the particle at time t1, xt2 its position at t2, vx0 its initial velocity along x, vy0 its initial velocity along y, and ay its acceleration along y. Position is measured in pixels, velocity in pixels/frame, and acceleration in pixels/frame2. Given a specific pixel size and frame rate, these values can be converted to physical units. As the time slices are uniformly separated, if an object spawns at a time between two time steps, its motion to the next time step is determined using the kinematical equations.

Subsequently, it is simply a matter of stepping through time to calculate the spatial coordinates of the trajectory for each object. Noise is then added, and the final output is a list of observations (x, y, t) and trajectory ID numbers for each observation.

I discuss the spawn rate further in order to clarify some points. As defined, a value of 1 indicates a single new object being added for each time slice; a value of 0.25 initiates 4 objects per time slice; an input of 0.30 implies 3 particles birthing in some frames and 4 in others. In physical units, at a spawn rate of 1/16 and a sampling rate of

30 Hz, 480 objects would spawn per second. Small fluctuations in the rate can also be introduced by randomly selecting a value from the Normal distribution with mean 0 and variance of 0.01. This allows the time between successive spawns to be closely centered on the desired mean value while preventing the algorithm from using

138 underlying patterns in the spawn rates to anticipate the emergence of new particles.

Additionally, sampling from a uniform distribution of spawn rates defined by an upper and lower bound is also built in.

Besides the spawn rate, I selected simulation parameters to cover a range of cases that might exist in experiments or data processing tasks. The “image” dimensions for all simulations are x = 400 and y = 800. The x0 values are drawn with uniform probability from the intervals: [200, 200], [150, 250], [100, 300], [50, 350], [1, 400], while the y0 values from: [1, 1]; [1, 50]; [1, 100]; [1, 150]; [1, 200]. This notation for specifying parameter intervals is to be interpreted as follows: y0 = [1, 1] means that all objects spawn at the top of the image; yi = [1, 200] means that spawn locations are picked uniformly between the two limits, i.e. the top quartile of the image, and so on. The initial x-velocities are vx0 = [0,

0]; [-1, 1]; [-2, 2]; [-3, 3]; [-4, 4]. The first selection, vx0 = [0, 0], corresponds to the objects moving exactly parallel to the direction of the force; in other cases, the x-velocity is drawn at from a uniform distribution defined over the specified intervals. A similar procedure is used for vy0: [0, 0]; [0, 2]; [0, 4]; [0, 6]; [0, 8] and ay = [9, 9]; [7, 11]; [5, 13]; [3,

15]; [1, 17]. Finally, to take into account uncontrolled technical sources of variation, noise is applied to the deterministic paths by drawing numbers from Gaussian distributions with zero mean and variances taken from σ = [0, 1, 2, 3] for both x- and y- coordinates.

139

Investigating the performance for all possible combinations of parameters values would imply testing ~105 unique parameter sets, which would amount to nearly 107 simulations (when repeating each parameter combination 50 times.) Since this is not computationally feasible, I performed simulations in which each input was varied while holding the others at a baseline configuration, which was set as follows: x0 = [0, 400], y0 =

[1, 1], vx0 = [0, 0], vy0 = [0, 0], and ay = [9, 9]. This corresponds to particles initiated at random from the top with a zero vertical y-velocity, no drift in the x-direction, and a constant y-acceleration. For each set of parameter values, 50 trials were performed for a total of 35000 trials [5 (parameters) × 5 (values/parameter) × 50 (trials) × 7 (spawn rates)

× 4 (noise strength).] In order to determine the optimal value of np, trials with the baseline setting were run with np drawn from [10, 20, 40, 80, 160, 320]. The results from these simulations, which I discuss shortly, led to np = 60 as a reasonable compromise between experimentally-representative complexity and computational time, and, unless otherwise noted, np was fixed to 60 for all simulations reported here.

I also performed simulations in which the input parameters were chosen to recapitulate bead drop experiments. These typically capture around 100 particles so I set np = 100. Since the particles are injected from the micro-pipette in the manner of vertical line source I set x0 = [100, 100] and y0 = [1, 100], which resulted in particles spawning from a line one pixel wide and 100 pixels high. The magnetic beads have a

140 horizontal velocity (x-direction) component; however, I found minimal initial velocity in the y-direction: vx0 = [-0.5, 2.5] and vy0 = [0, 0]. For the spawn rate, there are a few instances in which as many as 6 particles appear simultaneously out of the pipette, which would imply a rate of 1/6. However, most often particles may be separated by dozens of frames, which would be a rate of 12. Since I wanted to challenge the algorithm, I drew spawn rates from [1/6, 2]. Variations in the magnetic moments of the superparamagnetic particles imply variations in the magnetic forces on them. Thus, I expect small fluctuations around the mean acceleration. Moreover, the mean acceleration depends on the distance from the magnet. Thus, simulations were run with ay = [1, 2]; [5, 6]; [9, 10]; [13, 14]; [17, 18]; [21, 22]; [25, 26]. Simulations were also performed at the 4 different noise strengths described previously.

Figure 4.3A shows a sample simulation data set that has been created using the baseline parameter configuration discussed previously and with zero noise. The markers in this figure show only the coordinates of an arbitrary object, not a specifically-shaped particle. The algorithm must parse this 3-dimensional data set into complete trajectories. In Figure 4.3B, I show a sample data set in which the simulation parameters were set to recapitulate the bead drop experiments. As mentioned before, these data were generated as 100 objects spawning at a rate of [1/6 2], with initial positions of x0=[100,100] and y0=[1,100], initial velocities of vx0=[-0.5 2.5] and vy0=0, and

141

Figure 4.3 Simulation data set samples. (A) The baseline parameters for the trajectory simulations produces a set of coordinates in (x,y,t) that can be plotted as in this figure.

142

The markers are not indicative of any specific shape, but simply indicate the coordinates for some arbitrary object. (B) An example simulation using parameters that are similar to the experimental bead drop trials is shown. These data are generated as 100 objects spawning at a rate of [1/6 2], with initial positions of x0 =

[100,100] and y0 = [1,100], initial velocities of vx0 = [-0.5 2.5] and vy0 = 0, and accelerations of [9 10]. (C) The effect of noise on trajectory coordinates is shown. Five objects denoted by different markers (circle, diamond, triangle, dot, and square) are displayed with the noise-free trajectory in black, noise strength of 1 in red, noise strength of 2 in green, and noise strength of 3 in blue. accelerations of [9 10]. To demonstrate how the addition of noise can impact trajectories,

I plot a sample of 5 objects in Figure 4.3C at the 4 noise levels of 0, 1, 2, and 3. Each object is denoted by a different marker shape: circles, squares, stars, triangles, and diamonds. Simultaneously, the color of the markers and lines denote the noise levels: black for zero noise, red for noise strength of 1, green for noise strength of 2, and blue for noise strength of 3. The figure provides a head-on view of the trajectories in space.

At a noise level of 2, an obvious departure from the ideal kinematic model of zero noise is observed, and at a noise level of 3 the trajectories can over-lap and intersect repeatedly.

143

Finally, I note that the axes for the simulations were set by taking the direction of the force to be in the positive y-direction; also, the origin was set at (1, 1). When the data were processed through the tracking algorithm, however, the trajectories, i.e. list of object centroids over time, were transformed so that the particles were moving in the negative y-direction, which is the direction of the force assumed in the tracking program.

4.2.4 Performance Analysis Metrics

Two criteria were used to compare the results of the algorithm to the ground truth trajectories known from simulations, and to assay the degree of agreement with experiments. First, the correct number of individual observation links is determined by comparing the simulation particle coordinates at each time step (ground truth) to the coordinates assigned to the same particles by the algorithm. From this, I determine if a particle has been correctly linked to itself at the next time step, and the total number of correctly identified links is recorded as a percentage of the known links. This means that a higher percentage is desirable indicating very few incorrect link identifications.

The second approach utilizes the Variation of Information (VI) metric. This compares two partitions X and Y of a set A, where each partition consists of disjoint subsets X={x1, x2,..., xe} and Y={y1, y2,...,yf}, by computing the following quantity:

144

∑ *log ( ) log ( )+

| | | |⁄ | |⁄ where ⁄ , , , and ∑ | | ∑ | | | |

In my case, the set A consists of all observations in the simulations. The partition X is the correct trajectory assignments from the simulations and the partition Y is the algorithm output trajectories. The disjoint subsets within X and Y are the individual object trajectories. As defined, VI is zero if the two partitions are identical. As the difference between the partitions grows, VI grows as well, implying that low values of

VI are desirable.

While these two metrics are related, they encapsulate different results of the same algorithm output set. The Variation of Information provides a more holistic measure of the tracking algorithm performance than can be provided by simply counting the number of correct trajectory links. The primary difference between the two measures is that the distribution of incorrect trajectory links is taken into account with the Variation of Information metric. I can present two scenarios where 25 incorrect trajectory links occur during analysis of 50 objects. In one case, I propose that they all occur within the trajectories of 3 objects; and in another case I propose that the incorrect links are spread out with half of the objects each having one incorrect link. These two scenarios will have the same number of incorrect trajectory links but they will have

145 different Variation of Information values. The difference is subtle, but the Variation of

Information provides a more comprehensive characterization of trajectory completeness than a count of correct trajectory links alone.

4.3 Results

The results of simulations that varied each parameter (other than the total number of particles) while holding the others at their baseline values are shown in

Figures 4.5 – 4.9. Each figure has eight plots organized into two columns displaying how the algorithm’s performance varies as the parameter of interest is changed. The plots in the left column present the results in terms of the percent correct trajectory links averaged over 50 identical simulation runs; on the right, results using the VI score are shown. These are not averaged, however, and are computed by grouping the 50 replications into a superset and applying the VI score to this set. Each column has four plots with results arranged by the noise strength with zero noise at the top row and increasing by one to the maximum noise strength of 3 at the bottom. The x-axis for all plots is the same: the spawn rate. Within every plot, data for a specific spawn rate are shown with grouped bars of different colors. Within each group, the colored bars represent the simulation parameter that was varied for the set of trials. The values for

146 each color are shown in the legend at the bottom of the figure. These colors and values apply to all plots in Figures 4.5 – 4.9.

4.3.1 Total Number of Objects

In order to determine how sensitive the algorithm is to the total number of particles to be tracked, I carried out simulations varying this parameter over [10, 20, 40,

80, 160, 320]—see Figure 4.4. The spawn rate was 0.3 and the other parameters were set to their baseline values. A spawn rate of 0.3 implies that some frames had 3 while others had 4 new objects introduced in them. In terms of the percent correct links, the performance is excellent with close of 100% correct links over the range of particle numbers tested. I also find that the results are weakly sensitive to noise falling to about

90% for noise of 3 for all np. These conclusions also hold when performance is measured in terms of VI with some degradation in performance with increasing np, as expected.

However, the change in VI is not generally large (recall that smaller VI scores are preferred), with it staying below 1 for all noise and np values. For noise of 1 and 2, the VI behaves non-monotonically with np, increasing from np = 20 to 40 and then decreasing somewhat from 40 to 160 before rising again to 320.

147

Figure 4.4 Algorithm performance for simulations with various total number of objects. The total number of objects were varied from 10 to 320, while all other simulation parameters remained in the baseline configuration with x0 = [1, 400], y0 = [1,

1], vx0 = [0, 0], vy0 = [0, 0], ay = [9, 9] and spawn rate of 10/3, corresponding to 10 new objects every 3 frames. The simulations were run with added noise of strengths 0, 1, 2, and 3 as indicated by the legend.

4.3.2 Initial x-Position

In Figure 4.5 I alter x0 by drawing from the intervals [200, 200], [150, 250], [100,

300], [50, 350], and [1, 400]. I remind the reader that x0 = [200, 200] means all particles

148 enter from the same location; [50, 350], that they enter anywhere within an interval of size 300 starting at x = 50 (y = 1) and ending at x = 350 (y = 1), the exact position being chosen by sampling from a uniform distribution and so on. In the following, I will describe the results in terms of the length of the sampling interval, which varies here from 0 to 400. In general, I find that the performance depends on an interaction between the length of x0’s sampling interval and the spawn rate, with weak sensitivity to noise across the parameter sets tested. Thus, performance is best when x0 is drawn from a wide distribution and the spawn rate is small. In fact, for rates of 1 and lower, I see nearly 100% correct link identification with very little sensitivity to the width of the distribution for x0. For rates in the 8 to 2 range, I find that wider sampling in x0 leads to better performance; for instance, for a rate of 4, the percentage of correct links varies from 70% to 95% as x0 is sampled from intervals of length 0 to 400. Holding rate and sampling length constant, no statistically significant changes in these results is detectable as noise increases. When I look in terms of VI, the poorest performance is seen for the combination of a high spawn rate and a point source, where I find a VI close to 3. This is, of course, consistent with the results in terms of percent links. However, without noise, the performance improves dramatically when rates of 8 and less are examined, with VI less than 1; for very low spawn rates, the VI is almost 0. The overall trend remains the same with the introduction of noise; however with increasing noise,

149

Figure 4.5 Results from simulations examining the initial x-position variation.

While initial x-position variation was tested with values of 0, 100, 200, 300, and 400

(units of pixels), the remaining simulation parameters were held in the baseline configuration with np = 60, y0 = [1, 1], vx0 = [0, 0], vy0 = [0, 0], and ay = [9, 9]. The two columns show the two performance metrics: percent correct trajectory links in the left

150 column, variation of information in the right column. Four rows correspond to the four different noise strengths: 0, 1, 2, and 3. The x-axis for all plots is the spawn rate in units of objects/frame. Each spawn rate grouping contains 5 bars of different colors which represent the different initial x-position variation as shown in the legend at the bottom.

the performance improves the most as x0 is drawn from longer intervals, approaching the values for zero noise.

4.3.3 Initial y-Position

In Figure 4.6, I present the results of simulations in which the initial y-position was changed by drawing from intervals of lengths 0, 50, 100, 150, and 200. The other simulation parameters were held in their baseline configuration. The overall performance is very good measured using either of the metrics, reaching close to 100% correct identification and VIs less than 1 for a wide range of noise, spawn rates, and y0 interval lengths. However, I see a decline in performance as the interval length increases, although this decrease is modulated by the spawn rate and is not very sensitive to noise when holding spawn rate and interval length constant. For high noise

151

Fig 4.6 Results from simulations examining the initial y-position variation. While initial y-position variation was tested with values of 0, 50, 100, 150, and 200 (units of pixels), the remaining simulation parameters were held in the baseline configuration with np = 60, x0 = [1, 400], vx0 = [0, 0], vy0 = [0, 0], and ay = [9, 9]. The two columns show the two performance metrics: percent correct trajectory links in the left column, variation of information in the right column. Four rows correspond to the four

152 different noise strengths: 0, 1, 2, and 3. The x-axis for all plots is the spawn rate in units of objects/frame. Each spawn rate grouping contains 5 bars of different colors which represent the different initial y-position variation as shown in the legend at the bottom. strength of 3 and high spawn rates of 8 and 16, the performance can dip to below 50 percent of trajectory links correctly identified, particularly when the y-position variation is relatively high. Interestingly, a change in noise strength does not seem to have a tremendous effect on the algorithm's performance. For a spawn rate of 2, for instance,

VI does not change appreciably for any noise strength except for an interval of length 0 going from VI = 0.1 at zero noise to VI = 0.25 for noise strength of 3. At lower spawn rates, the differences between VI for the various noise values are negligible.

4.3.4 Initial x-Velocity

Figure 4.7 shows the results of simulations in which the initial x-velocity was altered. The input parameters for these simulations were vx0 = [0, 0]; [-1, 1]; [-2, 2]; [-3, 3];

[-4, 4], which correspond to variations of 0, 2, 4, 6, and 8, respectively. For fixed noise and spawn rates, I see that an increase in the variation generally results in a decrease in algorithm performance. At noise value of zero and spawn rate of 16, the percent correct

153

Figure 4.7 Results from simulations examining the initial x-velocity variation. While initial x-velocity variation was tested with values of 0, 2, 4, 6, and 8 (units of pixels/frame), the remaining simulation parameters were held in the baseline configuration with np = 60, x0 = [1, 400], y0 = [1, 1], vy0 = [0, 0], and ay = [9, 9]. The two columns show the two performance metrics: percent correct trajectory links in the left

154 column, variation of information in the right column. Four rows correspond to the four different noise strengths: 0, 1, 2, and 3. The x-axis for all plots is the spawn rate in units of objects/frame. Each spawn rate grouping contains 5 bars of different colors which represent the different initial x-velocity variation as shown in the legend at the bottom. trajectory links measure decreases from 95% to 75% and VI from under 0.5 to close to 1.5 as vx0 is drawn for wider intervals. As the spawn rate decreases, these differences also decrease while overall performance improves nearing 100% correct identification and

VI close to zero for spawn rates of 2 and less. These results are robust to increases in the noise amplitude. The worst performance occurs at spawn rate of 1 and noise 3, but even in that case over 95% of trajectory links are identified correctly and VI is less than 0.25.

4.3.5 Initial y-Velocity

In Figure 4.8, I present results from simulations in which vy0 was selected from:

[0, 0]; [0, 2]; [0, 4]; [0, 6]; [0, 8], which correspond to interval lengths of 0, 2, 4, 6, and 8 respectively. There are several interesting trends to note here. First, at low noise strengths of 0 and 1, I observe a slight decrease in algorithm performance as the initial y-velocity variation is increased. The percent correct trajectory links drops from the

155

Figure 4.8 Results from simulations examining the initial y-velocity. While initial y- velocity variation was tested with values of 0, 2, 4, 6, and 8 (units of pixels/frame), the remaining simulation parameters were held in the baseline configuration with np = 60, x0 = [1, 400], y0 = [1, 1], vx0 = [0, 0], and ay = [9, 9]. The two columns show the two performance metrics: percent correct trajectory links in the left column, variation of

156 information in the right column. Four rows correspond to the four different noise strengths: 0, 1, 2, and 3. The x-axis for all plots is the spawn rate in units of objects/frame. Each spawn rate grouping contains 5 bars of different colors which represent the different initial y-velocity variation as shown in the legend at the bottom. mid-90% range to the mid-80% range for spawn rates of 16, 8, and 4. However, moving to higher noise strength levels of 2 and 3, the performance difference between y-velocity variations disappears. VI measures for those noise levels do not show a discernible trend across spawn rates. For noise strength of 3 and spawn rate of 4, for instance, the two highest values for VI occur at initial y-velocity variation of 0 and 8. This is also the case for noise strength of 2 and spawn rate of 16. It is evident that for high noise strengths, the alteration from the ideal deterministic trajectory overwhelms any difference of initial y-velocity.

4.3.6 y-Acceleration

In Figure 4.9 I plot the results for the various simulations in which the initial y- acceleration, ay, was altered over the following intervals: ay = [9, 9]; [7, 11]; [5, 13]; [3, 15];

[1, 17]. These correspond to variation in values of 0, 4, 8, 12, and 16 respectively. I see

157

Figure 4.9 Results from simulations examining the y-acceleration. While y- acceleration variation was tested with values of 0, 4, 8, 12, and 16 (units of pixels/frame2), the remaining simulation parameters were held in the baseline configuration with np = 60, x0 = [1, 400], y0 = [1, 1], vx0 = [0, 0], and vy0 = [0, 0]. The two columns show the two performance metrics: percent correct trajectory links in the left

158 column, variation of information in the right column. Four rows correspond to the four different noise strengths: 0, 1, 2, and 3. The x-axis for all plots is the spawn rate in units of objects/frame. Each spawn rate grouping contains 5 bars of different colors which represent the different y-acceleration variation as shown in the legend at the bottom. that for zero noise, a y-acceleration variation of 0 leads to notably better performance of the algorithm in both percent correct trajectory links and VI. However, for y- acceleration variation greater than 0, there is no discernible trend in performance. At spawn rate of 16 and noise strength of 0, for instance, variations of 8 and 12 have the two largest VI values, while at spawn rates of 2 I see better performance than a variation of 16. Moving on the higher noise strengths, the difference between y-acceleration variation of 0 and the larger values becomes less noticeable. When the noise strength is

3 and the spawn rate is 16, I see that VI for y-acceleration variation of 0 is greater than when the y-acceleration variation is at the maximum, 16. In general, there seems to be a slight improvement of performance when the y-acceleration variation is low, but the difference is minimal and often falls within the standard deviation of other values.

159

4.3.7 Experimental Parameter Simulations

Figure 4.10 plots results from simulations that used input parameters similar to those I observe in the bead drop experiments. To recapitulate, these simulations varied the mean acceleration while keeping all other parameters constant at values similar to those I observe in the bead drop experiments. Again, the overall performance is excellent and insensitive to choices of the other parameters within a wide range. For instance, at an acceleration of 5.5 and zero noise, the percent correct trajectory links is

Figure 4.10 Results from simulations with parameters suitable for recapitulating microsphere tracking experiments. These trials varied the mean acceleration as a way to symbolize the change in force on the superparamagnetic microparticles that were

160 observed in the experiments. The mean acceleration values of 1.5, 5.5, 9.5, 13.5, 17.5,

21.5, and 25.5 were used - all in units of pixels/frame2 - and are plotted on the x-axis.

Due to the inhomogeneity of the superparamagnetic microparticles, around each mean acceleration value the variation spanned ±0.5 pixel/frame2. Thus, a mean acceleration of 1.5 could vary on the interval [1, 2] pixels/frame2, a mean acceleration of 5.5 could vary on the interval [5, 6] pixels/frame2, and so on. The other simulation input values for these trials were np = 100, x0 = [100, 100], y0 = [1, 100], vx0 = [-0.5, 2.5], vy0

= [0, 0], and rate = [1/6, 2]. All simulations were run with noise strengths of 0, 1, 2, and

3 as shown in the figure legend. about 92%, while at the highest acceleration of 25.5 that value is about 86%, a difference of 6%. The introduction of noise lowers performance somewhat (as expected) and decreases the sensitivity of results to changes in the acceleration: a mean acceleration of

5.5 at noise strength 3 has 74% correct trajectory links and higher accelerations are within 2% of that value. The outlier occurs at mean object acceleration of 1.5 for which the percentage is substantially higher for zero noise and but decreases at high noise.

When the performance is measured using the VI, the trends discussed above hold, but I find that the results are less sensitive to noise. For example, for a mean acceleration of

161

1.5, the VI changes by 2 when noise strength is increased from zero to 3 while at mean acceleration of 25.5 the difference is less than 0.5.

I find that performance results are somewhat dependent on choice of metric. For instance, the difference in percent correct trajectory links between mean accelerations of

1.5 and 25.5 at noise strength of 3 is less than 10%, yet the difference in VI for these parameters is much more significant ranging from VI greater than 2 for mean acceleration of 1.5 to less than 1 for mean acceleration of 25.5. However, on the whole, performance trends are consistent when examined using either metric, and the noise strength has a much greater influence on algorithm performance at low accelerations than it does at high accelerations.

Finally, algorithm run-times were calculated for the set of exhaustive simulations performed. Using a desktop PC with an Intel E6750 processor, 4GB of RAM and running MATLAB 2010a on Ubuntu 12.04 LTS, a median run-time of 4.041 seconds was observed, while the mean run-time was 55.759 seconds with a standard deviation of

133.849. The significant difference between the mean and median as well as the large standard deviation indicate that there are likely a handful of simulation trials that required a considerably longer time to execute than most of the trials. However, the median run-time of just around 4 seconds indicates that the majority of the analyses were able to be performed with in a very reasonable time frame.

162

4.4 Discussion

I have presented a tracking algorithm that specializes in determining trajectories of closely spaced, indistinguishable objects moving in a directional force field at high speeds and being spawned at fast rates. The algorithm combines a scoring function that takes into account expected motion due to the force field with a back-tracking method inspired by measurement-assignment techniques. In order to test its performance, I carried out simulation-based validation and sensitivity studies in which I systematically varied the spawn rate, initial conditions on object position and velocity (x0, y0, vx0, vy0), object acceleration (ay), noise strength or the signal-to-noise ratio (σ), and total number of particles. I also looked at how the performance scaled with the total number of particles, np, tracked. These were complemented by additional simulations designed to recapitulate microsphere tracking experiments, as well as single molecule DNA micromanipulation and microsphere tracking experiments in order to provide independent assays for the quality of track determinations. Results were quantified in terms of two measures: (a) the percentage of correct links identified and (b) the

Variation of Information (VI) score.

Starting with np, I found a weak dependence on particle number. This is not surprising since I expect entry of new objects to be compensated to some extent by the exit of others, roughly leaving the same number of particles to be tracked in each frame

163 independent of the total number of particles in simulation. The performance was generally robust as a function of the initial kinematical states of the particles (x0, y0, vx0, vy0, ay) with very high rates of successful link detection, 90% or higher, over a wide range of values for these quantities even when I simulated objects with large non-uniformities in velocities or accelerations (i.e., when I drew these values from intervals of varying lengths.) Also, I found that the spawn rate and, to a lesser extent, the noise strength played a bigger role in limiting the algorithm.

In altering the initial x-position x0, when it is drawn from a wide interval the resulting trajectories tend to be laterally spaced apart and this helps with the identification task. This is because misidentifications will lead to an object in the next frame undergoing a large lateral displacement. However, these are heavily penalized by the scoring function, leading to their rejection. Conversely, a point source leads to tightly clustered paths (the location of the point source does not matter – data not shown) and this case is difficult to parse. This is because the lateral separation between links may only be a few pixels making scoring-based discrimination less effective. This also explains why the performance measured in VI for the point source case is very sensitive to noise. Even for zero noise, trajectories are packed together tightly, leading to a challenging discrimination task. When noise is added, the chances of erroneous link assignments goes up even more making it that much harder to assign true trajectories.

164

For y0, at high spawn rates, increasing the length of the interval increases the probability that a new object A will spawn at time tj very close to the position of object B at time ti. If

A is closer to B’s location at tj-1 than B at tj is to its previous value, the algorithm will determine that A should be assigned to the trajectory of object B, while assigning B at tj to a new trajectory, leading to link (and trajectory) misidentifications.

When I looked at the effect of increasing the variation in vx0 I expected improved performance since greater non-uniformity in vx0 can increase horizontal separation between objects. However, because I allow objects to have both positive and negative x- velocity, the chance of multiple path intersections increases. The algorithm is capable of dealing with path intersections very well when the trajectories are spaced in time adequately (lower spawn rates) but as the objects are grouped closer and closer together

(at high spawn rates) it can become quite challenging to discern between trajectories that intersect frequently within a small space. These trends were recapitulated for vy0 where an increase in vy0 led to a slight decrease in performance. I attribute this difference to the design of the scoring function since motion in directions not aligned with the direction of the force will have increasingly larger penalties compared to motion more closely aligned with the force. Thus, variations in the movement of objects in the direction of the force will see a smaller change in performance since the algorithm

165 will not penalize these movements as heavily. As for ay, the results seemed robust to non-uniformities in the acceleration of the objects.

When looking at how performance depends on noise and spawn rate, in general I found a more prominent role for the spawn rate, although the signal-to-noise ratio of affected results as expected. Indeed, increasing noise strength correlates with a decrease in performance. This is especially true for trajectories that are closely spaced since densely clustered objects can essentially swap positions and still have trajectories close to the true ones. This is evident in Figure 4.5, which plots results from altering the initial x-position variation. In these simulations, the most densely clustered objects are represented by an initial x-position variation of 0 and a spawn rate of 16, with 16 objects appearing from the same exact point in every frame. As I look at the change in performance metrics from noise strength of 0 to noise strength of 1, I see a tremendous decrease in performance with VI values increasing 2-fold for noise strength of 1 compared to noise strength of 0. As the objects increase in separation (the initial x- position variation is larger), the effect of noise on the algorithm performance is less profound. Furthermore, when the objects are separated both spatially and temporally, the noise strength has even less influence on the algorithm performance. Again, referring to Figure 4.5, initial x-position variations of 300 or 400 and spawn rates of 1 or less lead to barely noticeable changes in VI values.

166

One of the fundamental limits of tracking algorithms is how closely the objects are located to one another. This is a function not only of the objects' motion in space, but also of the rate at which they enter or exit the sensor’s field of view, i.e., the rate at which they spawn. The role of the objects’ spatial density was evident in my simulations when I studied how sensitive the algorithm was to variations in the initial positions, velocities, and accelerations. These three parameters were found to have the most significant effect on object density with greater variation in their values leading to greater spatial separation. (The caveat is that greater variation in x-velocity and y- acceleration can also increase the frequency of trajectory intersections.)

The spawn rate also played a significant role in the ability of the algorithm to discriminate trajectories. Indeed, in any of the Figures 4.5 – 4.9, an examination of the same color bars of any sub-plot shows that the performance of the algorithm improves consistently as the object spawn rate is decreased. For object spawn rates of 1 or lower, the worst performance of the algorithm across all variables occurs for at an initial x- position variation of 0 and noise strength of 3, in which 70% of trajectory links were correctly identified and a VI of 1.4 is achieved. This is perhaps one of the most difficult cases to parse accurately, and yet the algorithm is capable of detecting 70% of trajectory links correctly while maintaining a VI of 1.4. In fact, the only other case in which VI is greater than 1 for a spawn rate of one is when the initial x-position variation is 0 and the

167 noise strength is 2. For all other parameter value combinations, VI is below 1. Even for a spawn rate of 2, the algorithm only has significant issues when the initial x-position variation is 0; for all other parameter combinations, VI at spawn rates of 2 does not significantly exceed 1. In an experimental setting, this means that if the frequency of observation (frame rate) is high enough to reduce the spawn rate to some value around

2, the algorithm should perform very well regardless of object dynamics.

Additionally, the algorithm has been used successfully to parse trajectories of real indistinguishable objects moving in clusters at high speeds. Referring back to sections 2.2.5 and 2.3.2, the tracking algorithm was used on video captures of superparamagnetic microspheres travelling through the experimental medium under the influence of a magnetic field. Figure 2.9 shows that the results of the tracking algorithm and the DNA tether fluctuation-dissipation theorem find agreement in the calculation of force as a function of distance from the magnet. This shows that the tracking algorithm can be implemented effectively in real-world scenarios.

Chapter 5 - Conclusions

5.1 Summary of Results

In this dissertation I have presented a complete transverse magnetic tweezers device for the investigation of single-molecule DNA-protein interactions. My device takes advantage of a novel attachment concept that preserves the advantages of a dual- bead DNA tether—namely, the ability to easily resolve tether extension and to passively compensate for experimental drift—while addressing difficulties that arise when implementing dual-bead DNA tether in optical tweezers and aspiration-pipette magnetic tweezers. This attachment concept uses a surface functionalized glass micro- rod to which the dual-bead DNA tether can bind. The glass micro-rod is able to be moved independently of the other experimental components, allowing the user to quickly examine several DNA tethers and select the best one for experimentation.

Computational methods were also developed to accurately analyze the experimental output. Sub-pixel localization of the bead centroids is accomplished by examining the concentric rings of the beads’ diffraction patterns. Fitting circles to these diffraction rings allows localization of the beads’ positions on the order of 10 nm, necessary to accurately determine the extension of the DNA tether as well as to quantify the force applied to the molecule by using the Fluctuation-Dissipation theorem.

168

169

Without sub-pixel localization, the physical pixel size of 58 nm would result in poor accuracy in the measurements of the experiments.

Furthermore, sub-pixel bead localization is required to observe the change of end-to-end tether length during DNA-protein interaction experiments. These extension changes are typically on the order of 10 nm. Extracting these extension changes from the data can prove challenging, as the data are noisy and the jumps in tether length can be varied in their size and duration. Thus, an algorithm capable of identifying these jumps without any a priori knowledge of the characteristics of the data set is necessary.

By examining the relative probability densities of the data set, along with some associated mathematical processing, I created an algorithm that is capable of identifying these step-like features in data.

The forces calculated from the transverse fluctuations of the tethered superparamagnetic particles were confirmed by taking advantage of Stokes Law, which relates the terminal velocity of the particle to its drag force. By allowing many superparamagnetic microspheres to freely move in a magnetic field at various distances from the magnet, the forces of the DNA tether experiments were verified. This process required the creation of a unique tracking algorithm which is able to parse position data from high-velocity, indistinguishable, densely-clustered objects into a set of trajectories.

This allowed the determination of the terminal velocities of the microspheres, which

170 was then used to calculate the effective magnetic force at various distances from the magnet. The agreement between the two methods of force calculation serves as testament to the capabilities of both the tweezers device and the tracking algorithm.

The culmination of all of these efforts is apparent in the DNA-histone binding experiments. The ability to precisely and automatically quantify the individual histone binding-unbinding events would not be possible without a system that can slowly and repeatedly manipulate a single molecule of DNA, precisely quantify the extensions and forces of the DNA molecule, automatically analyze the data to identify the sub-pixel extension changes, and to do so with the knowledge that the calculated values of force and extension are valid. Indeed, as seen in Figure 2.12, this transverse magnetic tweezers can accomplish all of these requirements and allow examination of individual

DNA-protein interactions with ease and precision.

5.2 Future Directions

There are several areas of this research that could potentially be improved with additional effort.

First, incorporating a method to examine twists and torques in DNA would be highly useful. Particularly as proteins interact with DNA, observing or imparting twists and torques on the molecule could provide interesting insight into the micromechanical

171 operations of this machinery. Such an addition would necessarily increase the complexity of the device, but that is true for any device that examines this aspect of the

DNA molecule.

The protocol for attachment of the DNA tethers to the glass micro-rod could also use some refinement. While multiple DNA molecules were able to be captured (up to 20 in some cases), this is only as several tens of thousands of molecules were introduced into the sample cell. Trying different ratios of the constituent components or adapting a different process order may yield improved attachment statistics. Additionally, the ability to have DNA attached close enough to test proteins that interact with two separate strands of DNA would be significant, a phenomenon which at this moment can only be performed by just one extremely complex quad-optical trap device in the world.

The step-finding algorithm could potentially be developed further in two ways.

First, it may be interesting to expand its abilities to multi-dimensional data; that is, to try and identify “plateaus” in 3-dimensional data, or “cubes” in 4-dimensional data.

Secondly, using the general method of examining relative probability densities may accommodate other shapes of data, such as quadratics or harmonics. Such shapes would likely have unique signatures in the modified probability density functions, and

172 searching for these signatures may allow detection of data features beyond the basic steps.

The primary issue with the tracking algorithm lies with the execution time. While most situations can be analyzed in just a few seconds, for substantially complex scenes, computation times can grow. This is likely due to inefficient programming or mathematical procedures. Attacking these issues so that the algorithm can work much faster would be highly beneficial to other potential applications that may require quicker turn-around time for scene analysis.

Appendices

A.1 Protocol for surface functionalization of glass micro-rod

All tools and containers should be sterilized, either from the factory or by autoclave.

1) Clean the glass micro-rod by submerging in pure acetone for 2 hours. Remove and allow to dry.

2) Create 5 mL pegylation buffer (ethanol/water w/w 95%/5%)

3) Add 1 mL pegylation buffer to stock 100mg silane-PEG-biotin and vortex. Remove complete volume of solution mix back into pegylation buffer for a total 5 mL of 20 mg/mL silane-PEG-biotin.

4) Aliquot 500 µL of the silane-PEG-biotin in pegylation buffer into 5 separate microtubes, each with 100 µL. Place 3 glass micro-rods into each tube. This should be enough for 2 weeks of experiments, the expected shelf-life of functionalized glass micro- rods. a) Aliquot the remaining 4.5 mL into individual 500 µL volumes. Freeze the microtubes for future use, picking up the protocol at step 4.

5) Slowly agitate glass micro-rods in silane-PEG-biotin mixture for 2 hours.

6) Remove glass micro-rods and wash with copious amounts of distilled, de-ionized water.

173

7) Place functionalized glass micro-rods in passivation buffer (1x PBS, 1% w/v BSA,

0.04% Tween 20). Allow passivation to occur overnight at 4°C. Remove glass micro-rods from passivation buffer and allow to dry at 4°C.

174

A.2 Protocol for DNA End-functionalization

All tools and containers should be sterilized, either from the factory or by autoclave.

I) Prepare in advance 100x Tris-EDTA (TE) buffer.

1) Prepare 150 mM Tris in sterilized, twice-filtered, de-ionized water (18.2 g Tris in 100 mL water).

2) Adjust to pH 8 using HCl to reduce or NaOH to increase.

3) Add 0.56 g of EDTA and complete to 150 mL volume with water.

4) Sterilize and filter again.

II) λ-DNA ligation with oligomer 5’-GGG CGG CGA CCT-dig (ONLdig)

1) Place 10 µg (20 µL) of λ-DNA into 200 µL tube.

2) Add 15 µL ONLdig in solution. (The concentration of the ONLdig solution should be such that the number of oligos in 15 µL of solution is ten times the number of λ-.

The mass of one λ-DNA is 10-11 µg, so the final number of λ-DNAs in 10 µg is 1012. The stock lyophilized ONLdig should be dissolved in an appropriate amount of 1x TE such that 15 mL of the final solution contains 1013 ONLdig.) Meanwhile, set the incubator to

65°C.

3) Mix the solution with the pipettor and incubate at 65°C for 5 minutes.

175

4) Let the DNA solution cool down at room temperature for 5 minutes. Reduce the incubator to room temperature.

5) Melt and vortex 10x T4 ligase buffer (ligase buffer is stored at -20°C).

6) Add 4 µL of T4 ligase buffer.

7) Add 1 µL of ligase enzyme. (Minimize the exposure of ligase enzyme to room temperature.) Incubate for 30 minutes at room temperature.

III) Extraction of λ-DNA-ONLdig constructs with QIAX II Gel Extraction Kit

1) Add 120 µL of QXI buffer and 80 µL water to 40 µL sample from Part II. Set the incubator temperature to 65°C.

2) Vortex QIAX II gel and add 20 µL to the sample (10 µL per 5mg DNA).

3) Incubate at room temperature for 10 minutes, flicking with finger repeatedly to ensure the gel remains suspended.

4) Spin for 1 minute and remove supernatant.

5) Wash sample twice with 500 µL PE buffer.

6) Air dry sample for at least 10 minutes, or until all PE buffer is evaporated.

7) Add 40 µL of TE buffer.

8) Incubate for 10 minutes at 65°C.

9) Centrifuge for 1 minute.

176

10) The DNA is now extracted in the solution (supernatant); remove the supernatant to a clean tube for further steps.

IV) λ-DNA ligation with oligomer 5’ –AGG TCG CCG CCC- biotin (ONRbio)

1) Add to the 40 µL sample (from Part III) 3 µL of ONRbio in solution. (The 3 µL of

ONRbio in solution should contain the same number of oligos as was used in part II, step 2. Thus, dissolve the stock ONRbio oligos in an appropriate amount of 1x TE buffer such that 3 µL of ONRbio in solution contains 1013 oligos.)

2) Mix the solution with the pipettor and incubate at 65°C for 10 minutes. After incubation, immediately cool down the incubator to room temperature.

3) Let sample cool down to room temperature for 5 minutes.

4) Melt and vortex 10x T4 ligase buffer and add 5 µL buffer to sample, bringing total volume to 48 µL.

5) Add 2 µL of ligase enzyme. (Minimize the exposure of ligase enzyme to room temperature.) Incubate for 30 minutes at room temperature.

V) Extraction of λ-DNA-ONLdig constructs with QIAX II Gel Extraction Kit

1) Add 150 µL of QXI buffer and 100 µL water to 50 µL sample from Part IV. Set the incubator temperature to 65°C.

2) Vortex QIAX II gel and add 20 µL to the sample (10 µL per 5mg DNA).

177

3) Incubate at room temperature for 10 minutes, flicking with finger repeatedly to ensure the gel remains suspended.

4) Spin for 1 minute and remove supernatant.

5) Wash sample twice with 500 µL PE buffer.

6) Air dry sample for at least 10 minutes, or until all PE buffer is evaporated.

7) Add 100 µL of TE buffer.

8) Incubate for 10 minutes at 65°C.

9) Centrifuge for 1 minute.

10) The DNA is now extracted in the solution (supernatant); remove the supernatant to a clean tube.

11) Aliquot the DNA solution in 200 µL tubes, each containing 5 µL DNA stock. Freeze and store the stock at -20°C for up to 6 months.

VI) Attaching Beads to End-functionalized lambda-DNA

1) Add ~100 µL of 1x TE buffer to 5 µL DNA stock.

2) Add 12.5 µL of the Bangs Labs streptavidin-coated microspheres and 25 µL of the anti-digoxigen-labeled superparamagnetic microspheres.

3) Mix briefly with the pipettor by drawing the solution in and out 3-5 times. Incubate for 10-20 minutes at room temperature. The DNA-bead constructs should be ready for use

178

A.3 Protocol for labeling of superparamagnetic beads with anti-digoxigenin

All tools and containers should be sterilized, either from the factory or by autoclave.

1) Wash 1 mL of stock 2.8 m MagnaLink Amino Magnetic Beads using 1x PBS a) Vortex beads in vial. Then transfer 1mL of beads from vial to 15 mL tube. Centrifuge and remove supernatant (stock buffer). b) Add 2 mL PBS to beads and vortex. Centrifuge and remove supernatant (PBS). c) Repeat step (b) two more times (three times in total).

2) After third PBS rinse, resuspend the beads in a 1x PBS, 5% glutaraldehyde solution overnight at 4°C. a) 1x PBS, 5% glutaraldehyde can be made by combining a 5% glutaraldehyde solution at 1:1 ratio with 2x PBS. (Dilute higher percent glutaraldehyde with DI water to obtain

5% solution if necessary; dilute 10x PBS to 2x; then combine equal amounts of each.)

3) Wash the beads 5 times with 1x PBS using the same method as above (2 mL 1x PBS for each wash) to completely remove the excess glutaraldehyde.

4) Resuspend the beads in a solution of 0.2 mg/mL anti-digoxigenin in 1x PBS. Incubate at room temperature for 4 hours.

179

a) The anti-dig solution can be made by adding 1 mL 1x PBS to the 200 µg (0.2 mg) powdered (lyophilized) anti-dig from Roche; add PBS directly to vial and vortex to completely dissolve the powdered anti-dig.

5) Centrifuge the bead-anti-dig solution and discard the supernatant. Resuspend the beads in 2 mL of 0.5M ethanolamine in 1x PBS. Incubate for 30 minutes at room temperature. a) Ethanolamine has MW of 61.08 and density of 1.01 g/mL. To make 10 mL of 0.5 M ethanolamine in 1x PBS, start with 0.302 mL (302 µL) of stock ethanolamine and add 1x

PBS to bring the total volume to 10 mL. In other words, add 9.698 mL of 1x PBS to 0.302 mL of ethanolamine to obtain 10 mL of 0.5 M ethanolamine in 1x PBS.

6) Centrifuge the ethanolamine-bead solution and discard the supernatant.

7) Resuspend the beads in 2 mL of 10 mg/mL bovine serum albumin (BSA) in PBS. Incubate for 30 minutes at room temperature. a) Each vial of Thermo ImjectBSA contains 20 mg of lyophilized BSA, as well as salts for

PBS. Add 2 mL of DI sterilized water directly to the vial to obtain 10 mg/mL BSA in

PBS.

8) Centrifuge the bead-BSA solution and remove the supernatant. Resuspend the beads again in BSA-PBS solution as in step 7. Store beads in refrigerator for up to 6 months.

180

A.4 Magnetic Tweezers Component List

Component Vendor Item Details Price (USD)

Optical Table with Active Pneumatic Thorlabs T46HK $6700 Isolation

Inverted Light Nikon Eclipse TS100 $8000 Microscope

Objective, 40x Olympus RMS40X $785 0.65NA

CMOS Camera with Grasshopper3 Point Grey $1350 USB3 GS3-U3-23S6M-C

3 × LSM Stage / Electronic Linear 3 × XMCB Motorized Stage Zaber Controller/ $7262 System 1 × XJOY 3/ Cables

Hydraulic Siskiyou MX630L S3432 $4050 Micromanipulator

521H15A300 (rail) Structural Rail and $480 Thomson 511H15A0 Carriages (carriage)

Micromanipulator Sutter MD54 $480 Platform

Syringe Pump (x2) New Era Pump NE-1000 $750 (x2)

Vertical Puller Narashige PC-10 $2973

Glass Micro-rods, Vitrocom custom $470 (per 500) Square, 0.25mm

181

Glass Capillary, 1mm OD, 6in WPI 1B100-6 $65 (per 500) length

Tygon Tubing US Plastics ND-100-80, 0.04ID $212

N42, NeFeBr Magnets (x10) Indigo Instruments $0.14 (x10) 4 x 2 x 1 mm

Desktop PC System HP Z230 $1200

External RAID CFI B8283ERGG $380 storage

Hard Drives (x8) Seagate ST2000DM001 $76 (x8)

MATLAB Mathworks R2015a Academic $500

National LabView LabView 2015 SP1 $1000 Instruments

Carbon Fiber Sample Cell Holder Custom 3D printed $233 Reinforced Plastic

Hydraulic Micromanipulator Custom machined Aluminum stock $70 Mounting Bracket

Sample Cells Custom 3D printed Somos 11122XC $240 (per 6)

Glass micro-rod Custom 3D printed VeroWhite plastic $70 Holder

Total $37,347 Where items are no longer produced or available, comparable items are specified. A partial list of consumables includes cover slips (#1), RTV silicone sealant, syringes, and syringe needles.

182

A.5 Protocol for preparation of histones

Reagants and Antibodies: F12 (Coon’s) medium was purchased from Sigma-Aldrich (St.

Louis, MO) and fetal bovine serum (FBS) from Gemini Bio-Products (Woodland, CA).

I) Cell Culture

1) WIF-B cells were grown in a humidified 7% CO2 incubator at 37°C

2) Briefly, cells were grown in F12 medium, pH 7.0, supplemented with 5% FBS, 10 µM hypoxanthine, 40 nM aminopterin and 1.6 µM thymidine.

3) Cells were seeded at 1.3x104 cells/cm2 and grown for 8-12 days until they reached maximum density and polarity.

II) Histone Purification

1) Confluent monolayers of WIF-B cells grown in 10 cm dishes were lysed in 0.8 mL of extraction buffer and histones purified using the spin column-based Histone

Purification Mini Kit (Active Motif, Carlsbad, CA) according to the manufacturer’s instructions.

2) Concentration was assessed by analyzing the eluted proteins via Bradford Assay.

3) The final histone solution was stored at -20°C until required for experimental use.

183

4) When preparing histone samples for an experiment, the individual samples were further diluted to a concentration of 0.2 mg/mL.

184

A.6 Bead Centroid Localization code

% --- Executes bead centroid localization method % --- on button press in calcbutton. function calcbutton_Callback(hObject, eventdata, handles) % hObject handle to calcbutton (see GCBO) % eventdata reserved - to be defined in a future version of MATLAB % handles structure with handles and user data (see GUIDATA) set(findobj('tag','stopbutton'),'enable','on'); data=get(gcf,'userdata'); %thresh = data.thresh; %experimental threshold dim = data.cropDim; ah = handles.MovDisplay; cla(ah); %determine bead radius in pixels %and calculate appropriate radius search values %data.magnif=40; data.dpp = 0.0577; %distance per pixel ~ pixel size in microns % revised to 0.0577 for new camera, 6/8/15 if (data.arad < data.mrad) minrad = round(((data.dpp)/(data.arad*0.25))^(-1)); else minrad = round(((data.dpp)/(data.mrad*0.25))^(-1)); end if (data.arad > data.mrad) maxrad = round(((data.dpp)/(data.arad*0.5))^(-1)); else maxrad = round(((data.dpp)/(data.mrad*0.5))^(-1)); end tic; for ind=data.frameIndex:data.nframes; data2=get(gcf,'userdata'); try tempnum = data.frameIndex; singleFrame = read(data.mov,data.frameIndex); data.frameIndex=data.frameIndex+1;

185

img = singleFrame(:,:,1); clear singleFrame; % set up the axes cla(ah); %imshow(img,'Parent',ah,'InitialMagnification','fit'); %axis tight;

set(findobj('tag','text1'),'String',[num2str(data.frameIndex-1),... '/',num2str(data.nframes)]); %apply the crop dimensions img2=img(dim(2):dim(2)+dim(4),dim(1):dim(1)+dim(3)); %call Circular Hough Xfrm with parameters to %optimize speed and bead recognition %(see CircularHough_Grd.m for specifics) [accum, circen, cirrad] = ... CircularHough_Grd(img2, [minrad maxrad], 2, 20, 0.9); [dist, beads] = GetBeads(accum, circen, cirrad); imshow(img2,'Parent',ah,'InitialMagnification','fit'); hold on; plot(beads(1,1), beads(1,2), 'r+'); plot(beads(2,1), beads(2,2), 'r+'); hold off; % here we subtract out the radii of the beads data.dists(data.frameIndex-1)= ... dist-((data.arad+data.mrad)/(data.dpp)); data.bead1(data.frameIndex-1,:)=beads(1,:); data.bead2(data.frameIndex-1,:)=beads(2,:); catch EX %#ok<*NASGU> disp(['read failed: ',num2str(data.frameIndex-1),' in ',data.file]); disp([EX.message]) if(tempnum==data.frameIndex) data.frameIndex = data.frameIndex + 1; end %disp([EX.identifier,' ',EX.message,' ',EX.stack]); end if(isfield(data2,'pause') && data2.pause)

186

data.pause = 0; set(gcf,'userdata',data); return; end %set(gcf,'userdata',data); clear img img2 BW block dist p1 p2 data2;%cx cy pause(1.0/data.framerate); end set(gcf,'userdata',data); data.tElapsed=toc; disp('done');

% This function takes the Circular Hough transform output and determines % which point is the aspirated bead and which is the magnetic bead function [ dist, beads ] = GetBeads( accum, circen, cirrad ) lookfor = 2; beads = zeros(lookfor); avgrad = mean(cirrad); m=size(accum,1); n=size(accum,2); % find tallest peak in accumulation array % which should be one of the two beads [svals,idx] = sort(accum(:),'descend'); % sort to vector svals(1); % largest value [II,JJ] = ind2sub([m,n],idx(1)); % position in the matrix k = dsearchn(circen, [JJ,II]); beads(1,:) = circen(k,:); found = 1; z=1; % search for other peaks in the accumulation array % outside of 3x radii of beads and also only % to 40% of accumulation array size while found < lookfor && z < (0.4*m), [II,JJ] = ind2sub([m,n],idx(z)); if ( ((JJ-beads(1,1))^2 + (II-beads(1,2))^2) > ((3*avgrad)^2) ) found = found+1;

187

k = dsearchn(circen, [JJ,II]); beads(found,:) = circen(k,:); else z = z+1; end end % Check to make sure beads exist % and that the magnetic bead is closer to the magnet % if not, switch bead indices if ( (beads(1,:) ~= zeros) & (beads(2,:) ~= zeros) ) if ( (beads(1,2) > beads(2,2) ) ), temp = beads(1,:); beads(1,:) = beads(2,:); beads(2,:) = temp; end dist = sqrt( (beads(1,1)-beads(2,1))^2 + (beads(1,2)-beads(2,2))^2 ); else dist = 0; end end

% This function implements a Circular Hough transform to % detect circular shapes in a grayscale image and to resolve % their center positions and radii. function [accum, varargout] = CircularHough_Grd(img, radrange, varargin) % [accum, circen, cirrad, dbg_LMmask] = CircularHough_Grd( % img, radrange, grdthres, fltr4LM_R, multirad, fltr4accum) % % INPUT: (img, radrange, grdthres, fltr4LM_R, multirad, fltr4accum) % img: A 2-D grayscale image (NO B/W bitmap) % radrange: The possible minimum and maximum radii of the circles % to be searched, in the format of % [minimum_radius , maximum_radius] (unit: pixels) % **NOTE**: A smaller range saves computational time and % memory. % grdthres: (Optional, default is 10, must be non-negative)

188

% The algorithm is based on the gradient field of the % input image. A thresholding on the gradient magnitude % is performed before the voting process of the Circular % Hough transform to remove the 'uniform intensity' % (sort-of) image background from the voting process. % In other words, pixels with gradient magnitudes smaller % than 'grdthres' are NOT considered in the computation. % **NOTE**: The default parameter value is chosen for % images with a maximum intensity close to 255. For cases % with dramatically different maximum intensities, e.g. % 10-bit bitmaps in stead of the assumed 8-bit, the default % value can NOT be used. A value of 4% to 10% of the maximum % intensity may work for general cases. % fltr4LM_R: (Optional, default is 8, minimum is 3) % The radius of the filter used in the search of local % maxima in the accumulation array. To detect circles whose % shapes are less perfect, the radius of the filter needs % to be set larger. % multirad: (Optional, default is 0.5) % In case of concentric circles, multiple radii may be % detected corresponding to a single center position. This % argument sets the tolerance of picking up the likely % radii values. It ranges from 0.1 to 1, where 0.1 % corresponds to the largest tolerance, meaning more radii % values will be detected, and 1 corresponds to the smallest % tolerance, in which case only the "principal" radius will % be picked up. % fltr4accum: (Optional. A default filter will be used if not given) % Filter used to smooth the accumulation array. Depending % on the image and the parameter settings, the accumulation % array built has different noise level and noise pattern % (e.g. noise frequencies). The filter should be set to an % appropriately size such that it's able to suppress the % dominant noise frequency. % OUTPUT: [accum, circen, cirrad, dbg_LMmask] % accum: The result accumulation array from the Circular Hough

189

% transform. The accumulation array has the same dimension % as the input image. % circen: (Optional) % Center positions of the circles detected. Is a N-by-2 % matrix with each row contains the (x, y) positions % of a circle. For concentric circles (with the same center % position), say k of them, the same center position will % appear k times in the matrix. % cirrad: (Optional) % Estimated radii of the circles detected. Is a N-by-1 % column vector with a one-to-one correspondance to the % output 'circen'. A value 0 for the radius indicates a % failed detection of the circle's radius. % dbg_LMmask: (Optional, for debugging purpose) % Mask from the search of local maxima in the accumulation % array. if ndims(img) ~= 2 || ~isnumeric(img), error('CircularHough_Grd: ''img'' has to be 2 dimensional'); end if ~all(size(img) >= 32), error('CircularHough_Grd: ''img'' has to be larger than 32-by-32'); end if numel(radrange) ~= 2 || ~isnumeric(radrange), error(['CircularHough_Grd: ''radrange'' has to be ', ... 'a two-element vector']); end prm_r_range = sort(max( [0,0;radrange(1),radrange(2)] )); % Parameters (default values) prm_grdthres = 10; prm_fltrLM_R = 8; prm_multirad = 0.5; func_compu_cen = true; func_compu_radii = true;

% Validation of arguments vap_grdthres = 1;

190

if nargin > (1 + vap_grdthres), if isnumeric(varargin{vap_grdthres}) && ... varargin{vap_grdthres}(1) >= 0, prm_grdthres = varargin{vap_grdthres}(1); else error(['CircularHough_Grd: ''grdthres'' has to be ', ... 'a non-negative number']); end end vap_fltr4LM = 2; % filter for the search of local maxima if nargin > (1 + vap_fltr4LM), if isnumeric(varargin{vap_fltr4LM}) && varargin{vap_fltr4LM}(1) >= 3, prm_fltrLM_R = varargin{vap_fltr4LM}(1); else error(['CircularHough_Grd: ''fltr4LM_R'' has to be ', ... 'larger than or equal to 3']); end end vap_multirad = 3; if nargin > (1 + vap_multirad), if isnumeric(varargin{vap_multirad}) && ... varargin{vap_multirad}(1) >= 0.1 && ... varargin{vap_multirad}(1) <= 1, prm_multirad = varargin{vap_multirad}(1); else error(['CircularHough_Grd: ''multirad'' has to be ', ... 'within the range [0.1, 1]']); end end vap_fltr4accum = 4; % filter for smoothing the accumulation array if nargin > (1 + vap_fltr4accum), if isnumeric(varargin{vap_fltr4accum}) && ... ndims(varargin{vap_fltr4accum}) == 2 && ... all(size(varargin{vap_fltr4accum}) >= 3), fltr4accum = varargin{vap_fltr4accum}; else

191

error(['CircularHough_Grd: ''fltr4accum'' has to be ', ... 'a 2-D matrix with a minimum size of 3-by-3']); end else % Default filter (5-by-5) fltr4accum = ones(5,5); fltr4accum(2:4,2:4) = 2; fltr4accum(3,3) = 6; end func_compu_cen = ( nargout > 1 ); func_compu_radii = ( nargout > 2 ); % Reserved parameters dbg_on = false; % debug information dbg_bfigno = 4; if nargout > 3, dbg_on = true; end % Building accumulation array % Convert the image to single if it is not of % class float (single or double) img_is_double = isa(img, 'double'); if ~(img_is_double || isa(img, 'single')), imgf = single(img); end % Compute the gradient and the magnitude of gradient if img_is_double, [grdx, grdy] = gradient(img); else [grdx, grdy] = gradient(imgf); end grdmag = sqrt(grdx.^2 + grdy.^2); % Get the linear indices, as well as the subscripts, of the pixels % whose gradient magnitudes are larger than the given threshold grdmasklin = find(grdmag > prm_grdthres); [grdmask_IdxI, grdmask_IdxJ] = ind2sub(size(grdmag), grdmasklin); rr_4linaccum = double( prm_r_range ); linaccum_dr = [ (-rr_4linaccum(2) + 0.5) : -rr_4linaccum(1) , ... (rr_4linaccum(1) + 0.5) : rr_4linaccum(2) ];

192

lin2accum_aJ = floor( ... double(grdx(grdmasklin)./grdmag(grdmasklin)) * linaccum_dr + ... repmat( double(grdmask_IdxJ)+0.5 , [1,length(linaccum_dr)] ) ... ); lin2accum_aI = floor( ... double(grdy(grdmasklin)./grdmag(grdmasklin)) * linaccum_dr + ... repmat( double(grdmask_IdxI)+0.5 , [1,length(linaccum_dr)] ) ... ); % Clip the votings that are out of the accumulation array mask_valid_aJaI = ... lin2accum_aJ > 0 & lin2accum_aJ < (size(grdmag,2) + 1) & ... lin2accum_aI > 0 & lin2accum_aI < (size(grdmag,1) + 1); mask_valid_aJaI_reverse = ~ mask_valid_aJaI; lin2accum_aJ = lin2accum_aJ .* mask_valid_aJaI + mask_valid_aJaI_reverse; lin2accum_aI = lin2accum_aI .* mask_valid_aJaI + mask_valid_aJaI_reverse; clear mask_valid_aJaI_reverse; % Linear indices (of the votings) into the accumulation array lin2accum = sub2ind( size(grdmag), lin2accum_aI, lin2accum_aJ ); lin2accum_size = size( lin2accum ); lin2accum = reshape( lin2accum, [numel(lin2accum),1] ); clear lin2accum_aI lin2accum_aJ; % Weights of the votings, currently using the gradient maginitudes % but in fact any scheme can be used (application dependent) weight4accum = ... repmat( double(grdmag(grdmasklin)) , [lin2accum_size(2),1] ) .* ... mask_valid_aJaI(:); clear mask_valid_aJaI; % Build the accumulation array using Matlab function 'accumarray' accum = accumarray( lin2accum , weight4accum ); accum = [ accum ; zeros( numel(grdmag) - numel(accum) , 1 ) ]; accum = reshape( accum, size(grdmag) ); % Locating local maxima in the accumulation array % Stop if no need to locate the center positions of circles if ~func_compu_cen, return; end

193

clear lin2accum weight4accum; % Parameters to locate the local maxima in the accumulation array % -- Segmentation of 'accum' before locating LM prm_useaoi = true; prm_aoithres_s = 2; prm_aoiminsize = floor(min([ min(size(accum)) * 0.25, ... prm_r_range(2) * 1.5 ])); % Filter for searching for local maxima prm_fltrLM_s = 1.35; prm_fltrLM_r = ceil( prm_fltrLM_R * 0.6 ); prm_fltrLM_npix = max([ 6, ceil((prm_fltrLM_R/2)^1.8) ]); % Lower bound of the intensity of local maxima prm_LM_LoBndRa = 0.2; % minimum ratio of LM to the max of 'accum' % Smooth the accumulation array fltr4accum = fltr4accum / sum(fltr4accum(:)); accum = filter2( fltr4accum, accum ); % Select a number of Areas-Of-Interest from the accumulation array if prm_useaoi, % Threshold value for 'accum' prm_llm_thres1 = prm_grdthres * prm_aoithres_s; % Thresholding over the accumulation array accummask = ( accum > prm_llm_thres1 ); % Segmentation over the mask [accumlabel, accum_nRgn] = bwlabel( accummask, 8 ); % Select AOIs from segmented regions accumAOI = ones(0,4); for k = 1 : accum_nRgn, accumrgn_lin = find( accumlabel == k ); [accumrgn_IdxI, accumrgn_IdxJ] = ... ind2sub( size(accumlabel), accumrgn_lin ); rgn_top = min( accumrgn_IdxI ); rgn_bottom = max( accumrgn_IdxI ); rgn_left = min( accumrgn_IdxJ ); rgn_right = max( accumrgn_IdxJ ); % The AOIs selected must satisfy a minimum size if ( (rgn_right - rgn_left + 1) >= prm_aoiminsize && ...

194

(rgn_bottom - rgn_top + 1) >= prm_aoiminsize ), accumAOI = [ accumAOI; ... rgn_top, rgn_bottom, rgn_left, rgn_right ]; end end else % Whole accumulation array as the one AOI accumAOI = [1, size(accum,1), 1, size(accum,2)]; end % Thresholding of 'accum' by a lower bound prm_LM_LoBnd = max(accum(:)) * prm_LM_LoBndRa; % Build the filter for searching for local maxima fltr4LM = zeros(2 * prm_fltrLM_R + 1); [mesh4fLM_x, mesh4fLM_y] = meshgrid(-prm_fltrLM_R : prm_fltrLM_R); mesh4fLM_r = sqrt( mesh4fLM_x.^2 + mesh4fLM_y.^2 ); fltr4LM_mask = ... ( mesh4fLM_r > prm_fltrLM_r & mesh4fLM_r <= prm_fltrLM_R ); fltr4LM = fltr4LM - ... fltr4LM_mask * (prm_fltrLM_s / sum(fltr4LM_mask(:))); if prm_fltrLM_R >= 4, fltr4LM_mask = ( mesh4fLM_r < (prm_fltrLM_r - 1) ); else fltr4LM_mask = ( mesh4fLM_r < prm_fltrLM_r ); end fltr4LM = fltr4LM + fltr4LM_mask / sum(fltr4LM_mask(:));

% **** Debug code (begin) if dbg_on, dbg_LMmask = zeros(size(accum)); end % **** Debug code (end) % For each of the AOIs selected, locate the local maxima circen = zeros(0,2); for k = 1 : size(accumAOI, 1), aoi = accumAOI(k,:); % just for referencing convenience % Thresholding of 'accum' by a lower bound

195

accumaoi_LBMask = ... ( accum(aoi(1):aoi(2), aoi(3):aoi(4)) > prm_LM_LoBnd ); % Apply the local maxima filter candLM = conv2( accum(aoi(1):aoi(2), aoi(3):aoi(4)) , ... fltr4LM , 'same' ); candLM_mask = ( candLM > 0 ); % Clear the margins of 'candLM_mask' candLM_mask([1:prm_fltrLM_R, (end-prm_fltrLM_R+1):end], :) = 0; candLM_mask(:, [1:prm_fltrLM_R, (end-prm_fltrLM_R+1):end]) = 0; % **** Debug code (begin) if dbg_on, dbg_LMmask(aoi(1):aoi(2), aoi(3):aoi(4)) = ... dbg_LMmask(aoi(1):aoi(2), aoi(3):aoi(4)) + ... accumaoi_LBMask + 2 * candLM_mask; end % **** Debug code (end) % Group the local maxima candidates by adjacency, compute the % centroid position for each group and take that as the center % of one circle detected [candLM_label, candLM_nRgn] = bwlabel( candLM_mask, 8 ); for ilabel = 1 : candLM_nRgn, % Indices (to current AOI) of the pixels in the group candgrp_masklin = find( candLM_label == ilabel ); [candgrp_IdxI, candgrp_IdxJ] = ... ind2sub( size(candLM_label) , candgrp_masklin ); % Indices (to 'accum') of the pixels in the group candgrp_IdxI = candgrp_IdxI + ( aoi(1) - 1 ); candgrp_IdxJ = candgrp_IdxJ + ( aoi(3) - 1 ); candgrp_idx2acm = ... sub2ind( size(accum) , candgrp_IdxI , candgrp_IdxJ ); % Minimum number of qulified pixels in the group if sum(accumaoi_LBMask(candgrp_masklin)) < prm_fltrLM_npix, continue; end % Compute the centroid position candgrp_acmsum = sum( accum(candgrp_idx2acm) );

196

cc_x = sum( candgrp_IdxJ .* accum(candgrp_idx2acm) ) / ... candgrp_acmsum; cc_y = sum( candgrp_IdxI .* accum(candgrp_idx2acm) ) / ... candgrp_acmsum; circen = [circen; cc_x, cc_y]; end end % **** Debug code (begin) if dbg_on, figure(dbg_bfigno); imagesc(dbg_LMmask); axis image; title('Generated map of local maxima'); if size(accumAOI, 1) == 1, figure(dbg_bfigno+1); surf(candLM, 'EdgeColor', 'none'); axis ij; title('Accumulation array after local maximum filtering'); end end % **** Debug code (end) % Estimation of the Radii of Circles % Stop if no need to estimate the radii of circles if ~func_compu_radii, varargout{1} = circen; return; end % Parameters for the estimation of the radii of circles fltr4SgnCv = [2 1 1]; fltr4SgnCv = fltr4SgnCv / sum(fltr4SgnCv); % Find circle's radius using its signature curve cirrad = zeros( size(circen,1), 1 ); for k = 1 : size(circen,1), % Neighborhood region of the circle for building the sgn. curve circen_round = round( circen(k,:) ); SCvR_I0 = circen_round(2) - prm_r_range(2) - 1; if SCvR_I0 < 1, SCvR_I0 = 1; end

197

SCvR_I1 = circen_round(2) + prm_r_range(2) + 1; if SCvR_I1 > size(grdx,1), SCvR_I1 = size(grdx,1); end SCvR_J0 = circen_round(1) - prm_r_range(2) - 1; if SCvR_J0 < 1, SCvR_J0 = 1; end SCvR_J1 = circen_round(1) + prm_r_range(2) + 1; if SCvR_J1 > size(grdx,2), SCvR_J1 = size(grdx,2); end % Build the sgn. curve SgnCvMat_dx = repmat( (SCvR_J0:SCvR_J1) - circen(k,1) , ... [SCvR_I1 - SCvR_I0 + 1 , 1] ); SgnCvMat_dy = repmat( (SCvR_I0:SCvR_I1)' - circen(k,2) , ... [1 , SCvR_J1 - SCvR_J0 + 1] ); SgnCvMat_r = sqrt( SgnCvMat_dx .^2 + SgnCvMat_dy .^2 ); SgnCvMat_rp1 = round(SgnCvMat_r) + 1; f4SgnCv = abs( ... double(grdx(SCvR_I0:SCvR_I1, SCvR_J0:SCvR_J1)) .* SgnCvMat_dx + ... double(grdy(SCvR_I0:SCvR_I1, SCvR_J0:SCvR_J1)) .* SgnCvMat_dy ... ) ./ SgnCvMat_r; SgnCv = accumarray( SgnCvMat_rp1(:) , f4SgnCv(:) ); SgnCv_Cnt = accumarray( SgnCvMat_rp1(:) , ones(numel(f4SgnCv),1) ); SgnCv_Cnt = SgnCv_Cnt + (SgnCv_Cnt == 0); SgnCv = SgnCv ./ SgnCv_Cnt; % Suppress the undesired entries in the sgn. curve % -- Radii that correspond to short arcs SgnCv = SgnCv .* ( SgnCv_Cnt >= (pi/4 * [0:(numel(SgnCv_Cnt)-1)]') ); % -- Radii that are out of the given range SgnCv( 1 : (round(prm_r_range(1))+1) ) = 0; SgnCv( (round(prm_r_range(2))+1) : end ) = 0; % Get rid of the zero radius entry in the array SgnCv = SgnCv(2:end); % Smooth the sgn. curve

198

SgnCv = filtfilt( fltr4SgnCv , [1] , SgnCv ); % Get the maximum value in the sgn. curve SgnCv_max = max(SgnCv); if SgnCv_max <= 0, cirrad(k) = 0; continue; end % Find the local maxima in sgn. curve by 1st order derivatives % -- Mark the ascending edges in the sgn. curve as 1s and % -- descending edges as 0s SgnCv_AscEdg = ( SgnCv(2:end) - SgnCv(1:(end-1)) ) > 0; % -- Mark the transition (ascending to descending) regions SgnCv_LMmask = [ 0; 0; SgnCv_AscEdg(1:(end-2)) ] & (~SgnCv_AscEdg); SgnCv_LMmask = SgnCv_LMmask & [ SgnCv_LMmask(2:end) ; 0 ]; % Incorporate the minimum value requirement SgnCv_LMmask = SgnCv_LMmask & ... ( SgnCv(1:(end-1)) >= (prm_multirad * SgnCv_max) ); % Get the positions of the peaks SgnCv_LMPos = sort( find(SgnCv_LMmask) ); % Save the detected radii if isempty(SgnCv_LMPos), cirrad(k) = 0; else cirrad(k) = SgnCv_LMPos(end); for i_radii = (length(SgnCv_LMPos) - 1) : -1 : 1, circen = [ circen; circen(k,:) ]; cirrad = [ cirrad; SgnCv_LMPos(i_radii) ]; end end end % Output varargout{1} = circen; varargout{2} = cirrad; if nargout > 3, varargout{3} = dbg_LMmask; end

199

A.7 Force calculation code % This function takes the output from the movie analysis and calculates the % force for each frame based on fluctuation-dissipation theorem. Input % "data" is the MATLAB structure from the beadTracker, and input "window" % is the number of frames to study for calculation of the force. Typical % input for window would be between 150-300 frames (10 to 20 seconds). function out = dynforce(data,window,smopt,loop) % establish KBT constant value KB=1.3806503e-23; T=297; KBT=KB*T; a.dpp=0.0577; %0.0577; %distance per pixel ~ pixel size in microns %revised new camera 6/8/15 a.fname=data.fname; a.window=window; % Smooth DNA extension data.dists=specsmooth(data.dists,30,1); % Isolate x-values for magnetic bead a.b2x=data.bead2(:,1); % calculate deviation for each frame over % the window to which it belongs for j=1:length(a.b2x), % Establish start and stop points for each window start=round(j-(window/2)); if start <= 0, start=1; end; stop=round(j+(window/2)); if stop > length(a.b2x), stop=length(a.b2x); end; % Calculate average value for x-position over window avgxv=mean(a.b2x(start:stop)); % Calculate deviation through window dx=zeros([(stop-start) 1]);

200

dx(1:(1+length(dx)))=(a.b2x(start:stop))-avgxv; % Calculate square of deviations dx2=(a.dpp*(1E-06)*dx).^2; % Calculate fluctuation a.flx(j)=(mean(dx2)); % Calculate individual frame deviation a.dx(j)=(a.b2x(j)-avgxv)*((a.dpp)*1E-06); a.dx2(j)=a.dx(j).^2; end; a.flx=a.flx'; a.dx=(a.dx)'; a.dx2=(a.dx2)'; % Convert dists to metric units; establish normalized extension a.dists=(data.dists)*((a.dpp)*1E-06); a.ext=a.dists/(16.4E-06); a.force=(1E12)*(KBT*a.dists)./(2*(a.flx)); % Smoothing options if smopt==1 || smopt==12 || smopt==13 || smopt==123, a.ssflx=specsmooth(a.flx,window,loop); a.ssfforce=(1E12)*(KBT*a.dists)./(2*(a.ssflx)); a.ssforce=specsmooth(a.force,window,loop); end if smopt==2 || smopt==12 || smopt==23 || smopt==123, [a.gsflx,~]=gaussmooth(a.flx,round(window),loop); a.gsfforce=(1E12)*(KBT*a.dists)./(2*(a.gsflx)); a.gsforce=gaussmooth(a.force,round(window),loop); end if smopt==3 || smopt==13 || smopt==23 || smopt==123, [~,a.lsflx]=lsmooth(a.flx); a.lsflx=specsmooth(a.lsflx,round(window),loop); a.lsfforce=(1E12)*(KBT*a.dists)./(2*(a.lsflx)); [~,a.lsforce]=lsmooth(a.force); a.lsforce=specsmooth(a.lsforce,round(window),loop); end out = a; end

201

function out=specsmooth(data,n,loop) n=floor(n/2); l=length(data); for k=1:loop, if mod(k,10)==0, k end newdata=zeros(l,1); tempdata=zeros(l+2*n,1); tempdata(1:n)=data(1); tempdata((n+1):(l+n))=data; tempdata((l+n+1):length(tempdata))=data(l); for i=(n+1):(l+n), avg=mean(tempdata((i-n):(i+n))); med=median(tempdata((i-n):(i+n))); newdata(i-n)=(avg+med)/2; end data=newdata; clear tempdata newdata end out=data; end function [newdata,h]=gaussmooth(data,window,loop) if mod(window,2)==0, window=window+1; end sigma=1; l=length(data); newdata=zeros(l,1); h=fspecial('gaussian',[window 1],sigma); while h(1)==0, sigma=sigma+1; h=fspecial('gaussian',[window 1],sigma); end

202

for k=1:loop, tempdata=zeros(l+window-1,1); tempdata(1:floor(window/2))=data(1); tempdata(ceil(window/2):(l+floor(window/2)))=data(:); tempdata((l+ceil(window/2)):length(tempdata))=data(l); for i=ceil(window/2):(l+floor(window/2)), newdata(i-floor(window/2))=sum(h'*tempdata(i- floor(window/2):i+floor(window/2))); end data=newdata; if mod(k,50)==0, k end end end function [iter,out]=lsmooth(data) iter=0; temp=data; newtemp=data; temparray=ones(length(data),1); %cmap=colormap(lines(20)); close all %figure(3), hold on, plot(temp,'Color',cmap(1,:),'Marker','.'); while ~isempty(newtemp), iter=iter+1; [newtemp,~,~]=leia(temp); if ~isempty(newtemp), if iter==1, temparray(:,1)=newtemp; else temparray=horzcat(temparray,newtemp); end temp=newtemp; elseif isempty(newtemp), iter=iter-1;

203

end end %sta=size(temparray) temp1array=temparray(:,(3:(iter-3))); h=fspecial('gaussian',[2*size(temp1array,2) 1],(size(temp1array,2))/2); h=2*h; h=h(1:size(temp1array,2)); %st1a=size(temp1array) out=zeros(size(temp1array,1),1); for i=1:size(temp1array,1), out(i)=sum(h'.*temp1array(i,:)); end clear temp1array temparray temp newtemp if size(out,2)~=1, out=out'; end end function [xmax,imax,xmin,imin]=extrema(x) % This function analyzes a signal to determine local extrema (value and % location) xmax = []; imax = []; xmin = []; imin = []; Nt = numel(x); if Nt ~= length(x) error('Entry must be a vector.') end inan = find(isnan(x)); indx = 1:Nt; if ~isempty(inan) indx(inan) = []; x(inan) = []; Nt = length(x); end

204

% Difference between subsequent elements: dx = diff(x); if ~any(dx) return end % Flat peaks are associated with the middle of that section a = find(dx~=0); lm = find(diff(a)~=1) + 1; d = a(lm) - a(lm-1); a(lm) = a(lm) - floor(d/2); a(end+1) = Nt; % Determine other peaks xa = x(a); b = (diff(xa) > 0); xb = diff(b); imax = find(xb == -1) + 1; % maxima indexes imin = find(xb == +1) + 1; % minima indexes imax = a(imax); imin = a(imin); nmaxi = length(imax); nmini = length(imin); % Analyze the boundaries of the signal if (nmaxi==0) && (nmini==0) if x(1) > x(Nt) xmax = x(1); imax = indx(1); xmin = x(Nt); imin = indx(Nt); elseif x(1) < x(Nt) xmax = x(Nt); imax = indx(Nt); xmin = x(1); imin = indx(1); end return end

205

% Maximum or minumim at the ends if (nmaxi==0) imax(1:2) = [1 Nt]; elseif (nmini==0) imin(1:2) = [1 Nt]; else if imax(1) < imin(1) imin(2:nmini+1) = imin; imin(1) = 1; else imax(2:nmaxi+1) = imax; imax(1) = 1; end if imax(end) > imin(end) imin(end+1) = Nt; else imax(end+1) = Nt; end end xmax = x(imax); xmin = x(imin); % Clean up NaNs if ~isempty(inan) imax = indx(imax); imin = indx(imin); end imax = reshape(imax,size(xmax)); imin = reshape(imin,size(xmin)); % Sort results [~,inmax] = sort(-xmax); xmax = xmax(inmax); imax = imax(inmax); [xmin,inmin] = sort(xmin); imin = imin(inmin); end

206

function [out,nmax,nmin]=leia(data) % This function performs Local Extrema Interpolation Averaging [xmax,imax,xmin,imin]=extrema(data); [imax,maxidx]=sort(imax); xmax=xmax(maxidx);

[imin,minidx]=sort(imin); xmin=xmin(minidx); nmax=length(imax); nmin=length(imin); if nmax>=3 && nmin>=3, maxfunc=interp1(imax,xmax,(1:length(data)),'cubic',NaN); minfunc=interp1(imin,xmin,(1:length(data)),'cubic',NaN); for i=1:length(data), if isnan(maxfunc(i)), maxfunc(i)=data(i); end if isnan(minfunc(i)), minfunc(i)=data(i); end end out=(0.5)*(maxfunc+minfunc); else out=[]; end if size(out,2)~=1, out=out'; end end

207

A.8 Step-generator Simulation code

% This program generates synthetic step data. The inputs are: % size: the number of steps, a positive integer scalar % freq: how much time between successive steps in points. Scalar inputs % result in constant frequency of that value. A two value input of [a,b] % will automatically choose random step frequencies within that range % where a is the minimum and b is the maximum. % vert: vertical distance between successive steps. Inputs options % are the same as with the freq input; scalar for constant, or range % for random. % horz: length of each step plateau, aka duration, in points. Inputs % options are the same as with freq input; scalar for constant, or % range for random. % noise: to determine type of noise, inputs of "1" for clean, "2" for % uniform distribution, "3" for normal distribution. % sigma: to specify width (standard deviation) of noise. % % Output consists of the base vector which is the signal with no noise, and % the data vector which has the noise added. function [base,data,steps] = newStepGen(size,freq,vert,horz,noise,sigma,plotopt) x=2; y(1)=0; z=1; while z<=size, if ~isscalar(freq), lowt=freq(1,1); hight=freq(1,2); StSp=lowt+(hight-lowt).*rand(1,1); else StSp=freq; end if ~isscalar(horz), lowh=horz(1,1); highh=horz(1,2);

208

StLe=lowh+(highh-lowh).*rand(1,1); else StLe=horz; end if ~isscalar(vert), lowv=vert(1,1); highv=vert(1,2); StVe=lowv+(highv-lowv).*rand(1,1); else StVe=vert; end if StSp~=0, slope=StVe/StSp; elseif StSp==0, slope=1e4; end a=0; b=0; while b<(StLe), y(x)=y(x-1); if b==0, steps(z)=y(x); z=z+1; end b=b+1; x=x+1; end if z==size, break, end; if StSp~=0, while a

209

base=y'; if noise==1, data=base; end if noise==2, data=base+uniformnoise(sigma,length(base)); end if noise==3, data=base+gaussnoise(sigma,length(base)); end if noise==4, data=base+poissnoise(sigma,length(base)); end if plotopt==1, figure(1), plot(data,'b.'), hold on, plot(base,'r-'); end end function r=gaussnoise(sigma,size) r=sigma.*randn(size,1); end function r=uniformnoise(sigma,size) r=(sigma.*rand(size,1))-(0.5*sigma); end function r=poissnoise(sigma,size) r=poissrnd(sigma,size,1)-sigma; end

210

A.9 Step-finding Algorithm code

% This function implements the algorithm to detect step-like features in % generalized data sets. The only necessary input is the data signal that % is to be analyzed. If there is a "base" data set from the step generator % simulation, that can optionally be plotted against the found steps. function out = getstep2(data,base,plotopt) % Find maximum and minimum of data dmin=min(data); dmax=max(data); % Establish PDF via finite difference CDF cumudist=cdf(data,dmin,dmax); fdcdf=findif(cumudist); if plotopt==1, figure(9), plot(fdcdf(:,1),fdcdf(:,2),'r-'), hold on, end; if plotopt==1, figure(9), plot(fdcdf(:,1),del2(fdcdf(:,2)),'g-'), hold on, end; % Sum PDF and 2nd derivative of PDF fdcdf(:,2)=(fdcdf(:,2)+del2(fdcdf(:,2))); if plotopt==1, figure(9), plot(fdcdf(:,1),fdcdf(:,2),'m-'), hold on, end; % Square of PDF+PDF'' fdcdf(:,2)=fdcdf(:,2).^2; if plotopt==1, figure(9), plot(fdcdf(:,1),fdcdf(:,2),'b-'), hold on, end; sigsum(:,1)=cumudist(:,1); % Perform Local Extrema Interpolation Averaging to smooth high frequency % noise sigsum(:,2)=leia(fdcdf); if plotopt==1, figure(9), plot(sigsum(:,1),sigsum(:,2),'k-'), end; % Find peaks in resulting signal steps1=findpeaks(sigsum); if ~isempty(steps1), steps1=data(dsearchn(data,steps1')); steps1=sort(steps1); % Plot data signal with detected steps if plotopt==1, figure(3); hold on;

211

xmax=length(data(:,1)); for i=1:length(steps1), if steps1(i)~=0, b=steps1(i); plot([0 xmax],[b,b],'Color','k'); end; end; plot(data(:,1),'b-'); plot(base,'r-'); hold off; end end out.data=data; out.steps1=steps1; out.cdf=cumudist; out.fdcdf=fdcdf; out.sigsum=sigsum; end function out = cdf(data,dmin,dmax) % This function calculates the cumulative distribution function temp=diff(data); avgtemp=mean(abs(temp)); stdtemp=std(abs(temp)); ind=avgtemp/stdtemp; val=dmin-rand(1); k=1; while val<=dmax+ind, out(k,1)=val; out(k,2)=sum(data

212

% This function performs a finite difference calculation out=zeros(length(data),2); out(:,1)=data(:,1); for i=1:length(data)-1, dx=data(i+1,1)-data(i,1); dy=data(i+1,2)-data(i,2); out(i,2)=dy/dx; end end function [xmax,imax,xmin,imin] = extrema(x) % This function analyzes a signal to determine local extrema (value and % location) xmax = []; imax = []; xmin = []; imin = []; Nt = numel(x); if Nt ~= length(x) error('Entry must be a vector.') end inan = find(isnan(x)); indx = 1:Nt; if ~isempty(inan) indx(inan) = []; x(inan) = []; Nt = length(x); end % Difference between subsequent elements: dx = diff(x); if ~any(dx) return end % Flat peaks are associated with the middle of that section a = find(dx~=0); lm = find(diff(a)~=1) + 1;

213

d = a(lm) - a(lm-1); a(lm) = a(lm) - floor(d/2); a(end+1) = Nt; % Determine other peaks xa = x(a); b = (diff(xa) > 0); xb = diff(b); imax = find(xb == -1) + 1; % maxima indexes imin = find(xb == +1) + 1; % minima indexes imax = a(imax); imin = a(imin); nmaxi = length(imax); nmini = length(imin); % Analyze the boundaries of the signal if (nmaxi==0) && (nmini==0) if x(1) > x(Nt) xmax = x(1); imax = indx(1); xmin = x(Nt); imin = indx(Nt); elseif x(1) < x(Nt) xmax = x(Nt); imax = indx(Nt); xmin = x(1); imin = indx(1); end return end % Maximum or minumim at the ends if (nmaxi==0) imax(1:2) = [1 Nt]; elseif (nmini==0) imin(1:2) = [1 Nt]; else if imax(1) < imin(1) imin(2:nmini+1) = imin;

214

imin(1) = 1; else imax(2:nmaxi+1) = imax; imax(1) = 1; end if imax(end) > imin(end) imin(end+1) = Nt; else imax(end+1) = Nt; end end xmax = x(imax); xmin = x(imin); % Clean up NaNs if ~isempty(inan) imax = indx(imax); imin = indx(imin); end imax = reshape(imax,size(xmax)); imin = reshape(imin,size(xmin)); % Sort results [~,inmax] = sort(-xmax); xmax = xmax(inmax); imax = imax(inmax); [xmin,inmin] = sort(xmin); imin = imin(inmin); end function peeks1 = findpeaks(signal) % This function analyzes the final signal to determine likely peaks that % correspond to significant steps/dwells peeks1=[]; ydata=signal(:,2); dev=std(ydata); [~,imax,~,imin]=extrema(ydata); % Optional diagnostic plotting

215

% figure(2), plot(signal(:,1),signal(:,2),'r-'), hold on; % figure(2), plot(signal(imax,1),xmax,'kx'), hold on; % figure(2), plot(signal(imin,1),xmin,'go'), hold on; imin=sort(imin); imax=sort(imax); diffarr=zeros(length(imax),3); pathint=zeros(length(imax),4); for i=1:length(imax), % Find boundaries of peaks -- minimum on each side indleft=find(iminimax(i),1,'first'); % Find elements from each minimum to the maximum lpath=signal(imin(indleft):imax(i),:); rpath=signal(imax(i):imin(indrite),:); % Calculate arc length pathint(i,1)=signal(imax(i),1); pathint(i,2)=pathcalc(lpath); pathint(i,3)=pathcalc(rpath); pathint(i,4)=mean(pathint(i,2:3)); % Peaks at boundaries may have only one minimum left=ydata(imin(indleft)); if isempty(left), left=0; end rite=ydata(imin(indrite)); if isempty(rite), rite=0; end % Collect arc lengths for each peak diffarr(i,1)=signal(imax(i),1); diffarr(i,2)=abs(ydata(imax(i))-left); diffarr(i,3)=abs(ydata(imax(i))-rite); end % Optional diagnostic plotting % figure(2), plot(pathint(:,1),pathint(:,2),'-bd'), hold on; % figure(2), plot(pathint(:,1),pathint(:,3),'-md'), hold on;

216

k=1; for i=1:length(imax), if diffarr(i,2)>dev, j=0; if diffarr(i,3)>dev, peeks1(k)=diffarr(i,1); k=k+1; end if diffarr(i,3)<=dev, while diffarr(i+j,3)

217

end function out=leia(data) % This function performs Local Extrema Interpolation Averaging [xmax,imax,xmin,imin]=extrema(data(:,2)); maxfunc=interp1(data(imax),xmax,data(:,1),'cubic'); minfunc=interp1(data(imin),xmin,data(:,1),'cubic'); for i=1:length(maxfunc), if isnan(maxfunc(i)), k=dsearchn(maxfunc,maxfunc(i)); maxfunc(i)=maxfunc(k); end if isnan(minfunc(i)), k=dsearchn(minfunc,minfunc(i)); minfunc(i)=minfunc(k); end end out=(0.5)*(maxfunc+minfunc); % Optional diagnostic plotting % figure(4), plot(data(:,1),data(:,2),'r-'), hold on; % figure(4), plot(data(:,1),maxfunc,'b-'), hold on; % figure(4), plot(data(:,1),minfunc,'g-'), hold on; % figure(4), plot(data(:,1),out,'k-'); end

218

A.10 Object Tracking Algorithm code

% The function ptrack sorts and organizes a list of coordinates collected % from indistinguishable particles moving in a unidirectional force field. % The force field should be oriented such that the direction of the force % is in the direction of decreasing y. % % Input poslist consists of three columns (x, y, t), where each row is an % observation of a particle position at a specific time. The input poslist % does not have to be in any particular order. % % The output from the algorithm is a list of each particle observation % assigned an ID. function out = ptrack(poslist) poslist=sortrows(poslist,3); ptcllist=zeros(length(poslist),5); tmax=max(poslist(:,3)); tmin=min(poslist(:,3)); ptclinit=poslist(find(poslist(:,3)==tmin),:); initsize=size(ptclinit); ptcllist(1:initsize(1),1:3)=ptclinit(:,:); ptcllist(1:initsize(1),4)=(1:initsize); yBias=forceBias(poslist); for i=tmin+1:tmax, nextline=1+length(find(ptcllist(:,1))); elemA=find(ptcllist(:,3)==(i-1)); ptclA=ptcllist(elemA,:); elemB=find(poslist(:,3)==i); ptclB=poslist(elemB,:); numA=size(ptclA,1); numB=size(ptclB,1); ptclcount=max(ptcllist(:,4)); ptclB=horzcat(ptclB,zeros(numB,2)); ptclB=sortrows(ptclB,2); PtclArr=zeros(numA,numB);

219

for j=1:numA, for k=1:numB, PtclArr(j,k)=surfcalc(ptclA(j,1),ptclA(j,2),ptclB(k,1),ptclB(k,2),yBias); end end

%Rows all negative correspond to particle exit oldPtcl=find(sum(PtclArr<=0,2)==numB); if ~isempty(oldPtcl), for z=1:length(oldPtcl), ptcllist(elemA(oldPtcl(z)),5)=3; end end % Version 2: ascend particles based on Y coordinate, select smallest % remaining positive value in PtclArr for new assignment. This shall % minimize the total Y displacement across all particles. posPA=PtclArr>0; tempPA=PtclArr.*posPA; idxtrk=zeros(numA,1); for n=1:numB, if sum(tempPA(:,n))==0, ptclcount=ptclcount+1; ptclB(n,4)=ptclcount; ptclB(n,5)=2; else nneg=sum(tempPA(:,n)==0); [~,ixtemp]=sort(tempPA(:,n)); idx=0; q=1; while idx==0, if q<=(numA-nneg), ixrow=ixtemp(nneg+q); idxtemp=ptclA(ixrow,4); if idxtrk(ixrow)~=0, q=q+1;

220

else idx=idxtemp; idxtrk(ixrow)=1; flag=1; end end if q>(numA-nneg), ptclcount=ptclcount+1; idx=ptclcount; flag=0; end end ptclB(n,4)=idx; ptclB(n,5)=flag; end end ptcllist(nextline:nextline+numB-1,:)=ptclB(:,:); end ptcllist=sortrows(ptcllist,4); out.res=ptcllist; % Backtrack method out.btres=backtrack(ptcllist); end function out = backtrack(ptcllist) % Backtrack method ptcllist=sortrows(ptcllist,3); tmax=max(ptcllist(:,3)); tmin=min(ptcllist(:,3)); prevPtclMat=cell(1); p=tmin matNum=1; while p<=tmax-2, obs0=find(ptcllist(:,3)==p); obs1=find(ptcllist(:,3)==(p+1)); obs2=find(ptcllist(:,3)==(p+2));

221

ptcl0=ptcllist(obs0,:); endPtcl=find(ptcl0(:,5)~=3); ptcl0=ptcl0(endPtcl,:); ptcID=unique(vertcat(ptcllist(obs1,4),ptcllist(obs2,4))); ptcls=[]; for q=1:length(ptcID), rowIDtemp=unique(find(ptcllist(min(obs1):size(ptcllist,1),4)==ptcID(q),2,'first')); if length(rowIDtemp)==2, ptcls=vertcat(ptcls,ptcllist(rowIDtemp(1)+max(obs0),:)); ptcls=vertcat(ptcls,ptcllist(rowIDtemp(2)+max(obs0),:)); elseif length(rowIDtemp)==1, ptcls=vertcat(ptcls,ptcllist(rowIDtemp+max(obs0),:)); exitX=ptcls(size(ptcls,1),1); exitY=0; exitT=ptcls(size(ptcls,1),3)+1; exitID=ptcls(size(ptcls,1),4); exitFlag=4; ptcls=vertcat(ptcls,horzcat(exitX,exitY,exitT,exitID,exitFlag)); end end pNext=1; if ~isempty(ptcl0) && ~isempty(ptcls), darr=abs(dcalc(ptcl0,ptcls)); for z=1:size(darr,3), darr(:,:,z)=sortrows(darr(:,:,z),1); end refPtclMat(:,:,1)=horzcat(ptcl0(:,3),ptcl0(:,4)); refPtclMat(:,:,2)=horzcat(ptcl0(:,3)+1,ptcl0(:,4)); refPtclMat(:,:,3)=horzcat(ptcl0(:,3)+2,ptcl0(:,4)); [refRows,refDARRsum]=getDARRsum(darr,refPtclMat); if sum(refRows)~=length(refRows), np=1; while np<=length(refRows) && pNext==1, if refRows(np)~=1, npdr=1;

222

while npdr

223

reID2new=find(ptcllist(:,4)==pNew(2)); reTIME2new=find(ptcllist(:,3)>=tNew(2)); reassign2new=intersect(reID2new,reTIME2new); reID1old=find(ptcllist(:,4)==pOld(1)); reTIME1old=find(ptcllist(:,3)==tNew(1)); reassign1old=intersect(reID1old,reTIME1old); reID2old=find(ptcllist(:,4)==pOld(2)); reTIME2old=find(ptcllist(:,3)>=tNew(2)); reassign2old=intersect(reID2old,reTIME2old); ptcllist(reassign1new,4)=pt1old; ptcllist(reassign2new,4)=pt2old; ptcllist(reassign1old,4)=pt1temp; ptcllist(reassign2old,4)=pt2temp; pNext=0; end end end end npdr=npdr+1; end end np=np+1; end end end if pNext==1, p=p+1 matNum=1; clear prevPtclMat prevPtclMat=cell(1); clear reassign1new reassign2new reassign1old reassign2old end clear ptcl0 ptcl1 ptcl2 ptcls darr refPtclMat tempPtclMat clear refDARRsum tempDARRsum end

224

% Compact Particle IDs idlist=unique(ptcllist(:,4)); for t=1:length(idlist), if t~=idlist(t), reassign=find(ptcllist(:,4)==idlist(t)); ptcllist(reassign,4)=t; end end out=sortrows(ptcllist,4); end function outPtclMat=swapPtclAssign(inPtclMat,t1,p1,t2,p2) savePtclMat=inPtclMat; for n=1:length(t1), tt1idx=find(inPtclMat(:,1,:)==t1(n)); pt1idx=find(inPtclMat(:,2,:)==p1(n)); tt2idx=find(inPtclMat(:,1,:)==t2(n)); pt2idx=find(inPtclMat(:,2,:)==p2(n)); pt1row=intersect(tt1idx,pt1idx); pt2row=intersect(tt2idx,pt2idx); if ~isempty(pt1row) && ~isempty(pt2row), v1=ceil(pt1row/size(inPtclMat,1)); r1=mod(pt1row,size(inPtclMat,1)); if r1==0, r1=size(inPtclMat,1); end v2=ceil(pt2row/size(inPtclMat,1)); r2=mod(pt2row,size(inPtclMat,1)); if r2==0, r2=size(inPtclMat,1); end inPtclMat(r1,:,v1)=[t2(n),p2(n)]; inPtclMat(r2,:,v2)=[t1(n),p1(n)]; elseif isempty(pt1row) && ~isempty(pt2row), v2=ceil(pt2row/size(inPtclMat,1)); r2=mod(pt2row,size(inPtclMat,1));

225

if r2==0, r2=size(inPtclMat,1); end inPtclMat(r2,:,v2)=[t1(n),p1(n)]; elseif isempty(pt2row) && ~isempty(pt1row),

v1=ceil(pt1row/size(inPtclMat,1)); r1=mod(pt1row,size(inPtclMat,1)); if r1==0, r1=size(inPtclMat,1); end inPtclMat(r1,:,v1)=[t2(n),p2(n)]; end end tCount=0; for n=1:size(inPtclMat,1), if inPtclMat(n,1,2)>inPtclMat(n,1,1) && inPtclMat(n,1,3)>inPtclMat(n,1,2), tCount=tCount+1; end end if tCount==size(inPtclMat,1), outPtclMat=inPtclMat; else outPtclMat=savePtclMat; end end function [outrows,outsum]=getDARRsum(darr,ptclmat) outrows=0; outsum=0; m=1; for i=1:size(ptclmat,1), for j=1:size(darr,3), for k=1:size(darr,1), if darr(k,3,j)==ptclmat(i,2,1) && darr(k,6,j)==ptclmat(i,2,2) && darr(k,9,j)==ptclmat(i,2,3), if darr(k,2,j)==ptclmat(i,1,1) && darr(k,5,j)==ptclmat(i,1,2) && darr(k,8,j)==ptclmat(i,1,3),

226

outsum=outsum+darr(k,1,j); outrows(m)=k; m=m+1; end end end end end end function dmat = dcalc(ref0,ptclist) % This function calculates the linedist function for a reference particle % and a set of particles from two different frames. ref is the particle in % question at time t; ptclist1 is a list of particles at time t-1, and % ptclist2 is a list of particles at time t-2. dmat=[]; for h=1:size(ref0,1), k=1; for i=1:size(ptclist,1), for j=1:size(ptclist,1), if ptclist(j,3)>ptclist(i,3), if ptclist(j,5)~=2 && ptclist(i,5)~=2, dmat(k,1,h)=linedist(ptclist(i,1),ptclist(i,2),ref0(h,1),ref0(h,2),ptclist(j,1),ptclist(j,2)); dmat(k,2:4,h)=ref0(h,3:5); dmat(k,5:7,h)=ptclist(i,3:5); dmat(k,8:10,h)=ptclist(j,3:5); k=k+1; end end end end end if isempty(dmat), dmat=zeros(1,10); end

227

end function d=linedist(x1,y1,x2,y2,x0,y0) % This function calculates a weighted value relating a point existing at a % time between two other points, one earlier in time and one later in time. % VerC: difference of distances from points d02=sqrt((x0-x2)^2+(y0-y2)^2); d01=sqrt((x1-x0)^2+(y1-y0)^2); d12=sqrt((x1-x2)^2+(y1-y2)^2); dC=((d01+d12)-d02); d=dC; if d>999, d=999; end end function out = forceBias(data) %This function calculates the bias to use for the tracker weighting fxn for i=1:max(data(:,3)), yCoM(i)=mean(data(data(:,3)==i,2)); end out=min(abs(diff(yCoM))); end function out=surfcalc(xp,yp,xq,yq,yBias) % This function calculates the weighted values for all points in the frame % (xmat,ymat) based upon the location of the input point (xp,yp) %Version 2: utilize rows/columns all neg out=-yBias+((abs(xp-xq).^2+(yp-yq).^2).^(0.5))./abs(cos((atan2(xp-xq,yp-yq))))*(sign(yp- yq)); end

228

A.11 Trajectory Simulation code function [info,output] = newTrackerSim(nPtcl,xdim,ydim,rate,xi,yi,vxi,vyi,ayi, xnoise,ynoise, anim) % nPtcl: the number of particles to simulate % xdim: the x-dimension for the simulation frame % ydim: is the y-dimension for the simulation frame % rate: is the time between spawns % xi, yi are spawn coordinates % vxi, vyi are intial velocities % ay is y-acceleration % xnoise, ynoise: number defines variance that adds normally-distributed noise to % final particle positions. X and Y can be operated on separately. % anim is 0 for no animation, 1 to view animation % For rate, xi, yi, vxi, vyi, and ayi, the input can be scalar or as: [min max]. % In the case of a scalar, the value will remain constant throughout the % simulation. When the input is a vector, the values will have a % random distribution between min and max. These randomly distributed % values then remain characteristic of the particle throughout the % simulation. t0=zeros(nPtcl,1); x0=zeros(nPtcl,1); y0=zeros(nPtcl,1); vx0=zeros(nPtcl,1); vy0=zeros(nPtcl,1); ay=zeros(nPtcl,1); t0(1)=1; for i=1:nPtcl, if i~=1 && ~isscalar(rate), a=rate(2)-rate(1); b=rate(1); t0(i)=t0(i-1)+(a*rand(1,1)+b); elseif i~=1 && isscalar(rate), t0(i)=t0(i-1)+rate+(0.01).*randn(1,1);

229

end if ~isscalar(xi), a=xi(2)-xi(1); b=xi(1); x0(i)=a*rand(1,1)+b; if x0(i)<0, x0(i)=0; end if x0(i)>xdim, x0(i)=xdim; end elseif isscalar(xi), x0(i)=xi; end if ~isscalar(yi), a=yi(2)-yi(1); b=yi(1); y0(i)=a*rand(1,1)+b; if y0(i)<0, y0(i)=0; end if y0(i)>ydim, y0(i)=ydim; end elseif isscalar(yi), y0(i)=yi; end if ~isscalar(vxi), a=vxi(2)-vxi(1); b=vxi(1); vx0(i)=a*rand(1,1)+b; elseif isscalar(vxi), vx0(i)=vxi; end if ~isscalar(vyi), a=vyi(2)-vyi(1);

230

b=vyi(1); vy0(i)=a*rand(1,1)+b; elseif isscalar(vyi), vy0(i)=vyi; end if ~isscalar(ayi), a=ayi(2)-ayi(1); b=ayi(1); ay(i)=a*rand(1,1)+b; elseif isscalar(ayi), ay(i)=ayi; end end poslist=cell(nPtcl,1); for j=1:nPtcl, if isinteger(t0(j)), poslist{j}(1,3)=t0(j); poslist{j}(1,2)=y0(j); poslist{j}(1,1)=x0(j); elseif ~isinteger(t0(j)), poslist{j}(1,3)=ceil(t0(j)); tdiff=ceil(t0(j))-t0(j); poslist{j}(1,2)=ycalc(y0(j),tdiff,vy0(j),ay(j)); poslist{j}(1,1)=xcalc(x0(j),tdiff,vx0(j)); end end linecount=1; for k=1:nPtcl, j=1; x=poslist{k}(1,1); y=poslist{k}(1,2); t=poslist{k}(1,3); while x0 && y0, j=j+1; t=t+1; y=ycalc(y,j,vy0(k),ay(k));

231

x=xcalc(x,j,vx0(k)); poslist{k}(j,1)=x; poslist{k}(j,2)=y; poslist{k}(j,3)=t; end nObserv=size(poslist{k},1)-1; data(linecount:linecount+nObserv-1,1:3)=poslist{k}(1:nObserv,1:3); data(linecount:linecount+nObserv-1,4)=k; linecount=linecount+nObserv; end info.poslist=poslist; info.ptclDrops=ptclDrops; data(:,2)=abs(ydim-data(:,2)); if sum(xnoise~=0)~=0 || sum(ynoise~=0)~=0, numNoise=max(length(xnoise),length(ynoise)); dataN=zeros(size(data,1),size(data,2),numNoise); for nn=1:numNoise, dataN(:,:,nn)=data; end for xn=1:length(xnoise), dataN(:,1,xn)=xnoise(xn).*randn(size(dataN,1),1)+dataN(:,1,xn); end for yn=1:length(ynoise), dataN(:,2,yn)=ynoise(yn).*randn(size(dataN,1),1)+dataN(:,2,yn); end output=dataN; else output=data; end if anim==1, for j=1:size(data,1), figure(1), plot3(data(j,3),data(j,1),data(j,2),'k.'), hold on; grid on; xlabel('t'); ylabel('x'); zlabel('y');

232

axis([0 t 0 xdim 0 ydim]); end if sum(xnoise~=0)~=0 || sum(ynoise~=0)~=0, cmap=colormap(hsv(size(output,3))); for i=1:size(output,3), for j=1:size(output,1), figure(1), plot3(output(j,3,i),output(j,1,i),output(j,2,i),'Color',cmap(i,:),'Marker','o'), hold on; grid on; xlabel('t'); ylabel('x'); zlabel('y'); axis([0 t 0 xdim 0 ydim]); end end end end end function out=xcalc(x,t,vx) out=x+t*vx; end function out=ycalc(y,t,vy,ay) out=y+t*vy+(1/2)*(ay)*(t^2); end

233

Bibliography

Abbondanzieri, Elio A., William J. Greenleaf, Joshua W. Shaevitz, Robert Landick, and

Steven M. Block. 2005. "Direct Observation of Base-Pair Stepping by RNA

Polymerase." Nature 438 (7067): 460-465. doi:10.1038/nature04268.

Adrian, Ronald J. 1991. "Particle-Imaging Techniques for Experimental Fluid

Mechanics." Annual Review of Fluid Mechanics 23 (1): 261-304.

doi:10.1146/annurev.fl.23.010191.001401.

Allersma, M. W., F. Gittes, M. J. deCastro, R. J. Stewart, and C. F. Schmidt. 1998. "Two-

Dimensional Tracking of Ncd Motility by Back Focal Plane Interferometry."

Biophysical Journal 74 (2 Pt 1): 1074-1085. doi:10.1016/S0006-3495(98)74031-7.

Anderson, C. M., G. N. Georgiou, I. E. G. Morrison, G. V. W. Stevenson, and R. J.

Cherry. 1992. "Tracking of Cell-Surface Receptors by Fluorescence Digital Imaging

Microscopy using a Charge-Coupled Device Camera - Low-Density-Lipoprotein

and Influenza-Virus Mobility at 4-Degrees-C." Journal of Cell Science 101:

425.

Arunajadai, Srikesh G. and Wei Cheng. 2013. "Step Detection in Single-Molecule Real

Time Trajectories Embedded in Correlated Noise." Plos One 8 (3): e59279. 234

Ashkin, A. 1970. "Acceleration and Trapping of Particles by Radiation Pressure."

Physical Review Letters 24 (4): 156-159. doi:10.1103/PhysRevLett.24.156.

Ashkin, A., J. M. Dziedzic, J. E. Bjorkholm, and S. Chu. 1986. "Observation of a Single-

Beam Gradient Force Optical Trap for Dielectric Particles." Optics Letters 11 (5): 288.

doi:10.1364/OL.11.000288.

Avery, Oswald T., Colin M. MacLeod, and Maclyn McCarty. 1944. "Studies on the

Chemical Nature of the Substance Inducing Transformation of Pneumococcal

Types." The Journal of Experimental Medicine 79 (2): 137-158. doi:10.1084/jem.79.2.137.

Ballard, Dana H. 1981. "Generalizing the Hough Transform to Detect Arbitrary Shapes."

Pattern Recognition 13 (2): 111-122.

Barniv, Y. 1985. "Dynamic Programming Solution for Detecting Dim Moving Targets."

IEEE Transactions on Aerospace and Electronic Systems AES-21 (1): 144-156.

doi:10.1109/TAES.1985.310548.

Bar-Shalom, Yaakov and Edison Tse. 1975. "Tracking in a Cluttered Environment with

Probabilistic Data Association." Automatica 11 (5): 451-460. doi:10.1016/0005-

1098(75)90021-7.

235

Batchelor, G. K. 2000. An Introduction to Fluid Dynamics. Cambridge: Cambridge

University Press.

Bennink, M. L., S. H. Leuba, G. H. Leno, J. Zlatanova, B. G. de Grooth, and J. Greve.

2001. "Unfolding Individual Nucleosomes by Stretching Single Chromatin Fibers

with Optical Tweezers." Nature Structural Biology 8 (7): 606-610. doi:10.1038/89646.

Betzig, Eric and Robert J. Chichester. 1993. "Single Molecules Observed by Near-Field

Scanning Optical Microscopy." Science 262 (5138): 1422-1425.

doi:10.1126/science.262.5138.1422.

Binnig, G., N. Garcia, and H. Rohrer. 1985. "Conductivity Sensitivity of Inelastic

Scanning Tunneling Microscopy." Physical Review B 32 (2): 1336-1338.

doi:10.1103/PhysRevB.32.1336.

Binnig, G., C. F. Quate, and Ch Gerber. 1986. "Atomic Force Microscope." Physical

Review Letters 56 (9): 930-933. doi:10.1103/PhysRevLett.56.930.

Blair, Daniel and Dufresne, Eric. "Matlab Locating and Tracking Code."

http://site.physics.georgetown.edu/matlab/code.html.

236

Blanding, W. R., P. K. Willett, and Y. Bar-Shalom. 2007. "Offline and Real-Time Methods

for ML-PDA Track Validation." IEEE Transactions on Signal Processing 55 (5): 1994-

2006. doi:10.1109/TSP.2007.893212.

Block, S. M., C. L. Asbury, J. W. Shaevitz, and M. J. Lang. 2003. "Probing the Kinesin

Reaction Cycle with a 2D Optical Force Clamp." Proceedings of the National Academy

of Sciences of the United States of America 100 (5): 2351-2356.

Bobroff, Norman. 1986. "Position Measurement with a Resolution and Noise-Limited

Instrument." Review of Scientific Instruments 57 (6): 1152-1157.

Bockelmann, U., Ph Thomen, B. Essevaz-Roulet, V. Viasnoff, and F. Heslot. 2002.

"Unzipping DNA with Optical Tweezers: High Sequence Sensitivity and Force

Flips." Biophysical Journal 82 (3): 1537-1553. doi:10.1016/S0006-3495(02)75506-9.

Bonneau, S., M. Dahan, and L. D. Cohen. June 2005. "Tracking Single Quantum Dots in

Live Cells with Minimal Paths.". doi:10.1109/CVPR.2005.546.

Bosaeus, Niklas, Afaf H. El-Sagheer, Tom Brown, Steven B. Smith, Björn Åkerman,

Carlos Bustamante, and Bengt Nordén. 2012. "Tension Induces a Base-Paired

237

Overstretched DNA Conformation." Proceedings of the National Academy of Sciences

109 (38): 15179-15184. doi:10.1073/pnas.1213172109.

Bouchiat, C., M. D. Wang, J. -F Allemand, T. Strick, S. M. Block, and V. Croquette. 1999.

"Estimating the Persistence Length of a Worm-Like Chain Molecule from Force-

Extension Measurements." Biophysical Journal 76 (1): 409-413. doi:10.1016/S0006-

3495(99)77207-3.

Brau, Ricardo R., Peter B. Tarsa, Jorge M. Ferrer, Peter Lee, and Matthew J. Lang. 2006.

"Interlaced Optical Force-Fluorescence Measurements for Single Molecule

Biophysics." Biophysical Journal 91 (3): 1069-1077. doi:10.1529/biophysj.106.082602.

Brooks Shera, E., Newton K. Seitzinger, Lloyd M. Davis, Richard A. Keller, and Steven

A. Soper. 1990. "Detection of Single Fluorescent Molecules." Chemical Physics Letters

174 (6): 553-557. doi://dx.doi.org/10.1016/0009-2614(90)85485-U.

Brower-Toland, B. D., C. L. Smith, R. C. Yeh, J. T. Lis, C. L. Peterson, and M. D. Wang.

2002. "Mechanical Disruption of Individual Nucleosomes Reveals a Reversible

Multistage Release of DNA." Proceedings of the National Academy of Sciences of the

United States of America 99 (4): 1965. doi:10.1073/pnas.022638399.

238

Bryant, Zev, Florian C. Oberstrass, and Aakash Basu. 2012. "Recent Developments in

Single-Molecule DNA Mechanics." Current Opinion in Structural Biology 22 (3): 304-

312. doi:10.1016/j.sbi.2012.04.007.

Bustamante, C., J. F. Marko, E. D. Siggia, and S. Smith. 1994. "Entropic Elasticity of

Lambda-Phage DNA." Science (New York, N.Y.) 265 (5178): 1599-1600.

Bustamante, J. O., A. Liepins, R. A. Prendergast, J. A. Hanover, and H. Oberleithner.

1995. "Patch Clamp and Atomic Force Microscopy Demonstrate TATA-Binding

Protein (TBP) Interactions with the Nuclear Pore Complex." The Journal of Membrane

Biology 146 (3): 263-272. doi:10.1007/BF00233946.

Carrion-Vazquez, Mariano, Andres F. Oberhauser, Thomas E. Fisher, Piotr E.

Marszalek, Hongbin Li, and Julio M. Fernandez. 2000. "Mechanical Design of

Proteins Studied by Single-Molecule Force Spectroscopy and Protein Engineering."

Progress in Biophysics and Molecular Biology 74 (1–2): 63-91. doi:10.1016/S0079-

6107(00)00017-1.

239

Carter, Ashley R., Gavin M. King, Theresa A. Ulrich, Wayne Halsey, David

Alchenberger, and Thomas T. Perkins. 2007. "Stabilization of an Optical Microscope

to 0.1 Nm in Three Dimensions." Applied Optics 46 (3): 421-427.

Carter, N. J. and R. A. Cross. 2005. "Mechanics of the Kinesin Step." Nature 435 (7040):

308-312.

Cerveri, P., A. Pedotti, and G. Ferrigno. 2003. "Robust Recovery of Human Motion from

Video using Kalman Filters and Virtual Humans." Human Movement Science 22 (3):

377-404. doi:10.1016/S0167-9457(03)00004-6.

Chargaff, Erwin, Stephen Zamenhof, and Charlotte Green. 1950. "Composition of

Human Desoxypentose Nucleic Acid." Nature 165 (4202): 756-757.

doi:10.1038/165756b0.

Chen, Bing and Jitendra K. Tugnait. 2001. "Tracking of Multiple Maneuvering Targets in

Clutter using IMM/JPDA Filtering and Fixed-Lag Smoothing." Automatica 37 (2):

239-249. doi:10.1016/S0005-1098(00)00158-8.

Chen, Hu, Hongxia Fu, Xiaoying Zhu, Peiwen Cong, Fumihiko Nakamura, and Jie Yan.

2011. "Improved High-Force Magnetic Tweezers for Stretching and Refolding of

240

Proteins and Short DNA." Biophysical Journal 100 (2): 517-523.

doi:10.1016/j.bpj.2010.12.3700.

Chenouard, Nicolas, Ihor Smal, Fabrice de Chaumont, Martin Maška, Ivo F. Sbalzarini,

Yuanhao Gong, Janick Cardinale, et al. 2014. "Objective Comparison of Particle

Tracking Methods." Nature Methods 11 (3): 281-289. doi:10.1038/nmeth.2808.

Chetverikov, D. and J. Verestói. 2014. "Feature Point Tracking for Incomplete

Trajectories." Computing 62 (4): 321-338. doi:10.1007/s006070050027.

Cluzel, P., A. Lebrun, C. Heller, R. Lavery, J. L. Viovy, D. Chatenay, and F. Caron. 1996.

"DNA: An Extensible Molecule." Science (New York, N.Y.) 271 (5250): 792-794.

Cox, I. J. and S. L. Hingorani. 1996. "An Efficient Implementation of Reid's Multiple

Hypothesis Tracking Algorithm and its Evaluation for the Purpose of Visual

Tracking." IEEE Transactions on Pattern Analysis and Machine Intelligence 18 (2): 138-

150. doi:10.1109/34.481539.

Crocker, John C. and David G. Grier. 1996. "Methods of Digital Video Microscopy for

Colloidal Studies." Journal of Colloid and Interface Science 179 (1): 298-310.

doi:10.1006/jcis.1996.0217.

241

Cui, Y. and C. Bustamante. 2000. "Pulling a Single Chromatin Fiber Reveals the Forces

that Maintain its Higher-Order Structure." Proceedings of the National Academy of

Sciences of the United States of America 97 (1): 127-132.

Cumpson, Peter J., Peter Zhdan, and John Hedley. 2004. "Calibration of AFM Cantilever

Stiffness: A Microfabricated Array of Reflective Springs." Ultramicroscopy 100 (3–4):

241-251. doi:10.1016/j.ultramic.2003.10.005.

Dahm, Ralf. 2007. "Discovering DNA: Friedrich Miescher and the Early Years of Nucleic

Acid Research." Human Genetics 122 (6): 565-581. doi:10.1007/s00439-007-0433-0.

Dai, P., S. -K Wang, H. Taub, J. E. Buckley, S. N. Ehrlich, J. Z. Larese, G. Binnig, and D.

P. E. Smith. 1993. "X-Ray-Diffraction and Scanning-Tunneling-Microscopy Studies

of a Liquid-Crystal Film Adsorbed on Single-Crystal Graphite." Physical Review B 47

(12): 7401-7407. doi:10.1103/PhysRevB.47.7401.

Dame, Remus T., Maarten C. Noom, and Gijs J. L. Wuite. 2006. "Bacterial Chromatin

Organization by H-NS Protein Unravelled using Dual DNA Manipulation." Nature

444 (7117): 387-390. doi:10.1038/nature05283.

242

Danilowicz, C., C. H. Lee, K. Kim, K. Hatch, V. W. Coljee, N. Kleckner, and M. Prentiss.

2009. "Single Molecule Detection of Direct, Homologous, DNA/DNA Pairing."

Proceedings of the National Academy of Sciences 106 (47): 19824-19829.

doi:10.1073/pnas.0911214106.

Danilowicz, Claudia, Vincent W. Coljee, Cedric Bouzigues, David K. Lubensky, David

R. Nelson, and Mara Prentiss. 2003. "DNA Unzipped Under a Constant Force

Exhibits Multiple Metastable Intermediates." Proceedings of the National Academy of

Sciences 100 (4): 1694-1699. doi:10.1073/pnas.262789199.

De Vlaminck, Iwijn and Cees Dekker. 2012. "Recent Advances in Magnetic Tweezers."

Annual Review of Biophysics 41: 453-472. doi:10.1146/annurev-biophys-122311-

100544.

Dessinges, Marie-Noëlle, Timothée Lionnet, Xu Guang Xi, David Bensimon, and

Vincent Croquette. 2004. "Single-Molecule Assay Reveals Strand Switching and

Enhanced Processivity of UvrD." Proceedings of the National Academy of Sciences of the

United States of America 101 (17): 6439-6444. doi:10.1073/pnas.0306713101.

243

Dijk, Meindert A. van, Lukas C. Kapitein, Joost van Mameren, Christoph F. Schmidt,

and Erwin J. G. Peterman. 2004. "Combining Optical Trapping and Single-Molecule

Fluorescence Spectroscopy: Enhanced Photobleaching of Fluorophores." The Journal

of Physical Chemistry. B 108 (20): 6479-6484. doi:10.1021/jp049805+.

Engel, A. 1991. "Biological Applications of Scanning Probe Microscopes." Annual Review

of Biophysics and Biophysical Chemistry 20 (1): 79-108.

doi:10.1146/annurev.bb.20.060191.000455.

Forth, Scott, Christopher Deufel, Maxim Y. Sheinin, Bryan Daniels, James P. Sethna, and

Michelle D. Wang. 2008. "Abrupt Buckling Transition Observed during the

Plectoneme Formation of Individual DNA Molecules." Physical Review Letters 100

(14): 148301. doi:10.1103/PhysRevLett.100.148301.

Fortmann, Thomas E., Y. Bar-Shalom, and M. Scheffe. 1983. "Sonar Tracking of Multiple

Targets using Joint Probabilistic Data Association." IEEE Journal of Oceanic

Engineering 8 (3): 173-184. doi:10.1109/JOE.1983.1145560.

Franklin, Rosalind E. and R. G. Gosling. 1953. "Molecular Configuration in Sodium

Thymonucleate." Nature 171 (4356): 740-741. doi:10.1038/171740a0.

244

Friedman, Larry J. and Jeff Gelles. 2012. "Mechanism of Transcription Initiation at an

Activator-Dependent Promoter Defined by Single-Molecule Observation." Cell 148

(4): 679-689. doi:10.1016/j.cell.2012.01.018.

Ghaemmaghami, Sina, Won-Ki Huh, Kiowa Bower, Russell W. Howson, Archana Belle,

Noah Dephoure, Erin K. O'Shea, and Jonathan S. Weissman. 2003. "Global Analysis

of Protein Expression in Yeast." Nature 425 (6959): 737-741. doi:10.1038/nature02046.

Ghosh, R. N. and W. W. Webb. 1994. "Automated Detection and Tracking of Individual

and Clustered Cell Surface Low Density Lipoprotein Receptor Molecules."

Biophysical Journal 66 (5): 1301-1318.

Gittes, Frederick and C. F. Schmidt. 1998a. "Thermal Noise Limitations on

Micromechanical Experiments." European Biophysics Journal 27 (1): 75-81.

doi:10.1007/s002490050113.

Gittes, Frederick and Christoph F. Schmidt. 1998b. "Interference Model for Back-Focal-

Plane Displacement Detection in Optical Tweezers." Optics Letters 23 (1): 7-9.

doi:10.1364/OL.23.000007.

245

Gollnick, Benjamin, Carolina Carrasco, Francesca Zuttion, Neville S. Gilhooly, Mark S.

Dillingham, and Fernando Moreno-Herrero. 2015. "Probing DNA Helicase Kinetics

with Temperature-Controlled Magnetic Tweezers." Small 11 (11): 1273-1284.

doi:10.1002/smll.201402686.

Gore, Jeff, Zev Bryant, Michael D. Stone, Marcelo Nöllmann, Nicholas R. Cozzarelli, and

Carlos Bustamante. 2006. "Mechanochemical Analysis of DNA Gyrase using Rotor

Bead Tracking." Nature 439 (7072): 100-104. doi:10.1038/nature04319.

Gosse, Charlie and Vincent Croquette. 2002. "Magnetic Tweezers: Micromanipulation

and Force Measurement at the Molecular Level." Biophysical Journal 82 (6): 3314-

3329.

Haber, Charbel and Denis Wirtz. 2000. "Magnetic Tweezers for DNA

Micromanipulation." Review of Scientific Instruments 71 (12): 4561-4570.

doi:10.1063/1.1326056.

Handa, Naofumi, Piero R. Bianco, Ronald J. Baskin, and Stephen C. Kowalczykowski.

2005. "Direct Visualization of RecBCD Movement Reveals Cotranslocation of the

246

RecD Motor After Χ Recognition." Molecular Cell 17 (5): 745-750.

doi:10.1016/j.molcel.2005.02.011.

Hashirao, Masataka, Tetsuya Kawase, and Iwao Sasase. 2002. "Maneuver Target

Tracking with an Acceleration Estimator using Target Past Positions." Electronics

and Communications in Japan (Part I: Communications) 85 (12): 29-37.

doi:10.1002/ecja.10026.

Herbert, K. M., A. La Porta, B. J. Wong, R. A. Mooney, K. C. Neuman, R. Landick, and S.

M. Block. 2006a. "Sequence-Resolved Detecton of Pausing by Single RNA

Polymerase Molecules." Cell 125 (6): 1083-1094.

Herbert, Kristina M., Arthur La Porta, Becky J. Wong, Rachel A. Mooney, Keir C.

Neuman, Robert Landick, and Steven M. Block. 2006b. "Sequence-Resolved

Detection of Pausing by Single RNA Polymerase Molecules." Cell 125 (6): 1083-1094.

doi:10.1016/j.cell.2006.04.032.

Herbert, Kristina M., Jing Zhou, Rachel A. Mooney, Arthur La Porta, Robert Landick,

and Steven M. Block. 2010. "E. Coli NusG Inhibits Backtracking and Accelerates

247

Pause-Free Transcription by Promoting Forward Translocation of RNA

Polymerase." Journal of Molecular Biology 399 (1): 17-30. doi:10.1016/j.jmb.2010.03.051.

Hohng, Sungchul, Ruobo Zhou, Michelle K. Nahas, Jin Yu, Klaus Schulten, David M. J.

Lilley, and Taekjip Ha. 2007. "Fluorescence-Force Spectroscopy Maps Two-

Dimensional Reaction Landscape of the Holliday Junction." Science (New York, N.Y.)

318 (5848): 279-283. doi:10.1126/science.1146113.

Hong, Lang, Ningzhou Cui, Shan Cong, and Devert Wicker. 1998. "An Interacting

Multipattern Data Association (IMPDA) Tracking Algorithm." Signal Processing 71

(1): 55-77. doi:10.1016/S0165-1684(98)00134-0.

Huang, H., D. Dabiri, and M. Gharib. 1997. "On Errors of Digital Particle Image

Velocimetry." Measurement Science and Technology 8 (12): 1427. doi:10.1088/0957-

0233/8/12/007.

Huhle, Alexander, Daniel Klaue, Hergen Brutzer, Peter Daldrop, Sihwa Joo, Oliver

Otto, Ulrich F. Keyser, and Ralf Seidel. 2015. "Camera-Based Three-Dimensional

Real-Time Particle Tracking at kHz Rates and Ångström Accuracy." Nature

Communications 6. doi:10.1038/ncomms6885.

248

Janssen, Xander J. A., Jan Lipfert, Tessa Jager, Renier Daudey, Jaap Beekman, and

Nynke H. Dekker. 2012. "Electromagnetic Torque Tweezers: A Versatile Approach

for Measurement of Single-Molecule Twist and Torque." Nano Letters 12 (7): 3634-

3639. doi:10.1021/nl301330h.

Johnson, Aaron and Mike O'Donnell. 2005. "Cellular DNA Replicases: Components and

Dynamics at the Replication Fork." Annual Review of Biochemistry 74: 283-315.

doi:10.1146/annurev.biochem.73.011303.073859.

Kalafut, Bennett and Koen Visscher. 2008. "An Objective, Model-Independent Method

for Detection of Non-Uniform Steps in Noisy Signals." Computer Physics

Communications 179 (10): 716-723.

Kalman, R. E. 1960. "A New Approach to Linear Filtering and Prediction Problems."

Journal of Basic Engineering 82 (1): 45. doi:10.1115/1.3662552.

Kapanidis, Achillefs N., Emmanuel Margeat, Sam On Ho, Ekaterine Kortkhonjia,

Shimon Weiss, and Richard H. Ebright. 2006. "Initial Transcription by RNA

Polymerase Proceeds through a DNA-Scrunching Mechanism." Science (New York,

N.Y.) 314 (5802): 1144-1147. doi:10.1126/science.1131399.

249

Kerssemakers, Jacob W. J., E. Laura Munteanu, Liedewij Laan, Tim L. Noetzel, Marcel

E. Janson, and Marileen Dogterom. 2006. "Assembly Dynamics of Microtubules at

Molecular Resolution." Nature 442 (7103): 709-712.

Kessel, Amit and Nir Ben-Tal. 2010. Introduction to Proteins: Structure, Function, and

Motion CRC Press.

Klaue, Daniel and Ralf Seidel. 2009. "Torsional Stiffness of Single Superparamagnetic

Microspheres in an External Magnetic Field." Physical Review Letters 102 (2): 028302.

doi:10.1103/PhysRevLett.102.028302.

Kollmannsberger, Philip and Ben Fabry. 2007. "High-Force Magnetic Tweezers with

Force Feedback for Biological Applications." The Review of Scientific Instruments 78

(11): 114301. doi:10.1063/1.2804771.

Kossel, Albrecht. 1886. "Weitere Beiträge Zur Chemie Des Zellkerns (further

Contributions to the Chemistry of the Cell Nucleus)." Zeitschrift Für Physiologische

Chemie 10: 248-264.

Kuhn, H. W. 1955. "The Hungarian Method for the Assignment Problem." Naval

Research Logistics Quarterly 2 (1-2): 83-97. doi:10.1002/nav.3800020109.

250

Kuo, S. C., J. Gelles, E. Steuer, and M. P. Sheetz. 1991. "A Model for Kinesin Movement

from Nanometer-Level Movements of Kinesin and Cytoplasmic Dynein and Force

Measurements." Journal of Cell Science. Supplement 14: 135-138.

Lee, G. U., L. A. Chrisey, and R. J. Colton. 1994. "Direct Measurement of the Forces

between Complementary Strands of DNA." Science 266 (5186): 771-773.

doi:10.1126/science.7973628.

Leger, J. F., J. Robert, L. Bourdieu, D. Chatenay, and J. F. Marko. 1998. "RecA Binding to

a Single Double-Stranded DNA Molecule: A Possible Role of DNA Conformational

Fluctuations." Proceedings of the National Academy of Sciences 95 (21): 12295-12299.

doi:10.1073/pnas.95.21.12295.

Levene, P. A. 1919. "The Structure of Yeast Nucleic Acid." Journal of Biological Chemistry

40 (2): 415-424.

Levi, Leo. 1974. "Unsharp Masking and Related Image Enhancement Techniques."

Computer Graphics and Image Processing 3 (2): 163-177.

doi://dx.doi.org/10.1016/S0146-664X(74)80005-5.

251

Levi, V., V. I. Gelfand, A. S. Serpinskaya, and E. Gratton. 2006. "Melanosomes

Transported by Myosin-V in Xenopus Melanophores Perform Slow 35 Nm Steps."

Biophysical Journal 90 (1): L7-L9.

Lipfert, Jan, Jacob W. J. Kerssemakers, Tessa Jager, and Nynke H. Dekker. 2010.

"Magnetic Torque Tweezers: Measuring Torsional Stiffness in DNA and RecA-DNA

Filaments." Nature Methods 7 (12): 977-980. doi:10.1038/nmeth.1520.

Logothetis, Andrew, Vikram Krishnamurthy, and Jan Holst. 2002. "A Bayesian EM

Algorithm for Optimal Tracking of a Maneuvering Target in Clutter." Signal

Processing 82 (3): 473-490. doi:10.1016/S0165-1684(01)00198-0.

Luger, K., A. W. Mader, R. K. Richmond, D. F. Sargent, and T. J. Richmond. 1997.

"Crystal Structure of the Nucleosome Core Particle at 2.8 Angstrom Resolution."

Nature 389 (6648): 251-260.

Maffeo, Christopher, Robert Schöpflin, Hergen Brutzer, René Stehr, Aleksei

Aksimentiev, Gero Wedemann, and Ralf Seidel. 2010. "DNA-DNA Interactions in

Tight Supercoils are Described by a Small Effective Charge Density." Physical

Review Letters 105 (15): 158101.

252

Maier, Berenike, David Bensimon, and Vincent Croquette. 2000. "Replication by a Single

DNA Polymerase of a Stretched Single-Stranded DNA." Proceedings of the National

Academy of Sciences 97 (22): 12002-12007. doi:10.1073/pnas.97.22.12002.

Mameren, Joost van, Peter Gross, Geraldine Farge, Pleuni Hooijman, Mauro Modesti,

Maria Falkenberg, Gijs J. L. Wuite, and Erwin J. G. Peterman. 2009. "Unraveling the

Structure of DNA during Overstretching by using Multicolor, Single-Molecule

Fluorescence Imaging." Proceedings of the National Academy of Sciences 106 (43):

18231-18236. doi:10.1073/pnas.0904322106.

Mameren, Joost van, Mauro Modesti, Roland Kanaar, Claire Wyman, Gijs J. L. Wuite,

and Erwin J. G. Peterman. 2006. "Dissecting Elastic Heterogeneity Along DNA

Molecules Coated Partly with Rad51 using Concurrent Fluorescence Microscopy

and Optical Tweezers." Biophysical Journal 91 (8): 78.

doi:10.1529/biophysj.106.089466.

Manosas, Maria, Michelle M. Spiering, Zhihao Zhuang, Stephen J. Benkovic, and

Vincent Croquette. 2009. "Coupling DNA Unwinding Activity with Primer

Synthesis in the Bacteriophage T4 Primosome." Nature Chemical Biology 5 (12): 904-

912. doi:10.1038/nchembio.236.

253

Marko, J. F. and E. D. Siggia. 1997. "Driving Proteins Off DNA using Applied Tension."

Biophysical Journal 73 (4): 2173-2178.

Marko, John F. and Sébastien Neukirch. 2012. "Competition between Curls and

Plectonemes Near the Buckling Transition of Stretched Supercoiled DNA." Physical

Review. E, Statistical, Nonlinear, and Soft Matter Physics 85 (1 Pt 1): 011908.

Marko, John F. and Eric D. Siggia. 1995. Stretching DNA. Vol. 28 American Chemical

Society. doi:10.1021/ma00130a008.

Marszalek, Piotr E., Hui Lu, Hongbin Li, Mariano Carrion-Vazquez, Andres F.

Oberhauser, Klaus Schulten, and Julio M. Fernandez. 1999. "Mechanical Unfolding

Intermediates in Titin Modules." Nature 402 (6757): 100-103. doi:10.1038/47083.

Marti, O., V. Elings, M. Haugan, C. E. Bracker, J. Schneir, B. Drake, S. a. C. Gould, et al.

1988. "Scanning Probe Microscopy of Biological Samples and Other Surfaces."

Journal of Microscopy 152 (3): 803-809. doi:10.1111/j.1365-2818.1988.tb01452.x.

Mathews, Christopher K., Kensal E. Van Holde, Dean R. Appling, and Spencer J.

Anthony-Cahill. 2012. Biochemistry. 4th ed. California: Prentice Hall-Pearson.

254

McAndrew, Christopher P. 2012. "Studies of Single DNA-Histone Binding Events using

a Novel Magnetic Tweezer." The Catholic University of America.

Meijering, Erik, Oleh Dzyubachyk, and Ihor Smal. 2012. "Chapter Nine - Methods for

Cell and Particle Tracking." In Methods in Enzymology, edited by P. Michael Conn.

Vol. 504, 183-200. Waltham, MA: Academic Press.

Milescu, Lorin S., Ahmet Yildiz, Paul R. Selvin, and Frederick Sachs. 2006. "Maximum

Likelihood Estimation of Kinetics from Staircase Dwell-Time

Sequences." Biophysical Journal 91 (4): 1156-1168.

Moffitt, Jeffrey R., Yann R. Chemla, David Izhaky, and Carlos Bustamante. 2006.

"Differential Detection of Dual Traps Improves the Spatial Resolution of Optical

Tweezers." Proceedings of the National Academy of Sciences of the United States of

America 103 (24): 9006-9011. doi:10.1073/pnas.0603342103.

Moffitt, Jeffrey R., Yann R. Chemla, Steven B. Smith, and Carlos Bustamante. 2008.

"Recent Advances in Optical Tweezers." Annual Review of Biochemistry 77 (1): 205-

228. doi:10.1146/annurev.biochem.77.043007.090225.

255

Mosconi, Francesco, Jean François Allemand, and Vincent Croquette. 2011. "Soft

Magnetic Tweezers: A Proof of Principle." The Review of Scientific Instruments 82 (3):

034302. doi:10.1063/1.3531959.

Murray, Robert K., Darryl K. Granner, Peter A. Mayes, and Victor W. Rodwell. 2006.

Harper's Illustrated Biochemistry. 7th ed. New York: McGraw-Hill.

Nahas, Michelle K., Timothy J. Wilson, Sungchul Hohng, Kaera Jarvie, David M. J.

Lilley, and Taekjip Ha. 2004. "Observation of Internal Cleavage and Ligation

Reactions of a Ribozyme." Nature Structural & Molecular Biology 11 (11): 1107-1113.

doi:10.1038/nsmb842.

Neuman, K. C., T. Lionnet, and J. -F Allemand. 2007. "Single-Molecule

Micromanipulation Techniques." Annual Review of Materials Research 37 (1): 33-67.

doi:10.1146/annurev.matsci.37.052506.084336.

Neuman, K. C., O. A. Saleh, T. Lionnet, G. Lia, J. F. Allemand, D. Bensimon, and V.

Croquette. 2005. "Statistical Determination of the Step Size of Molecular Motors."

Journal of Physics-Condensed Matter 17 (47): S3811-S3820.

256

Neuman, Keir C. 2010. "Evolutionary Twist on Topoisomerases: Conversion of Gyrase

to Topoisomerase IV." Proceedings of the National Academy of Sciences of the United

States of America 107 (52): 22363-22364. doi:10.1073/pnas.1016041108.

Neuman, Keir C., Elio A. Abbondanzieri, Robert Landick, Jeff Gelles, and Steven M.

Block. 2003. "Ubiquitous Transcriptional Pausing is Independent of RNA

Polymerase Backtracking." Cell 115 (4): 437-447.

Neuman, Keir C. and Steven M. Block. 2004. "Optical Trapping." The Review of Scientific

Instruments 75 (9): 2787-2809. doi:10.1063/1.1785844.

Neuman, Keir C. and Attila Nagy. 2008. "Single-Molecule Force Spectroscopy: Optical

Tweezers, Magnetic Tweezers and Atomic Force Microscopy." Nature Methods 5 (6):

491-505. doi:10.1038/nmeth.1218.

Noyes, S. P. and D. P. Atherton. March 2004. "Control of False Track Rate using

Multiple Hypothesis Confirmation.". doi:20040062.

Nugent-Glandorf, Lora and Thomas T. Perkins. 2004. "Measuring 0.1-Nm Motion in 1

Ms in an Optical Microscope with Differential Back-Focal-Plane Detection." Optics

Letters 29 (22): 2611-2613. doi:10.1364/OL.29.002611.

257

Oberstrass, Florian C., Louis E. Fernandes, and Zev Bryant. 2012. "Torque

Measurements Reveal Sequence-Specific Cooperative Transitions in Supercoiled

DNA." Proceedings of the National Academy of Sciences of the United States of America

109 (16): 6106-6111. doi:10.1073/pnas.1113532109.

Paik, D. Hern and Thomas T. Perkins. 2011. "Overstretching DNA at 65 pN does Not

Require Peeling from Free Ends Or Nicks." Journal of the American Chemical Society

133 (10): 3219-3221. doi:10.1021/ja108952v.

Perkins, T. T., Quake, D. E. Smith, and S. Chu. 1994. "Relaxation of a Single DNA

Molecule Observed by Optical Microscopy." Science 264 (5160): 822-826.

doi:10.1126/science.8171336.

Pertsinidis, Alexandros, Yunxiang Zhang, and Steven Chu. 2010. "Subnanometre Single-

Molecule Localization, Registration and Distance Measurements." Nature 466 (7306):

647-651. doi:10.1038/nature09163.

Point Grey. 2015. Camera Sensor Review. Richmond, BC, CA: Point Grey.

http://www.ptgrey.com/sensor-review.

258

Pope, L. H., M. L. Bennink, K. A. van Leijenhorst-Groener, D. Nikova, J. Greve, and J. F.

Marko. 2005. "Single Chromatin Fiber Stretching Reveals Physically Distinct

Populations of Disassembly Events." Biophysical Journal 88 (5): 3572-3583.

doi:10.1529/biophysj.104.053074.

Pralle, A., M. Prummer, E. L. Florin, E. H. Stelzer, and J. K. Hörber. 1999. "Three-

Dimensional High-Resolution Particle Tracking for Optical Tweezers by Forward

Scattered Light." Microscopy Research and Technique 44 (5): 378-386. doi:AID-

JEMT10>3.0.CO;2-Z.

Reid, D. B. 1979. "An Algorithm for Tracking Multiple Targets." IEEE Transactions on

Automatic Control 24 (6): 843-854. doi:10.1109/TAC.1979.1102177.

Revyakin, Andrey, Chenyu Liu, Richard H. Ebright, and Terence R. Strick. 2006.

"Abortive Initiation and Productive Initiation by RNA Polymerase Involve DNA

Scrunching." Science (New York, N.Y.) 314 (5802): 1139-1143.

doi:10.1126/science.1131398.

259

Ribeck, Noah, Daniel L. Kaplan, Irina Bruck, and Omar A. Saleh. 2010. "DnaB Helicase

Activity is Modulated by DNA Geometry and Force." Biophysical Journal 99 (7):

2170-2179. doi:10.1016/j.bpj.2010.07.039.

Rief, Matthias, Hauke Clausen-Schaumann, and Hermann E. Gaub. 1999. "Sequence-

Dependent Mechanics of Single DNA Molecules." Nature Structural & Molecular

Biology 6 (4): 346-349. doi:10.1038/7582.

Rink, Jochen, Eric Ghigo, Yannis Kalaidzidis, and Marino Zerial. 2005. "Rab Conversion

as a Mechanism of Progression from Early to Late Endosomes." Cell 122 (5): 735-749.

doi:10.1016/j.cell.2005.06.043.

Sadler, B. M. and A. Swami. 1999. "Analysis of Multiscale Products for Step Detection

and Estimation." IEEE Transactions on Information Theory 45 (3): 1043-1051.

Saenger, Wolfram. 1984. Principles of . New York: Springer.

doi:10.1007/978-1-4612-5190-3.

Sage, D., F. Hediger, S. M. Gasser, and M. Unser. 2003. "Automatic Tracking of Particles

in Dynamic Fluorescence Microscopy.". doi:10.1109/ISPA.2003.1296963.

260

Sbalzarini, I. F. and P. Koumoutsakos. 2005. "Feature Point Tracking and Trajectory

Analysis for Video Imaging in Cell Biology." Journal of Structural Biology 151 (2):

182-195. doi:10.1016/j.jsb.2005.06.002.

Sethi, I. K. and Ramesh Jain. 1987. "Finding Trajectories of Feature Points in a

Monocular Image Sequence." IEEE Transactions on Pattern Analysis and Machine

Intelligence PAMI-9 (1): 56-73. doi:10.1109/TPAMI.1987.4767872.

Shaevitz, Joshua W., Elio A. Abbondanzieri, Robert Landick, and Steven M. Block. 2003.

"Backtracking by Single RNA Polymerase Molecules Observed at Near-Base-Pair

Resolution." Nature 426 (6967): 684-687. doi:10.1038/nature02191.

Shafique, Khurram and Mubarak Shah. 2005. "A Noniterative Greedy Algorithm for

Multiframe Point Correspondence." IEEE Transactions on Pattern Analysis and

Machine Intelligence 27 (1): 51-65. doi:10.1109/TPAMI.2005.1.

Shao, Zhifeng, Jie Yang, and Andrew P. Somlyo. 1995. "Biological Atomic Force

Microscopy: From Microns to Nanometers and Beyond." Annual Review of Cell and

Developmental Biology 11 (1): 241-265. doi:10.1146/annurev.cb.11.110195.001325.

261

Sheinin, Maxim Y., Scott Forth, John F. Marko, and Michelle D. Wang. 2011.

"Underwound DNA Under Tension: Structure, Elasticity, and Sequence-Dependent

Behaviors." Physical Review Letters 107 (10): 108102.

Skoko, Dunja, Ben Wong, Reid C. Johnson, and John F. Marko. 2004. "Micromechanical

Analysis of the Binding of DNA-Bending Proteins HMGB1, NHP6A, and HU

Reveals their Ability to Form Highly Stable DNA-Protein Complexes." Biochemistry

43 (43): 13867-13874. doi:10.1021/bi048428o.

Skoko, Dunja, Daniel Yoo, Hua Bai, Bernhard Schnurr, Jie Yan, Sarah M. McLeod, John

F. Marko, and Reid C. Johnson. 2006. "Mechanism of Chromosome Compaction and

Looping by the Escherichia Coli Nucleoid Protein Fis." Journal of Molecular Biology

364 (4): 777-798. doi:10.1016/j.jmb.2006.09.043.

Smith, S. B., Y. Cui, and C. Bustamante. 1996. "Overstretching B-DNA: The Elastic

Response of Individual Double-Stranded and Single-Stranded DNA Molecules."

Science (New York, N.Y.) 271 (5250): 795-799.

262

Smith, S. B., L. Finzi, and C. Bustamante. 1992. "Direct Mechanical Measurements of the

Elasticity of Single DNA Molecules by using Magnetic Beads." Science 258 (5085):

1122-1126. doi:10.1126/science.1439819.

Stone, Michael D., Zev Bryant, Nancy J. Crisona, Steven B. Smith, Alexander

Vologodskii, Carlos Bustamante, and Nicholas R. Cozzarelli. 2003. "Chirality

Sensing by Escherichia Coli Topoisomerase IV and the Mechanism of Type II

Topoisomerases." Proceedings of the National Academy of Sciences of the United States of

America 100 (15): 8654-8659. doi:10.1073/pnas.1133178100.

Strick, T. R., J. -F Allemand, D. Bensimon, A. Bensimon, and V. Croquette. 1996a. "The

Elasticity of a Single Supercoiled DNA Molecule." Science 271 (5257): 1835-1837.

doi:10.1126/science.271.5257.1835.

Strick, Terence, Jean-François Allemand, Vincent Croquette, and David Bensimon. 2000.

"Twisting and Stretching Single DNA Molecules." Progress in Biophysics and

Molecular Biology 74 (1): 115-140. doi:10.1016/S0079-6107(00)00018-3.

Sun, Bo, Kong-Ji Wei, Bo Zhang, Xing-Hua Zhang, Shuo-Xing Dou, Ming Li, and Xu

Guang Xi. 2008. "Impediment of E. Coli UvrD by DNA-Destabilizing Force Reveals

263

a Strained-Inchworm Mechanism of DNA Unwinding." The EMBO Journal 27 (24):

3279-3287. doi:10.1038/emboj.2008.240.

Thomann, D., J. Dorn, P. K. Sorger, and G. Danuser. 2003. "Automatic Fluorescent Tag

Localization II: Improvement in Super-Resolution by Relative Tracking." Journal of

Microscopy-Oxford 211: 230-248.

Thompson, Russell E., Daniel R. Larson, and Watt W. Webb. 2002. "Precise Nanometer

Localization Analysis for Individual Fluorescent Probes." Biophysical Journal 82 (5):

2775-2783. doi:10.1016/S0006-3495(02)75618-X.

Veenman, C. J., M. J. T. Reinders, and E. Backer. 2003. "Motion Tracking as a

Constrained Optimization Problem." Pattern Recognition 36 (9): 2049-2067.

doi:10.1016/S0031-3203(03)00037-2.

Veenman, C.J., M.J.T. Reinders, and E. Backer. 2001. "Resolving Motion Correspondence

for Densely Moving Points." IEEE Transactions on Pattern Analysis and Machine

Intelligence 23 (1): 54-72. doi:10.1109/34.899946.

Visscher, K. and S. M. Block. 1998. "Versatile Optical Traps with Feedback Control."

Methods in Enzymology 298: 460-489.

264

Visscher, K., S. P. Gross, and Steven M. Block. 1996. "Construction of Multiple-Beam

Optical Traps with Nanometer-Resolution Position Sensing." IEEE Journal of Selected

Topics in Quantum Electronics 2 (4): 1066-1076. doi:10.1109/2944.577338.

Wang, M. D., M. J. Schnitzer, H. Yin, R. Landick, J. Gelles, and S. M. Block. 1998. "Force

and Velocity Measured for Single Molecules of RNA Polymerase." Science (New

York, N.Y.) 282 (5390): 902-907.

Watson, J. D. and F. H. C. Crick. 1953. "Molecular Structure of Nucleic Acids: A

Structure for Deoxyribose Nucleic Acid." Nature 171 (4356): 737-738.

doi:10.1038/171737a0.

Williams, Mark C., Ioulia Rouzina, and Micah J. McCauley. 2009. "Peeling Back the

Mystery of DNA Overstretching." Proceedings of the National Academy of Sciences 106

(43): 18047-18048. doi:10.1073/pnas.0910269106.

Wuite, Gijs J. L., Steven B. Smith, Mark Young, David Keller, and Carlos Bustamante.

2000. "Single-Molecule Studies of the Effect of Template Tension on T7 DNA

Polymerase Activity." Nature 404 (6773): 103-106. doi:10.1038/35003614.

265

Yan, Jie, Thomas J. Maresca, Dunja Skoko, Christian D. Adams, Botao Xiao, Morten O.

Christensen, Rebecca Heald, and John F. Marko. 2007. "Micromanipulation Studies

of Chromatin Fibers in Xenopus Egg Extracts Reveal ATP-Dependent Chromatin

Assembly Dynamics." Molecular Biology of the Cell 18 (2): 464-474.

doi:10.1091/mbc.E06-09-0800.

Yan, Jie, Dunja Skoko, and John F. Marko. 2004. "Near-Field-Magnetic-Tweezer

Manipulation of Single DNA Molecules." Physical Review. E, Statistical, Nonlinear,

and Soft Matter Physics 70 (1 Pt 1): 011905.

Yang, Deshan, Issam El Naqa, Apte Aditya, Yu Wu, Murty Goddu, Sasa Mutic, Joseph

O. Deasy, Daniel A. Low, Olaf Dössel, and Wolfgang C. Schlegel. 2009. "DIRART –

A Software Suite for Deformable Image Registration and Adaptive Radiotherapy

Research." In World Congress on Medical Physics and Biomedical Engineering, September

7 - 12, 2009, Munich, Germany, 844-847: Springer Berlin Heidelberg. doi:10.1007/978-

3-642-03474-9_236.

You, Huijuan, Jingyuan Wu, Fangwei Shao, and Jie Yan. 2015. "Stability and Kinetics of

C-MYC Promoter G-Quadruplexes Studied by Single-Molecule Manipulation."

Journal of the American Chemical Society 137 (7): 2424.

266

Yu, Junping, Yaxin Jiang, Xinyong Ma, Yi Lin, and Xiaohong Fang. 2007. "Energy

Landscape of Aptamer/Protein Complexes Studied by Single-Molecule Force

Spectroscopy." Chemistry – an Asian Journal 2 (2): 284-289.

doi:10.1002/asia.200600230.

Yung, Kar W., Peter B. Landecker, and Daniel D. Villani. 1998. "An Analytic Solution for

the Force between Two Magnetic Dipoles." Magnetic and Electrical Separation 9: 39-

52.

Zacchia, Nicholas A. and Megan T. Valentine. 2015. "Design and Optimization of

Arrays of Neodymium Iron Boron-Based Magnets for Magnetic Tweezers

Applications." Review of Scientific Instruments 86 (5): 053704. doi:10.1063/1.4921553.

Zhang, Xinghua, Hu Chen, Shimin Le, Ioulia Rouzina, Patrick S. Doyle, and Jie Yan.

2013. "Revealing the Competition between Peeled ssDNA, Melting Bubbles, and S-

DNA during DNA Overstretching by Single-Molecule Calorimetry." Proceedings of

the National Academy of Sciences 110 (10): 3865-3870. doi:10.1073/pnas.1213740110.

Zhou, Jing, Kook Sun Ha, Arthur La Porta, Robert Landick, and Steven M. Block. 2011a.

"Applied Force Provides Insight into Transcriptional Pausing and its Modulation

267

by Transcription Factor NusA." Molecular Cell 44 (4): 635-646.

doi:10.1016/j.molcel.2011.09.018.

Zhou, Ruobo, Alexander G. Kozlov, Rahul Roy, Jichuan Zhang, Sergey Korolev,

Timothy M. Lohman, and Taekjip Ha. 2011b. "SSB Functions as a Sliding Platform

that Migrates on DNA Via Reptation." Cell 146 (2): 222-232.

doi:10.1016/j.cell.2011.06.036.

268