Optimal Estimation of Diffusion Coefficients from Noisy Time-Lapse-Recorded Single- Particle Trajectories
Total Page:16
File Type:pdf, Size:1020Kb
Downloaded from orbit.dtu.dk on: Dec 20, 2017 Optimal Estimation of Diffusion Coefficients from Noisy Time-Lapse-Recorded Single- Particle Trajectories Vestergaard, Christian L.; Flyvbjerg, Henrik Publication date: 2012 Document Version Publisher's PDF, also known as Version of record Link back to DTU Orbit Citation (APA): Vestergaard, C. L., & Flyvbjerg, H. (2012). Optimal Estimation of Diffusion Coefficients from Noisy Time-Lapse- Recorded Single-Particle Trajectories. General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. Optimal Estimation of Diffusion Coefficients from Noisy Time-Lapse-Recorded Single-Particle Trajectories Christian L. Vestergaard Kongens Lyngby 2012 DTU Nanotech Department of Micro- and Nanotechnology Technical University of Denmark Building 345B, DK-2800 Kongens Lyngby, Denmark Phone +45 4525 5700, Fax +45 4588 7762 [email protected] www.nanotech.dtu.dk DTU Nanotech-PhD-2012 Summary (English) Optimal Estimation of Diffusion Coefficients from Noisy Time-Lapse- Measurements of Single-Particle Trajectories Single-particle tracking techniques allow quantitative measurements of diffu- sion at the single-molecule level. Recorded time-series are mostly short and contain considerable measurement noise. The standard method for estimat- ing diffusion coefficients from single-particle trajectories is based on least- squares fitting to the experimentally measured mean square displacements. This method is highly inefficient, since it ignores the high correlations inherent in these. We derive the exact maximum likelihood estimator for the diffusion coefficient, valid for short time-series, along with an exact benchmark for the maximum precision attainable with any unbiased estimator, the Cram´er-Rao bound. We propose a simple analytical and unbiased covariance-based esti- mator based on the autocovariance function and derive an exact analytical expression of its moment generating function. We find that the maximum likelihood estimator exceeds the precision set by the Cram´er-Rao bound, but at the cost of a small bias, while the covariance-based estimator, which is born unbiased, is almost optimal for all experimentally relevant parameter values. We extend the methods to particles diffusing on a fluctuating substrate, e.g., flexible or semiflexible polymers such as DNA, and show that fluctuations in- duce an important bias in the estimates of diffusion coefficients if they are not accounted for. We apply the methods to obtain precise estimates of diffusion coefficients of hOgg1 repair proteins diffusing on stretched fluctuating DNA from data previously analyzed using a suboptimal method. Our analysis shows that the proteins have different effective diffusion coefficients and that their diffusion coefficients are correlated with their residence time on DNA. These results imply a multi-state model for hOgg1's diffusion on DNA. ii Summary (Danish) Optimal estimering af diffusionskoefficienter fra tidsseriem˚alinger af enkeltpartikeltrajektorier med m˚alestøj Enkeltpartikelteknikker gør det muligt kvantitativt at følge enkelte moleky- lers diffusion. Tidsserier m˚alt med disse teknikker er overvejende korte og indeholder meget m˚alestøj. Standardmetoden for estimation af diffusionsko- efficienter fra enkeltpartikeltrajektorier er baseret p˚amindste kvadraters fit til eksperimentelt m˚alte mean squared displacements. Denne metode er højst ineffektiv, da den ignorerer de høje korrelationer mellem disse. Vi udleder den eksakte maksimum likelihood estimator for diffusionskoefficienten, sam- men med et eksakt benchmark for den maksimale præcision nogen middelret estimator kan opn˚a,Cram´er-Rao grænsen. Vi foresl˚aren simpel analytisk og middelret covariansbaseret estimator, baseret p˚aautocovariansfunktionen, og udleder et eksakt analystisk udtryk for dens momentgenererende funktion. Vi finder at maksimum likelihood estimatorens præcision er højere end Cram´er- Rao grænsen, p˚abekostning af en lille skævhed, mens den covariansbaserede estimator, som er født middelret, er optimal for alle eksperimentelt relevante parameterværdier. Vi udvider metoderne til partikler, som diffunderer p˚aet fluktuerende substrat, s˚asomDNA, og viser at hvis der ikke tages højde for fluktuationerne, inducerer de en skævhed i de estimerede diffusionskoefficien- ter. Vi anvender metoderne til præcis estimering af diffusionskoefficienter for hOgg1 proteiner, som diffunderer p˚astrukket fluktuerende DNA, udfra data, som tidligere er blevet analyseret med en suboptimal metode. Vores analyse viser at proteinerne har forskellige effektive diffusionskoefficienter og at deres diffusionskoefficienter er korrelerede med deres residenstider p˚aDNA'et. Disse resultater peger p˚aen multitilstandsmodel for hOgg1's diffusion p˚aDNA. iv Preface Ph.D. thesis Optimal Estimation of Diffusion Coefficients from Noisy Time-Lapse- Recorded Single-Particle Trajectories This thesis has been written as a partial fulfillment of the requirements for obtaining a PhD degree at the Technical University of Denmark (DTU). The PhD project has been conducted in the period of December, 2008 to March, 2012 at Department of Micro- and Nanotechnology, DTU Nanotech Technical University of Denmark DK-2800 Kongens Lyngby Denmark under supervision of Henrik Flyvbjerg, associate professor, dr. scient. Department of Micro- and Nanotechnology, DTU Nanotech Technical University of Denmark DK-2800 Kongens Lyngby Denmark This work was funded 1/3 by Rigshospitalet, and 2/3 jointly by the Danish Agency for Science, Technology and Innovation (Forsknings- og innovations- styrelsen) and DTU Nanotech. Christian L. Vestergaard vi Acknowledgements I would like to thank Paul Blainey, Broad Institute, MA, for providing ex- perimental data, insightful discussions, and for pointing out important details related to collection of real data that a theoretician tends to forget. I wish to thank Christian Gradinaru for his collaboration in deriving statistical proper- ties of the mean squared displacements and mean squared displacement-based estimators, and my collegues Kim Mortensen and Jonas N. Pedersen for numer- ous discussions, on- and off-topic, and for critical lecture of this manuscript. Last, but not least I wish to thank my supervisor Henrik Flyvbjerg for invalu- able advice, guidance and countless late hours spent in front of the blackboard. viii List of Figures 2.1 Experimental MSDs, variances of MSD-based estimators as a function the number of MSD points included in the fit, and variance of the GLS estimator. .8 2.2 Mean plus/minus standard error of the MLE and CVE applied to an ensemble of 1,000 Monte Carlo generated time-series for unknown σ2............................. 13 2.3 Mean plus/minus standard error of the MLE and CVE applied to an ensemble of 10,000 Monte Carlo generated time-series for a priori determined σ2........................ 13 2.4 Power spectra of measured displacements of different types of stochastic movement. 16 2.5 Mean plus/minus standard error of the MLE, which accounts for DNA fluctuations, and the CVE, which does not, applied to an ensemble of 1,000 Monte Carlo generated time-series of diffusion on a fluctuating substrate, where both positional noise amplitude σ2 and parameters determining the substrate motion (c1,X1;1,Y1) are unknown. 19 2.6 Mean plus/minus standard error of the MLE and CVE, which account for DNA fluctuations, applied to an ensemble of 1,000 Monte Carlo generated time-series of diffusion on a fluctuat- ing substrate, where positional noise amplitude σ2 is unknown, while parameters determining the substrate motion, (c1,X1;1,Y1) are known a priori. 20 x LIST OF FIGURES 2.7 Mean plus/minus standard error of the MLE and CVE, which account for DNA fluctuations, applied to an ensemble of 1,000 Monte Carlo generated time-series of diffusion on a fluctuating substrate, where both positional noise amplitude σ2 and pa- rameters determining substrate motion (c1,X1;1,Y1) are known apriori. ............................... 21 2.8 Experimental setup and raw data. 23 2.9 Distribution of the residence times on DNA. 24 2.10 Trajectories of hOgg1 proteins diffusing on flow-stretched DNA. 26 2.11 Diffusion coefficient estimates Dp versus protein residence time on DNA t............................... 28 2.12 Results from Monte Carlo simulations of a random walker in a rugged energy landscape. 29 A.1 Mean plus/minus standard error of the approximate and exact maximum likelihood estimates of the diffusion coefficient. 53 A.2 Actual SNR as a function of SNR0 and Cram´er-Rao bound as a function of SNR0......................... 58 A.3 Standard deviation and probability density of the CVE. 58 A.4 Absolute values of Si;¨ as a function of ! and characteristic function pr of the CVE. 64 B.1 Correlation coefficient ck as a function of mode number k and relative contributions to the variance of the transversal motion of the three lowest modes for DNA pulled by the ends. 70 B.2 Relative contributions to of the three lowest modes to the power spectrum of the measured transversal motion of a DNA strand pulled by the ends. 70 B.3 Correlation coefficient ck as a function of mode number k and relative contributions to the variance of the transversal motion of the three lowest modes for DNA in plug flow. 74 B.4 Relative contributions to of the three lowest modes to the power spectrum of the measured transversal motion of a DNA strand stretched by a plug flow. 74 LIST OF FIGURES xi B.5 Correlation coefficient ck as a function of mode number k and relative contributions to the variance of the transversal motion of the three lowest modes for DNA in shear flow.