An Open Source Toolkit for Analysis of Freely-Diffusing Single

bioRxiv preprint doi: https://doi.org/10.1101/039198; this version posted July 8, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. FRETBursts: An Open Source Toolkit for Analysis of Freely-Diusing Single-Molecule FRET Antonino Ingargiola∗1, Eitan Lerner1, SangYoon Chung1, Shimon Weiss1, and Xavier Michalet1 1Dept. Chem. & Biochem, Univ. California Los Angeles, Los Angeles, CA, USA. July 8, 2016 Abstract results in a single document, FRETBursts allows to seam- less share analysis workows and results, encourages repro- Single-molecule Forster¨ Resonance Energy Transfer ducibility and facilitates collaboration among researchers in (smFRET) allows probing intermolecular interactions and the single-molecule community. conformational changes in biomacromolecules, and represents an invaluable tool for studying cellular processes at the molecular scale. smFRET experiments can detect Contents the distance between two uorescent labels (donor and acceptor) in the 3-10 nm range. In the commonly employed 1 Introduction 2 confocal geometry, molecules are free to diuse in solution. 1.1 Open Science and Reproducibility . 2 When a molecule traverses the excitation volume, it emits a 1.2 Paper Overview . 2 burst of photons, which can be detected by single-photon avalanche diode (SPAD) detectors. e intensities of donor 2 FRETBursts Overview 3 2.1 Technical Features . 3 and acceptor uorescence can then be related to the distance 2.2 Soware Availability . 3 between the two uorophores. While recent years have seen a growing number of con- 3 Architecture and Concepts 3 tributions proposing improvements or new techniques in 3.1 Photon Streams . 3 smFRET data analysis, rarely have those publications been 3.2 Background Denitions . 3 accompanied by soware implementation. In particular, de- 3.3 e Data Class . 4 spite the widespread application of smFRET, no complete 3.4 Introduction to Burst Search . 4 soware package for smFRET burst analysis is freely avail- 3.5 Corrected Burst Sizes and Weights . 6 able to date. In this paper, we introduce FRETBursts, an open source 4 smFRET Burst Analysis 7 soware for analysis of freely-diusing smFRET data. 4.1 Loading Data . 7 FRETBursts allows executing all the fundamental steps of 4.2 Alternation Parameters . 7 smFRET bursts analysis using state-of-the-art as well as 4.3 Background Estimation . 7 novel techniques, while providing an open, robust and well- 4.4 Burst Search . 9 documented implementation. erefore, FRETBursts repre- 4.5 Bursts Corrections . 9 sents an ideal platform for comparison and development of 4.6 Burst Selection . 9 new methods in burst analysis. 4.7 Population Analysis . 11 We employ modern soware engineering principles in or- 4.8 FRET Dynamics . 12 der to minimize bugs and facilitate long-term maintainabil- ity. Furthermore, we place a strong focus on reproducibility 5 Implementing Burst Variance Analysis 13 by relying on Jupyter notebooks for FRETBursts execution. 5.1 BVA Overview . 13 Notebooks are executable documents capturing all the steps 5.2 BVA Implementation . 13 of the analysis (including data les, input parameters, and results) and can be easily shared to replicate complete smFRET 6 Conclusions 15 analyzes. Notebooks allow beginners to execute complex workows and advanced users to customize the analysis for Supporting Information 20 their own needs. By bundling analysis description, code and S1 Notebook Workow . 20 S2 Development and Contributions . 20 ∗[email protected] S3 Timestamps and Burst Data . 20 1 bioRxiv preprint doi: https://doi.org/10.1101/039198; this version posted July 8, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. S4 Ploing Data . 21 maintaining open source scientic soware for reproducible S5 Background Estimation With Optimal research is a critical requirement of the modern scientic en- reshold . 22 terprise [26, 27]. S6 Burst Weights . 22 Other disciplines have started tackling this issue [28], and S6.1 eory . 22 even in the single-molecule eld a few recent publications S6.2 Weighted FRET estimator . 23 have provided soware for analysis of surface-immobilized S6.3 Weighted FRET histogram . 23 experiments [29–33]. For freely-diusing smFRET experiments, although it is common to nd mention of “code available from the authors upon reques” in publications, 1 Introduction there is a dearth of such open source code, with, to our knowledge, the notable exception of a single example [34]. 1.1 Open Science and Reproducibility To address this issue, we have developed FRETBursts, an open source Python soware for analysis of freely-diusing Over the past 20 years, single molecule FRET (smFRET) has single-molecule FRET measurements. FRETBursts can be grown into one of the most useful techniques in single- used, inspected and modied by anyone interested in using molecule spectroscopy [1, 2]. While it is possible to ex- state-of-the art smFRET analysis methods or implementing tract information on sub-populations using ensemble mea- modications or completely new techniques. FRETBursts surements (e.g. [3, 4]), smFRET unique feature is its ability therefore represents an ideal platform for quantitative com- to very straightforwardly resolve conformational changes of parison of dierent methods for smFRET burst analysis. biomolecules or measure binding-unbinding kinetics in het- Technically, a strong emphasis has been given to the repro- erogeneous samples [5–9]. smFRET measurements on freely ducibility of complete analysis workows. FRETBursts uses diusing molecules (the focus of this paper) have the ad- Jupyter Notebooks [35], an interactive and executable doc- ditional advantage, over measurements performed on im- ument containing textual narrative, input parameters, code, mobilized molecules, of allowing to probe molecules and and computational results (tables, plots, etc.). A notebook processes without perturbation from surface immobilization thus captures the various analysis steps in a document which or additional functionalization needed for surface aach- is easy to share and execute. To minimize the possibility of ment [10, 11]. bugs being introduced inadvertently [36], we employ mod- e increasing amount of work using freely-diusing ern soware engineering techniques such as unit testing smFRET has motivated a growing number of theoretical con- and continuous integration [28, 37]. FRETBursts is hosted tributions to the specic topic of data analysis [12–24]. De- on GitHub [38, 39], where users can write comments, re- spite this profusion of publications, most research groups port issues or contribute code. In a related eort, we re- still rely on their own implementation of a limited number cently introduced Photon-HDF5 [40], an open le format for of methods, with very lile collaboration or code sharing. timestamp-based single-molecule uorescence experiments. To clarify this statement, let us point that our own group’s An other related open source tool is PyBroMo [41], a freely- past smFRET papers merely mention the use of custom-made diusing smFRET simulator which produces Photon-HDF5 soware without additional details [16, 17]. Even though les that are directly analyzable with FRETBursts. Together some of these soware tools are made available upon re- with all the aforementioned tools, FRETBursts contributes to quest, or sometimes shared publicly on websites, it remains the growing ecosystem of open tools for reproducible science hard to reproduce and validate results from dierent groups, in the single-molecule eld. let alone build upon them. Additionally, as new methods are proposed in literature, it is generally dicult to quantify their performance compared to other methods. An indepen- 1.2 Paper Overview dent quantitative assessment would require a complete reim- plementation, an eort few groups can aord. As a result, is paper is wrien as an introduction to smFRET burst potentially useful analysis improvements are either rarely or analysis and its implementation in FRETBursts. e aim is slowly adopted by the community. In contrast with other es- illustrating the specicities and trade-os involved in var- tablished traditions such as sharing protocols and samples, ious approaches with sucient details to enable readers to in the domain of scientic soware, we have relegated our- customize the analysis for their own needs. selves to islands of non-communication. Aer a brief overview of FRETBursts features (sec- From a more general standpoint, the non-availability of tion 2), we introduce essential concepts and terminology for the code used to produce scientic results, hinders repro- smFRET burst analysis (section 3). In section 4, we illustrate ducibility, makes it impossible to review and validate the the steps involved in smFRET burst analysis: (i) data loading soware’s correctness and prevents improvements and ex- (section 4.1), (ii) denition of the excitation alternation peri- tensions by other scientists. is situation, common in many ods (section 4.2), (iii) background correction (section 4.3), (iv) disciplines, represents a real impediment to the scientic burst search (section 4.4), (v) burst selection (section 4.6) and progress. Since the pioneering work of the Donoho

Load more