Flyclockbase
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/099192; this version posted January 9, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International licenseFlyClockbase. time series variance curation 1 Watching the clock for 25 years in 2 FlyClockbase: 3 Variability in circadian clocks of Drosophila melanogaster 4 as uncovered by biological model curation 5 *,† ‡ *,† *,† 6 Katherine S. Scheuer , Bret Hanlon , Jerdon W. Dresel , Erik D. Nolan , ‡ *,† 7 John C. Davis , Laurence Loewe 8 * 9 Systems Biology Theme, Wisconsin Institute for Discovery, 10 University of Wisconsin-Madison, Madison, WI, 53715 † 11 Laboratory of Genetics, University of Wisconsin-Madison, Madison, WI 53706 ‡ 12 Department of Statistics, University of Wisconsin-Madison, Madison, WI, 53706 13 14 General Article Summary 15 Circadian clocks impact health and fitness by controlling daily rhythms of gene- 16 expression through complex gene-regulatory networks. Deciphering how they work 17 requires experimentally tracking changes in amounts of clock components. We 18 designed FlyClockbase to simplify data-access for biologists and modelers and curated 19 over 400 time series observed in wildtype fruit flies from 25 years of research on clocks. 20 We found differences in peak time variance of the clock-proteins ‘PERIOD’ and 21 ‘TIMELESS’, which probably stem from differences in phosphorylation-network 22 complexity. Combining in-depth circadian-biology, model-curation, and compiler logic, 23 our trans-disciplinary research shows how biology-friendly compilers could simplify 24 model curation enough to democratize it. 25 POST StabliizingZone Level: QualityQuest. VersionVariant Number: QQv1r0p0_2017m01d05_LL i bioRxiv preprint doi: https://doi.org/10.1101/099192; this version posted January 9, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under Scheuer et al. aCC-BY 4.0 International license. 1 Running title: FlyClockbase time series variance curation 2 3 Keywords / Key phrases: 4 Drosophila melanogaster circadian clock model biological data input review, 5 experimental time series peak-valley variance and outlier clock observations, 6 differential variance hypothesis on PERIOD - TIMELESS amount peak times, 7 data repository for estimating parameters of mechanistic simulation models, 8 compiler logic enabling human error analysis simplifying biological model curation. 9 10 Corresponding author: 11 Laurence Loewe 12 Wisconsin Institute for Discovery, University of Wisconsin-Madison, 13 330 N Orchard St, Madison, WI, 53715 14 Email and Phone: [email protected] (608) 316-4324 15 16 Statement of data availability and stabilizing versioning number: QQv1 17 For review purposes: QQv1 zip-archive in Supplemental Material or upon request. 18 Before final publication: FlyClockbase will be made available on http://github.com/ 19 20 Abbreviations: Table 1: Core clock components Table 2: Concepts in FlyClockbase 21 Supporting Material: 22 Supporting Text and Tables, Supplemental Statistical Analysis (87 pages), 23 R-Script zip file (>12K lines, QQv1), FlyClockbase zip file (QQv1). 24 ii 2017-01-05 QQv1 bioRxiv preprint doi: https://doi.org/10.1101/099192; this version posted January 9, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International licenseFlyClockbase. time series variance curation 1 Abstract 2 High-quality model curation provides insights by organizing biological knowledge- 3 fragments. We aim to integrate published results about circadian clocks in Drosophila 4 melanogaster while exploring economies of scale in model curation. Clocks govern 5 rhythms of gene-expression that impact fitness, health, cancer, memory, mental 6 functions, and more. Human clock insights have been pioneered in flies. Flies simplify 7 investigating complex gene regulatory networks, which express proteins cyclically using 8 environmentally entrained interlocking feedback loops. Simulations could simplify 9 research further, but currently few models test their quality directly against 10 experimentally observed time series scattered across publications. We designed 11 FlyClockbase for robust efficient access to such scattered data for biologists and 12 modelers, prioritizing simplicity and openness to encourage experimentalists to 13 preserve more annotations and raw-data. Such details could multiply long-term value for 14 modelers interested in meta-analyses, parameter estimates, and hypothesis testing. 15 Currently FlyClockbase contains over 400 wildtype time series of core circadian 16 components systematically curated from 86 studies published between 1990 and 2015. 17 Using FlyClockbase, we show that PERIOD protein amount peak time variance 18 unexpectedly exceeds that of TIMELESS. We hypothesize, PERIOD’s exceedingly 19 more complex phosphorylation rules are responsible. Human error analysis improved 20 data quality and revealed significance-degrading outliers, possibly violating presumed 21 absence of wildtype heterogeneity or lab evolution. We found PCR-measured peak time 22 variances exceed those from other methods, pointing to initial count stochasticity. Our 23 trans-disciplinary analyses demonstrate how compilers with more biology-friendly logic 24 could simplify, guide, and naturally distribute biological model curation. Resulting quality 25 increases and cost reductions benefit curation-dependent grand challenges like 26 personalizing medicine. 27 POST StabliizingZone Level: QualityQuest. VersionVariant Number: QQv1r0p0_2017m01d05_LL iii bioRxiv preprint doi: https://doi.org/10.1101/099192; this version posted January 9, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under Scheuer et al. aCC-BY 4.0 International license. 1 Table of Contents 2 3 4 INTRODUCTION 1 5 Challenges 1 --- Circadian clocks 4 --- Model organisms 4 --- Math models 5 6 Estimating unknown rates from observed time series 6 --- Biological example --- Models and reality 7 Reproducibility of research 8 --- Firm foundations --- Problems with label reproducibility 8 Statistical reproducibility --- Statistical error iceberg --- Reproducibility in genetics 15 9 Versioned Biological Information Resources (VBIRs) 16 --- Importance of versioned data integrators 10 Flexibility of VBIRs --- Database integration --- Cochrane review 11 Genome projects as a model for VBIRs development on a broader scale 20 12 Genome projects have revolutionized biology --- Importance of biological model curation 22 13 How a compiler could help in biological model curation 23 --- Enforce best practices 14 Efficiencies of scale --- Opportunity 25 --- Purpose of this study 26 --- Formal organization of 15 Versioned Biological Information Resources (VBIRs) --- Data integrated by curation --- Hypothesis 16 tested: biological variability --- Hypothesis tested: observation method --- Human error analysis 17 Compiler logic design --- Importance of efficient biological model curation --- Overview of Sections 18 19 MODELS 34 20 Biological model of fly circadian clocks 34 --- Main loop 35 --- Other loops 36 21 In silico models integrating fly clock observations 37 --- Biological results overview 37 22 Role of stochasticity 38 --- Shared problems 39 --- Parameter estimation in complex models 40 23 Using abstract time series traits 42 --- Using complete observed time series 44 --- Using both, 24 complete observed time series and abstract traits 47 --- Complete experimentally observed time 25 series --- Working with abstract time series traits --- Distances --- Both sides offer advantages 26 Studies integrating time series data 50 27 FlyClockbase data model overview 52 --- Overview 52 --- SumS, DetS --- Logic in biology 54 28 Other design decisions 55 --- Simple file system storage 56 --- Flexibility 57 --- Data types for 29 organizing content 59 --- Content, Attributes, Traits --- Identification of TimeSeries 62 --- IDL, IDF, 30 IDM, Ref.Fig.TS -- Raw, Mod, and Odd Observations 65 -- Data types of time measurements 66 31 ZT, DZT, CZT -- Data types of amounts 67 --- Current definition of scope 68 --- Collection of data 69 32 33 MATERIALS AND METHODS 70 34 Literature search 70 --- Initial eligibility assessment 71 --- Biological eligibility assessment 72 35 TimeSeries data extraction 74 --- Accuracy estimates of digitized TimeSeries data 74 36 Extracting TimeSeries Attributes 76 37 TimeSeries Traits analysis of Peaks and Valleys refined by ObsOdd checks 79 38 Measuring a limit for maximal peak time variance 79 39 Factors contributing to increased trait variance 80 --- Linearizing Time Series Data 82 40 Statistical Analyses 85 --- Automated analysis with R script 85 --- Outlier analysis 86 41 Testing differences in variance 87 --- Testing differences in mean 88 42 Supplemental Statistical Analysis 88 43 44 RESULTS 89 45 Experimental observations used in modeling 89 46 FlyClockbase is a new resource enabling studies of circadian clock TimeSeries 91 47 Hypothesis testing --- 91 Error analysis 92 --- Overview of