Minas et al. BMC Bioinformatics (2017) 18:316 DOI 10.1186/s12859-017-1695-8

SOFTWARE Open Access ReTrOS: a MATLAB toolbox for reconstructing transcriptional activity from gene and protein expression data Giorgos Minas1,2*† ,HiroshiMomiji1†, Dafyd J. Jenkins1,4†, Maria J. Costa1,2†, David A. Rand1,3 and Bärbel Finkenstädt2*

Abstract Background: Given the development of high-throughput experimental techniques, an increasing number of whole genome profiling time series data sets, with good temporal resolution, are becoming available to researchers. The ReTrOS toolbox (Reconstructing Transcription Open Software) provides MATLAB-based implementations of two related methods, namely ReTrOS–Smooth and ReTrOS–Switch, for reconstructing the temporal transcriptional activity profile of a gene from given mRNA expression time series or protein reporter time series. The methods are based on fitting a differential equation model incorporating the processes of transcription, translation and degradation. Results: The toolbox provides a framework for model fitting along with statistical analyses of the model with a graphical interface and model visualisation. We highlight several applications of the toolbox, including the reconstruction of the temporal cascade of transcriptional activity inferred from mRNA expression data and protein reporter data in the core in , and how such reconstructed transcription profiles can be used to study the effects of different cell lines and conditions. Conclusions: The ReTrOS toolbox allows users to analyse gene and/or protein expression time series where, with appropriate formulation of prior information about a minimum of kinetic parameters, in particular rates of degradation, users are able to infer timings of changes in transcriptional activity. Data from any organism and obtained from a range of technologies can be used as input due to the flexible and generic nature of the model and implementation. The output from this software provides a useful analysis of time series data and can be incorporated into further modelling approaches or in hypothesis generation. Keywords: Gene transcription, Time series, Transcriptional switches, Circadian timing

Background generation of large numbers of high-resolution genome- Analyzing the temporal dynamics of mRNA and pro- scaletimeseriesdatasets.Theprocessing,analysisand tein expression is a key ingredient to the study of gene summarising of such time series data has a number of the- function within the cell. The widespread adoption of high- oretical and computational difficulties to overcome. If, for throughput transcriptomic and proteomic technologies, example, a protein reporter construct is used, what is the such as microarrays, fluorescent imaging, transcriptional relationship between the observed reporter protein and reporter constructs and sequencing, has enabled the the mRNA expression dynamics of the gene of interest? Moreover, allowing for the process of mRNA and protein *Correspondence: [email protected]; degrading, what is the actual transcriptional activity? [email protected] Here, we present a software toolbox called ReTrOS †Equal contributors (Reconstructing Transcription Open Software) which 1Systems Biology Centre, University of Warwick, Coventry CV34 4HD, UK 2Department of Statistics, University of Warwick, Coventry CV34 4HD, UK provides several approaches for processing and analysing Full list of author information is available at the end of the article both gene or protein expression time series data sets, with

© The Author(s). 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Minas et al. BMC Bioinformatics (2017) 18:316 Page 2 of 11

an easy-to-use graphical interface for user interaction. Let y(ti), i = 1, ..., n, be the discretely observed (not The software is written in the cross-platform MATLAB® necessarily at equidistant time points) imaging protein environment. The approach used in ReTrOS is based on time series. Assuming that the mean of the data are pro- a differential equation model [1] to account for transcrip- portional with unknown factor κ to the concentration of tion and degradation of mRNA molecules and translation the reporter protein, we have and degradation of protein molecules. The model has y(t ) = P˜(t ) + (t ),whereP˜(t) = κP(t),(3) been the basis to a number of novel methodologies to i i i study the dynamics of gene expression [2–6]. The ReTrOS and (ti), i = 1, ..., n are independent normally dis- software currently provides two methods for the pro- tributed random variables each with mean zero and 2 cessing, reconstruction and analysis of gene transcription unknown variance σ (ti). If the observations are mRNA dynamics: expression levels (e.g. measured using microarrays tech- nology), instead of (Eq. 3) we have • ReTrOS-Smooth: based upon the algorithm ˜ ˜ introduced in Harper et al. (2011) [2]. y(ti) = M(ti) + (ti),whereM(t) = κM(t),(4) ReTrOS-Smooth outputs a smooth reconstruction of with (t ), i = 1, ..., n again independent normally transcription activity from a non-parametric i distributed random variables each with mean zero and representation of the transcriptional process. The unknown variance σ 2(t ). algorithm is extended to incorporate both variability i The main difference between ReTrOS-Smooth and of measurement error and uncertainty about model ReTrOS-Switch lies in the formulation of the transcription parameters with credibility envelopes simulated function τ(t) as follows. through a bootstrap procedure. • ReTrOS-Switch: based upon the algorithm ReTrOS-smooth: reconstruction of a smooth transcription introduced in Jenkins et al. (2013) [3]. function ReTrOS-Switch outputs a reconstruction of Here we use a kernel smoothing function that we found transcriptional activity using a temporal “switch” to be more flexible outperforming the spline approach model where transcription rates are subject to particularly when the data changes rapidly over short temporal jumps at unknown time points. A Bayesian time-scales. The user is assumed to have prior informa- inference algorithm using reversible jump Markov tion about the values of the degradation rates and has Chain Monte Carlo is provided for modeling mRNA to provide these through appropriate distributions. The data as in [3], and has, for the purpose of this article, algorithm extends the transcription reconstruction algo- been extended to model protein dynamics. rithm originally introduced in [2] by incorporating both This article introduces and describes the underlying variability of measurement error and model parameter model and the two algorithms implemented in ReTrOS. uncertainty through a bootstrap procedure. ˜( ) We briefly discuss the data requirements and compare the Rewriting (Eq. 2) in terms of P t we first note that, for ˜( ) δ applicability of the two methods. Following the methods a given solution path P t and degradation rate P we can overview, we present several cases of applying ReTrOS compute the mRNA path by concluding with some final remarks summarising the soft- ˜( ) ˜ dP t ˜ ware and its uses. M(t) := καM(t) = + δPP(t).(5) dt ˜ Implementation In ReTrOS-Smooth, the continuous function P(t) is mRNA and protein expression dynamics replaced using kernel smoothing [7] with a bandwidth The basic model underlying ReTrOS is an ordinary differ- optimized by leave–one–out cross–validation. One can ential equation (ODE) model introduced in [1]: then recover the transcriptional dynamics by inserting the estimated M˜ (t) into dM = τ( ) − δ ( ) ˜ ( ) t MM t (1) d M t ˜ dt τ(˜ t) := κατ(t) = + δMM(t) .(6) dP dt = αM(t) − δPP(t) (2) κ α dt If the scaling factors and are unknown, the profile of the reconstructed time paths representing the mRNA where M(t) is the amount of mRNA transcript at time t, and transcriptional dynamics are computed at an arbi- P(t) is the amount of protein at time t, τ(t) istherateof trarylevel.InReTrOSwesetκα = 1 as default parameter transcription/mRNA synthesis, δM istherateofmRNA values. transcript decay, α is the rate of translation/protein syn- This two-step back-calculation is applied if the observed thesis and δP istherateofproteindecay. time courses represent protein data arising, for example, Minas et al. BMC Bioinformatics (2017) 18:316 Page 3 of 11

from reporter protein or fluorescently tagged functional series data. Here it is further extended to include protein protein. If the data y(ti) represent mRNA levels, the pro- dynamics through (Eq. 2). We assume that the transcrip- cedure involves only the one step of the back-calculation tional rate function τ(t) in (Eq. 1) has the profile of a step in (Eq. 6) and instead of (Eq. 3), y(ti) are assumed to be function where the rate is constant between time points, related to the unknown mRNA expression levels, M(t), si, where it changes due to unobserved transcriptional through (Eq. 4). A prior distribution for the mRNA degra- events (for example, activation or inhibition): dation rate, δM, is provided by the user here. The next τ(t) = τi for si < t ≤ si+1, i = 1, ..., k, t ∈[0,L](7) paragraph details the steps of the ReTrOS-Smooth imple- mentation algorithm. with L the total length of the time interval observed. Note that transcription might not be fully turned off and that Variability of the reconstructed profile The variability there may be more than just two states. The posterior dis- τ(˜ ) of the profiles for t depends on the variance of the fit- tributions for the switch-times s1, ..., sk and number of ted kernel function given the sample size and variance of switches k are estimated by the software. Extending this the residuals. ReTrOS-Smooth incorporates uncertainty model to incorporate protein expression dynamics (Eq. 2) about knowledge of the degradation rate parameters. Let gives the solution: ˆ =ˆ( ) − ( )  ti y ti y ti denote the residuals between the data t −δ t −δ t δ u and the fitted smooth kernel function at time ti. The influ- P(t) = P(0)e P + αe P e P M(u) du (8) ence of both sources of variability can be estimated in a 0 straightforward way via bootstrap simulation methods [8]: with M(t) satisying (Eq. 5) and τ(t) as in (Eq. 7). A fast estimation algorithm can be implemented due to ˆ( ) (1) Compute y t using kernel smoothing with a the fact that the ODE system, for given values of the degra- bandwidth optimized by leave–one–out dation parameters during an iteration of the estimation cross–validation. In order to estimate a model for the sampler,canbewrittenasalinearmodelforfunctionsof variance we apply kernel estimation at the same the parameters P(0), M(0) and all τ.Thatis,(Eq.8)canbe σˆ 2( ) bandwidth to fit a smooth function t to the written as a linear regression model, which for the protein ˆ2 squared residuals i . is of the form: (2) Obtain a resampled data profile of same sample size P P P P ∗ P(t) = β0X + β1X + β2X +···+βk+2X + (9) y (ti) =ˆy(ti) + e(ti), ti = 1, ..., n where 0 1 2 k 2 2 τ e(ti) ∼ N(0, σˆ (ti)), i.e. using the variance function where β = P(0), β = M(0), β = 0 , β = 0 1 2 δM 3 ( ) τ −τ τ −τ − evaluated at ti, and such that e ti has the same sign 1 0 , ... ., β + = k k 1 and the regressors are δM k 2 δM as ˆt . i −δ (3) Find yˆ∗(t) using kernel smoothing at the same XP = e Pt 0   α −δ −δ bandwidth as used in step 1. P = Mt − Pt X1 e e (4) Draw values for δP and δM from their prior δP − δM  −δ −δ −δ  distributions. 1 − e Pt e Mt − e Pt ˜ P = α − (5) Calculate M and τ˜ using(Eq.5)and(Eq.6), X2 δP δP − δM respectively. . .   Steps (2) to (5) are repeated R times (by default set to −δ ( − ) −δ ( − ) −δ ( − ) − P t sk M t sk − P t sk 99 in ReTrOS-Smooth), after which point-wise mean esti- P = α 1 e − e e Xk+2 δ δ − δ mates for M˜ and τ˜ are plotted along with estimated 95% P P M confidence envelopes for the reconstructed curves. The and, analogously, for the mRNA [3] values of δ and δ in step (4) are drawn from a gamma P M ( ) = β M + β M +···+β M distribution with mean and standard deviation provided M t 1X1 2X2 k+2Xk+2 (10) bytheuser.IfthetimecourserepresentsmRNAlevels with then steps (1) to (5) are implemented analogously where −δ −δ XM = e Mt, XM = 1 − e Mt, only a prior distribution for δM is required. An exam- 1 2 −δ ( − ) −δ ( − ) M = − M t s1 ... M = − M t sk ple output from the ReTrOS-Smooth method is shown in X3 1 e , , Xk+2 1 e . Fig. 1. Parameter estimation and model output Parameter ReTrOS-switch: reconstruction of a switching transcription estimation is performed using a Markov Chain Monte profile Carlo (MCMC) sampler following the methodological This method is an implementation of the transcriptional approach introduced in [3]. Like for ReTrOS-Smooth the switch model inference introduced in Finkenstädt et al. [1] user is required to provide prior means and standard devi- and Jenkins et al. [3] where it was applied to mRNA time ations so that an informative Gamma distribution can be Minas et al. BMC Bioinformatics (2017) 18:316 Page 4 of 11

a WT:CAB (4) 1.5

1 Protein expression 0.5 10 20 30 40 50 60 70 80 90 b 0.25

0.2

0.15

0.1

0.05 mRNA expression 0 10 20 30 40 50 60 70 80 90 c 0.8

0.6

0.4

0.2 Transcription rate 0 10 20 30 40 50 60 70 80 90 Timescale (hours) Fig. 1 Example output from ReTrOS-Smooth applied to protein data. Data is from a luciferase reporter for the CAB clock gene in wild-type Arabidopsis plants (ROBuST Project) [15]. Panel a) shows the raw input data (blue circles), the kernel-smoothed input data (green)andthemedian fitted protein output (black). Panel b) shows the back-calculated median reporter mRNA expression (red) with 2.5% and 97.5% percentiles in dashed lines.Panelc) shows the back-calculated median transcription activity (blue) with the same percentiles as (b). The model in Eqs. (5) and (6) are used here. Degradation rates (mean, std. dev.) of mRNA and protein are (2.3, 0.46) and (0.13, 0.010), respectively, set by choosing luc reporter in the parameter setting GUI (Fig. 5, upper panel). For other parameter values, see the middle panel of Fig. 5

δ δ = ˆ formulated for the degradation rates, P and M, which are we(i) Me(i) , an estimate of the given expression at updated using a standard Metropolis-Hastings acceptance time ti, derived by a smoothing spline kernel. If mRNA scheme. For protein reporter observations (Eq. 3) is used expression data are provided, the reduced model given M while (Eq. 4) is used for mRNA level observations, where in (Eq. 10) can be inferred using δM and the X design the translation rate parameters, α and κ, if unknown, are matrix only. fixedto1asintheReTrOS-Smoothmodel. Given the nature of the sampling methodology, we have The number and positions of the switch points are to summarise the posterior distributions obtained from sampled using the reversible jump methodology [9]. Esti- the simulated chains. Univariate parameter distributions mates of the values for the β0 to βk+2 coefficients, are summarised in terms of percentiles, whilst the switch representing P(0), M(0) and τ0, ..., τk,areobtained time distribution is summarised as a Gaussian mixture through linear regression of the model (Eq. 9) given δP model, such that each estimated switch event has a mean and δM, protein expression data and the design matrix and a standard deviation derived from the marginal switch XP. The regression can be performed using ordinary time probability density (see [3] for further details). In least squares or a parametric weighted least squares addition to summarising the sampled distributions across method (see, for example, [10] for details) with weights, all model sizes, the sampled switch time density for Minas et al. BMC Bioinformatics (2017) 18:316 Page 5 of 11

specific model sizes are also shown in order of sampled Input data format The data format used is a simple table frequency. Example output summary from the ReTrOS- generated by standard spreadsheet software. A header Switch method applied to microarray mRNA expression row defines the observation time and each row contains data is shown in Fig. 2 and to protein reporter (luciferase) the observed data for a single sample, gene or protein. data in Fig. 3. The N columns in the table can be used to specify the name of the gene/protein or other annotation data and Running ReTrOS the remaining columns contain the observed values for The ReTrOS toolbox provides both a graphical and a the corresponding observation time in the header row command-line interface for selecting input data files and (see Fig. 4). Replicate samples can be either treated as specifying algorithm parameters. Users can either run individual time series, or combined together. the analysis directly from the graphical user interface, or alternatively from the command-line or batch scripts Algorithm parameters A range of user-definable param- generated from the GUI. eters are available for both methods. The user interface

a 4 d 2 10 ELF4 107 sigma (thinned) 10 6 5 8 4 3 6 2 4 mRNA expression a 1

5 1015202530354045 0 2000 4000 6000 b Degradation rate (thinned) 0.15 1.5

0.1 b 1 0.05 Switch PDF 0.5

0 5 1015202530354045 0 2000 4000 6000 c Model size frequency 5000 0.3 c 4000 0.2 3000

2000 0.1 1000

Switch samples (thinned) 0 5 1015202530354045 2 4 6 8 10 Timescale (hours) Number of switches Fig. 2 Example output summary from ReTrOS-Switch applied to mRNA microarray data. Data is from a microarray time-series for the ELF4 clock gene in wild-type Arabidopsis plants (PRESTA project) [12]. Panel a) shows the raw input data (blue circles, extremes of replicates shown in shaded region and median shown with line), the fitted mRNA expression (black line) and the estimated switch events (μ ± 1.96σ as vertical red/green lines). Panel b) shows the baseline-removed estimated switch time probability density (black line) and the estimated switch times Gaussian mixture model (blue line, with shaded red/green region indicating μ ± 1.96σ ). Panel c) shows the accepted switch time samples, along with the switch events 2 (red/green). The burn-in period of the chain is shown by the dashed black line.Paneld) shows the accepted samples from the precision (σ ), δm and δp parameter chains in blue and a histogram of the accepted model size. Summaries of the parameter chains are shown in black (median and lower/upper quartiles) and red (μ and 1.96σ ). The model in Eq. (10) is used here. The default switch parameter values in the parameter setting GUI are used, with the prior mRNA degradation rate (mean = 1.5, std. dev. = 0.18), set by choosing Arabidopsis (see Fig. 5, upper panel) Minas et al. BMC Bioinformatics (2017) 18:316 Page 6 of 11

a e LHY sigma2 (thinned) 2.5 0.09

2 0.08 0.07 1.5 0.06 1 0.05

Protein expression 0.5 10 20 30 40 50 60 70 80 90 100 0 2000 4000 6000 b mRNA degradation rate 7 1 6 0.8 5 0.6 4 0.4 3 0.2

mRNA expression 0 2000 4000 6000 10 20 30 40 50 60 70 80 90 100 protein degradation rate c 0.22 0.15 0.2 0.18 0.1 0.16 0.14 0.05 0.12

Switch PDF 0 2000 4000 6000 0 Model size frequency 10 20 30 40 50 60 70 80 90 100 0.4 d 5000 0.3 4000 0.2

3000 0.1 2000 0 1000 7 9 11 13 15 Number of switches 10 20 30 40 50 60 70 80 90 100

Switch samples (thinned) Timescale (hours)

Fig. 3 Example output summary from ReTrOS-Switch applied to protein data. Data is luciferase tagged protein for the LHY clock gene in wild-type Arabidopsis plants (ROBuST Project) [15]. Panel a) shows the raw input data (blue circles, extremes of replicates shown in shaded region and median shown with line), the fitted protein expression (black line) and the estimated switch events (μ ± 1.96σ as vertical red/green lines). Panel b)showsthe back-calculated mRNA expression profiles (black line) with the estimated switch times. Panel c) shows the baseline-removed estimated switch time probability density (black line) and the estimated switch times Gaussian mixture model (blue line, with shaded red/green region indicating μ ± 1.96σ ). Panel d) shows the accepted switch time samples, along with the switch events (red/green). The burn-in period of the chain is shown by 2 the dashed black line.Panele) shows the accepted samples from the precision (σ ), δm and δp parameter chains in blue and a histogram of the accepted model size. Summaries of the parameter chains are shown in black (median and lower/upper quartiles) and red (μ and 1.96σ ). If mRNA expression data is used, specific output panels are removed accordingly. The model in Eq. (9) is used here. The default switch parameter values in the parameter setting GUI are used, with the prior mRNA and protein degradation rates set by choosing luc reporter provides default values for required parameters (such more appropriate to use. The ReTrOS-Smooth method as number of iterations or bootstrap samples to run) does not require, but can handle, replicate data and due and each method provides default values for all other to the non-parametric approach has fewer restrictions parameters. Most algorithmic parameters can be modi- on data requirements. Model fitting using the ReTrOS- fied through the user interface and all parameters can be Switch method is improved by the use of replicate data modified through the batch scripts (see Fig. 5). samples, but requires that the replicate data samples are reasonably synchronised (both in expression range and Algorithm comparison dynamics) and that changes in transcription activity are Whilst the two algorithms in ReTrOS use the same expected. The algorithmic complexity of the ReTrOS- underlying transcription/translation model, there are a Smooth method is lower than for the ReTrOS-Switch number of considerations as to which method may be method,whichalsorequiresalargenumberofMCMC Minas et al. BMC Bioinformatics (2017) 18:316 Page 7 of 11

Fig. 4 The ReTrOS GUI environment for importing data. The user can import data by selecting the data file directory of a spreadsheet file that is in the same format as in the example data provided with our software. The software automatically detects the number of time-points and replicates, the profiles with unique names, the column that defines the name of each gene/protein. The profile type can be defined by the user as either mRNA or protein reporter (for more details see ReTrOS User Manual)

iterations to estimate the parameter distributions accu- distributions over the timing and type of changes in tran- rately. As such, the computational time required for scriptional activity and therefore it is more appropriate the ReTrOS-Smooth method is far smaller than for the when switching events are of interest. ReTrOS-Switch method. ReTrOS-Smooth processes time series such as in Fig. 1 (4 replicates of time-series of 46 Results data-points) with the default parameter setting (see Fig. 5) The algorithms used in the ReTrOS toolbox have previ- in about 20 s (3.1-GHz Intel Core i7 processor), about 8 ously been applied to a number of different mRNA and times faster on average than ReTrOS-Switch. The com- protein reporter data sets, in addition to those in [1]. putations can be parallelised, which improves processing The study in [2] used the ReTrOS-Smooth to investigate times particularly for large number of protein reporters or the transcriptional dynamics of two protein-reporter sys- genes, and the current ReTrOS implementation provide tems, firefly luciferase (luc) and destabilized enhanced the option to use workers of the parallel pool of the local green fluorescent protein (EGFP), under identical pro- machine. Whilst both methods require informative prior moter control in mammalian cells. The study in [3] used distributions of δm and δp, the mRNA and protein degra- ReTrOS-Switch to explore correlations between discrete dation rates, ReTrOS-Switch samples the parameter dis- transcriptional ‘switch events’ and structure tributions through the Metropolis-Hastings acceptance and also obtained updated degradation rate estimates for scheme and provides an updated posterior estimate of 200 genes displaying a in Arabidopsis the distributions, whereas ReTrOS-Smooth draws directly thaliana leaf samples. Here we also present a new case from the given distributions only. Finally, in addition to study applying ReTrOS to recently published mRNA and the continuous estimate of transcription activity gener- protein-reporter time series of a selection of central Ara- ated by both algorithms, ReTrOS-Switch also generates bidopsis thaliana circadian clock-related genes. Circadian Minas et al. BMC Bioinformatics (2017) 18:316 Page 8 of 11

Fig. 5 The GUI environment for selecting parameter values of ReTrOS-Smooth. Default parameter values are available to the user. The rate parameter values can also be imported by selecting the directory of the corresponding file. A similar GUI environment is available for the ReTrOS-Switch method (for more details see ReTrOS User Manual)

clocks and rhythms are present in most living organ- transcriptional switches, at approximately 6, 10 and 20 h, isms and provide a regulatory method for many important during the first 24-h observation period, followed by three processes. A common feature of circadian clocks is that switches of the same types at approximately the same they consist of both transcriptional and translational com- times during the second 24-h period. However, not all ponents. We use the ReTrOS software to explore the of the analysed profiles displayed identifiable expression oscillatory nature of the mRNA expression and protein dynamics (specifically ELF3) and as such a clear tran- reporter time series data from the model plant Arabidop- scriptional switch profile could not be obtained. The sis thaliana. putative interactions of the repressilator model can be visually identified in many cases: for example the regu- Identifying temporal events in the Arabidopsis thaliana latory interactions by the LHY/CCA1 group (shown in circadian ‘repressilator’ circuit black) can be identified with positive regulation of sev- Using the simplified core ‘repressilator’ model (shown in eral PRR genes (increasing transcription switch event to Fig. 6 top) introduced by Pokhilko et al. [11], we aim increasing transcriptional switch event) and negative reg- to identify the direct and indirect regulatory interac- ulation of several EC genes (increasing transcriptional tions caused by the double-negative in terms of switch event to decreasing transcriptional switch event). transcriptional switch timing. We selected 10 core clock The temporal flow of transcriptional regulation in waves genes (Table 1) including several from each broad gene can also be identified, with the LHY/CCA interactions fol- group, LHY/CCA1, PRRs and Evening Complex (EC), and lowed by the PRR interactions which are followed by the analysed mRNA microarray data from the 2-day wild- EC interactions. type time series dataset [12] using ReTrOS-Switch (shown in Fig. 6). The temporal switch distributions (shown at Effects of light and temperature conditions on circadian ±1.96σ with darker colours representing more frequently marker genes sampled switches) clearly show the circadian nature of We analysed protein luciferase reporter data from a 4-day the transcriptional profiles: for example LHY has three time series dataset of the circadian-controlled CCR2 and Minas et al. BMC Bioinformatics (2017) 18:316 Page 9 of 11

Fig. 6 Arabidopsis clock repressilator circuit model and estimated transcriptional switch times. The top panel shows the core Arabidopsis clock repressilator circuit model from Pokhilko et al. [11], showing the 3 gene groups: LHY/CCA1 (black), PRRs (blue), EC (dark green). Direct regulatory interactions between the groups are shown by the solid lines, while the indirect interactions caused by the double-negative interactions are shown by dotted lines.Thebottom panel shows the switch times and switch types heatmap estimated using the ReTrOS-Switch method for the selected 10 clock genes. The data used here is from microarray time-series for the clock genes in wild-type Arabidopsis plants (PRESTA project) [12]. Switches increasing the transcription rate are shown in red, decreasing transcription switches are shown in green. Darker colours represent more frequently sampled switch times. The direct interactions from the repressilator circuit model are shown by the black, blue and dark green solid lines and the indirect interactions are shown in dashed lines in the same colours. The oscillatory nature of the genes’ transcription is clearly identified in most of the genes and the putative interactions from the repressilator can also be determined in many cases. The default switch parameter values in the parameter setting GUI are used, with the prior mRNA degradation rate set by choosing Arabidopsis

Table 1 Selected Arabidopsis thaliana clock genes CAB2 promoters in a range of wild-type and mutant lines, Gene Group Locus temperature conditions and light regimes [13]. Apply- LHY LHY/CCA1 AT1G01060 ing the ReTrOS-Smooth algorithm to the detrended data CCA1 LHY/CCA1 AT2G46830 and combining replicate observations from the same lines yields clear rhythmic back-calculated transcriptional pro- PRR9 PRRs AT2G46790 files under most experimental conditions. Figure 7 shows PRR7 PRRs AT5G02810 the back-calculated transcription profiles for the wild- PRR5 (NI) PRRs AT5G24470 type CCR2(3) line (shown by the black line) and the cry1 TOC1 - AT5G61380 cry CCR2(3) double-mutant line (shown by the red line at GI - AT1G22770 temperatures of 12, 17 and 27 °C and blue (BL), red (RL) LUX EC AT3G46640 and mixed red-blue (RBL) light conditions. We observe similar behaviours to those identified within the original ELF4 EC AT2G40080 study, such as an increased period in the cry1 cry2 CCR2 ELF3 EC AT2G25930 mutant line at 27 °C under RBL conditions, however, Minas et al. BMC Bioinformatics (2017) 18:316 Page 10 of 11

pipeline as either, a data preprocessing step for rapidly 12oC BL 12oC RBL 12oC RL 1 1 1 obtaining high-resolution back-calculated transcriptional profiles which are then used in other analysis steps, or 0.5 0.5 0.5 directly as an analysis tool extracting distributions of tran- scriptional ‘switching’ activity and degradation rate esti- 0 0 0 mates. There are a number of possible extensions to the 20 40 60 80 20 40 60 80 20 40 60 80 Transcription (norm.) analysis including parallelisation of the ReTrOS-Smooth 17oC BL 17oC RBL 17oC RL 1 1 1 bootstrap procedure and parallel MCMC sampling in the ReTrOS-Switch algorithm, the application of further anal- 0.5 0.5 0.5 ysis methods such as clustering on the model outputs and the use of Bayesian hierarchical modelling for multiple 0 0 0 time series [3]. The methodology for a stochastic tran- 20 40 60 80 20 40 60 80 20 40 60 80 Transcription (norm.) scriptional switch model is considerably more challenging 27oC BL 27oC RBL 27oC RL 1 1 1 and has recently been developed in [5] and [6].

0.5 0.5 0.5 Availability and requirements The ReTrOS toolbox, user manual and example data 0 0 0 are freely available from http://www2.warwick.ac.uk/fac/ 20 40 60 80 20 40 60 80 20 40 60 80 Transcription (norm.) Timescale (hours) Timescale (hours) Timescale (hours) sci/systemsbiology/research/software/ and https://github. Fig. 7 Back-calculated transcription profiles of the wild-type and cry1 com/giorgosminas/ReTrOS under the GNU General Pub- cry2 double-mutant CCR2 reporter lines under different temperature lic License https://www.gnu.org/licenses/gpl-3.0.en.html. and light conditions. Each panel shows the back-calculated wild-type The toolbox runs in MATLAB® that can operate in Win- CCR2 (black line) and the double-mutant cry1 cry2 CCR2 (red line) dows, Mac and Linux (see https://uk.mathworks.com/ transcription profiles at 12 (top row), 17 (middle row)and27°C support/sysreq.html). (bottom row) with blue (BL, left column), mixed red and blue (RBL, middle column)andred light (RL, right column) conditions. The rhythmic activity is observable in all profiles, but with strong damping Additional file at 17 and 27 °C across most light conditions. A lengthening of oscillatory period is particularly observable in the mutant line at 27 °C Additional file 1: Toolbox and data. The MATLAB code for running ReTrOS under RBL conditions. Degradation rates (mean, std. dev.) of mRNA toolbox. This also includes the data used in the illustrative examples and the and protein are respectively (2.8, 0.63) and (0.098, 0.0058) at 12 °C, user manual that provides the instructions on how to run the toolbox. (2.3, 0.46) and (0.13, 0.010) at 17 °C, and (1.2, 0.11) and (0.19, 0.020) at (ZIP 52838 kb) 27 °C. For other parameter values, see the middle panel of Fig. 5 Abbreviations BL: Blue light; CAB2: Chlorophyll A/B-binding protein 2; CCA1: Circadian clock associated 1; CCR2: Cold, circadian rhythm, and RNA binding 2; EC: Evening complex; EGFP: Destabilized enhanced green fluorescent protein; GUI: the back-calculated transcription profile of the cry1 cry2 Graphical user interface; LHY: Late elongated Hypocotyl; luc: firefly luciferase; CCR2 still shows damped rhythmic dynamics. As the MCMC: Markov Chain Monte Carlo; mRNA: Messenger Ribonucleic acid; ODE: back-calculation model takes into account the degrada- Ordinary differential equation; PRR: Pseudo-response regulator; RBL: Mixed red-blue light; ReTrOS: Reconstructing transcription open software; RL: Red tion processes of the luceriferase reporter, finer-scale light structures in the time series are able to be extracted which may allow, for instance, increased accuracy in periodic- Acknowledgments This research is part of the PRESTA (Plant Responses to Environmental STress in ity inference when using methods robust to asymmetric Arabidopsis) and the ROBUST (Regulation Of Biological Signalling by oscillations such as spectrum resampling [14]. Temperature) projects and we thank the groups for high-resolution Arabidopsis thaliana microarray and luciferase time series data and for helpful discussions and advice. We thank S. Calderazzo, K. Hey, P. Brown and D. Conclusions Woodcock for discussions on the methodology and software. Analysis of large-scale and high-throughput data is becoming an increasingly common task for many Funding This work was supported through providing funds by the Biotechnology and researchers. We provide an easy-to-use toolbox for the Biological Sciences Research Council [BB/F005806/1, BB/F005237/1]; and the analysis of mRNA or protein-reporter time series data, Engineering and Physical Sciences Research Council [EP/C544587/1 to DAR]. that generates a fine-scale profile of transcriptional activ- None of the funding bodies have played any part in the design of the study, in the collection, analysis, and interpretation of the data, or in the writing the ity by removing the effects of degradation processes from manuscript. the observed data (Additional file 1). The ReTrOS tool- box has been applied to a variety of datasets from a Authors’ contributions GM developed ReTrOS-Switch for protein data; DJJ developed ReTrOS-Switch range of different technologies and platforms. ReTrOS can for mRNA data; HM and MJC developed ReTrOS-Smooth; DJJ, HM, MJC and be easily incorporated into a computational or analysis GM all programmed and implemented relevant parts of the software; DJJ Minas et al. BMC Bioinformatics (2017) 18:316 Page 11 of 11

implemented final version of software with user manual; GM tested the final KJ, Hall AJ. Network balance via CRY signalling controls the Arabidopsis version of software; DAR provided guidance on mathematical modeling; BF circadian clock over ambient temperatures. Mol Syst Biol. 2013;9(1):650. suggested and provided statistical guidance on statistical methodology; GM, 14. Costa MJ, Finkenstädt B, Roche V, Lévi F, Gould PD, Foreman J, Halliday K, BF and DJJ wrote the manuscript with the help from all authors; All authors Hall A, Rand DA. Inference on periodicity of circadian time series. read and approved manuscript submitted. Biostatistics. 2013;14(4):792–806. 15. Gould PD, Ugarte N, Domijan M, Costa M, Foreman J, MacGregor D, Ro Competing interests se K, Griffiths J, Millar AJ, Finkenstädt B, Penfield S, Rand DA, Halliday KJ, The authors declare that they have no competing interests. Hall AJW. Network balance via CRY signalling controls the Arabidopsis circadian clock over ambient temperatures. Mol Syst Biol. 2013;9(1). Consent for publication doi:10.1038/msb.2013.7. Not applicable.

Ethics approval and consent to participate Not applicable. Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author details 1Systems Biology Centre, University of Warwick, Coventry CV34 4HD, UK. 2Department of Statistics, University of Warwick, Coventry CV34 4HD, UK. 3Mathematics Institute, University of Warwick, Coventry CV34 4HD, UK. 4School of Life Sciences, University of Warwick, Coventry CV34 4HD, UK.

Received: 5 December 2016 Accepted: 18 May 2017

References 1. Finkenstädt B, Heron EA, Komorowski M, Edwards K, Tang S, Harper CV, Davis JRE, White MRH, Millar AJ, Rand DA. Reconstruction of transcriptional dynamics from gene reporter data using differential equations. Bioinformatics. 2008;24(24):2901–7. 2. Harper CV, Finkenstädt B, Woodcock DJ, Friedrichsen S, Semprini S, Ashall L, Spiller DG, Mullins JJ, Rand DA, Davis JRE, White MRH. Dynamic analysis of stochastic transcription cycles. PLoS Biol. 2011;9(4):1000607. 3. Jenkins DJ, Finkenstädt B, Rand DA. A temporal switch model for estimating transcriptional activity in gene expression. Bioinformatics. 2013;29(9):1158–65. 4. Woodcock DJ, Vance KW, Komorowski M, Koentges G, Finkenstädt B, Rand DA. A hierarchical model of transcription dynamics allows robust estimation of transcription rates in populations of single cells with variable gene copy number. Bioinformatics. 2013;29(12):1519–25. 5. Hey K, Momiji H, Featherstone K, Davis JRE, White MRA, Rand DA, Finkenstädt B. A stochastic transcriptional switch model for single cell imaging data. Biostatistics. 2015;16(4):655–69. 6. Featherstone K, Hey K, Momiji H, McNamara A, Patist AL, Woodburn J, Spiller D, Christian H, McNeilly A, Mullins J, Finkenstädt B, Rand DA, White MRA, Davis JRE. Spatially coordinated dynamic gene transcription in living pituitary tissue. eLife. 2015;16:5–08494. 7. Wand MP, Jones MC. Kernel Smoothing. Monographs on Statistics and Applied Probability, vol 60. Boca Raton: Chapman and Hall/CRC; 1994. 8. Efron B, Tibshirani RJ. An introduction to the bootstrap. Chapman & Hall/CRC monographs on statistics & applied probability. Boca Raton: Taylor & Francis; 1994. 9. Green PJ. Reversible jump Markov chain Monte Carlo computations and Submit your next manuscript to BioMed Central Bayesian model determination. Biometrika. 1995;82:711–32. 10. Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB. and we will help you at every step: Bayesian data analysis. Boca Raton: Chapman & Hall/CRC; 2004. • We accept pre-submission inquiries 11. Pokhilko A, Piñas Fernández A, Edwards KD, Southern MM, Halliday KJ, Millar AJ. The clock gene circuit in Arabidopsis includes a repressilator • Our selector tool helps you to find the most relevant journal with additional feedback loops. Mol Syst Biol. 2012;8:574. • We provide round the clock customer support 12. Windram O, Madhou P, McHattie S, Hill C, Hickman R, Cooke E, Jenkins DJ, • Convenient online submission Penfold CA, Baxter L, Breeze E, Kiddle SJ, Rhodes J, Atwell S, Kliebenstein DJ, Kim YS, Stegle O, Borgwardt K, Zhang C, Tabrett A, • Thorough peer review Legaie R, Moore J, Finkenstädt B, Wild DL, Mead A, Rand D, Beynon J, • Inclusion in PubMed and all major indexing services Ott S, Buchanan-Wollaston V, Denby KJ. Arabidopsis defense against • Maximum visibility for your research Botrytis cinerea: Chronology and regulation deciphered by high-resolution temporal transcriptomic analysis. Plant Cell. 2012;24(9):3530–57. Submit your manuscript at 13. Gould PD, Ugarte N, Domijan M, Costa M, Foreman J, Macgregor D, www.biomedcentral.com/submit Rose K, Griffiths J, Millar AJ, Finkenstädt B, Penfield S, Rand DA, Halliday