entropy
Article Testing of Multifractional Brownian Motion
Michał Balcerek *,† and Krzysztof Burnecki † Faculty of Pure and Applied Mathematics, Hugo Steinhaus Center, Wroclaw University of Science and Technology, Wyspianskiego 27, 50-370 Wroclaw, Poland; [email protected] * Correspondence: [email protected] † These authors contributed equally to this work.
Received: 18 November 2020; Accepted: 10 December 2020; Published: 12 December 2020
Abstract: Fractional Brownian motion (FBM) is a generalization of the classical Brownian motion. Most of its statistical properties are characterized by the self-similarity (Hurst) index 0 < H < 1. In nature one often observes changes in the dynamics of a system over time. For example, this is true in single-particle tracking experiments where a transient behavior is revealed. The stationarity of increments of FBM restricts substantially its applicability to model such phenomena. Several generalizations of FBM have been proposed in the literature. One of these is called multifractional Brownian motion (MFBM) where the Hurst index becomes a function of time. In this paper, we introduce a rigorous statistical test on MFBM based on its covariance function. We consider three examples of the functions of the Hurst parameter: linear, logistic, and periodic. We study the power of the test for alternatives being MFBMs with different linear, logistic, and periodic Hurst exponent functions by utilizing Monte Carlo simulations. We also analyze mean-squared displacement (MSD) for the three cases of MFBM by comparing the ensemble average MSD and ensemble average time average MSD, which is related to the notion of ergodicity breaking. We believe that the presented results will be helpful in the analysis of various anomalous diffusion phenomena.
Keywords: multifractional Brownian motion; autocovariance function; power of the statistical test; Monte Carlo simulations
1. Introduction Over the last decades, massive advances in single-particle tracking (SPT), partially based on superresolution microscopy of fluorescently tagged tracers, or fluorescence correlation spectroscopy allow experimentalists to obtain insight into the motion of submicron tracer particles or even single molecules in complex environments, such as living biological cells, down to nanometer precision and at submillisecond time resolution [1,2]. The observed data obtained by SPT experiments often show pronounced deviations from Brownian motion, namely, anomalous diffusion of the power-law form
2 α EX(t) ' Kαt (1) of the mean-squared displacement (MSD) is observed [3–8]. Kα is the anomalous diffusion coefficient. Depending on the magnitude of the anomalous diffusion exponent α we distinguish subdiffusion for 0 < α < 1 from superdiffusion with α > 1 [5,8]. Subdiffusion is typically observed for submicron particles in both bacterial and eukaryotic cells [6,9–13], in artificially crowded [14,15] and structured [16–19] liquids, in pure and protein-crowded lipid bilayer systems [12,20–25], as well as in groundwater systems [26]. Superdiffusion occurs in the presence of active motion, for instance, in living biological cells [27–29] or due to bulk-surface exchange [30,31]. While typically anomalous
Entropy 2020, 22, 1403; doi:10.3390/e22121403 www.mdpi.com/journal/entropy Entropy 2020, 22, 1403 2 of 17 diffusion refers to the power-law behavior (1) with a fixed α, an increasing number of systems are reported in which the local scaling exponent of the MSD (1) is an explicit function of time, α(t). Such transient behavior has, for instance, been observed for green fluorescent proteins in cells or for the motion of lipid molecules in protein-crowded bilayer membranes [25,32]. Fractional Brownian motion (FBM) introduced by Kolmogorov in 1940 and rediscovered by Mandelbrot and van Ness in 1968 [33–35] is a generalization of the classical Brownian motion (BM). Most of its statistical properties are characterized by the self-similarity (Hurst) index 0 < H < 1. FBM is D H H-self-similar, namely for every c > 0 we have BH(ct) = c BH(t) in the sense of all finite dimensional distributions, and has stationary increments. It is the only Gaussian process satisfying these properties. FBM is the overdamped description for viscoelastic motion and thus intimately connected to the fractional Langevin processes, an attractive framework for many physical systems [36], for instance, 2 2 2H of lipid molecules in bilayer membranes [22,25,37]. The second moment of FBM reads EBH(t) = σ t , 2 2 where EBH(1) = σ > 0. As a consequence, for H < 1/2 we obtain subdiffusive dynamics with persistent motion, whereas for H > 1/2, the process is superdiffusive and antipersistent. Since FBM is the classical model for power-law dependence a number of statistical tests have been already introduced for this process in the literature. Let us mention here the tests based on the autocovariance function (ACVF), MSD, and detrending moving average statistics [38–40]. FBM has stationary increments that do not allow us to model processes whose regularity of paths and “memory depth” change in time [41]. Several generalizations of FBM have been proposed recently. One of these, called multifractional Brownian motion (MFBM), was proposed by Peltier and Véhel [42] with time-varying Hurst exponent H(t) which is a Hölder function. The variance at time 2 2H(t) t of the MFBM BH(t)(t) is given by Var(BH(t)(t)) = σ t [43]. The time-varying Hurst exponent H(t) characterizes the path regularity of the process at time t: sample paths near t with small Ht, close to 0, are space-filling and highly irregular, while paths with large Ht, close to 1, are very smooth. The variance constant σ2 determines the “energy level” of the process. This natural extension of FBM results in some loss of some of FBMs basic properties, in particular, the increments of MFBM are non-stationary and the process is no longer self-similar. Other, similar generalizations are limited to a piecewise constant H [44] but, what is important from a data analysis point of view, is that they lead to continuous Gaussian processes with stationary increments. Let us also mention an idea involving an appropriate class of covariance functions. Ryvkina [45] uses such covariance functions to define Gaussian processes to extend FBM and MFBM to a class of fractional Brownian motions with a variable Hurst parameter parameterized by a set of all measurable functions with values in (1/2, 1), and different from MFBMs. However, from a biological data point of view, such a range for H values is not practical since it only corresponds to a superdiffusive (long-range dependent) case. MFBMs have become popular as flexible models in describing real-life signals of high-frequency features in geoscience, microeconomics, and turbulence, to name a few [43]. They are closely related to the notion of transient diffusion dynamics observed in biological experiments. The article is structured as follows. In Section2, the MFBM is defined and its basic properties are presented. We also recall formulas for the ensemble average MSD, time average MSD, and present three Hurst exponent functions that will be analyzed in the sequel. In Section3, the main results are presented. We introduce a statistical test on MFBM based on its ACVF which is presented as a quadratic form. Next, the power of the test is studied for the three cases corresponding to different Hurst exponent functions. We show the areas where the test is very strong in distinguishing between the processes and the cases when it fails in this respect. Finally, Section4 summarizes and concludes our work.
2. Model and Methods Let us start with a definition of the MFBM. Entropy 2020, 22, 1403 3 of 17
Definition 1. (Multifractional Brownian motion). Process BH(t)(t) is called a multifractional Brownian t≥0 motion (MFBM) if it is a centered Gaussian process with covariance function
H(t)+H(s) H(t)+H(s) H(t)+H(s) Cov(BH(t)(t), BH(s)(s)) = D(H(t), H(s)) t + s − |t − s| , (2) where p σ2 Γ(2x + 1)Γ(2y + 1) sin(πx) sin(πy) D(x, y) = (3) x+y 2Γ(x + y + 1) sin π 2 for some σ > 0 and Hölder function H : [0, ∞) → [a, b] ⊂ (0, 1) of some exponent β > 0 [46]. 2 ( ) = 2 2H(t) ( ) The second moment of MFBM scales as E BH(t) t σ t . Hence, we will call H t the Hurst exponent function. Furthermore, for H(t) ≡ H ∈ (0, 1) MFBM becomes standard FBM. de f In general, MFBM has non-stationary increments. Its increment process Y(t) = BH(t+1)(t + 1) − BH(t)(t) possesses the long-range dependence property, in the sense that
∞ ∀δ > 0, ∀s ≥ 0 ∑ |Corr (Y(s), Y(s + kδ)) | = +∞, k=0
( ( ) ( )) where Corr is the correlation function, i.e., Corr(Y(t), Y(s)) = √Cov Y t ,Y s [46]. E2Y(t)E2Y(s) Furthermore, the function H(t) can be pointwise interpreted as a local self-similarity parameter, i.e., ! BH(u+εt)(u + εt) − BH(u)(u) lim = s(u) (BH(t))t∈ , ε→0+ εH(u) R+ t∈R+ where BH is a fractional Brownian motion with index H ≡ H(u), s(u) is a scaling function [47] and the convergence is on the space of continuous functions endowed with the topology of the uniform convergence on compact sets.
2.1. Mean-Squared Displacement
Let us now recall different estimators of the MSD for a sample of n trajectories X1, X2, ... Xn, each with N observations, that is, Xi consists of Xi(t1), Xi(t2), ... , Xi(tN) equally spaced in time, i = 1, 2, . . . , n. Ensemble average MSD (EAMSD) is defined as follows:
n 1 2 EAMSD(τ = m∆t) = ∑ (Xk(t1 + τ) − Xk(t1)) , (4) n k=1 where m = 1, . . . , N and ∆t = t2 − t1. Time average MSD (TAMSD) is defined for each trajectory as
N−m 1 2 TAMSD(τ = m∆t, k) = X (t + m∆t) − X (t ) . (5) − ∑ k j k j N m j=1
Finally, we consider ensemble and time average MSD (EATAMSD) which is an average of TAMSDs:
1 n EATAMSD(τ) = ∑ TAMSD(τ, k). (6) n k=1
Physicists often observe systems where EA and EATAMSD are different. Such behavior is called weak ergodicity breaking. In mathematics, the notion of ergodicity is restricted to stationary Entropy 2020, 22, 1403 4 of 17 processes. Since the increments of MFBM lack stationarity, when analyzing the results, one has to be exceedingly meticulous.
2.2. Three Cases of the Hurst Exponent Function Following [48], we consider three basic families of the function H(t), namely
linear H(t) = at + b, t ∈ [0, T], c − b logistic H(t) = n o + b, t ∈ [0, T], t−t0 1 + exp −d T t periodic H(t) = a sin 4π + b, t ∈ [0, T] T for some time horizon T > 0. Furthermore, in the sequel we consider only case with parameter σ2 = 1 in (2). Such functions are continuous and as a consequence satisfy the Hölder condition, so in order to MFBM be properly defined we only require H(t) ∈ (0, 1), for all t ∈ [0, T] [42]. Such choice of considered functions can be interpreted as follows. In the linear case, MFBM can switch steadily from short- to long-range dependence or vice versa, whereas in the logistic case such change is quite rapid and it happens between two levels. The latter case closely resembles instantaneous change in dependence or jump-type regime switching (such cases would lead to non-Hölder function). An alternative function to the logistic which is also considered is the literature is the arctan function [49]. Finally, the periodic case represents a situation where such changes are gradual and repetitive. In the paper, we focus on the following special cases with specified parameters:
0.3 linear function : H(1)(t) = t + 0.3, t ∈ [0, 1000], 1000 0.3 (2)( ) = + ∈ [ ] logistic function : H t t−500 0.3, t 0, 1000 , 1 + exp −100 1000 t periodic case : H(3)(t) = 0.15 sin 4π + 0.45. t ∈ [0, 1000]. 1000
We choose those specific parameters so that all of the cases have a similar “average” behavior of the function H(t), i.e., its mean is close to 0.45, and the function itself has values in the interval [0.3, 0.6]. We illustrate those cases in Figures1–3. On the top left panel of each of these figures, we can see three simulated trajectories. The function H(t) is presented on the top right panel, whereas on the bottom panel we can see a behavior of the corresponding MSDs. It is important to note that EAMSD (blue line) is directly related to the variance of the model at time τ, i.e., EAMSD(τ) = Vard(X(τ)), thus, from (2), it should behave like τ2H(τ). For the linear case, see Figure1, since H(1)(t) increases steadily from 0.3 to 0.6 we can see trajectories exhibit more variability for bigger times. This is also related to the EAMSD dynamics. In addition, such a model exhibits weak ergodicity breaking behavior (i.e., lack of equality between EATAMSD and EAMSD), which can be inferred from the bottom panel. Next, for the logistic case, see Figure2, H(2)(t) increases quite rapidly from 0.3 to 0.6 near t = 500. As a consequence, we can see a switch in the behavior of simulated trajectories: for times t < 500 they exhibit far less variability than for t > 500. Intuitively, for times t < 500 trajectories locally exhibit short-range dependence, whereas for t > 500 they locally exhibit long-range dependence. Again, we can see weak ergodicity breaking on the bottom panel. Entropy 2020, 22, 1403 5 of 17
Figure 1. MFBM with the linear Hurst exponent function. Top left panel: three simulated trajectories. Top right panel: illustration of the function H(t) used in simulations. Bottom panel: comparison of EAMSD (solid blue line) with EATAMSD (dashed red line) and its 95% confidence interval (red shaded area). EAMSD and ETAMSD with confidence interval were calculated on the basis of 1000 simulated trajectories of MFBM.
Figure 2. Cont. Entropy 2020, 22, 1403 6 of 17
Figure 2. MFBM with the logistic Hurst exponent function. Top left panel: three simulated trajectories. Top right panel: illustration of the function H(t) used in simulations. Bottom panel: comparison of EAMSD (solid blue line) with EATAMSD (dashed red line) and its 95% confidence interval (red shaded area). EAMSD and ETAMSD with confidence interval were calculated on the basis of 1000 simulated trajectories of MFBM.
Finally, for the periodic case, see Figure3, H(3)(t) varies between 0.3 and 0.6. We can clearly see two different regimes of behavior: for times when H(3)(t) is bigger, trajectories are smoother and generally have larger values, in contrast to times when H(3)(t) is smaller. Despite lack of stationarity, here, EAMSD almost always lies in the confidence region of EATAMSD, which could suggest there is no weak ergodicity breaking.
Figure 3. MFBM with the periodic Hurst exponent function function. Top left panel: three simulated trajectories. Top right panel: illustration of the function H(t) used in simulations. Bottom panel: comparison of EAMSD (solid blue line) with EATAMSD (dashed red line) and its 95% confidence interval (red shaded area). EAMSD and ETAMSD with confidence interval were calculated on the basis of 1000 simulated trajectories of MFBM. Entropy 2020, 22, 1403 7 of 17
3. Results In applications, it is crucial to be able to check whether a stochastic model describes empirical data well. Despite dedicated identification methods for the MFBM [50–53], to the best of the authors’ knowledge, there is no rigorous statistical test designed for such process. Here, we propose an approach using a simple test statistic which also contains useful information about the process itself.
3.1. Test For the testing purposes, we follow an approach based on the ACVF which was introduced by Balcerek and Burnecki [38]. ACVF is a very popular statistic and it is also one of the simplest quadratic forms. For a random sample XN = {X(1), X(2), ... , X(N)} and τ ∈ {1, 2, ··· , N − 1}, it is defined as follows:
1 N−τ ACVF (τ) = X(i + τ)X(i). (7) N − ∑ N τ i=1
Here, we only consider a version of ACVF without subtracting the sample mean as it does not influence performance of tests based on this statistic for a centered process [38] and it makes the formulas much simpler. N Let us now introduce a matrix A(τ) = {a(τ; i, j)}i,j=1, where 1 I(i = j) if τ = 0 N ( ) = 1 1 a τ; i, j N− I(|i − j| = τ) if τ = 1, 2, . . . , N − 1 (8) 2 τ 0 otherwise,
1 and I is the indicator. To summarize, the matrix A(τ) is either diagonal (for τ = 0) with elements N on diagonal or Toeplitz, with only two nonzero subdiagonals (starting at (1 + τ)th row and (1 + τ)th 1 1 column) with elements 2 N−τ . The statistic ACVFN can be now expressed as a quadratic form (as shown in [38]) as a generalized χ2 distribution, that is
N 2 ACVFN(τ) = ∑ λi(τ)Zi , (9) i=1
2 2 where Zis are i.i.d standard normal variables (so Zi has a χ1 distribution) and λk(τ) are eigenvalues 1/2 1/2 of the matrix ΣN(τ) = Σ A(τ)Σ with Σ being the (theoretical) autocovariance matrix of our trajectory XN. It is important to note that this result is true regardless of whether the considered model is stationary or not. Let us now formulate a test for checking whether a random sample XN comes from the MFBM with function H : [0, T] → (0, 1), where T is the time horizon:
H0 : sample comes from the model with function H(t) versus
H1 : sample comes from the model with function different than H(t).
We will use ACVFN as a test statistic with its distribution given by Equation (9) to calculate critical regions of such test for a given significance level. Naturally, eigenvalues λi(τ) depend on the matrix A(τ) as well as on the matrix Σ. Elements of Σ are given by ACVF (2) and are calculated using the function H(t) from the null hypothesis. Entropy 2020, 22, 1403 8 of 17
3.2. Three Power Case Studies The power of the test is the probability to reject the null hypothesis when the alternative is true. The power is an important characteristic of any statistical test. We consider the following null hypotheses, which correspond to the examples presented in Figures1–3.
(1) linear function null hypothesis H0 : H(t) = H (t) t ∈ [0, 1000], (2) logistic function null hypothesis H0 : H(t) = H (t), t ∈ [0, 1000], (3) periodic case null hypothesis H0 : H(t) = H (t), t ∈ [0, 1000].
In our studies, for all considered cases, we calculate the power of the test by using Monte Carlo simulations. We assume that the significance level is equal to 5%. In our Monte Carlo simulations we consider the time horizon T = 1000 and equally spaced time points t = 1, 2, ... , 1000. For each set of parameters from the alternative hypothesis, we simulate n = 1000 trajectories, calculate test statistic (7), and check if the null hypothesis is rejected at 5% significance level. In the test statistic, we consider only τ = 1 since other choices of τ lead to worse results. Finally, we estimate the power of this test for each considered case by calculating the fraction of rejected null hypotheses. We present the results in the form of power functions with arguments being the parameters of the function H(t) from the alternative hypothesis. For all of the cases, we considered the alternative coming from the same family of functions as the function H(t) in the null hypothesis, i.e., linear alternative for linear null, etc. First, let us consider testing MFBM with the linear Hurst exponent function. We can see the power function related to that case in Figure4. The left panel presents the power function with respect to parameters a and b from the alternative hypothesis H1 : H(t) = at + b, the right panel the corresponding heat map. We can see “layers” (regions on the heat map with the same color) for which our test has a very similar power. For example, the deep blue region on the right panel corresponds to the processes indistinguishable from the null hypothesis process. We believe that the shape of the region is related to the construction of our test, namely our test statistic ACVFN takes into account all addends X(t)X(t + τ) with the same weight, thus it is not that relevant whether t is big or not. In the case of MFBM, for which neither the process nor its increments are stationary, it might be an important factor. As a consequence, we can see that the test has a difficulty in distinguishing between MFBMs with increasing and decreasing Hurst exponent functions if their means are similar. However, this conjecture is not very precise, namely the mean case H ≡ 0.45, which matches the alternative hypothesis with a = 0, b = 0.45, yields a much higher power of the test than the significance level. We also note that for parameters from the null hypothesis: a = 0.3, b = 0.3 power of such test is approximately equal to 5%, which is the assumed significance level. On the heat map, white regions represent areas for which the process MFBM is not well-defined, i.e., H(t) ∈/ (0, 1) for some t ∈ [0, 1000]. Let us now consider the second case, that is MFBM with the logistic function H(t). We can observe the power function in Figure5. Left panel presents the power function with respect to parameters b c−b and c from the alternative hypothesis H : H(t) = n o + b, the right panel the corresponding 1 t−t0 1+exp −d T heat map. The parameter b is related to the local self-similarity parameter for t < 500, and c for t > 500. Similarly to the case with the linear function null hypothesis, here we can observe “layers” in which parameters b and c are almost symmetric (e.g., the null hypothesis case where b = 0.3 and c = 0.6 is closely related to the case b = 0.6 and c = 0.3). Moreover, we can see that the power function seems to be quite high in the cases when a tested sample has the b parameter close to the value 0.3 from the null hypothesis, but c is far from 0.6, or vice versa. Entropy 2020, 22, 1403 9 of 17
1
0.8
0.6
0.4
0.2
0
Figure 4. Power of the introduced test for the linear Hurst exponent function H(t) = at + b with 0.3 respect to parameters a and b. The null hypothesis is a = 1000 and b = 0.3. The right panel depicts the results in the form of a heat map with the red ’x’ sign representing parameters in the null hypothesis. White regions represent areas for which MFBM is not well defined. The powers were calculated by means of Monte Carlo simulations on the basis of simulated data from the MFBM with different as and bs.
1
0.8
0.6
0.4
0.2
0
Figure 5. Power of the introduced test for the logistic Hurst exponent function H(t) = c−b t−500 + b with respect to parameters c and b. The null hypothesis is c = 0.6 and b = 0.3. 1+exp{−100 1000 } The right panel depicts the results in the form of heat map with the red ’x’ sign representing parameters in the null hypothesis. White regions represent areas for which MFBM is not well defined. The powers were calculated by means of Monte Carlo simulations on the basic of simulated data from the MFBM with different cs and bs.
Lastly, let us consider the case of MFBM with the periodic function H(t). We can observe the power function in Figure6. The left panel presents the power function with respect to parameters a t and b from the alternative hypothesis H1 : H(t) = a sin 4π T + b, the right panel the corresponding heat map. Parameter b is related to the “mean” behavior of the function H(t), whereas the parameter a corresponds to its amplitude. Again, similarly to the two previous null hypotheses, we can observe “layers” of similar power values. Those layers are symmetric with respect to a = 0. This means that for alternatives with opposite parameters a the test seems to return the same power. Let us note that this is not intuitive, namely such opposite as are related to completely different local behaviors of the self-similarity parameter. On the heat map, white regions represent areas for which process MFBM is not well defined, i.e., H(t) ∈/ (0, 1) for some t ∈ [0, 1000]. Entropy 2020, 22, 1403 10 of 17
1
0.8
0.6
0.4
0.2
0