PHYSICAL REVIEW X 8, 031038 (2018)
Multiplex Decomposition of Non-Markovian Dynamics and the Hidden Layer Reconstruction Problem
Lucas Lacasa,1,* In´es P. Mariño,2,3,12 Joaquin Miguez,4 Vincenzo Nicosia,1 Édgar Roldán,5,6 Ana Lisica,7 Stephan W. Grill,8,9 and Jesús Gómez-Gardeñes10,11 1School of Mathematical Sciences, Queen Mary University of London, Mile End Rd, E14NS London, United Kingdom 2Department of Biology, Geology, Physics & Inorganic Chemistry, Universidad Rey Juan Carlos, 28933 Móstoles, Madrid, Spain 3Institute for Women’s Health, University College London, London WC1E 6BT, London, United Kingdom 4Department of Signal Theory & Communications, Universidad Carlos III de Madrid, 28911 Legan´es, Madrid, Spain 5The Abdus Salam International Center for Theoretical Physics, Strada Costiera 11, 34151 Trieste, Italy 6Max Planck Institute for the Physics of Complex Systems, Nöthnitzerstraße 38, 01187 Dresden, Germany 7London Center for Nanotechnology, University College London, London, United Kingdom 8Biotechnology Center, Technical University Dresden, Tatzberg 47/49, 01309 Dresden, Germany 9Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauer Strase 108, 01307 Dresden, Germany 10Department of Condensed Matter Physics, University of Zaragoza, 50009 Zaragoza, Spain 11GOTHAM Lab, Institute for Biocomputation and Physics of Complex Systems (BIFI), University of Zaragoza, E-50018 Zaragoza, Spain 12Department of Applied Mathematics, Lobachevsky State University of Nizhny Novgorod, Nizhniy Novgorod, Russia
(Received 2 February 2018; revised manuscript received 7 June 2018; published 7 August 2018)
Elements composing complex systems usually interact in several different ways, and as such, the interaction architecture is well modeled by a network with multiple layers—a multiplex network—where the system’s complex dynamics is often the result of several intertwined processes taking place at different levels. However, only in a few cases can such multilayered architecture be empirically observed, as one usually only has experimental access to such structure from an aggregated projection. A fundamental challenge is thus to determine whether the hidden underlying architecture of complex systems is better modeled as a single interaction layer or if it results from the aggregation and interplay of multiple layers. Assuming a prior of intralayer Markovian diffusion, here we show that by using local information provided by a random walker navigating the aggregated network, it is possible to determine, in a robust manner, whether these dynamics can be more accurately represented by a single layer or if they are better explained by a (hidden) multiplex structure. In the latter case, we also provide Bayesian methods to estimate the most probable number of hidden layers and the model parameters, thereby fully reconstructing its architecture. Thewhole methodology enables us to decipher the underlying multiplex architecture of complex systems by exploiting the non-Markovian signatures on the statistics of a single random walk on the aggregated network. In fact, the mathematical formalism presented here extends above and beyond detection of physical layers in networked complex systems, as it provides a principled solution for the optimal decomposition and projection of complex, non- Markovian dynamics into a Markov switching combination of diffusive modes. We validate the proposed methodology with numerical simulations of both (i) random walks navigating hidden multiplex networks (thereby reconstructing the true hidden architecture) and (ii) Markovian and non-Markovian continuous stochastic processes (thereby reconstructing an effective multiplex decomposition where each layer accounts for a different diffusive mode). We also state and prove two existence theorems guaranteeing that an exact reconstruction of the dynamics in terms of these hidden jump-Markov models is always possible for arbitrary
*Corresponding author. [email protected]
Published by the American Physical Society under the terms of the Creative Commons Attribution 4.0 International license. Further distribution of this work must maintain attribution to the author(s) and the published article’s title, journal citation, and DOI.
2160-3308=18=8(3)=031038(36) 031038-1 Published by the American Physical Society LUCAS LACASA et al. PHYS. REV. X 8, 031038 (2018)
finite-order Markovian and fully non-Markovian processes. Finally, we showcase the applicability of the method to experimental recordings from (i) the mobility dynamics of human players in an online multiplayer game and (ii) the dynamics of RNA polymerases at the single-molecule level.
DOI: 10.1103/PhysRevX.8.031038 Subject Areas: Biological Physics, Complex Systems, Interdisciplinary Physics
I. INTRODUCTION by using only local information extracted from simple random-walk statistics, it is possible to discriminate whether Network science has emerged as a powerful unifying the underlying structure of the system is actually a single- framework for studying the emergence of collective phenom- layer or a multilayer network and, in the latter case, to ena in real complex systems from different domains [1,2] and estimate the number of interacting layers in the system. Note has allowed us to increase the accuracy and predictive power that methods to infer network topological properties via of minimal models of complex dynamics, including epidemic random-walk statistics have been explored previously spreading [3], synchronization [4],orsocialdynamics[5]. [32,33]. Notably, our discrimination method exploits the One of the most fascinating challenges faced in the last few breaking of Markovianity occurring in a coarse-grained yearsby network science is the need to incorporate and couple multilayer random walk, while the method to estimate the several network structures in order to correctly capture the most probable number of layers is based on a maximum inherently multidimensional nature of interaction patterns in a posteriori (MAP) probabilistic criterion, which can be real-world systems. As a result, much effort has been recently implemented via numerical integration methods, including devoted to the definition and study of multilayer and – conventional grid-based approximations [34] or more multiplex networks [6 8]. The ubiquity of such structures sophisticated Monte Carlo algorithms [35], which we show in social, biological, and technological systems has required increase the computational efficiency. the revision of several canonical dynamical models that were Interestingly, this paper not only deals with a particular previously studied only on isolated complex networks, problem of network science. As a matter of fact, we show including percolation [9–13], diffusion dynamics [14,15], navigation [16–18], epidemics [19–22], evolutionary games (a) (b) [23–25], synchronization [26], or opinion dynamics [27,28]. Notably, the collective behavior of such complex systems depends strongly on whether they can be described by isolated networks or by coupled networks [29], highlighting the importance of the multilayer architecture of real-world systems. Multilayer network models of real-world systems face two fundamental and dual challenges. The first one is the necessity to assess, in a systematic way, whether a multi- layer network model is adequate to represent the system and when such a model gives redundant information. This challenge was first addressed in Ref. [30] and constitutes an intense field of research today. The dual challenge aims at understanding whether an empirical network whose multi- FIG. 1. (a) A multiplex network with L ¼ 2 layers (L1 and L2) layer character is not directly observable is genuinely and K ¼ 7 nodes. A random walker diffusing over this structure monolayer or is only an aggregated projection of a hidden l N generates a two-dimensional time series fXðtÞ; ðtÞgt¼1, where multilayer network [see Fig. 1(a) for an illustration of a XðtÞ and lðtÞ are the vertex and layer locations of the walker at multiplex network with L ¼ 2 layers and its aggregated time t. In many real-world cases, the layer indicator lðtÞ is N projection]. Such a scenario has received much less hidden, and one has access only to fXðtÞgt¼1, i.e., to the series of attention despite being, for instance, central for networks states of the walker on the projected network (P) shown in the arising in natural systems whose architecture is not directly bottom of the figure. For L>1, the resulting trajectory is non- observable, as in genetic networks or in brain functional Markovian: We rely on this Markovianity-breaking phenomenon property to detect multiplexity and to provide an estimate of the networks where pairs of nodes modeling different brain N number of layers in the system by observing only fXðtÞgt¼1. areas can interact according to an a priori unknown range (b) Simple canonical model where we fix a prior on the topology of different biological pathways [31]. of each layer: a cycle graph with homogeneous transition rates, In this article, we provide a method to identify the hidden with uniform interlayer transition rates. This model serves as a multilayer structure of a complex system from coarse- basis for a generic stochastic decomposition of non-Markovian grained dynamical measurements of its state. We show that, dynamics on the aggregated network.
031038-2 MULTIPLEX DECOMPOSITION OF NON-MARKOVIAN … PHYS. REV. X 8, 031038 (2018) that the mathematical formulation can indeed be applied to A multiplex network is defined by a set of L ≥ 1 signals of arbitrary origin—not necessarily random walkers interaction layers (networks), all of them having the same navigating a network—and the multiplexity estimation set of K nodes but different topology (different edge set), framework reduces in the general case to a stochastic with the peculiarity that each node has a replica in each decomposition of the signal in terms of an effective layer [see Fig. 1(a) for an illustration]. This structure is multiplex network, whose layers play the role of indepen- thereby fully described by a set of adjacency matrices dent dynamical modes. More concretely, non-Markovian ðlÞ L ðlÞ fA gl 1, where Aαβ ¼ 1 if there is an edge between dynamics can thereby be decomposed into a stochastic ¼ nodes α and β at layer l and zero otherwise. For simplicity, combination of diffusive modes by projecting the dynamics we label the different layers of the multiplex network with into an appropriate hidden jump-Markov model. Roman letters (i, j, etc.) and the nodes of each layer with We validate the proposed methodology with numerical Greek letters (α, β, etc.). simulations of (i) random walks navigating hidden multiplex We consider a random walker navigating a multiplex networks (thereby reconstructing the true architecture of the [16] defined as follows: Jumps between layers are governed hidden multiplex networks) and (ii) both Markovian and by a Markov chain with an L × L transition matrix R (R non-Markovian continuous stochastic processes (thereby L ij is the probability to jump from layer i to layer j), while the reconstructing an effective multiplex decomposition where l each layer accounts for a different dynamical mode). dynamics within each individual layer is also Markovian TðlÞ Furthermore, we state and prove two existence theorems, and determined by a K × K transition matrix (where ðlÞ guaranteeing that such multiplex decomposition is always Tαβ is the probability to walk from node α to node β at possible for any finite-order Markovian and infinite-order layer l). For simplicity, we only consider diffusive dynam- (fully non-Markovian) process. Specifically, we show that ics, where at each time step, the walker at node α on layer l random sequences generated by those processes can be (i) remains in the same layer with probability Rll ¼ 1 − r exactly reconstructed as a random walk over an effective or instantaneously jumps with uniform probability Rll0 ¼ multiplex network. Finally, we apply our method to exper- r=ðL − 1Þ to a different layer l0 and, subsequently, (ii) dif- imental recordings of two complex systems of different fuses to one of the neighbors of node α in the chosen layer nature and show that the method can be leveraged to according to the layer’s internal dynamics (given by TðlÞ or decompose noisy, non-Markovian processes into alternating Tðl0Þ). Notice that this type of dynamical model can be combinations of simpler dynamics, and we extract valuable mathematically formalized in terms of a jump-Markov information accordingly. affine system, as defined in the field of control theory The article is structured as follows: In Sec. II, we propose (see Ref. [39] and references therein for a review). the methods for multiplexity detection and layer estimation, When r ≪ 1, i.e., when walkers tend to remain in the and we explore their performance in a few examples. In same layer, this navigation model might mimic, for Sec. III, we discuss the analogy between multiplexity instance, human mobility in multilayered transportation unfolding and the decomposition of non-Markovian proc- networks [40,41], where multimodality is minimized esses as multiple-layer Markovian processes; therefore, to avoid waiting times related to connections between we extend the methodology to continuous time processes. different modes. We also state and discuss the implications of two existence In the particular case when layers have a simple cycle- theorems, whose proofs are put in two appendixes for graph topology, the jump-Markov model described above is readability. In Sec. IV, we address real-world scenarios, also reminiscent of the so-called discrete flashing ratchet where we analyze two sets of experimental recordings, model [42,43], better known as a Parrondo game [44,45]. namely, human mobility in an online environment and This is a paradoxical gambling strategy that allows winning traces of RNA polymerase, which further showcase the in losing scenarios, where gamblers can alternate between applicability of the method. In Sec. V, we provide a two different strategies (game A, layer 1; game B, layer 2), discussion of our results. Mathematical details and addi- each of them having different rules and winning proba- tional examples can be found in the appendixes. bilities. Our model for L ¼ 2 can be seen as a variant of a Parrondo gambler that plays with probabilities r and 1 − r II. DESCRIPTION OF THE METHODS AND with two different biased coins. ILLUSTRATIVE EXAMPLES More generally, Brownian ratchets are paradigmatic Multiplex networks are the most ubiquitous class of models used in nonequilibrium physics to describe the multilayer networks. They are a natural model for online transport of Brownian particles embedded in periodic, social networks [36], where a given individual can commu- asymmetric energy potentials, a paradigm originally pro- nicate with others via different platforms (e.g., Facebook, posed by Smoluchowski [46] and popularized by Feynman Twitter, email, etc.) or transportation networks [37,38], [47] in the context of thermodynamic engines, and further where a set of locations can be connected in a multimodal shown to be a minimal model system for molecular motors way (e.g., bus, train, underground, etc.). in biophysics [43]. Again, a Brownian particle subject to a
031038-3 LUCAS LACASA et al. PHYS. REV. X 8, 031038 (2018) periodic asymmetric potential that is switched on and off As a proof of concept, we first consider the simple stochastically is formally equivalent to our random walker scenario where a random walker navigates over a two-layer navigating over a multiplex with L ¼ 2 cycle graphs with multiplex ring (each layer is a cycle graph of K nodes), a different transition matrices [48]. model compatible with a discrete flashing ratchet as The general navigation process on a multiplex network mentioned before. In the first layer, we define a Markov discussed above can be expressed as a stochastic process ð1Þ chain with homogeneous transition probabilities Tαþ1;α ¼ fully described by an infinite two-dimensional time series ð1Þ ð1Þ ∞ 2=3; T 1=3 and T 0 if β ≠ α 1 mod K or fXðtÞ; lðtÞg , where XðtÞ ∈ f1; …;Kg and lðtÞ ∈ α;αþ1 ¼ αβ ¼ þ t¼1 β ≠ α − 1 f1; …;Lg. As in real-world scenarios, the multiplex nature ð Þ mod K. A random walker diffusing in this of the system is not always empirically accessible, the layer layer will have an induced current in the direction of indicator lðtÞ is usually hidden, and the only observable is decreasing node indices. In the second layer, we define a the sequence of node locations Xð1Þ;Xð2Þ; … . In such a different Markov chain with homogeneous transition prob- ð2Þ 1 2 ð2Þ 1 2 ð2Þ 0 situation, we only have experimental access to partial abilities Tαþ1;α ¼ = ; Tα;αþ1 ¼ = and Tαβ ¼ other- information of the process, described by a finite sequence wise, i.e., an unbiased random walk. While we can always O N of observations of the variable X: ¼fXðtÞgt¼1. This is estimate Q numerically from the observed time series, formally equivalent to observing a dynamical process on in this simple case, it is easy to derive it analytically: the aggregated (projected) network. Hence, the question is ð1Þ ð2Þ Qαβ ¼ W1Tαβ þ W2Tαβ , where W1 (W2) is the probability as follows: Is it possible to discern if the system is multiplex of finding the walker in layer l ¼ 1 (l ¼ 2). Now, since in and, in that case, to estimate the most probable number of this case R ¼ R ¼ r, the system is symmetric with layers if we only have access to O? We now propose and ij ji respect to (w.r.t.) the switching process, and the walker test a novel method to achieve this highly nontrivial task. spends, on average, the same amount of time in each of the two layers, W1 ¼ W2 ¼ 1=2, and then Qαβ ¼ A. Method to detect multiplexity ð1Þ ð2Þ Tαβ =2 þ Tαβ =2. For this specific example, we thus Consider the Markov switching walker discussed above, find Qα 1 α ¼ 7=12;Qα α 1 ¼ 5=12. navigating on a (multiplex) network from which we þ ; ; þ The left panel of Fig. 2 shows numerical results of DðmÞ only have access to coarse-grained information given by O as a function of the block size m for different switching . Initially assuming XðtÞ is a Markov process, we can rates r. In order to deal with finite-size effects (which estimate directly from O the (monoplex) transition matrix Q increase exponentially with m), we systematically increase that would describe such a Markovian dynamics. the size of the walker series under study as a function Accordingly, we can define a Markov chain associated m of m, taking series of size NðmÞ¼N0 · 2 data extracted to Q and generate a Markovian surrogate of the original N from the original system and from the corresponding process fYðtÞg 1. If the underlying network was truly 5 t¼ Markovian surrogates. We use N0 ¼ 10 , although smaller monoplex, then XðtÞ would actually be Markov, and XðtÞ values yield qualitatively equivalent results. As expected, and YðtÞ would then have asymptotically equivalent sta- D 1 D 2 0 … … ð Þ¼ ð Þ¼ , meaning that YðtÞ is a faithful tistics, PXðZ1; ;ZmÞ¼PYðZ1; ;ZmÞ, for all possible D 0 … Markovian surrogate of XðtÞ. Furthermore, ðmÞ > sequences Z1; ;Zm of any arbitrary length m.For ≥ 3 l for m , meaning that XðtÞ is non-Markovian [49]; multiplex structures however, losing information of ðtÞ, hence, the underlying network is correctly identified as a in general, breaks Markovianity, and therefore, XðtÞ is multiplex. This result is robust for a quite large range of typically non-Markovian. Accordingly, XðtÞ and YðtÞ now values of r (see Appendix B, and note that r ¼ 0.5 is a share the same joint distributions only up to blocks of size trivial exception), meaning that the method works even if m ¼ 2. For blocks of size m ≥ 3, their statistics may differ, … ≠ … the walker makes fast switches between layers. A similar PXðZ1; ;ZmÞ PYðZ1; ;ZmÞ. To quantify such a dif- scenario is found if we tune the transition probabilities such ference, we make use of the mth-order Kullback-Leibler ≥ 1 that no net-induced current is found (see Appendix B), divergence rate [49,50] between data blocks of size m : pointing out that multiplexity can be unraveled even in that X case. In the right panel in Fig. 2, we plot Dð3Þ for different 1 PXðZ1; …;ZmÞ DðmÞ ≔ PXðZ1; …;ZmÞ log ; ð1Þ series sizes to showcase how finite-size effects vanish as the m P ðZ1; …;Z Þ BðmÞ Y m series gets larger. Notably, with a few thousand data points, we can already accurately detect multiplexity. where BðmÞ enumerates all the blocks of size m. The Furthermore, we have demonstrated the flexibility and statistic DðmÞ is semipositive definite and vanishes only robustness of our method to detect multiplexity in a range when the joint probabilities coincide [51]. Thus, by con- of additional scenarios, including (i) layers with increas- struction, Dð1Þ¼Dð2Þ¼0. The Markovianity-breaking ingly different and disordered topologies controlled by both criterion implies that if DðmÞ > 0 for m ≥ 3, then the rewiring and edge addition, (ii) Erdos-Renyi graphs, and underlying dynamics is multiplex [52]. (iii) similar scenarios on larger graphs. For all cases, we
031038-4 MULTIPLEX DECOMPOSITION OF NON-MARKOVIAN … PHYS. REV. X 8, 031038 (2018)
FIG. 2. (Left panel) Multiplexity detection statistic DðmÞ [see Eq. (1)]forXðtÞ, in a case where XðtÞ shows an induced current, for different values of the switching rate r. The series XðtÞ records the position of a walker diffusing over two layers, with transition ð1Þ 1 3 ð2Þ 1 2 probabilities in the layers given by Tα;αþ1 ¼ = and Tα;αþ1 ¼ = , respectively, such that the transition probability in the Markovian 5 12 D 0 surrogate series is Qα;αþ1 ¼ = . We correctly find that XðtÞ is non-Markovian even for large values of r as ðmÞjm≥3 > , which suggests an underlying multiplex structure. (Right panel) Finite-size-scaling analysis where we compute Dð3Þ as a function of the series size N, for the multiplex considered in the left panel (r ¼ 0.1, green dots) and a null model with equivalent monoplex dynamics (red squares) for which Dð3Þ should vanish for large values of N. In both panels, the symbols represent the mean, and the error bars represent the standard deviation calculated over 10 different realizations.
ˆ MAP N find a correct multiplexity detection and good scalability L ¼ arg max PðfXðtÞg 1jLÞP0ðLÞ; ð3Þ ∈ 1 2 … t¼ (see Appendix B). L f ; ; ;Lþg
where Lˆ MAP is the estimate of the actual number of layers, B. Method to quantify multiplexity 2 ≤ ∞ Lþ < is the (assumed) maximum possible value The Markovianity-breaking phenomenon that we have N ∝ N of L, and PðLjfXðtÞgt¼1Þ PðfXðtÞgt¼1jLÞP0ðLÞ is the exploited only provides a means to discriminate whether a posteriori probability mass function of L given the hidden layers do exist in the multiplex, but not to quantify N observations fXðtÞgt¼1. the number of underlying layers. In order to bridge this gap, The practical computation of the MAP estimator in we now make use of statistical inference tools to define a Eq. (3) can be addressed in different ways. The classical model selection scheme [53]. We assume that two models literature on hidden Markov models (HMMs) [54–56] are different if they have a different number of layers. suggests the use of the expectation-maximization (EM) Accordingly, the number of layers L is now modeled as a algorithm (in various forms) to compute approximate random variable with prior probability mass function ˆ ˆ maximum likelihood (ML) estimators, TL and RL,of P0ðLÞ. Given the value of L, the motion of the random the parameters and then assume that PðfXðtÞgN jLÞ ≈ walker is determined by the Markov switching of layers, t¼1 N Tˆ Rˆ PðfXðtÞgt 1j L; LÞ in order to compare the models (i.e., with transition matrix RL, and Markov walks within each ¼ l one tries to optimize the parameters instead of averaging layer characterized by T ¼fTð ÞgL . Assuming prior L l¼1 over them). This approach has also been applied to jump- probability density functions for these parameters p0 ðR Þ ;R L Markov affine systems [57], and it relies on standard and p0 ðT Þ, the likelihood of a given model with L layers ;T L techniques; however, it has several drawbacks and is X t N conditional on the observed data f ð Þgt¼1 reads therefore not adopted here. Actually, the equation of the Z model likelihood with the parameter likelihood easily N N breaks down when the parameter estimates are poor PðfXðtÞg 1jLÞ¼ PðfXðtÞg 1jT ; R Þ t¼ t¼ L L (e.g., because of overfitting). Another major disadvantage of EM is that it converges locally and thus performs badly × p0;TðTLÞp0;RðRLÞμðdTL × dRLÞ; when the parameter likelihood is multimodal or when ð2Þ the parameter dimension varies significantly for different models. More sophisticated parametric schemes have been where μ is a suitable reference measure for TL and RL (see proposed (see, e.g., Ref. [58]); however, they are still Appendixes C and D for technical details). In general, this subject to these fundamental limitations. Integration in multidimensional integral cannot be computed exactly, and Eq. (2), which we adopt here, has been favored theoreti- it needs to be approximated numerically. Then, the number cally but criticized practically because of the computational N N of layers L in the system that generated the data fXðtÞgt¼1 cost of approximating PðfXðtÞgt¼1jLÞ numerically [56]. can be detected using a MAP criterion, i.e., However, we have found that state-of-the-art variational
031038-5 LUCAS LACASA et al. PHYS. REV. X 8, 031038 (2018)
Bayes [59] or adaptive importance sampling [60] methods can be applied effectively up to moderate values of L. See Appendixes C3 and D for further discussion, including examples of using both deterministic integration and the adaptive Monte Carlo sampler from Ref. [61]. To illustrate the MAP model selection method given by Eq. (3), we consider again the discrete flashing ratchet model formed now by L ¼ 3 ring-shaped layers with K ¼ 3 nodes and homogeneous transition probabilities [see Fig. 1(b) for an illustration and Appendix D for other examples]. In this example, the probability to stay in the same layer is Rll ¼ 1 − r ¼ 0.84, and the probabilities for α → α 1 ð1Þ the walker to move from node þ are Tα;αþ1 ¼ N 2 3 FIG. 3. Posterior probability PðLjfXðtÞgt¼1Þ as a function of 0 16 ð Þ 0 76 ð Þ 0 24 4 . , Tα;αþ1 ¼ . , and Tα;αþ1 ¼ . , respectively the number of layers L, computed from a trajectory of 2 × 10 ðlÞ 1 − ðlÞ l ðlÞ 0 time steps generated using a model with L ¼ 3 cycle graphs (Tαþ1;α ¼ Tα;αþ1 for every , and Tαβ ¼ for any (K ¼ 3) and parameters Rii ¼ 1 − r ¼ 0.84 for i ¼ 1,2,3, other α and β). 1 2 3 Tð Þ 0.16, Tð Þ 0.76, and Tð Þ 0.24 (see the text In order to evaluate the likelihood function α;αþ1 ¼ α;αþ1 ¼ α;αþ1 ¼ N and Appendix B for details). The algorithm easily estimates the PðfXðtÞgt¼1jLÞ, we need to approximate the integral in 3 1 … 4 correct model L ¼ . (Inset) A linear-log plot of the same graph. Eq. (2) for each value L ¼ ; ; as discussed previously. Note that the probability for L 1 is zero up to the computer’s 4 ¼ Here, we assume Lþ ¼ . A very simple strategy is to accuracy; hence, this point is not shown in logarithmic scales. evaluate it via numerical integration over a deterministic grid of 19 points on the interval (0,1) for each unknown parameter. For the case L ¼ 1, this reduces to a single ð1Þ computationally inefficient. Accordingly, we have further unknown, T , and for L>1, there are L þ 1 unknowns: α;αþ1 considered alternative approximations of the integral in ðlÞ 1 … r and Tα;αþ1 for l ¼ ; ;L. Note that the particular choice Eq. (2) using a nonlinear population Monte Carlo (NPMC) of transition probabilities was taken to make the problem algorithm [61], and we have effectively reduced the runtime more challenging, as these values are not commensurate by a factor close to 100 on the same computer for the with the integration grid points. We use an unbiased prior example of Fig. 3 (see Appendixes C and D for details). probability density function p0;RðRLÞ given by a uniform Actually, our NPMC is a sophisticated importance sam- probability density function on (0,1) for the unknown pling procedure by which an efficient grid of the parameter parameter r, while the prior p0;TðTLÞ is used to penalize space is obtained. This procedure meshes, in a tight way, system configurations with two or more identical layers the regions where the posterior probability density of the [62]. For this particular numerical experiment, the penal- parameters is high and, in a sparse way, in the regions ðlÞ ðl0Þ where the posterior probability density is low. Accordingly, izing prior is p0 ðT Þ ∝ min l l0 jTα α 1 − Tα α 1j; i.e., the ;T L ð ; Þ ; þ ; þ the nodes of this mesh cover all the regions of the parameter prior probability density function of a given configuration space where the likelihood is sufficiently large (this TL is proportional to the minimum distance between any 0 approach ensures a global exploration of the parameter TðlÞ Tðl Þ pair of matrices and . Since in this example each space, at odds with the EM algorithm, which can only TðlÞ l scalar fully characterizes layer , this prior simply perform a local search). A simple inspection of the like- penalizes configurations where two or more layers are very lihoods of each node in the grid allows us to robustly decide similar and thus avoids overfitting. In general, the prior which is the one with highest likelihood. Accordingly, T p0;Tð LÞ is set to penalize models where pairs of layers we are able to infer that the maximum of the posterior have similar transition matrices, to avoid redundancy, N T T probability density PðfXðtÞgt¼1j L;rÞp0;Tð LÞ [i.e., the T ∝ TðlÞ − Tðl0Þ so p0;Tð LÞ minðl;l0Þjj jj, for some chosen integrand of Eq. (2) with uniform prior for r] is indeed norm jj · jj. attained at the true value of T3 (see Appendix D5 for Figure 3 shows that the true model is easily estimated details). We therefore conclude that our model selection with the proposed scheme, as the posterior probability scheme correctly estimates not only the number of layers emphatically peaks at L ¼ 3, which implies that Lˆ MAP ¼ 3. but also the transition probabilities within each layer, i.e., Without penalization—i.e., with a uniform prior p0;T—we the full architecture. Finally, we have also verified that the obtain multiple equivalent solutions involving layers with posterior probability density is smooth close to its maxi- ˜ identical values of the estimated parameters. mum, as perturbations T3 ¼ T3 þ δT systematically yield Now, it is well known that direct, grid-based determin- a smaller posterior probability density function for suffi- istic integration of the posterior probability is intuitive but ciently large N (see Fig. 21).
031038-6 MULTIPLEX DECOMPOSITION OF NON-MARKOVIAN … PHYS. REV. X 8, 031038 (2018)
III. MULTIPLEX DECOMPOSITION OF Langevin equation, where the noise term is not white NON-MARKOVIAN DYNAMICS anymore but has some color. Such a process can be described by the following Langevin equation: As it happens for any Bayesian inference-based method [53,63,64], our approach, in principle, would provide a x_ −x η; 6 conclusive indication of the hidden multiplexity in the case ¼ þ ð Þ where the prior—intralayer dynamics are diffusive—is a η t reasonable assumption. Notwithstanding, the methodology with ð Þ being defined by an Ornstein-Uhlenbeck process η t 0 η t η t0 ∝ − t − t0 =τ =τ proposed here actually extends above and beyond the such that h ð Þi ¼ and h ð Þ ð Þi expð j j Þ , τ > 0 reconstruction of hidden multiplex architectures using with the correlation time of the noise. Note that in Eqs. (5) and (6), the dot denotes the time derivative. walkers with partial information. As a matter of fact, a η similar approach can be considered even when the archi- When the noise term is not white anymore, the Langevin tecture is truly single layered. Suppose, for instance, that a equation (6) generates non-Markovian trajectories for the x given observed time series is truly the outcome of a non- variable . Interestingly, Eq. (6) has been attracting con- Markovian dynamics running on a physical single-layered siderable attention in soft matter as a minimal model for the network. In that case, our multiplex reconstruction method non-Markovian dynamics of the position of a passive would still provide the most probable multiplex model, particle immersed in an active (e.g., bacterial) bath [65,66]. We perform numerical simulations of these processes with L>1 due to lack of the walker’s Markovianity. The using a Euler-Mayurama algorithm and subsequently key difference is that now layers in the hidden multiplex 104 would be providing the most probable effective multiplex embed the resulting trajectories of data in a cycle- 4 reconstruction with Markovian intralayer dynamics, which graph topology [see Fig. 1(b)] via Eq. (4) with K ¼ ; then, we apply our multiplex detection and estimation protocol. would yield such complex dynamics. Incidentally, note D that this brand of effective models is used in community In the top panels of Fig. 4, we plot ðmÞ and the outcome detection in single-layered networks, which result from of a layer estimation. For the Ornstein-Uhlenbeck process, the method does not detect multiplexity (and a layer finding the optimal number of effective groups of nodes 1 which maximize a certain likelihood function [53]. estimation confirms that a model with an L ¼ layer is In what follows, we first capitalize on this new inter- the most probable one), in good agreement with the fact pretation to extend our previous analysis on random walks that the Ornstein-Uhlenbeck process is indeed a Markovian on graphs to continuous stochastic processes, and then we process. On the other hand, the Langevin equation with present two existence theorems that guarantee that this colored noise has a nontrivial memory kernel and generates stochastic decomposition is universally applicable to ran- non-Markovian dynamics, which is correctly captured D 3 0 dom sequences with arbitrary memory. by the fact that ð Þ > . In this case, the dynamics optimally decomposes into a Markov switching combina- tion of L ¼ 2 layers. A. Extension to continuous processes To further validate these results, we now generate When the original dynamics is continuous, we can synthetic trajectories from the estimated multiplex models discretize motion and embed the original time series into with L layers. Importantly, we should recall that our layer a simple graph topology, for example, via the transforma- estimation method makes use of an importance sampling tion fXðtÞg → fX˜ ðtÞg given by X˜ ð0Þ¼0 and algorithm to concentrate the Monte Carlo search in the regions of the parameter space with large likelihood. As a ˜ 1 1 by-product, we are are able to estimate—given the number ˜ XðtÞþ mod K; if Xðt þ Þ >XðtÞ Xðt þ 1Þ¼ of layers L—the parameters of the most likely model, and X˜ ðtÞ − 1 mod K; if Xðt þ 1Þ
031038-7 LUCAS LACASA et al. PHYS. REV. X 8, 031038 (2018)
FIG. 4. (Top-left panel) Multiplexity detection statistic DðmÞ, applied to a trajectory generated by (blue) an Ornstein-Uhlenbeck process (Markovian, see the text) and (green) a Langevin equation with colored noise (non-Markovian, see the text) after embedding the trajectories in a ring topology via Eq. (4). The multiplexity detection is negative in the Markovian case and positive for the Langevin equation with a nonwhite noise term, as expected. (Top-right panel) Posterior probability of a multiplex model with L layers, confirming that L ¼ 1 (i.e., monoplex) is the most probable model for the Markovian case (Ornstein-Uhlenbeck) and that a model with L ¼ 2 is the most likely model for the case of a Langevin equation with correlated noise. (Bottom-left panel) DðmÞ (order-m Kullback Leibler divergence) between the Ornstein-Uhlenbeck (Markovian) process and a synthetic series generated by the most likely model with L layers (for L ¼ 1,2,3). Estimation of the model parameters (the architecture) is possible thanks to the importance sampling procedure, which preselects parameter configurations with high likelihood, in such a way that one can find the global optimum by searching in this reduced set. The synthetic series that shows the highest similarity with the original process is found for the model L ¼ 1 (only improved by the comparison with a Markovianized time series) coinciding with the prediction found by the model selection scheme. (Bottom-right panel) Similar measures as the left panel, performed on the series extracted from the Langevin equation with colored noise (non-Markovian). In this case, the synthetic series that shows more similarity with the actual non-Markovian series is the one generated by the most likely model with L ¼ 2 layers. shown in the following 2D (Markovian) stochastic differ- [such as the ones generated by Eqs. (5) and (6)], the ential equation: effective multiplex model provides a good reconstruction of the original dynamics. Hence, the question is as follows: Is x_ ¼ −x þ y; y_ ¼ −βy þ ξ; ð7Þ this type of hidden jump-Markov model able to fully (i.e., exactly) reconstruct any random sequence (i.e., high-order Markovian and fully-non Markovian)? These questions are with ξ a Gaussian white noise process and β > 0 a positive answered affirmatively in the next subsection, where we constant. state and prove two theorems that address these matters. In even more general terms, our methodology provides a mathematically sound solution for the stochastic projection of non-Markovian dynamics onto a base of simple diffusive B. Exact decomposition of Markovian and dynamical modes. For random sequences that were origi- non-Markovian dynamics via multiplex models nally generated by a random walker navigating a hidden In this subsection, we pose the question of whether the multiplex network, the method easily reconstructs such statistics of arbitrary random sequences fXðtÞgt≥0, possibly hidden architecture, whereas for general random sequences with long memory, can be recovered exactly using a hidden
031038-8 MULTIPLEX DECOMPOSITION OF NON-MARKOVIAN … PHYS. REV. X 8, 031038 (2018) jump-Markov model of the class described in Sec. II (from the transition probabilities of a Markov model of order h. now on, we refer to it as a multiplex model). The answer is On the other hand, multiplex models where the transitions positive (in a probabilistic sense, to be made precise), as between layers are independent of the walker’s past summarized by two representation theorems. trajectory yield sequences with infinite memory (except In particular, we first state and prove that every Markov for pathological cases). model offinite (but arbitrarily large) order h can be recast into a multiplex model. Then, we address the representation of 2. Representation of sequences with infinite memory models with infinite memory, by letting h → ∞, and show that, under mild regularity assumptions, they admit a compact In the second part, we extend the previous existence representation in the form of a model with an uncountable theorem to a wide class of infinite memory (i.e., fully (continuous) set of hidden layers. For readability, we have non-Markovian) models. Let fXhðtÞgt≥0 denote a random K 1 … relegated as many technical details as possible (including all sequence on ¼f ; ;Kg that can be represented the mathematical proofs) to Appendixes F and G, and we only exactly by a Markov model of order h. We consider here discuss the key results and implications here. the class of sequences with infinite memory that can be obtained from Markov models as h → ∞ and refer to them as “Markov-∞” sequences. To be precise, we say that 1. Representation of Markov models of order h → ∞ fXðtÞgt≥0 is the limit of fXhðtÞgt≥0 as h , and we write Let us consider a discrete state space with K elements, K ¼f1; …;Kg (this will be the node set of the multiplex). d X¼ lim Xh ð8Þ A Markov sequence of (integer) order h ≥ 1 is defined by h→∞ the set of Khþ1 probability masses when we can approximate the transition probabilities of the h sequence fXðtÞgt≥0 with an arbitrarily small error using a P ði ji − ∶ −1Þ ≔ P(XðtÞ¼i jXðt − h∶t − 1Þ¼i − ∶ −1); K t t h t t t h t Markov model of sufficiently large order. Specifically, we need to introduce the following technical definition: where Xðt−h∶t−1Þ¼fXðt−hÞ;…;Xðt−1Þg and it−h∶t ¼ … ∈ Khþ1 1 Definition 1: The random sequence fXðtÞgt≥0 is fit−h; ;itg is a sequence of h þ state-space ∞ h Markov- if it satisfies the regularity conditions below: observations. If we fix i − ∶ −1, then P i i − ∶ −1 is a t h t Kð tj t h t Þ (C1) The joint probability of any sequence of states probability mass function (PMF)P and, as a consequence, h h vanishes uniformly with the length of the sequence, i.e., P i i − ∶ −1 ∈ 0; 1 and P i i − ∶ −1 1. Kð tj t h t Þ ½ it∈K Kð tj t h t Þ¼ Now, any Markov model of finite order h can be lim sup PðXð0∶tÞ¼i0∶tÞ¼0: t→∞ tþ1 transformed into an equivalent multiplex model with a i0∶t∈K sufficiently large, but finite, number of layers L, as formally stated below. (C2) There exists a sequence of Markov models 1 2 … ϵ 0 Theorem 1: For every Markov model of order h<∞ on fXhðtÞgt≥0, h ¼ ; ; , such that for any > the state space K ¼f1; …;Kg, there exists an equivalent (arbitrarily small), there exists h0 ∈ N (sufficiently multiplex model, with observation space K and L ¼ Kh large) that guarantees layers. 0∶ − 1 − … The proof of Theorem 1 is technical and is therefore put in sup jPðXðtÞ¼itjXð t Þ¼i0∶t−1Þ ∈Ktþ1 Appendix F. Theorem 1 guarantees that every random i0∶t h Markov sequence with finite memory h can be represented − PKðitjit−h∶t−1Þj < ϵ; by a multiplex model with Kh layers (i.e., this is an existence theorem). The theorem does not state that this representation for every t>hand is unique, though, and it does not state that it is minimal either. According to the numerical evidence given in the sup jPðXðtÞ¼itjXð0∶t − 1Þ¼i0∶t−1Þ − … ∈Ktþ1 previous sections, we conjecture that a suitable selection of i0∶t R T h L and L (e.g., using the estimation methods described in − PKðitjit−h∶t−1Þj ¼ 0; this paper) can yield an accurate representation of a sequence of order h with considerably less than Kh layers. for every t ≤ h, whenever h>h0. Let us also remark that there is no contradiction between Let fXLðtÞg represent a random sequence generated by a Theorem 1 and our earlier claim that multiplex models can multiplex model with L layers. Since every Markov-∞ represent systems with infinite memory. Indeed, depending model is the limit of a sequence of Markov systems with on the choice of its parameters (L, RL, and TL), a multiplex increasing order, h → ∞, they can also be obtained (via system can yield random sequences with either finite or Theorem 1) as the limit of a sequence of multiplex systems infinite memory. For example, in the proof of Theorem 1, as the number of layers grows, L → ∞. Moreover, it turns we explicitly construct a multiplex model that matches out that the limit limL→∞fXLðtÞgt≥0 can be interpreted
031038-9 LUCAS LACASA et al. PHYS. REV. X 8, 031038 (2018) itself as a multiplex model with an uncountable set of layers There are a large variety of problems involving the afore- and a first-order Markov system on K associated with each mentioned scenarios, which could be amenable to our layer. This is made precise by the following theorem: approach. To guarantee computational efficiency, we would ∞ Theorem 2 Let fXðtÞgt≥0 be a Markov- random only require that the network over which the agents move is sequence on K. There exists a Markov kernel not too large, something that in the general case can be achieved by coarse graining the network via community M∶ Bð½0; 1ÞÞ × ½0; 1Þ → ½0; 1 ; ð9Þ detection. Mathematically, within this context, a non- Markovian process running on a monoplex and a Markov where Bð½0; 1ÞÞ denotes the Borel σ-algebra of subsets of switching process running on a multiplex are equally valid [0, 1), a probability measure M0∶Bð½0; 1ÞÞ → ½0; 1 , and an models, much in the same way a function is equivalent to its uncountable family of K × K transition matrices Fourier series representation. Still, we consider this new 2 3 interpretation not just suggestive but also parsimonious from … T11ðyÞ TiKðyÞ a cognitive point of view, and therefore, it might be of 6 7 . . . relevance in the study of memory in search processes. TðyÞ¼6 . . . 7;y∈ ½0; 1Þ; ð10Þ 4 . . . 5 To illustrate this type of application, we consider TK1ðyÞ … TKKðyÞ experimental mobility trajectories performed by players (agents) in a virtual environment: The Pardus universe (see such that Ref. [75] for details). The Pardus universe is a multiplayer online role-playing video game, which is used as a large- PðXðtÞ¼itjXð0∶t−1Þ¼i0∶t−1Þ scale socioeconomic laboratory to study mobility in a Z Z Yt−1 controlled way. It consists of an (online) physical network 400 P … T y M dy y −1 M0 dy0 ; with K ¼ nodes, a networked universe with social and ¼ i0 ikþ1ik ð kÞ ð kj k Þ ð Þ k¼1 economic activities, where the players of the game move around. The mobility traces of these players can then be ∈ Ktþ1 0 for every i0∶t , where Pi0 ¼ PðXð Þ¼i0Þ. naturally symbolized by coarse graining the original net- See Appendix G for a proof. Theorem 2 indicates that work via community detection into a network of 20 multiplex network models can be generalized to obtain nonoverlapping communities wired by a 20 × 20 weighted probabilistic systems with an infinite and uncountable adjacency matrix with complex topology [75]. number of layers, which are flexible enough to represent Szell et al. analyzed online player’s trajectories and (i.e., exactly recover) a broad class of random sequences reported evidence of long-term memory in the diffusing with infinite memory. patterns of agent mobility in this universe [75].Here,we apply the multiplexity detection statistic to a long time series IV. APPLICATIONS TO EXPERIMENTAL DATA obtained by concatenating individual traces. In Fig. 5,we show the numerical results (black dots), along with the null We now apply our methodology to two experimental case of a Markovian diffusion (red squares) over the same recordings of completely different nature: the mobility network. We find Dð3Þ > 0 as a footprint of multiplexity, dynamics of human players of an online video game and which can be interpreted here as the hidden presence of the dynamics of polymerases during RNA transcription.
A. Application to mobility: Pardus universe Human and animal mobility [67–69] is often described by dynamical processes on single-layer networks and, interestingly, has been found to display signatures of memory [70–72], which can be interpreted as a deviation from Markovianity. Can such lack of Markovianity be interpreted as being the result of a Markovian dynamics taking place on a hidden multiplex network? In the case where mobility takes place across (hidden) multimodal transportation systems (as when we collect GPS traces of urban mobility), layers could be physical (underground, bus, car, etc.). On the other hand, animal foraging dynamics are clearly different and alternate during day and night [73]. FIG. 5. Multiplexity detection statistic DðmÞ, applied to ex- In a similar vein, human mobility patterns change when perimental mobility trajectories on the Pardus universe [75].We switching from work to leisure styles [74]: These would be find Dð3Þ > 0, which suggests that mobility series are non- cases where layers would be effective rather than physical. Markovian, in agreement with independent evidence [75].
031038-10 MULTIPLEX DECOMPOSITION OF NON-MARKOVIAN … PHYS. REV. X 8, 031038 (2018) several effective layers. Szell et al. introduced a long-term transcription, the spatial location of RNAPs along the memory model to account for the mobility dynamics and the DNA template exhibits noise due to thermal fluctuations. heterogeneity of players. An alternative interpretation is that In addition, the dynamics of polymerases exhibits switching players are performing simple diffusion dynamics but are between two different dynamical regimes: an active polym- switching stochastically between different effective dynami- erization state called elongation and a passive or diffusive cal regimes. The challenge, which we leave as an open state called backtracking [77]. While elongating, RNAP question for future work, would be to assign a social or moves along the DNA template with a net velocity of the perhaps cognitive meaning to the different layers. In this order of about 20 nucleotides per second, with 0.34 nm being sense, our method here is closer to an unsupervised the distance between two nucleotides. In the backtracking clustering paradigm than to a supervised one. Since our regime, the polymerization reaction is stalled, and RNAPs experimental trace is given by the concatenation of traces of perform passive Brownian diffusion due to several types of different walkers, the different effective layers could also noise (e.g., thermal and chemical). Optical tweezers enable reveal a taxonomy of different game strategies. measuring the motion of a single RNAP during transcription of a single DNA template at a base-pair resolution [77,78]. Single-molecule experimental RNAP traces are, however, B. Application to biology: Transcription by extremely noisy, and the problem of identifying the possible eukaryotic RNA polymerases mixture of different underlying dynamical processes— It has been shown that the ratchet mechanism plays an including transitions from elongation to backtracking as well important role in active transport by molecular motors in as the hidden presence of other regimes—from a single living cells [43]. A paradigmatic example of a molecular experimental trace is a challenging task. motor that can be well described by a ratchet mechanism is At the single-molecule level, the RNAP dynamics RNA polymerase (RNAP). RNAPs are macromolecular corresponds to a molecular motor producing a non- enzymes responsible for the transcription of genetic infor- Markovian dynamics, which can thus be modeled as a mation encoded in the DNA into RNA [76]. During Markov switching random-walk dynamics in a biochemical
FIG. 6. (Upper panels) Experimental RNA polymerase I trace XðtÞ (in base-pair units) from which we extract an excerpt of the first 104 time steps (highlighted in red in the original trace, shown in the middle panel) and a sample of the first 100 data of its symbolized trace X˜ ðtÞ (right). Experimental traces of RNA polymerase I are extremely noisy, and one cannot always easily distinguish different dynamical regimes (see Appendix E for additional traces studied in this work). (Bottom-left panel) Multiplexity detection statistic DðmÞ applied to five (symbolized) experimental RNAP traces. We consistently find Dð3Þ > 0, which suggests that to correctly describe the dynamics of RNA polymerase I, we need at least two diffusive layers. (Bottom-right panel) Posterior probability for the layer estimation applied to five experimental series. We confirm that, with overwhelming probability, the most likely number of layers is L ¼ 2. One possibility is that the identification of L>1 is the consequence of the presence of colored noise in the single-molecule optical tweezer transcription experiment, whereas another possibility is that L ¼ 2 corresponds to the elongation (active state) and backtracking (passive state) dynamical modes (in the inner panel, the same results are shown in log-linear scales, and the probability associated with L ¼ 1 is zero so is not defined in a log scale).
031038-11 LUCAS LACASA et al. PHYS. REV. X 8, 031038 (2018) multiplex with L ¼ 2 layers (corresponding to elongation be applicable to more complicated scenarios such as and backtracking, respectively) [79]. This suggests that our identifying the number of different nucleotides in copolym- methodology (using cycle graphs with homogeneous tran- erization processes of templates with strong disorder [81]. sition rates as the prior topology) is applicable, and under the aforementioned assumptions, we should find that both V. DISCUSSION AND CONCLUSION Dð3Þ > 0 and the model with L ¼ 2 should have a clear maximum likelihood. To test such a prediction, we have In this work, we have introduced a method that both applied our complete methodology to five single-molecule detects and quantifies the degree of multiplexity in the hidden underlying structure of a networked system by only having experimental traces of the position X of RNA polymerase I access to local and partial statistics of a random walker. Our (Pol I) from yeast S. cerevisiae, obtained with a dual-trap working hypothesis (prior) is that there is a hidden multiplex optical tweezer setup in the assisting force mode [80].We where walkers diffuse, switching layers stochastically and systematically choose the first 104 data points at a 1-kHz diffusing over each layer. Under these circumstances, any sampling rate (see Fig. 6 and Appendix E) to keep the time random walker for which we only see a projection of such series short and make the inference problem harder (for trajectory in the aggregated network is necessarily non- instance, for the series shown in Fig. 6, it is not easy to Markovian if the number of layers is larger than one. Hence, visually distinguish the elongation and backtracking our algorithm for multiplexity detection exploits such break- regimes). Since the original traces have continuous state ing of Markovianity as a means to detect multiplexity. space, we discretize these series by embedding the exper- Incidentally, here we have focused on the specific case of imental recordings into a cycle graph via the transformation multiplex networks, where every layer has the same number proposed in Eq. (4) for K 4. An example is shown in the ¼ of (replica) nodes. Actually, in a multiplex, one can even have top-right panel in Fig. 6. In the bottom-left panel of the the same topology in each layer, where only transition weights D ˜ same figure, we plot ðmÞ applied to fXðtÞg for each of the differ, as in the case of the multiplex cycle graphs considered D 3 0 five experimental series, consistently showing ð Þ > . above. On the other hand, in a generic multilayer network, For comparison and control of finite-size effects, a similar each layer will have, in general, a different number of nodes 104 measure is computed on a null model: a time series of and different topology. This latter situation can be reinter- data points generated by a (monoplex) Markov chain preted as having a multiplex where, in each layer, we can have ˜ whose transition matrix has been estimated from XðtÞ isolated nodes that are never reached by a walker, or forbidden (red dashed line). We can conclude that the underlying transitions. Accordingly, we envisage that layers in a multi- dynamics requires at least two alternating dynamics—a plex would be, in general, harder to distinguish via our method stochastic alternation between two diffusive layers—hence than layers in a generic multilayer network, and as such, we the projection onto an effectively multiplex model. expect the detection method to be easily generalizable to the Subsequently, in the bottom-right panel of the same multilayer case, something that should be studied in the future. figure, we provide the results on the layer estimation using In a second step, we have introduced a probabilistic our nonlinear population Monte Carlo algorithm (conver- scheme to estimate the most probable number of layers 2 gence after 12 iterations, 10 samples per iteration). The composing the hidden multiplex. Note that probabilistic algorithm directly provides log (PðOjLÞ), so assuming a model selection is not new in network inference; for instance, uniform prior on the number of layers, log (PðOjLÞ) ∝ in Ref. [82], a similar concept was used to estimate the most log (PðLjOÞ), i.e., the logarithm of the a posteriori prob- probable combination of basic block models that accounts for ability of model L. The true probability of the model (in a certain network topology (see, also, Refs. [83–86]), whereas natural units) is subsequently extracted and plotted accord- in Ref. [53], a probabilistic framework was developed to ingly. We find that the probability is essentially 1 for the estimate the most probable number of communities in a model with L ¼ 2 layers and negligible for the rest. Our single-layer network. Formally similar strategies to estimate algorithm thus reveals that the optimal hidden multiplex model parameters based on ϵ-machines or jump-Markov has L ¼ 2 effective layers. One possibility is that the system identification have also been put forward in the identification of L>1 is the consequence of the presence nonlinear dynamics [87] and control theory [57] commun- of colored noise in the single-molecule optical tweezer ities, respectively. In our case, the posterior probabilities transcription experiment. Quite intriguingly, additional quantify the likelihood of having a hidden multiplex with L evidence points to the presence of non-Markovianity also layers; i.e., our approach for multiplex model estimation is in the backtracking regime (see Appendix E), which might purely Bayesian. We were able to demonstrate the validity suggest the presence of colored noise even in the back- and accuracy of this second part in simple synthetic networks tracking mode, as in the example in Sec. III. Another (multiplex cycles) up to L ¼ 5 layers, as reported in the main possibility is, as previously discussed, that the two effective text and appendixes. Since the model selection protocol can layers correspond to the biochemical mechanisms of be seen as a multidimensional Bayesian inference problem, elongation (active state) and backtracking (passive state) these schemes—similarlytohiddenMarkovmodelsand dynamical modes. Finally, we expect our approach to also other methods in statistical inference—suffer from poor
031038-12 MULTIPLEX DECOMPOSITION OF NON-MARKOVIAN … PHYS. REV. X 8, 031038 (2018) scalability: Essentially, the computation of the model pos- deals more generally with the decomposition of non- terior probability explodes with the number of unknowns. Markovian dynamics. When these series are indeed traces This is a limitation of the method, and its optimization is of walkers navigating a network, under the premise of having therefore an open problem for future work. As a matter of fact, intralayer diffusion, our approach provides a workable a simple and intuitive (although inefficient) way to estimate solution for the unfolding of a multiplex network from its these posteriors is to use a (deterministic) grid integration aggregated projection. We should make it clear that, generally scheme. In an effort to improve scalability and optimize such speaking, it is not possible to assert that the effective multiplex calculations, we have proposed a nonlinear population representation is the true underlying architecture, much like Monte Carlo algorithm (described in full detail in one cannot typically claim that there exist true communities in Appendixes C and D), which reduces the computer runtime an observed network. However, one can determine in a by a factor close to 102, without performance loss, for all the principled way whether the available observations are more examples where we had previously used deterministic reliably represented by a multiplex model or by a monoplex integration. Interestingly enough, our model selection one, in the same vein as models of networks with community scheme not only selects the most probable number of hidden structure sometimes reproduce the observations more faith- layers: By capitalizing on an importance sampling routine fully than models that lack such community structure. that focuses on a small region of the parameter space where In the general case, our method provides a potentially the likelihood is concentrated, we can also provide a Bayesian useful approach to disentangle the combination of dynami- estimation of the model parameters (i.e., the topology and cal regimes that appear intertwined in noisy dynamics; transition rates), thereby estimating not only the most for instance, this is the case of RNA polymerase moving probable number of layers but, given that number of layers, in a noisy environment and stochastically switching the full architecture. between an active and a passive state. This suggests that We have thoroughly illustrated the validity and scalability applications of this work not only include network science of the whole method by addressing several synthetic systems but extend to other fields in biophysics, condensed matter of varying complexity, as depicted in Sec. II and the physics, gambling, or mathematical finance, where non- appendixes, where we show that we can reconstruct the full Markovian signals pervade. architecture of the hidden multiplex network by analyzing the statistics of random walks over the projected network. ACKNOWLEDGMENTS Interestingly, our method can be applied to signals of We sincerely thank Michael Szell, Roberta Sinatra, and arbitrary origin, extending the formalism to also deal with Vito Latora for sharing data on the Pardus universe and continuous processes—i.e., time series that are not neces- for fruitful discussions, and we thank anonymous referees for sarily random walkers navigating a network—after a simple useful comments. L. L. acknowledges funding from EPSRC graph embedding (series discretization). Under this exten- Grant No. EP/P01660X/1. I. P. M. acknowledges the sion, our hidden jump-Markov model provides a decom- Spanish Ministry of Economy and Competitiveness position of a given non-Markovian dynamics into a Markov (Projects No. TEC2015-69868-C2-1-R ADVENTURE switching combination of diffusive modes and thus enjoys and No. TEC2017-86921-C2-1-R CAIMAN) for financial larger generality: The reconstructed multiplex model support. J. M. acknowledges the Spanish Ministry of embeds the originally non-Markovian signal into a random Economy and Competitiveness (Project No. TEC2015- walk navigating an effective multiplex network, where each 69868-C2-1-R ADVENTURE) and the Office of Naval layer accounts for a different type of diffusive dynamics. We Research (ONR) Global (Grant No. N62909-15-1-2011) validated this extension by analyzing canonical continuous for financial support. I. P. M. also acknowledges support processes (the Ornstein-Uhlenbeck process and a Langevin from thegrant of the Ministry of Education and Science of the equation with colored noise). Furthermore, we have proved Russian Federation Agreement No. 074-02-2018-330. J. G. two existence theorems which guarantee that an exact G. acknowledges financial support from MINECO (Projects reconstruction is always possible, for any type of (arbitrary) No. FIS2014-55867-P and No. FIS2017-87519-P) and from finite-order Markovian and fully non-Markovian (i.e., infin- the Departamento de Industria e Innovacion del Gobierno de ite memory) process. Aragon y Fondo Social Europeo (FENOL group E36_17R). Finally, we applied this methodology in two experimental L. L., I. P. M., and J. M. contributed equally to this work. scenarios (a case of human mobility in an online universe and the analysis of the dynamics of RNA polymerase) and hence showcased its applicability in real, experimental data. APPENDIX A: A FEW NUMERICAL To conclude, starting from the question of whether it is CONSIDERATIONS possible to disentangle the hidden multiplex architecture Bounds on switching rate.—Numerically, the problem of of a complex system if one only has experimental access to a detecting multiplexity via statistical differences between projection of this architecture, in this work we have elaborated the Markovian surrogates YðtÞ and XðtÞ should, in prin- a mathematical and computational framework that actually ciple, be easier when the switching rate r is small enough
031038-13 LUCAS LACASA et al. PHYS. REV. X 8, 031038 (2018) such that we allow trends associated with each layer to method correctly detects multiplexity in this arguably more build up in the series, but large enough such that Dð1Þ¼ complicated scenario. Dð2Þ¼0 (that is, large enough such that the one- and two- In the next sections, we extend the initial study on step joint distributions are still equivalent). A simple inferring multiplexity to the case where layers have theoretical lower bound is r<1=m (as the average size different complex topologies, departing from the situation of a trend is 1=r, and this should be at least as large as the where each layer is a cycle graph (ring). Note that when block size). Note that this bound is not tight in what we each layer has a different topology, we expect the refer to as real-world scenarios (as for human mobility), algorithm to more easily detect the underlying multiplex where the switching rate is normally low in order to avoid character (in this sense, extension of this formalism to the unnecessary delays or has characteristic timescales that are more general case of a multilayer network is very much lower than the diffusion timescales (as for changing promising). In particular, we explore the following addi- the foraging mode between day and night, in animal tional scenarios: mobility). (i) Scenario 2: Two complete graphs with different Finite size bias in KLD.—As a technical remark, note transition matrices. that KLDðpjjqÞ diverges if the distributions p and q have (ii) Scenario 3: Two layers with cycle graphs, where in different supports [i.e., if qðmÞ¼0, pðmÞ ≠ 0 or one of the layers we introduce a shortcut, which is pðmÞ¼0, qðmÞ ≠ 0 for some value m]. In order to crossed with a probability ϵ. This scenario allows appropriately weight this possibility while maintaining us to explore small perturbations in the transition the finite measure in pathological cases, a common matrices with respect to the original scenario studied procedure [49] is to introduce a small bias that allows in the main text. for the possibility of having a small uncertainty for every (iii) Scenario 4: Each layer has a different topology and contribution. Here, we introduce a bias of order Oð1=n2Þ, dynamics, the first being a cycle graph (ring) with where n is the series size (i.e., we replace all vanishing positive net current and the second layer being a frequencies with 1=n, and we normalize the frequency complete graph with null net current. histogram appropriately). (iv) Scenario 5: Initially having two identical layers (two cycle graphs), we introduce a number of additional edges (shortcuts) in the second one to APPENDIX B: INFERRING MULTIPLEXITY IN explore topological perturbations on our initial SOME ADDITIONAL SCENARIOS scenario. We also explore the scalability of this We start by providing, in Fig. 7, an additional analysis general scenario by considering the effect of in- similar to Fig. 2 but where there is no induced current. The creasing the number of nodes. (v) Scenario 6: Initially, we consider two identical layers formed by Erdos-Renyi graphs and then rewire a percentage of the edges in one of the layers. Transition matrices are obtained here by unbiasing the walker Tij ¼ Aij=ki. We also explore the scal- ability of this scenario by considering the effect of increasing the size of the graphs.
1. Scenario 2: Complete graphs In this scenario, we consider two layers with identical topology but different transition matrices. Here, we build two replicas of a complete graph. In the first layer, we define an unbiased random walker with transition matrix
FIG. 7. The normalized Kullback-Leibler divergence D m 2 3 ð Þ 01=31=31=3 between XðtÞ and its Markovian surrogate YðtÞ, in the case 6 7 where XðtÞ does not show any induced current, for different 6 1=301=31=3 7 Tð1Þ ¼ 6 7; values of the switching rate r. The series XðtÞ records the position 4 1=31=301=3 5 of a walker diffusing over two layers, where transition proba- ð1Þ 1 3 ð2Þ 2 3 1=31=31=30 bilities in the layers are Ti;iþ1 ¼ = and Ti;iþ1 ¼ = .We correctly find that XðtÞ is non-Markovian even for large values of r as Dðm>2Þ > 0, which suggests an underlying multiplex whereas in the second layer, we introduce a parametric structure. deviation
031038-14 MULTIPLEX DECOMPOSITION OF NON-MARKOVIAN … PHYS. REV. X 8, 031038 (2018)
FIG. 8. Scenario 2. Normalized Kullback-Leibler divergence DðmÞ between XðtÞ and its Markovianized surrogate YðtÞ, in the FIG. 9. Scenario 3. Normalized Kullback-Leibler divergence case where both layers are complete graphs with different DðmÞ between XðtÞ and its Markovianized surrogate YðtÞ in the transition matrices and the switching rate is r ¼ 0.1. The differ- case where the second layer has a shortcut, which is traversed ence is parametrized by ϵ (when ϵ ¼ 0, both matrices are with probability ϵ (scenario 3) and the switching rate is r ¼ 0.1. identical, and they increasingly differ with increasing values For ϵ ¼ 0, both layers are identical, and the walker is essentially of ϵ). The method detects multiplexity [Dðm>2Þ > 0] with navigating over a monoplex; therefore, XðtÞ is Markovian, and larger values as ϵ increases. DðmÞ¼0 ∀ m. For ϵ > 0, the graph is multiplex; therefore, Dðm>2Þ > 0, and such a feature is detected with quantitatively 2 3 larger fingerprints as ϵ increases. 01=3 þ 2ϵ 1=3 − ϵ 1=3 − ϵ 6 7 6 1=3 þ 2ϵ 01=3 − ϵ 1=3 − ϵ 7 Tð2Þ ¼ 6 7: 2 3 4 1=3 þ 2ϵ 1=3 − ϵ 01=3 − ϵ 5 01=3 − ϵ 2ϵ 2=3 − ϵ 6 2 3 ϵ 013 − ϵ 0 7 1=3 þ 2ϵ 1=3 − ϵ 1=3 − ϵ 0 2 6 = þ = 7 Tð Þ ¼ 6 7: 4 2ϵ 2=3 − 3ϵ 01=3 þ ϵ 5 The statistics of a Markov chain over each layer separately 1=3 ϵ 02=3 − ϵ 0 differ more for larger ϵ.Forϵ ¼ 0, both layers are þ identical, and a walker diffusing over the multiplex ϵ 0 (switching layers at a constant rate r) reduces to a When ¼ , this extra edge has no effect, and we should Markov chain over one layer, whereas for ϵ > 0,the expect that the multiplex is effectively monoplex; however, ϵ 0 process is non-Markovian if we only have access to the for > , both layers show gradually different structures, state XðtÞ.InFig.8, we plot the results for DðmÞ,which and as such, XðtÞ is non-Markovian. Similar to the previous show that multiplexity can always be detected, and such case, our methodology predicts that the network is a D ≤ 2 0 D 2 0 detection is easier as ϵ increases. multiplex when ðm Þ¼ and ðm> Þ > .
2. Scenario 3: Controlled perturbation on one layer In this scenario, we explore the effect of a controlled perturbation in the topology of one of the layers. We start by defining L ¼ 2 identical layers (two rings with the same transition matrix, with a homogeneous probability to flow → 1 1 3 from i i þ Ti;iþ1 ¼ = ). In the second replica, we introduce a shortcut between two nodes, weighting the probability of traversing this node as 2ϵ and biasing the rest of the edges accordingly. Thus, the transition matrices of both layers read 2 3 01=302=3 6 7 6 2=301=307 Tð1Þ ¼ 6 7; 4 02=301=3 5 FIG. 10. Scenario 4. Normalized Kullback-Leibler divergence DðmÞ between XðtÞ and its Markovianized surrogate YðtÞ in the 1=302=30 case where the second layer is a complete graph.
031038-15 LUCAS LACASA et al. PHYS. REV. X 8, 031038 (2018)
Intuitively, as ϵ increases, the effect of the shortcut should be higher, and thus, detecting the multiplex nature of the network should be easier. In Fig. 9, we plot DðmÞ for a Markov chain walking over this topology with a constant switching rate r ¼ 0.1 and different values of ϵ. The results support the accuracy of the method; even for small values of ϵ, the multiplex nature of the network is detected.
3. Scenario 4: Ring versus complete graph In this scenario, we focus on a multiplex with L ¼ 2, where each layer is totally different. The first layer is a ring with a net current described by the transition matrix 2 3 01=302=3 6 7 FIG. 12. Scenario 5. Normalized Kullback-Leibler divergence 6 2=301=307 DðmÞ between XðtÞ and its Markovianized surrogate YðtÞ for a Tð1Þ ¼ 6 7; 4 02=301=3 5 two-layer multiplex, where the second layer is a replica of the first layer in which a number of shortcuts have been introduced. When 1 30230 = = no edges have been introduced, both replicas are identical, XðtÞ is Markovian, and DðmÞ¼0 ∀ m; otherwise, the process is non- whereas the second layer is a complete graph with no net Markovian, and the algorithm detects multiplexity by finding current and detailed balance everywhere: Dðm>2Þ > 0. Such detection improves as the topology of both 2 3 layers is increasingly different. 01=31=31=3 6 1 301313 7 2 6 = = = 7 Tð Þ ¼ 6 7: In this initial case, we expect DðmÞ¼0, ∀ m (the 4 1=31=301=3 5 multiplex is effectively monoplex). We then add to this 1=31=31=30 benchmark a different number of edges [and, accordingly, we expect Dðm>2Þ > 0, and larger values when the Again, we find that the method can clearly detect the number of added edges increases], effectively interpolating D 2 0 multiplex nature of the network as ðm> Þ > between scenario 2 and scenario 3. The adjacency matrices (see Fig. 10). for three concrete cases (with no added edges, one added edge, and two added edges—this latter case being equiv- 4. Scenario 5: Sequentially introducing shortcuts alent to a complete graph) are depicted in Fig. 11, and in In this scenario, we consider two identical replicas (a Fig. 12, we show the values of DðmÞ. multiplex with L ¼ 2 layers), where in the second layer, we add a certain number of additional edges. Originally, both 5. Scenario 5b: Introducing edges on larger graphs replicas are rings with detailed balance (unbiased walker), Here, we investigate the scalability of scenario 5, and in 2 3 01=201=2 particular, we explore (i) the effect of increasing the number 6 7 of nodes K in each layer and (ii) the effect of increasing the 6 1=201=207 Tð1Þ Tð2Þ 6 7 number of rewired edges on the detectability. We start ¼ ¼ 4 5: 01=201=2 by defining two identical replicas of a ring with K nodes 1 2 ≠ 1=201=20 with unbiased transition matrix Tij ¼ = for i j. In the
FIG. 11. Scenario 5. Adjacency matrices of each layer for three cases: (i, left panel) No shortcuts are introduced, and both layers are identical; (ii, middle panel) one shortcut has been introduced; (iii, right panel) two shortcuts have been rewired. The transition matrices for each case are Tij ¼ Aij=ki, where ki is the degree of node i.
031038-16 MULTIPLEX DECOMPOSITION OF NON-MARKOVIAN … PHYS. REV. X 8, 031038 (2018)
second layer, we introduce a number of shortcuts, and we analyze two particular behaviors as follows. Effect of node increase.—We fix the number of short- cuts p ¼ 2 and vary the number of nodes K,andwe explore the dependence of DðmÞ on K (see Fig. 13). As we keep the series size N ðmÞ¼105 × 2m independent from K, we expect that as K increases, the statistics are poorer as we need larger series to capture an equivalent number of transitions. Detectability as a function of the number of shortcuts introduced.—Here, we fix K ¼ 10 and explore the multi- plex detectability as the number of shortcuts p is increased in the second layer. Multiplexity is detected when Dð3Þ > 0, and such detection is easier with larger Dð3Þ. FIG. 13. Scenario 5b. Normalized Kullback-Leibler divergence In Fig. 14, we plot Dð3Þ as a function of the number of DðmÞ between XðtÞ and its Markovianized surrogate YðtÞ for a shortcut edges p. two-layer multiplex where each layer has K nodes, and the second layer is a replica of the first layer, where p ¼ 2 shortcuts have been introduced. 6. Scenario 6: Rewiring edges In this scenario, we initially consider two replicas of the same Erdos-Renyi graph (where nodes i and j are connected with probability p ¼ 0.8, above the percolation threshold, to have a connected graph). We consider three different situations: (i) Both layers are kept identical, (ii) we rewire one edge at random, and (iii) we rewire two edges at random. The adjacency matrices of each layer for these three cases are represented in Fig. 15, and we choose unbiased random walkers with layer transition matrices Pij ¼ Aij=ki. In Fig. 16, we plot the values of DðmÞ for these three cases. As expected, when the layers are identical, we find DðmÞ¼0 ∀ m, whereas when we rewire edges from a layer, the network is converted into a multiplex one, and Dðm>2Þ > 0. Also, Dðm>2Þ takes larger values—and thus multiplex detection is easier— when the layers are increasingly different.
D 3 FIG. 14. Scenario 5b. Detectability measure ð Þ for a two- 7. Scenario 6b: Rewiring edges on larger graphs layer multiplex with K ¼ 10 nodes per layer, where the second layer is a replica of the first layer in which a number of shortcuts Finally, we consider Erdos-Renyi graphs (linking prob- p have been introduced. Results have been averaged over 10 ability 0.65) with K ¼ 10 nodes per layer and explore the network realizations for each case. multiplex detectability as we rewire a percentage of nodes
FIG. 15. Scenario 6. Adjacency matrices of each layer for three cases: (i, left panel) No rewiring takes place, and both layers are identical; (ii, middle panel) one edge has been rewired; (iii, right panel) two edges have been rewired. The transition matrices for each case are Tij ¼ Aij=ki, where ki is the degree of node i.
031038-17 LUCAS LACASA et al. PHYS. REV. X 8, 031038 (2018)
APPENDIX C: MATHEMATICAL AND ALGORITHMIC FRAMEWORK FOR LAYER ESTIMATION Here, we formalize the problem and provide a detailed derivation of the model, the probabilistic model selection scheme, and some possible algorithmic implementations of this scheme. Let us remark that the probabilistic framework described herein includes the case in which the path followed by the walker across the multiplex network cannot be observed exactly (i.e., there are observation errors) even if this is not addressed in the main text. We use an argument-wise notation to denote PMFs and probability density functions (PDFs). If X and Y are FIG. 16. Scenario 6. Normalized Kullback-Leibler divergence discrete random variables (RVs), then PðXÞ and PðYÞ are DðmÞ between XðtÞ and its Markovianized surrogate YðtÞ for a the PMF of X and the pmf of Y, respectively, and two-layer multiplex, where the second layer is a replica of the first are possibly different. Similarly, PðX; YÞ and PðXjYÞ layer in which a number of edges have been rewired. When no denote the joint PMF of the two RVs and the conditional edges have been rewired, both replicas are identical, XðtÞ is PMF of X given Y, respectively. We use lowercase p for Markovian, and DðmÞ¼0 ∀ m; otherwise, the process is non- PDFs. If X and Y are continuous RVs, then pðXÞ and pðYÞ Markovian, and the algorithm detects multiplexity by finding are the corresponding densities, which are possibly differ- D 2 0 ðm> Þ > . Such detection improves as the topology of both ent, and pðX; YÞ and pðXjYÞ denote the joint and condi- layers is increasingly different. tional PDFs. We may have a PDF of a continuous RV X conditional on a discrete RV Y, pðXjYÞ, as well as a PMF of Y given X, PðYjXÞ. Most RVs are indicated with uppercase p in the second layer. Multiplexity is detected when letters, e.g., X. If we need to denote a specific realization of Dð3Þ > 0, and the larger Dð3Þ, the easier such detection the RV, then we use the same letter but in lowercase, e.g., is. In Fig. 17, we plot Dð3Þ as a function of p. Dots are the X ¼ x or Y ¼ y. Matrices and vectors are indicated with a T result of an ensemble average over ER graph realizations. boldface font, e.g., . For an ensemble, we keep the number of rewired edges fixed and compute the effective average percentage of 1. The model rewired edges (which fluctuates because each realization of We assume that a walker travels through a multiplex an ER graph will have a different total number of edges). network, taking random moves between neighboring nodes and, occasionally, between layers. Let L denote the number of layers in the multiplex network, and let K be the number of nodes per layer. At discrete time t, the RV XðtÞ denotes the in-layer walker position. Therefore, XðtÞ ∈ f0; …;K− 1g, and XðtÞ¼k means that the par- ticle is located at node number k at time t, irrespective of the layer. The RV lðtÞ indicates the layer at time t, i.e., lðtÞ ∈ f1; 2; …;Lg, and lðtÞ¼l means that the walker is found in layer l at time t. The state of the walker, therefore, is given by the 2 × 1 vector ZðtÞ¼½XðtÞ; lðtÞ ⊤. At each time step, the walker may jump across layers. This motion is assumed to be Markov, and hence it can be characterized by an L × L stochastic transition matrix RL, 2 where the entry Rij, ði; jÞ ∈ f0; …;L− 1g , represents the probability of moving from layer i to layer j. Subsequently, the particle diffuses within the new layer to one of its neighbors. Within each single layer, the motion of the FIG. 17. Scenario 6b. Detectability measure Dð3Þ for a two- walker is also assumed to be Markovian. Hence, in the lth layer multiplex with K ¼ 10 nodes per layer, where each layer is layer, it is governed by a K × K transition matrix TðlÞ, such a connected Erdos-Renyi graph (link probability 0.65). The ðlÞ second layer is a replica of the first layer, where a percentage that Tij is the probability of a particle lying in layer l to p of nodes have been rewired. Results have been averaged over diffuse from node i to node j. These probabilities are 102 network realizations for each case. constant over time t. The complete Markov model is
031038-18 MULTIPLEX DECOMPOSITION OF NON-MARKOVIAN … PHYS. REV. X 8, 031038 (2018)
ð0Þ ðL−1Þ characterized by the set of matrices fRL; T ; …; T g, (ii) a likelihood function T TðlÞ L−1 Z and we denote L ¼f gl¼0 in the sequel for convenience. P(Yð1∶NÞjL) ¼ P(Yð1∶NÞjT L; RL)p0ðT LÞ We can think of this model as a discrete-time state-space dynamical system, where the state variables at time t are × p0ðRLÞμðdT L × dRLÞ; ðC2Þ
XðtÞ¼fRL; T L;XðtÞ; lðtÞg;t¼ 0; 1; 2; …; ðC1Þ then we can aim at computing the posterior PMF of the number of layers, and we assume there is a sequence of observations fYðtÞg ≥1 taking values in the set of node labels t P(LjYð1∶NÞ) ∝ P(Yð1∶NÞjL)P0ðLÞ; f0; 1; …;K− 1g. In the main text, we assume that the observations are exact and, therefore, YðtÞ¼XðtÞ. and choose the model according to the MAP criterion However, we can handle a more general class of models ( ) in which YðtÞ is a RV with conditional PMF P YðtÞjXðtÞ Lˆ ¼ arg max P(LjYð1∶NÞ) MAP ∈ 1 2 … (independently of the current or past layers). If the L f ; ; ;Lþg observation is exact, then ¼ arg max P(Yð1∶NÞjL)P0ðLÞ; ðC3Þ L∈f1;2;…;L g þ 1 if j ¼ i PðYðtÞ¼jjXðtÞ¼iÞ¼ i.e., we choose the value of L that is more probable given 0 otherwise: the available observations. Expression (C3) yields the optimal solution to the However, the proposed model (and related numerical problem of selection of the number of layers in the multiplex methods) admit the cases in which observation errors from a probabilistic Bayesian point of view. As discussed may occur, and hence, P(YðtÞjXðtÞ) is a nondegenerate below, one can find alternatives to this approach in the PMF. We assume there are known and independent a priori literature on HMMs [54,56]; however, the latter suffer from a PMFs for the node and layer at time t ¼ 0, P0(Xð0Þ; number of theoretical and practical limitations, and we lð0Þ) ¼ P0(Xð0Þ)P0(lð0Þ). In practical problems, the strongly advocate for the Bayesian solution (C3). parameters T L and RL are unknown, and we also endow them with prior PDFs p0ðT L; RLÞ¼p0ðT LÞp0ðRLÞ w.r.t. 3. Connections with hidden Markov model a suitable reference measure μðdT L × dRLÞ. Most often, estimation theory μ and for a general scenario, can be the Lebesgue measure The problem of selecting the number of layers L in the RL×K×K RL×L on × , but other choices may be possible if multiplex model can be cast as one of selecting a HMM T R we wish to impose constraints on L and L. For the case where the complete state is ZðtÞ¼½XðtÞ; lðtÞ ⊤ and the μ of the network with ring-shaped layers in the main text, transition from time t to time t þ 1 is governed by the Lþ1 can be reduced to the Lebesgue measure on R . ð0Þ ðL−1Þ (unknown) stochastic matrices RL; T ; …; T . Let us adapt the notation a bit in order to make it closer to the 2. Bayesian model selection classical HMM theory. We only have partial observations O 1∶ Assume that we have collected a sequence of N of the Markov chain, namely, the sequence ¼ Yð NÞ, “ ” ( ) observations, which we now label with emission probabilities P YðtÞjXðtÞ , while the l N sequence of layer labels f ðtÞgt¼1 remains unobserved. The goal is to estimate the total number of layers L from the 1∶ 1 2 … ∈ 0 1 … − 1 N Yð NÞ¼fYð Þ;Yð Þ; ;YðNÞg f ; ; ;K g : observations O. The theory of HMMs has received considerable attention We wish to make a decision as to what model is the best fit in the literature since the 1970s because of their application for that sequence. We adopt the view that two models are in a variety of fields, including speech processing, molecu- different when they have a different number of layers; lar biology, data compression, or artificial intelligence. The hence, if model A has L layers and model B has L0 layers, problem of fitting a HMM, i.e., estimating its unknown A ¼ B ⇔ L ¼ L0. A convenient way to tackle this problem parameters, has been thoroughly researched [54]. The is to model the total number of layers L as a RV, in such a classical technique is the Baum-Welch algorithm, which way that each possible value of L corresponds to a different is actually an instance of the expectation-maximization model. If we define (EM) method [54,88]. Indeed, the general EM methodol- (i) a prior probability mass function for L, say, P0ðLÞ, ogy, in several forms, is the standard approach to the ∈ 1 2 … for L f ; ; ;Lþg, where Lþ is the maximum problem of fitting HMMs, often combined with other admissible number of layers, and techniques for its implementation, such as the Viterbi
031038-19 LUCAS LACASA et al. PHYS. REV. X 8, 031038 (2018) algorithm, the forward-backward algorithm, or the Kalman (ii) The EM framework yields local optimization algo- smoother (see Ref. [56] for an excellent survey). Many of rithms. Even if the EM scheme converges, it may these techniques adopt the general form of a space- yield a local maximizer of the likelihood for L1 and, alternating EM algorithm, where the unobserved states perhaps, a global maximizer for L2. In this case, we and the unknown parameters are iteratively estimated, one O Tˆ Rˆ O Tˆ Rˆ may again have Pð j L1 ; L1 Þ
PðOjL2Þ. methodology was introduced in Ref. [55], and it provides a (iii) Even if we manage to obtain accurate maximum common framework for many current algorithms for T R T R likelihood estimates of L1 , L1 and L2 , L2 ,there ˆ ˆ fitting HMMs. is no guarantee that PðOjT ;Rˆ Þ
PðOjL2Þ. with up to L ≥ 4 layers are shown.
031038-20 MULTIPLEX DECOMPOSITION OF NON-MARKOVIAN … PHYS. REV. X 8, 031038 (2018) Z (iv) Conventional Monte Carlo integration, which suf- P(Yð1∶NÞjL) ¼ P(Yð1∶NÞjT ; R )p0ðT Þ fers from a similar complexity limitation. Classical L L L Markov chain Monte Carlo (MCMC) samplers × p0ðR ÞμðdT × dR Þ: ðC6Þ [92,93] could be well suited to solve integrals with L L L T R O respect to the posterior PDF pð L; Lj Þ; however, Using the Bayes theorem, we realize that the integrand in the integral in Eq. (C5) is actually the normalizing Eq. (C6) is proportional to the posterior density of the constant of this posterior, which turns out to be hard parameters, given the observations Yð1∶NÞ, i.e., to estimate via MCMC, which limits its application to model selection, in general [93]. p(T ;R jYð1∶NÞ) ∝ P(Yð1∶NÞjT ; R )p0ðT Þp0ðR Þ: The classical alternative to MCMC in Monte Carlo L L L L L L integration is importance sampling (IS) [93].While ðC7Þ conventional IS suffers from a problem called weight degeneracy, which translates into poor scaling with Taken together, Eqs. (C6) and (C7) indicate that the model the dimension of the parameters in the integral, likelihood P(Yð1∶NÞjL) is the normalization constant of recently, families of much more efficient adaptive the posterior PDF of the parameters, p(T L; RLjYð1∶NÞ). IS schemes have been introduced [60,61,94,95]. This normalization constant is often termed the model These techniques yield estimates of the integral in evidence in the Bayesian terminology. Eq. (C5) in a simple way (unlike MCMC) and can An efficient way of computing the normalization con- potentially work in high dimensions [95]. stant of a target PDF via Monte Carlo integration is by Below, we present a detailed description of the nonlinear using the importance sampling (IS) method. population Monte Carlo (PMC) scheme of Ref. [61] and show an example of model selection with up to 10 layers a. Importance sampling in a nutshell (L ≤ 10). While conventional (and even state-of-the-art) importance samplers are based on the computation of Let pðzÞ be a target PDF that we can evaluate up to a weights of the form wðzÞ ∝ f½pðzÞ =½qðzÞ g, where pðzÞ normalization constant c; i.e., we have the ability to is the target PDF and qðzÞ is a proposal density, the key compute feature of the nonlinear PMC scheme is to compute ˜ transformed weights w¯ ðzÞ ∝ ϕf½pðzÞ =½qðzÞ g, where pðzÞ¼cpðzÞ ϕð·Þ is a nonlinear function, in order to reduce the variance of the weights [if z is a random variable, then U ¼ wðZÞ pointwise, but c is unknown. The IS method [93] enables is random as well]. This very simple transformation, if the estimation of c [actually,R it enables the estimation of properly chosen, significantly improves the numerical integrals of the form fðzÞpðzÞdz in general, for any stability of the algorithm when the dimension of Z grows, integrable test function f] by sampling from an alternative while preserving the convergence properties of conven- PDF, qðzÞ, often called the proposal density or importance tional IS. The examples presented below, for the nonlinear function. We assume that qðzÞ is chosen to satisfy PMC and a deterministic scheme based on regular grids, p˜ ðzÞ show that this Monte Carlo integration scheme can be as wðzÞ¼ < ∞; ðC8Þ effective as a deterministic integrator with just a fraction of qðzÞ the running time. where wðzÞ is the weight function. The inequality in Eq. (C8) typically implies, at least, that qðzÞ > 0 whenever 4. Computation of the posterior probabilities pðzÞ > 0. via Monte Carlo integration The basic IS algorithm proceeds as follows: 1 Let us return to the original notation where Yð1∶NÞ (1) Draw M independent samples z ; …;zM from qðzÞ. denotes the sequence of observations. In order to select (2) Compute weights the number of layers L in the multiplex according to the Bayesian criterion in Eq. (C3), we need the ability to p˜ ðziÞ w˜ i ¼ wðziÞ¼ ; for i ¼ 1; …;M: evaluate the posterior probability qðziÞ
(3) Normalize the weights, ( 1∶ ) ∝ ( 1∶ ) P LjYð NÞ P Yð NÞjL P0ðLÞ; w˜ i wi P : ¼ M ˜ m m¼1 w where the prior P0ðLÞ is known (chosen by design) but the model likelihood P(Yð1∶NÞjL) is an integral given by It is a straightforward application of the strong law of large Eq. (C2), namely, numbers [93] to prove that
031038-21 LUCAS LACASA et al. PHYS. REV. X 8, 031038 (2018) Z XM (1) Initialization. lim wifðziÞ¼ fðzÞpðzÞdz almost surely ða:s:Þ T˜ i R˜ i M→∞ (a) Draw M independent samples ð L;0; L;0Þ, i ¼ i¼1 1; …;M, from the prior PDFs p0ðT LÞ and p0ðR Þ. for any square-integrable test function f. L (b) Compute non-normalized IWs, w˜ i ¼ P(Yð1∶NÞj However, the most relevant result for the purpose of this 0 T˜ i R˜ i ) 1 … paper is that L;0; L;0 , i ¼ ; ;M. i (c) Compute non-normalized TIWs, w¯ 0 ¼ ϕ ˜ m M 1 … 1 XM ði; fw0 gm¼1Þ, i ¼ ; ;M. cˆM ¼ w˜ i ðC9Þ (d) Normalize the TIWs, M i¼1 w¯ i is an unbiased, consistent estimator of the normalization wi P 0 ;i1; …;M: 0 ¼ M ¯ m ¼ constant c. By the strong law of large numbers again, m¼1 w0 Z Z T˜ i R˜ i M (e) Resample M times the set f L;0; L;0gi¼1, with lim cˆM ¼ wðzÞqðzÞdz ¼ p˜ ðzÞdz ¼ c a:s: ðC10Þ M→∞ replacement and using the normalized TIWs as probability masses, to yield an unweighted T i Ri M since p˜ ðzÞ¼cpðzÞ and pðzÞ is a PDF (hence, it integrates sample set f L;0; L;0gi¼1. to 1). (2) Iteration. For j ¼ 1∶J: In the Bayesian model selection problem at hand, the (a) Draw M independent samples target non-normalized function is given by p˜ ðzÞ ≡ PðYð1∶NÞjT ; R Þp0ðT Þp0ðR Þ, which we can evalu- T˜ i R˜ i ∼ T R T i Ri L L L L R ð L;j; L;jÞ Kð L; Lj L;j−1; L;j−1Þ; ate (as will be shown below), and c ¼ p˜ ðzÞdz ≡ 1 … PðYð1∶NÞjLÞ is the model likelihood. i ¼ ; ;M:
b. Nonlinear population Monte Carlo technique (b) Compute non-normalized IWs,
The main drawback of standard IS is that, whenever 1∶ T˜ i R˜ i T˜ i R˜ i ˜ PðYð NÞj L;j; L;jÞp0ð L;jÞp0ð L;jÞ there is a significant mismatch between pðzÞ and qðzÞ, the w˜ i ¼ ; j T˜ i R˜ i T i Ri variance of the weights becomes very large. As a conse- Kð L;j; L;jj L;j−1; L;j−1Þ quence, estimators converge very slowly (with M), and they i ¼ 1; …;M: become of little use. This issue is usually referred to as weight degeneracy [96]. It typically happens when the ¯ i dimension of the random variable Z is large or when, (c) Compute non-normalized TIWs, wj ¼ ϕ ˜ m M 1 … simply, the target PDF is very narrow. ði; fwj gm¼1Þ, i ¼ ; ;M. To mitigate degeneracy, a number of adaptive IS have (d) Normalize the TIWs, been proposed, especially since the publication of Ref. [97]. Here, we resort to one such method that introduces a w¯ i wi P j ;i1; …;M: nonlinear transformation of the weights to control their j ¼ M ¯ m ¼ m¼1 wj variability and, hence, degeneracy. The technique is called the NPMC method, and it was originally proposed in T˜ i R˜ i M (e) Resample M times the set f L;j; L;jgi¼1, with Ref. [61]. It consists of J iterations, each involving the replacement and using the normalized TIWs as computation of both conventional importance weights (IWs) probability masses, to yield an unweighted and transformed importance weights (TIWs). The trans- i i M sample set fT ; R g 1. formation is a clipping or truncation operation, denoted L;j L;j i¼ After the Jth iteration, we have the IS estimator of the ϕð·; ·Þ. For a set of M ordered IWs, w˜ i1 > w˜ i2 > > w˜ iM , model likelihood we obtain a set of M TIWs, with clipping of order pffiffiffiffiffi M < M,as c 1 XM M( 1∶ ) ˜ i P Yð NÞjL ¼ wJ: ˜ iMc ˜ i ≥ ˜ iMc M i m M w if w w i¼1 w¯ ¼ ϕði; fw˜ g 1Þ¼ m¼ w˜ i otherwise: This is a consistent estimator that converges with optimal This operation truncates the bigger Mc weights. A general Monte Carlo error rates. In particular, assuming that the algorithm is outlined below, with M samples per iteration IWs are bounded away from zero, Theorem 1 of Ref. [98] T i Ri ϵ 1 and an instrumental Markov sampling kernel Kð·; ·j L; LÞ states that for any arbitrarily small < 2, there exists an a.s. T i Ri ϵ centered at L, L. finite random variable U such that
031038-22 MULTIPLEX DECOMPOSITION OF NON-MARKOVIAN … PHYS. REV. X 8, 031038 (2018)
Uϵ ( 1∶ − 1 T R ) M( 1∶ ) − ( 1∶ ) P YðtÞjYð t Þ; L; L ðC12Þ jP Yð NÞjL P Yð NÞjL j < 1−ϵ M2 and, therefore, is a PMF that can be computed exactly using a recursive algorithm. We can write the function P(YðtÞjYð1∶t − 1Þ; M lim P (Yð1∶NÞjL) ¼ P(Yð1∶NÞjL) a:s: T L; RL) as a marginal of the mass P(YðtÞ;XðtÞ; lðtÞj M→∞ Yð1∶t − 1Þ; T L; RL), namely, All that remains is to show that the parameter likelihood function P(Yð1∶NÞjT ; R ), and hence the IWs and the L;j L;j P(YðtÞjYð1∶t−1Þ;T ;R ) TIWs, can be evaluated exactly. XX L L ¼ P(YðtÞjXðtÞ)P(XðtÞ;lðtÞjYð1∶t−1Þ;T L;RL); c. Exact calculation of the parameter likelihood XðtÞ lðtÞ P(Y 1∶N T L;RL) ð Þj ðC13Þ We start with the factorization
YN where we have used the fact that P(YðtÞjXðtÞ) ¼ ( 1∶ T R ) ( 1∶ − 1 T R ) P Yð NÞj L; L ¼ P YðtÞjYð t Þ; L; L ; P(YðtÞjXðtÞ). The conditional observation PMF t¼1 P(YðtÞjXðtÞ) is one of the building blocks of the state- ðC11Þ space model, so it can be readily evaluated. The predictive PMF P(XðtÞ; lðtÞjYð1∶t − 1Þ; T L; RL), on the other hand, where each factor can be decomposed as
X X P(XðtÞ; lðtÞjYð1∶t − 1Þ; T L; RL) ¼ P(XðtÞjXðt − 1Þ; lðt − 1Þ; T L) × P(lðtÞjlðt − 1Þ; RL) Xðt−1Þ lðt−1Þ
× P(Xðt − 1Þ; lðt − 1ÞjYð1∶t − 1Þ; T L; RL); ðC14Þ which depends on the filtering PMF at time t − 1:
PðXðt − 1Þ; lðt − 1ÞjYð1∶t − 1Þ; T L; RLÞ ∝ PðYðt − 1ÞjXðt − 1ÞÞ × PðXðt − 1Þ; lðt − 1ÞjYð1∶t − 2Þ; T L; RLÞ: ðC15Þ
Note that Eqs. (C13)–(C15) are recursively related. If we start from the prior PMF P0(Xð0Þ; lð0Þ) ¼ P0(Xð0Þ)P0(lð0Þ), then we can recursively compute the required sequence of PMFs for t ¼ 1; 2; …;T as X X P(XðtÞ; lðtÞjYð1∶t − 1Þ; T L; RL) ¼ P(XðtÞjXðt − 1Þ; lðt − 1Þ; T L) × P(lðtÞjlðtÞ − 1; RL) Xðt−1Þ lðt−1Þ
× P(Xðt − 1Þ; lðt − 1ÞjYð1∶t − 1Þ; T L; RL); ðC16Þ X X P(YðtÞjYð1∶t − 1Þ; T L; RL) ¼ P(YðtÞjXðtÞ) × P(XðtÞ; lðtÞjYð1∶t − 1Þ; T L; RL); ðC17Þ XðtÞ lðtÞ
P(XðtÞ; lðtÞjYð1∶NÞ; T L; RL) ∝ P(YðtÞjXðtÞ)P(XðtÞ; lðtÞjYð1∶t − 1Þ; T L; RL): ðC18Þ
Substituting Eq. (C17) into Eq. (C11) for each t ¼ 1; …;N main text) a simple ring topology for the layers of the yields the parameter likelihood. multiplex, as well as a particular structure for the matrix RL, in such a way that it can be parametrized by a single scalar r ∈ ð0; 1Þ.ThisisdescribedinAppendixD1.In APPENDIX D: QUANTIFYING COMPLEXITY: Appendix D2,weshowhow,giventheringtopologyofthe ADDITIONAL RESULTS layer and assuming exact observations YðtÞ¼XðtÞ,itis In this section, we present some additional computer possible to construct a relatively simple deterministic simulation results that complement those shown in the main approximation to the posterior probabilities P(LjYð1∶NÞ). text. For the simulations, we assume (the same as in the A NPMC algorithm for this specific model is also detailed.
031038-23 LUCAS LACASA et al. PHYS. REV. X 8, 031038 (2018)
Finally, our numerical results are displayed and discussed in In order to simplify the structure of the stochastic matrix Appendix D5. R, we assume that, at each time t, the walker may stay in the same layer (as in t − 1) with probability 1 − r 1. Multiplex of ring-shaped layers (r ∈ ð0; 1Þ) or may jump to a different layer with proba- From now on, we assume a ring topology (cycle graph) bility ½r=ðL − 1Þ . Hence, the transition matrix RL becomes for every layer. In this particular case, the particle may jump 2 3 from one node to its neighbors (left or right on the ring) and r r 1 − r −1 −1 from one layer to another layer. This motion is random, and 6 L L 7 6 r 1 − r r 7 we model it by way of the following probabilities, 6 L−1 L−1 7 R 6 7; L ¼ 6 . . . . . 7 4 . . . . 5 Rij ¼ P(lðtÞ¼jjlðt − 1Þ¼i); ðD1Þ r r 1 − L−1 L−1 r TðlÞ ¼ P(XðtÞ¼Xðt − 1Þþ1 ðmod KÞjlðt − 1Þ¼l); ðD2Þ and there is a single unknown parameter, the probability to jump, r. where Rij represents the probability of moving from layer i to layer j and TðlÞ is the probability of moving rightwards in 2. Approximation of the posterior model probabilities 1 layer l, i.e., of jumping from node i to node i þ ðmod KÞ. In the main text, we have focused on scenarios where The probability of moving leftwards within layer l is, the node visited at time t can be observed exactly; i.e., therefore, YðtÞ¼XðtÞ and 1 − TðlÞ ¼ 1 − P(XðtÞ¼Xðt − 1Þ− 1 ðmod KÞjlðt −1Þ¼l): 1 if j ¼ i PðYðtÞ¼jjXðtÞ¼iÞ¼ These rightwards-jump probabilities for layers l ¼ 0 otherwise: 0; 1; …;L− 1 are put together in the vector [99] 2 3 This constraint simplifies some of the general equations in Tð1Þ 6 7 Appendix C2. Specifically, the recursions given by 6 . 7 – TL ¼ 4 . 5 ðD3Þ Eqs. (C16) (C18) for the evaluation of the PMF P(Y t Y 1∶t − 1 ; T ;r) (note that we have reduced ðlÞ ð Þj ð Þ L T 1 L× RL and T L to the scalar r and vector TL) become
X P(XðtÞjXð1∶t − 1Þ; TL;r) ¼ P(XðtÞjlðt − 1Þ;Xðt − 1Þ; TL) × P(lðt − 1ÞjXð1∶t − 1Þ; TL;r); ðD4Þ lðt−1Þ
P(lðtÞjXð1∶tÞ; T ;r) ¼ P(XðtÞjlðtÞ;Xð1∶t − 1Þ; T ;r)P(lðtÞjXð1∶t − 1Þ; T ;r) L XL L ¼ P(XðtÞjXð1∶t − 1Þ; TL;r) P(lðtÞjlðt − 1Þ;r)P(lðt − 1ÞjXð1∶t − 1Þ; TL;r): ðD5Þ lðt−1Þ
Equations (D4) and (D5) can be applied recursively, with These priors are assumed to be known as part of the model initial conditions given by the prior PMFs P0(Xð0Þ) and specification (typically, they can be uniform PMFs, as they P0(lð0Þ). In particular, note that have been selected in computer experiments). X X Given Eq. (D4), we can evaluate the likelihood of the ( 1 T ) ( 1 0 l 0 T ) P Xð Þj L;r ¼ P Xð ÞjXð Þ; ð Þ; l parameters TL and r. In particular, Xð0Þ lð0Þ P(Xð1∶NÞjTL;r) × P(lð0Þ)P(Xð0Þ); ðD6Þ YN ( 1 T ) ( 1∶ − 1 T ) while ¼ P Xð Þj L;r P XðtÞjXð t Þ; L;r ; ðD8Þ X t¼1 P l 1 T ;r P l 1 l 0 ;r P l 0 : D7 ð ð Þj l Þ¼ ð ð Þj ð Þ Þ ð ð ÞÞ ð Þ with P(Xð1ÞjTL;r) computed as in Eq. (D6). Numerically, lð0Þ it is convenient to work with the log-likelihood
031038-24 MULTIPLEX DECOMPOSITION OF NON-MARKOVIAN … PHYS. REV. X 8, 031038 (2018) logP(Xð1∶NÞjTL;r) 4. NPMC algorithm for Monte Carlo integration XN Let us specify a practical NPMC algorithm for the ¼ logP(Xð1ÞjTL;r) þ log P(XðtÞjXð1∶t − 1Þ; TL;r) multiplex composed of ring-shaped layers. The prior densities t¼1 ðlÞ ðl0Þ are p0ðrÞ¼Uð0; 1Þ and p0ðTLÞ ∝ minl≠l0 jT − T j.We ðD9Þ approximate the normalization constant Cˆ of the latter PDF by standard Monte Carlo integration [simulating the TðlÞ’s and transform it into natural units only when strictly uniformly over (0,1)]. needed. The sampling kernels Kð·; ·jTL;rÞ are truncated The likelihood of the RV, L, given by the observation Gaussian PDFs. Specifically, if TNðxjμ; σ2;a;bÞ denotes record Xð1∶NÞ is, therefore, given by the integral the truncated Gaussian density Z − 1 −μ 2 P(Xð1∶NÞjL) ¼ P(Xð1∶NÞjT ;r)p0ðT Þp0ðrÞdT dr; 2 expf 2σ2 ðx Þ g L L L TNðxjμ;σ ;a;bÞ¼R ; for x ∈ ða;bÞ; b − 1 −μ 2 d a expf 2σ2 ðu Þ g u ðD10Þ
then where we have assumed prior densities w.r.t. the Lebesgue measure. Indeed, in the absence of any prior knowledge, it is natural to choose p0ðrÞ to be uniform over the interval YL T T˘ ˘ ˘ σ2 0 1 TðlÞ T˘ ðlÞ σ2 0 1 (0,1), i.e., p0ðrÞ¼1. The prior p0ðTLÞ is important Kð L;rj L;rÞ¼TNðrjr; ; ; Þ TNð j ; ; ; Þ; because it is used to penalize system configurations with l¼1 two or more identical layers. Note that a multiplex with 1 2 ˜ L ¼ 2 and identical transition matrices Tð Þ ¼ Tð Þ ¼ T where σ2 0.005. ¼ pffiffiffiffiffi is equivalent to a monoplex with transition matrix T˜ . The number of clipped weights is Mc ¼b Mc. Recall L m M Similarly, any multiplex with layers, out of which that the function ϕði; fw˜ g 1Þ truncates the biggest M 0 ≤ j m¼ c L L have identical transition matrices, is fully equivalent weights to be equal (and identical to the M th largest − 0 1 c to a reduced system with L L þ layers [100]. For our weight). computer simulations, we have chosen (1) Initialization. T˜ i ˜i (a) Draw M independent samples ð L;0; r0Þ, i ¼ ðlÞ ðl0Þ 1 … T p0ðTLÞ ∝ minjT − T j; ðD11Þ ; ;M, from the prior PDFs p0ð LÞ and l≠l0 p0ðrÞ¼1 for r ∈ ð0; 1Þ. i (b) Compute non-normalized IWs, w˜ 0 ¼ PðXð1∶NÞj i.e., we penalize systems with at least two layers that have T˜ i ˜i 1 … L;0;r0Þ, i ¼ ; ;M. similar transition probabilities. i (c) Compute non-normalized TIWs, w¯ 0 ¼ ϕði; ˜ m M 1 … fw0 gm¼1Þ, i ¼ ; ;M. 3. Deterministic integration (d) Normalize the TIWs, One conceptually simple way to approximate the integral in Eq. (D8) is w¯ i wi P 0 ;i1; …;M: 0 ¼ M ¯ m ¼ m¼1 w0 P(Xð1∶NÞjL) X X X ≈ ( 1∶ T ) T T˜ i ˜i M P Xð NÞj L;r p0ð LÞ; ðD12Þ (e) Resample M times the set f L;0; r0gi¼1, with Tð0Þ∈G TðL−1Þ∈G r∈G replacement and using the normalized TIWs as probability masses, to yield an unweighted Ti i M where G ¼fg1; …;gHg is a grid of H points over the sample set f L;0;r0gi¼1. 1∶ interval (0,1), i.e., 0