<<

Number of hidden states needed to physically implement a given conditional distribution

Jeremy A. Owen,1 Artemy Kolchinsky,2 and David H. Wolpert2, 3, 4, ∗ 1Physics of Living Systems Group, Department of Physics, Massachusetts Institute of Technology, 400 Tech Square, Cambridge, MA 02139. 2Santa Fe Institute 3Arizona State University 4http://davidwolpert.weebly.com† (Dated: October 15, 2019) We consider the problem of how to construct a physical process over a finite state space X that applies some desired conditional distribution P to initial states to produce final states. This problem arises often in the thermodynamics of computation and nonequilibrium statistical physics more generally (e.g., when designing processes to implement some desired computation, feedback controller, or Maxwell demon). It was previously known that some conditional distributions cannot be implemented using any master equation that involves just the states in X. However, here we show that any conditional distribution P can in fact be implemented—if additional “hidden” states not in X are available. Moreover, we show that is always possible to implement P in a thermodynamically reversible manner. We then investigate a novel cost of the physical resources needed to implement a given distribution P : the minimal number of hidden states needed to do so. We calculate this cost exactly for the special case where P represents a single-valued function, and provide an upper bound for the general case, in terms of the nonnegative rank of P . These results show that having access to one extra binary degree of freedom, thus doubling the total number of states, is sufficient to implement any P with a master equation in a thermodynamically reversible way, if there are no constraints on the allowed form of the master equation. (Such constraints can greatly increase the minimal needed number of hidden states.) Our results also imply that for certain P that can be implemented without hidden states, having hidden states permits an implementation that generates less heat.

I. INTRODUCTION conditional distribution of the system state at the final time given its state at the initial time. The entry Pij is Master equation dynamics over a discrete state space the probability that the system is in state i at the final play a fundamental role in nonequilibrium statistical time given that it was in state j at the initial time. physics and stochastic thermodynamics [1–3], and are The problem of characterizing the set of all stochas- used to model a wide variety of physical systems. Such tic matrices P that can arise this way from a (time- dynamics can arise after coarse-graining the continu- inhomogeneous) continuous-time master equation is ous phase space of a Hamiltonian system coupled to a known as the embedding problem for Markov chains [8– heat bath [4–6], or as semiclassical approximations of 13], which can be viewed as the classical and time- the evolution of an open quantum system with discrete inhomogeneous analogue of the Markovianity problem states [1, 7]. studied in quantum information [14, 15]. The linearity of the master equation implies a linear Despite the ubiquity of master equation dynamics in relationship between the distribution over the system’s models of physical systems, it turns out that many P are states at the initial time t = 0 and some final time (here non-embeddable, i.e., they cannot be implemented with taken to be t = 1 without loss of generality). This is any master equation. As we discuss below, examples true even if the master equation is time-inhomogeneous of non-embeddable matrices include common operations (i.e., non-autonomous). We can express this relation as such as bit erasure and the bit flip (a logical NOT),

p(1) = P p(0) 1 1 0 1 P = ,P = . (1) erase 0 0 flip 1 0 where p(0) and p(1) are column vectors whose entries sum arXiv:1709.00765v4 [cond-mat.stat-mech] 14 Oct 2019 to one, representing probability distributions at times t = While neither of these operations are embeddable, 0 and t = 1 respectively. If our system has n states, the there is a crucial difference between them. Bit erasure is P is an n × n , representing the infinitesimally close to being embeddable, meaning that it can be implemented by a master equation if one is will- ing to tolerate some finite, but arbitrarily small, probabil- ∗ Massachusetts Institute of Technology ity of error. We call such matrices limit-embeddable. † This is a post-peer-review version of an article published in New On the other hand, as we show below, many stochas- Journal of Physics. The final, published version is available at tic matrices P , such as the bit flip, are not even limit- https://doi.org/10.1088/1367-2630/aaf81d embeddable. In light of this, suppose we encounter a 2

physical system with n “visible states” that we know while generating less heat, as we illustrate in detail in an obeys a continuous-time master equation, but is ob- example below. served to evolve according to a P that is not even limit- Note that our results assume complete freedom to use embeddable between t = 0 and t = 1. In such a situation any master equation to implement a given P . However, in we can conclude that the master equation dynamics must the real world there will often be major constraints on the actually take place over some larger state space, including form of the master equation that can be considered, e.g., some unseen hidden states in addition to the n visible due to known properties of an observed system we wish states. to model using a master equation, or due to limitations Alternatively, we can imagine that instead of observ- on what kind of system we can build. An example of the ing some stochastic dynamics, we attempt to build a sys- former is if we know that the system’s Hamiltonian can tem that carries out a specified P using master equation only couple degrees of freedom in certain restricted ways. dynamics. This is the challenge we might face, for ex- An example of the latter is if P must be implemented ample, if P is the update function of the logical state using a digital circuit made out of separate gates. In of a computer we wish to build. In this scenario, the cases where there are such constraints on the form of the minimal number of hidden states needed to implement master equation, our results can be considered as upper P with master equation dynamics emerges as a funda- bounds on lower bounds of the number of hidden states mental “state space cost” of the computation P . (See that are really required. (See Section VI C below.) also [16].) Previous research has shown how to carry out arbitrary In fact, by adding hidden states one can construct mas- P thermodynamically reversibly [18–21]. Moreover, the ter equations that meet more desiderata than just imple- constructions in those papers can all be formulated in menting a given P . Specifically, below we show that for terms of master equation dynamics.2 In addition, they all any given P and initial distribution p(0), one can con- exploit what in our terminology are called hidden states. struct a master equation that implements P on p(0) while However, none of those papers considered the issue of the being arbitrarily close to thermodynamically reversible.1 minimal number of hidden states needed to implement P , In this paper, we establish upper bounds on the num- i.e., the state space cost of implementing P , which is the ber of hidden states required to implement any stochas- focus of this paper. tic matrix P using a master equation, and also establish In a companion paper [16] we focus on the special case bounds when we require that P be implemented in a where the stochastic matrix P is “single-valued”, i.e., it thermodynamically reversible way. We do this using ex- represents a deterministic function. It turns out that for plicit constructions that show any stochastic matrix P that case at least, there is a second kind of cost aris- can be implemented by composing some appropriate set ing in master equations dynamics that implement P , in of fundamental transformations that we call local relax- addition to the state space cost which is the focus of ations, defined in detail below. this paper. Roughly speaking, that second cost is the Our main result is that any n × n stochastic matrix minimal number of times that the set of allowed state- P can be implemented with no more than r − 1 hidden to-state transitions changes (which in a physics context states, where r is the nonnegative rank of P . Because may correspond to raising or lowering infinite energy bar- r ≤ n, this result implies a simple corollary that one riers between states). This can be viewed as a “timestep additional binary degree of freedom (which doubles the cost” of implementing P . Interestingly, there is a tradeoff number of available states) is sufficient to carry out any between the timestep cost of implementing any (single- P in a thermodynamically reversible way. We also derive valued) P and the state space cost of implementing that exact results for some particular kinds of P , including P . those representing single-valued functions, which show This paper focuses on the general problem of bounding that the number of required hidden states can be much the state space cost of arbitrary (not necessarily single- smaller than r − 1. valued) stochastic matrices and the relationship of this cost to thermodynamic reversibility. In the next section Finally, we show that if P can be implemented with we provide relevant background. Then in Section III, some number of hidden states while incurring some we define what it means for a master equation to im- nonzero entropy production, then it can be implemented plement a stochastic matrix P to arbitrary precision. In without any entropy production at the cost of at most Section IV, we define “local relaxations”, which are the one additional hidden state. Such results imply, for in- building blocks of all our constructions. We present our stance, that for a system coupled to a heat bath, adding main results in Section V. We also investigate how to ex- hidden states can allow one to implement the same P tend our framework beyond finite state spaces to count- ably infinite state spaces, for the special case where P is

1p(0) must be specified in addition to P since in general, the same master equation dynamics run on different initial distributions p(0) 2Specifically, the processes considered in those papers are all (in our will implement the same P , but with different amounts of irre- terminology) “limit-embeddable”, and so could be expressed using versible entropy production. See [17]. master equations. 3 a single-valued map over X. Since this topic is a bit dif- ferent from the main focus of the paper, it can be found in Appendix F. The other appendices contain proofs that are not in the main text. Time II. BACKGROUND

A. Master equations 0 1 Consider a physical system with a finite state space Probability that bit is 1 X of size n. We write the of the state at time t as p(t), where pi(t) is the probabil- FIG. 1. The difference of two solutions to Eq. (4) always ity that the system is in state i at time t. We suppose decreases over time but it can never change sign. that p(t) evolves as a time-inhomogeneous continuous- time Markov chain (CTMC),

dp(t) Pflip. To see why, note that the space of distributions = M(t)p(t) . (2) dt over two states is one-dimensional. The bit flip over a pair of states requires two distinct solutions to a differ- The elements in each column of the rate matrix M(t) ential equation, with different starting and ending con- must sum to zero and its off-diagonal entries are positive. ditions, to cross at some time, which is impossible (see The entry Mij(t) is the transition rate from state j to Fig. 1). (Note that this same argument does not apply state i at time t. Eq. (2) is commonly referred to as “the to bit erasure.) master equation”. For any CTMC, we can relate the distributions at the Some broadly applicable necessary conditions for a initial time t and some later time t0 by a linear map stochastic matrix to be embeddable are known. For ex- ample, by Liouville’s formula, we have, 0 0 p(t ) = TM (t, t )p(t) , (3)

0 Z 1  where TM (t, t ) is known as a transition matrix, and det TM (0, 1) = exp tr M(t) dt . (5) equals the time-ordered exponential of M(t). Where 0 M(t) is clear from context, for notational convenience 0 0 we will sometimes write TM (t, t ) simply as T (t, t ). Note 0 that if M(t) = M is constant, then T (t, t0) = e(t −t)M . Since the exponential function is strictly positive, any Finally, note that we can rescale time arbitrarily by mul- embeddable matrix must have strictly positive determi- tiplying M(t) by an appropriate constant. Therefore, nant [12, 22]. An immediate corollary is that a map like without loss of generality we will take t = 0 and t0 = 1 Pflip (which has determinant −1) is not embeddable. In- from now on. deed, by the continuity of the determinant, Pflip is not even infinitesimally close to an embeddable matrix. (In terms of our formal framework introduced below, Pflip is B. Embeddability not “limit-embeddable”.) Q It is also known that P must obey det P ≤ i Pii in Only some stochastic matrices P are embeddable, order to be embeddable [12, 22]. Note that any (non- meaning that they can be written as P = TM (0, 1) for identity) has determinant that is ei- some rate matrix M(t). As an illustration of a non- ther 1 or −1, while the product of diagonal entries is embeddable matrix, consider a bit flip in a two state sys- 0. Thus, no permutation matrix is embeddable, or even 0 1 tem, represented by Pflip = ( 1 0 ). Now the most general infinitesimally close to an embeddable matrix. master equation for a two state system is These two necessary conditions on the determinant

p˙1(t) = −r(t)p1(t) + s(t)(1 − p1(t)) , (4) of a matrix for it to be embeddable are not sufficient. Even if we restrict attention to the time-homogeneous where p1(t) indicates the probability that the bit is case, where transition rates between states are constant, in state 1 at time t, and r(t) and s(t) indicate time- the problem of whether an arbitrary matrix is embed- dependent rates of the 1 → 0 and 0 → 1 transitions, dable is unsolved for matrices larger than 3 × 3. (On respectively. It can be shown that there is no choice of the other hand, if one imposes the further constraint non-negative functions r(t), s(t) such that the solution of that the dynamics must obey detailed balance, the time- Eq. (4) simultaneously satisfies p1(1) = 0 when p1(0) = 1 homogeneous embedding problem has been solved for all and p1(1) = 1 when p1(0) = 0, as required to implement finite-size state spaces [13].) 4

C. Entropy production where p(0) and p(1) are the distributions over system states at the beginning and end of the physical process The field of stochastic thermodynamics [3] has devel- under consideration. oped a consistent thermodynamic interpretation of mas- In the previous section, we discussed the embedding ter equation dynamics, including a way to quantify en- problem, which considers what P can be implemented tropy production (EP), the overall increase of entropy in using a continuous-time master equation. A related, the system and coupled reservoirs. For a physical sys- thermodynamically motivated, question concerns which tem evolving according to a master equation like Eq. (2), P can be implemented using a master equation that we define the instantaneous rate of entropy production at achieves 0 total EP, in the case where there is a single, time t to be [23] isothermal heat bath. Generally, EP vanishes in such a situation if and only if p(t) is an equilibrium distribu- X Mij(t)pj(t) tion of M(t) for all t ∈ [0, 1]. Formally, this means that Σ(˙ p(t),M(t)) = M (t)p (t) ln (6) ij j Mij(t)pj(t) = Mji(t)pi(t) for all i, j and t ∈ [0, 1]. In Mji(t)pi(t) i,j general, this condition requires that the initial distribu- tion p(0) is an equilibrium distribution and that we go The total EP over the time interval [0, 1], given some to the “quasistatic limit” where the parameters change initial distribution p(0), is infinitesimally slowly in relation to the relaxation time of the system [27–29]. This limit can be achieved either Z 1 Σ(p(0),M(t), [0, 1]) = Σ(˙ p(t),M(t)) dt . (7) by implementing P increasingly slowly (in “wall clock” 0 time), and/or by making the transition rates (which con- trol the relaxation time) increasingly large. The rate of EP is nonnegative, and therefore so is to- While a given CTMC M(t) may achieve zero (more tal EP. A process that achieves 0 total EP is said to be accurately, arbitrarily small) total EP for some initial thermodynamically reversible. distribution p(0), in general it cannot generate zero EP If the system is coupled to a single reservoir, a heat for more than one such initial distribution. (See [17] for bath at temperature T , then the total EP can be written a detailed investigation of the dependence of total EP on as the initial distribution.) So our thermodynamically mo- 1 tivated question is whether a given P has the property Σ(p(0),M(t), [0, 1]) = [S(p(1)) − S(p(0))]+ hQi . (8) that for all initial distributions p(0), there is some asso- kT ciated set of M(t) that implement P with zero total EP, where S(·) is Shannon entropy, k is Boltzmann’s con- for that particular initial distribution p(0). stant, hQi indicates the average heat transferred to the thermal reservoir over all trajectories of states from t = 0 to t = 1 [23]. III. DEFINITIONS To use a well-known example from the thermodynam- ics of computation [24, 25], suppose we wish to perform A. Limits of CTMCs a bit erasure (Perase), in which both initial states of a bit, {0, 1}, get mapped to a single final value, 0. Since this As discussed in the Section I, many common operations is a many-to-one map, it reduces the entropy of the de- such as bit erasure are not embeddable (e.g. no CTMC vice (e.g., a computer) in which it is performed. By the has T (0, 1) = P ) but are nonetheless infinitesimally second law of thermodynamics, at least as much entropy erase close to a stochastic matrix that is embeddable. This must be produced in the environment. Often this occurs motivates the following definition: by the transfer of some amount of heat hQi from the sys- tem into a heat bath at temperature T . In this case, Definition 1. A stochastic matrix P is limit- assuming the initial probability over the states of the bit embeddable (LE) if for all  > 0, there is a rate matrix is uniform, the second law yields hQi ≥ kT ln 2, which is M(t) such that |T (0, 1) − P | < . the well-known bound on the amount of heat produced M in bit erasure that Landauer derived using semi-formal Our definition of limit embeddable makes no restric- 3 reasoning. More generally, an immediate consequence tions on the thermodynamic properties of the implemen- of Eq. (8) and the non-negativity of EP is tation of a given P . Our next task is to formalize an additional restriction that reduces the set of all LE ma- hQi ≥ kT [S(p(0)) − S(p(1))] , (9) trices to a subset that can be implemented in a ther- modynamically reversible manner. Rather than focus on whether some P can be implemented that way for one specific initial distribution, we focus on whether it can be 3It is now understood that a logically irreversible process like bit implemented that way for any initial distribution. This erasure can be done in a thermodynamically reversible manner [26], motivates the following definition: despite how many have interpreted Landauer’s reasoning. 5

Definition 2. A stochastic matrix P is quasistatically (state 1). The dot is brought into contact with a metallic embeddable (QE) if for all initial distributions p, all lead at temperature T which (for an appropriate state of  > 0, and all δ > 0, there is a rate matrix M(t) with the dot) may transfer an electron into the dot or out of |TM (0, 1) − P | <  and Σ(p, M(t), [0, 1]) < δ. it, thereby changing the state of the bit. At time t, the propensity of the lead to give or receive We emphasize that the rate matrix M(t) that implements an electron is set by its chemical potential, indicated by a given P must generally be specialized to the initial dis- µ(t), and the energy of an electron in the dot, indicated by tribution p in order to make Σ(p, M(t), [0, 1]) arbitrarily E(t). Specifically, let p(t) indicate the two-dimensional small. vector of probabilities, with p0(t) and p1(t) being the prob- There are a few properties of QE matrices that are ability of an empty and full dot, respectively. These prob- central to our analysis. The first one is the following abilities evolve according to the rate matrix [31] lemma: −w(t) 1 − w(t)  Lemma 1. The set of QE matrices is closed under mul- M(t) = C , (10) tiplication. w(t) −(1 − w(t)) It is impossible to implement any matrix without entropy where C sets the timescale of the exchange of electrons production except in a limit. This is illustrated by the between the dot and the lead and w(t) is the Fermi dis- next proposition, which is proved using a result from [30]: tribution of the lead, −1 w(t) = [exp((E(t) − µ(t))/kBT ) + 1] . Proposition 2. If M(t) is the rate matrix of a CTMC with T (0, 1) = P , I, then for any initial distribution p, Using Eq. (2), Eq. (10) and conservation of probability (i.e., p0(t) + p1(t) = 1), we can write |P p − p|2 Σ(p, M(t), [0, 1]) ≥ − 1 , p˙ (t) = C(w(t) − p (t)) (11) 2 ln(det P ) 1 1

4 Suppose that µ(t) and E(t) are chosen in a way that where |·|1 is `1 norm. depends on p1(0), so that w(t) = (1 − t)p1(0) + tδ for The inequality in Proposition 2 (which may not be some constant δ. In this case, Eq. (11) can be explicitly tight) shows that only those P with determinants very solved for p1: close to 0 can achieve small EP for arbitrary ini- −1 −Ct tial distributions. Recall that by Eq. (5), det P = p1(t) = w(t) + C (p1(0) − δ) 1 − e . (12) R 1  exp 0 tr M(t) dt . This means that a small determi- In the limit where C → ∞, p1(1) → w(1) = δ, so the nant can be achieved if the transition rates (the off- transition matrix T (0, 1) becomes diagonal entries of M(t)) are very large (so that tr M(t) 1 − δ 1 − δ is very negative). Note that since we have scaled time out T (0, 1) = . to fix the final time t0 = 1, we are effectively measuring δ δ rates in units of 1/t0. Therefore, the observation that the Furthermore, by Eq. (12), in that limit p (t) → w(t), entries of M(t) must be very large to achieve small EP is 1 and so p (t) → 1 − w(t). That means that for all i, j, consistent with the well-known fact that quasistatic pro- 0 M (t)p (t) → M (t)p (t). In this limit, the total entropy cesses are infinitely slow. (See also our discussion of the ij j ji i production over the course of the process, quasistatic limit in the previous section.) An important corollary of Proposition 2 is that any Z 1   X Mij(t)pj(t) QE matrix must be singular. Σ = dt Mij(t)pj(t) ln , Mji(t)pi(t) 0 i,j∈{0,1} Corollary 3. Any QE matrix, except the identity, has determinant zero. approaches zero (we show this rigorously, in the context of proving a more general result, in Appendix D). These points are illustrated in the following example Since δ is arbitrary, this means that we can make of a canonical QE process, bit erasure, implemented with T (0, 1) arbitrarily close to bit erasure (δ = 0), while hav- a quantum dot: ing arbitrarily small total EP. This establishes that bit Example 1. In the model of bit erasure described in [31], erasure is QE. a classical bit is implemented as a quantum dot, which Note that if we cannot control C directly, we can still can be either empty (state 0) or filled with an electron achieve the same effect as the limit C → ∞ by running the process with some fixed C for longer and longer times (that is, by changing the endpoint from t = 1 to some t > 1). This demonstrates the equivalence of going to the 4 For two probability distributions p, q over n states, the `1 norm is limit of infinitely-long time versus infinitely-fast rates for defined |p − q| := Pn |p − q |. 1 i=1 i i achieving vanishing EP. 6

B. Embedding with hidden states that way for any initial distribution. With this defini- tion, the “state space cost” given by the number of hid- Many stochastic matrices P are not QE. Our main re- den states would then be a function of P and the initial sult is that despite this, any P is a (principal) submatrix distribution p, not just of P . Our definition can be seen of a larger matrix P˜ which is QE. We formalize this prop- as a worst-case version of this weaker definition; we mea- erty as follows: sure the “state space cost” of a stochastic matrix P as the maximal number of hidden states we might need if we Definition 3. An n × n stochastic matrix P is qua- were given some specific initial distribution p and wanted sistatically embeddable with m hidden states if to construct a master equation that implements P with there exists some (n + m) × (n + m) matrix P˜ that is no EP for that p. ˜ QE, and for all i, j ∈ 1 . . . n, Pij = Pij (i.e., P is a In addition, we note that any matrix which is QE ac- principal submatrix of P˜). cording to our stronger definition would also satisfy this To understand the motivation for this definition, imag- weaker definition of QE. Thus, the upper bounds on the ine that there is a matrix P˜ over state space Y that is minimal number of hidden states required to quasistati- QE, and that P is a principal submatrix of P˜, corre- cally embed a given P (according to Definition 2) which sponding to the subset of states X ⊆ Y . If at t = 0 the we derive below are also upper bounds for the number process that implements P˜ is in some state i ∈ X, then of hidden states required under the alternative weaker the distribution over the states in X at the end of the definition of QE. Note also that Corollary 3 holds un- process at t = 1 will be exactly as specified by P . Fur- der the weaker definition, as long as the desired initial P ˜ distribution p is not the fixed point of P . thermore, because j∈X Pji = 1, Pji = 0 for any i ∈ X and j < X. This means that if the process is started on some i ∈ X, no probability can “leak” out into the hidden states by the end of the process, although it may IV. LOCAL RELAXATIONS pass through them at intermediate times. Mathematically, the hidden states in Definition 3 are In Example 1 we showed that bit erasure is QE. In fact, additional to the states whose evolution is controlled by bit erasure is part of a much larger family of stochastic P . However, there are multiple ways to map this math- matrices which can be shown to be QE, which we call “lo- ematical structure into a particular physical system. In cal relaxations”. These will serve as the “building blocks” particular, hidden states can arise when Y is a set of of the constructions we use below: microstates of a system and X is an associated set of macrostates; we elaborate this point in Section V. Definition 4. A stochastic matrix P is a local relax- Note that any P that is QE with m hidden states is ation if there is some permutation matrix Q (which may also QE with p hidden states for all p ≥ m. Note as well be the identity) such that: that by Lemma 1, if an n × n stochastic matrix P can be factored into a product of n×n matrices, each of which is 1. QP Q−1 has a block diagonal structure, QE with m hidden states, then P itself is QE with m hid- den states. More generally, the number of hidden states 2. Each block of QP Q−1 has rank one. required to quasistatically embed a product of stochastic matrices is no more than the maximum number required Note that the role of the permutation matrix Q is to just for each of the matrices in that product. In Section V, we rearrange rows and columns, i.e., to relabel states. exploit this fact to derive upper bounds on the minimal Loosely speaking, when a local relaxation (LR) ma- number of hidden states needed to quasistatically embed trix is used to evolve a system, states of the system are a given matrix. grouped into different “blocks” between which no prob- The definition of QE may seem very strict, but from ability flows, while the states within each block relax to the perspective of the number of hidden states needed the same final, block-specific distribution. for embedding, it is only slightly more costly than limit embedding: Example 2. The following 4 × 4 stochastic matrices are all local relaxations (for each matrix, we assume that Proposition 4. If a matrix P is limit-embeddable, it is a, b, c, d are chosen to be nonnegative and that the sum QE with at most one hidden state. of each column equals 1): This means the minimal number of hidden states re- quired for limit-embedding vs. quasistatic-embedding are a a a a a a 0 0 a 0 a 0 at most one apart. b b b b b b 0 0 0 c 0 c A =   ; B =   ; C =   Note that it might be fruitful to consider a weaker def- c c c c 0 0 c c b 0 b 0 inition of QE than Definition 2, in which a matrix would d d d d 0 0 d d 0 d 0 d be considered “QE” if it is limit-embeddable while achiev- ing vanishingly small EP for some specific initial distribu- The block structure in matrices A and B is immedi- tion p, rather than requiring that it is limit-embeddable ately apparent. Matrix C is a local relaxation since 7

B = QCQ−1, where Q is the permutation matrix To begin, note that if a single-valued map P corre- sponds to an invertible function, then P is a permutation 1 0 0 0 and is not limit-embeddable, as discussed in Section I. If, 0 0 1 0 Q =   . on the other hand, a single-valued map P corresponds 0 1 0 0 to a non-invertible function, then it has determinant 0. 0 0 0 1 Moreover, we can use LR matrices to establish that any such P is limit-embeddable, and in fact QE: (In other words, C can be arranged to have the block structure of B by switching rows/columns 2 and 3.) Proposition 6. Any single-valued map with determinant zero is QE. In Appendix D, we prove the following result, which es- Proof. Consider a single-valued map P over some finite tablishes that we can use LR matrices as building blocks set X with determinant zero, which represents some to construct QE matrices: non-invertible function f : X → X. It is known that Proposition 5. Any local relaxation is QE. any non-invertible function f : X → X is a compo- sition of finitely many idempotent X → X functions,5 By performing one local relaxation followed by an- f = fN ◦ . . . f2 ◦ f1 [32]. In our context, this means that other one involving a different partition of X into blocks, P can be represented as a product of N single-valued Q (k) (k) we can quasistatically implement matrices that are not maps P = k M , each M representing the idempo- themselves local relaxations. This means the converse of tent function fk. Note that the image of an idempotent Proposition 5 is false—not every QE matrix is a local function consists entirely of fixed points, thus each M (k) relaxation. can be written as However, it is the case that any 2 × 2 QE matrix P ( 1 if j ∈ f −1(i) and i = f (i) is a local relaxation. If P is the identity, it is a local M (k) = k k , relaxation. If not, then by Corollary 3, its determinant ij 0 otherwise is zero, which implies it has rank one and so is a local relaxation (with a single block). It can be verified that any such M (k) is a local relax- Since they are all QE, products of LR matrices have ation, since it consists of blocks corresponding to the set determinant zero (again, except for the identity). On of pre-images {f −1(i): i ∈ f(X)}, and each block f −1(i) the other hand, not all singular matrices are products of contains identical columns: all 0s, except for a 1 in the LRs. To see this, note that a (non-identity) product of row corresponding to i. LRs must always send at least two initial states to the The result then follows by applying first Proposition 5 same final distribution (since there must at least one LR and then Lemma 1.  in the product with a block of size larger than one). So such a product must have at least two identical columns. While a single-valued map with nonzero determinant Therefore, a matrix like is not QE (by Corollary 3), it can always be made QE by adding a single hidden state. Before establishing this for 1 0 1/2 the general case, we illustrate the basic proof technique with an example showing that the bit flip is QE with one 0 1 1/2 0 0 0 hidden state: 0 1 Example 3. The bit flip Pflip = ( 1 0 ) is a map over is not a product of LRs, even though it is singular. a binary space X. It is not QE because it has negative To establish the results in the next section, we will determinant. However, it is QE with one hidden state. repeatedly make use of Proposition 5 and Lemma 1 to This follows by constructing a special set of three LR prove that a given matrix is QE by writing it as a finite matrices, each defined over a space Y that has three ele- product of local relaxations. ments, such that the restriction to the first two elements One might conjecture that a matrix is QE only if it is of the product of those matrices is the bit flip: a product of LRs. However, we do not establish this, and         our results below do not rely on it. 0 1 0 1 0 0 1 1 0 0 0 0 1 0 1 = 0 1 1 0 0 0 0 1 0 . (13) 0 0 0 0 0 0 0 0 1 1 0 1 V. UPPER BOUNDS ON MINIMAL NUMBER OF HIDDEN STATES The restriction of the matrix on the LHS to its first two elements (i.e., the upper left block) is the bit flip oper- ation, by inspection. In addition, the first two matrices A. Single-valued maps

We refer to a stochastic matrix that represents a de- terministic function (i.e., all its entries are either 0 or 1) 5A function is idempotent if applying it twice is the same as applying as a single-valued map. it once, f ◦ f = f. 8 on the RHS are LR, by inspection. The third matrix on can be interpreted as representing transfers of probability the RHS is LR as well, but to see that we need to relabel between disjoint sets of states: the first matrix transfers the elements of Y in such a way that that matrix becomes probability from the n original states to r hidden ones, block-diagonal. Specifically, if we permute the second and and the second map transfers probability back to the the third elements of Y , we transform the third matrix as: n original states. These transfers of probability can be     shown to be QE, establishing that no more than r hidden 0 0 0 0 0 0 states are needed to implement P in a QE manner. 0 1 0 → 1 1 0 As we show in Appendix E, it is possible to reduce 1 0 1 0 0 1 this number of hidden states needed by one, which yields Theorem 8. which confirms that the rightmost of the three matrices is Since the nonnegative rank of an n × n matrix is n or LR. less, Theorem 8 implies that any n × n stochastic matrix So the RHS is a product of three LR matrices, and is QE with at most n − 1 hidden states. This is fewer therefore the LHS must be QE. This confirms that the bit than the number of hidden states provided by adding flip is QE with one hidden state. a single independent, extra binary degree of freedom to As an aside, note that the matrix on the LHS is not the system, since adding such a bit doubles the size of itself LR, even though it is a product of three LR matri- the state space. Thus, simply by adding a hidden bit ces. (This follows by verifying that there is no relabeling to a system, we can implement any stochastic matrix in of the elements of Y that changes the matrix on the LHS a thermodynamically reversible manner (presuming we of Eq. (13) into block-diagonal form.) have freedom to use arbitrary sequences of LR matrices We can easily generalize this technique to establish to implement the stochastic matrix). that any transposition is QE with one hidden state. Since Recall from Proposition 2 that some stochastic ma- any permutation can be written as a product of transpo- trices P cannot be directly implemented with zero EP, sitions, and since the product of QE matrices is QE, the even if they are embeddable. Theorem 8 tells us that next result follows immediately: it is sometimes possible to reduce entropy production of implementing an embeddable P by adding hidden states. Proposition 7. Any permutation is QE with one hidden This is illustrated with the following example. state. Example 4. The “partial” bit flip Together Corollary 3 and Proposition 7 imply that any permutation matrix requires exactly one hidden state to 2/3 1/3 P = be QE. 1/3 2/3 As mentioned in Section I, it is possible to extend our analysis of single-valued maps to countably infinite is embeddable [9] but has nonzero determinant (det P = spaces X, for suitable extensions of our definitions. Ap- 1/3), and so cannot be QE by Corollary 3. Specifically, pendix F introduces one such extension of our definitions, for any initial probability distribution p(0) (except possi- and then proves that with those extended definitions, any bly the fixed point of P , p(0) = (1/2, 1/2)|), there is some single-valued function over a countably infinite X can be unavoidable EP. For example, by Proposition 2, for the implemented with at most one hidden state. initial distribution p(0) = (1, 0)|,

4/9 Σ ≥ − ≈ 0.2 . B. General case 2 ln 1/3 However, by Theorem 8, P is QE with one hidden We now present our main result for arbitrary stochastic state. Thus, by adding a hidden state, the partial bit matrices. To begin, recall that the nonnegative rank of an flip can be carried out while achieving zero EP. Note n × n stochastic matrix P is the smallest m such that P that since the change in the Shannon entropy S(P p(0))− can be written as P = RS, where R is an n×m stochastic S(p(0)) is independent of how P is implemented, the re- matrix and S is an m×n stochastic matrix [33]. Roughly duction in EP realized when implementing P using a hid- speaking, the nonnegative rank of a matrix is analogous den state implies a decrease in the amount of entropy to the number of independent mixing components in a produced in the environment (e.g., as a reduction in gen- mixture distribution. erated heat). Theorem 8. An n × n stochastic matrix P with non- negative rank r is quasistatically embeddable with r − 1 As a final comment, while the nonnegative rank pro- hidden states. vides a general bound on the minimal number of hidden states needed to quasistatically embed a given matrix, To understand this bound intuitively, suppose we have other properties of the matrix can sometimes be used to decomposed a given n × n stochastic matrix P into a further reduce the bound. As a simple illustration, sup- product of two rectangular stochastic matrices of dimen- pose that we know the nonnegative rank of some stochas- sions n × r and r × n. These two rectangular matrices tic matrix P —but also know that P is block diagonal (up 9 to a rearrangement of rows/columns by some permuta- Type of matrix P Min. # hidden states tion matrix), and that the greatest nonnegative rank of Single-valued any of the blocks is k. Now we can implement P by implementing each block in P independently, one after Invertible (, I) 1 the other. Implementing P this way would allow us to Non-invertible 0 repeatedly “reuse” whatever hidden states we have, for Lower bound Upper bound each successive implementation of a block. This means Noisy (LE) (QE) (QE) that P is quasistatically embeddable using k − 1 hidden det P = 0 0 states. This is true even if the nonnegative rank of P is 0 (substantially) larger than k, in which case the direct ap- det P > 0 1 r − 1 plication of Theorem 8 would give a much weaker bound det P < 0 on the minimal required number of hidden states. Q 1 1 det P > i Pii Limit Embeddable 0 0 1 C. Coarse-grained states TABLE I. Minimum number of hidden states required to limit-embed [LE] or quasistatically embed [QE] a given When introducing our notion of hidden states, we in- stochastic matrix P . r is the nonnegative rank of P . dicated that our upper bounds would also apply if the stochastic matrix P described the evolution of probabil- ity over some number m of coarse-grained “macrostates” this minimal number of states needed for these two kinds which collectively have internal to them n “microstates”. of implementation, stated in terms of basic properties of To see this, it suffices to notice that a single local re- P . laxation could send all the microstates associated to Our results for different kinds of P are summarized a macrostate to one of its microstates. This concen- in Table I, which lists upper and lower bounds for both trates probability in as many microstates as there are limit-embedding (i.e., implementing to arbitrary accu- macrostates, and the resulting “empty” m−n microstates racy) and quasistatic embedding (i.e., implementing to can now be used as hidden states in the sense of our def- arbitrary accuracy in a thermodynamically reversible initions and constructions. manner) for different classes of matrices. Any QE ma- For example, consider a system with three microstates trix is limit-embeddable, so the upper bounds established {a, b, c} which are grouped into two macrostates 0 = {a} for quasistatic embedding also apply to limit-embedding. and 1 = {b, c}, representing the two states of a bit, that All the lower bounds in this table are tight, in the sense we wish to flip. This can be done by first applying the that in each class of matrices listed, there is a matrix local relaxation {a} 7→ a, {b, c} 7→ b, leaving c with no that satisfies the lower bound. probability. Now, by carrying out the bit flip opera- tion on microstates a and b using c as a hidden state (as shown in Example 3), we carry out a bit flip over the A. Interpretation of hidden states macrostates 0 and 1. Often in physics there are some states of a system or even entire physical variables that are hard to ob- VI. DISCUSSION serve and control experimentally. Such “hidden states” are often a problem to be circumvented, e.g., by coarse- In this paper, we consider implementing stochastic ma- graining. Whatever method we use for dealing with hid- trices P using a master equation. For some P , this is not den states invariably affects the predictions we make possible. However, we show that it is always possible to (e.g., coarse-graining can result in entropy increasing implement any P (to arbitrary accuracy), if we have suf- with time). However, such methods can allow us to pro- ficiently many hidden states, in addition to the visible ceed in an analysis despite our incomplete knowledge. states that P works on. We also show that it is always At other times, the presence of hidden states can be possible not only to implement P (to arbitrary accuracy), a crucial property of a system, necessary for us to use but to do so in a thermodynamically reversible manner a master equation to model the dynamics. Our results by using hidden states. The minimal number of required highlight the extent to which this is the case in differ- hidden states for such thermodynamically reversible im- ent situations. In some cases, hidden states cannot be plementation of a given stochastic matrix P is a novel ignored—an engineer designing a physical system to im- and fundamental kind of “state space cost” of carrying plement a given P must ensure that there is appropriate out P . dynamic coupling between the hidden states and the visi- We go on to derive some bounds on the minimal num- ble ones that P operates on, either implicitly or explicitly. ber of such hidden states required for any given P , either In a different context, the number of hidden states are a just to implement it, or to do so in a thermodynamically measure of how many internal states are being overlooked reversible manner. In particular, we derive bounds on by a scientist who notices that some naturally occurring 10

system evolves (over discrete time) according to P , and states according to the cyclic permutation: wants to presume that there is a master equation dynam- ics underlying that evolution. 0 0 1

It is important to emphasize that the role for hidden Ptick = 1 0 0 states uncovered in our analysis is different from the role   0 1 0 of the states of the history tape in Bennett’s reversible computation construction [25, 34, 35], or the states of each time it ticks. As we observe in Section I, this is im- the extra bits in reversible Toffoli gates [36]. Like hidden possible (since the product of the diagonal entries of Ptick states, the states used in reversible computation “facili- is less than its determinant [12, 22]). In fact, we cannot tate” the desired dynamics P , in a broad sense, without even approximate these clock dynamics by allowing an being in the space that P runs over. However, the role of arbitrarily small amount of error. For example, a “lazy” those states in reversible computation is to allow logical clock Plazy = (1 − )Ptick + I, which fails to tick with reversibility of the conditional distributions giving the probability  and slowly dephases, is not embeddable. update rules, and therefore to lower the minimal ther- However, as we show above, all permutations, includ- modynamic work (rather than the entropy production) ing Ptick, are QE with one hidden state, which means required to perform certain computations. In contrast, that if there is a hidden state in the clock (which may the conditional distributions we construct here are made be occupied between ticks) it can be arbitrarily precise from LR matrices and are not logically reversible in gen- and remember its phase, while producing arbitrarily lit- eral. tle entropy. Our more general results (e.g. Theorem 8) Indeed, the state space cost of a conditional distribu- say that periodic driving can be used to implement any tion P in some ways behaves in a manner “opposite” to variant of the stochastic matrix Ptick (e.g., modeling a the costs of P discussed in the early thermodynamics three state clock that loses coherence in a specific way or of computation literature. A logically reversible func- rate), as long as at least two hidden states are available. tion needs a hidden state to be implementable by a CTMC, while a function that is logically irreversible does not. Therefore, as far as the number of hidden states is C. Future technical directions concerned, there are advantages to being logically non- invertible rather than being logically invertible. Many of our results establish upper bounds on the number of hidden states required to quasistatically em- bed a given stochastic matrix P . We generally do not B. Biochemical oscillations know how tight these upper bounds are, even in the worst cases. In fact, we have no example of a matrix which re- One possible application of our results is to re- quires more than one hidden state for limit-embedding cent stochastic thermodynamics analyses of “Brownian or quasistatic embedding. One strategy to prove lower clocks” [37, 38]. These are biochemical oscillations in bounds would be to show that all QE matrices can be which a component (e.g. a protein) undergoes Marko- written as products of local relaxations, and then im- vian transitions (governed by some rate matrix M(t)) prove our arguments to establish that the number of hid- through a cycle of discrete states.6 den states used in our constructions (or modifications Such clock-like oscillations are forbidden at equilib- thereof) is minimal. This is a natural target for future rium. However, they can be sustained out of equilibrium work. two ways: By fixed driving forces (M(t) = M is con- We are also keen to explore the potentially fruitful con- stant but not detailed balanced), or by periodic driving nections between the classical questions we have tackled (time-dependent variation of M(t)). in this paper and questions arising in the study of open In the former case, oscillations invariably dephase, quantum systems. The state of such systems can be de- though coherence can be preserved over many periods scribed by a , and the transformation of if the system is driven strongly and has many states [38]. state between two times is given by a completely posi- Even in the latter, more general case, it is challenging to tive trace-preserving map, or “quantum channel”, which preserve phase information. In particular, if we try to de- is the quantum analogue of a stochastic matrix. The sign a Brownian clock that remembers its phase exactly, analogue of the classical master equation in this context we run into precisely the embedding problem discussed is the Lindblad master equation [39], and the analogue in this paper. For example, a clock with three states that of the embedding problem is known as the Markovianity remembers its phase (perfectly) must transform its own problem [14]. Prior work related to our concerns in this paper include [40], which studies the relationship between time-homogeneous and inhomogeneous embedding in the quantum context, as well the results in [41] on mapping 6Note that the rate equation of a monomolecular chemical reaction certain non-Lindblad master equations to systems obey- network is also of the form of Eq.(2), so the same mathematics can ing a Lindblad master equation coupled to a single extra arise with a different interpretation. “ancillary” qubit. 11

Another possible direction involves pursuing continu- space case. ous space generalizations of our results. Although master Even in naturally discrete systems, our ability to inde- equations over a finite number of states are well-suited pendently control transition rates between states could to represent many systems at some scales, continuous be constrained, which could give rise to state space costs state spaces unavoidably appear in microscopic, classical greater than the very modest upper bound (one hidden descriptions. One way to handle this would be to intro- bit) we establish here for any matrix P , and deserve fur- duce a fine discretization of space (see e.g. [42]) to turn ther exploration. say, diffusion in a compact region into a finite-state mas- Our proofs involve implementing a given conditional ter equation. Although our results presented here would distribution P via a sequence of discrete steps—local re- apply to such an approximation, they would not be ad- laxations. The number of such discrete steps required to dressing the natural questions for such a context, because implement a given P presents an interesting way of mea- they ignore the continuous structure of state space. For suring the difficulty of physically implementing a stochas- example, we have assumed throughout that the entries tic matrix. Analyzing the number of steps required to of M(t) can be controlled freely and entirely indepen- implement a given P , and how this number depends on dently of each other. In a discretized model of a con- the number of hidden states and the details of P , is an tinuous system, this would be like supposing that force important direction for future work. As a initial result in fields or diffusivities could vary arbitrarily over the (very this direction, we note that our construction in the proof small) discretization scale. As an alternative, one could of Theorem 8 requires 4rn − 10 local relaxations to be consider the embedding problem for Fokker-Planck equa- performed in sequence. Such investigations are closely tions, characterizing the set of conditional probability linked to the state space size versus timestep tradeoff for density functions that can be implemented with Fokker- single-valued stochastic matrices, which we analyze in a Planck equations, and seeing whether adding additional companion paper [16]. (See also work on a related ques- dimensions can enlarge this set. tion for the special case n = 3 in [43].) Of course, there is also an “intermediate” regime, in which the state space is infinite, but still countable. We ACKNOWLEDGMENTS present a very preliminary investigation of that regime in Appendix F. Some possibly fruitful future work would in- We would like to thank Max Tegmark for useful discus- volve extending the investigations there, e.g., to apply to sions, and the Santa Fe Institute for helping to support noisy P as well as single-valued P . It might also be fruit- this research. This paper was made possible through the ful to consider alternatives to the ways that the analysis support of Grant No. FQXi-RFP-1622 from the FQXi in Appendix F extends our basic concepts of “implement- foundation, and Grant No. CHE-1648973 from the U.S. ing P ” from the finite state space case to the infinite state National Science Foundation.

[1] Christian Van den Broeck et al. Stochastic thermody- [8] G Elfving. Zur theorie der Markoffschen ketten. Acta namics: A brief introduction. Physics of Complex Col- Soc SciFennicae n Ser A2, 1937. loids, 184:155–193, 2013. [9] J. F. C. Kingman. The imbedding problem for finite [2] C. Van den Broeck and M. Esposito. Ensemble and tra- Markov chains. Zeitschrift f¨urWahrscheinlichkeitstheorie jectory thermodynamics: A brief introduction. Physica und Verwandte Gebiete, 1(1):14–24, January 1962. A: Statistical Mechanics and its Applications, 418:6–16, [10] Halina Frydman and Burton Singer. Total positivity and January 2015. the embedding problem for Markov chains. Mathemat- [3] Udo Seifert. Stochastic thermodynamics, fluctuation the- ical Proceedings of the Cambridge Philosophical Society, orems and molecular machines. Reports on Progress in 86(02):339–344, September 1979. Physics, 75(12):126001, 2012. [11] B. Fuglede. On the imbedding problem for stochastic [4] N. G. Van Kampen. Stochastic processes in chemistry and doubly stochastic matrices. Probability Theory and and physics. Amsterdam: North Holland, 1:120–127, Related Fields, 80(2):241–260, December 1988. 1981. [12] Pedro Lencastre, Frank Raischel, Tim Rogers, and Pe- [5] Dror Givon, Raz Kupferman, and Andrew Stuart. Ex- dro G Lind. From empirical data to time-inhomogeneous tracting macroscopic dynamics: model problems and al- continuous markov processes. Physical Review E, gorithms. Nonlinearity, 17(6):R55, 2004. 93(3):032135, 2016. [6] Pierre Gaspard. Hamiltonian dynamics, nanosystems, [13] Chen Jia. A solution to the reversible embedding problem and nonequilibrium statistical mechanics. Physica A: for finite markov chains. & Probability Letters, Statistical Mechanics and its Applications, 369(1):201– 116:122–130, 2016. 246, 2006. [14] Toby S Cubitt, Jens Eisert, and Michael M Wolf. [7] Massimiliano Esposito, Ryoichi Kawai, Katja Linden- The complexity of relating quantum channels to mas- berg, and Christian Van den Broeck. Finite-time ther- ter equations. Communications in Mathematical Physics, modynamics for a single-level quantum dot. EPL (Euro- 310(2):383–418, 2012. physics Letters), 89(2):20003, 2010. [15] Johannes Bausch and Toby Cubitt. The complexity of 12

divisibility. Linear algebra and its applications, 504:64– bennett’s time-space tradeoff for reversible computation. 107, 2016. SIAM Journal on Computing, 19(4):673–677, 1990. [16] David H. Wolpert, Artemy Kolchinsky, and Jeremy A. [36] Edward Fredkin and Tommaso Toffoli. Conservative Owen. A space–time tradeoff for implementing a func- logic. International Journal of Theoretical Physics, tion with master equation dynamics. Nature Communi- 21(3):219–253, 1982. cations, 10(1):1727, 2019 [37] Andre C Barato and Udo Seifert. Cost and precision of [17] Artemy Kolchinsky and David H Wolpert. Dependence of brownian clocks. Physical Review X, 6(4):041053, 2016. dissipation on the initial distribution over states. Journal [38] Andre C Barato and Udo Seifert. Coherence of biochem- of Statistical Mechanics: Theory and Experiment, page ical oscillations is bounded by driving force and network 083202, 2017. topology. Physical Review E, 95(6):062409, 2017. [18] S. Turgut. Relations between entropies produced in non- [39] G. Lindblad. On the generators of quantum dynamical deterministic thermodynamic processes. Physical Review semigroups. Comm. Math. Phys., 48(2):119–130, 1976. E, 79(4):041102, April 2009. [40] Alexander Schnell, Andr´eEckardt, and Sergey Denisov. [19] O.J.E. Maroney. Generalizing landauer’s principle. Phys- Is there a floquet lindbladian? arXiv preprint ical Review E, 79(3):031105, 2009. arXiv:1809.11121, 2018. [20] David H Wolpert. Extending landauer’s bound from [41] Michael R. Hush, Igor Lesanovsky, and Juan P. Garra- bit erasure to arbitrary computation. arXiv preprint han. Generic map from non-lindblad to lindblad master arXiv:1508.05319, 2015. equations. Phys. Rev. A, 91:032113, Mar 2015. [21] David H Wolpert. The free energy requirements of bi- [42] Todd R Gingrich, Grant M Rotskoff, and Jordan M ological organisms; implications for evolution. Entropy, Horowitz. Inferring dissipation from current fluctua- 18(4):138, 2016. tions. Journal of Physics A: Mathematical and Theo- [22] Gerald S. Goodman. An intrinsic time for non-stationary retical, 50(18):184004, 2017. finite Markov chains. Probability Theory and Related [43] Søren Johansen and Fred L Ramsey. A bang-bang repre- Fields, 16(3):165–180, 1970. sentation for 3× 3 embeddable stochastic matrices. Prob- [23] Massimiliano Esposito and Christian Van den Broeck. ability Theory and Related Fields, 47(1):107–118, 1979. Three faces of the second law. i. master equation formu- [44] Søren Johansen. A central limit theorem for finite semi- lation. Physical Review E, 82(1):011143, 2010. groups and its application to the imbedding problem for [24] Rolf Landauer. Irreversibility and heat generation in the finite state markov chains. Zeitschrift f¨urWahrschein- computing process. IBM journal of research and devel- lichkeitstheorie und Verwandte Gebiete, 26(3):171–190, opment, 5(3):183–191, 1961. 1973. [25] Charles H Bennett. The thermodynamics of computation—a review. International Journal of Theoretical Physics, 21(12):905–940, 1982. [26] Takahiro Sagawa. Thermodynamic and logical reversibil- Appendix A: Proof of Lemma 1 ities revisited. Journal of Statistical Mechanics: Theory and Experiment, 2014(3):P03025, 2014. [27] Peter Salamon and R. Stephen Berry. Thermodynamic Lemma 1. The set of QE matrices is closed under mul- length and dissipated availability. Physical Review Let- tiplication. ters, 51(13):1127, 1983. [28] Bjarne Andresen, R. Stephen Berry, Mary Jo Ondrechen, Proof. If T (t, t0) is the transition matrix associated with and Peter Salamon. Thermodynamics for processes in rate matrix M(t) and S(t, t0) is the transition matrix as- finite time. Accounts of Chemical Research, 17(8):266– sociated with N(t), then S(1/2, 1)T (0, 1/2) = U(0, 1), 271, 1984. 0 [29] Patrick R Zulkowski and Michael R DeWeese. Optimal where U(t, t ) is associated with rate matrix: finite-time erasure of a classical bit. Physical Review E, 89(5):052140, 2014. ( 1 M(t) if t ∈ [0, 2 ] [30] Naoto Shiraishi, Ken Funo, and Keiji Saito. Speed L(t) = 1 limit for classical stochastic processes. Phys. Rev. Lett., N(t) if t ∈ ( 2 , 1] 121:070601, Aug 2018. [31] Giovanni Diana, G Baris Bagci, and Massimiliano Espos- To complete the proof note that the entropy production ito. Finite-time erasing of information stored in fermionic over the whole process is the sum of entropy production bits. Physical Review E, 87(1):012111, 2013. over both subintervals.  [32] John M. Howie. The subsemigroup generated by the idempotents of a full transformation semigroup. Jour- nal of the London Mathematical Society, 1(1):707–716, 1966. [33] Joel E. Cohen and Uriel G. Rothblum. Nonnegative Appendix B: Proof of Proposition 2 ranks, decompositions, and factorizations of nonnegative matrices. Linear Algebra and its Applications, 190:149– Proposition 2. If M(t) is the rate matrix of a CTMC 168, 1993. with T (0, 1) = P , I, then for any initial distribution p, [34] Charles H Bennett. Logical reversibility of computation. IBM journal of Research and Development, 17(6):525– 2 532, 1973. |P p − p| Σ(p, M(t), [0, 1]) ≥ − 1 , [35] Robert Y Levine and Alan T Sherman. A note on 2 ln(det P ) 13

7 00 where |·|1 is `1 norm. Now choose such a P within /2 of P . If we wish to implement P within  while producing total entropy Proof. We begin with a result from [30]. Eq. 14 in that less than δ, with one hidden state, we can do so with a paper, with rearranging, states rate matrix M(t) that implements P 00 to within /2 while producing less than δ entropy, with one hidden state. |p(τ) − p(0)|2 Σ(p(0),M(t), [0, τ]) ≥ 1 , (B1) There always is one since P 00 is QE with one hidden state. 2τhAiτ This establishes the result.  where hAiτ is the so-called “dynamical activity”,

1 Z τ X hAiτ := Mij(t)pj(t)dt Appendix D: Proof of Proposition 5, that any LR τ 0 matrix is QE i,j 1 Z τ X = − M (t)p (t)dt . τ ii i Proposition 5. Any local relaxation is QE. 0 i Proof. We first show this for local relaxations that have Since for all i, pi(t) ≤ 1 and Mii(t) ≤ 0, we can upper a single block, that is, those of the form bound the dynamical activity as P = π1T , 1 Z τ X 1 hAi ≤ − M (t)dt = − ln(det P ) , (B2) where π is an n-dimensional column-vector with positive τ τ ii τ 0 i entries that sum to 1, and 1 is an n-dimensional column- vector of 1s. We construct a sequence of CTMCs that where we’ve used P = TM (0, τ) and Eq. (5). The result approximates P arbitrarily well while achieving arbitrar- follows by combining Eqs. (B1) and (B2) and taking p = ily small total EP for some initial distribution q. p(0), P p = p(τ).  Consider the rate matrix whose off-diagonal entries are given by Mij(t) = αwi(t), where wi(t) = (1 − t)qi + tπi. Corollary 3, that any non-identity QE matrix P is sin- The associated master equation decouples, gular, follows directly from Proposition 2. Choose p to be any non-fixed-point of P . Then, in order for entropy p˙i(t) = α(wi(t) − pi(t)) , (D1) production to be made arbitrarily small, as required by the definition of QE, it must be that det P = 0. and can be solved explicitly (using the initial condition p(0) = q), p (t) = w (t) + α−1(q − π )(1 − e−αt) . (D2) Appendix C: Proof of Proposition 4 i i i i It is clear that by making α sufficiently large, this CTMC Proposition 4. If a matrix P is limit-embeddable, it is will transform any initial distribution at t = 0 arbitrarily QE with at most one hidden state. close to final distribution π at t = 1. That is,

lim pi(1) = lim wi(1) = πi(t) , Proof. Let P be a limit-embeddable matrix. By defini- α→∞ α→∞ tion, there is an embeddable matrix P 0 within any desired Thus, it will approximate P arbitrarily well. We must distance 1 of P . Now consider a stochastic matrix which keeps every now show that by making α large we can also make the state fixed except state i, which it sends to j with prob- total EP as small we like. ability α, and leaves alone with probability 1 − α. In Since the rates Mij(t) satisfy detailed balance at all control theory, such matrices are called Poisson matri- times, Mij(t)wj(t) = Mji(t)wi(t), the total entropy pro- ces. It is known that any embeddable matrix P 0 can be duction can be written as approximated arbitrarily closely by a finite product of Z 1 X pi(t) such matrices [44]. Let P 00 indicate a product of Poisson Σ = − p˙i(t) ln dt 0 0 wi(t) matrices which approximates P within a distance 2. i A Poisson matrix is QE with one hidden state by We split this integral into two parts, Lemma 9. Since P 00 is a product of Poisson matrices, P 00 is QE with one hidden states. Thus, P is arbitrarily Z 1 Z 1 close (less than  +  ) to P 00, a matrix which is QE p˙i(t) ln pi(t) dt − p˙i(t) ln wi(t) dt 1 2 0 0 with one hidden state. The first integral can be evaluated as, Z 1 Z 1 d p˙i(t) ln pi(t)dt = [pi(t) ln pi(t) − pi(t)] dt 7 0 0 dt For two probability distributions p, q over n states, the `1 norm is defined |p − q| := Pn |p − q |. 1 i=1 i i = pi(1) ln pi(1) − qi ln qi + qi − pi(1) , 14 which, as α → ∞, converges to

πi ln πi − qi ln qi + qi − πi . (D3) Step 1 (a) (b) j r The second integral can be written as Z 1 Z 1 −αt p˙i(t) ln wi(t) dt = (πi − qi) (1 − e ) ln wi(t) dt , Step 2 0 0 r r where we’ve plugged Eq. (D2) into Eq. (D1). As α → ∞, the integrand converges pointwise and monotonically to ln wi(t), which is bounded for all t ∈ (0, 1). By the Step 3 dominated convergence theorem, as α → ∞, the second r integral converges to i Z 1

(πi − qi) ln wi(t) dt (D4) 0 Z 1 = (πi − qi) ln ((1 − t)qi + tπi) dt 0 FIG. 2. (a) A composition of LRs can be used to effect a transfer of probability α from i to j, using a relay state r. (b) = πi ln πi − qi ln qi + qi − πi . (D5) A sequence of these operations can be composed to effect any Comparing Eq. (D3) and Eq. (D5), we see that in the transfer T . limit of α → 0, the two integrals cancel, and so Σ → 0. The general case of a local relaxation with multi- 1. Transfers of probability using a relay state ple blocks follows immediately, because a block diago- nal matrix is QE if the blocks are—a block diagonal In what follows, it will be useful to consider stochastic matrix A with blocks Bi can be factored as a product matrices of the form: A = (B1 ⊕ I ⊕ · · · )(I ⊕ B2 ⊕ · · · ) ··· .  " # " # IP D 0 or Appendix E: Proof of main result in Section V 0 D PI

To establish our main result we first establish some where P is an m×p matrix with positive entries with col- preliminary results. umn sums less than or equal to one, D is a p×p that makes the whole stochastic, and Lemma 9. Any 2 × 2 stochastic matrix is QE with one I is the m × m . hidden state. We call such matrices transfers, because they can be viewed as representing transfers of probability between Proof. For any real number p ∈ [0, 1], definep ¯ 1 − p. B two disjoint sets of states of sizes p and m. One can Without loss of generality, write P in matrix notation as show that: " # p¯ q P = Proposition 10. Any transfer T is QE with one hidden p q¯ state.

i→j We consider two cases. If p ≤ q¯, then if x = p/q¯ and We will make repeated use of the map, R (α), acting y = q, we have on a set with two elements, which keeps state j fixed, but sends state i to j with probability α and leaves it as i p¯ q p¯ 1 0 1 y y 0 x 0 x with probability 1 − α. Such a matrix, which is called a Poisson matrix in control theory, is QE with one hidden p q¯ p = 0 1 0 y¯ y¯ 0 0 1 0 .         state by Lemma 9. 0 0 0 0 0 0 0 0 1 x¯ 0x ¯ Proof of Proposition 10. Without loss of generality8, let If instead p > q¯, then if x =p ¯, y =q/p ¯ , we have " # D 0         T = p¯ q q 1 0 1 x x 0 1 0 0 PI         p q¯ q¯ = 0 1 0 x¯ x¯ 0 0 y y . 0 0 0 0 0 0 0 0 1 0y ¯ y¯

In both cases, the factors on the right hand side are local 8The other kind of transfer can be rewritten in this form by per- relaxations, so the result is established.  muting its rows and columns. 15

If P has only one nonzero row, whose entries are pi, where D is the diagonal matrix that makes the second i = 1, . . . , n, then factor stochastic, and I is an identity matrix of dimen- sions suitable to where it appears. To show that P can n Y i→j be implemented with one hidden state, it suffices to show T = R (pi) that each of the factors on the right hand side can be. i=1 The middle two factors represent transfers, in the sense where the indices i and j are now labeling rows described above, so they can implemented with one hid- (e.g. states) in the larger matrix T . The first n rows den state. The first and last factors represent operations performed on just two states, so by Lemma 9 can also be that i ranges over become the rows of D, and j is the −1 index of the single nonzero row of P , now as a subma- implemented with one hidden state. Note that R2D trix of T . So by construction j > n ≥ i, and the maps is stochastic, because the column sums of R2 are exactly i→j the diagonal entries of D (recall D was chosen to make R (pi) for different i do not interfere with one another (they commute and fix each others images). the second factor stochastic).  Now suppose P has two nonzero rows. Write P as the We are now ready to prove Theorem 8. sum of two matrices P = P1 + P2, each zero except for one of the rows of P . Let D1 be the diagonal matrix Theorem 8. An n × n stochastic matrix P with non- which makes negative rank r is quasistatically embeddable with r − 1 hidden states. " # D 0 1 Proof. Suppose r > 2 (if not, the result follows from P1 I Lemma 11). As before, write π = RS where R and S are stochastic of dimensions n×r and r×n, respectively. But stochastic, and write this time, decompose R and S differently: R = [Rm P ] " # " # " # " #" # Sm D 0 D D 0 D 0 D 0 and S = , where P is n × 2 and Q is 2 × n. Note 2 1 2 1 Q = = −1 PI P2 + P1 I P2D I P1 I 1 that π = RmSm + PQ.

" # " #" −1 #" # where D2 is chosen to make the matrix is appears in PQ + RmSm Rm IRm P QD 0 D 0 stochastic (this is always possible since the column sums = 0 0 0 0 0 I S I of P are all less than 1). Since the product of stochastic m matrices is stochastic, the matrix appearing on the left where D is the diagonal matrix that makes the second hand side is T , establishing the proposition for P with factor stochastic, and I is an identity matrix of dimen- two nonzero rows. The result for general P follows by sions suitable to where it appears. induction. To prove the Theorem, it suffices show that each of the  (n + k − 2) × (n + k − 2) matrices on the right hand side care QE with one hidden state. The first and last factors represent transfers from r −2 2. Minimum number of hidden states is less than hidden states to the original n states (and vice versa), nonnegative rank and can be implemented using one hidden state, as de- scribed earlier. The middle factor, which represents an Lemma 11. An n × n stochastic matrix P with nonneg- operation performed only on the original states, has non- ative rank 2 is QE with one hidden state. negative rank 2 (note that P and QD−1 are stochastic), so by Lemma 11 can also be implemented with one hid- Proof. Write the nonnegative rank decomposition P = den state. This establishes the result.  RS where R and S are stochastic matrices of dimensions n × 2 and 2 × n, respectively. We further decompose R " # Rm Appendix F: Countably infinite state spaces and S into sub-matrices: R = and S = [Sm S2], R2 We now consider the case where the state space of our where R2 and S2 are 2 × 2. Now note that system, X, is countably infinite, and restrict attention " # R S R S to the implementation of single-valued maps f : X → X P = m m m 2 over such spaces. R2Sm R2S2 In our results above, we establish quasistatic embed- dability of various matrices by writing them as products and of local relaxations. In the infinite case, we can simi- " # " #" #" #" # larly ask what is possible by composing local relaxations, R S R S I 0 IR 0 0 I 0 m m m 2 = m which in the context of single-valued maps are exactly −1 R2Sm R2S2 0 R2D 0 D Sm I 0 S2 the idempotent functions. 16

One natural way to extend our analysis to the case of Note that a cyclic permutation like [−j] and [α] af- countably infinite X is to consider a sequence of idem- fect only a finite number of points in Z, and thus can be potent functions over some Y ⊃ X that, when restricted written as a product of a finite number of transpositions, to X, gives the desired f. To allow the sequence of func- which can in turn be written as a product of three idem- tions to implement f(x) for all x ∈ X, in general we potents that use one hidden state (e.g., as in Example 3). must consider sequences that are infinite. However, the This hidden state is provided by Y , which lets us con- infinite product of local relaxations need not be QE (or struct a sequence of idempotents {gi} over Y with the even be well-defined). To circumvent this issue we im- property we desire.  pose a “practical” interpretation of what it means to im- plement f, by requiring that any particular input x ∈ X We can now use a “dovetailing” algorithm to construct is mapped to f(x) after finitely many idempotents. a sequence of local idempotents that implements any Adopting this interpretation, in this appendix we es- specified function, thereby establishing Proposition 12: tablish the following result: Proof of Proposition 12. Suppose first that f is a bijec- Proposition 12. Let X be a countable set, and take tion. Partition X according to the orbits of f—that is, Y := X ∪ {z}. For any function f : X → X there is a two points x and y are in the same part if there is some n such that y = f n(x) or x = f n(y). sequence {gi} of idempotent functions gi : Y → Y such that for all x ∈ X ⊂ Y , there is an m (which can depend There will be countably many orbits Aj. Each of the Aj is either finite and f restricted to it is a permutation, on x) such that for all r ≥ m, (gr ◦ gr−1 ◦ · · · ◦ g1)(x) = f(x). or else Aj is countably infinite and there is a numbering of its elements (using positive and negative integers) such Proposition 12 is a kind of infinite analog of Proposi- that f restricted to that orbit is n → n+1. In either case, tions 6 and 7, which showed that any function over a using Proposition 6 or Lemma 13, we can form for each finite X can be implemented with at most one hidden orbit a sequence {(gj)i} that implements the restriction state. of f to that orbit. Extend the maps in each sequence To prove Proposition 12 we begin with the following to all of Y by having them fix the elements in all other lemma: orbits. Now form a new sequence by interleaving these (count- Lemma 13. Let Y be Z with one point added. There is ably many) sequences in a way that preserves the order a sequence {gi} of idempotent functions gi : Y → Y such of elements in each individual sequence. For example, that for all n ∈ Z ⊂ Y , there is an m (which can depend on n) such that for all r ≥ m, (gr ◦ gr−1 ◦ · · · ◦ g1)(n) = {(g1)1, (g1)2, (g2)1, (g1)3, (g2)2, (g3)1,... }. n + 1. This sequence satisfies the requirements of the theorem. Proof. First, let [α] indicate the cyclic permutation Any x ∈ X is in some orbit Aj, and so there is some (−1, 0, 1) and, for any integer i ∈ Z, let [i] indicate the element of the associated sequence (gj)m after the appli- cyclic permutation (−i, i + 1, −i − 1). cation of which x is mapped to f(x) and remains so. We For any particular j ∈ Z, consider the following com- use here the fact that the interleaving to form the larger position of permutations, sequence preserves order, and that idempotents that are members of sequences corresponding to the other orbits M (j) := [j] ... [2][1][α] . fix Aj. If f is not a bijection, consider the idempotent map We show by induction that M (j) sends each state i ∈ a that, for all x ∈ X, sends all elements in the inverse {−j −1, . . . , j} to i+1 (while also sending j +1 to −j −1 image f −1(x) to a distinguished element w(x) ∈ f −1(x). as a ‘side effect’). First, it can be verified by inspection Write Z = img f ∪ img a. We can write f = h ◦ a, where that it holds true for j = 0 (when M (j) = [α]). Second, h : Z → Z is a bijection. So using the construction assume it holds true for M (j) and then observe that above, we can make a sequence of idempotents for h with M (j+1) = [j + 1]M (j) = (−j − 1, j + 2, −j − 2)M (j) . the desired property. Adding a to the beginning of the sequence forms a sequence that implements f. For each x So, applying [j + 1] after M (j) results in: there exists an w(x), and under the implementation of h, w(x) 7→ f(w(x)) once some number m idempotents are (a) −j − 2 being sent to −j − 1; applied. But note that x was first mapped to w(x), under the initial application of a. Thus, once m idempotents (j) (j) (b) the ‘side effect’ of M being fixed, in that, M of h have been applied, the combined effect on x is x 7→ mapped j + 1 7→ −j − 1, but [j + 1] maps −j − 1 7→ w(x) 7→ f(w(x)), and f(w(x)) = f(x), by the definition j + 2, so the combined result is j + 1 7→ j + 2. of w(x), so this completes the argument.  Thus, M (j+1) sends each state i ∈ {−(j + 1) − 1,..., (j + 1)} to i + 1 (while now sending (j + 1) + 1 to −(j + 1) − 1 as a ‘side effect’).