Generalized Continued Fractions: Definitions, Convergence and Applications to Markov Chains
Total Page:16
File Type:pdf, Size:1020Kb
Generalized continued fractions: Definitions, Convergence and Applications to Markov Chains Habilitationsschrift Hendrik Baumann im Jahr 2017 ii Contents 1 Introduction { Markov chains and continued fractions 1 1.1 Markov chains: Invariant measures and hitting times . 2 1.2 Notes on the history of continued fractions . 8 1.3 Block matrices and Markov chains . 12 1.4 Organization of the results . 14 I A general approach 17 2 Defining generalized continued fractions 19 2.1 Markov Chains and continued fractions . 20 2.2 S-series . 26 2.3 Definition of generalized continued fractions . 29 2.4 Literature review . 33 3 Pringsheim-type convergence criteria for generalized continued fractions 35 3.1 Equivalence transformations for gcfs . 35 3.2 Pringsheim-type convergence criteria . 36 3.3 Pringsheim-type criteria: Special cases . 42 3.4 Literature review . 48 4 GCFs and infinite systems of equations 51 4.1 The system Hx =0 ..................................... 52 4.2 Special cases . 57 4.3 Equivalence transformations for w ............................. 60 4.4 Convergence criteria involving S-series . 62 4.5 Pringsheim-type convergence criteria for w ......................... 63 4.6 Inhomogeneous systems . 66 4.7 Transposed systems . 67 4.8 Criteria using positivity and minimality . 68 iii 4.9 Systems with block matrices . 70 4.10 GCFs and Gaussian elimination . 72 4.11 Literature review . 74 5 Application to Markov chains 77 5.1 Kernels . 78 5.2 Application to discrete-time Markov processes . 84 5.3 Application to continuous-time Markov chains . 91 5.4 Literature review . 94 6 Algorithms for Markov chains 99 6.1 Absorption and hitting probabilities . 100 6.2 Computing invariant measures and long-run averages . 104 6.3 Minimality, subdominance and stabililty . 112 6.4 Literature review . 115 II Generalized continued fractions defined by sequences of numerators and denominators 117 7 GCFs generated by upper Hessenberg matrices 119 7.1 Representation by numerators and denominators . 120 7.2 Equivalence transformations revisited . 123 7.3 Pringsheim-type convergence criteria: Alternative proof . 124 7.4 Literature review . 131 8 Dominance and Subdominance 139 8.1 Miller's algorithm . 139 8.2 Adjoint systems . 143 8.3 Pincherle-type convergence criteria . 145 8.4 Literature review . 148 9 Periodic GCFs 149 9.1 Periodicity . 149 9.2 Periodic GCFs in C ..................................... 154 9.3 Periodic GCFs in Cd×d .................................... 159 9.4 Application: Waiting time distribution for M=D=1 . 165 9.5 Literature review . 170 10 Further research and open problems 173 10.1 Minimal roots . 173 iv 10.2 Poincar´e-type equations . 173 10.3 Analyticity . 173 10.4 Application to differential equations . 174 10.5 Transient distributions . 174 10.6 Parameter estimation in Markov models . 174 A Convergence of series in Banach spaces and algebras 175 A.1 Convergence and unconditional convergence . 175 A.2 Absolute convergence . 177 B Operator algebras and positive operators 179 B.1 Operator algebras . 179 B.2 Positivity on Banach lattices . 180 B.3 Positive operators . 181 C Markov chains 183 C.1 Stochastic processes and the Markov property . 183 C.2 Discrete-time Markov processes . 184 C.3 Continuous-time Markov chains . 196 D Codes 207 D.1 Code for computing stationary distributions numerically . 207 E Symbols and notations 209 v vi Chapter 1 Introduction { Markov chains and continued fractions Markov chains are used as mathematical models in various areas of applications. Among these are • queueing systems (production networks, telecommunication, . ) • epidemiology, • biochemical stochastic reaction networks, • ... In practical applications, we are interested in computing long-run characteristics of Markov chains, for example the long-run average number of customers in a queueing system. Un- fortunately, in most situations we are not able to obtain explicit representations of these characteristics, and thus, we have to use numerical procedures. In most realistic and thus detailed models, the state space of the Markov chain is very large. Often, there are infinitely many states. In these situations, the application of numerical methods becomes difficult. If the transition probability matrix or generator matrix of the Markov chain has a block- tridiagonal structure, literature [BT95] suggests using matrix-analytic solution techniques for computing the invariant distribution (from which we can obtain long-run characteristics), and it is well-established that these methods are strongly related to matrix-valued continued fractions [Han99]. Similarly, for band-structured matrices, techniques relying on appropriate generalizations of (real-valued) continued fractions were introduced [Han92]. The block-tridiagonal or band structure of a transition probability or generator matrix can be interpreted as a restriction of the dynamic behaviour of the Markov chain: a transition from state i to state j can only occur if state j is in some kind of neighbourhood of state i. In this thesis, we will drop this restriction: We will introduce an appropriate definition of generalized continued fractions (gcfs) which enables us to represent long-run characteristics of Markov chains with arbitrary transition structures in terms of gcfs. We will discuss • practical issues, that is, benefits of these representations, and 1 • theoretical issues, that is, we will compare our definition of gcfs (which is motivated by the application to Markov chains) with generalizations of continued fractions found in the literature, and we will derive convergence criteria and speed-of-convergence esti- mates for gcfs which are independent of the application to Markov chains. In chapter 2, we will discuss the relationship between continued fractions and Markov chains with tridiagonal transition structures in a detailed manner, and we will introduce our defini- tion of gcfs. Before we will start with these technical details, we will briefly present the evolution of continued fractions throughout the last centuries (see section 1.2), and we will demonstrate which problems can arise when computing long-run characteristics of Markov chains with large state spaces by means of numerical procedures. Note that in the applications which we have mentioned above, we usually use continuous-time Markov chains as mathematical models. Concerning the computation of long-run character- istics, we will see that the same problems arise for discrete-time and continuous-time Markov chains, and the same methods for solving these problems apply, see chapter 5 and chapter 6. Therefore, in this introductory chapter, we will focus on Markov chains in discrete time. For details on some terms which we will use in the next sections (irreducibility, recurrence, positive recurrence, . ), we refer to appendix C, in particular to section C.2. 1.1 Markov chains: Invariant measures and hitting times The dynamics of a discrete-time Markov chain (Xm)m2N0 with discrete state space E is characterized by the matrix P = (pij)i;j2E of the one-step transition probabilities pij = P(Xm+1 = jjXm = i). Many long-run characteristics of the process can be written in terms of • invariant measures, • hitting probabilities and • mean hitting times. 1.1.1 Invariant measures If the Markov chain is irreducible and recurrent, the system P = of linear equations has a solution which is unique up to constant multiples, and which can be chosen strictly positive. Any vector > 0 with this property is said to be an invariant measure. If additionally, P i = 1, we refer to as invariant distribution or stationary distribution. Such an invariant i2E distribution exists in case of positive recurrence. Under the conditions of irreducibility and recurrence, the Ergodic Theorem holds (we refer to appendix C for more details and more Limit Theorems for Markov chains): If is any P invariant measure, and f; g : E ! R are functions such that the sums f := if(i) and i2E 2 g converge absolutely with g 6= 0, we have m−1 P f(X ) k f lim k=0 = : m−1 m!1 P g g(Xk) k=0 almost surely. In particular, in case of positive recurrence, let π denote the invariant distri- bution. If πf converges, we have m−1 1 X f lim f(Xk) = πf = n!1 m 1 k=0 almost surely where 1 denotes the constant function with value 1. As an example, let Xn denote the number of customers in a discrete-time queueing system at time n. In the long run, we may be interested in • the average proportion of time in which the server is idle, that is m−1 1 X 1f0g lim 1f0g(Xk) = π1f0g = π0 = ; m!1 m 1 k=0 where 1A is the indicator function of set A. • the average number of customers in the system, that is m−1 1 X id lim Xk = πid = ; n!1 m 1 k=0 where id is the identity. These considerations motivate that in applications, it is highly important to be able to com- pute an invariant measure or the value of f for some function f. Unfortunately, in most applications, there is no explicit representation of or f, and thus, we have to use numerical methods. For applying these methods to Markov chains with (infinitely) large state spaces, we have to truncate the state space in an appropriate way. Without restriction, we assume E = N0. N Then it is reasonable to choose some N, and use the matrix (pij)i;j=0 for computing an approximation for , say (N). Finally, we approximate N (N) X (N) f ≈ f := n f(n): n=0 A further problem which arises in the context of computing invariant measures numerically, is instability. In order to demonstrate this effect, we consider a Markov chain with state space E = N0 and tridiagonal transition probability matrix 0 1 p00 p01 B p10 p11 p12 C P = B C : B p21 p22 p23 C @ . A .. .. .. 3 Let be an invariant measure with 0 = 1. Then P = can be rewritten as (1.1.1) 0p00 + 1p10 = 0; (1.1.2) n−1pn−1;n + npnn + n+1pn+1;n = n; n 2 N: Due to p00 + p01 = 1 and pn;n−1 + pnn + pn;n+1 = 1 for n ≥ 1, we obtain n Y pk−1;k n = ; n 2 N0 pk;k−1 k=1 with an easy induction.