RESEARCH ARTICLE Marathon: An Open Source Software Library for the Analysis of Markov-Chain Monte Carlo Algorithms
Steffen Rechner1*, Annabell Berger1,2
1 Institute of Computer Science, Martin Luther University Halle-Wittenberg, Halle (Saale), Germany, 2 German Centre of Integrative Biodiversity Research (idiv) Halle-Jena-Leipzig, Leipzig, Germany
Abstract
We present the software library marathon, which is designed to support the analysis of sam- pling algorithms that are based on the Markov-Chain Monte Carlo principle. The main appli- cation of this library is the computation of properties of so-called state graphs, which represent the structure of Markov chains. We demonstrate applications and the usefulness of marathon by investigating the quality of several bounding methods on four well-known Markov chains for sampling perfect matchings and bipartite graphs. In a set of experiments, we compute the total mixing time and several of its bounds for a large number of input instances. We find that the upper bound gained by the famous canonical path method is OPEN ACCESS often several magnitudes larger than the total mixing time and deteriorates with growing Citation: Rechner S, Berger A (2016) Marathon:An input size. In contrast, the spectral bound is found to be a precise approximation of the total Open Source Software Library for the Analysis of mixing time. Markov-Chain Monte Carlo Algorithms. PLoS ONE 11 (1): e0147935. doi:10.1371/journal.pone.0147935
Editor: Hans A Kestler, University of Ulm, GERMANY
Received: August 19, 2015
Accepted: January 11, 2016 Introduction Published: January 29, 2016 The task of random sampling is to return a randomly selected object from a typically large set
Copyright: © 2016 Rechner, Berger. This is an open of objects according to a specified probability distribution. Such tasks often arise in practical access article distributed under the terms of the applications like network analysis, where properties of a certain network of interest are to be Creative Commons Attribution License, which permits compared with those of a random null model network [1, 2]. Another application is approxi- unrestricted use, distribution, and reproduction in any mate counting of combinatorial objects. While this is typically a hard problem, the number of medium, provided the original author and source are solutions of a self-reducible problem, however, can be approximated in polynomial time using credited. randomly sampled objects [3]. We consider sampling methods which follow the so-called Mar- Data Availability Statement: Source code is kov-Chain Monte Carlo (MCMC) approach. A MCMC algorithm can be considered as a ran- available at https://github.com/srechner/marathon. dom walk on a directed graph where the vertices of this graph correspond to the set of objects Funding: The authors have no support or funding to to be sampled from. Starting at an arbitrary vertex, we go to a randomly chosen adjacent vertex report. and continue our random walk from this vertex; after some time, we stop the random walk. Competing Interests: The authors have declared The object representing the final vertex is returned as a random sample. An overview of the that no competing interests exist. surprisingly versatile applications of the MCMC approach was given by Diaconis [4].
PLOS ONE | DOI:10.1371/journal.pone.0147935 January 29, 2016 1/22 Marathon
Motivation In general, an infinite number of steps will lead to a truly random sample. But what happens if we stop the random walk after a finite number of steps? The number of steps which are neces- sary to sample from a probability distribution which is close to the desired distribution is known as the total mixing time of a Markov chain, and is of central interest for the applicabil- ity of an MCMC algorithm. There are several methods which are able to gain upper bounds on the total mixing time like the canonical path method. Sinclair [5] gives an excellent over- view about different bounding techniques. Sometimes, such bounding methods can success- fully be applied to Markov Chains to gain upper bounds on their total mixing times. When applied successfully, such upper bounds most often are high-degree polynomials on the size of the input. This is far too large to be applicable in practice, where more than a linear number of steps is infeasible for large input size. In fact, software tools like mfinder [1], which can be used for motif search in large networks, use a MCMC sampling approach with a linear num- ber of steps as default for generation of null models. This leads to the problem that the sam- pling result might not be as random as desired, resulting in a non-optimal or even incorrect behaviour of the application. On the other hand, it may be the case that upper bounds on the total mixing time are just too pessimistic. From a purely theoretical perspective it is often already a breakthrough to establish that a given Markov chain is rapidly mixing, that is, to establish a polynomial mixing time bound. Since the bounding techniques in Markov chain analysis are often fairly general and worst-case instances in terms of total mixing time are not known, it is not clear, whether the upper bounds gained by such methods are tight for some worst-case instance. However, for practical applicability, it is of eminent importance to find as sharp bounds as possible. Up to now, there is very little knowledge about the gap between the so established upper bounds and tight bounds for actual worst-case instances. Probably most researches in this field will suspect that a considerable real gap exists, but for specific Markov chains it is in general completely open how many orders of magnitude this gap may be large. By this reason, we believe that the true total mixing time might be much lower than proven by theoretical methods. Therefore, our working questions are: a) Is the total mixing time really as large as the bounding methods propose, or is it possible that the bounding methods are just not precise enough to capture the real total mixing time? b) Which bounding method has the best potential and could lead to bet- ter results when further information about the structure of state graphs is given?
Contribution In trying to answer these questions, we developed the C++ library marathon, which has been designed to compute several structural properties of Markov chains and its corresponding state graphs. The library offers the following features: • Construction of state graphs from arbitrary input instances for a set of user-definable transi- tion rules. • Computation of structural properties of state graphs.
• Computation of the total mixing time for arbitrary . • Computation of the congestion bound gained by the canonical path method with an arbi- trary, user-definable, path construction scheme. • Computation of the upper and lower spectral bound. • Computation of network properties like diameter and average path length.
PLOS ONE | DOI:10.1371/journal.pone.0147935 January 29, 2016 2/22 Marathon
We built this library to easily add algorithms for network analysis. We expect that our library could be helpful both for theoretical scientists as well as for practitioners, who implement MCMC algorithms and have to choose an appropriate number of steps. As a demonstration on what kind of research can be done with the help of marathon, we ana- lysed structural properties and investigated the quality of the total mixing time bounds on the example of four well-known Markov chains. From our experiments, we gained several insights. • We found that the congestion bound is multiple times larger than the corresponding total mixing time for almost all instances. In addition, the quality of the congestion bound deterio- rates with growing input size. This indicates that the congestion bounds are bad approxima- tions of the total mixing time, and, since this bound is often used for theoretical analysis, that the high-degree polynomial bounds from theory may be too pessimistic. • In contrast, the upper spectral bound is close to the total mixing time for all observations. Even if its quality also deteriorates with growing instance size, the spectral bounds keeps close to the total mixing time much longer than the congestion bound. • The lower spectral bound can be used as a very good approximation for all input instances we investigated in this work. The data suggests an almost linear relationship between the lower spectral bound and total mixing time. • We figured out several new structural insights about the Markov chains we investigated. For example, we found that the the second largest eigenvalue in magnitude is almost always posi- tive. The total mixing time of instances of the same input size can be very diverse and depends on the vertex degree in the state graphs. This shows that the experimental approach provided by our library may lead to structural insights which may be useful to get new intui- tions for developing more precise bounding techniques. From our experiments we can conclude that the canonical path method will not lead to tight bounds of the total mixing time, even when further information about a state graphs structure is included. Instead, developing new methods based on the spectral gap seems to be more promising.
Structure The remainder of this article is structured as follows: In the next section we give a brief intro- duction into the theory of Markov chains, in particular, in the concepts of total mixing time and its bounds. Thereafter, we present the main features of the marathon library. In the second part of this article, we demonstrate the applicability of the library in a set of experiments, while trying to answer the questions described above. The results of these experiments are shown and discussed in the final section.
Methods and Materials An important tool for understanding MCMC based sampling processes are Markov chains. To make the following sections more understandable, we will briefly introduce the most important concepts of the theory of Markov chains. For a less steep introduction into the topic, we recom- mend the lecture notes of Sinclair [6], the textbook by Levin, Peres and Wilmer [7], and the survey of Lovász [8].
Theory of Markov Chains A Markov chain can be seen as a random walk on a set O of combinatorial objects, the so-called states. Two states x, y 2 O can be connected via a transition arc (x, y)2C, when x can be
PLOS ONE | DOI:10.1371/journal.pone.0147935 January 29, 2016 3/22 Marathon
transformed into y via small local changes. This definition induces a so-called state graph Γ = (O, C), representing the objects and their adjacencies. We define for each pair of states x, y 2 O the so-called proposal probability κ: O × O ! [0, 1] and a weight function w : O ! Rþ. A step from x to y in a random walk is done with transition probability P(x, y)=κ(x, y) min(1, w(y)/
w(x)). The matrix P =(P(x, y)x, y 2 O) is called transition matrix. Algorithm 1 shows the classical random walk based sampling method. A random state b 2 O ðtÞ ’ is returned, sampled from a probability distribution pa which describes a state s probability after t steps when starting at state a 2 O. The fundamental theorem for Markov chains (see for ðtÞ example [7]) says that the distribution pa converges for t !1to the unique stationary distri- bution π with pðyÞ¼PwðyÞ for all y 2 O and initial states a 2 O, if the state graph Γ is non- wðxÞ x2O bipartite, connected and reversible with respect to π, i.e. π(x)P(x, y)=π(y)P(y, x) for each pair of states x, y 2 O. Markov chains which fulfil these properties are called ergodic. Algorithm 1 Random Walk
Input: State a 2 Ω, number t of steps. b Ω ðtÞ Output: State 2 , according to probability distribution pa . 1: x a 2: for i 1 to t do y x κ x y 3: Neighbour selection: Pick a neighbour of with probability ( , ). 4: : x y with probability 1; wðyÞ : Metropolis rule min wðxÞ 5: end for 6: return x
Total Mixing Time. The total variation distance measures the distance between two prob- ability distributions μ and η: 1 X m Z : m Z : jj jj ¼ 2 j ðxÞ ðxÞj ð1Þ x2O
fi t : N: ðtÞ p We de ne að Þ ¼ minft 2 jjpa jj g as the minimal number of steps a random walk has to take to reach a distribution which is close to to its stationary distribution when starting at state a 2 O. The total mixing time τ( ) of a Markov chain is defined as t : t : ð Þ ¼ maxf að Þg a2O ð2Þ
A Markov chain is known as rapidly mixing,ifτ( ) can be bounded from above by a polyno- − mial which depends on the input size n and the parameter 1. The total mixing time can be bounded by several techniques. We briefly present two widely used bounding methods which we investigate in the remainder of this article. λ > λ ... λ > − Spectral Bound. Let 1 = 1 2 |O| 1 be the real eigenvalues of the transition λ λ λ λ matrix P and let max be defined as max = max{| 2|, | O|}. The total mixing time can be bounded by the lower and upper spectral bound[5], i.e.,
1 t 1 l 1 p 1 ð Þ ðÞ max lnð Þþlnð minÞ ð3Þ
1 t l 1 l 1 2 1; ð4Þ ð Þ 2 maxð maxÞ lnð Þ
π π where min denotes the smallest component of . The transition matrix P is not necessarily
PLOS ONE | DOI:10.1371/journal.pone.0147935 January 29, 2016 4/22 Marathon
symmetric. However, it can be transformed into a symmetrical matrix: