The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18)

Counting Linear Extensions in Practice: MCMC versus Exponential Monte Carlo

Topi Talvitie Kustaa Kangas Teppo Niinimaki¨ Mikko Koivisto Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science University of Helsinki Aalto University Aalto University University of Helsinki topi.talvitie@helsinki.fi juho-kustaa.kangas@aalto.fi teppo.niinimaki@aalto.fi mikko.koivisto@helsinki.fi

Abstract diagram (Kangas et al. 2016), or the treewidth of the incom- parability graph (Eiben et al. 2016). Counting the linear extensions of a given partial order is a If an approximation with given probability is sufficient, #P-complete problem that arises in numerous applications. For polynomial-time approximation, several Markov chain the problem admits a polynomial-time solution for all posets Monte Carlo schemes have been proposed; however, little is using Markov chain Monte Carlo (MCMC) methods. The known of their efficiency in practice. This work presents an first fully-polynomial time approximation scheme of this empirical evaluation of the state-of-the-art schemes and in- kind for was based on algorithms for approximating the vol- vestigates a number of ideas to enhance their performance. In umes of convex bodies (Dyer, Frieze, and Kannan 1991). addition, we introduce a novel approximation scheme, adap- The later improvements were based on rapidly mixing tive relaxation Monte Carlo (ARMC), that leverages exact Markov chains in the set of linear extensions (Karzanov exponential-time counting algorithms. We show that approx- and Khachiyan 1991; Bubley and Dyer 1999) combined imate counting is feasible up to a few hundred elements on with Monte Carlo counting schemes (Brightwell and Win- various classes of partial orders, and within this range ARMC kler 1991; Banks et al. 2017). The mixing times of the typically outperforms the other schemes. Markov chains were recently studied empirically by Talvitie, Niinimaki¨ and Koivisto (2017) and found to be often much 1 Introduction better than the worst-case bounds. However, when approxi- mation guarantees are desired, the degrees of the polynomial A or poset is a set of elements coupled worst-case bounds can be prohibitively high. It is thus cur- with a binary relation that defines the mutual order between rently unclear to what extent the MCMC-based schemes are some pairs of elements while leaving other pairs incompara- usable in practical applications, especially in comparison to ble. A fundamental property of a poset is the set of its linear the recent improvements in exact counting. extensions, the possible ways to extend the poset into a lin- ear order, where all pairs of elements are comparable. This In this work we investigate the feasibility of approxi- paper concerns the problem of counting the linear extensions mate counting of linear extensions in practice by present- of a given poset, which arises in various applications such ing what is to our knowledge the first empirical evaluation as sorting (Peczarski 2004), sequence analysis (Mannila and of the state-of-the-art approximation schemes. In addition to Meek 2000), convex rank tests (Morton et al. 2009), prefer- this empirical review, we present two algorithmic contribu- ence reasoning (Lukasiewicz, Martinez, and Simari 2014), tions: First, we propose practical amendments to improve partial order plans (Muise, Beck, and McIlraith 2016), and the performance of certain MCMC-based schemes. Second, learning graphical models (Wallace, Korb, and Dai 1996; we extend the exact exponential-time algorithms to approxi- Niinimaki,¨ Parviainen, and Koivisto 2016). mate counting via a suitable relaxation of the problem, com- bined with a Monte Carlo approach. As a result, we obtain Computing the number of linear extensions of a given a novel approximation scheme, adaptive relaxation Monte poset is #P-complete (Brightwell and Winkler 1991) and Carlo (ARMC). Through the empirical evaluation, we show thus likely to be intractable in the general case. The fastest that approximate counting is feasible on posets up to a few known algorithms are based on dynamic programming and hundred elements. We also demonstrate that within this fea- require exponential time and space (De Loof, De Meyer, and sible region our ARMC scheme vastly outperforms the other De Baets 2006; Kangas et al. 2016). Polynomial-time algo- approximation schemes. rithms are known for certain notable special cases such as when the poset is series-parallel (Mohring¨ 1989), when its We introduce some central concepts of partially ordered Hasse diagram is a (Atkinson 1990), or when a certain sets in Section 2. In Section 3 we review the state-of-the-art structural parameter is bounded, such as the width (De Loof, MCMC schemes and present our practical improvements. In De Meyer, and De Baets 2006), the treewidth of the Hasse Section 4 we review the exact exponential-time algorithms and present our ARMC scheme that leverages them for ap- Copyright c 2018, Association for the Advancement of Artificial proximate counting. Section 5 presents an empirical evalua- Intelligence (www.aaai.org). All rights reserved. tion of the schemes described in Sections 3 and 4.

1431 2 Preliminaries A partial order ≺ is an irreflexive and transitive binary rela- tion. A partially ordered set or poset is a pair P =(A, ≺), where A is a set and ≺ is a partial order on A. A pair (x, y) ∈≺is called an (ordering) constraint and denoted x ≺ y for short; we say that x is a predecessor of y and y is a successor of x. A pair of elements x, y ∈ A are compara- Figure 1: An eight-element poset (left) and its cover graph. ble if x ≺ y or y ≺ x; otherwise they are incomparable.If all pairs of elements are comparable, we say that P is a lin- ear order. We visualize a poset as a 1991; Huber 2006; Bubley and Dyer 1999). A more recent with A as the vertex set and ≺ as the arc set (Fig. 1). For all approach, the Tootsie Pop algorithm, is based on random x ≺ y ≺ z, the arc x → z is implied by transitivity and can sampling from an embedding of the set of linear extensions be omitted for clarity, producing the cover graph of P . into a continuous space (Banks et al. 2017; Huber and Schott A poset P =(A, ≺ ) is an extension of P =(A, ≺) 2010). We next review these approaches. For the former ap- if x ≺ y implies x ≺ y for all x, y ∈ A; dually, in this proach, we also present a number ideas to enhance the exist- case we say that P is a relaxation of P . An extension that ing schemes when the interest is in practical performance. is also a linear order is called a linear extension. We denote by L(P ) the set of linear extensions and by (P )=|L(P )| The Telescopic Product Estimator the number of linear extensions of P . When there is no am- biguity about P , we may extend this notation to any subset Brightwell and Winkler (1991) presented the following re- duction to (approximately) uniform sampling. Construct a B ⊆ A, denoting (B):=(B,≺B) for short, where ≺B is k the restriction of ≺ to the set B. sequence of posets (Pi)i=0, starting from P0 = P and end- Throughout the paper, we consider the problem of pro- ing in a linearly ordered set Pk, such that each Pi is ob- ducing an (, δ)-approximation of (P ): For given , δ > 0 tained from the previous poset Pi−1 by adding an appro- ¯ priately chosen ordering constraint ai ≺ bi and those that and poset P , produce an estimate of (P ) such that −1   follow by transitivity. Now, write (P ) as a telescopic −1 ¯ ≥ − k Pr (1 + ) < /(P ) < 1+ 1 δ. product i=1 μi where μi = (Pi)/(Pi−1). For each fac- tor μi, use the zero–one Monte Carlo estimate μ¯i obtained The approximation schemes considered yield slight varia- by drawing s independent samples from L(Pi−1) (almost) tions of this error bound, but they can be parameterized to uniformly at random, and computing the proportion of sam- satisfy the above problem statement. k ples that fall in L(Pi). This yields an estimator μ¯ = i=1 μ¯i −1 Reduction Rules whose expected value is (P ) . The smaller the factors μi Let P =(A, ≺) be a poset. We attribute to folklore the fol- are, the higher the number of samples s per factor has to be lowing equalities that reduce the problem of computing (P ) to estimate them up to sufficient relative error. Supposing to counting linear extensions in subsets of A. the factors μi are bounded from below by a positive con- stant, putting s = O(−2k log δ−1) guarantees that 1/μ¯ is First, if {A1,A2} is a partition of A such that no element a (, δ/2)-approximation of (P ) (Talvitie, Niinimaki,¨ and in A1 is comparable with an element in A2, then   Koivisto 2017, proof of Thm. 1). |A| (P )= (A1)(A2) . | | (1) Construction by Sorting. For the reduction to be effi- A1 cient, Brightwell and Winkler (1991) choose the pairs of el- Second, if {A1,A2} is a partition of A such that a1 ≺ a2 ements (ai,bi) such that k is small while ensuring that the for all a1 ∈ A1 and a2 ∈ A2, then factors μi are bounded from below by a positive constant. This is achieved by simulating any O(n log n)-time compar- (P )=(A1)(A2) . (2) ison sorting algorithm on the set of elements A, iteratively k constructing the sequence (Pi) starting from the original Third, if A is non-empty, then, denoting by min P the set i=0 poset P . For each comparison between a pair of elements of elements of P with no predecessors, we have that  (x, y) made by the algorithm, we look up the ordering from (P )= (A \{x}) . (3) the currently last element Pi in the sequence. If the elements x∈min P are incomparable, then we use the Monte Carlo method with a sufficient number of samples to find out which or- der between the elements is more probable in the linear 3 Markov Chain Monte Carlo extensions of Pi, and add Pi augmented with that order- The state-of-the-art polynomial-time randomized schemes ing constraint to the end of the sequence, i.e., (ai+1,bi+1) for approximate counting of linear extensions have been de- is either (x, y) or (y, x). An analysis shows that in total 2 signed from the viewpoint of asymptotic worst-case time O(−2n2 log n log δ−1) samples from the uniform distribu- complexity. The traditional approach was to build a tele- tion of linear extensions (of the varying posets) suffice for scopic product estimator based on self-reducibility and an (, δ)-approximation of (P ) (Talvitie, Niinimaki,¨ and rapidly mixing Markov chains (Brightwell and Winkler Koivisto 2017); if the samples are from an almost uniform

1432 distribution, then the bound becomes slightly larger. Using g h Huber’s (2006) perfect sampler that generates a random lin- ear extension in O(n3 log n) expected time, the whole algo- e f −2 5 3 −1 rithm runs in O( n log n log δ ) expected time. c d Structure Decomposition. We next propose a way to im- a b prove the performance of the telescopic product scheme. The idea is to take into account the special structure that k d appears in the posets in the tail of the sequence (Pi)i=0, c g h provided that we use Quicksort as the sorting algorithm: In a b e f the first at most m = O(n) operations, Quicksort chooses a pivot element p ∈ A and compares all other elements to it. Thus after m operations, p divides the rest of the el- c b e g h ements into its predecessors and successors, and we can use the reduction rule (2) to continue the algorithm in the a f set of predecessors and successors separately. This idea re- peats recursively, and using a variant of Quicksort that al- a c g ways uses the median element as the pivot, the structure h is divided in half every O(n) comparisons. The time com- f plexity of obtaining a single sample after h halvings is O(2h(2−hn)3 log(2−hn)), which decays geometrically as h increases. Thus the time complexity of the O(n log n) com- f h parisons is dominated by the first O(n) comparisons, which saves a factor of log n in the running time and yields the best worst-case bound we currently are aware of: Figure 2: An illustration of structure decomposition, with the proposed improvements. The algorithm chooses the pivot Proposition 1. There is a randomized algorithm that given d, and compares it to the only remaining incomparable el- , δ > 0 and a poset on n elements, computes an (, δ)- ement c. After adding the ordering constraint c ≺ d (prob- approximation of the number of linear extensions of the μ1 =15/29 ≥ 1/2 −2 5 2 −1 ability ), the poset is split into three poset in O( n log n log δ ) expected time. components using the reduction rule (2). One of the result- In practice, we can often significantly expedite the algo- ing components, {e, f, g, h}, is further split into two compo- rithm by decomposing the poset into disjoint parts as soon nents after adding the ordering constraint e ≺ f (probability as it is possible using reduction rules (1) and (2). For ex- μ2 ≥ 3/5). The remaining subproblems are solved by ap- ample, in the (extreme) case of series–parallel posets the plying the reduction rules  (1) and (2). The number of linear −1 −1 3 3 29 5 reduction rules are enough to completely decompose the extensions is μ1 μ2 2 1 = 15 × 3 × 3 × 3=29. poset without adding new ordering constraints. To avoid the O(n) comparisons required to find the median element, we choose the pivot for Quicksort heuristically based on the cur- pling exactly from the uniform distribution of linear exten- rent poset instead. The heuristic chooses the element with sions. It currently yields the best asymptotic running time largest minimum of the number of predecessors and succes- bound for approximate counting based on the telescopic sors in the current poset. This choice of heuristic aims to product scheme. However, Huber’s (2014) later perfect sam- reduce the number of steps required to decompose the poset pler, which is based on Gibbs sampling, was recently shown into two parts. Theoretically, choosing pivot this way may to run significantly faster in various instances of practical cause Quicksort to reach its worst case of O(n2) compar- size (Talvitie, Niinimaki,¨ and Koivisto 2017), even if the isons, causing a near-linear slowdown in the algorithm, but sampler is currently not known to run in polynomial time we found this choice of pivot work well in practice. An ex- in the worst case. ample of the algorithm is shown in Fig. 2. The Tootsie Pop Algorithm Sampling Linear Extensions. Early proposals of the tele- scopic product scheme relied on sampling from almost uni- Huber and Schott (2010) presented a randomized approxi- mation scheme, called the Tootsie Pop algorithm, for esti- form distribution of linear extensions by simulating a suit- able Markov chain for given number of iterations. The num- mating the measure μ(A ) of a given set A in a continuous space under certain conditions. The algorithm estimates a ber of iterations required depends linearly on the mixing time of the Markov chain, for which an upper bound has to ratio μ(A )/μ(A ), where A is another set whose measure be known in advance. For the classical chain of Karzanov is assumed to be easy to compute, by considering a family and Khachiyan (1991) the mixing time is known to be of intermediate sets {Aβ | β ∈ R} such that Θ(n3 log n) •A ⊂A (Bubley and Dyer 1999; Wilson 2004), which β1 β2 for all β1 <β2, remains the best known worst-case bound. • → A Huber’s (2006) perfect sampler runs in O(n3 log n) ex- the function β μ( β) is continuous, pected time and is the fastest known algorithm for sam- •Aβ = A and Aβ = A for some β >β ,

1433 • there exists an algorithm that, for a given β ∈ R, draws are then interleaved uniformly at random. In this manner a 2 from the uniform distribution on Aβ. single sample can be drawn in O(n ) time. ∞ The algorithm constructs a random sequence (βi)i=0 itera- tively by setting β0 = β and βi+1 = inf{β ∈ R | Xi ∈Aβ}, Monte Carlo via Exact Counting where Xi is drawn uniformly at random from Aβi . Now the { ≥ | ≤ } We now propose a way to apply the DP algorithm to ap- random variable Z =mini 0 βi+1 β is Poisson P A A proximate counting. Given a poset , the basic idea is to distributed with expected value log(μ( )/μ( )). The ex- relax P by removing ordering constraints until the counting pected value is estimated by averaging multiple independent problem becomes feasible for the DP algorithm. We then realizations of Z. estimate the error introduced by the removal of constraints Banks et al. (2017) apply the Tootsie Pop algorithm to using Monte Carlo, similarly to Section 3 where constraints count linear extensions by embedding the set of linear ex- were added instead. Specifically, let R be a relaxation of P , tensions into a continuous space. Roughly speaking, they and let μ = (P )/(R) be the probability that a linear order define the set Aβ as a set of pairs (P , [0,w )), where P L L sampled uniformly from (R) is also in (P ). To approx- is a linear extension and w is a distance of P to an arbi- imate (P ), we first use the DP algorithm to compute (R) trarily chosen fixed linear extension; the distance is mod- and then samplem linear orders from L(R) to compute the ulated by the parameter β. They also give an algorithm 1 m estimate μ¯ = Xi, where Xi =1if the ith sample is that, for any fixed β, samples exactly from the uniform m i=1 in L(P ) and Xi =0otherwise. As m grows, μ¯ concentrates A 3 distribution on the set β in O(n log n) average time, around μ and thus μ¯ (R) concentrates around (P ). Again, and prove that it yields an (, δ)-approximation for (P ) in −2 3 2 −1 the smaller μ is, the more samples we need for an accurate O( n log (P )logn log δ ) average time. approximation. The efficiency of this scheme thus depends crucially on how the relaxation R is chosen, as it determines 4 Adaptive Relaxation Monte Carlo both μ and the time required to run the DP algorithm. In this section we present a novel scheme for approximate Before discussing the choice of R in detail, consider the counting of linear extensions, which we dub adaptive relax- number of samples required to reach the desired approxima- ation Monte Carlo (ARMC). This scheme leverages an ex- tion guarantees. Chebyshev’s inequality yields that act dynamic programming (DP) approach for counting linear − ≤ ≤ ≥ − extensions (De Loof, De Meyer, and De Baets 2006), which Pr(1 μ/μ¯ 1+) 1 δ (4) can be faster than MCMC schemes on posets of moderate 2 if we take m to be at least (1 − μ)/( δμ) . However, this size, despite its exponential complexity. We begin by briefly quantity depends on the unknown μ and therefore cannot reviewing the DP algorithm and then describe the ARMC be used directly as a stopping criterion. We use instead an scheme to use it in approximate counting. algorithm by Dagum et al. (2000), which adaptively deter- Exact Dynamic Programming mines when to stop sampling, based on the samples it has seen so far. For any random variable X distributed in [0, 1] Given a poset P =(A, ≺), the DP algorithm computes (P ) with expected value μ>0 this algorithm requests a num- by applying the reduction rule (3) recursively. This computa- ber of samples and eventually outputs an approximation μ¯ tion involves solving the subproblem (U) exactly for each with the guarantee (4). The number of samples requested is upset U of P , a set such that x ∈ U and x ≺ y imply guaranteed to be optimal within a constant factor among all y ∈ U. By caching the subproblem solutions the algorithm algorithms that achieve the same error bound. In our case we can be made to run in O(|U| w) time, where U is the set of may use a simpler precursor of the general algorithm, which upsets and w is the width of P . Dilworth’s theorem (1950) additionally requires that X be a Bernoulli variable. This al- |U| w notably implies the bound = O(n ) on the number of gorithm stops as soon as it has seen m,δ samples that fall in |U| n upsets, though in the worst case we still have =2. L(P ), where m,δ only depends on and δ. Under the prac- Kangas et al. (2016) note further that U can often be pruned tical assumption that both and δ are bounded from above −2 −1 with the reduction rule (1): Whenever through application by a constant, we have that m,δ = O( log δ ). of rule (3) the cover graph of a subproblem becomes dis- connected, rule (1) can be applied to solve the problem in- Partition Relaxations dependently for each connected component. This modifica- tion brings the worst-case running time to O(2nn2) for find- We now address the crucial step of choosing the relaxation ing the connected components but may in return reduce the R. The choice of R is effectively a tradeoff between the time number of subproblems exponentially. spent running the DP algorithm (DP phase) and the time Importantly, once the DP algorithm has finished, it also spent drawing the samples (sampling phase). If R is taken enables straightforward sampling of linear extensions from to be close to P , then μ is large and the expected number of the uniform distribution: If A is connected, the first element samples required until m,δ samples hit L(P ) is small. On x of the linear extension is drawn from min P , with the prob- the other hand, the more we allow R to deviate from P , the ability p(x)=(A \{x})/(A), and the order on the re- more constraints we can remove to speed up the DP phase. maining elements is sampled by recursing on A \{x}.IfA Since finding an optimal balance between the two phases is not connected, a linear extension is sampled recursively is presumably intractable, we propose an adaptive heuristic for each connected component and the sampled extensions strategy for selecting the relaxation.

1434 Figure 3: Two relaxations of the poset in Fig. 1, obtained by partitioning the elements into two sets of size k =4(shown in black and white) and removing all constraints between the sets (dashed arcs). The original poset has 88 linear exten- sions while the relaxations have 140 and 1260, respectively. Figure 4: The time spent in the DP phase and the sampling phase as a function of k on two randomly generated posets on 64 elements, a bipartite poset (left) and a poset of average degree 5 (right). The lines show the median and the quartiles R First, consider choosing so as to restrict the time spent over 99 runs. The variance is due to hillclimbing starting at in the DP phase. A natural way to accomplish this is to re- random partitions and converging to different relaxations. move a subset of ordering constraints such that the cover graph is split into multiple connected components. Specif- ically, we partition A into sets A1,A2,...,An/k of size phase. Figure 4 illustrates this behavior on two posets used k (the last set may be smaller) and then take R =(A, ≺), ≺ ≺ in the experiments in Section 5. We rely on the empirical where x y if and only if x y and x and y are in the observation that a good choice of k for minimizing the total same set Ai (Fig. 3). To compute (R), the DP algorithm running time tends to be where the two phases take roughly may now immediately apply the reduction rule (1) and thus k 2 the same amount of time. To find such a point, we propose solves the problem in O(2 k ) time for each of the n/k the following procedure that starts at a low value of k and components. The parameter k controls the tradeoff: increas- iteratively increases it until the phases are in balance: ing k causes the DP phase to take longer, but typically brings R closer to P and thus saves time in the sampling phase. 1. Choose a partition relaxation R with the parameter k. Assume for now that we are able to choose a good value 2. Run the DP algorithm on R to obtain (R) and to enable for k and consider the problem of choosing a partition so efficient sampling from L(R). that μ is maximized. This is equivalent to minimizing (R), which may vary by orders of magnitude between different R 3. Try running the sampling phase for R tentatively, but stop even for a fixed value of k, as illustrated in Fig. 3. Aiming to sampling after time αT2, where T2 is the time spent in minimize (R), we employ a search in the space of possible step 2 and α ≤ 1 is a constant. partitions. Since computing (R) is still relatively expen- 4. Let t3 = m,δ T3/m be estimated time to run the sam- sive, we require a more computationally feasible function pling phase in full, where T3 is the time spent sampling in for comparing candidate partitions. We choose the number step 3, and m is the number of samples that fell in L(P ). of ordering constraints removed in the relaxation as the func- tion to minimize, since removing a constraint from a poset 5. Let t2 = βT2 be estimated time to run the DP algorithm always increases the number of linear extensions. While the for a larger k, where β>1 is a constant. number of constraints induced by a partition is easy to com- 6. If t3

1435 In the experiments we set α =0.1, β =10, and κ =5, We generated five posets from each instance class and size observed to perform well on average. As a practical op- between 8 and 512 elements. We will make these instances timization, we also keep the previous relaxation found in as well as all algorithm implementations publicly available.1 step 2 if it has fewer linear extensions than the new relax- We ran each algorithm on each instance in single thread. ation. Further, if k becomes so large that the DP algorithm Each run was limited to 24 hours of CPU time and 8 GB starts running out of memory, we proceed directly to the of RAM; all running times were measured. Each algorithm sampling phase with the best relaxation found so far. was instantiated to produce a (1, 1/4)-approximation for the While the adaptive strategy enjoys empirical success, it number of linear extensions, i.e., an estimate within a factor can perform poorly in certain extreme cases. In particular, it of 2 with probability at least 3/4. Note that this particular fails to quickly detect cases where relaxations offer no ben- choice of the parameters and δ enables extrapolation also efit over the exact DP algorithm (posets of very low width). to other pairs of values: for all algorithms, except for exact and k = n is thus best choice. Very skewed structures may dynamic programming, the running times are roughly mul- −2 −1 also cause t2 to be underestimated and thus k to be increased tiplied by log4 δ . Figure 5 summarizes the results. too much. Depending on the application, these flaws can be The results are generally very similar on all instance partially remedied by employing hybrid solutions that apply classes. The Tootsie Pop algorithm is faster than the ba- specialized algorithms for edge cases and ARMC otherwise. sic telescopic product scheme, but the structure decompo- sition optimization improves the performance significantly, 5 Experiments enough to make it faster than Tootsie Pop in most cases. The We empirically evaluated the following MCMC schemes curves for the MCMC algorithms resemble straight lines in and the ARMC scheme for counting linear extensions de- the logscale plots, which means that their growth is roughly scribed in the previous sections: polynomial. Both the structure decomposition optimization and the Gibbs sampler reduce the slope of the curve, both ef- Telescopic Product. The basic scheme due to Brightwell fectively improving the performance by 1–2 orders of mag- and Winkler (1991), revisited by Talvitie et al. (2017). nitude in the largest cases solved by the MCMC algorithms. Decomposition Telescopic Product. Like the Telescopic Within our time limit, adaptive relaxation Monte Carlo is Product scheme but with structure decomposition. clearly the fastest: it solves in seconds instances that take hours with MCMC. However, by extrapolating the curves it Decomposition Telescopic Product w/ Gibbs. Like the De- seems likely that the fastest MCMC scheme will overtake composition Telescopic Product scheme, but replacing the exponential algorithms if the computations are allowed Huber’s (2006) O(n3 log n) average time linear extension to take weeks. In small instances, even the exact dynamic sampler by Huber’s (2014) newer Gibbs sampler. programming algorithm is very fast compared to the approx- Tootsie Pop. The scheme due to Banks et al. (2017). imate MCMC algorithms. Exact Dynamic Programming. The algorithm obtained by combining the reduction rules (3) and (1) as presented by 6 Conclusions Kangas et al. (2016). We have investigated the problem of approximate count- Adaptive Relaxation Monte Carlo. The scheme presented ing of linear extensions from a practical perspective. We in Section 4. presented several ideas to enhance state-of-the-art MCMC schemes, and as a side result, improved the best known We evaluated the schemes on instances of different sizes worst-case bound (Proposition 1). We also introduced a generated from the following classes of posets: novel scheme, adaptive relaxation Monte Carlo (ARMC), • AvgDeg(k) for k ∈{3, 5}: The poset is generated as the which exploits the fact that small instances can often be transitive closure of a directed acyclic graph with vertices handled very fast by dynamic programming (Kangas et al. {1, 2,...,n} with expected average degree k generated 2016). Our empirical evaluation on various instances that by adding each edge (i, j) where 1 ≤ i

1436 Telescopic Product Decomposition Telescopic Product Decomposition Telescopic Product w/ Gibbs Tootsie Pop Exact Dynamic Programming Adaptive Relaxation Monte Carlo

AvgDeg(3) AvgDeg(5) Andes 8h

1h

8 min me i ng t

i 1 min

unn 8s R

1s

0.1 s Bipartite(0.2) Bipartite(0.5) Diabetes 8h

1h

8 min me i ng t

i 1 min

unn 8s R

1s

0.1 s Link Munin Pigs 8h

1h

8 min me i ng t

i 1 min

unn 8s R

1s

0.1 s 8 16 32 64 128 256 512 8 16 32 64 128 256 512 8 16 32 64 128 256 512 Number of elements Number of elements Number of elements

Figure 5: The running times of approximation algorithms for counting linear extensions on different classes of posets as func- tions of the number of elements in the poset. All algorithms estimate the count within a factor of 2 with probability at least 3/4. The markers show the median of five independent runs, and the surrounding shaded area shows the range of the values. The time limit of 24 hours is always the limiting factor for all other algorithms except for exact dynamic programming, which always reaches the memory limit first. Both axes are logarithmic, which means that polynomials should appear as straight lines. Very small running times are outside the visible area.

We believe our instantiation of the ARMC scheme leaves ponential dynamic programming algorithm, which is appli- room for further improvements. The key step in the scheme cable to all posets. Alternatively, one could choose a special is the choice of the relaxation R, which determines the time R that admits polynomial-time counting and sampling, such required by the two main phases: computing (R) and sam- as a tree or a series–parallel poset; though our tentative ex- pling from L(R). The present work employed a simple lo- periments suggest such relaxations tend to require impracti- cal search in a subspace of relaxations, but we surmise that cally many samples, a more careful analysis is warranted. a more intelligent search strategy could improve the choice Our results on counting linear extensions raise questions substantially. Further, to compute (R) we relied on the ex- about other intractable counting problems. On one hand,

1437 can existing MCMC schemes for which known worst-case Gomes, C. P.; Sabharwal, A.; and Selman, B. 2006. Model bounds are impractical be enhanced by ideas similar to the counting: A new strategy for obtaining good bounds. In Pro- present work? On the other hand, do other problems admit ceedings of the 21st National Conference on Artificial Intel- practical exponential-time schemes analogous to ARMC? ligence, 54–61. Finally, we note that there has been recent progress Huber, M., and Schott, S. 2010. Using TPA for Bayesian in alternative paradigms for approximate counting, which inference. Bayesian Statistics 9:257–282. leverage the power of modern SAT and ILP solvers for Huber, M. 2006. Fast perfect sampling from linear exten- hard decision and optimization problems (Gomes, Sabhar- sions. Discrete 306(4):420–428. wal, and Selman 2006; Chakraborty, Meel, and Vardi 2014; Chakraborty et al. 2015; Kim, Sabharwal, and Ermon 2016). Huber, M. 2014. Near-linear time simulation of linear exten- We find it as an intriguing open question, whether such tools sions of a height-2 poset with bounded interaction. Chicago can be successfully applied to counting linear extensions. Journal of Theoretical Computer Science. Kangas, K.; Hankala, T.; Niinimaki,¨ T.; and Koivisto, M. Acknowledgments 2016. Counting linear extensions of sparse posets. In Pro- This work was supported in part by the Academy of Finland, ceedings of the 25th International Joint Conference on Arti- under Grants 276864, 303816, and 284642 (Finnish Centre ficial Intelligence, 603–609. AAAI Press. of Excellence in Computational Inference Research COIN). Karzanov, A., and Khachiyan, L. 1991. On the conductance of order Markov chains. Order 8(1):7–15. References Kim, C.; Sabharwal, A.; and Ermon, S. 2016. Exact sam- Atkinson, M. D. 1990. On computing the number of linear pling with integer linear programs and random perturba- extensions of a tree. Order 7(1):23–25. tions. In Proceedings of the 30th AAAI Conference on Arti- Banks, J.; Garrabrant, S.; Huber, M. L.; and Perizzolo, A. ficial Intelligence, 3248–3254. AAAI Press. 2017. Using TPA to count linear extensions. arXiv preprint Lukasiewicz, T.; Martinez, M. V.; and Simari, G. I. 2014. arXiv:1010.4981. Probabilistic preference logic networks. In Proceedings of Brightwell, G., and Winkler, P. 1991. Counting linear ex- the 21st European Conference on Artificial Intelligence, vol- tensions. Order 8(3):225–242. ume 263 of Frontiers in Artificial Intelligence and Applica- tions, 561–566. IOS Press. Bubley, R., and Dyer, M. 1999. Faster random generation of linear extensions. Discrete Mathematics 201(13):81–88. Mannila, H., and Meek, C. 2000. Global partial orders from sequential data. In Proceedings of the Sixth Interna- Chakraborty, S.; Fried, D.; Meel, K. S.; and Vardi, M. Y. tional Conference on Knowledge Discovery and Data Min- 2015. From weighted to unweighted model counting. In ing, 161–168. ACM. Proceedings of the 24th International Joint Conference on Artificial Intelligence, 689–695. AAAI Press. Mohring,¨ R. H. 1989. Computationally tractable classes of ordered sets. In Rival, I., ed., Algorithms and Order, 105– Chakraborty, S.; Meel, K. S.; and Vardi, M. Y. 2014. Bal- 193. Kluwer Academic Publishers. ancing scalability and uniformity in SAT witness generator. In Proceedings of the 51st Annual Design Automation Con- Morton, J.; Pachter, L.; Shiu, A.; Sturmfels, B.; and Wien- ference, 60:1–60:6. and, O. 2009. Convex rank tests and semigraphoids. SIAM Journal on Discrete Mathematics 23(3):1117–1134. Dagum, P.; Karp, R. M.; Luby, M.; and Ross, S. M. 2000. An optimal algorithm for Monte Carlo estimation. SIAM Muise, C. J.; Beck, J. C.; and McIlraith, S. A. 2016. Op- Journal on Computing 29(5):1484–1496. timal partial-order plan relaxation via MaxSAT. Journal of Artificial Intelligence Research 57:113–149. De Loof, K.; De Meyer, H.; and De Baets, B. 2006. Ex- ploiting the lattice of ideals representation of a poset. Fun- Niinimaki,¨ T.; Parviainen, P.; and Koivisto, M. 2016. Struc- damenta Informaticae 71(2,3):309–321. ture discovery in Bayesian networks by sampling partial or- ders. Journal of Machine Learning Research 17:57:1–57:47. Dilworth, R. P. 1950. A decomposition theorem for partially ordered sets. Annals of Mathematics 51(1):161–166. Peczarski, M. 2004. New results in minimum-comparison sorting. Algorithmica 40(2):133–145. Dyer, M.; Frieze, A.; and Kannan, R. 1991. A random polynomial-time algorithm for approximating the volume of Talvitie, T.; Niinimaki,¨ T.; and Koivisto, M. 2017. The mix- convex bodies. Journal of the ACM 38(1):1–17. ing of Markov chains on linear extensions in practice. In Proceedings of the 26th International Joint Conference on Eiben, E.; Ganian, R.; Kangas, K.; and Ordyniak, S. 2016. Artificial Intelligence. IJCAI. Counting linear extensions: Parameterizations by treewidth. In Proceedings of the 24th Annual European Symposium on Wallace, C. S.; Korb, K. B.; and Dai, H. 1996. Causal dis- Algorithms, volume 57 of Leibniz International Proceed- covery via MML. In Proceedings of the 13th International ings in Informatics, 39:1–39:18. Schloss Dagstuhl–Leibniz- Conference on Machine Learning, 516–524. Morgan Kauf- Zentrum fur¨ Informatik. mann. Garey, M. R.; Johnson, D. S.; and Stockmeyer, L. J. 1976. Wilson, D. B. 2004. Mixing times of lozenge tiling and card Some simplified NP-complete graph problems. Theoretical shuffling Markov chains. The Annals of Applied Probability Computer Science 1(3):237–267. 14(1):274–325.

1438