Central Analysis for Cost Accumulators in Probabilistic Programs

Di Wang Jan Hoffmann Thomas Reps Carnegie Mellon University Carnegie Mellon University University of Wisconsin USA USA USA Abstract ACM Reference Format: For probabilistic programs, it is usually not possible to au- Di Wang, Jan Hoffmann, and Thomas Reps. 2021. Central Moment tomatically derive exact information about their properties, Analysis for Cost Accumulators in Probabilistic Programs. In Pro- ceedings of the 42nd ACM SIGPLAN International Conference on such as the distribution of states at a given program point. Programming Language Design and Implementation (PLDI ’21), June Instead, one can attempt to derive approximations, such 20–25, 2021, Virtual, Canada. ACM, New York, NY, USA, 15 pages. as upper bounds on tail probabilities. Such bounds can be https://doi.org/10.1145/3453483.3454062 obtained via concentration inequalities, which rely on the moments of a distribution, such as the expectation (the first raw moment) or the (the second central moment). 1 Introduction Tail bounds obtained using central moments are often tighter Probabilistic programs [16, 23, 28] can be used to manipulate than the ones obtained using raw moments, but automati- uncertain quantities modeled by probability distributions cally analyzing central moments is more challenging. and random control flows. Uncertainty arises naturally in This paper presents an analysis for probabilistic programs Bayesian networks, which capture statistical dependencies that automatically derives symbolic upper and lower bounds (e.g., for diagnosis of diseases [21]), and cyber-physical sys- on , as well as higher central moments, of cost ac- tems, which are subject to sensor errors and peripheral dis- cumulators. To overcome the challenges of higher-moment turbances (e.g., airborne collision-avoidance systems [27]). analysis, it generalizes analyses for expectations with an Researchers have used probabilistic programs to implement algebraic abstraction that simultaneously analyzes different and analyze randomized algorithms [2], cryptographic proto- moments, utilizing relations between them. A key innova- cols [3], and machine-learning algorithms [15]. A probabilis- tion is the notion of moment-polymorphic recursion, and a tic program propagates uncertainty through a computation, practical derivation system that handles recursive functions. and produces a distribution over results. The analysis has been implemented using a template- In general, it is not tractable to compute the result distribu- based technique that reduces the inference of polynomial tions of probabilistic programs automatically and precisely: bounds to linear programming. Experiments with our pro- Composing simple distributions can quickly complicate the totype central-moment analyzer show that, despite the ana- result distribution, and randomness in the control flow can lyzer’s upper/lower bounds on various quantities, it obtains easily lead to state-space explosion. Monte-Carlo simula- tighter tail bounds than an existing system that uses only tion [33] is a common approach to study the result distribu- raw moments, such as expectations. tions, but the technique does not provide formal guarantees, and can sometimes be inefficient [5]. CCS Concepts: Theory of computation Probabilistic • In this paper, we focus on a specific yet important kind computation Program analysis Automated→ reasoning ; ; . of uncertain quantity: cost accumulators, which are pro- gram variables that can only be incremented or decremented Keywords: Probabilistic programs, central moments, cost through the program execution. Examples of cost accumu- analysis, tail bounds lators include termination time [5, 10, 11, 22, 31], rewards in Markov decision processes (MDPs) [32], position infor- mation in control systems [4, 7, 34], and cash flow during Permission to make digital or hard copies of part or all of this work for bitcoin mining [41]. Recent work [7, 24, 29, 41] has proposed personal or classroom use is granted without fee provided that copies are successful static-analysis approaches that leverage aggregate not made or distributed for profit or commercial advantage and that copies information of a cost accumulator 푋, such as 푋’s expected bear this notice and the full citation on the first page. Copyrights for third- value 피 푋 (i.e., 푋’s “first moment”). The intuition why it [ ] party components of this work must be honored. For all other uses, contact is beneficial to compute aggregate information—in lieu of the owner/author(s). distributions—is that aggregate measures like expectations PLDI ’21, June 20–25, 2021, Virtual, Canada © 2021 Copyright held by the owner/author(s). abstract distributions to a single number, while still indi- ACM ISBN 978-1-4503-8391-2/21/06. cating non-trivial properties. Moreover, expectations are https://doi.org/10.1145/3453483.3454062 transformed by statements in a probabilistic program in a PLDI ’21, June 20–25, 2021, Virtual, Canada Di Wang, Jan Hoffmann, and Thomas Reps manner similar to the weakest-precondition transformation feature [7] [29] [24] [41] this work of formulas in a non-probabilistic program [28]. loop ✓ ✓ ✓ ✓ One important kind of aggregate information is moments. recursion ✓ ✓ In this paper, we show how to obtain central moments (i.e., continuous distributions ✓ ✓ ✓ ✓ non-monotone costs ✓ ✓ ✓ 피 푋 피 푋 푘 for any 푘 2), whereas most previous work [( − [ ]) ] ≥ higher moments ✓ ✓ ✓ focused on raw moments (i.e., 피 푋 푘 for any 푘 1). Central [ ] ≥ interval bounds ✓ ✓ ✓ moments can provide more information about distributions. (a) For example, the variance 핍 푋 (i.e., 피 푋 피 푋 2 , 푋’s [ ] [( − [ ]) ] “second central moment”) indicates how 푋 can deviate from [29, 41] [24] this work 피 푋 피 푋 3 2 its , the (i.e., [( − [ 3 ])2 ] , 푋’s “third standard- Derived 피 tick 핍 tick 핍 푋 / 피 tick 2푑 4 [ ] ≤ [ ] ≤ ( [ ]) bound [ ] ≤ + 4푑2 22푑 28 22푑 28 ized moment”) indicates how lopsided the distribution of 푋 + + + 피 푋 피 푋 4 Moment type raw raw central is, and the (i.e., [( − [ ]) ] , 푋’s “fourth standard- 핍 푋 2 Concentration Markov Markov ( [ ]) Cantelli ized moment”) measures the heaviness of the tails of the inequality (degree = 1) (degree = 2) distribution of 푋. One application of moments is to answer Tail bound 1 1 푑 tail bounds 2 4 →∞ 0 queries about , e.g., the assertions about proba- ℙ tick 4푑 ≈ ≈ −−−−−→ bilities of the form ℙ 푋 푑 , via concentration-of-measure [ ≥ ] [ ≥ ] (b) inequalities from [12]. With central mo- 0.75 ments, we find an opportunity to obtain more precise tail

] [29, 41] bounds of the form ℙ 푋 푑 , and become able to derive 푑 [ ≥ ] 4 0.5

bounds on tail probabilities of the form ℙ 푋 피 푋 푑 . ≥ [| − [ ]| ≥ ] Central moments 피 푋 피 푋 푘 can be seen as polyno- [24] tick

[( − [ ]) ] [ . mials of raw moments 피 푋 , , 피 푋 푘 , e.g., the variance 0 25 [ ] ··· [ ] ℙ this work 핍 푋 = 피 푋 피 푋 2 can be rewritten as 피 푋 2 피2 푋 , [ ] [( − [ ]) ] [ ] − [ ] where 피푘 푋 denotes 피 푋 푘 . To derive bounds on central [ ] ( [ ]) 20 40 60 80 푑 moments, we need both upper and lower bounds on the raw (c) moments, because of the presence of subtraction. For exam- Figure 1. (a) Comparison in terms of supporting features. ple, to upper-bound 핍 푋 , a static analyzer needs to have an [ ] (b) Comparison in terms of moment bounds for the running upper bound on 피 푋 2 and a lower bound on 피2 푋 . [ ] [ ] example. (c) Comparison in terms of derived tail bounds. In this work, we present and implement the first fully automatic analysis for deriving symbolic interval bounds terms of tail-bound analysis on a concrete program (see §5). on higher central moments for cost accumulators in prob- The bounds are derived for the cost accumulator tick in a abilistic programs with general recursion and continuous random-walk program that we will present in §2. It can be distributions. One challenge is to support interprocedural observed that for 푑 20, the most precise tail bound for ≥ reasoning to reuse analysis results for functions. Our solution tick is the one obtained via an upper bound on the variance makes use of a “lifting” technique from the natural-language- 핍 tick (tick’s second central moment). processing community. That technique derives an algebra [Our] work incorporates ideas known from the literature: for second moments from an algebra for first moments [25]. - Using the expected-potential method (or ranking super- We generalize the technique to develop moment semirings, martingales) to derive upper bounds on the expected pro- and use them to derive a novel frame rule to handle function gram runtimes or monotone costs [9–11, 13, 24, 29]. calls with moment-polymorphic recursion (see §2.2). - Using the Optional Stopping Theorem from probability Recent work has successfully automated inference of up- theory to ensure the soundness of lower-bound inference per [29] or lower bounds [41] on the expected cost of prob- for probabilistic programs [1, 17, 35, 41]. abilistic programs. Kura et al.[24] developed a system to - Using linear programming (LP) to efficiently automate derive upper bounds on higher raw moments of program run- the (expected) potential method for (expected) cost analy- times. However, even in combination, existing approaches sis [18, 19, 40]. cannot solve tasks such as deriving a lower bound on the The contributions of our work are as follows: second raw moment of runtimes, or deriving an upper bound We develop moment semirings to compose the moments • on the variance of accumulators that count live heap cells. for a cost accumulator from two computations, and to Fig.1(a) summarizes the features of related work on mo- enable interprocedural reasoning about higher moments. ment inference for probabilistic programs. To the best of our We instantiate moment semirings with the symbolic in- • knowledge, our work is the first moment-analysis tool that terval domain, use that to develop a derivation system supports all of the listed programming and analysis features. for interval bounds on higher central moments for cost Fig.1(b) and (c) compare our work with related work in accumulators, and automate the derivation via LP solving. Central Moment Analysis for Cost Accumulators in Probabilistic Programs PLDI ’21, June 20–25, 2021, Virtual, Canada

1 func rdwalk() begin 2.1 The Expected-Potential Method for 2 2 푑 푥 4 { ( − ) + } Higher-Moment Analysis 3 if 푥 < 푑 then Our approach to higher-moment analysis is inspired by the 4 2 푑 푥 4 def { ( − ) + } 1# XID = 푥,푑, 푡 5 푡 uniform 1, 2 ; { } expected-potential method [29], which is also known as rank- ∼ (− ) def 6 2 푑 푥 푡 5 2# FID = rdwalk ing super-martingales [9, 10, 24, 41], for expected-cost bound { ( − − ) + } { } 7 푥 B 푥 푡; 3# pre-condition: 푑 > 0 analysis of probabilistic programs. + { } 8 2 푑 푥 5 4 func main() begin The classic potential method of amortized analysis [36] { ( − ) + } 9 call rdwalk; 5 푥 B 0; can be automated to derive symbolic cost bounds for non- 10 1 6 call rdwalk { } probabilistic programs [18, 19]. The basic idea is to define 11 tick(1) 7 end a potential function 휙 : Σ ℝ+ that maps program states 12 0 → { } 휎 Σ to nonnegative numbers, where we assume each 13 fi ∈ state 휎 contains a cost-accumulator component 휎.훼. If a 14 end program executes with initial state 휎 to final state 휎 ′, then it Figure 2. A bounded, biased random walk, implemented holds that 휙 휎 휎 .훼 휎.훼 휙 휎 , where 휎 .훼 휎.훼 ( ) ≥ ( ′ − ) + ( ′) ( ′ − ) using recursion. The annotations show the derivation of an describes the accumulated cost from 휎 to 휎 ′. The potential upper bound on the expected accumulated cost. method also enables compositional reasoning: if a statement 푆1 executes from 휎 to 휎 ′ and a statement 푆2 executes from 휎 to 휎 , then we have 휙 휎 휎 .훼 휎.훼 휙 휎 and ′ ′′ ( ) ≥ ( ′ − ) + ( ′) 휙 휎 휎 .훼 휎 .훼 휙 휎 ; therefore, we derive 휙 휎 ( ′) ≥ ( ′′ − ′ ) + ( ′′) ( ) ≥ 휎 .훼 휎.훼 휙 휎 for the sequential composition 푆 ; 푆 . ( ′′ − ) + ( ′′) 1 2 For non-probabilistic programs, the initial potential provides We prove the soundness of our derivation system for pro- • an upper bound on the accumulated cost. grams that satisfy the criterion of our recent extension to This approach has been adapted to reason about expected the Optional Stopping Theorem, and develop an algorithm costs of probabilistic programs [29, 41]. To derive upper for checking this criterion automatically. bounds on the expected accumulated cost of a program 푆 We implemented our analysis and evaluated it on a broad • with initial state 휎, one needs to take into consideration the suite of benchmarks from the literature. Our experimental distribution of all possible executions. More precisely, the results show that on a variety of examples, our analyzer is potential function should satisfy the following property: able to use higher central moments to obtain tighter tail 휙 휎 피휎′ 푆 휎 퐶 휎, 휎 ′ 휙 휎 ′ , (1) bounds on program runtimes than the system of Kura et al. ( ) ≥ ∼J K( ) [ ( ) + ( )] where the notation 피푥 휇 푓 푥 represents the [24], which uses only upper bounds on raw moments. ∼ [ ( )] of 푓 푥 , where 푥 is drawn from the distribution 휇, 푆 휎 is ( ) ( ) the distribution over final states of executing 푆 fromJ K휎, and 2 Overview def 퐶 휎, 휎 ′ = 휎 ′.훼 휎.훼 is the execution cost from 휎 to 휎 ′. In this section, we demonstrate the expected-potential ( ) − method for both first-moment analysis (previous work) and Example 2.2. Fig.2 annotates the rdwalk function from higher central-moment analysis (this work) (§2.1), and dis- Ex. 2.1 with the derivation of an upper bound on the expected cuss the challenges to supporting interprocedural reasoning accumulated cost. The annotations, taken together, define an and to ensuring the soundness of our approach (§2.2). expected-potential function 휙 : Σ ℝ where a program → + state 휎 Σ consists of a program point and a valuation ∈ for program variables. To justify the upper bound 2 푑 Example 2.1. The program in Fig.2 implements a bounded, ( − 푥 4 for the function rdwalk, one has to show that the biased random walk. The main function consists of a single ) + potential right before the tick 1 statement should be at statement “call rdwalk” that invokes a recursive function. ( ) The variables 푥 and 푑 represent the current position and the least 1. This property is established by backward reasoning ending position of the random walk, respectively. We assume on the function body: For call rdwalk, we apply the “induction hypothesis” that that 푑 > 0 holds initially. In each step, the program samples • the length of the current move from a uniform distribution the expected cost of the function rdwalk can be upper- bounded by 2 푑 푥 4. Adding the 1 unit of potential on the interval 1, 2 . The statement tick 1 adds one to a ( − ) + [− ] ( ) need by the tick statement, we obtain 2 푑 푥 5 as the cost accumulator that counts the number of steps before the ( − ) + random walk ends. We denote this accumulator by tick in the pre-annotation of the function call. For 푥 B 푥 푡, we substitute 푥 with 푥 푡 in the post- rest of this section. The program terminates with probability • + + one and its expected accumulated cost is bounded by 2푑 4. annotation of this statement to obtain the pre-annotation. + For 푡 uniform 1, 2 , because its post-annotation is • ∼ (− ) 2 푑 푥 푡 5, we compute its pre-annotation as ( − − ) + PLDI ’21, June 20–25, 2021, Virtual, Canada Di Wang, Jan Hoffmann, and Thomas Reps

피푡 uniform 1,2 2 푑 푥 푡 5 ∼ (− ) [ ( − − ) + ] 1 func rdwalk() begin 2 = 2 푑 푥 5 2 피푡 uniform 1,2 푡 2 1, 2 푑 푥 4, 4 푑 푥 22 푑 푥 28 ( − ) + − · ∼ (− ) [ ] {⟨ ( − ) + ( − ) + ( − ) + ⟩} = 2 푑 푥 5 2 1 2 = 2 푑 푥 4, 3 if 푥 < 푑 then ( − ) + − · / ( − ) + 4 1, 2 푑 푥 4, 4 푑 푥 2 22 푑 푥 28 {⟨ ( − ) + ( − ) + ( − ) + ⟩} which is exactly the upper bound we want to justify. 5 푡 uniform 1, 2 ; ∼ (− ) 6 1, 2 푑 푥 푡 5, 4 푑 푥 푡 2 26 푑 푥 푡 37 Our approach. In this paper, we focus on derivation of {⟨ ( − − ) + ( − − ) + ( − − ) + ⟩} 7 푥 B 푥 푡; higher central moments. Observing that a central moment + 8 1, 2 푑 푥 5, 4 푑 푥 2 26 푑 푥 37 푘 {⟨ ( − ) + ( − ) + ( − ) + ⟩} 피 푋 피 푋 can be rewritten as a polynomial of raw mo- 9 call rdwalk; [( − [ ]) ] 푘 ments 피 푋 , , 피 푋 , we reduce the problem of bounding 10 1, 1, 1 [ ] ··· [ ] {⟨ ⟩} central moments to reasoning about upper and lower bounds 11 tick(1) on raw moments. For example, the variance can be written 12 1, 0, 0 {⟨ ⟩} as 핍 푋 = 피 푋 2 피2 푋 , so it suffices to analyze the upper 13 fi [ ] [ ] − [ ] bound of the second moment 피 푋 2 and the lower bound 14 end [ ] on the square of the first moment 피2 푋 . For higher cen- [ ] Figure 3. Derivation of an upper bound on the first and tral moments, this approach requires both upper and lower second moment of the accumulated cost. bounds on higher raw moments. For example, consider the fourth central moment of a nonnegative 푋: Example 2.3. Fig.3 annotates the rdwalk function from 피 푋 피 푋 4 = 피 푋 4 4피 푋 3 피 푋 6피 푋 2 피2 푋 3피4 푋 . [( − [ ]) ] [ ] − [ ] [ ] + [ ] [ ] − [ ] Ex. 2.1 with the derivation of an upper bound on both the Deriving an upper bound on the fourth central moment re- first and second moment of the accumulated cost. To justify quires lower bounds on the first피 ( 푋 ) and third (피 푋 3 ) [ ] [ ] the first and second moment of the accumulated cost forthe raw moments. function rdwalk, we again perform backward reasoning: We now sketch the development of moment semirings. For tick 1 , it transforms a post-annotation 푎 by We first consider only the upper bounds on higher mo- • ( ) 휆푎. 1, 1, 1 푎 ; thus, the pre-annotation is 1, 1, 1 ments of nonnegative costs. To do so, we extend the range (⟨ ⟩ ⊗ ) ⟨ ⟩ ⊗ 1, 0, 0 = 1, 1, 1 . of the expected-potential function 휙 to real-valued vectors ⟨ ⟩ ⟨ ⟩ 푚 1 For call rdwalk, we apply the “induction hypothesis”, i.e., ℝ+ + , where 푚 ℕ is the degree of the target moment. • ( ) ∈ the upper bound shown on line 2. We use the opera- We update the potential inequality (1) as follows: ⊗ tor to compose the induction hypothesis with the post- 푘 휙 휎 피휎 푆 휎 −−−−−−−−−−−−−−−→퐶 휎, 휎 ′ 0 푘 푚 휙 휎 ′ , (2) annotation of this function call: ( ) ≥ ′∼ ( ) [⟨ ( ) ⟩ ≤ ≤ ⊗ ( )] J K 1, 2 푑 푥 4, 4 푑 푥 2 22 푑 푥 28 1, 1, 1 where −−−−−−−−→푣푘 0 푘 푚 denotes an 푚 1 -dimensional vector, ⟨ ( − )+ ( − ) 2+ ( − )+ ⟩ ⊗ ⟨ ⟩ ⟨ ⟩ ≤ ≤ ( + ) = 1, 2 푑 푥 5, 4 푑 푥 22 푑 푥 28 2 2 푑 푥 4 1 the order on vectors is defined pointwise, and is some ⟨ ( − )+ ( ( − ) + ( − )+ )+ ·( ( − )+ )+ ⟩ ≤ ⊗ = 1, 2 푑 푥 5, 4 푑 푥 2 26 푑 푥 37 . composition operator. Recall that 푆 휎 denotes the distri- ⟨ ( − )+ ( − ) + ( − )+ ⟩ ( ) bution over final states of executingJ K푆 from 휎, and 퐶 휎, 휎 For 푥 B 푥 푡, we substitute 푥 with 푥 푡 in the post- ( ′) • + + describes the cost for the execution from 휎 to 휎 ′. Intuitively, annotation of this statement to obtain the pre-annotation. For 푡 uniform 1, 2 , because the post-annotation in- for 휙 휎 = −−−−−−−−−−−−→휙 휎 푘 0 푘 푚 and each 푘, the component 휙 휎 푘 • ∼ (− ) ( ) ⟨ ( ) ⟩ ≤ ≤ ( ) 2 is an upper bound on the 푘-th moment of the cost for the volves both 푡 and 푡 , we compute from the definition of computation starting from 휎. The 0-th moment is the ter- uniform distributions that 1 2 mination probability of the computation, and we assume it 피푡 uniform 1,2 푡 = 2, 피푡 uniform 1,2 푡 = 1. ∼ (− ) [ ] / ∼ (− ) [ ] is always one for now. We cannot simply define as point- Then the upper bound on the second moment is derived ⊗ wise addition because, for example, 푎 푏 2 ≠ 푎2 푏2 in as follows: ( + ) + 2 general. If we think of 푏 as the cost for some probabilis- 피푡 uniform 1,2 4 푑 푥 푡 26 푑 푥 푡 37 ∼ (− ) [ ( − − ) + ( − − )+ ] tic computation, and we prepend a constant cost 푎 to the 2 = 4 푑 푥 26 푑 푥 37 8 푑 푥 26 피푡 uniform 1,2 푡 computation, then by linearity of expectations, we have ( ( − ) + ( − )+ 2)−( ( − )+ )· ∼ (− ) [ ] 4 피푡 uniform 1,2 푡 피 푎 푏 2 = 피 푎2 2푎푏 푏2 = 푎2 2 푎 피 푏 피 푏2 , + · ∼ (− ) [ ] [( + ) ] [ + + ] + · · [ ] + [ ] = 4 푑 푥 2 22 푑 푥 28, i.e., reasoning about the second moment requires us to keep ( − ) + ( − )+ track of the first moment. Similarly, we should have which is the same as the desired upper bound on the sec- 2 ond moment of the accumulated cost for the function 휙 휎 2 피휎′ 푆 휎 퐶 휎, 휎 ′ 2 퐶 휎, 휎 ′ 휙 휎 ′ 1 휙 휎 ′ 2 , ( ) ≥ ∼J K( ) [ ( ) + · ( )· ( ) + ( ) ] for the second-moment component, where 휙 휎 and 휙 휎 rdwalk. (See Fig.3, line 2.) ( ′)1 ( ′)2 denote 피 푏 and 피 푏2 , respectively. Therefore, the compo- [ ] [ ] We generalize the composition operator to moments sition operator for second-moment analysis (i.e., 푚 = 2) ⊗ ⊗ with arbitrarily high degrees, via a family of algebraic struc- should be defined as tures, which we name moment semirings (see §3.2). These 1, 푟 , 푠 1, 푟 , 푠 def= 1, 푟 푟 , 푠 2푟 푟 푠 . (3) semirings are algebraic in the sense that they can be instan- ⟨ 1 1⟩ ⊗ ⟨ 2 2⟩ ⟨ 1 + 2 1 + 1 2 + 2⟩ tiated with any partially ordered semiring, not just ℝ+. Central Moment Analysis for Cost Accumulators in Probabilistic Programs PLDI ’21, June 20–25, 2021, Virtual, Canada

Interval bounds. Moment semirings not only provide a We then use the operator to derive a frame rule: ⊕ general method to analyze higher moments, but also enable 푄1 푆 푄1′ 푄2 푆 푄2′ reasoning about upper and lower bounds on moments simul- { } { }{ } { } 푄1 푄2 푆 푄 ′ 푄 ′ taneously { ⊕ } { 1 ⊕ 2 } . The simultaneous treatment is also essential for We define as pointwise addition, i.e., for second moments, analyzing programs with non-monotone costs (see §3.3). ⊕ def We instantiate moment semirings with the standard inter- 푝1, 푟1, 푠1 푝2, 푟2, 푠2 = 푝1 푝2, 푟1 푟2, 푠1 푠2 , (4) ⟨ ⟩ ⊕ ⟨ ⟩ ⟨ + + + ⟩ val semiring I = 푎,푏 푎 푏 . The algebraic approach and because the 0-th-moment (i.e., termination-probability) {[ ] | ≤ } allows us to systematically incorporate the interval-valued component is no longer guaranteed to be one, we redefine bounds, by reinterpreting operations in eq. (3) under I: to consider the termination probabilities: ⊗ def L U L U L U L U 푝1, 푟1, 푠1 푝2, 푟2, 푠2 = 푝1푝2, 푝2푟1 푝1푟2, 푝2푠1 2푟1푟2 푝1푠2 . 1, 1 , 푟1 , 푟1 , 푠1, 푠1 1, 1 , 푟2 , 푟2 , 푠2, 푠2 ⟨ ⟩⊗⟨ ⟩ ⟨ + + + ⟩ ⟨[ ] [ ] [ ]⟩ ⊗ ⟨[ ] [ ] [ ]⟩ (5) def L U L U = 1, 1 , 푟1 , 푟1 I 푟2 , 푟2 , ⟨[ L ]U[ ] + L[ U ] L U L U Remark 2.5. As we will show in §3.2, the composition oper- 푠 , 푠 I 2 푟 , 푟 I 푟 , 푟 I 푠 , 푠 [ 1 2 ] + ·([ 1 1 ]· [ 2 2 ]) + [ 2 2 ]⟩ ator and combination operator form a moment semiring; = 1, 1 , 푟 L 푟 L, 푟 U 푟 U , 푠L 2 min 푆 푠L, 푠U 2 max 푆 푠U , ⊗ ⊕ ⟨[ ] [ 1 + 2 1 + 2 ] [ 1 + · + 2 1 + · + 2 ]⟩ consequently, we can use algebraic properties of semirings (e.g., distributivity) to aid higher-moment analysis. For example, where 푆 def= 푟 L푟 L, 푟 L푟 U, 푟 U푟 L, 푟 U푟 U . We then update the po- { 1 2 1 2 1 2 1 2 } a vector 0, 푟1, 푠1 whose termination-probability component tential inequality eq. (2) as follows: is zero does⟨ not seem⟩ to make sense, because moments with −−−−−−−−−−−−−−−−−−−−−−−−−−−→푘 푘 respect to a zero distribution should also be zero. However, by 휙 휎 피휎′ 푆 휎 퐶 휎, 휎 ′ ,퐶 휎, 휎 ′ 0 푘 푚 휙 휎 ′ , ( ) ⊒ ∼J K( ) [⟨[ ( ) ( ) ]⟩ ≤ ≤ ⊗ ( )] distributivity, we have where the order is defined as pointwise interval inclusion. 1, 푟 , 푠 1, 푟 푟 , 푠 푠 ⊑ ⟨ 3 3⟩ ⊗ ⟨ 1 + 2 1 + 2⟩ = 1, 푟 , 푠 0, 푟 , 푠 1, 푟 , 푠 Example 2.4. Suppose that the interval bound on the first ⟨ 3 3⟩ ⊗ (⟨ 1 1⟩ ⊕ ⟨ 2 2⟩) = 1, 푟 , 푠 0, 푟 , 푠 1, 푟 , 푠 1, 푟 , 푠 . moment of the accumulated cost of the rdwalk function from (⟨ 3 3⟩ ⊗ ⟨ 1 1⟩) ⊕ (⟨ 3 3⟩ ⊕ ⟨ 2 2⟩) Ex. 2.1 is 2 푑 푥 , 2 푑 푥 4 . We can now derive the If we think of 1, 푟1 푟2, 푠1 푠2 as a post-annotation of a [ ( − ) ( − ) + ] ⟨ + + ⟩ upper bound on the variance 핍 tick 22푑 28 shown in computation whose moments are bounded by 1, 푟3, 푠3 , the [ ] ≤ + ⟨ ⟩ Fig.1(b) (where we substitute 푥 with 0 because the main equation above indicates that we can use to decompose the ⊕ function initializes 푥 to 0 on line 5 in Fig.2): post-annotation into subparts, and then reason about each sub- part separately. This fact inspires us to develop a decomposition 핍 tick = 피 tick2 피2 tick [ ] [ ] − [ ] technique for moment-polymorphic recursion. upper bnd. on 피 tick2 lower bnd. on 피 tick 2 ≤ ( 2 [ ])−(2 [ ]) Example 2.6. With the operator and the frame rule, we = 4푑 22푑 28 2푑 = 22푑 28. ⊕ ( + + ) − ( ) + only need to analyze the rdwalk function from Ex. 2.1 with In §5, we describe how we use moment bounds to derive three post-annotations: 1, 0, 0 , 0, 1, 1 , and 0, 0, 2 , which ⟨ ⟩ ⟨ ⟩ ⟨ ⟩ the tail bounds shown in Fig.1(c). form a kind of “elimination sequence.” We construct this sequence in an on-demand manner; the first post-annotation 2.2 Two Major Challenges is the identity element 1, 0, 0 of the moment semiring. For post-annotation ⟨1, 0, 0⟩, as shown in Fig.3, we need Interprocedural reasoning. Recall that in the derivation to know the moment⟨ bound⟩ for rdwalk with the post- of Fig.3, we use the operator to compose the upper bounds ⊗ annotation 1, 1, 1 . Instead of reanalyzing rdwalk with the on moments for call rdwalk and its post-annotation 1, 1, 1 . ⟨ ⟩ ⟨ ⟩ post-annotation 1, 1, 1 , we use the operator to compute However, this approach does not work in general, because the “difference”⟨ between⟩ it and the⊕ previous post-annotation the post-annotation might be symbolic (e.g., 1, 푥, 푥2 ) and 1, 0, 0 . Observing that 1, 1, 1 = 1, 0, 0 0, 1, 1 , we now ⟨ ⟩ the callee might mutate referenced program variables (e.g., 푥). analyze⟨ ⟩ rdwalk with 0, ⟨1, 1 as⟩ the⟨ post-annotation:⟩ ⊕ ⟨ ⟩ ⟨ ⟩ One workaround is to derive a pre-annotation for each possi- 1 call rdwalk; 0, 1, 3 # = 1, 1, 1 0, 1, 1 {⟨ ⟩} ⟨ ⟩ ⊗ ⟨ ⟩ ble post-annotation of a recursive function, i.e., the moment 2 tick 1 0, 1, 1 ( ){⟨ ⟩} annotations for a recursive function is polymorphic. This Again, because 0, 1, 3 = 0, 1, 1 0, 0, 2 , we need to fur- workaround would not be effective for non-tail-recursive ther analyze rdwalk⟨ with⟩ ⟨0, 0, 2⟩ ⊕as ⟨ the post-annotation:⟩ ⟨ ⟩ functions: for example, we need to reason about the rdwalk 1 call rdwalk; 0, 0, 2 # = 1, 1, 1 0, 0, 2 {⟨ ⟩} ⟨ ⟩ ⊗ ⟨ ⟩ function in Fig.3 with infinitely many post-annotations 2 tick 1 0, 0, 2 2 ( ){⟨ ⟩} 1, 0, 0 , 1, 1, 1 , 1, 2, 4 , ..., i.e., 1, 푖, 푖 for all 푖 ℤ+. ⟨ ⟩ ⟨ ⟩ ⟨ ⟩ ⟨ ⟩ ∈ With the post-annotation 0, 0, 2 , we can now rea- Our solution to moment-polymorphic recursion is to intro- ⟨ ⟩ duce a combination operator in a way that if 휙 and 휙 are son monomorphically without analyzing any new post- ⊕ 1 2 two expected-potential functions, then annotation! We can perform a succession of reasoning steps similar to what we have done in Ex. 2.2 to justify the follow- −−−−−−−−−−−−−−→푘 휙1 휎 휙2 휎 피휎′ 푆 휎 퐶 휎, 휎 ′ 0 푘 푚 휙1 휎 ′ 휙2 휎 ′ . ing bounds (“unwinding” the elimination sequence): ( )⊕ ( ) ≥ ∼J K( ) [⟨ ( ) ⟩ ≤ ≤ ⊗( ( )⊕ ( ))] PLDI ’21, June 20–25, 2021, Virtual, Canada Di Wang, Jan Hoffmann, and Thomas Reps

1 func geo() begin 1, 2푥 {⟨푥 1 ⟩} 푆 F skip tick 푐 푥 B 퐸 푥 퐷 call 푓 while 퐿 do 푆 od 2 푥 B 푥 1; 1, 2 − | ( ) | | ∼ | | + {⟨ ⟩} if prob 푝 then 푆 else 푆 fi if 퐿 then 푆 else 푆 fi 푆 ; 푆 3 # expected-potential method for lower bounds: | ( ) 1 2 | 1 2 | 1 2 푥 1 1 푥 1 4# 2 − < 2 2 1 2 0 퐿 F true not 퐿 퐿1 and 퐿2 퐸1 퐸2 / ·( + ) + /푥 · | | | ≤ 5 if prob 1 2 then 1, 2 1 ( / ) {⟨ + ⟩} 퐸 F 푥 푐 퐸1 퐸2 퐸1 퐸2 6 tick 1 ; 1, 2푥 | | + | × ( ) {⟨ ⟩} 퐷 F uniform 푎,푏 7 call geo 1, 0 ( ) | · · · {⟨ ⟩} 8 fi Figure 5. Syntax of Appl, where 푝 0, 1 , 푎,푏, 푐 ℝ, 푎 < 푏, 9 end ∈ [ ] ∈ 푥 VID is a variable, and 푓 FID is a function identifier. ∈ ∈ Figure 4. A purely probabilistic loop with annotations for a lower bound on the first moment of the accumulated cost. bounding the costs of non-probabilistic programs is a par- tial-correctness method, i.e., derived upper/lower bounds are 0, 0, 2 rdwalk 0, 0, 2 : Directly by backward reason- sound if the analyzed program terminates [30]. With proba- •{⟨ ⟩} {⟨ ⟩} ing with the post-annotation 0, 0, 2 . bilistic programs, many programs do not terminate definitely, 0, 1, 4 푑 푥 9 rdwalk ⟨0, 1, 1⟩ : To analyze the re- but only almost surely, i.e., they terminate with probability •{⟨cursive call( − with) + post-annotation⟩} {⟨ 0, 1⟩}, 3 , we use the frame one, but have some execution traces that are non-terminating. rule with the post-call-site annotation⟨ ⟩ 0, 0, 2 to derive The programs in Figs.2 and4 are both almost-surely termi- ⟨ ⟩ 0, 1, 4 푑 푥 11 as the pre-annotation: nating. For the expected-potential method, the potential at a 1⟨ 0,( 1,−4 푑) +푥 ⟩ 11 # = 0, 1, 4 푑 푥 9 0, 0, 2 {⟨ ( − ) + ⟩} ⟨ ( − ) + ⟩ ⊕ ⟨ ⟩ program state can be seen as an average of potentials needed 2 call rdwalk; for all possible computations that continue from the state. 3 0, 1, 3 # = 0, 1, 1 0, 0, 2 {⟨ ⟩} ⟨ 2 ⟩ ⊕ ⟨ ⟩ If the program state can lead to a non-terminating execu- 1, 2 푑 푥 4, 4 푑 푥 22 푑 푥 28 rdwalk 1, 0, 0 : tion trace, the potential associated with that trace might be •{⟨To analyze( − )+ the recursive( − ) + call( with− post-annotation)+ ⟩} {⟨1, 1, 1⟩}, problematic, and as a consequence, the expected-potential we use the frame rule with the post-call-site annotation⟨ ⟩ method might fail. 0, 1, 1 to derive 1, 2 푑 푥 5, 4 푑 푥 2 26 푑 푥 37 as⟨ the⟩ pre-annotation:⟨ ( − ) + ( − ) + ( − ) + ⟩ Recent research [1, 17, 35, 41] has employed the Optional 1 1, 2 푑 푥 5, 4 푑 푥 2 26 푑 푥 37 Stopping Theorem (OST) from probability theory to address {⟨ ( − )+ ( − ) + ( − )+ ⟩} 2# = 1, 2 푑 푥 4, 4 푑 푥 2 22 푑 푥 28 0, 1, 4 푑 푥 9 this soundness issue. The classic OST provides a collection of ⟨ ( − )+ ( − ) + ( − )+ ⟩ ⊕ ⟨ ( − )+ ⟩ 3 call rdwalk; sufficient conditions for reasoning about expected gain upon 4 1, 1, 1 # = 1, 0, 0 0, 1, 1 termination of stochastic processes, where the expected gain {⟨ ⟩} ⟨ ⟩ ⊕ ⟨ ⟩ at any time is invariant. By constructing a stochastic pro- In §3.3, we present an automatic inference system for the cess for executions of probabilistic programs and setting the expected-potential method that is extended with interval- expected-potential function as the invariant, one can apply valued bounds on higher moments, with support for moment- the OST to justify the soundness of the expected-potential polymorphic recursion. function. In a companion paper [39], we study and propose Soundness of the analysis. Unlike the classic potential an extension to the classic OST with a new sufficient condi- method, the expected-potential method is not always sound tion that is suitable for reasoning about higher moments; in when reasoning about the moments for cost accumulators this work, we prove the soundness of our central-moment in- in probabilistic programs. ference for programs that satisfy this condition, and develop an algorithm to check this condition automatically (see §4). Counterexample 2.7. Consider the program in Fig.4 that describes a purely probabilistic loop that exits the loop with 3 Derivation System for Higher Moments probability 1 2 in each iteration. The expected accumulated / In this section, we describe the inference system used by cost of the program should be one [17]. However, the annota- our analysis. We first present a probabilistic programming tions in Fig.4 justify a potential function 2푥 as a lower bound language (§3.1). We then introduce moment semirings to on the expected accumulated cost, no matter what value 푥 compose higher moments for a cost accumulator from two has at the beginning, which is apparently unsound. computations (§3.2). We use moment semirings to develop Why does the expected-potential method fail in this case? our derivation system, which is presented as a declarative The short answer is that dualization only works for some program logic (§3.3). Finally, we sketch how we reduce the problems: upper-bounding the sum of nonnegative ticks is inference of a derivation to LP solving (§3.4). equivalent to lower-bounding the sum of nonpositive ticks; lower-bounding the sum of nonnegative ticks—the issue in 3.1 A Probabilistic Programming Language Fig.4—is equivalent to upper-bounding the sum of nonposi- This paper uses an imperative arithmetic probabilistic pro- tive ticks; however, the two kinds of problems are inherently gramming language Appl that supports general recursion different [17]. Intuitively, the classic potential method for and continuous distributions, where program variables are Central Moment Analysis for Cost Accumulators in Probabilistic Programs PLDI ’21, June 20–25, 2021, Virtual, Canada real-valued. We use the following notational conventions. upper and lower bounds of higher moments. Our approach Natural numbers ℕ exclude 0, i.e., ℕ def= 1, 2, 3, ℤ def= is inspired by the work of Li and Eisner [25], which devel- { · · · } ⊆ + 0, 1, 2, . The Iverson brackets are defined by 휑 = 1 ops a method to “lift” techniques for first moments to those { ···} [·] [ ] if 휑 is true and otherwise 휑 = 0. We denote updating an for second moments. Instead of restricting the elements of [ ] existing binding of 푥 in a finite map 푓 to 푣 by 푓 푥 푣 . We semirings to be vectors of numbers, we propose algebraic [ ↦→ ] will also use the following standard notions from probability moment semirings that can also be instantiated with vectors theory: 휎-algebras, measurable spaces, measurable functions, of intervals, which we need for the interval-bound analysis random variables, probability measures, and expectations. that was demonstrated in §2.1. We include a review of those notions in the technical re- 푚 Definition 3.1. The 푚-th order moment semiring MR( ) = port [38]. R 푚 1, , , 0, 1 is parametrized by a partially ordered (| | + ⊕ ⊗ ) Fig.5 presents the syntax of Appl, where the metavari- semiring R = R , , , , 0, 1 , where ables 푆, 퐿, 퐸, and 퐷 stand for statements, conditions, expres- (| | ≤ + · ) def sions, and distributions, respectively. Each distribution 퐷 is −−−−−−−−−→푢푘 0 푘 푚 −−−−−−−−→푣푘 0 푘 푚 = −−−−−−−−−−−−−→푢푘 푣푘 0 푘 푚, (6) ⟨ ⟩ ≤ ≤ ⊕⟨ ⟩ ≤ ≤ ⟨ + ⟩ ≤ ≤ associated with a probability measure 휇퐷 픻 ℝ . We write ∈ ( ) def= −−−−−−−−−−−−−−−−−−−−−−−−−→Í푘 푘 픻 푋 for the collection of all probability measures on the −−−−−−−−−→푢푘 0 푘 푚 −−−−−−−−→푣푘 0 푘 푚 푖=0 푖 푢푖 푣푘 푖 0 푘 푚, (7) ( ) ⟨ ⟩ ≤ ≤ ⊗⟨ ⟩ ≤ ≤ ⟨ ×( · − )⟩ ≤ ≤ measurable space 푋. For example, uniform 푎,푏 describes 푘 is the binomial coefficient; the scalar product 푛 푢 is an ( ) 푖 × a uniform distribution on the interval 푎,푏 , and its corre- Í푛 def [ ] abbreviation for 푖=1 푢, for 푛 ℤ+,푢 R; 0 = 0, 0, , 0 ; sponding probability measure is the integration of its density def ∈ ∈ ⟨ ··· ⟩ def ∫ 푎 푥 푏 and 1 = 1, 0, , 0 . We define the partial order as the function 휇uniform 푎,푏 푂 = [ ≤ ≤ ] 푑푥. The statement ⟨ ··· ⟩ ⊑ ( ) ( ) 푂 푏 푎 pointwise extension of the partial order on R. “푥 퐷” is a random-sampling assignment,− which draws from ≤ ∼ the distribution 휇 to obtain a sample value and then assigns Intuitively, the definition of in eq. (7) can be seen as 퐷 ⊗ it to 푥. The statement “if prob 푝 then 푆 else 푆 fi” is a the multiplication of two moment-generating functions for ( ) 1 2 probabilistic-branching statement, which executes 푆1 with distributions with moments −−−−−−−−−→푢푘 0 푘 푚 and −−−−−−−−→푣푘 0 푘 푚, re- probability 푝, or 푆 with probability 1 푝 . ⟨ ⟩ ≤ ≤ ⟨ ⟩ ≤ ≤ 2 ( − ) spectively. We prove a composition property for moment The statement “call 푓 ” makes a (possibly recursive) call semirings. to the function with identifier 푓 FID. In this paper, we as- ∈ Lemma 3.2. For all 푢, 푣 R, it holds that sume that the functions only manipulate states that consist ∈ tick 푐 푘 푘 푘 of global program variables. The statement , where −−−−−−−−−−−−−−→푢 푣 0 푘 푚 = −−−−−−−−−→푢 0 푘 푚 −−−−−−−−→푣 0 푘 푚, ( ) ≤ ≤ ≤ ≤ ≤ ≤ 푐 ℝ is a constant, is used to define the cost model. It adds 푐 푛⟨( + ) ⟩ ⟨ ⟩Î푛 ⊗ ⟨ ⟩ ∈ where 푢 is an abbreviation for 푖=1 푢, for 푛 ℤ+,푢 R. to an anonymous global cost accumulator. Note that our im- ∈ ∈ plementation supports local variables, function parameters, 3.3 Inference Rules return statements, as well as accumulation of non-constant We present the derivation system as a declarative program costs; the restrictions imposed here are not essential, and are logic that uses moment semirings to enable compositional introduced solely to simplify the presentation. reasoning and moment-polymorphic recursion. We use a pair 풟, 푆 to represent an Appl program, ⟨ main⟩ where 풟 is a finite map from function identifiers to their Interval-valued moment semirings. Our derivation sys- bodies and 푆main is the body of the main function. We present tem infers upper and lower bounds simultaneously, rather an operational semantics for Appl in §4.1. than separately, which is essential for non-monotone costs. Consider a program “tick 1 ; 푆” and suppose that we have (− ) 1, 2, 5 and 1, 2, 5 as the upper and lower bound on the ⟨ ⟩ ⟨ − ⟩ 3.2 Moment Semirings first two moments of the cost for 푆, respectively. If we only As discussed in §2.1, we want to design a composition op- use the upper bound, we derive 1, 1, 1 1, 2, 5 = 1, 1, 2 , ⟨ − ⟩ ⊗ ⟨ ⟩ ⟨ ⟩ eration and a combination operation to compose and which is not an upper bound on the moments of the cost ⊗ ⊕ combine higher moments of accumulated costs such that for the program; if the actual moments of the cost for 푆 are 푘 1, 0, 5 , then the actual moments of the cost for “tick 1 ; 푆” 휙 휎 피휎 푆 휎 −−−−−−−−−−−−−−→퐶 휎, 휎 ′ 0 푘 푚 휙 휎 ′ , ⟨ ⟩ (− ) ( ) ⊒ ′∼ ( )[⟨ ( ) ⟩ ≤ ≤ ⊗ ( )] are 1, 1, 1 1, 0, 5 = 1, 1, 4 1, 1, 2 . Thus, in the J K ⟨ − ⟩ ⊗ ⟨ ⟩ ⟨ − ⟩ ̸≤ ⟨ ⟩ −−−−−−−−−−−−−−→푘 analysis, we instantiate moment semirings with the interval 휙1 휎 휙2 휎 피휎′ 푆 휎 퐶 휎, 휎 ′ 0 푘 푚 휙1 휎 ′ 휙2 휎 ′ , ( ) ⊕ ( ) ⊒ ∼J K( )[⟨ ( ) ⟩ ≤ ≤ ⊗ ( ( ) ⊕ ( ))] domain I. For the program “tick 1 ; 푆”, its interval-valued where the expected-potential functions 휙, 휙 , 휙 map pro- (− ) 1 2 bound on the first two moments is 1, 1 , 1, 1 , 1, 1 gram states to interval-valued vectors, 퐶 휎, 휎 ′ is the cost ⟨[ ] [− − ] [ ]⟩ ⊗ ( ) 1, 1 , 2, 2 , 5, 5 = 1, 1 , 3, 1 , 2, 10 . for the computation from 휎 to 휎 ′, and 푚 is the degree of ⟨[ ] [− ] [ ]⟩ ⟨[ ] [− ] [ ]⟩ the target moment. In eqs. (4) and (5), we gave a definition Template-based expected-potential functions. The ba- of and suitable for first and second moments, respec- sic approach to automated inference using potential func- ⊗ ⊕ tively. In this section, we generalize them to reason about tions is to introduce a template for the expected-potential PLDI ’21, June 20–25, 2021, Virtual, Canada Di Wang, Jan Hoffmann, and Thomas Reps functions. Let us fix 푚 ℕ as the degree of the target mo- (Q-Sample) ∈ 푚 (Q-Tick) ment. Because we use M( ) -valued expected-potential func- Γ = 푥 supp 휇퐷 : Γ′ I −−−−−−−−−−−−−−→푘 푘 ∀ ∈ ( ) 푄 = 푐 , 푐 0 푘 푚 푄 ′ 푄 = 피푥 휇퐷 푄 ′ tions whose range is vectors of intervals, the templates are ⟨[ ]⟩ ≤ ≤ ⊗ ∼ [ ] vectors of intervals whose ends are represented symboli- Δ ℎ Γ;푄 tick 푐 Γ;푄 ′ Δ ℎ Γ;푄 푥 퐷 Γ′;푄 ′ ⊢ { } ( ){ } ⊢ { } ∼ { } cally . In this paper, we represent the ends of intervals by (Q-Loop) (Q-Call-Mono) polynomials in ℝ VID over program variables. [ ] Δ ℎ Γ 퐿;푄 푆 Γ;푄 Γ;푄, Γ′;푄 ′ Δ푚 푓 More formally, we lift the interval semiring I to a sym- ⊢ { ∧ } { } ( ) ∈ ( ) Δ Γ;푄 while 퐿 do 푆 od Γ 퐿;푄 Δ 푚 Γ;푄 call 푓 Γ ;푄 bolic interval semiring PI by representing the ends of the ⊢ℎ { } { ∧¬ } ⊢ { } { ′ ′} 푘-th interval by polynomials in ℝ푘푑 VID up to degree (Q-Seq) (Q-Call-Poly) 푚 [ ] 푘푑 for some fixed 푑 ℕ. Let M( ) be the 푚-th order Δ ℎ Γ;푄 푆1 Γ′;푄 ′ ℎ < 푚 Δℎ 푓 = Γ;푄1, Γ′;푄1′ ∈ PI ⊢ { } { } ( ) ( ) moment semiring instantiated with the symbolic interval Δ ℎ Γ′;푄 ′ 푆2 Γ′′;푄 ′′ Δ ℎ 1 Γ;푄2 풟 푓 Γ′;푄2′ ⊢ { } { } ⊢ + { } ( ){ } semiring. Then the potential annotation is represented as Δ ℎ Γ;푄 푆1; 푆2 Γ′′;푄 ′′ Δ ℎ Γ;푄1 푄2 call 푓 Γ′;푄1′ 푄2′ 푚 ⊢ { } { } ⊢ { ⊕ } { ⊕ } 푄 = −−−−−−−−−−−−−−→퐿푘,푈푘 0 푘 푚 MPI( ) , where 퐿푘 ’s and 푈푘 ’s are poly- (Q-Prob) ⟨[ ]⟩ ≤ ≤ ∈ 푚 nomials in ℝ VID . 푄 defines an M( ) -valued expected- Δ Γ;푄1 푆1 Γ′;푄 ′ Δ Γ;푄2 푆2 Γ′;푄 ′ 푘푑 [ ] I ⊢ℎ { } { } ⊢ℎ { } { } def 푄 = 푃 푅 푃 = 푝, 푝 , 0, 0 , , 0, 0 푄1 potential function 휙푄 휎 = −−−−−−−−−−−−−−−−−−−−−→휎 퐿푘 , 휎 푈푘 0 푘 푚, where ⊕ ⟨[ ] [ ] ··· [ ]⟩ ⊗ ( ) ⟨[ ( ) ( )]⟩ ≤ ≤ 푅 = 1 푝, 1 푝 , 0, 0 , , 0, 0 푄2 휎 is a program state, and 휎 퐿푘 and 휎 푈푘 are 퐿푘 and 푈푘 ⟨[ − − ] [ ] ··· [ ]⟩ ⊗ ( ) ( ) Δ Γ;푄 if prob 푝 then 푆 else 푆 fi Γ ;푄 evaluated over 휎, respectively. ⊢ℎ { } ( ) 1 2 { ′ ′} Inference rules. We formalize our derivation system Figure 6. Selected inference rules of the derivation system. for moment analysis in a Hoare-logic style. The judgment assigns a value to 푥 in the support of distribution 퐷, we quan- has the form Δ Γ;푄 푆 Γ ;푄 , where 푆 is a state- ⊢ℎ { } { ′ ′} tify 푥 out universally from the logical context. To compute ment, Γ;푄 is a precondition, Γ′;푄 ′ is a postcondition, { } { } 푄 = 피푥 휇퐷 푄 ′ , where 푥 is drawn from distribution 퐷, we Δ = Δ푘 0 푘 푚 is a context of function specifications, and ∼ [ ] ⟨ ⟩ ≤ ≤ assume the moments for 퐷 are well-defined and computable, ℎ ℤ specifies some restrictions put on 푄, 푄 that we will 푖 ∈ + ′ and substitute 푥 , 푖 ℕ with the corresponding moments in explain later. The logical context Γ : VID ℝ , ∈ ( → ) → {⊤ ⊥} 푄 ′. We make this assumption because every component of is a predicate that describes reachable states at a program 푚 푄 ′ is a polynomial over program variables. For example, if point. The potential annotation 푄 MPI( ) specifies a map 퐷 = uniform 1, 2 , we know the following facts ∈ (− ) from program states to the moment semiring that is used 0 1 1 2 3 5 피푥 휇 푥 =1, 피푥 휇 푥 = 2, 피푥 휇 푥 =1, 피푥 휇 푥 = 4. ∼ 퐷 ∼ 퐷 ∼ 퐷 ∼ 퐷 to define interval-valued expected-potential functions. The [ ] [ ] 2/ 2 [3 ] [ ] / Then for 푄 ′ = 1, 1 , 1 푥 , 푥푦 푥 푦 , by the linearity of semantics of the triple ;푄 푆 ;푄 ′ is that if the rest of ⟨[ ] [ + + ]⟩ {· } {· } expectations, we compute 푄 = 피푥 휇 푄 ′ as follows: the computation after executing 푆 has its moments of the ∼ 퐷 [ ] = 2 2 3 accumulated cost bounded by 휙푄 , then the whole compu- 피푥 휇퐷 푄 ′ 1, 1 , 피푥 휇퐷 1 푥 , 피푥 휇퐷 푥푦 푥 푦 ′ ∼ [ ] ⟨[ ] [ ∼ [ + 2] 2 ∼ [ + ]⟩ 3 tation has its moments of the accumulated cost bounded by = 1, 1 , 1 피푥 휇퐷 푥 ,푦 피푥 휇퐷 푥 푦피푥 휇퐷 푥 ⟨[ ] [ + ∼2 [ ] ∼ [ ] + ∼ [ ]]⟩ 휙 . The parameter ℎ restricts all 푖-th-moment components = 1, 1 , 2, 1 2 푦 5 4 푦 . 푄 ⟨[ ] [ / · + / · ]⟩ in 푄, 푄 ′, such that 푖 < ℎ, to be 0, 0 . We call such potential The other probabilistic rule (Q-Prob) deals with probabilistic [ ] annotations ℎ-restricted; this construction is motivated by an branching. Intuitively, if the moments of the execution of 푆1 observation from Ex. 2.6, where we illustrated the benefits of and 푆2 are 푞1 and 푞2, respectively, and those of the accumu- carrying out interprocedural analysis using an “elimination lated cost of the computation after the branch statement is sequence” of annotations for recursive function calls, where bounded by 휙푄′ , then the moments for the whole computa- the successive annotations have a greater number of zeros, tion should be bounded by a “weighted average” of 푞 휙 ( 1 ⊗ 푄′ ) filling from the left. Function specifications are valid pairs and 푞 휙 , with respect to the branching probability 푝. ( 2 ⊗ 푄′ ) of pre- and post-conditions for all declared functions in a We implement the weighted average by the combination program. For each 푘, such that 0 푘 푚, and each function operator applied to 푝, 푝 , 0, 0 , , 0, 0 푞 휙 ≤ ≤ ⊕ ⟨[ ] [ ] ··· [ ]⟩ ⊗ 1 ⊗ 푄′ 푓 , a valid specification Γ;푄, Γ′;푄 ′ Δ푘 푓 is justified by and 1 푝, 1 푝 , 0, 0 , , 0, 0 푞 휙 , because the ( ) ∈ ( ) ⟨[ − − ] [ ] ··· [ ]⟩ ⊗ 2 ⊗ 푄′ the judgment Δ 푘 Γ;푄 풟 푓 Γ′;푄 ′ , where 풟 푓 is 0-th moments denote probabilities. ⊢ { } ( ){ } ( ) the function body of 푓 , and 푄, 푄 ′ are 푘-restricted. To per- The rules (Q-Call-Poly) and (Q-Call-Mono) handle func- form context-sensitive analysis, a function can have multiple tion calls. Recall that in Ex. 2.6, we use the operator to ⊕ specifications. combine multiple potential functions for a function to reason Fig.6 presents some of the inference rules. The rule (Q- about recursive function calls. The restriction parameter ℎ Tick) is the only rule that deals with costs in a program. To is used to ensure that the derivation system only needs to accumulate the moments of the cost, we use the opera- reason about finitely many post-annotations for each call 푚 ⊗ tion in the moment semiring MPI( ) . The rule (Q-Sample) site. In rule (Q-Call-Poly), where ℎ is smaller than the target accounts for sampling statements. Because “푥 퐷” randomly moment 푚, we fetch the pre- and post-condition 푄 , 푄 for ∼ 1 1′ Central Moment Analysis for Cost Accumulators in Probabilistic Programs PLDI ’21, June 20–25, 2021, Virtual, Canada

1 func rdwalk() begin 0, 0 , 0, 푞 푥2 푞 푥 푞 1 , where we use polynomials of ⟨[ ] [ 푥′ 2 · + 푥′ · + 1′ · ]⟩ 2 푥 <푑 2; 1, 1 , 2 푑 푥 , 2 푑 푥 4 , 푥 up to degree 2 as the templates, and푞 2, 푞 , 푞 , 푞 , 푞 , 푞 are { + ⟨[ ] [ ( − ) ( − )+ ] 푥 푥 1 푥′ 2 푥′ 1′ 3 4 푑 푥 2 6 푑 푥 4, 4 푑 푥 2 22 푑 푥 28 [ ( − ) + ( − )− ( − ) + ( − )+ ]⟩} unknown numeric coefficients. By (Q-Sample), we generate 4 if 푥 < 푑 then constraints to perform “partial evaluation” on the polynomi- 5 푥 <푑; 1, 1 , 2 푑 푥 , 2 푑 푥 4 , { ⟨[ ] [ ( − ) ( − )+ ] als by substituting 푥 with the moments of uniform 1, 2 . 6 4 푑 푥 2 6 푑 푥 4, 4 푑 푥 2 22 푑 푥 28 (− ) [ ( − ) + ( − )− ( − ) + ( − )+ ]⟩} Let 퐷 denote uniform 1, 2 . Because 7 푡 uniform 1, 2 ; 2 (− ) ∼ (− ) = 1 2 8 푥 <푑 푡 2; 1, 1 , 2 푑 푥 푡 1, 2 푑 푥 푡 5 , 피푥 휇퐷 푞푥′ 2 푥 푞푥′ 푥 푞1′ 1 푞푥′ 2 1 푞푥′ 푞1′ 1 , { ∧ ≤ ⟨[ ] [ ( − − )+ ( − − )+ ] ∼ [ · + · + · ] ( · + · / + · ) 9 4 푑 푥 푡 2 10 푑 푥 푡 3, 4 푑 푥 푡 2 26 푑 푥 푡 37 we generate these linear constraints: [ ( − − ) + ( − − )− ( − − ) + ( − − )+ )]⟩} 10 푥 B 푥 푡; 1 + 푞푥2 = 0, 푞푥 = 0, 푞1 = 푞′ 2 푞푥′ 2 푞1′ . 11 푥 <푑 2; 1, 1 , 2 푑 푥 1, 2 푑 푥 5 , 푥 + · / + { + ⟨[ ] [ ( − )+ ( − )+ ] 12 4 푑 푥 2 10 푑 푥 3, 4 푑 푥 2 26 푑 푥 37 [ ( − ) + ( − )− ( − ) + ( − )+ ]⟩} 13 call rdwalk; 14 ; 1, 1 , 1, 1 , 1, 1 Constraint generation for other inference rules is similar { ⊤ ⟨[ ] [ ] [ ]⟩} 15 tick(1) to the process we describe in Ex. 3.4. For example, let us 16 ; 1, 1 , 0, 0 , 0, 0 consider the loop rule (Q-Loop). Instead of computing the { ⊤ ⟨[ ] [ ] [ ]⟩} 17 fi loop invariant 푄 explicitly, our system represents 푄 directly 18 end as a template with unknown coefficients, then uses 푄 as the post-annotation to analyze the loop body and obtain a Figure 7. The rdwalk function with annotations for the pre-annotation, and finally generates linear constraints that interval-bounds on the first and second moments. indicate that the pre-annotation equals 푄. Details of the linear-constraint generation of our system are included in the function 푓 from the specification context Δℎ. We then combine it with a frame of ℎ 1 -restricted potential an- the technical report [38]. ( + ) The LP solver not only finds assignments to the coeffi- notations 푄2, 푄2′ for the function 푓 . The frame is used to account for the interval bounds on the moments for the com- cients that satisfy the constraints, it can also optimize a putation after the function call for most non-tail-recursive linear objective function. In our central-moment analysis, programs. When ℎ reaches the target moment 푚, we use the we construct an objective function that tries to minimize rule (Q-Call-Mono) to reason moment-monomorphically, imprecision. For example, let us consider upper bounds on because setting ℎ to 푚 1 implies that the frame can only the variance. We randomly pick a concrete valuation of pro- + be 0, 0 , 0, 0 , , 0, 0 . gram variables that satisfies the pre-condition (e.g., 푑 > 0 ⟨[ ] [ ] ··· [ ]⟩ in Fig.2), and then substitute program variables with the Example 3.3. Fig.7 presents the logical context and the concrete valuation in the polynomial for the upper bound on complete potential annotation for the first and second mo- the variance (obtained from bounds on the raw moments). ments for the cost accumulator tick of the rdwalk function The resulting linear combination of coefficients, which we from Ex. 2.1. Similar to the reasoning in Ex. 2.6, we can jus- set as the objective function, stands for the variance under tify the derivation using moment-polymorphic recursion the concrete valuation. Thus, minimizing the objective func- and the moment bounds for rdwalk with post-annotations tion produces the most precise upper bound on the variance 0, 0 , 1, 1 , 1, 1 and 0, 0 , 0, 0 , 2, 2 . ⟨[ ] [ ] [ ]⟩ ⟨[ ] [ ] [ ]⟩ under the specific concrete valuation. Also, we can extract a symbolic upper bound on the variance using the assignments 3.4 Automatic Linear-Constraint Generation to the coefficients. Because the derivation of the bounds only We adapt existing techniques [8, 29] to automate our in- uses the given pre-condition, the symbolic bounds apply to ference system by (i) using an abstract interpreter to infer all valuations that satisfy the pre-condition. logical contexts, (ii) generating templates and linear con- straints by inductively applying the derivation rules to the analyzed program, and (iii) employing an off-the-shelf LP 4 Soundness of Higher-Moment Analysis solver to discharge the linear constraints. During the genera- tion phase, the coefficients of monomials in the polynomials In this section, we study the soundness of our derivation sys- from the ends of the intervals in every qualitative context tem for higher-moment analysis. We first present a Markov- 푚 chain semantics for the probabilistic programming language 푄 M( ) are recorded as symbolic names, and the inequal- ∈ PI Appl to reason about how stepwise costs contribute to the ities among those coefficients—derived from the inference global accumulated cost (§4.1). We then formulate higher- rules in Fig.6—are emitted to the LP solver. moment analysis with respect to the semantics and prove Example 3.4. We demonstrate linear-constraint genera- the soundness of our derivation system for higher-moment tion for the upper bound on the first moment for the sam- analysis based on a recent extension to the Optional Stopping pling statement 푥 uniform 1, 2 with a pre-annotation Theorem (§4.2). Finally, we sketch the algorithmic approach 2 ∼ (− ) 0, 0 , 0, 푞 2 푥 푞 푥 푞 1 and a post-annotation for ensuring the soundness of our analysis (§4.3). ⟨[ ] [ 푥 · + 푥 · + 1 · ]⟩ PLDI ’21, June 20–25, 2021, Virtual, Canada Di Wang, Jan Hoffmann, and Thomas Reps

4.1 A Markov-Chain Semantics a 휎-algebra on Ω, and ℙ is a probability measure on Ω, F ( ) obtained by the evaluation relation 휎 휇 and the initial con- Operational semantics. We start with a small-step op- ↦→ figuration 휆_.0, 푆 , Kstop, 0 . Intuitively, ℙ specifies the erational semantics with continuations, which we will use ⟨ main ⟩ later to construct the Markov-chain semantics. We follow a over all possible executions of a prob- distribution-based approach [6, 23] to define an operational abilistic program. The probability of an assertion 휃 with re- spect to ℙ, written ℙ 휃 , is defined as ℙ 휔 휃 휔 is true . cost semantics for Appl. Full details of the semantics are in- [ ] ({ | ( ) }) cluded in the technical report [38]. A program configuration To formulate the accumulated cost at the exit of the pro- 휎 Σ is a quadruple 훾, 푆, 퐾, 훼 where 훾 : VID ℝ is a gram, we define the stopping time 푇 : Ω ℤ+ of a ∈ ⟨ ⟩ → → ∪ {∞} program state that maps variables to values, 푆 is the state- probabilistic program as a random variable on the probability space Ω, F, ℙ of program traces: ment being executed, 퐾 is a continuation that describes what ( ) def remains to be done after the execution of 푆, and 훼 ℝ is the 푇 휔 = inf 푛 ℤ+ 휔 = _, skip, Kstop, _ , ∈ ( ) { ∈ | 푛 ⟨ ⟩} global cost accumulator. An execution of an Appl program i.e.,푇 휔 is the number of evaluation steps before the trace 휔 풟 ( ) , 푆main is initialized with 휆_.0, 푆main, Kstop, 0 , and the reaches some termination configuration _, skip, Kstop, _ . ⟨ ⟩ ⟨ ⟩ ⟨ ⟩ termination configurations have the form _, skip, Kstop, _ , We define the accumulated cost 퐴 : Ω ℝ with respect to ⟨ ⟩ 푇 → where Kstop is an empty continuation. the stopping time 푇 as Different from a standard semantics where each program def configuration steps to at most one new configuration, a 퐴푇 휔 = 퐴푇 휔 휔 , ( ) ( ) ( ) probabilistic semantics may pick several different new con- where 퐴 : Ω ℝ captures the accumulated cost at the 푛 → figurations. The evaluation relation for Appl has the form 푛-th evaluation step for 푛 ℤ , which is defined as ∈ + 휎 휇 where 휇 픻 Σ is a probability measure over con- def ↦→ ∈ ( ) 퐴푛 휔 = 훼푛 where 휔푛 = _, _, _, 훼푛 . figurations. Below are two example rules. The rule (E-Prob) ( ) ⟨ ⟩ The 푚-th moment of the accumulated cost is given by the constructs a distribution whose support has exactly two expectation 피 퐴푚 with respect to ℙ. elements, which stand for the two branches of the proba- [ 푇 ] bilistic choice. We write 훿 휎 for the Dirac measure at 휎, ( ) defined as 휆퐴. 휎 퐴 where 퐴 is a measurable subset of 4.2 Soundness of the Derivation System [ ∈ ] Σ. We also write 푝 휇 1 푝 휇 for a convex combi- Proofs for this section are included in the technical re- · 1 + ( − )· 2 nation of measures 휇 and 휇 where 푝 0, 1 , defined as port [38]. 1 2 ∈ [ ] 휆퐴.푝 휇 퐴 1 푝 휇 퐴 . The rule (E-Sample) “pushes” · 1 ( ) + ( − )· 2 ( ) the probability distribution of 퐷 to a distribution over post- The expected-potential method for moment analysis. 푚 sampling program configurations. We fix a degree 푚 ℕ and let M( ) be the 푚-th order ∈ I 푆 = if prob 푝 then 푆1 else 푆2 fi moment semiring instantiated with the interval semiring I. ( ) (E-Prob) 푚 훾, 푆, 퐾, 훼 푝 훿 훾, 푆 , 퐾, 훼 1 푝 훿 훾, 푆 , 퐾, 훼 We now define M( ) -valued expected-potential functions. ⟨ ⟩ ↦→ · (⟨ 1 ⟩) + ( − )· (⟨ 2 ⟩) I (E-Sample) 푚 훾, 푥 퐷, 퐾, 훼 휆퐴.휇퐷 푟 훾 푥 푟 , skip, 퐾, 훼 퐴 Definition 4.2. A measurable map 휙 : Σ M( ) is said ⟨ ∼ ⟩ ↦→ ({ | ⟨ [ ↦→ ] ⟩ ∈ }) → I to be an expected-potential function if Example 4.1. Suppose that a random sampling statement (i) 휙 휎 = 1 if 휎 = _, skip, Kstop, _ , and is being executed, i.e., the current configuration is ( ) ⟨ ⟩ 푘 푘  (ii) 휙 휎 피휎 휎 −−−−−−−−−−−−−−−−−−−−−−−−−−−−→훼 ′ 훼 , 훼 ′ 훼 휙 휎 ′ 푡 푡0 , 푡 uniform 1, 2 , 퐾0, 훼0 . ( ) ⊒ ′∼↦→( ) [ [( − ) ( − ) ] 0 푘 푚 ⊗ ( )] ⟨{ ↦→ } ( ∼ (− )) ⟩ where 휎 = _, _, _, 훼 , 휎 = _, _, _, 훼 for all≤ 휎≤ Σ. The probability measure for the uniform distribution is ⟨ ⟩ ′ ⟨ ′⟩ ∈ ∫ 1 푥 2 휆푂. [− ≤ ≤ ] 푑푥 푂 3 . Thus, by the rule (E-Sample), we derive Intuitively, 휙 휎 is an interval bound on the moments for ( ) the post-sampling probability measure over configurations: the accumulated cost of the computation that continues from ∫ 1 푟 2 휆퐴. ℝ 푡 푟 , skip, 퐾0, 훼0 퐴 [− ≤3 ≤ ] 푑푟. the configuration 휎. We define Φ푛 and Y푛, where 푛 ℤ+, to [⟨{ ↦→ } ⟩ ∈ ]· 푚 ∈ be M( ) -valued random variables on the probability space A Markov-chain semantics. In this work, we harness I Ω, F, ℙ of the Markov-chain semantics as Markov-chain-based reasoning [22, 31] to develop a Markov- ( ) def def 푘 푘 chain cost semantics for Appl, based on the evaluation rela- Φ푛 휔 = 휙 휔푛 , Y푛 휔 = −−−−−−−−−−−−−−−−−−−−−−−−→퐴푛 휔 , 퐴푛 휔 0 푘 푚 Φ푛 휔 . ( ) ( ) ( ) ⟨[ ( ) ( ) ]⟩ ≤ ≤ ⊗ ( ) tion 휎 휇. An advantage of this approach is that it allows In the definition of Y , we use to compose the powers of ↦→ 푛 ⊗ us to study how the cost of every single evaluation step the accumulated cost at step 푛 and the expected potential contributes to the accumulated cost at the exit of the pro- function that stands for the moments of the accumulated gram. Details of this semantics are included in the technical cost for the rest of the computation. report [38]. def Let Ω, F, ℙ be the probability space where Ω = Σℤ+ is Lemma 4.3. By the properties of potential functions, we can ( ) the set of all infinite traces over program configurations, F is prove that 피 Y푛 1 Y푛 Y푛 almost surely, for all 푛 ℤ+. [ + | ] ⊑ ∈ Central Moment Analysis for Cost Accumulators in Probabilistic Programs PLDI ’21, June 20–25, 2021, Virtual, Canada

We call Y푛 푛 ℤ a moment invariant. Our goal is to higher-moment analysis. We extend OST with a new suffi- { } ∈ + establish that 피 Y 피 Y , i.e., the initial interval- cient condition that allows us to prove Thm. 4.4. We discuss [ 푇 ] ⊑ [ 0] valued potential 피 Y = 피 1 Φ = 피 Φ brackets the details of our extended OST in a companion paper [39]; [ 0] [ ⊗ 0] [ 0] the higher moments of the accumulated cost 피 Y = in this work, we focus on the derivation system for central [ 푇 ] 푘 푘 푘 푘 moments. 피 −−−−−−−−−−−−−−−→퐴 , 퐴 0 푘 푚 1 = −−−−−−−−−−−−−−−−−−−−−−→피 퐴 , 피 퐴 0 푘 푚. [⟨[ 푇 푇 ]⟩ ≤ ≤ ⊗ ] ⟨[ [ 푇 ] [ 푇 ]]⟩ ≤ ≤ Soundness. The soundness of the derivation system is proved with respect to the Markov-chain semantics. Let 4.3 An Algorithm for Checking Soundness Criteria def Termination Analysis. We reuse our system for auto- −−−−−−−−−−−−−−→푎푘,푏푘 0 푘 푚 = max0 푘 푚 max 푎푘 , 푏푘 . ∥⟨[ ]⟩ ≤ ≤ ∥∞ ≤ ≤ { {| | | |}} matically deriving higher moments, which we developed in Theorem 4.4. Let 풟, 푆main be a probabilistic program. Sup- 피 푇푚푑 < ⟨ ⟩ 푚 §3.3 and §3.4, for checking if (Thm. 4.4(i)). To pose Δ Γ;푄 푆 Γ ; 1 , where 푄 M( ) and the [ ] ∞ ⊢ { } main { ′ } ∈ PI reason about termination time, we assume that every pro- ends of the 푘-th interval in 푄 are polynomials in ℝ VID . 푘푑 [ ] gram statement increments the cost accumulator by one. For Let Y푛 푛 ℤ+ be the moment invariant extracted from the example, the inference rule (Q-Sample) becomes Markov-chain{ } ∈ semantics with respect to the derivation of Γ = 푥 supp 휇퐷 : Γ′ 푄 = 1, 1, , 1 피푥 휇 푄 ′ Δ Γ;푄 푆 Γ ; 1 . If the following conditions hold: ∀ ∈ ( ) ⟨ ··· ⟩ ⊗ ∼ 퐷 [ ] ⊢ { } main { ′ } Δ Γ;푄 푥 퐷 Γ ;푄 (i) 피 푇푚푑 < , and ⊢ { } ∼ { ′ ′} [ ] ∞ However, we cannot apply Thm. 4.4 for the soundness of the (ii) there exists 퐶 0 such that for all 푛 ℤ+, Y푛 ≥ ∈ ∥ ∥∞ ≤ termination-time analysis, because that would introduce a 퐶 푛 1 푚푑 almost surely, ·( + ) circular dependence. Instead, we use a different proof tech- −−−−−−−−−−−−−−−−−−−−−−→푘 푘 푚푑 Then 피 퐴푇 , 피 퐴푇 0 푘 푚 휙푄 휆_.0 . nique to reason about 피 푇 , taking into account the mono- ⟨[ [ ] [ ]]⟩ ≤ ≤ ⊑ ( ) [ ] tonicity of the runtimes. Because upper-bound analysis of 푘 푘 The intuitive meaning of −−−−−−−−−−−−−−−−−−−−−−→피 퐴 , 피 퐴 0 푘 푚 ⟨[ [ 푇 ] [ 푇 ]]⟩ ≤ ≤ ⊑ higher moments of runtimes has been studied by Kura et al. 휙 휆_.0 is that the moment 피 퐴푘 of the accumulated cost 푄 ( ) [ 푇 ] [24], we skip the details, but include them in the technical upon program termination is bounded by the interval in the report [38]. 푘th-moment component of 휙 휆_.0 , where 푄 is the quanti- 푄 ( ) tative context and 휆_.0 is the initial state. Boundedness of 푌푛 , 푛 ℤ+. To ensure that the condi- As we discussed in §2.2 and Ex. 2.7, the expected-potential ∥ ∥∞ ∈ tion in Thm. 4.4(ii) holds, we check if the analyzed program method is not always sound for deriving bounds on higher satisfies the bounded-update property: every (deterministic moments for cost accumulators in probabilistic programs. or probabilistic) assignment to a program variable updates The extra conditions Thm. 4.4(i) and (ii) impose constraints the variable with a change bounded by a constant 퐶 almost on the analyzed program and the expected-potential func- surely. Then the absolute value of every program variable at tion, which allow us to reduce the soundness to the optional evaluation step 푛 can be bounded by 퐶 푛 = 푂 푛 . Thus, a stopping problem from probability theory. · ( ) polynomial up to degree ℓ ℕ over program variables can be ∈ Optional stopping. Let us represent the moment invari- bounded by 푂 푛ℓ at evaluation step 푛. As observed by Wang ( ) ant Y푛 푛 ℤ as et al. [41], bounded updates are common in practice. ∈ + { } 0 0 1 1 푚 푚 퐿푛( ),푈푛( ) , 퐿푛( ),푈푛( ) , , 퐿푛( ),푈푛( ) 푛 ℤ+, {⟨[ 푘 푘 ] [ ] ··· [ ]⟩} ∈ where 퐿( ),푈 ( ) : Ω ℝ are real-valued random vari- 푛 푛 → 5 Tail-Bound Analysis ables on the probability space Ω, F, ℙ of the Markov-chain ( ) semantics, for 푛 ℤ , 0 푘 푚. We then have the obser- One application of our central-moment analysis is to bound ∈ + ≤ ≤ vations below as direct corollaries of Lem. 4.3: the probability that the accumulated cost deviates from some 푘 푘 given quantity. In this section, we sketch how we produce ( ) ( ) For any 푘, the sequence 푈푛 푛 ℤ+ satisfies 피 푈푛 1 • 푘 푘 { } ∈ [ + | the tail bounds shown in Fig.1(c). 푈푛( ) 푈푛( ) almost surely, for all 푛 ℤ+, and we want ] ≤ 푘∈ 푘 There are a lot of concentration-of-measure inequalities to find sufficient conditions for 피 푈푇( ) 피 푈0( ) . in probability theory [12]. Among those, one of the most 푘 [ ] ≤ [ ] 푘 ( ) ( ) For any 푘, the sequence 퐿푛 푛 ℤ+ satisfies 피 퐿푛 1 important is Markov’s inequality: • 푘 푘 { } ∈ [ + | 퐿푛( ) 퐿푛( ) almost surely, for all 푛 ℤ+, and we want to ] ≥ 푘 ∈ 푘 find sufficient conditions for 피 퐿( ) 피 퐿( ) . [ 푇 ] ≥ [ 0 ] Proposition 5.1. If 푋 is a nonnegative random variable and These kinds of questions can be reduced to optional stop- 피 푋 푘 푎 > 0, then ℙ 푋 푎 [ ] for any 푘 ℕ. ping problem from probability theory. Recent research [1, [ ≥ ] ≤ 푎푘 ∈ 17, 35, 41] has used the Optional Stopping Theorem (OST) from probability theory to establish sufficient conditions Recall that Fig.1(b) presents upper bounds on the raw for the soundness for analysis of probabilistic programs. moments 피 tick 2푑 4 and 피 tick2 4푑2 22푑 28 However, the classic OST turns out to be not suitable for for the cost accumulator[ ] ≤ +tick. With[ Markov’s] ≤ inequality,+ + we PLDI ’21, June 20–25, 2021, Virtual, Canada Di Wang, Jan Hoffmann, and Thomas Reps derive the following tail bounds: 1. How does the raw-moment inference part of our tool 피 tick 2푑 4 푑 1 compare to existing techniques for expected-cost bound ℙ tick 4푑 [ ] + →∞ , (8) [ ≥ ] ≤ 4푑 ≤ 4푑 −−−−−→ 2 analysis [29, 41]? 피 tick2 4푑2 22푑 28 푑 1 2. How does our tool compare to the state of the art in tail- ℙ tick 4푑 [ ] + + →∞ . (9) [ ≥ ] ≤ 4푑 2 ≤ 16푑2 −−−−−→ 4 probability analysis (which is based only on higher raw ( ) moments [24])? Note that (9) provides an asymptotically more precise bound 3. How scalable is our tool? Can it analyze programs with on ℙ tick 4푑 than (8) does, when 푑 approaches infinity. [ ≥ ] many recursive functions? Central-moment analysis can obtain an even more precise For the first question, we collected a broad suite of chal- tail bound. As presented in Fig.1(b), our analysis derives lenging examples from related work [24, 29, 41] with dif- 핍 tick 22푑 28 for the variance of tick. We can now [ ] ≤ + ferent loop and recursion patterns, as well as probabilistic employ concentration inequalities that involve variances of branching, discrete sampling, and continuous sampling. Our Cantelli’s inequality random variables. Recall : tool achieved comparable precision and efficiency with the Proposition 5.2. If 푋 is a random variable and 푎 > 0, then prior work on expected-cost bound analysis [29, 41]. The 핍 푋 핍 푋 details are included in the technical report [38]. ℙ 푋 피 푋 푎 핍 푋[ ]푎2 and ℙ 푋 피 푋 푎 핍 푋[ ]푎2 . [ − [ ] ≥ ] ≤ [ ]+ [ − [ ] ≤ − ] ≤ [ ]+ For the second question, we evaluated our tool on the com- With Cantelli’s inequality, we obtain the following tail plete benchmarks from Kura el al. [24]. We also conducted a bound, where we assume 푑 2: case study of a timing-attack analysis for a program provided ≥ ℙ tick 4푑 = ℙ tick 2푑 4 2푑 4 by DARPA during engagements of the STAC program [14], [ ≥ ] [ − ( + ) ≥ − ] 핍 tick where central moments are more useful than raw moments ℙ tick 피 tick 2푑 4 [ ] ≤ [ − [ ] ≥ − ] ≤ 핍 tick 2푑 4 2 (10) to bound the success probability of an attacker. We include [ ] + ( − ) the case study in the technical report [38]. 2푑 4 2 22푑 28 푑 = 1 ( − ) + →∞ 0. For the third question, we conducted case studies on two − 핍 tick 2푑 4 2 ≤ 4푑2 6푑 44 −−−−−→ [ ] + ( − ) + + sets of synthetic benchmark programs: For all 푑 15, (10) gives a more precise bound than both coupon-collector programs with 푁 coupons (푁 1, 103 ), ≥ • ∈ [ ] (8) and (9). It is also clear from Fig.1(c), where we plot the where each program is implemented as a set of tail- three tail bounds (8), (9), and (10), that the asymptotically recursive functions, each of which represents a state of most precise bound is the one obtained via variances. coupon collection, i.e., the number of coupons collected In general, for higher central moments, we employ Cheby- so far; and shev’s inequality to derive tail bounds: random-walk programs with 푁 consecutive one- • dimensional random walks (푁 1, 103 ), each of which Proposition 5.3. If 푋 is a random variable and 푎 > 0, then ∈ [ ] 피 푋 피 푋 2푘 starts at a position that equals the number of steps taken ℙ 푋 피 푋 푎 [( − [ ]) ] for any 푘 ℕ. [| − [ ]| ≥ ] ≤ 푎2푘 ∈ by the previous random walk to reach the ending position In our experiments, we use Chebyshev’s inequality to (the origin). Each program is implemented as a set of derive tail bounds from the fourth central moments. We will non-tail-recursive functions, each of which represents a show in Fig.8 that these tail bounds can be more precise random walk. The random walks in the same program than those obtained from both raw moments and variances. can have different transition probabilities. The largest synthetic program has nearly 16,000 LOC. We 6 Implementation and Experiments then ran our tool to derive an upper bound on the fourth (resp., second) central moment of the runtime for each Implementation. Our tool is implemented in OCaml, and coupon-collector (resp., random-walk) program. consists of about 5,300 LOC. The tool works on imperative The experiments were performed on a machine with an arithmetic probabilistic programs using a CFG-based IR [37]. Intel Core i7 3.6GHz processor and 16GB of RAM under The language supports recursive functions, continuous dis- macOS Catalina 10.15.7. tributions, unstructured control-flow, and local variables. To infer the bounds on the central moments for a cost accumu- Results. Some of the evaluation results to answer the sec- lator in a program, the user needs to specify the order of ond research question are presented in Tab.1. The programs the analyzed moment, and a maximal degree for the poly- (1-1) and (1-2) are coupon-collector problems with a total nomials to be used in potential-function templates. Using of two and four coupons, respectively. The programs (2-1) APRON [20], we implemented an interprocedural numeric and (2-2) are one-dimensional random walks with integer- analysis to infer the logical contexts used in the derivation. valued and real-valued coordinates, respectively. We omit We use the off-the-shelf solver Gurobi [26] for LP solving. three other programs here but include the full results in the Evaluation setup. We evaluated our tool to answer the technical report [38]. The table contains the inferred upper following three research questions: bounds on the moments for runtimes of these programs, and Central Moment Analysis for Cost Accumulators in Probabilistic Programs PLDI ’21, June 20–25, 2021, Virtual, Canada

Table 1. Upper bounds on the raw/central moments of run- (1-1) (1-2)

] by raw moments [24] by raw moments [24] times, with comparison to Kura et al.[24]. “T/O” stands for 푑 nd nd

≥ 0.2 by 2 central moment 0.2 by 2 central moment

푇 th th timeout after 30 minutes. “N/A” that the tool is not [ by 4 central moment by 4 central moment ℙ applicable. “-” indicates that the tool fails to infer a bound. Entries with more precise results or less analysis time are 0.1 0.1

marked in bold. Full results are included in the technical tail probability 0 0 report [38]. 20 30 40 푑 100 150 푑 (2-1) (2-2) this work Kura et al. [24]

] by raw moments [24] by raw moments [24] program moment 푑 nd nd upper bnd. time (sec) upper bnd. time (sec) ≥ 0.2 by 2 central moment 0.2 by 2 central moment

푇 th th nd [ by 4 central moment by 4 central moment 2 raw 201 0.019 201 0.015 ℙ rd 3 raw 3829 0.019 3829 0.020 0.1 0.1 (1-1) 4th raw 90705 0.023 90705 0.027

2nd central 32 0.029 N/A N/A tail probability 0 0 4th central 9728 0.058 N/A N/A 200 400 푑 200 400 푑 nd 2 raw 2357 1.068 3124 0.037 Figure 8. Upper bound of the tail probability ℙ 푇 푑 as a 3rd raw 148847 1.512 171932 0.062 [ ≥ ] function of 푑, with comparison to Kura et al.[24]. Each gray (1-2) 4th raw 11285725 1.914 12049876 0.096 line is the minimum of tail bounds obtained from the raw 2nd central 362 3.346 N/A N/A th moments of degree up to four inferred by Kura et al.[24]. 4 central 955973 9.801 N/A N/A nd 2nd raw 2320 0.016 2320 11.380 Green lines and red lines are the tail bounds given by 2 th 3rd raw 691520 0.018 - 16.056 and 4 central moments inferred by our tool, respectively. (2-1) 4th raw 340107520 0.021 - 23.414 We include the plots for other programs in the technical 2nd central 1920 0.026 N/A N/A report [38]. 4th central 289873920 0.049 N/A N/A 2nd raw 8375 0.022 8375 38.463 3rd raw 1362813 0.028 - 73.408 th (2-2) 4 raw 306105209 0.035 - 141.072 tail bounds when 푑 is large on program (2-2), while it obtains nd 2 central 5875 0.029 N/A N/A similar curves on program (2-1). 4th central 447053126 0.086 N/A N/A Scalability. In Fig.9, we demonstrate the running times of our tool on the two sets of synthetic benchmark pro- the running times of the analyses. We compared our results grams; Fig. 9a plots the analysis times for coupon-collector with Kura et al.’s inference tool for raw moments [24]. Our programs as a function of the independent variable 푁 (the tool is as precise as, and sometimes more precise than the total number of coupons), and Fig. 9b plots the analysis times prior work on all the benchmark programs. Meanwhile, our for random-walk programs as a function of 푁 (the total num- tool is able to infer an upper bound on the raw moments of ber of random walks). The evaluation show that our degree up to four on all the benchmarks, while the prior work tool achieves good scalability in both case studies: the run- reports failure on some higher moments for the random-walk time is almost a linear function of the program size, which programs. In terms of efficiency, our tool completed each is the number of recursive functions for both case studies. example in less than 10 seconds, while the prior work took Two reasons why our tool is scalable on the two sets of pro- more than a few minutes on some programs. One reason why grams are (i) our analysis is compositional and uses function our tool is more efficient is that we always reduce higher- summaries to analyze function calls, and (ii) for a fixed set of moment inference with non-linear polynomial templates to templates and a fixed diameter of the call graph, the number efficient LP solving, but the prior work requires semidefinite of linear constraints generated by our tool grows linearly programming (SDP) for polynomial templates. with the size of the program, and the LP solvers available Besides raw moments, our tool is also capable of inferring nowadays can handle large problem instances efficiently. upper bounds on the central moments of runtimes for the benchmarks. To evaluate the quality of the inferred central Discussion. Higher central moments can also provide moments, Fig.8 plots the upper bounds of tail probabilities more information about the shape of a distribution, e.g., the on runtimes 푇 obtained by Kura et al.[24], and those by 피 푇 피 푇 3 skewness (i.e., [( − [ 3 ])2 ] ) indicates how lopsided the distri- our central-moment analysis. Specifically, the prior work 핍 푇 / ( [ ]) 피 푇 피 푇 4 uses Markov’s inequality (Prop. 5.1), while we are also able bution of 푇 is, and the kurtosis (i.e., [( 핍−푇[ 2]) ] ) measures to apply Cantelli’s and Chebyshev’s inequality (Props. 5.2 the heaviness of the tails of the distribution( [ ]) of 푇 . We used and 5.3) with central moments. Our tool outperforms the our tool to analyze two variants of the random-walk pro- prior work on programs (1-1) and (1-2), and derives better gram (2-1). The two random walks have different transition PLDI ’21, June 20–25, 2021, Virtual, Canada Di Wang, Jan Hoffmann, and Thomas Reps

3500 8000 our prototype implementation and the analysis of a broad 3000 suite of benchmarks, as well as a tail-bound analysis. 6000 2500

2000 In the future, we plan to go beyond arithmetic programs 4000 1500 and add support for more datatypes, e.g., Booleans and lists. 1000 2000 We will also work on other kinds of uncertain quantities running time (ms) running time (ms) 500

0 0 for probabilistic programs. Another research direction is to 0 200 400 600 800 1000 0 200 400 600 800 1000 N N apply our analysis to higher-order functional programs.

(a) Coupon Collector (b) Random Walk Acknowledgments Figure 9. Running times of our tool on two sets of synthetic benchmark programs. Each figure plots the runtimes asa This article is based on research supported, in part, by a gift function of the size of the analyzed program. from Rajiv and Ritu Batra; by ONR under grants N00014- 17-1-2889 and N00014-19-1-2318; by DARPA under AA con- 3 4 10 10− · − · tract FA8750-18-C0092; and by the NSF under SaTC award rdwalk-1 rdwalk-1 4 rdwalk-2 2 rdwalk-2 1801369, SHF awards 1812876 and 2007784, and CAREER 피 푇 [ ] award 1845514. Any opinions, findings, and conclusions or 2 1 recommendations expressed in this publication are those of

density estimation 0 0 the authors, and do not necessarily reflect the views of the 100 200 300 400 500 600 700 800 900 푇 sponsoring agencies. Figure 10. Density estimation for the runtime 푇 of two We thank Jason Breck and John Cyphert for formulating, variants rdwalk-1 and rdwalk-2 of (2-1). and providing us with an initial formalization of the problem of timing-attack analysis. probabilities and step length, but they have the same ex- pected runtime 피 푇 . Tab.2 presents the skewness and kur- [ ] tosis derived from the moment bounds inferred by our tool. References A positive skew indicates [1] Gilles Barthe, Thomas Espitau, Luis María Ferrer Fioriti, and Justin Hsu. Table 2. Skewness & kurtosis. that the mass of the distribu- 2016. Synthesizing Probabilistic Invariants via Doob’s Decomposition. tion is concentrated on the program skewness kurtosis In Computer Aided Verif. (CAV’16). https://doi.org/10.1007/978-3-319- rdwalk-1 2.1362 10.5633 41528-4_3 left, and larger skew means rdwalk-2 2.9635 17.5823 [2] Gilles Barthe, Thomas Espitau, Marco Gaboardi, Benjamin Grégoire, the concentration is more Justin Hsu, and Pierre-Yves Strub. 2018. An Assertion-Based Program left. A larger kurtosis, on the other hand, indicates that the Logic for Probabilistic Programs. In European Symp. on Programming distribution has fatter tails. Therefore, as the derived skew- (ESOP’18). https://doi.org/10.1007/978-3-319-89884-1_5 ness and kurtosis indicate, the distribution of the runtime [3] Gilles Barthe, Benjamin Grégoire, and Santiago Zanella Béguelin. 2009. Formal Certification of Code-Based Cryptographic Proofs. In Princ. of 푇 for rdwalk-2 should be more left-leaning and have fatter Prog. Lang. (POPL’09). https://doi.org/10.1145/1594834.1480894 tails than the distribution for rdwalk-1. Density estimates for [4] Ezio Bartocci, Laura Kovács, and Miroslav Stankovič. 2019. Automatic the runtime 푇 , obtained by simulation, are shown in Fig. 10. Generation of Moment-Based Invariants for Prob-Solvable Loops. In Our tool can also derive symbolic bounds on higher mo- Automated Tech. for Verif. and Analysis (ATVA’19). https://doi.org/10. ments. The table below presents the inferred upper bounds 1007/978-3-030-31784-3_15 [5] Kevin Batz, Benjamin Lucien Kaminski, Joost-Pieter Katoen, and on the variances for the random-walk benchmarks, where Christoph Matheja. 2018. How long, O Bayesian network, will I we replace the concrete inputs with symbolic pre-conditions. sample thee?. In European Symp. on Programming (ESOP’18). https: program pre-condition upper bnd. on the variance //doi.org/10.1007/978-3-319-89884-1_7 (2-1) 푥 0 1920푥 [6] Johannes Borgström, Ugo Dal Lago, Andrew D. Gordon, and Marcin ≥ Szymczak. 2016. A Lambda-Calculus Foundation for Universal (2-2) 푥 0 2166.6667푥 1541.6667 ≥ + Probabilistic Programming. In Int. Conf. on Functional Programming (ICFP’16). https://doi.org/10.1145/2951913.2951942 7 Conclusion [7] Olivier Bouissou, Eric Goubault, Sylvie Putot, Aleksandar Chakarov, We have presented an automatic central-moment analysis and Sriram Sankaranarayanan. 2016. Uncertainty Propagation Using Probabilistic Affine Forms and Concentration of Measure Inequalities. for probabilistic programs that support general recursion In Tools and Algs. for the Construct. and Anal. of Syst. (TACAS’16). and continuous distributions, by deriving symbolic interval https://doi.org/10.1007/978-3-662-49674-9_13 bounds on higher raw moments for cost accumulators. We [8] Quentin Carbonneaux, Jan Hoffmann, Thomas Reps, and Zhong Shao. have proposed moment semirings for compositional reason- 2017. Automated Resource Analysis with Coq Proof Objects. In Com- ing and moment-polymorphic recursion for interprocedural puter Aided Verif. (CAV’17). https://doi.org/10.1007/978-3-319-63390- 9_4 analysis. We have proved soundness of our technique using [9] Aleksandar Chakarov and Sriram Sankaranarayanan. 2013. Proba- a recent extension to the Optional Stopping Theorem. The bilistic Program Analysis with Martingales. In Computer Aided Verif. effectiveness of our technique has been demonstrated with (CAV’13). https://doi.org/10.1007/978-3-642-39799-8_34 Central Moment Analysis for Cost Accumulators in Probabilistic Programs PLDI ’21, June 20–25, 2021, Virtual, Canada

[10] Krishnendu Chatterjee, Hongfei Fu, and Amir Kafshdar Goharshady. (EMNLP’09). https://dl.acm.org/doi/10.5555/1699510.1699517 2016. Termination Analysis of Probabilistic Programs Through Posi- [26] Gurobi Optimization LLC. 2020. Gurobi Optimizer Reference Manual. tivstellensatz’s. In Computer Aided Verif. (CAV’16). https://doi.org/10. Available on https://www.gurobi.com. 1007/978-3-319-41528-4_1 [27] Guido Manfredi and Yannick Jestin. 2016. An Introduction to ACAS [11] Krishnendu Chatterjee, Hongfei Fu, Petr Novotný, and Rouzbeh Xu and the Challenges Ahead. In Digital Avionics Systems Conference Hasheminezhad. 2016. Algorithmic Analysis of Qualitative and Quanti- (DASC’16). https://doi.org/10.1109/DASC.2016.7778055 tative Termination Problems for Affine Probabilistic Programs. In Princ. [28] Annabelle K. McIver and Carroll C. Morgan. 2005. Abstraction, Refine- of Prog. Lang. (POPL’16). https://doi.org/10.1145/2837614.2837639 ment and Proof for Probabilistic Systems. Springer Science+Business [12] Devdatt P. Dubhashi and Alessandro Panconesi. 2009. Concentration Media, Inc. https://doi.org/10.1007/b138392 of Measure for the Analysis of Randomized Algorithms. Cambridge [29] Van Chan Ngo, Quentin Carbonneaux, and Jan Hoffmann. 2018. University Press. https://doi.org/10.1017/CBO9780511581274 Bounded Expectations: Resource Analysis for Probabilistic Programs. [13] Luis María Ferrer Fioriti and Holger Hermanns. 2015. Probabilistic Ter- In Prog. Lang. Design and Impl. (PLDI’18). https://doi.org/10.1145/ mination: Soundness, Completeness, and Compositionality. In Princ. 3192366.3192394 of Prog. Lang. (POPL’15). https://doi.org/10.1145/2676726.2677001 [30] Van Chan Ngo, Mario Dehesa-Azuara, Matthew Fredrikson, and Jan [14] Dustin Fraze. 2015. Space/Time Analysis for Cybersecurity (STAC). Hoffmann. 2017. Verifying and Synthesizing Constant-Resource Available on https://www.darpa.mil/program/space-time-analysis-for- Implementations with Types. In Symp. on Sec. and Privacy (SP’17). cybersecurity. https://doi.org/10.1109/SP.2017.53 [15] Zoubin Ghahramani. 2015. Probabilistic machine learning and arti- [31] Federico Olmedo, Benjamin Lucien Kaminski, Joost-Pieter Katoen, and ficial intelligence. Nature 521 (May 2015). https://doi.org/10.1038/ Christoph Matheja. 2016. Reasoning about Recursive Probabilistic nature14541 Programs. In Logic in Computer Science (LICS’16). https://doi.org/10. [16] Andrew D. Gordon, Thomas A. Henzinger, Aditya V. Nori, and Sri- 1145/2933575.2935317 ram K. Rajamani. 2014. Probabilistic Programming. In Future of Softw. [32] Martin L. Puterman. 1994. Markov Decision Processes: Discrete Stochastic Eng. (FOSE’14). https://doi.org/10.1145/2593882.2593900 Dynamic Programming. John Wiley & Sons, Inc. https://dl.acm.org/ [17] Marcel Hark, Benjamin Lucien Kaminski, Jürgen Giesl, and Joost-Pieter doi/book/10.5555/528623 Katoen. 2019. Aiming Low Is Harder: Induction for Lower Bounds in [33] Reuven Y. Rubinstein and Dirk P. Kroese. 2016. Simulation and the Probabilistic Program Verification. Proc. ACM Program. Lang. 4, POPL Monte Carlo Method. John Wiley & Sons, Inc. https://doi.org/10.1002/ (December 2019). https://doi.org/10.1145/3371105 9781118631980 [18] Jan Hoffmann and Martin Hofmann. 2010. Amortized Resource Anal- [34] Sriram Sankaranarayanan, Aleksandar Chakarov, and Sumit Gulwani. ysis with Polymorphic Recursion and Partial Big-Step Operational 2013. Static Analysis for Probabilistic Programs: Inferring Whole Semantics. In Asian Symp. on Prog. Lang. and Systems (APLAS’10). Program Properties from Finitely Many Paths. In Prog. Lang. Design https://doi.org/10.1007/978-3-642-17164-2_13 and Impl. (PLDI’13). https://doi.org/10.1145/2491956.2462179 [19] Martin Hofmann and Steffen Jost. 2003. Static Prediction of Heap [35] Anne Schreuder and Luke Ong. 2019. Polynomial Probabilistic Invari- Space Usage for First-Order Functional Programs. In Princ. of Prog. ants and the Optional Stopping Theorem. https://arxiv.org/abs/1910. Lang. (POPL’03). https://doi.org/10.1145/604131.604148 12634 [20] Bertrand Jeannet and Antoine Miné. 2009. Apron: A Library of Nu- [36] Robert Endre Tarjan. 1985. Amortized Computational Complexity. merical Abstract Domains for Static Analysis. In Computer Aided Verif. SIAM J. Algebraic Discrete Methods 6, 2 (August 1985). https://doi.org/ (CAV’09). https://doi.org/10.1007/978-3-642-02658-4_52 10.1137/0606031 [21] Xia Jiang and Gregory F. Cooper. 2010. A Bayesian spatio-temporal [37] Di Wang, Jan Hoffmann, and Thomas Reps. 2018. PMAF: An Algebraic method for disease outbreak detection. J. American Medical Informatics Framework for Static Analysis of Probabilistic Programs. In Prog. Lang. Association 17, 4 (July 2010). https://doi.org/10.1136/jamia.2009.000356 Design and Impl. (PLDI’18). https://doi.org/10.1145/3192366.3192408 [22] Benjamin Lucien Kaminski, Joost-Pieter Katoen, Christoph Matheja, [38] Di Wang, Jan Hoffmann, and Thomas Reps. 2021. Central Moment and Federico Olmedo. 2016. Weakest Precondition Reasoning for Analysis for Cost Accumulators in Probabilistic Programs. https: Expected Run–Times of Probabilistic Programs. In European Symp. //arxiv.org/abs/2001.10150 on Programming (ESOP’16). https://doi.org/10.1007/978-3-662-49498- [39] Di Wang, Jan Hoffmann, and Thomas Reps. 2021. Expected-Cost 1_15 Analysis for Probabilistic Programs and Semantics-Level Adaption of [23] Dexter Kozen. 1981. Semantics of Probabilistic Programs. J. Comput. Optional Stopping Theorems. https://arxiv.org/abs/2103.16105 Syst. Sci. 22, 3 (June 1981). https://doi.org/10.1016/0022-0000(81)90036- [40] Di Wang, David M. Kahn, and Jan Hoffmann. 2020. Raising Expec- 2 tations: Automating Expected Cost Analysis with Types. Proc. ACM [24] Satoshi Kura, Natsuki Urabe, and Ichiro Hasuo. 2019. Tail Probability Program. Lang. 4, ICFP (August 2020). https://doi.org/10.1145/3408992 for Randomized Program Runtimes via Martingales for Higher Mo- [41] Peixin Wang, Hongfei Fu, Amir Kafshdar Goharshady, Krishnendu ments. In Tools and Algs. for the Construct. and Anal. of Syst. (TACAS’19). Chatterjee, Xudong Qin, and Wenjun Shi. 2019. Cost Analysis of https://doi.org/10.1007/978-3-030-17465-1_8 Nondeterministic Probabilistic Programs. In Prog. Lang. Design and [25] Zhifei Li and Jason Eisner. 2009. First- and Second-Order Expectation Impl. (PLDI’19). https://doi.org/10.1145/3314221.3314581 Semirings with Applications to Minimum-Risk Training on Trans- lation Forests. In Empirical Methods in Natural Language Processing