Central Moment Analysis for Cost Accumulators in Probabilistic Programs
Total Page:16
File Type:pdf, Size:1020Kb
Central Moment Analysis for Cost Accumulators in Probabilistic Programs Di Wang Jan Hoffmann Thomas Reps Carnegie Mellon University Carnegie Mellon University University of Wisconsin USA USA USA Abstract ACM Reference Format: For probabilistic programs, it is usually not possible to au- Di Wang, Jan Hoffmann, and Thomas Reps. 2021. Central Moment tomatically derive exact information about their properties, Analysis for Cost Accumulators in Probabilistic Programs. In Pro- ceedings of the 42nd ACM SIGPLAN International Conference on such as the distribution of states at a given program point. Programming Language Design and Implementation (PLDI ’21), June Instead, one can attempt to derive approximations, such 20–25, 2021, Virtual, Canada. ACM, New York, NY, USA, 15 pages. as upper bounds on tail probabilities. Such bounds can be https://doi.org/10.1145/3453483.3454062 obtained via concentration inequalities, which rely on the moments of a distribution, such as the expectation (the first raw moment) or the variance (the second central moment). 1 Introduction Tail bounds obtained using central moments are often tighter Probabilistic programs [16, 23, 28] can be used to manipulate than the ones obtained using raw moments, but automati- uncertain quantities modeled by probability distributions cally analyzing central moments is more challenging. and random control flows. Uncertainty arises naturally in This paper presents an analysis for probabilistic programs Bayesian networks, which capture statistical dependencies that automatically derives symbolic upper and lower bounds (e.g., for diagnosis of diseases [21]), and cyber-physical sys- on variances, as well as higher central moments, of cost ac- tems, which are subject to sensor errors and peripheral dis- cumulators. To overcome the challenges of higher-moment turbances (e.g., airborne collision-avoidance systems [27]). analysis, it generalizes analyses for expectations with an Researchers have used probabilistic programs to implement algebraic abstraction that simultaneously analyzes different and analyze randomized algorithms [2], cryptographic proto- moments, utilizing relations between them. A key innova- cols [3], and machine-learning algorithms [15]. A probabilis- tion is the notion of moment-polymorphic recursion, and a tic program propagates uncertainty through a computation, practical derivation system that handles recursive functions. and produces a distribution over results. The analysis has been implemented using a template- In general, it is not tractable to compute the result distribu- based technique that reduces the inference of polynomial tions of probabilistic programs automatically and precisely: bounds to linear programming. Experiments with our pro- Composing simple distributions can quickly complicate the totype central-moment analyzer show that, despite the ana- result distribution, and randomness in the control flow can lyzer’s upper/lower bounds on various quantities, it obtains easily lead to state-space explosion. Monte-Carlo simula- tighter tail bounds than an existing system that uses only tion [33] is a common approach to study the result distribu- raw moments, such as expectations. tions, but the technique does not provide formal guarantees, and can sometimes be inefficient [5]. CCS Concepts: Theory of computation Probabilistic • In this paper, we focus on a specific yet important kind computation Program analysis Automated! reasoning ; ; . of uncertain quantity: cost accumulators, which are pro- gram variables that can only be incremented or decremented Keywords: Probabilistic programs, central moments, cost through the program execution. Examples of cost accumu- analysis, tail bounds lators include termination time [5, 10, 11, 22, 31], rewards in Markov decision processes (MDPs) [32], position infor- mation in control systems [4, 7, 34], and cash flow during Permission to make digital or hard copies of part or all of this work for bitcoin mining [41]. Recent work [7, 24, 29, 41] has proposed personal or classroom use is granted without fee provided that copies are successful static-analysis approaches that leverage aggregate not made or distributed for profit or commercial advantage and that copies information of a cost accumulator -, such as -’s expected bear this notice and the full citation on the first page. Copyrights for third- value E - (i.e., -’s “first moment”). The intuition why it » ¼ party components of this work must be honored. For all other uses, contact is beneficial to compute aggregate information—in lieu of the owner/author(s). distributions—is that aggregate measures like expectations PLDI ’21, June 20–25, 2021, Virtual, Canada © 2021 Copyright held by the owner/author(s). abstract distributions to a single number, while still indi- ACM ISBN 978-1-4503-8391-2/21/06. cating non-trivial properties. Moreover, expectations are https://doi.org/10.1145/3453483.3454062 transformed by statements in a probabilistic program in a PLDI ’21, June 20–25, 2021, Virtual, Canada Di Wang, Jan Hoffmann, and Thomas Reps manner similar to the weakest-precondition transformation feature [7] [29] [24] [41] this work of formulas in a non-probabilistic program [28]. loop X X X X One important kind of aggregate information is moments. recursion X X In this paper, we show how to obtain central moments (i.e., continuous distributions X X X X non-monotone costs X X X E - E - : for any : 2), whereas most previous work »¹ − » ¼º ¼ ≥ higher moments X X X focused on raw moments (i.e., E - : for any : 1). Central » ¼ ≥ interval bounds X X X moments can provide more information about distributions. (a) For example, the variance V - (i.e., E - E - 2 , -’s » ¼ »¹ − » ¼º ¼ “second central moment”) indicates how - can deviate from [29, 41] [24] this work E - E - 3 2 its mean, the skewness (i.e., »¹ − » 3 ¼º2 ¼ , -’s “third standard- Derived E tick V tick V - / E tick 23 4 » ¼ ≤ » ¼ ≤ ¹ » ¼º bound » ¼ ≤ ¸ 432 223 28 223 28 ized moment”) indicates how lopsided the distribution of - ¸ ¸ ¸ E - E - 4 Moment type raw raw central is, and the kurtosis (i.e., »¹ − » ¼º ¼ , -’s “fourth standard- V - 2 Concentration Markov Markov ¹ » ¼º Cantelli ized moment”) measures the heaviness of the tails of the inequality (degree = 1) (degree = 2) distribution of -. One application of moments is to answer Tail bound 1 1 3 tail bounds 2 4 !1 0 queries about , e.g., the assertions about proba- P tick 43 ≈ ≈ −−−−−! bilities of the form P - 3 , via concentration-of-measure » ≥ ¼ » ≥ ¼ (b) inequalities from probability theory [12]. With central mo- 0.75 ments, we find an opportunity to obtain more precise tail ¼ [29, 41] bounds of the form P - 3 , and become able to derive 3 » ≥ ¼ 4 0.5 bounds on tail probabilities of the form P - E - 3 . ≥ »j − » ¼j ≥ ¼ Central moments E - E - : can be seen as polyno- [24] tick »¹ − » ¼º ¼ » . mials of raw moments E - , , E - : , e.g., the variance 0 25 » ¼ ··· » ¼ P this work V - = E - E - 2 can be rewritten as E - 2 E2 - , » ¼ »¹ − » ¼º ¼ » ¼ − » ¼ where E: - denotes E - : . To derive bounds on central » ¼ ¹ » ¼º 20 40 60 80 3 moments, we need both upper and lower bounds on the raw (c) moments, because of the presence of subtraction. For exam- Figure 1. (a) Comparison in terms of supporting features. ple, to upper-bound V - , a static analyzer needs to have an » ¼ (b) Comparison in terms of moment bounds for the running upper bound on E - 2 and a lower bound on E2 - . » ¼ » ¼ example. (c) Comparison in terms of derived tail bounds. In this work, we present and implement the first fully automatic analysis for deriving symbolic interval bounds terms of tail-bound analysis on a concrete program (see §5). on higher central moments for cost accumulators in prob- The bounds are derived for the cost accumulator tick in a abilistic programs with general recursion and continuous random-walk program that we will present in §2. It can be distributions. One challenge is to support interprocedural observed that for 3 20, the most precise tail bound for ≥ reasoning to reuse analysis results for functions. Our solution tick is the one obtained via an upper bound on the variance makes use of a “lifting” technique from the natural-language- V tick (tick’s second central moment). processing community. That technique derives an algebra »Our¼ work incorporates ideas known from the literature: for second moments from an algebra for first moments [25]. - Using the expected-potential method (or ranking super- We generalize the technique to develop moment semirings, martingales) to derive upper bounds on the expected pro- and use them to derive a novel frame rule to handle function gram runtimes or monotone costs [9–11, 13, 24, 29]. calls with moment-polymorphic recursion (see §2.2). - Using the Optional Stopping Theorem from probability Recent work has successfully automated inference of up- theory to ensure the soundness of lower-bound inference per [29] or lower bounds [41] on the expected cost of prob- for probabilistic programs [1, 17, 35, 41]. abilistic programs. Kura et al.[24] developed a system to - Using linear programming (LP) to efficiently automate derive upper bounds on higher raw moments of program run- the (expected) potential method for (expected) cost analy- times. However, even in combination, existing approaches sis [18, 19, 40]. cannot solve tasks such as deriving a lower bound on the The contributions of our work are as follows: second raw moment of runtimes, or deriving an upper bound We develop moment semirings to compose the moments • on the variance of accumulators that count live heap cells. for a cost accumulator from two computations, and to Fig.1(a) summarizes the features of related work on mo- enable interprocedural reasoning about higher moments. ment inference for probabilistic programs. To the best of our We instantiate moment semirings with the symbolic in- • knowledge, our work is the first moment-analysis tool that terval domain, use that to develop a derivation system supports all of the listed programming and analysis features. for interval bounds on higher central moments for cost Fig.1(b) and (c) compare our work with related work in accumulators, and automate the derivation via LP solving.