Stochastic Lambda Calculus and Monads of Probability Distributions

Reprinted from the Conference Record of the 29th Annual ACM Symposium on Principles of Programming Languages (POPL'02) Stochastic Lambda Calculus and Monads of Probability Distributions Norman Ramsey Avi Pfeffer Division of Engineering and Applied Sciences Harvard University Abstract sidered as programming languages, the techniques used to build probabilistic models are weak. We would like to draw Probability distributions are useful for expressing the mean- on the large body of knowledge about programming lan- ings of probabilistic languages, which support formal mod- guages to design a good language that supports probabilis- eling of and reasoning about uncertainty. Probability dis- tic modeling. Stochastic lambda calculus, in which the deno- tributions form a monad, and the monadic de¯nition leads tations of expressions are probability distributions, not val- to a simple, natural semantics for a stochastic lambda cal- ues, is a suitable basis for such a language. To express the culus, as well as simple, clean implementations of common semantics of a stochastic lambda calculus, we exploit the queries. But the monadic implementation of the expectation monadic structure of probability distributions (Giry 1981; query can be much less e±cient than current best practices Jones and Plotkin 1989). in probabilistic modeling. We therefore present a language The contributions of this paper are: of measure terms, which can not only denote discrete prob- ² We show that the probability monad leads to simple, el- ability distributions but can also support the best known egant implementations of three queries commonly posed modeling techniques. We give a translation of stochastic of probabilistic models: expectation, sampling, and sup- lambda calculus into measure terms. Whether one trans- port. Using the monad as an intermediate form simpli- lates into the probability monad or into measure terms, the ¯es proofs of desirable properties, e.g., that the sampling results of the translations denote the same probability dis- function draws values from any model using appropriate tribution. probabilities. ² We show that the monadic implementation of expecta- 1. Introduction tion is potentially much less e±cient than techniques currently used in probabilistic reasoning. The problem Researchers have long modeled the behavior of agents using arises because a monad does not exploit intensional prop- logic, but logical models are a poor choice for dealing with erties of functions; it only applies functions. To support the uncertainty of the real world. For dealing with inherent an alternative implementation of expectation, we trans- uncertainty and with incomplete knowledge, probabilistic late stochastic lambda calculus into a simple language we models are better. call measure terms. By algebraic manipulation of mea- There are a variety of representations and reasoning tech- sure terms, we can express variable elimination, which is niques for probabilistic models. These techniques, which the standard technique for e±ciently computing expec- include Bayesian networks (Pearl 1988) and other kinds tation. Measure terms denote measures, and our trans- of graphical models (Jordan 1998), are centered around lation into measure terms is consistent with our monadic the structuring and decomposition of probability distribu- de¯nition of stochastic lambda calculus. tions. Recent work focuses on scaling up the techniques to deal with large, complex domains. Domains that re- Our work has implications for both design and imple- quire large-scale modeling techniques include medical di- mentation of probabilistic languages. For design, we show agnosis (Jaakkola and Jordan 1999) and military intelli- that one can support e±cient probabilistic reasoning sim- gence (Mahoney and Laskey 1998). ply by adding a choose operator to an ordinary func- As models grow large, it becomes more important to be tional language; it is not necessary to include language able to build them easily and to reuse the parts|but con- features that expose common implementation techniques such as variable elimination. For implementation, we Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are show that standard techniques of programming-language not made or distributed for pro£t or commercial advantage and that copies implementation|monadic interpreters and algebraic ma- bear this notice and the full citation on the £rst page. To copy otherwise, to nipulation of programs (including common-subexpression republish, to post on servers or to redistribute to lists, requires prior speci£c elimination)|can apply to probabilistic languages. Prob- permission and/or a fee. abilistic reasoners can enjoy the bene¯ts of higher-order, POPL ’02, Jan. 16-18, 2002 Portland, OR USA typed languages, without requiring undue e®ort from im- °c 2002 ACM ISBN 1-58113-450-9/02/01. $5.00 plementors. 154 2. Probabilistic models and queries Language for models & queries The simplest language we have found to describe probabilistic models is a lambda calculus in which expressions denote probability distributions. The primary means of creating interesting probability distributions is a new construct that Stochastic lambda calculus makes a probabilistic choice. choose p e1 e2 represents a linear combination of two distributions. Operationally, to take a value from the combined distribution, with probability p we take a value from e1 and with probability 1 ¡ p Probability monad Measure terms we take a value from e2. 2.1. An example model Expectation Sampling Support Expectation To illustrate our ideas, we present a simple model of tra±c lights and drivers, using a Haskell-like notation. Tra±c Ovals are representations; boxes are queries lights are probabilistically red, yellow, or green. htra±c examplei´ Figure 1: Implementation paths light1 = dist [ 0.45 : Red, 0.1 : Yellow, 0.45 : Green ] dist is a version of choose that is extended to combine the expectation of the identity function. The probabil- two or more weighted distributions. Here it means that ity of an outcome satisfying predicate p is the expecta- light1 has value Red with probability 0:45, value Yellow tion of the function \x -> if p x then 1 else 0. Condi- with probability 0:1, and value Green with probability 0:45. tional probability can be computed from probability, since Drivers behave di®erently depending on the colors of the P (p j q) = P (p^q)¥P (q). In the example above, we can an- lights they see. A cautious driver is more likely to brake swer the question about the probability of a crash by build- than an aggressive driver. ing the probability distribution of crash cautious driver htra±c examplei+´ aggressive driver light1 and computing the probability cautious_driver light = of the identity predicate. case light of Sampling means drawing a value from the distribution. Red -> dist [ 0.2 : Braking, 0.8 : Stopped ] By sampling repeatedly, we can not only approximate ex- Yellow -> dist [ 0.9 : Braking, 0.1 : Driving ] pectation but also get an idea of the shape of a distribution, Green -> Driving or of some function over the distribution. Like true exper- aggressive_driver light = imental data, samples can be ¯t to analytic solutions to case light of equations; \Monte Carlo" techniques used in the physical Red -> dist [ 0.3 : Braking, 0.6 : Stopped, sciences rely on sampling. 0.1 : Driving ] Support tells us from what subset of the entire sample Yellow -> dist [ 0.1 : Braking, 0.9 : Driving ] space a sample might be drawn with nonzero probability. Green -> Driving It is seldom interesting by itself, but a good implementa- We estimate that if two drivers go through a single light tion of support can make it easier to compute expectations from di®erent streets, there is a 90% probability of a crash. e±ciently. Support therefore plays a signi¯cant role in our htra±c examplei+´ implementation. crash d1 d2 light = [ 0.90 : d1 light == Driving && 2.3. Implementing probabilistic models d2 (other light) == Driving, 0.10 : False ] This paper presents two representations that are useful for where other Red = Green answering queries about probabilistic models: probability other Green = Red monads and measure terms. Figure 1 shows the translations other Yellow = Yellow involved. 1. A user writes a model and a query using a domain- 2.2. speci¯c, probabilistic language. In this paper, we take Queries the language to be the stochastic lambda calculus that is Having de¯ned a probabilistic model, we might wish to ask de¯ned formally in Section 4. In practice, we would pre- questions about it. For example, if two drivers, one cau- fer a richer language, e.g., one providing types, modules, tious and one aggressive, are approaching light1, what is and either an explicit ¯xed-point operator or recursion the probability of a crash? This question and many others equations. Even in practice, however, stochastic lambda can be answered using three kinds of queries: expectation, calculus is a useful intermediate form. sampling, and support. 2. We translate the model into a more restricted target The expectation of a function h is the mean of h over form: a value in the probability monad, or a measure the distribution. Expectation subsumes some other queries term. Section 4 gives a translation into the probability as special cases. The mean value of a distribution is monad, the semantics of which we explain in Section 3. Section 6 gives a translation into measure terms. 155 3. We use the target form to answer the query. The prob- º(A) = ¹(s¡1(A)), where ¹ is the Lebesgue measure, which ability monad can answer all three kinds of query; mea- describes uniform distributions. We use this formulation sure terms are designed to answer expectation queries for its practical value: if you give us a way of sampling uni- e±ciently. formly on the unit interval, by applying s we'll give you a The probability monad is easy to implement in Haskell way of sampling from the probability distribution º.

Load more