<<

arXiv:2011.13171v2 [cs.LO] 14 May 2021 7--6449-/1$10 22 IEEE ©2021 978-1-6654-4895-6/21/$31.00 adoti [ b in were intuitions out basic laid The usual interpretation. the probabilistic of instead values truth algebra. generalized Boolean of two-element set a as used etitdtrst eusud sarno onmybe may coin random a as unsound, be to terms restricted f[ of cpn o admcis hscauses This coins. random for scoping of algebra space measure Borel larg The Continuum standard variables. arbitrarily the a random an real-valued of introducing of independence by set The obtained was theory. Hypothesis set independen obtaining in for [ forcing results Scott Cohen by to introduced technique native first were models valued ewe w em ae narno rcs.Tesource The process. a random T called a is on randomness based of terms two between ⊕ aibe nasttertcmdlo h untyped the of model set-theoretic a in variables hc qain ewe stochastic between equations which o h stochastic the for h prahwsfruae ntrso nonstandard [ (see a theory id set the of ZF on of terms based models Boolean-valued theory in of set of formulated interpretation was Boolean-valued approach The [ Scott enn n aevle nacmlt ola ler.The evaluati algebra. of for Boolean equality principles reasoning complete provide a to was in intention values take and meaning rahsi hto [ of that is proaches languages. functional focuse higher-order for datasets has models research large semantic foundational and of on languages Recent analysis new learning. statistical of machine the emergence in the by applications buoyed interest, of [ w eatcapoce ul npeiu okta used higher-ord that about reason work to previous randomness programs. on of probabilistic source build explicit approaches . an lambda semantic stochastic the two for operational 1 hc atrsteie htacoc st emade be to is choice a that idea the captures which , .Tepiayga a odvlpa qainlter in theory equational an develop to was goal primary The ]. ct loosre htteeiescudb ie a given be could ideas these that observed also Scott h eatc rsne nti ae ifrfo those from differ paper this in presented semantics The h agaecnan iaypoaiitccoc operat choice probabilistic binary a contains language The n prahta srdclydfeetfo te ap- other from different radically is that approach One resurgence recent a enjoyed has programming Probabilistic Abstract Ω : nvra eatc o h Stochastic the for Semantics Universal 1 er .Aeeod Amorim de Azevedo H. Pedro nsvrlkywy.Tesmniso [ of semantics The ways. key several in ] → 2 ,teppr[ paper the ], 2 W en on n dqaednttoa and denotational adequate and sound define —We ω λ iigasqec fidpnetfi onflips. coin fair independent of sequence a giving trsudrvrospormtransformations. program various under -terms 2 n h omldvlpetcridotin out carried development formal the and ] λ 1 cluu.Bsdo noiia daof idea original an on Based -calculus. ,wihivle ola-audmodels Boolean-valued involves which ], .I I. 1 ucee nicroaigrandom incorporating in succeeded ] Ω NTRODUCTION opeeBoenagba was algebra, Boolean complete a , osn process tossing λ trshv probabilistic have -terms ∗ etrKozen Dexter , β admvariable random a , rdcinfrun- for -reduction 4 3 sa alter- an as ] 1 ) Boolean- ]). † s static use ] λ nvriyo Strathclyde of University -calculus. ∗ ‡ onl University Cornell cilUniversity McGill gthe ng These riefly ∗ auMardare Radu , and ea ce or er d e omtoso h atrsaeo nnt onsequences. coin infinite of space Cantor the of formations sastochastic a is ofr oec te.W mn h tcatcdenotational stochastic [ the of amend We semantics other. each to conform denotational a define not ab combinatorially programs. reason did to them work obliged which semantics, That Mont processes. chain Markov trace Carlo of implementation an of correctness [ proved; in given be not of can is development semantics adequacy operational the the which an to indeed applying for impediment semantics before major operational sound resolved a an is be probabili is all This It that . the decision. namely in probabilistic restriction, certain decisions one a than under only more in used rsn eatc ihsmlrdmi-hoei semanti domain-theoretic foundations. w simpler standard work, with with this semantics call In a would semantics. present they operational as Boolean-valued Boolean- a semantics, for nonstandard operational an The be of to development time. [ the them call of foundations allowing valued function coins, at random supplied dynamically we xoiinof exposition exp an of A constructor. instead fixpoint functions recursive represent to capsules stoch [ reformulated of the semantics to respect op- with the of semantics adequacy erational and soundness prove We view above. mentioned Boolean-valued the permits still [ that way of a in sources stochastic the for [ semantics operational λ an presented work prtoa ue iia o[ to similar rules operational r sfollows. as are paper. aclsa rsne n[ in presented as calculus ainovae h edfra xlctfipitconstructo fixpoint explicit an for need the obviates tation aslsrpeetafiiecagbacrpeetto fa of representation coalgebraic finite regular closed a represent Capsules 6 cluu sa daie eso fteCuc language Church the of version idealized an as -calculus ,aogwt esnn rnilsadapiain othe to applications and principles reasoning with along ], nti ae emdf h prahsdsrbdaoeto above described approaches the modify we paper this In • oeoeainlapoc a ae n[ in taken was approach operational more A • h raiaino hsppradormi contributions main Syntax: our and paper this of organization The osn Processes: Tossing 1 FV ∀ ,ytalw h omlto fbg n small-step and big- of formulation the allows yet ], x ( ∈ M † dom n§ In ) rks Panangaden Prakash , ⊆ osn processes tossing 1 λ 1 λ σ II that in open left problem main the solving ], dom oatrtesoigdsiln frandom of discipline scoping the alter to ] tr and - ctr a infinite (an -coterm FV erve h ytxo h stochastic the of syntax the review we σ ( and , σ n§ In capsule ( x )) 1 1 σ III eatc ute opiaethe complicate further semantics ] ,btwt n hne euse We change: one with but ], 5 ⊆ sa niomn,sc that such environment, an is ihu h rica restriction artificial the without ] eudraeacomprehensive a undertake we dom rmauepeevn trans- measure-preserving or , [ 7 λ sapair a is ] ‡ σ -Calculus ihe Roberts Michael , . λ tr) hsrepresen- This -term). h ,σ M, 1 .I , In ]. i where , 5 .That ]. ∗ astic licit stic r. out M λ cs e e - These processes arise in the study of behavioral invariance of the sample space is 2ω with Lebesgue measure, so a programs, i.e. programs that behave the same way except for tossing process is now any measurable T :2ω → 2ω coin usage. We characterize the computable and continuous such that T −1 preserves measure. Examples are tl(α)= processes, both partial and total, and show their relationship α1α2α3 ··· and evens(α)= α0α2α4 ··· . This allows a to prefix codes. We also identify a general class of processes more concrete treatment as developed in §III. called tree processes that we later use in §VII to characterize • We can omit the fixpoint operator of [1] using capsules the relationship between the coin usage patterns of our big- [7], as described in §III. and small-step . In the treatment of [1], general β-reduction is unsound, of tossing processes: Also in §III, we show precluding any standard operational semantics. This is be- ω how to embed the Cantor space 2 in a Scott domain in a cause the source of randomness used by a function in the natural way, thereby laying the groundwork for our modified evaluation of its body is a coin sequence packaged with ≤ω . Consider the set 2 of finite and the function at the site of the function’s definition. Thus infinite binary strings ordered by the prefix relation. This is an randomness, like environments, is statically scoped. This can algebraic DCPO whose compact elements are the finite strings. lead to the reuse of coins at different locations in the program, ≤ω ≤ω Define x↑ = {y ∈ 2 | x  y} for x ∈ 2 , where  is the thereby breaking linearity. For example, in the evaluation of prefix relation. The basic Scott-open sets are x↑ for x ∈ 2∗. 1 (λx.xa(xb))(λy.cy ⊕ dy), the same coin is used twice in the These are well known folklore results; the domain is usually resolution of two ⊕’s when the body of the first is known as the domain of binary streams. evaluated. The infinite streams or sequences, with the subspace topol- To achieve adequacy with respect to an operational seman- ogy inherited from the Scott topology, is homeomorphic to tics, we modify the denotational semantics of functions to Cantor space. Lemmas 7 and 8 establish a formal relationship allow the random source to be supplied as a at the between these two spaces and their continuous maps. This call site. “Scottified” Cantor space gives an explicit characterization Deterministic Denotational Semantics: In §V, we observe of functions that behave continuously with respect to coin that in the stochastic semantics, the value of depends usage in the sense that halting depend only on − not on the whole tossing process nor the environmentL M , finite prefixes of the coin sequence. This allows us to discuss T E which are random variables parameterized by a sample point computable and continuous tossing processes. All the tree , but only on their values. Intuitively, each run of processes are Scott-continuous. ω ∈ Ω A Simplified Stochastic Semantics: In §IV, we review the the program corresponds to one trial, which is determined stochastic denotational semantics of [1]. That semantics is by a single sample point ω. This is the same observation based on a semantic map used to eliminate . This allows us to develop an intermediate deterministic denotational semantics in which − : Exp → Env → Cont → Toss → RV probabilistic choices are resolved in advance, after which the L M program runs deterministically, making probabilistic decisions where based on a presampled infinite stack of random . • RV is the set of random variables Val from a sample Ω → The deterministic denotational semantics is built on a space taking values in a reflexive CPO Val, Ω reflexive domain of values constructed using the Scottified • Exp is the set of stochastic -terms , λ M Cantor space of §III. In §VI, we prove the equivalence of • Env is the set of environments Var RV, e : → the stochastic denotational semantics of [1] (as modified in • Cont is the set of continuations : RV → RV, and ω §IV) and the deterministic semantics of §V ( 16). • Toss is the set of tossing processes T :Ω → 2 . Operational Rules: In §VII, we give big-step and small- RV Thus M ECT : . step structured operational semantics in the style of [8]. The We canL simplifyM the exposition as follows: big-step rules take the form hM,ei ⇓α hv,fi, which means • Suppose we restrict continuations to be of the form that hM,ei reduces to normal form hv,fi with coins α ∈ 2ω. SF = λf.λω.Fω(fω) for some F : Ω → [Val → The small-step rules take the form hM,ei →x hN,fi, which Val], where [Val → Val] denotes the Scott-continuous means that hM,ei reduces to hN,fi via a that 2 deterministic maps. Then all continuations that arise in consumes exactly a prefix x ∈ 2∗ of the infinite coin sequence. the inductive definition of M are also of this form. In Theorem 17, we prove the equivalence of the big- and Formally adopting this restrictionL M allows us to eliminate small-step rules, which use their random coins in a different continuations altogether. pattern. The relationship is characterized by a tree process as • A tossing process determines how a supplied source of described in §III. randomness is used in a computation. In [1] they are Soundness and Adequacy: In §VIII, we prove the soundness of type Ω → 2ω, where Ω is an abstract sample space. and adequacy of our denotational semantics with respect to For our purposes, there is no reason not to assume that our big-step operational semantics (Theorem 18). Unlike most 1https://en.wikipedia.org/wiki/Scott domain adequacy proofs that use logical relations, this proof is a rela- 2The operation S is the familiar S-combinator from . tively straightforward inductive argument, as the deterministic

2 denotational semantics and the big-step operational semantics the smallest σ-algebra containing the Borel sets and all subsets use their coins in the same pattern. of null sets. The set of null sets is denoted N . A tossing process is any measurable map T : 2ω → 2ω II. SYNTAX such that T −1 preserves measure; that is, for all A ∈ B, Let Var be a countable set of program variables x, y, . . . . Pr(T −1(A)) = Pr(A). Given an infinite bitstream α = Let Exp denote the set of untyped λ-terms M,N,K,... α0α1α2 ··· , we can define the examples tl(α)= α1α2α3 ··· with the usual abstraction and application operators plus an and evens(α) = α0α2α4 ··· . A tossing process determines additional binary operator ⊕ for probabilistic choice. Let Λ how a supplied source of randomness is used in a computation. denote the set of λ-abstractions, λ-terms of the form λx.M. Lemma 1. T is a tossing process iff for all x ∈ 2∗, A. Capsules Pr({α | x ≺ T (α)})=2−|x| A capsule is a pair , where Exp and Var hM, σi M ∈ σ : ⇁ −1 Λ is a capsule environment, such that Proof. We have x ≺ T (α) iff α ∈ T ({γ | x ≺ γ}), therefore (i) FV(M) ⊆ dom σ (ii) ∀x ∈ dom σ FV(σ(x)) ⊆ dom σ. T is a tossing process Here dom σ refers to the domain of σ and FV M refers to ⇔ ∀x ∈ 2∗ Pr(T −1({γ | x ≺ γ})) = Pr({γ | x ≺ γ}) the set of free variables of M. A capsule is reduced if its ⇔ ∀x ∈ 2∗ Pr({α | x ≺ T (α)})=2−|x|. first component is in Λ. Reduced capsules are denoted with lowercase letters, as hv, σi. A capsule is a finite coalgebraic representation of a regular closed λ-coterm (infinitary λ-term), which is an element of the A. Computable and Continuous Processes final coalgebra for the signature of the λ-calculus. Capsules For a function f : 2ω → 2ω to be computable, it must be give a convenient representation of recursive functions without possible to emit each digit of the output stream after reading the need of fixpoint combinators. only finitely many digits of the input stream. For example, one Capsules are considered equivalent modulo α-conversion, can emit the nth digit of tl α after reading n +1 digits of α, including α-conversion of the variables used in σ. In terms and one can emit the nth digit of evens α after reading the of nominal sets with the variables as atoms, the support of a first 2n − 1 digits of α. capsule is ∅. Capsules are also considered equivalent modulo Lemma 2. All computable tossing processes ω ω are garbage collection in the sense that we can assume without T :2 → 2 continuous. All continuous functions 2ω → 2ω are uniformly loss of generality that dom e is a minimal set of variables continuous with respect to the standard metric. satisfying (i) and (ii). The capsule β-reduction rule is Proof. To be computable, it must be the case that any finite prefix x ≺ T α of the output is determined by some finite ( fresh) h(λx.M) v, σi → hM[y/x], σ[v/y]i y prefix of the input α. This implies that x ≺ Tβ for any β that applied in a call-by-value evaluation order. This mechanism agrees with α on a sufficiently long prefix; in other words, {β | −1 −1 captures static scoping without closures, heaps, or stacks [7]. y ≺ β}⊆ T ({γ | x ≺ γ}) for some y ≺ α. Thus T ({γ | Here we are using the notation [−/−] for both (as x ≺ γ}) is open. As x was arbitrary, T is continuous. in M[y/x]) and rebinding (as in σ[v/y]). It is a standard result that any continuous function on a Capsules were introduced in [7]. For the stochastic λ- compact metric space is uniformly continuous. calculus, we augment the system with the new syntactic For example, tl is Lipschitz with constant 2: d(tl α, tl β) ≤ construct M ⊕ N for probabilistic choice. 2d(α, β). The maps evens and odds are not Lipschitz, but III. TOSSING PROCESSES they are H¨older of order 1/2; that is, both maps satisfy d(Tα,Tβ) ≤ d(α, β). ω The Cantor space 2 is the space of infinite bitstreams. It is There is a subtle distinction between “reading” and “con- p the topological power of ω copies of the two-element discrete suming” a digit. The latter refers to using the digit to make a ω space 2= {0, 1}. Elements of 2 are denoted α, β, . . . . The probabilistic choice. One can read digits without consuming ω topology is generated by basic open sets Ix = {α ∈ 2 | x ≺ them; they can be saved to make probabilistic choices later, at ∗ α}, where x ∈ 2 and ≺ denotes the strict prefix relation. The which point they are consumed. It is important for indepen- sets Ix are called intervals. The topology is also generated by dence that digits not be consumed more than once. −n the standard metric d(α, β)=2 , where n is the length of Uniform continuity fails if we allow tossing processes to the longest common prefix of α and β, or 0 if α = β. be partial. A partial tossing process is a measure-preserving The Borel sets B of the Cantor space are the smallest σ- partial measurable function T : 2ω ⇀ 2ω. Such a function algebra containing the open sets. The uniform (Lebesgue) is necessarily almost everywhere defined, since dom T = ω measure Pr on (2 , B) is generated by its values on intervals: T −1(2ω), which must have measure 1. Pr({α | x ≺ α})=2−|x|. The Lebesgue measurable sets are

3 ω Lemma 3. All computable partial tossing processes T :2 → In addition, x 7→ Px is said to be exhaustive provided 2ω are continuous. There is a computable partial tossing • if P is an exhaustive prefix code, then so is x∈P Px. process that is continuous but not uniformly continuous. Theorem 6. For every continuous partial tossingS process T , ω ω Proof. Computable partial tossing processes T :2 → 2 are there is a unique coding function x 7→ Px such that continuous for the same reason that total ones are. (i) x ≺ T α iff y ≺ α for some y ∈ Px; in other words, For the second statement, define T coinductively as follows: −1 T (Ix)= {α | x ≺ T α} = Iy; T (0∗10α)=0 T (α) T (0∗11α)=1 T (α). y[∈Px ∗ ω The domain of definition of T is (0 1) , the measure-1 set of (ii) Px is -minimal among prefix codes satisfying (i); streams containing infinitely many 1’s. It is continuous, since (iii) −|x|. Pr( y∈Px Iy)=2 if α and β share a prefix with at least 2n 1’s, then T α and If T is total, then x 7→ P is exhaustive. Moreover, every S x Tβ share a prefix of length n. It is not uniformly continuous, coding function of this form gives rise to a continuous partial as there is no bound on the of input digits that need or total tossing process. to be read before producing the next output digit. Proof. A proof can be found in the Appendix which can be The T of the previous lemma is undefined on the nullset found in the complete version of this paper. 2∗0ω. One can define T arbitrarily on this set, but Lemma 2 says that the resulting total tossing process cannot be B. Tree Processes continuous. This does not rule out the possibility that every Let t : 2∗ → ω be a labeled tree with no repetition of total tossing process might be equivalent modulo N to some labels along any path; that is, if x, y ∈ 2∗ with x ≺ y, then continuous partial tossing process. However, this too is false. t(x) 6= t(y). Each such tree gives rise to a continuous tossing ω ω ω Lemma 4. There is a tossing process that is not equivalent process T :2 → 2 as follows. Given α = α0α1α2 ···∈ 2 , let T (α)= β β β ··· , where β = α . Thus the modulo N to any continuous partial tossing process. 0 1 2 n t(β0β1···βn−1) bit of the input sequence α that is tested in the nth step can Proof. The proof uses [9, Exercises 7 and 8, p. 59]. A depend on the outcomes of previous tests as determined by t. complete proof can be found in the Appendix which can be The restriction “no repetition of labels along any path” ensures found in the complete version of this paper. that no coin is used more than once. Lemma 5. All continuous tossing processes, partial or total, Every such T is measurable and measure-preserving, thus are surjective. a tossing process: −1 ω −1 −1 T ({γ | β0 ··· βn−1 ≺ γ})= {α | β0 ··· βn−1 ≺ T (α)} Proof. For any β ∈ 2 , T ({β}) = n T ({α | αn = n−1 n−1 βn}). This is the intersection of a collection of closed sets with T the finite intersection property in a compact space, therefore = {α | T (α)i = βi} = {α | αt(β0···βi−1) = βi} i=0 i=0 it is nonempty. ^ \ Every tossing process is equivalent modulo to a partial −1 T N Pr(T ({γ | β0 ··· βn−1 ≺ γ})) ′ ′−1 T that is “almost continuous” in the sense that all T ({α | n−1 0 ′ x ≺ α}) are ∆2, that is, both Gδ and Fσ. One can obtain T = Pr( {α | αt(β0···βi−1) = βi}) from T by deleting countably many nullsets Gx \ Fx, where i=0 \ Gx and Fx are Gδ and Fσ, respectively, such that Pr(Fx)= n−1 and −1 . The sets and −n Pr(Gx) Fx ⊆ T ({α | x ≺ α}) ⊆ Gx Fx = Pr({α | αt(β0···βi−1) = βi})=2 . G exist by Lebesgue measurability. i=0 x Y The following theorem gives a characterization of the con- Such processes are called tree processes. ω ω tinuous partial and total tossing processes T : 2 → 2 . A Tree processes are uniformly continuous in the standard binary prefix code is a nonempty set of prefix-incomparable metric: T (α) and T (β) agree on their length-n prefixes finite-length binary strings. A binary prefix code P is exhaus- provided α and β agree on their length-m prefixes, where ω tive if all α ∈ 2 have a prefix in P . An exhaustive prefix m is the supremum of the labels on all nodes of depth n or code is necessarily finite by compactness. less in the tree t. If P and Q are two binary prefix codes, write P  Q if every element of Q is an of some element of P ; C. Scottifying the Cantor Space that is, for every y ∈ Q, there exists x ∈ P such that x  y. We can embed the Cantor space 2ω in a Scott domain in a A coding function is a map x 7→ Px, where Px is a prefix natural way. Consider the set 2≤ω of finite and infinite binary code, such that strings ordered by the prefix relation. This is an algebraic CPO • Pε = {ε}; whose compact elements are the finite strings. Define x↑ = ≤ω ≤ω • if x and y are prefix-incomparable, then Px ∩ Py = ∅; {y ∈ 2 | x ≺ y} for x ∈ 2 . The basic Scott-open sets ∗ • if x  y, then Px  Py. are x↑ for x ∈ 2 .

4 Lemma 7. simplifying the presentation. This also makes sense at an (i) If B is a Scott-open set of 2≤ω, then B ∩2ω is a Cantor- intuitive level: A single trial is a single evaluation of the open set of 2ω. program and depends only on one sample from Ω. (ii) If A is a Cantor-open set of 2ω, then {x ∈ 2≤ω | x↑⊆ (ii) Tossing processes in [1] are of type Ω → 2ω, where A} is a Scott-open set of 2≤ω, and is largest Scott-open Ω is an abstract sample space. A large part of the set B such that B ∩ 2ω = A. development of [1] was concerned with invariance prop- erties of measure-preserving transformations of . For ω Ω Thus the Cantor space 2 is a subspace of the Scott our purposes, there is no reason not to take the sample ≤ω space 2 . The “Scottified” Cantor space gives an explicit space to be 2ω with the standard Lebesgue measure. characterization of functions that behave continuously with Thus tossing processes become measure-preserving maps respect to coin usage in the sense that computations depend T : 2ω → 2ω. This allows a more concrete treatment. only on finite prefixes of the coin sequence. A comprehensive characterization of such processes is Let D be a continuous ω-CPO ordered by ⊑ with a meet given in §III. operation d. Let ≺ be the proper prefix relation on strings. (iii) The definition of [1] included a fixpoint operator. Our use Lemma 8. of capsules allows us to eliminate this operator without (i) If f :2≤ω → D is Scott-continuous, then f ↾ 2ω :2ω → loss of expressiveness. D is Cantor-continuous. In addition to these simplifications, we introduce a more (ii) If g :2ω → D is Cantor-continuous, then g extends to a radical change that will admit a full-fledged operational se- Scott-continuous map ≤ω . mantics, namely the dynamic scoping of the random source. λx. dx≺α g(α):2 → D The type of the semantic map is now ω ω f ↾ 2 ω g 2 D 2 D − : Exp → Env → Toss → RV. L M The values RV do not form a reflexive CPO, however they are f λx. d g(α) x≺α built out of a reflexive CPO, as explained below in §VI-A. ω ω 2≤ 2≤ Fun : RV → [Toss → RV → RV] Proof. A proof can be found in the Appendi which can be Lam :[Toss → RV → RV] → RV found in the complete version of this paperx. We will define these functions explicitly below in §VI-A. IV. STOCHASTIC SEMANTICS Definition 9. We review briefly the stochastic semantics from [1]. This semantics was based on a map (i) x ET = E(x) 3 3 3 (ii) LMNM ET = Fun( M E(π0 ◦ T ))(π1 ◦ T )( N E(π2 ◦ − : Exp → Env → Cont → Toss → RV TL )) M L M L M L M Lam where (iii) λx.M ET = (λT v . M E[v/x]T ) (iv) LM ⊕ MN ET = λω .Lhd(MTω) ? M E(tl ◦T )ω : • RV is the set of random variables Ω → Val from a sample LN E(tl ◦TM )ω L M space Ω taking values in a reflexive CPO Val, whereL clauseM (ii) uses the notation π3(α) (α when evident • Exp is the set of stochastic λ-terms M, i i from the ) to refer to the subsequence of α consisting • Env is the set of environments E : Var → RV, of bits whose indices are i mod 3; thus π3(α α α ··· ) = • Cont is the set of continuations C : RV → RV, and 1 0 1 2 ω α α α ··· , and clause (iv) uses the ternary • Toss is the set of tossing processes T :Ω → 2 . 1 4 7 Thus M ECT : RV. The Boolean-valued semantics inter- s, if b =1, pretedL propertiesM in the Boolean algebra of measurable sets of b ? s : t = (1) (t, if b =0. Ω. We can simplify the definition of [1] with a few observa- In M E, we assume that FV(M) ⊆ dom E. tions. L M V. DETERMINISTIC SEMANTICS (i) In [1], the map − is parameterized by continuations C : (Ω → Val)L →M (Ω → Val). Suppose we restrict The observation of §IV that allowed continuations to be continuations to be of the form SF = λf.λω.Fω(fω) eliminated can be carried further. All components in the for some F : Ω → [Val → Val], where [Val → Val] de- definition of − are parameterized by sample points ω ∈ Ω, notes the Scott-continuous deterministic maps.3 Then all but as observed,L thereM is no resampling in the course of a single continuations that arise in the inductive definition of − trial; it is the same ω. The function − does not really depend are also of this form. Formally adopting this restrictionL M on the whole tossing process T orL theM whole environment E, allows us to eliminate continuations altogether, thereby which are random variables, but only on their values. This observation allows us to develop an intermediate deterministic 3The operation S is the familiar S-combinator from combinatory logic. denotational semantics in which all probabilistic choices are

5 resolved in advance. The program runs deterministically, re- solving probabilistic choices by consulting a preselected stack To obtain a reflexive domain, we need to construct contin- of random bits. In this section we introduce this semantics uous maps and develop some of its basic properties. Later, in §VI, we will prove that it is equivalent to the stochastic semantics of fun : Val → [2ω → Val → Val] [1] as modified in §IV (Theorem 16). lam : [2ω → Val → Val] → Val A. A Domain of Values such that fun ◦ lam = id. Barendregt [10, §5] presents several constructions of re- flexive CPOs that can serve as denotational models of the fun a = λβv .{q ∈ Q | ∃x ≺ α ∃b ∈Pfin(v) (x, b, q) ∈ a} untyped λ-calculus. One concrete such model, due to Engeler = {q | (x, c, q) ∈ a} (2) [11],[12], is a reflexive -algebraic CPO ordered by ω P(Q) x≺α c∈Pfin(b) inclusion, where Q is a certain countable set. The basic Scott- [ [ lam f = {(x, c, q) ∈ 2∗ ×P (Q) × Q | ∀β ∈ I q ∈ fβc} open sets are a↑ = {b | a ⊆ b}, where a is a finite subset of fin x ∅ Q. A function P(Q) →P(Q) is continuous if it is continuous ∪{ }. (3) in this topology; equivalently, if . fb = c∈Pfin(b) fc Then In this section we present a version of the Engeler model modified to include a random sourceS as an argument to fun(lam f)αb = {q | (x, c, q) ∈ lam f} x≺α continuous functions using the Scottified Cantor space of [ c∈P[fin(b) §III-C. Define = {q | ∀β ∈ Ix q ∈ fβc} ∅ ∗ x≺α c∈Pfin(b) Q0 = { } Qn+1 = Qn ⊎ (2 ×Pfin(Qn) × Qn) Q = Qn [ [ n [ = fβc = fαb. and let Val = P(Q), ordered by inclusion. The basic Scott- c∈Pfin(b) x≺α β∈Ix [ [ \ open sets of Val are , where . a↑ = {b | a ⊆ b} a ∈ Pfin(Q) Also, note that since ⊥ = ∅ in Val, fun ⊥ = λβv .⊥, but A function Val Val is continuous if it is continuous in this → lam ⊥ = {∅} 6= ⊥. This is important for call-by-value, as topology. we must distinguish Ω from λx.Ω for our adequacy result of A function ω Val Val is continuous if it is con- f :2 → → §VIII. tinuous in both variables with respect to the Scott topology on Val and the Cantor topology on 2ω. The continuous functions B. The Semantic Function ω of this type are denoted [2 → Val → Val]. Intuitively, f is For partial functions f : D ⇀ E, define dom f = {x ∈ D | continuous if its value on α ∈ 2ω and b ∈ Val depends only f(x) is defined}. Equivalently, for functions f : D → E⊥, on finite prefixes of α and finite subsets of b. define dom f = {x ∈ D | f(x) 6= ⊥}. Let FV(M) denote the Lemma 10. Let f : [2ω → Val → Val]. Then free variables of M. The type of our deterministic semantic function is fαb = fβc. ′ ω x≺α ((−)) : Exp → Env → 2 → Val, c∈P[fin(b) [ β\∈Ix Proof. Let c ∈ Val. By the continuity of f in its first argument, where Env′ = Var → Val is the set of (deterministic) λα.fαc : 2ω → Val is continuous, therefore for any basic environments. open set a↑, Definition 11. −1 −1 α ∈ (λα.fαc) (a↑) ⇔ ∃x ≺ α Ix ⊆ (λα.fαc) (a↑). (i) ((x))eα = e(x) 3 3 3 Then (ii) ((MN ))eα = fun(((M ))e(π0(α)))(π1 (α))((( N ))e(π2(α))) (iii) ((λx.M ))eα = lam(λβv .(( M ))e[v/x]β) a ⊆ fαc ⇔ fαc ∈ a↑ (iv) ((M ⊕ N ))eα = hd α ? ((M ))e(tl α) : ((N ))e(tl α) ⇔ α ∈ (λα.fαc)−1(a↑) where clause (iv) uses the ternary predicate (1) and fun −1 ⇔ ∃x ≺ α Ix ⊆ (λα.fαc) (a↑) and lam are defined in (2) and (3), respectively. In ((M ))e, −1 we assume that FV(M) ⊆ dom e. In (iii), we interpret the ⇔ ∃x ≺ α ∀β ∈ Ix β ∈ (λα.fαc) (a↑) metaexpression λβv .(( M ))e[v/x]β as strict; thus ⇔ ∃x ≺ α ∀β ∈ Ix a ⊆ fβc (λβv .(( M ))e[v/x]β) α ⊥ = ⊥. ⇔ a ⊆ fβc. x≺α [ β\∈Ix Note that this is completely deterministic. Probabilistic As a was arbitrary, for any b ∈ Val, choices are resolved by consulting a preselected stack of random bits α. fαb = fαc = fβc. In the clause for MN, instead of evens and odds as in [1], x≺α c∈P[fin(b) c∈P[fin(b) [ β\∈Ix we divide the coins into three streams for use in, respectively,

6 4 the evaluation of M, the evaluation of N, and the application σ : Var → Val+Λ⊥ . From this we can obtain a new semantic of the value of M to the value of N. environment σ∗ : Var → Val as follows. Consider the map This definition is well founded, but the resulting metaex- P : (Var → Val) → (Var → Val) pression is a λ-term that must be evaluated in the metasystem, σ and that evaluation may not terminate. We define the value σ(x), σ(x) ∈ Val to be when that happens. For example, consider , ⊥ ((Ω))eα Pσ(ℓ)(x)= ((σ(x)))ℓα, σ(x) ∈ Λ where . Define Ω = (λx.xx)(λx.xx) ⊥, σ(x)= ⊥

u , ((λx.xx ))eα where in the second case we use Definition 11(iii). The α ω ω = lam (λβv .(( xx))e[v/x]β) there can be any element of 2 , as ((λx.M ))ℓ :2 → Val is a . The third case is already included in the = lam (λβv . fun (((x))e[v/x]β ) β (((x))e[v/x]β )) 0 1 2 first, so henceforth we omit explicit mention of it. Note that lam fun = (λβv . vβ1 v) 6= ⊥. dom Pσ(ℓ)= dom σ.

Note that this value is independent of α, due to the fact that Lemma 13. Pσ(ℓ) is monotone in both σ and ℓ. coins are dynamically scoped. Then Proof. Since Λ⊥ is a flat domain, if σ1 ⊑ σ2 and σ1(x) ∈ Λ, then σ1(x)= σ2(x). It follows that for all ℓ : Var → Val and ((Ω))eα = (((λx.xx)(λx.xx) ))eα x ∈ Var, = fun(((λx.xx ))eα0) α1 (((λx.xx ))eα2) σ1(x), σ1(x) ∈ Val = fun u α1 u Pσ1 (ℓ)(x)= (((σ1(x)))ℓα, σ1(x) ∈ Λ = fun(lam(λβv . fun vβ1 v)) α1 u σ (x), σ (x) ∈ Val = (λβv . fun vβ1 v) α1 u ⊑ 2 2 (4) ((σ (x)))ℓα, σ (x) ∈ Λ = fun u α11 u ( 1 2 = ··· σ (x), σ (x) ∈ Val = 2 2 (((σ2(x)))ℓα, σ2(x) ∈ Λ Lemma 12. For e1,e2 : Var → Val, if e1 ⊑ e2 and FV(M) ⊆ = Pσ2 (ℓ)(x). dom e1, then ((M ))e1 ⊑ ((M ))e2.

If ℓ1 ⊑ ℓ2, then by Lemma 12, ((σ(x)))ℓ1α ⊑ ((σ(x)))ℓ2α for Proof. Note that if e1 ⊑ e2, then dom e1 ⊆ dom e2 and σ(x) ∈ Λ, therefore Pσ(ℓ1)(x) ⊑ Pσ(ℓ2)(x). e1(x) ⊑ e2(x) for all x ∈ dom e1. The proof is a straight- forward induction on the structure of . Suppose . M e1 ⊑ e2 By the Knaster-Tarski theorem, Pσ has a least fixpoint ((x))e α = e (x) ⊑ e (x) = ((x))e α, x ∈ dom e . σ(x), σ(x) ∈ Val 1 1 2 2 1 ∗ (5) σ (x)= ∗ (((σ(x)))σ α, σ(x) ∈ Λ 3 3 3 ∗ ((MN ))e1α = fun(((M ))e1(π0 (α)))(π1 (α))(((N ))e1(π2 (α))) and we define ((M ))σα = ((M ))σ α, where the right-hand 3 3 3 side is by Definition 11. With this formalism, we have ⊑ fun(((M ))e2(π0 (α)))(π1 (α))(((N ))e2(π2 (α))) = ((MN ))e2α. ((rec f.λx.M ))σ = ((f ))σ[λx.M/f].

Lemma 14. If σ1, σ2 : Var → Val + Λ⊥ are mixed en- ∗ ∗ ((λx.M ))e1α = lam(λβv .(( M ))e1[v/x]β) vironments with σ1 ⊑ σ2, then σ1 ⊑ σ2 . In addition, if ∗ ∗ σ1(x) = σ2(x) for all x ∈ dom σ1, then σ1 (x) = σ2 (x) ⊑ lam(λβv .(( M ))e2[v/x]β) ∗ for all x ∈ dom σ1 = dom σ1. = ((λx.M ))e2α. Proof. From (5) and the fact that ((λx.M ))eα 6= ⊥ we have ∗ ∗ that dom σi = dom σi . By Lemma 13, we have Pσ1 (σ2 ) ⊑ ∗ ∗ ∗ ∗ ((M ⊕ N ))e1α = hd α ? ((M ))e1(tl α) : ((N ))e1(tl α) Pσ2 (σ2 )= σ2 , so σ2 is a prefixpoint of Pσ1 . Since σ1 is the least prefixpoint, σ∗ ⊑ σ∗. ⊑ hd α ? ((M ))e2(tl α) : ((N ))e2(tl α) 1 2 In addition, if σ1 and σ2 agree on dom σ1, then for x ∈ = ((M ⊕ N ))e2α. dom σ1, equality holds in (4), thus Pσ1 (ℓ)(x) = Pσ2 (ℓ)(x). As this is true for all ℓ, we have

∗ α α ∗ σ1 (x) = sup Pσ1 (⊥)(x) = sup Pσ2 (⊥)(x)= σ2 (x). To extend ((−)) to capsules, we combine a semantic envi- α α ronment Var → Val as used in Definition 11 and a capsule 4 environment Var → Λ⊥ in a single mixed environment In the coproduct, the ⊥’s of the two domains are coalesced.

7 Lemma 15. Let σ : Var → Val+Λ⊥ be a mixed environment. single run of the program. All independence requirements are If v ∈ Λ, y 6∈ FV(v), and y 6∈ dom σ, then σ[v/y]∗ = satisfied by the way randomness is allocated to the different σ∗[((v ))σ∗α/y]. tasks. For example, in the clause for MN , the coin sequence is broken into three disjoint sequencesL to useM in three distinct Proof. tasks (evaluation of M, evaluation of N, and application of σ[v/y]∗(y) M to N). This is equivalent to three independent tossing ∗ processes. In the clause for ⊕, we resolve the probabilistic = Pσ[v/y](σ[v/y] )(y) choice using the head coin, but then throw it away and σ[v/y](y), σ[v/y](y) ∈ Val continue with the tail of the coin sequence, so the head coin = ∗ (((σ[v/y](y)))σ[v/y] α, σ[v/y](y) ∈ Λ is not reused. Because of these considerations, linearity is = ((v ))σ[v/y]∗α (6) maintained. = ((v ))σ∗α (7) A. Fun and Lam = σ∗[((v ))σ∗α/y](y), We first show how to define Fun and Lam with the desired properties from fun and lam. Recall that where the inference (6) is because σ[v/y](y)= v and v ∈ Λ fun and the inference (7) is by Lemma 14. For x 6= y, x ∈ dom σ, ω Val [2 → Val → Val] lam σ[v/y]∗(x) ∗ = Pσ[v/y](σ[v/y] )(x) and we need σ[v/y](x), σ[v/y](x) ∈ Val Fun → Val → ω → → Val → → Val = ∗ Ω [(Ω 2 ) (Ω ) (Ω )]. (((σ[v/y](x)))σ[v/y] α, σ[v/y](x) ∈ Λ Lam σ(x), σ(x) ∈ Val (8) We define Fun and Lam in two steps: = ∗ (((σ(x)))σ[v/y] α, σ(x) ∈ Λ Fun = Fun2 ◦ Fun1 Lam = Lam1 ◦ Lam2, σ(x), σ(x) ∈ Val (9) = ∗ where (((σ(x)))σ α, σ(x) ∈ Λ Fun1 ∗ ω = Pσ(σ )(x) Ω → Val [Ω → (2 → Val → Val)] Lam = σ∗(x) 1 ∗ ∗ Fun2 ω = σ [((v ))σ α/y](x), [(Ω → 2 ) → (Ω → Val) → (Ω → Val)] Lam where the inference (8) is because σ[v/y](x)= σ(y) and the 2 inference (9) is by Lemma 14. Fun1 and Lam1 are just the covariant hom-functor in Set VI. RELATING THE STOCHASTIC AND DETERMINISTIC applied to fun and lam, respectively. SEMANTICS Fun1 = Set(Ω, fun)= fun ◦− In Definition 9, we have reworked the semantic function of Lam = Set(Ω, lam)= lam ◦− [1] to be of type 1 Then − : Exp → Env → Toss → RV, L M (Fun1 ◦ Lam1)f = ((fun ◦−) ◦ (lam ◦−))f and in Definition 11 we have given a deterministic semantics fun lam fun lam of type = ( ◦−)( ◦f)= ◦ ◦f = f, so Fun ◦ Lam = id. Also, ((−)) : Exp → Env′ → 2ω → Val, 1 1 Lam (λω.f) = (lam ◦−)(λω.f) where 1 = lam ◦λω.f = λω .(lam ◦λω.f)ω • RV are the random variables Ω → Val from a sample space Ω taking values in a reflexive CPO Val, = λω . lam((λω.f)ω)= λω . lam(f). (10) • Exp are the stochastic λ-terms M, We define • Env are the (stochastic) environments E : Var → RV, ′ • Env are the (deterministic) environments e : Var → Val, Fun2 = λfg.S(Sfg), ω • Toss are the tossing processes T :Ω → 2 . where S = λghω.gω(hω) is the familiar S-combinator from In this section we establish the formal relationship between combinatory logic. Then these two semantics (Theorem 16). The idea is that in the stochastic semantics, although all data are parameterized by a Fun2fghω = S(Sfg)hω = Sfgω(hω) = (fω)(gω)(hω). sample point ω ∈ Ω, it is actually the same ω throughout a (11)

8 The function Fun2 is injective: A. Big-step rules The notation hM,ei ⇓ hv,fi means that hM,ei reduces Fun f = Fun f α 2 1 2 2 to hv,fi under the big-step rules below with coins α ∈ 2ω. ⇒ λg.S(Sf1g)= λg.S(Sf2g) hx, ei⇓α he(x),ei ⇒ ∀g∀h∀ω (f1ω)(gω)(hω) = (f2ω)(gω)(hω) hM,ei⇓α hv,fi hN,ei⇓α hv,fi ⇒ ∀y∀z∀ω f1ωyz = f2ωyz hM ⊕ N,ei⇓0α hv,fi hM ⊕ N,ei⇓1α hv,fi ⇒ f1 = f2. 3 hM,ei⇓π0 (α) hλx.K,e0i We define Lam2 to be the inverse of Fun2 on the image of hN,e i⇓ 3 hu,e i  0 π1 (α) 1  Fun2. Thus hK[y/x],e [u/y]i⇓ 3 hv,fi  1 π2 (α)  hMN,ei⇓ hv,fi Lam2(λghω .(fω)(gω)(hω)) = Lam2(Fun2f)= f. (12)  α  where in the third premise of the last rule, y is a fresh . Also, B. Small-step rules Fun2(Lam2(λghω .(fω)(gω)(hω))) The notation hM,ei →x hN,fi means that hM,ei reduces = Fun2f = λghω .(fω)(gω)(hω), to hN,fi under the small-step rules below via a computation that consumes exactly coins x ∈ 2∗ in order from left to right. so Fun2 ◦ Lam2 = id on its domain. Then The notation hM,ei →α hN,fi means that hM,ei →x hN,fi for some x ≺ α, where α ∈ 2ω. Fun ◦ Lam = Fun2 ◦ Fun1 ◦ Lam1 ◦ Lam2

= Fun2 ◦ Lam2 = id. ′ ′ hM,ei →x hM ,fi hN,ei →x hN ,fi ′ ′ Moreover, using (10),(11), and (12), hMN,ei →x hM N,fi hvN,ei →x hvN ,fi

hM,ei →ε hM,ei hx, ei →ε he(x),ei Fun fghω = Fun2(Fun1f)ghω

= Fun2(fun ◦f)ghω h(λx.M)v,ei →ε hM[y/x],e[v/y]i (y fresh) = (fun ◦f)ω(gω)(hω)= fun(fω)(gω)(hω) (13) hM ⊕ N,ei →0 hM,ei hM ⊕ N,ei →1 hN,ei Lam(λghω.f(gω)(hω)) hM,ei →x hN,fi hN,fi →y hK,gi = Lam1(Lam2(λghω .(λω.f)ω(gω)(hω))) hM,ei →xy hK,gi = Lam1(λω.f)= λω . lam(f). (14) hM,ei →x hN,fi x ≺ α Note that the domain Ω → Val is not reflexive with respect hM,ei →α hN,fi to [(Ω → 2ω) → (Ω → Val) → (Ω → Val)] under Fun and C. Relation of Big- and Small-Step Semantics Lam, but only with respect to [Ω → (2ω → Val → Val)] (and The big- and small-step operational semantics use their its image in [(Ω → 2ω) → (Ω → Val) → (Ω → Val)] under coins in different patterns, and we need a way to characterize Fun ). However this is all we need for Theorem 16. 2 how they relate. The big-step rule for application breaks its coin sequence up into three independent coin sequences to B. Relating the Deterministic and Stochastic Semantics evaluate the function, to evaluate the argument, and to The following theorem gives the formal relationship be- the function, respectively; whereas the small-step rules just tween the stochastic and deterministic denotational semantics. use their coins sequentially. The relationship is characterized by a tree process as Theorem 16. λω .((M ))e(Tω)= M (λxω .ex)T . described in §III. The construction is given in the proof of L M Proof. The proof follows by case analysis. A complete proof the following theorem. can be found in the Appendix which can be found in the Theorem 17. For all hM,ei there exists a tree process T complete version of this paper. hM,ei such that for all α, v, f,

hM,ei⇓α hv,fi ⇔ hM,ei →ThM,ei(α) hv,fi. VII. OPERATIONAL SEMANTICS Proof. The rules for the big-step semantics define proof trees In this section we give big- and small-step operational rules by which one concludes that an instance of the big-step in the style of [8] and prove their equivalence. The two styles relation holds. We proceed by induction on the structure of use their coins in different patterns and the relationship must these proof trees. The base case corresponds to reading a be formally specified. This is done using the tree processes of variable from the environment: hx, ei ⇓α he(x),ei. This case §III-B. is immediate since we can just take the tree process to be the

9 one that defines the identity function; note that the environment (ii) If ((M ))σ∗α = lam f for some f : [2ω → Val → Val], stores only values so there is no further reduction. then hM, σi⇓α. For the case Proof. (i) The proof is by induction on the derivation of hM,ei⇓α hv,fi hM, σi⇓α hv, τi. Let us do the easy cases first. For variables, hM ⊕ N,ei⇓0α hv,fi we have hx, σi⇓α hσ(x), σi and ′ ∗ ∗ ∗ ∗ we have, by induction, a tree process ThM,ei, call it T for ((x))σ α = σ (x)= Pσ(σ )(x) = ((σ(x)))σ β. short, such that For abstractions, we have hλx.M, σi ⇓α hλx.M, σi and hM,ei →T ′(α) hv,fi ((λx.M ))σ∗α = ((λx.M ))σ∗β and analogously for the other branch of the choice. We can define the tree t by for any β, since the semantics of abstractions does not depend h(,Mi⊕N)e on α. ′ ′ For choice, suppose . If hd thM,ei(α ) if α =0α hM ⊕ N, σi ⇓α hv, τi α = thM⊕N,ei(α)== , then tl . By the induction hypothe- t (α′) if α =1α′ 0 hM, σi ⇓ α hv, τi ( hN,ei sis, ((M ))σ∗(tl α) = ((v ))τ ∗β, so ((M ⊕ N ))σ∗(0 tl α) = It is a routine calculation to verify the result in this case. ((v ))τ ∗β. By a similar argument, if hd α = 1, then For the case hMN,ei, take the tree ((M ⊕ N ))σ∗(1 tl α) = ((v ))τ ∗β. Thus in either case, ((M ⊕ N ))σ∗α = ((v ))τ ∗β. 3 · thK[y/x],e1[u/y]i(z)+2, The most involved case is application. Suppose if w = xyz ∧ hM,ei →  x hMN,σi ⇓α hv, τi. Then for some λx.K, σ0, u, and  hλx.K,e0i σ1 such that σ ⊑ σ0 ⊑ σ1 ⊑ τ,   and hN,e0i →y hu,e1i, thMN,ei(w)=  3 3  hM, σi⇓π0 (α) hλx.K, σ0i hN, σ0i⇓π2 (α) hu, σ1i 3 · thN,e0i(y)+1,  if w = xy∧hM,ei →x hλx.K,e0i 3 hK[y/x], σ1[u/y]i⇓π1 (α) hv, τi  and hN,e0i →y NV,  where y ∈ Var is fresh. By the induction hypothesis, 3 · thM,ei(w), if hM,ei →w NV,   ((M ))σ∗(π3(α)) = ((λx.K ))σ∗β (15) where NV means “some capsule that is not reduced,” and let 0 0 ∗ 3 ∗ ThMN,ei be the associated tossing process. Then hMN,ei⇓α ((N ))σ0 (π2 (α)) = ((u))σ1 β (16) hv,fi occurs iff there exist K,u,e0,e1, and y fresh such that ∗ 3 ∗ ((K[y/x]))σ1[u/y] (π1 (α)) = ((v ))τ β. (17)

3 3 hM,ei⇓π0 (α) hλx.K,e0i hN,e0i⇓π1 (α) hu,e1i Then

3 hK[y/x],e1[u/y]i⇓π2 (α) hv,fi ∗ 3 fun(((M ))σ (π0 (α))) ∗ By the induction hypothesis, this occurs iff there exist x,y,z = fun(((λx.K ))σ0 γ) by (15) such that ∗ = fun(((λy.K[y/x]))σ0 γ), y fresh by α-conversion 3 fun lam ∗ hM,ei →x hλx.K,e0i x ≺ ThM,ei(π0(α)) = ( (λβv .(( K[y/x]))σ0 [v/y]β)) 3 ∗ (18) hN,e0i →y hu,e1i y ≺ ThN,e0i(π1(α)) = λβv .(( K[y/x]))σ0 [v/y]β. 3 hK[y/x],e1[u/y]i →z hv,fi z ≺ ThK[y/x],e1[u/y]i(π2 (α)) By (16) and Lemma 14, since σ0 is an extension of σ, ∗ 3 ∗ 3 ∗ By construction of thMN,ei(α), xyz ≺ ThMN,ei(α), so this ((N ))σ (π2 (α)) = ((N ))σ0 (π2 (α)) = ((u))σ1 β. (19) occurs iff By Lemma 15, since y is fresh, hMN,ei →x h(λx.K)N,e0i →y h(λx.K)u,e1i ∗ ∗ ∗ σ1[u/y] = σ1 [((u))σ1 γ/y]. (20) →ε hK[y/x],e1[u/y]i →z hv,fi, Using (17)–(20) and Lemma 14, which occurs iff hMN,ei →ThMN,ei(α) hv,fi. ((MN ))σ∗α VIII. SOUNDNESS AND ADEQUACY ∗ 3 3 ∗ 3 = fun(((M ))σ (π0 (α)))(π1 (α))(((N ))σ (π2 (α))) The following theorem asserts the soundness and adequacy 3 ∗ = (λβv .(( K[y/x]))σ0[v/y]β)(π1 (α))(((u))σ1 γ) of our denotational semantics with respect to our big-step ∗ ∗ 3 operational semantics. = ((K[y/x]))σ0 [((u))σ1 γ/y](π1(α)) = ((K[y/x]))σ∗[((u))σ∗γ/y](π3(α)) Theorem 18. 1 1 1 = ((K[y/x]))σ [u/y]∗(π3(α)) (i) If , then for any , ∗ 1 1 hM, σi ⇓α hλx.N, τi γ ((M ))σ α = ∗ ((λx.N ))τ ∗γ = lam (λβv .(( N ))τ ∗[v/x]β). = ((v ))τ β.

10 (ii) For variables, we have ((x))σ∗α = σ∗(x) = lam f for IX. RELATED WORK AND CONCLUDING REMARKS some f : [2ω → Val → Val]. Since dom σ = dom σ∗, we must have σ(x)= λx.K for some λx.K ∈ Λ. By definition In probability theory, stochastic processes are modeled as of σ∗, σ∗(x) = ((λx.K ))σ∗α = lam(λβv .(( K ))σ∗[v/x]β). random variables (measurable functions) defined on a proba- ∗ bility space, which is viewed as the source of randomness. It is As lam is injective, f = λβv .(( K ))σ [v/x]β, and hx, σi ⇓α hλy .K, σi. natural to think of probabilistic programming in a similar vein, For λ-abstractions, ((λx.M ))σ∗α = and in many of the related approaches one sees a programming ∗ formalism augmented by a source of randomness. lam(λβv .(( M ))σ [v/x]β) and hλx.M, σi ⇓α hλx.M, σi. For choice, we have ((M ⊕ N ))σ∗α = The idea of modeling probabilistic programs with a stream hd α ? ((M ))σ∗(tl α) : ((N ))σ∗(tl α) = lam f. Either of random data in the λ-calculus has been used in [5] as hd α = 1, in which case ((M ))σ∗(tl α) = lam f and well. In that work, the authors define big-step and small- step operational semantics for an idealized version of the hM, σi⇓tl α by the induction hypothesis, or hd α = 0, in ∗ probabilistic language Church. Their operational semantics, which case ((N ))σ (tl α) = lam f and hN, σi⇓tl α by the like ours, is a binary relation parameterized by a source of induction hypothesis. In either case, hM ⊕ N, σi⇓α by the big-step rule for choice. randomness. Although their language can handle continuous Finally, for applications, suppose ((MN ))σ∗α = lam f. We distributions and soft conditioning, they have not given a have denotational semantics. Our approach could accommodate ∗ ∗ ∗ continuous distributions simply by changing the source of ((MN ))σ α = fun (((M ))σ α0) α1 (((N ))σ α2) randomness to R. We might interpret R either as the usual ∗ If ((M ))σ α0 = ⊥, then real numbers or as its constructive version as in Real PCF [13]. To accommodate soft conditioning, we could adopt the ((MN ))σ∗α = fun (((M ))σ∗α ) α (((N ))σ∗α ) 0 1 2 solution proposed in [14] of adding a write-only state cell to ∗ = fun ⊥ α1 (((N ))σ α2) store the weight of the execution trace, which can be done ∗ = (λβv .⊥) α1 (((N ))σ α2) by slightly changing our domain equation. These extensions = ⊥, would complicate our semantics and its presentation, so we leave them for future work. ∗ contradicting our assumption. Similarly, if ((M ))σ α0 = In a similar vein, the category of Quasi-Borel Spaces (Qbs) ∗ lam g but ((N ))σ α2 = ⊥, then defined in [15] also assumes that probability comes from an ∗ ∗ ∗ ((MN ))σ α = fun (((M ))σ α0) α1 (((N ))σ α2) ambient source of randomness, which they model as a set of random variables of type R satisfying certain properties. = fun(lam g) α ⊥ → A 1 Furthermore, they show that in Qbs there is a Giry-like monad = g α1 ⊥ that uses the set of random variables in its definition. In [16], = ⊥, to accommodate arbitrary , they equip every Quasi- again contradicting our assumption. So we can assume that Borel space with a complete partial order and require the set ∗ ∗ of random variables to be closed with respect to directed ((M ))σ α0 = lam g and ((N ))σ α2 6= ⊥. By the induction hypothesis and Lemma 15, suprema. It would be interesting to better understand how our requirement of continuity of coin usage relates to their hM, σi⇓α0 hλx.K, σ0i hN, σ0i⇓α2 hu, σ1i construction. ∗ ∗ ∗ g = λβv .(( K ))σ0 [v/x]β ((N ))σ0 α2 = ((u))σ1 (−). There has also been alternative operational semantics for languages similar to ours. In [17], an operational semantics is defined in terms of Markov kernels over the values. Since ∗ ∗ ∗ ((MN ))σ α = fun (((M ))σ α0) α1 (((N ))σ0 α2) the of that work is on syntactic methods to reason ∗ about contextual equivalence, a denotational semantics is not = fun(lam g) α1 (((u))σ1 (−)) ∗ defined. However, by our adequacy and soundness , = g α1 (((u))σ1 (−)) ∗ ∗ we can also use our semantics to reason about contextual = (λβv .(( K ))σ0 [v/x]β) α1 (((u))σ1 (−)) ω equivalence. Furthermore, since the set {α ∈ 2 | hM, σi⇓α} ∗ ∗ = ((K ))σ0 [((u))σ1 (−)/x]α1 is measurable, one can prove by induction on reduction ∗ ∗ sequences that the semantics of [17] and ours are equivalent. = ((K[y/x]))σ0 [((u))σ1 (−)/y]α1 ∗ ∗ An alternative domain theoretical tool that has been used = ((K[y/x]))σ1 [((u))σ1 (−)/y]α1 ∗ to interpret randomness is the probabilistic powerdomain = ((K[y/x]))σ1[u/y] α1, construction. Recently the Jung-Tix problem [18] has been and by the induction hypothesis, hK[y/x], σ1[u/y]i⇓α1 . By solved [19], showing that it is possible to define a commu- the big-step rule for application, hMN,σi⇓α. tative probabilistic monad in a cartesian closed category of continuous domains. We tackle the problem from a different Corollary 19. For every capsule hM, σi perspective. We leave for future work to understand the con- ω ω {α ∈ 2 | hM, σi⇑α} = {α ∈ 2 | ((M ))σα = ⊥} nections between the probabilistic powerdomain and our func-

11 tor MX =2≤ω → X. It is worth noting that this functor is the not necessarily reflect the views of the National Science Reader monad from . Unfortunately, if Foundation. we were to use the same monad multiplication from the Reader Panangaden is funded by NSERC (Canada). monad—i.e. µX (t) = λα.t α α—we would reuse the same source of randomness twice, breaking linearity of usage of random data and probabilistic independence. Furthermore, for [1] G. Bacci, R. Furber, D. Kozen, R. Mardare, P. Panangaden, and D. Scott, λ any other natural transformation M 2 ⇒ M that preserves such “Boolean-valued semantics for the stochastic -calculus,” in Logic in (2018), 2018. linearity conditions, the associativity monad law holds only up [2] D. S. Scott, “Stochastic λ-calculi,” Journal of Applied Logic, vol. 12, to a measure-preserving function. As an example, suppose that no. 3, pp. 369–376, 2014. we chose the natural transformation µ (t)= λα :2ω.t α α [3] J. L. Bell, Set theory: Boolean-valued models and independence proofs. X 0 1 Oxford University Press, 2011, vol. 47. as our monad multiplication. In this case the associativity law [4] D. Scott, “A proof of the independence of the continuum hypothesis,” becomes: Mathematical systems theory, vol. 1, no. 2, pp. 89–111, 1967. [5] J. Borgstr¨om, U. Dal Lago, A. D. Gordon, and M. Szymczak, “A 3 ω lambda-calculus foundation for universal probabilistic programming,” λt : M (X) α :2 . t α0 α1 α1 = 0 0 in International Conference on Functional Programming (ICFP), 2016. 3 ω λt : M (X) α :2 . t α0 α10 α11 [6] N. Goodman, V. Mansinghka, D. M. Roy, K. Bonawitz, and J. B. Tenenbaum, “Church: a language for generative models,” arXiv preprint Obviously the equation above does not hold. arXiv:1206.3255, 2012. [7] J.-B. Jeannin and D. Kozen, “Computing with capsules,” in International As a final example of a related formalism, we mention Workshop on Descriptional Complexity of Formal Systems. Springer, probabilistic coherence spaces [20],[21], which use the de- 2012, pp. 1–19. composition of the usual into a linear function [8] G. D. Plotkin, A structural approach to operational semantics. Aarhus university, 1981. space and an exponential comonad. In [21], a fully abstract [9] R. Walter, “Real and complex analysis,” 1974. semantics is given for a probabilistic extension to PCF. They [10] H. P. Barendregt, The . North-Holland Amsterdam, model higher-order probability by using a generalization of 1984, vol. 3. [11] E. Engeler, “Algebras and combinators,” Algebra universalis, vol. 13, transition matrices. Cones of measures have also been used no. 1, pp. 389–392, 1981. to construct a model of higher-order probabilistic computa- [12] G. Longo, “Set-theoretical models of λ-calculus: theories, expansions, tion [22]. It is a fascinating question to understand precisely isomorphisms,” Annals of pure and applied logic, vol. 24, no. 2, pp. 153–188, 1983. the relationship between all these formalisms for higher-order [13] M. H. Escard´o, “Pcf extended with real numbers,” Theoretical Computer probabilistic computation. Science, vol. 162, no. 1, pp. 79–115, 1996. To conclude, while other approaches to denotational se- [14] S. Staton, F. Wood, H. Yang, C. Heunen, and O. Kammar, “Semantics for probabilistic programming: higher-order functions, continuous dis- mantics for higher-order probabilistic computation have been tributions, and soft constraints,” in Logic in Computer Science (LICS), taken, no such construction is the obviously ”correct” one. 2016. To clarify this matter, the connections between different ap- [15] C. Heunen, O. Kammar, S. Staton, and H. Yang, “A convenient category for higher-order probability theory,” in Logic in Computer Science proaches would need to be well understood. But this is a (LICS), 2017. hard open problem and requires in-depth understanding of the [16] M. V´ak´ar, O. Kammar, and S. Staton, “A for statistical various possible approaches and how they relate, and the field probabilistic programming,” in Principles of Programming Languages (POPL), 2019. is not there yet. Even though the approaches mentioned above [17] V. Vignudelli, “Behavioral equivalences for higher-order languages with are interesting, we do not see that they have any compelling probabilities,” Ph.D. dissertation, Universit`adi Bologna, 2017. argument suggesting that they are the only ”right” semantics [18] A. Jung and R. Tix, “The troublesome probabilistic powerdomain,” Electronic Notes in Theoretical Computer Science, vol. 13, pp. 70–91, for probabilistic higher-order computation. In this paper we 1998. contributed to the area by focusing on the Boolean-valued [19] X. Jia, B. Lindenhovius, M. Mislove, and V. Zamdzhiev, “Commutative semantics of [1] and modified it to accommodate a call-by- monads for probabilistic programming languages,” in Logic in Computer Science (LICS), 2021. value operational semantics which we proved it sound and [20] V. Danos and T. Ehrhard, “Probabilistic coherence spaces as a model of adequate with respect to the modified denotational semantics, higher-order probabilistic computation,” Information and Computation, solving the main open problem from that work. vol. 209, no. 6, pp. 966–991, 2011. [21] T. Ehrhard, C. Tasson, and M. Pagani, “Probabilistic coherence spaces are fully abstract for probabilistic pcf,” in Principles of Programming ACKNOWLEDGMENTS Languages (POPL), 2014. [22] T. Ehrhard, M. Pagani, and C. Tasson, “Measurable cones and stable, Thanks to Giorgio Bacci, Fredrik Dahlqvist, Robert Furber measurable functions: a model for probabilistic higher-order program- and Arthur Azevedo de Amorim. Special thanks to ming,” in Principles of Programming Languages(POPL), 2017. for many inspiring conversations. Thanks to the Bellairs Re- search Institute of McGill University for providing a wonderful APPENDIX research environment. Proof of Lemma 4. Given any real 0

12 ∞ −ni Write the binary expansion of a as i=0 2 , n0 < Intuitively, Px is the set of ≺-minimal prefixes of input streams n1 < ··· , using the form with trailing 1’s when there are that produce x as a prefix of the output. This is a coding P −|x| two representations. At stage i, let Ix be the next unoccupied function, and Pr({α | x ≺ T α})=2 by definition of interval in the list. Set Ea , Ea ∪ Ixy, where ni = |xy|. This tossing process. Moreover, if T is total, then x 7→ Px is is always possible, since the procedure maintains the invariant exhaustive by uniform continuity. that the largest unoccupied interval is at least twice the size Conversely, every coding function satisfying (i)-(iii) gives of the next occupying interval. This is true initially since the rise to a continuous tossing process T by defining T α to be first occupying interval is of size at most 1/2 and the first the unique stream containing all prefixes x such that Px ∩{y | unoccupied interval is of size 1, and it is preserved in each y ≺ α} 6= ∅, or undefined if no such stream exists. This is a step since populating an interval I with an interval I′ at most tossing process by Lemma 1. half the size leaves an unoccupied subinterval of I at least the same size ′, and the next occupying interval is at most half Proof of Lemma 8. (i) By Lemma 7(i), the inclusion map I ω ≤ω the size of I′. After each stage, remove all newly occupied 2 → 2 is continuous, thus its composition with f is. intervals from the list and repeat. Every interval in the list (ii) For any basic open set x↑ of D, eventually becomes occupied after all the intervals before it −1 y ∈ (λz . l g(α)) (x↑) ⇔ l g(α) ∈ x↑ become occupied or are removed from the list. z≺α y≺α This construction results in a dense open set Ea such that ⇔ x ⊑ g(α) ⇔ ∀α (y ≺ α ⇒ x ⊑ g(α)) Pr(Ea) = a. The set Ea is the union of countably many l pairwise disjoint intervals I for x ∈ H , where H is some y≺α x a a −1 −1 countable binary prefix code. ⇔ ∀α (α ∈ y ↑ ⇒ α ∈ g (x↑)) ⇔ y ↑⊆ g (x↑), Now let an be any sequence of reals 1/2 < an < 1 such and −1 is Scott-open by Lemma 7(ii). ∞ n+1 n+1 {y | y ↑⊆ g (x↑)} that n=0 an = 1/2, e.g. an = (2 + 1)/(2 + 2) or ( 2n+3 − 1)/ 2n+3 , and let Proof of Theorem 16. For variables x, 2Q 2  ∞  λω .((x))e(Tω)= λω .ex = (λyω.ey)x = x (λyω.ey)T. A = Han = {x0x1x2 · · · | xn ∈ Han , n ≥ 0}. L M n=0 For choice, Y The set A is the intersection of a descending chain of dense m λω .((M ⊕ N ))e(Tω) open sets Am = {Ix | x ∈ n=0 Han }, m ≥ 0, with m = λω . hd(Tω) ? ((M ))e(tl(Tω)) : ((N ))e(tl(Tω)) (21) Pr(Am)= n=0 an. Thus A is a dense Gδ set with Pr(A)= 1/2. S Q = λω . hd(Tω) ? M (λxω .ex)(tl ◦T )ω Q We also claim that for all intervals I, 0 < Pr(A ∩ I) < : LN M(λxω .ex)(tl ◦T )ω (22) ∞ ∞ Pr I. Note that 1/2 = n=0 an < n=1 an < 1, thus if L M ∞ = M ⊕ N (λxω .ex)T. (23) Pr(A0 ∩ I)= ε, then Pr(A ∩ I)= ε n=1 an, so 0 < ε/2 ≤ L M Pr(A ∩ I) < ε ≤ Pr(I)Q. Thus both AQand A¯ = 2ω \ A are Step (21) is by definition of ((−)) (Definition 11). Step (22) dense and of measure 1/2. Q is by the induction hypothesis for M and N and the fact Since both A and A¯ are dense, there is no interval contained that tl(Tω) = (tl ◦T )ω. Step (23) is by definition of − in either one of these sets. Moreover, no tossing process T (Definition 9). L M with T −1({α | 0 ≺ α}) = A can be made continuous by For application, deleting a nullset N from the domain, since then we must have I ∩N¯ ⊆ A for some interval I, thus I ∩A¯ ⊆ N, contradicting λω .((MN ))e(Tω) 3 3 3 the fact that Pr(A¯ ∩ I) > 0 for all intervals I. Such a tossing = λω . fun(((M ))e(π0(Tω)))(((N ))e(π1(Tω)))(π2 (Tω)) process T exists; set (24) 3 −1 = λω . fun( M (λxω .ex)(π0 ◦ T )ω) T ({α | 0x ≺ α})= A ∩{α | βa ≤lex α

13 Finally, for λ-abstraction, we first need a property of re- : (λxω .ex)[R/x]= λy .(y = x) ? R : (λxω .ex)y = λy .(y = x) ? λω.Rω : λω.ey = λyω .(y = x) ? Rω : ey = λyω .(e[Rω/x]y). (29) Then λω .((λx.M ))e(Tω) = λω . lam(λβv .(( M ))(e[v/x])β) (30) = Lam(λT Rω .((M ))(e[Rω/x])(Tω)) (31) = Lam(λT R. M (λyω.e[Rω/x]y)T ) (32) = Lam(λT R.LM M((λxω .ex)[R/x])T ) (33) = λx.M (λxωL .exM )T. (34) L M Step (30) is by definition of ((−)). Step (31) is by (14). Step (32) is by the induction hypothesis for M. Step (33) is by (29). Step (34) is by definition of − . L M

14