arXiv:1902.11189v1 [cs.LO] 28 Feb 2019 ihtewl-nw eatc fhge-re programs higher-order of similari semantics bears well-known but products, the theo tensor classical their with the and in rooted spaces is Banach semantics b Our operators spaces. linear of Banach terms in programs probabilistic order aSottruhteuse the through Scott la emdlBysa neec scmuigthe computing to as inference program observat Bayesian from probabilistic distribution model a prior We viewed a of be about information computation can infer the inference higher- reversing Bayesian for as inference. foundation a Bayesian condition sampling, as and with serve programming can probabilistic that order maps regular and entosi em ffie ons en ae namonoidal effe a semantics a on our as based structure, randomness Being closed treats points. cartesian fixed than of rather terms in definitions ieroeao n hwhwi orsod ocmuiga computing to corresponds it how disintegration show and operator linear category closed [5]. monoidal to of shown semantics measure-transformer and equivalent the functions to an measurable dual of [6], be o spaces on In Banach based an dered measures. introduced on was of semantics operators predicate-transformer space linear continuous Banach and were ordered positive Programs spaces. as Banach modelled data ordered and denotational imperativ over Programs interpreted first-order and generation. were idealized number operational random an with for which language given in were [5], semantics was programs [1]–[4]. ta this perspectives approached various have from works recent specificat Several for created reasoning. tools and and has foundations sampl semantic inference, sound and for Bayesian and need construct perform Church to and o statisticians as distributions allow emergence such which The learning languages Anglican, datasets. machine large programming in of applications probabilistic analysis new statistical by and driven interest of o rdc osrcini h category the in construction product sor tuto osamta ducinwt ost enriched internaliz as homsets acting structure with space adjunction Banach ordered an an with admit co the does cartesian, not struction Although closure. monoidal symmetric Abstract nti ae eii hsapoc.W dniyasymmetric a identify We approach. this revisit paper this In probabilistic of semantics the on works earliest the of One resurgence recent a enjoyed has programming Probabilistic h xeso ohge ye sahee hog ten- a through achieved is types higher to extension The eatc fhge-re rbblsi programs probabilistic higher-order of Semantics W rsn eoainlsmnisfrhigher- for semantics denotational a present —We . .I I. resource nvriyCleeLondon College University NTRODUCTION ordered RoBan [email protected] rdi Dahlqvist Fredrik . aahsae hc allow which spaces Banach fodrdBnc spaces Banach ordered of ihconditioning with RoBan adjoint htgives that etween ctively ions. yof ry fa of ing, ties ion ed sk n- r- a a e e ` f nit,a ti ret rdtoa ct-tl denotatio Scott-style traditional to true (see mantics is it as an entists, for semantics sampl inference. denotational Bayesian with and and language conditioning, system probabilistic higher-order type foundati this idealized a on Based give distribution. a we values prior which a Bayesian in with types’, and decorated ‘Bayesian introduce conditioning we inference, accommodate To exponentials. osrcint s apemr hnoc;ta s each is, product). that tensor sample once; fresh a a than in requires by (component more sample forbidden random sample a is consuming a operation it use properties: to crypographical construction nice some has randomne only physically, can rate; generators limited is a number at random resource, randomness True computation produce space. a or as of time (randomness) like task entropy of the view simplify the greatly algorithms. will learning this machine stati stochastic that theory, validating believe ergodic We analysis, functional etc. from wea operato a results these with seamlessly of of connects theory therefore the semantics Gib of Our part [8]. like k important algorithms an important is is of sampling, which correctness ou theory, the of proving Banach ergodic morphisms to a the certain classical precisely just between Similarly, are is These 1 semantics. 19]. process norm Ch. Markov [7, of lattices a operator that linear ve is positive exampl uses perspective For it traditional fields. those as a from theorists, objects mathematical learning familiar machine and statisticians hlta etoe bv,orsmnisi otdi the b in constructed being rooted objects is the semantics of our mathematics above, scienc traditional mentioned computer theoretical as of world whilst the from univ directly mathematical semanti includes come proposed present these that First, the following. PCF with the are of differences a key extension provides The probabilistic which conditioning. a category to closed form semantics Cartesian These cones. cpo-enriched measurable of a terms au in the semantics [4] a In develop distributions. o probability and terms close distribution of probability monad in Cartesian of Giry-like given notion a a is admit form semantics and These category a spaces. [2], quasi-Borel [1], so-called In literature. develo the recently in been have programming probabilistic order ebleeorapoc hudapa ocmue sci- computer to appeal should approach our believe We eae works: Related with well fits semantics our that mention also should We eore[] u yesse,bigresource-sensitive, being system, type Our [9]. resource a [email protected] § VD.I hudas pelt mathematicians, to appeal also should It IV-D). onl University Cornell etrKozen Dexter w eypwru eatc o higher- for semantics powerful very Two a se- nal stics, thors erses fa of ing, ped on, lth ey bs ry cs re ss rs e, e, y d r f the programs. Second, quasi-Borel spaces and measurable define a measure kernel to be a measurable map f : X M Y cones form Cartesian closed categories, whereas we work such that for all x X f(x)

2 B. Ordered Banach spaces Proposition 1. Regular operators on regular ordered Banach 1) Regular Ordered Banach spaces: An ordered vector spaces are (norm) bounded. space V is a vector space together with a partial order Theorem 2 ( [16]). If are regular ordered Banach spaces ≤ U, V which is compatible with the linear structure in the sense that and [U, V ] is equipped with the obvious linear structure, for all u,v,w V, λ R+ pointwise order and the regular norm ∈ ∈ u v u + w v + w and u v λu λv f r = inf g : g f g ≤ ⇒ ≤ ≤ ⇒ ≤ k k {k k − ≤ ≤ } A vector v in an V is called positive if where g = sup g(u) : u 1 is the usual operator v 0 and the collection of all positive vectors is called the norm, thenk k [U, V ] {kis a regulark k orderedk ≤ } Banach space. positive≥ cone of V and denoted V +. The positive cone is said to be generating if V = V + V +, that is to say if every vector This result justifies the following notation: we will denote can be expressed as the difference− of two positive vectors. the regular ordered Banach space of operators between the An ordered normed vector space V is an ordered vector regular ordered Banach spaces U, V by [U, V ]. space in which the positive cone V + is closed for the 3) Banach lattices: We now describe a particularly im- topology generated by the norm. A subset of the positive portant class of regular ordered Banach spaces: the class cone of particular importance will be the positive unit ball of Banach lattices. Although this class of objects lacks the B+(V )= v 0 : v 1 . An ordered Banach space is an categorical closure properties that we seek (see II-D), most § ordered normed{ ≥ vectork k≤ space} which is complete. We can now of the objects we will be dealing with are Banach lattices. describe the central class of object of this work: an ordered An ordered vector space (V, ) is a if its partial normed space is said to be regular if it satisfies [15, Ch. 9]: order is a lattice. This allows the≤ definition of the positive and + R1 if y x y then x y negative part of a vector v V as v = v 0, v− = ( v) 0 − ≤ ≤ k k≤k k ∈ ∨ +− ∨ R2 x = inf y : y x y and its modulus as v = v ( v). Note that v = v v−, k k {k k − ≤ ≤ } with v+, v positive,| and| the∨ positive− cone of a Riesz space− is In particular, a regular ordered Banach space is an ordered − thus generating. A Riesz space is order complete or Dedekind- Banach space which is regular. A few comments are in order. complete (resp. σ-order complete or σ-Dedekind complete) First note that if y y then 0 2y, and thus y is positive, if every non-empty (resp. non-empty countable) subset of V so R2 says that the− norm≤ of any≤ vector can be approximated which is order bounded has a supremum2. A normed Riesz arbitrarily well by the norm of positive vectors. Note also that space is a Riesz space equipped with a lattice norm, i.e. norm R2 implies that the positive cone is generating: for any x V , satisfying axiom R1 above. A normed Riesz space is called fix ǫ> 0, then by R2 there exists y with y x y whose∈ y+−x ≤y x≤ a if it is (norm-) complete. As stated, Banach norm is ǫ-close to that of x. Since x = − , and since 2 2 lattices form a special class of regular ordered Banach spaces: it follows from y x y that both y +−x and y x are − ≤ ≤ − positive, x can indeed be expressed as the difference of two Proposition 3. Banach lattices are regular. positive vectors. Regularity can be understood as the fact that the space is fully characterised by its positive unit ball [16]. Example 4. Given a measurable space (X, ), the space (X, ) can be shown [7, Th 10.56] to be a BanachF lattice. 2) Regular operators and RoBan: As mentioned above, M F regular ordered Banach spaces are determined in a very The Banach space structure was described above and the strong sense by their positive cone which is generating and lattice structure is given by determines the norm (axiom R2). It is therefore natural to (µ ν)(A) = sup µ(B)+ν(A B) B measurable ,B A consider linear operators f : U V between regular ordered ∨ { \ | ⊆ } → Banach spaces which send positive vectors to positive vectors, and dually for meets. The Hahn-Jordan decomposition theo- i.e. such that u 0 f(u) 0. Such operators are rem defines the positive and negative part of a measure in the ≥ ⇒ ≥ called positive operators and constitute a field of mathematical Banach lattice (X, ). research in their own right [17], [18]. The collection [U, V ]p of M F Example 5. Given a measured space (X, ,µ) and 1 p positive operators between two regular ordered Banach spaces F ≤ ≤ , the Lebesgue space Lp(X,µ) is a Banach lattice with clearly does not form a vector space, since it is not closed ∞ under scalar multiplication by negative reals. We therefore the pointwise order. In particular, for any f Lp(X,µ), the + ∈ consider the span of this collection, that is to say the operators positive and negative parts f and f − of a function used f : U V which can be expressed as the difference between in the definition of the Lebesgue integral defines the positive- → negative decomposition of in the Banach lattice . two positive operators, i.e. f = h g with h,g [U, V ]p. f Lp(X,µ) − ∈ We will say that p, q N are H¨older conjugate if Such operators are called regular operators, and we define the ∈ ∪ {∞} category RoBan as the category whose objects are regular either of the following conditions hold: (i) 1

3 The examples of Banach lattices described above are in- and the σ-order-dual: stances of an even better behaved class of objects called V σ = f : V R : f is σ-order continuous and regular abstract Lebesgue spaces or AL spaces. They are defined by { → } the following property of the norm: a Banach lattices V is an The latter is also known as the Kothe¨ dual of V [17], [19]. AL space if for all u, v V + ∈ Theorem 8. The Kothe¨ dual V σ of a regular ordered Banach u + v = u + v (AL) space V is an order-complete Banach lattice. k k k k k k Not surprisingly, the Lebesgue spaces L1(X,µ) are examples Example 9. It is shown in e.g. [17], [20] that of AL spaces, as are the Banach lattices (X). σ M Lp(X,µ) =Lq(X,µ) (2) Theorem 6 ( [18], Sec. 4.1). AL spaces are order-complete. for any Holder¨ conjugate pair 1 p, q . In particular 4) Bands: The order structure of Riesz spaces gives rise to ≤ ≤ ∞ the spaces L1(X,µ) and L (X,µ) are Kothe¨ dual of each classes of subspaces which are far richer than the traditional other. Note that they are not∞ordinary duals. linear subspaces. An ideal of a Riesz space V is a linear subspace U V with the property that if u v and 6) Categorical connections.: We conclude this section by v U then u⊆ U. An ideal U is called a | | ≤when | | for a summary of some results from [14] which provide a cate- every∈ subset D ∈ U if D exists in V , then it also belongs gorical connection between most of the topics covered so far. to U. Every band⊆ in a Banach lattice is itself a Banach lattice. The category Krn is the category whose objects are pairs Of particular importanceW in what follows will be the principal (X,µ) where X is standard Borel spaces [12] (in fact any band generated by an element v V , which we denote Vv class of measurable spaces for which disintegrations exist will and can be described explicitly by∈ do) and µ M X. A morphism between (X,µ) and (Y,ν) is a measure∈ kernel f : X M Y such that f (µ) = ν Vv = w V ( w n v ) w (where f is defined in (1)),→ in which case the morphism∗ is { ∈ | | | ∧ | | ↑ | |} ∗ Example 7. Given a measure µ X, the band ( X)µ denoted f as well. As was shown in [14], any two morphisms generated by µ is the set of signed∈ measures M of boundedM vari- which disagree only on a null set can be identified, and the ation which are absolutely continuous w.r.t. µ [7, Th. 10.61]. morphisms of Krn thus become equivalence classes of almost The ordered version of the Radon-Nikodym theorem states that everywhere equal measure kernels (see [14] for the technical ( X)µ L1(X,µ) as Banach lattices [7, Th. 13.19]. details of this construction). M ≃ Now, we define some functors. First, as was shown in 5) Kothe¨ duals: There are two modes of ‘convergence’ [14], the Bayesian inversion operation described in II-A3 in an ordered Banach space: and norm op § defines a functor ( )† : Krn Krn which leaves objects convergence. The latter is well-known, the former less so. Let unchanged and sends− a morphism→ f : (X,µ) (Y,ν) D be a directed set, and let vα α D be a net in an ordered → { } ∈ to its Bayesian inverse f † : (Y,ν) (X,µ) (we drop Banach space V . We say that vα converges in order to v → the subscript µ of f † because it is made explicit from the if there exists a decreasing net{ } with , µ uα α D uα = 0 typing). Note that (f ) = f [14]. We also define the functor notation , such that { } ∈ † † uα 0 ( )σ : RoBan RoBanop which sends a regular ordered ↓ V − → uα vα v uα for all α D Banach space X to its K¨othe dual, and a regular operator − ≤ − ≤ ∈ T : U V to its adjoint T σ : V σ U σ defined in the If the directed set is N we get the notion of order-convergent D usual way.→ Note that just as taking the→ K¨othe dual gives an sequence. Order and norm convergence of sequences are dis- order-complete space, the adjoint T σ of a regular operator is joint concepts, i.e. neither implies the other (see [17, Ex. 15.2] an order-continuous regular operator [17, Ch. 26]. for two counter-examples). However if a sequence converges Connecting the categories, we define for each 1 p both in order and in norm then the limits are the same the functor L : Krn RoBanop which sends≤ a Krn≤ ∞- (see [17, Th. 15.4]). Moreover, for monotone sequences norm p object (X,µ) to the Lebesgue→ space L (X,µ) and a Krn- convergence implies order convergence [17, Th. 15.3]. p arrow f : (X,µ) (Y,ν) to the operator L f : L (Y,ν) It is well known that bounded operators are continuous, p p L (X,µ), φ λx→ . φ df(x). We also define the functor→ i.e. preserve norm-converging sequences. The corresponding p Y : Krn 7→RoBan which sends an object (X,µ) to the order-convergence concept is defined as follows: an operator M− → R band ( X)µ and a morphism f : (X,µ) (Y,ν) to the op- T : V W between Riesz spaces is said to be σ-order M → → 3 erator f : ( X)µ ( Y )ν , ρ λB . X f(x)(B) dρ. continuous if whenever vn 0, T vn 0 . We can thus M M → M 7→σ Krn ↓ ↓ The functors , L1 ( )† and ( ) L of type consider two types of dual spaces on an ordered Banach space − ∞ RoBan are relatedM via◦ natural− transformations− ◦ R which play→ a V : on the one hand we can consider the norm-dual: major role in measure theory [14]: R V ∗ = f : V : f is norm-continuous RN : L1 ( )† acts at (X,µ) by sending a { → } • M− → ◦ − dµ 3 measure ν µ to its Radon-Nikodym derivative dν . Equivalently if vn ↑ v, i.e. vn is an increasing sequence with supremum ≪ v, implies T v ↑ T v. Note the similarity with Scott-continuity, the only MR : L1 ( )† acts at (X,µ) by sending an n • ◦ − → M− difference being the condition that sequences must be order-bounded. L1-map f to its Measure Representation fµ.

4 FR : ( )σ L acts at (X,µ) by sending a bilinear map f : U V W there exists a unique linear map • measureMµ− to→ its− Functional◦ ∞ Representation λφ. φ dµ. f¯ : U V W making× → the following diagram commute: RR : ( )σ L acts at (X,µ) by sending an ⊗ → • − ◦ ∞ → M− R U V / U V (4) L -functional F to its Riesz Representation λB.F (1B). × s ⊗ ∞ ⊗ s The natural transformations RN and MR are inverse of each f s s f¯ other, as are FR and RR, proving natural isomorphisms  ys between the three functors. We summarize these relationships W in the following diagram: The tensor product can be built explicitly as follows: it is the RoBan (3) free vector space over U V quotiented by the following ♣♣8 O f▲▲▲ identities: × σ ♣♣ ▲▲ ( ) ♣♣ ▲▲ L1 − ♣♣ ▲▲ ♣♣ − ▲▲ ♣♣ M ▲▲ (u + u′, v) = (u, v) + (u′, v), (u, v + v′) = (u, v) + (u, v′), ♣♣♣ ▲▲ RoBanop ks FR ks MR Krnop (λu, v) = (u, λv)= λ(u, v) (5) f◆◆◆ +3 r+3 r9 ◆◆◆ RR RN rr ◆◆ rrr It is not too hard to see that the last identity is precisely what is ◆◆ rr 2 ◆◆ r † needed to fix the contradiction λf(u, v)= λ f(u, v) described L∞ ◆◆ rr ( ) ◆◆ rrr − above and reconcile currying with linearity. By definition, an Krn element x U V will be a (finite) linear combination of ∈ ⊗ C. Tensor products of ordered Banach spaces equivalence classes of the generators (u, v) – denoted u v ⊗ – under the identities (5), formally x = ui vi. We start by describing the tensor product of vector spaces i ⊗ 2) Tensor product of Banach spaces: Suppose now that from the perspective of computer science. We will then discuss P how the tensor product can be normed and ordered. both U and V are Banach spaces, in particular that they carry a norm. How do we define a norm on U V , and 1) Introduction to the tensor product: As was already ⊗ highlighted in [5] in the case of probabilistic programming, how do we ensure that the space is complete for this norm? and subsequently in the development of semantics for quantum For Hilbert spaces there is a straightforward construction, but programming languages (e.g. [21]), it may be desirable to since we shall be dealing with Banach spaces which are not interpret programs as linear operators in a category of vector Hilbert spaces, we will require the much more subtle theory spaces. Indeed, this is precisely what this paper advocates for of tensor products of Banach spaces originally developed by probabilistic programming languages. However, a difficulty Grothendieck [22]. We refer to [23] for a good introduction. quickly emerges if one wants to include higher-order features. The initial difficulty with the construction of a norm is that Consider a map in two arguments f : U V W . The most by definition of the tensor product, each vector has many rep- × → resentations. For example, the vector (2, 2) (3, 3) R2 R2, basic facility provided by higher-order reasoning is the ability ⊗ ∈ ⊗ to curry such a map and define the two curried maps i.e. the equivalence class of the pair ((2, 2), (3, 3)) under the equations (5), can also be expressed as (1, 1) (6, 6), fˆ : U W V and f˜ : V W U and these representations are built from vectors with⊗ very → → different norms. Assuming that we want the norm of the by fixing one argument or the other. Since we want both tensor be be defined from the norms of its components, which curried map fˆ and f˜ to be linear, it is easy to see that f representation do we choose? There is no unique solution must be linear in each arguments separately, in particular to this question, but Grothendieck proposed one extremal f(λu, v)= λf(u, v) and f(u, λv)= λf(u, v)= λ(u, v) solution by defining for any x U V ∈ ⊗ Such a map is referred to as a bilinear map, it is linear in each n n x π = inf ui vi : x = ui vi (6) argument separately. However being bilinear is incompatible k k k kk k ⊗ (i=1 i=1 ) with being linear: by definition of the product linear structure X X if f were also linear we would have This definition of π defines a norm [23, Prop 2.1] which k·k λf(u, v)= f(λ(u, v)) = f(λu, λv)= λf(u, λv)= λ2f(u, v) is called the projective norm. However, the space U V equipped with the projective norm is in general not complete⊗ . which is clearly a contradiction if λ = 1. Thus f is not a One therefore defines the projective tensor product of two 6 valid morphism if we want our semantic universe to consist Banach spaces U, V as the completion of U V under of linear maps between vector spaces. Fortunately, for any pair the projective norm, i.e. the space of equivalence⊗ classes of of vector spaces U, V there exists a special object, the tensor Cauchy sequences in U V converging to the same point. product U V , which linearizes bilinear maps, i.e. such that This space will be denoted⊗ U V and one can describe the ⊗ π any bilinear map f : U V W corresponds to unique linear projective norm of elements in⊗ this space as follows: × → map f¯ : U V W (and vice-versa). This can be phrased in b terms of a⊗ universal→ property: there exists a universal bilinear ∞ ∞ ∞ x π = inf ui vi : ui vi < , x= ui vi map : U V U V , (u, v) u v such that for any k k k kk k k kk k ∞ ⊗ (i=1 i=1 i=1 ) ⊗ × → ⊗ 7→ ⊗ X X X

5 In II-C1 we saw how the tensor product can be used as a way Theorem 12 ( [25], Th. 2B). If E is an AL-space and F is § △ to linearize bilinear maps. This property extends naturally to any regular ordered space, then E π F and E π F are the normed case, and it can be shown that the projective tensor isomorphic as Banach spaces. ⊗| | ⊗ product U π V linearizes bounded bilinear maps [23, Th b D. The closed monoidal structue of RoBan 2.9] in the sense⊗ that there exists a universal bounded bilinear maps satisfying the universal property of 1) Tensor product of regular operators: In II-C1 and U Vb U π V § (4) w.r.t× bounded→ bilinear⊗ maps. II-C2 we saw how the tensor and projective tensor products § The projective tensorb product of two Banach spaces is in can be used to linearize bilinear and bounded bilinear maps general fairly inscrutable. However, one can explicitly describe respectively. The positive projective tensor product fulfils projective tensor products involving important objects for our the same role for positive (and thus bounded by Prop. 1) semantics. When one component is an L1-space we have: bilinear maps: there exists a universal positive bilinear map △ U V U π V satisfying the universal property of (4) Theorem 10 (Radon-Nikodym and [23] p. 43). For fi- w.r.t× positive→ bilinear⊗| | maps [26, 2.7]. This universal property nite measures µ,ν on measurable spaces X, Y respectively, △ of tensor products provides a definition of π as a bifunctor ( X)µ π ( Y )ν ( (X Y ))µ ν . on RoBan. Let f : U X,g : V Y be⊗ positive| | operators, M ⊗ M ≃ M × × → → The operator : X Y (X Y ) taking the then the map b × M ×M → M × product of measures is bilinear. Therefore there exists a unique △ △ π (f g): U V X Y X π Y map X π Y (X Y ) mapping any tensor µ π ν ⊗| | ◦ × × → × → ⊗| | M ⊗ M →M × ⊗ to the product measure. In this sense, X π Y is the is positive and bilinear, and thus there exists a unique pos- M ⊗ M △ △ subspace ofb (X Y ) which is generated by taking linearb itive operator U π V X π Y which is denoted M × △ ⊗| | → ⊗| | △ combinations of product measures, and then closingb under the f π g. This provides the definition of the bifunctor π projective norm. on⊗ morphisms.| | ⊗| | 2) The closed monoidal structure: As we saw in Th. 2, Theorem 11. The projective tensor product X Y is π the category RoBan has internal homs, and these interact isometrically embedded in (X Y ). M ⊗ M M × correctly with positive projective tensor products. 3) Tensor product of ordered Banach spaces: Web conclude Theorem 13 ( [16]). For every regular ordered Banach space this brief description of tensor products by examining the case , the tensoring and homming operations △ and of interest to us, namely regular ordered Banach spaces. In U π U [U, ] define functors RoBan RoBan such− ⊗ that| | the most important examples the construction is isomorphic − → △ as Banach spaces to the unordered case, and we will therefore π U [U, ] not dwell too long on the theory of tensor product of ordered − ⊗| | ⊣ − Banach spaces developed in [24]–[26]. The main idea of the The positive projective tensor defines a symmetric monoidal RoBan R R definition is to reflect the central role of positive vectors in structure on with as its unit – since U U R ⊗ ≃ ≃ the theory of regular ordered Banach spaces, and in particular U at the level of the underlying vector spaces – and the ⊗ △ △ obvious isomorphisms U π V V π U inherited from the fact that the positive cone is generating and determines the ⊗| | → ⊗| | norm (axiom R1, R2 above). The same should hold for any the isomorphism U V V U between the tensor product ⊗ → ⊗ RoBan ordered tensor product. of the underlying vector spaces. The category is thus Given two ordered regular Banach spaces U, V , their tensor symmetric monoidal closed. product U V is equipped with the positive projective norm ⊗ III. A HIGHER-ORDERLANGUAGEWITHCONDITIONING π defined as k·k| | A. A type system n + + We start by defining a type system for our language. Our x π = inf ui vi : ui U , vi V , k k| | k kk k ∈ ∈ aims are to (a) have enough types to write some realistic (i=1 X n n programs for example including multivariate normal or chi- squared distributions, (b) have higher-order types, (c) provide ui vi x ui vi − ⊗ ≤ ≤ ⊗ i=1 i=1 ) special types for Bayesian learning: Bayesian types. X X Note the similarity with (6), and the role played by positive Our type grammar is given as follows: vectors in this definition. As in the unordered case U V T ::=m intn realn PosDef(n) is not complete for the positive projective norm, and⊗ we | | | | (T,µ) T T T T MT (7) must therefore take its completion which we call the positive | ⊗ | → | △ n projective tensor of U and V and denote by U π V . where 1 m,n N and µ : T. We will refer to m, int , ⊗| | n ≤ ∈ Since L1-spaces and (X, )-spaces are examples of AL- real , PosDef(n) as ground types. As their name suggest spaces, the following resultM showsF that we can in practice often they are to be regarded as the types of (possibly random) ignore the subtleties of the positive projective tensor product elements of finite sets, vectors of integers, vectors of reals and rely on the descriptions of the ordinary projective tensor and positive semi-definite matrices, that is to say covariance products. matrices. We will write int1 and real1 as int and real.

6 This is by no means an exhaustive set of ground types, As will become clear when we define the semantics of types, but sufficiently rich to consider some realistic probabilistic we can also use the subtyping relation to add information about programs. The type 1 N will be referred to as the unit type the absolute continuity of one built-in measure w.r.t another and denoted unit and∈ the type 2 N will be referred to as in the type system. For example, since a beta distribution is ∈ the boolean type and denoted bool. absolutely continuous w.r.t. to a normal distribution, if e1 is a The type constructors are the following. First, given a program constructing a beta distribution and e2 is a program term µ of type T, we can build the pointed type (T,µ). constructing a normal distribution, we could add the rule We will call these types Bayesian types because µ : T (real,e1) <: (real,e2) . will be interpreted as a prior. Bayesian types will support Contexts: are maps Γ : N Types – the free algebra conditioning and thus Bayesian learning. As we shall see, of all types generated by (7) –→ which send cofinitely many our Bayesian types also fulfil a role in the semantics of integers to the unit type unit. We will write supp(Γ) for variable assignment. As is the tacit practise in Anglican, we the set i Γ(i) = unit and use the traditional notation will consider that assigning a (possibly random) value to a Γ[i T]{to| denote6 the context} mapping i to T and all j = i to variable is equivalent to assigning a prior to the type of this Γ(j7→). To each context we can associate the finite tensor6 type n variable. For example the program x := 2.5 which assigns i=1 Γ(i) where n = sup supp(Γ). When a context Γ appears the value 2.5 to the variable x can be understood as placing to the right of a turnstile in the typing rules which follow we N a (deterministic) prior on the reals, namely δ2.5. Similarly, will implicitly perform this conversion from context to type. the program x := sample(normal(0, 1)) which assigns to x a Our contexts are a dynamic version of the static context value randomly sampled from the normal distribution (0, 1) of [5] which consists of a constant map on N to a single with mean 0 and standard deviation 1 can be understoodN as type. They are in some respects similar to the heap models setting the prior (0, 1) on the reals. In a slogan: of separation logic, and for notational clarity we will require N similar operations on contexts as on heaps: a notion of Bayesian types = Assigned types compatibility, of union and of difference. Given two contexts Note however that this slogan is only valid for assignments Γ1, Γ2 we will say that they are compatible if Γ1(i) = Γ2(i) without free variables, indeed a prior cannot be parametric in for all i supp(Γ ) supp(Γ ), and we will then write ∈ 1 ∩ 2 some variables, it represents definite information. This caveat Γi ⋔ Γ2. For any two compatible contexts Γ1 ⋔ Γ2 we define will be reflected in the type system. the union context Γ Γ as the union of their graphs, which 1 ⊕ 2 We then have two binary type constructors: tensor types is a function by the compatibility assumption. We define the and functions types which together will support higher-order difference context Γ1 Γ2 as the map sending i Γ1(i) if ⊖ 7→ reasoning. Finally, we have a unary type constructor used to i / supp(Γ2) and to unit otherwise. In particular Γ1 Γ2 = Γ1 ∈ ⊖ define higher-order probabilities. if the supports are disjoint. We isolate the following two sub-grammars of types whose semantic properties will be essential to the typing of certain B. Syntax operations. First we define order-complete types as the types We define an ML-like language allowing imperative features generated by the grammar like variable assignments, conditionals and while loops within a functional language. S ::= G (G,µ) (G,µ) (G,µ) G in ground types (8) | | ⊗ 1) Expressions: T ::= S S S MS (9) | → | e ::= n ∈ Nk | r ∈ Rk | m ∈ PosDef(n) | Constants Second, we define measure types as the types generated by op(e,...,e) | Built-in operations all the constructors of grammar (7) except the function-type xi | i ∈ N, Variables constructor. xi := e | Assignment e; e | Sequential composition Remark 14. We could easily add product types to our let xi = e in e Sequencing type system since the category in which we interpret types fn xi . e | λ-abstraction (RoBan) is complete, but we feel that this would distract e(e) | Function application from the central role played by the tensor product. It would if e then e else e | Conditional also introduce the possibility of copying, which from the while e do e | Iterations perspective of randomness as a resource is problematic. This sample(e) | Sampling is why we have ‘hard-coded’ the products which we do need sampler(e) | Packages a program as a sampler (tuples of integers and reals) as ground types. observe(e) Conditioning Subtyping relation: We will need to formalise the fact that Every built-in operation must come equipped with typ- a Bayesian type (T,µ) is a subtype of the type T. For this we ing instruction which we will write as an n + 1-tuple introduce a subtyping relation denoted <: freely generated by (S1,..., Sn, T) where the first n components are ground the rules types specifying the input types and the last component is S <: S′ T <: T′ S′ <: S T <: T′ a ground type or measures over a ground type specifying (T, µ) <: T S ⊗ T <: S′ ⊗ T′ S → T <: S′ → T′

7 the output type. For example the boolean connective or A. Semantics of types would come with typing (bool, bool, bool), the sine func- For ground types we define tion sin with typing (real, real) and the function normal JmK = 1,...,m where 1,...,m is equipped with constructing a normal distribution would come with typing • the discreteM{ σ-algebra.} Note{ that JmK} Rm, and thus (real, PosDef(1), M real), where the first input is the mean, JunitK R, the unit of the positive projective≃ tensor. the second is the standard deviation (a 1-dimensional covari- JintK =≃ N, where N is equipped with the discrete ance matrix, i.e. a positive real) and the output is a measure • σ-algebra M over the reals. JrealK = R, where R is equipped with its usual Borel • 2) Well-typed expressions: The typing rules for our lan- σ-algebra M guage are gathered in Fig. 1. We will discuss these rules JPosDef(n)K = PosDef(n), where PosDef(n) is the • in detail when we define the denotational semantics of our space of positiveM semi-definite n n matrices equipped language in IV, but we can already make some observations. × Rn n § with the Borel σ-algebra inherited from × It is important to realize that memory-manipulating rules in As expected, the tensor and function type constructors are effect have a sequent on the right of the turnstile, formally interpreted by the monoidal closed structure of RoBan, i.e. represented by an integer-indexed tensor product type (see △ III-A), whilst the other rules just have a type. Syntactically JS TK := JSK π JTK and JS TK := [JSK, JTK] and§ semantically however, we make no distinction between ⊗ ⊗| | → The higher-order probability type constructor M is interpreted these two cases. A useful way to think about our system is as follows. For any regular ordered Banach space V we as follows: a purely functional computation Γ e : T will consider the underlying set together with the Borel σ-algebra consume a context Γ and output a value of type T⊢. A program induced by the norm. We then apply the functor to with imperative features Γ e : ∆ modifying an internal store this measurable space. This construction is functorial andM we Γ to a new store ∆ should be⊢ thought of as consuming a context overload to denote the resulting regular ordered Banach Γ and outputting the totality of its internal state ∆. space by MV . Using this convenient notation we define The only way to explicitly create a Bayesian type is through M a variable assignment without free variables: a prior must con- JMTK := JTK tain definite information, not information which is parametric M in variables. Only ‘measure types’ can form Bayesian types. For Bayesian types note that the type system in Fig. 1 can only produce a Bayesian type T if T is a measure type The sequential composition rule looks daunting, but it is ( ,µ) and µ has no free variables, i.e. if µ : T is derivable. We simply a version of the cut rule with a bit of bookkeeping to will therefore only need to provide∅⊢ a semantics to Bayesian make sure contexts do not conflict with one another. types of this shape. Our semantics of Bayesian types is in Note finally that our observe statement applies to a term some respect similar to that of pointed types used in homotopy of type T, intuitively we observe a possibly random element of type theory [27]. Indeed, at the type-theoretic level they are type T. This is slightly different from the syntax of observe defined identically as a type together with a term inhabiting in Anglican where a distribution is observed. Semantically, this type. However, the ordered vector space structure allows the difference disappears since a possibly random element is us to provide a semantics which is much richer than a space modelled by a distribution. with a distinguished point. Given a measure type T and a 3) A simple example: It is not hard (but notationally sequent of the type µ : T, we will see in IV-B that cumbersome) to type-check the following simple Gaussian µ is interpreted as an∅ ⊢ operator JµK : R JT§K, which is inference program against the inference rules of Fig. 1. uniquely determined by JµK(1). For notational→ clarity we will let x=sample(normal(0,1)) in often simply write µ for the measure JµK(1). We define the observe(sample(normal(x,1))) denotation of the Bayesian type (T,µ) as the principal band in JTK (see II-B4) generated by the measure µ (i.e.JµK(1)). § In the empty context the program above evaluates to a function Formally: of type J(T,µ)K = JTKµ (11) (real, sample(normal(sample(normal(0, 1)), 1))) For this semantics to be well-defined it is necessary that JTK (real, sample(normal(0, 1))) (10) be a Riesz space, since bands are defined using the lattice → structure. This is indeed the case: which, as we will see in IV, is what we want semantically. § Theorem 15. The semantics of a measure type is a Banach lattice. IV. DENOTATIONAL SEMANTICS The function type constructor is the only operation in the As the reader will have guessed we will now provide a type system which forces us to leave the category of Banach denotational semantics for the language described in III in lattices and enter the much larger category RoBan. As shown the category RoBan of regular ordered Banach spaces.§ in [18, Ex. 1.17], the space of regular operators between two

8 i ∈ n n ∈ Nk Constants: ∅ ⊢ i : n ∅ ⊢ n : intk r ∈ Rk M ∈ PosDef(n) ∅ ⊢ r : realk ∅ ⊢ M : PosDef(n)

Γ[i 7→ S] ⊢ e : ∆[j 7→ T] ′ ′ Variables and subtyping: [i 7→ T] ⊢ x : T ′ ′ S <: S, T <: T i Γ[i 7→ S ] ⊢ e : ∆[j 7→ T ]

T ∅ ⊢ e : T measure type ∅ ⊢ e : (T,e)

Γ1 ⊢ e1 : S1 ··· Γn ⊢ en : Sn Built-in operations: op of type (S1,..., Sn, T), supp(Γi) ∩ supp(Γj )= ∅,i 6= j Γ1,..., Γn ⊢ op(e1,...,en): T

Γ ⊢ e : T ∅ ⊢ e : T Assignment T measure type Γ[i 7→ T] ⊢ xi := e : T [i 7→ T] ⊢ xi := e : (T,e)

Γ ⊢ e1 : S ∆[i 7→ S] ⊢ e2 : T Sequencing: supp(Γ) ∩ supp(∆) = ∅ Γ ⊕ ∆ ⊢ let xi = e1 in e2 : T

Γ1 ⊢ e1 : ∆1 Γ2 ⊢ e2 : ∆2 Sequential composition: ∆1 ⋔ Γ2, Γ1 ⋔ (Γ2 ⊖ ∆1), ∆2 ⋔ (∆1 ⊖ Γ2) Γ1 ⊕ (Γ2 ⊖ ∆1) ⊢ e1; e2 : (∆1 ⊖ Γ2) ⊕ ∆2

Γ[i 7→ S] ⊢ e : T Γ ⊢ e1 : S ∆ ⊢ e2 : S → T λ-abstraction and function i∈ / supp(Γ) application: Γ ⊢ fn xi . e : S → T Γ, ∆ ⊢ e2(e1): T

Γ ⊢ e1 : bool Γ ⊢ e2 : T Γ ⊢ e3 : T Imperative control flow: Γ order-complete type Γ ⊢ if e1 then e2 else e3 : T

Γ ⊢ e1 : bool Γ ⊢ e2 : Γ Γ order-complete type Γ ⊢ while e1 do e2 : Γ Γ ⊢ e : T Γ ⊢ e : MT Probabilistic operations: T measure type T measure type Γ ⊢ sampler(e): MT Γ ⊢ sample(e): T

[i 7→ (S, µ)] ⊢ e : T S, T measure type, S order-complete type [i 7→ (S, µ)] ⊢ observe(e) : (T,e[xi/µ]) → (S, µ)

Fig. 1. Typing rules

Riesz spaces need not even be a lattice. The non-closure and we put J K := R. A typing rule Γ e : T will be of Banach lattices under taking internal homs is one of the interpreted as∅ a regular (in fact positive, see⊢ Th. 23) operator technical reasons for our use of ‘measure types’ 4. JeK : JΓK JTK . → We introduced order-complete types in III-A because of B. Semantics of well-formed expressions a ‘dual’ non-closure property: order-complete§ spaces are not closed under the positive projective tensor operation. As shown Let us now turn to the semantics of terms. △ 1) Constants: A constant c G whose ground type G is in [25, 4C] the product L2([0, 1]) π L2([0, 1]) is not order- ∈ ⊗| | interpreted as the space G will be interpreted as the operator complete, even though L2([0, 1]) is. Order-completeness will M be important in the semantics of while loops. JcK : J K = R JGK = G, λ λδc ∅ −→ M 7→ Theorem 16. The semantics of an order-complete type is an 2) Built-in operations: Recall that every built-in operation order-complete space. op comes with typing information (G1,..., Gn, T) where each G is of ground type and T is either of ground type or of type Subtypes and contexts: The subtyping relation will simply i MG. Each such operation is interpreted via a function f : be interpreted as subspace inclusion. For example the relation op G ... Gn X, with X = G or X = G, as the (T,µ) <: T is interpreted as the inclusion of the principal band 1 × × → M unique regular operator which linearizes fop according ×◦ JTKJµK ֒ JTK. A context Γ will be interpreted as the positive △ M → to the universal property (4) of π : projective tensor ⊗| | △ △ sup supp(Γ) G1 ... Gn / JG1K π ... π JGnK △ △ M × ×M | | | | |π| ⊗ ⊗ JΓK = π JΓ(i)K. ① | | × ⊗ ✈ i  r t N (G1 ... Gn) ♦ ♣ 4Note that we could in principle extend (11) to all types by considering M × × ♠ fop ❥ ❧ JopK subsets generated by a single element which exist in all regular ordered M  ❢ ❣ ✐ Banach spaces, for example the closure of ideals. X q ❡ M

9 R R R For example the boolean operator or of type interpreted as the operator ( )δ . ,µ µ( )δ . . M → M 3 5 7→ 3 5 (bool, bool, bool) would be interpreted, via the function In particular any probability distribution gets mapped to δ3.5. for : 2 2 2 implementing the boolean join, as the This is the semantics of assignment of [5]. × → linearisation of for (which is bilinear). Similarly, the 4) Sequencing and sequential composition: These are con- operation normalMof type◦×(real, PosDef(1), Mreal) building ceptually straightforward as they essentially implement some a normal distributions would be interpreted, via the obvious form of function composition. The only difficulty resides in + function fnormal : R R R, as the linearisation of the bookkeeping of contexts which is a bit cumbersome. × → M fnormal . Note that if the inputs are deterministic, i.e. a Given Je K : JΓK JSK and Je K : J∆[i S]K JTK we M ◦× 1 → 2 7→ → tensor δµ δσ for a mean µ and a standard deviation σ (as define N = sup(supp(Γ) supp(∆)), we assume w.l.o.g. that ⊗ ∪ would usually be the case), then JnormalK(δµ δσ) outputs i / supp(Γ) and define the semantics of let x = e in e ⊗ i 1 2 a Dirac delta over the distribution (µ, σ). Note how we as∈ the unique operator which linearizes the multilinear map interpret the deterministic constructionN of a distribution over N differently from sampling an element of according to X X J(Γ ∆)(i)K JTK this distribution: the former is a distribution over distributions, ⊕ → i=1 the latter just a distribution. Y 3) Variables and assignments: A variable on its own acts (x ,...,xN ) e xi,e xi like a variable declaration and introduces a context (see Fig. 1 7→ 2  1   i supp(∆) i supp(Γ) 1). Its semantics is simply given by the identity operator on ∈ O ∈ O    the type of the variable, formally if [i T] xi : T then The disjointness condition on the contexts Γ and ∆ implements 7→ ⊢ JxiK = IdJTK. In order to define the semantics of variable the resource-awareness of the system: e1 consumes the context 5 assignment we need the following result. Γ, so no part of it can be re-used in the computation e2. Sequential composition is just a generalisation of sequenc- Theorem 17. The denotation of any type T admits a strictly ing where e1 outputs a store and the ‘cut’ can take place over positive functional φJTK. more than one type. As a simple example consider the program

The strictly positive functional φJTK can be thought of as x1 := 3.5 ; x2 := 7.3. Using the sequential composition rule a generalisation to all types of the functional on measures we can derive the following typing-checking proof which consists in evaluating the mass of the whole space, ∅ ⊢ 3.5: real ∅ ⊢ 7.3: real i.e. of evX : X R,µ µ(X). With this notion in place we canM provide→ a semantics7→ to assignments. Given a [1 7→ real] ⊢ x1 := 3.5 : (real, 3.5) [2 7→ real] ⊢ x2 := 7.3 : (real, 7.3) [1 7→ real, 2 7→ real] ⊢ x1 := 3.5 ; x2 := 7.3 : (real, 3.5) ⊗ (real, 7.3) sequent Γ e : T, let φJTK be the strictly positive functional on JTK constructed⊢ in Th. 17 and let us write JΓ[i T]K as The semantics of the program is the operator defined by △ △ 7→ JΓ1K π JTK π JΓ2K. We now define the multilinear map R △ R R △ R ⊗| | ⊗| | π ( )δ3.5 π ( )δ7.3 , M ⊗| | M → M ⊗| | M JΓ1K JTK JΓ2K JTK µ ν µ(X)δ . ν(X)δ . × × −→ ⊗ 7→ 3 5 ⊗ 7 3 △ △ φJTK(t)JeK(γ1 π t π γ2) if Γ(i)= T | | | | 5) λ-abstraction and function application: These are in- (γ1,t,γ2) ⊗△ ⊗ 7→ (φJTK(t)JeK(γ1 π γ2) else terpreted exactly as expected in a monoidal closed category, | | ⊗ △ 6 namely via the adjunction π JSK [JSK, ] and ordinary This defines the unique linearizing operator function application. − ⊗| | ⊣ −

Jxi := eK : JΓ[i T]K JTK 7→ −→ Remark 18. While the denotation of λ-abstraction is imme- RoBan In the case where the context Γ is empty, the premise of diately given by the monoidal closed structure of , the typing rule for assignments is interpreted as an operator the following point is worth making. Assume a context of JeK : R JTK and we can therefore strengthen the definition ground types only. In the system of [5], such a context is of → the shape (X1 ... Xn), i.e. any joint distribution over of Jxi := eK as follows: the variablesM can× be considered× as an input to the program. Jxi := eK : JTK JTK , t φ (t)JeK(1) If we were Cartesian closed, a context would be of the shape → JeK(1) 7→ JTK X1 ... Xn, i.e. only product distributions would be In the empty context, the general rule for variable assignment Mconsidered× as×M potential inputs to the program. Our semantics is a consequence of the rule creating Bayesian types since lies somewhere in between these two possibilities since a (T,e) <: T, i.e. there is no disagreement between the two △ △ context is of the shape X1 π ... π Xn. This means rules. that not all joint probabilitiesM ⊗| can| be⊗λ| -abstracted| M on, only As a simple example, it is easy to type-check the program those which live in the tensor product (i.e. limits of Cauchy x:=3.5 and see by unravelling the definition that it is sequences of linear combinations of product measures). Put differently, we can only λ-abstract if the probabilistic state 5As a consequence of this theorem, the semantics of all our types are Archimedean ordered vector spaces. of the machine is prepared (to use a quantum analogy) to a 6In fact a nuclear operator [28]. distribution in the positive projective tensor product.

10 6) Conditionals and while loops: We provide the seman- We can now define the semantics of sampler. Suppose we tics of conditionals, the semantics of while loops then follows have JeK : JΓK JTK and, by Th. 20, that JTK is isometrically exactly as in [5]: by first writing the fixpoint equation in terms and monotonically→ embedded in the space X. Now consider M of conditionals and then solving it using the side condition that the map η : X X, x δx (which is not an operator) JΓK is order-complete. and define denotation→ M of sampler7→ (e) as the positive operator Given a boolean test Γ e : bool interpreted as an operator ⊢ Jsampler(e)K : JΓK JTK,γ η(γ). JeK : JΓK 2, the order-completeness of JΓK allows us to →M 7→ M define the→M maps The semantics of sample works in the opposite direction. Suppose we have JeK : JΓK JTK with JTK isometrically + + → M Te : JΓK JΓK ,γ 0 γ′ γ : JeK(γ′)(1) = JeK(γ)(1) and monotonically embedded in X, then each element of → 7→ { ≤ ≤ } M + + JTK is also an element of X. We can define a map Fe : JΓK JΓK ,γ ^ 0 γ′ γ : JeK(γ′)(0) = JeK(γ)(0) → 7→ { ≤ ≤ }M MM mX : X X, Proposition 19. The maps^ Te and Fe are additive and linear MM →M R+ over . ρ λB. evB(µ) dρ 7→ B+( X) Since regular ordered Banach spaces have a generating Z M where we recall that B+( X) is the positive unit ball of cone, we can uniquely extend Te and Fe to the entire M space JΓK and define the semantics of the conditional Γ the space X. This map clearly defines a positive operator, ⊢ and in theM case where ρ is supported by the set of probability if e1 then e2 else e3 : T as the operator distributions, i.e. the shell µ ( X)+ : µ = 1 of the { ∈ M k k } JΓK JTK,γ Je2K Te (γ)+ Je3K Fe (γ) positive unit ball, m coincides with the multiplication of the → 7→ ◦ 1 ◦ 1 X Giry monad. We are now ready to define To see why this definition makes sense we will briefly show that it recovers the semantics of [5]. In [5], JΓK is a Jsample(e)K : JΓK JTK,γ mX (γ). → 7→ measure space X and Je K : X 2 is of the shape M 1 M → M We can now interpret the type of the small Gaussian inference b for a measurable map b : X 2 which specifies a program of III-B3. In defining the semantics of built-in oper- Mmeasurable subset of . We claim→ that B X Te : X X ations we saw§ that the semantics of normal(0, 1): Mreal sends a probability measure to the measure M defined→M by µ µB is the linear map R R mapping∅⊢ 1 to the Dirac delta , exactly as the operator of [5, 3.3.4]. µB(A)= µ(A B) eB over the normal distribution→ MM (0, 1). It follows that By unravelling∩ the definition we want to show that N Jsample(normal(0, 1))K(1) = (0, 1) µB = 0 ν µ : ν(B)= µ(B) N { ≤ ≤ } and by unravelling the definition we similarly find that Note first that µ ^belongs to the set above, so it remains B Jsample(normal(sample(normal(0, 1)), 1)K(1) = to show that it is its minimal element. Let ν also belong to this set, and let A be a measurable set. We decompose A as λB. (x, 1)(B) d (0, 1) = (0, √2) A = (A B) (A Bc). By definition x R N N N ∩ ⊎ ∩ Z ∈ c c which is the pushforward of (0, 1) by the kernel λx. (x, 1) µB(A B )=0 ν(A B ) since 0 ν. N N ∩ ≤ ∩ ≤ described in (1). It follows that the output of the Gaussian Moreover we have inference program is interpreted as a linear operator ( R) ( R) . µB(A B)= µ(A B)= ν(A B). (0,√2) (0,1) ∩ ∩ ∩ M N −→ M N We now turn to the semantics of the observe statement, which For if we had ν(A B) < µ(A B), then in order to keep will show us what this operator actually is. µ(B) = ν(B) we∩ would need ∩ν(Ac B) > µ(Ac B), 8) Semantics of observe: Assume that we have JeK : a contradiction with ν µ. Thus µ ∩ ν as claimed∩ and B J(S,µ)K JTK with S, T of measure type, i.e. we can type JeK the semantics of if e ≤then e else ≤e : T becomes the 1 2 3 as a regular→ operator ( X) Y . operator µ We now make the assumption,M →M which we justify in Th. 23 µ Je K(µB )+ Je K(µBc ) 7→ 2 3 below, that JeK is positive. Under this assumption if ν Kµ, then JeK(ν) KJeK(µ), i.e. JeK restricts to an operator≤ exactly as in [5]. ≤ 7) sampler and sample: are given a semantics which can UB UB JeK : ( X)µ ( Y )JeK(µ) be understood as generalisations of the unit and co-unit of M → M UB the Giry monad [10] respectively. First we need the following where ( X)µ is the set of measures uniformly bounded by M easy result. a multiple of µ (see [20]). The semantics of observe(e) is then fundamentally con- Theorem 20. The semantics of every measure type is isomet- tained in the K¨othe dual operator rically and monotonically embedded in a space of measures σ σ X. JeKσ : ( Y )UB ( X)UB . M M JeK(µ) → M µ   

11 It is not hard to check by using the Riesz Representation where i, j are the obvious inclusions. and Functional Representations natural transformations (RR σ Remark 21. The semantics of observe via the K¨othe and FR in Diagram (3)) that ( Y )UB (L (Y,ν))σ ν p dual is more general than a semantics in terms of Bayesian ( Y ) , and thus JeKσ can, moduloM these isomorphisms,≃ be≃ ν inversion/disintegration. Nothing prevents the introduction of typedM as an operator  ground types which stand for measurable spaces in which σ JeK : ( Y ) ( X)µ (12) disintegrations do not exist. However, the K¨othe dual will still M JeK(µ) → M exist. Thus our semantics is free of some of the ‘pointful’ which is what the typing rule for observe requires. technicalities surrounding the existence of disintegrations, and To illustrate how this semantics really implements the follow the ‘pointless’ perspective advocated in [13]. Similarly, Bayesian inversion described in II-A3, let us again con- we do not have to worry about the ambiguity cause by the fact sider our simple Gaussian inference§ program. The underlying that disintegrations are only defined up to a null set: the K¨othe Bayesian model is given by the probability kernel ( , 1) : dual of an operator between regular ordered Banach spaces R M R and the prior (0, 1) on R. TogetherN − these exists completely unambiguously. define→ a Krn-arrow (R, (0,N1)) (R, (0, √2)) which is implemented by the programN → N C. Some properties of the semantics [x (real, normal(0, 1))] sample(normal(x, 1)) : real 7→ ⊢ The development of the semantics in the previous section whose denotation is the positive operator has built-in soundness:

( ( , 1)) : ( R) (0,1) R. Theorem 22. The semantics is sound w.r.t. to the typing rules M− N − M N →M of Fig. 1. Using the same argument as above, we can restrict this operator as follows More importantly, we can extend [5, Th. 3.3.8] by a R UB R UB straightforward induction and show that: ( ( , 1)) : ( ) (0,1) ( ) (0,√2). M− N − M N → M N As stated above, all the information about the semantics of Theorem 23. The semantics of any program is a positive operator of norm 1. observe(sample(normal(x, 1))) is contained in the K¨othe ≤ dual of this operator, which, through the Riesz Representation However another result of [5], namely that the denotation and Functional Representations natural transformation, can be of a program is entirely determined by its action on point typed modulo isomorphism as masses [5, Th. 6.1] does not hold any more. The reason is σ R R interesting and is worth a few words. It is immediate from the ( ( ( , 1))) : ( ) (0,√2) ( ) (0,1). M− N − M N → M N type system (Fig. 1) and the denotation of Bayesian types that Using the other half of diagram (3), that is to say the Radon- the domain of the semantics of an observe statement may Nikodym and Measure Representation natural transformations not contain any point masses at all. For example in the case (RN and MR in diagram 3), this operator is equal, modulo of the Gaussian inference program of III-B3, this domain is isomorphism, to the operator ( R) (0,1) which contains no point masses at all. M N R R ( ( , 1)†) : ( ) (0,√2) ( ) (0,1). M− N − M N → M N D. Comparison with semantics a` la Scott. Here the Bayesian inverse of our original probability kernel appears explicitly, showing that our semantics indeed captures There are interesting parallels to be drawn between our se- the notion of Bayesian inverse. mantics and the Scott-Strachey semantics in terms of domains. There is one final subtlety which we need to account for. Looking at ground types first, it is worth noting that just like Given JeK : J(S,µ)K JTK, the typing rule for observe in the flat domain functor turns a set (of integers for example) → into a valid semantic object (a domain), so the functor turns fact makes the whole semantics described above parametric M in a choice of measure absolutely continuous w.r.t. the prior a measurable set into a valid semantic object (a regular ordered µ (see Fig. 1). This is a simple technicality: morally and Banach space). Similarly, just like the flat domain functor can practically the parameter will always be set to the prior itself, turn a partial map between sets into a total map between domains, so the functor turns a partial measurable map into in which case we get as output of the program the K¨othe dual M described by (12). Mathematically however, we can choose as a linear operator. Partiality is encoded by the presence of the input any ν µ; the output operator is then defined by the bottom element in the case of domain, and by the possibility following tortuous≪ journey (similar to constructions in [20]) to lose mass (i.e. get subdistributions) in the case of spaces of measures. σ iσ σ ≃ UB UB We do not know yet if the semantics of every program in our ( Y )JeK(µ) / ( Y )JeK(µ) / ( Y )JeK(ν) M M M language is σ-order continuous, which would be the equivalent    σ  JeK of Scott-continuous in our setting. The fundamental difference  however is that our semantic category is not Cartesian closed, j UB σ ( X)µ o ( X)ν o ≃ ( X) but monoidal closed. M M M ν 

12 ACKNOWLEDGMENT [28] S. Abramsky, R. Blute, and P. Panangaden, “Nuclear and trace ideals in tensored ⋆-categories,” Journal of Pure and Applied Algebra, vol. 143, The authors would like to thank Ilias Garnier for bringing no. 1-3, pp. 3–47, 1999. [16] to their attention. [29] E. Davies, “The structure and ideal theory of the predual of a Banach lattice,” Transactions of the American Mathematical Society, vol. 131, no. 2, pp. 544–555, 1968. REFERENCES [30] R. Dudley, B. Bollobas, and W. Fulton, Real Analysis and Probability, ser. Cambridge Studies in Advanced Mathematics. Cambridge Univer- [1] O. Heunen, C.and Kammar, S. Staton, and H. Yang, “A convenient sity Press, 2002. category for higher-order probability theory,” in LICS. IEEE, 2017, pp. 1–12. [2] A. Scibior,´ O. Kammar, M. V´ak´ar, S. Staton, H. Yang, Y. Cai, K. Os- termann, S. K. Moss, C. Heunen, and Z. Ghahramani, “Denotational validation of higher-order bayesian inference,” Proceedings of the ACM on Programming Languages, vol. 2, no. POPL, p. 60, 2017. [3] S. Staton, “Commutative semantics for probabilistic programming,” in ESOP. Springer, 2017, pp. 855–879. [4] T. Ehrhard, M. Pagani, and C. Tasson, “Measurable cones and stable, measurable functions: a model for probabilistic higher-order program- ming,” Proceedings of the ACM on Programming Languages, vol. 2, no. POPL, p. 59, 2017. [5] D. Kozen, “Semantics of probabilistic programs,” J. Comput. Syst. Sci., vol. 22, no. 3, pp. 328–350, June 1981. [6] ——, “A probabilistic PDL,” J. Comput. Syst. Sci., vol. 30, no. 2, pp. 162–178, April 1985. [7] C. Aliprantis and K. Border, Infinite dimensional analysis. Springer, 1999, vol. 32006. [8] T. Eisner, B. Farkas, M. Haase, and R. Nagel, Operator theoretic aspects of ergodic theory. Springer, 2015, vol. 272. [9] B. Hayes, “Computing science: Randomness as a resource,” American Scientist, vol. 89, no. 4, pp. 300–304, 2001. [10] M. Giry, “A categorical approach to probability theory,” in Categorical aspects of topology and analysis. Springer, 1982, pp. 68–85. [11] J. T. Chang and D. Pollard, “Conditioning as disintegration,” Statistica Neerlandica, vol. 51, no. 3, pp. 287–317, 1997. [12] A. S. Kechris, Classical descriptive set theory, ser. Graduate Text in Mathematics. Springer, 1995, vol. 156. [13] F. Clerc, F. Dahlqvist, V. Danos, and I. Garnier, “Pointless learning,” in Foundations of Software Science and Computation Structures - 20th International Conference, FOSSACS 2017. Proceedings, 2017. [14] F. Dahlqvist, V. Danos, I. Garnier, and A. Silva, “Borel kernels and their Approximations, Categorically,” in Mathematical Foundations of Programming Semantics (MFPS), 2018. [15] Y.-C. Wong and K.-F. Ng, Partially ordered topological vector spaces. Oxford University Press, 1973. [16] K. C. Min, “An exponential law for regular ordered banach spaces,” Cahiers de Topologie et G´eom´etrie Diff´erentielle Cat´egoriques, vol. 24, no. 3, pp. 279–298, 1983. [17] A. C. Zaanen, Introduction to operator theory in Riesz spaces. Springer Science & Business Media, 2012. [18] C. D. Aliprantis and O. Burkinshaw, Positive operators. Springer Science & Business Media, 2006, vol. 119. [19] J. Dieudonn´e, “Sur les espaces de K¨othe,” Jour. dAnalyse Math., vol. 1, no. 1, pp. 81–115, 1951. [20] P. Chaput, V. Danos, P. Panangaden, and G. Plotkin, “Approximating Markov Processes by averaging,” Journal of the ACM, vol. 61, no. 1, Jan. 2014. [21] P. Selinger and B. Valiron, “On a fully abstract model for a quantum linear functional language,” Electronic Notes in Theoretical Computer Science, vol. 210, pp. 123–137, 2008. [22] A. Grothendieck, Produits tensoriels topologiques et espaces nucl´eaires. American Mathematical Soc., 1955, vol. 16. [23] R. A. Ryan, Introduction to tensor products of Banach spaces. Springer Science & Business Media, 2013. [24] D. Fremlin, “Tensor products of archimedean vector lattices,” American Journal of Mathematics, vol. 94, no. 3, pp. 777–798, 1972. [25] ——, “Tensor products of banach lattices,” Mathematische Annalen, vol. 211, no. 2, pp. 87–106, 1974. [26] G. Wittstock, “Ordered normed tensor products,” in Foundations of quantum mechanics and ordered linear spaces. Springer, 1974, pp. 67–84. [27] D. Licata and E. Finster, “Eilenberg-maclane spaces in homotopy type theory,” in LICS. ACM, 2014, p. 66.

13 APPENDIX u X π Y , let µu denote the corresponding vector- valued∈ M measure⊗ M in (X, Y ), we can then define a measure E. Proofs M M on X Y bby putting for any rectangle A B of X Y × × × Proof of Proposition 1. By the triangle inequality it is enough J(µu)(A B) := µu(A)(B) to reason about positive operators. The proof is by contradic- × tion. Suppose that f : U V is positive but not bounded, It is easy to check that J =1. → k k then we can find a sequence xn in U with xn = 1 3 { } k k Proof of Theorem 15. By induction on the structure of the such that f(xn) n . By the axioms R1 and R2 and the k k ≥ type. For ground type, it follows from the fact that their positivity of f we can assume w.l.o.g. that xn 0. Since xn ≥ denotations are spaces of the shape X which is always n∞=1 kn2 k converges, we have by completeness of Banach M xn a Banach lattice (see II-B3). The same argument holds in spaces that n∞=1 n2 converges to an element x of U. Since § P xn the inductive case if the outermost constructor is M. For tensor every x 0, it follows that 0 2 x. Since positive n n product it follows from the fact that the positive projective operators are≥P automatically monotone≤ it≤ follows from axiom tensor product of two Banach lattice is a Banach lattice [25]. R1 that: 3 Finally, any band in a Banach lattice is a Banach lattice (see n xn n = f f(x) II-B4). n2 ≤k n2 k≤k k §   Proof of Theorem 16. By induction on the structure of the for every n N, a contradiction. Thus f is (norm-) bounded and therefore∈ continuous. type. We start with the first layer of the grammar (9). Ground types are interpreted as spaces of the shape X which are M Sketch of the proof of Theorem 8. The proof of [17, Th. 20.2, AL spaces, and thus order-complete by Th. 6. Bayesian types (G,µ) built from ground types are of the shape Xµ, and thus Cor. 20.3] also holds when the domain space is a regular M ordered space because the only property of the domain being isomorphic to an L1-space as was shown in Ex. 7. These are AL spaces, and are thus order-complete. The case of positive used is that the positive cone is generating. Thus the set V ∼ projective tensors of the shape (G,µ) (G,µ) then follows from of regular functionals on V forms an order-complete Riesz ⊗ space. Similarly, the proof of [18, Th. 4.74] holds when the Th. 10. Finally, for the outer layer of the syntax, it is clear that whatever S is, since JMSK = JSK, we get an order-complete domain space is a regular ordered space. Thus V ∼ forms a M Banach space as well. From [17, Th. 22.2] we know that the space. The case of internal homs follows from the well-known σ result from the theory of Riesz space which states that if V set V is a band in V ∼. Now assume that vn is a Cauchy σ is an order-complete space, so is [U, V ] (for a net Tα in sequence in V . We know that it converges to a v V ∼. By ∈ + { } [17, Th. 15.6] every norm convergent sequence in a Banach [U, V ], the join is defined at each u U as α Tα(u) by order-completeness of V , this is map∈ is linear and extends to lattice has a subsequence converging in order. Let un be this an operator U V by the fact that regular orderedW spaces sequence, i.e. un converges in order to v. Since V ∼ is order → complete we can further assume by the lim inf construction have a generating cone, for more details see the proof of [18, σ Th. 1.18]). that un is increasing, i.e. un v. But since V is band, it ↑ must follow that v V σ as desired. Proof of Theorem 17. By induction on the structure of the ∈ types. For the base case note that all ground types are inter- Proof of Theorem 10. By Th. 25 ( X)µ L1(X,µ) and preted as spaces of the shape G, on which the evaluation M ≃ M ( Y )ν L1(Y,ν). For each u L1(X,µ) π L1(Y,ν), function evG : G R,µ µ(X) is defined and is a M ≃ ∈ ⊗ M → 7→ let fu L1(X, L1(Y,ν),µ) be the associated Bochner µ- strictly positive functional. The same clearly holds for types ∈ integrable function given by Th 27. This map inb turns defines of the shape MT. Suppose now that JTK has a strictly positive R φu : X Y by φu(x, y) = fu(x)(y). It is easy to see functional, then for any T, since T is a subspace × → µ : J( ,µ)K that φu L1(X Y,µ ν): of T is clearly inherits this strictly positive functional. Now ∈ × × J K assume both JSK and JTK have strictly positive functionals φ φ(x, y) d(µ ν)= fu(x)(y) dν dµ and ψ respectively. Then the map JSK JTK R defined by X Y × X Y × → Z × Z Z (t,s) φ(t)φ(s) is bilinear and strictly positive. It follows that there7→ exists a strictly positive functional △ on By Bochner’s integrability theorem [7, 11.44] fu is Bochner φ π ψ △ ⊗| | µ-integrable iff f is Lebesgue integrable which shows JSK π JTK = JS TK. For internal homs we proceed as u 1 ⊗| | ⊗ that the integralk abovek is finite. The fact that the operation follows: since [JSK, JTK] is generated by positive operators, let u fu φu is an isometry is obvious, and it has an inverse us first consider a strictly positive operator T and define 7→ 7→ which associates to φ L (X Y,µ ν) the Bochner µ- + ∈ 1 × × χ(T ) = sup ψ(T (s)) s JSK 0 , φ(s) 1 integrable function fφ : X L1(Y,ν), x φ(x, ). { | ∈ \{ } ≤ } → 7→ · (note how this definition mimics the definition of the operator Proof of Theorem 11. By Th. 28 X π Y can be em- norm). Since φ is strictly positive, the set over which the bedded in (X, Y ). This allowsM us to⊗ defineM an embedding supremum is taken is not empty, and since T and ψ are M M J : X π Y (X Y ) asb follows: for each strictly positive, 0 < χ(T ) < [29, p. 545]. Moreover, M ⊗ M → M × ∞ b 14 since the supremum of a set of sums equals the sum of the containing the open subsets of R with the usual topology. In suprema, χ(T + S)= χ(T )+ χ(S), and similarly since scalar general, the Borel sets of a topological space are the smallest multiplication distributes over suprema χ(λT ) = λχ(T ), it σ-algebra containing the open sets. follows that χ is linear on strictly positive operators. We can A signed measure on a σ-algebra over X is a map then extend χ to all regular operators by putting χ(T ) := µ : [ , + ] associating to everyF event B a “weight” + F → −∞ ∞ χ(T ) χ(T −). µ(B) satisfying (i) µ( )=0; (ii) µ can assume at most one − of the values and∅+ ; and (iii) µ is σ-additive, that is, Proof of Proposition 19. To see that it is additive, note first −∞ ∞ µ( ∞ Ai)= ∞ µ(Ai) for any countable pairwise disjoint that since JΓK is order-complete it is in particular a Riesz space, ∪i=1 i=1 collection (Ai)i N of measurable sets. A signed measure is and it therefore has the Riesz decomposition property [7, 8.9], ∈ called a measureP if it assumes values in [0, ],a finite signed which means that if there exist 0 z x + y 0 z1,z2 measure if it assumes values in ( , ∞) (equivalently, if such that z + z = z ≤and z≤ x, z y. From≤ this and 1 2 1 2 µ(X) < ), and a probability measure−∞ ∞if it is a measure the linearity of it now follows≤ easily≤ that and are JeK Te Fe and| µ(|X)=1∞ . additive. The linearity poses no problem. The study of signed measures can largely be reduced to the Proof of Theorem 20. By induction on the structure of the study of measures as the following result shows. type, with the base case being tautological. Similarly, the Theorem 24 (Hahn-Jordan decomposition [30, 5.6.1]). Every case of Bayesian types and types of the shape MT are also signed measure µ has a unique decomposition§ as a difference tautological. The case of tensor product types follows from + µ = µ µ− of two measures, at least one of which is finite. Th. 11 and Th. 12 (the monotonicity of the embedding is − obvious). The total variation measure of a signed measure is defined as

Proof of Theorem 23. The proof is by induction on the deriva- ∞ µ (A) = sup µ(An) : An a partition of A tion tree of the program (see Fig. 1). The result holds trivially | | | | { }⊆F (n=1 ) for constants, variables and subtyping. For built-in operations X it follows from the fact that the pushforward operation has and the total variation of µ is then defined as µ (X). If µ is a measure, then µ = µ(X). From Th. 24 it we| | have norm 1. For assignments, one can easily show that the norm k k of the strictly positive functional built in Th. 17 and used in µ = sup µ(A) + µ(X A) defining the semantics has norm 1 and assignment therefore k k A | | | \ | ∈F also has norm 1. The case of sequencing and sequential The last measure-theoretical definition we need is the follow- composition follows from the fact that the composition of ing: if µ,ν are measures on a σ-algebra , µ is said to be operators of norm 1 has norm 1. For λ-abstraction, the F ≤ ≤ absolutely continuous w.r.t. ν, notation µ ν, if µ(B)=0 result follows from the fact that the universal bilinear map has whenever ν (B)=0. ≪ norm 1, and the result is trivial for function application. For | | A map f : (X, ) (Y, ) between measurable spaces conditionals it follows from the definition of and and Te Fe is called measurableF if→ for everyG measurable subset U , the fact that the norm is monotone in a regular ordered Banach f 1(U) . Given a such a measurable map and a measure∈ G space. For while loops the result follows from [5]. Finally, − µ on X,∈F one defines the pushforward measure f (µ) on (Y, ) for the observe statement the result follows from the fact that 1 ∗ G by f (µ)(B)= µ(f − (B)). the dual of an operator of norm 1 has norm 1. ∗ ≤ ≤ 2) Lebesgue integration [30, Ch. 4]: Given a measurable F. Background material on measure theory and Lebesgue space (X, ), a simple function is any real-valued function f : X RFexpressible as a sum of the form integration. → 1) Measures: A measurable space is a set X equipped with n

½ (13) a collection of subsets—called the measurable subsets— f = αi Bi F i=1 which can intuitively be understood as the observable parts of X

R ½ X, and are therefore also referred to as events in the prob- where αi , Bi , and Bi is the characteristic function ∈ ∈F abilistic literature. The collection of measurable sets must of Bi. A simple function can be equivalently expressed in F contain the empty set and be closed under complementation many ways, depending on the choice of n and the αi and and countable union. It follows from the de Morgan laws that Bi. Given a signed measure µ on , the integral of a simple F is also closed under countable intersection. Any collection function (13) w.r.t. to µ is given by F of subsets satisfying these closure properties is called a σ- n 7 algebra. A σ-algebra is thus an ω-complete Boolean algebra f dµ = αiµ(Bi) (14) of sets. An important example of a measurable space is the i=1 Z X set R together with its Borel σ-algebra, the smallest σ-algebra It can be shown that (14) is independent of the specific representation (13). Note that (14) is linear as a function of 7The “σ” in σ-algebra refers to “countable unions” in the same way as . It thus follows from Theorem 24 that integration w.r.t. to Fσ sets are countable unions of closed sets in descriptive set theory, with σ µ standing for the German Summe, union. signed measures is uniquely determined by integration w.r.t.

15 to measures. Similarly, (14) is linear and positive w.r.t. simple where vi V,Bi . We define the Bochner integral of this functions (positive simple functions have positive integrals). simple function∈ as∈F

Since any simple function can be expressed as the difference of n two positive simple functions, it follows that integrating simple f dµ = viµ(Bi) maps w.r.t. to signed measures is completely determined by Z i=1 integrating positive simple maps w.r.t. to measures. In this X spirit, given a positive measurable function f : X R+, we that is to say the weighted average of the vectors vi with define → the weights µ(Bi). We will always assume that a Banach space comes equipped with its Borel σ-algebra. To extend f dµ = sup g dµ 0 g f, g simple Bochner integration to arbitrary measurable functions X V | ≤ ≤ → Z Z  is impossible in general since V , unlike R, may have a very For a general function f : X R, we define f +(x) = high dimension. Intuitively, it is in general not possible to → + max(f(x), 0) and f −(x) = min(f(x), 0). Clearly f ,f − approximate a function f : X V with simple ones if the + − → are positive and f = f f −. We now define the integral of dimensionality of V is too high. We thus restrict the definition f w.r.t. to a signed measure− µ as of the Bochner integral to a limited class of measurable functions taking values in a separable subspace. Formally, a + f dµ = f dµ f − dµ function f : X V is µ-essentially separately valued if − → Z Z Z there exists E and a separable subspace Y V such + + + c ∈ F ⊆ = f dµ f dµ− f − dµ− + f − dµ− that µ(E )=0 and f(E) Y . We can now state: − − ⊆ Z Z Z Z Theorem 26 (Pettis Measurability Theorem [23, Prop 2.15]). We shall return to the order-theoretic property of integration Let (X, ) be a measurable space, let µ be a finite measure in II-B. We conclude this summary of Lebesgue integration on , andF let f : X V be a function taking values in a with§ one of the most important theorems in measure theory. BanachF space. Then f→is Borel measurable and µ-essentially Theorem 25 (Radon-Nikodym [30, Th 5.5.4]). Let (X, ) be separately valued iff there exists a sequence (f ) of simple F n a measurable space, µ a finite measure8 on X, and ν a finite functions converging µ-a.e. to f, in which case f is said to signed measure on such that ν µ. Then there exists a be µ-measurable. measurable functionFf : X R,≪ unique up to a µ-nullset, → A µ-measurable function f : X V will be called such that → Bochner µ-integrable if the sequence of simple functions (fn) ν(B)= ½B.f dµ = f dµ converging µ-a.e. to f also satisfies Z ZB The function f is called the Radon-Nikodym derivative of dν lim f fn dµ =0 µ w.r.t. ν and is denoted . It was shown in [14] that the n dµ →∞ k − k Radon-Nikodym derivative defines a natural transformation. Z where the integral is the ordinary Lebesgue integral. The G. Supplementary material on projective tensor products Bochner integral of f is the defined as the vector 1) Definition of a Banach space: We only consider real f dµ = lim fn dµ vector spaces in this paper. We therefore simply say ‘vector n space’ with the understanding that the scalar field is R. A Z →∞ Z Banach space is a vector space V equipped with a norm : Following the example of Lebesgue integration, we define k·k V [0, ) such that V is complete for the metric induced Lebesgue-Bochner space L1(X,V,µ) as the Banach space by→ the norm,∞ i.e. such that every Cauchy sequence in V has of (equivalence classes) of Bochner µ-integrable functions a limit in V . The vector spaces Rn equipped with the usual f : X V equipped with the norm → Euclidean norm are Banach space. 2) Bochner integration: One can generalise the ideas be- f 1 = f V dµ k k k k hind the Lebesgue integral to give a definition of the integral Z of a function taking its value in an arbitrary Banach space . V Theorem 27 ( [23] Ex. 2.19). L1(X,µ) π V L1(X,V,µ) As in the case of the Lebesgue integral, we start with simple the space of Bochner µ-integrable maps⊗f : X ≃ V . → functions. Given a measurable space (X, ), we generalise b (13) and say that a function f : X FV is simple if is 3) Vector-valued measures: One can also characterise pro- → expressible a sum jective tensor products of the shape (X) π V . For this we need a couple of definitions whichM generalise⊗ the notion of n measure to Banach spaces. Given a measurableb space (X, )

½ F f = vi Bi and a Banach space V , a V -valued vector measure on is i=1 F X a σ-additive map µ : V , where σ-additivity means that F → 8The result holds more generally for any dominating σ-finite measure. We if Ai is a countable sequence of pairwise disjoint measurable will only need finite signed measures in this paper. sets, then the series i µ(Ai) converges to µ( i Ai). The P S

16 variation norm of a V -valued measure is defined exactly like in the scalar-valued case: µ = sup µ(A) is a measurable partition of X k k1 { k k|B } A X∈B where the norm µ(A) is of course taken in V . Vector-valuedk measuresk and Bochner integration are related by a Radon-Nikodym type construction which can briefly be described as follows. Given a scalar measure λ (X, ) and a Bochner λ-integrable function f : X ∈V M, we canF use f as a ‘density’ and define a V -valued measure→ µ in the obvious way by µ(A)= f dλ. (15) ZA The notion of absolute continuity for vector-valued measures is defined as follows: a V -valued measure µ on (X, ) is absolutely continuous w.r.t. to a scalar measure λ (X,F ) is λ(E)=0 implies µ(E)=0.9 It is now natural∈M to askF whether the Radon-Nikodym Theorem generalises to vector- valued measures and Bochner integrable densities. The answer is usually negative (see [23, Ex. 5.13] for an example), and we therefore start from (15) to justify the following definition: a V -valued measure µ on (X, ) has the Radon-Nikodym property if it has bounded variationF and if for every finite scalar measure λ (X, ) there exists a Bochner λ-integrable function f :∈X M V suchF that (15) holds. We can now state a → characterisation of tensor products of the shape X π V . M ⊗ Theorem 28 ( [23] Th. 5.22). X π Y is isometrically b isomorphic to the Banach spaceM of Y -valued⊗ measures with the Radon-Nikodym property together withb the variation norm.

9Note that µ(E) = 0 is the zero of V whilst λ(E) = 0 ∈ R.

17